Drivers of inter-annual variability in Net Ecosystem Exchange in a semi-arid savanna ecosystem, South Africa

. Inter-annual variability in primary production and ecosystem respiration was explored using eddy-covariance data at a semi-arid savanna site in the Kruger Park, South Africa. New methods of extrapolating night-time respiration to the entire day and ﬁlling gaps in eddy-covariance data in semi-arid systems were developed. Net ecosystem exchange (NEE) in these systems occurs as pulses associated with rainfall events, a pattern not well-represented in current standard gap-ﬁlling procedures developed primarily for temperate ﬂux sites. They furthermore do not take into account the decrease in respiration at high soil temperatures. An artiﬁcial neural network (ANN) model incorporating these features predicted measured ﬂuxes accurately (MAE 0.42 gC/m 2 /day), and was able to represent the seasonal patterns of photosynthesis and respiration at the site. The amount of green leaf area (indexed using satellite-derived estimates of fractional interception of photosynthetically active radiation f APAR ) , and the timing and magnitude of rainfall events, were the two most important predictors used in the ANN model. These drivers were also identiﬁed by multiple linear regressions (MLR), with strong interactive effects. The annual integral of the ﬁlled NEE data was found to range from − 138 to +155 g C/m 2 /y over the 5 year eddy covariance measurement period. When applied to a 25 year time series of meteorological data, the ANN model predicts an annual mean NEE of 75( ± 105) g C/m 2 /y. The main correlates of this inter-annual variability were found to be variation in the amount of absorbed photosynthetically active radiation (APAR), length of the growing season, and number of days in the year when moisture was available in the soil.


Introduction
Carbon dioxide flux measurements using the eddy covariance technique generate a raw dataset with a very high temporal resolution (generally 10-20 Hz). The first step in the analysis of these data is to screen them for spurious values, perform various corrections, and then integrate the fluxes over periods of about 30 min. The half-hourly data provide important insights into many short-term physiological processes, but most ecological and management-relevant questions are framed over even longer timeframes -from days to years. A matter of particular interest to both ecologists and ecosystem managers is the inter-annual variability of primary production and carbon storage (Lauenroth et al., 2006). Semi-arid savannas are characterised by high inter-annual variability, in response to highly variable rainfall. This underlies many features of their ecology, including the likelihood and intensity of fires, the growth and migration of animal populations, and the stability of the tree-grass mixture (Higgins et al., 2000;Tyson, 1986;Reed et al., 1994;Ma et al., 2007;Serneels et al., 2007), and makes savanna systems particularly hard to manage.
Accumulating 30 min flux measurements to longer time periods is not a simple matter of adding them up, for two main reasons. The first is that even the best-run eddy covariance datasets have gaps, due to instrument failure or weather conditions that cause the eddy covariance flux assumptions to be violated. The second is that the eddy covariance measurement, net ecosystem exchange (NEE), is often not what is needed by ecologists who are often more interested in its components, gross primary production (GPP) and ecosystem respiration (R eco ): Published by Copernicus Publications on behalf of the European Geosciences Union. 252 S. A. Archibald et al.: Inter-annual variability in NEE NEE=GPP+R eco (observing the convention that fluxes from the atmosphere to the ground are given a negative sign).
A model is used to bridge the data gaps in what is intended to be an unbiased fashion. The same or different models can be used to deconvolve the NEE signal into its components. A wide range of standard procedures have been developed for this process, largely for application in temperate ecosystems (Falge et al., 2001;Papale et al., 2006;Moffat et al., 2007). These are not always appropriate for tropical wet-dry systems. They use phenomenological models, neural networks or process-based models to achieve their objectives. The readily-available ones do not work well for data from semiarid sites in Southern Africa. This is because they assume the major controls on flux processes to be solar radiation and temperature, whereas temperatures in the semi-arid tropics are almost always warm enough to permit physiological activity, and insolation is sufficient, at least during non-cloudy days, for light saturation of part or all of the typically-sparse canopy. In arid and semi-arid systems, the main control on the rate and duration of many ecosystem processes is soil moisture.
As a further complication, in low-rain, high-evaporation ecosystems, where the soils dry out between successive rainfall events (so-called pulse-driven systems), the various terms in the carbon budget are highly dependent on the recent history of the system (Huxman et al., 2004). For example, following a rainfall event, respiration increases rapidly whereas it takes several days for the ecosystem to reach maximum photosynthesis (Huxman et al., 2004;Xu et al., 2004). Similarly, the magnitude of the system response depends not only on the size of the current rainfall event, but on the amount and timing of preceding events: after a long drought the response to a rain event is larger than to a similarsized event during the middle of the rainy season, but the time taken to reach the peak response is longer (Veenendaal et al., 2004). Therefore, it is not possible to use instantaneous measures such as the soil moisture content as a sole proxy for the state of the system. Gap-filling requires consideration of indices that have "memory": for instance, accumulators of water deficit.
Moreover, "phenomenological" models will only be appropriate when they truly represent the underlying responses (Falge et al., 2001). Most current respiration models define the relationship between respiration and temperature using an exponential-or logistic-shaped function; i.e. functions that either continually increase, or level off at a maximum value (Moffat et al., 2007). These models were developed in systems where temperature ranges are generally below 30 • C (Fang and Moncrieff, 2001;Lloyd and Taylor, 1994). Physiologically, respiration is expected to decrease once temperature exceeds the optimum for microbial activity (Yamano and Takahashi, 1983), and photosynthesis shows a similar reduction at high temperatures (Hamerlynck and Huxman, 2000). In tropical dry systems, the soil temperature in the top centimetres often exceeds 40 • C. Thus more appropriate functional forms need to be developed before current gapfilling methodologies can be applied globally.
Improving functional relationships to include extreme conditions would also be valuable in the context of climate change. In coming decades, many ecosystems around the world are likely to be exposed to higher temperatures and reduced moisture availability. Information on ecosystem responses to high temperatures and intermittent droughts will be valuable in predicting responses to these changes.
We present a statistical approach to estimating annual NEE for a semi-arid savanna system in Southern Africa. We tested the importance of six environmental drivers of daily photosynthesis (GPP) and respiration (R eco ) at the Skukuza flux tower in the Kruger Park (25.02 • S, 31.50 • E). Predictors commonly used in temperate systems were included, together with a range of environmental predictors chosen to reflect the effect of pulsed rainfall events. Predictive models were then used to interpolate annual fluxes over a 25 year time period, and to investigate the degree and possible causes of inter-annual variation in CO 2 exchange.
Our approach was motivated by the fact that there was a limited amount and duration of flux data (spanning six years with many gaps, which is too short for a reliable estimate of variance), but that a full time series of daily meteorological and phenological data was available for a 25 year period. Working at a daily time-step allowed us to bridge the gap between the half-hourly flux data and the crucial annual timescale, and to use the long-term meteorological data to estimate inter-annual variability. Process-based modelling would be ideal for these systems where previous conditions affect the response of the system to perturbation, but we chose to limit ourselves to a statistical analysis, given our imperfect understanding of the processes driving NEE in these systems. Results from this research will be used to develop more process-based models.
This paper aims to: -Document new procedures for eddy covariance gapfilling that are appropriate for dry, hot ecosystems; -Explore the factors associated with short-term (daily) variation in NPP, GPP and R eco ; -Calculate annual estimates of NEE and explore the main factors driving inter-annual variation in savanna carbon exchange at the Skukuza flux site in South Africa.

Study site
A flux tower situated in a semi-arid savanna near Skukuza, in the Kruger National Park has been collecting data since February 2000. The site is 370 m above sea level with strongly seasonal rainfall occurring between November and April. Mean annual rainfall is 550 ±160 mm. The landscape is gently undulating, consisting of broad-leaved Combretum apiculatum-dominated savanna on the coarse sand crests and fine-leaved Acacia nigrescens savanna on sandy clay loam in the valleys (Scholes et al., 2001). The soils are about 0.6 m deep. The eddy covariance flux tower is situated at the ecotone between the two vegetation types. The woody vegetation reaches 8-10 m in height and the flux sensors are at 17 m, giving the tower a footprint of about 500 m. The vertically projected tree canopy cover in this area is about 30% and woody basal area is 7 m 2 ha −1 . The grass layer is dominated by Panicum maximum, Digitaria eriantha, Eragrostis rigidor, and Pogonarthria squarrosa.
The tower is instrumented with a sonic anemometer (WindMaster, Gill Instruments Ltd., Lymington, UK) measuring wind velocity in three dimensions and a closed-path infrared gas analyzer (LI-6262, LI-COR Inc., Lincoln, NE, USA) measuring water vapour and CO 2 concentration. The raw high frequency (10 Hz) data was processed following (Lee et al., 2004) to produce half-hourly measures of abovecanopy turbulent fluxes of sensible heat, water vapour, and carbon dioxide. Heat and mass fluxes were calculated based on conventional equations and corrections (see e.g. Moncrieff et al., 1997;Aubinet et al., 2000) and all fluxes are reported as positive upward from the land to the atmosphere. Canopy storage flux was estimated from the half-hourly time derivative of a 16 m column integral based on CO 2 concentrations measured at 0.75, 2.0, 3.5, 5.25, and 16 m, and added to the above-canopy turbulent flux for data analysis. Incoming and outgoing long-and shortwave radiation was measured with a net radiometer, (NR Lite, Zipp & Zonen B.V., Delft, The Netherlands), with the incoming and outgoing shortwave radiation measured with pyranometers (CM 21, Zipp & Zonen B.V., Delft, The Netherlands) mounted at 22 m.
Average half-hourly volumetric soil (θ) water content was estimated with 15 cm long frequency domain reflectometry sensors (CS616, Campbell Scientific Inc., Logan, UT, USA) installed horizontally at soil depths of 3, 7, 16, 30, and 50 cm in the clayey Acacia -dominated soils downhill of the tower, and 5, 13, 29, and 61 cm in the sandier Combretum -dominated soils uphill. Rainfall per half hour was measured with a tipping bucket rain gauge (TE525, Campbell Scientific Inc., Logan, UT, USA) located on the tower top, along with other standard meteorological variables such as air temperature and humidity, wind speed and direction.

The effect of the ecotone
The differences in soil properties and species composition above and below the seepline were expected to be apparent in the flux data from the tower. To test this we separated the half-hourly fluxes into predominantly broad-leafed and predominantly fine-leafed (based on the wind direction (Fig. 1). The only observable difference between the two vegetation types was a slightly higher night time carbon flux in the broad-leafed savanna during the dry months -this was not considered significant enough to justify separating the fluxes into two datasets with the consequent reduction in sample size. Kutsch et al. (2008) similarly notes that the data "show no significant differences between the savanna types in terms of fluxes". Whether this was due to a lack of ability to differentiate between fluxes from the two sites, or because at the landscape level the differences are not significant, we chose to complete the rest of the analysis using all flux data as one unit. We used a model to create an integrated site-level soil moisture record (For soil moisture modelling methods see Archibald and Scholes, 2007). See Appendix A for a calibration of measured vs modelled soil moisture.

Data processing and gap filling
Flux data were available from February 2000 to December 2005 (the site continues to operate, but with an open-path IRGA). Of the half-hourly data, 41% was missing, which is slightly more than the average among flux sites of 35% (Falge et al., 2001). As rainfall occurs during summer months of November to April the flux data were summarised by rainfall years (July to June) which provided five full years of flux data -with data coverage ranging from 30-74 % annually. Most of the data gaps were for a single half hour interval, but instrument failure due to lightning strikes resulted in six gaps of over two months duration, usually occurring during summer periods. These large, non-random gaps limit the types of gap filling approaches that can be used.  When a u * filter of 0.1 m/s was applied to eliminate periods of low turbulence during which eddy covariance measurements are unreliable (Goulden, 1996), the missing flux data increased to 49%. Linear interpolation was used to fill gaps <2 h in duration, which reduced the data gaps to 44%. These half-hourly data were then summed to calculate daily NEE values for all days with unbroken 30-min time series. The result was 698 days of NEE data. These days were not randomly distributed through the year, with the rainy months (particularly December and January) having many fewer data points than the dry months of June through September (Fig. 2). Dry, winter conditions are therefore over-represented in the sample. In addition, one of the periods of most continuous and cleanest observations spans an intense drought, 2002-2003 growing season, further biasing results.
Simple gap-filling techniques using mean daily averages are inadequate for filling gaps in the Skukuza data because the stochastic and variable NEE response over the course of a wetting event would not be well represented by a summary value and because gaps in the data often span several weeks. Non-linear regression methods work well when there is just one main driver of carbon uptake or release (in temperate systems, temperature is normally used to drive respiration, and APAR to drive photosynthesis (Moffat et al., 2007). However, the presence of multiple drivers at the Skukuza site means that single-parameter non-linear methods are unlikely to be sufficient.
Similarly, Marginal Distribution Sampling (MDS: Reichstein et al., 2005) uses a look-up table approach which fills gaps by taking the average value for data collected under similar meteorological conditions within a certain window of the missing data. In this way it accounts for temporal auto-correlation as well as co-variation with meteorological drivers. However, when there are long gaps this method breaks down, and the choice of "similar" meteorological conditions requires that the appropriate hydrological indices are considered in the model -current implementations use only radiation, temperature, and vapour pressure deficit (http://gaia.agraria.unitus.it/database/eddyproc/). We used artificial neural networks (ANN) as our gapfilling approach, as this method accommodates non-linear relationships between variables but requires few a priori assumptions on the relative importance of different variables or their functional relationships. The usefulness of ANNs depends entirely on the appropriate selection of input variables -and we hoped to improve on standard methods available by choosing variables which would reflect the pulsed response to soil moisture in arid systems. We also ran standard multiple linear regression models on the data to explore interactive effects between the variables. This approach allowed us to investigate the important drivers of NEE, as well as develop models which could be used for prediction using long-term meteorological data.

NEE, photosynthesis, respiration
Half-hourly night-time fluxes were used to estimate the daytime respiration. A stricter u * threshold of 0.25 ms −1 (Kutsch et al., 2008) was used for this analysis, as it was more important to have reliable data than large sample sizes. Respiration is controlled by temperature, which generally varies quite predictably over the course of a day, as well as variables such as soil water content and the amount of actively photosynthesising leaf material, which are relatively constant over a single day, but vary over longer time scales. We therefore took a two-scale approach to determining day-time ecosystem respiration: we derived a temperature response curve by fitting it to "optimum" respiration conditions -i.e. the maximum values measured at a range of temperatures (all valid halfhourly night-time fluxes were used for this). This curve was used to estimate the maximum potential respiration rate for each daylight interval, using the daytime temperature trend as input (see Appendix B for more details on this method). The actual respiration during any particular day was then estimated as the temperature-driven "potential" scaled by the ratio of observed night-time respiration to the potential nighttime respiration for that day. By multiplying the temperature response function of R eco by the scaling parameter, the estimated respiration values are shrunk towards the mean respiration value for that day, thereby resulting in a day-specific temperature response function for R eco . This scaling factor was assumed to account for the effects of soil moisture and physiological activity. Unlike the flux-partitioning method of Reichstein et al. (2005) this method does not require a separate temperature response function to be derived for each day.
Conventional exponential (e.g. Arrhenius, Lloyd-Taylor) temperature functions were not considered appropriate representations of the response functions, as day-time temperatures at the site often exceed that which is optimum for microbial activity (Yamano and Takahashi, 1983). An analysis of independently-collected respiration data from the site, collected using soil chambers, indicated that a generalised Poisson temperature relationship produced the best fit to measurements of soil respiration (Kirton et al., unpublished data).
We therefore used the following equation to describe the optimal temperature response: Parameters were estimated using a non-linear least squares by means of the Levenberg-Marquardt algorithm: where values in brackets represent the standard error of the estimate. Only days when there were more than three valid night time flux values with which to estimate the scaling parameter were used to interpolate day-time fluxes. See Appendix B for details on this method and a comparison with other methods. Negative night time fluxes were excluded from the model fitting, as there was no theoretical justification for negative respiration. Interpolated respiration values that dropped below zero (which can occur at very high or low temperatures, using the parabolic curve) were given a value of zero. This method produces predicted respiration values with similar distributions to those recorded for all conditions of soil moisture and f APAR (Fig. 3).
Daily respiration (R eco ) values were obtained by calculating a half-hourly value (multiplying the per second value by 60 × 30) and summing this over the 48 half-hours. All other daily values were calculated in the same way. Daily Gross Primary Production (GPP) was calculated by subtracting the interpolated day-time respiration values from the recorded daytime NEE values, and summing over the daylight hours. This resulted in a dataset with 372 valid daily records for R eco and 529 for GPP.

Drivers of NEE
In temperate systems incoming solar radiation (PAR) and temperature are the main drivers used to predict photosynthesis and respiration. In some models these are modified by measures of LAI and soil moisture (Moffat et al., 2007). We chose to test six input variables as predictors of GPP and R eco (see Table 1).
Only data that could be derived from standard daily South African Weather Services (SAWS) climate records or longterm low-resolution satellite vegetation indices were used as  (red) halfhourly respiration values over temperature. Data are presented for all conditions, for periods of low soil moisture, for periods with little leaf material (low f APAR ), and for conditions of low soil moisture and f APAR . Interpolated values lie well within the distribution of observed values for all conditions. It is also clear that respiration drops off at high temperatures, and that temperature-response functions need to include this reduction at high temperatures if they are to be appropriate for this site.
input predictors, in order that the models could be used in conjunction with the long-term records to estimate NEE over periods much longer than the eddy covariance data would permit. The daily time-course of temperature variables was estimated from daily maximum and minimum air temperature. Soil water content was modelled using a simple bucket model and Penman-Monteith evapo-transpiration functions (Archibald and Scholes, 2007). The half-hourly meteorological data available at the flux tower was used to validate these models (see Appendix A).
Three different measures were used to indicate the hydrological state and history of the ecosystem: Relative Available Water Content (RAWC); water deficit (a function which accumulates the deficit for all days of water stress θ <θ crit until rewetting occurs); and time since wetting (the time since the last big wetting event -i.e. time since θ increased above θ crit ). Equations for these indices can be found in Table 1. Mean air temperature -which correlates well with soil temperature (Appendix A) -was used as the predictor of R eco , whereas mean daytime temperature was used at the predictor for GPP. The European Joint Research Centre 10-day f APAR product (Pinty et al., 2002) was linearly interpolated to create a daily f APAR parameter. A relationship between AVHRR- Table 1. Defining the six input variables used in the models to predict GPP and R eco . All input variables were derived from data available at a daily level from the SA Weather Services, so they could be used to produce long-term predictions. Relative available water content (RAWC) is calculated as in the formula below where θ = volumetric soil moisture content, WP = wilting point and FC = field capacity. Accumulated water deficit (wdef) is the accumulated difference between θ crit and θ while θ is less than θ crit (plants under water stress) and is 0 whenever θ is greater than or equal to θ crit (plants not under water stress). Time since wetting is the number of days that water deficit has been 0 (plants have not been water stressed).  Tucker et al., 2005) and f APAR was used to define the daily f APAR input for the period before 1998 which was when the Joint Research Centre (JRC) dataset started (see Appendix A).

Modelling approach
Two different artificial neural network (ANN) methods were tested: Generalised Regression Neural Network (GRNN) and Multi-Layer Feed Forward Neural Network (MLF). The GRNN is based on a kernel smoothing approach and has the advantage of using non-parametric regression procedures (which makes no assumptions about the underlying data) and can be trained quickly as only the smoothing parameter needs to be estimated and optimised. As has been found in other studies (Cigizoglu, 2005;Currit, 2002;Kisi, 2006) this method is efficient for modelling non-linear systems and worked as well as the more traditional MLF, which required excessive fine-tuning to optimise the system architecture. Three separate models were developed for predicting R eco , GPP, as well as daily NEE. Models were developed using 80% of the data for training and 20% for testing (proportions of 70-30% were also tried, without substantially changing the results). Multiple linear regression (MLR) equations with up to three-way interactions were examined for both photosynthesis and respiration. A combination of backward selection and stepwise selection was used to obtain significant predictors in the model. The ability of the MLR to explore the importance of different variables separately and in combination added value to the results of the ANN. However, there are strong theoretical reasons against using ordinary least squares (OLS) regression for data-filling (Richardson and Hollinger, 2005), which is why we restricted their use to exploring the relationships between variables. Many of the meteorological variables, at least over a certain range, are expected to have a near-linear relationship with respiration and photosynthesis. Temperature is an exception: therefore quadratic terms of temperature were also included during the model selection process.

Error estimation
The random error component of the total error in the daily carbon fluxes was considered in an attempt to obtain a confidence interval for the annual estimates of NEE. The systematic component of the error was not assessed for this paper, but this analysis will be carried out at a later stage. To estimate the random error, the method described by Richardson et al. (2008) was used, where the model error was used as a surrogate for the random error. The error of the daily ANN model prediction (difference between the observed and modelled daily fluxes) was calculated for all cases where the observed daily fluxes were available. The distribution of these errors fitted a Laplace distribution better than a normal distribution (Chi-squared tests for goodness of fit were χ 2 =37.37 compared with χ 2 =111.01 for the normal distribution). Richardson et al. (2008) also found the errors in halfhourly flux data to be distributed according to the Laplace distribution.
We assumed the daily errors were independent and identically distributed. This allowed us to use the Central Limit Theorem to assume normality for the annual sum of the errors in the fluxes. The expected value of the errors was assumed to be zero and the variance of the errors was estimated by the sample variance. The approximate standard error for the annual estimates was then calculated to be 12.9 g Cm −2 year −1 , leading to coefficients of variation from  Maximum CO 2 sequestration occurs when soil moisture is low but green leaves are still present. Wet conditions were defined as periods when the soil moisture was greater than 9% volumetric water content, dry conditions, less than 6%. Periods with green leaves were defined as periods when the f APAR value was greater than 0.2. The average number of days each year for each combination of physiological and soil moisture conditions are shown, together with the average daily sum of NEE (g C/m 2 /day) for these conditions. 8-30%, and therefore the error in the annual NEE estimates is 25.3 g Cm −2 year −1 with 95% confidence. This agrees with the estimate of random error obtained by Richardson and Hollinger (2005), where they used the Monte Carlo simulation to estimate the error in the model parameters and model estimates. Goulden et al. (1996) and Oren et al. (2006) both reported instrument error of approximately 5% for closed path eddy covariance systems. If the same instrument error can be assumed for the Skukuza data, this increases the error value by between 4.1 and 15.5 g Cm −2 year −1 .

Carbon balance
The diurnal time-course of NEE is highly responsive to soil moisture and the presence of green leaves (Fig. 4). Interestingly, maximum CO 2 uptake occurs during periods of low soil moisture when green leaves are still present, because under these circumstances the contribution of soil respiration is low, but a substantial amount of photosynthesis is still occurring using water stored in the plant, or accessed from deeper soil layers that do not contribute much to ecosystem respiration. 3.2 Gap-filling: modelling R eco and GPP Despite the relative paucity of daily data both the ANN and multiple regression methods produced models which reasonably represented the input data (Table 2). Mean absolute error (MAE) ranged from 0.37 to 0.56 g C/m 2 /day, which compares favourably to the 1-1.5 g C/m 2 /day range of values reported by Moffat et al. (2007) for a range of gap-filling methods and vegetation types. Respiration was generally harder to  Table 4a. Results of a multiple linear regression to predict ecosystem respiration (a), and GPP (b). Parameters are displayed in order of decreasing significance and non significant parameters are excluded from the tables. The best respiration model included f APAR , time since wetting, soil temperature, and relative available water content, and two-way interactions between these variables. This corroborates the findings of the ANN model, but does not produce a good prediction (r 2 =0.41, MAE=0.85 g C/m 2 /day). The best GPP model included f APAR , time since wetting, relative available water content, mean daytime temperature, and threeway interaction between several variables. This also corroborates ANN results, and produces a reasonable prediction (r 2 =0.68, MAE=0.62 g C/m 2 /day). The ANN identified available green leaf material (indexed by f APAR ) to be the most important predictor of both R eco and GPP, but f APAR was relatively more important for predicting GPP than for predicting R eco , as would be expected (Table 3). We interpret the role of f APAR in driving R eco as reflecting the availability of readily-respired sub-strate. For GPP the time since wetting event was the next most important predictor, which corroborates findings of Wohlfahrt (2008) and Xu et al. (2004) that there is a delay in the pulse of photosynthetic activity after a rainfall event. In terms of water relations, soil moisture content was the best predictor for R eco , but water deficit and time since wetting were also identified as important. Interestingly, temperature did not prove to be important in predicting either respiration or photosynthesis. This could reflect the daily time-step at which we did the analysis -in this sub-tropical system temperature variation between days and over the growth season is much less important than variation in leaf dynamics and soil moisture in driving NEE.

Std. Estimate Error t-value P
For respiration models using MLR, f APAR and time since wetting were the most significant single predictors (Table 4). Interactions between various soil moisture parameters and f APAR also significantly improved the fit of the respiration model. As can be seen in Fig. 4, the effect of a parameter like soil moisture greatly depends on the amount of photosynthesising green leaf material, so it is unsurprising that these interaction terms are important.
In the photosynthesis model soil moisture was very significant, and three-way interactions between f APAR , soil moisture, PAR, and time since wetting were important in improving model fit. The importance of the interactive terms perhaps goes some way to representing the delayed photosynthetic response to wetting events identified by Xu et al. (2004). It usually takes 5-7 days in this system before photosynthesis reaches its maximum after a wetting event, and this response depends on how much leaf material is present. Temperature was included in both the GPP and R eco models as it produced significant interactions with other variables, but as a main effect it was not significant. The ANN net ecosystem exchange model had the lowest error (Table 2), so this model was used to gap-fill the six year dataset .

Inter-annual variability
Annually-integrated net ecosystem exchange varied from −138 to +155 gC/m 2 /y over the 5 year period for which there were flux data (Table 5). In drought years limited carbon uptake occurs even during the height of summer, but in years with above average rainfall the site can be a sink of carbon for several months of the year (Fig. 5). Only two of the five years had negative NEE (in other words, were net carbon sinks at the annual timescale). It is possible that our gap filling methods over-estimate the amount of respiration occurring at this site: there was very little data available during the summer months (Fig. 2), so the model was probably not well trained to identify days of maximum GPP in this system. To test this we will need to acquire a more extensive summer dataset for this site. Estimates of random error suggest that years where predicted annual NEE was within ±25 gC/m 2 /y might actually have been close to carbon-neutral. If systematic error is included, this error estimate increases further to up to approximately 40 gC/m 2 /y. When the 25 year NEE sequence is predicted the pattern becomes more obvious (Fig. 6). The site was predicted to be a net sink for carbon in only 6 of the 25 years, but three other years (1989, 1996, and 2000) may have been near-sinks. The data give a long-term mean annual NEE of 75(±105) g C/m 2 /y. Loss of a cohort of aging Acacia nilotica trees at the site, and increased stem damage with increasing elephant populations over the last 20 years might both and four potential drivers of inter-annual variability in carbon uptake: annual rainfall, available photosynthetically active radiation, length of the growing season, and number of growth days. Annual rainfall seems to be the least significant, compared with parameters that include seasonal variation in leaf display (APAR and length of growing season), and the seasonal distribution of rainfall. Solid circles represent years 2000-2005 for which flux data were available to constrain the model. contribute to making this site appear as a net source in this analysis. Recent field data at the site record high rates of tree turnover: 8%(±3%) per annum -with damage by elephants and senescence of old Acacia nilotica trees being the main cause (Archibald, unpublished data). These turnover rates are high, but not exceptional for southern African savannas (Shackleton, 1997), and it is perfectly feasible that tree growth could match these losses. Therefore, it would be precipitous to speculate further on the implications of the long-term predictions until there is better information on tree productivity, and more peak-growing season flux data with which to calibrate the models. The most useful information provided by the long-term prediction are estimates of the inter-annual variation for this site. Figure 7a indicates that there is a strong relationship between predicted annual NEE and absorbed photosynthetically active radiation (APAR, which is PAR * f APAR ). This analysis suggests that once annually accumulated APAR exceeds about 675 MJ/m 2 , the system becomes a sink for carbon (Fig. 7a).
It might seem surprising that soil moisture, which was so important at a daily time scale, does not show a stronger relationship with annual NEE. Even when photosynthesis and respiration are considered separately (Fig. 7b, c), by far   the best relationship is found with APAR. This result makes sense when one considers that both the ANN and the MLR analyses showed strong interactive effects of soil moisture with f APAR -i.e. the effect of available soil moisture in driving Pn and R eco depends heavily on the amount of photosynthetically active green leaf material. Similarly, soil moisture has been shown to be an important driver of seasonal patterns of leaf display at the site (Archibald and Scholes, 2007). Therefore f APAR can be seen as an integrated measure of Table 5. Summary of NEE over the 5 year period for which there was flux data. Negative values represent an overall sink of carbon. Data gaps were filled using an ANN and predictors f APAR , water deficit, relative soil moisture content, mean day time temperature, time since wetting, and mean soil temperature, in that order of importance. Also reported are annual summaries of rainfall, available photosynthetically active radiation, length of the growing season, and number of growth days (days when soil moisture content is greater than θ crit , 7% by volume).

Rainfall year
Annual NEE 95% confidence Annual rainfall Annual PAR Growing season Number of (July to June) ( hydrological conditions at the site, which is better at predicting annual-scale carbon exchange than any measure derived from short-term measurements of daily soil moisture. For example in the 2003-2004 rainfall year the total annual rainfall was above average (618 mm) but it was heavily skewed towards the last part of the growing season, and the start of grass growth was delayed by several weeks. In this instance integrated values of APAR would represent the growing conditions for a season better than total rainfall, or even number of growing season days.

Other pathways of carbon loss from the system
A savanna carbon budget would be incomplete without a consideration of fire and herbivory. The fluxes of CO 2 to the atmosphere via these two pathways have not been directly measured at the Skukuza site, but can be inferred and constrained from other data. The abundant large mammalian herbivore (>5 kg body mass) community in this landscape consists of 14 species, mostly Bovidae. The combined herbivore biomass is 3155 kg km −2 (Scholes et al., 2004). Taking into account the effect of body mass on metabolic requirements and digestability, this translates to a herbivore respiratory flux of 4.5 g Cm −2 y −1 and a flux from the decomposition of dung of 5.0 g Cm −2 y −1 . The uncertainty range associated with these estimates is unknown, but thought to be around 20%, related mostly to errors in game census. The inter-annual variability is thought to be relatively low. The herbivore respiration and dung decomposition fluxes are subsumed in the ecosystem respiration measured by the eddy covariance system (Table 6). The mean fire return time in this landscape in the KNP is 4.2 years (Van Wilgen et al., 2000). The most comprehensive set of fuel measurements for this landscape was taken in August 1992 at 10 locations within 30 km of the Skukuza site (Shea et al., 1996). The combusted material was predominantly dry grass (1442±975 SD kg ha −1 ), tree litter (1452±636 kg ha −1 ) and a contribution from dead wood (226±194 kg ha −1 ) giving a total of 3120±1795 kg ha −1 . A multi-site, multi-year mean grass fuel load for the KNP is 3359 kg ha −1 , with a range of 1152-6728 (Trollope and Potgieter, 1985). The emission factor for CO 2 , measured for the same fires as the above fuel loads (Ward et al., 1996) is 1699±33 gCO 2 kg DM −1 . Therefore, the long-term annualised emission of CO 2 through fire is around 136±58 gCO 2 m −2 y −1 . An additional 6.4±3.9 gCO m −2 y −1 and 0.2±0.2 g CH 4 m −2 y −1 are also emitted from fires, so the total pyrogenic carbon loses are around 40.0±17.5 g Cm −2 y −1 ( Table 6). The flux site has burned five times since 2000 1 , which suggests that the pyrogenic emissions during this period are probably about twice the long-term, landscape-scale averages calculated above. The pyrogenic fluxes are in principle part of ecosystem respiration, but in practice are not measured by the eddy covariance system because they occur briefly, and during that period exceed the measurement range of the infra-red gas analyser. The inter-annual variability is high because a given site does not burn at all in most years, and the fuel load varies greatly in the years when it does burn, in response to the variability of rainfall in the preceding season. The total carbon budget -including fire, herbivory, and plant growth and decomposition -for the skukuza site is therefore 115.0±122.5 g Cm −2 y −1 . 262 S. A. Archibald et al.: Inter-annual variability in NEE

Conclusions
Inter-annual variability in carbon exchange at the Skukuza flux site is on the same scale as an oak savanna in California. In a seven year study Ma et al. (2007) measured values from −155 to −56 g Cm −2 y −1 in the savanna and from −88 to 141 g Cm −2 y −1 in an adjacent grassland. This compares to −138 to 155 g Cm −2 y −1 from the six year Skukuza dataset. The variability at Skukuza seems to be largely controlled by variations in the length of time that green leaves are displayed by the trees and grasses, and by changes in seasonal patterns of water availability (Fig. 7) -both ultimately driven by variations in rainfall between years.
The flux-partitioning and gap-filling procedures developed in this paper are a distinct improvement on standard methodologies largely because they use more appropriate temperature-response functions and explicitly include a soil moisture control, including indices of the wetting history. Estimates of annual CO 2 flux obtained through gap-filling using an ANN may be slight over-estimates (i.e., slightly biased toward the sink side), because of the paucity of peak growing season flux data. However, it is also possible that this particular savanna site has been a carbon source in recent years due to high tree turnover. Results of the ANN gap-filling procedures and MLR models indicate a large degree of interaction between driver variables and lend support for the development of a process-driven model for this system. Such a model would need to include explicit measures of leaf mass, soil moisture and temperature.
The generalised Poisson function used here to fit an optimum temperature response curve is an effective method for extrapolating day-time respiration in systems where temperatures often exceed 30 • C -provided a scaling factor is used to control for the co-limiting factors of LAI and soil moisture. At a daily to seasonal level, however, temperature was shown to be less important than other factors in influencing NEE.

Comparison of meteorological data
Correlation between the flux tower variables and corresponding variables from other sources appears in Table A1. Strong linear relationships exist between the flux tower daily measurements for the mean soil temperature and the mean daytime temperature and the corresponding temperature variables derived from the minimum and maximum daily temperatures of the South African Weather Services (SAWS) data. There is also a strong linear relationship between the measured mean soil moisture and the modelled soil moisture using the SAWS data, as well as a fairly strong linear relationship between PAR derived from the shortwave radiation from the flux tower and the modelled PAR. The correlation between the flux tower rainfall and the SAWS rainfall is significant, but not as strong as that of the previous comparisons to SAWS derived variables. The peaks of the environmental data are usually slightly higher than recorded from the flux tower, although there are few days when the flux tower recorded higher values. This could be due to localised rainfall events. Peaks in the data do not always correspond and this could be due to the measurements from the SAWS data being taken daily from a rain gauge, whereas the flux tower took instantaneous measurements of rainfall. Therefore daily rainfall events may not always correspond exactly. The pattern of rainfall over time appears to match for the two data sets. The annual sum of rainfall for the environmental data is always more than that for the flux tower data (Table A1). This is due to missing data from the flux tower.
There is a strong linear relationship between Gimms NDVI and f APAR (Table A2). Therefore a linear regression equation was derived to describe this relationship. The linear regression obtained a r 2 -value of 0.71 and an MAE of 0.05.
The standard error for the intercept is 0.004 and the standard error for the slope is 0.009.

Appendix B Interpolating day-time respiration
Fitting an optimal temperature function to the mass of nighttime flux measurements involved making several assumptions about a) the shape of the temperature-respiration curve, and b) the values to use to fit the curve.

B1 Shape of the temperature-response curve
Field data indicate that a generalised Poisson function is the best descriptor of the effect of temperature on respiration, as it describes both the exponential increase of respiration with temperature and the sudden decrease once the temperature optimum has been reached (Kirton et al., unpublished data). However, for this analysis we also tried a simple parabolic function.
B2 Values used to fit the curve This interpolation method relies on deriving a curve that represents the temperature response under a certain set of environmental conditions. Any deviation from this line by an observed point is then assumed to be due to different environmental conditions. The curve can be pulled up and down to match this point, and thereby adjust for these varying environmental conditions, by the use of a scaling parameter. Missing respiration values (day time points) can then be interpolated on this day (because the environmental conditions other than temperature are going to remain stable at a daily time step) by using the temperature at each point and the adjusted temp/resp equation.
With this in mind, extracting the points to be used could be done in a number of different ways. The easiest way to identify points where all factors other than temperature are constant would be to identify the maximum points for each ±25% quantiles, bar: data range). The median and ±25% quantiles are very similar for each method, but the method that calculates the fitted values had slightly lower maxima than the other two methods. All data are well within the range of measured R eco values (u * -corrected half-hourly night-time fluxes). temperature value (which would represent respiration under completely optimal conditions of soil moisture and LAI). We tried three different methods for extracting these values: manually picking the maximum respiration values, calculating the maximum respiration value for each degree temperature change, and calculating the 95th quantile for each degree temperature change (Fig. B1). We also tried manually picking values at the top of the thickest part of the cloud of respiration points. This approach would exclude any extreme outliers but could also be assumed to represent the same set of other environmental conditions. Because the curve is adjusted up and down based on the respiration values on the day in question, the position of the curve on the y-axis is unimportant. It is the shape of the curve that will affect the interpolation.
Using the 95th quantile was not satisfactory as some temperature categories had orders of magnitude more respiration measurements than others. We therefore abandoned that method and tested six different respiration interpolation methods (Table B1): manually selected maximum points (fitting a parabolic and generalised Poisson), manually selected points at edge of data cloud (parabolic and GDP), and calculated maximum points (parabolic and GDP).

B3 Results
Results indicate that the interpolated values are very resilient to the method used to fit the temperature response curve. The distribution of interpolated points was similar for all six methods (Fig. B2), and linear regression models show similar fits to the observed respiration data (Table B1). A visual assessment of the interpolated points (Fig. B3) indicates Table B1. The six different methods used to fit a temperature response curve to the measured night-time (respiration) fluxes. Two different fitting functions were used, and three different methods for identifying points to fit the curve to. The distributions of the data interpolated with each method were very similar to each other (Fig. B2), and fell well within the bounds of the observed respiration data (Fig. B3) Figure B3: The distribution of measured half-hourly night-time fluxes (black circles) and 921 interpolated half-hourly respiration (red crosses) along a temperature axis. Interpolated fluxes 922 represent all half-hour values which had soil temperature data and at least three night-time 923 fluxes to estimate the scaling parameter. 924 Fig. B3. The distribution of measured half-hourly night-time fluxes (black circles) and interpolated half-hourly respiration (red crosses) along a temperature axis. Interpolated fluxes represent all half-hour values which had soil temperature data and at least three night-time fluxes to estimate the scaling parameter. that the generalised Poisson interpolations fell more clearly within the main data cloud. We therefore chose to use the calculated maximum value method fitted to the generalised Poisson distribution.