OzFlux Data : Network integration from collection to curation

Measurement of the exchange of energy and mass between the surface and the atmospheric boundary-layer by the 10 eddy covariance technique has undergone great change in the last two decades. Early studies of these exchanges were confined to brief field campaigns in carefully controlled conditions followed by months of data analysis. Current practice is to run tower-based eddy covariance systems continuously over several years due to the need for continuous monitoring as part of a global effort to develop local-, regional-, continentaland global-scale budgets of carbon, water and energy. Efficient methods of processing the increased quantities of data are needed to maximise the time available for analysis and 15 interpretation. Standardised methods are needed to remove differences in data processing as possible contributors to observed spatial variability. Furthermore, public availability of these datasets assists with undertaking global research efforts. The OzFlux data path has been developed (i) to provide a standard set of quality control and post-processing tools across the network, thereby facilitating inter-site integration and spatial comparisons; (ii) to increase the time available to researchers for analysis and interpretation by reducing the time spent collecting and processing data; (iii) to propagate both 20 data and metadata to the final product; and (iv) to facilitate the use of the OzFlux data by adopting a standard file format and making the data available from web-based portals. Discovery of the OzFlux data set is facilitated through incorporation in FluxNet data syntheses and the publication of collection metadata via the RIF-CS format. This paper serves two purposes. The first is to describe the datasets, along with their quality control and post-processing, for the other papers of this Special Issue. The second is to provide an example of one solution to the data collection and curation challenges that are 25 encountered by similar flux tower networks worldwide.


Introduction
Studies of the interactions between terrestrial ecosystems and the atmosphere have evolved over the last century, and the development of eddy covariance (EC) between the early studies and current networks has faced several challenges.In the mid-to late 1950s, the first eddy covariance systems were designed and built by researchers at the Australian Commonwealth Scientific and Industrial Research Organisation (CSIRO; Dyer, 1961), when the challenge was to build the instruments with which to measure atmospheric turbulence and turbulent transport of momentum and scalars.These early experiments and those that followed made the basic measurements possible and led to rapid advances in instrumentation technology.Even still, measurements were restricted to short, intensive field campaigns for several decades to follow.By the mid-to late 1980s, the First ISLSCP Field Experiment (FIFE) was organised to extend the studies to full growing seasons (Kim and Verma, 1990).This intensive field campaign was a large collaborative venture that produced insight into fundamental processes of turbulence, CO 2 fluxes and thermodynamics of the surface layer (Kim and Verma, 1990;Norman et al., 1992;Dias and Brutsaert, 1996).Through the 1990s, EC studies were organ-P.Isaac et al.: Network integration from collection to curation ised into similarly intensive, multi-disciplinary projects at a single location: OASIS (Observations at Several Interacting Scales) during 1994 and 1995 in New South Wales, Australia (Cleugh et al., 2004;Leuning et al., 2004), and BOREAS (Boreal Ecosystem-Atmosphere Study) 1994-1996in Manitoba, Canada (Goulden and Crill, 1997;Goulden et al., 1997;Hogg et al., 1997;Blanken et al., 1998).These large studies were intermittent in space and time.
Shortly after the OASIS and BOREAS experiments, several regional flux networks were formed: EUROFLUX in 1998 (Tenhunen et al., 1998), Ameriflux in 1996 (Hollinger et al., 1999;Wilson and Baldocchi, 2000) and OzFlux in 2001 (Beringer et al., 2016a).FLUXNET, the global "network of networks", was established in 1998 (Baldocchi et al., 2001) initially for validation of remote sensing products but now operating in a wider capacity as a clearing house for global flux data sets.The current grand challenge has become to develop routines for processing the data and correcting for instrumental and physical artefacts.These algorithms were developed independently in various laboratories and shared across the network.The network approach increased the number of investigators who could apply the EC method, which led to rapid advances in instrumentation.These advances facilitated the continuous application of EC over years, and this continuous data stream has led to the use of flux data for process studies and model parameterisation and validation.However, data were still sparsely distributed across the globe, but regional networks continued to grow, both internally and with the addition of networks in other parts of the world (e.g., ChinaFLUX).
With the expansion within these flux networks, the challenge is to handle effectively the large volumes of data generated by many towers in multiple ecosystems, whilst assuring data quality, and this has driven the development of integrated software packages (e.g., EdiRe, EddyPro, TK3).At the same time, there has been a parallel push to integrate across networks, requiring standardisation in the quality and format of data sets to easily combine them into larger databases like the LaThuile and FLUXNET2015 releases.This area has been advancing rapidly in recent years, but it is not as mature as instrument technology and processing algorithms that have preceded it.In OzFlux, we developed a solution (OzFluxQC) that provides a standardised set of tools and file formats to facilitate the public availability of data in a timely manner.By offering data consistency across the network, OzFluxQC provides the basis for network integration in the products that are supplied to global networks and other stakeholders, even whilst providing tools that are configurable for the specific conditions at individual sites.Most importantly, standardisation of data at the network level creates the possibility for data archiving that render the data discoverable, reusable and accessible for decades to come, long after the researchers who collected the data have moved on.
Provision of high-quality EC data requires that "bad" observations be removed, generating gaps in the data set.These data sets are increasingly used for construction of local to global carbon, water or energy budgets and in modelling applications (Moffat et al., 2007), as Falge et al. (2001b) had predicted they would.However, gaps in the data have to be filled to produce defensible sums or to incorporate into modelling frameworks.Early efforts using interpolation to fill gaps were replaced by lookup tables developed from the data set (Falge et al., 2001a, b).In this method, gaps are filled from the measured flux during a period of similar vapour pressure deficit, solar radiation and air temperature within a ±7-day window, which is expanded to a ±14-day window if no other similar periods are present within ±7 days (Reichstein et al., 2005).Gap filling via lookup tables is widely used, and the errors from filling gaps this way are smaller than for most methods, other than artificial neural networks (ANNs), which produce comparably small errors when trained upon a full year (instead of the ±7-to 14-day windowed approach used in the lookup tables; Moffat et al., 2007).Two types of ANNs are in use for filling of gaps in flux data.The first type are feed-forward ANNs (Papale and Valentini, 2003), which are in widespread use for gap filling of flux data (Beringer et al., 2007;Moffat et al., 2007;Beringer et al., 2016b).Alternatively, a self-organising linear output (SOLO) model based upon a self-organising feature map (SOFM) is a type of ANN that is used in hydrology for its small error and resistance to overtraining (Hsu et al., 2002).SOFM has been used to evaluate the meteorological drivers of fluxes (Schmidt et al., 2011), and SOLO is effective for modelling fluxes (Abramowitz, 2005;Abramowitz et al., 2006) and gap filling of fluxes (Cleverly et al., 2013;Eamus et al., 2013) and meteorological drivers (Cleverly et al., 2016d).Because of different assumptions in various gapfilling approaches regarding the relationships between meteorological conditions and fluxes, the choice of method can affect the results obtained from flux studies and ultimately bias our understanding of biogeochemical exchange.
The EC method is used to measure the net ecosystem exchange of carbon (NEE).However, the expanded use of flux data globally (e.g., in biogeochemical models and in comparisons to remote sensing studies; Baldocchi, 2008) means that the components which comprise NEE, gross primary production (GPP) and ecosystem respiration (ER), are required by the users of flux data.There are broadly two approaches to model GPP or ER from flux data (Falge et al., 2002;Baldocchi, 2008): from nocturnal thermal sensitivity relationships (for ER; Reichstein et al., 2005) or light-response functions (for GPP and ER; Stoy et al., 2006;Lasslop et al., 2010).Most variations on these two approaches (e.g., Q 10 and Arrhenius thermal-response functions, and rectangular and nonrectangular hyperbola light-response functions) produce similar site ranking, conferring greater confidence in the partitioned products on the condition that the same method is used to compare fluxes across sites (Desai et al., 2008).Whereas nocturnal methods remain the most commonly applied solution to partitioning of net carbon fluxes (Reichstein et al., 2012), the combination of methods shows great promise for reducing spurious correlations potentially present in these methods in isolation (Baldocchi and Sturtevant, 2015).Some methods are systematically biased, especially in drylands and Mediterranean climates (Desai et al., 2008), in which case partitioning by multiple methods can help to avoid unrealistic results (Cleverly et al., 2013;Baldocchi and Sturtevant, 2015).Because of the potential for bias originating from a poor choice of partitioning scheme, standardisation of partitioning methods across sites remains a challenge, especially in OzFlux sites such as AU-Tum, where exchange with CO 2 stored in the canopy interacts with drainage flows (van Gorsel et al., 2007), and at AU-TTE, where standard partitioning methods generate unrealistic values of GPP efflux (Cleverly et al., 2016b).For these extreme cases, we defer to Ray Leuning's oft-repeated maxim, "Know thy site", to remind us that there may not be a single, site-independent method to correctly partition carbon fluxes.At a network level, tools to employ standard approaches can be provided.In this work, we will address how the choice of standard partitioning method affects the resultant estimates of GPP and ER.It was recognised in the early stages of the development of OzFlux that a standard quality control (QC), postprocessing, gap filling and partitioning tool was needed for the OzFlux network.The principle aims in developing such a tool were to reduce site-to-site variability in the budgets of energy, water and CO 2 due to differences in the processing steps adopted by individual site principal investigators (PIs); to make available the cumulative wisdom of a few experienced researchers for the many OzFlux site PIs who were new to the fields of EC and surface-atmosphere interactions; and to provide processed data to the Australian and international research communities in a standard format and with sufficient metadata to make the data sets selfdocumenting.
In this paper, we will describe the production of common, standardised flux tower data sets using a processing tool developed by OzFlux.We investigate how the choice of data processing procedures, particularly gap-filling and partitioning methods, affects estimates of NEE, GPP, ER, evapotranspiration (ET), and energy fluxes across the OzFlux network at a continental scale.Following Falge et al. (2001a) and Desai et al. (2008), we anticipated that the various approaches would produce similar rankings of GPP and ER across sites, i.e., that site-to-site variability would be greater than variability between methods.We further hypothesised that surface energy balance (SEB) across OzFlux will average 80 %, following trends in FLUXNET (Wilson et al., 2002).Lastly, we will demonstrate how consistent, network-wide standards of data quality can be propagated to the end user by developing a common set of procedures to the data providers (i.e., individual site PIs).The definitions, symbols and sign conventions for carbon cycle terminology given in Chapin et al. (2006) are used throughout this paper.Refer to Table A1 in Appendix A for definitions of symbols.

The OzFlux tower network
Figure 1 shows a map of Australia containing the location of the OzFlux towers with thumbnail plots of monthly average air temperature and precipitation for climate stations that are representative of the climatic regions where the bulk of towers are located.Only those sites for which data are available from the OzFlux Data Portal (ODP; http://data.ozflux.org.au/) are shown.The OzFlux Data Portal contains 23 active sites, of which 11 receive contributions to their running costs from the Terrestrial Ecosystem Research Network (TERN), and another 13 inactive (closed) sites.Beringer et al. (2016a) provide an overview of the OzFlux network, the location of OzFlux sites in the climate space and approximate areal extents of the ecosystems represented within the network.Of note is the bias toward sites in native or remnant vegetation (n = 24) where carbon and water budgets are dominated by persistent ("woody") vegetation compared to the number in annually cyclic ("grassy") ecosystems (n = 12).This is important because of the finding in Haverd et al. (2013) that two-thirds of the annual net ecosystem productivity (NEP) across Australia is attributable to "grassy" biomes.The bias in site location relative to ecosystem types reflects the ad hoc nature of the network evolution over time, which is itself a feature of the network's funding history and something seen in other regional networks (Baldocchi et al., 2001).
Much of Australia receives less than 450 mm of precipitation annually, but a relatively narrow band extends around the coast from the Top End (Darwin) to southern Australia (Melbourne and Hobart), which receives much more precipitation annually.There are large extremes in precipitation amount and seasonality across the continent.For example, northern Australia is characterised by a wet-dry tropical climate with precipitation occurring mainly during the monsoon season (November to April; see thumbnails in Fig. 1).Central Australia experiences a continental climate with the largest seasonal variation in temperature (Alice Springs thumbnail, Fig. 1), and small amounts of precipitation on average, but with very large interannual variability (Cleverly et al., 2016a).The southern coastal regions experience mild temperatures and a large range in the precipitation seasonality, from an autumn peak in Sydney to little seasonality in Melbourne and a pronounced winter peak in Perth.The large range in precipitation amount and seasonality has resulted in a large range of biomes across Australia.Beringer et al. (2016a) give details of the range of biomes sampled by the OzFlux network.OzFlux was established as an informal network of researchers working in the field of land surface-atmosphere exchange of momentum, energy and mass in 2001 (Beringer et al., 2016a).The provision of significant funding by TERN in 2009 allowed the network to grow in size and this was done with the benefit of knowledge gained since the establishment of OzFlux.It was understood from the outset that the resources available for operating OzFlux would be limited and that many of the site PIs were new to the area of EC measurements of surface fluxes.These considerations suggested that a high degree of standardisation would be required across the network to reduce the cost of establishing the new sites and to reduce the time taken to quality control and post-process the flux tower data.

Instrumentation suite
OzFlux chose a standard suite of instruments in 2010, but with the final decision dependent upon local conditions as these can be extreme.The EC instruments at most sites consists of a CSAT3 sonic anemometer (Campbell Scientific, Logan, Utah, USA) and a Li-7500 [A] open path infrared gas analyser (IRGA, LI-COR Biosciences, Lincoln, Nebraska, USA).Three sites (AU-Tum, AU-Otw and AU-Vir) use Gill HS sonic anemometers (Gill Instruments Ltd, Lymington, UK), and one site (AU-Wrr) uses an EC155 IRGA (Campbell Scientific).Measurements of the fluxes are made outside the roughness sublayer for half of the OzFlux sites using the least conservative criterion for the depth of the roughness sublayer (RSL) given in Katul et al. (1999;depth < twice canopy height).This means that care must be taken when interpreting small site-to-site variabilities in the fluxes over similar ecosystems because these may be due to the influence of local roughness elements at a particular site.
Logging of the EC data at most sites is done by a CR3000 or CR5000 measurement and control data logger (Campbell Scientific) with data transfer between the EC instruments and the logger by the synchronous device for measurements protocol (Campbell Scientific).The use of this interface allows diagnostic information from the CSAT3 and the Li-7500 to be recorded and these are used during the quality control and post-processing of the data; see Sect.3.2.The Campbell Scientific data loggers are fitted with compact flash card (CFC) modules (CF100 or NL115, Campbell Scientific).Three sites (AU-Tum, AU-Otw and AU-Vir) used PC-based data logging systems built at the site PIs institution.
The four components of the radiation budget are measured by CNR1 or CNR4 radiometers (Kipp and Zonen, Delft, Netherlands) and NR01 radiometers (Hukseflux, Delft, Netherlands), and most sites also measure net radiation using an NRlite (Kipp and Zonen).Additional meteorological information of air temperature and relative humidity (HMP45, HMP60, HMP155, Vaisala, Vantaa, Finland), wind speed and direction (Wind Sentry, R. M. Young, Traverse City, Michigan, USA, and WindSonic4, Gill) and precipitation (various, Hydrological Services, NSW, Australia, Rimco, Campbell Scientific) are also collected at all sites.Soil measurements across the OzFlux network consist of soil temperature (TCAV, Campbell Scientific), soil water content (CS605, CS616 and CS650, Campbell Scientific) and ground heat flux (CN3, Middleton, Victoria, Australia, HFP01, Hukseflux, HFT3, Campbell Scientific).Ground heat flux plates are buried between 5 and 8 cm below the surface with soil temperature probes placed to give the average temperature in the layer between the ground heat flux plates and the surface.Soil water content sensors were buried according to manufacturer's recommendations.The typical suite of soil sensors consists of three ground heat flux plates and soil temperature sensors to provide some degree of spatial sampling.Soil water content measurements are made at a minimum of 2 depths, 5 and 50 cm, below the surface.Several sites, where PIs have a particular interest in soil moisture or where required by the site soil characteristics, have arrays of soil moisture sensors at multiple depths in separate soil pits to provide spatial replicates.Where the number of sensors exceeded the input capacity of the data logger, additional inputs were provided using an AM 16/32B analogue multiplexer (Campbell Scientific).
Profile systems to measure [CO 2 ] and [H 2 O] in the canopy are installed at AU-Cum, AU-Tum, AU-Whr, AU-Wac and AU-Wom.Profile measurements are made at seven heights at AU-Cum and AU-Tum and six heights at AU-Whr, AU-Wac and AU-Wom and all systems were built within the site PIs institution.The general sampling protocol is to draw air at a constant rate through tubing of equal length for each height being sampled and to switch these air streams through the analyser (Li-840 or Li-7000, LI-COR) over cycles lasting for between 2 and 3.25 min (site dependent).Storage terms are calculated from the first and last profile measurements in each tower time step (30 or 60 min).The resultant storage correction terms have been applied to the AU-Tum and AU-Whr data, processing is in progress for AU-Cum and AU-Wom and will be included in future revisions of the OzFlux data set.McHugh et al. (2016) provide details of profile systems at AU-Whr and AU-Wom.
Details of the sensor arrays at each OzFlux site are available from the OzFlux web site (http://ozflux.org.au/monitoringsites/index.html).

Measurement protocols
This section addresses OzFlux sites that use Campbell Scientific data loggers.Data measurement and collection for the remaining three sites (AU-Tum, AU-Otw and AU-Vir) is described in Leuning et al. (2005).All sites fitted with Campbell Scientific data loggers recorded the three components of the wind field, virtual air temperature and H 2 O and CO 2 concentration at 10 Hz.Slow response meteorological and soil sensors are sampled every 10 s, except at AU-ASM and AU-TTE, where measurement frequencies depend upon the response time for the sensor and medium (e.g., soil), including measurements at 1, 10, 30 s, 1 and 30 min periods.All sites use a 30 min averaging period, except AU-Tum, AU-Otw and AU-Vir, which use an averaging period of 60 min.The time stamp refers to the end of the averaging period, which is retained throughout the OzFlux data quality control and postprocessing system.
The data loggers are programmed to write the 10 Hz turbulence data, sonic diagnostic and IRGA diagnostic directly to the CFC.The 10 Hz data stream is also processed by the logger program to remove samples recorded when the sonic or IRGA diagnostics indicated poor data quality and this quality-controlled data are then passed to the logger program's covariance routine.Online fluxes are calculated from these covariances at the end of each averaging period and the Webb, Pearman, Leuning (WPL) density terms are applied.Note that the data logger program does not apply a coordinate rotation to the 10 Hz turbulence data before calculating the online fluxes.For this reason, the online calculated fluxes are not used in the subsequent data processing (see Sect. 3.2) but serve as a useful instrument diagnostic during site visits.All slow response data are averaged or summed as appropriate over the 30 min averaging period (and a 1 min averaging period for AU-ASM and AU-TTE; Cleverly et al., 2016b) and written to the CF card in separate tables for fluxes (including all covariance terms), radiation, meteorological and soil data.In addition to the 30 min data written to the CFC, a subset of core data (radiation, online calculated fluxes, basic meteorology and soil data) is written only to the data logger's memory in a ring buffer of 6 months' duration.This acts as a backup of core measurements in the event of a CFC failure.
P. Isaac et al.: Network integration from collection to curation

Data collection
Several methods of data collection are used across the OzFlux network.Because of the range in access to telecommunications across the vast and sparsely populated Australian continent, a standard approach to telecommunications cannot be adopted.At all sites with Campbell Scientific data loggers, manual collection of data stored on the compact flash cards is the most basic and most robust method of data collection.However, this method is not suited to extremely remote sites and sites that are located at large distances from the site PI's institution where site visits cannot always be scheduled before the compact flash cards are filled.Modems are used at all but three OzFlux sites to provide data in near-real time and to provide an alternative data collection method in the event that overwrites occur on the CFC.The three sites not fitted with modems (AU-DaS, AU-Dry and AU-Wrr) are outside the mobile telephone coverage area.Two modem types are used: packet switch (ModMax, InteliMax, Maxon, NSW, Australia) and Ethernet (UniMax, Maxon).Packet switch modems are usually restricted to upload of the 30 min average data from the sites with the 10 Hz turbulence data still read from the CF card, except where the site PI has disabled the LoggerNet's (Campbell Scientific) default behaviour to convert the 10 Hz data to ASCII before transfer (AU-ASM and AU-TTE).Ethernet modems are used to upload both the 30 min average data and the 10 Hz turbulence data.Some modems are connected via a virtual private network (Maxon) to provide a secure, static IP service to the modem.

Ancillary data
OzFlux acquires additional data from the Australian Bureau of Meteorology (BoM), the European Centre for Medium Range Weather Forecasting (ECMWF) and the Moderate Resolution Imaging Spectroradiometer (MODIS) on the TERRA and AQUA satellites.The additional data provide alternative data for gap filling flux tower data sets, additional drivers for gap filling that contain seasonal and disturbance information and act as aids to researchers when interpreting the flux tower data.Multiple sources of ancillary data are used to obtain the best quality data; e.g., meteorological quantities for gap filling come from the BoM automated weather station (AWS) network, radiation and soil data from 2011 onwards come from the BoM weather forecasting model ACCESS-R and radiation and soil data prior to 2011 come from the ECMWF reanalysis product.Access to the additional data is via the Australian Academic Research Network (AARNet) cloud storage facility.OzFlux maintains a shared folder on this service to which all OzFlux site PIs and invited guests have access.The additional data files are placed in the shared storage area as they become available and users download these as required.

Automatic weather station data
The BoM operates a network of 620 AWSs across Australia that provides observations of air temperature, humidity, wind speed, wind direction and precipitation every 30 min.Of the 32 OzFlux sites for which data are available, only three are farther than 50 km from an AWS and none are farther than 100 km.OzFluxQC uses data from up to three AWSs for each flux tower site and chooses the AWS that correlates best with the tower data.AWS data are supplied by the BoM monthly, and processing is applied by OzFlux that includes checks for plausible values on all variables, linear interpolation to fill gaps shorter than 2 h and 2-D bi-linear interpolation to fill gaps shorter than 3 days.The AWS data are then output as netCDF (see Sect. 3) files to be compatible with the flux tower data format.

ACCESS-R
The second alternative source of data is the Australian Community Climate Earth System Simulator (ACCESS) numerical weather prediction model used by the Bureau of Meteorology.The regional version of ACCESS (ACCESS-R) used for operational forecasting has a horizontal resolution of 12.5 km for the whole of Australia and a time step of 1 h (Bi et al., 2013).ACCESS-R is initialised every 6 h with an analysis field based on surface and upper air observations and between these analysis times uses information on the cloud field from polar orbiting satellites (Puri et al., 2013).OzFlux uses the 6-hourly analysis fields and the intervening 5 h of forecast data to provide time series of radiation, meteorological and soil quantities at all of the active flux tower sites.Model output from 2011 onward is available from the BoM archive.Current ACCESS-R data are available from the BoM OPeNDAP (http://www.opendap.org/)server and are automatically retrieved by OzFlux daily.Processing of the ACCESS-R data consists of extracting cut-outs of 3 × 3 grid squares centred on the flux towers, interpolating from the ACCESS-R time step of 60 min to the flux tower time where required and calculation of instantaneous rainfall rates from the cumulative product output by the model.

ERA-Interim reanalysis
The third source of alternative data is the ERA-Interim data set from the European Centre for Medium Range Weather Forecasting (Dee et al., 2011).The ERA-Interim data set is a reanalysis product available for the whole globe from 1979 onward.It has a horizontal resolution of approximately 75 km across Australia and is available at 6hourly time steps with the 00:00 and 12:00 UTC data based on observations and the 06:00 and 18:00 UTC data based on forecast fields.The ERA-Interim surface data set provides time series of radiation, meteorological and soil quantities (http://apps.ecmwf.int/datasets/data/interim-full-daily/levtype=sfc/).Processing of the ERA-Interim data consists of extracting the time series data for the grid square containing the flux tower, calculating instantaneous quantities from the cumulative products and interpolating from the ERA-Interim 6-hourly time step to the flux tower time step.Interpolation for short-wave radiation data is based on the solar zenith angle and the 6-hourly rainfall data are spread evenly across all times in the interpolation period.Linear interpolation is used for all other quantities following Vuichard and Papale (2015).

MODIS
OzFlux harvests the MOD13Q1 data (Normalised Difference Vegetation Index (NDVI) and Enhanced Vegetation Index (EVI)) from the TERN-AusCover Data Portal (http://www.auscover.org.au/).Cut-outs of 5 × 5 pixels around each tower are extracted from the spatial aggregates and filtered using the MODIS quality control flags.Additional quality control includes rejecting pixels whose value is outside ±2 standard deviations of the mean value of the 25 pixels.Values from the remaining pixels are averaged and missing periods are filled by linear interpolation.The resulting time series of MODIS data at 16-day intervals is then smoothed using a Savitzky-Golay filter, linearly interpolated to the flux tower time step and output as netCDF files compatible with the flux tower data format.The MODIS data provide information on the state of the ecosystems around the flux towers at a temporal resolution fine enough to resolve changes due to disturbance or seasonality.This is particularly important when filling long data gaps caused, for example, by fire because this information is often not present in the other alternate data sources.

OzFlux data access
There is an increasing awareness within the research community and those who fund research that data should be easily available and reusable.The reuse of data is promoted when it is accompanied by sufficient information, or metadata, to allow the user to decide if the data are fit for this purpose.Within the FLUXNET community, the BADM templates are one approach to collecting these metadata.In OzFlux, metadata are provided by OzFluxQC and bundled directly with the data in the netCDF file.Alternative file formats can be made available to facilitate collaboration with other networks (e.g., FLUXNET) or research organisations (e.g., NASA's SMAP project; http://smap.jpl.nasa.gov/).In these latter cases where data are provided as CSV files, OzFluxQC can include metadata in the file header, as negotiated on a case-by-case basis.
OzFlux contributed 17 site-years of L3 data from four sites to the LaThuile FLUXNET data set.At the April 2016 update of the FLUXNET2015 synthesis, there were 62 siteyears of L3 data from 15 OzFlux sites and this will be extended to 86 site-years from 23 sites by the July 2016 update.
The OzFlux submission to FLUXNET is available from the FLUXNET2015 download page (http://fluxnet.fluxdata.org/data/fluxnet2015-dataset/) and from the FLUXNET collection on the ODP.

The OzFlux Data Portal
The primary repository for OzFlux data is the ODP (http: //data.ozflux.org.au/).Data on the ODP are organised into collections such that each collection represents a single flux tower.The collections can be viewed as a list or via a map interface.The collection home pages contain a brief description of the site and the temporal coverage of the collection.Citation information is also provided including a persistent identifier (handle) that resolves to the collection when typed into the address bar of a web browser (hdl.handle.net).The handle simplifies the citation of data sets and increases the visibility of the data.Like all TERN data sets, the OzFlux collections can be browsed by the public without login credentials.OzFlux L3 data are stored on the portal in annual files, although some PIs also choose to provide L4 to L6 data as well.The metadata associated with each netCDF file (the global and variable attributes) can be viewed from the collection web page.This enables users to assess the suitability of the data without first downloading the netCDF files.The collection page entry for each file also indicates when access to the data is restricted.Site PIs may choose to restrict access to data for up to 18 months when it is necessary to protect student intellectual property, although a waiver for data usage that does not conflict with the student's research can be negotiated with the site PI.The download of files that are not restricted does not require the user to have an account on the portal or to be logged in.More than 90 % of the data on the ODP are unrestricted.
The default license applied to data uploaded to the ODP is the TERN BY-SA-NC license.This is a variant of the Creative Commons V3 license that requires attribution of the data owner when the data are used (BY), which states that any derivative products are also covered by the same license (share alike or SA) and allows commercial use only if authorised by the data owner (NC).TERN reserves the right to change the licence at their discretion.Collection metadata are made available by the ODP in Registry Interchange Format -Collections and Services schema on a publicly available server.Organisations such as Research Data Australia (https://researchdata.ands.org.au/) and the TERN Data Discovery Portal (http://portal.tern.org.au/)can then harvest this metadata, making the ODP collections visible via their own web presence.Source code for the ODP is publicly available (https://code.google.com/archive/p/eddy/).

The OzFlux OPeNDAP server
The Open-source Project for a Network Data Access Protocol (OPeNDAP; http://opendap.org/)provides a mechawww.biogeosciences.net/14/2903/2017/Biogeosciences, 14, 2903-2928, 2017 nism for serving data via the internet.Access to an OPeN-DAP server can be via a web browser or via OPeNDAPaware programmes such as Panoply (http://www.giss.nasa.gov/tools/panoply/).A major feature of OPeNDAP is that the server can provide subsets of the data, as configured by the users, allowing them to view and manipulate the data without downloading the complete file.OzFlux has implemented an OPeNDAP server (http://dap.ozflux.org.au/), and unrestricted data from the OzFlux data set are available from this server.The data on the OPeNDAP server are stored by site and then by processing level (L3 to L6).Within each processing level there are "site_pi" and "default" data sets.
The "default" data sets are processed by OzFlux using the standard methods described in this paper, using the default options.This achieves a high level of standardisation for the processing across all sites but may not produce the best results at an individual site.The "site_pi" data sets are processed by the site PIs and may use non-standard methods or options based on the PIs knowledge of their own site.Data available from the OzFlux OPeNDAP server are combined into single, multiple-year files for each processing level at each site.
3 OzFlux data processing The processing system described here has several novel features.It integrates the quality control, post-processing, gap filling and partitioning of flux tower data into a single graphical user interface (GUI)-driven package written in a simple but powerful scripting language that is easy to maintain and extend.The system performs gap filling using meteorological, radiation and soil data from multiple alternate sources to improve the quality of the filled data for gaps longer than several days.Finally, the package encourages a deeper understanding of ecosystem processes operating at a site by presenting multiple visualisations of the data and allowing the user to quickly iterate over options at each processing level.Each step through the data processing is presented graphically to the user, who is then able to decide if the results of a particular stage are satisfactory.This allows users and PIs to maintain a constantly updated familiarity with the conditions at the OzFlux site.
The system uses a platform independent, selfdocumenting file format (network Common Data Format, netCDF) that allows information describing the data and the processing (metadata) to be packaged together with the data (Rew and Davis, 1990).This file format is also supported by a number of off-the-shelf tools that enable data to be easily served via the internet (Signell et al., 2008).netCDF is the data format adopted by the Australian land surface model CABLE (Kowalczyk et al., 2006).
The utility of the metadata contained in the netCDF files is enhanced when the type of metadata and its possible values conform to a widely adopted schema.OzFlux has adopted the Climate and Forecasting (CF) metadata conventions (http://cfconventions.org/,Gregory, 2003).The CF conventions define, among other things, a variable attribute called "standard_name", the value of which is chosen from a controlled vocabulary.Selecting data from a netCDF file using the CF standard_name variable attribute can avoid problems with differing variable naming schemes used by different data providers but is limited due to the origins of the CF conventions in the climate modelling domain rather than the flux observation world.At present, 20 of the 30 main variables output by OzFluxQC have standard names defined by the CF conventions.OzFlux is planning to submit proposed standard names for those currently not defined to the CF conventions working group.

The OzFluxQC Python suite
OzFlux has developed a suite of Python scripts, called OzFluxQC, that are integrated into a single GUI application that addresses the quality control, post-processing, gap filling and partitioning requirements associated with processing data from flux towers.The Python language was chosen because of its clear syntax, its widespread use in the scientific community and its encouragement of modular, easyto-maintain and reusable program development (Oliphant, 2007).The full suite of scripts consists of over 10 000 lines of code and is available from GitHub (https://github.com/OzFlux/OzFluxQC) under the GNU Public License.
Figure 2 shows a high level diagram of the OzFluxQC data path.Processing of flux tower data is divided into six stages, and each stage generates a netCDF file that is CF compliant.The six stages are identified as follows: -L1 (level 1) processing ingests the flux tower data and writes the combined data and metadata to a netCDF file.
-L2 (level 2) processing applies a suite of quality control checks to the L1 data.
-L3 (level 3) processing applies a range of corrections to the L2 data.
-L4 (level 4) processing fills gaps in the radiation, meteorological and soil quantities.
-L5 (level 5) processing fills gaps in the flux data.
-L6 (level 6) processing partitions the gap-filled NEE into GPP and ER.
All stages of processing read configuration information from text files (called control files).This convention allows siteto-site variability in the processing requirements to be abstracted from the Python code in to user editable files.The primary function of OzFluxQC is to post-process, gap fill and partition flux tower data; it does not process raw turbulence data (typically sampled at 10 to 20 Hz) to covariance or flux values averaged over some nominal period (e.g., 30 or 60 min).This functionality is provided by a number of readily available software packages, e.g., EddyPro (LI-COR), TK3 (Maunder and Foken, 2015), EdiRe (R. Clement, University of Edinburgh, UK) and ECPack (van Dijk et al., 2004).OzFluxQC is able to use the output from these packages and then skips the unnecessary steps at L3.All of the code in OzFluxQC has been written in Python by members of the OzFlux community with the exception of the neural network used when gap filling fluxes at L5.The neural network code, written in C and C++, is called directly by OzFluxQC.The re-implementation of algorithms such as meteorological functions and u * threshold detection in different computer languages can lead to errors caused by programming mistakes, subtle differences in physical constants and different assumptions or inferences (Fratini and Mauder, 2014).Harmonising the routines used by OzFluxQC with the same algorithms implemented in other gap-filling and partitioning packages is an important task for the future.

The roles of OzFluxQC and DINGO
This paper describes the OzFluxQC suite of Python scripts used to quality control, post-process, gap fill and partition data from flux towers.Beringer et al. (2016b) present a description of the Dynamic INtegrated Gap filling and partitioning tool for OzFlux (DINGO), a second tool that can be used for gap filling and partitioning of flux tower data.The two systems are independent and serve different roles within OzFlux.The core distinction is that OzFluxQC is intended to be the operational tool used by site PIs to process their data, while DINGO is a research tool with distinct capabilities.The major differences between the two approaches can be summarised as follows.
OzFluxQC provides an integrated processing and plotting environment for all stages (L1 to L6) of processing flux tower data, whereas DINGO performs the equivalent of L4 to L6 processing using data output by OzFluxQC at L3.Both systems support batch processing of data from multiple sites, and OzFluxQC provides an interactive GUI mode that allows site PIs to vary processing options to achieve the best results www.biogeosciences.net/14/2903/2017/Biogeosciences, 14, 2903-2928, 2017 on a site by site, year-by-year basis.In keeping with its role as a production tool, OzFluxQC propagates and adds metadata throughout the processing stages and saves this with the data in netCDF files, written at the conclusion of each processing level, that are intended to be self-documenting.For example, gap-filled and partitioned fluxes at L6 contain in their variable attributes details of processing options such as the source of alternate data and settings for the neural network used during gap filling, friction velocity and incoming short-wave radiation thresholds and other processing options often known only to the data owner.The use of netCDF files also promotes easy integration of the OzFluxQC data path with servers designed to make the data publicly available (e.g., the OzFlux Data Portal and the OzFlux OPeNDAP server; see Sect. 4).In keeping with its role as a research tool, DINGO provides data in comma separated value (CSV) files for easy access.Both systems use data from the BoM AWS network for gap filling meteorological data, but where OzFluxQC uses ACCESS-R and ERAI for radiation and soil data, DINGO uses a daily, gridded radiation product from the BoM interpolated to the tower time step and soil data from a land surface model run as part of the gap filling process.Gap filling of fluxes is done using ANNs in both systems but the type of ANN differs between the two systems and both approaches use the change point detection (CPD) method to estimate the u * threshold.OzFluxQC and DINGO both offer partitioning using the u * filter/ANN approach (with different ANNs as per the gap filling of fluxes) and the light-response function method.OzFluxQC also offers the u * filter/Lloyd-Taylor approach for partitioning NEE.

Data ingestion, quality control and post-processing (L1 to L3)
The first stage of processing is designed to accept data in multiple forms, combine these data with metadata entered by the user in the L1 control file and to write the data and metadata to a CF V1.6 compliant netCDF file.Input data can be as comma separated values (CSV) or Excel spreadsheets.OzFluxQC is designed to work with input data from two sources.For EC systems using Campbell data loggers, OzFluxQC is able to use the average covariances output by the logger.In this mode, the QC checks are done on the covariance values and the fluxes are calculated at L3 after coordinate rotation is applied to the covariances.This approach allows site PIs to quickly analyse data without having to first process the fast turbulence data but this does mean that some processing steps normally applied to the fast data (e.g., stationarity checks) cannot be performed.The second source of input data comes from software designed to process the fast turbulence data (e.g., EddyPro).With this method, the fluxes calculated by the fast data processing software are read directly into OzFluxQC at L1, the QC checks are performed on the fluxes, and the otherwise redundant corrections and flux calculations at L3 are skipped based on user-selected processing switches.Inconsistent combinations of input data and processing switches will cause OzFluxQC to terminate with an error message.Quality control of surface flux data has long been recognised as an important step in producing robust surfaceatmosphere data sets (Foken and Wichura, 1996;Vickers and Mahrt, 1997;Falge et al., 2001a;Rebmann et al., 2005;Gockede et al., 2008;Mauder et al., 2013).Reviews of quality control steps are also given in Foken et al. (2010) and(2012).Quality control checks used by OzFluxQC for all variables include (i) range checks for plausible limits, (ii) spike detection, (iii) dependency on other variables and (iv) manual rejection of date ranges.Specific checks applied to the sonic and IRGA data include rejection of points based on the sonic and IRGA diagnostic values and and on either automatic gain control (AGC) or CO 2 and H 2 O signal strength, depending upon the configuration of the IRGA.The AGC and signal strengths for CO 2 and H 2 O have been found to be particularly useful for detecting periods when water droplets are present in the optical path.The spike detection routine is a modification of the standard deviation test of Vickers and Mahrt (1997), applied here to averaged data.Dependency checks allow data points of one variable to be rejected based on the value of a second variable; for example, flux measurements can be rejected when the wind direction is outside of an acceptable range.Limits for the range check (upper and lower limits) and spike detection (the number of standard deviations about the mean) are specified for each month of the year to allow for large seasonal variability seen in some Australian ecosystems (e.g., the seasonal wet-dry tropical savanna).
L3 includes computation of meteorological variables, application of standard corrections, calculation of fluxes and optional application of profile measurements for estimating CO 2 storage dynamics.First, a suite of meteorological variables are calculated and added to the data structure, including the atmospheric vapour pressure, dry and moist air densities, specific heat capacities and humidity deficits.Equations for these calculations are taken from Stull (1988) and van Dijk et al. (2004).Next, corrections are performed if they have not already been performed separately (e.g., by a SmartFlux system or by calculating the fluxes from the turbulence data), which include two-dimensional coordinate rotation (Lee et al., 2010); corrections for frequency attenuation due to stability, sensor geometry and sensor line averaging (Massman and Clement, 2004); recalculation of the fluxes from the rotated, corrected covariances and meteorological variables computed in the previous step; conversion of the virtual heat flux to sensible heat flux (Schotanus et al., 1983;Campbell Scientific Inc., 2004); the WPL density terms for the effects of sensible and latent heat fluxes (F h and F e , respectively) on density measurements made by the IRGA, which is applied to correct F e and then CO 2 flux (F c , Webb et al., 1980); correction of ground heat flux (F g ) for heat storage in the soil above the heat flux plates (Kustas et al., 2000); calculation of available energy as the difference between net radiation (F n ) and storage-corrected F g (Leuning et al., 2012); and addition of the F c storage term to the EC values of F c , where applicable.The F c storage term can be supplied from values calculated externally from profile data or internally from CO 2 concentration at the EC instrument height.

u * threshold detection
The underestimation of ER by EC measurements at night is well documented (Goulden et al., 1996;Moncrieff et al., 1996;Aubinet et al., 2000).One of the approaches used by OzFluxQC to mitigate this affect is to reject nocturnal periods when the friction velocity falls below a threshold value (see Sect. 3.7 for an alternative partitioning method).The threshold value for u * is determined following the CPD method of Barr et al. (2013).In this approach, the time series of nocturnal NEE measurements is divided into windows, or "seasons", of 1000 points with successive windows overlapping by 500 points.The data for each window are divided into four equally populated temperature classes and within each temperature class the nocturnal NEE is averaged over 50 u * bins.Bins with fewer than five points (30 min averaging period) or three points (60 min) are removed from further analysis.The u * threshold for each temperature class in each 1000-point "season" is then found by successive fitting of a two-stage linear regression to the binned NEE data while varying the centre point of the linear fit from the lowest u * bin to the highest.The u * thresholds for all "seasons" in a year are then averaged to get annual threshold values after rejecting those "seasons" for which the sign of the regression slopes differed from the majority (dominant mode).The fraction of "seasons" rejected for this reason is typically less than 5 %.Uncertainty in the u * threshold value is estimated by repeating the CPD method 1000 times and randomising the time order of the input data prior to each run (bootstrapping).Bootstrapping yields a mean threshold value and 95 % confidence limits that are used to estimate uncertainty in NEE due to uncertainty in the u * threshold.In practice, mean and confidence limits of the u * threshold are insensitive to the number of bootstrapping iterations between 100 and 1000 runs.
By default, OzFluxQC will use the mean, annual u * threshold from the CPD method when filtering nocturnal NEE.This can be overridden by the user and an arbitrary number of date periods with different thresholds can be specified.This allows the user to accommodate seasonally varying u * thresholds where these exist.We examined the seasonal variability in the u * threshold from the CPD approach using the technique described in Barr et al. (2013) for nine OzFlux sites.At seven of the nine sites, there was no seasonal variability in the u * threshold (s 1 /s 0 = 0) and at the remaining two sites, the seasonal variability was not significant (s 1 /s 0 < 0.2 and R < 0.2).All nine sites are evergreen, native vegetation with very small changes in canopy structure over the course of a year and this lack of seasonal variability in the vegetation is the most likely explanation for lack of seasonality in the u * thresholds.
At some OzFlux sites, the u * filter can produce unrealistic results, for example due to decoupling between the sensors deployed above a forested canopy and drainage flow beneath (AU-Tum;van Gorsel et al., 2009) or due to doubleaccounting of storage flows (Aubinet, 2008) at the flat, sparsely vegetated sites in semiarid central Australia (AU-ASM and AU-TTE; Cleverly et al., 2013).OzFluxQC offers an alternative partitioning pathway (the daytime method, Lasslop et al., 2010) for use in cases where standard methods might introduce site-specific biases.

Gap filling of drivers and fluxes (L4 to L5)
Gap-filled time series are required if site data are to be used by modellers for parameterisation and validation of land surface models and to construct annual sums of carbon and water exchange between the land surface and the atmosphere.OzFlux data are gap filled in two stages.Before the gaps in the fluxes are filled, gaps are filled in the environmental measurements that are used to drive gap filling of the fluxes.OzFlux sites are usually located in harsh climates that occasionally prevent access to the sites for maintenance and repair (e.g., the North Australian Tropical Transect sites in the Northern Territory).As a result, gaps in flux tower data are frequent and occasionally long.Figure 3 shows the distribution of gap occurrence (top panels) and fraction of towww.biogeosciences.net/14/2903/2017/Biogeosciences, 14, 2903-2928, 2017 tal missing data (bottom panels) for 86 site-years across 23 OzFlux sites.The plots show results for the combined environmental drivers (left column) and the fluxes (u * , F h , F e and F c , right column).Short gaps of 1 day or shorter in duration are the most common, but the greatest contribution to the total amount of missing data comes from long gaps of 30 days or more.The high proportion of missing data due to gaps longer than 30 days means that gap-filling techniques such as mean diurnal variation and marginal distribution sampling (Moffat et al., 2007) that are based on site data alone will perform poorly.As a substitute for climatologytype approaches, OzFluxQC uses data from three alternative sources to fill time series of radiation, meteorological and soil data from flux towers: one based on observations and two based on model or reanalysis outputs.
The three sources of alternate data used by OzFluxQC, AWS, ACCESS-R and ERAI are described in Sect.2.5.1, 2.5.2 and 2.5.3, respectively.All three sources provide meteorological quantities and ACCESS-R and ERAI data provide radiation and soil data as well.The choice of which source to use when more than one is available can be made by the user but in practice the default configuration of AWS for meteorology, ACCESS-R for radiation and soil data from 2011 onwards and ERAI for radiation and soil data prior to 2011 has proved to be the best combination for all 23 OzFlux sites.During gap filling of the drivers, OzFluxQC produces an Excel workbook containing statistics (R, bias, root mean square error (RMSE), ratio of the variances) describing the fit between data measured at the flux tower and from the alternate sources and the default choice can be confirmed or modified based on these results.
Bias is frequently observed between the data from flux towers and the alternative sources and this must be removed before the alternate data can be used for gap filling the flux tower data.Bias correction in OzFluxQC is done on 90-day windows (default, user able to specify) as a compromise between providing sufficient samples for training the bias correction model and using short enough periods to avoid seasonal changes in the relationship between the flux tower and alternate data.Phase differences between the flux tower and alternate data are detected using lagged correlations and removed before performing the bias correction.Bias correction is done in each 90-day period using a linear fit calculated for each variable provided there is a minimum percentage of good data available within the window (default 50 %).For gaps that do not satisfy the minimum percentage of good data threshold, the window size is increased by moving the start and end dates of the window period by 1 day at a time until the minimum percentage of good data requirement is met.This technique ensures that the window length is kept as short as possible while still providing sufficient good data for the linear fit.The AWS and ACCESS data sets contain more than one replicate for each variable being filled (typically three for AWS and nine for ACCESS).The replicate with the highest correlation to the flux tower data in each 90-day window is used.The effect of alternate data sources for gap filling of environmental drivers is summarised in Figs. 4 (for F sd ) and 5 (for T a ) for nine OzFlux sites that represent a range of ecosystems with data sets of varying gap fraction across the OzFlux network.
Quality assurance of gap filling for environmental drivers was based upon estimates of bias, variance ratio, correlation coefficient (R) and RMSE in the comparison of measured vs. modelled (ACCESS and ERAI) data sets.The variance ratio is defined here as σ 2 tower /σ 2 alternate where σ 2 is the variance calculated over the 90-day window period.The variance ratio quantifies the extent to which the alternate data capture the variance in the tower data.Bias, R and RMSE values for F sd were comparable for both sources of alternate data, but ACCESS variance ratios were closer to unity than those for ERAI, on average (Fig. 4).The results show that both ACCESS and ERAI provide good quality data for gap filling F sd with ACCESS performing slightly better than ERAI.The results for F su , F ld , F lu , F n and F a (not shown) are similar.Figure 5 shows the performance of alternate data sources (AWS, ACCESS and ERAI) on gap filling of air temperature.Bias in air temperature was small at most sites and variance ratios were close to unity.Both R and RMSE show that the AWS data for air temperature were a significantly better match (higher R, lower RMSE) to the tower data than ACCESS.ERAI data perform poorly compared to the AWS and ACCESS data.The results of comparisons for all radiation, meteorological and soil quantities (not shown) confirm that the combination of numerical weather prediction (NWP) model or reanalysis data for gap filling radiation and soil variables and AWS data for gap filling meteorological variables provides the best overall performance.
OzFluxQC uses the SOLO ANN (Hsu et al., 2002) to gap fill the fluxes.Details of the SOLO neural network design and operation are given in and Abramowitz (2005) and its use in Australian conditions is discussed in Cleverly et al. (2013) and Eamus et al. (2013).In SOLO, SOFM performs the ANN equivalent of a principal component analysis to determine the relationships amongst environmental drivers.Next, SOLO performs the ANN equivalent of a multiple linear regression between the results of SOFM and the fluxes.The choice of fluxes and environmental drivers is configurable by the user, but the default settings are to fill u * (with WS, F n , T a and q as drivers), F h (F a , T a and WS), F e (F a , q, T a and WS) and F c (F n , F g , q, vapour pressure deficit (VPD), T a and T s ).A u * filter was applied to the F c data prior to gap filling, and annual thresholds were determined as described in Sect.3.5.The proportion of nocturnal NEE measurements remaining after application of the u * filter varied between 11 and 23 % across all sites in the OzFlux network.The choice of window size for gap filling the fluxes is a compromise between a long enough period to provide sufficient points for training the ANN and short enough to allow the ANN to accommodate seasonal changes in the relationships between the drivers and the fluxes.The default window size is 60 days, but the window size and all other ANN parameters can be set by the user.As with the gap-filling strategy for the drivers described above, OzFluxQC will automatically adjust the start and end dates of the window period in increments of 1 day until the minimum percentage of good data requirement is met.Multiple interactive runs are generally required to find the best combination of window length and ANN parameters for any given site.All statistics were improved (smaller RMSE, higher R and variance ratio closer to 1) when gap filling was performed on windowed data rather than on entire time series (see Fig. 6).RMSE was least sensitive to window length, but shorter window lengths were associated with values of R that were higher and values of the variance ratio closer to unity, much in the same way that windowed lookup tables perform with small errors when the gaps are small enough (Moffat et al., 2007).

Partitioning of NEE (L6)
A common goal of flux tower research is to partition observed NEE into the component fluxes of GPP and ER.This is an ill-posed problem where we infer two large numbers from their small difference (Baldocchi, 2008).OzFluxQC offers three methods for estimating ER from flux tower data: two are based on the nocturnal thermal-response approaches (Papale et al., 2006), and one is based on daytime lightresponse functions (Lasslop et al., 2010).The two nocturnal approaches apply a u * filter to measurements of F c (either nocturnal only or both nocturnal and daytime, user selectable), but they apply different models to estimate ER from the filtered data.
The first nocturnal approach applies a modified Arrhenius equation (Lloyd and Taylor, 1994): In this equation, T ref = 10 • C is the reference temperature, T 0 = −46.04• C is a model parameter chosen by Lloyd and Taylor (1994) to fit their soil respiration data, ER 10 is the base respiration rate at the reference temperature, E 0 is the activation energy and ER LT is the ecosystem respiration estimated www.biogeosciences.net/14/2903/2017/Biogeosciences, 14, 2903-2928, 2017 using the Lloyd-Taylor model.Either air temperature or soil temperature can be used in the implementation of Eq. (1) in OzFluxQC (Lasslop et al., 2012).The values of ER 10 and E 0 were estimated from nocturnal, u * -filtered measurements of F c using the Levenberg-Marquardt damped least squares method.E 0 values were determined using a window size of one calendar year, and ER 10 was estimated using a window size of 15 days and an overlap of 10 days.The resulting values for ER 10 and E 0 were quality controlled as described in Lasslop et al. (2010), and missing parameter values were filled using linear interpolation.ER LT was then estimated using Eq. ( 1) with the gap-filled air or soil temperature, as appropriate.
The second nocturnal approach also used nocturnal, u *filtered F c measurements, but here respiration was modelled using SOLO, with T a , T s , soil water content and, optionally, MODIS EVI data as drivers.Due to the small amount of data retained after the u * filter was applied, we use window lengths of a minimum of 1 year in length for this approach.Inclusion of MODIS EVI as a driver improved the fit of the SOLO output when long window lengths were used.The ANN was trained only on observations that have passed all quality control checks; that is, gap-filled data were not used.After training, SOLO was used to produce predictions of ER ANN at all time steps in the tower record (i.e., based upon gap-filled drivers).
The third approach for estimating ER used daytime data, with a rectangular hyperbola fit between F sd and NEE (following Lasslop et al., 2010): The second term on the right side of Eq. ( 2) is nocturnal ER LL (i.e., in the absence of light; Eq. 1).In the first term, F sd is incoming short-wave radiation, α is the initial slope of the light-response curve (LRC) and β is the photosynthetic capacity at light saturation, which is modified from its maxi- mum value β 0 by an exponential function of VPD: In this model, T ref , T 0 and VPD 0 = 1 kPa all have fixed values.E 0 is calculated for each calendar year as described above.α, β, ER 10 and k are all found by non-linear curve fitting using the Levenberg-Marquardt method for 15-day windows with a 10-day overlap between consecutive windows.Quality control of the estimated parameters follows Lasslop et al. (2010).With all of the above methods, the final ER time series was constructed from observations of F c at night (F sd < 10 W m −2 ) when u * is above the threshold and modelled ER at all other times.The final NEE time series was constructed from gap-filled F c during the day (F sd > = 10 W m −2 ) and ER at night.GPP was then calculated from the three NEE and ER time series described above as Estimation of the uncertainty in annual sums of NEE, GPP and ER is important to decide whether observed source or sink magnitudes and observed inter-and intra-annual variability are significant.Possible sources of uncertainty include uncertainty in the u * threshold value, random error in the flux measurements, model error due to limitations in the models adopted for estimating ER and GPP, uncertainties in the model parameters derived from observations and fraction of missing data (Barr et al., 2013;Hollinger and Richardson, 2005;Keith et al., 2009;Papale et al., 2006).OzFlux is developing a system for estimating the uncertainty in estimates of NEE, GPP and ER and this is described in Sect.6, Future Directions.

Surface energy balance closure
The closure of the SEB is an important diagnostic for EC measurements (Aubinet et al., 2000;Leuning et al., 2012;Stoy et al., 2013).In their analysis of data from the LaThuile FLUXNET data set, Stoy et al. (2013) found an overall value for the SEB closure of 0.84 ± 0.2 and concluded that, among other factors, non-closure of the SEB is related to landscapescale heterogeneity.In an analysis of the same data set but using a slightly different methodology, Leuning et al. (2012) found a median value for SEB closure of 0.75 for half-hourly data, rising to 0.90 when daily averages were used.We investigated the SEB closure for 86 site-years of data across 23 OzFlux sites following the approach of Leuning et al. (2012).Daily values of (F h + F e ) and (F n − F g ) were calculated for each site using non-gap-filled data on days when more than 80 % of records were present.Figure 7 shows a histogram of SEB ratios ([F h + F e ]/[F n F g ]) for the OzFlux sites; 50 % of site-years had an SEB ratio of 0.89 or higher, and 80 % had an SEB ratio greater than 0.80.Only 8 % of site-years have an SEB ratio greater than 1.00.These values for SEB closure are similar to those found in recent reviews (Stoy et al., 2013;Leuning et al., 2012).

Comparison of annual sums
We compare annual sums of NEE, GPP and ER estimated using five different methods at nine OzFlux sites for the calendar year 2013 (AU-Cpr, Calperum  years available from the OzFlux and FLUXNET2015 data sets and to minimise the extent of gap filling in the tower data.Note that for these results, OzFluxQC was set to apply the u * filter only to nocturnal measurements of F c whereas the FLUXNET results were obtained by applying the u * filter to both nocturnal and daytime measurements. The general rankings of sites were consistent across methods, with some notable exceptions (see Fig. 9).Daytime approaches by OzFlux, REddyProc and FLUXNET produced 20-57 % (median 35 %) smaller values of ER at all sites compared to the nocturnal methods.The underestimation of ER by the daytime methods was particularly noticeable at AU-Cum, AU-Tum and AU-Whr.By contrast, nocturnal, u *filtered approaches using neural networks to estimate ER (OzFlux and DINGO) produced very similar results at all sites (median 7 % difference, range 0 to 14 %).The largest difference (14 %) occurred for AU-Tum, which was likely to be due to differences in the application of the nocturnal u * filter (DINGO used only the first 3 h after sunset; see Beringer et al, this issue).The nocturnal, u * -filtered approaches using the Lloyd-Taylor ER model shows the largest variability between methods within a single site (median 24 %, range −6 to 170 %), particularly at AU-Cum, AU-DaS and AU-Tum.The within-site variability can be explained by differences in the approach and data used by each method.For example, OzFluxQC determines a value of E 0 annually whereas REddyProc uses a single value for all years (details for FLUXNET not available at the current time).OzFluxQC and FLUXNET calculate the u * threshold using the CPD method but REddyProc uses the method described in Papale et al. (2006).The OzFluxQC CPD method gives a u * threshold for AU-Tum of 0.58 m s −1 compared to values of 0.24 m s −1 for the same method implemented by FLUXNET and 0.28 m s −1 for the Reichstein method used in REddyProc.To examine the effect of these different u * threshold values on the results we repeated the AU-Tum processing with OzFluxQC and REddyProc using the u * threshold applied in the FLUXNET processing.Annual sums for the reprocessed data are given in Table 1.The median difference between the methods reduces to 12, 4 and 22 % for NEE, GPP and ER, respectively.This is a substantial reduction but disparities remain.Small differences in results from different methods are to be expected and may be another indicator of uncertainty if, a priori, we have no basis for deciding which method is correct at a given site or under particular circumstances.However, different implementations of the same algorithm (e.g., the nocturnal method with the Lloyd-Taylor respiration model) are expected to agree and where differences exist, these need to be resolved.

Comparison of methods and uncertainty in the u * threshold
The previous section compared annual sums for NEE, GPP and ER for seven different methods at nine OzFlux sites and found significant variation in these between approaches and even between implementations of the same method.In this section, we compare the range in monthly totals of NEE, GPP and ER from the seven different methods at three OzFlux sites (AU-Cpr, AU-DaS and AU-Whr) to the range in these quantities due to uncertainty in the u * threshold to determine the relative sizes of these two sources of uncertainty.While there are other components to uncertainty in addition to that arising from the value of the u * threshold (e.g., random error, model error), we expect that the uncertainty due to the u * threshold to be the largest contributor (Barr et al., 2013;McHugh et al., 2016).Extension of the measurement footprint outside the target ecosystem, especially in stable conditions, will also contribute to uncertainty in NEE, GPP and ER estimates.The range in monthly sums from the seven different methods (described in Sect.4.2) was found by taking the largest and smallest values from any method for each month.The monthly sums for the uncertainty in the u * threshold were calculated by constructing distributions of u * threshold from the mean and the 5 and 95 % confidence intervals given by the Barr et al. (2013) approach, sampling 1000 u * -threshold values from the distribution, applying the u * filter with these values and partitioning the filtered NEE using the nocturnal method with the Lloyd-Taylor respiration model.The range in the monthly sums due to the uncertainty in the u * threshold is again taken as the largest and smallest values for each month.
The results are shown in Fig. 9 with the range in monthly sums across the seven different methods plotted as light coloured bands and the range due to uncertainty in the u * threshold plotted as dark coloured bands.At all three sites, the range in monthly sums from the different partitioning methods is more than 3 times the range due to uncertainty in the u * threshold (medians 3.8, 3.6 and 3.2 for NEE, GPP and ER, respectively).Restricting the comparison from the seven methods used in Sect.4.2 to the three methods implemented in OzFluxQC (nocturnal u * -filtered Lloyd-Taylor, nocturnal u * -filtered ANN and daytime method) reduces the range in monthly sums from the different partitioning schemes to 2 times the range due to uncertainty in the u * threshold (medians 2.1, 2.6 and 2.4 for NEE, GPP and ER, respectively; data not shown).Further restricting the comparison to different implementations of the same method (nocturnal u * -filtered Lloyd-Taylor from OzFluxQC, FLUXNET and REddyProc) brings the range in monthly sums from the different methods close to the range due to the uncertainty in the u * threshold (medians 1.3, 1.3 and 0.9 for NEE, GPP and ER, respectively; data not shown) although the scatter in the comparison remains high (5 and 95 % quantile values of the ratio vary between 0.1 and 6).
These results suggest that implementations of the same partitioning method by different software packages cause uncertainties in monthly sums that are slightly larger than that caused by uncertainty in the value of the u * threshold.Extending this to include other partitioning methods (daytime LRC, use of ANN) leads to the uncertainty due to choice of method being significantly larger than uncertainty due to the choice of u * threshold.Resolving the differences between implementations of the same method by different software and between different methods is a high priority for future work.

Conclusions
In the 36 years since Webb et al. (1980), there has been a concerted effort to standardise algorithms for calculating and correcting fluxes.As noted 15 years ago by Falge et al. (2001b), there has not been the same advance in standard approaches to gap filling and partitioning, although progress has been made (Reichstein et al, 2005;Papale et al 2006;Moffat et al, 2007).As well as a standard approach to gap filling and partitioning, the advance from individual site programs to global syntheses requires a corresponding advance in the standardisation of data archiving, data visibility and data accessibility to promote the reuse of flux network data, especially by those outside the EC community.The FLUXNET LaThuile and 2015 synthesis data sets and the FLUXNET BADM initiative are valuable moves in this direction.
In this paper we have described an integrated suite of Python scripts, OzFluxQC, that provides a powerful and flexible tool for processing flux tower data that addresses some of these issues.A key feature of this processing path is the use of netCDF files that provide a cross-platform file format, which allows metadata to be transported with the data in a www.biogeosciences.net/14/2903/2017/Biogeosciences, 14, 2903-2928, 2017 self-describing file format and is widely supported in atmospheric and oceanic research communities.The primary design goals for OzFluxQC were (i) to provide a standard processing method that makes the expert knowledge of a few available to a large audience of users with various levels of expertise and (ii) to reduce the time required to produce gap-filled and partitioned fluxes so that PIs have more time to concentrate on the science.Novel aspects of the OzFluxQC suite include the integration of all steps into a single framework, interactive processing via a GUI or batch processing of multiple sites, the use of netCDF as the file format so that metadata travel with the data, the use of AWS and NWP models as sources for alternate data when gap filling meteorological data from towers, the implementation of several partitioning methods in a single package and the integration of data processing and data visibility and availability made possible by the use of the netCDF format.
We have demonstrated the utility of this package using a data set of ∼ 86 site-years from ∼ 23 OzFlux sites.Fluxes calculated with this package show excellent agreement with those calculated from 10 Hz turbulence data.Overall, the SEB closure across the OzFlux network is similar to that found for other flux tower synthesis data sets.
While gaps of short duration make up the majority of gap occurrences in the OzFlux data set, gaps longer than 5 days account for most of the missing data and gaps longer than 30 days are a significant component.To improve the filling of long gaps we use data from AWS, NWP models and global reanalysis.For radiation quantities we find that the global reanalysis at ∼ 75 km horizontal resolution performs only slightly worse than the NWP output at 12.5 km resolution and that both are highly correlated to flux tower data.For meteorological quantities we find that AWS data perform significantly better than NWP output with reanalysis data performing worst of the alternate sources.These results demonstrate the utility of NWP output and global reanalysis products for gap filling radiation but suggest local AWS data are a better choice for meteorological quantities.
When gap filling fluxes, the user often has a choice of the window size over which the gap filling method is to be trained.For the ANN used in OzFluxQC, we find the best performance with a window size of 60 days, the ANN performance degrades with increasing window length and applying the ANN to the whole data set gives the worst performance.The most likely explanation for this behaviour is that the use of short window lengths allows the ANN to re-train and take in to account seasonal changes in the relationship between the drivers and the target flux.It is possible that using more complex ANN designs and training these harder would improve the statistics for long windows but this carries an increased risk of an overtrained network and consequent introduction of spurious predictions.Providing the ANN with a driver that contains information on seasonal and interannual variability, such as MODIS EVI or simply the day of the year, may improve ANN performance with long window periods.
Finally, we have used results from nine OzFlux sites to compare annual sums of NEE, GPP and ER for 2013 calculated using seven different methods (three from OzFluxQC, two from FLUXNET2015, one each from REddyProc and DINGO).We find that daytime partitioning methods using a combination of light-response curve for GPP and an Arrhenius-style equation for ER underestimate ER and overestimate NEE compared to the nocturnal, u * -filtered methods at most sites.Site-to-site variability follows the same trends across all partitioning methods.Variability between different implementations of the same method at a single site is significant though still smaller than the site-to-site differences.

Future directions
The development of OzFluxQC has provided OzFlux with a powerful tool for quality control, post-processing, gap filling and partitioning of flux tower data that is now in routine use across the network.With this base established, several areas present themselves as potential future directions for the OzFlux data set and OzFluxQC.
One of the most important areas for future work is to harmonise the OzFlux data with FLUXNET.The OzFlux netCDF files are a rich source of metadata but need to be sup-plemented by the completion of the FLUXNET BADM templates for all OzFlux sites.This will make local knowledge about the sites available to the wider research community and aid in the use and interpretation of the OzFlux data.Other areas for rationalising the approaches across different networks are variable names and the terminology for processing levels.
A major area of future work is resolving the disparities in annual sums of NEE, GPP and ER given by different implementations of the same processing method, as shown in Sect.4.2 and 4.3.A related area of research is resolving the disparities between different methods, for example daytime approaches underestimate ER and overestimate NEE at OzFlux sites compared to night-time, u * -filter approaches.
Characterisation of the uncertainty in the carbon and water budgets from flux towers is an important area for future work.We propose to estimate the uncertainty due to random error in the EC measurements using the paired-observation technique of Hollinger and Richardson (2005).Uncertainty due to the distribution of u * threshold and gaps in the data will be estimated from multiple runs by selecting threshold values from the distribution, selecting a gap scenario and then repeating the gap filling and partitioning process to construct distributions of NEE, GPP and ER.This will require a large number of runs, of the order of 10 4 , to cover the range of threshold values and gap scenarios and will be computationally expensive making this feature better suited to a standalone utility, rather than as part of the interactive OzFluxQC.
Footprint visualisation is an important aid to interpretation of flux tower data, especially at inhomogeneous sites.We plan to integrate existing code that compiles a footprint climatology using the Kormann and Meixner (Kormann and Meixner, 2001) and Kljun 2-D (Kljun et al., 2015) models into OzFluxQC over the next 6 months.

Figure 1 .
Figure 1.Map of Australia showing the location of OzFlux sites (red for active sites, blue for inactive) and thumbnail plots of monthly average air temperature and precipitation at Bureau of Meteorology sites representative of the tower locations.The diameter of the symbol marking the tower locations is proportional to the canopy height; see scale at bottom left.

Figure 2 .
Figure 2. High level diagram of the OzFlux data path.Data input can be the averaged covariances output by the data logger ("Data logger output") or fluxes calculated by EddyPro or similar from the fast turbulence data ("EddyPro output").Processes in the dotted orange line are contained in OzFluxQC, while processes in the dotted blue line are utilities provided with OzFluxQC.

Figure 3 .
Figure 3. Histograms of gap frequency by duration for (a) drivers and (b) fluxes and histograms of the fraction of total gap length for (c) drivers and (d) fluxes.

Figure 4 .
Figure 4. Box plot of bias, variance ratio, correlation coefficient and root mean square error for the comparison of tower and alternate incoming short wave radiation at nine OzFlux sites.Statistics for ACCESS-R are plotted in khaki and ERA-Interim in blue.Box lengths represent the 75 and 25 % quartiles and the median is plotted as a solid line across the boxes.Numbers at the top are the percentage of missing data that are gap filled.

Figure 5 .
Figure 5. Box plot of bias, variance ratio, correlation coefficient and root mean square error for the comparison of tower and alternate air temperature at nine OzFlux sites.Statistics for AWS are plotted in red, ACCESS-R in khaki and ERA-Interim in blue.Box lengths represent the 75 and 25 % quartiles and the median is plotted as a solid line across the boxes.Numbers at the top are the percentage of missing data that are gap filled.

Figure 6 .
Figure 6.Statistics for different gap filling window lengths for CO 2 flux.Window lengths are 60 days (red), 182 days (khaki), 365 (blue) and whole length of data set (black dots).Box lengths represent the 75 and 25 % quartiles with the median plotted across the boxes as a solid line.Numbers at the top are the percentage of missing data that are gap filled.

Figure 7 .
Figure 7. Histogram of surface energy budget closure for 86 siteyears of data from 23 OzFlux sites.

Figure 8 .
Figure 8. Bar chart of the annual sum of NEE, GPP and ER for nine OzFlux sites in 2013.The bars, from left to right for each site, give the results from the nocturnal method with the Lloyd-Taylor respiration model (OzFluxQC, light red; REddyProc, mid red; FLUXNET, dark red), the nocturnal method with an ANN respiration model (OzFluxQC, light green; DINGO, dark green) and the daytime method (OzFluxQC, light blue; FLUXNET, dark blue).

Figure 9 .
Figure 9.Time series of monthly sums of NEE (red), GPP (green) and ER (blue) at the Calperum (AU-Cpr, top), Daly Savanna (AU-DaS, middle) and Whroo (AU-Whr, bottom) sites.Dark colours show the range of values due to uncertainty in the u * threshold (OzFluxQC, nocturnal Lloyd-Taylor approach), while light colours show the range of values across all seven methods used in Fig. 8.