Testing the applicability of neural networks as a gap-filling method using CH4 flux data from high latitude wetlands

. Since the advancement in CH 4 gas analyser technology and its applicability to eddy covariance ﬂux measurements, monitoring of CH 4 emissions is becoming more widespread. In order to accurately determine the greenhouse gas balance, high quality gap-free data is required. Currently there is still no consensus on CH 4 gap-ﬁlling methods, and methods applied are still study-dependent and often carried out on low resolution, daily


Introduction
Methane is one of the most important long-lived greenhouse gases, second only to CO 2 (IPCC, 2007), with natural wetlands thought to be the biggest individual source (IPCC, 2007;EPA, 2010). Since the advancement in CH 4 gas analyser technology and its applicability to eddy covariance flux measurements (Hendriks et al., 2008;Eugster and Plüss, 2010;Dengel et al., 2011;McDermitt et al., 2011;Peltola et al., 2013), monitoring of CH 4 emissions is becoming more widespread in northern regions (Mastepanov et al., 2008;Sachs et al., 2008;Zona et al., 2009;Sturtevant et al., 2012). These measurements contribute to a better understanding of the greenhouse gas balance of the Arctic and subarctic. In order to accurately estimate annual greenhouse gas budgets, time series of high quality gap-free data are required (Falge et al., 2001;Rinne et al., 2007).
Currently there is no consensus on CH 4 gap-filling methods. Several studies (Zona et al., 2009;Gažovič et al., 2010;Sturtevant et al., 2012) did not apply any gap-filling to their CH 4 flux data. Studies where gap-filling was applied were site dependent and often applied to low resolution, daily 8186 S. Dengel et al.: Testing the applicability of neural networks as a gap-filling method mean values (Hargreaves et al., 2001;Rinne et al., 2007;Riutta et al., 2007;Jackowicz-Korczyński et al., 2010;Long et al., 2010;Tagesson et al., 2012), while Wille et al. (2008), Parmentier et al. (2011a) and Forbrich et al. (2012) employed a model in order to recover missing data in their daily, 3 h and 30 min mean data, respectively. Hargreaves et al. (2001), Rinne et al. (2007), Long et al. (2010) and Tagesson et al. (2012) identified a non-linear relationship between CH 4 flux and peat temperature at depths of 0-10 cm, 35 cm, and 50 cm in subarctic ecosystems, respectively. During extended periods where no dependency on peat temperature was found, Rinne et al. (2007) and Tagesson et al. (2012) applied a simple interpolation to gap-fill these data sets. In addition, Tagesson et al. (2012) applied an exponential regression between half-hourly CH 4 fluxes and friction velocity measured after the soil was completely frozen. No dependency on water table position was found by the two studies mentioned above. A similarly simple peat temperature relationship with CH 4 emissions was also found by Zona et al. (2009) and Jackowicz-Korczyński et al. (2010). Wille et al. (2008) and Sachs et al. (2008) found strong relationships between CH 4 flux, friction velocity and soil temperature at a depth of 20 and 10 cm, respectively. Some of the above mentioned studies considered non-linear relationships to gap-fill their daily averaged CH 4 fluxes, while Parmentier et al. (2011a) provide a method for gap-filling of higher resolution (3 h) data. This method applied a gap-filling model that includes the attenuating effect of atmospheric stability on flux measurements, where methane production was related to soil temperature and water table level. Recently Forbrich et al. (2011) tested various models where peat temperatures at various depths, water table level, barometric pressure and friction velocity were integrated in order to gap-fill their time series. Furthermore, large uncertainties in applied methods do still exist with no common protocol on missing data recovery of CH 4 eddy covariance flux data.
The application of neural networks (Jain et al., 1996;Svozil et al., 1997;Elizondo and Góngora, 2005;Saxén and Pettersson, 2006) for data recovery and gap-filling (Aubinet et al., 2000;Gorban et al., 2002;Papale and Valentini, 2003;Ooba et al., 2006;Moffat et al., 2007 andSchmidt et al., 2008) has proven to be a very reliable tool in several scientific disciplines Dorling, 1998, 1999;Lek and Guégan, 1999;Lee and Jeng, 2002;Toptygin et al., 2005). In atmospheric sciences (Gardner and Dorling, 1998;Toptygin et al., 2005;Chattopadhyay G. and Chattopadhyay S., 2008), applying neural networks in forecasting has become a standard application. Neural networks have the reputation of being a "black box" where transparency is limited in most cases (Elizondo and Góngora, 2005). This partly results from a neural network's high capacity in training itself where coefficients are distributed through fitted weights and spread across several layers to accurately reproduce a given data set. In the current study, we discuss the applicability of neural networks to gap-fill CH 4 flux data from northern high lati- tude ecosystems (wet sedge tundra, sedge fen and polygonal tundra), some driver dynamics, the advantages and disadvantages of this method, and what information can be extracted from such models.
Since CH 4 is the second most potent, long-lived greenhouse gas in the atmosphere (IPCC, 2007), it is becoming increasingly important to introduce a method which is capable of dealing with such high resolution data combined with auxiliary measurements, and which is easy to implement across a variety of ecosystems. Regarding Arctic and subarctic regions, it is very important to work with time series where data gaps have been filled using reliable methods in order to accurately determine CH 4 emissions, potential annual budgets, and prediction of future emissions under a changing climate (Anisimov, 2007;IPCC, 2007). The data sets introduced in the current study were chosen, as they show distinctive differences in their emission patterns and originate from high latitude ecosystems (Fig. 1), to assure the broad applicability of the introduced methods and results.
The aim of the current study is to test and compare three different neural network approaches as a gap-filling method for high resolution, 30 min and 1 h methane (CH 4 ) eddy covariance flux data from Arctic and subarctic ecosystems. Included are a limited number of standard meteorological variables that are measured at all sites that act as drivers for methane emissions in such dynamic ecosystems. This novel approach in CH 4 studies is a first step towards standardising CH 4 gap-filling and a contribution to standardising CH 4 measurement protocols.

Methane flux and meteorological data
The CH 4 eddy covariance flux data used in the current study originates from five distinctively different northern ecosystems ( Fig. 1): the subarctic sites of Stordalen (68 • 20 N, 19 • 03 E, northern Sweden), a mixed mire (Johansson et al., 2006) and Lompolojänkkä a nutrient-rich sedge fen located in the aapa mire region of north-western Finland (67 • 59 N, 24 • 12 E) (Aurela et al., 2009), and the tundra sites underlain by permanent permafrost: Samoylov Island in the southern central Lena River Delta (72 • 22 N, 126 • 30 E) (Sachs et al., , 2010, Kytalyk (70 • 49 N,147 • 29 E) (Parmentier et al., 2011a, b) and Barrow, a wet sedge tundra in the northern part of the Arctic Coastal plain (71 • 17 N, 156 • 36 W) (Zona et al., 2009. The CH 4 fluxes were measured by the eddy covariance (EC) method (Baldocchi, 2003). Instrumentation used in these six studies were the three-dimensional sonic anemometer R3-50 (Gill Instruments Ltd., Lymington, Hampshire, England) coupled with a closed path Fast Greenhouse Gas Analyser (FGGA, Los Gatos Research, Mountain View, California, USA) in Stordalen; the USA-1 (METEK, Elmshorn, Germany) three-axis sonic anemometer/thermometer and the closed-path DLT-100 fast response CH 4 gas analyser (Los Gatos Research, Mountain View, California, USA) in Lompolojänkkä, and the three-dimensional Solent R3 sonic anemometer (Gill Instruments Ltd., Lymington, Hampshire, UK) and the TGA 100 tunable diode laser spectrometer (Campbell Scientific Ltd., USA) in the Lena River Delta. At the Kytalyk site, a three-dimensional Solent R3-50 sonic anemometer (Gill Instruments Ltd., Lymington, Hampshire, UK) and a closed-path DLT-100 fast response CH4 gas analyser (Los Gatos Research, Mountain View, California, USA) were used in both years, while a WindMasterPro sonic anemometer (Gill Instruments Ltd., Lymington, Hampshire, UK) and the closed-path DLT-100 fast response CH4 gas analyser (Los Gatos Research, Mountain View, California, USA) was used in Barrow. The reader is advised to consult Jackowitz-Korczynski et al. (2010), Aurela et al. (2009, Parmentier et al. (2011a, b) and Zona et al. (2009), for more details about the sites, measurements and further instrumentation.
All five sites recorded standard meteorological variables, such as air temperature, solar radiation, soil temperature at various depths, wind speed and wind direction (for the current study wind speed has been decomposed into its horizontal (along wind u) and perpendicular (across wind v) components). Furthermore, CH 4 eddy covariance flux data from Lompolojänkkä were u * filtered, using 0.1 m s −1 as a threshold. Here, fluxes were binned according to u * and tested on two soil temperature ranges resulting in the threshold value mentioned above. At the Barrow site a threshold of 0.1 m s −1  was also applied. The data from Kytalyk were filtered for occurrences of high atmospheric stability, prior to including it in the current study. Wille et al. (2008) have shown that u * itself can enhance methane fluxes and it is therefore useful to obtain also data points at low u * , as long as storage effects are not an issue. Storage did not represent an issue at the Kytalyk site, and after filtering for high stability no significant effect of u * was seen. The reader is advised to consult Parmentier et al. (2011a, b) for more background information on the processing of these fluxes. Data introduced in the current study was not previously gap-filled at 30 min and 1 h (Lena River Delta) resolution. In order to integrate the lagged effect of the water table depth (WTD) and that of precipitation as a potential input variable, we included both variables as they were, and in addition lagged precipitation by one and WTD by 12 days, as has been identified by Kettunen et al. (1996) and Suyker et al. (1996).
To give an overview of the data coverage and availability for the current study, and the representation of day and night (07:00-18:30 and 19:00-06:30) respectively, required for the current application (efficient network training) relevant information has been listed in Table 1.

Artificial neural networks
The topology of a simple multi-layer, feed-forward neural network includes non-linear elements (neurons) that are arranged in successive layers (Fig. 2). The information flows unidirectionally, from the input (covariates) layer to the output (response) layer, through the hidden layer(s) (Jain et al., 1996;Svozil et al., 1997;Elizondo and Góngora, 2005;Saxén and Pettersson, 2006).
In the initial phase, a set of input and target data is used for training and presented to the network many times (also known as iterations). A training data set should have sufficient data to be representative of the overall data set, meaning the whole range of meteorological and flux variability should be available for training (including emission events so that the network can learn such conditions) (Moffat et al., 2010;Papale, 2012). Furthermore, the cross-correlative (Guan et al. 2007) and cross-dependency nature of climatic variables should be taken into account when choosing the appropriate input variables, as some add only little extra information to the network . Training is carried out by constantly adjusting the fitted weights so that the network output matches the target data. In order to test the trained network, a new set of input data is fed into the network and the desired output compared with those predicted by the network. The agreement or disagreement of these two data sets is an indication of the performance of the neural network model. A chosen error function measures the difference between predicted and observed output.
One of the drawbacks of neural networks is the nonuniqueness of the global minimum (Hammerstrom, 1993;Nguyen and Chan, 2004) which changes as each training run achieves different weights and results (it is important to find The architecture of the "seasonal" neural network topology used in the current study. Input variables (left side of the network) are fed into the network with weights fitted (along grey arrows) with information flowing unidirectionally to the nodes (marked as circles) within the hidden layer, where a bias (offset) (marked as 1) is added (along black arrows). Here a sigmoid function (activation function) is applied to the weighted sum, leading further to the next layer, the "output" layer where a new set of weights is distributed, together with a bias and the sigmoid activation function before making an estimate for the output value. As the output still has a range of 0-1, it is rescaled prior to replacing missing data values. Actual fitted weights and biases are removed from the graph for clarity. Please see Table 1 for a definition of the input variables.
a set of weights that processes data accurately enough for the application). Another issue with neural networks is the possibility of under-or over-fitting of networks (Hansen and Salamon, 1990;Jain et al., 1996;Svozil et al., 1997). This can happen when data used to train the network is not representative enough for the entire observation span, if the number of hidden layers or neurons is not correct, if the global minimum is overshot or when the network learns the train-ing pattern well but is underperforming during its validation (poor generalisation) (Jain et al., 1996;Gardner and Dorling, 1998;Nguyen and Chan, 2004;Wang et al., 2005;Saxén and Pettersson, 2006;Stathakis, 2009). To prevent this from happening, one can remove redundant input data (Gunaratnam et al., 2003;Saxén and Pettersson, 2006), reduce or increase the number of neurons in the network and use the appropriate generalisation such as early stopping (Hansen and Salamon, 1990;Amari et al., 1997;Svozil et al., 1997;Wang et al., 2005) or another stopping criteria (Günther and Fritsch, 2010;Fritsch and Günther, 2012).

Pre-processing of data
Recently, several studies (Zhang and Qi, 2005; Klevecka and Lelis, 2009) pointed out an ongoing debate on whether data should be de-seasonalised prior to applying neural networks. Nelson et al. (1999) showed better results for de-seasonalised time series, while others (Sharda and Patil, 1992;Franses and Draisma, 1997) found that neural networks are able to model seasonality directly and prior de-seasonalisation is not necessary. Regarding gap-filling of atmospheric trace gas fluxes (van Wijk and Bouten, 1999;Aubinet et al., 2000;Carrara et al., 2003;Papale and Valentini, 2003;Ryan et al., 2004;Ooba et al., 2006 andSchmidt et al., 2008), no de-seasonalisation of data was carried out prior to applying neural networks. As artificial neural networks have the ability to deal with complex data sets and as no de-seasonalisation of data was carried out in previous gap-filling studies (see above references) we decided to proceed in the same manner. Table 1 gives an overview of the data availability and distribution used in the current study. All sites are located within the Arctic Circle and are experiencing the polar day (∼ 3 months of no darkness over the summer period). As u * filtering has been applied to two sites only (very good nighttime data coverage, nonetheless), data were not split into the two categories day/night as required for CO 2 gap-filling studies. Furthermore, by including all available data (see effective data coverage in days in Table 1) we avoided a further reduction in number of data points and risking working with models that lack in power or representativeness. Also, grouping S. Dengel et al.: Testing the applicability of neural networks as a gap-filling method 8189 data by different meteorological conditions would lead here to a subjective choice as not every emission event, or weather condition has the same effect on methane fluxes. One of the reasons could be lagged effects of some of the drivers (only the lagged effect of water table depth and precipitation are discussed here) that need further investigation.
We are dealing with seasonal and diurnal data that experience regular and predictable changes. In order to add this seasonal and diurnal effect to their neural network, Papale and Valentini (2003) introduced several fuzzy data sets reflecting the diurnal and seasonal variation to reduce the linear cumulative numerical weight of time in relation to other variables. Adding this type of input to neural networks does not always increase the neural network performance, and has been shown by Schmidt et al. (2008) to have sometimes little effect. Our analysis, however, showed an increase in network performance when including these fuzzy sets. Therefore, four fuzzy sets representing the diurnal effect and three (spring, summer and autumn) mirroring seasonal variations were included in the current study. Winter as a fuzzy set has been excluded as none of the used flux data sets extended into the winter period. Furthermore, these seasonal fuzzy sets were adjusted to correspond to the seasons in northern latitudes (onset of spring is later than in temperate regions, for example). We kept the models simple following the principle of parsimony (Beck, 1943;Bugmann and Martin, 1995) and the quality assurance standards highlighted in Moffat et al. (2010). The predictive ability of a model initially increases with complexity but they do also have the tendency to decline once a model becomes too complicated (Bugmann and Martin, 1995).
The meteorological, soil, and CH 4 flux data as well as the fuzzy data sets consist of different magnitudes and units. In order to generalise the data, we have scaled all data from 0 to 1 as has been previously applied by van Wijk and Bouten (1999), Papale and Valentini (2003), Nguyen and Chan (2004) and Moffat et al. (2010). Furthermore, the range between 0 and 1 is also necessary as we are applying a sigmoid activation function (Cybenko, 1989), which has a range of 0-1. By scaling the data that we are feeding into the network, all data is being treated equally and weights can be distributed evenly. A sigmoid function was also used in the output layer (Fig. 2), as has been previously applied by Papale and Valentini (2003).
Three approaches were tested here: two including meteorological variables recorded at all five sites ("seasonal" and "diurnal") and a third, a more thermo-hydrological approach ("lagged"), where the lag effects of water table depth (WTD) and precipitation were incorporated at some of the sites.

Applying artificial neural networks to data
Introducing the neural network topology (Fig. 2) used in the current study, input variables (left side of the network) are fed into the network with weights fitted and spread across the two layers with information flowing unidirectionally (grey arrows) to the 4 nodes (marked as circles) within the hidden layer, where a bias (offset) (marked as 1) is added.
In this case, the underlying function is simply written as where n is the number of hidden neurons (4 in our case), x i represents the input variables (x 1 , . . ., x 9 -in our case), w ij denotes the fitted weights (w 1 , . . ., w 4 -for each neuron) attached to each input variable and b denotes the bias (or offset) that is added to the weighted sum prior to applying the sigmoid activation function (f ), leading further to the next layer, where a new set of weights are distributed, together with a bias and the sigmoid activation function before making an estimation of the output values (o). These output values also have a range of 0-1 which were then rescaled to their appropriate physical unit (nmol m −2 s −1 ).
As we are testing and comparing different artificial neural networks as a gap-filling method for CH 4 eddy covariance flux data, we utilised the artificial gap length scenarios introduced in Moffat et al. (2007, Appendix A). These scenarios served in extracting the same data rows (∼ 10 %) from all data sets which were utilised for model performance testing, necessary for reliable model comparison and evaluation. Moffat et al. (2007) generated secondary data sets by flagging 10 % of the data as artificial gaps. Ten data sets were generated for each gap scenario, each having a different temporal shift in gap distribution. We have chosen three scenarios per gap length . These gap lengths represent very short gaps (V, 1-3) of random 30 min values; short gaps (S, 1-3) of random 4 h gaps, medium (M, 1-3) of 1.5 days, long (L, 1-3) of 12 full days and a mixed scenario (X, 1-3), representing a mix of the above mentioned gap lengths, resulting in 15 scenarios per data set. These scenario labels 1-3 do not correspond to the consecutive scenario numbering introduced in Moffat et al. (2007), but are chosen for simplicity. Scenarios were chosen in such a way that the maximal existent data coverage is achieved, together with an appropriate gap length. There were cases where artificial gap data points coincided with already existing gaps, resulting in a non-uniform length of data pairs used as test data sets. Nevertheless, each scenario extended the already existing gaps by a further 8-14 %. The artificial gap scenarios introduced in Moffat et al. (2007) are for 30 min resolution files which were adjusted for the Lena River Delta data set, where a 30 min artificial gap was applied to the respective hour value. The remaining available data were used for training, avoiding any further reduction of the number of data rows out of reasons mentioned in the previous section.
Several learning algorithms are available for neural network training. In the current study we applied the resilient backpropagation algorithm (Riedmiller, 1994). It is a first-order optimisation algorithm that acts on each weight separately. It modifies the weights in order to find a local minimum of the error function. The weights are modified going in the opposite direction of the partial derivatives until a local minimum is reached (if the partial derivative is negative, the weight is increased; if the partial derivative is positive, the weight is decreased). This ensures that a local minimum is reached, leading to an efficient and transparent adaptation process (Riedmiller, 1994;Günther and Fritsch, 2010). The resilient backpropagation algorithm has been chosen as it appears to be the fastest in training and most consistent learning algorithm in several studies (Schiffmann et al., 1994;Treadgold and Gedeon, 1996;Kişi and Uncouglu, 2005). Furthermore, this learning algorithm does limit the size of the weights, which are generally also linked to over-fitting of a model.
In order to test the network's performance, various error functions can be applied. We chose the sum of squared errors (SSE), as previously used by Moffat et al. (2010) and assembled the neural network by implementing the "neuralnet" package (Günther and Fritsch, 2010;Fritsch and Günther, 2012) in R statistical language (R Development Core Team, 2013). Here we applied the built-in training function and modified it accordingly to suit our purposes (maximum number of iterations and threshold value for the partial derivatives of the error function). Moffat (2010) employed a similar method but used the root mean square error (RMSE) as an error function. The process stops when this training error levels off and all partial derivatives of the error function reach the pre-specified threshold value of 0.01 (1 %) that acts as a stopping criteria.
One of the advantages of applying the "neuralnet" package is the possibility to choose this integer specifying the threshold as a stopping criteria (Günther and Fritsch, 2010) that should be achieved during the training phase, along with the maximum number of iterations (per repetition) the network should carry out in order to fulfil our requirements (convergence of the network and finding the local minimum). The trained network is then tested on the extracted ∼ 10 % of data (artificial gaps) by applying the test function (compute) within the "neuralnet" package. If the error function is equivalent to the negative log-likelihood function then the Akaike information criterion (AIC) can be used to avoid an overfitting of the trained network (F. Günther, personal communication, 2013).
The data were not split into the traditional three-way crossvalidation data sets: training, validation and testing subsets (where a model is trained on a training data set with a smaller validation data set that is periodically passed through the network, before being tested on the independent test data). One reason is the use of the applied package, but also to keep the training and testing of the network transparent. Another reason is the applicability of the method to short time series. Would a short time series, such as Kytalyk (2008), be split into three data sets then the resulting analysis would lack any power and representativeness. This way all six data sets were treated in the same manner. In order to ensure the reliability of all trained models an additional test across sites has been carried out. Moffat (2010) showed that model reliability can also be checked via cross-validation using data from another site.
A reliable model does require a good generalisation (also to avoid overtraining/over-fitting), which was taken into account in the current study. Moffat (2010) does list the main points of having the right number of neurons, stopping the training as soon as the error levels off, pruning the nodes (neurons) and penalising large weights by regularisation. When applying the resilient backpropagation learning algorithm weights are limited internally in size (Riedmiller, 1994). Verifying the reliability of a model not only on its test data but also via cross-validation with data from another site does also contribute to a good generalisation of a model. In order to ensure the reliability of our networks and their applicability in CH 4 studies (also regarding drivers/input parameters) site combinations were chosen according to their flux range and site properties.
There is currently no consensus in the scientific community on the number of neurons that should be used (Svozil et al., 1997;Saxén and Pettersson, 2006;Stathakis, 2009) when applying neural networks to data series. In order to apply the appropriate number of neurons, 25 repetitions were run for a selection of neurons (1-12) to help in choosing the appropriate number of neurons (Järvi et al., 2012) to be applied within the hidden layer of our networks. Furthermore, utilising the integrated AIC's calculation did also support the choice in applied number of neurons. The AIC value reflects the overall fit of a model and it is used to avoid an overfitting of the trained model. While by itself the AIC does not give much information, it only becomes applicable when it is compared to the AIC value of a series of models in which the same data sets (observations) were used. It does also help to determine which model is most parsimonious and an information criterion commonly used for model selection (Hurvich and Tsai, 1989;Burnham and Anderson, 2004). Once no further improvement could be realised in terms of goodness of fit and the networks were able to predict CH 4 fluxes relatively accurately (Pearson's correlation coefficient (r) and the RMSE achieved when testing the trained networks (data not shown)), we decided to proceed further using four neurons within the hidden layer. All networks (15 scenarios per site for all tree approaches) were trained again 50 times and the same error values calculated again (Figs. 3-5).

Statistical analysis
In order to examine all input variables and their effect on methane fluxes, we applied a simple stepwise regression (a combination of backward elimination and forward selec- resolution data. Following the principle of parsimony, we decided on a few meteorological variables and four fuzzy sets representing time of day (morning, afternoon, evening and night) and three representing the seasonal variation (spring, summer and autumn) (Fig. 2, Table 2). This selection helped to prune the network by avoiding insignificant input data (Gunaratnam et al., 2003;Saxén and Pettersson, 2006) and to avoid cross-correlations (to some extent) between input variables, adding little extra information to the network. We have tested three approaches by using three different sets of input data: the most simple approach included solar radiation as a time of day indicator and the three seasonal fuzzy sets ("seasonal", Table 2), a second approach in which solar radiation has been removed and replaced by the time of day fuzzy sets ("diurnal", Table 2), while a third, a more thermo-hydrological approach is tested by integrating the lagged effect of precipitation and WTD which was applied to four out of six data sets ("lagged", Table 2).
For the first two approaches, we chose those variables that appeared important and available in all data sets. Furthermore it did help to standardise the method and make it applicable to all six different data sets in the same way. The study by Wille et al. (2008) has shown that u * can enhance CH 4 emissions. Therefore, adding u * as an input variable where data has previously been u * filtered, would lead to uncertainties, as u * filtered data do not provide the information necessary for the network to train and learn such conditions in order to predict CH 4 fluxes occurring under similar conditions.
Precipitation and water table depth can also act as CH 4 drivers (Whalen and Reeburgh, 1992;Roulet et al., 1992;Christensen, 1993;, and can additionally also have a lagged effect on CH 4 emissions (Windsor et al., 1993;Bubier et al., 1995;Kettunen et al., 1996;Suyker et al., 1996). In order to integrate these two hydrological variables, data from Kytalyk were tested by using the current corresponding precipitation values as well as precipitation values lagged by one day, as has been identified in the study conducted by Kettunen et al. (1996). Kettunen et al. (1996) and Suyker et al. (1996) identified a distinctive 12 day lag when investigating the water table fluctuation effect on CH 4 fluxes from boreal wetlands. Stordalen and Lompolojänkkä (both subarctic) were used to test this approach. Barrow was tested by integrating other thermohydrological variables, such as soil moisture, vapour pressure deficit (VPD) and soil heat flux as input variables. Lena River Delta data were used to test relative humidity in addition to barometric air pressure as an indicator for atmospheric changes.
In order to investigate the performance of all the networks of the three approaches we estimated the mean r (at 95 % confidence level), AIC and root mean square error (RMSE) as a goodness of fit indicator of the measured and predicted fluxes. The simple RMSE indicates the range of the error for each scenario and each site (shown in their true physical unit of nmol m −2 s −1 ). The resulting r values for the training (grey full circles) and test (black full circles) are shown in Figs. 3-5 (lower panel), while the RMSE for the test data are shown as box-plots in the upper panel of these figures. AIC values, necessary for model selection and an indicator for model performance (the lower the value, the better the model) for all six sites, scenarios and model approaches are visualised in Fig. 6. The Pearson correlation coefficient was also utilised as goodness of fit in the cross-validation analysis (Fig. 7).

Results
We applied artificial neural networks to six different CH 4 flux data sets originating from subarctic and Arctic regions of Fennoscandia, Siberia and Alaska. We have tested three approaches by using three different sets of input data. These input variables (listed in Table 2) were, amongst others, air temperature (Air T), soil temperature at the depth of 10 cm (Soil T), wind direction (decomposed wind speed into along (u) and across wind (v)), barometric air pressure (Air P) and the fuzzy transformation of the time of day and seasonal variations.
In the first step of the analysis 1-12 neurons were applied to both standardised approaches in order to find the optimal number of neurons, as has been introduced in Järvi et al. (2012) prior to deciding on the final number of neurons included in the hidden layer. The training distribution showed an increase in correlation coefficient value with each added neuron (data not shown here), while some of the test results showed no improvement with increase in neurons added. The distribution indicated that from four neurons onwards no real improvement was visible, be it for short, long or mixed gap length scenarios. This is also confirmed by the lack of statistical significance at the 95 % confidence level (notched Tukey box plots) beyond four neurons leading to the assumption that four neurons were ideal within the hidden layer to be included in this study for all three approaches. The integrated AIC calculations did also support this approach.
The output from the first approach (Table 2) (seasonal effect) is visualised in Fig. 3, showing the r coefficients achieved during the training (in grey) and test phases (in black), by applying four neurons only. In the upper panel the RMSE is given in true physical units (nmol m −2 s −1 ) for the test outcome only. Missing values indicate the lack of data or inappropriate length of data coverage or a distribution outside the visualised plot margins. Figure 4 shows the outcome from the second approach (diurnal effect) where solar radiation, as an indicator of time of day, was replaced by the four fuzzy sets representing the diurnal effect of the data. Here again, the performance of the network at each site and their error values become visible and comparable. The third approach (thermo-hydrological or lagged effects) in which different input variables were chosen Table 2. Input variables used in three different neural network gap filling approaches. The input variables listed are: air temperature (Air T ), soil temperature at the depth of 10 cm (Soil T ), wind direction (Wind U and Wind V (horizontal and perpendicular)), solar radiation (Sol Rad, substituted with photosynthetic active radiation where not available), barometric air pressure (Air P ), relative humidity (RH), water table depth (WTD), precipitation, soil heat flux, soil moisture, soil temperature of the polygon rim (Rim temp) and vapour pressure deficit (VPD). Furthermore, included are the fuzzy transformation of the seasonal variation and time of day represented by spring, summer and autumn, as well as the four time periods morning, afternoon, evening and night.

All sites
All sites Stordalen Lompolojänkkä Lena River Delta Kytalyk Barrow (approach 1) (approach 2) (approach 3) (approach 3) (approach 3) (approach 3) (approach 3) "seasonal" "diurnal" "lagged" "lagged" "lagged" "lagged" "lagged" for each site is illustrated in Fig. 5. The error values for Stordalen, Lompolojänkkä and Kytalyk (2008) (Figs. 3-5) appear much higher than those of the remaining three results; they also have the highest flux ranges (see Fig. 8) in our study. In order to visualise the model performances (overall fit of the model) we included the AIC values as an aid for model selection (Fig. 6) for each scenario. The "seasonal" and "diurnal" model included the same number of input vari- ables and same number of neurons. Their values appeared the lowest across all five sites with the "seasonal" model showing the lowest values for all scenarios in five out of six data sets. The AIC values for the "lagged" model for Stordalen and Kytalyk show the highest values. This information criterion is a measure of the overall fit but does also indicate how parsimonious a model is. The "lagged" models for these two sites do include more input variables and they are therefore "penalised" accordingly (part of the AIC principle), (Hurvich and Tsai, 1989). Therefore, the "seasonal" model was chosen to visualise the results from our additional model reliability test carried out for all six trained networks (Fig. 7). None of these results point to any degradation of the network performance or over-fitting of the trained networks. In order to visualise the performance of the artificial neural networks applied to the "seasonal" data, we illustrated the goodness of fit of the predicted and actually measured CH 4 flux data (test data) for all three mixed scenarios showing their distribution along the ideal 1 : 1 regression line in

Discussion
Artificial neural networks that have previously been successfully implemented as a gap-filling method (Falge et al., 2001, Moffat et al., 2007 for carbon dioxide flux time series (Aubinet et al., 2000;Carrara et al., 2003;Valentini, 2003 andSchmidt et al., 2008) have been described as a robust, reliable and versatile tool. Nevertheless, their application is time consuming, particularly in finding the appropriate input variables, the appropriate number of hidden layers, and neurons/nodes within these layers, as well as the choice of training and test data sets (data rows). Furthermore, the global minimum (Hammerstrom, 1993;Nguyen and Chan, 2004) is not unique and changes with each training run because every training run/repetition (a run includes many iterations) achieves different fitted weights and results (it is important to find a set of weights that processes data accurately enough for the application).
In the current study, we tested the applicability of neural networks as a gap-filling tool for methane flux data and also made an attempt to standardise the method by including the same input variables for all data sets and using the same number of neurons within the hidden layer for each data scenario. In order to test their applicability, we applied the method to various ecosystems by including six distinctively different data sets from high latitudes, one showing   . Illustrated are the results from the "seasonal" model in black, the "diurnal" model in light grey and the "lagged" model in dark grey. Following the AIC principle it becomes visible that the "seasonal" fitted model appears to be the preferred across all 15 scenarios. diurnal and seasonal variation, one only seasonal while three data sets do not show any diurnal or seasonal variation. The sixth data set (Stordalen) reflected its position on the shores of Villasjön Lake in its emission patterns. Three different approaches have been introduced here: two including the same input variables across all sites (in an attempt to standardise the method), while a third approach included different input variables at each site and is also taking the lagged effect of water table depth and precipitation into account. Furthermore, we tested the reliability of our models by testing our trained network with data from another site with similar ranges in fluxes or site properties.
The chosen input variables (Table 2; commonly recorded meteorological parameters that act as CH 4 drivers and wind direction) as well as fuzzy sets representing time of day and seasonal changes appear to be the right choice. On the one hand adding more input variables would not comply with the simple model approach, while, on the other hand, adding more input variables would introduce the risk of cross-dependency and to some extent cross-correlation (Guan et al., 2007) between input variables, as input variables should remain as independent as possible as described in Moffat et al. (2010). Reducing the number of chosen input variables any further would not comply anymore with the principle of parsimony, of keeping a model simple but not too simple, leading to an underperformance of the network as predicted fluxes did not reach the full range which means the networks were underestimating the fluxes. The chosen meteorological variables included in the current study belong to the main drivers as shown in previous studies (see Introduction). Hydrological properties, such as precipitation and water table depth, can have a lagged effect on methane emissions (Windsor et al., 1993;Bubier et al., 1995;Kettunen et al., 1996;Suyker et al., 1996). Kettunen et al. (1996) carried out a cross-correlation study looking at the different lag effects for both variables at a boreal mire complex (six sites). They identified a significant lag of one day for both their boreal flark and hummock sites, indicating that precipitation increased emissions throughout the summer. Furthermore, Kettunen et al. (1996) also identified a 12 day lag with WTD at both these sites. Here again, a rise in water table was reflected in CH 4 emissions. Suyker et al. (1996) also identified a distinctive 12 day lag when investigating the water table fluctuation effect on the midday CH 4 fluxes from their boreal fen.
Water table position is not always recorded, or not recorded with the same time resolution as the CH 4 fluxes. Nevertheless two of the selected sites (Stordalen and Lompolojänkkä) do record water table depth continuously and were included in the current study. Results show that their network performance remained very high showing these two lagged variables to be reliable input variables in the current study. Including the lagged effect of precipitation did slightly decrease the performance of the Kytalyk (2008) network (Fig. 5). A reason for the decrease could be the data set lengths or the fact that the lagged effect of precipitation was not present.
The performance of the mixed scenarios (representation of the most realistic gap scenario in flux data) originally chosen by Moffat et al. (2007) as a crosscheck of the other four different gap scenarios and also included in the current study show the same, in some cases better, results than some of the individual scenarios (see Figs. 3-5) themselves. The Lena River Delta appears to display the lowest mean correlation coefficient values. The extreme discrepancy in case of the Lena River Delta results are due to low CH 4 fluxes recorded at the site with few significant emission events resulting in much higher fluxes. Furthermore the correlation coefficient values for the test (artificial gaps) data sets showed sometimes higher values than those achieved in the training phase . This could be due to the fact that Barrow (for example) experienced a diurnal trend and the data composing the test data set (artificial gaps) did not include any specific events. In case of the Lena River Delta data set, the artificially mixed gaps included two little events that the "seasonal" network was capable to predict (Fig. 8). This performance could be credited to the fact that the network has learnt about such events from similar conditions during the training phase (indication of good representativeness of the overall data as part of the training data set). The introduced lag effect for this site (Fig. 5) did slightly reduce the network performance, showing low correlation values with values still within acceptable significance level margins, nonetheless. This reduction in network performance could be credited maybe to the choice of relative humidity as an input variable in this study.
Some of the Pearson correlation coefficients achieved in the current study appear low (Figs. 3-8), compared to those achieved for CO 2 fluxes when applying the same method (Moffat et al., 2007). Much higher correlation coefficients (r > 0.95) were achieved in the current study when comparing trained data versus actual measured data, but resulted frequently in no acceptable values when testing the network performance. Outliers were introduced in places where there were no high or low fluxes. Such results could also be due to existing and gap scenario data distribution, as artificial gaps coincided with existing gaps, reducing the number of testing data points. None of the data sets included an entire gap free 12 day period that could have been used as a classical "long gap" test data set, as introduced in Moffat et al. (2007). Furthermore it is to be expected that predicting CH 4 emission events is more complex than predicting CO 2 fluxes that undergo a regularity and predictability in respiration or photosynthetic uptake, respectively. Integrating the additional test did help in testing the reliability of our trained networks beyond the simple testing using test data originating from the same overall data set. These results do show that, though methane does behave in a different manner than CO 2 , it is still possible to apply this type of cross-validation method to such fluxes as common drivers do act similarly at Fig. 7. Results from the additional tests carried out, testing each "seasonal" trained network with data from another site that showed a similar range in fluxes or similar site properties. Site names on the left side in the legend indicate the name of the trained network, while the site name on the right side indicate the site name whose data has been used to test the reliability of the model. all sites. These results are also indicative of the robustness of the models introduced in the current study.
Two other important aspects in the current study are the length of the time series (Table 1), and thus consequently the sufficiency of available training data, and secondly the gap length in the existing time series. Time series vary between 15 days (Kytalyk, 2008) and 181 days (Lompolojänkkä), with Lompolojänkkä having the highest proportion of gaps in the CH 4 fluxes. Regarding sufficiency of available training data, we believe that by utilising the method described here, we kept the method transparent, avoided working with models lacking in power, and making this method applicable to short and longer time series. Moffat et al. (2010) showed that neural networks are actually also applicable to one day of incomplete data, when applied to CO 2 flux data. However, none of the time series included in the current study were gap free or included an entire year's worth of data. Therefore, no bias error to indicate the bias induced on the annual sums/budgets, as referred to by Moffat et al. (2007), was calculated.
Nevertheless, neural networks show excellent performance (Figs. 3-8), which proves that this standardised method is easy to be implemented and applicable to many different ecosystems in the northern latitudes (Fig. 1). We have reliably reproduced and predicted methane fluxes. Furthermore, we were able to incorporate successfully the lagged effect of hydrological properties such as water table depth at two sites and precipitation at one site (Kytalyk). We find artificial neural networks to be recommendable as a reliable and robust gap-filling method for high resolution CH 4 flux data originating from various ecosystems at high latitudes, as estimated annual budgets rely on accurate gap-free or gap-filled data.

Challenges and recommendations
A peculiar characteristic of CH 4 is its higher emission variability than CO 2 fluxes, often connected to specific events such as those visible in the Lena River Delta data set. In our case, the network was able to reproduce these events . In case such events are triggered by other drivers or physical forcing (not included as input variables in neural networks), predicted values do diverge from the actually measured values. Precipitation and water table depth can affect CH 4 fluxes (Windsor et al., 1993;Bubier et al., 1995;Kettunen et al., 1996;Suyker et al., 1996) with each rain event being different in intensity and length. The same can be said of water table depths whose rise and fall are not equally predictable after each such event also behaving differently at each site. This lagged effect has, so far, only been confirmed in in situ soil chamber studies (Windsor et al., 1993;Kettunen et al., 1996), laboratory experiments (Funk et al., 1994;Kettunen et al., 1999) and midday CH 4 flux values, measured with the eddy covariance method (Suyker et al., 1996) but not in high resolution CH 4 eddy covariance flux studies. In addition, further uncertainties in predicting/estimating accurate CH 4 emissions do exist regarding the insufficiently understood "pressure pumping effect" (Zamolodchikov et al., 2003;Takle et al., 2004) and friction velocity (u * -correction), a parameter known to act as a driver for CH 4 emissions (Wille et al., 2010) but also used as a filtering criterion for low turbulence, both affecting methane emissions. These factors might have an influence on how far neural networks are reproducing and predicting CH 4 fluxes accurately.
In order to evaluate, carry out and apply standardised CH 4 flux measurements, data post-processing and gap-filling methods in the future, standardised protocols and required auxiliary measurements are needed to be implemented. First steps towards such procedures have already been initiated during the international ESF (European Science Foundation) explanatory (Germany, April 2012) and FLUXNET CH 4 (Finland, September 2012) workshops.