Evaluation and uncertainty analysis of regional-scale CLM4.5 net carbon flux estimates

Modeling net ecosystem exchange (NEE) at the regional scale with land surface models (LSMs) is relevant for the estimation of regional carbon balances, but studies on it are very limited. Furthermore, it is essential to better understand and quantify the uncertainty of LSMs in order to improve them. An important key variable in this respect is the prognostic leaf area index (LAI), which is very sensitive to forcing data and strongly affects the modeled NEE. We applied the Community Land Model (CLM4.5BGC) to the Rur catchment in western Germany and compared estimated and default ecological key parameters for modeling carbon fluxes and LAI. The parameter estimates were previously estimated with the Markov chain Monte Carlo (MCMC) approach DREAM(zs) for four of the most widespread plant functional types in the catchment. It was found that the catchment-scale annual NEE was strongly positive with default parameter values but negative (and closer to observations) with the estimated values. Thus, the estimation of CLM parameters with local NEE observations can be highly relevant when determining regional carbon balances. To obtain a more comprehensive picture of model uncertainty, CLM ensembles were set up with perturbed meteorological input and uncertain initial states in addition to uncertain parameters. C3 grass and C3 crops were particularly sensitive to the perturbed meteorological input, which resulted in a strong increase in the standard deviation of the annual NEE sum (σNEE) for the different ensemble members from ∼ 2 to 3 gC m−2 yr−1 (with uncertain parameters) to ∼ 45 gC m−2 yr−1 (C3 grass) and ∼ 75 gC m−2 yr−1 (C3 crops) with perturbed forcings. This increase in uncertainty is related to the impact of the meteorological forcings on leaf onset and senescence, and enhanced/reduced drought stress related to perturbation of precipitation. The NEE uncertainty for the forest plant functional type (PFT) was considerably lower (σNEE ∼ 4.0–13.5 gC m −2 yr−1 with perturbed parameters, meteorological forcings and initial states). We conclude that LAI and NEE uncertainty with CLM is clearly underestimated if uncertain meteorological forcings and initial states are not taken into account.

Abstract.Modeling net ecosystem exchange (NEE) at the regional scale with land surface models (LSMs) is relevant for the estimation of regional carbon balances, but studies on it are very limited.Furthermore, it is essential to better understand and quantify the uncertainty of LSMs in order to improve them.An important key variable in this respect is the prognostic leaf area index (LAI), which is very sensitive to forcing data and strongly affects the modeled NEE.We applied the Community Land Model (CLM4.5-BGC) to the Rur catchment in western Germany and compared estimated and default ecological key parameters for modeling carbon fluxes and LAI.The parameter estimates were previously estimated with the Markov chain Monte Carlo (MCMC) approach DREAM (zs) for four of the most widespread plant functional types in the catchment.It was found that the catchment-scale annual NEE was strongly positive with default parameter values but negative (and closer to observations) with the estimated values.Thus, the estimation of CLM parameters with local NEE observations can be highly relevant when determining regional carbon balances.To obtain a more comprehensive picture of model uncertainty, CLM ensembles were set up with perturbed meteorological input and uncertain initial states in addition to uncertain parameters.C 3 grass and C 3 crops were particularly sensitive to the perturbed meteorological input, which resulted in a strong increase in the standard deviation of the annual NEE sum (σ NEE ) for the different ensemble members from ∼ 2 to 3 g C m −2 yr −1 (with uncertain parameters) to ∼ 45 g C m −2 yr −1 (C 3 grass) and ∼ 75 g C m −2 yr −1 (C 3 crops) with perturbed forcings.This increase in uncertainty is related to the impact of the meteorological forcings on leaf onset and senescence, and enhanced/reduced drought stress related to perturbation of precipitation.The NEE uncertainty for the forest plant functional type (PFT) was considerably lower (σ NEE ∼ 4.0-13.5 g C m −2 yr −1 with perturbed parameters, meteorological forcings and initial states).We conclude that LAI and NEE uncertainty with CLM is clearly underestimated if uncertain meteorological forcings and initial states are not taken into account.

Introduction
Net ecosystem CO 2 exchange (NEE), the difference of CO 2 release via soil and plant respiration and photosynthetic CO 2 uptake, is an important indicator for the net carbon sink or source function of terrestrial ecosystems.The understanding of factors controlling the spatial and temporal variability of carbon fluxes like respiration is still limited (Reichstein and Beer, 2008).Eddy covariance (EC) net carbon flux measurements are limited to a relatively small area.Chen et al. (2012) showed that the 90 % cumulative annual footprint area of 12 EC towers located at Canadian sites (with different land cover including grassland and forest) varied from about 1.1 to 5.0 km 2 , and that the spatial representativeness of the EC flux measurements depends on the degree of the land surface heterogeneity.Biogeochemical fluxes are spatially and temporally highly variable and nonlinear due to the spatial het-Published by Copernicus Publications on behalf of the European Geosciences Union.erogeneity of soil properties, vegetation and fauna, and to the temporal variability of the environmental drivers (e.g., meteorological conditions, management schemes; Chen et al., 2009;Stoy et al., 2009;Borchard et al., 2015).Therefore, conventional interpolation methods, such as kriging or inverse distance weighting, are not suitable for upscaling EC carbon flux measurements to larger areas.
Land surface models like CLM (Oleson et al., 2013) simulate the coupled carbon, nitrogen, water and energy cycle of the land surface, and are essential to understand interactions between the climate and the terrestrial carbon cycle and to predict climate-ecosystem feedbacks (e.g., Le Quéré et al., 2012;Arora et al., 2013;Brovkin et al., 2013;Todd-Brown et al., 2014).In this study, CLM version 4.5 in the biogeochemistry (BGC) mode (CLM4.5-BGC)was applied.The prognostically calculated leaf area index (LAI) is a major indicator for the model representation of plant phenology.Moreover, it affects photosynthesis and transpiration.Therefore, the LAI is a key state variable for carbon flux predictions and land surface-atmosphere exchange fluxes of water and carbon.Thus, a correct representation of the simulated LAI in terms of magnitude and timing is highly desirable.The representation of plant phenology (timing of plant emergence, length in growing season) has been shown to be seriously flawed in land surface models (LSMs; e.g., Richardson et al., 2012) including CLM (Dahlin et al., 2015), which can also affect carbon flux estimates (Baldocchi and Wilson, 2001;Richardson et al., 2012).Simulated carbon fluxes and the prognostic LAI in CLM4.5-BGC are closely linked, because they depend on common ecological key parameters and plant phenology schemes.
Commonly, LSMs are applied at global or continental scale (e.g., Stöckli et al., 2008;Bonan et al., 2011;Lawrence et al., 2012) with grid sizes of ∼ 0.25-1.5 • .At this coarse scale with such a high degree of spatial aggregation, and given the non-linearity of the governing equations, the modeled values for output variables like NEE can deviate strongly from those that would have been calculated with a fineresolution model using fine-resolution input.Moreover, reliable calibration and validation of global LSMs is difficult, because observed data including soil carbon stocks and EC fluxes are only available for single locations.When applying a LSM for a small region or catchment with high spatial resolution (e.g., 1 km 2 , as in this study) the error of simulated fluxes is expected to be smaller due to the lower degree of spatial aggregation (Anderson et al., 2003).In addition, the land cover within a 1 km 2 grid cell more likely matches with the land cover at the EC site, which enables grid-based evaluation of modeled NEE.High spatial resolution can better represent the land surface heterogeneity and regional weather variability than a coarse spatial resolution.Thus, regional or catchment-scale applications of LSMs allow for investigating spatial patterns of model states, biogeochemical fluxes and interactions with the regional climate and catchment hydrology.Accordingly, quantification of carbon fluxes at re-gional scales can enhance the understanding of CO 2 dynamics and their drivers (Desai et al., 2008).This has been shown in various studies, for example, for western Africa (Bonan et al., 2002;Li et al., 2007) and the Alaskan Arctic (Fisher et al., 2014).However, to our knowledge, studies like Xiao et al. (2011), who optimized a simple ecosystem model to upscale measured EC carbon fluxes to the regional scale, do not exist yet for more complex LSMs like CLM.This is because (i) high-resolution input data are often not available, (ii) the implementation of a new model setup to a specific region is relatively time consuming and (iii) careful parameter estimation is required to allow for meaningful predictions.
LSM predictions of carbon, water and energy fluxes are still subject to a high degree of uncertainty due to (i) model structural deficits related to an imperfect and incomplete model representation of the biogeochemical processes (Todd-Brown et al., 2012;Foereid et al., 2014), (ii) poorly constrained model parameters (Abramowitz et al., 2008;Beven and Freer, 2001;Todd-Brown et al., 2013), (iii) errors in the representation of initial model conditions generated via a spin-up (Carvalhais et al., 2010;Kuppel et al., 2012;Xia et al., 2012;Exbrayat et al., 2014) and (iv) errors in both atmospheric and land surface input data.Some studies estimate the uncertainty of terrestrial carbon flux predictions based on an ensemble of many different LSMs (e.g., Fisher et al., 2014;Huntzinger et al., 2012;Piao et al., 2013;Zhao et al., 2016).These studies focus on differences between models and therefore model structural deficits.Those studies highlight that (i) carbon flux predictions are generally highly uncertain, which contributes to the uncertainty in climate change predictions, (ii) interactions of the different processes and drivers are not understood satisfactorily and (iii) models require structural improvement to produce more consistent predictions.In order to improve LSM model structure and thus model-data and inter-model consistency, a more comprehensive understanding of model functionality and the contribution and link of the different model error sources is required.However, as highlighted by Xiao et al. (2014), the uncertainty of carbon fluxes obtained by ecosystem and land surface models has not been analyzed and quantified enough, particularly in regional-scale studies.
Whereas the uncertainty of land surface model parameters has been subject to intensive investigations (e.g., Ren et al., 2013;Xiao et al., 2014), and several works are dedicated to reducing this uncertainty, for example by parameter estimation with data assimilation methods (e.g., Safta et al., 2015), the other sources of uncertainty are less studied.Earlier work has concluded that parameter uncertainty alone cannot explain observed deviations between measured and simulated NEE (e.g., Pridhoko et al., 2008;Wang et al., 2011), and the remaining deviations are often attributed to model structural errors.However, uncertainty in atmospheric forcings and initial conditions could also contribute to unexplained deviations between simulated and measured NEE.From global-scale studies it is known that NEE is sensitive to the climate scenario.For example, in a study the LSM LPJ-GUESS was forced with output from 18 different coupled atmosphere-ocean global circulation models or Earth system models, and showed very different NEE responses depending on the climate scenario.Of the 18 models, 10 project that the land will become a carbon source in the 21st century, while the other 8 indicate that the land will act as a carbon sink (Ahlstrom et al., 2012).Very few studies analyzed the impact of uncertainty in meteorological forcings on NEE at the plot, catchment or regional scale and for shorter time periods in the recent past.Studies which analyzed interannual variability in NEE detected the important role of temperature and precipitation as drivers of this variability (e.g., Keppel-Aleks et al., 2014).Gu et al. (2016) found that for the Missouri Ozark AmeriFlux forest site, the interannual NEE variability is smaller in simulations by CLM than in the data.On the other hand, some field studies for pairs of monitoring sites (Kwon et al., 2006) or experimentation sites (with temperature increase and precipitation increase or decrease; Xu et al., 2016) found a limited impact of temperature and precipitation differences (or changes) on NEE.Zhang et al. (2012) is one of the few studies that analyzed the role of uncertain meteorological forcings (together with parameter uncertainty) for calculating NEE with a process-based ecosystem model in more detail.Their simulations for a Korean pine mixed forest site with two different meteorological input data sets revealed clear differences in model response.Spadavecchia et al. (2011) performed a rigorous uncertainty analysis for the combination of parameter and forcing uncertainty with a simple LSM for a pine stand in Oregon, USA.They found that the contribution of parameter uncertainty to NEE uncertainty is larger than the contribution of forcing uncertainty, although it also has a significant impact.If meteorological stations are located more than 100 km away from the study site, forcing uncertainty starts to dominate parameter uncertainty.The relative contribution of initial state uncertainty to prediction uncertainty of LSMs has not been thoroughly studied, although it has been recognized as one of the major challenges in model-data fusion studies (Williams et al., 2009).Richardson et al. (2010) performed Monte Carlo inverse modeling to assimilate many different data types in a forest carbon cycle model for the Howland AmeriFlux site in Maine, USA.They estimated jointly model parameters and initial carbon pools.Although they found that model performance improved strongly, they also concluded that carbon pools are more difficult to constrain than ecosystem parameters.Peylin et al. (2016) updated jointly parameters and carbon pools with the land surface model ORCHIDEE for a large-scale application.These last two studies explicitly considered initial state uncertainty together with other uncertainty sources, but did not focus on the relative contribution of the different uncertainty sources.
As atmospheric forcings and initial state uncertainty, except parameter uncertainty, have not received much consideration in the literature until now, we investigate the uncer-tainty of model predictions in relation to uncertain model parameters, initial conditions and atmospheric forcings in this work.
Accordingly, this study has two main objectives.The first objective is to investigate to what extent can the parameter estimates based on the Markov chain Monte Carlo (MCMC) method enhance the model-data consistency of NEE and LAI for a small European catchment.The model used was CLM4.5 and compared with EC measurements and LAI from the RapidEye satellite.We applied successfully validated parameter estimates from Post et al. (2016) for C 3 grass, evergreen coniferous forest and broadleaf deciduous forest.For C 3 crops, we estimated a new set of parameters as the estimated parameters by Post et al. (2016) did not improve the model-data fit in verification experiments, using the Dif-feRential Evolution Adaptive Metropolis (DREAM; Vrugt, 2015).Main advantages of a multi-chain MCMC approach like DREAM are its robustness to find the global minimum and that it is not limited to Gaussian distributed states and parameters.The model performance with the new parameter estimates was then compared to a reference run with global default parameters.The second objective was to investigate the uncertainty of modeled NEE and LAI resulting from uncertain model parameters, atmospheric forcings and initial states.We set up three different ensembles with different combinations of perturbed model input data in order to obtain insight into the contribution of these three main sources of model uncertainty to the uncertainty of the final model output.
2 Data and methods

The Rur catchment
The Rur catchment (Fig. 1) is located around the Belgian-Dutch-German border region and covers an area of 2354 km 2 .It is characterized by two distinctly different areas of land use and climate.In the northern lowland part, precipitation amounts are lower (650-850 mm yr −1 ), and potential evapotranspiration is higher (580-600 mm yr −1 ) compared to the mountainous Eifel region in the south where annual precipitation is 850-1300 mm yr −1 and potential evapotranspiration is 450-550 mm yr −1 (Montzka et al., 2008a, b).The annual mean temperature in the catchment ranges from about 7.5 • C in the south of the catchment to about 10.3 • C in the north (Baatz et al., 2014).The northern part is dominated by fertile loess soils and is intensely used for agriculture.Sugar beet and cereals (winter wheat, barley) are the most cultivated crops in the catchment (Fig. 1).In the mountainous southern part shallow, less fertile soils dominate.It is mainly covered by meadows and forests.The Rur catchment is one of four central research regions of the TERENO project (Zacharias et al., 2011).The main goal of TERENO is to determine global change impacts across different ter- restrial compartments at the regional level.Therefore, comprehensive input and evaluation data are available for the catchment, including information on land use (Lussem and Waldhoff, 2013), LAIs (Ali et al., 2015;Reichenau et al., 2016) andEC data (Schmidt et al., 2012;Graf et al., 2014;Kessomkiat et al., 2013;Post et al., 2015).

Eddy covariance data
Eddy covariance (EC) data in the Rur catchment were measured for four C 3 crop sites for the evaluation periods summarized in Table 1.The EC crop sites Merzenhausen (ME), Niederzier (NZ), Selhausen (SE) and Engelskirchen (EN) are located in the northern lowland region of the catchment.In ME and SE winter wheat was grown during the measurement period, and in EN and NZ sugar beet.The EC towers in ME and SE were permanently installed.For those sites EC data were available for more than 1 year.EC data at EN and NZ were measured by a roving station, which was installed for 2 to 3 months at each of the three sites (Table 1).The complete processing of the raw data was performed with the TK3.1 software (Mauder and Foken, 2011), using the quality flagging and uncertainty estimation scheme by Mauder et al. (2013) as outlined in Post et al. (2015).Only nongap-filled data with quality flag 0 (high-quality data) and 1 (moderate-quality data) were included in this study.Rollesbroich (RO) is an extensively used grassland site (Korres et al., 2010;Post et al., 2015).Wüstebach (WÜ; Graf et al. 2014) is located in the Eifel National Park and is largely covered by evergreen coniferous forest, particularly spruce.The parameters adopted for the current study were previously estimated and validated at RO and WÜ in a previous study (Post et al., 2016).

RapidEye-based leaf area index
RapidEye is a commercial satellite mission initiated by RapidEye AG (Tyc et al., 2005) and consists of five identical satellites, which were launched in August 2008.RapidEye provides multi-spectral images of five spectral bands (blue, green, red, red edge and near-infrared).The nominal temporal resolution is daily.The ground sampling distance is 6.5 m and the pixel size is 5 m for the orthorectified Level 3A data used here.The LAI data derived from satellite images are determined based on the NDVI (normalized difference vegetation index), which is related to the chlorophyll content in leaves.The NDVI is calculated based on the reflectances at near-infrared (NIR) and red (RED).NDVI-based LAI data (LAI NDVI ) are affected by various error sources, which can result in high uncertainty of the LAI estimate.The major error sources are summarized in Garrigues et al. (2008) and include (i) uncertainties in surface reflectance measure-  (Chen et al., 1997).Clumping effects on landscape scale are also related to the fact that LAI algorithms have been calibrated at the plot scale, but are applied over larger heterogeneous pixels, which can induce substantial scaling biases on the LAI estimates (Garrigues et al., 2006).The latter error source is assumed relatively small for the LAI retrieval from RapidEye due to the high spatial resolution (5 m) of the images.Studies on verification or uncertainty quantification of LAI data derived from satellite images are very rare (Garrigues et al., 2008), but they are very important for land surface model applications.Ali et al. (2015) used orthorectified and radiometrically corrected Level 3A data to generate 5 m resolution LAI data for the Rur catchment, with the same methodology previously applied to MODIS data (e.g., Propastin and Erasmi, 2010).Those LAI data were validated for two crop sites (Merzenhausen and Selhausen) in the Rur catchment using in situ data measured with a destructive, ground-based method (Bréda, 2003) at several equally distributed points within the fields at 6 and 11 days during the growing season.The results indicate that LAI and LAI NDVI measure in site, derived from RapidEye, are highly consistent (Ali et al., 2015).Because only the two crop sites were included in this evaluation approach, the LAI data for crop sites (winter wheat) are considered most reliable.For this study, the LAI NDVI data for the Rur catchment obtained according to Ali et al. (2015) were aggregated from the 5 m 2 to the 1 km 2 grid of the CLM Rur catchment domain by arithmetic averaging.

Community Land Model setup
The Community Land Model (CLM) version 4.5 (Oleson et al., 2013) with the active biogeochemistry model (CLM4.5-BGC) is fully prognostic with respect to the seasonal timing of vegetation growth and litter fall.The day length, soil and air temperature and soil water content are main determinants of plant phenology.Plant phenology representation follows three different schemes depending on the particular plant functional type (PFT): (1) evergreen phenology, (2) seasonal deciduous phenology and (3) stress deciduous phenology.The four most widespread CLM PFTs in the Rur catchment are (1) needleleaf evergreen temperate trees, (2) broadleaf deciduous temperate trees, (3) C 3 non-Arctic grass and (4) C 3 crops.The average PFT coverage of the vegetated land in Rur catchment was ∼ 34 % C 3 crops, ∼ 32 % grassland, ∼ 17 % broadleaf deciduous forest and ∼ 14 % coniferous forest.Evergreen coniferous trees follow phenology scheme 1, deciduous broadleaf trees follow scheme 2 and C 3 grass and C 3 crops follow scheme 3 (Oleson et al., 2013).The LAI and all carbon and nitrogen state variables in the vegetation, litter and soil organic matter are calculated prognostically.NEE in CLM is calculated as the sum of gross primary production (GPP) and total ecosystem respiration (ER).Ecosystem respiration includes heterotrophic respiration (HR) and autotrophic respiration, the sum of maintenance respiration (MR) and growth respiration (GR; Oleson et al., 2013).Photosynthesis is determined at leaf scale (Dai et al., 2004;Thornton and Zimmermann, 2007) and is upscaled by means of the LAI.The definition of land use cover in CLM follows a nested sub-grid hierarchy structure (Oleson et al., 2013).The main land units, which are defined as percentage coverage per grid cell are glacier, wetland, vegetated land, lake and urban area.Each land unit follows a different sub-model scheme to calculate the respective carbon, water and energy fluxes for a certain grid cell.Each vegetated land unit has 15 soil columns and can include different plant functional types.The PFTs are also defined as percentage area of the vegetated area within the grid cell.
To apply CLM for the Rur catchment domain, a land surface input data set was generated with a spatial resolution of 1 km 2 .The land unit for each grid cell and the PFT distribution of each vegetated land unit were defined based on the land use classification derived from supervised, multitemporal remote sensing data analysis using RapidEye and ASTER data (Waldhoff et al., 2012;Lussem and Waldhoff, 2013).In addition to the land use coverage, CLM requires information on the percentages of clay and sand for each of the 15 soil columns of the vegetated area per grid cell.For each soil layer, the soil texture was defined based on the German soil map (BK50) provided by the Geological Survey of North Rhine-Westphalia.Mean topographic slope, mean elevation and maximum fractional saturated area were determined for the 1 km 2 grid from a 10 m resolution digital elevation model (scilands GmbH, 2010).Additional land surface data required to run CLM4.5 such as soil color were adopted from the default CLM4.5 0.9 • × 1.25 • resolution global land surface data file of year 2000 (surfdata_0.9× 1.25_simyr2000_c110921.nc).
The atmospheric forcing data used to run CLM are hourly time series of precipitation (mm s −1 ), incoming shortwave radiation (W m −2 ), incoming longwave radiation (W m −2 ), atmospheric pressure (Pa), air temperature (K), specific humidity (kg kg −1 ) and wind speed (mm s −1 ) at the lowest atmospheric level.The data were obtained for the years 2008-2013 from the reanalysis COSMO-DE data set provided by the German Weather Service (DWD) in 2.8 km × 2.8 km resolution (Baldauf et al., 2009).The COSMO-DE data were downscaled to 1 km 2 using natural neighbor interpolation based on Delaunay triangulation.
To generate the initial state variables such as the carbon and nitrogen pools, CLM was spun up over a period of 1200 years, using COSMO-DE data of the years 2008-2010.According to DWD (http://www.dwd.de/DE/klimaumwelt/klimaatlas/klimaatlas_node.html), the annual average temperature in the years 2008 and 2009 was ∼ 0.5-1.0• C higher than the long-term average  and the mean annual temperature in 2010 was ∼ 0.5-1.0• C lower.Mean precipitation amounts and freezing days were representative for the long-term average.We also studied the effect of the forcing data used for the model spin-up.For example, we tested using a longer time series (1998)(1999)(2000)(2001)(2002)(2003)(2004) of global climate data.However, we found that using more recent, regional forcing data during the spin-up resulted in carbon and energy fluxes in better agreement with the observations.The model states obtained after the 1200-year spin-up were then used as input for a second 3-year "exit spin-up" also using the meteorological data for the years 2008-2010.The exit spin-up in CLM is necessary for technical reasons and switches the CLM settings from the (accelerated) spin-up mode to the "normal" mode in terms of the calculated carbon-nitrogen cycling.A longer exit spin-up period of 100 years was tested, but results and model states differed only minimally from the case with a 3-year exit spin-up.In this study, we estimated the posterior probability density functions (pdfs) of five PFT-specific CLM4.5 parameters for C 3 crops using the adaptive Markov chain Monte Carlo (MCMC) method DREAM (zs) (Ter Braak and Vrugt, 2008;Laloy and Vrugt, 2012;Vrugt, 2015), according to the approach presented in Post et al. (2016).DREAM (zs) is based on the Bayes' theorem (A1).In multi-chain MCMC methods like DREAM (zs) , different (in our case three) Markov chains generate random walks through the parameter space and successively visit solutions with stable frequencies stemming from a stationary distribution.It is assumed herein that the prior distribution is uniform (non-informative) using ranges of the predefined upper and lower bounds for each parameter according to Post et al. (2016).The use of multiple chains offers robust protection against premature convergence and thus the probability of becoming stuck in local minima is considerably reduced compared to single-chain methods (Vrugt, 2015).The convergence of the parameters is monitored with the R convergence diagnostic of Gelman and Rubin (1992) which is computed for each dimension as the ratio of variance within one chain and the variance between different chains.Convergence is achieved if R is smaller than 1.2 for all parameters.Because every point in the parameter space is hit with a frequency proportional to its probability, the random walk allows one to iteratively find a stable posterior distribution.Hence, after convergence, the density of the acceptance points in the parameter space approximates the posterior pdf according to Bayes' theorem (see Appendix).DREAM exhibits excellent sampling efficiencies for multimodal and high-dimensional posterior distributions (Vrugt, 2015), which is an important advantage if applied to complex models like CLM.A full description of the DREAM (zs) algorithms can be found in Vrugt (2015).
To constrain the C 3 crop parameters, a 1-year time series of non-gap-filled, half-hourly NEE data from the EC tower at the ME site (1 December 2011-30 November 2012) was used.The five key parameters are (1) the fraction of leaf N in Rubisco enzyme (fl N R ), (2) the growth respiration factor (g R ), (3) the rooting distribution parameter (1 m −1 ) (r b ), (4) the specific leaf area at top of canopy (m 2 g C; sla top ) and (5) the soil water potential at full stomatal closure (mm; ψ c ).This selection is based on Post et al. (2016) and previous studies which highlighted the sensitivity of these parameters (Foereid et al., 2014;Göhler et al., 2013).
Parameter estimates for C 3 grass, evergreen coniferous forest and broadleaf deciduous forest have been successfully estimated and validated for central European sites by Post et al. (2016) with DREAM (zs) -CLM4.5 and are therefore adopted in this study.Post et al. (2016) used NEE data from the RO and WÜ sites to estimate parameters for C 3 grass and coniferous forest, as well as FLUXNET data from the Fontainebleau site (FR-Fon) in France, located about 300 km southwest of the Rur catchment (48.4763 • N, 2.7801 • E; e.g., Migliavacca et al., 2015).The validation was based on NEE data from EC sites of corresponding PFTs that were located ∼ 600 km away from the parameter estimation sites.Post et al. (2016) estimated the parameters fl N R , g R , r b , sla top and ψ c jointly with three more parameters: the temperature coefficient (Q 10 ), the base rate for maintenance respiration (mr b ) and the Ball-Berry slope of conductance-photosynthesis relationship (b s ).Q 10 quantifies the fractional change of the respiration rate in response to a 10 • C temperature rise.Q 10 , mr b and b s in CLM4.5 are by default hard-wired in the CLM4.5 source code and not defined separately for each PFT like the other five parameters.This implies that one single value of those parameters is defined globally, which is then applied to all PFTs.However, Post et al. (2016) found that values of the estimated hard-wired parameters, especially Q 10 , varied for different PFTs or sites.The fact that Q 10 , mr b and b s are hard-wired in CLM imposes a challenge when applying the jointly estimated parameter sets on regional scale because it requires that Q 10 , mr b and b s be defined separately for each PFT.Therefore, we modified CLM in order to input the PFT-specific values for Q 10 , mr b and b s of the jointly estimated parameter sets by Post et al. (2016).Post et al. (2016) showed that the estimated parameter values for C 3 crops did not enhance model performance compared to the default parameter values, if applied to an evaluation year or another C 3 crop site.This applies to the parameter values that were estimated and applied using a 1-year time series of NEE data, as done in this study.Therefore, we estimated a new set of C 3 crop parameters.

Evaluation of parameter estimates with eddy covariance NEE data
In order to evaluate the performance of the estimated CLM C 3 -crop-specific parameters in terms of the consistency of modeled and measured data, two CLM cases were defined and compared: (i) a reference run (CLM-Ref), which is a forward run for the years 2011-2013 with default parameters and the atmospheric input data and initial conditions as described in Sect.2.2, and (ii) an ensemble run with 60 realizations, for the same time period, same initial conditions and same atmospheric input data as CLM-Ref, but with parameters sampled randomly from the DREAM (ZS) multivariate posterior pdfs (CLM-Ens P ).Thus, we did not perform a separate model spin-up for each of the estimated set of parameters, as this was computationally too expensive.The parameter estimates for ME were evaluated with eddy covariance NEE data from four different C 3 crop sites in the Rur catchment (ME, SE, NZ and EN).For ME, the NEE observation time series of the year that followed the parameter estimation year were used for evaluation (Table 1).The evaluation was conducted with time series of halfhourly NEE data measured at the EC tower sites.First it was verified that the PFT coverage of one CLM grid cell coincided with the dominant PFT at the respective EC tower site.More than 80 % of each of the four grid cells was covered by C 3 crops.This was considered sufficient for grid-based model evaluation.In the following, the subscript "gc" is used to refer to the grid cell in which one of the EC towers is located; e.g., ME gc refers to the respective grid cell of the ME site.The length of the available NEE time series differed among the EC sites, ranging from 2 to 12 months (including data gaps).See Fig. 1.Accordingly, only the model output that coincides with the observed NEE data was used.
The calculated NEE output was evaluated based on the following indices: The root mean square error (RMSE m ): where y is measured half-hourly NEE (µmol m −2 s −1 ) at time step i for the given time series of length n and m is the modeled equivalent.
The mean absolute difference of the mean diurnal NEE cycle (MAD dir ): where ȳ is average measured NEE at a given time d during the day (µmol m −2 s −1 ) and m is the modeled equivalent.This performance measure is evaluated half-hourly (48 times per day).For the sites ME and SE, where a complete year of NEE data was available, the mean diurnal NEE cycle and the index MAD diur calculated separately for each of the four seasons within the evaluation year (winter: December-February; spring: March-May; summer: June-August; autumn: September-November).In order to summarize the four seasonal MAD diur indices to one MAD diur index representative for the whole evaluation year, the four seasonal indices were averaged.For the other sites where only NEE data for ∼ 2-3 months were available, the indices including MAD diur were calculated for this shorter time series.The relative difference (%) of the NEE sum ( NEE) was calculated for all half-hourly data available in the respective evaluation period (RD NEE ): where y is measured NEE (non-gap-filled) and m is modeled equivalent.
All evaluation indices were determined for both CLM-Ref and for the ensemble mean of CLM-Ens P .Post et al. (2016) showed that parameters estimated for the forest sites WÜ and FR-Fon are well transferable to other FLUXNET sites of corresponding PFTs located more than 600 km from WÜ and FR-Fon: Tharandt in Germany (DE-Tha; e.g., Grünwald and Bernhofer, 2007) and Hainich in Germany (DE-Hai; e.g., Knohl et al., 2003).Because WÜ is the only forested EC tower site in the Rur-catchment and no additional EC towers were available for the validation of the estimated parameters, we assume the parameter estimates provide more reliable NEE data for a large part of the forested area in the www.biogeosciences.net/15/187/2018/Biogeosciences, 15, 187-208, 2018 H. Post et al.: Evaluation and uncertainty analysis of regional-scale CLM4.5 net carbon flux estimates Rur catchment.We also adopted successfully estimated and validated parameters for the C 3 grass site RO from Post et al. (2016).

Evaluation of LAI predictions
Various studies highlight current deficits to accurately estimate and validate LAI derived from multispectral satellite images for deciduous and needleleaf forest (Ganguly et al., 2012;Tillack et al., 2014;Härkönen et al., 2015).In addition, the LAI data derived from RapidEye (LAI RapidEye ) used herein have not yet been validated for the forest PFTs.Thus, the model performance in terms of the LAI was only evaluated for grid cells with more than 80 % C 3 grass or C 3 crop coverage, not for the forest PFTs.The effect of the parameter estimates on LAI was evaluated for the 1 km 2 grid of the Rur catchment domain.LAI RapidEye data of about 18 days (depending on the location) between 1 November 2011 and 16 September 2012 were used for the LAI evaluation and compared with modeled LAI data on those days and those grid cells for which RapidEye data were available.The LAI was evaluated separately for C 3 grass and C 3 crops, and both for the winter half-year (November-April) and summer halfyear (May-October).To evaluate and compare modeled LAI for CLM-Ens P and CLM-Ref, the mean absolute difference between simulated LAI and measured LAI (MAD LAI ) was calculated over all grid cells with more than 80 % C 3 grass (n pft = 224) or C 3 crop (n pft = 404) coverage: where m is the modeled daily LAI (m 2 m −2 day −1 ) and y is the measured equivalent, n days is number of days and n pft is number of grid cells for which LAI RapidEye data were available at a particular day.RMSE LAI was calculated according to Eq. ( 1).The mean LAI for each PFT was calculated by where n days is the number of days RapidEye data for a given PFT were available, and LAI i is the LAI observed or modeled at a particular day (m 2 m −2 day −1 ).For each LAI PFT value, a respective standard deviation was calculated.

Perturbation of atmospheric input data
In order to take the uncertainty of the meteorological input data into account, a 60-member ensemble of perturbed meteorological forcings was generated for the years 2008-2012 using hourly COSMO-DE data.The approach used to generate the perturbation fields has previously been applied in soil moisture data assimilation studies (Reichle et al., 2007(Reichle et al., , 2010;;Kumar et al., 2012;Han et al., 2012Han et al., , 2013Han et al., , 2014)).Perturbation fields were applied to air temperature (K), incoming longwave radiation (W m 2 ), incoming shortwave radiation (W m 2 ) and precipitation (mm s −1 ).Normally distributed additive perturbations were applied to longwave radiation (LW) and air temperature (Temp).Log-normally distributed multiplicative perturbations were applied to precipitation (Prec) and shortwave radiation (SW).The parameters used for the perturbations were adapted from Han et al. (2014) and are listed in Table 2.The perturbations for the different atmospheric variables are cross-correlated in order to generate physically plausible perturbations of the atmospheric forcings.We considered spatial correlation in addition to temporal correlation of the meteorological variables.The multiplicative perturbations are truncated by a defined maximum of 2.5 standard deviations, to remove outliers from the generated perturbation fields (Reichle et al., 2007).The spatially correlated noise is calculated first using the fast Fourier transform approach (Park and Xu, 2013) with a 10 km spatial correlation scale.Next, the temporally correlated noise is added to the spatially correlated noise.The temporal correlation for all perturbed variables is imposed using a first-order AR(1) autoregressive model (Reichle et al., 2007).The AR(1) temporal correlation coefficient for a time lag of 1 day was 0.368 for all variables.

Generation of perturbed initial state input files and the perturbed forward run for the Rur catchment
As shown in various studies, carbon fluxes predicted by land surface models strongly depend on the carbon-nitrogen pools generated during the model spin-up (Carvalhais et al., 2010;Kuppel et al., 2012).In order to take the uncertainty of initial states into account, a 60-member ensemble of perturbed initial states was generated.This was done via a 15year spin-up using perturbed atmospheric forcings for the years 2008-2010 (Sect.2.4.1) and parameter values sampled randomly from the joint DREAM posterior pdfs.The initial conditions used at the beginning of the 15-year perturbed spin-up were taken from the end of the main 1200year spin-up plus 3-year exit spin-up.This perturbation takes into account the uncertainty of the less stable carbon and nitrogen pools, but uncertainty with respect to the stable and resistant carbon and nitrogen pools is not taken into account.
As the stable nitrogen pool is not perturbed, the availability of N from mineralization is very similar across the ensemble members.This might be an unproblematic assumption given N fertilization by agriculture and atmospheric deposition, but it also implies that the uncertainty of the net N availability at the sites is likely underestimated.3), which were estimated for C 3 crops or adopted from Post et al. (2016) for C 3 grass, evergreen coniferous forest and broadleaf deciduous forest.For C 3 crops, the setup of CLM-Ens P was identical to the one used for the evaluation of the estimated C 3 crop parameters in terms of the NEE model-data consistency (Sect.2.3.2).
2. CLM-Ens PA : ensemble model runs for 1 December 2012-30 November 2013 with parameter values sampled according to CLM-Ens P and perturbed atmospheric forcings, but using deterministic initial states.
3. CLM-Ens PAI : ensemble model runs according to CLM-Ens PA , but starting from perturbed initial states.
4. CLM-Ens P+Q10 : a forward run for the period 1 December 2012-30 November 2013 with deterministic forcings, deterministic initial states and parameters sampled according to CLM-Ens P , except for Q 10 .Q 10 values for C 3 crop and forest PFTs were sampled from a normal distribution with a mean of 2.0 and a standard deviation of 0.5 with a minimum and maximum bound set to 1.3 and 3.0.We choose Q 10 because of its central role in carbon stock and flux predictions and the respective uncertainties in most land surface models (Post et al., 2008;Hararuk et al., 2014), including CLM (Post et al., 2016).The objective of ensemble CLM-Ens P+Q10 was to estimate the uncertainty of the CLM output induced by the prior uncertainty of one single key parameter, and compare it to the posterior uncertainty from CLM-Ens p .
The modeled NEE sum (g C m −2 yr −1 ) for the period 1 December 2012-30 November 2013 was calculated for each  1) including all half-hourly time steps where eddy covariance data (Obs) were available, for the CLM ensembles Ens P with estimated parameters and Ens PAI with additional perturbed atmospheric forcings and perturbed initial states (including standard deviations of the 60 ensemble members), in comparison to a reference run with default parameters (CLM-Ref).In all CLM cases time series were subsetted according to available Obs.
ensemble member using the complete time series of daily outputs.The PFT-specific NEE sum ( NEE PFT ) was then calculated separately for each of the four main PFTs by averaging the simulated NEE sum over all grid cells with more than 80 % coverage by the particular PFT.The analysis of model uncertainty was then based on the standard deviation (σ ) of NEE PFT between the different ensemble members, σ ( NEE PFT ).Accordingly for GPP and ER, σ ( GPP PFT ) and σ ( ER PFT ) were calculated.

Evaluation of simulated NEE and LAI with estimated parameter values
As shown in  evaluation grid cells dominated by C 3 crops, except EN gc , if estimated parameters were used.The RMSE m was reduced by up to 13 % (SE gc ) for all sites except NZ gc .RD NEE was most notably reduced compared to the other evaluation indices.The measured NEE was negative for each of the four EC sites and for all of the respective model outputs (Fig. 2).This would imply that all sites were net carbon sinks during the evaluation period.However, because data gaps were included in the NEE time series and because more EC data were available for summer and daytime than for winter and nighttime, those values do not represent the true NEE sum of the evaluation period.For CLM-Ref and all evaluation sites, NEE differed notably from the observed data (Fig. 2,  2 highlights that the uncertainty of modeled NEE is probably underestimated if the uncertainty of meteorological input data and initial states is not taken into account.The standard deviation of CLM-Ens PAI is notably higher compared to CLM-Ens P , where only parameter uncertainty is considered.Table 5 summarizes the catchment average LAIs and the corresponding mean absolute differences (MAD LAI ) and RMSE between the observed and modeled LAI PFT .In the summer half-year, LAI Ens was closer to LAI RapidEye than LAI Ref , both for C 3 grass and C 3 crops.RMSE LAI was on average 1.5 for CLM-Ens P and 3.4 for CLM-Ref.MAD LAI for summer was 1.2 for CLM-Ens P and 3.4 for CLM-Ref on average.In winter, the difference was only 0.2 (RMSE LAI ) and 0.3 (MAD LAI ) and thus the improvement not significant.
While CLM-Ref overestimated LAI PFT , ENS P underestimated LAI PFT .Overall, the standard deviation of LAI PFT , i.e., the LAI variation over all included grid cells n throughout the half-year period, was very high for both the observed and the modeled data.The mean annual LAI cycles for ME gc and SE gc (Fig. 3) indicate that the uncertainty of LAI was un-Table 5. Mean LAI values (LAI PFT ) for the RapidEye data (Obs), the CLM ensembles with estimated parameters (Ens P ) and the CLM reference run (Ref) with default parameters, as well as the mean absolute differences (MAD LAI ) and root mean square error (RMSE LAI ).N is the number of grid cells in the catchment with more than 80 % coverage of one particular PFT.1271 3.0 ± 1.4 1.8 ± 0.6 5.5 ± 1.4 1.9 3.2 1.5 2.6 C 3 crop w 4392 2.3 ± 1.3 0.7 ± 0.5 2.4 ± 1.7 2.1 1.9 1.7 1.6 C 3 crop s 2887 2.1 ± 0.9 1.5 ± 0.5 5.5 ± 0.8 1.0 3.6 0.8 3.5

PFT
w: winter half-year (November-April); s: summer half-year (May-October); ± standard deviation; Ens P : CLM ensemble with estimated parameters.derestimated for CLM-ENS P .The uncertainty of simulated LAI was much higher for CLM-Ens PAI .Figure 3 highlights that for C 3 crops, the simulated yearly LAI cycle did not match well the observed and expected annual LAI course.The delay of the plant emergence indicated by these LAI courses is related to the strong underestimation of daytime NEE in spring 2013 (Figs. 4,5).This underestimation of NEE can mainly be attributed to an underestimation of GPP, which is too low in spring because the simulated plant onset was about 2 weeks later than observed.For CLM-Ref and for most of the CLM-Ens PAI ensemble members, leaf onset started in May.For those model realizations, the underestimation of daytime NEE in spring was highest.In contrast, for CLM-Ens P and a small proportion of CLM-Ens PAI , leaf onset started in March.For those cases, the underestimation of daytime NEE in spring was notably lower.This elucidates the close link of modeled NEE and LAI and highlights that errors in the timing of leaf onset can lead to substantial errors in simulated NEE.The evaluation of simulated LAI showed that modeled and observed LAI are closer for simulations with estimated CLM parameters than for simulations with default parameters and that for C 3 crops, simulated leaf onset was better represented.

Uncertainties of simulated carbon fluxes on catchment scale
In this section, results of modeled NEE, GPP and ER are summarized and compared for the CLM ensembles CLM-Ens P , CLM-Ens PA , CLM-Ens PAI and CLM-Ens P+Q10 with focus on the model uncertainty.The uncertainty is evaluated with the standard deviations σ ( NEE PFT ), σ ( GPP PFT ) and σ ( ER PFT ) of the 60 ensemble members.In this case the time series from 1 December 2012 to 30 November 2013 of the simulated fluxes was used to calculate the PFTspecific, catchment-average sums NEE PFT , GPP PFT and ER PFT (g C m −2 yr −1 ).
Figure 6 shows the means and standard deviations of NEE PFT for the four CLM ensembles and for CLM-Ref.The sign of NEE PFT indicates that for CLM-Ref, all PFTs in the catchment act as net carbon sources.NEE PFT changed significantly if estimated parameter values were applied (CLM-Ens P ).In that case, all PFTs except C 3 grass acted as carbon sink.Ens P , the NEE sum became negative for most grid cells in the catchment including a large part of the northern lowland area (Fig. 7).Thus, with estimated parameters the catchment became a net CO 2 sink (disregarding CO 2 fluxes due to harvesting, land use change or fossil fuel combustion).Figure 7 indicates that both GPP and ER increased with estimated parameters.However, GPP increased more than ER.Since we found that at verification sites simulations with estimated parameters gave NEE sums closer to the observed NEE sums than simulations with default parameters, we as-sume the catchment-scale NEE sum (Fig. 7) and NEE PFT (Fig. 6) are more reliable for CLM-Ens P than for CLM-Ref.
As indicated in Fig. 6, for C 3 grass and C 3 crops, σ ( NEE PFT ) was significantly higher for CLM-Ens PA compared to CLM-Ens P .For CLM-Ens PA , σ ( NEE PFT ) was ∼ 45 g C m −2 yr −1 (C 3 grass) and ∼ 75 g C m −2 yr −1 (C 3 crops) compared to ∼ 2-3 g C m −2 yr −1 for CLM-Ens P .Thus, applying perturbed forcings led to a very strong increase in the uncertainty of simulated NEE PFT by a factor of ∼ 14 (C 3 grass) and ∼ 42 (C 3 crops).This finding reveals that C 3 grass and C 3 crops in CLM were extremely sensitive to minor differences in the meteorological input data.In comparison, the differences of σ ( NEE PFT ) between CLM-Ens PA and CLM-Ens PAI were minor.Thus, the much higher σ ( NEE PFT ) for CLM-Ens PAI compared to CLM-Ens P can mainly be ascribed to the perturbed meteorological input data, rather than the perturbed initial states.Due to the large uncertainty of NEE PFT for both CLM-Ens PA and CLM-Ens PAI , differences of the ensemble mean NEE PFT were not significant between the different two CLM ensembles.For C 3 crops and C 3 grass, GPP PFT and ER PFT were strongly sensitive to the perturbed forcings such that the strong increase in σ ( NEE PFT ) for CLM-Ens PA compared to CLM-Ens P is related to changes in both ER and GPP (Fig. 6).For coniferous forest and deciduous forest, σ ( NEE PFT ) for CLM-Ens PA and CLM-Ens PAI were ∼ 4.0-13.5 g C m −2 yr −1 and thus considerably lower in comparison to C 3 grass and C 3 crops.The increase in σ ( GPP PFT ) and σ ( ER PFT ) with perturbed forcings and perturbed initial states was also minor in comparison to CLM-Ens P .This indicates that the carbon cycle of the forest PFTs in CLM is much less sensitive to the atmospheric input data compared to C 3 grass and C 3 crops.However, when comparing CLM-Ens PA and CLM-Ens PAI , the additional effect of the perturbed initial states was notably higher for forest compared to C 3 grass and C 3 crops.For the forest PFTs, the ensemble mean NEE PFT changed significantly if initial states were perturbed (Fig. 6).This was related to the fact that estimated parameters and perturbed atmospheric forcings disrupted the steady state of the forest carbon pools.Hence, carbon pools increased relatively rapidly during the www.biogeosciences.net/15/187/2018/Biogeosciences, 15, 187-208, 2018 perturbed spin-up (Sect.2.4.2), which was not the case for C 3 grass and C 3 crop.As already shown in the previous section, the spread of CLM-Ens P was low, indicating that the uncertainty of simulated NEE induced by the posterior parameter values that were sampled from the estimated pdfs was low.This is, however, related to the fact that parameters were already conditioned to NEE data, which reduced the parameter uncertainty.The additional perturbation of Q 10 (CLM-Ens P+Q10 ) increased σ ( NEE PFT ) by a factor of ∼ 6 (C 3 grass, coniferous forest) to ∼ 19 (deciduous forest) in comparison to CLM-Ens P .This highlights the strong effect of an uncertain Q 10 parameter on the uncertainty of predicted carbon fluxes in CLM.As shown in Fig. 6, the perturbed Q 10 parameter strongly affected both ER PFT and GPP PFT .For the forest PFTs, the effect of Q 10 (Ens P+Q10 ) on the uncertainty of the regional-scale GPP PFT , ER PFT and NEE PFT was much stronger than the effect of both uncertain meteorological forcings and initial states (CLM-Ens PAI ).The ensemble mean ER PFT did not change significantly, except for C 3 grass.However, for coniferous forest, the ensemble means for CLM-Ens P+Q10 of ER PFT and NEE PFT were significantly different from CLM-Ens P .This is not surprising, since the mean of Q 10 = 2.0 for CLM-Ens P+Q10 was considerably lower than the Q 10 values of CLM-Ens P sampled from the estimated parameter sets (Table 3).

Discussion
The results of the study showed that the model-data consistency was enhanced with estimated parameters for RO gc , ME gc , WÜ gc , SE gc and NZ gc .For KA gc and EN gc , not all evaluation indices indicated an improvement of modeled NEE or the improvement was not significant.For the roving station sites NZ, KA and EN, NEE time series of only 2-3 months were available for evaluation in contrast to the other sites, where time series of a whole year were available.Accordingly, evaluation results were not directly comparable between those sites.Post et al. (2016) showed that the evaluation runs with parameter estimates had a strongly varying performance over the year.Thus, if a complete year of NEE data had been available for evaluation of the roving station sites, evaluation results for those sites would have been more informative.Particularly the transfer of parameters estimated for C 3 grass and C 3 crops to the catchment domain was found to be critical.As expected, parameter values estimated for the winter wheat site ME did not enhance the model performance at the grid cell EN gc which was dominated by sugar beet.Accordingly, results showed that parameter values estimated for one C 3 crop site are not necessarily transferable to other C 3 crop types characterized by different physiology and management.This highlights the limitations of large-scale modeling with land surface models that only distinguish between very broad groups of PFTs.For SE gc , where winter wheat was grown during the evaluation period like in ME, parameter estimates clearly improved the model performance in terms of NEE.Several studies have emphasized that LSM parameters can vary within one group of PFT such that the transfer from single site estimates to other sites with the same PFT is not trivial (Groenendijk et al., 2011;Xiao et al., 2011).In addition to deficits in the representation of plant phenology, management (harvesting, cutting, etc.) is not explicitly considered for C 3 crops and C 3 grass in CLM although it has a significant impact on NEE (Borchard et al., 2015).Therefore, observed temporal LAI variations in the growing season were not well represented in the default model simulations.However, estimated parameters by CLM-Ens P provided more reliable estimates of the NEE sums for ME and SE than the default run because the modeled plant emergence was shifted ahead.Therefore, we expect that the performance of CLM can be improved if it is coupled to a crop model which can treat specific plant traits and include management practices.Li et al. (2011) found that crop variety and management factors like irrigation, fertilization and planting date have a significant impact on NEE (and evapotranspiration) for the coupled LSM-crop model ORCHIDEE-STICS.Wu et al. (2016) showed that a crop model coupled to the LSM ORCHIDEE, combined with assimilation of many different data types, was able to reproduce many measurement data up to the measurement uncertainty, whereas other remaining errors could be related to management activities which were not correctly represented in the model.
For the forest PFTs, the uncertainty of the Q 10 parameter had the largest impact on NEE, followed by uncertainty of the initial states and uncertainty of meteorological forcings.The crucial role of the Q 10 parameter in carbon stock and flux predictions and the respective uncertainties in most land surface models has been highlighted in previous studies (Post et al., 2008(Post et al., , 2016;;Hararuk et al., 2014).Post et al. (2016) showed that Q 10 correlates strongly with other CLM4.5 key parameters like fl N R and the Ball-Berry slope of stomatal conductance as well as with the initial carbon-nitrogen pools.This may explain why the spread of the regional-scale GPP PFT , ER PFT and NEE PFT in Ens P+Q10 was notably higher for coniferous forest than for the other PFTs (Fig. 6).
It is important to stress again that uncertainty in initial states was considered by an additional 15-year spin-up with perturbed parameters and meteorological forcings.This short additional spin-up does not consider uncertainty of the most stable carbon and nitrogen pools.In addition, carbon and nitrogen pools are influenced by changing land use in recent centuries and by management practices, which were not considered here.Therefore, the true uncertainty related to the initial states will be larger than in these simulation experiments.The simulation results for C 3 grass and C 3 crops were surprising as uncertainty in the meteorological forcings had a much larger impact on uncertainty of NEE than the uncertainty of initial states and Q 10 had.We argue that the applied meteorological perturbations (standard deviations of 1.0 K for air temperature, 30 % of the incoming shortwave radiation and 50 % of precipitation amount) are realistic for meteorological reanalysis products, and in correspondence with other studies.The perturbations (which are also temporally correlated) result in considerable variations, between ensemble members, of leaf onset and senescence.The stress deciduous phenology scheme (which determines plant onset and offset for C 3 crops and C 3 grass in CLM as outlined in Sect.2.2) is strongly determined by various arbitrary thresholds such as "crit_offset_swi" (water stress days for offset trigger) or "crit_onset_fdd" (critical number of freezing degree days to trigger onset).The phenology representation of C 3 crops and C 3 grass could be too sensitive to those thresholds (Dahlin et al., 2015).Verheijen et al. (2015) argue that the use of PFTs instead of plant traits in LSMs reduces the adaptive response of vegetation to environmental drivers.
A second possible reason for the large impact of the perturbation of meteorological forcings on NEE uncertainty is the preferential location of the C 3 crops and C 3 grass in the drier, northern part of the catchment, which sometimes experiences drought stress during summer.The perturbation of the precipitation amounts affects drought stress and can enhance the variability in simulated NEE and LAI for C 3 crops and C 3 grasses in drier areas.Thus, the impact of the uncertainty of meteorological forcings depends on the local conditions.Jung et al. (2011) already showed that interannual variability of NEE is larger in semi-arid and semi-humid areas than in other regions.In summary, the large impact of uncertainty of meteorological input on NEE uncertainty is likely primarily model specific, but also related to the semi-humid conditions in the agricultural part of the region.High impact of uncertainty of meteorological conditions can be expected in other semi-humid and semi-arid regions and should be taken into account in simulation studies.
In order to better estimate the uncertainty of the initial carbon and nitrogen pools, it is necessary to include the spin-up (for each of the ensemble members) in the data assimilation experiments where parameters are estimated.As model runs have to be repeated thousands of times, this would be extremely CPU expensive.The spin-up should also consider uncertainty of historical land use and uncertainty of historical meteorological conditions.In particular, land use and meteorological conditions of the last few centuries impact initial states.We believe that it is important to jointly consider the uncertainty of initial conditions and parameters in LSM forward and inverse modeling runs.Pinnington et al. (2016) already pointed towards the importance of considering correlations between initial states and parameters in inverse modeling studies.

Conclusions
This study evaluated ecosystem parameters estimated with DREAM (zs) for the Community Land Model (CLM4.5)and analyzed the uncertainty of modeled NEE and LAI at regional scale for a catchment in the border region of western Germany.The ecosystem parameters were estimated for EC sites with different plant functional types (PFTs) in western Germany and northern France by conditioning them to time series of NEE data.These parameters were assigned to a high-resolution (1 km 2 ) CLM4.5 setup for the Rur catchment in Germany.It was evaluated whether the distributed land surface model reproduced measured NEE and LAI better with estimated parameters than with default parameters.Moreover, a comprehensive model uncertainty analysis was done for the four main PFTs in the catchment, comparing the effect of different model error sources (parameters, atmospheric forcings and initial states) on the CLM ensemble spread.
Parameter estimation using NEE data was found suitable to improve LAI predictions.For C 3 crop, the timing of plant emergence in spring was more accurate.This resulted in an improved characterization of NEE, which by default was highly overestimated in spring in cases where the plant emergence was delayed.This highlights the potential of DREAM-CLM parameter estimates for utilizing individual observations at a minimal number of grid cells to improve catchment-wide predictions of LAI and carbon fluxes.
Estimated parameters reduced the relative difference between the observed and modeled NEE sum significantly for most evaluation sites compared to a reference run with default parameters.For all PFTs except C 3 grass, the sign of the mean annual NEE sum over the catchment was reversed from positive to negative with estimated parameters compared to default parameters.This implies that forest and C 3 crop areas in the catchment were predicted as net CO 2 sources with global default parameter values and as net CO 2 sinks with the successfully validated parameter estimates.This elucidates the potential and relevance of parameter estimation in terms of obtaining more reliable estimates of regional carbon balances.
If estimated parameters were sampled without additional consideration of uncertain atmospheric input data and uncertain initial conditions, the uncertainty of predicted NEE and LAI was underestimated.Constraining CLM parameters with DREAM (zs) resulted in a very low uncertainty of the predicted NEE and LAI.However, with additional consideration of uncertainty in the initial model conditions and atmospheric input data, the uncertainty of NEE was significantly higher.Thus it is essential to take into account the uncertainty of parameters, atmospheric input data and initial states for predictions of carbon fluxes or stocks with CLM4.5.
The effect of uncertainty of atmospheric forcing data on the LAI and NEE uncertainty was considerably higher for C 3 grass and C 3 crops than for the forest PFTs.This is re-lated to the specific C 3 crop/C 3 grass phenology representation and internal model thresholds in CLM.Many of these thresholds relate to temperature and strongly control leaf onset and senescence.The strong effect of the perturbed forcing data on modeled carbon fluxes in CLM4.5 is closely linked to the effect of uncertain model parameters like the temperature coefficient Q 10 .
The model uncertainty resulting from uncertain atmospheric forcing data for C 3 grass and C 3 crops is additionally enhanced because these crops are located in the northern part of the catchment, which is prone to some drought stress in summer.In combination with soil-moisture-related model thresholds, the effect of the perturbed precipitation on modeled NEE and LAI can be regionally different due to differences in the regional climate.This stresses the importance of regional-scale modeling.
This study demonstrated that if different crop types are grown in a region, it is particularly important to parameterize them independently in order to obtain reliable carbon flux estimates with a land surface model like CLM.Separate differentiation of different crop types including a crop-specific representation of phenology and management is very important for obtaining more reliable carbon flux and stock estimates in the future, and it is foreseen for future CLM versions (personal communication with collaborators from the CLM development team at NCAR, 2016).

2. 3
Parameter estimation and evaluation of model performance 2.3.1 Parameter estimation with DREAM (zs) of model uncertainty In order to evaluate the effect of uncertain CLM parameters, meteorological input data and initial model states, we set up four CLM ensembles with 60 ensemble members each: 1. CLM-Ens P : ensemble model runs for 1 December 2012-30 November 2013 with deterministic (nonperturbed) initial states and non-perturbed input data from COSMO-DE (Sect.2.2).The initial states are the outcome of the default 1203-year spin-up, without perturbations.The parameter values were sampled randomly from the DREAM (zs) multivariate posterior pdfs (Table

Figure 2 .
Figure2.NEE sum ( NEE) determined for the evaluation periods (see Table1) including all half-hourly time steps where eddy covariance data (Obs) were available, for the CLM ensembles Ens P with estimated parameters and Ens PAI with additional perturbed atmospheric forcings and perturbed initial states (including standard deviations of the 60 ensemble members), in comparison to a reference run with default parameters (CLM-Ref).In all CLM cases time series were subsetted according to available Obs.

Figure 3 .
Figure 3. Daily LAIs for the evaluation period 1 December 2011-30 November 2013 for grid cells where the sites Merzenhausen (ME) and Selhausen (SE) are located.Results are shown for the 60 ensemble members of the CLM cases Ens PertP with estimated parameters, and Ens PAI with additional perturbed atmospheric forcings and perturbed initial states, in comparison to a reference run with default parameters (CLM-Ref) and RapidEye data (Obs.RapidEye; bold lines: ensemble mean).
Figure 7 illustrates how estimated parameters affected the modeled annual NEE sum (g C m −2 yr −1 ) for the Rur catchment.With default parameter values, the annual NEE was positive for most of the grid cells, particularly in the northern part of the catchment which is dominated by agriculture.In terms of NEE, the catchment is a net CO 2 source.With estimated parameters from CLMwww.biogeosciences.net/15/187/2018/Biogeosciences, 15, 187-208, 2018

Figure 4 .
Figure 4. Mean diurnal course of half-hourly NEE for winter 2012/13 (a), spring 2013 (b), summer 2013 (c) and autumn 2013 (d) for the Merzenhausen site (ME).Results are shown for the 60 ensemble members of the CLM cases Ens P with estimated parameters, and Ens PAI with additional perturbed atmospheric forcings and perturbed initial states, in comparison to a reference run with default parameters (CLM-Ref) and EC data (EC-Obs.;bold lines: ensemble mean).

Figure 6 .
Figure 6.Ensemble mean and standard deviation of carbon fluxes simulated by four different CLM ensembles: (i) with estimated parameters (Ens P ) and (ii) additional perturbation of Q 10 (Ens P+Q10 ) or atmospheric forcings (Ens P ) or atmospheric forcings and initial states (Ens PAI ), in comparison to the reference run with default parameters (Ref).Plotted are the PFT-specific, catchment-average sums of NEE, GPP and ER.

Table 1 .
Eddy covariance tower sites in the Rur catchment.for example, from calibration errors or cloud contamination and (ii) deficiencies in the representation of canopy architecture in the algorithms applied for the LAI retrieval, for example, negligence of foliage clumping.This can lead to gross underestimation of actual LAI, especially for needleleaf forests a Parameter estimation b Per.t.: Permanent EC tower; Rov.st.: roving station ments resulting,

Table 2 .
Han et al. (2014)d for the perturbation of the meteorological input data, adapted fromHan et al. (2014).

Table 4 ,
CLM parameter estimates notably reduced the mismatch between modeled and measured NEE data in comparison with the reference run (Ref), where global default parameter values were used.MAD diur was reduced by 7 % (SE gc ), 25 % (NZ gc ) and 35 % (ME gc ) and increased by 7 % for EN gc .Thus, the mean diurnal NEE cycles were closer to the observed ones for all www.biogeosciences.net/15/187/2018/Biogeosciences, 15, 187-208, 2018

Table 4 .
Root mean square error RMSE m (µmol m −2 s −1 ), mean absolute difference for the mean diurnal NEE cycle MAD diur (µmol m −2 s −1 ) and relative difference of the NEE sum over the evaluation period RD NEE (%) for the CLM ensemble with estimated parameters (Ens P ) in comparison to the reference run (Ref) with default parameters.Results are given for ME (Merzenhausen), SE (Selhausen), NZ (Niederzier) and EN (Engelskirchen).Here, n is the number of non-gap-filled half-hourly measurement data that were available to calculate these evaluation indices.

Table 4 )
. For ME gc , SE gc and NZ gc , the predicted NEE was significantly closer to the observations for CLM-Ens P than for CLM-Ref, and RD EE was reduced by a factor of 1.2 (SE gc ) to 7.2 (ME gc ).RD EE was 13 % (SE gc ) to 68 % (ME gc ) lower for CLM-Ens P compared to CLM-Ref.
For EN gc , RD EE was slightly lower for CLM-Ref, but as indicated in Fig. 2, the difference of NEE was not significant.Overall, results indicate that NEE sums over a time period from several months to 1 year were better represented with estimated values than with global default values.Figure