Meta-analysis of high-latitude nitrogen-addition and warming studies implies ecological mechanisms overlooked by land models

Accurate representation of ecosystem processes in land models is crucial for reducing predictive uncertainty in energy and greenhouse gas feedbacks with the climate. Here we describe an observational and modeling meta-analysis approach to benchmark land models, and apply the method to the land model CLM4.5 with two versions of belowground biogeochemistry. We focused our analysis on the aboveground and belowground responses to warming and nitrogen addition in high-latitude ecosystems, and identified absent or poorly parameterized mechanisms in CLM4.5. While the two model versions predicted similar soil carbon stock trajectories following both warming and nitrogen addition, other predicted variables (e.g., belowground respiration) differed from observations in both magnitude and direction, indicating that CLM4.5 has inadequate underlying mechanisms for representing high-latitude ecosystems. On the basis of observational synthesis, we attribute the model– observation differences to missing representations of microbial dynamics, aboveground and belowground coupling, and nutrient cycling, and we use the observational metaanalysis to discuss potential approaches to improving the current models. However, we also urge caution concerning the selection of data sets and experiments for meta-analysis. For example, the concentrations of nitrogen applied in the synthesized field experiments (average = 72 kg ha yr) are many times higher than projected soil nitrogen concentrations (from nitrogen deposition and release during mineralization), which precludes a rigorous evaluation of the model responses to likely nitrogen perturbations. Overall, we demonstrate that elucidating ecological mechanisms via meta-analysis can identify deficiencies in ecosystem models and empirical experiments. 1 Introduction Northern Hemisphere high-latitude soils are among the largest global stores of soil organic matter (SOM) (Grosse et al., 2011). Recent studies have estimated SOM storage within permafrost regions to be ∼ 1700 Pg to 3 m in depth (Schuur et al., 2012), representing nearly 50 % of global terrestrial organic carbon, or nearly twice that currently in the atmosphere (King et al., 2007). Permafrost SOM is stabilized by cold temperatures, and is therefore vulnerable to the warming that high-latitude regions will experience over the next century (Schuur and Abbott, 2011). However, the response of high-latitude ecosystems to global climate change is complex. Under warming, the active layers of permafrost soils thicken, and may serve as a reservoir of chemically labile organic carbon. Carbon released from these soils (mostly as CO2 or CH4) may accelerate the rate of warming and form a positive feedback to climate change (Koven et al., 2011). Alternatively, elevated rates of organic matter decomposition release limiting nutrients (e.g., nitrogen) that could stimulate plant productivity, sequestering CO2 from the atmosphere, serving as a negative feedback on climate change (Shaver et al., 1992). Predictions of how future climate change will alter highlatitude soil carbon are derived mainly from (a) conclusions of in situ field manipulation studies and (b) output of land models either coupled or uncoupled with an atmospheric model. The Earth system models (ESMs) couple land and atmospheric processes by simulating land biogeochemical and biophysical states and fluxes (including soil carbon dynamics and effluxes) and feedbacks to atmospheric carbon concentrations across decadal, centennial, and millennial timescales (Kaplan et al., 2002; Koven et al., 2011). Current ESMs have high uncertainty in their predicted magnitude of Published by Copernicus Publications on behalf of the European Geosciences Union. 6970 N. J. Bouskill et al.: Testing model performance via meta-analysis 43 1139 1140 1141 1142 1143 1144 Figure	  1 1145 1146 1147 1148 1149 1150 150oW 120 o W


Introduction
Northern Hemisphere high-latitude soils are among the largest global stores of soil organic matter (SOM) (Grosse et al., 2011). Recent studies have estimated SOM storage within permafrost regions to be ∼ 1700 Pg to 3 m in depth , representing nearly 50 % of global terrestrial organic carbon, or nearly twice that currently in the atmosphere (King et al., 2007). Permafrost SOM is stabilized by cold temperatures, and is therefore vulnerable to the warming that high-latitude regions will experience over the next century (Schuur and Abbott, 2011). However, the response of high-latitude ecosystems to global climate change is complex. Under warming, the active layers of permafrost soils thicken, and may serve as a reservoir of chemically labile organic carbon. Carbon released from these soils (mostly as CO 2 or CH 4 ) may accelerate the rate of warming and form a positive feedback to climate change (Koven et al., 2011). Alternatively, elevated rates of organic matter decomposition release limiting nutrients (e.g., nitrogen) that could stimulate plant productivity, sequestering CO 2 from the atmosphere, serving as a negative feedback on climate change (Shaver et al., 1992).
Predictions of how future climate change will alter highlatitude soil carbon are derived mainly from (a) conclusions of in situ field manipulation studies and (b) output of land models either coupled or uncoupled with an atmospheric model. The Earth system models (ESMs) couple land and atmospheric processes by simulating land biogeochemical and biophysical states and fluxes (including soil carbon dynamics and effluxes) and feedbacks to atmospheric carbon concentrations across decadal, centennial, and millennial timescales (Kaplan et al., 2002;Koven et al., 2011). Current ESMs have high uncertainty in their predicted magnitude of carbon-climate feedbacks (Arora et al., 2013;Friedlingstein et al., 2006) because of insufficiencies in model structure and parameterization (Bonan et al., 2011;Jung et al., 2007;Piao et al., 2013;Zaehle et al., 2014).
Benchmarking the performance of land models has been challenging (Luo et al., 2012). One approach has been to compare model output against the output of distinct manipulation studies (Thomas et al., 2013b) that acutely perturb ecosystems on short timescales (months to years). However, the broad spatial heterogeneity of high-latitude soils may not be well represented by the concentration of high-latitude field studies within a few sites. Herein, we benchmark the models by compiling data from a range of studies measuring the same variables across spatial gradients. This approach can determine an overall ecosystem response to perturbation, eliminating the weight placed on any one study. Data compilation can also identify important mechanisms that determine the fate of soil carbon but are currently not represented in the land models.
In the present study, we examined the fate of high-latitude soil carbon based on conclusions drawn from (1) metaanalyses of high-latitude field studies (≥ 60 • N) focusing on ecosystem responses to warming and nitrogen additions and (2) meta-analyses of simulations mimicking the experiments using the land component (CLM4.5) of the Community Earth System Model (CESM). We address four questions: (1) do the models and synthesized data predict a similar response of carbon and nutrient cycling to ecosystem warming and ni-trogen addition? (2) In what areas do the models and experiments diverge? (3) What are the mechanisms, including those absent in the models, the field experiments demonstrate to be important for evaluating the fate of soil C? (4) What types of observationally derived model benchmarks are appropriate for the various ecosystem processes relevant to high-latitude soil C dynamics? 2 Materials and methods

Literature search
We compiled published observations for replicated field studies from high-latitude ecosystems (≥ 60 • N) ( Fig. 1) examining responses of belowground biogeochemistry to warming and nitrogen addition. The data were mainly extracted from published figures or tables, or directly from the authors in cases where unpublished results were referenced in a published study. Manipulation studies were located by searching the ISI Web of Knowledge, using the following principal terms: "Arctic", "Permafrost", and "High-latitude", paired with "Manipulation", "Nitrogen", and "Warming". Where available, we collected data from control and perturbed soils on microbial (i.e., bacterial + fungal) biomass, fungal biomass, aboveground biomass, belowground respiration, heterotrophic respiration, gross primary productivity (GPP), litter decomposition, soil organic matter content (SOM), net nitrogen mineralization, and soil and microbial nitrogen and phosphorus concentrations.
To characterize the response of high-latitude soils to warming we collected data from studies that passively warmed soil using open top chambers (OTC) or greenhouses (OTG) and snow manipulation studies. We also collected data from studies that used incubations to increase temperature. We collected more than 2800 entries from 53 field studies across 17 different high-latitude ecosystems. We present the data as a response ratio across all of the studies. We also sought to understand the influence of duration on certain responses and, where appropriate, data were further partitioned by experimental duration: short-term (< 2 yr), longterm (> 5 yr) and intermediate (2-4 yr).
For nitrogen addition, we collected studies that applied nitrogen as either ammonium nitrate (NH 4 NO 3 ) or nitrogen phosphorus potassium fertilizer (NPK). We analyzed over 2300 entries (i.e., individual measurements of each metric) across 37 nitrogen addition field studies from 14 geographically distinct sites (Table S1). We examined the influence of geography on the response of our data sets by partitioning the data between that collected from European and North American manipulation studies. The data were also temporally disaggregated in a similar manner as described above for the warming experiments.
Data were extracted from figures using the Data Thief software (Tummers, 2006). Comparison data were standardized to units of "g m −3 " prior to calculating a response ratio. Bulk density measurements for the different soils were extracted from the published studies or through personal communication with the authors. In the cases where authors could not be contacted, bulk density was estimated using a previously published approach (Calhoun et al., 2001).

Meta-analysis
Data were analyzed using the MetaWin 2.2 software package (Rosenberg et al., 2000), using the standard deviation (SD) reported from each individual observation. In the majority of cases, SD was calculated from the reported standard error and number of replicates. A response metric was calculated as the natural log of the treatment group relative to a control: where X T and X A are the mean values for the treatment and ambient response variable, respectively. The sampling variance (V ln R ) was calculated as where s T and s A represent the normalized standard deviations around the mean values and N T and N A are the number of replicate studies from treatment and ambient experiments, respectively. The effect size for different response metrics was subsequently calculated using a weighted average value, where the weight for the ith study is the reciprocal of its sampling variance. A mixed model was used to calculate the cumulative differences in the response variables in treatment versus control plots. These cumulative differences were calculated for the overall data set, and also after constraining the data sets to similar conditions and forcings (e.g., geographic location, magnitude of N added). When an effect size was drawn from a low number of contributing studies (< 15), the data were resampled (using 2500 iterations) by bootstrapping to give a conservative estimate of the confidence interval (CI). Data were also gathered on climate conditions (mean annual air temperature (MAT) and precipitation (MAP), and growing season mean air temperature (GSMT)) and experimental conditions (experimental duration and magnitude of warming or nitrogen added) for each site sampled. We used a regression analysis to examine whether variability in response variables (e.g., belowground respiration and microbial biomass) was due to spatial differences in climate or due to experimental manipulation (e.g., warming or nitrogen added).

CLM-BGC spin-up and experimental manipulation scenarios
We simulated the ecosystem perturbation experiments using the community land model (CLM4.5) with two different representations of belowground biogeochemistry; a vertically resolved belowground module with similar biogeochemistry to the Century model (termed CLM-Century, Koven et al., 2013), and the Carbon-Nitrogen biogeochemistry module (termed CLM-CN, Thornton et al., 2007). CLM-Century and CLM-CN share the same formulation of aboveground biogeochemical processes and land biogeophysics, but differ in their representation of belowground carbon turnover and nitrogen cycling. For example, CLM-CN represents the belowground decomposition cascade as four discrete pools with faster turnover times than the three-pool approach used by CLM-Century (Koven et al., 2013). Furthermore, the nitrogen cycle of CLM-CN is much more open (i.e., higher cycling rates and losses) than that of CLM-Century. Finally, CLM-CN does not resolve the vertical biogeochemical gradients characteristic of CLM-Century. All simulations were run at a spatial resolution of 1.9 • × 2.5 • , using the Qian et al. (2006) data set for atmospheric forcing. The models were spun up for 1500 yr to preindustrial equilibrium following an improved spinup approach (Koven et al., 2013). Simulations were then run from 1850 to 1979 under contemporary climate forcing before the onset of perturbation conditions over the following 21 yr (from 1980 to 2000). Vegetation cover type was specified as described in Oleson et al. (2013). Model simulations were parameterized to replicate the field experiments: the soil was warmed by scaling the aerodynamic resistance by a factor of 10, a value obtained by trial and error to achieve a desired warming of ∼ 1 • C (in accordance with the average temperature increase noted for the experimental manipulations, see the results section below), while keeping sufficient spatial variability of the warming. CLM forces the soil heat transport process through the residual flux from incoming radiation, latent heat, and sensible heat. Increasing aerodynamic resistance reduces the sensible and latent heat fluxes and warms the soil during the growing season. We tried warming the soil by increasing the surface air temperature (which is a diagnostic variable in CLM), but this approach violated CLM's surface energy budget and was therefore avoided. Furthermore, increasing aerodynamic resistance is more analogous to the approach of installing opentop chambers to warm the soil.
Nitrogen was added in the form of NH 4 NO 3 at concentrations that replicated the very high concentrations of the nitrogen addition experiments (20, 40, 60, 80, and 100 kg N ha −1 yr −1 ). However, for comparison, we also simulated the model response to a range of nitrogen concentrations that reflect more realistic nitrogen deposition scenarios up to 2050 (0.2, 1.0, 2.0, 3.0 kg N ha −1 yr −1 , Galloway et al., 2004). To mimic the approach of most field studies, we began the perturbation (warming or nitrogen addition) when a   Fig. 2b, R B is given as the response to the average nitrogen concentration and also to lower, more realistic concentrations (represented by the green square). The modeled response in Fig. 2b is the collated response following the addition of low nitrogen concentrations (i.e., 0.2, 1.0, 2.0 and 3.0 kg-N ha −1 yr −1 ) and high concentrations (20, 60 and 100 kg N ha −1 yr −1 ). Note the axis change in Fig. 2b following the break. The number of individual studies and data points (in brackets) used in calculating the observation response ratio are given in blue on the right-hand side of the figure.
given model grid was snow free for 7 days (< 1 mm standing stock) and ended after more than 7 days with standing snow (> 1 mm standing stock). Model output was collected for each site considered in the meta-analysis ( Fig. 1) using a 3 × 3 grid that surrounded the experimental manipulation site at the center. The mean and standard deviation (SD) of predictions from the nine grid cells were then used to calculate the response ratios from that site. For coastal sites, some modeled grid cells were not on land due to model spatial resolution, and data statistics were therefore scaled with the actual number of data points accordingly. For all sites we took the mean and SD of the grid cells and analyzed the data using the meta-analysis approach applied to the observations and described above. Our model analysis was limited to the output from the surface soil (10 cm for CLM-Century and bulk prediction for CLM-CN, which represents approximately the top 20 cm of the soil) where the majority of the collected studies focused their measurements.

Response of belowground C cycling to warming
On average, experimental warming increased soil temperatures by 1.4 • C (±0.7 • C). Belowground respiration increased significantly under warming by 9 % (±5 %) compared to the controls. This increase in belowground respiration was largely driven by the response of European soils, which increased 33 % (±11 %) above control soils. Conversely, belowground respiration from North American soils showed a more modest, and non-significant, increase (2.5 ± 6.5 %; Fig. 2a, Supplement Fig. S1b). It is unlikely that this spatial difference is due to greater experimental warming of European soils: passive warming increased soil temperatures by 1.4 ± 0.6 • C in Europe and 1.3 ± 0.5 • C in North American experiments. A transient effect of belowground respiration in high-latitude soils was also noted in the data set. Short-term experiments (< 2 yr) showed a large significant increase (34.4 ± 16 %) in belowground respiration, which was not evident in studies lasting 2-4 yr. However, studies lasting longer than 5 yr also had significant increases in belowground respiration. GPP increased significantly (11.8 %) in warmed soils (Fig. 2a) and showed a positive relationship with belowground respiration (Fig. 3). Despite elevated GPP, litter decomposition declined significantly, by 9 % (±5 %), while SOM did not change significantly from control values (Fig. 2a). Both microbial and fungal biomass increased non-significantly under warming: microbial biomass increased by 3.8 % (±12 %), while fungal biomass increased by 11.5 % (±19 %).
Modeled warming experiments increased soil temperature by 1.21 ± 0.47 • C in CLM-CN and 0.91 ± 0.35 • C in CLM-Century. In response, the two models each predicted stronger relative and absolute increases in belowground respiration compared with the observational data. The models predicted higher litter decomposition in response to warming, which is in contrast to the decreasing trend found in the observational data. Both models also predicted increased nitrogen mineralization following warming, contrary to the observational data. The relative changes in SOM under warming were consistent between the model predictions and observations. Soil moisture increased non-significantly in both models (CLM-CN: 38 ± 42 %; CLM-Century: 7 ± 33 %), but with a wide variability. In general, CLM-CN tended to predict a much stronger temperature response than CLM-Century (Fig. 2a).

Response of belowground carbon cycling to nitrogen addition
The field experiments added an average of 72 kg N ha −1 yr −1 (±38 kg N ha −1 yr −1 ) of nitrogen to soils, with a range of 1-100 kg N ha −1 yr −1 . This additional nitrogen reduced belowground respiration and resulted in a larger sink for SOM, indicating a negative feedback to atmospheric CO 2 concentrations (Fig. 2b). Belowground respiration in soils receiving additional nitrogen (in the form NH 4 NO 3 ) declined 11.8 % (±7 %), significantly below control soils (Fig. 2b).
This pattern was consistent for the two geographical regions examined and was not dependent on the duration of the experiment. Belowground respiration in European soils declined, non-significantly, by 7 % (±9.5 %) below control soils (Fig. S1a). Belowground respiration in North American soils also declined significantly by 12.7 % (±9 %). Belowground respiration showed a negative relationship with increasing soil nitrogen concentration (Fig. 4a). Linear regressions failed to uncover a significant relationship between the response of belowground respiration and climate (MAT, MAP) or experimental factors (experimental duration and magnitude of nitrogen added). Heterotrophic respiration showed no significant change under nitrogen addition; however, the data are highly variable (±12 %). Nitrogen addition resulted in a significant decline in litter decomposition (% mass loss yr −1 ) of 4.8 % (±3 %), while SOM increased significantly 19.5 % (±10 %) in perturbed soils. GPP increased significantly under nitrogen addition (44.3 ± 7.5 %) compared with the control soils (Fig. 2b). On average, aboveground biomass (vascular + non-vascular plants) non-significantly increased upon nitrogen addition (15 ± 22 %). Vascular plant biomass increased significantly (33 ± 8 %) over that of the control soils (Fig. S1b).
Overall, a non-significant increase in microbial biomass was observed for experimental soils (Fig. 2b), yet, declined with increasing concentrations of nitrogen added to the soil (Fig. 4b). When factoring in geographical location, microbial biomass in European soils increased significantly above the controls (17.5 ± 9 %), but decreased non-significantly relative to control soils in North American soils (Fig. S1a). While different forms of nitrogen were applied in the experiments (e.g., NH 4 NO 3 or NPK), the most significant factors, explaining 37 % of the variance in microbial biomass, were site-specific pH and mean annual temperature. Finally, fungal biomass increased significantly by 23 % (±20.5 %) compared to the control soils.
For nitrogen perturbed CLM-CN and CLM-Century simulations we analyzed the relative response of variables complementary to the observational meta-analysis. Under nitrogen addition, the modeled response variables matched observations for only two parameters, GPP and SOM, and only at the lowest nitrogen-addition concentrations (i.e., ≤ 1 kg N ha −1 yr −1 , Fig. S3). Neither model accurately replicated the trend in the observed response of belowground respiration, litter decomposition, and nitrogen mineralization (Fig. 2b), while both models overestimated the response of heterotrophic respiration.

Discussion
Accurate representation of the processes governing soil carbon cycling in high-latitude soils is crucial for reducing model uncertainty in energy and greenhouse gas feedbacks with climate. By comparing meta-analyses based on model output and observations, we show that two belowground biogeochemical representations in CLM4.5 are unable to represent adequately many of the observed high-latitude ecosystem responses to two important climate change variables: temperature and nitrogen availability. We focus our discussion on the potential reasons for the discrepancies in responses by highlighting (1) the most important mechanisms currently missing from, or poorly represented in, the models, and (2) instances where deficiencies in the experimental approaches prohibit the data from being used to benchmark the model. We also recommend further approaches to improve the mechanistic basis of the belowground biogeochemistry representation in ESMs.

Response of belowground carbon cycling to warming
The observational meta-analysis suggests that elevated belowground respiration is balanced by elevated GPP (and associated increases in soil organic matter). We therefore conclude that the coupling of aboveground and belowground processes resulted in these soils being carbon neutral under modest (+1.3 • C) warming. The models also predicted no significant changes to belowground SOM content under warming due to concomitant increases in belowground respiration and GPP. However, the magnitude of the modeled fluxes is many times larger than the observed fluxes. Therefore, the net impact of the manipulation on SOM was predicted by the models, but with incorrect mechanisms. As a broader point, we believe this result illustrates a common problem among tests of land model performance, i.e., inferences of model fidelity based on comparisons solely with observations of emergent responses that have a low signal-to-noise ratio. For example, it is insufficient to use net ecosystem exchange (NEE) as a sole model benchmark (Schwalm et al., 2010), because it ignores the fact that (1) NEE is typically a small difference between ecosystem respiration and assimilation, and (2) models separately represent these gross fluxes as being differently controlled by climate and antecedent system states. We contend that representing this type of emergent ecosystem net flux within the observational uncertainty gives little information as to whether the model is accurately representing the underlying mechanisms appropriately.

Nitrogen cycling under warming
Confronting the model outputs with observations showed a consistent overestimation of key variables in the model predictions (Fig. 2a). One potential reason for a larger modeled response is the approaches CLM-CN and CLM-Century take to representing the nitrogen cycle, as modeled nitrogen input, retention, and loss have been shown to have a large impact on ecosystem carbon sequestration (Thomas et al., 2013b;Zaehle and Dalmonech, 2011). Moreover, Biogeosciences, 11, 6969-6983, 2014 www.biogeosciences.net/11/6969/2014/ data-based modifications to ESM nitrogen cycling mechanisms may further improve the correspondence between observations and model output (Thomas et al., 2013b). CLM-CN predicts much higher rates of nitrogen loss from denitrification, than CLM-Century, and is therefore more responsive to changing nitrogen availability from inputs, mineralization, and losses. Under warming, CLM-CN predicted a significant loss of soil nitrogen not predicted in CLM-Century, which has a more closed nitrogen cycle, possibly more representative of the nitrogen cycle in high-latitude soils (Barsdate and Alexander, 1975), where mineralization is the main source of nitrogen for plant and microbial growth during the growing season (Shaver et al., 1992). Depolymerization of proteinaceous compounds, to amino acids and eventually ammonia (Jones et al., 2009;Schimel and Bennett, 2004) is the critical step in this process and dependent on microbial physiology and subject to the same biotic and abiotic controls of organic matter decomposition (see discussion below).
Modeled nitrogen mineralization, however, increases under warming, with a concomitant increase in soil nitrogen in the CLM-Century framework. CLM-CN, with its high rates of mineral nitrogen losses, shows a very large decline in soil nitrogen, possibly rendering the aboveground and belowground communities nitrogen limited throughout. In our data analyses, nitrogen mineralization declined as microbial nitrogen (i.e., immobilization) increased. The end result in both cases (i.e., the models and observations) is the potential limitation of plant growth over long timescales. Our data synthesis suggests that the release of nitrogen from increased decomposition is used to meet microbial demands or immobilized. Microbial immobilization is regulated by the stoichiometric imbalance between the substrate being depolymerized and the physiological nutrient demand. While analogous to carbon use efficiency (CUE), nitrogen use efficiency (NUE), which relates immobilization and mineralization to microbial growth (Mooshammer et al., 2014), is regulated independently in order to maintain cell stoichiometry. Some attempts have been made to incorporate NUE controls into ecosystem models (Manzoni and Porporato, 2009), but further experimental and modeling work is required to understand NUE's plasticity and impacts on soil carbon dynamics.

Aboveground dynamics
The biogeochemical coupling between aboveground and belowground components of the ecosystem is crucial for understanding high-latitude carbon cycling under a changing climate. The meta-analysis of field measurements showed a general stimulation of aboveground activity under warming, while previous field studies have noted a shift in plant community composition with warming, favoring the establishment of deciduous shrubs and graminoids and selecting against mosses and lichens (Schuur et al., 2007;Sistla et al., 2013;Walker et al., 2006). This shift toward more woody plants changes the ecosystem carbon balance and nutrient dynamics (Jackson et al., 2002;Welker et al., 2004), as shrubs tend towards higher internal carbon allocation toward woody tissue, but also may increase belowground carbon allocation (as both litter and exudates) relative to mosses (Street et al., 2013). This change in belowground allocation may result in the observed relationship between GPP and belowground respiration (Fig. 3), indicating a close coupling between these two processes.
Current models crudely represent aboveground and belowground biogeochemical coupling and do not represent some of the crucial roles plants play in soil carbon dynamics (Ostle et al., 2009;Schmidt et al., 2011). Of particular relevance to high-latitude ecosystems is the lack of any representation of cryptogams or bryophytes in CLM4.5. These plants contribute substantially to aboveground biomass and biogeochemical processes in tundra soils (Cornelissen et al., 2007;Elbert et al., 2012) and are clearly important for accurate simulations of tundra carbon dynamics. Few ESM land models (including CLM4.5) include dynamic vegetation, and when it is included, representation tends to be coarse (Ostle et al., 2009). Ongoing work will attempt to address some of these deficiencies, by including representations of aboveground ecosystem demography (Huntingford et al., 2008;Moorcroft et al., 2001), and soil carbon dynamics Tang and Riley, 2013). Integration of these approaches into the CLM framework may improve the robustness of longterm tundra soil simulations and reduce uncertainty associated with the aboveground model response.

Litter decomposition
Disagreement between the observations and model predictions was also noted for litter decomposition. Under warming, litter decomposition declined in the observations, possibly contributing to SOM accumulation, but increased in the models. In previous studies, the response of litter decomposition to warming was largely dependent on the method used to increased soil temperature (Aerts, 2006). OTCs tend to warm the soil and reduce soil moisture, limiting litter decomposition by saprotrophic fungi. Soil moisture in the models showed a non-significant increase with warming as the permafrost began to thaw (Fig. 2a). The difference between the observational meta-analysis and the models represents a potentially confounding factor in using these data to benchmark the model. A previous meta-analysis focused solely on litter decomposition in Arctic and Alpine tundra found that warming induced a small increase in decomposition provided sufficient soil moisture (Aerts, 2006). This response was not apparent in our data syntheses, but suggests the model results, while overestimating litter decomposition, were at least in the appropriate direction. Soil moisture is an important controller on decomposition (Aerts, 2006;Hicks Pries et al., 2013). However, changes to surface hydrology during permafrost thaw are dependent on thermokarst formation and www.biogeosciences.net/11/6969/2014/ Biogeosciences, 11, 6969-6983, 2014 topological features of the landscape (Jorgenson and Osterkamp, 2005) and may result in increased or decreased soil moisture. We identify these issues as important for further experimental and modeling work in order to better represent future changes in surface hydrology and the consequences for litter decomposition.

Belowground response to warming
The observational data indicated elevated belowground respiration under warming. The response of microbial heterotrophs to warming can partially be explained by kinetic theory, whereby biochemical reaction rates increase with increasing temperature (Davidson and Janssens, 2006). Hydrolytic and oxidative extracellular enzymes, secreted to depolymerize complex organic matter (Allison et al., 2010), are sensitive to temperature (German et al., 2012). Structural modifications in cold ecosystems maximize their specific activity under in situ temperatures relative to temperate ecosystems (Hochachka and Somero, 2002), which may result in significantly enhanced activity under warming (Koch et al., 2007). This theory fits with the short-term (< 2 yr) data from the current meta-analysis showing increasing belowground respiration despite no increase in microbial biomass. However, we also identified a drop in belowground respiration in studies lasting longer than 2 yr and shorter than 5 yr (Fig. S1b). Belowground respiration has consistently been reported to decline under prolonged warming (Rustad et al., 2001) and attributed to substrate limitation (Hartley et al., 2008) or a community-level response of microbial populations to warmer temperatures offsetting the kinetic response of individual microbes (Bradford, 2013;Bradford et al., 2008). Given the increased GPP found in our meta-analysis, belowground communities are unlikely to be substrate limited. Therefore, we hypothesize that the community-level response is likely responsible for the drop in belowground respiration under 2-5 yr of warming.
The subsequent increase in belowground respiration over prolonged warming (> 5 yr) could represent either the decomposition of leaf litter driven by changes in microbial community composition, or thawing subsurface organic matter (Dorrepaal et al., 2009). This latter hypothesis is relevant to the long-term fate of high-latitude carbon. In the current analysis, NEE appears balanced, with no change in SOM. However, temporal patterns of vegetation response to warming show a transient effect of warming, with nutrient limitation reducing plant productivity on longer timescales (Arft et al., 1999;Chapin and Shaver, 1996). It is possible, given the large nitrogen immobilization under warming, that belowground respiration may continue longer than productivity, unbalancing NEE and leading to net carbon loss.
Temperature is a key factor influencing biogeochemical mechanisms in the model. CLM models belowground respiration using static Q 10 and fixed carbon use efficiencies (CUE) for different SOM pool sizes. This approach may re-sult in the large modeled increase in belowground respiration. In reality, both Q 10 and CUE vary on spatial and temporal scales, and respond nonlinearly to changes in temperature (Janssens and Pilegaard, 2003;Sinsabaugh et al., 2013, Tang and. Recent microbe-explicit models (MEMs) that consider basic microbial physiology (e.g., Lawrence et al., 2009) introduce direct biological control over soil carbon cycling and different conclusions on soil carbon pool size and dynamics under warming (Allison et al., 2010;Lawrence et al., 2009;Wieder et al., 2013). For example, by scaling the CUE value with temperature, in accordance with published observations (Luo et al., 2001;Melillo, 2002), the MEMs show a decline in soil carbon turnover under warming (Li et al., 2014;Wieder et al., 2013). Recent work using a MEM with explicit representation of internal physiology, extracellular enzymes, and mineral surfaces , demonstrates that both decomposition temperature sensitivity and CUE are hysteretic and cannot easily be represented by a simple function of soil temperature. However, it is also important to note that microbial CUE are not solely temperature dependent, and other factors, some of which are already present in CLM-CN and CLM-Century (including nutrient and soil moisture limitations), may uncouple growth and respiration and change CUE (Manzoni et al., 2008;Sinsabaugh et al., 2013). The predictions of the microbe-explicit models (MEM) provide further impetus for greater representation of the structure and function of belowground biomass.

Response of belowground carbon cycling to nitrogen addition
Our meta-analysis of field observations found that the addition of inorganic nitrogen to traditionally nitrogen limited ecosystems enhances the carbon sink, consistent with previous studies (Luo et al., 2012;McGuire et al., 2012). Interactions between the carbon and nitrogen cycles resulting in soil carbon accumulation in different ecosystems have been reported previously (Magnani et al., 2007;Thomas et al., 2013b), and have been attributed to an increased carbon allocation to woody tissue (Ciais et al., 2008;Tummers, 2006) and reduction in the SOM decomposition rate (Olsson et al., 2005). Overall, our data synthesis is largely consistent with the overarching conclusions of previous meta-analyses (Janssens et al., 2010;Knorr et al., 2005). A question remains, however, about the value of the responses synthesized from studies that add fertilizer (NH 4 NO 3 or NPK) as a source of nitrogen far in excess of anticipated global change scenarios for high-latitude ecosystems. The average concentration of nitrogen added to the soils in the tundra studies (∼ 72 kg ha −1 yr −1 ) is extremely high when compared with (1) estimates of nitrogen fixation (< 10 kg ha −1 yr −1 , Cleveland et al., 1999) www.biogeosciences.net/11/6969/2014/ nitrogen availability from organic matter mineralization under a warming climate (Harden et al., 2012). Consequentially, we question whether such data lend themselves to understanding the response of the ecosystem to realistic chronic incremental changes in nitrogen availability, and the benefit of benchmarking the ecosystem models against such a data set. On the other hand, if the models include the relevant underlying mechanisms, then they should reproduce the field studies, regardless of the amount of nitrogen added. We give further examples below of where the high nitrogen concentrations may confound the interpretation of the experiments with respect to the model predictions.

SOM dynamics
SOM accumulation under nitrogen addition experiments is a common feature of both the field experiments and the model simulations. However, the underlying mechanisms leading to SOM accumulation are very different, adding uncertainty to the model-predicted soil carbon fate over longer timescales.
In both versions of CLM, the alleviation of nitrogen limitation stimulates a number of ecosystem processes including aboveground primary productivity, litter decomposition, and organic matter decomposition. The accumulation of SOM indicates the stimulation of GPP and litter decomposition (as a source into the SOM pools) must outweigh losses from increased belowground respiration. The observations, on the other hand, show a significant decline in belowground respiration and litter decomposition under nitrogen addition. Belowground respiration depends on the decomposition and substrate utilization capabilities of the microbial (i.e., bacterial and fungal) community to mineralize root exudates and litter. A drop in belowground respiration may, therefore, be attributable to several mechanisms not included in either version of CLM, including the internal reallocation of carbon in plants and trees that reduces the rate of root exudation to belowground ecosystems (Janssens et al., 2010). Carbon limitation of the microbial community may result in a decline in biomass and belowground respiration (Janssens et al., 2010). Our empirical data show increased GPP and vascular plant biomass that could indicate the reallocation of newly fixed carbon in vascular plants (Ciais et al., 2008) and a drop in belowground exudation.

Belowground response to nitrogen addition
Overall, the current observational meta-analysis found a nonsignificant increase in microbial biomass (i.e., bacterial and fungal) but a significant increase in fungal biomass under nitrogen addition. This response appears contrary to previous studies that have recorded a drop in microbial biomass under nitrogen addition , but in line with fertilization studies in tundra ecosystems (Clemmensen et al., 2006). We also note that microbial biomass (and below-ground respiration) are inversely related to the amount of nitrogen added to the soils (Fig. 4a, b). At low nitrogen concentrations, microbial community activity can be stimulated (Allison et al., 2009) and decomposition elevated, as indicated by the models (Fig. 2b) and some of the observations (Fig. 4b). Elevated nitrogen concentrations, however, have a negative impact on microbial biomass  and decomposition (Janssens et al., 2010). This response can occur through the inhibition of lignin-degrading enzymes produced by saprotrophic fungi (Sinsabaugh et al., 2002;but see Hobbie, 2008), or the increased physical protection of organic matter from decomposition attributed to soil carbon undergoing condensation reactions with high concentrations of inorganic nitrogen (Dijkstra et al., 2004). Therefore, under the high nitrogen inputs used in the present field studies, the coupling between aboveground and belowground ecosystems can decrease belowground respiration and litter decomposition, resulting in an accumulation of SOM.
Whereas the warming meta-analysis yielded results that could be used to constrain model mechanisms, the same cannot be concluded for the nitrogen-addition studies due to the uncertainty of how high-latitude soils will respond to lower concentrations of nitrogen. However, we suggest two potential model changes that could rectify the different conclusions derived from the observations and models. (1) A dynamic vegetation approach sensitive to changes in nitrogen inventory could represent compositional changes across the tundra with important ramifications for root biomass, litter quality, and plant exudates that play a significant role in soil carbon dynamics (Aerts et al., 2005). (2) Representation of discrete belowground biomass functional groups (e.g., heterotrophic and fungal decomposers) alongside their dependencies on soil nitrogen may help to constrain the belowground response to nitrogen addition. Finally, while the model mechanisms should ideally be able to reproduce the observed response to high nitrogen loading, we believe that future manipulation studies in high-latitude soils that use realistic nitrogen additions would be more relevant for understanding the tundra soil response. For example, recent studies have added nitrogen to tundra soils at magnitudes one order of magnitude higher than measured concentrations (Lavoie et al., 2011) or guided by soil mineralization rates (Sistla et al., 2012). The ecosystem response is therefore more likely to reflect future responses under anticipated mineralization or deposition scenarios (Galloway et al., 2004).

Barriers and criteria for successful experiment-based model benchmarking
While we were able to benchmark some aspects of the model predictions using the observational meta-analysis, we acknowledge several concerns that may have complicated the data-model comparison. structures and degrees of complexity, a standard approach to establishing perturbations would be beneficial. In our model, atmospheric warming resulted in unrealistic uniform soil warming across the study domain and therefore underestimated the spatial heterogeneity found in passive warming experiments (Bokhorst et al., 2012). On the other hand, solely reducing the wind speed failed to alter the soil thermal regime, indicating a possible problem in the formulation of CLM's surface boundary layer resistance. However, our approach of warming via enhanced aerodynamics resistance is not transferable to models using atmospheric temperature, rather than a surface energy balance scheme, to force soil thermal dynamics. Therefore, criteria need to be established to ensure, regardless of the method used, that the experimental manipulation is reproduced in the model with sufficient fidelity that the predicted and observed responses can be reasonably compared. We consider the criterion used here for the warming experiments (i.e., that the mean predicted manipulation soil temperatures are not significantly different from the observations) to be a minimally acceptable criterion. Ideally, the predicted response of soil temperature, soil moisture, and radiation under warming would emerge in a statistically similar manner to the observations. In the current study, this criterion was not met for soil moisture, where the observations found that soil moisture declined (by 8 ± 6 %) under warming and the model predicted large increases (CLM-CN: 38 ± 42 %; CLM-Century: 7 ± 33 %). This may be important, given the significant impacts moisture has on decomposition and nitrogen cycling.
Second, the spatial discrepancy between the model predictions and observational data is large. This mismatch arises from several sources, including uncertainties caused by spatial heterogeneity in the site and experimental manipulation (e.g., unequal heating within the open-top chambers, energy leaking at the boundary with surrounding soil), and uncertainties in the climate and environmental forcing data used to drive the models.
Third, while we acknowledge the complexity of interpreting single-factor manipulation experiments, the multifaceted nature of climate change calls for more multifactorial experiments and models that can reproduce any response. The few studies we could find measuring the response of similar variables to combined warming and nitrogen addition (e.g., Shaver et al., 1998) found an even larger warming response than for the single-factor experiments. However, there were too few studies measuring complementary variables to conduct a complete meta-analysis. Previous studies conducted in high-latitude soils have recorded a stronger response of decomposition following perturbation by a combination of drivers (e.g., elevated temperature and CO 2 ) than if those factors were considered in isolation (Fenner et al., 2007). In contrast, Leuzinger et al. (2011) give several examples where the opposite occurs: a combination of multiple drivers lessens the ecological response relative to individual drivers.
These contradictory results call for further consideration of the impact of multiple drivers in high-latitude ecosystems that might be used to benchmark model performance.

Overall recommendations
We have demonstrated here that despite some experimental drawbacks, the underlying biogeochemical mechanisms of CLM-CN and CLM-Century are insufficient to accurately reproduce the observations of a number of high-latitude perturbation experiments. However, we can identify several metrics from the meta-analyses, including nitrogen mineralization and litter decomposition, which may serve as useful indices of model performance. The sign and magnitude of these response ratios were incorrectly predicted by the models in under both warming and nitrogen addition. This error in the sign of the response also occurred for simulated belowground respiration under nitrogen addition, where the model was unable to capture the detrimental impact of very high nitrogen concentrations. In contrast, the SOM response under temperature and nitrogen perturbations appears to be a poor metric to benchmark the models, possibly owing to the large size and undefined composition of the soil organic matter stock.
Future development of biogeochemistry representation in CLM should focus on improvements to the nitrogen cycle. Recent work has shown that specific modifications to different nitrogen cycle pathways (e.g., redox cycling, plantmicrobial interactions) can improve the correspondence between model predictions and observational data (Thomas et al., 2013a). Development should also may focus on improved kinetics (e.g., equilibrium chemistry approximations, Tang and Riley, 2013) to regulate competition for nutrients between biotic and abiotic sinks (e.g., plants, microbes, minerals) as an alternative to the current allocation schemes of CLM-CN (Thornton et al., 2007). In addition, the integration of dissolved organic nitrogen cycling as a nutrient source for microbes and plants appears to be an important source of nitrogen in high-latitude soils (Hobbie et al., 2009;Weintraub and Schimel, 2005). However, insufficient data were available to include DON as a response factor in the metaanalysis.
The lack of explicit coupling between plant functional types (PFT) and belowground microbial ecosystems in the model fails to capture the importance of this interaction for carbon and nutrients cycling and SOM stability. The PFT concept could be further extended to characterize differential belowground carbon allocation (Street et al., 2013). In addition, symbiotic relationships between different plants and mycorrhizal fungi can increase nutrient acquisition (Hobbie et al., 2009), by facilitating nitrogen fixation (Nasto et al., 2014), and phosphorus acquisition (Smith et al., 2011), thereby increasing photosynthetic rates (Jia et al., 2004). Improving and expanding the definition of the PFT to include these associations may serve to improve coupling between nutrient cycling with belowground biogeochemistry .   Biogeosciences, 11, 6969-6983, 2014 www.biogeosciences.net/11/6969/2014/ Finally, while the magnitudes of nitrogen added to tundra soils were very high, the threshold relationship (Fig. 4) that describes the alleviation of nitrogen limitation and stimulation of ecosystem processes at low concentrations from their inhibition at high nitrogen concentrations has support from previous studies (Knorr et al., 2005). However, mechanisms have not yet been integrated into the model to capture this range of responses. The model should be able to reproduce the impact of high nitrogen concentrations associated with agriculture soils, and more work is required to further characterize this threshold effect. It is unlikely, however, that the model-predicted linear relationship between nitrogen availability and ecosystem processes will, in general, be true.

Summary and conclusions
The use of a meta-analysis to benchmark models has a distinct advantage of aggregating the response of a number of different climate change experiments across spatial and temporal scales to converge upon an average ecosystem or biome response. This aggregation reduces the weight that any one study has on the development of a model benchmark metric. This approach is particularly valuable in ecosystems in which a large number of studies have been performed (e.g., temperate systems, Lu et al., 2013). However, we also caution that the field experiments used in a benchmarking metaanalysis must be carefully chosen. We demonstrated the utility of benchmarking land models using studies and measurements that attain a realistic ecosystem response to warming, and the difficulties associated with comparing model performance against nitrogen addition studies that do not replicate conditions under current or anticipated future climates.
The Supplement related to this article is available online at doi:10.5194/bg-11-6969-2014-supplement.