Underestimation of boreal soil carbon stocks by mathematical soil carbon models linked to soil nutrient status

Underestimation of boreal soil carbon stocks by mathematical soil carbon models linked to soil nutrient status Boris Ťupek1, Carina A. Ortiz2, Shoji Hashimoto3, Johan Stendahl2, Jonas Dahlgren4, Erik Karltun2, and Aleksi Lehtonen1 1Natural Resources Institute Finland, P.O. Box 18, 01301 Vantaa, Finland 2Swedish University of Agricultural Sciences, P.O. Box 7014, 75007 Uppsala, Sweden 3Forestry and Forest Products Research Institute, Tsukuba, Ibaraki 305-8687, Japan 4Swedish University of Agricultural Sciences, Skogsmarksgränd, 90183 Umeå, Sweden Correspondence to: B. Ťupek (boris.tupek@luke.fi), A. Lehtonen (aleksi.lehtonen@luke.fi)


I see now that Fig A1 caption no longer calls it 'measured fAPAR' but 'actual fAPAR'. So that seems to mean that we have 'actual fAPAR' which is a calculated quantity based on Harkonen et al. and your modelled fAPAR (?). So really we are comparing model vs. model? So where do things like observations fit in? # In
caption and through the manuscript the site specific "actual = observed" fAPAR was directly calculated based on observations from Swedish forest inventory data and a method described in Härkönen et al. (2010). Thus fAPAR values in our study represented the observed state of forest stand biomass. The long term average fAPAR (fAPAR70) was found iteratively from the "observed" fAPAR maxima for the sites across Sweden by comparison of SOC stocks estimated with litterfall derived by fAPAR models for fAPAR70 and similarity of distributions between measured and modeled SOC stocks (Fig. S2).
This brings me to a more major point. There is a heavy reliance on models to check models within this paper. For eg."We modelled the steady state biomass by applying the fitted exponential functions between the actual state forest biomass components (stem, branch, foliage, stump, coarse-roots, fineroots, estimated by tree stand measurements and the allometric biomass functions) and the actual fraction of absorbed radiation (fAP AR ) (Appendix B) to the estimated fAP AR70 of the steady state forest" (line 146 in revised MS) and then: "Forest stand biomass was estimated by allometric biomass functions for stem with bark, branch,foliage, stump, coarse-roots and fine-roots applied to basic tree dimensions (breast height diameter,total height of tree, number of trees) of SFI stands " (line 130) Then in appendix B: "The biomass components derived with allometric models (measured) and those derived with fAP AR models (modeled) showed strong correlations (Fig. B1). " But really they aren't measured! They are both modelled quantities; allometric models are not 'real'.This seems like a rabbit hole where modelled quantities are used to check other modelled quantities which then generate further modelled quantities. How do you propagate error from the allometric relations? How do you know that using one modelled quantity to check another results in a correct answer? Since those are very hard questions to answer, it needs to be made completely transparent what is observed and what is modelled. So if you are comparing a modelled value to a modelled value there can be no mistaking it for measured quantities. What you have done could be valid but it is still very opaque on what is going on. # We acknowledge the point needed to clarify our method and test the model outputs with litterfall data. Although data on required litterfall components was not available for Sweden and for soil carbon modelling litterfall needs to be modeled (Ortiz et al. 2011, and2013), the use of allometric models with data from forest inventory and applying litterfall production rates in order to estimate litter input for soil carbon models is a standard method of national greenhouse gas inventories estimating soil carbon stock changes (Statistics Finland 2013). Therefore we adopted this standard method. In order to model SOC stocks of forest in equilibrium (not SOC stocks changes) we modified it only by estimating the long term litterfall of forest in equilibrium.
We added following sentence into the section on 2.1.1 Biomass and litterfall estimates: For the biomass and litterfall estimation we adopted standard method of national greenhouse gas inventories for estimating soil carbon stock changes (Statistics Finland 2013). In order to model SOC stocks of forest in equilibrium (not SOC stocks changes) we modified the method by estimating the long term litterfall of forest in equilibrium. Statistics Finland 2013. Greenhouse gas emissions in Finland 1990-2011. In: National Inventory Report to the UNFCCC Secretariat, Ministry of the Environment, Helsinki, Finland, pp. 285-286.
Dr. Leonid L. Golubyatnikov, Referee #1: I do not see in this manuscript new ideas and new approaches. The main results are expected in advance. I think this manuscript does not correspond to the level offered for articles in journal Biogeosciences. # Thank you for your opinion! Although as we thoroughly replied to your previous comments (see Interactive comment on Biogeosciences Discuss., doi:10.5194/bg-2015-657, 2016, we do not see any scientifically valid reason for ignoring our results from the SOC stock inter-comparison between the massive Swedish forest soil inventory data and Yasso07, CENTURY and Q process based models.
In a nutshell, our study provided unique solution for combining massive ground measurements from Swedish forest and forest soil inventories, estimating litterfall for forest in equilibrium required for modelling equilibrium soil carbon stocks, as well as unique evaluation of physicochemical soil, climatic, and plant factors by using recursive partitioning method for generating regression trees of SOC stocks.
We agree that soil nutrient status as a missing driver of soil carbon sequestration of process based models can be hypothesized, as it has been discussed (Schmidt et al., 2011;Averill et al., 2014), and as it's known that soil nutrient status drives the ecosystem carbon use efficiency and soil carbon sequestration (Ågren et al., 2001, Manzoni et al., 2012, Fernández-Martínez et al., 2014, and also that models fail to reconstruct the spatial variation (Todd-Brown et al., 2013;Ortiz et al., 2013;Lehtonen et al., 2015a). Although to our knowledge this has not been tested by other studies, and our findings that Yasso07, CENTURY and Q SOC stock estimates generally agreed well for 2/3 of the Swedish boreal forest sites but underestimated for sites with higher fertility due to models poorly predicting the large carbon stabilization in the mineral soil are new.
Biogeosciences (BG) frequently publishing results on interactions between the drivers of soil carbon sequestration provides the best platform for sharing our findings with the international research community. Hopefully by now this has convinced you, that our findings would appeal to a broad audience of scientists studying the interactions between climate, forest stand, soil, and carbon sequestration, such as the readership of BG. Our analysis suggested that the soils with poorly predicted SOC stocks, as characterized by the high nutrient status and well sorted parent material, indeed have had other predominat drivers of SOC stabilization lacking in the models presumably the mycorrhizal organic uptake and organo-mineral stabilization processes. Our results imply that the role of soil nutrient status as regulator of organic 20 matter mineralization has to be re-evaluated, since correct steady state SOC stocks are decisive for predicting future SOC change and soil CO 2 efflux.

Introduction
In spite of the historical net carbon sink of boreal soils, 500 Pg of carbon since the last ice age (Rapalee et al., 1998;DeLuca and Boisvenue 2012;Scharlemann et al., 2014), boreal soils could 25 become a net source of carbon :::::: dioxide : to the atmosphere as a result of long-term climate warming (Kirschbaum 2000;Amundson 2001). They have the potential to release larger quantities of carbon than all anthropogenic carbon emissions combined (337 Pg) (Boden et al., 2010). In order to preserve the soil carbon pool and to utilize the soil carbon sequestration potential to mitigate anthropogenic CO 2 emissions, mitigation strategies of climate forcing aim to improve soil organic matter 30 management (Schlesinger 1999;Smith 2005;Wiesmeier et al., 2014).
Supporting soil management decisions requires an accurate quantification of spatially variable soil organic carbon (SOC) stock and SOC stock changes (Scharlemann et al., 2014). The initial level of SOC stock is essential in order to estimate SOC stock changes (Palosuo et al., 2012, Todd-Brown et al., 2014, especially when estimating carbon emissions due to land-use change e.g. afforestation

35
of grasslands (Berthrong et al., 2009 Ortiz et al., 2013). Despite the potentially quantitative importance of CO 2 emissions the expected change will be small in relation to the SOC stock. Therefore, the uncertainty of measurements and/or model estimates could prevent conclusions on SOC stock changes (Palosuo et al., 2012;Ortiz et al., 2013;Lethonen et al., 2015a) especially for the soils with largest SOC stocks which are the most sensitive to carbon loss. Beside large uncertainties, the poor agreement between 45 the modelled and measured SOC stocks (Todd-Brown et al., 2013) could also indicate missing biotic or abiotic drivers of long-term carbon storage (Schmidt et al., 2011;Averill et al., 2014).
For example ignoring the essential role of soil nutrient availability in ecosystem carbon use efficiency (Fernández-Martínez et al., 2014) could lead to missing important controls of plant litter production and soil organic matter stabilization mechanisms. Soil nutrient status is linked to the 50 mobility of nutrients in the water solution (Husson et al., 2013), production, quality and microbial decomposition of plant litter (Orwin et al., 2011), and formation of the soil organic matter (SOM).
The SOM affects soil nutrient status by recycling of macronutrients (Husson et al., 2013), and water retention and water availability (Rawls et al., 2003).
In spite of state of the art soil carbon modelling based on the amount and quality of plant litter 55 "recalcitrance", affected by climate and/or soil properties as in the Yasso07, Q and CENTURY models, these type of process based models do not include mechanisms for SOM stabilization by a) the organic nutrient uptake by mycorrhizal fungi; b) humic organic carbon interactions with silt-clay minerals; and c) the inaccessibility of deep soil carbon and carbon in soil aggregates to soil biota 2 (Orwin et al., 211;Sollins et al., 1996;Torn et al., 1997;Six et al., 2002;Fan et al., 2008;Dungait 60 et al., 2012;Clemente et al., 2011). Although the models do not contain aforementioned mechanisms and controls for changes in SOM stabilization processes, they have been parameterized using a wide variety of datasets and can treat soil biotic, physicochemical and environmental changes implicitly.
The Yasso07 model (Tuomi et al., , 2011 is an advanced forest soil carbon model and it is used for Kyoto protocol reporting of changes in soil carbon amounts for the United Nations Framework

65
Convention on Climate Change (UNFCCC) by European countries e.g. Austria, Finland, Norway, and Switzerland. The Q model (Ågren et al., 2007) is a mechanistic litter decomposition model developed in Sweden and used e.g. to compare results produced with Swedish national inventory data (Stendahl et al., 2010, Ortiz et al., 2011 and also with other models at national or global scales (Ortiz et al., 2013;Yurova et al., 2010). The CENTURY model (Parton et al., 1987, 1994, Adair 70 et al., 2008 is one of the most widely applied models and it is used for soil carbon reporting to UNFCCC by Canada, Japan, and USA. Although individual parameters and functions vary, mathematical models such as Yasso07, Q and CENTURY have similar structures. For example, these models are driven by the decomposition rates of litter input and soil organic matter (SOM). Decomposing litter and SOM is divided into pools based on litter quality, and its transfer from one pool 75 to another is apart from model functions and parameters affected by temperature (Q) and/or water (Yasso07), and/or soil texture and structure (CENTURY). The Q model does not include explicit moisture function, whereas for the Yasso07 and CENTURY models precipitation affects decomposition Adair et al., 2008). On the other hand, the models do not explicitly or by default include mechanisms that reduce decomposition by excessive precipitation/moisture (Falloon We hypothesized that (1) soil carbon estimates of the Yasso07, Q, and CENTURY models would deviate for soils where SOC stabilization processes not implicitly accounted by the models are predominant, (2) the Yasso07 and Q models ignoring soil properties would fail on the nutrient rich sites of South-West coast of Sweden and on occasionally paludified clay and silt soils, and (3) the CEN-

85
TURY model outperforms the Yasso07 and Q models due to fact that it includes soil properties as input variables.
We grouped Swedish forest soil inventory data into homogenous groups with specific soil physicochemical conditions using regression tree and recursive partitioning modelling methods. After that we ran the models into a steady state :: an :::::::::: equilibrium : with a litter input which was derived from the 90 Swedish forest inventory. Thereafter we compared the model estimates against data by groups that were obtained from the regression tree model. In discussion we address the reasons why the models deviate and indicate directions of further improvements.
2 Material and methods

95
We analysed data from the Swedish forest soil inventory (SFSI) which is a stratified national grid survey of vegetation and physicochemical properties of soils (SLU, 2011, Olsson et al., 2009). All analysis was done using R software for statistical computing and graphics (R core team 2014). The soil data were identical to dataset used in Stendahl et al. (2010). We restricted our sample plots to minerogenic soils since the Q, Yasso07, and CENTURY models were not developed for use on 100 peat soils, and only to plots for forest land use with Swedish forest inventory data (SFI). We also excluded samples with total SOC stock below 2.8 and above 470. Each sample plot contained categorical data from the field survey on the sorting of soil parent material, humus type, soil texture, and soil moisture. In our analysis we reduced categorical classes by basing them on the sorting of soil parent material and humus type (Table 1). We determined numeric values for silt, clay, and sand content from soil texture categories by Albert Atterberg's distribution 115 of the different grain size fractions in tills and by Lindén's (2002) distributions for sediments (Table 1). We also determined numeric values of volumetric soil water content (SWC) from categorical field data classified according to the depth of the ground water level (WL) ( Table 1).
In order to derive the litter inputs, annual turnover rate (TR, the fraction of living biomass that is shed onto the ground per year, unitless) of biomass components were applied to the modelled biomass components of the steady state ::::::::: equilibrium forest. The needle litter TR was a linear function of latitude for pine and spruce and a constant for deciduous species (Ågren et al., 2007). The TR of 170 branches and roots were from Mukkonen and Lehtonen (2004), Lehtonen et al. (2004) and the TR of stump and stem were from Viro (1955), Mälkönen (1974Mälkönen ( , 1977 as cited in Liski et al. (2006).
For tree fine roots we assumed there was a difference between tree species and between southern and northern Sweden. For pine, spruce, and birch the fine roots TR were 0.811, 0.868, and 1.0 respectively as reported by Maidi (2001) and Kurz et al. (1996), and cited in Liski et al. (2006). Kleja

180
The major part of the litter input originated from the tree stand biomass components which were modeled by the non-linear functions with R 2 values close to 0.9 (Fig. B1, Tables A1 and B1).
The linear understory vegetation models had low R 2 values (Table C1). However, when the understory models (Appendix C) were applied only to plots close to steady state ::::::::: equilibrium : forest, as in our application, the R 2 values of predicted and observed understory components were larger 185 ( Fig. S9). In comparison to major understory litterfall originating from reasonably well predicted dwarf-shrubs and mosses ( Fig. S9 and S10), the influence of poorer understory models (for herbs, grass, and lichens) was small on predictions of the understory litter and marginal on predictions of the total forest litterfall (Fig. S10). The main improvement on the accuracy of total litter input was achieved by avoiding the confounding effect of actual :::::::: observed forest state by modelling the 190 biomass/litterfall estimates representing the mean long-term conditions (defined by estimated steady state :::::::::: equilibrium f AP AR70 ) for small regions (defined by degree of latitude and productivity class for dominant species, Fig. A1). Thus the estimates accurately reflected the long-term spatial variability in dominant species, nutrient status and climate (Fig. S11) and lacked higher spatial and temporal precission; as attempts for high precision of the estimates applied for the period of the last 195 few thousands of years would be uncertain due to high variation of factors affecting plot history.

Correlation analysis
Overall our data consists of 3230 soil samples and their carbon stocks linked to soil physicochemical variables, stand and ground vegetation biomass and litterfall components, and nearest weather station environmental variables. We performed the Spearman's rank correlation analysis between the 200 6 total soil carbon stock and the other soil variables, site, climate and vegetation characteristics. As expected the total soil carbon stock most strongly correlated with the measured variables used for its calculation e.g. bulk density, depth of humus and mineral soil, carbon content, and stoniness. These variables were excluded from further regression tree analysis which aimed to group data according to the processes of soil carbon stock development.

Regression trees
In order to organize SOC data into groups according to the physicochemical soil variables and to better understand the nature of measured data, we generated regression trees of SOC stocks by using recursive partitioning (RPART) (Therneau and Atkinson 1997). RPART is based on developing decision rules for predicting and cross validation of continuous output of soil carbon stocks (regression 210 tree). The classification tree was built by finding a single variable which best splits the data into two groups. Each sub-group was recursively separated until no improvement could be made to the soil carbon stock estimated by using the split based regression model. The complex resultant regression tree model was cross validated for a nested set of sub trees by computing the estimate of soil carbon stock to trim back the full tree.

215
When building the regression tree models we excluded variables such as bulk density, carbon contents of soil layers, soil depth, and stoniness, since these measured variables were used for determining the total soil carbon stock. The selected variables for the RPART data mining were based on the correlations analysis (see 2.1.2), the processes of soil organic matter formation (e.g. Husson et al., 2013) and decomposition, and represented the soil categorical variables (sorting of parent ma-220 terial, soil texture, long-term soil moisture and humus form), soil physicochemical variables (sand, clay, and silt content, long-term soil moisture, highly bound water, C/N ratio, pH, CEC of organic, B, BC, and C horizons), climatic variables (annual mean air temperature, annual precipitation sum), and stand and site characteristics (tree species coverage of pine, spruce and deciduous, total foliar litter input, productivity class and N deposition). Alternatively we also ran regression and classifi-225 cation analysis by excluding all measured soil variables because soil variables are often unavailable for landscape level modelling.
The regression tree model separated the measured total SOC stocks (tC ha −1 ) into 10 groups.
The cation exchange capacity of the BC horizon (CEC, mmol c kg −1 ) divided all the samples into 2/3 of lower SOC stock groups (means between 65 and 130 tC ha −1 ) and 1/3 of larger groups 230 (means between 86 and 269 tC ha −1 ) (Fig. 2a). The group of the smallest SOC stock consisted of 959 samples compared to 8 samples of the group with the largest SOC stocks. We acknowledge that this is a small distinct group based only on 8 observation. However, we did not have any reasons to exclude these datapoints as outliers. Two-thirds of samples with smaller SOC stocks were subdivided by CEC and the type sorting of soil parent material (sorted or unsorted). One-third of samples with 235 larger SOC stocks was subdivided by the C/N ratio, CEC, N deposition among others. Roughly generalized, groups from left to right or from 1 to 10 formed a gradient in levels of SOC stock, moisture, nutrient status, and production ( Fig. 2, Table S1).

Soil carbon stock modelling
The Q model (Rolff and Ågren, 1999) is a continuous mechanistic litter decomposition model describing change of soil organic matter over time. The decomposition rate for the branch, stem, needle, fine root, and woody litter fractions is controlled by the temperature, litter quality, microbial growth and litter invasion rate. The model has been calibrated for seven climatic regions of Sweden in order 255 to account for Swedish temperature and precipitation gradients (Ortiz et al., 2011) (Table 3). The Q model was applied in several studies of SOC stock and change estimation in Sweden (e.g. Stendahl et al., 2010;Ortiz et al., 2013;Ågren et al., 2007). The Q model was run for seven Swedish climatic regions (Ortiz et al., 2011). The mean regional parameterization from the calibration of the 2011 Q model was used for the plot simulations. Thus, the simulations in each region represent variations in 260 climate and litter input and not parameter variations. The steady state ::::::::: equilibrium soil carbon stocks are estimated in the model using the equation for steady state ::::::::: equilibrium : soil carbon stock which is derived from the decomposition functions with constant amounts and quality of litter input.
The Yasso07 model  is one of the most widely applied SOC models. The model was calibrated based on almost 10 000 measurements of litter decomposition from Europe,

265
North and South America (Table 3). The required annual inputs of litterfall, its size and chemical composition, temperature and precipitation determine the decomposition and sequestration rates of soil organic matter. Yasso07 estimates SOC stock to a depth of 1 m (organic and mineral layers), change of SOC stock, and heterotrophic soil respiration. Species specific chemical composition of different litter compartments of Yasso07 were used according to Liski et al. (2009). The initial soil 270 organic matter of Yasso07 was zero. The simulated soil carbon stock corresponding to a steady-state ::::::::: equilibrium : between the litter input and decomposition was achieved by a Yasso07 spin-up run of 10 8 000 years. Yasso07 runs used litter inputs of the steady state ::::::::: equilibrium : forest biomasses (see 2.1.1) and climate variables (annual air temperature, monthly temperature amplitude, and annual precipitation). The global parameter values of decomposition rates, flow rates, and other dependencies of 275 Yasso07 soil carbon model were adopted from Tuomi et al. (2011) and the estimates of Yasso07 SOC stocks were used in comparison with measurements and other models. We did not use the SOC stocks simulated with the more recent Yasso07 parameters based on the litter decomposition data from the Nordic countries (Rantakari et al., 2012), because the SOC stocks simulated with the global parameter values produced better fit with SFSI measurements.

280
The CENTURY mathematical model originally developed for grassland systems (Parton et al., 1987;1992) has been since modified for various ecosystems including boreal forests (Nalder and Wein 2006). The CENTURY is also one of the most widely applied models. The soil organic matter in the model consists of active, slow, and passive pools which have different TR (Table 3).
The decomposition rates are modified by temperature and moisture, and in addition the decomposi-  S11). For CENTURY we adopted general parameters from the parameter file "tree.100", parameters of site "AND H_J_ANDREWS" for conifers, and site "CWT Coweeta" for deciduous trees. The N dynamics in CENTURY sub-model included tuning site specific parameters of topsoil mineral 295 N relative to N deposition (Throop et al. 2004) and reduction of C/N ratio of the litterfall up to 15% for most productive sites (Merilä et al. 2014). We also accounted for site specific soil drainage by varying its parameter between 1 and 0.6 relative to long-term soil water content ranging between 10 and 50% (Raich et al. 2000). The CENTURY SOC stocks simulation were run with steady state :::::::::: equilibrium forest litter inputs, site specific C/N ratios of litterfall, site specific soil parameters 300 (specific bulk density, sand, silt, and clay content, mineral N in topsoil, and drainage) and climate variables (monthly air temperature, and monthly precipitation). In order to account for the deep soil carbon (Jobbágy and Jackson 2000), we scaled CENTURY estimates representing the topsoil horizon by adding 40% of estimated site specific SOC stock. The simulated steady state :::::::::: equilibrium SOC stocks were estimated by a spin-up run of 5 000 years. The number of years to reach steady 305 state :::::::::: equilibrium (equilibrium between the litter input and decomposition) was sought empirically on 100 random sites, and differs from Yasso07 and Q models.

Results
The distributions of Yasso07, Q, and CENTURY model estimates of total SOC stocks (tC ha −1 ) were in agreement for 2/3 of the measured data with lower SOC stock (Fig. 3, distributions of 310 groups 1, 2, and 4). The remaining 1/3 of data was underestimated by models. This 1/3 of data was separated into 7 physicochemical soil groups (means of groups in range from 104 to exceptionally large 269 tC ha −1 , see Fig. 3, distributions of groups 3, and 5-10). The linear regression of mean levels of all 10 physicochemical soil groups (weighted by the number of samples in each group) between the modelled and measured SOC stocks showed smaller underestimation of CENTURY 315 compared to Yasso07 and Q models (Fig. 4). The weighted root mean square error (RMSE) was 27.5 (tC ha −1 ) for CENTURY and 31.6 and 38.8 for Yasso07 and Q respectively. The proportion of explained variance was larger for Q ( r 2 = 0.58) than for Yasso07 and CENTURY ( r 2 = 0.42 and 0.32) (Fig. 4). The deviation of the distributions of CENTURY SOC stocks, simulated using soil bulk density, sand, silt, and clay content, were lower than those of Yasso07 and Q estimates 320 for 10 physicochemical soil groups (Fig. 3). Accounting for site specific soil texture (clay, silt, and sand content) and structure (bulk density) by CENTURY model improved SOC stock estimates for fertile sites with high clay content, but not for sites with high N deposition. Varying CENTURY parameters of site specific topsoil mineral Nitrogen and C/N ratio of the litterfall showed that this impact on SOC stocks estimates was small in comparison to sensitivity of SOC stock estimates to 325 litterfall (Fig. S12). The application of site specific drainage on our mostly well drained soils showed minor impact on estimated CENTURY SOC stocks.
As expected, the models clearly showed less variation than the measurements. The shift of the mean values from the center of distribution, the width of confidence intervals of means, and the width of the tails of distributions were clearly larger for the measurements than for the modelled estimates 330 (Fig. 3). The modelled distributions agreed for the poor-medium fertility soils with low and medium measured SOC stocks, low and medium CEC, unsorted parent material, low temperatures and low production (groups 1, 2, and 4) (Fig. 2, Table S1, Fig. 3). Disagreement between modelled and measured SOC stock distributions were formed on fertile soils with sorted parent material (groups 3 and 5), soils with higher water content (groups 3, 5, and 10), where nitrogen deposition was large 335 (groups 7 and 8), and where CEC was median or large (Fig. 2, Fig. 3). The largest deviation between the measured and modelled distributions was found for the relatively small physicochemical groups of soils (3%) typical for highly bound water and peat humus types (groups 8 and 10) (Fig. 2, Fig. 3).
The distributions of measured total SOC stocks (tC ha −1 ) generally increased for the groups with higher nutrient status (Fig. 3, Fig. S4). The distributions of SOC stocks in mineral soil were larger 340 than those in humus horizon, and distributions of mineral SOC stocks increased with fertility slightly more than distributions of SOC stocks in humus horizon (Fig. S4).
After excluding all the soil physicochemical characteristics from the recursive partitioning, the SOC stock distributions of 5 groups regression tree model (Fig. S3, Table S2) were in agreement between the measurements and model estimates for 3 groups (77% of samples) and deviated for 2 345 groups (23%) (Fig. S5). The modelled SOC stock distributions agreed with measurements for all models on sites with cold ::: low : annual temperatures < 3 • C in northern sites (low-C.cold.pine, low-C.cold.other) and for warmer conditions in middle Sweden on sites with low nitrogen deposition and median SOC stocks (Fig. S5). However, the models underestimated SOC stocks on sites with high (> 10 kgN ha −1 y −1 ) N deposition (21% of samples) and on sites with warm and dry climate (2% 350 of samples) (Fig. S5).
The variation of density functions of modeled SOC stocks for 10 physicochemical groups (Fig. 3) was similar to the variation of the total annual plant litter input (tC ha −1 ) (Fig. S6) indicating that litterfall was the main driver of SOC accumulation in the models . The mean levels of annual plant litter input and mean SOC stocks for 10 soil groups were more strongly correlated for Yasso07 and 355 Q models (with r 2 values 0.86 and 0.96, respectively) than for CENTURY (r 2 = 0.52). Although, models performed reasonably well for the largest soil groups of nutrient and production levels ( Fig. 3 and Fig. 4), none of the models was able to predict variation of individual samples (Fig. S7). The model estimates were well correlated between Yasso07 and CENTURY with r 2 ranging from 45 to 73% for individual samples of 10 soil groups, whereas the correlations of estimates between Q and 360 the other two models were lower (Fig. S8).

SOC stock distributions linked to mechanisms of SOM stabilization
It has been suggested that process based soil carbon models with the current formulation lacking major soil environmental and biological controls of decomposition would fail for conditions where 365 these controls predominate (Schmidt et al., 2011;Averill et al., 2014). Although, the effect of the soil properties on SOC stocks e.g. soil nutrient status in the widely used models such as Yasso07, Q, and CENTURY have not previously been quantitatively evaluated. We found that in comparison with Swedish forest soil inventory data, the models based on the amount and quality of inherent structural properties of plant litter (Q, Yasso07, and CENTURY) produced accurate SOC stock es-370 timates for 2/3 of northern boreal forest soils in Sweden. Two-thirds of the distributions of SOC stocks measurements of SFSI agreed with distributions of SOC stock estimates of the Q, Yasso07, and CENTURY soil carbon models (Fig. 3, distributions of groups 1, 2, and 4). However, the SOC stocks underestimation by these models for one third of the data (Fig. 3, distributions of groups 3, and 5-10) indicated that some drivers other than molecular structure, especially site nutrient status, 375 play an important role in higher SOC stocks sequestration.
Some level of deviation from measurements and poorly explained spatial variation (Fig. S7) (Fig. S6) then the spatial differences between measured and modeled SOC stock distributions could be linked to sites with rich nutrient status through cation 385 exchange capacity, C/N ratio, N deposition, drainage (sorting of parent material) among other factors ( Fig. 2 and 3). Additionally, when the soil properties were excluded from the regression, the estimates of SOC stocks also deviated for the fertile groups (Fig. S5). However, the rich nutrient status for these groups was linked to differences in species composition, N deposition, and climate (temperature, precipitation) instead of soil properties (Fig. S3).

390
Larger net soil carbon accumulation in nutrient rich sites could be attributed to the relative differences in litterfall components (relatively more leaves and branches with higher N content than fine roots), and to higher N availability and carbon use efficiency of decomposers, reduction of respiration per unit of C uptake (Ågren et al., 2001, Manzoni et al., 2012, Fernández-Martínez et al., 2014. Largest deviation between measured and modeled data in our study was found for fertile presumably 395 N rich and fresh to fresh-moist sites. The soils with large N deposition were also highly productive and showed high to exceptionally high SOC stocks (Fig. 2, Fig. 3, soil groups 7 and 8). This was in agreement with fertilization and modelling study of Franklin et al. (2003) showing an increase in soil C accumulation with N addition. Our forest biomass and litterfall estimates were based on forest inventory and modeling, but the site nutrient status and N deposition was only partially reflected in 400 the amount of biomass/litterfall (Fig. S11) and its quality. The quality was only reflected through the biochemical differences between species and plant litter components. The relative differences between the biomass/litterfall components or between C/N ratios of litterfall in relation to site fertility are not accounted by the current biomass models, but soil fertility could be considered in an attempt of SOC stock modelling (included in CENTURY but missing in Yasso07 and Q models). For exam-405 ple the proportion of acid -, water -, and ethanol-soluble and non-soluble litter inputs for Yasso07 could be re-evaluated by allowing it to vary depending on site fertility, in addition to currently used variation specific for species and the litter components. Although CENTURY SOC stocks were sensitive to the amount of clay, the variation of topsoil mineral N and C/N ratio of litterfall did not improved SOC stock predictions for sites with high N deposition ( Fig. 3 and Table S1).

410
The litter decomposition and SOC stabilization rates in Yasso07, Q, and CENTURY based on the litter quality "recalcitrance" originating from the litter bag mass loss measurements have major drawbacks. The mass loss from the litter bags is assumed to be fully mineralized, although the litterbags are subjected to non-negligible leaching (Rantakari et al., 2012;Kammer and Hagedorn, 2011). The SOC stabilization represented in models by the remaining litter mass is thus underestimated due to 415 the fraction of particulate organic matter and dissolved organic carbon that is lost from the litterbags but later immobilized e.g. through organo-mineral stabilization. The use of stable isotopes seems to determine the field carbon mineralization and accumulation rates from the labile (high C quality and N concentration) or recalcitrant (low C quality and N concentration) litter more accurately than litter bags (Kammer and Hagedorn, 2011).

420
Higher amount of more recalcitrant fine roots compared to more labile leaves (Xia et al., 2015) heavily increased the soil carbon sequestration in CENTURY model simulations which was in line with McCormack et al. (2015). Though, the contribution of fine roots to SOC stabilization is still not settled due to the significant role of mycorrhizal fungi in SOC accumulation (Averill et al., 2014;Orwin et al., 2011). Xia et al. (2015) claimed that more recalcitrant fine roots contribute to stable 425 SOC more than leaf litter, because fine roots degrade slower. This would be supported by the fact if the precursors of fine roots that are degraded by fungi are more stable than the precursors of leaves degraded by microbes. However, more recalcitrant plant litter has been also suggested to stabilize less SOC stocks (Kammer and Hagedorn, 2011). This is a result of recalcitrant litter satisfying less of the microbial N demands promoting respiration and reducing the long-term production of mi-430 crobial products, precursors for the organo-mineral stabilization (Cotrufo et al., 2013, Castellano et al., 2015. According to the microbial efficiency-matrix (MEM) stabilization mechanism (Cotrufo et al., 2013) fertile sites with relatively more labile plant litter, but with larger absolute production and larger microbial activity than poor sites, would in long-term stabilize more carbon through organo-mineral stabilization. Our results supported MEM stabilization theory by showing larger car-435 bon stocks in mineral soil than in humus horizon, and by relatively more SOC stocks in mineral soil in fertile groups than in poor conditions (Fig. S4).
Expanding on the CENTURY model structure, the MySCaN model incorporating the organic nutrient uptake by mycorrhizal fungi estimated positive effect on SOC accumulation, relatively larger in poor than in fertile sites (Orwin et al., 2011). Therefore, not accounting for the organic nutrient 440 uptake by mycorrhizal fungi by the Yasso07, Q, and CENTURY models probably led to the underestimation of SOC stocks in sites with higher nutrient status. This hypothesis needs to be tested in further studies. We did not have all input data and the source code to include MySCaN into our model intercomparison. The spatial trends of N and P data of litter in Sweden that would be needed to make such study were not available. However, adjusting biomass turnover rates, used for the lit-445 ter input estimation, in dependence to site fertility would lead into larger inputs for fertile sites and increased SOC stock accumulation as a result of increasing plant productivity and inputs. It is well established that SOM increases soil fertility by improving the soil water and nutrient holding capacity; recycling of SOM increases CEC, humic substances and nutrient availability for plant resulting in larger biomass/litter production (Zandonadi et al., 2013). As an alternative to adjusting turnover 450 rates with site fertility, we suggest that a feedback link in models between increasing fertility due to SOC stock accumulation (e.g. due to increased CEC relative to humus, increased nitrogen availability), increasing litter inputs, and reduced rates of SOC decomposition per unit of litter input (e.g. through satisfying more microbial N demand with less respiration, limited oxygen in increased moisture conditions) would also increase SOC stock accumulation.

455
Increased moisture and more frequent water saturation due to SOC accumulation limits soil oxygen availability and slows rates of microbial decomposition which increases the rate of SOC stabilization. Our results, which were derived from mostly well drained soils, suggest that measured high SOC stocks may be partly caused by reduction of decomposition at increased water content (Fig. 2). The CENTURY model has an optional function that represents the reduction of decompo-460 sition caused by anaerobic conditions. The function becomes active when a controlling parameter, "drain", is changed, and the value of the parameter has to be arbitrarily determined through parameter fitting against SOC data (e.g. Raich et al., 2000). However, this function was meant for anaerobic conditions in poorly drained soils, therefore it was not applicable to the prevailing conditions of our sites. Accounting for drainage only on some sites slightly affected decomposition, when precip-465 itation increased and potential evapotranspiration decreased in late spring or early autumn. Water availablility affecting soil fertility and SOC formation is beside climate also affected by topography (Clarholm et al., 2013) which was not accounted for by CENTURY. Detailed modelling of soil water conditions requires specific functions and many parameters, which are not included in simpler SOC models like Q and Yasso07. However, appropriate modelling of soil water conditions and reduction 470 of decomposition in wet conditions (not necessarily at saturation) would potentially improve the performance of SOC models in particular for soils with high SOC stocks.

Intercomparison of models
The similarities between the variations of modeled SOC stocks and litterfall inputs for the soil groups with different fertilities (Fig. 3, Fig. S6) could be expected for the Yasso07 and Q models which ig-475 nore the soil properties. These models run organic matter decomposition and humus stabilization with litterfall, temperature and/or precipitations input data. Litter quality as input in Yasso07 and Q implicitly includes some information on soil properties, but as we saw litter quality hardly mapped any of soil fertility. Although, the impact of soil properties on the estimates was seen in the more complex CENTURY model for sites with high clay content, the SOC stock of sites with high N 480 deposition were underestimated. The CENTURY model depended less on the amount of litter input, and its variations of the estimated SOC stocks distributions were less pronounced than those for the Yasso07 and Q models. In testing multiple soil carbon models with same litter inputs Palosuo et al. (2012) observed larger variation in modeled SOC stocks at the early stage of the litter decomposition (10 years) but later on at 100 years the variation decreased. Although the variations of 485 SOC stocks were similar between the models, the estimated CENTURY SOC stocks distributions were lower than the Yasso07 estimates when we did not accounted for deep soil carbon. CENTURY in its original configuration simulated SOC stock up to 20 cm soil depth (Metherell et al., 1993) whereas the Yasso07, Q, and measured SOC stocks data represented up to 100 cm of the soil , Stendahl et al., 2010. In Yasso07 model parameters were calibrated based on soil age 490 chronosequence data of SOC stocks for soil depths up to 30 cm, which was assumed to represent 60% of the total SOC stocks up to 100 cm soil depth (Liski et al., 1998(Liski et al., , 2005 as cited by Tuomi et al., 2009). Therefore, when 40% of the missing deep carbon (Jobbágy and Jackson 2000) were added on top of the original CENTURY estimates as was done when callibrating Yasso07, the SOC stock levels for CENTURY were larger than those for the Yasso07 and Q models.

495
Although estimated SOC stocks of CENTURY were generally larger than those of Yasso07, the correlation between CENTURY and Yasso07 estimates was stronger than for Q model compared to two other models (Fig. S8). The reason was probably similar global parameterizations of Yasso07 and CENTURY whereas Q was specifically parameterized and applied for the regions in Sweden Hyvönen 2003, Ortiz et al., 2013). Furthermore the Q model SOC stock estimates 500 were more sensitive to differences in species coverage e.g. to pine and spruce (Ågren and Hyvönen 2003) and formed two distinct point cloud distributions (one for pine and broadleaves, the other for spruce) when compared with the CENTURY or Yasso07 estimates (Fig. S8). In spite of similarities in Yasso07 and CENTURY SOC stocks estimates, Yasso07 was more sensitive to species coverage through species specific litterfall solubility ) than CENTURY which treated 505 conifers in a single group (Metherell et al., 1993). Pine and other species (spruce) coverage was shown to affect measured low and median SOC stocks of colder climate if the soil properties were not considered (Fig. S5). Therefore the pattern of increased accumulation of SOC stock on sites with larger spruce coverage partially observed in distribution of Yass07 estimates, and missing in the CENTURY estimates, could be related to the slightly lower solubility/decomposability of spruce 510 compared to pine litterfall. However, the CENTURY model SOC stocks were also highly sensitive to accurate estimation of fineroots litterfall (Mc Cormack et al., 2015) typically increasing with colder climate and increasing the C/N ratio of the organic layer (Lehtonen et al., 2015b) which is driven by the dominant tree species (Cools et al., 2014).
Large SOC stocks measurements on sites with high long-term nitrogen deposition over 10 515 kgN ha −1 y −1 (Fig. 3 and Fig. S4) were underestimated by the Q, Yasoo07, and CENTURY models.
A positive correlation between nitrogen deposition and SOC stocks measurements in Sweden had been previously reported by Olsson et al. (2009), and the modelling study by Svensson et al. (2008) indicated that Swedish soil carbon was decreasing in the North and increasing in the South mainly as a result of different nitrogen inputs. The Q and Yasso07 models do not have nitrogen processes.

520
As for CENTURY, it is reported that large N input could enhance plant productivity and then increase SOC (Raich et al., 2000). The purpose of the study was to evaluate the performance of soil carbon models against the same SOC data using the same litter input, and therefore only the soil carbon submodel was used and the feedback of nitrogen input to plant productivity was primarily included in this study indirectly, through estimated steady state ::::::::: equilibrium : litter input based on site 525 productivity class which strongly correlated with N deposition (Fig. A1 and S11). In spite of slight increase of SOC stock estimates when CENTURY accounted for the site specific topsoil mineral N, C/N ratio of litterfall, in sites with large N deposition CENTURY still underestimated. However, as in the case of drainage discussed above, the CENTURY incorporates more detailed processes than the relatively simpler soil carbon models, Q and Yasso07, do, and hence the CENTURY could 530 potentially reproduce a wider range of SOC stocks if it was parameterized with more detailed data.

Conclusions
In this study we presented the reasons to re-evaluate the connection between the soil nutrient status and performance of widely applied soil carbon models (Yasso07, Q, and CENTURY). As previously described in detail, our simulation was based on the widely used process based SOC models, accurate 535 driving data including litter inputs, and massive SOC data points (Swedish inventory data, N=3230).
The models differed in the main controls and functions and their performance was expected to depend on model complexity (CENTURY outperforming Q and Yasso07). The intercomparison of SOC stocks between Yasso07, Q, and CENTURY models and Swedish soil carbon inventory data revealed that these process based mathematical models developed for predicting short-term SOC 540 stock changes can all in their current state predict accurate long-term SOC stocks for most soils.
The estimates of CENTURY fitted generally better to measurements than those of Yasso07 and Q model. However, Yasso07 model which requires fewer parameters and less input data showed similar performance than CENTURY, except for sites with hig clay content. The models with their current formulation lack nutrient status related controls of decomposition and soil carbon accumulation and 545 underestimated for conditions where the high nutrient status predominate, in our application for medium-highly productive sites of Southern Sweden.
Through the intercomparison of three different widely-used SOC models with massive data points, we identified that re-evaluating of the impact of nutrient status would improve the model development towards their accuracy. Particularly, the relationship between the soil nutrient status and the 550 mechanism of soil organo-mineral carbon stabilization needs to be re-evaluated, because larger SOC stocks were found in the mineral than in the humus soil horizon. We suggest evaluating enhanced microbial transformation of soil organic matter and the mycorrhizal organic nutrient uptake in relation to larger plant biomass/litter production in nutrient rich sites resulting to higher SOC stock accumulation in deeper soil layers. In addition for the organo-mineral carbon stabilization, we also 555 suggest further model development accounting for the soil nutrient status through evaluating the effect of topography on sorting of the parent material, and its silt and clay complexes.
Our study is very useful for developing accurate soil carbon and Earth system models. Furthermore, developing accurate models that would account for the soil nutrient status as one of the key controls affecting the soil organic matter production and SOC stabilization improves estimation of 560 feedback of global warming on SOC stock temperature sensitivity and soil CO 2 efflux, national reporting of soil carbon stock changes for UNFCCC, and implications of decisions mitigating the climate change effects on soil carbon stocks.
Appendix A: Models of fraction of absorbed radiation for actual ::::::: observed : and steady state :::::::::: The fraction of photosynthetically active absorbed radiation (f AP AR ) for actual :::::::: observed state forest was calculated based on basic tree measurements of Swedish forest inventory data as in Härkönen et al. (2010). For the main tree species f AP AR was also well correlated with the stand basal area ( r 2 was 0.85, 0.86, and 0.88 for pine, spruce, and deciduous stands respectively, coefficients of regressions in Table A1). The actual ::::::: observed : state forest f AP AR varied between 0 and maximum 570 close to 1 (Fig. A1).
The steady state :::::::::: equilibrium : forest f AP AR values were assumed to be in range between the median and the maximum fraction of actual :::::::: observed state forest f AP AR for given species, latitudinal degree, and site productivity class (indicated by the height of largest tress at 100 years of stands age). The steady state ::::::::: equilibrium : forest f AP AR values were set to 70th percentile of 575 maximum (f AP AR70 ) for given species, latitudinal degree, and site productivity class. We selected 70th percentile out of range from 50th to 95th, because the modelled soil carbon distributions with the litter input from biomass of f AP AR70 best agreed with measured soil carbon distributions (Fig. S2). The f AP AR70 values specific for pine, spruce, and deciduous stands were first modelled by regression models with latitude (f AP AR70LAT ) (Table A2) and then reduced by the differ-580 ence between the modelled f AP AR70 by regression models with productivity class :: site ::::::::::: productivity :::: index : (H100) (f AP AR70H100 ) (Table A1) and maximum f AP AR70H100 (f AP AR70 = f AP AR70LAT + f AP AR70H100 -maximum f AP AR70H100 ). The f AP AR70 values equaled the f AP AR70LAT values only for the maximum productivity class, otherwise it was reduced.
Appendix B: Models of forest dry weight biomass (kg ha −1 ) with f AP AR .
The biomass components derived with allometric models (measured) and those derived with f AP AR models (modeled) showed strong correlations (Fig. B1). In order to model the longterm mean forest 590 biomass "steady state :::::::::: equilibrium forest biomass" we applied the f AP AR biomass models to the modeled f AP AR70 values.
Appendix C: Models of understory vegetation.
We used Swedish forest inventory ground vegetation coverage (%) data visually monitored between 1993 and 2002 on 2440 plots around Sweden with altogether 4472 observations separately for 595 species of forest floor vegetation /or their classes (Table S3). In order to derive the ground vegetation biomass and to apply the coverage/biomass conversion functions (Lehtonen et al., manuscript), we grouped the species coverage observations into five functional types (dwarf-shrubs, herbs, grasses, moss, and lichen) ( Table S3). The applied coverage/biomass conversion functions estimated separately the above-and below-ground biomass components for dwarf-shrubs, herbs, and grasses, and 600 total biomass for moss, and lichen.
Except the understory coverage, the forest inventory data also contained basic tree dimensions (diameter and height of trees) and stand variables (species dominance, age, basal area, site productivity class indicated by the height of largest tress at 100 years of stands age), and also we linked the plots by their closest proximity to SMHI weather stations with weather data (air temperature, 605 precipitation) and location attributes of the weather stations (latitude, longitude, altitude).
We built linear models for dry weight biomass of understory vegetation (kg ha −1 ) in a two level selection of the predictors from stand, weather and location variables. First, we selected the predictors into linear models by using R package "Mass" and its stepwise model selection by exact AIC (Venables and Ripley, 2002). Second, we refined the model by using "relaimpo" R package estimat-610 ing usefulness (Grömping, 2006), or relative importance for each of the predictors in the model, and by selecting only predictors with relative importance ≥ 0.1. The general form of the models was: Where y i is the understory dry weight biomass (kg ha −1 ), x 1 . . . x n are the predictors, a, b 1 . . . b n are parameters of the i th understory functional type (Table C1), and ε is the residual error. Statistics 615 of the models are shown in Table C1. Scatter plots between the measured coverage derived biomass and modelled dry weight biomass (kg ha −1 ) of the functional types of ground vegetation for the forests in their actual ::::::: observed : state close to the estimated steady state ::::::::: equilibrium : are shown on Code and data availability 620 The source codes of the Yasso07, Q and CENTURY models used in this paper are available through the supplementary material. Data used in this study can be available directly by contacting the authors.       Table C1. Parameter estimates and their standard errors for the coefficients of the forest understory vegetation dry weight biomass (kg ha −1 ) models (Eq. C1) for functional types (1-dwarfshrubs, 2-herbs, 3-grasses, 4mosses and 5-lichens) with intercept (a) and n number of predictors (b1-age (years), b2 -basal area (m 2 ha −1 ), b3 -annual air temperature ( • C), b4 -latitude ( • ), b5 -H100 (height of trees at 100 years of age, m), b6 -H100 of spruce trees (m), b7 -H100 of pine trees (m), b8-pine dominance (0/1), b9-spruce dominance (0/1)).
For the latin names of species included into understory functional types see Table S3.  32 Figure 3. Bean plot of density functions for 10 physicochemical groups of the soil carbon (tC ha −1 ) measurements (grey fill) and estimates simulated by the soil carbon models Yasso07, CENTURY, and Q with the litter input derived from the steady state ::::::::: equilibrium forest. The thin lines are the density distributions. The thick lines are the group means and dashed lines are their confidence intervals. The n is number of samples. For description of group levels of SOC stocks, moisture, and fertility see Fig.2 and Table S1. Figure A1. Actual state ::::::: Observed : fraction of absorbed radiation (fAP AR, estimated as in Härkönen et al., 2010) (actual :::::: observed : fAP AR) and steady state ::::::::: equilibrium fAP AR (modeled fAP AR70) which was set to 70th percentile of maximum fAP AR for given species, latitudinal degree, and site productivity class. Panels a), b), and c) show relation between fAP AR and latitude (°) for forest stands dominant by Scots pine, Norway spruce and deciduous species, whereas panels d), e), and f) show relation between fAP AR and site productivity class ( :::: index H100 , : (height of dominant trees at 100 years in meters). Figure B1. Scatter plots : (n : = :::: 3698 :: in :: all :::::: panels) for the dry weight tree biomass components (tC ha −1 ) between "modelled" (estimated based on fraction of absorbed radiation,fAP AR, and our fAP AR models) and "measured" (estimated based on basic tree stand dimensions and allometric biomass models). The r 2 values represent the coefficient of determination indicating how close the modeled values fit the measured values.