Land-use and land-cover change carbon emissions between 1901 and 2012 constrained by biomass observations

. The use of dynamic global vegetation models (DGVMs) to estimate CO 2 emissions from land-use and land-cover change (LULCC) offers a new window to account for spatial and temporal details of emissions and for ecosystem processes affected by LULCC. One drawback of LULCC emissions from DGVMs, however, is lack of observation constraint. Here, we propose a new method of using satellite- and inventory-based biomass observations to constrain historical cumulative LULCC emissions (E cLUC ) from an ensemble of nine DGVMs based on emerging relationships between simulated vegetation biomass and E cLUC . This method is applicable on the global and regional scale. The original DGVM estimates of E cLUC range from 94 to 273 PgC during 1901–2012. After constraining by current biomass observations, we derive a best estimate of 155 ± 50 PgC (1 σ Gaussian error). The constrained LULCC emissions are higher than prior DGVM values in tropical regions but signiﬁcantly lower in North America. Our emergent constraint approach independently veriﬁes the median model estimate by biomass observations, giving support to the use of this estimate in carbon budget assessments. The uncertainty in the constrained E cLUC is still relatively large because of the uncertainty in the biomass observations, and thus reduced uncertainty in addition to increased accuracy in biomass observations in the future will help improve the constraint. This constraint method can also be applied to evaluate the impact of land-based mitigation activities.

W. Li et al.: Land-use and land-cover change carbon emissions Abstract.The use of dynamic global vegetation models (DGVMs) to estimate CO 2 emissions from land-use and land-cover change (LULCC) offers a new window to account for spatial and temporal details of emissions and for ecosystem processes affected by LULCC.One drawback of LULCC emissions from DGVMs, however, is lack of observation constraint.Here, we propose a new method of using satellite-and inventory-based biomass observations to constrain historical cumulative LULCC emissions (E c LUC ) from an ensemble of nine DGVMs based on emerging relationships between simulated vegetation biomass and E c LUC .This method is applicable on the global and regional scale.The original DGVM estimates of E c LUC range from 94 to 273 PgC during 1901-2012.After constraining by current biomass observations, we derive a best estimate of 155 ± 50 PgC (1σ Gaussian error).The constrained LULCC emissions are higher than prior DGVM values in tropical regions but significantly lower in North America.Our emergent constraint approach independently verifies the median model estimate by biomass observations, giving support to the use of this estimate in carbon budget assessments.The uncertainty in the constrained E c LUC is still relatively large because of the uncertainty in the biomass observations, and thus reduced uncertainty in addition to increased accuracy in biomass observations in the future will help improve the constraint.This constraint method can also be applied to evaluate the impact of land-based mitigation activities.

Introduction
Carbon emissions from land-use and land-cover change (LULCC) are part of the human perturbation to the global carbon cycle (Houghton et al., 2012;Le Quéré et al., 2015) and started before the industrial era when fossil fuel CO 2 emissions appeared.Since 1850, estimated cumulative LULCC emissions, E c LUC , have represented one-third of total cumulative anthropogenic CO 2 emissions (Boden et al., 2013;Houghton et al., 2012;Le Quéré et al., 2015).Annual LULCC emissions have been higher than those from fossil fuel burning until the 1930s (Boden et al., 2013;Houghton et al., 2012;Le Quéré et al., 2015) and today represent a smaller but persistent perturbation in the global carbon cycle.Unlike fossil fuel emissions, relative uncertainties in LULCC emissions are high due to the difficulty of assessing this flux from measurements.Some progress has been made to better quantify gross tropical deforestation emissions by combining spatial biomass data with satellite-derived maps delineating forest cover loss (Harris et al., 2012).However, such spatially resolved data are not available beyond the last decade and provide only gross deforestation emissions, i.e., do not track the regrowth of secondary ecosystems or legacy soil carbon losses that can persist long after deforestation.
Bookkeeping models (Hansis et al., 2015;Houghton, 1999) based on historical LULCC area data and tabulated functions of carbon losses and gains are one approach to estimating E c LUC , but they do not include the effects of environmental changes on carbon stocks before and after LULCC happens (Gasser and Ciais, 2013;Pongratz et al., 2014).The bookkeeping model of Houghton (1999) used for the annual update of the global carbon budget (Le Quéré et al., 2015) is based on regionally aggregated data and does not consider spatial differences in LULCC fluxes within a region.Alternatively, the estimated LULCC fluxes by dynamic global vegetation models (DGVMs) account for spatial and temporal variations in carbon stock densities and land-cover change, as well as for delayed ("legacy") carbon fluxes.In DGVMs, LULCC fluxes are related to environmental conditions through simulated carbon cycle processes, i.e., net primary production (NPP) and respiration, resulting in changes in biomass and soil carbon stocks simulated with variable atmospheric CO 2 concentration and climate.Yet, LULCC emissions from DGVMs differ greatly, even when these models are prescribed with the same inputs of landcover change data (such as time-variable areas of pasture and crops; Pitman et al., 2009).Several factors are responsible for differences in E c LUC among DGVMs, including (1) different representations of processes that determine the carbon densities of vegetation and soils subject to land-use change; (2) using dynamic vegetation or prescribing a fixed vegetation distribution; and 3) the use of different rules assigning how natural vegetation types change to agricultural areas (Peng et al., 2017;Pitman et al., 2009;Reick et al., 2013).
Carbon initially stored in forest biomass contributes the predominant portion of the LULCC emissions after deforestation (Hansis et al., 2015).Thus, an accurate representation of the biomass carbon density exposed to LULCC is crucial to reduce uncertainties in DGVM-based E c LUC estimates.Global biomass datasets based on inventories and satellites recently became available.These datasets (Table 1) provide the spatially distributed biomass carbon density on regional or global scales (Avitabile et al., 2016;Baccini et al., 2012;Carvalhais et al., 2014;Liu et al., 2015;Pan et al., 2011;Saatchi et al., 2011;Santoro et al., 2015;Thurner et al., 2014), but differ in terms of their coverage of aboveground or belowground biomass and whether they provide only forest biomass or biomass for all vegetation types.
In this study, we propose a new method to combine recent satellite-and inventory-based biomass datasets to constrain E c LUC simulated by DGVMs (Fig. 1).We analyzed the outputs from nine DGVMs (Table 2) of the Trends in Net Land-Atmosphere Exchange (TRENDY-v2) project (Sitch et al., 2015; http://dgvm.ceh.ac.uk/node/9) and developed global and regional regressions between initial biomass in 1901 and present-day biomass (average of 2000-2012) and between E c LUC during 1901-2012 and initial biomass across the DGVMs.The former set of regressions is used to extrapolate present-day observation-based biomass (Table 1) to initial biomass in the year 1901.The latter set of regressions is applied to provide an emerging constraint on E c LUC as a function of initial biomass (Fig. 1).Using the Gaussian uncertainties associated with the observation-based biomass datasets and the uncertainties in the two regressions, the Gaussian errors in E c LUC can be derived after applying the biomass constraint.

LULCC emissions and biomass from the DGVMs
The DGVMs in TRENDY-v2 was used to conduct two simulations (labeled S2 and S3) between 1860 (except JSBACH from 1850, Table 2) and 2012, with outputs quantifying LULCC emissions over the period 1901-2012 (Sitch et al., 2015).Both simulations are performed with changing climate and CO 2 concentration, but one (called S3) has variable LULCC maps based on Land-Use Harmonization (LUH) dataset (Hurtt et al., 2011; with an extension until 2012), and the other (called S2) has a time-invariant land-cover map representing the state in 1860.The difference in net biome production (NBP, the net carbon exchange between the biosphere and the atmosphere) between these two simulations (S3 and S2) defines modeled LULCC emissions.This calculation of LULCC emissions by DGVMs includes the "lost sink capacity" (called "altered sink capacity" in Gasser and Ciais, 2013, and "the loss of additional sink capacity" in Pongratz et al., 2014) because simulated NBP in the S2 simulation without LULCC is a net sink over areas affected by LULCC in S3.For example, forests have larger carbon stor-age and a slower turnover time than croplands and are thus expected to be carbon sinks when the atmospheric CO 2 level increases.After deforestation to croplands, this sink capacity due to CO 2 fertilization is lost.Modeled LULCC emissions include the legacy emissions from soil carbon losses and emissions from wood and other products produced by LULCC, as far as the latter are included in the TRENDY-v2 models (Table 2).The DGVMs used in this study are CLM4.5 (Oleson et al., 2013), JSBACH (Reick et al., 2013), JULES3.2(Best et al., 2011;Clark et al., 2011), LPJ (Sitch et al., 2003), LPJ-GUESS (Smith et al., 2001), LPX-Bern (Stocker et al., 2014), ORCHIDEE (Krinner et al., 2005), VISIT (Ito and Inatomi, 2012;Kato et al., 2013) and OCN (Zaehle and Friend, 2010).Each DGVM is described briefly in Table 2.
LULCC can either reduce or increase the biomass amount over time depending on the LULCC types.For example, forest clearing turns forest biomass into atmospheric CO 2 eventually, while secondary forest regrowth can increase biomass.The overall effect of LULCC on biomass during the historical period is a net loss of carbon (Houghton, 1999) due to converting natural vegetation into cultivated lands by humans (Klein Goldewijk et al., 2011).Identifying the LULCC-affected grid cells in each model is thus critical because only biomass in these grid cells should be used to constrain LULCC emissions.Grid cells affected by LULCC differ among models.Although all models share the same pasture and cropland areas from the LUH dataset (Hurtt et al., 2011), the models have different numbers of PFT, use different PFT definitions and have different allocation rules for translating the shared agricultural data into the new vegetation cover (Peng et al., 2017;Pitman et al., 2009;Reick et al., 2013).As a result, there is no unified map to determine the www.biogeosciences.net/14/5053/2017/Biogeosciences, 14, 5053-5067, 2017 Table 1.The different biomass datasets based on observations.The biomass information from the TRENDY-v2 project is also listed for comparison.Dataset LULCC-affected grid cells in all models.For the same reasons, the forest areas and the LULCC types are also different among models.
In this study, we adopted the "deforestation grid cells" in their corresponding PFT maps as a criterion to locate the LULCC-affected grid cells from DGVM outputs.Thus we used the PFT maps from each model to first calculate the temporal change in forest area (total area of all forest PFTs) during 1901-2012 and then selected the grid cells that experienced deforestation by comparing the forest area maps between 1901 and 2012 (net deforestation).This procedure produces a good approximation given the continuously decreasing trend of forest area in LULCC hotspot regions like South and Central America (Fig. 2).We also tested an alternative method to determine the LULCC-affected grid cells in TRENDY model outputs; i.e., PFT maps were compared year by year during 1901-2012, and grid cells with deforestation were selected (gross deforestation).This method tends to give a greater number of LULCC-affected grid cells, reducing the goodness of fit in the regression between the biomass in 1901 and E c LUC during 1901-2012 (Figs.S1 and S2 in the Supplement).Therefore, the method of gross deforestation is not used for further analyses.
We verified that deforestation grid cells are responsible for most of the total net LULCC flux.In fact, the average of the different model simulations of LULCC emissions from deforestation grid cells between 1901 and 2012 is approximately 90 % of the total LULCC emissions from all grid cells (Fig. S1).The LULCC emissions in this study are thus taken to equal the sum of LULCC emissions from the selected deforestation grid cells using our criterion.It should be noted that although only deforestation is used as a single criterion to define grid cells affected by LULCC in DGVMs, modeled LULCC emissions also include other types of land-use tran-   Zaehle and Friend (2010) a T63 grid has an approximate resolution of 1.9 www.biogeosciences.net/14/5053/2017/Biogeosciences, 14, 5053-5067, 2017 sitions involving pairs of non-forest PFTs in the selected grid cells.
In each model, only biomass in deforestation grid cells is considered.Biomass in the year 1901 is thereby defined as initial biomass, and biomass averaged during 2000-2012 is defined as present biomass.An ordinary least squares linear regression is performed with the outputs of all models between initial biomass and E c LUC from 1901 to 2012 and between the initial and the present biomass on both global and regional scales.Our division of nine regions in the world (Fig. 2) for estimating LULCC fluxes is the same as in Houghton et al. (1999).

Observation-based biomass datasets
Several biomass datasets (Avitabile et al., 2016;Baccini et al., 2012;Carvalhais et al., 2014;Liu et al., 2015;Pan et al., 2011;Saatchi et al., 2011;Santoro et al., 2015;Thurner et al., 2014) based on inventories and remote sensing can potentially be used to constrain E c LUC through the set of regressions from DGVMs.However, these biomass datasets cover different parts of biomass (aboveground, belowground or total) and different regions (tropics, Northern Hemisphere or the globe) at different spatial resolutions (Table 1).We choose the global grid-based biomass dataset from Carvalhais et al. (2014) to derive an observational constraint that results in a best estimate of E c LUC .This map merges the Northern Hemisphere biomass dataset from Thurner et al. (2014) and the tropical biomass dataset from Saatchi et al. (2011).An advantage of this map is its consistency in biomass terms with the outputs of TRENDY models because it documents aboveground + belowground and forest + herbaceous biomass (Tables 1, 2).Three other biomass maps are used as alternative datasets for sensitivity tests: (1) the global biomass map from the GEOCARBON project, a merged product of the biomass datasets in the Northern Hemisphere (Santoro et al., 2015) and tropics (Avitabile et al., 2016); (2) regional biomass estimates from Pan et al. (2011) based on forest inventory data; and (3) the biomass map from Liu et al. (2015) derived from satellite vegetation optical depth.The GEOCARBON (Avitabile et al., 2016;Santoro et al., 2015) and Liu et al. (2015) datasets that only provide aboveground biomass were extended to total forest biomass using the conversion factors for the nine regions (Liu et al., 2015).The global biomass maps from GEOCARBON (Avitabile et al., 2016;Santoro et al., 2015) and Pan et al. (2011) are only for forest (Table 1), and we do not add the herbaceous biomass to these two datasets because the global herbaceous biomass only accounts for about 3 % of the global total biomass (Carvalhais et al., 2014).Note that the uncertainties in the corresponding constrained results using these three alternative datasets do not include (1) the uncertainties in converting aboveground biomass to the total of aboveground and belowground biomass for the datasets from Liu et al. (2015) and GEOCARBON (Avitabile et al., 2016;Santoro et al., 2015) or (2) the uncertainties in ignoring non-woody biomass in the datasets from GEOCARBON (Avitabile et al., 2016;Santoro et al., 2015) and Pan et al. (2011).The biomass maps of Carvalhais et al. (2014), GEOCARBON (Avitabile et al., 2016;Santoro et al., 2015) and Liu et al. (2015) with different spatial resolutions were aggregated to a 1 • × 1 • resolution before selecting the deforestation grid cells.Peng et al., 2017) assumes that the increase in cropland and pasture is first taken from forest and then from natural grassland if no more forest area is available and that the regional forest area change is set to match the historical forest reconstruction from Houghton (2003).Because the biomass distribution in Pan et al. (2011) is given as regional mean values and not resolved on a grid cell basis, it is impossible to select deforestation grid cells directly from this dataset using the above methods.Therefore, for each region, we calculated the ratios of biomass in deforestation grid cells according to Method A, Method B and Method C to the total biomass in all grid cells in each of the other three biomass datasets (Carvalhais et al., 2014;GEOCARBON, Avitabile et al., 2016;Santoro et al., 2015;Liu et al., 2015).For each method (Method A, B and C), the three ratios corresponding to the three biomass datasets were further averaged in each region.The total biomass amount from Pan et al. (2011) in each region was multiplied by the average ratio to derive the biomass equivalent to using Method A, Method B and Method C for the dataset from Pan et al. (2011).These three methods applied to the above-listed biomass datasets are also applied as sensitivity tests to select the deforestation grid cells since 1901 in the TRENDY model outputs.Identically, regressions are performed using the initial biomass amount and E c LUC from these selected grid cells.Due to the inconsistencies among the three methods and the historical PFT maps of each DGVM, the biomass amount in 1901 in the selected grid cells using these three methods is higher than using PFT maps, but the E c LUC are lower, reflecting a lower representativeness of the deforestation grid cells using these three methods for DGVM outputs (Fig. S1).As a consequence, a weaker goodness of regression fit was found between E c LUC and initial biomass (Fig. S2).

Uncertainties in constrained LULCC emissions
The biomass from Method A, Method B and Method C obtained from each dataset is extrapolated into biomass for the year 1901 using the regression between initial biomass and present biomass modeled by the DGVMs.This biomass in 1901 is then applied in the regression between modeled E c LUC and modeled initial biomass among different DGVMs to calculate constrained E c LUC .In this emerging constraint approach (Fig. 1), the uncertainties in constrained E c LUC are a function of the uncertainties in the observed biomass datasets, the linear regression goodness of fit for the two regressions (regressions between E c LUC and the initial biomass and between the initial and present biomass) and the slopes of the regressions.The uncertainty in constrained LULCC emissions is calculated as in Stegehuis et al. (2013): where σ LULCC , σ initial_biomass and σ present_biomass are the uncertainties in constrained E c LUC , the uncertainty in initial biomass and the uncertainty in present biomass; α and σ res_LULCC represent the slope and the standard deviation of the residuals from the linear regression fit between E c LUC and initial biomass, and β and σ res_biomass represent the slope and standard deviation of the residuals from the linear regression between initial biomass and present biomass.

Forest area change and cumulative LULCC emissions in DGVMs
As expected, a general decrease in forest area is found between 1901 and 2012, especially in regions subject to extensive deforestation over the last decades, namely South and Central America, South and Southeast Asia and tropical Africa (Fig. 2), which is in support of our methods of defining deforestation grid cells, although the forest area in some regions differs substantially across DGVMs.Differences in forest area are large in tropical Africa, North America and the former Soviet Union, while they are smaller in South and Central America and South and Southeast Asia (Fig. 2).There are several reasons for these differences in forest area: (1) the models have different initial distributions of PFTs (the TRENDY-v2 protocol only prescribed the same initial area of natural vegetation, but did not specify the PFTs that compose natural vegetation); (2) some models consider only net LULCC, but others have gross LULCC including some sub-grid transitions (Table 2; see a comparison using the JSBACH model; Wilkenskjeld et al., 2014); (3) and the models have different treatments for changing pasture areas (either proportional from natural vegetation or preferential from natural grasslands).In North America, the China region and Western Europe, the forest area decreased in the first half of the 20th century and then increased in recent decades.Yet, the magnitude of the increase is smaller than that of the previous decrease in these regions, and the global average is net forest loss between 1901 and 2012 (ranging from 2.3 to 16.8 Mkm 2 across the nine models).E c LUC from the nine DGVMs between 1901 and 2012 range from 1.7 PgC (−0.6 to 6.0; median and range are positive, indicating a net cumulative flux to the atmosphere) in North Africa and the Middle East to 42.6 PgC (33.5 to 81.4) in South and Central America, resulting in a global total of 148 PgC (94 to 273; Table 3).Tropical Africa and South and Southeast Asia have the second-largest E c LUC of 21.8 (15.8 to 57.8) and 21.8 PgC (9.6 to 46.6), respectively.Although afforestation and reforestation occurred in North America after around 1960 and in China after 2000 (Fig. 2), E c LUC in these two regions have been positive since 1901, with median values of 19.9 and 10.7 PgC, respectively (Table 3).

Relationship between cumulative LULCC emissions and initial biomass
We found a positive linear relationship between E c LUC and initial biomass in the deforestation grid cells of each model on a global scale and in the regions considered (Fig. 3).The coefficients of determination (r 2 ) are 0.61, 0.58 and 0.76 in South and Central America, South and Southeast Asia and tropical Africa, respectively.Due to stable or slightly in-creasing forest area (Fig. 2), the correlation between initial biomass and E c LUC is small in Western Europe (Fig. 3).The slopes of the relationships between E c LUC and initial biomass shown in Fig. 3 range from 0.13 PgC PgC −1 in Western Europe to 0.63 PgC PgC −1 in North Africa and the Middle East.In tropical regions with intensive LULCC, the slope is similar between South and Southeast Asia (0.36 PgC PgC −1 ) and tropical Africa (0.37 PgC PgC −1 ), but lower in South and Central America (0.21 PgC PgC −1 ).These slopes reflect the sensitivity of cumulative carbon loss to initial biomass carbon stock.They are mainly influenced by the fraction of deforested area relative to the initial forest area in each region, which explains 46 % of the variations in the slopes across regions (Fig. S3).Differences in biomass density across regions and in the use of gross or net transitions among DGVMs (Table 2) also contribute to variations in slopes.

Cumulative LULCC emissions constrained by present-day biomass observations
There is also a strong positive relationship between initial biomass in 1901 and present-day biomass in grid cells that have experienced deforestation (Fig. 4).The r 2 of this regression is higher than 0.92 in most regions, except in North America and the China region (0.89 and 0.76, respectively).The regression between present-day and initial biomass was applied to extrapolate current observation-based biomass back to the year 1901.The extrapolated biomass in 1901 is higher than that in the present day, mainly due to a larger forest area, although it is difficult to discriminate other effects, such as CO 2 fertilization, that might have increased biomass between 1901 and 2012.
Using the chain of emerging constraints between presentday and initial biomass (Fig. 4) and between E c LUC and initial biomass (Fig. 3), with all uncertainties being propagated (Eqs. 1 and 2), we were able to constrain E c LUC during 1901-2012 by biomass observations (Figs. 3, S4, S5, Table 3).The E c LUC value constrained by the biomass dataset of Carvalhais et al. ( 2014) is 155 ± 50 PgC (mean and 1σ Gaussian error) and this estimate is robust to the choice of the methods to define deforestation grid cells in biomass datasets (constrained E c LUC = 152 ± 49, 154 ± 50 and 159 ± 51 PgC for Method A, Method B and Method C, respectively).The difference between the global constrained E c LUC and the median value of original E c LUC (148 PgC) from TRENDY DGVMs is not significant, suggesting that the median model estimate is independently verified by biomass observations.Still, some models that are inconsistent with the observations can be identified (Fig. 3).
The uncertainties reported in our constrained estimate of E c LUC include uncertainties in the biomass observations and in the scatter of the two regressions (Figs. 3, 4) used to construct the emerging constraint.The uncertainties in the constrained E c LUC are still relatively large, resulting from the large uncertainties in the biomass observations.However, it should be noted that we summed the biomass uncertainty in each deforestation grid cell to give the regional biomass uncertainty, which gives a maximum uncertainty with a potential assumption that the uncertainties in all grid cells are fully correlated.In reality, the regional biomass uncertainty should be lower, thus leading to lower uncertainty in constrained E c LUC .However, it is difficult to estimate the error correlations of observation-based biomass between different grid cells at this stage.
Although the constrained global E c LUC value is only 7 PgC higher than the median of the original DGVM ensemble (Table 3), larger differences can be found on a regional scale (Fig. 5).Constrained E c LUC estimates are higher than the original modeled values in South and Southeast Asia, tropical Africa and South and Central America (Table 3).For example, the constrained E c LUC value is 37.2 ± 14.4 PgC in South and Southeast Asia compared to the original TRENDY median value of 21.8 PgC (range of 9.6 to 46.6 PgC) for that region.The constrained emissions are also higher in the China region and the Pacific developed region compared to the prior median value (see Table 3).A significantly large reduction in E c LUC through the emerging constraint is found in North America because of the lower biomass amount from observation-based datasets than from DGVMs.The original median E c LUC value of that region is 19.9 PgC (range of 8.6 to 40.8 PgC), while the constrained result is 10.8 ± 7.1 PgC.

Constrained E c
LUC are also lower than original estimates in Western Europe, North Africa and the Middle East, although their contributions to the global total emissions are very small (Table 3).
Alternative estimates of E c LUC constrained by three other biomass datasets (Liu et al., 2015;GEOCARBON, Avitabile et al., 2016;Santoro et al., 2015;Pan et al., 2011) are provided in Fig. 6 and Table 3.In general, the constrained E c LUC using biomass maps from Liu et al. (2015) and GEOCAR-BON (Avitabile et al., 2016;Santoro et al., 2015) are rather consistent (on average only 4.5 % higher) with those from Carvalhais et al. (2014), implying the robustness of our estimates.The biomass dataset from Pan et al. (2011) leads to lower LULCC emission estimates on a global scale, mainly due to a lower estimate in South and Southeast Asia (Table 3) compared to the other products.In the Pacific developed region, GEOCARBON-based estimates (Avitabile et al., 2016;Santoro et al., 2015) are much higher than those from Carvalhais et al. (2014) because the latter has a gap in the biomass map in the southern part of Australia (Carvalhais et al., 2014).In Fig. 6, we show the original E c LUC from TRENDY DGVMs as quantiles because we do not know whether they follow a normal distribution; to be comparable, the interquantiles of the constrained E c LUC are also shown.The interquantile range of constrained E c LUC is larger than that of the original E c LUC (Fig. 6).This, however, does not mean that our emerging constraint method is not effective, but that the relatively large uncertainty in the constrained E c LUC is propagated from the biomass observation uncerwww.biogeosciences.net/14/5053/2017/Biogeosciences, 14, 5053-5067, 2017  tainty, which is about one-third of the mean biomass at the global level (Carvalhais et al., 2014).
The global constrained E c LUC value obtained by using the two supplementary methods is almost identical to that from our original method in Fig. 1 (see an example in Fig. S6).

The difference in E c
LUC between the supplementary and original methods at the global level is < 1 % for all biomass observation datasets (Carvalhais et al., 2014;Liu et al., 2015;GEOCARBON, Avitabile et al., 2016;Santoro et al., 2015;Pan et al., 2011) and all methods to select LULCC grid cells (Method A, B and C).This suggests that our constrained results are very robust.The change in the uncertainty in global constrained E c LUC is also very small (< 2 %) because most of the uncertainties are from the biomass observations (see Discussion) and the regression between E c LUC and biomass (see r 2 in Fig. 3), rather than from converting present-day biomass to biomass in 1901 (see r 2 in Fig. 4).The difference in regional E c LUC between different constraint methods is relatively larger (12 % on average), but the difference remains very small in tropical regions (∼ 1 %).However, we note that the results from the two supplementary methods (Method S1 and S2) should be cautiously treated.First, because E c LUC are related to the biomass that has been affected since the start of the land-use perturbation, only biomass in 1901 (rather than that left out of land use in the 2000s) in LULCC-affected grid cells is logically related to historical E c LUC .Thus, converting present-day biomass to biomass in 1901 (the original method; Fig. 1) is a more direct and process-justified approach compared to regressing presentday biomass versus E c LUC (Method S1), which is not justified by a logical mechanism.Second, using B in Method S2 is not a perfect solution to extrapolate biomass in 1901 from present-day biomass because the change in biomass is not solely impacted by land-use change.The interactions between biomass and climate conditions, disturbances and nutrient limitation are also very important in DGVMs.For example, historical LULCC may reduce biomass over LULCCaffected regions by replacing forests with croplands.On the contrary, the CO 2 fertilization effects may increase biomass over LULCC and non-LULCC regions.Therefore, B reflects a mixed effect of different factors, not a sole response to LULCC.In addition, as B has a higher relative uncertainty among models (∼ 53 % at the global level), using the regression (r 2 > 0.92 in seven regions; Fig. 4) to calculate biomass in 1901 could include relatively less noisy information than using B.

Discussion
Our approach to constraining E c LUC from an ensemble of DGVMs provides a best estimate that is between those from two bookkeeping models (∼ 130 PgC from Houghton et al., 2012, and 212 PgC for the default dataset from Hansis et al., 2015).Although the bookkeeping model from Hansis et al. (2015) was driven by the same agricultural land-use maps as the TRENDY models (the model of Houghton et al., 2012, uses FRA/FAO data), the E c LUC value from Hansis et al. (2015) is different from that constrained from the DGVMs.Differences in estimates between DGVMs and bookkeeping models have been attributed to different definitions of LULCC emissions (Pongratz et al., 2014;Stocker and Joos, 2015).Indeed, LULCC emissions from DGVM simulations in TRENDY include the "missed sink capacity in the deforested area" (Gasser and Ciais, 2013;Pongratz et al., 2014), and so, all else being equal, should simulate higher emissions than bookkeeping models, which do not include this term.However, bookkeeping models take forest degradation into account, while this process is ignored in DGVMs.Bookkeeping models also represent shifting cultivation (resulting in larger sub-grid-scale gross land transitions as opposed to net transitions) and wood harvest; these are processes that are accounted for in only a subset of the TRENDY models (see Table 2).In addition to different driving LULCC area data, differences between the two bookkeeping models were discussed by Hansis et al. (2015) We are aware that our truncated diagnostic of a set of deforestation grid cells, instead of grid cells affected by all LULCC types, is an underestimate of the total area subject to LULCC because we ignore grid cells that experienced land-use transitions between non-forest vegetation only (e.g., only conversions from grasslands to cropland happening in a grid cell).However, the conversion of forest to croplands and pasture dominates the total net LULCC flux (Houghton, 2003(Houghton, , 2010)), while the contribution of transitions between non-forest vegetation and agriculture to E c LUC is comparatively small (Fig. S1).In fact, the annual LULCC emission from deforestation was estimated to be 2.2 PgC yr −1 during the 1990s, and the total emissions from other activities (e.g., afforestation, reforestation, non-forest transitions) are nearly neutral (Houghton, 2003).
The lack of direct biomass observations at the initial state forces us to hindcast biomass in 1901 based on present-day observations; this is an extrapolation that also comes with uncertainties.Some of the observed biomass datasets only cover forests, and satellite measurements usually quantify aboveground biomass carbon stocks and not total biomass stocks (Table 1).In addition, the regression of modeled biomass between 1901 and 2000-2012 (average) to extrapolate the biomass amount in 1901 is only a statistical approach.This regression cannot be mechanistically explained because its slope and intercept are impacted by multiple factors in the models like land clearing, secondary vegetation regrowth, CO 2 fertilization, climate, disturbances and the nu-trient limitation on biomass.Despite these uncertainties, the high coefficient of determination in the regression increases our confidence in the biomass extrapolation to 1901.For a given biomass dataset, the choice of a method for defining deforestation grid cells (Method A, Method B and Method C) has a very small influence on our results (Table 3).
LULCC carbon emissions are influenced not only by changes in biomass, but also by how these are prescribed in the model to influence posterior changes in detrital and soil organic carbon pools.However, LULCC emissions are dominated by changes in biomass.For example, LULCC results in a net carbon loss of 110 PgC in biomass during 1850-1990, accounting for 89 % of the total E c LUC (Houghton, 1999).The soil carbon changes after LULCC is also indirectly impacted by initial biomass, since the dead roots and remaining aboveground debris turn into soil organic carbon after land clearing, which takes longer to return into the atmosphere.In addition, it is not necessary to account for all factors when applying an emergent constraint approach (e.g., Cox et al., 2013;Kwiatkowski et al., 2017;Wenzel et al., 2016).The regression between E c LUC and biomass in 1901 in the models in our study is satisfying (e.g., r 2 = 0.66 on a global scale; Fig. 3) to constrain E c LUC through biomass observations.The required model outputs for carbon stocks and fluxes in the TRENDY project are not PFT specific; only the mean PFT-mixed variables in each grid cell are required.Such an aggregation prevents a rigorous separation of biomass between forest and other biomes in each grid cell.It was thus impossible for us to calculate individual contributions of different LULCC types to the overall LULCC emissions, which induces uncertainties when matching model results with observed forest biomass distributions (e.g., only forest biomass in datasets from GEOCARBON; Avitabile et al., 2016;Santoro et al., 2015;Pan et al., 2011).Therefore, we suggest that the next generation of DGVM comparisons report PFTspecific carbon stock and fluxes, and other model intercomparison exercises should follow suit.The approach of using multiple biomass observation datasets to constrain the LULCC emissions could also be applied in other modeling projects, such as Coupled Model Intercomparison Project Phase 5 (CMIP5) and CMIP6.
Currently, the uncertainties in the satellite-based biomass datasets are relatively large (e.g., 38 % on average in the tropics at the pixel level (< 1 km); Saatchi et al., 2011).This introduces uncertainties in the constrained cumulative LULCC emissions, depending on the forest types and biomass range.For example, on average on the global scale, the uncertainty in the resolution of DGVM grid cells (0.5 • × 0.5 • ) is about one-third of the mean biomass (Carvalhais et al., 2014) and the relative uncertainty is smaller for high biomass areas in the tropics (Avitabile et al., 2016;Saatchi et al., 2011).
The main sources of uncertainties in satellite-based biomass datasets depend on the specific product, the spatial resolution of the datasets and the methodology used to validate the data.For instance, in the case of radar remote sensing used for biomass mapping in Northern Hemisphere boreal and temperate forests, the uncertainty is largely due to the sensitivity of the signal to properties other than vegetation structure (e.g., moisture), the influence of non-forest vegetation on the signal (especially in fragmented landscapes; Santoro et al., 2015) and uncertainties in the additional datasets (allometric databases, land cover) used for the conversion of satellite measurements to biomass estimates (Thurner et al., 2014).At the pixel level and modeling grid cells, uncertainties may also be strongly influenced by the quality and size of the inventory data used for validation and the significant mismatch between pixel area and the plot data, as well as the difference between the dates of satellite and ground observations (Saatchi et al., 2015(Saatchi et al., , 2011;;Thurner et al., 2014).
Moreover, the satellite-derived biomass datasets used in this study represent different dates.The tropical biomass products represent the circa 2000 status of forests, whereas the boreal and temperate biomass maps are based on spaceborne radar data from the year 2010.These differences in the date of observations introduce additional uncertainty in the biomass estimates due to changes in forest cover from the disturbance, recovery and land-use activities (Hurtt et al., 2011) occurring annually and regionally.
However, in boreal, temperate and in tropical regions, the estimated relative uncertainties were lowest in high biomass areas (Avitabile et al., 2016;Thurner et al., 2014), which dominate the contribution to our results.Moreover, the relatively high accuracy of biomass datasets when aggregated to modeling grid cells from higher-resolution maps (< 1 km; Saatchiet al., 2011;Thurner et al., 2014) suggests that the biomass datasets implemented in our study provide a realistic representation of carbon stocks to constrain the historical cumulative LULCC emissions from vegetation.

Conclusions
Uncertainties in LULCC carbon emissions are relatively large compared to other terms in the global carbon budget.The wide spread is partly due to the differences in model structure but also because of the difficulty in constraining models by observations of LULCC, particularly emissions resulting from deforestation.We propose an observationally constrained global cumulative LULCC emission of 155 ± 50 PgC during 1901 and 2012.Although the constrained cumulative LULCC emissions are close to the unconstrained ones from models, our study offers an evaluation of the modeling results using the observation-based biomass.More importantly, we combine the uncertainties in the regressions from state-of-the-art models with uncertainties in multiple observation-based biomass datasets and give a constrained E c LUC with a 1σ Gaussian uncertainty.The idea of an emergent constraint approach is to give a more accurate estimate and/or reduced uncertainty in an unknown variable by combining a heuristic relationship between two mod-eled variables (an observable and an unknown one) with actual observations of the observable variable.Thus, our study shows (1) that there is a heuristic relationship between initial biomass and E c LUC among different models, (2) that available biomass observation data independently confirm the median of modeled emission estimates and (3) that more accurate biomass data in the future would allow some of the modeled estimates of emissions to be falsified.Although the uncertainties in current observation-based biomass datasets are relatively high, as more accessible and accurate observation data become available, many data-driven opportunities are being created to improve the accuracy of DGVM predictions.
Data availability.Different biomass datasets used in this study can be downloaded based on information in their original publications.Specifically, the biomass dataset of Carvalhais et al. (2014) can be downloaded from MPI BGI Data Portal: https://www.bgc-jena.mpg.de/geodb/projects/Home.php;The biomass dataset of Liu et al. (2015) can be downloaded from http://www.wenfo.org/wald/global-biomass/;The biomass dataset of GEO-CARBON (Avitabile et al., 2016;Santoro et al., 2015) can be downloaded from http://www.wur.nl/en/Expertise-Services/Chairgroups/Environmental-Sciences/;The regional biomass of Pan et al. (2011) can be found in Table 2 in their paper.The outputs (biomass-constrained cumulative LULCC emissions) of this study are provided in Table 3.
Competing interests.The authors declare that they have no conflict of interest.

Figure 2 .
Figure 2. Temporal change in forest area from TRENDY-v2 models in each of the nine regions.Differences between models arise from their specific vegetation maps and rules through which natural PFTs are chosen to give land to agriculture.

2. 5
Two supplementary methods to constrain E c LUC using biomass observations We also tested two supplementary methods to constrain E c LUC : first, Method S1 using the regression between E c LUC and present-day biomass from TRENDY models rather than extrapolating present biomass to biomass in 1901, and then Method S2 using B (biomass difference between present biomass and biomass in 1901 derived from the model simulations) instead of a regression between biomass in 1901 and present-day biomass to extrapolate the observation-based biomass in 1901.In Method S1, the uncertainties in the biomass observations and in the regression between E c LUC and present biomass from the models are used to calculate the uncertainties in the constrained E c LUC .In Method S2, the uncertainties in the biomass observations and the standard deviation of B among the models are used.
Relationship between biomass in 1901 and cumulative land-use and land-cover change (LULCC) emissions during 1901-2012 across the nine TRENDY-v2 models.The black solid line is the linear regression line.The vertical green solid line indicates the reconstructed biomass in 1901 from Carvalhais et al. (2014) by applying Method A (the increase in cropland in HYDE v3.1 data from forest; see Figs.S4 and S5 for the results of Method B and Method C) to define deforestation grid cells.The orange solid horizontal line indicates the cumulative LULCC emissions constrained by reconstructed biomass in 1901.Dashed lines represent 1σ uncertainties.The probability density function of the constrained cumulative LULCC emissions is shown on the right.

Figure 4 .
Figure 4.The relationship between initial biomass in 1901 and present biomass (average of biomass from 2000 to 2012) across the TRENDY-v2 models for each region.Note that both biomass in 1901 and present biomass are from TRENDY models, not the observations.Dashed line is the 1 : 1 line.

Figure 5 .
Figure 5. Comparisons between the original TRENDY land-use and land-cover change (LULCC) emissions and the cumulative LULCC emissions constrained by the biomass dataset from Carvalhais et al. (2014).Panels (a), (b) and (c) are the results from Method A, Method B and Method C, respectively.The original TRENDY emissions are shown as the median value of all models.Dashed line is the 1 : 1 line.

Table 2 .
Description of TRENDY model setups used in this study.
to identify grid cells subject to past deforestation in biomass datasets It is not practical to use PFT maps from DGVMs to define deforestation grid cells in the observation-based biomass datasets because PFT maps and forest area change since 1901 differ across DGVMs.Instead, we diagnosed deforestation grid cells in the biomass maps using three harmonized methods (Method A, Method B and Method C).All the methods are based on the reconstructed historical agricultural area from the History Database of the Global En- vironment(HKlein Goldewijk et al., 2011)l., 2011)but with different hypotheses regarding how agricultural expansion has affected forests.These harmonized methods are representative of the different rules for assigning LULCC data to natural vegetation types in DGVMs.Method-A assumes that the increase in cropland area in a grid cell between 1901 and 2012 is taken from forest; Method B assumes that the increase in cropland and pasture is taken proportionally from all natural vegetation types; and Method C (like the "BM3" scenario in

Table 3 .
The global and regional cumulative land-use and land-cover change (LULCC) emissions (PgC) during 1901-2012 from original TRENDY models and from the estimates constrained by different biomass datasets with different methods to define deforestation grid cells.The interquantile ranges are shown in TableS1.
; for