Estimating spatial variation in Alberta forest biomass from a combination of forest inventory and remote sensing data

Uncertainties in the estimation of tree biomass carbon storage across large areas pose challenges for the study of forest carbon cycling at regional and global scales. In this study, we attempted to estimate the present aboveground biomass (AGB) in Alberta, Canada, by taking advantage of a spatially explicit data set derived from a combination of forest inventory data from 1968 plots and spaceborne light detection and ranging (lidar) canopy height data. Ten climatic variables, together with elevation, were used for model development and assessment. Four approaches, including spatial interpolation, non-spatial and spatial regression models, and decision-tree-based modeling with random forests algorithm (a machine-learning technique), were compared to find the “best” estimates. We found that the random forests approach provided the best accuracy for biomass estimates. Non-spatial and spatial regression models gave estimates similar to random forests, while spatial interpolation greatly overestimated the biomass storage. Using random forests, the total AGB stock in Alberta forests was estimated to be 2.26× 109 Mg (megagram), with an average AGB density of 56.30± 35.94 Mg ha−1. At the species level, three major tree species, lodgepole pine, trembling aspen and white spruce, stocked about 1.39 × 109 Mg biomass, accounting for nearly 62 % of total estimated AGB. Spatial distribution of biomass varied with natural regions, land cover types, and species. Furthermore, the relative importance of predictor variables on determining biomass distribution varied with species. This study showed that the combination of groundbased inventory data, spaceborne lidar data, land cover classification, and climatic and environmental variables was an efficient way to estimate the quantity, distribution and variation of forest biomass carbon stocks across large regions.

sion factors (e.g., Brown, 1997;Cairns et al., 1997;Schroeder et al., 1997), local and regional scale forest inventories (Monserud et al., 2006;Blackard et al., 2008), simulation modelling (Tans et al., 1990;Ciais et al., 1995), to methods using only remote sensing or combined with inventory data (Hall et al., 2011;Myneni et al., 2001;Wulder et al., 2008;Yemshanov et al., 2012). However, the estimates obtained by these 15 different approaches are often inconsistent. For example,  compared several biomass estimates for the Brazilian Amazon forests and found very low agreement across the estimates, with the range ranging from 39 to 93 gigatons (Gt) of carbon. Blackard et al. (2008) compared several estimates of C pools in living forest biomass of continental US forests and found that satellite-image based estimation was 20 two times higher than estimates based on inventory data.
Forest ground-based inventory laid out in a statistically sound design is considered to be the optimum approach to accurately and precisely measure forest biomass C stocks (Schroeder et al., 1997;Ketterings et al., 2001;Brown, 2002). However, sampling a sufficient number of trees to represent the size and species distribution in 25 a forest is extremely time-consuming and costly. The task becomes much harder for accurate estimation of biomass C stocks over large areas. For carbon estimation at the regional scale, most researchers tend to measure biomass on a few small, generally non-randomly selected plots, and use various prediction approaches (e.g., spa-Introduction Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | tial interpolation techniques, and regression models), to estimate regional biomass C stocks based on observed values of these small sampling plots. However, inventories based on ground samplings are not free of problems. The first problem is related to the scarcity of ground-based inventory plots (Botkin and Simpson, 1990;Wulder et al., 2008;Pan et al., 2011). The lack of sufficient and high-quality sample plots has been 5 identified as a major barrier to the development of robust biomass estimates and to the subsequent validation of these estimates (Wulder et al., 2008). For example, in a recent report about global carbon storage, Pan et al. (2011) stated that estimates of C stocks are only limited to the 230 million hectares (Mha) of managed forest in Canada, leaving about 118 Mha of northern forests unaccounted for because of data paucity.

10
The second problem is related to the fact that forest inventories tend to be conducted in forests that are considered to have commercial value, i.e., closed forests, with little regard to the open, drier forests, woodlands, or human-disturbed forests (Botkin and Simpson, 1990;Brown, 1997). This biased sampling design usually tends to overestimate biomass C stocks over large areas. 15 Light Detection And Ranging (LiDAR) is perhaps the most promising remote sensing technology for estimating biomass because it directly measures vertical forest structure, such as canopy height and crown dimensions (Simard et al., 2011). Generally, LiDAR remote sensing has three platforms, including spaceborne, airborne, and ground-based platforms. While airborne or ground-based LiDAR methods have been 20 intensely used for biomass-related measurements at the stand level or individual tree level, these methods are only feasible at local or small-regional scales, rarely at larger scales (Popescu et al., 2011). The main reason of this restriction is because the costs of airborne or ground-based LiDAR on data acquisition and analysis are still high at large extents (Popescu et al., 2011;Saatchi et al., 2011). For biomass and carbon 25 estimation at the regional scale, spaceborne LiDAR with relatively low cost has some competitive advantages.
The boreal forest, containing large amounts of carbon in its biomass and soils, has been recognized as an important global contributor to the net balance of carbon ex-Introduction change between the atmosphere and the biosphere (Kurz and Apps, 1999;Fyles et al., 2002;Pan et al., 2011). According to Intergovernmental Panel on Climate Change (IPCC, 2007), climate warming in northern latitudes is occurring almost twice as rapidly as the global average. Climate warming in the boreal may be leading to increases the frequency of wildfires (Harden et al., 2000), insect outbreaks (e.g., mountain pine bee-5 tle, Kurz et al., 2008) and regional drought events (Allen et al., 2010), thus influencing carbon stocks and dynamics (Kurz et al., 2008;Monserud et al., 2006;Pan et al., 2011). Since forest biomass is a key biophysical parameter in evaluating and modeling terrestrial carbon stocks and dynamics (Houghton et al., 2009), an accurate estimation of regional biomass is important for understanding boreal forests and their responses to 10 climate warming. However, most of the previous studies for biomass estimations in the boreal were limited to the regions with high productivity and little disturbance (Botkin and Simpson, 1990 toring Institute (ABMI). In total, 1968 plots measured in the period 2000-2012 were selected to estimate current biomass carbon stock in Alberta forest region (Fig. 1).
For the selected plots with more than one census, only the latest inventory data was selected for the current analysis.

10
The Alberta PSP network has maintained more than two thousand PSPs established and re-censused by the government and forest companies starting from 1950s. Most PSPs were selected in forest regions with high productivity, and these plots were excluded from normal harvesting and other human disturbances. Plot sizes ranged from 400m 2 (0.04 ha) to 8092m 2 (0.81 ha) (mean: 0.12 ha sites are established on each grid with a random distance and directional offset of up to 5.5 km from this grid. Different from the PSP network, ABMI sampling plots were more randomly distributed and were thus more representative of the full range of forest stand ages and disturbance regimes at the landscape level. The area of each ABMI plot is one hectare (100 × 100 m). On each site, all trees and snags with ≥ 25 cm DBH in four selected 25 × 25 m plots, all trees and snags with ≥ 7 cm DBH in four 10 × 10 m subplots, and all trees and snags in four 5 × 5 m further subplots were measured regardless of size. Totally, 490 sampling plots were included for current work, including 36 059 living trees and 7046 snags.

10
Spaceborne LiDAR top canopy height data for Alberta forest regions (Supplement A) were obtained from a global wall-to-wall canopy height map at 1 km spatial resolution (Simard et al., 2011). This map was produced by using the data acquired by the Geoscience Laser Altimeter System (GLAS), onboard the Ice, Cloud, and land Elevation Satellite (ICESat), in combination with several global ancillary variables, which 15 correspond to climate and vegetation characteristics. These variables included: annual mean precipitation, precipitation seasonality, annual mean temperature, temperature seasonality, elevation, tree cover, and protection status.

Climatic variables
Climate data for Alberta forests were derived from the program ClimateWNA 4.70 20 (Wang et al., 2012). This program uses baseline climate data derived from monthly precipitation and temperature grids ( map of the forested region, and Agriculture Agri-Food Canada's map of the agricultural zone. This map consists of 11 land cover classes, including waters, snow/ice, rock/rubble, exposed land, developed, shrubland, grassland, agriculture, coniferous forest, broadleaf forest, and mixed forest.

5
To compare how tree biomass carbon stock varies in different forest regions, we used Alberta natural regions (NRs) and natural subregions (NSRs) classification system (Alberta Natural Regions Committee, 2006) as the basis for our comparisons. In Alberta, this system has supplied for the provincial natural resource management activities since the 1970s. The current version of this system consists of 6 NRs and 21 NSRs.
NRs, the largest mapped ecological units in this system, are defined geographically on the basis of landscape patterns, notably vegetation, soils and physiographic features. NSRs, subdivisions of a NR, are generally characterized by vegetation, climate, elevation, and latitudinal or physiographic differences within a given NR.

Estimation of aboveground biomass
Aboveground biomass was estimated for each individual tree in all ground inventory plots using DBH-and height-based biomass allometric equations and tree speciesspecific parameters provided by Lambert et al. (2005) and Ung et al. (2008). These equations were derived from thousands of trees sampled across Canada and allow the 20 calculation of tree biomass (foliage, branches, stem bark, and stem wood) based on DBH measurements (for details see Lambert et al., 2005 andUng et al., 2008). The form of the allometric equation is as follows: where Y is the biomass component of interest, diameter (D) is measured on each tree, height (H) is measured on a subsample trees in each plot, and β 1 , β 2 and β 3 are parameters. For trees with missing height measure, the heights are estimated from local species-specific height-diameter equations developed by Huang et al. (2009). Total aboveground biomass of each PSP was summed up from all trees in each 5 plot. Total aboveground biomass of each ABMI site was summed up from three parts: the biomass per hectare of trees ≥ 25 cm DBH in the 25 × 25 m plots, the biomass per hectare of trees 7-25 cm DBH in 10 × 10 m subplots, and the biomass per hectare of trees < 7 cm DBH in 5 × 5 m subplots.

Estimation of belowground biomass
Since belowground data were not measured at all sites, we estimated belowground tree root biomass using the following regression equation developed for boreal forests by Cairns et al. (1997): where BGB is the belowground biomass (coarse and fine roots), and AGB is the above-15 ground biomass.

Estimation of debris biomass
Our aboveground biomass estimates included standing dead trees. However, there was no inventory data on down and dead woody material (fine and coarse woody debris) for most of our study plots. To estimate debris biomass, we calculated the ratios of 20 debris biomass (fine and coarse woody debris) to above-ground biomass for 90 study sites across Canada's forest regions (Shaw et al., 2005). The average ratio of debris biomass to above ground biomass was 5 %, which was used to estimate the debris biomass in the plots.

Estimation of biomass carbon stock
Estimates of belowground biomass and debris biomass were added to the aboveground estimates to produce total biomass estimates. Biomass carbon pool was calculated by multiplying a carbon biomass conversion factor of 0.5 to the total biomass (Schlesinger, 1997).

Biomass-environment correlations
We used simple Pearson correlations to explore covariation among biomass and 14 environmental variables. Because the presence of spatial autocorrelation in model residuals violates the assumption of data independence (Bini et al., 2009), Pearson correlations among biomass and biotic and abiotic variables were calculated after ac-10 counting for spatial autocorrelation using the R package modttest 1.4 (José Manuel Blanco Moreno, Universitat de Barcelona, Spain, personal communication).

Scaling up to the whole region
To get an accurate estimate of biomass distribution, four approaches were selected for our analysis, including spatial interpolation of direct field measurements, non-spatial 15 regression model, spatial regression model, and decision-tree based modelling with random forests algorithm (RF). Spatial interpolation methods: These methods have been used for mapping forest variables (e.g. site index, standing volume, above-ground biomass, productivity, etc.) based on forest inventory data where these variables seemingly have spatial au-20 tocorrelation (e.g., Dungan, 1998;Freeman and Moisen, 2007;Viana et al., 2012). In this study, we compared several different approaches to find the "best" method for spatial interpolation of tree biomass. These approaches included ordinary kriging (Krige, 1951), standardized ordinary cokriging (elevations as the covariate), inverse distance weighting, thin-plate smoothing splines, and partial thin-plate smoothing splines. Cross-validation analysis was used to evaluate effective parameters for these interpolation methods. The results with the highest R 2 in cross-validation analysis were finally selected. Kriging, cokriging and inverse distance weighting were calculated using the geostatistics software GS+ (http://www.gammadesign.com), and thin-plate smoothing splines were calculated using the R package "fields" (Fields Development Team 2006).

5
After producing the biomass map for Alberta, we used the Alberta Natural Region GIS map to crop grassland and parkland regions, and the Alberta land cover map to crop the areas with the following land cover classes: waters, snow/ice, rock/rubble, exposed land, shrubland, grassland, and agriculture. Non-spatial and spatial regression models: Two steps were used to estimate biomass 10 stocks using canopy height data from spaceborne LiDAR. First, we used the data from the 1968 forest inventory plots to establish the relationships between total tree biomass and ground-measured top canopy height, climatic variables, elevations, latitudes, and longitudes. Both non-spatial multiple regression models (ordinary least squares, OLS) and spatial linear models (here "spatial simultaneous autoregressive 15 error models (SARs)", Kissling and Carl, 2008) were used. The SARs models allow the inclusion of the residual spatial autocorrelation of the data. Among these predictors, some of them were highly correlated. To reduce the risk of multi-collinearity, we used VIF (Variance Inflation Factors) for variable selection. The variables with VIF > 10, which represent high collinearity, were removed. The "best" model is selected based 20 on lower AIC (Akaike information criterion) and higher R 2 . Second, we applied this selected model to estimate tree biomass density (Mg ha −1 ) using LiDAR canopy height and other environmental variables in each 1 × 1 km grid in Alberta forest regions. All the analyses were done using R language (R Core Team, 2013), and SARs were calculated using the R package "spdep" (version 0.5-33).

25
Decision-tree based modelling with random forests algorithm (RF): The method is an ensemble machine learning technique, where many decision trees are constructed based on random sub-sampling of the given data set (Breiman, 2001). As one of tree-based models, RF performs recursive partitioning of data sets, and makes no assumptions regarding the distribution of the input data. RF can capture non-linear relationships between the response variable (tree biomass in our study) and predictor variables (canopy height, climate, and other environmental variable in our study), and can deal with correlated variables while producing a low generalization error (Breiman, 2001). In addition, RF can be used to rank the importance of variables in a regression 5 or classification problem in a natural way. In our study, this method was used to detect the relative importance of climate, topography and other environmental variables, and predict the distributions of forest biomass. All analyses were implemented in the R package "randomForest" (Liaw and Wiener, 2002).

Model accuracy assessment
10 Three well-known error statistics were calculated to measure the difference between the observed and predicted forest biomass, including mean absolute error (MAE), root mean-square error (RMSE), and the normalized root-mean-square error (NRMSE). They are defined as: Based on our inventory data, we detected a large variation of total tree biomass along forest stand ages ( Fig. 2a and b). We classified these plots into four forest age groups (young, immature, mature, and old-growth forests). Old-growth forests (age > 120 yr) and mature forests (80-120 yr) had the highest average tree biomass, 214.32 and 187.96 Mg ha −1 respectively. The average biomass density in immature 15 forests (50-80 yr) was 121.04 Mg ha −1 , and the average in young forests (< 50 yr) was 63.97 Mg ha −1 .

Biomass-environment correlations
The results of Pearson correlations after accounting for spatial autocorrelation showed that total tree biomass of each ground plot was strongly correlated with observed 20 canopy height (R 2 : 0.752, P < 0.001, Table 1, Fig. 2c). Elevation also showed significant correlations with total biomass. Among the 10 climatic variables, most variables were highly correlated with others. MCMT (mean coldest month temperature) and DD0 (degree-days below 0 • C) had relatively high correlations with total tree biomass.

Biomass estimates from four different approaches
We compared the results of four approaches for biomass estimation (  (Table 2). Nonspatial and spatial regression models performed nearly as well as the RF approach, 5 while spatial interpolation had the poorest estimate (R 2 = 0.29, MAE = 66.77 Mg ha −1 , RMSE = 85.08 Mg ha −1 , NRMSE = 84.20 %). The total tree biomass estimation from spatial interpolation was 5.07×10 9 Mg, which was much larger than the estimates from spatial regression model (3.01 × 10 9 Mg) and RF (3.11 × 10 9 Mg) (Fig. 3).
Using the RF approach, the estimates of total tree biomass across Alberta forest 10 regions was 3.11 × 10 9 Mg (Table 3, Fig. 3). The average biomass density in each 1×1 km grid was 77.59 Mg ha −1 . Nearly 25 % of total forest areas had biomass densities between 40-60 Mg ha −1 , and around 11 % of total forest areas had biomass densities larger than 150 Mg ha −1 (Fig. 4).
Total tree biomass in the boreal region (RF approach) was about 1.81 × 10 9 Mg, 15 accounting for 58.17 % of total tree biomass in Alberta forests among the four main natural regions of Alberta (Table 3). The estimated biomass was about 0.76×10 9 Mg in the Foothills, 0.50 × 10 9 Mg in the Rocky Mountain, and 0.03 × 10 9 Mg in the Canadian Shield. Among the fourteen natural subregions ( Alberta forests, and broadleaf forests and mixed forests stored 0.84 × 10 9 and 0.23 × 10 9 Mg biomass, respectively.

Biomass estimates of major tree species
Three major tree species, lodgepole pine, trembling aspen and white spruce, stocked about 1.91 × 10 9 Mg biomass in total, accounting for 61 % of total biomass in Alberta 5 forests (Fig. 5, Table 4). Total biomass of lodgepole pine was 0.76×10 9 Mg, and 84 % of which is distributed in the Foothills and Rocky Mountain regions. For trembling aspen, total biomass was 0.68 × 10 9 Mg, of which 78 % is distributed in the Boreal region. For white spruce, total biomass was 0.47 × 10 9 Mg, of which 58 % is distributed in the Boreal region.

Variable importance on biomass distribution
Using the RF, we also assessed the importance of various predictor variables on biomass distribution (Fig. 6). Canopy height, which was directly related to biomass, had major influence on biomass distribution at both stand and species levels. Elevation was also significantly correlated with biomass distribution. Each of the ten climatic vari- 15 ables had relatively weak effects on biomass distribution at the stand level. The three major tree species showed differing relationships with climatic variables. For lodgepole pine, DD0, MCMT and DD5 had stronger impacts on biomass than the other climatic variables. For trembling aspen, four climatic variables related to site dryness, including MAP, MSP, DI and AHM, were much more important than the other climatic variables. 20 For white spruce, MSP and DD5 had slightly stronger impacts on biomass than others.

Discussion
We reported a large-scale spatially explicit dataset for presenting biomass storage in Alberta's forest regions, derived from a combination of forest inventory data from 1968 19021 Introduction plots, spaceborne LiDAR data, land cover classification, climate and other environmental variables. Using decision-tree based approach with random forests algorithm, the total biomass stock in the study region was estimated to be 3.11 × 10 9 Mg, which is very close to Bonnor's (1985) estimate (3.15 × 10 9 Mg) based on volume inventory data ( Table 5). The average biomass density was 77.59 Mg ha −1 , which is close to 5 Bonnor's (1985) estimate (77.52 Mg ha −1 ). This study showed that the combination of multisource data could be a cost-effective way to estimate the amounts, distributions and variations of biomass carbon stocks across large regions with good accuracy.

Comparison with previous biomass estimations
We summarized previous studies on boreal forest biomass estimation at different spa-10 tial extents (Table 5). At the global scale, total biomass estimates of boreal forests ranged from 111.32× 10 9 Mg (Cao and Woodward, 1998) to 176 × 10 9 Mg (Dixon et al., 1994). In Canada forests, total biomass estimates varied from 29.02 × 10 9 Mg (Kurz and Apps, 1999) to 56.34 × 10 9 Mg (Penner et al., 1997). In Alberta forest regions, our estimate (3.11 × 10 9 Mg) using decision-tree approach was very similar to the estimate of Bonnor (1985), but smaller than the estimate of Penner et al. (1997) (Table 5). Compared with other studies, our estimate of mean biomass density was close to several studies at global and regional scales, while it also had a large difference from the estimates of some other studies, such as Dixon et al. (1994), Pan et al. (2011) andPenner et al. (1997) (Table 5). Clearly, there is a huge disagreement among different 20 estimates, but it is hard to compare them because of different data sources, estimation methodologies, and time periods of data collection. Compared with these previous studies, our current study has several improvements and advantages: (1) multisource data: we combined the data from ground-based inventory, LiDAR, land cover, climate and other environmental variables. Many previous 25 studies used only a single data source, and did not consider the role of climate and other variables in their analyses; (2) large, relative unbiased sample plots on forest inventory: the lack of sufficient and unbiased sample plots has been identified as a ma-19022 Introduction

Tables Figures
Back Close

Full Screen / Esc
Printer-friendly Version

Interactive Discussion
Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | jor barrier to accurately estimate biomass stocks at large area (Botkin and Simpson, 1990;Brown, 1997;Wulder et al., 2008). In the present study, the two different sources of plot data showed significant differences on stand age structure and biomass distribution (Fig. 2). The PSP data was derived from undisturbed, relatively productive stands and thus gave much greater average values of biomass density than the ABMI plots, 5 which includes both disturbed and undisturbed sites. Further, the regular distribution of ABMI plots places some of them in peatlands, which generally were avoided in the PSP inventory. Thus, the use of PSP data alone would lead to the overestimation of biomass. In terms of the scope and sample sizes, the data used in this study are more comprehensive and extensive than previous datasets; (3) By combining inventory data 10 and remote sensing data, we provide a cost-effective scheme of mapping biomass stock for provincial-and national-scale assessments.

Comparison of different methods for biomass estimations
Selection of appropriate models plays a central role in estimating biomass and carbon stocks (Fang et al., 1998;Saatchi et al., 2011). Four different approaches, includ- 15 ing spatial interpolation, non-spatial and spatial regression models, and decision-tree based modeling with random forests algorithm (RF), were used to yield an estimate of total tree biomass for our study area. We found that spatial interpolation greatly overestimated total tree biomass, while regression models and RF provided similar estimate with high accuracy. The overestimation by spatial interpolation might be related to the 20 characteristics of the approach itself and the data we use. First, the spatial interpolation approach assumes that spatial distribution of the variable we try to predict is a spatially continuous surface, and the near points generally receive higher weights than far away points (Krig, 1951). This principle can be easily used to the prediction of some climate and topography variables, but, for biomass and carbon, it might not be suitable because 25 the distribution of biomass is discontinuous usually because of different types of natural and anthropogenic disturbances (Supplement B). Second, the spatial interpolation approach we used only considered one additional variable, which seriously constricts 19023 Introduction

Tables Figures
Back Close

Full Screen / Esc
Printer-friendly Version

Interactive Discussion
Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | the ability to accurately predict. Although some techniques have been developed to consider multiple variables into spatial interpolation, they are still not available in most of widely used geostatistics software. Furthermore, for most of the PSP plots placed on upland sites, these are intermixed with a fine-scale mosaic of forested peatlands with much lower biomass.
As a nonparametric approach, RF has shown some outstanding advantages in our study. This is also supported by previous studies for soil mapping (e.g., Grimm et al., 2008), biomass mapping in forests (Baccini et al., 2004;Neumann et al., 2011;Asner et al., 2013) and seafloor (Wei et al., 2010), and bird distribution modeling (Kreakie et al., 2012). The advantages of random forests include: ability of modeling high dimensional non-linear relationships, handling of categorical and continuous predictors, resistance to overfitting, relative robustness with respect to noise features, unbiased measure of error rate, and measures of variable importance (Breiman, 2001;Grimm et al., 2008). Therefore, by combining different predictor variables, this approach has a great potential for improving the estimation of forest biomass at regional and global 15 scales.

Canopy height as an important determinant of biomass distribution
It is well known that canopy height is a critical indicator of forest site quality and growth potential (Kimmins 2004;Fang et al., 1998). Also, canopy height is highly related to stand age and forest disturbance, both of which affect directly forest biomass and pro-20 ductivity. Using a large sample of forest inventory data, we detected a significant relationship between biomass and canopy height (Table 1, Fig. 2). The assessment of variable importance using the RF approach also showed that canopy height was the most important variable for determining biomass distribution in our study area (Fig. 6). However, canopy height has been rarely used in previous estimations of regional-scale 25 biomass and carbon storage, because this information was not available over large areas in the past. The development of remote sensing techniques, especially LiDAR, has provided high or medium resolution canopy height products at both regional and 19024 Introduction  (Lefsky et al., 2010;Simard et al., 2011), and provides an opportunity to obtain more accurate estimates of biomass and carbon storage over large areas. For example, based on 1 km resolution spaceborne LiDAR canopy height data (Lefsky et al., 2010) and ground inventory data, Saatchi et al. (2011) mapped the total biomass carbon stocks in tropical regions across three continents with a forest area of 5 2.5 billion ha. Therefore, the integration of plot-based measurements of biomass with remotely-sensed observations of canopy height can provide a cost-effective method for large-scale mapping. In addition, the LiDAR canopy height data are closely related to logging and fire history, allowing recently logged and burned sites to be more accurately accounted for in biomass carbon estimation.

Biomass-climate relationships
Understanding biomass-climate relationships is important for biomass and carbon mapping under past and current conditions as well as for making future projections under a changing climate. Although climatic variables have been used in biomass estimations, we know relatively little about how climate influences variation in biomass 15 stocks (Stegen et al., 2011). In this study, we found that climate explained relatively little of the observed, stand-level variation in Alberta forest biomass (Table 1, Fig. 6), which is consistent with Stegen et al. (2011)'s findings on biomass-climate relationships in temperate and tropical forests. Disturbance regime is likely a better predictor of biomass but these are often difficult to map at regional scales. Because canopy 20 height is strongly influenced by time since the last stand-replacing disturbance (e.g., fire), high-resolution LiDAR data can play an important role in estimating biomass and productivity at regional and national scales. Species-level analysis on biomass-climate relationships showed that tree species respond differently to how climate affects biomass distribution (Fig. 6). For lodgepole 25 pine, chilling degree-days (DD0), mean coldest month temperature (MCMT) and growing degree-days (DD5) played a more important role than other climatic variables. This strong correlation with degree-days is also supported by previous studies on lodgepole 19025 Introduction  (Monserud et al., 2006). For trembling aspen, four drought-related variables (MAP, MSP, DI and AHM) were much more important than other climatic variables, which confirm previous studies about drought-related impacts on aspen stand dynamics (e.g., Hogg et al., 2008;Michaelian et al., 2011).

5
To map total carbon (C) storage of Alberta forests, we also need high quality data on soil C in our study area. Boreal forest ecosystems contain vast C stocks in soil, most of which is found in peatlands and permafrost soils (Deluca and Boisvenue, 2012). Soil C in boreal ecosystems has been reported to account for about five times the total C in the standing biomass or about 85 % of the total biome C (Malhi et al., 1999). The 10 large-scale estimation of soil C stocks poses many challenges (Liu et al., 2013), and was thus not specifically included in the current study. However, using the recent data set of North American soil organic carbon content at 0.25 • resolution (Liu et al., 2013), we estimated the total soil carbon stocks in Alberta's forests to be about 11.8 × 10 9 Mg, with a high proportion in peatlands (Vitt et al., 2000). Our estimate of biomass carbon 15 (1.56× 10 9 Mg, 50 % of total tree biomass) only accounted for 12 % of total carbon stocks (13.36×10 9 Mg), while soil carbon accounted for 88 %. Clearly, more efforts are needed to better understand spatial and temporal variation of biomass and soil carbon stocks in the boreal forest.  BGD 10,2013 Estimating spatial variation in Alberta forest biomass J. Zhang et al.   Fig. 6. Relative importance of predictor variables for biomass estimation by decision-tree based modeling with random forest algorithm. Variable importance is measured in mean decrease in accuracy, which is the decrease in accuracy of a classification after the variable has been randomly permuted. A higher mean decrease in accuracy means the variable contributes more to the accuracy of the classification.