Technical note : A new approach for comparing soil depth profiles using bootstrapped Loess regression ( BLR )

Introduction Conclusions References


Introduction
Understanding the consequences of different land uses for the soil system is important to better inform decisions based on sustainability (Foley et al., 2005;Haygarth and Ritz, 2009).The ability to assess change in soil properties effected by altered land use or management is therefore a critical step in this process.Greatest change is likely in the surface layers with factors such as tillage and plant inputs impacting the physical, chemical and biological properties of the soil.Many soil properties, however, will also be modified below this depth, particularly as time since land use change (LUC) increases (Popelau et al., 2011).It is therefore important that changes can be assessed below the topsoil and throughout the soil profile.Introduction

Conclusions References
Tables Figures

Back Close
Full As a prime example, a number of studies, including global meta-analyses, have summarised the impacts of LUC on soil C concentration and stocks (e.g.Guo and Gifford, 2002;Maquere et al., 2008;Laganière et al., 2010;Poeplau et al., 2011).Soil organic matter and organic C is generally concentrated in the top 30 cm of the soil and so LUC is generally expected to have the greatest impact on soil C in these upper layers (Lorenz and Lal, 2005;Laganière et al., 2010).Even within this surface soil, however, the magnitude and sometimes direction of the effects of LUC on soil C can depend on the depth that is being considered (Guo and Gifford, 2002;Popelau et al., 2011).It is also becoming more evident that, in addition to there being a large proportion of total soil C stocks resident in the subsoil, important C dynamics may also occur deeper in the soil (Jobbágy and Jackson, 2000;Lorenz and Lal, 2005).
The turnover time of soil C generally increases with depth and hence the stabilisation of C may take place in deeper soil.This may take place through biochemical stabilisation driving reduced decomposition, by the inherent recalcitrance of root litter (e.g.lignins) and by physicochemical stabilisation (e.g.complexing with minerals and clay in subsoils) (Lorenz and Lal, 2005).Conversely, priming of the decomposition of older soil C may occur following LUC, especially with woody species (see Fontaine et al., 2007).This is particularly relevant for LUC to perennial vegetation or forest where deeper rooting plants are involved.For example, the root systems of perennial or tree species are likely to be more permanent and extensive in the subsoil, with a greater contribution of recalcitrant litter and potential priming down the soil profile (Fontaine et al., 2007).Altered land use or management may also impact the translocation of particulate and dissolved organic C likely to occur down the soil profile via effects on leaching.Such mechanisms may produce more complex relationships between soil depth and soil characteristics, and even discontinuous horizonation, rather than linear gradients.Introduction

Conclusions References
Tables Figures

Back Close
Full

Existing approaches to model soil depth profiles
There are various approaches that have been used recently to model non-linear soil depth profiles including, for example, modified exponential decay (Maquere et al., 2008), spline functions (Malone et al., 2009;Wendt and Hauser, 2013), depth distribution functions which utilise multiple regression (Indorante et al., 2013) and asymmetric peak functions (Myers et al., 2011).The most common of these alternative methods is the use of Generalised Additive Models (Hastie and Tibshirani, 1990).Differences in soil C across transitions and soil depth profiles can be tested with both land use and depth included as fixed factors in an interaction model, and appropriate random terms to account for non-independence of depth increments within the same core and/or plots.Non-linear relationships between soil C and soil depth across LUC transitions can also be incorporated by the inclusion of flexible splines (Wood, 2003).Confidence intervals and significance tests are, however, based upon the assumption that the response variable is drawn from the exponential family of distributions and inference is very sensitive to this assumption.
Recent work modelling depth profiles has focussed on deriving parametric non-linear relationships between soil depth and the response of interest.Maquere et al. (2008) adopt a parametric form with modified exponential decay, whereas Myers et al. (2011) use an approach based on asymmetric peak functions.Whilst capturing the non-linear form of the soil depth profile, neither method adequately handles the associated uncertainty and hence confidence intervals, with the Maquere method assuming a t distribution and the Myers method failing to produce confidence envelopes at all.
Regression-based approaches similar to the popular GAM method have also been adopted using multiple covariates to account for any non-linearity (Indorante et al., 2013) and fitting cubic splines directly (Wendt and Hauser, 2013).However, the multiple regression approach assumed a normal distribution of the response variables, which is often not realised, and the cubic spline method presented by Wendt and Hauser does not provide any measure of uncertainty.Malone et al. (2009), in their study mapping Introduction

Conclusions References
Tables Figures

Back Close
Full continuous depth functions of soil carbon and water storage, highlighted the need for better estimation of uncertainty in model outputs, suggesting the use of simulation and re-sampling approaches.Simulation and re-sampling techniques avoid the necessity to assume a distributional form for the response variable in order to obtain confidence intervals and test hypotheses.Such approaches are rarely used to investigate soil depth relationships despite the often flawed assumptions made by the more commonly applied methods.Clifford et al. (2014) adopted a simulation routine from a master database to impute missing values and this clearly demonstrated another strength of the simulation approach, though they did not apply the method directly to test specific hypotheses relating to changes along the soil profile.
We sought to develop an approach which (1) would be able to compare and test for significant differences between potentially non-linear depth profiles of land uses (or across land use transitions), (2) did not need to meet any parametric distribution assumptions given that individual datapoints in soil datasets are typically nonindependent (i.e.vertically or horizontally nested measurements) and (3) would be generally applicable regardless of specific contexts of land use and soil type.Below, we describe the resulting non-parametric approach and provide an example comparing soil C depth profiles across a land use transition.

A Bootstrapped Loess Regression (BLR) approach
The developed approach combines bootstrapped resampling of data with local least squares-based polynomial smoothing (Loess) regression.Consequently, this nonparametric method benefits from being data-driven and unconstrained by distributional form or rigid model parameterisation.Like a cubic spline approach (Wendt and Hauser, 2013), it does not assume constant values for soil layers or horizons.Such a nonparametric approach is highly suitable where data are non-independent.This is particularly applicable in soil profiles where measurements made in depth increments down

Conclusions References
Tables Figures

Back Close
Full a soil profile may be correlated and even more relevant where data are cumulative (e.g.cumulative C stocks).It is also appropriate where soil cores have been sampled using a nested spatial design with multiple cores taken from within plots.
The BLR approach is intended to make use of soil data which has been measured at fixed-depth intervals down the soil profile at a generally high resolution, or at least at a resolution satisfactory for the purposes of an assessment.The vertical sampling resolution is not limited to any specific depth interval (e.g. 10 cm increments) but clearly a greater, and regular, resolution provides more detailed information on potential differences and their specific location in the soil profile.Low sample sizes will affect the amount of smoothing that can be done by the Loess algorithm.As the algorithm fits polynomial regressions within local neighbourhoods, the definition and size of the neighbourhood determines the smoothness and sensitivity of the fitted regression line.Typically a minimum of 3 observations per neighbourhood would be required.
The initial dataset comprises all data for the soil variable of interest from the two land uses (LU 1 ∪ LU 2 = LU ALL ) which are to be compared, with the associated depth and/or soil mass as reference.A subset is then created containing only data from the "second" land use (LU 2 ).In theory, it does not matter which of the land uses are subsetted for LU 2 but one may be more intuitive given the direction of a specific land use transition.It is also useful to plot the data to determine whether the datasets contain outliers that may need to be excluded before bootstrapping to prevent skewing the Loess regression.
For cumulative mass-based data, if datapoints from the bottom depths of either LU 1 or LU 2 are at distinctly greater cumulative masses than others, these could also be trimmed so that the comparison is made to the same approximate lower bounds of the reference.Using a large number of bootstrap samples, however, should negate the need for extensive data cleansing prior to analysis.
The combined data (LU ALL ) is re-sampled by bootstrap with replacement, with the number of data-points resampled equal to the number of data-points in LU 2 .This is repeated n = 1000 times.Each bootstrapped set of data are then modelled using Loess regression and these regressions are used to generate 95 % confidence inter-Introduction

Conclusions References
Tables Figures

Back Close
Full vals around a modelled soil depth profile by taking pointwise percentiles at each depth.
As each sub-sample is taken from the union of the two land uses, this confidence interval (or confidence envelope) represents the null hypothesis that there is no difference between the LU 1 and LU 2 .The data from only LU 2 is then modelled using Loess regression; if the modelled line for the LU 2 profile sits outside the confidence envelope of the null hypothesis it can be inferred that the soil variable is significantly different between LU 1 and LU 2 at that particular point in the profile.Overall P values for the difference between depth profiles can be obtained by taking normalised test statistics across the full set of bootstrap samples and taking the percentile of these values corresponding to the same statistic obtained from the LU 2 data.This is a similar approach to that adopted in the spatial statistics literature when analysing K functions under resampling as demonstrated in Diggle et al. (2008) and Henrys and Brown (2009) for example.This relatively straightforward non-parametric method may be most useful in comparing soil carbon or organic matter profiles between land uses, but it can be applied to any soil property which has been measured at satisfactory resolution down the soil profile.Many of these other properties measured in soil (e.g.bulk density, pH, root biomass) can vary in a non-linear fashion down the soil profile, with potential horizonation.The effects of land use change are typically examined using either a paired-site or chronosequence approach.These assume that each paired or chronosequence site only differs in their age or, for example, time since disturbance and have comparable biotic and abiotic histories (Laganière et al., 2010).While this BLR method benefits from being unconstrained by assumptions of parametric methods, it must still satisfy the assumptions of the paired-site and chronosequence approaches, particularly if spacefor-time substitution is used (Indorante et al., 2013).

Conclusions References
Tables Figures

Back Close
Full The bootstrap re-sampling and Loess regression used to test differences between soil profiles was conducted using the R statistical programming language (R Core Team, 2015).Example code to demonstrate the BLR approach using real data are available via http://doi.org/10/f3jp5d(Keith et al., 2015).These data are from a study examining the impacts of bioenergy LUC on soil C in the UK (Rowe et al., 2015).A LUC transition from arable to Short-Rotation Coppice (SRC) willow was selected and the data were separated into subsets of those from each component of the transition (i.e.arable and SRC willow samples) before analysis.Data on soil carbon concentration (expressed as a percentage), cumulative soil C stock and cumulative dry soil mass were derived at 10 cm increments to 1 m depth in order to construct fixed-depth profiles of soil C concentration (Fig. 1a) and mass-based depth profiles of soil C stocks (i.e. the relationship between soil mass and soil C sensu Gifford and Roderick, 2003) (Fig. 1a).Cumulative soil mass was used because measured soil C stock in small fixed-depth increments (as was required in this study) may not be directly comparable across LUC transitions, due to potential variation in bulk densities and any compression or expansion introduced through sampling (e.g.Gál et al., 2007).An approach using soil mass as the independent variable overcomes this issue more generally because profiles can be directly compared at a particular reference soil mass (Gifford and Roderick, 2003;Wendt and Hauser, 2013).Gifford and Roderick (2003) suggest a reference dry soil mass of 4000 and 12 000 t ha −1 may be used to approximate sampling to 30 cm and 1 m depth in agricultural systems, respectively.This is not an issue when examining soil C concentration, as these data are not directly influenced by core volume and apparent bulk density.
Individual datapoints for each land use, the confidence envelope of the null hypothesis and the modelled profile for the SRC willow were plotted following BLR (Fig. 1).
Where the modelled line sits outside the confidence envelope it can be inferred whether there are significant effects of land use in the soil profile, at either a particular depth or Introduction

Conclusions References
Tables Figures

Back Close
Full references soil mass.In Fig. 1a, the soil C concentration is significantly greater under SRC willow compared with arable at 10 and 20 cm, where the modelled line sits to the right of the confidence envelope.The modelled line sits within the confidence envelope between 40-100 cm and so there is no significant difference (Fig. 1a).Nevertheless, the two depth profiles are significantly different overall (P < 0.01).The depth profile of soil C concentration is reflected in the cumulative soil C stock profile, with the modelled line for SRC willow moving further from the confidence envelope up to approximately 5000 t ha −1 (Fig. 1b).The difference in cumulative soil C stock between arable and SRC willow is maintained to 100 cm and, consequently, is significantly different down the whole soil profile (P < 0.01; Fig. 1b).

Conclusions
We modelled soil profiles and tested differences in soil characteristics between land use or land management using a non-parametric approach combining bootstrap sampling and Loess regression.The development of this approach was driven by a need for a flexible method which could compare potential non-linear relationships between land uses (or across land use transitions) and would not be constrained to specific contexts.While there are several other methods which can be used to model non-linear relationships in soil depth profiles, the BLR approach is generally more flexible because it is data-driven and does not need to meet any distributional assumptions.The confidence envelopes obtained are robust to miss-specification of the error distribution and provide clear inspection of significant differences across the full depth profile.Sampling to depth and increasing the resolution of depth increments can provide useful profiles or "fingerprints" of soil properties under different land uses and soil types.In particular, assessment of soil C to depth, and determining the rooting systems that may influence the C balance deeper in the subsoil via priming of decomposition, C stabilisation or translocation.The BLR approach can be, however, applied to any soil property of interest giving the ability to assess land use effects at any point down the soil profile.Being data-driven and flexible, it is hoped that further studies of land use and land management, based on new or existing data, can make use of this approach to examine differences in soil profiles.Introduction

Conclusions References
Tables Figures

Back Close
Full Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | 4 Applying a BLR approach -an example of bioenergy land use change Discussion Paper | Discussion Paper | Discussion Paper | response of soil C to land use change (LUC) or land management change is essential to understand the sustainability of different soil use options.This may be particularly important for land-use transitions to perennial crops, which have deeper and more permanent Discussion Paper | Discussion Paper | Discussion Paper |

Figure 1 .
Figure 1.Depth profiles of (a) soil C concentration as a function of sampling depth and (b) cumulative soil C stock as a function of soil mass.Depth represents values of samples from 10 cm increments.Grey and black symbols represent SRC willow and arable data-points, respectively.Dashed lines represent upper and lower bounds of 95 % confidence intervals from bootstrapped (n = 1000) Loess regressions of combined arable and SRC willow data; solid lines represents Loess regression of percent C and cumulative soil C in SRC willow only, if this line sits outside the confidence interval it can be inferred that arable and SRC willow are significantly different.