The effect of using the plant functional type paradigm on a data-constrained global phenology model

Leaf seasonality impacts a variety of important biological, chemical, and physical Earth system processes, which makes it essential to represent leaf phenology in ecosystem and climate models. However, we are still lacking a general, robust parametrisation of phenology at global scales. In this study, we use a simple process-based model, which describes phenology as a strategy for carbon optimality, to test the effects of the common simplification in global modelling studies that plant species within the same plant functional type (PFT) have the same parameter values, implying they are assumed to have the same species traits. In a previous study this model was shown to predict spatial and temporal dynamics of leaf area index (LAI) well across the entire global land surface provided local grid cell parameters were used, and is able to explain 96 % of the spatial variation in average LAI and 87 % of the variation in amplitude. In contrast, we find here that a PFT level parametrisation is unable to capture the spatial variability in seasonal cycles, explaining on average only 28 % of the spatial variation in mean leaf area index and 12 % of the variation in seasonal amplitude. However, we also show that allowing only two parameters, light compensation point and leaf age, to be spatially variable dramatically improves the model predictions, increasing the model’s capability of explaining spatial variations in leaf seasonality to 70 and 57 % of the variation in LAI average and amplitude, respectively. This highlights the importance of identifying the spatial scale of variation of plant traits and the necessity to critically analyse the use of the plant functional type assumption in Earth system models.


Introduction
The ability to understand and predict leaf seasonal cycles, a process known as leaf phenology, is essential to our understanding of Earth systems processes, through its impact on the carbon and water cycles (White et al., 1999;Wilson and Baldocchi, 2000) and climate (Hayden, 1998).As such, phenology is an essential component of global vegetation models and an improvement in our understanding of, and ability to predict, leaf phenology would improve Earth system model predictions.
One of the aspects of global vegetation models that is currently under scrutiny is the way parameters are assigned to the simulated vegetation within a given model grid cell.Traditionally, models make use of the plant functional type (PFT) concept.In this approach, a small number of PFTs are defined, each with a corresponding set of parameters, then a given grid cell is assigned to one, or a mixture of, these PFTs.However, more recently efforts are being made to include a more biologically detailed representation in the form of plant traits.PFTs are classes of plant species with similar characteristics and roles within ecosystems (Box, 1996;Smith, 1997) and found within certain bioclimatic regions (Prentice et al., 1992;Haxeltine and Prentice, 1996).All model parameter values are then assigned to each PFT either based on ground measurements or through parameter estimation.This approach has the underlying assumption that all plants within such a PFT show an identical behaviour (Sitch et al., 2003), an assumption applied to all processes represented in such models, including leaf phenology.Such an assumption is necessary because of the lack of available measurements Published by Copernicus Publications on behalf of the European Geosciences Union.S. Caldararu et al.: The PFT paradigm in a phenology model across the globe needed for models, which are not data constrained.
The main advantage of using PFTs in vegetation models is the simplicity of the concept and the relatively small number of parameters, minimising both the amount of data and computational effort required.Using PFTs to represent ecological processes at global scales would be the obvious initial choice for parameter inference because the number of parameters can be kept low while still representing the various types of vegetation.PFTs are also a useful concept for future climate predictions where expected changes in vegetation type can be easily represented in this way.Dynamic global vegetation models predict PFT distributions based either on pre-defined climate envelopes (Prentice et al., 1992) or pre-defined competitive outcomes, both approaches being based on existing PFT distributions (Arora and Boer, 2006).Recent studies have attempted to use a more physiologicalbased approach (Fisher et al., 2015).
However, there are a number of disadvantages to using the PFT approach, mainly due to the fact that a PFT-type categorisation imposes fixed parameter values and cannot capture the continuous variation observed in plant traits within and among PFTs (see review by Van Bodegom et al., 2012).Capturing such heterogeneity not only may improve the prediction of biogeochemical and physical dynamics in Earth system models but also may improve predictions of other longer-term vegetation processes such as shifts in vegetation composition to climate change.Recent studies have therefore focussed on replacing the PFT method with using plant traits (Sakschewski et al., 2015;Verheijen et al., 2013;Pavlick et al., 2013) and identifying the distribution of traits to use in different locations across the Earth's surface (Kattge et al., 2011;Reich et al., 2007).
Given the potential advantages and disadvantages of the PFT approach, it is important to formally evaluate it in comparison to alternative approaches, such as using locationspecific traits, but such a formal comparison has not been carried out to date.
In the current paper we aim to investigate the use of PFT and trait-based parameters within the framework of a dataconstrained global phenology model.We have chosen to use a previously developed leaf phenology model (Caldararu et al., 2014) as a simpler case than a full-scale dynamic global vegetation model (DGVM).For the purpose of this paper, we use the term phenology to encompass seasonal trajectories of leaf area index (LAI) as well as the timing of leaf off and leaf on, which is what the term refers to in its stricter sense.We explore the extent to which the PFT assumption can capture the spatial variability in leaf seasonality.To this end, we use three main different model parametrisations: the local parametrisation, the fitted parameters at the PFT level and a novel approach, which combines PFT level parameters with local traits, and two additional ones -a global and regional parametrisation (Sect.3).We explore the differences between the different parametrisations (Sect.4) and we aim to explain the effects shown by local parameters and their relationships with plant traits (Sect.5).

LAI data
We use LAI data from the Moderate Resolution Imaging Spectroradiometer (MODIS) on board the Terra platform.We use the MODIS collection 5 product MOD15A, which is available at 1 km spatial resolution and an 8-day time step (https://lpdaac.usgs.gov/).The MODIS LAI is based on a reflectance algorithm, which uses the red and nearinfrared bands and includes corrections for canopy structure and background soil reflectance (Knyazikhin et al., 1999).In cases where this main algorithm fails, a backup algorithm is used, which is based on an empirical relationship between LAI and NDVI (normalised difference vegetation index).We use the quality assurance flags provided with this product to filter pixels that were derived using the backup algorithm or which are classified as snow covered, as described in Caldararu et al. (2012).We use data for the globe with a spatial resolution of 1 km, which we then aggregate to the GEOS-4 base resolution of 2 • latitude by 2.5 • longitude, by calculating the mean of all pixels within a grid cell.All pixels classified as cropland were excluded prior to averaging.The aggregation to the coarser resolution was done both to reduce computational effort and to obtain time series without the gaps resulting from our filtering procedure.The data were split into a training (2001)(2002)(2003)(2004)(2005) and an evaluation (2006) data set.

Environmental variables
To drive the model, we use temperature and photosynthetically active radiation (PAR) data from assimilated meteorological data products of the Goddard Earth Observing System (GEOS-4) (Bey et al., 2001), which is available at a spatial resolution of 2 • latitude by 2.5 • longitude and a temporal resolution of 3 h, which we average to a 1-day temporal resolution.The soil moisture data required in the model were obtained from the NCAR/NCEP reanalysis daily average surface flux data set (http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanalysis.surfaceflux.html)(Kalnay et al., 1996), which is provided at a 1-day temporal resolution and has been regridded to the GEOS-4 spatial resolution.

Plant functional type map
We use a global PFT map, which is used in the Integrated Biosphere Simulator Model (IBIS) (Kucharik et al., 2000).This differentiates between 13 different plant functional types based on general plant properties (trees vs. grasses), temperature tolerance (tropical vs. temperate), and leaf habit (deciduous vs. evergreen).The PFT data are provided at a where P refers to leaf production processes, which are calculated as a function of solar radiation I 0 and the LAI at the previous time step LAI(x, t − 1), and L refers to leaf loss summed over all groups of leaves of the same age a (see Table 1 for a full list of parameters).The model allows for continuous leaf gain and loss and does not include abrupt leaf on and leaf off thresholds, thus being able to capture irregular seasonal cycles (Caldararu et al., 2012(Caldararu et al., , 2014)).
To describe leaf gain, we define the concept of target LAI as the optimum number of leaf layers for a given light level at the top of the canopy I 0 so that the bottommost leaf layer receives sufficient light for photosynthesis, that is light at the compensation point C (Wm −2 ).The target is calculated using Beer's law of light extinction and expressed as where α is the canopy extinction coefficient calculated as a function of day of year and latitude (Brock, 1981;dePury and Farquhar, 1997).The solar radiation at the top of the canopy I 0 is averaged over a number of p days.We calculate separate values for LAI targ for direct and diffuse radiation to account for the different response of photosynthesis to the two.The overall target is then calculated as the minimum of the two values.We choose to use a direct and diffuse compensation point as a simplified representation of light distribution in the canopy.At any time step, if the existing LAI is lower than the target value, new leaves are gained to reach the target LAI.We introduce a parameter gain max to limit the new leaves that can be added at each time step to reflect the physiological limits to building new leaves.The gain at any time t and for all locations x is then calculated as (3) To account for the effects of temperature, we set a threshold of 0 • C mean daily temperature under which no leaves are gained.Initial parameter optimisations where this threshold is a free parameter have shown that the model is not very sensitive to its value (Caldararu et al., 2014).Following the optimality hypothesis, leaves are lost when their carbon assimilation is less than their respiration and maintenance costs, defined as the limit assimilation value A min .We calculate the carbon assimilation as a linear function of PAR absorbed by the canopy, I tot , per unit leaf area: here φ and q are model parameters representing photosynthetic efficiency (µmol s −1 W −1 ) and canopy level light compensation point (µmol m −2 s −1 ).Due to the lack of data constraints for carbon assimilation in our modelling framework, we normalise assimilation values and associated parameters.As a result, parameter values for φ and q in the above equation do not represent measurable values in the field, but instead scale between potential minimum and maximum photosynthetic rates within the model.While the absolute values of these two parameters have been scaled, the relative distributions across the globe can still be interpreted as having physical meaning.It is worth noting that the canopy level compensation point q and the direct and diffuse compensation points used in the calculation of the target LAI have different units (µmol m −2 s −1 and Wm −2 , respectively), due to the units that the original PAR data are in as well as the structure of the model.
To account for water limitation to assimilation and, implicitly, phenological processes, we introduce a factor f W calculated as where W s is volumetric soil moisture (unit less) obtained from the NCAR/NCEP data set, s 1 and s 2 are parameters associated with water extraction capacity from the soil, represent potential evapotranspiration and u is plant water use.
Similarly, we define an age factor f age to describe the declining carbon assimilation of leaves as they age: where µ is the rate of decrease with age (yr −1 ) after a limit age a crit (years).Using both these factors, the overall assimilation is calculated as Overall, the leaf loss at any point in time t and all locations x for any group of leaves of the same age a (cohort) is To calculate the overall canopy LAI loss, we can then sum over all age groups.

Model setup
We use five different model parametrisation to explore the extent to which the PFT approach is applicable to a dataconstrained phenology model.The first such model setup, previously used in Caldararu et al. (2014), is to fit a unique parameter set to each grid cell.We will term this the local model.This approach involves a very large number of parameters (14 parameters at each grid cell, for 2041 vegetated grid cell results in a total of 28 574 parameters).It is important to note, however, that the total amount of data available from sources such as MODIS is also very large, making it possible to parametrise extremely parameter-rich models, depending on the exact nature of the data.
The second model setup is using one set of parameters for each PFT, resulting in only 182 parameters for the entire globe.We term this the PFT model.To investigate the potential effects of geographical separation, we further separate each PFT into geographical regions (e.g.North American temperate deciduous broadleaf and European temperate deciduous broadleaf), resulting in 44 regions.This was done to test the assumption that species evolving in different geographical locations have different physiological parameters even when belonging to the same PFT.This setup is referred to as the region model and has 616 parameters.As a point of reference, we also introduce a global model where parameters are common for all grid cells, under the assumption that there is no difference in phenological behaviour between vegetation types or geographical regions.
To test the extent to which each parameter represents local characteristics, in the final model setup one or more parameters are location specific while the rest have PFT-wide values.We then term a parameter local if it has a specific value at each grid cell.This setup is the combination model.As there are a very large number of possible combinations of local parameters, we perform an initial analysis to determine which parameters would most improve the model performance, if local.We performed a principal components analysis (PCA) of the spatial variation in parameter values fitted for the local model.This highlighted that the principal axis of variation in all parameter values was strongly correlated with variation in C direct , while the second axis was dominated by variation in a crit (Appendix Table A1).We also fit 14 different model parametrisations, allowing each parameter in turn to be local, while the other parameters are fitted at the PFT level.The two parameters identified by the PCA, the light compensation point C direct and the leaf age limit a crit , also show an increase in model performance, especially in terms of spatial variation explained (Table A2).As a consequence of these two analyses, we focussed in detail on only one model that combined local and PFT parameters in which the C direct and a crit parameters are local.This model has 4238 parameters for the whole globe, compared to 28 574 for the local model and 182 for the PFT.
We fitted all models to the data using a custom Markov chain Monte Carlo (MCMC) algorithm, known as the Filzbach algorithm (http://research.microsoft.com/en-us/um/cambridge/groups/science/tools/filzbach/filzbach.htm), which has been described in detail in Caldararu et al. (2012).Filzbach utilises MCMC with the Metropolis-Hastings algorithm to estimate the joint distribution of the parameter set θ .In our study we assume no prior information about θ and therefore our implementation is reduced to estimating the θ associated with the highest probability of the observations given by the model.To do this we need to define a likelihood function that gives the probability of the data for any set of predictions from the model with a given parametrisation.
For the local model this likelihood function is maximised independently at each location x and is calculated as where LAI pred (x, t, θ x ) is the predicted LAI at location x at time t (this depends on the model parameters θ x ); LAI obs (x, t) is the observed MODIS LAI at location x at time t; and n LAI obs (x, t), LAI pred (x, t, θ x ), σ x denotes the probability density for observing LAI obs (x, t) given a normal distribution with mean LAI pred (x, t, θ x ) and standard deviation σ x , which expresses the magnitude of unexplained variation in LAI.The likelihood is calculated as a sum over all time steps at location x, expressed as t (x).
For the global, regional and PFT models, the likelihood estimation is carried out at the global, regional or PFT level, the likelihood being calculated as the sum at all locations x within a group G, x(G): ln n LAI obs (x, t), LAI pred (x, t, θ G ), σ G , (10) where Z G and θ G denote observed LAI and model parameters for a given group of grid cells G. Within the combination model, the likelihood is again minimised for a whole PFT, but in addition to the PFT level parameters θ G , the predicted LAI is also a function of local parameters θ B,x .For all model parametrisations, we use years 2001-2005 as training data and 2006 for evaluation purposes.The model was run at a daily time step, but the likelihood was only computed when MODIS data were available, with a time step of 8 days.
Without separating training and test data in this way, the more parameter-rich models would be guaranteed to give a better fit to the data.Separating the training and test improves our ability to assess model performance, although, given that the training test data are separated by a relatively short time and not separated in space, we expect a tendency for the more parameter-rich models to provide superior performance against the test data.

Model performance metrics
To compare the different types of models described above, we define several model performance metrics against the test data.The best model should be able to capture both the timing and magnitude of the seasonal cycle at each location and the spatial variability in seasonal cycles across the globe.As an overall measure of fit we use the root mean squared error (RMSE) normalised by the mean LAI, which is a measure of the fit at each particular location.The mean LAI and LAI amplitude describe the magnitude of the seasonal cycle and we use the percent of variation explained to capture the extent to which the model describes their spatial distribution.Similarly, we use the start and end of the growing season to describe the timing.We define the start of the growing season as the first date of the year when the LAI reaches 0.2 of the maximum LAI, while the end of the growing season is the equivalent last date.To capture the timing in tropical areas with a less pronounced seasonal cycle, we also use the timing of maximum LAI.All metrics are reported for the model evaluation period (2006).
We choose not to use statistical information criteria (e.g.Bayesian information criteria) because our model fitting methodology does not easily allow for the computation of a single likelihood metric.The model structure is the same for all parametrisations, with the main model differences being the number of parameters at each grid cell.However, this means that different quantities of data are also used to fit different models.For example, since the local model is fitted separately at each location, it effectively consists of 2041 separate models, each with 14 parameters, while the PFT model contains 13 models each with 14 parameters.Rather than work out a global information criterion-based metric for the models, we instead opt to use the more meaningful metrics of the relationships between the model predictions and the data described

Results
An overall comparison of the five model parametrisations (Table 2) shows that the global model has the highest error, while the local model has the lowest error.The fact that the global setup has a very high error is not unexpected since there are known physiological differences between plant functional types, which is why the use of PFTs is common in global modelling studies.However, the PFT model also has a much higher error than the local one.The regional model does not show a significant improvement from the PFT, with the exception of the tropical broadleaf evergreen forest PFT.Below we will discuss in detail only the PFT, combination and local models, where this particular forwww.biogeosciences.net/13/925/2016/Biogeosciences, 13, 925-941, 2016 est PFT has been separated into geographical (continental) regions.
Figure 1 shows the overall model error over the entire study period for the three main model parametrisations.Relative root mean squared error (unitless) values are much higher for the PFT model than for the local model, 0.52 ± 0.5 compared to only 0.24 ± 0.03.The combination model has a lower error of 0.38 ± 0.45.These errors are much lower for tropical forests, typically 0.15 for the local model, compared to 0.22 for the PFT and 0.16 for the combination models (Fig. 1).Similar errors occur in temperate deciduous areas.The highest errors are observed in tropical grasslands and shrublands for all models and specifically for the PFT model (up to 2).
Figure 2 shows the relative difference between model and observed LAI annual mean and amplitude.Both the local and combination models underestimate the mean LAI across all PFTs by 11.3 and 23.4 %, respectively.The PFT model exhibits a higher bias, with a mean value of 45.4 %, with the highest difference in tropical and temperate deciduous regions (over 90 %).The PFT model underestimates the seasonal amplitude in tropical forests by up to 50 and by 20 % in higher latitude regions, while overestimating it by up to 200 % in subtropical grasslands and savannas.The combination model shows a similar pattern but a lower bias, with differences of 27 % in tropical forests and 13 % in temperate areas, similar to those of the local model.
Figures 3 and 4 show a comparison of predicted and observed LAI mean and amplitude for forest and grass PFTs, respectively.The PFT model captures the mean behaviour but is not able to predict the full range of values in either mean LAI or seasonal amplitude for any PFT, explaining on average only 28 and 12 %, respectively, of the spatial variation in LAI mean and amplitude.The combination model shows an improvement, explaining on average 70 % of the spatial variation in mean LAI and 56 % of the amplitude, compared to the local model, which explains 90 and 87 %, respectively.The model results in boreal regions are difficult to interpret because of the uncharacteristically low values of the MODIS LAI in these regions, which are partially caused by high within cell heterogeneity (Caldararu et al., 2014).
All models show a similar ability to predict the timing of the seasonal cycle, with an error of 16 days for the start of the growing season and differences of up to 24 days for the maximum and end of the growing season, whereas in tropical Figure 5 shows LAI time series for four different PFTs.At the tropical evergreen forest location the local and combination models show a similar fit, whilst the PFT model cannot capture any seasonal cycle.At the dry tropical (savanna) location, the local model shows a good fit, but both the combined and PFT model predict a much higher LAI.For the temperate deciduous forest, all models capture the timing of the seasonal cycle, but the PFT model predicts a lower amplitude than that observed in the MODIS data.For the boreal evergreen forest, both the PFT and combination model predict a higher LAI than that observed by MODIS.
Figure 6 shows the relationship between model error and grid cell heterogeneity within the PFT model in terms of fraction of cell occupied by the dominant PFT for model RMSE, bias in LAI mean, and bias in LAI amplitude.All three metrics show no correlation with grid cell heterogeneity, with an R 2 of less than 0.01, indicating that there is no systematic bias in errors caused by the chosen PFT map.
To further investigate the observed differences arising from the model parametrisation, we can analyse the parameter values for each different model.Figures 7 and 8 show parameter distributions for the light compensation point and leaf age limit parameters for six selected PFTs. Figure 9 shows global distributions of the local parameters in the combined model.The PFT model fitted parameters are in most cases capturing the mean values of the local parameter distributions, but the discrepancy is higher in PFTs where the distribution has a long tail or multiple modes, especially in the grass PFTs (Fig. 8).In the evergreen tropical forest the discrepancy between the one value estimated by the PFT model and the wide range of both the local and combination parameters is particularly large, as, according to the model, phenology in these areas is limited by leaf age (Caldararu et al., 2014) and the different modes observed in the parameter distribution are essential for representing the leaf cycles caused by species with long but varied lifespans.The discrepancy in leaf age values between the different model parametrisations for the temperate PFT does not have such a profound effect on predicted LAI as phenology in these regions is limited by temperature and light and the age parameters are often poorly constrained even for the local model.Other large differences in parameter values are observed in the grass PFTs, which, as discussed above, have some of the highest errors.
Overall, all metrics show that the PFT model performs poorly across the globe, while the combination model, which has only two location-specific parameters, shows a good fit to the data.

Discussion
In this paper we have investigated the capacity of a global phenology model parametrised at the PFT level to represent observed phenological behaviour.We show that the PFT model cannot fully capture spatial variations in LAI mean and amplitude.In contrast, a model with local parameters results in a better model fit, but has a very large number of parameters, which make it very difficult to use.However, a combination model, where two of the model parameters are local while the others are fitted at the PFT scale, performs well with a reduced number of parameters.Our analysis shows that two specific parameters need to have local values, the direct compensation point C direct (henceforth referred to as compensation point) and the leaf age limit age crit .Below, we focus on the possible biological significance of the combined model and the possibility of using this concept in a more general setting.

Plant traits in the combined model
The most straightforward biological explanation for the observed results of the combined model is that the two local parameters -the light compensation point and leaf age limitare location-specific plant traits that vary within a PFT sufficiently to affect model performance.Previous studies, which have included traits as a replacement for the PFT concept, have done so starting from biological principles, either based on trait databases (Verheijen et al., 2013) or by evolving traits through plant competition (Sakschewski et al., 2015).In contrast, in the current study we include no prior knowledge of which parameters correspond to known plant traits and the local parameters in the combination model are inferred from the fitted model.The question that arises is if the resulting parameters and parameter values provide any biological insights or if this is just a mathematical artefact, resulting either from the data used or the model structure.
The light compensation point is not a trait commonly used in models or included in trait data, but it is closely related to leaf photosynthetic parameters, such as V cmax and J max , and could easily be derived in terms of these if our model included a biochemical description of photosynthesis.There is one other parameter in our model, the photosynthetic efficiency, φ, that is perhaps closer to the commonly used traits but did not emerge as the most important parameter in the PCA (Table A1) or was able to explain the spatial variability in LAI (Table A2).In contrast to the compensation point parameter, which drives leaf gain across the globe, φ mainly determines leaf loss in temperate and boreal regions, which are light and temperature limited (Caldararu et al., 2014).This result shows that leaf loss within a given PFT across temperate and boreal forests can be predicted well from environmental factors alone, without any inherent trait variation within a PFT.This could result from the real trait variation being low or the real trait variation having such a strong correlation with environmental factors that the effects of the trait variation cannot be separated from the effects of the environment.More ground measurements are needed to help confirm this.The model also contains a parameter for the diffuse light compensation point, C diffuse , introduced as a simple way of representing light in the canopy.This parameter is expressed relative to the C direct value at each location even though the parameter acts as a PFT level parameter, the absolute value for diffuse compensation point will vary spatially.
While the light compensation point is not a common parameter in vegetation models, measured values can be obtained from light response curves measured for individual plant species.Reported values range from 0.5 to 16.2 Wm −2 for tree species, varying with species and light environment (Riddoch et al., 1991;Lewis et al., 2000;Givnish et al., 2004;Baltzer and Thomas, 2007).The compensation point values for the combined model agree broadly with these values (Fig. 7), with the exception of values for boreal forests, where values can be much higher (up to 60 Wm −2 ), an error that we attribute to the poor quality of MODIS data in this region.
Leaf longevity is one of the main parameters used in vegetation models, which employ plant traits (e.g.Sakschewski et al., 2015) as well as in the analysis of the leaf trait spectrum (Wright et al., 2004).The second local parameter used in the combination model, the leaf age limit age crit , does not have the same meaning as leaf lifespan, mainly because leaf ageing is only one of the three leaf loss mechanisms in our model, alongside temperature and light and water limitation.Thus, in regions where leaf loss is not age limited, for example in temperate areas, the parameter is poorly constrained and its age value is never reached, as leaf loss occurs much sooner.In wet tropical systems where leaf ageing is the main driver of leaf loss, this age parameter is the critical age where leaves start ageing; therefore, the effective lifespan can be much larger.However, according to our model, it is the main driver of leaf loss in tropical systems and thus a proxy for determining leaf lifespan (see Caldararu et al., 2014, for a detailed discussion of the physical interpretation of this parameter).

Model structure
Our results show that allowing two critical traits to vary within a PFT among locations provides a superior model performance.It is likely that such traits vary due to underlying factors that are not explicitly included in our model.Two likely candidates for such hidden factors are nutrient availability and canopy structure.If the effects of these factors on traits could be understood and modelled explicitly, this could dramatically reduce the number of parameters required by the model, without making the assumption that the traits are constant within any PFT.Leaf photosynthetic capacity is a function of leaf nitrogen content (Farquhar et al., 1980;dePury and Farquhar, 1997;Hikosaka, 2003), a factor that has not been included in our model.According to current photosynthetic models, a higher leaf nitrogen content would lead to a higher light limited photosynthetic rate and hence a lower compensation point.shows the spatial distribution of the compensation point parameter as fitted in the combination model.The highest values are observed in grasslands, especially in the tropical region.In forest PFTs, the highest compensation point occurs over tropical forests, followed by temperate deciduous regions.This is supported by field studies, as higher latitude forests are generally more nitrogen limited while tropical and temperate grasslands are one of the most nutrient poor systems in terms of phosphorus (Bustamante et al., 2006;Elser et al., 2007).To explore the intra-PFT distribution of nitrogen availability and fully explain the locality of our compensation point parameter, we would need either a global data set of nitrogen availability such as the nutrient limitation index derived as a function of evapotranspiration and ecosystem production (Fisher et al., 2012) or coupling the phenology model with a full-scale vegetation model with an explicit representation of the nitrogen cycle (e.g.Zaehle and Friend, 2010).
Canopy structure determines the light environment in the canopy and controls the actual amount of light that reaches the leaves for a given light intensity above the canopy.This means it can be of important value in determining the compensation point, both through model structure and long-term impact on plant behaviour.Within the model used in this study, we assume a homogeneous canopy, with a random distribution of leaf angles and no clumping, assumptions that can be considered valid at very large scales, but can potentially introduce errors for certain forest structure types.It has been shown (Chen et al., 2012) that including leaf clumping in a carbon assimilation model has a major impact on resulting global gross primary productivity values.A leaf clumping factor would be used to adjust the attenuation coefficient α (Eq.2) to improve the description of light transmission through the canopy.It is possible that the compensation point parameter C direct artificially accounts for this variation in canopy structure, which explains its observed spatial variability.Further information such as the leaf clumping index map developed by Chen et al. (2005) would be needed to distinguish between the actual compensation point and canopy structure.This relationship is further complicated by the fact that plants adapt to their light environment; therefore, leaves in closed canopies will be better adapted to shaded conditions and will have lower compensation points and thus tropical forests, which are highly stratified, will have a much lower compensation point than other systems.The question is further complicated as canopy structure itself can be an adaptation to the available resources, such as light, water, or nitrogen, making it difficult to distinguish between all possible factors in the absence of further data.

Model parametrisation
One of the main possible sources of error in our conclusion is the way we have parametrised the PFT model.In most models, which use the PFT concept, grid cells are represented www.biogeosciences.net/13/925/2016/Biogeosciences, 13, 925-941, 2016 as a mix of PFTs, with PFT-specific parameters assigned to each fraction (e.g.Stockli et al., 2008), while we have chosen, in order to reduce the computational effort necessary for a global data-constrained model, to only use the dominant PFT in each grid cell.This approach, together with the low resolution that the model is run at could mean that the poor fit shown by the PFT model is due to a poor representation of PFTs rather than the unsuitability of the concept in vegetation models.If this was the case, we would expect high model errors in grid cells with a larger mix of vegetation types.However, the high errors in the PFT model are consistent throughout and do not show a significant correlation with the grid cell PFT heterogeneity (Fig. 6), indicating that the mix of vegetation types within grid cells cannot be the only explanatory factor.For a more robust conclusion, we would need to re-run the analysis with either a higher spatial resolution or with a PFT mix in each grid cell.Given the high computational demand of the fitting procedure, this would ideally be done for a smaller number of PFTs rather than at global scales.
We use space borne vegetation data from the MODIS Terra sensor, as satellite measurements are one of the only sources of data for constraining global level vegetation models, although they suffer from instrument error and atmospheric contamination.We have attempted to filter the data robustly using data quality flags, as discussed in Sect.2.1 and previous studies (Caldararu et al., 2012(Caldararu et al., , 2014) ) and the fitting procedure contains a parameter σ x , which accounts for the error in the observations (Sect.3.2).The largest possible source of error is the seasonality shown by the MODIS data in the Amazon basin and other tropical regions, which is most likely to determine the spatial distribution of the age crit parameter.Initial studies have shown that there is an increase in satellite observed LAI during the dry season over the Amazon (Myneni et al., 2007;Huete et al., 2006), but more recent studies have argued that the observed change in LAI is due to sunsensor geometry (Morton et al., 2014).This finding has been contradicted by subsequent papers (Bi et al., 2015) and we do not attempt to present an answer to this debate.For the purpose of this study, we assume that this observed change in LAI is a reflection of actual changes in leaf cover, an assumption backed by observed changes in gross primary productivity and litterfall (da Rocha et al., 2004;Goulden et al., 2004;Hutyra et al., 2007).

Method generality
As more studies begin to acknowledge that the PFT concept is not necessarily the best approach to vegetation modelling, we need to quantify the extent to which the inclusion of spatially distributed parameters or plant traits improve our predictive capability and to identify the optimal number of parameters that both give a good model fit and minimise computational cost.In this study we have attempted not only to build a model with locally distributed parameters but also to quantify the extent to which a model with local parameters and one with PFT level parameters can capture the spatial variability in global LAI observations.Furthermore, we quantitatively identified which parameters need to be local to improve model performance with a view to reduce data and computational needs.We believe that the method used here for investigating the use of PFT level parameters has a high degree of generality and can be applied to a large variety of models and input data sets.
One of the advantages of the PFT concept is its capacity to represent future changes in vegetation distribution within DGVMs.Given a predicted change in climate, models using PFTs can then predict a change in PFT distribution, using either predefined climate envelopes (Sitch et al., 2003) or predefined plant competition rules (Arora and Boer, 2006).Models that instead use plant traits do not offer such a straightforward solution, but have a number of advantages.The PFT approach only allows abrupt changes in vegetation and cannot capture any plant adaptation to climate, whereas a trait approach can represent gradual changes; however, representing changing traits in response to climate is more difficult, both conceptually and computationally.Recent studies have proposed the use of plant competition and emergent traits to predict vegetation distribution (van Bodegom et al., 2014;Wullschleger et al., 2014;Fisher et al., 2015).Therefore, a logical next step to our analysis would be to identify the environmental responses of the two combined parameters and their relationships with plant physiology responses, so parameter values can be estimated independent of data constraints for the purpose of model predictions under future climate change.

Conclusions
In this paper we explored the extent to which plants within the same PFT exhibit the same phenological characteristics using a process-based global phenology model.We showed that a model with PFT-wide parameters cannot explain the observed spatial variation in seasonal cycles, but that an intermediate model with two location-specific parameters gives a good overall model fit and can reliably be used for phenological studies.The spatial patterns of these local parameters, the light compensation point, and leaf age limit might be explained by species adaptation to the local climate or nutrient and water availability, and further data are needed to fully understand the observed distribution.The modelling approach used to determine the validity of PFT level models can provide further insight for global vegetation models, which use plant functional types as a basis for upscaling measured or fitted parameter values and can hence improve global simulations of ecosystem processes.

Figure 1 .
Figure 1.Root mean squared error (RMSE) of predicted LAI over the model study period for the local, PFT, and combined models.All values have been normalised to the mean observed LAI at all locations.

Figure 2 .
Figure 2. Difference between predicted and observed annual mean LAI (left) and seasonal amplitude (right) for the local, PFT, and combined models.All values have been normalised to the mean observed LAI at all locations

Figure 4 .Figure 5 .
Figure 4. Comparison of predicted and observed mean LAI and seasonal amplitude for the local, PFT, and combined models for tropical (green), temperate (red), and boreal (blue) grass PFTs.

Figure 6 .Figure 7 .
Figure 6.Correlations between model error and fraction of each grid occupied by the dominant PFT in the PFT model as a proxy for grid heterogeneity.(a) Model RMSE (b) bias in LAI mean, and (c) bias in LAI amplitude.

Figure 8 .
Figure 8. Parameter distributions for the light compensation point and age limit in three representative grass PFTs: savanna (S), grassland (G), and tundra (T).Parameter values are the mean of the fitted posterior distributions and the represented values reflect the variation in space within one PFT, for the local (top) and combined (bottom) models, as well as for the fitted PFT (black line).

Figure 9 .
Figure 9. Posterior parameter means for the compensation point C direct and the leaf age limit a crit resulting from the combination model.

Figure A1 .
Figure A1.Root mean squared error (RMSE) and difference in mean and amplitude of predicted LAI over the model study period for a selecion of regions in the regional model.All values have been normalised to the mean observed LAI at all locations.

Table 1 .
Model parameters for leaf gain and loss processes.

Table 2 .
Goodness of fit metrics for all five model parametrisations: root mean square error (RMSE) normalised by mean LAI value, difference in observed and predicted mean LAI and difference in observed and predicted annual amplitude.All metrics here are median values across the globe and the two difference values are shown as absolute values.