Interactive comment on “ Using satellite data to improve the leaf phenology of a global Terrestrial Biosphere Model ”

Abstract. Correct representation of seasonal leaf dynamics is crucial for terrestrial biosphere models (TBMs), but many such models cannot accurately reproduce observations of leaf onset and senescence. Here we optimised the phenology-related parameters of the ORCHIDEE TBM using satellite-derived Normalized Difference Vegetation Index data (MODIS NDVI v5) that are linearly related to the model fAPAR. We found the misfit between the observations and the model decreased after optimisation for all boreal and temperate deciduous plant functional types, primarily due to an earlier onset of leaf senescence. The model bias was only partially reduced for tropical deciduous trees and no improvement was seen for natural C4 grasses. Spatial validation demonstrated the generality of the posterior parameters for use in global simulations, with an increase in global median correlation of 0.56 to 0.67. The simulated global mean annual gross primary productivity (GPP) decreased by ~ 10 PgC yr−1 over the 1990–2010 period due to the substantially shortened growing season length (GSL – by up to 30 days in the Northern Hemisphere), thus reducing the positive bias and improving the seasonal dynamics of ORCHIDEE compared to independent data-based estimates. Finally, the optimisations led to changes in the strength and location of the trends in the simulated vegetation productivity as represented by the GSL and mean annual fraction of absorbed photosynthetically active radiation (fAPAR), suggesting care should be taken when using un-calibrated models in attribution studies. We suggest that the framework presented here can be applied for improving the phenology of all global TBMs.


Introduction
Leaf phenology, the timing of leaf onset, growth and senescence, is a critical component of the coupled soil-vegetationatmosphere system as it directly controls the seasonal exchanges of carbon, C, as well as affecting the surface energy balance and hydrology through changing albedo, surface roughness, soil moisture and evapotranspiration.In turn leaf phenology is largely governed by the climate, as leaf onset and senescence are triggered by seasonal changes in temperature, moisture and radiation.Leaf phenology is therefore sensitive to inter-annual climate variability and future climate change (Cleland et al., 2007;Körner and Basler, 2010;Reyer et al., 2013), as well as to increasing atmospheric CO 2 concentrations (Reyes-Fox et al., 2014), and will provide feedback on both climate and atmospheric C02 (Richardson et al., 2013).It is expected that climate warming will advance leaf onset in temperature limited northern biomes.Such trends have already been observed in the Northern Hemisphere (NH) using either satellite or in situ observations (Badeck et al., 2004;Delbart et al., 2008;Jeong et al., 2011;Myneni et al., 1997;Parmesan, 2007).However, increasing temperatures may either advance or delay senescence, depending on species-specific responses to other environmental variables (Hänninen and Tanino, 2011;Piao et al., 2007).Future changes of precipitation in a warming climate will also likely affect tropical and semi-arid ecosystems that are more controlled by moisture availability (e.g.Anyamba and Tucker, 2005;Dardel et al., 2014;Fensholt et al., 2012).
In order to improve predictions of the impact of future climate change on vegetation and its interaction with the global C and water cycles, it is crucial to have prognostic leaf phe-N.MacBean et al.: Using satellite data to improve the leaf phenology nology schemes in process-based terrestrial biosphere models (TBMs) that constitute the land component of Earth system models (ESMs) (Kovalskyy and Henebry, 2012b;Levis and Bonan, 2004).Many such models exist in the literature, especially for temperate and boreal forests (e.g.Arora and Boer, 2005;Caldararu et al., 2014;Chuine, 2000;Hänninen and Kramer, 2007;Knorr et al., 2010;Kovalskyy and Henebry, 2012a) and have been included in most TBMs.However, model evaluation studies have shown that there are biases in the growing season length and magnitude of the leaf area index (LAI) predicted by TBMs when compared to groundbased observations of leaf emergence and LAI (Kucharik et al., 2006;Richardson et al., 2012) or satellite-derived measures of vegetation greenness and LAI (Kim and Wang, 2005;Lafont et al., 2012;Maignan et al., 2011;Murray-Tortarolo et al., 2013).This can result in systematic errors in model predictions of the seasonal carbon, water and energy exchanges (Kucharik et al., 2006;Richardson et al., 2012;Walker et al., 2014).
As is always the case prior to model parameter calibration, it is unclear whether the misfit between modelled and observed measures of leaf phenology is the result of inaccurate parameter values, model structural error, or both.In order to answer this question, the parameters first need to be optimised using data assimilation (DA) techniques, and if the models cannot reproduce the data within defined uncertainties we expect to gain insights into possible directions for model improvement.DA is also a useful way to better characterise and possibly reduce uncertainty in model simulations, and to determine the relative influence of parametric, structural and driver uncertainty (e.g.Migliavacca et al., 2012).
Many studies have optimised the parameters of phenology models for a range of species with ground-based observations of the date of leaf onset (Blümel and Chmielewski, 2012;Chuine et al., 1998;Fu et al., 2012;Jeong et al., 2012), the "green fraction" derived from ground-based digital photography (Migliavacca et al., 2011) or with spring onset dates derived from carbon fluxes taken at flux tower sites (Melaas et al., 2013).Melaas et al. (2013) went further and demonstrated the transferability of parameters in time and between sites by including multiple sites in the optimisation.All of these studies have used DA to test different phenology model structures, thereby contributing significantly to the debate about whether a simple classical temperature-driven budburst model is sufficient, or whether more complex chilling and/or photoperiodic cues are needed to best predict leaf onset.Several studies also investigated the impacts of optimising phenology on the resulting C and water budgets (Migliavacca et al., 2012;Picard et al., 2005;Richardson and O'Keefe, 2009).Peñuelas et al. (2009) noted that medium-to coarseresolution satellite data might be more appropriate for optimising the phenology in TBMs, due to the large difference in scale between the resolution of a typical model grid cell (1 × 1 • ) and ground-based data, which may cause representation errors (Rayner, 2010).A few studies to date have performed a global optimisation of model phenology using satellite data, in the sense that multiple sites and/or plant functional types (PFTs) are included in the assimilation (Forkel et al., 2014;Knorr et al., 2010;Stöckli et al., 2011).In this study we aim to reinforce this line of research and to answer the following questions: i. Can we constrain the phenology-related parameters and processes of a typical process-based TBM at global scale using satellite "greenness" index data?
ii.Does this produce a generic parameter set that results in improved simulations of the seasonal cycle of the vegetation, or are further model structural developments required?
iii.What is the impact of the optimisation on mean patterns and trends in vegetation productivity (as represented by the mean fraction of absorbed photosynthetically active radiation (fAPAR), amplitude and growing season length (GSL)) at regional and global scales?
To achieve this we performed a global, multi-PFT, multisite optimisation of the phenology model parameters for the six non-agricultural deciduous PFTs of the ORCHIDEE TBM.The phenology models in ORCHIDEE are common to many process-based TBMs.Note there is no specific phenology model associated to evergreen PFTs, where leaf turnover is simply a function of climate and leaf age.
Some of the carbon cycle-related parameters of OR-CHIDEE (including phenology-related parameters) have previously been optimised using in situ flux measurements (e.g.Kuppel et al., 2014;Santaren et al., 2014;Bacour et al., 2015).Here we focus purely on improving the timing of both spring onset and autumn senescence of ORCHIDEE at global scale, by using a novel approach to assimilate normalised medium-resolution satellite-derived vegetation "greenness" index data (MODIS NDVI collection 5) that are linearly related to the simulated daily fAPAR.The aim of a multi-site (MS) (i.e.model grid cell) assimilation is to find a unique parameter set for each PFT that results in a similar improvement as a single-site (SS) optimisation, as the range of posterior parameter values for individual sites/species can be large (Richardson and O'Keefe, 2009).We hypothesise that the MS approach may average out the site-based variability, and thus provide one consistent PFT-generic parameter vector that can be used for global simulations (e.g.Kuppel et al., 2014).

ORCHIDEE terrestrial biosphere model
ORCHIDEE is a global process-oriented TBM (Krinner et al., 2005) and is the land surface component of the IPSL-CM5 Earth System Model (Dufresne et al., 2013).In this study we used the "AR5" version that was used for the IPCC Fifth Assessment Report (Ciais et al., 2013).The model calculates carbon, water and energy fluxes between the land surface and the atmosphere at a half-hourly time step.The water and energy module computes the major biophysical variables (albedo, roughness height, soil humidity) and solves the energy and hydrological budgets.The carbon module controls the uptake of carbon into the system and respiration following cycling of C through the litter and soil pools.Carbon is assimilated via photosynthesis depending on light availability, CO 2 concentration and soil moisture, based on the work of Farquhar et al. (1980) for C3 plants and Collatz et al. (1992) for C4 plants.The module includes the calculation on a daily time step of a prognostic LAI and allocation of newly formed photosynthates towards leaves, roots, sapwood, reproductive structures and carbohydrate reserves, depending on the availability of moisture, light availability and heat (Friedlingstein et al., 1999).The phenology models that control the timing of leaf onset and senescence in ORCHIDEE, depending on PFT, were described in Botta et al. (2000) and Maignan et al. (2011) but are described in more detail in Appendix A. The simulated fAPAR can be calculated from the model LAI using the following Beer-Lambert extinction law, assuming a spherical leaf angle distribution and that the sun is at nadir (following Bacour et al., 2015): ORCHIDEE's functioning relies on the concept of plant functional types (see Wullschleger et al. (2014) for a review).
A PFT groups plants that have the same physiological behaviour under similar climatic conditions.ORCHIDEE uses 13 PFTs that are listed in Table 1 (along with the phenology model used).Different PFTs share the same processes but usually with different parameter values, except for the phenological models that are PFT-dependent (Table 1).The model is driven by meteorological variables related to temperature, precipitation, short-and long-wave incident radiation, specific humidity, surface pressure and wind speed.Soil texture and PFT fraction are also prescribed per grid cell.

Observing the seasonal cycle of vegetation
The seasonal cycle of the terrestrial vegetation is observed daily, cloud cover permitting, at a global scale and mediumscale spatial resolution (250 m) from several polar orbiting spectroradiometers.Studies have shown that considerable discrepancies exist between so-called "high-level" satellite products such as LAI or fAPAR, especially when considering their magnitude (D'Odorico et al., 2014;Garrigues et al., 2008;Pickett-Heaps et al., 2014).This is because radiative transfer models are used to derive these products, which introduces uncertainty due to undetermined parame- ters or potentially incomplete descriptions of the radiative transfer model physics.Instead therefore, we considered a vegetation greenness index, the Normalized Difference Vegetation Index (NDVI), that is directly related to the near infrared (NIR) and red (RED) surface reflectance, ρ (NDVI = ρ NIR − ρRED / ρ NIR +ρRED).This index is based on the fact that photosynthesising vegetation reflects a high proportion of the incoming NIR radiation, whilst absorbing most of the red.NDVI has been shown to be linearly related to fAPAR, though with uncertainties related to the issues mentioned above (Fensholt et al., 2004;Knyazikhin et al., 1998;Myneni and Williams, 1994).Therefore although NDVI is not directly related to a physical property of the vegetation, it does capture its seasonal cycle together with inter-annual anomalies, and therefore can be used to optimise the model phenology (via the simulated fAPAR).In order to optimise the seasonality (but not the magnitude) of modelled daily fA-PAR using the NDVI data, we normalise both to their maximum and minimum values of the whole time series at each site (following Bacour et al., 2015).

MODIS NDVI data and processing
NDVI observations are derived from the MOD09CMG collection 5 (v5) surface red (620-670 nm) and near-infrared (841-876 nm) daily global reflectance products available at 5 km from the MODerate resolution Imaging Spectrometer (MODIS) on-board the NASA's Terra satellite.The reflectance data were cloud-screened and corrected for atmospheric and directional effects (related to the change of reflectance with observation geometry) following (Vermote et al., 2009), and the corresponding NDVI was calculated for the 2000-2008 period.(Vermote et al., 2009).However, in this study the daily model and observation uncertainty used in the assimilation was defined as the RMSE between the normalised prior model fAPAR simulation (using default ORCHIDEE values) and the normalised NDVI observations, following (Kuppel et al., 2012).This error thus accounts for the spatial and temporal averaging, the error of the NDVI retrieval and the model structural error.

System description
The ORCHIDEE Data Assimilation System (http://orchidas.lsce.ipsl.fr) is based on a variational data assimilation procedure that has been described in detail in previous studies using ground-based net surface CO 2 and energy fluxes (Kuppel et al., 2012;Santaren et al., 2007;Verbeeck et al., 2011).Kuppel et al. (2012) presented the first results using a multisite (MS) version of the system at selected eddy-covariance flux tower locations, where data from all sites were used to optimise the model parameters at the same time for each PFT.
As with most statistical data assimilation approaches it follows a Bayesian framework, where prior knowledge of the parameter values is updated based on new information from the observations.Assuming that the probability distribution functions (PDFs) of the model parameter and observation uncertainties are Gaussian, the optimal parameter vector x can be found by minimising the following cost-function J (x) (Tarantola, 1987): where y is the observation vector, H (x) the model outputs, given parameter vector x, R the uncertainty matrix of the observations (including observation and model errors), x b the a priori parameter values (the standard values of ORCHIDEE) and P b the a priori uncertainty matrix of the parameters.Hence, the cost function describes the misfit between the observations and corresponding model outputs, plus the misfit between the current and prior parameter vectors, weighted by prior information on the parameter and observation uncertainties.Observation and model errors are assumed to be uncorrelated in space and time, and parameters are assumed to be independent; hence R and P b are diagonal matrices.The cost function is iteratively minimised using the gradientbased L-BFGS-B algorithm (Byrd et al., 1995), which allows the definition of boundary constraints for the parameters.The prior parameter vector is most commonly used as the starting point in the iterative minimisation, but it can be started from any point (set of parameter values) in the parameter space.
The gradient of the J (x) is estimated using the tangent lin-ear model, except for parameters that impose a threshold on the model processes.For these, the finite difference method is used.The posterior parameter covariance can be approximated from the inverse of the second derivative (Hessian) of the cost function around its minimum, which is calculated using the Jacobian of the TBM model with respect to fAPAR at the minimum of J (x) (for the set of optimised parameters), H ∞ , following Tarantola (1987): The posterior parameter covariance can then be propagated into the model state variables (fAPAR or net C flux) space given the following matrix product and the hypothesis of local linearity (Tarantola, 1987): (4) The square root of the diagonal elements of R post corresponds to the posterior error (standard deviation σ ), on the state variables considered of each grid cell.In order to appraise the knowledge improvement brought by the assimilation, the error reduction is determined as 1 − R post /R prior ).

Parameters to be optimised
Figure 1 shows a general schematic of how the parameters of the phenological equations used in ORCHIDEE (Botta et al., 2000) control the timing of the seasonal cycle of the LAI as well as the rate of leaf growth and fall.The parameters that are optimised for each PFT are given in Table 2 and are briefly described here.A more detailed description can be found in Appendix A. The start of the seasonal cycle of temperature-driven PFTs is constrained by optimising the growing degree day threshold, GDD threshold (Eqs.A1 and A2), the threshold for the number of growing days, NGD threshold and the LAI threshold (Eq.A4) parameters, which all play a part in controlling leaf onset and rate of canopy growth.As the GDD threshold is calculated in different ways depending on the PFT-dependent phenology model used, and as the NGD threshold acts in a similar way to the GDD models, we introduced one single multiplicative effective parameter, K pheno_crit , to optimise the GDD threshold and NGD threshold for all phenology models.Thus the GDD threshold and the NGD threshold become the following: with an a priori value of 1.In a similar manner a new effective parameter, K lai_happy was introduced to compute the LAI threshold parameter, which is the LAI below which the carbohydrate reserves are used for leaf growth at the beginning of the growing season, following the equation  Although it is not strictly a phenology model parameter, we optimise K lai_happy as it is partly responsible for the rate of leaf growth.LAI max is a fixed parameter in this study as we only examine the seasonal cycle of the vegetation, not its magnitude.
The end of the seasonal cycle is constrained by optimising the critical leaf age for senescence, L agecrit (Eq.A5), the senescence temperature threshold, T threshold (Eq.A6) and the rate of leaf fall, L fall (Eq.A8) parameters.L agecrit and L fall are optimised directly, and T threshold is optimised through the C 0 parameter (Eq.A6), and is henceforth called T senes .
For phenology models that are driven by soil moisture conditions ("MOI" models -see Appendix A and Table 1) the parameter that controls leaf onset is the "minimum time since the last moisture minimum" (Moist Tmin ), and the parameters that control senescence are Moist senes and Moist no_senes , the critical moisture levels below and above which senescence does and does not occur, respectively.These PFT-dependent parameters are optimised directly; i.e. no effective parameters are introduced to scale the original ORCHIDEE parameter.
The prior parameter values are taken from the ORCHIDEE standard (non-optimised) version and are detailed in Table 2.The maximum and minimum bounds of the parameters were set based on literature and "expert" knowledge.Prior uncertainty on the parameters was taken to be 40 % of the parameter range following Kuppel et al. (2012).

PFTs optimised and site selection
The six deciduous, non-agricultural PFTs of ORCHIDEE are optimised in this study.For each of the PFTs that were optimised we selected 30 sites (where one site is equal to one model grid cell at 0.72 • resolution -see Sect.2.4) that fulfilled several constraints (Fig. 2).First the grid cells have to be representative of the considered PFT and thus contain a high fraction of the PFT in question.This was mostly > 0.6 except for the boreal broadleaved deciduous (BoBD) PFT where the fractional cover is never greater than 0.4.For this PFT, all the grid points selected contained 40 % BoBD trees and high fractions (0.5-0.6) of natural C3 grasses (NC3); thus both PFTs were optimised for these grid cells simultaneously.Second, the site locations should be as representative as possible of the PFT spatial distribution.This is achieved by a random sampling of grid cells with a fractional coverage above the given threshold.Lastly each NDVI time series was visually inspected and discarded if it was too noisy or con-N.MacBean et al.: Using satellite data to improve the leaf phenology tained an incomplete seasonal cycle.Whilst we could not be 100 % certain that no land cover change or disturbance had taken place for the grid cells selected, none of the time series showed discernible signs of a shift in vegetation.A total of 15 of the sites were used in the optimisation, and the other 15 were kept for spatial validation following the optimisations (see Fig. 2).To separate the 30 sites into 2 sets of 15 for optimisation and validation, we ordered all sites by their grid cell row number and took alternate points for each list.

Optimisations and simulations performed
In this study ORCHIDEE is used in forced offline mode and is driven by 3-hourly ERA-Interim meteorological fields (Dee et al., 2011), on a regular 0.72 • grid, which are linearly interpolated to a half hour time step within ORCHIDEE.We use the Olson land cover classification, which contains 96 classes at a resolution of 5 km, to derive the PFT fractions at 0.72 • following (Vérant et al., 2004).The soil texture classes are derived from Zobler (1986).The impact of land use change, forest management, harvesting and fires were not included in any simulation.

Multi-site optimisation
For each PFT optimised, the 15 optimisation sites (see Sect. 2.3.3) were first optimised simultaneously (i.e.all sites were included in the same cost function), over the 2000-2008 period using the multi-site (MS) approach detailed in Kuppel et al. (2012).Following (Santaren et al., 2014) we tested the ability of the algorithm to find the global minimum of the cost function by starting the iterative minimisation algorithm (see Sect. 2.3.1) at different points in the parameter space, choosing 20 random "first guess" sets of parameters and performing a MS optimisation for each.The results of these tests are presented in Sect.3.1.

Single-site optimisation
A single site (SS) optimisation was then performed for each of the same 15 optimisation sites.The assimilations were exactly the same as for the MS optimisation, except each site was optimised separately.The posterior parameter vector resulting from the "best" random first guess MS optimisation (taken as the greatest % reduction in the cost function) was used as the first guess for the SS optimisation.The first guess with the greatest % reduction in the cost function was equivalent to the first guess that resulted in the lowest value of the cost function, as the % reduction was calculated using the value of the cost function using the default (prior) OR-CHIDEE parameters.

Site-based validation
The same MS posterior parameter vector for each PFT was then used to perform a simulation at each of the 15 extra spatial validation sites (see Sect. 2.3.3)over the same time period.In addition, prior and posterior simulations at all 30 optimisation and validation sites were extended to cover the 2009-2010 period in order to perform a temporal validation.

Global-scale evaluation
Finally two global-scale simulations (with increasing atmospheric CO 2 concentration and changing climate) were performed for the 1990-2010 period with both the prior and MS posterior parameter values, in order to evaluate the impact of the optimisation on the global mean patterns and trends in annual mean fAPAR, amplitude and GSL.The same ERA-Interim 0.72 • forcing data were used as for the site-based optimisations.Note that no spinup of the soil C pools was needed for this study.

Post-processing and analysis
The prior and posterior RMSE and correlation coefficient, R, were calculated for both the MS and SS optimisations at all sites for a comparison.The values for the spatial and temporal validation simulations were evaluated to assess the spatial and temporal generality of the posterior vectors.For all the analyses performed in this study, metrics given per PFT at global scale were derived for grid cells that contained > = a certain PFT fraction following the rules used to select the optimisation sites (see Sect. 2.3.3).

Calculation of the start of leaf onset and senescence, GSL and trend analysis
The curve-fitting method of Thoning et al. (1989) was used to fit a function to the daily time series of observations and model output as described in Maignan et al. (2008).The function consists of two parts; a second-order polynomial that is used to account for the long-term trend, and a fourth order Fourier function to approximate the annual cycle.The residuals of the fit to this function were filtered with two low pass filters in Fourier space (80 and 667 cut-off days) and then added back to the function to produce a smoothed function that captures the seasonal and inter-annual variability and long-term trend.The detrended curve can be calculated by subtracting the trend from the smoothed function.
The start of season (SOS -leaf onset) and end of season (EOS -the start of leaf senescence) were defined as the upward and downward crossing points of the "zero-line" of the de-trended curve per calendar year (see Fig. 1 in Maignan et al., 2008).These values were calculated for all grid cells with only one seasonal cycle per year (this includes grid cells in the SH where the growing season spans 2 calendar years).
The GSL was calculated as the number of days per calendar year when the detrended curve is greater than zero.Therefore unlike the SOS and EOS, the GSL was also calculated for grid cells that contain multiple growing seasons within a calendar year.For the trend analysis, a linear least squares regression was used to calculate the long-term trend in the annual fAPAR amplitude, growing season length (GSL) and the mean fA-PAR time series.

Global evaluation with MODIS NDVI
The global simulations were evaluated with the same MODIS NDVI data that were used at the site level for the optimisation, following the protocol of Maignan et al. (2011).The following metrics were used for evaluation of both the prior and posterior simulations.
The correlation between the normalised simulated fAPAR and MODIS NDVI weekly time series.
The bias (in days) between the modelled and observed SOS and EOS dates (model -observations) were also examined so as to investigate the impact on the timing of the phenology more directly (a positive bias indicates the model date is later than the date calculated from the observations).
The above metrics were calculated for each grid cell.Following this a global median value was calculated, as well as a median correlation per PFT.

Convergence of the optimisation algorithm
We initially tested the ability of the MS optimisation to find the global minimum of the cost function (J (x)) by starting at 20 different random "first guess" points in the parameter space.For the forest PFTs and natural C3 grasses, the final cost function value was mostly within ∼ 30 % of the minimum (lowest) cost function value (up to 50 % for TeBD -Table 3, 2nd column), and in the majority of cases a 30-60 % reduction in the cost function was achieved (  the value of J (x) for the default parameters of ORCHIDEE) across all 20 random tests was seen for the BoND PFT.There was a higher spread in the % reduction for natural C3 grasses and the TeBD PFTs, suggesting the cost function is not as smooth (i.e.contains more local minima) as for the BoND PFT (for example).This is possibly linked to the need for a greater number of species for certain PFTs or due to differing parameter sensitivities under different climate regimes (the NC3 sites have a particularly wide global distribution).Nevertheless, examining these results we feel confident that for these PFTs the assimilation system converges to a value of the cost function that is reasonably close to the likely global minimum.Thus for the SS optimisations, the site-based validation and the global-scale evaluation we used the MS posterior vector that resulted in the minimum value of the cost function.
However the picture is different for natural C4 grasses (NC4).Only 2 out of the 20 random first guess tests resulted in a > 10 % reduction in the cost function, and although the spread of final values of the cost function was low and close to the minimum value (Table 3, column 2), the final value was between 2 and 10 times higher than that achieved for other PFTs (Table 3, column 1).This suggests that the optimisation algorithm cannot find a better fit to the data than with the default parameter values.It is possible that the BFGS algorithm is not adequate for exploring the parameter space for NC4 grasses, but given that none of the random tests resulted in a noticeable reduction in the cost function, it is more likely that the model sensitivity to the parameters is lower than for other PFTs.This in turn suggests that the phenology model structure itself is inadequate for this NC4 grasses.

Temperate and boreal PFTs
There is an improvement in the model-data fit after both SS and MS optimisations for all temperate and boreal broadleaved and needleleaved deciduous forests (TeBD, BoND, BoBD) and for natural C3 grasses (NC3), largely resulting from an earlier onset of senescence in the model and therefore a substantially shortened growing season length (Figs. 3 and 4).The shift in the start of leaf growth is much smaller, which is not surprising as the prior model more closely matches the observations.Of the four PFTs listed above, only TeBD trees have a slightly later leaf onset as a result of the optimisation.Figure 3 shows the full time series at one site of both the normalised and un-normalised fAPAR and NDVI, together with the simulated LAI, for the BoBD PFT.This site is provided as an example of the typical changes in temporal behaviour seen for the four PFTs listed above.Figure 4 shows the mean seasonal cycle of the normalised fAPAR/NDVI across all sites and years (2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008) for each of the four PFTs and demonstrates that the patterns seen in Fig. 3 are similar for all the boreal and temperate deciduous PFTs.The optimisations resulted in a significant reduction in the RMSE (34-61 %) and increase in correlation (posterior R > 0.82) between the normalised modelled fAPAR and MODIS NDVI data for all four temperate and boreal PFTs (Table 4).The variation in the RMSE and R at each site for prior, multi-site optimisation and the spread for all the singlesite optimisations for all four PFTs are shown in Fig. S1 in the Supplement.The improvement is greatest for the Boreal PFTs (median reduction in RMSE across all sites for both SS and MS optimisations of ∼ 50-60 % and an increase in R from 0.41 to 0.9), but nonetheless the optimisations of the temperate broadleaved deciduous (TeBD) and natural C3 grasses (NC3) PFTs result in a median reduction of uncertainty of between ∼ 20-40% and an increase in R of up to ∼ 0.2.
There is a discernible slowing down in the rate of leaf growth towards the end of the leaf onset after the assimilation, which particularly results in an improved fit to the observations for the TeBD, BoND and NC3 PFTs.However it is noticeable that although parameters that partially control the rate of leaf growth and fall are included in the optimisation, the model generally grows and sheds leaves too fast compared to the observations, except for the BoBD PFT, which results in the model having an unnatural "box-like" temporal profile (Fig. 4 and see Sect.4.3 for further discussion).
The MS optimisation (red line in Figs. 3 and 4) largely results in a similar reduction in RMSE and increase in R as the SS optimisations at each individual site except for TeBD trees, although a similar magnitude of improvement is achieved for this PFT as for the others (Table 4 and Fig. S1).This suggests that the MS parameter vector is universal enough to be used to perform global simulations.

Tropical deciduous forest and C4 grasses
For both the tropical broadleaved raingreen (TrBR) and natural C4 grass (NC4) PFTs, the median prior fAPAR simulation performs reasonably well compared to the observations, with RMSEs of 0.29 and 0.23 and R of 0.73 and 0.81, respectively (Table 4 and Fig. S1).Following optimisation there is an improvement in the median model-data misfit both for the SS and MS optimisation for TrBR trees across all sites, but the spread in posterior RMSE and R values remains high (Fig. S1). Figure 5a shows an example of the typical issues seen for some TrBR sites that have not been resolved by optimising the phenological parameters.The growing season is always slightly out of phase, even after the optimisation, with the simulated SOS and EOS lagging that of the observations.The observed start of leaf growth coincides with the start of increased precipitation, as would be expected, but the simulated onset does not, despite the fact that the onset phenology model is solely driven by soil water availability.Possible causes of inconsistencies in the model will be discussed in Sect.4.4.At the end of the growing season the optimisation results in a more gradual start of leaf turnover at many sites, thus better matching the observation temporal profile, but still the end of the leaf fall lags that of the observations (Fig. 5a).Finally at some of the sites the observations show a smaller, second period of growth in some years (not shown), but the model also does not capture this.
There is no change in the median RMSE and R for the NC4 PFT with the MS optimisation, although the prior model-data misfit is relatively small (Fig. S1).The SS optimisations do result in a small reduction in the median RMSE (0.21) but no improvement in the median correlation.This is not surprising given the results of the random first guess tests presented in Sect.3.1.A comparison of the observed and modelled time series of NDVI and fAPAR at each site again reveals where the model is not able to fully reproduce the seasonality seen in the observations.At many sites the model predicts a positively biased (i.e.too late) SOS, followed by a drop in fAPAR in the first half to middle of the year that is not seen in the observations, and does not correspond to a decline in precipitation (e.g.Fig. 5b).The optimisation can result in the partial removal of this feature, particularly for SS optimisations at certain sites, but at the expense of a further delay to the start of leaf growth.Neither the SS or MS optimisations are able to reduce the model-data misfit by forcing an earlier start of senescence at any of the sites.As there is no discernible improvement after the MS optimisation of the NC4 PFT, the posterior parameters are not used in the further analysis of global changes to the gross primary productivity (GPP), GSL or trends.It is likely that the phenology models that are used in water limited ecosystems (TrBR and NC4) need revising (see discussion in Sect.4.4).

Spatial and temporal validation
Table 4 also shows the RMSE and R for the extra 15 sites that were not included in the optimisation (spatial validation), and for all sites for the period 2009-2010 that were not included in the optimisation (temporal validation).These validation exercises were performed with just the MS posterior parameter vector for each PFT.Similar magnitudes and patterns of improvement are observed between the PFTs as for the sites used for optimisation -the largest improvement is seen for the BoND PFT, and there is no improvement for NC4 grasses.These results again give confidence in the generality of the MS posterior vector and its use in regional and global scale simulations, except for the NC4 PFTs for which there was an insufficient improvement post-optimisation.

Validation at global scale with MODIS NDVI
The global median correlation between the model and the MODIS NDVI data have increased from 0.56 to 0.67 after the optimisation, demonstrating an overall improvement in the simulated fAPAR time series.As seen at the site level, the largest increase in correlation between the modelled and observed time series is for the boreal PFTs.There is also a modest improvement for natural C3 grasses (Table 5).Figure 6 shows the spatial distribution of the correlation for the both posterior simulation and the difference after optimisa- tion (posterior -prior).The difference map shows that R has mainly improved for boreal regions in the north of Canada and in Siberia, the C3 grasslands in central Asia and western North America, and the high altitude regions of the Andes (due to improvements in BoBD, BoND and NC3 PFTs) with slight changes for tropical raingreen trees in savannahdominated regions in Africa.A decrease in R was seen in part of the drylands of western North America and along the western boundary of South America.
The global median end of season (EOS) bias (modelobservations) between the model and MODIS data was reduced dramatically as a result of the optimisation (prior: 33 days; posterior 5 days).Note that a positive bias indicates the model date is later than the date derived from the MODIS data.Again, boreal PFTs and NC3 grasses showed considerable improvement, as expected from the site-level behaviour (Sect.3.2), but grid cells containing high fractions of temperate and boreal evergreen trees were also positively affected (Table 6).The bias in the start of season (SOS) dates also decreased (prior and posterior global median bias of 22 and 14 days, respectively), with improvements seen for all PFTs except TeBD trees and crops (Table 6).

Posterior parameters and processes constrained
The prior value, prior range and posterior value from the MS optimisation for each parameter (per PFT) are shown in Table 2. Figure 7 shows the prior and posterior parameter values for both the SS and MS optimisation for each parameter and each PFT.In Fig. 7 the mean and standard error of the mean of the SS posterior parameters are shown (circle with error bars), together which the value obtained at each individual site (crosses).For the prior simulation and MS optimisation the error bar corresponds to the standard deviation of the parameter value (calculated using Eq.3).The MS uncertainty is lower than spread of SS posterior values, suggesting that it underestimates the true uncertainty of the posterior parameters.This may be the case given the assumptions of linearity of the model and of Gaussian and uncorrelated errors.
The lower posterior values of K pheno_crit reduced the positive bias in the SOS dates, for temperate and boreal deciduous trees and natural C3 grasses, thus providing a better fit to the model.For all temperate and boreal PFTs the posterior value of the K lai_happy parameter, which is used to calculate the LAI value below which leaf growth is supported by the carbohydrate reserves, decreases for both the SS and MS optimisation.In ORCHIDEE the rate of leaf growth slows down after this LAI threshold is reached as the C now comes from photosynthesis, which may be limited by various factors.Lower values of K lai_happy therefore result in an earlier end to the period of rapid canopy growth at the beginning of the season and thus a smoother temporal profile follows during the final stages of growth.This partially compensates for the "box-like" model behaviour, but structural deficiencies related to spatial variability need to be addressed further (see Sect. 4.3).The lower value of K lai_happy (together with the earlier start to senescence) is also responsible for the reduction in the peak LAI at some sites for some PFTs, particularly BoND trees (see Fig. 3 for example).It may be hypothesised that the optimisation resulted in a reduction in the C allocated to the carbohydrate reserve, which therefore may also have contributed to the decrease in the peak LAI.However an examination of the simulated carbohydrate reserve showed this not to be the case (results not shown).In any case this ob- served reduction in peak LAI was not a common result of the optimisation when considering all PFTs.
The earlier start of senescence is overwhelmingly caused by an increase of T senes for all temperate and boreal trees and natural C3 grasses in both the SS and MS optimisations, as well as a lowering of the critical leaf age (L agecrit ), though this is not the case for BoBD trees (Fig. 7).It is important to remember however that the optimisations at BoBD sites also included high fractions of natural C3 grass that were optimised at the same time (see methods Sect.2.3.3) and therefore the same parameter can have strong interactions between the two PFTs, as can be seen in for the L agecrit parameter in the correlation matrices in Fig. S2.The L fall parameter has not changed considerably after the optimisation, suggesting that the rate of leaf fall matches that of the observations when parameters that govern the start of leaf senescence have been optimised.However, one exception to this is seen for TeBD trees, where the rate of leaf fall has decreased following optimisation (Fig. 4a), and thus the model better matches the observations.
The phenology of C3 grasses is also controlled by soil moisture availability.The moisture-related leaf onset parameter, Moist Tmin , does not appear to be as important as the temperature-related leaf onset parameter (K pheno_crit ).However, the increase in the value of both T senes and Moist senes post-optimisation show that both moisture and temperature conditions are responsible for the marked shortening of the growing season length for natural C3 grasses (Fig. 4), as would be expected given the wide geographical distribution of this PFT.
For the tropical raingreen forests (TrBR), there is a decrease in the posterior SS and MS values of Moist Tmin and an increase for the Moist senes parameter, which results in an earlier start of both leaf growth and senescence, again reducing the positive bias in the SOS and EOS dates predicted by the model.However the parameters are "edge-hitting", which suggests the optimisation has not necessarily found the optimum solution.This may explain why the model remains out of phase with the observations as described above (Fig. 5a).
Although the phenology of C4 grasses is governed by both temperature and moisture conditions, the fact that there is no change in the value and uncertainty of the temperaturerelated parameters, K pheno_crit and T senes shows a lack of sensitivity to both (Fig. 7), which is not surprising as the location of the C4 grasses pixels used in this study are mainly located in moisture-limited tropical regions.The SS and MS posterior values of the moisture-related parameters, Moist Tmin and Moist senes , have changed (increased), but the spread of the SS optimised parameter values is large, which accounts for the lack of improvement in the median RMSE and correlation between the time series of the model and observations (Table 4).

Change in global patterns and trends of mean annual fAPAR, amplitude and GSL
Figure 8 shows the change (posterior -prior) in the mean annual GSL, fAPAR amplitude and mean simulated fAPAR globally over the 1990-2010 period.As expected from the site-level results for temperate and boreal PFTs (Fig. 4 and Table 4), there was a strong decrease in the mean GSL in the high latitudes and grasslands across much of the NH (median of −30 and −10 days for boreal (60-90 • N) and temperate (30-60 • N) regions, respectively), as well as in equatorial Africa (median of −28 days) (Fig. 8a).An increase in the GSL of ∼ 7 days was mostly observed in the Sahel and Miombo savannah regions of Africa.A decrease in fAPAR amplitude was also seen in Siberia and in waterlimited grasslands in southern Africa, the western United States and central Asia (Fig. 8b).Although the primary aim of the assimilation was to constrain the timing of the phenology, changes in the amplitude are the result of the inter- play between parameters controlling the rate of leaf growth and fall and the timing of senescence that ultimately result in a lower maximum LAI, as discussed in Sect.3.4.The combined result was a strong decrease in the annual mean fAPAR in most regions of the globe that are not dominated by evergreen trees, crops or bare soil, except for the Sahelian region in Africa (Fig. 8c).
Figure 9 shows the linear trend (yr −1 ) in the annual mean fAPAR for the 1990-2010 period, both for the prior and posterior simulation and their difference.The trends of the annual mean are shown as it is a more comprehensive metric compared to the daily/monthly time series of fAPAR, the GSL or fAPAR annual amplitude.The large-scale spatial patterns of positive and negative trends over the 1990-2010 period were not altered significantly after the assimilation, however the strength of the trend is generally reduced.This can be seen in Fig. 9 as positive "greening" trends mostly correspond to regions where there was a decrease in the slope of the trend after optimisation (orange areas in Fig. 9c), and vice versa for negative "browning" trends (purple areas in Fig. 9c).This was not always the case however, for example the "greening" trend in central Siberia (around 60 • N) and the "browning" trend in parts of the Sahelian both increase in strength following the assimilation.Note that an examination of the trends in other metrics (SOS, EOS, GSL or annual amplitude) did reveal different changes in the spatial patterns following the assimilation.For example the increase in GSL actually increased slightly in north-western Siberia, and the decline in GSL in Mongolia (also seen as a browning trend in the annual mean in Fig. 9) increased by ∼ 2-3 days after the optimisation (results not shown).Further details are not discussed here.
We chose not to compare the simulated trends with that of the MODIS NDVI.This was partly because the 2000-2010 period is likely too short to calculate a robust trend, as the influence of inter-annual variability will be stronger; indeed the trends over the longer 1990-2010 period are more geographically distinct.Secondly it was not the aim of this study to validate the modelled trends, nor would it be appropriate, because we have used a version of the model that does not include land use change, disturbance and other effects that will contribute to changes in vegetation greenness at global scale.The results presented here serve to highlight that different parameter values can change the strength, sign and location of the trends.However, it is worth noting that the sign of the simulated trend does not always match the MODIS data, especially for drier, warmer semi-arid regions.For example, the browning trend seen in the Kazakh Steppe in the MODIS data is stronger in ORCHIDEE and extends further east into Mongolia (results not shown).Similarly, the model predicts a decline in fAPAR across much of sub-Sahelian Africa is not seen in the MODIS NDVI.The optimisation did not result in a change in trend direction for this region, though generally it did reduce its strength.Overall the greening trend in the NH (> ∼ 30 • N) was well captured by the model both before and after optimisation.

Optimisation
Whilst the SS optimisations result in a reduction in the model-data misfit across a range of sites, this study has shown that a MS optimisation can achieve a similar improvement with one unique parameter vector.This is an important result, and reinforces the conclusions of Kuppel et al. (2014) that the MS posterior parameters can be used with confidence to perform global simulations.Previous site-level optimisations of phenology models have resulted in a wide range of parameterisations of growing degree day sum, chilling requirements and light availability (Migliavacca et al., 2012;Richardson and O'Keefe, 2009); therefore it is difficult to know which values to use for regional-to-global scale simulations.In most cases the MS optimisation averages out the variability due to specific site characteristics.The validation at the site and global scale using daily MODIS NDVI data demonstrates the generality of the MS posterior parameter vectors given the ORCHIDEE model structure.This gives confidence in using these values in regional-to-global scale simulations of the carbon and water fluxes and for future predictions with this model.
Although the optimisation resulted in a dramatic improvement in the seasonal leaf dynamics for temperate and boreal ecosystems, the impact in the inter-annual variability (IAV) as a result of the optimisation for any variable -mean annual fAPAR, mean spring/autumn fAPAR, GSL, SOS, EOS -was minimal (results not shown).This was disappointing as generally there is a low correlation in the IAV between OR-CHIDEE and the MODIS data, and IAV in spring phenology has been shown to be a dominant control on C flux anomalies (Keenan et al., 2012).

Validity of the posterior parameters and optimised phenology models
As data assimilation schemes are expected to result in a reduction in the prior data-model misfit it is useful to assess if the posterior parameter values are indeed realistic, which is conditioned on the prior range as well as the interaction (correlation) between the parameters.However validating these values remains difficult, as there is no database that corresponds directly to phenology model parameters, which may have a different meaning in different models.
Leaf lifespan (LL) however can more easily be measured and therefore can be found in the literature and in plant trait databases (Kattge et al., 2011;Wright et al., 2004).Although leaf lifespan is not the same as the critical age for senescence parameter (L agecrit ), the leaf lifespan should be similar if not lower than L agecrit , and thus serves as a benchmark to a certain degree.Reich (1998) gave 4-6 months for coldtemperate and boreal broadleaved and needleleaved deciduous trees, similar to the mean values for the same functional types in the TRY database (Kattge et al., 2011).These values encompass the range of optimised values for temperate and boreal broadleaved (TeBD and BoBD) PFTs (160 and 240 days, respectively, Table 2).The posterior value of L agecrit for the boreal needleleaved (BoND) PFT was lower than this range at 90 days, and was at the lower boundary of the parameter range.The TRY database gives a mean value of 3.85 and 1.68 months for C3 and C4 grasses, respectively (Kattge et al., 2011).These values are higher than the posterior value of the C3 grasses (60 days) and lower than the posterior value for C4 grasses (166 days).The values of L agecrit for NC3 and BoND PFTs were therefore too low and it is also notable that the values of T senes were also at their upper bounds for these two PFTs.
Most of the posterior values for the moisture-related phenology parameters for the tropical broadleaved raingreen and natural C4 grasses (TrBR and NC4) PFTs were also at their upper or lower bounds, even with liberal parameter bounds due to lack of prior knowledge.For example the posterior value of Moist Tmin , the number of days since the last moisture minimum before leaf onset, was unrealistically low at 10 days for TrBR trees.In turn the threshold of relative soil moisture needed for senescence (Moist senes ) was 0.8 and 0.7 % for TrBR and NC4 PFTs, respectively, which is likely too high.Assuming the prior range of the parameters does encompass the likely variability, such "edge-hitting" poste-rior parameter values can indicate that other processes may be missing.Indeed certain studies (Galvagno et al., 2013;Rosenthal and Camm, 1997) have suggested that photoperiod is also important in determining autumnal senescence in deciduous conifers (e.g.BoND PFT).Possible deficiencies in the models that control tropical deciduous phenology are discussed further in Sect.4.4.
In order to better evaluate the model parameterisations, we suggest that the phenology observation and modelling community could engage in an elicitation exercise (O'Hagan, (1998(O'Hagan, ( , 2012)); http://www.tonyohagan.co.uk/ shelf/).This would involve gathering knowledge on parameter ranges of typical phenological model parameters from experts in the field, in order to derive probability distributions of the parameter values.However, even if this exercise is carried out, measurements made at the species and site levels will be difficult to scale across species and to the coarse resolution used in TBMs.
The onset models used for temperate and boreal PFTs in ORCHIDEE are mostly simple spring warming models, though some include a chilling requirement (Table 1).Their comparative ability to reproduce the observations could add to evidence that more complex representations including light availability may not be needed (Fu et al., 2012;Picard et al., 2005;Richardson and O'Keefe, 2009).Other studies however have showed improved performance when a photoperiod term was included for species with a late leaf onset (Hunter and Lechowicz, 1992;Migliavacca et al., 2012;Schaber and Badeck, 2003).Several authors (e.g.Fu et al., 2012;Linkosalo et al., 2008) have pointed out that whilst the simple warming onset models may perform well under current climate conditions, future predictions may require additional complexity; for example the model-defined chilling period may not be sufficient in warmer conditions.More importantly perhaps, any model that requires a fixed start date from which thresholds are calculated may be inconsistent under increased temperatures, as warming will start before the defined start date (Blümel and Chmielewski, 2012).Ideally therefore these models should also be tested under future warming scenarios, although (Wolkovich et al., 2012) showed the magnitude of species' phenological response to temperature increases is lower in warming experiments compared to historical observations.

Accounting for spatial variability
The coarse-resolution observations used in this study will include the spatial variability in the timing of phenological events for different species within a PFT, or even due to specific site characteristics (edaphic conditions, terrain/slope, local meteorological effects) within the same species (Fisher and Mustard, 2007).During senescence in particular, spatial variability in the rate of leaf fall, and to some extent the leaf colouration, will likely contribute to the decline in "greenness" seen in NDVI observations.ORCHIDEE does not exwww.biogeosciences.net/12/7185/2015/Biogeosciences, 12, 7185-7208, 2015 plicitly represent this variability, instead vegetation is represented as a mean stand (which is also responsible for the unnatural "box-like" temporal profile seen for some sites/PFTs) and thus the posterior value of L fall parameter likely accounts for such missing structural processes and sub-grid variability.This could explain why the SS optimisations for the TeBD PFT result in a wide variety of values for this parameter (Fig. 7).This leads us to question whether phenology should be optimised at the species level (Chuine et al., 2000;Olsson et al., 2013), and/or whether there is a need to include a term that accounts for spatial variability in the model (e.g.Knorr et al., 2010).If the phenology models were optimised at species level there would need to be an increase in the number of points included in the optimisation in order to properly account for the variability between grid cells, which may result in strong correlations between the same parameters shared by the different species (e.g.Fig. S2c).On the other hand, prescribing a spatial variability term is a non-trivial issue, as it would not only encompass physiological differences between species but also variation in other site characteristics, as mentioned above.These issues have previously been discussed (e.g.Morisette et al., 2009;Bacour et al., 2015).

Phenology in ecosystems driven by water availability
The processes that govern leaf phenology in ORCHIDEE cannot reproduce the observations as well in regions where moisture availability, and not temperature, is likely the dominant control.The optimisation has revealed structural deficiencies as the probable cause of inaccurate simulations, rather than incorrect parameter values.In addition to problematic, "edge-hitting" posterior parameter values (see Sect. 4.2) the prior and posterior model incorrectly simulates a strong decline in vegetation productivity in the Sahelian region, which is opposite to that seen in satellite observations (results not shown, but see Traore et al. (2014a).Traore et al. (2014a) suggested that incorrect trends predicted by OR-CHIDEE could be related to the phenology models, however this study shows that is not the case.Even without a comparison to observations, ORCHIDEE does not always appear to have the correct response to precipitation, with two periods of simulated growth seen in one rainy season but without any decline in rainfall.This unexpected model behaviour is not corrected by the optimisation and needs investigating.
It is likely that the phenology models in ORCHIDEE are too simplistic for these regions and/or that the computation of soil water availability or the plant water stress function are inadequate.Such issues are likely encountered in other TBMs as they rely on similar models.Knorr et al. (2010) pointed out that evaporative demand, and not just moisture availability, should be considered in phenology models.(Traore et al., 2014b) evaluated the inter-annual variability of the soil moisture of ORCHIDEE across Africa using satellite-derived estimates, and found that the new 11-layer hydrology version performed better than the 2-layer version that was used in this study.The latest version of ORCHIDEE (Naudts et al., 2015) has a more mechanistic representation of plant water stress using water potential in the soil-plant continuum, which may lead to better predictions of leaf dynamics in drought-prone regions.
Although there are fewer studies focusing on the modelling of plant moisture-availability driven phenology, some models do exist for the dry tropics/semi-arid regions.Such models aim to simulate the vegetation response to soil and groundwater availability or atmospheric demand, both empirically (Archibald and Scholes, 2007;Do et al., 2005), or in a more mechanistic way by including non-linear feedbacks via transpiration (Choler et al., 2010).Similar approaches could be included in TBMs in order to better represent leaf growth and turnover in semi-arid grassland and savannah ecosystems in the dry tropics.

Importance of constraining phenology for global change studies
Observed increases in GSL and/or increases in vegetation density have been shown to result in concurrent impacts on the C surface fluxes on seasonal time scales (Dragoni et al., 2011;Piao et al., 2007;Richardson et al., 2009), although the magnitude and sign of the effect on net C fluxes is still a topic of debate (see Barichivich et al., 2013;Keenan et al., 2014;Piao et al., 2008;Richardson et al., 2010;White and Nemani, 2003).An in-depth analysis of the impact of the modified leaf phenology on the C, water and energy cycles and the subsequent feedbacks to the atmosphere was beyond the scope of this study.However, the changes in leaf phenology described above resulted in a ∼ 10 PgC yr −1 decrease in the simulated global mean annual GPP (uptake of C) over the 1990-2010 period (prior: 172.2 PgC yr −1 , posterior: 162.5 PgC yr −1 calculated using PFT fraction-and area-weighted fluxes for each grid cell).Reductions in GPP (results not shown) follow the same global spatial pattern as the difference in annual mean fAPAR (Fig. 8c).The decline in GPP is predominantly caused by the shorter GSL in the high latitudes and grasslands across the NH (median of −30 and −10 days for boreal and temperate regions, respectively), and in equatorial Africa and western South America (Fig. 8a).The exception to this is eastern Siberia, where the decrease in fAPAR amplitude, due to the lower peak LAI for BoND trees discussed in Sect.3.4, contributes to an even stronger reduction in GPP.In east Africa and South America, the lower fAPAR amplitude is predominantly responsible for the decrease in GPP.The mean annual GPP only increased in the Sahel and northern Australia, which is due to the fact that the optimisation resulted in an increase in both the GSL and fAPAR amplitude in these regions (Fig. 8).The decrease in GPP per day change in the GSL ( GPP / GSL) was ∼ 3-4 gCm −2 d −1 for boreal and temperate regions and an increase of ∼ 2 gCm −2 d −1 in the Sahel.Large variations in this ratio are seen across the NH depending on the PFT in question, owing to their different physiologies (TeBD: 31 gCm −2 d −1 ; BoBN: 10 gCm −2 d −1 ; BoBD: 5 gCm −2 d −1 ; NC3: 3 gCm −2 d −1 ).The reduction in the mean annual GPP (1990GPP ( -2010) ) partially accounts for the large positive bias in ORCHIDEE compared to data-driven global estimates of ∼ 120 PgC yr −1 (Jung et al., 2011).Note also that the increase in time series correlation and reduction in the bias of SOS and EOS dates have resulted in a improvement in the timing and magnitude of the simulated GPP and ET compared to the data-driven product (Jung et al., 2009).We suggest therefore that the bias in the timing of simulated onset and senescence and GPP estimates in other TBMs, as seen in the Richardson et al. (2012) model intercomparison for example, would be reduced if the phenology-related parameters were optimised using a similar framework.Also, the impact on energy and water fluxes in particular could result in strong feedbacks to climate (Peñuelas et al., 2009), possibly leading to different predictions under future warming.
Aside from making predictions of the carbon, water and energy budgets, TBMs are routinely used in trend attribution studies.A good example in this context would be the exploration of the causes of "greening" or "browning" trends in vegetation productivity (Hickler et al., 2005;Piao et al., 2006), or the impact of such trends on resource use efficiency (Traore et al., 2014b) or the C cycle (Piao et al., 2007).The fact that the optimisation resulted in changes in the strength and location of these trends (Fig. 9) demonstrates that such analyses are partly dependent upon model parameters, which can be a considerable source of uncertainty (Enting et al., 2012).

Conclusions
This study has demonstrated that a time series of normalised coarse-resolution satellite NDVI data can be used to optimise the parameters of phenology models commonly used in TBMs, and crucially that a multi-site optimisation can find a unique parameter vector that enables better predictions of the seasonal leaf dynamics at global scale.This type of model calibration framework is thus imperative for Earth system model development.The results also highlight that optimising the parameters allows model developers to distinguish between inaccurate model representations resulting from incorrect parameter values and model structural deficiencies.In ORCHIDEE the models used for predicting the leaf phenology in temperate and boreal regions are able to reproduce the seasonal cycle of the vegetation well after calibration, but ecosystems driven by water availability require further modification, particularly for natural C4 grasses.The optimisation also led to changes in the strength and location of "greening" and "browning" trends in the model, suggesting caution should be exercised when using un-calibrated models for trend attribution studies.Furthermore, the observed trends were not well captured in some regions, which is a key aspect to improve upon when considering future simulations of climate, CO 2 and anthropogenic change.In temperate and boreal regions the onset of leaves is mainly driven in the spring by an accumulation of warm temperatures (see the review of Hänninen and Kramer, 2007).The well-known growing degree day (GDD) model (Chuine, 2000) sums up the temperatures above a given temperature threshold, T 0 (for example 0 • C), starting at a given date (for example the first of January in the Northern Hemisphere).The onset of leaves starts when the GDD reaches a plant/PFT-specific threshold.In ORCHIDEE, T 0 is −5 • C, and the GDD sum is calculated from the beginning of the dormancy period, which starts when the leaves were lost or when GPP decreased below a certain threshold.
This simple model may be refined.For example the GDD threshold has been reported to depend on a "chilling requirement" for some species; i.e. their physiology requires cold temperatures to trigger the mechanism that will allow budburst to occur (e.g.Orlandi et al., 2004).This ensures that the dormancy has been broken after a required cold period, and thus prevents a too-early awakening.The Number of Chilling Days (NCD) GDD model initiates leaf onset earlier with an increase in the number of chilling days, defined as a day with a daily mean air temperature below a PFT-dependent threshold accumulated after a given starting date (e.g.Botta et al., 2000;Cannell and Smith, 1986;Murray et al., 1989).The GDD threshold therefore decreases as NCD increases.This experimental relationship is a negative exponential with three PFT-specific parameters (Botta et al., 2000): where A 0 , A 1 and A 2 are parameters that have been calibrated against satellite data (Botta et al., 2000).The growing season begins if the daily calculated GDD is higher than the calculated GDD threshold .This model (hereafter referred to as the "NCD_GDD" model) is used for the temperate and boreal broadleaved deciduous (TeBD and BoBD) PFTs in OR-CHIDEE.
The start of the growing season for the boreal needleleaved deciduous (BoND) PFT occurs when the number of growing days (NGD), i.e. days with a mean daily temperature above the threshold temperature, T 0 (−5 • C), has exceeded a PFT-dependent threshold (NGD threshold ).The NGD is calculated from the beginning of the dormancy period.This model (hereafter referred to as the "NGD" model) was proposed by Botta et al. (2000) for boreal and arctic biomes, and is designed to initiate leaf onset after the end of the soil frost.
For C3 and C4 natural grasses and crops (NC3, NC4, AC3, AC4), the GDD threshold is given by a second-degree polynomial of the long term mean annual air surface temperature T l : where B 0 , B 1 and B 2 are PFT-dependent parameters.In addition, leaf onset is initiated only when a moisture availability criterion is met, namely when the moisture minimum occurred a sufficiently long time ago, as specified by a PFTdependent threshold parameter (Moist Tmin ): time since moisture minimum > Moist Tmin . (A3) This moisture availability criterion corresponds to Model 4b in Botta et al. (2000), who assume leaf onset in tropical biomes requires a certain amount of accumulation of water in the soil.Both the moisture availability and GDD threshold criteria (hereafter referred to the "MOI_GDD" model) must be met for leaf onset to occur in grasses and crops.
For the tropical broadleaved raingreen (deciduous) (TrBR) PFT, the start of the growing season depends only on the moisture availability criterion (hereafter referred to as the "MOI" model) previously described for grasses and crops.
When the onset of leaves is declared, the allocation module first allocates carbon from the carbohydrate reserves towards leaves and roots as long as the LAI is lower than a given threshold, which is a function of the maximum LAI, LAI max , a PFT-dependent value: LAI threshold = 0.5LAI max . (A4) The onset parameter values are listed in Table 2.

A2 Leaf age and leaf senescence
To account for the fact that the photosynthetic efficiency of where B l is the leaf biomass lost through this aging process and t is the daily time step.
The second turnover process is a senescence process (the end of the growing season and the shedding of leaves) that is based on climatic conditions.This only exists for deciduous PFTs.For tree PFTs whose senescence is driven by sensitivity to cold temperatures (TeBD, BoND, BoBD and grasses), senescence begins when the monthly air surface temperature goes below a threshold temperature, defined as a second order polynomial of the long-term mean annual air surface temperature T l : where C 0 , C 1 and C 2 are PFT-dependent parameters.
For phenology models that are driven by soil moisture conditions ("MOI" models -see Appendix A and Table 1) the parameter that controls leaf onset is the "minimum time since the last moisture minimum" (Moist Tmin ), and the parameters that control senescence are Moist senes and Moist no_senes , the critical moisture levels below and above which senescence does and does not occur, respectively.These PFT-dependent parameters are optimised directly; i.e. no effective parameters are introduced to scale the original ORCHIDEE parameter.
For grasses, the climatic senescence is controlled by both temperature and moisture conditions.The senescence parameter prior values are listed in Table 2.Note that no senescence models in ORCHIDEE currently include a photoperiod term for either onset or senescence.

Figure 1 .
Figure 1.Schematic to show how the optimised parameters control the timing of the leaf phenology in ORCHIDEE.The dotted arrow shows that the temperature and moisture threshold for senescence also affects the rate of leaf fall for grasses by slowing down the turnover rate once this threshold has been reached (whereas for trees only the L fall parameter is used).

Figure 2 .
Figure 2. Global distributions of fractional cover for the six PFTs optimised in this study.Red upright triangles mark the location of the optimisation sites, and yellow upside-down triangles mark the location of the validation sites.

Figure 3 .
Figure 3.Time series (zoom to 2003-2007) for one example BoND PFT site (72 • N, 120.24 • E) of (a) the normalised modelled total fA-PAR and MODIS NDVI data; (b) the un-scaled model total fAPAR and MODIS NDVI data; (c) the corresponding modelled LAI for the BoND PFT only.The black curve corresponds to the MODIS NDVI data, the blue curve is the prior model simulation, the orange curve shows the model simulation using the posterior parameters from the SS optimisation, and the red curve corresponds to the model simulation at this site using the MS posterior parameter values.The prior and posterior RMSE and R are given in the upper left box.The MODIS NDVI and model fAPAR time series were normalised to their maximum and minimum value over the 2000-2008 period for the optimisation (see Sect. 2.2.1).

Figure 4 .
Figure 4.The mean seasonal cycle of the normalised modelled fA-PAR before and after optimisation, compared to that of the MODIS NDVI data, for the temperate and boreal deciduous PFTs (TeBD, BoBD, BoND and NC3).Black = MODIS NDVI data; blue = prior model; orange = single-site optimisation; red = multi-site optimisation.

Figure 5 .
Figure 5. Example time series (2002-2008 period) for the tropical deciduous PFTs for which phenology is driven by moisture availability.The two panels compare the normalised simulated fAPAR to the normalised MODIS NDVI (black curve) prior to (blue curve) and after the optimisations (orange curve = SS optimisations; red curve = MS optimisation) for a (a) TrBR tree site (5.77• S, 25.92 • E) and a (b) NC4 grass site (10.08 • N, 4.32 • W).The prior and posterior RMSE and R are given in the upper left boxes.The grey vertical lines show the daily precipitation.

Figure 6 .
Figure 6.Global maps showing the correlation between the simulated fAPAR and MODIS NDVI data in the weekly time series for the posterior simulation (left column) and the difference after optimisation (posterior -prior) (right column).Note POST refers to the posterior simulation after optimisation, and PRIOR to the simulation using the standard parameters of ORCHIDEE.

Figure 7 .
Figure 7.The prior (blue), MS posterior (red) and SS posterior (orange) parameter values (circles) and uncertainty (error barsance calculated in Eq. 3) for each parameter and each PFT.For the SS optimisations the circle error bars represent the mean and standard error of the mean of all sites, and the crosses give the posterior values for each site.Refer to Table 1 for a description of the PFTs and Fig. 1 and Appendix A for a of each parameter.The y axis range represents the maximum upper and lower bounds for each parameter across all PFTs, and the horizontal dashed lines represent the parameter range for each individual PFT.

Figure 9 .
Figure 9. Linear trend (yr −1 ) in the annual mean of the simulated normalised fAPAR for the 1990-2010 period: (a) prior simulation; (b) posterior simulation; (c) difference between the prior and posterior (posterior -prior).

Table 1 .
Standard PFTs used in ORCHIDEE, their short name and corresponding phenology model (see Appendix A for a full description of the phenology models).Evergreen and agriculture PFTs do not have a specific phenology model in the ORCHIDEE TBM.
averaged at the model forcing spatial resolution (0.72 • ) for each time step.The data have a noise range of ∼ 0.025 to 0.03, with highest values in densely forested areas The time series were interpolated on a daily basis, in order to account for any missing values due to cloud, using a third degree polynomial and considering the 10 nearest valid acquisitions, with a maximum allowed difference of 15 days.The NDVI values were then spatially www.biogeosciences.net/12/7185/2015/Biogeosciences, 12, 7185-7208, 2015

Table 2 .
ORCHIDEE parameters optimised.For each PFT the prior values, minimum and maximum values (in squared brackets) and multisite posterior mean values (in bold) are given.Note the prior uncertainty on the parameters is defined as 40 % of the full parameter range.

Table 3 ,
3rd column).The skill of the optimisation algorithm is highly dependent on the PFT in question; the lowest final value of the cost function and the highest % reduction (normalised to www.biogeosciences.net/12/7185/2015/Biogeosciences, 12, 7185-7208, 2015

Table 3 .
Metrics to describe the ability of the optimisation algorithm to find the global minimum of the cost function for the MS optimisation using 20 different random "first guess" parameters.The 2nd column shows the final value (after 25 iterations of the BFGS optimiser) for the random test that resulted in the lowest value of J (x)(the cost function), which corresponds to the highest % reduction in J (x).The 3rd column gives an indication of the distribution of the final values for all 20 tests with respect to the lowest value obtained (given in column 1).This is represented as the difference between the final value of J (x) for each test and the minimum value of J (x), divided by the same minimum value.The 4th column shows the % reduction of J (x) normalised to the value of J (x) for the default parameters of ORCHIDEE.The two values for the 3rd and 4th columns represent the interquartile range (25th to 75th percentiles) for the spread across the 20 random tests.

Table 4 .
Prior and posterior median (across sites) RMSE and R (and median reduction of RMSE in % in brackets) for the 15 sites included in the optimisation and for the 15 sites used for spatial validation, for each PFT.For the temporal validation all 30 sites are included, but only for the 200-2010 period.For the spatial and temporal validation, the posterior MS parameter vector was used.

Table 5 .
Median prior and posterior correlation between modelled fAPAR and satellite NDVI daily time series and inter-annual anomalies of the annual mean.The metrics are computed for each PFT for grid cells that contained > = fraction that was used to select the sites for the PFT in question (seeSect.2.3.3).

Table 6 .
Median prior and posterior bias between model-and observation-derived start of season (SOS) and end of season (EOS) dates (model -observations).The metrics are computed for each PFT for grid cells that contained > = fraction that was used to select the sites for the PFT in question (seeSect.2.3.3).A negative bias indicates the modelled date is earlier than the one calculated from the observations.
leaves depends on their age, L age , they are separated into four age classes.Biomass newly allocated to leaves goes into the first age class and leaf biomass, B l , is then transferred from one class to the next based on a PFT-specific critical leaf age value, L agecrit .The long-term reference temperature modulates the critical leaf age of grasses in order to account for the fact that leaves can live longer in colder climates.The leaf age continually affects the turnover of the leaves both during the growing season and once senescence has begun.Different turnover processes control leaf fall, the first one is a simple aging process based on the L agecrit parameter.For trees, when leaf age is greater than half the critical leaf age a turnover rate is applied following