Technical Note: Approximate Bayesian parameterization of a process-based tropical forest model

Inverse parameter estimation of process-based models is a long-standing problem in many scientific disciplines. A key question for inverse parameter estimation is how to define the metric that quantifies how well model predictions fit to the data. This metric can be expressed by general cost or objective functions, but statistical inversion methods require a particular metric, the probability of observing the data given the model parameters, known as the likelihood. For technical and computational reasons, likelihoods for process-based stochastic models are usually based on general assumptions about variability in the observed data, and not on the stochasticity generated by the model. Only in recent years have new methods become available that allow the generation of likelihoods directly from stochastic simulations. Previous applications of these approximate Bayesian methods have concentrated on relatively simple models. Here, we report on the application of a simulation-based likelihood approximation for FORMIND, a parameter-rich individual-based model of tropical forest dynamics. We show that approximate Bayesian inference, based on a parametric likelihood approximation placed in a conventional Markov chain Monte Carlo (MCMC) sampler, performs well in retrieving known parameter values from virtual inventory data generated by the forest model. We analyze the results of the parameter estimation, examine its sensitivity to the choice and aggregation of model outputs and observed data (summary statistics), and demonstrate the application of this method by fitting the FORMIND model to field data from an Ecuadorian tropical forest. Finally, we discuss how this approach differs from approximate Bayesian computation (ABC), another method commonly used to generate simulation-based likelihood approximations. Our results demonstrate that simulation-based inference, [...]

Abstract. Inverse parameter estimation of process-based models is a long-standing problem in many scientific disciplines. A key question for inverse parameter estimation is how to define the metric that quantifies how well model predictions fit to the data. This metric can be expressed by general cost or objective functions, but statistical inversion methods require a particular metric, the probability of observing the data given the model parameters, known as the likelihood.
For technical and computational reasons, likelihoods for process-based stochastic models are usually based on general assumptions about variability in the observed data, and not on the stochasticity generated by the model. Only in recent years have new methods become available that allow the generation of likelihoods directly from stochastic simulations. Previous applications of these approximate Bayesian methods have concentrated on relatively simple models. Here, we report on the application of a simulation-based likelihood approximation for FORMIND, a parameter-rich individualbased model of tropical forest dynamics.
We show that approximate Bayesian inference, based on a parametric likelihood approximation placed in a conventional Markov chain Monte Carlo (MCMC) sampler, performs well in retrieving known parameter values from virtual inventory data generated by the forest model. We analyze the results of the parameter estimation, examine its sensitivity to the choice and aggregation of model outputs and observed data (summary statistics), and demonstrate the application of this method by fitting the FORMIND model to field data from an Ecuadorian tropical forest. Finally, we discuss how this approach differs from Approximate Bayesian Compu-tation (ABC), another method commonly used to generate simulation-based likelihood approximations.
Our results demonstrate that simulation-based inference, which offers considerable conceptual advantages over more traditional methods for inverse parameter estimation, can be successfully applied to process-based models of high complexity. The methodology is particularly suitable for heterogeneous and complex data structures and can easily be adjusted to other model types, including most stochastic population and individual-based models. Our study therefore provides a blueprint for a fairly general approach to parameter estimation of stochastic process-based models.

Introduction
Parameter estimation of process-based models is a longstanding problem in many scientific disciplines. Early proponents of process-based modeling in ecology have stressed the importance of deriving predictions from basic physical processes, with physical parameters that can be experimentally determined (Bossel, 1992). In practice, however, for various reasons ranging from time limitations to fundamental observability restrictions, most process-based models have parameters for which direct measurements are not available . These parameters need to be estimated inversely, meaning that they are adjusted by comparing model outputs to observed data.
To make this comparison, Bayesian methods have become increasingly popular in ecological research during the last decade (e.g. O'Hara et al., 2002;Clark, 2005;Purves et al., 2 Hartig et al.: Approximate bayesian parameterization of a tropical forest model 2007; Higgins et al., 2012). In addition to their flexibility and explicit treatment of parameter uncertainty, a particularly appealing property of Bayesian statistics is that they offer the possibility of combining existing information on likely parameters values with the information that is generated inversely . As with other inverse parameterization approaches, Bayesian methods require the definition of a metric that quantifies how well model predictions fit to the observed data. In non-statistical inversion approaches, such metrics are often called goal functions, objective functions or cost functions (e.g. Schröder and Seppelt, 2006). Bayesian approaches use a particular statistical metric, the probability of obtaining the observed data given the current model and parameter values, usually referred to as the likelihood.
Most previous applications of Bayesian statistics to process-based ecological models derive this probability by making distributional assumptions about how observations vary around mean model predictions that are independent of the processes in the model, either ad-hoc or based on the observed variance in the data (e.g. Martínez et al., 2011;van Oijen et al., 2013). This is usually justified with the idea in mind that there is observation uncertainty or variability in environmental conditions that is not accounted for in the model. The approach of constructing likelihoods from such assumptions is the current state-of-the-art, but it has a major limitation: many process-based ecological models are already stochastic and predict variability of certain model outputs. In principle, one would prefer using this variability for deriving the likelihood, because it is based on our mechanistic understanding of the ecological system and accounts, for example, for the possibility that expected variability may change with the parameters of the ecological processes. Moreover, once we base our likelihood on the outputs of the stochastic model, additional observer submodels that describe how field data was collected could easily be added (e.g. Zurell et al., 2009). However, while theoretically possible, calculating likelihoods for such complicated stochastic interactions used to be intractable in practice (Hartig et al., 2011).
This technical limitation has been reduced in recent years by novel simulation-based approximation techniques that allow practically any stochastic model to be treated in a formal statistical inference framework. Of those, Approximate Bayesian Computation (ABC Beaumont, 2010) has arguably attracted most attention, but there are other approaches as well. Their common principle is very simple: what is needed for including a stochastic simulation model in a formal inferential framework is the likelihood p(D|M (φ)) for an outcome D to occur under a model M with parameters φ (Diggle and Gratton, 1984). Simulation-based likelihood approximations estimate this probability by generating draws from the stochastic model. Subsequently, different methods are used to approximate the likelihood or posterior (Hartig et al., 2011). Often, this involves comparing the model output and observed data by means of data aggregations, also called pat-terns (Wiegand et al., 2004;Grimm and Railsback, 2012) or summary statistics (Beaumont, 2010;Wood, 2010). For brevity, we will refer to these methods in general simply as likelihood approximations, or, in the context of a Bayesian analysis, as approximate Bayesian methods.
The potential of likelihood approximations in ecology has been repeatedly stressed, but applications to community or population ecology are still rare (but see Chave, 2009, 2011;May et al., 2013). To our knowledge, there is no previous study that applies likelihood approximations to a computationally expensive, parameter-rich model simulating an ecological community.
The aim of our study is to show that simulation-based likelihood approximations can be successfully applied to complex process-based models. We use a simulation-based likelihood approximation proposed by Wood (2010) to infer the parameters of FORMIND, an individual-based model of tropical forest communities. We first fit to virtual inventory data that were generated from the model with known parameters. This allows us to test whether the method can correctly identify all model parameters for different kinds of field data, and to examine how choice and aggregation (summary statistics) of data affect the results of the inference. Finally, we apply the method to fit the model to field data from a tropical montane forest in Ecuador.

Forest gap dynamics and the FORMIND model
Forests ecosystems are locally highly dynamic. One of the most prominent drivers of these dynamics, particularly in the tropics, are natural disturbances, where large trees that have lost stability due to mortality or other factors fall and damage or kill other trees. Gap formation creates a dynamic mosaic of light-filled gaps in natural forests (e.g. Shugart, 1984;Mc-Carthy, 2001). Within these gaps, pioneer species colonize first, until other species take over and continue the successional dynamics that are thought to be one part of the explanation for forest diversity (e.g. Kohyama, 1993).
Mechanistic forest models that describe the processes of gap formation and recovery have a long history in ecology (Pacala et al., 1996;Bugmann, 1996;Shugart, 1998;Huth and Ditzer, 2000). These models typically include several tree species with different growth properties and light demands. For highly diverse systems such as tropical rainforests, species are usually grouped into plant functional types (PFTs) that represent a group of species with similar functional properties. Parameters and model predictions per plant functional types then represent a mean over the species that are represented by this type. Gap formation by falling dead trees maintains the modeled forest in a dynamic equilibrium. As a result, forest gap models do not merely predict a mean value for outputs such as biomass, species composi- Table 1. Important model parameters and their interpretation. The parameter dbhm in the last line refers to the maximum diameter of a tree, which is a species or PFT specific parameter of the model (see Dislich et al., 2009 tion, or tree size distributions. Rather, they deliver samples of different possible values for these outputs and therefore allow probabilities to be assigned to different community or biomass states. These predictions of spatio-temporal variation in community composition is what we will use later to derive a probabilistic measure of distance between the model output and observed data. FORMIND, the forest model used for this study, is a stochastic, individual-based forest model designed in the tradition of classical forest gap models (Köhler, 2000). It has been applied for estimating forest succession, variability and disturbances impacts in various tropical locations around the world (e.g. Rüger et al., 2007;Köhler and Huth, 2010;Dislich and Huth, 2012;Gutiérrez and Huth, 2012). The simulation area (plot) in FORMIND, which can be of variable size (we use 1 ha throughout the paper) is subdivided into 20 × 20m grid cells. Tree individuals are assigned to one of these cells and interact with each other on the cell, but do not have an explicit spatial position within the cells. The model state is entirely described by species respectively functional type, size (measured in diameter at breast height dbh), and location (cell) of all trees. Other variables, such as tree height and crown dimensions, are derived through fixed allometric relationships.
At each time step (we use 5-year time steps), the light climate in each cell is calculated from the trees on that cell and their respective crowns. Subsequently, establishment (lightdependent, stochastic), mortality (stochastic) and tree growth (light-dependent) act on all tree individuals. Important parameters in the model (Table 1) are recruitment and mortality rates, parameters that describe the size-specific maximum growth rates, and the allometric relationships that determine height and crown dimensions. Details of these processes, together with a more detailed description of the model scheduling, are provided in the supplementary material (see also Köhler, 2000;Dislich et al., 2009).

Bayesian parameter estimation with simulationbased likelihood approximations
We use a Bayesian approach for parameter estimation. One of the advantages of using Bayesian methods with Markov Chain Monte Carlo (MCMC) sampling for simulation-based likelihood approximations is that MCMCs, unlike optimization approaches, are more robust towards variance in likelihood estimates generated by the approximation (Hartig et al., 2011). Bayesian methods are also somewhat better suited to dealing with interactions between parameters, which is a phenomenon to be expected in process-based models. In principle, however, one could use the likelihood approximation used in this study with an optimization algorithm in a maximum-likelihood framework as well.
There are a number of introductions to Bayesian statistics. A detailed reference is Gelman et al. (2003), for a shorter introduction see Ellison (2004). We give only a brief summary here. The outcome of a Bayesian inference is a probability distribution P (φ|D obs ) for the parameters φ given the observed data D obs . This distribution, called the posterior, is calculated as where c is a normalization constant, the prior probability density p(φ) quantifies parameter uncertainties before comparing the model to the observed data, and the likelihood function p(D obs |M (φ)) describes the probability of obtaining the observed data conditional on the model M with parameters φ. Broadly speaking, we may say the likelihood quantifies the quality of the fit, while the prior quantifies our prior expectation for each possible parameter value.
Because our main concern in this paper is the approximation of the likelihood, we chose wide uniform (flat) priors for all parameters and data types, which means that the pos-   (black), standard deviation (gray) and minimum/maximum values (light gray) for the biomass of a 1 ha plot over 10,000 years, starting from an empty plot. Panel (b) shows the same mean equilibrium biomass (black) and two standard deviations (gray), but as a function of the mortality of the latesuccessional type PFT 3; all other parameters constant. Comparing the observed biomass from panel (a), which was created with a mortality rate of 0.005, with the predicted biomass for different mortality rates, we can infer the original value as well as a statistical uncertainty, without having to define a statistical model. terior and likelihood are strictly proportional to each other across the possible prior range. Tables with the widths of these uniform priors are provided in the supplementary material. Given that we knew that the model reacts nonlinearly to many parameters, other uninformative priors choices would have been possible (e.g. Kass and Wasserman, 1996), but we felt for the purpose of our study it is more useful to ensure proportionality of likelihood and posterior to facilitate interpretation of the results.

Generating approximate likelihoods
The technical key novelty in this study is the definition of the likelihood p(D obs |M (φ)). In "conventional" Bayesian or maximum likelihood studies, this conditional probability is obtained by formulating an error model that quantifies probabilities of deviations between model predictions and observations ocurring (e.g. Van Oijen et al., 2005). This model may be mechanistically motivated, for example by knowledge about measurement uncertainties. In practical situations, however, there are usually a number of error sources that interact, and error models are therefore typically either fixed ad hoc (van Oijen et al., 2013) or derived from the observed variability in the data (Martínez et al., 2011). Hence, "conventional" likelihoods are usually independent of the mechanisms in the process-model that is fit.
Our approach goes beyond such an independent error model towards an approach where both the mean model prediction, but also the probability of observing deviations from the mean, are derived from the same stochastic ecological processes. This is particularly promising in systems where process-stochasticity dominates observation errors. For inventory data from tropical forests, this is generally the case. Given typical observation errors (see Chave et al., 2004), we can assume that, for small plots, observation uncertainty is small compared to local biomass variation due to successional dynamics (e.g. Chave et al., 2003). FORMIND simulations of the aforementioned successional dynamics triggered by gap formation explain the extent of this variability well (e.g. Köhler and Huth, 2010) and can therefore be used to generate statistical expectations for model outputs such as biomass conditional on the model parameters (Fig 1).
Several techniques have been suggested for achieving likelihood approximations from stochastic simulations. Most prominent is arguably the method of Approximate Bayesian Computation (ABC Csilléry et al., 2010;Beaumont, 2010), which has attracted much attention in recent years. However, as discussed in Hartig et al. (2011), there are a number of closely related methods that are currently not counted as examples of ABC, but that apply similar principles. In this study, we use the method of "synthetic likelihoods" suggested by Wood (2010), classified as a "parametric likelihood approximation" in Hartig et al. (2011).
The principle of this method is to estimate p(D obs |M (φ)), for any φ desired, by fitting a parametric distribution to the output of the stochastic simulation, and estimating the probability of obtaining D obs from this distribution (Fig. 2). We  used a multivariate normal distribution because it fitted well to the simulation outputs, and allows a convenient estimation of the covariance structure, but normality is by no means a fundamental requirement of the approach. For the multivariate normal approximation, the likelihood of obtaining the observed data D obs with model M and parameters φ is Here, c = (2π) −k/2 , k being the dimension of D obs ,d sim (φ) is the corresponding vector of mean simulation outputs, Σ sim (φ) the covariance matrix of the simulation outputs that summarizes variability of and correlations between simulation outputs, and |Σ sim (φ)| the determinant of the covariance matrix. Pseudocode for the entire parameter estimation algorithm is provided in the supplementary material.

Representation of the data
As in ABC, it is desirable to represent the data used in eq. 2 in a low-dimensional form so that the estimation particularly of Σ −1 sim (φ) can be achieved in a computationally efficient way. The challenge here is to find lower-dimensional aggregations (summary statistics) of the data that still contain the same amount of information for the purpose of the inference as the raw data (sufficiency). Unfortunately, there is still no generally accepted rule on how to find good summuary statistics (but see Fearnhead and Prangle, 2012; Blum et al., 2013). We therefore decided to use mainly two aggregations that have been frequently used for summarizing inventory data in forest modeling, and tested their information content by fitting the model to simulated data. The first aggregation is using stem size distributions, which count the number of tree individuals per (in our study 10cm) size class per PFT or for all trees. The second is the size specific mean growth, which quantifies the mean stem diameter growth for different size classes. We also experimented with other forest attributes or aggregations of the data (see Table 2).

Posterior estimation
Subsequent posterior estimation based on the approximate likelihood was done with an adaptive Metropolis-Hastings MCMC (Haario et al., 2001). We always ran several chains and checked convergence visually and with Gelman-Rubin diagnostics (Gelman and Rubin, 1992, see supplementary material for further details). As several seconds were typically required to evaluate a single parameter combination with FORMIND, posterior estimations cost substantial computing time. The exact number, length and burn-in of chains are provided in the figure captions of the supplementary material. Fig. 2 provides a visual summary of the analysis method.

Field data, model setup and analysis
We used two data sets to fit the parameters of the model, a "virtual" 1-ha inventory with three plant functional types (PFT1 = pioneer, PFT2 = mid-successional, PFT3 = late successional) that was created from the FORMIND model itself (which has the advantage that the "true" parameter values are Table 2. Overview of parameter estimations with different models, parameters and summary statistics. Abbreviations for the data: SSD = stem size distribution (16 10-cm classes), GRO = mean stem diameter growth for each of the 16 10-cm stem diameter classes, BM = biomass. If not stated otherwise, the data type was available for each PFT separately. If we use the mean over all PFTs, we label this by "total". "Full parameters" means that all parameters listed in Tables 1,2 of the supplementary material are estimated inversely. Reduced parameters means that only recruitment, mortality, maximum growth and maximum growth diameter are estimated. "Number of parameters" and "Data dimension" give the number of parameters and data points, respectively. "Posterior width" measures the posterior width of the marginal distributions by the ratio between marginal posterior standard deviation and uniform prior width averaged over all parameters. "Convergence ranking" provides a ranking of the speed of convergence of the MCMCs based on the convergence diagnostics discussed in the supplementary material. Lower numbers indicate fastest convergence. As E1 uses different data and a different number of PFTs, the convergence ranking is not fully comparable and was set in parenthesis. known), and a 5 ha forest inventory from a montane tropical rainforest in Ecuador that is described in (Dislich et al., 2009). The purpose of the virtual dataset is to test the parameter estimation method for different data types in a situation where true parameters are known, while the data from Ecuador provides a realistic case study that allows us to test the method in a situation that had previously been dealt with by manual calibration based on visual assessment of model fit.

LABEL
To create the "virtual" inventory, we used a base parameterization that was adjusted for exhibiting biomass values and successional patterns typical to a wet tropical lowland rainforest. With this setting, we simulated 1000 model runs, and created virtual datasets from the mean equilibrium values of these replicates for different types of output variables (summary statistics) such as biomass, stem diameter growth rates and stem size distributions. We also experimented with a different number of parameters to be estimated. A summary of these options, labeled V1-V5, is provided in Table 2. For complex models, it can usually not be known a priori which data types are sufficient for a particular inferential question, and we therefore have to test this with virtual data (see also Jabot and Chave, 2009). The number of estimated parameters, on the other hand, is more a practical issue: from our understanding of the processes, it was foreseeable that FOR-MIND would exhibit interactions between parameters with respect to these outputs, but it is of practical interest to determine to which extent posterior estimation is slowed down by these interactions, and how those interactions look exactly. For fitting the model to field data in Ecuador, a tree-species grouping into seven PFTs was used that is described in de-tail in Dislich et al. (2009). Due to data availability, we used only the stem size distributions for the parameter estimation, which we label E1.

Fit to virtual inventory data (tropical lowland rain-
forest, V1-V5) As explained above, we considered a number of options to fit the model to the virtual inventory data. Those options differed in the aggregation of model outputs, and in the number of estimated parameters. We concentrate here on the case V1 in Table 2 (detailed data, not all parameters under calibration). Results for the other cases are discussed in brief below. Detailed results are provided in the supplementary material. Fig. 3a shows the estimated marginal posterior densities (eq. 1) for the parameterization V1 in Table 2. Those marginal posterior densities represent the probability assigned for the values of each parameter. We find that most parameter values are retrieved correctly and show moderate uncertainties of the order of 20% − 50% of the mean. When interpreting these plots, note that "marginal" means that we display the values of one particular parameter in the posterior sample without taking the corresponding values of the other parameters into consideration. If there are correlations between parameters densities in the posterior, marginal uncertainties often appear substantially larger than they effec-  Table 2). The distributions in a) correspond to the marginal posterior density p(φ|D) for each parameter, scaled relative to the "true" values that were used to create the synthetic data (see Table 2 in the supplementary material for true values and units). The dot within each distribution denotes the median value. Panels in b) visualize correlations between recruitment and mortality parameters in the posterior sample (recr1 refers to the recruitment rate of PFT1, mort2 refers to mortality of PFT2 and so on). The diagonal shows the marginal distributions displayed in panel a). The lower triangle shows the correlation density between the parameters on the diagonal (red values denoting higher density) and a nonlinear fit of the correlation (black line). The upper triangle shows Spearman's rank correlation coefficients for the correlations in the lower triangle.

Marginal distributions
tively are when viewed multivariately. We examine this in the next subsection.

Correlations
Marginal distributions represent a cross-section of the posterior sample along one parameter, which neglects potential trade-offs between parameters with respect to the data to which the model is fit. Statistical models are usually designed to avoid such correlations wherever possible. For processbased models, on the other hand, the correspondence to specific biological mechanisms is usually the main design criterion. It is therefore likely that such correlations will appear when estimating their parameters, as evidenced by Fig 3.b. Moreover, it is to be expected that the correlation structure depends on the data used to fit the model. Less informative data will typically lead to more parameter combinations that can reproduce this data, affecting the correlation structure in the posterior sample.
These expectations are largely confirmed by our results. We find strong positive correlations particularly for recruitment and mortality of early successional types, as one would expect, because, for those PFTs, increased mortality can be compensated for to some extent by increased recruitment. Also, we find that the correlation structure changes with the data types used. A detailed analysis of the correlation structure for the different data types (summary statistics) tested by us is provided in the supplementary material.

Choice of data type and number of fitted parameters
For the parameter estimation scenario V3-V5 ( Table 2) that used less information (more aggregated model outputs or summary statistics), posterior parameter estimates were wider than for our baseline scenario V1 (Table 2; for details see supplementary material). It can therefore be concluded that all further aggregations of the data used in V1 lose information for the purpose of estimating the considered parameters (see also Wiegand et al., 2004). Similarly, increasing the number of fitted parameters (scenario V2) increased the width of the posterior distribution. For all cases, the results indicate that more coarse aggregations or more parameters under calibration lead to additional correlations between parameters with respect to the objective of reproducing the respective data type, leading to wider marginal distributions  Fig. 4. True, prior and posterior predictive uncertainty. Each distribution is created from 1000 model runs, observing the biomass on a 1 ha forest plot after 2000 years. The upper distribution shows biomass values from model runs with the same, "true" parameters ( Table 2, supplementary material), and thereby gives an estimate of the stochastic uncertainty of the model. For the middle distribution, model parameters were drawn from the prior distribution (resulting in what is called the prior predictive distribution). For the lower distribution, model parameters were drawn from the posterior (posterior predictive uncertainty). and slower convergence of the MCMC algorithms (overview in Table 2, see supplementary material for details).

Reduction of predictive uncertainty
The Bayesian framwork also allows convenient estimation of the predictive uncertainty before and after fitting the model to the data. We compare three cases, the inherent stochastic uncertainty of the model with the "true" parameters, the uncertainty resulting from parameters drawn from the prior distribution (i.e. before parameter estimation), and the uncertainty for parameters drawn from the posterior distribution (i.e. after parameter estimation). The results displayed in Fig 4 show that the posterior predictive mean is similar to that of the "true" parameters, with predictive uncertainty only slightly larger than for the "true", fixed parameter value, which indicates that, for a single 1 ha plot, the output uncertainty generated from process-stochasticity is of the same order of magnitude as the uncertainty originating from the parameters. The prior predictive distribution, showing the predictions before calibration, is biased towards smaller values. This is likely due to the fact that many parameters in the prior distribution, in particular those with high mortality, result in very low biomass values.

Fit to Ecuadorian montane rain forest, E1
The results of the fit to field data from Ecuador (case E1 in table 2) are displayed in Fig. 5. We show the marginal distributions for each parameter scaled to the prior range. Priors were uniform distributions within plausible ranges for a forest of that type. Hence, the figure provides a visual estimate of the reduction of parameter uncertainty that would be reached starting from a state at which no specific information about the plot is available, with a distribution of a width 0.2, for example, indicating that the prior uncertainty is reduced by 80% with the chosen data type. Parameter correlations and unscaled marginal parameter estimates are provided in the supplementary material, Fig. 13 and Fig. 12, respectively.

Discussion
Inverse parameter estimation of ecological models requires a metric that quantifies how well model predictions fit to observed data. Because of technical limitations, the current state-of-the-art is choosing these metrics from expert knowledge or deriving them from field data. However, new statistical methods make it possible to generate goodness-of-fit metrics directly from any stochastic simulation model. More specifically, simulation-based likelihood approximations allow the generation of approximate likelihood functions that return the probability of obtaining a certain field observation given the model parameters directly from the stochastic model outputs. This technique provides a universal and unambiguous way to connected stochastic ecological models to field data.
The present study is one of the first to apply this method to a parameter-rich ecological model. We use a parametric likelihood approximation, proposed by Wood (2010), for fitting FORMIND, a relatively complex individual-based forest gap model, to a range of different virtual inventory data created from the model as well as to real field data from an Ecuadorian tropical forest.

Validation of the method with virtual inventory data
Fitting the model to different virtual inventory data sets allowed us to assess uncertainty and bias of the fit for situations where true parameters were known. For the most detailed data (abundance and growth distributions, scenario V1 in Table 2), estimated parameter values were largely unbiased, with correlations between a few of the parameters (Fig. 4, as well as Figs. 2,3 in the supplementary material). With increasing level of aggregation (scenarios V3-V5) parameter values showed increasing correlations, bias and uncertainty. Correlations in the posterior indicate a trade-off between parameters for the purpose of reproducing the data used for the fit. For scenario V1 (full data), for example, correlations occurred mostly within the parameters of the same PFT, between mortality and recruitment, which indicates that higher  Table 3 in the supplementary material for prior values and units). Values used by Dislich et al. (2009) are marked as dark red triangles. An unscaled version of these distributions and correlations are provided in Fig. 12,13 of the supplementary material.
mortality can to some extent be compensated by higher recruitment to produce similar population sizes (as V1 contained growth data, growth rates are tightly constrained, so the only option to maintain similar population sizes is to increase recruitment). In scenario V3, which did not include growth data, posterior parameter estimates also show correlation between mortality and growth, evidently because growth is unconstrained for this data type. It is important to take correlations into account when interpreting marginal parameter uncertainties such as Fig. 3a: if there are correlations between parameters, marginal uncertainties appear wider than in the multivariate correlation plots. This remains true for higher-order correlations, which are likely present for more aggregated data types used in scenarios V4 and V5, but which are difficult to visualize. Comparing the extent to which model parameters are constrained by the data based on the width of their marginal posterior distribution only can therefore be misleading in the presence of strong correlations. It is an advantage of the Bayesian analysis (or rather the use of an MCMC) that these interactions can be made explicit and interpreted. Thinking about the reasons for correlations may also be helpful for understanding and improving the model structure, although we stress that a correlation in the posterior does not necessarily mean that a parameter is redundant. It merely means that changes in one of the parameters may be counterbalanced by the other to maintain the same value of the model output under consideration. For example, correlations and bias increase from V1 to V3, indicating that even for fitting recruitment, mortality and growth parameters only, static data such as stem size distributions do not provide sufficient information to constrain all parameters at once. Thus, correlations are connected to a particular data type, and they inform us which parameters cannot be fully constraint by this data type.
Bias and correlations observed in the scenarios V1-V5 using the virtual inventory data seemed to originate predominantly from data limitations and not from problems with the simulation-based likelihood approximation. We saw no indications that would suggest that the parametric model (multivariate normal) used in the likelihood approximation created any problems or bias by not adequately summarizing model outputs, which would be theoretically possible. However, due to the computational complexity of our study, it was not possible to make a more systematic analysis of this question, for example by using virtual replicates of the field data sets or less aggregated data types.

Fit to Ecuadorian field data
Only static data was available to us for fitting the FORMIND model to field data from a montane forest in Ecuador. Our previous analysis suggested that these data would not be sufficient to sensibly constrain all demographic parameters at once. To get ecologically interpretable results, we therefore fixed the recruitment parameters to the values used in Dislich et al. (2009), and calibrated mortality and growth parameters only. Prior uncertainty was considerably reduced by these data (Fig. 5), suggesting that our approach together with the Ecuadorian data is able to substantially constrain the parameters under calibration. Marginal posterior parameter estimates are similar to those derived by Dislich et al. (2009) with a combination of literature data, expert knowledge and calibration (see supplementary material, Table 3 for exact values).
From the fits to the virtual inventory data V3 (Figure 7, supplementary material), we expected correlations in the posterior mostly to occur between parameters of the same PFT. We find those correlations, but we also find additional correlations, particularly between the mortality parameters of some PFTs (Fig. 13, supplementary material). To understand this, one has to know that species grouping designed by Dislich et al. (2009) is hierarchical, consisting of 7 PFTs that were further divided into 4 growth groups with equal maximum diameter growth for the PFTs in each group, with the following relation between (PFT) and growth group: (1)-2,(2)-1,(3,4)-3,(5,6,7)-4. Diameter growth parameter 3, which is estimated lower, thus applies to the midsuccessional PFTs 3 and 4, and diameter growth parameter 4, estimated higher, applies to the late successional PFTs 5, 6 and 7. This hierarchical species grouping is mirrored in the correlation structure, with particularly strong correlations in the mortality parameters of PFTs that belong to the same growth group. Our interpretation of this pattern is that PFTs in the same growth group are competing more strongly with each other than those that are in different growth groups.
Differences to the parametrization of Dislich et al. (2009) are particularly evident in the mortality parameters. Lower values were estimated for the mortality of the midsuccessional PFTs 3 and 4, while mortality of the late successional PFTs 5, 6 and 7 was estimated higher. This pattern is mirrored in the maximum diameter growth rates of mid-successional species. Thus, our study points to less pronounced differences between mid and late-successional types than Dislich et al. (2009). We can only speculate about the reason for these differences. In general, one would think that the systematic parameter estimation is more reliable than the manual calibration by Dislich et al. (2009). However, although Dislich et al. (2009) calibrated to the same data, they also considered the fit of other model outputs such as total biomass and expert opinions for fixing the parameters. Expert opinion in particular would favor more pronounced differences in mortality rates between mid and late successional species due to ecological expectations, although specific empirical data on tree mortality or on maximum growth rates under full light were not available. Secondly, there are significant correlations between the parameters, which allow us to gain a similar fit with a range of different parameter values. And finally, we were using the model in this study at a lower temporal resolution (5-year time steps) than Dislich et al. (2009) to reduce computing time, which can affect model dynamics and equilibrium distributions, meaning that slightly different parameter values would be estimated for the same model with different temporal resolution.

Advantages compared to conventional calibration methods
Our results demonstrate that inverse parameter estimation with a likelihood function derived from the stochasticity in the model outputs is feasible and provides good results, even for a relatively complex and runtime-intensive ecological model. This is encouraging in itself, as it is neither trivial to calibrate a parameter-rich model with heterogeneous data in general, nor easy to address all the technical challenges for performing the simulation-based likelihood approximation. A valid question, however, is whether the gain is worth the effort -after all, our approach is connected with considerable computational and conceptual costs, and all we gain are parameter estimates that could probably also have been derived with conventional inversion methods such as parameter optimization.
We believe the effort is justified, particularly because there are practical advantages of simulation-based likelihood approximations for ecological research that extend far beyond what we could demonstrate in this study. First of all, there is considerable interest in connecting models to large and heterogeneous data sources that become increasingly available (Luo et al., 2011;Hartig et al., 2012;Dietze et al., 2013). A practical problem in this context is that conventional methods provide no good answer as to how different data sources should be weighted to construct a joint likelihood or objective function. Moreover, ecological processes almost inevitably lead to correlations between those different data types, meaning that we would not expect errors to be independent, posing a challenge for conventional methods. Simulation-based likelihood approximations provide a natural answer to these problems. Assuming that the simulation model includes all major sources of stochasticity, likelihoods approximations automatically weight the importance of different model outputs and account for correlations between them. In our study, we can see this in the combined fit of growth rates and stem size distributions, which required no weighting of these two patterns and automatically accounted for second order correlations between them.
Moreover, under conventional inverse parametrization procedures, one might see that a certain pattern is not well represented, but it is often difficult to decide whether this is a random or a systematic problem. Simulation-based likelihood approximations allow us to make a definite statement about the probability of observed patterns given the current model (parameters). Thus, we can use the full arsenal of statistical procedures, including Bayesian and frequentist model selection, to compare alternative ecological hypotheses. The possibility of such rigorous statistical tests for alternative process-based models will likely increase the acceptance of process-based models as a tool, not only for representing and predicting, but for statistically testing ecological knowledge.

Differences to ABC
The comments in the previous subsection apply to parametric and non-parametric likelihood approximation alike. However, it also seems interesting to discuss differences between the parametric likelihood approximation used in this study and the more widely used non-parametric approximation used in Approximate Bayesian Computation (ABC, Beaumont, 2010). As discussed in Hartig et al. (2011), unlike ABC, parametric likelihood approximations will almost inevitably exhibit a certain amount of bias because it is unlikely that a simple distributional model can emulate model output distributions in all respects (particularly in the tails of the output distribution). Yet, the parametric approximation also has practical advantages. Many ecological models have to be run into equilibrium before predictions can be made. Once such a model is in equilibrium, more draws for the parametric approximation can be generated relatively cheap, while a new run has to be started for each ABC step. In our example, the time required for the parametric approximation in one MCMC step was not much longer than for an ABC step, but the parametric approximation ensures a good acceptance probability. To reach the same acceptance probability with ABC, we would have to accept a relatively large ABC approximation error. This error may be corrected later, but the fact remains that for situations where the number of possible MCMC evaluation is fixed (complex models), both ABC and parametric approximations will have a non-negligible error. We conjecture that the balance could well be in favor of parametric approximations in situations such as the one encountered in this study.

Conclusions
Our results suggest that likelihood approximations, in particular parametric likelihood approximations, are a promising route for the parametrization of stochastic ecological models. Their use is technically more challenging than the "traditional" Bayesian approach where likelihoods are based on phenomenological error models. The advantage, however, is that error models are based on the same ecological mechanisms as all other model predictions. Thus, they allow a more rigorous test of the mechanistic model assumptions, because the mechanisms have to explain both the mean and the variance in the data. Moreover, likelihood approximations account for the relative importance and correlations between different data types predicted by the model, which makes them interesting when models have to be coupled to heterogeneous data. In this study, additional computational costs of the approach were moderate (factor 2-5) compared to a standard Bayesian approach due to the fact that the model had to be run into equilibrium in any case. Such runtime differences appear secondary compared to the methodological advantage of rigorously testing our mechanistic understanding of ecosystems against field data, including the sampling and measurement process. Parametric likelihood approximations therefore seem particularly promising for models that have to be run into equilibrium, contain the dominant stochastic processes, use heterogeneous data, and predict outputs that can be well summarized by standard distributions.