Interactive comment on “ Potential sources of variability in ocean acidification mesocosm experiments

General comments Variability in responses and interpretation of the results of acidification experiments is a relevant problem and the topic of this paper is interesting. However the title does not accurately reflect the content of the paper, as it is general enough to have a reader believe the article treats variability in all kinds and types of mesocosm experiments using acidification treatments while in fact the focus of the paper is an in depth analysis of the cause of variability in two specific experiments and only dealing with primary producer responses.

In the following we provide our responses to individual comments raised by of Referee #1.
Comment 1 (C1): "I am uncomfortable with the assertion that these model simulations produced in this study can be used to define which uncertainties lead to observed variably in experimental results (L4-6, page 1)" Authors' response to C1: The model is a mechanistic description of plankton growth dynamics based on dynamically and ecologically consistent equations.Due to the skill of the simple model and the consistency of our methodology, we regard our results as reasonable estimates of how the investigated factors can generate the observed variability among replicates.However, we decided to address this concern of Referee #1 and explicitly mention the caveat that an agreement between experimental data and simulation results does not necessarily imply uniqueness and maximum of realism reproduced by the model structure and by the parameterizations imposed.As a consequence, results based on any particular model are always prone to indetermination.
Regarding the variability decomposition method, we performed uncertainty propagation following the "Guide to the expression of uncertainty in measurement" (GUM), in particular, the Supplement 1: "Propagation of distributions using a Monte Carlo method", prepared in 2008 by the international Joint Committee for Guides in Metrology (JCGM).This method has been adopted by many organizations, is widely used, and has been implemented in standards and guides on measurement uncertainty.Authors' response to C2: The reference solution was obtained by adjusting parameter values so that they can be used for simulations of two independent experiments.If we had applied a data assimilation approach we may have identified parameter values based on probabilistic considerations, e.g. by maximizing a likelihood.Doing so does not automatically provide reasonable parameter estimates.The additional application of a data assimilation approach is clearly out of scope of the study presented here.The 'adjustment' of the parameter values to provide a qualitative fit to experimental data is rather common than unusual.In preparation of our study described here we performed an extensive series of simulation runs, with models of different complexity and with different parameterizations.This preparatory work is not documented explicitly, as it does not provide much insight with respect to the objective of the manuscript.For the same reason we did not introduce a data assimilation method for specifying a reference model solution.
The major achievement during the process of model selection and of parameter adjustment was to identify the simplest model structure while keeping parameter values C3 within meaningful limits.Thus, we have identified an effective representation of major processes underlying the observed experimental dynamics, which specified the model reference solutions.This 'calibrated' reference solution successfully explains 54 data sets of repeated measures of POC, PON and DIN (i.e., 3 quantities x 3 treatment levels x 3 replicates x 2 independent experiments on ocean acidification with primary producers).
We added some more information (second paragraph section 2.1) about the calibration of the model and how the reference solution is identified.In fact, we also considered a third independent data set for model validation.The third data set is not included since it is not directly relevant for the study presented here.
Comment 3 (C3): "Through the various optimizations of parameter values, it is therefore possible that random combinations of parameters can be achieved to produce the observed variability in POC, or close to it, but this does not mean that this is what happens in reality.I believe you address this partially at the end of the methods section (L24-27, page 5), but isn't this the rationale of your study?"Authors' response to C3: We disagree with the Referee's remark that any random combination of parameter values would yield similar variability on POC.We recall that no systematic data assimilation method was employed for parameter optimization.By 'optimization of parameter values' we assume that Referee #1 actually refers to the iterative procedure of assessing the limits of variations of parameter values that generate variations in model states at specific times.These variations in model states (like in POC) are thus mechanistically (dynamically) linked to the variations of parameter values, including initial conditions.The parameter values were varied individually while other parameter values remained fixed (note that the fixed values correspond with the calibrated model reference solution).
In L24-27, page 5, we noticed that a substantial achievement already is to mechanisti-cally elucidate a minimum number of requirements that are sufficient for the uncertainty to escalate and mask treatment effects (specially since it is not possible to find the total number of requirements, neither experimentally nor mathematically).The rationale of our study is to show that slight differences in initial conditions (in particular, in nutrient concentration, mean cell size or biomass losses) are already sufficient to blur the signal of treatment effects.
Comment 4 (C4): "Indeed, in page 4, L29-32, it seems you optimize the parameter values by minimizing a model costs based on an emergent property (POC) (i.e. the relationship between POCexp and POCmod), not your state variables.Is this the case?" Authors' response to C4: We learned that the essence of the method is not sufficiently well documented.The Referee's comment is helpful.A conceptual diagram of the method has been devised and included to our revised manuscript, see new Fig. ( 1).The diagram illustrates all steps of the work flow of the analysis: the model was calibrated with POC, PON and DIN experimental data (calculation of the reference run, steps 1 and 2 in Fig. 1); later, we perform the uncertainty propagation analysis (part of it is the comparison of POC experimental and simulated variability to estimate the tolerance thresholds, step 5 in Fig. 1).
The model counterpart to the POC measurements is equal to the sum of two state variables, namely phytoplankton carbon and the carbon pool attributed to detritus and all heterotrophs: POC = PhyC + DH_C (see former L23, page 3).

C5
We find sufficient evidence in the literature that primary production becomes enhanced under elevated CO2 conditions.This enhancement may seem to be low and whether it can be unambiguously revealed depends on the experimental design.We also follow theoretical considerations as described in Wirtz (2011, Journal of Phytoplankton Research 33, 9:1325-1341, that are in support of finding enhanced carbon fixation rates with increasing CO2 levels.We greatly appreciate that Referee #1 provided two valuable references.In Artioli et al. (2016, Biogeosciences Discussions 11: 601-612) it is straightened that high CO2 enhances primary production (and in that article also PeECE III data are used).In the second study, by Nagelkerken & Connel (2015, PNAS 112: 13272-13277), it is shown that high CO2 enhanced primary production in most cases while the variability in the data may become too large.These studies are good examples that highlight the relevance of our study: controversial results in ocean acidification experiments.We have considered both studies in the Introduction and Methods sections.

Responses to detailed comments by Referee #1
Detailed comment 1 (DC1): "Did you constrain the normal distributions sampled in order to limit the parameter values to positive values?If so, please state this and explain how you constrained the sampled distributions used for delta phi" Authors' response to DC1: We thank Referee #1 for this notice.We added a sentence at the end of the Methods section to explain that we dismissed negative values, representing less than 5% of all trajectories.Given the super-optimal number of virtual replicates, this reduction did not affect the results.
Detailed comment 2 (DC2): "Could you explicitly provide the values you used in the definition of future, present and past CO2 conditions?Could you provide the references you used to define your initial conditions for all your model factors (Tables I and  II)?My expertise in plankton ecology is not sufficient to allow me to comment on the actual, values used as initial conditions for the model.So these should be reviewed by someone with that knowledge."Authors' response to DC2: The CO2 values we used as forcing were plotted in Appendix D. They were downloaded from PANGAEA, together with the initial conditions of the state variables given in Table (1).Table (2) lists the parameter values used in the reference run.Many parameters are difficult to measure and have not been experimentally determined, but the values used remain within the range of plausible biological values.
Detailed comment 3 (DC3): "L3-5 page 5: is this a reasonable expectation, given that your parameters are not independent?E.g. aCO2 is possibly quite tightly dependent on V*max?Please expand on why you think this is an appropriate assumption." Authors' response to DC3: We refer to independence of errors (i.e.covariances being zero), not the independence of the parameters (no collinearities).The independence of the uncertainties is extensively assumed (we have included the reference to the GUM).Although this is a typical assumption, we agree that this assumption introduces limitations and the consideration of correlations will likely be an improvement.
Our model accounts for the dependence between aCO2 and V*max because we follow Edwards et al. (2011, Ecology 92:2085-2095) and resolve allometric relationships that describe the relation between maximum growth and nutrient uptake through the logarithm of the equivalent spherical diameter (GUM Supplement 1, Section 6.1.4NOTE: "It may be possible to remove some or all dependencies by re-expressing relevant input quantities in terms of more fundamental independent input quantities on which the original input quantities depend").
Detailed comment 4 (DC4): "Results L7-10 page 6: it could be argued that the purpose of conducting mesocosm experiments in real life (usually to investigate what we think C7 are the mechanism underlying variation in some variable in real life) is to observe what mean and variability we get under pre-determined conditions.To bind the initial conditions of the experiment in order to modify the result (variability) could be perceived to be a circular argument." Authors' response to DC4: Effects of the treatment are expected to appear as differences among treatment levels, not as differences among replicates of the same treatment level.We provide an estimate of how much the latter need to be constrained in order to observe the former.We do not suggest to 'bind' the initial conditions of different treatment levels since the exploration of the differences among treatment levels is the aim of the experiment, as correctly pointed by Referee #1.
Detailed comment 5 (DC5): "L14-onwards: if the parameters are tuned based on model cost calculated using POCexp, how can we be sure that this matters in any way other than in the model structure used here?What you have carried out is a model perturbation experiment, with tuning of parameters.I find it difficult to determine how we can derive new knowledge about the way in which ocean acidification impacts plankton communities." Authors' response to DC5: The main point is that our method involves a mechanistic understanding of how the plankton community can react to ocean acidification.Based on the available data and on this mechanistic description we make inferences about CO2 effects on the timing and intensity of the phytoplankton bloom (it is earlier and larger under high CO2 conditions) and about the origins of variability in observations from ocean acidification mesocosm experiments that include a natural plankton community.Furthermore, we can disentangle differences in observed POC in response to a CO2 effect and in response to variations in ecophysiological factors (phi_i).To make similar inferences from statistical analyses of the data is hardly possible, unless such analysis accounts for some of the predominant interdependencies between nonlinear processes.The mechanistic model description introduces an explicit representation of such nonlinearities.
We thank Referee #1 for sharing her/his thoughts and for proposing two new references.The comments helped to improve the manuscript and to avoid any further misconceptions.Following her/his suggestion, we hired an English editing service for the manuscript text as well.

Fig. 1 .
Fig. 1.Conceptual diagram for the variability decomposition method based on uncertainty propagation C10

Fig. 2 .
Fig. 2. The exploration of the sources of variability in an experiment requires a multi-factorial high-dimensional set-up (left).Alternately, we simulate the biomass dynamics with virtual replicates (right) Comment 2 (C2): "... you cannot use this study to infer about what is observed in POC in the experiments, because you cannot trace how likely your parameter values used, or the simulations of your state variables are.You do not use any observational datasets to validate the state variables simulations associated with the mechanisms represented, or the parameter values chosen here (adjusted?)?