Use of remote-sensing reflectance to constrain a data assimilating marine biogeochemical model of the Great Barrier Reef

Skillful marine biogeochemical (BGC) models are required to understand a range of coastal and global phenomena such as changes in nitrogen and carbon cycles. The refinement of BGC models through the assimilation of variables calculated from observed in-water inherent optical properties (IOPs), such as phytoplankton absorption, is problematic. Empirically derived relationships between IOPs and variables such as chlorophyll-a concentration (Chl a), total suspended solids (TSS) and coloured dissolved organic matter (CDOM) have been shown to have errors that can exceed 100 % of the observed quantity. These errors are greatest in shallow coastal regions, such as the Great Barrier Reef (GBR), due to the additional signal from bottom reflectance. Rather than assimilate quantities calculated using IOP algorithms, this study demonstrates the advantages of assimilating quantities calculated directly from the less error-prone satellite remote-sensing reflectance (RSR). To assimilate the observed RSR, we use an in-water optical model to produce an equivalent simulated RSR and calculate the mismatch between the observed and simulated quantities to constrain the BGC model with a deterministic ensemble Kalman filter (DEnKF). The traditional assumption that simulated surface Chl a is equivalent to the remotely sensed OC3M estimate of Chl a resulted in a forecast error of approximately 75 %. We show this error can be halved by instead using simulated RSR to constrain the model via the assimilation system. When the analysis and forecast fields from the RSR-based assimilation system are compared with the non-assimilating model, a comparison against independent in situ observations of Chl a, TSS and dissolved inorganic nutrients (NO3, NH4 and DIP) showed that errors are reduced by up to 90 %. In all cases, the assimilation system improves the simulation compared to the non-assimilating model. Our approach allows for the incorporation of vast quantities of remote-sensing observations that have in the past been discarded due to shallow water and/or artefacts introduced by terrestrially derived TSS and CDOM or the lack of a calibrated regional IOP algorithm.

Abstract.Skillful marine biogeochemical (BGC) models are required to understand a range of coastal and global phenomena such as changes in nitrogen and carbon cycles.The refinement of BGC models through the assimilation of variables calculated from observed in-water inherent optical properties (IOPs), such as phytoplankton absorption, is problematic.Empirically derived relationships between IOPs and variables such as chlorophyll-a concentration (Chl a), total suspended solids (TSS) and coloured dissolved organic matter (CDOM) have been shown to have errors that can exceed 100 % of the observed quantity.These errors are greatest in shallow coastal regions, such as the Great Barrier Reef (GBR), due to the additional signal from bottom reflectance.Rather than assimilate quantities calculated using IOP algorithms, this study demonstrates the advantages of assimilating quantities calculated directly from the less error-prone satellite remote-sensing reflectance (RSR).To assimilate the observed RSR, we use an in-water optical model to produce an equivalent simulated RSR and calculate the mismatch between the observed and simulated quantities to constrain the BGC model with a deterministic ensemble Kalman filter (DEnKF).The traditional assumption that simulated surface Chl a is equivalent to the remotely sensed OC3M estimate of Chl a resulted in a forecast error of approximately 75 %.We show this error can be halved by instead using simulated RSR to constrain the model via the assimilation system.When the analysis and forecast fields from the RSR-based assimilation system are compared with the non-assimilating model, a comparison against independent in situ observations of Chl a, TSS and dissolved inorganic nutrients (NO 3 , NH 4 and DIP) showed that errors are reduced by up to 90 %.In all cases, the assimilation system improves the simulation compared to the non-assimilating model.Our approach allows for the incorporation of vast quantities of remote-sensing observations that have in the past been discarded due to shallow water and/or artefacts introduced by terrestrially derived TSS and CDOM or the lack of a calibrated regional IOP algorithm.

Introduction
Aquatic biogeochemical (BGC) models have been used to understand a range of coastal and global phenomena such as ocean acidification (Mongin et al., 2016), nutrient pollution (Skerratt et al., 2013) and carbon cycles, and they are central to our predictions of global climate (Sarmiento and Gruber, 2006).At the coastal/regional scale, non-linear BGC processes driven by planktonic interactions, as well as nonlinear circulation features such as mesoscale eddies, limit the timescale over which BGC properties are deterministically predictable (Baird, 2010).For the purposes of prediction, it is therefore necessary to assimilate observations to correct for model errors and non-linear processes.
In situ observations of phytoplankton pigments and macronutrients are sparse in space and time due to the pro- A diagram denoting the steps used to relate and transform various optical properties with the system state and observations.The green circle represents the in-water system state as predicted by a marine BGC model.The blue circle represents the inherent optical properties (IOPs) of the system state.The magenta circle represents the depth resolved apparent optical properties (AOPs).The cyan circle represents the 2-D remote-sensing reflectance (RSR).The two red boxes represent either in situ or satellite remote-sensing observations.Each circle is partitioned into three segments where each segment represents the possibility to compare like-for-like variables.
hibitive expense of collecting them.Optical sensors on gliders and floats provide high-resolution in situ observations that are used to estimate pigment and nutrient concentrations.Nonetheless, these observations in large parts of the ocean remain sparse.The most spatially comprehensive dataset available for BGC assimilation is from Ocean Color (OC) remote sensing.The assimilation of remotely sensed data into marine BGC models has been problematic due to differences between the variables represented in models and the variables that are routinely observed (Baird et al., 2016a), typically referred to as "difference in kind" errors.Satellites measure the top of atmosphere radiance, not chlorophyll a concentration (Chl a; or other modelled variables) directly.To attain estimates of Chl a or other variables, the spectrally resolved top of atmosphere radiance is converted into atmospherically corrected remote-sensing reflectances (RSRs), which are then related to Chl a via empirical statistical relationships derived from in situ observations.Figure 1 graphically demonstrates the key steps in the OC processing chain and points at which models and remotely sensed and in situ observations can be compared.
Early studies investigating the benefits of assimilating OC products, predominantly SeaWiFS-derived Chl a, into BGC models include those of Carmillet et al. (2001) and Natvik and Evensen (2003), with a comprehensive review of algorithms used and observations assimilated detailed in Gregg (2008).Considerable effort has been invested in data assimilation (DA) algorithm development, with ensemble and variational approaches being the most common.A thorough review of these approaches in a statistical sense is presented in Dowd et al. (2014).There are now examples of operational and pre-operational global systems that routinely assimilate Chl a products (Ford et al., 2012).Additionally, there has been further experimentation with assimilating alternative remotely sensed apparent optical properties (AOPs) such as the vertical attenuation coefficient at 443 nm, Kd 443 (Ciavatta et al., 2014) and inherent optical properties (IOPs) such as phytoplankton absorption (a ph ), as described in Shulman et al. (2013).A map of the Great Barrier Reef region, with the colour bar denoting the water depth, markers denoting the population centres (red triangles), IMOS NRS sites (yellow triangles), GBRMPA MMP Water Quality Meters (WQMs; yellow circles) and points of interest referred to in the text (red circles), with the glider track (white line adjacent to Lizard Island).The in situ sampling locations and glider observations are used to assess the data assimilation system performance.
It is well known that OC algorithms, such as observed OC3M (Moderate Resolution Imaging Spectroradiometer (MODIS) three-band Chl a algorithm), that are optimized for global applications suffer from errors due to a variety of optically active constituents in coastal and shelf waters (Odermatt et al., 2012).Furthermore, it has also been noted that even globally, there is a non-uniform distribution of error and substantial bias in the remotely sensed OC3M-derived Chl a. Satellite-derived OC algorithms such as OC3M are a function of observed RSRs.A substantial effort is invested in empirical studies that convert RSRs to BGC quantities such as Chl a, total suspended solid concentration (TSS), phytoplankton functional types (PFTs) and coloured dissolved organic matter (CDOM; Odermatt et al., 2012).Each of these empirical relationships have differing error magnitudes stemming not only from a difference in kind but also from representation errors.In optically deep regions (e.g.offshore waters) not influenced by sediment resuspension and terrestrial runoff, typical errors for OC-derived Chl a (e.g.OC3M) are less than 40 % and as low as 5-20 % for IOPs and AOPs.However, in optically complex coastal areas where there is river discharge, sediment resuspension and a surface expression of benthic reflectance, errors can exceed 300 % (e.g.Schroeder et al., 2016;Qin et al., 2007).
Numerical weather prediction (NWP) avoids difference in kind errors through using the model to simulate directly observed quantities, such as temperature brightness, in preference to deriving other quantities such as temperature and humidity profiles from the brightness measurements (Derber and Wu, 1998;Dee et al., 2011).The goal of this study is to apply this approach to marine BGC modelling, assimilating RSRs rather than quantities calculated using empiricalstatistical relationships.We assess the approach of assimilating RSRs through a comparison against withheld in situ observations of a range of BGC quantities such as Chl a, TSS and dissolved nutrients.
The Great Barrier Reef (GBR) region, located along the northeastern coastline of Australia, is used to demonstrate the assimilation of RSRs in optically complex and shallow waters (Blondeau-Patissier et al., 2009).The GBR is characterized by fringing reefs along the continental slope that create a semi-connected inshore lagoon that spans over 3000 km of coastline (Fig. 2).The GBR ecosystem, described as one of the seven natural wonders of the world, is under increasing pressure from local and global anthropogenic stressors (De'ath et al., 2012).Decreasing water clarity due to nutrient and sediment pollution is considered a serious threat to the GBR ecosystem (Thompson et al., 2014), with major concerns including the impact of reduced benthic light levels on coral and seagrass communities (Collier et al., 2012;Baird et al., 2016b) and the impacts of invasive species (e.g.Morello et al., 2014).
The paper is structured in the following manner: in Sect. 2 (Methods) the data for assimilation and skill assessment are presented, along with a description of the model and assimilation methods used.Section 3 contains the results from the control (non-assimilating) run of the model and subsequent DA experiments.In Sect. 4 we discuss the approach used and implications of the findings more generally.We conclude with major findings in Sect. 5.

Methods
The assimilation system was tested between 25 May and 22 September 2013.This period was chosen as it coincides with a field program by the Great Barrier Reef Marine Park Authority (GBRMPA) inshore Marine Monitoring Program (MMP) and an autonomous glider deployment (Integrated Marine Observing System, IMOS), which are used for assessment of the assimilation system.Additionally, this period also has a large number of cloud-free days, which increases the amount of remotely sensed observational data available for assimilation.Details of the observations, model and assimilation system are given below.

In situ observations
All in situ observations have been withheld from the assimilation system for validation purposes and are primarily obtained from two different programs.
The IMOS has deployed fluorometers on moorings at Yongala and North Stradbroke National Reference Stations (NRS; Fig. 2).This study uses the monthly observations of dissolved inorganic nutrients (NO 3 , NH 4 and DIP) at the NRS sites (Lynch et al., 2014).Additionally, glider data obtained from the IMOS-operated Australian National Facility for Ocean Gliders (ANFOG) provide cross-shelf sections of water column properties, including temperature, salinity and chlorophyll fluorescence.These locations are shown in Fig. 2.
The GBRMPA MMP (Fig. 2, yellow circles) samples 13 sites in inshore regions and in the GBR lagoon, and samples nutrients and Chl a extractions three times a year (Thompson et al., 2011;Rolfe and Gregg, 2015); no MMP bottle sampling occurred during the simulation period.The Australian Institute of Marine Sciences (AIMS) deployed moorings at the GBRMPA MMP sites from 2009 to 2014 and included Sea-Bird water quality monitors (WQMs) that measure chlorophyll fluorescence and turbidity, which are used for assessment purposes in this study.A comparison of the 2011-2014 control run simulation against the GBRMPA MMP observations, and other observations, is available in a skill assessment report at https://research.csiro.au/ereefs/models.

MODIS observations
The daily observations of RSR are obtained from MODIS-Aqua and an atmospheric correction developed for the region is applied (Schroeder et al., 2007).The atmospheric correction applied an Artificial Neural Network (ANN) approach trained by a radiative transfer model to invert the topof-atmosphere signal measured by MODIS Aqua.The ANN algorithm was adapted to an approach previously developed for the MEdium Resolution Imaging Spectrometer (MERIS) sensor but on the basis of a different learning algorithm Figure 3.The eReefs modelling system with optically active components identified with beige colouring and an asterisk, with the number of asterisks denoting the number of different optically active elements with this component.Thus, each of the four microalgae have two pigment types, one absorbing like divinyl chlorophyll a, and the other like photosynthetic carotenoids.There are two seagrass types, and corals have both skeletons and zooxanthellae.There are three types of detritus that absorb and scatter, and the sediment model contains a suspended fraction and four (mixed) sediment compositions.Additionally, pure seawater both absorbs and scatters light.(Schroeder et al., 2007).Algorithm performance is described in detail in Goyens et al. (2013) and King et al. (2014).
SeaDAS-provided Level-2 flags were used to quality control the observed RSR and to exclude erroneous and out-ofrange pixels.We filtered the data for land and severe sunglint-affected pixels, cloud contamination including cloud shadows, and rejected pixels with observing and solar zenith angles above 52 and 70 • respectively.Super-observations (Cummings, 2005;Oke et al., 2008a), of RSR and OC3M (see Sect. 2.5.2,Eq. 21) are generated for the assimilation system by taking the mean and variance of all the 1 km resolution observations that fall within a 4 km model grid cell.The mean value of the super-observation is then assimilated, while the variance is used as an estimate of the representation error in the observation error covariance matrix.More details on this process are given in Sect.2.5.2.

The eReefs modelling system
We used the eReefs coupled hydrodynamic, sediment and BGC modelling system (Schiller et al., 2014).The hydrodynamic model is a fully 3-D finite-difference baroclinic model based on the 3-D equations of momentum, continuity and conservation of heat and salt, employing the hydrostatic and Boussinesq assumptions (Herzfeld, 2006;Herzfeld and Gillibrand, 2015).The sediment transport model adds a multilayer sediment bed to the hydrodynamic model grid and simulates sinking, deposition and resuspension of multiple size classes of suspended sediment (Margvelashvili, 2009;Margvelashvili et al., 2016).The complex BGC model simulates optical, nutrient, plankton, benthic organisms (seagrass, macroalgae and coral), detritus, chemical and sediment dynamics across the whole GBR region, spanning estuarine systems to oligotrophic offshore reefs (Fig. 3; Baird et al., 2016a).An expanded description of the BGC model is given in Appendix A, with a brief description of the optical model in Appendix B. Briefly, the BGC model considers four groups of microalgae (small and large phytoplankton, Trichodesmium and microphytobenthos), two zooplankton groups, three macrophytes types (seagrass types corresponding to Zostera and Halophila, macroalgae) and coral communities.Photosynthetic growth is determined by concentrations of dissolved nutrients (nitrogen and phosphorous) and photosynthetically active radiation.Microalgae contain two www.biogeosciences.net/13/6441/2016/Biogeosciences, 13, 6441-6469, 2016 pigments (chlorophyll a and an accessory pigment) and have variable carbon : pigment ratios determined using a photoadaptation model (described in Baird et al., 2013).Overall, the model contains 23 optically active constituents (Baird et al., 2016a;and Appendix A).
The model is forced with freshwater inputs at 21 rivers along the GBR and the Fly River in southwest Papua New Guinea.River flows are obtained from the DERM (Department of Environment and Resource Management) gauging network.Statistical flow/load relationships are used to account for nutrient and sediment inputs from rivers into the model (statistical relationships between river flow and nutrient concentrations; Furnas, 2003).Nutrient concentrations flowing in from the ocean boundaries were obtained from the CSIRO Atlas of Regional Seas (CARS) 2009 climatology (Ridgway et al., 2002).

Calculation of remote-sensing reflectance from BGC state
The model contains 23 optically active constituents (Baird et al., 2016a;Appendices A and B).To calculate the RSR at the surface, we need to consider the light returning from multiple depths, and from the bottom.Rather than using a computationally expensive radiative transfer model, we approximate RSR based on an optical-depth weighted scheme (Baird et al., 2016a); alternative methods are given in Fujii et al. (2007) and Dutkiewicz et al. (2015).The ratio of the backscattering coefficient to the sum of backscattering (b b ) and absorption (a) coefficients for the whole water column at wavelength, λ, is where w λ,z is a weighting for each layer representing the component of the RSR due to the absorption and scattering at depth z .The weighting fraction is given by where k λ is the vertical attenuation coefficient at wavelength λ, z 0 and z 1 are the top and bottom depths of the layer and the factor of 2 accounts for the path length of both downwelling and upwelling light.The vertical attenuation coefficient is calculated from the sum of the absorption and scattering properties of each of the optically active constituents, and the zenith angle (for each of these relationships, and more information, see Baird et al., 2016a).The integral of w λ,z to infinite depth is 1.In areas where light reaches the bottom, the integral of w λ,z to the bottom is less than 1, and benthic reflectance is included in the sum as an extra term with a weighting of 1 − w λ,z .
The subsurface RSR, r rs , is given by where g 0 = 0.0895 and g 1 = 0.1247 are coefficients for the nadir view in oceanic waters that vary with wavelength and other optical properties (Morel et al., 2002) but can be approximated as constants (Lee et al., 2002).The constants result in a change of units from the unitless u to a per unit of solid angle, sr −1 , quantity, r rs .The above-surface RSR is given by (Lee et al., 2002) R rs,λ = 0.52 r rs,λ 1 − 1.7r rs,λ .
(4) Thus, the above-surface RSR is calculated from the inherent optical properties of the optically active constituents in the BGC model.

Data assimilation system
The DA algorithm used in this study is the deterministic ensemble Kalman filter (DEnKF; Sakov and Oke, 2008).The full BGC state variable list contains over 130 2-D and 3-D variables.Including all of these variables in the assimilation system is impractical due to memory constraints, but we also acknowledge that for many variables the observations will be uninformative and therefore not good candidates to include in the assimilation state vector.We therefore limit the variables that are updated within the system to a select subset that is detailed in Sect.2.5.2.

Data assimilation algorithm
The DEnKF is based on the Kalman filter analysis equation, of which various flavours have had general success in state estimation in other marine BGC DA problems (e.g.Hu et al., 2012;Ciavatta et al., 2016).The derivation of the DEnKF is given in Sakov and Oke (2008) and is a modification to the traditional Kalman filter equation: where x is the model state, Y is the vector of observations, H is the observation operator and the superscripts of "a" and "f" denote the analysis and forecast fields respectively.In this study, we only use a subset of the state variables in the DEnKF update; those variables not included in the state vectors denoted in Table 1 are not altered by the state update.
The forecast innovations (F ) are defined by The Kalman gain matrix, K, is given by where l is the localisation operator, applied in the form of covariance localisation (Gaspari and Cohn, 1999;Sakov and Bertino, 2011), with an isotropic localisation radius of 60 km, and R is the observation error covariance matrix.Off diagonal elements of R are set to 0, and the diagonal elements of R are the sum of the representation error, difference in kind error and analytical measurement error, termed E tot .These error terms are discussed in detail in Sect.2.5.2 (additionally see Schaeffer et al., 2016, for a further discussion of these error sources relating to BGC variables and methods to estimate them from glider data).
The background error covariance matrix, P, is given by where m is the ensemble size and i denotes the ith member of the ensemble.The background error covariance is approximated by a 36-member dynamic ensemble, where X f i is the ith ensemble member and x is the ensemble mean.We acknowledge that a 36-member ensemble is small, but, as demonstrated in the results section, this small ensemble performs adequately for the assimilation of a univariate observation type.The ensemble size will need to be increased to assimilate multivariate observations.To avoid negative values and normalise the state, we log-transform the state before forming the state vector.The application of the log-transform to biological variables is discussed in Parslow et al. (2013) and is commonly used in other BGC assimilation schemes (e.g.Ciavatta et al., 2014).The background error covariance matrix is never computed; rather a series of anomaly fields are constructed and denoted by A. We then construct the Kalman gain matrix in the observation subspace as per Sakov and Oke (2008): where the ith anomaly field is given by where x is the ensemble mean and is updated via Eq.( 5) and each anomaly field is updated by The full analysed ensemble is then given by The assimilation system iterates through time using a fiveday forecast duration.The assimilation system is cycled by calculating the analysis fields at time t, using the forecast from the previous cycle, X f (t), and observations Y (t) at time t, using all observations that fall within a window of t ±3 h.The numerical model is initialised using the analysis fields X a (t) and the next 5-day forecast is made.This forecast at t + 5 days, X f (t + 5), is then used in the next assimilation cycle.
The DEnKF requires the ensemble to be perturbed in such a way that it captures the main source of error.These perturbations are introduced in a way that captures our prior understanding of the dominant errors.In this system we expect that errors will stem from uncertainty in the initial conditions (ICs) as per most assimilation systems.Additional sources of error can stem from uncertainty associated with BGC process parameters, which has been discussed at length in Parslow et al. (2013), and river boundary conditions.In this study we have ignored the effects of uncertainty that propagate from the model physics (hydrodynamics), short-wave radiation forcing and open ocean boundary fluxes of BGC tracers, but we do acknowledge that they contribute to the overall uncertainty in the predicted system state.
In the context of this study, we have introduced perturbations to the ensemble by sampling ICs randomly from a 4-year run (where T = 4 years) of the BGC model: where X(t = 0) i is the initial condition for the model state for the ith member sampled for a uniform distribution with www.biogeosciences.net/13/6441/2016/Biogeosciences, 13, 6441-6469, 2016 no replacement, and X(t = 0) i=1 is the ensemble mean.Sensitivity experiments have shown the model is sensitive to perturbations in the quadratic zooplankton mortality rate coefficient for small (m Q,ZS ) and large (m Q,ZL ) zooplankton with units of d −1 (mg N m −3 ) −1 .These are considered system parameters and are as such uncertain.To this end we have perturbed the ensemble by sampling space-and time-invariant parameters from where LN is a log-normal distribution, and respective means of 0.012 and 0.007 are values as used in the control run (Baird et al., 2016a) and are given relatively broad standard deviations of 1 for both parameters.The river nutrient and sediment loads were altered by a time-invariant scaling factor (θ ) to all rivers: where N is a normal distribution truncated at 0. Each ensemble member has their load (Q i ) scaled according to where Q control is the load entering in the control run.

Assimilation system experiments and configuration
The assimilation system was assessed using five assimilation system configurations (Table 1) using a subset of the full model state in the assimilation state vector and corresponding diagonal elements (E tot ; see below) of the observation error covariance matrix (R).The assimilation experiments were conducted on Raijin, a super-computer hosted at the National Computational Infrastructure (NCI; http://nci.org.au/).The assimilation cycle progressed via a two-step process.The first step generated the forecast fields by integrating the ensemble forwards on 36 nodes of the system.Each node has a dual 8 core processor, with 16 Gb of RAM.This step takes approximately 90 min to simulate 5 model days.The second step generates the analysis fields and updates model scripts to allow for the next forecast cycle.The analysis step is undertaken on a single high-memory node and requires 800 Gb of RAM and takes 50 min using 16 cores.
We only allow the observations to update the variables contained in the assimilation state vector.The analysed assimilation state vector is then inserted into the full model state vector.
There are three sources of observational error (E tot ) that must be accounted for when relating remotely sensed observations to a modelled state variable: 1. Representation errors (E R ) arise due to the approximation that the modelled tracer quantities are an average over a whole model cell.This can be thought of as unresolved spatial variability.In the case of the MODIS observations, we set E R to be the standard deviation of all the 1 km log-scaled observations that fall within a 4 km model grid cell.
2. Difference in kind errors (E D ) arise when the variables that are being modelled and included in the assimilation state vector differ from the observations.For example, many studies have included surface Chl a (or some optical depth weighted average) in the assimilation state vector and assume there is a direct relationship with Chl a estimated using OC and the OC3M algorithm (or other quantities).The OC3M algorithm is known to have typical errors of between 30 and 70 % in blue water domains and errors that exceed 300 % in optically complex (or optically shallow) waters (Qin et al., 2007).
3. Analytical/sensor/processing errors (E A ), depending on the observational platform in use, can be small (e.g.Argo floats) or in the case of remote-sensing products, even with the ANN atmospheric correction, as large as 15-20 % (see Schroeder et al., 2007).
The total observation error used on the diagonal of R is then given by the sum of error types, E tot = E R +E D +E A , and is expressed as a standard deviation in log-space and can therefore be thought of as a fraction of the observed quantity.The sum of these error types can be large and forms the diagonal element of the observation error covariance matrix (R).The larger E tot is the lower the impact of the observations in the assimilation system.If we can remove the difference in kind error (E D ), then we only have representation error and analytical/sensor/processing error.Given that Level 3 OC remote-sensing products rely on empirical/statistical relationships, E D dominates; therefore if we can minimise E D (or remove it entirely), then the information content of the observations increases.Conversely, if we do not have a large enough observation error, we run the risk of overfitting the observations and generating unrealistically large increments for the unobserved model state variables.
Experiment 1 (EXP1) was designed to test the assimilation system under the assumption that modelled surface total Chl a (the sum of small and large phytoplankton Chl a, benthic Chl a and Trichodesmium Chl a) was equivalent to the Chl a from MODIS-observed OC3M.This experiment is analogous to those of Natvik and Evensen (2003), Gregg (2008) and Ford et al. (2012).This is a reasonable assumption in offshore waters, but OC3M is known to be unreliable in coastal waters where sediments (e.g.TSS), bottom reflectance and CDOM cause artificially high OC3M values.We expect that EXP1 will contain difference in kind errors that stem from the assumption that surface Chl a is equiv-alent to observed OC3M.The observation error prescribed for this experiment is considered to be the lower bound of OC3M errors as reported in Qin et al. (2007).
In experiments 2-4 (EXP2-4) we assume that the simulated OC3M (i.e.calculated from simulated RSR) is equivalent to the observed OC3M (i.e.calculated from observed RSR) and it is used as an input into the observation operator.The simulated OC3M (Eq.21) contains the signature from simulated TSS, CDOM and bottom reflectance as per Sects.2.3, 2.4 and Appendix A. Thus for EXP2-4, the observed and simulated OC3M contain no difference-in-kind errors (although we leave E D high during EXP2 for experimental purposes).In other words, the configuration used for EXP2 is the same as EXP1, except the effect of difference in kind error has been removed by using simulated OC3M, in place of total surface Chl a.In EXP3, we reduce the observation error to account for the reduction in E D , and in EXP4 we add additional variables to the assimilation state vector.
In EXP5, we assimilate observed RSR at 551 nm (R551) using the simulated RSR at the equivalent wavelength.Typically errors in RSR as reported in Schroeder et al. (2007) are in the order of 10-20 %, and we have used 0.2 for this experiment.
In all experiments we have generated a set of superobservations (Cummings, 2005;Oke et al., 2008b) by spatially averaging the 1 km observations onto the 4 km model grid.Prior to generating the super-observations in experiments 1-4 (EXP1-4), the observed atmospherically corrected RSRs (Schroeder et al., 2007) at 443, 488 and 551 nm were transformed into a single observation using the OC3M algorithm: ).
R rs,λ1 and R rs,λ2 are determined by the absolute magnitude (whichever is greater) of the remote-sensing reflectance, and R rs,λ1 is either the band centred on 443 or 488 nm and R rs,λ2 is the band centred on 551 nm.We apply the OC3M algorithm (Eq.21) to both the observed and simulated RSRs.
The assimilation system preserves the stoichiometry of the small and large phytoplankton (referred to as PhyS Chl a and PhyL Chl a respectively) as follows.In the BGC model, each phytoplankton cell (small, large, benthic or Trichodesmium) is represented by a quantity of structural material, B, and reserves of nitrogen, R N , reserves of phosphorus, R P , reserves of energy, R I , and an intracellular chlorophyll a concentration, c i .Our intention in the assimilation is to change the number of cells, as quantified by B, not the physiological status of the cell, as represented by R N , R P , R I and c i .Since the reserves are quantified as the total of these reserves across the entire population, each of the reserves is changed by the same proportion as the biomass.Once the analysed quantity is determined (e.g.PhyL Chl a and PhyS Chl a), the quantities of R N , R P , R I and B are updated such that the respective ratios prior to assimilation are preserved.

Control run
The modelling system has been designed to represent the spatially resolved water quality dynamics (phytoplankton, nutrients, turbidity and oxygen) of the GBR World Heritage Area for informed management.A number of indicators have been used to assess the skill of the model, including RMS errors, Pearson's correlation coefficients and Wilmott's skill indicators (Wilmott et al., 1985; https://research.csiro.au/ereefs/models).
The simulated state variable concentrations resemble both the regional climatology for offshore-reef, lagoon-reef and near-shore zones and water quality observations under contrasting seasons-loads and flood events (not shown here, but detailed at https://research.csiro.au/ereefs/models,with optical (Baird et al., 2016a) and carbon chemistry (Mongin et al., 2016) skill assessment published elsewhere).As mentioned above, the model simulates RSR.It is therefore possible to incorporate these RSRs into standard, well-recognised remote-sensing products.Figure 4 presents a snapshot of simulated surface Chl a and the simulated OC3M, as well as remotely sensed products (regional ANN-observed OC3M and NASA-observed OC3M).The images are during the dry season conditions with little cloud contamination along the inshore region.
The two panels on the right side of Fig. 4 represent the simulated (top) and NASA remotely sensed (bottom) OC3M estimate of Chl a.Both combine individual RSRs into proxies for Chl a using the OC3M algorithm (Eq.21).OC3M poorly represents surface Chl a close to the coast where sediment resuspension and CDOM absorption dominate.By comparing the two panels, we can conclude that the model represents the general distribution of OC3M Chl a throughout the region, with high values along the coast and above each reef systems, and low concentrations offshore.The model predicts a large Chl a plume south of Papua New Guinea that is not present in the remote-sensing observations.In the reference simulation, the model overpredicts inshore sediment resuspension, which causes high simulated OC3M values in shallow coastal regions.The simulated surface Chl a inside the coastal band is lower than in the remotely sensed OC3M observation.The two panels on the left of Fig. 4   gionally optimized RSR (bottom; ANN-observed OC3M).
Regardless of the remote-sensing products used, there are clear differences between the simulated Chl a and the simulated OC3M, and these exceed the differences between those of the ANN-observed and NASA-observed Chl a.The reflectances derived from the ANN method have been shown to have substantially lower errors due to the improved atmospheric correction near the coast for case 2 waters (Schroeder et al., 2007), and we have therefore used these observations in the assimilation experiments below.Surface Chl a and OC3M are typically assumed to be equivalent.However, many studies have shown that there are regional biases and substantial overestimation in optically complex coastal waters (Qin et al., 2007;Brando et al., 2015).This is demonstrated in Fig. 5, where we plot simu-lated surface Chl a against simulated OC3M for deep water (Fig. 5, top-left) and the whole domain (Fig. 5, top-right) for 26 May 2013.It is immediately obvious that these variables are not equivalent and the error structure is non-linear, especially when coastal regions are included.A scatter plot of in situ Chl a plotted against MODIS OC3M observations (Fig. 5, lower panel) further highlights the magnitude of the OC3M errors in complex coastal waters.While it is obvious that coastal water types are outside of the design specification of the OC3M algorithm, MODIS OC3M contains errors upwards of 400 % at low Chl a. Figure 5 shows that for the GBR region, there is a substantial risk of OC3M overestimating the in situ Chl a even in optically simple deep water regions of the domain.

Assimilation system configuration experiments
To choose the best configuration for the assimilation of RSR into a coastal BGC model, five experiments were undertaken using a variety of state variables in the state vector (X, Table 1) and by altering the diagonal elements of the observation error covariance matrix (R, Eq. 7).

Forecast innovations
The forecast innovations (Eq.6) for the five assimilation system configuration experiments are shown in Fig. 6: -EXP 1 (black lines, Fig. 6) assumes that total surface Chl a is equivalent to observed OC3M and is used to calculate the forecast innovations, which are then used to update small and large phytoplankton Chl a.An 80 % error in the ANN-observed OC3M observation is prescribed on the diagonal elements of R.
-EXP 2 (red line, Fig. 6) uses simulated OC3M (calculated from simulated RSR at 443, 488 and 551 nm, as described by Baird et al., 2016a) to calculate the forecast innovations.The state variable and observation errors are the same as EXP1.-EXP 3 (green line, Fig. 6) uses the same configuration as EXP 2, with a reduced observation error imposed on the diagonal of the observation error covariance matrix.
-EXP 4 (blue line, Fig. 6) includes additional variables in the assimilation state vector, which now comprises of small and large phytoplankton, ammonia, nitrate and total suspended solids concentrations.The observation error is the same as EXP3.
-EXP5 (dashed magenta line, Fig. 6) is the same assimilation vector as EXP4, but using R551 observations, with a lower observation error of 0.2.Note that these innovations relate to the difference in simulated and observed R551, not OC3M as in EXP1-4.
The forecast innovation (Eq.6) statistics for experiments 1-5 (Fig. 6) provide an insight into the assimilation system performance.An optimal assimilation system should result in mean forecast innovations (mismatches between observations and the model) of close to 0 and low mean absolute innovations.The assimilation system where the observation operator assumed that there was a direct relationship between simulated surface Chl a and ANN-observed OC3M (EXP1, Fig. 6 black line), performed very poorly and was discontinued after nine cycles; the model at times became numerically stiff, requiring the adaptive fourth-fifth order ordinary differential equation (ODE) integrator to take progressively smaller steps.The innovation statistics for EXP1 suggested the model was constantly overpredicting Chl a with the mean absolute innovation exceeding 0.7 more than 50 % of the time.Calculating the forecast innovations with simulated OC3M and ANN-observed OC3M, rather than simulated surface Chl a and observed ANN-observed OC3M, improved innovation statistics dramatically (EXP2-4, Fig. 6).
In EXP3, the magnitude of the diagonal elements of R(E tot ) are reduced from 0.8 to 0.4, yielding a very minor improvement in forecast statistics over EXP2, and was discontinued after 10 cycles.Additional variables were added to the assimilation state vector for EXP4, while keeping the observation error equal to EXP3.The additional assimilation state variables improved the forecast statistics and decreased the mean absolute forecast innovation when compared with EXP2 and EXP3.
Of the experiments that assimilated OC3M, the configuration used for EXP4 (Fig. 6, blue line) gave the lowest mean absolute forecast innovation.While the EXP5 config- uration (noting that this experiment assimilates R551, not OC3M) beat EXP4 with a lower mean absolute innovation, these innovations are for R551, not OC3M, and therefore cannot be directly compared.An assessment against independent (non-assimilated) in situ observations is given in Sect.3.2.2-3.2.5 to assess the relative performance between the non-assimilating control run and EXP1 (glider only), EXP2, EXP3 (glider only), EXP4 and EXP5.

Independent assimilation system assessment: glider
The control and assimilating runs were compared to withheld data obtained from an ocean glider that was deployed on 26 May 2013, and recovered on 4 August 2013.The glider track largely followed the shelf break and headed in a southeasterly direction.To make a comparison between glider observations and the model, we take a subsample of glider observations centred at the time of model output, with a time www.biogeosciences.net/13/6441/2016/Biogeosciences, 13, 6441-6469, 2016 window of 2 h.For each glider observation that falls within this time period, we find a corresponding 3-D cell from the model and extract the equivalent model solution.
A persistent feature of the observed Chl a when interpolated onto the model time-space grid (Fig. 7, top panel) is the relatively low values in the upper 100 m of water column.For much of the record, there is a persistent weak deep chlorophyll maximum (DCM) that is centred at between 80 and 120 m depth.Rarely do concentrations in the upper 80 m exceed 0.5 mg m −3 .When the control run is examined, Chl a in the top 80 m regularly exceeds 1 mg m −3 , and DCMs, when they exist, are located between 30 and 60 m deep.A detailed analysis of the control run demonstrates the model is able to reliably produce DCMs (contained in the skill assessment report at https://research.csiro.au/ereefs/models),but, in this particular location in time and space, the model does not generate one consistent with the observations.Using the EXP1 configuration, the assimilation system does not improve the solution when compared with the control run.The EXP2 configuration marginally improves the solution by reducing the mixed layer Chl a.However, using the EXP4 and EXP5 configurations, the assimilation system substantially improves on the control when compared to the glider observations.The difference between the EXP2 and EXP4 configuration is the reduction in observation error and the inclusion of additional variables in the assimilation state vector.Regardless of the assimilation configuration used, the assimilation system cannot place the DCM in the correct location because the remote-sensing observations provide no information about such a deep feature.The remote-sensing observations do remove the bias in the upper 80 m with concentrations in the assimilating run ranging between 0.1 and 0.3 mg m −3 (Fig. 7).The configuration used in EXP5 captures a weak DCM that persists for 7 days from 7 to 14 July 2013 and also maintains the lowest mixed layer Chl a for the feature captured on from the 7-13 July 2013 (Fig. 7: bottom panel).Assimilating OC3M still overpredicts the mixed layer Chl a.
A comparison between non-interpolated individual profiles from the glider and equivalent sampling of the model shows substantial unresolved subgrid-scale variability (or instrument noise).Figure 8 (top left) shows that in the upper 70 m the observed Chl a as measured by the fluorometer ranges between 0.14 and 0.3 mg m −3 , with a mean value of 0.18 mg m −3 .There is no quenching evident in this or other profiles, as the observed fluorescence and backscatter rates are constant in the upper 50 m.Within each of the subplots, the glider profiles collected over the 2 h window may sample multiple model grid cells.However, due to the relatively slow horizontal glider speed, these glider profiles fall within two adjacent model cells.In most cases, the control and assimilating runs gave indistinguishable solutions between adjacent cells with the exception of Fig. 8 (top left).In all cases the assimilation of observations of both OC3M and R551 has reduced the error in the simulated Chl a when assessed against independent glider data.The noise in the fluorometer observations appears to range between 0.04 and 0.08 mg m −3 .
The Chl a root mean square difference (RMSD) for each layer of the model, and aggregated in time for all glider observations, is shown in Fig. 9. Above 80 m, there is a substantial reduction in RMSD between the control (cyan) and assimilating runs (EXP2-red, EXP4-blue and EXP5-magenta).EXP3 behaved similarly to EXP2.The RMSD from EXP1 (dashed black) is marginally higher than the RMSD in the control run.The assimilating runs (EXP4 and EXP5) have an RMSD of between 0.10 and 0.17 mg m −3 compared with 0.30 to 0.41 mg m −3 in the control run.In the upper 60 m, the assimilation of R551 (EXP5) results in a minor improvement in RMSD when compared with the assimilation of OC3M (EXP4).Below 80 m, the RMSD profile is similar between the control and assimilating runs.However, the assimilation of R551 results in the lowest RMSD below 80 m.The largest impact of assimilating remote-sensing observations is in the top 80 m of the water column.

Independent assimilation system assessment:
GBRMPA MMP Chl a and TSS The AIMS moorings were deployed at the 13 GBRMPA MMP sites in the shallow inshore regions of the Great Barrier Reef Lagoon (Fig. 1).The control run typically had RMSDs of between 0.4 and 0.6 mg m −3 for in situ Chl a (Fig. 10; upper panel).In all cases the EXP2 configuration performed worse than the control run.In most cases the EXP4 configuration reduced the RMSD of in situ Chl a by 0-10 % when compared with the control run.The assimilation of R551 (EXP5) was more variable.Some sites improved, whereas others were degraded by the assimilation of this observation type when compared with the control run.
The TSS RMSD varies widely across all the GBRMPA MMP sites (Fig. 10; lower panel), driven by the strong variation in magnitude of the spring-neap tidal forcing.The combination of the initialization of the ensemble and perturbed forcing caused the ensemble mean for EXP2 to have a substantially reduced RMSD when compared with the control run.The inclusion of the TSS constituents in the state vector in the assimilating model further reduced the RMSD, and has generated realistic time-varying correlations between the observed OC3M/R551 and inshore TSS.These cross correlations allow for the correction of simulated TSS from OC3M/R551 observations.The performances of EXP4 and EXP5 were indistinguishable, but the inclusion of TSS in the assimilation state vector was beneficial.

Independent assimilation system assessment: nutrients
Within the GBR region, there are two IMOS NRS sites (Yongala and North Stradbroke, NS, Fig. 1).The dissolved inorganic nutrients of NO 3 , NH 4 and DIP are taken monthly.
Biogeosciences, 13, 6441-6469, 2016 At Yongala, water samples are taken at the surface (0 m), 10 m, 20 m and bottom (26 m).At NS, samples are taken at the surface (0 m) and 10 m.It should be noted that there are only three to four samples per depth at each site during the 4-month simulation period.At Yongala, typical control run RMSDs for NO 3 range from 5 to 12 mg m −3 .Assimilating OC3M/R551 halved these errors using the EXP4/5 configurations.The improvement at NS is evident with improvements found in the upper 10 m of the water column, with EXP4 and EXP5 giving indistinguishable results.The EXP2 configuration yielded higher RMSDs than EXP4 and EXP5.
With the exception of the surface samples at Yongala, the assimilation system improved the prediction of NH 4 at all depths for each site.Most notably was the 70 to 90 % reduction in RMSD at the deeper locations at Yongala.EXP4 and EXP5 outperformed EXP2 in all cases for NH 4 .There were marginal improvements to DIP, which displayed a 0 to 30 % reduction in RMSD across all sites.EXP4 resulted in lower DIP errors at NS, but EXP5 resulted in lower DIP errors at Yongala.EXP2 had the lowest DIP RMSDs.

Summary of assimilation system experiments
The direct assimilation of RSR (EXP5: R551), or a function of RSR, f (RSR) (EXP4: OC3M), improved the model so-lution when compared with the control run against three independent non-assimilated in situ observational datasets.In most cases the EXP4 and EXP5 configurations outperformed the EXP2 configuration, which was to be expected due to the additional variables included in the assimilation state vector.The subtle differences between the two approaches will be discussed in Sect. 4. Based on these findings, the preferred assimilation system for state estimation on the GBR is that used in EXP4.A further examination of the EXP4 forecast error statistics and example forecast, analysis and increment field is given Sect.3.3.

EXP4: assimilation system forecast errors
The assimilation system was run with a 5-day forecast cycle.Using the forecast at t + 5 days and comparing the temporal mean (across all cycles) of the RMSD and percentage error against observations provides insight into the value of the assimilation system, when compared with a non-assimilating system.By comparing the forecast fields against yet to be assimilated observations, we are providing a semi-independent estimate of forecast skill.Additionally, by comparing the forecast against the persisted analysis field from the previous analysis cycle (e.g. the analysis field from t − 5 days), Figure 9.A profile of the temporal mean Chl a RMSD between the glider observations presented in Fig. 7 and the non-assimilating control (cyan), EXP1 (black dashed; note that the mean RMSD is calculated using a short time period), EXP2 (red), EXP3 (green dashed; note that the mean RMSD is calculated using a shorter time period), EXP4 (blue) and EXP5 (magenta).
it can be determined whether the dynamic model is adding skill to the forecast.
A comparison of simulated OC3M and observed OC3M for the non-assimilating control run gives a domain-wide median error (range) of 0.32 (0.27-0.48) mg m −3 (Fig. 12; Tables 2 and 3).This is approximately equivalent to a domainwide median percentage error (range) of 100 % (80-130 %; Fig. 12).The DA system reduces the forecast errors to a median value (range) of 0.23 (0.20-0.30) mg m −3 and median (range) percentage errors of 55 % (43-63 %) respectively.The analysis errors are again reduced when observations are assimilated, with median and percentage errors (range) of 0.19 (0.14-0.23) mg m −3 and 39 % (37-42 %) respectively.When the analysis field from the previous assimilation cycle is persisted forward, the errors (and percentage errors) slightly exceed that of the forecast field with values of 0.26 (0.21-0.29) mg m −3 and 52 % (44-65 %) respectively.How-ever, it is not expected that these error statistics are spatially uniform given the large percentage of area that is dominated by deep oceanic waters.To understand the spatial variability of the forecast error statistics, the whole domain is divided into three regions representing shallow coastal waters (depth < 30 m), lagoon and shelf waters (30 m < depth < 500 m), and deep oceanic waters (depth > 500 m).
In shallow coastal areas (Fig. 12, second column), the nonassimilating control run has a median error (range) of 1.35 (1.1-2.45)mg m −3 , which corresponds to a percentage error (range) of 130 (105-180) %.The distribution of control run errors in the coastal zone is positively skewed, with the mean value of the distribution sitting some way from the median.The assimilation system marginally reduces the median forecast error when compared with the control run, though, most notably, it reduces the median percentage error and associated variability.The forecast also beats persistence in this region.There is a marked improvement for lagoon and shelf waters with the assimilation system reducing the median error from 0.34 to 0.25 mg m −3 , which corresponds to a reduction in percentage error from 96 to 48 %.In the oceanic regions of the domain, the assimilation system reduces the error from 0.16 to 0.10 mg m −3 , corresponding to a percentage error reduction from 91 to 45 %.In all cases the forecast fields beat persistence.A summary of the results can be found in Tables 2 and 3.

EXP4: forecast, increment and analysis fields
The sum of the surface Trichodesmium Chl a, PhyS Chl a and PhyL Chl a biomass differs substantially from the simulated OC3M as shown in Fig. 5.The assimilation system updates all of the state variables included in the assimilation state vector.Simulated OC3M is a diagnostic variable that is a function of all the optically active dynamic state variables as described in Sect. 2. To demonstrate the impact on the dynamic variables of PhyS and PhyL, results from the forecast step, and the assimilation update, are presented in Figs. 13, 14 and 15.Cycle 22 was chosen to demonstrate the spatial impact of the system as it is representative of the last 10 cycles and was relatively cloud free.
The simulated OC3M forecast field for cycle 22 (12 September 2013) displays elevated OC3M in the shallow near-shore environment throughout the whole of the GBR region and southern shelf of Papua New Guinea (PNG; Fig. 13).Additional features are elevated OC3M in the vicinity of the central and southern fringing reefs and a plume originating from the eastern region of PNG.Offshore oceanic waters generally have low OC3M of 0.2 mg m −3 or less.There is some evidence of mesoscale blooms in the northern and southern sections of the domain.Observed OC3M is overlaid on Fig. 13 (left).Where there is a difference in colour, the simulated OC3M differs from the observed OC3M.Table 2. Forecast error statistics for OC3M (mg Chl a m −3 ) by region (coastal, lagoon, oceanic) for the control (C) and EXP4 forecast (F), analysis (A) and persistence (P) fields.The forecast surface layer fields for PhyS and PhyL appear substantially different to the simulated OC3M field (Fig. 13).The differences near the coast are where TSS and CDOM are known to cause artefacts in OC3M.While there are patchy blooms of small phytoplankton at various locations within the domain, rarely does the PhyS Chl a exceed 0.5 mg m −3 .The exception to this is in the inner central coastal region of the GBR and in the vicinity of the Fly River plume on the south coast of PNG.Similarly, the PhyL Chl a remains very low for large areas of the domain; however, in regions with additional nutrient supply (e.g. in upwelling regions, mesoscale eddies and some river mouths) blooms do occur.
When the observed OC3M is assimilated, increment fields are calculated using Eq. ( 5) and are presented in Fig. 14 for simulated OC3M, PhyS Chl a and PhyL Chl a.The innovations are overlaid on the increment field for OC3M to give an indication of how well we are fitting the observations.In areas where the model overpredicts OC3M, the increments will be negative.In areas where the model underpredicts OC3M, the increments will be positive.The increments and innovations here are presented as a fractional change with respect to the background (forecast) field.
For this particular analysis cycle, it appears that the model is underestimating inshore OC3M by up to 10-30 % and overestimating OC3M by upwards of 50 % offshore (Fig. 14,left).By using the background ensemble correlation structure, the increments applied to PhyS biomass increase its concentration in the inner lagoon by up to 20 %, and substantially increase the PhyS biomass offshore of the central outer reefs by more than 50 % (Fig. 14, centre).It should be noted that the increments being applied to the background fields contain meso-and sub-mesoscale information.Significantly,  features such as upwelling filaments, eddies and plumes are maintained through the assimilation procedure, demonstrating that they are allowed to dynamically evolve in the assimilation system.The increments applied to PhyL biomass (Fig. 14, right) differ substantially to those of PhyS biomass (Fig. 14, centre).For large areas of the domain, the assimilation system decreases the PhyL biomass by up to 50 %, whereas there are some areas in which it increases.These areas correspond to regions where a bloom may be occurring (there is a small westward shift in the major bloom off the Papua New Guinea coast).The increment applied to the central region of the domain, offshore of the outer reefs, is linear and coherent and likely a result of shifting a dynamic feature such as an upwelling-induced bloom to better match observations.
When the increments contained in Fig. 14 are applied to the forecast fields, the resulting analysis field for simulated OC3M better fits the observed OC3M, with a substantially reduced error inshore and in the vicinity of the outer reefs (Fig. 15).The difference between simulated and observed OC3M is small in the deeper offshore regions and shallow sections of the lagoon.The greatest error in OC3M occurs in the central lagoon and the outer reefs where spatial variability is highest.The corresponding analysis fields for small phytoplankton and large phytoplankton are contained in Fig. 15.There are elevated concentrations of small phytoplankton biomass in the near-shore region near river mouths and the outer fringing reefs.The large phytoplankton biomass is concentrated in the region of Broad Sound, the Fly River plume and the Papua New Guinea upwelling.Each of these fea- Presented are statistics for the whole domain (left column) and regions as defined by bottom depth (three rightmost columns), for which the mean and range of values are presented in Tables 2 and 3.
tures was predicted by the forecast, as little biomass is added or subtracted by the assimilation update.However, there is substantial removal of large phytoplankton biomass from the northern and central offshore regions.This leaves very little large phytoplankton biomass present in substantial areas of the domain during this particular analysis cycle.

Discussion
In the optically complex waters of the GBR, the use of an optics model to calculate simulated RSR and subsequent simulated OC3M to constrain the BGC model substantially reduces the errors in BGC state.This has been achieved by explicitly assimilating like-for-like variables of simulated R551 or simulated OC3M.The DA is constrained by the mismatch between simulated and observed R551 and OC3M.A summary of the RMSDs is contained in Table 4.Our approach of simulating the observation is the opposite to the conversion of observed RSRs into modelled variables, e.g. the assimilation of phytoplankton functional types, TSS, CDOM and Chl a.We therefore advocate for the use of a "forward" model to predict optical properties, rather than rely on the "inversion" to back calculate the model state from reflectances.The conversion of RSRs into derived variables (e.g.Chl a, PFTs) has associated errors that are as large as 300 % in some locations and can be biased by up to 70 % (Qin et al., 2007), whereas the forward approach has errors of approximately 20 % (Baird et al., 2016a).The errors stemming from the "inversion" technique are at times difficult to characterise in data assimilation systems.Using a forward model avoids these errors and therefore the dominant source of error stems from model error, rather than observation error.The approach of Ciavatta et al. (2014) contained similar results where they found the assimilation of Kd 443 , to be superior to remotely sensed Chl a.
A significant source of error in algorithms such as OC3M is that they produce a single value for each horizontal pixel, generally considered to be representative of the first optical depth of the water column.If this is to be compared to a single value in a BGC model, then it must be assumed that the water column is well mixed to the optical depth and that there is an equal optical depth of each of the wavebands used in the algorithm.Both of these conditions are rarely met in coastal waters.Matching RSR requires no assumptions about the structure of the water column or of the vertical distribution of the optically active constituents, because both observed and modelled quantities are 2-D fields.
EXP1 performed poorly for a number of reasons.It is likely that even an 80 % observation error was insufficient to adequately account for positive biases in the OC3M obser-  vations.These biases, present even in offshore waters, lead to positive innovations that result in adding phytoplankton biomass in the increments.These large increments lead to persistently high biomass, which draws available nutrients down to very low levels.The only way to account for this form of observation error is to inflate the "difference in kind error" to a large value.Given that the non-assimilating control run of the model had forecast errors that range between 70 and 100 % (region dependent), running an assimilation system with an observation error larger than the error present in the non-assimilating model does not make sense.
The similarity in RMSDs between EXP4 and EXP5 (Table 4) suggests there is little difference between using sim-ulated R551 and simulated OC3M.Univariate observations were assimilated in each case, and the information content of the observations is likely going to be similar in deep water, hence the similar results for the Ocean Glider.However, due to the simulated OC3M containing information from 443 and 488 nm, which are important in shallow regions due the absorption of these wavelengths by sediments and CDOM, there is additional information in the OC3M observation that is not present in the 551 nm RSR.Ultimately the greatest information content will come from the simultaneous assimilation of multivariate observations obtained from the multispectral OC sensors and including the longer wavelengths.Table 4. Root mean square differences (RMSDs) for the withheld observations and the control run (control) and assimilation experiment 2 (EXP2), experiment 4 (EXP4) and experiment 5 (EXP5).The lowest RMSD is given in bold.

True-colour visualisation of EXP4
In order to visualise the impact of the assimilation system on the prediction of water clarity, we compare the observed true colour (Fig. 16) with the simulated true colour of the control run (Fig. 17, top left) and the EXP4 assimilating run (Fig. 17, top right).Simulated true-colour images are generated from RSR at the red, green and blue wavelengths calculated using the optical model and the 3-D fields of the 23 model-predicted, optically active constituents.
The observed true-colour image on the 12 September 2013 shows brown-yellow features associated with high suspended sediment concentrations.As these concentrations become more diluted, and mixed with phytoplankton, the water appears more greenish blue.Offshore reefs, with clear water above white substrates, appear as light blue features, with the intensity depending on the reef depth.Qualitatively, the control run (Fig. 17, top left) does a reasonably good job of reproducing the observed true colour.The quantification of this mismatch can be done on individual colour bands (not shown; Baird et al., 2016a).Qualitatively, the control run does not have enough suspended solids in the surface wa-ter in the mouth of Broad Sound (22.2 • S, 149.5 • E) and has too high phytoplankton concentrations offshore, especially in a feature centred at 23 • S, 151.5 • E. The assimilated run (EXP4), while not that different to the control run, corrects some of these errors.
To approximately quantify impacts of the assimilation of water clarity, it is possible to consider the colour of the added (and subtracted) constituents in the assimilation procedure.To avoid confusion with the phrases "falsely coloured" or "negative", which have distinct meanings in visualisation science, but to still provide a phrase for true-colour error, we use the term "off-colour" and distinguish between off-colour that requires correction through addition (Fig. 17, bottom left) and subtraction (Fig. 17, bottom right).The assimilation procedure added yellow colours (suspended sediment) within Broad Sound and green colours (phytoplankton) in the mouth.Offshore the assimilation removed green, particularly, as noted above at 23.0 • S, 151.5 • E. By removing green it made the water more blue (Fig. 17  The RGB wavelengths used were 667, 551 and 488 nm and processed using the MODIS true-colour algorithm (Gumley et al., 2010;Baird et al., 2016a).The white pixels are clouds, and grey is land.

Like-for-like assimilation
The general methodology presented in this study is similar to that used in NWP.More than two decades ago, the NWP community moved away from assimilating satellite-derived temperature and humidity profiles, in favour of assimilating radiances (or temperature brightness), this approach is detailed in Derber and Wu (1998).By assimilating satellitederived radiance data, the NWP community avoided any reliance on empirical relationships used to predict the temperature and humidity profiles.Similarly, the approach taken here is to avoid the use of an empirical inverse model and use a physics-based forward model to predict RSR centred at the MODIS bandwidths.We then post-process these simulated RSRs into a simulated OC3M.The simulated OC3M is directly comparable to the observed OC3M with both containing quantitatively similar sources of error derived from bottom reflectance and turbid coastal waters.By avoiding the use of an inverse empirical statistical model, we present a BGC DA approach that were adopted by the NWP community decades ago (Derber and Wu, 1998;Dee et al., 2015).
An additional benefit of avoiding the use of IOP/AOPbased empirical products is that assimilation of RSRs can take advantage of non-ocean-colour-specific missions such as Himawari 8.The spectral resolution of simulated RSR can be altered to simulate reflectances at the Himawari 8 truecolour bands, providing a step change in the data available for areas such as the GBR due to the high spatial resolution The difference between the remote-sensing reflectance in the control and assimilated runs was used to quantify the colour (referred to as off-colour) added (i.e.greater surface expression, bottom left) and subtracted (i.e. less surface expression, bottom right) due to the updating of optically active constituents in the assimilation run (see Fig. 16 for more details).Note that the off-colour images have a smaller brightening factor as the MODIS true-colour stretch saturates the features that are of most interest.Simulated true-colour images are not falsely coloured, and thus do not require a colour map, nor are they 2-D as they have a depth of field, being based on reflectance from multiple depths and the bottom (Baird et al., 2016a).Thus simulated true colour can be considered a photograph of the optical state of the different model runs and, like observed true colour, a powerful and intuitive visualisation tool for water clarity in BGC models.
(nominally 500 m) and temporal resolution (every 30 min).This data density far exceeds that available from orbiting satellites and will provide coverage similar to the products being assimilated in NWP systems.

Multiband assimilation
If multiple reflectance bands are to be simultaneously assimilated, it will be necessary to account for cross correlation between observation errors.For example, the RSR at 443 nm is strongly correlated with the RSR at 488 nm.Therefore, the observations of adjacent bands are no longer independent and it is likely we need to reconsider the assumption that the off diagonal elements of the observation error covariance matrix (R) are 0. Using simulated and observed OC3M eliminates the possibility of cross correlation and contains information derived from multiple bands.The OC3M algorithm can be considered a band-ratio function, f (RSR), that transforms multivariate observations into univariate observations.It is likely that there are other functional forms that could combine information from multiple bands into a single noncorrelated observation.
The configuration of the DA presented in this study requires the dominant error subspace to be spanned by the ensemble.Pragmatic choices have been made to allow the system to run on the available compute resources.To this end we have perturbed two sensitive model parameters and river loads of nutrients and sediments.The distributions that have been sampled to perturb the zooplankton mortality rates and θ i along with their respective shape parameters could be considered a subjective choice.There is substantial scope to recast the problem with a Bayesian hierarchical modelling (BHM) framework (as in Parslow et al., 2013;and Dowd et al., 2014), whereby the prior distribution are assigned to uncertain parameters, and a thorough metaanalysis of the literature could be used to construct informative distributions.The observations could then be used to construct not only a posterior over the state, but a full joint posterior over the state and parameters.Furthermore, we have not allowed uncertainty in the physics to propagate into the BGC solution.We recognise this is a shortcoming of the study but, given the computational constraints, we are not in a position to expand the ensemble to include physics perturbations (which would require an ensemble that is up to an order of magnitude larger).As more computing power becomes available, ensemble sizes could be increased, stochastic parameterisations introduced (Garnier et al., 2016) and DA methods with less parametric assumptions (e.g.Parslow et al., 2013) could be adopted.
There have been two recent discussion papers that detail the pathway towards operationalising BGC forecasting systems (Gehlen et al., 2015;Ford and Barciela, 2015), analogous to the current NWP and hydrodynamic prediction system that routinely run at numerous operational centres.It has been acknowledged that satellite remote sensing will play a key role in such systems; there appears to be two possible pathways to achieve this vision, dependent upon on whether bio-optics are taken into account.In the absence of an optical model, Ford and Barciela (2015) suggest further exploration of the assimilation of empirical statistical products such as Chl a and PFTs.If simple bio-optical properties are taken into account, an alternative approach is the assimilation of diffuse attenuation coefficient(s) (e.g.Ciavatta et al., 2014) or phytoplankton absorption (e.g.Shulman et al., 2013).For complex coastal regions that are dominated by case 2 waters, an explicit spectrally resolved in-water optics model opens the possibility of directly assimilating RSRs and avoids the costly requirement of calibrating an empirical IOP algorithm that is regionally specific.Whilst the results from this study have shown to be valuable in the GBR region, further work needs to be undertaken to demonstrate the broad applicability of this approach.Nonetheless, we would advocate a third approach should be considered -the assimilation of RSR.

Conclusions
In this study we have used a spectrally resolved optical model coupled to a BGC model to simulate the RSR centred at the MODIS OC bands.A series of assimilation system configuration experiments were undertaken to test the assimilation system performance.When the simulated OC3M (EXP4) and remote-sensing reflectances (EXP5) were assimilated into the model, the forecast errors in Chl a fell from 100 to 55 % when compared to the non-assimilating model.By using a function of the remote-sensing reflectances (OC3M), information from multiple bands is included in a univariate observation and the forecast error is halved compared to simply assuming the OC3M is directly related to the model prediction of surface total Chl a.A comparison against in situ observations of NO 3 , NH 4 , DIP and TSS shows the assimilating model (EXP4) reduces the MAPE from 90 to less than 20 % at most stations.By using a forward model that includes a majority of error sources present in the observed OC3M, we have shown that the assimilation of remotely sensed products in optically complex case 2 waters can be achieved and adds substantial predictive skill when compared to the non-assimilating model.Furthermore, this approach can be generalized to non-OC-specific missions by assimilating the remote-sensing reflectances directly (e.g.EXP5), liberating a vast quantity of data that cannot be used in traditional BGC assimilation systems.

Data availability
The data used and produced in this study can be considered either observational data or model output.The eReefs modelling system is a suite of coupled hydrodynamic, sediment, optical and BGC models specifically tailored to the Great Barrier Reef.The hydrodynamic model is a 3-D, finite-difference, baroclinic model based on the 3-D equations of momentum, continuity and conservation of heat and salt, employing the hydrostatic and Boussinesq assumptions (Herzfeld, 2006;Schiller et al., 2015).The equations of motion are discretized on a finite-difference stencil corresponding to the Arakawa C grid.In the vertical z coordinate scheme, there are 47 fixed z levels.The atmospheric forcing products (wind, pressure, rain and heat fluxes) are supplied by Bureau of Meteorology (BOM) reanalysis products.A tidal signal was superimposed on the low-frequency sea level oscillation provided by BRAN2.3 (Oke et al., 2008a) on the regional grid open boundary.This tidal signal was introduced via a local flux adjustment.The OTIS tidal model (Egbert and Erofeeva, 2002) was used to generate the tidal signal from amplitude and phase information for eight constituents.The local grid open boundary was forced with temperature, salinity and velocity (with local flux adjustment) derived from the regional grid.A mass conserving flux-based advection scheme is used to transport sediment and BGC tracers.The sediment transport model adds a multilayer sediment bed to the hydrodynamic model grid and simulates sinking, deposition and resuspension of multiple size classes of suspended sediment (Margvelashvili et al., 2008).The model solves advection-diffusion equations of the mass conservation of suspended and bottom sediments and is particularly suitable for representing fine sediment dynamics, including resuspension and transport of BGC particles.The model is initialised with the observed distribution of gravel, sand and mud in the seabed of the shelf region.Sediment particles settle on the seabed due to gravity and resuspend into the water column whenever the bottom shear stress, exerted by waves and currents, exceeds the critical shear stress of erosion.The resuspension and deposition fluxes are parameterised with the Ariathurai and Krone (1976) formula.The bottom friction under combined waves and currents is estimated through the non-linear bottom boundary layer model (Madsen, 1994).
Sediments in benthic layers undergo vertical mixing due to bioturbation, represented by local diffusion.The corresponding diffusion coefficient scales with the sediment depth so that the bioturbation ceases to operate beneath the biologically active layer.The resistance of sediments to resuspension also varies with the sediment depth to reflect the consolidated nature of deep sediments.The numerical grid for sediment variables in the water column coincides with the numerical grid for the hydrodynamic model.Within the bottom sediments, the model utilises a time-varying sedimentthickness-adapted grid, where the thickness of sediment layers varies with time to accommodate the deposited sediment.
Horizontal resolution within sediments follows the resolution of the water column grid.
The BGC model is organised into three zones: pelagic, epibenthic and sediment.The epibenthic zone overlaps with the lowest pelagic layer and the top sediment layer, sharing the same dissolved and suspended particulate material fields.Dissolved and particulate BGC tracers are advected and diffused throughout the model domain.Additionally, BGC particulate substances sink and are resuspended in the same way as sediment particles.BGC processes are organised into pelagic processes of phytoplankton and zooplankton growth and mortality, remineralisation of particulate and organic material, and fluxes of dissolved oxygen, nitrogen, phosphorus and carbon (including nitrogen fixation, phosphorus adsorption and desorption, surface gas exchanges, respiration and photosynthesis, and fluxes to and from biotic pools); epibenthic processes of growth and mortality of macroalgae, seagrass and corals; and sediment-based processes of phytoplankton mortality, microphytobenthos growth, detrital remineralisation and fluxes of dissolved substances (Fig. 3).
The BGC model includes four groups of microalgae (small and large phytoplankton, Trichodesmium and microphytobenthos) and three macrophytes types (seagrass types corresponding to Zostera and Halophila, macroalgae and coral communities).Photosynthetic growth is determined by concentrations of dissolved nutrients (nitrogen and phosphate) and photosynthetically active radiation.Autotrophs take up dissolved ammonium, nitrate, phosphate and inorganic carbon and, in the case of Trichodesmium, fix atmospheric nitrogen (Robson et al., 2013).Microalgae incorporate carbon (C), nitrogen (N) and phosphorus (P) at the Redfield ratio (106C : 16N : 1P; Redfield, 1963) while macrophytes do so at the Atkinson ratio (550C : 30N : 1P; Atkinson, 1983).Microalgae contain two pigments (chlorophyll-a and an accessory pigment) and have variable carbon:pigment ratios determined using a photoadaptation model (described in Baird et al., 2013).
Microzooplankton graze on small phytoplankton and mesozooplankton graze on large phytoplankton and microzooplankton, at rates determined by particle encounter rates and maximum ingestion rates.Of the grazed material that is not incorporated into zooplankton biomass, half is released as dissolved and particulate carbon, nitrogen and phosphate, with the remainder forming detritus.Additional detritus accumulates by mortality.Detritus and dissolved organic substances are remineralised into inorganic carbon, nitrogen and phosphate with labile detritus transformed most rapidly (days), refractory detritus slower (months) and dissolved organic material transformed over the longest timescales (years).The production (by photosynthesis) and consumption (by respiration and remineralisation) of dissolved oxygen is also included in the model and depending on prevailing concentrations, facilitates or inhibits the oxidation of ammonia to nitrate and its subsequent denitrification (in the sediment) to di-nitrogen gas which is then lost from the system.Full details of equations used in the BGC model are given by Baird et al. (2016a) and details of parameter values and implementation for the Great Barrier Reef are given by Herzfeld et al. (2016).
The model is forced using flow and concentrations of dissolved and particulate constituents from 21 rivers along the Queensland coast (north to south: Normanby, Daintree, Barron, combined Mulgrave + Russell, Johnstone, Tully, Herbert, Haughton, Burdekin, Don, O'Connell, Pioneer, Fitzroy, Burnett, Mary, Calliope, Boyne, Caboolture, Pine, combined Brisbane + Bremer, and combined Logan + Albert) and the Fly River in Papua New Guinea (Herzfeld, 2015).To determine river concentrations, sediment and nutrient observations were statistically evaluated over 10 years (Furnas, 2003).Separate analysis was undertaken for wet (the Fly and the northernmost six rivers in Queensland) and dry (remainder) catchment rivers.Volume-averaged wet season export coefficients based on this observed dataset were derived for wet-and dry-catchment river types, and mean flowweighted concentrations determined.These constant concentrations are multiplied by higher frequency (daily) observed discharge data to calculate the flux of constituents at the river mouths.
The eReefs BGC and sediment model has three open ocean boundaries.Nutrient concentrations flowing in from the boundaries were obtained from the CSIRO Atlas of Regional Seas (CARS) 2009 climatology (Ridgway et al., 2002) and empirical nutrient-temperature relationships.The ICs are specified by a generalised empirical relationship and scaled nutrient profiles on the model density profile specifying top and bottom water column values from the CARS ocean atlas.Surface NO 3 is usually low (< 3 mg m −3 ).In deeper waters nutrient concentrations increase from 0 to 1500 m depth and then remain constant down to the ocean floor (4000 m depth, 500 mg m −3 ).The ICs for most other tracers were not spatially resolved, since observations for the outer reef and Coral Sea are limited temporally and spatially.

Appendix B: Calculation of inherent optical properties
The optical model considers the processes of absorption and scattering by pure seawater, CDOM, non-algal particulate (NAP) and phytoplankton cells, as well as benthic reflectance.Here we describe the calculation of the IOPs, such as total phytoplankton absorption at a specific wavelength (those required for the assimilation), that are calculated from the model state variables (e.g.phytoplankton chlorophyll biomass) and model parameters (e.g.cell radius).A full description is given in Baird et al. (2016a), including benthic reflection calculations that are not given here and a description for how the optical model is used to calculate photosynthetic processes.

B1 Phytoplankton absorption
The absorption cross section (α) of a spherical cell of radius (r) pigment-specific absorption coefficient (γ ), and homogeneous intracellular pigment concentration (c i ), calculated using geometric optics: α = π rr 2 1 − 2(1 − (1 + 2γ c i r) e −2γ c i r (2γ c i r) 2 , where π r 2 is the projected area of a sphere.The use of an absorption cross section of an individual cell has two significant advantages.Firstly, the same model parameters used here to calculated absorption in the water column are used to determine photosynthesis in the biogeochemical model, including the effect of packaging of pigments within cells.Secondly, the dynamic chlorophyll concentration modelled in the biogeochemical model can be explicitly included in the calculation of phytoplankton absorption (Baird et al., 2013).The absorption of a population of n cell m −3 is given by nα m −1 .

B2 Coloured dissolved organic matter absorption
The absorption of CDOM at 443 nm, a CDOM,443 , is determined from a relationship with salinity in the following region (Schroeder et al., 2012): a CDOM,443 = −0.0332S + 1.2336, where S is the salinity.In order to avoid unrealistic extrapolation, the salinity used in this relationship is the minimum of the model salinity and 36.In some cases coastal salinities exceed 36 due to evaporation.The absorption due to CDOM at other wavelengths is calculated using a CDOM spectral slope for the region (Blondeau-Patissier et al., 2009) a CDOM = a CDOM,443 exp (−S CDOM (λ − 443)) , where S CDOM = 0.012 nm −1 is an approximate spectral slope for CDOM, with observations in the GBR ranging from 0.01 to 0.02 nm −1 for significant concentrations of CDOM.

B3 Absorption due to non-algal particulate material
In the model, optically significant NAPs include mineral particulates and detritus, with NAP absorption given by a NAP = c 1 NAP, where c 1 is spectrally resolved coefficients determined from in situ observations in Gladstone Harbour at times when absorption was dominated by particles.

B4 Total absorption
The total absorption, a, is given by a = a w + a NAP + a CDOM + where a w is pure seawater absorption and N is the number of phytoplankton classes.

B5 Scattering
The total scattering coefficient is given by b = b w + c 3 NAP + b phy where NAP is the concentration of non-algal particulates, b W is the scattering coefficient due to pure seawater, c 3 is the NAP-specific scattering coefficient and b phy is the chlorophyll-specific scattering coefficient with the water column chlorophyll concentration of each classes is given by n x c i,x V x (where c i is the intracellular chlorophyll concentration and V is the cell volume).Similarly to absorption of NAPs, spectrally resolved values for c 3 were based on field observations in Gladstone Harbour at times when absorption was dominated by particles.

Figure 1 .
Figure 1.A diagram denoting the steps used to relate and transform various optical properties with the system state and observations.The green circle represents the in-water system state as predicted by a marine BGC model.The blue circle represents the inherent optical properties (IOPs) of the system state.The magenta circle represents the depth resolved apparent optical properties (AOPs).The cyan circle represents the 2-D remote-sensing reflectance (RSR).The two red boxes represent either in situ or satellite remote-sensing observations.Each circle is partitioned into three segments where each segment represents the possibility to compare like-for-like variables.

Figure 2 .
Figure 2. A map of the Great Barrier Reef region, with the colour bar denoting the water depth, markers denoting the population centres (red triangles), IMOS NRS sites (yellow triangles), GBRMPA MMP Water Quality Meters (WQMs; yellow circles) and points of interest referred to in the text (red circles), with the glider track (white line adjacent to Lizard Island).The in situ sampling locations and glider observations are used to assess the data assimilation system performance.

Figure 4 .
Figure 4. Simulated surface Chl a (mg m −3 ) of the non-assimilating control run (top left) and the simulated OC3M (top right) derived from the simulated remote-sensing reflectance for 14 July 2013.The observed OC3M with ANN-derived observed remote-sensing reflectance (bottom left) and NASA-derived observed remote-sensing reflectance (bottom right).

Figure 5 .
Figure5.A log 10 -scaled scatter plot of simulated surface total Chl a and simulated OC3M for deep water regions (top-left) and whole of domain (top-right).The lower panel contains a scatter plot of MODIS OC3M against in situ Chl a data obtained from the IMOS Bio-Optical Database, which is publicly available through the Australian Ocean Data Network (AODN) portal (https://portal.aodn.org.au/).Matchups were included for MODIS overpasses within ±24 h of the in situ sampling.Points are coloured according to optical water type(Moore et al., 2009), ranging from low-chlorophyll open water (types 1-2) to coastal, CDOM and sediment-dominated waters (types 6-8).

Figure 6 .
Figure6.Comparison of mean innovation statistics for EXP 1-5.EXP 5 should be analysed with caution as these innovations relate to R551, not OC3M.An innovation of 0 indicates perfect agreement between model and observations.The top panel plots the mean innovation for each assimilation cycle.The lower panel plots the mean absolute innovation against assimilation cycle.The colours correspond to black (EXP1), red (EXP2), green (EXP3), blue (EXP4) and magenta (EXP5).The variables in the legend correspond to the observation error (R) and the assimilation state vector (X).

Figure 7 .
Figure 7.Comparison against independent IMOS glider observations, with the observed section of Chl a derived from the onboard fluorescence sensor.The comparison is undertaken in model space, whereby all glider data that fall within a 1 h window either side of when there is a 3-D model output available are extracted, and equivalent simulated Chl a are extracted from the model.The glider and model data are then spatially aggregated and interpolated onto the vertical model grid.The resulting observed (top) and simulated (control and assimilation experiments) sections for the IMOS glider are shown.

Figure 8 .
Figure8.Comparison of depth-resolved Chl a against six individual glider profiles using the method described in the caption of Fig.7.Glider observations of Chl a and backscatter coefficient are green and black respectively, the non-assimilating control run profiles are cyan and the assimilating run profiles for EXP4 are blue and EXP5 are magenta.

Figure 10 .
Figure 10.A comparison of Chl a and TSS RMSDs between the in situ GBRMPA MMP moorings for the non-assimilating (cyan) and assimilating runs of EXP2 (red), EXP4 (blue) and EXP5 (magenta); the GBRMPA MMP sites are denoted by the yellow circles in Fig. 1.

Figure 11 .
Figure11.A comparison of RMSD of simulated nutrients with in situ bottle samples for the non-assimilating (cyan) and assimilating run of EXP2 (red), EXP4 (blue) and EXP5 (magenta).Observations are obtained at the Queensland IMOS site (yellow triangles in Fig.1) at Yongala (Y) and North Stradbroke (NS) Island.Y_0 are the Yongala surface samples, while Y_26 are the samples taken from 26 m depth.

Figure 12 .
Figure 12.Box and whisker plots of RMSD (top row) and MAPE (bottom row) of the mismatch between simulated OC3M and ANNcorrected observed OC3M.Each panel contains the control run (C) and EXP4 showing forecast (F), persistence (P) and analysis (A).Presented are statistics for the whole domain (left column) and regions as defined by bottom depth (three rightmost columns), for which the mean and range of values are presented in Tables2 and 3.

Figure 14 .
Figure14.Increments that are added to the forecast fields generated by the assimilation system (EXP4 configuration) for simulated OC3M with innovations overlaid (left), and the prognostic variables of surface small phytoplankton Chl a (centre) and surface large phytoplankton Chl a (right) for cycle 22.

Figure 15 .
Figure 15.The resulting analysis fields for simulated OC3M and withheld ANN OC3M observations for EXP4 (cycle 22) overlaid (left) and the analysis fields for the prognostic variables of small phytoplankton (centre) and large phytoplankton (right) for the 12 September 2013.

Figure 16 .
Figure16.Observed true-colour image on 12 September 2013 obtained from 1 km resolution, atmospherically corrected ANN remote-sensing reflectance.The RGB wavelengths used were 667, 551 and 488 nm and processed using the MODIS true-colour algorithm(Gumley et al., 2010;Baird et al., 2016a).The white pixels are clouds, and grey is land.

Figure 17 .
Figure 17.The simulated true-colour image on 12 September 2013 of the control run (top left) and EXP4 assimilation run (top right).The difference between the remote-sensing reflectance in the control and assimilated runs was used to quantify the colour (referred to as off-colour) added (i.e.greater surface expression, bottom left) and subtracted (i.e. less surface expression, bottom right) due to the updating of optically active constituents in the assimilation run (see Fig.16for more details).Note that the off-colour images have a smaller brightening factor as the MODIS true-colour stretch saturates the features that are of most interest.Simulated true-colour images are not falsely coloured, and thus do not require a colour map, nor are they 2-D as they have a depth of field, being based on reflectance from multiple depths and the bottom(Baird et al., 2016a).Thus simulated true colour can be considered a photograph of the optical state of the different model runs and, like observed true colour, a powerful and intuitive visualisation tool for water clarity in BGC models.

Table 1 .
The subset of state variables included in the state vector and corresponding sum of the analytical (E A ) and difference in kind errors (E D ) expressed as standard deviations in log-space used for the five assimilation system configurations; note that E tot = E A + E D + E R , as per Sect.2.5.2, and E tot are used on the diagonal elements of the observation error covariance matrix (R).The bold variables in the state vector are used in the input to the observation operator.It should be noted than the state variables are transformed by taking the natural logarithm of the variables.The observation error is then applied to the log-transformed state vector.
The observational data are publicly available via the IMOS portal (https://portal.aodn.org.au/search).The AIMS MMP data are available via Australian Institute of Marine Science (AIMS) 2016, AIMS Water Quality Chlorophyll and Turbidity Time-series Data, http://data.aims.gov.au/metadataviewer/faces/view.xhtml?uuid=8a698de1-3fbf-48a5-b068-358b07aad35c.The model output is very large and is not catalogued online; however, it is available upon request from the author (Emlyn M. Jones).