Detection of anthropogenic climate change in satellite records of ocean chlorophyll and productivity

. Global climate change is predicted to alter the ocean’s biological productivity. But how will we recognise the impacts of climate change on ocean productivity? The most comprehensive information available on its global distribution comes from satellite ocean colour data. Now that over ten years of satellite-derived chlorophyll and productivity data have accumulated, can we begin to detect and attribute climate change-driven trends in productivity? Here we compare recent trends in satellite ocean colour data to longer-term time series from three biogeochemical models (GFDL, IPSL and NCAR). We ﬁnd that detection of climate change-driven trends in the satellite data is confounded by the relatively short time series and large interannual and decadal variability in productivity. Thus, recent observed changes in chlorophyll, primary production and the size of the oligotrophic gyres cannot be unequivocally attributed to the impact of global climate change. Instead, our analyses suggest that a time series of ∼ 40 years length is needed to distinguish a global warming trend from natural variability. In some regions, notably equatorial regions, detection times are predicted to be shorter ( ∼ 20 − 30 years). Analysis of modelled chlorophyll and primary production from 2001– 2100 suggests that, on average, the climate change-driven trend will not be unambiguously separable from decadal variability until ∼ 2055. Because the magnitude of natural variability in chlorophyll and primary production is larger than, or similar to, the global warming trend, a consistent, decades-long data record must be established if the impact of climate change on ocean productivity is to be deﬁnitively detected.


Introduction
Ocean primary production (PP) makes up approximately half of the global biospheric production (Field et al., 1998), so detecting the impact of global climate change on ocean productivity and biomass is an essential task. The consequence of increasing global temperatures, in combination with altered wind patterns, is to change the mixing and stratification of the surface ocean (e.g. Sarmiento et al., 2004). Reduced mixing and increased stratification at low latitudes will further limit the supply of nutrients to the euphotic zone, and is predicted to result in reduced PP. The canonical view is that at high latitudes, where phytoplankton growth is light limited during winter, decreased mixing may result in earlier re-stratification and a lengthened growing season, resulting in increased PP (Bopp et al., 2001;Doney, 2006). In contrast, recent analyses by Behrenfeld et al. (2008a and suggest that increasing sea surface temperature (SST) corresponds to reduced PP in sub-polar regions (although the response is weaker than for the sub-tropics). Water temperature also has a direct influence on phytoplankton growth and metabolic rates. Production increases with increasing temperature until a species-specific maximum is reached, after which rates decline rapidly (Eppley, 1972). Rising temperatures will also result in changes to the distribution of phytoplankton species. Some species, adapted to warm temperatures and low nutrient levels (usually small picoplankton) will expand their range, whilst others that prefer turbulent, cool and nutrient-rich waters (mostly large phytoplankton, e.g. diatom species) may migrate poleward as temperatures rise. Polar and ice-edge species will have to adapt to warmer conditions and associated changes in stratification and freshwater input, or risk extinction. These shifts in species composition may alter carbon export and the availability of food to higher trophic levels. Large phytoplankton, such as diatoms Published by Copernicus Publications on behalf of the European Geosciences Union. and coccolithophores, are believed to be responsible for the majority of carbon export (e.g. Michaels and Silver, 1988;Brzezinski et al., 1998). If replaced by smaller warm-water species, the export of carbon from surface waters may be reduced. Phytoplankton are also the base of the marine food web and shifts in the dominant species or overall abundance may alter fishery ranges and yields (e.g. Iverson, 1990;Chavez et al., 2003;Ware and Thomson, 2005;Cheung et al., 2009aCheung et al., , 2009b. Because of ocean productivity's key role in the global carbon cycle, many studies have sought to quantify the variability and climate response of biomass and/or PP (for a recent review of studies using satellite data, see McClain et al., 2009). Long (∼ 20 years) time series of chlorophyll and PP have been measured at the BATS and HOT stations. These time series stations have the advantage of measuring subsurface and biogeochemical properties that cannot be estimated using satellite data. However, the principal source of global, multi-year ocean productivity data are the Sea-WiFS and MODIS-Aqua ocean colour instruments. Sea-WiFS has been operating since September 1997 (continuously until December 2007, and intermittently thereafter), and MODIS-Aqua since July 2002. In addition, limited ocean colour data are available from the Coastal Zone Color Scanner (CZCS), which operated from 1978-1986(Madrid et al., 1978, although difficulties cross-calibrating the CZCS and contemporary records have hampered efforts to study multi-decadal variability (Antoine et al., 2005;Gregg et al., 2003). Notwithstanding this, Martinez et al. (2009) demonstrated that the first principal component of CZCS (processed using the Antoine et al. (2005) methodology) and SeaWiFS chlorophyll-a (chl) show similar responses to SST. However, a direct comparison of changes in the magnitude of chl or PP between the CZCS and SeaWiFS datasets remains problematic.
With over 10 years of data now available, SeaWiFS products are being used to explore variability and trends in subtropical productivity (e.g. Behrenfeld et al., 2006;McClain et al., 2004;Gregg et al., 2005;, high latitude productivity (Behrenfeld et al., 2008a;Behrenfeld et al., 2009), coastal productivity (Kahru and Mitchell, 2008;Kahru et al., 2009) and extent of the oligotrophic gyres (McClain et al., 2004;Irwin and Oliver, 2009). These studies found that trends in the SeaWiFS record are often dominated by natural decadal variability, as embodied in indices such as the Pacific Decadal Oscillation or the Multivariate ENSO Index. However, one study concluded that the size of the oligotrophic gyres had increased over the span of the SeaW-iFS record as a response to global warming (Polovina et al., 2008).
Models have also been used to investigate the interplay between natural variability and global climate change trends. For example, Boyd et al. (2008) demonstrated that, in the Southern Ocean, the magnitude of long-term changes in stratification and mixed layer depth in an earlier version of the NCAR model run under global warming conditions were small relative to the interannual variability; and that a definitive climate-warming signal in modelled mixed layer depth could not be detected until ∼ 2040 in Southern Ocean polar waters and no unequivocal trend at all was detected in the sub-polar region (their analysis extended to 2100). Boyd et al. (2008) speculated that the gradual changes in physical properties would result in similarly slow changes in phytoplankton populations. Similarly, an experiment with an earlier version of the IPSL model, forced with a CO 2 doubling scenario, demonstrated that it took between 30 and 60 years to detect changes in export production in the equatorial Pacific (Bopp et al., 2001).
Here, we use both satellite ocean colour data and output from 3 coupled physical-biogeochemical models (GFDL, IPSL and NCAR) to explore the decadal variability, historical trends and future response of chlorophyll concentration and PP. We examine trends in both chl and PP here, as the chl product from the SeaWiFS instrument is better validated and has lower errors than PP. This is partly because the database of in situ observations used to calibrate the algorithms contains many more chl than PP measurements, and partly because chl is more closely related to what SeaWiFS actually measures (water leaving radiances). In addition, the chl product represents surface concentrations, whereas PP is an estimate of the depth-integrated productivity. Algorithms to derive PP from satellite data are still subject to fairly large uncertainties (e.g. Joint and Groom, 2000), partly because satellite ocean colour instruments only measure surface conditions and extrapolating to a depth-integrated quantity poses additional difficulties. Uncertainties also arise from errors in the input parameters to PP algorithms (i.e. chl, SST and photosynthetically available radiation; Friedrichs et al., 2009). Indeed, in some instances, satellite PP algorithms are no more skilful at reproducing in situ PP measurements than biogeochemical models (Friedrichs et al., 2009). We investigate both chl and PP here because chl can change without corresponding changes in phytoplankton biomass or PP, due to the ability of cells to alter their chlorophyll to carbon ratio in response to light or nutrient stress (e.g. Laws and Bannister, 1980;Geider, 1987). An investigation of how this might be manifested in the SeaWiFS record can be found in Behrenfeld et al. (2008b). Primary production, on the other hand, is the parameter that will have a more direct impact on the global carbon cycle.
A note on terminology is warranted here to avoid confusion. Throughout the paper, we use the phrase "natural variability" to refer to interannual, decadal or multi-decadal variability in PP or chl driven by oscillatory or transient physical forcing (e.g. El Niño events, shifts in the phase of the North Atlantic Oscillation etc). This is contrasted with "trends", which we use to indicate long-term (multi-decadal or longer) changes in PP or chl driven by persistent anomalous forcing (i.e. global warming). We first investigate whether the trends in PP, chl and oligotrophic gyre size observed in the 10-year Biogeosciences, 7, 621-640, 2010 www.biogeosciences.net/7/621/2010/ satellite record are likely to be reflecting climate change, and conclude that the dataset is not yet long enough to unequivocally detect a global warming trend. A statistical analysis of biogeochemical model output suggests that a PP or chl time series of ∼ 40 years duration will be needed to distinguish a climate change signal from natural interannual to decadal variability. We also explore predictions of when the impact of global climate change on chl and PP will exceed the range of natural variability and become unambiguously detectable. The analyses presented here have significant implications for our ability to recognise the impacts of climate change on ocean productivity, and for strategies for monitoring ocean biology into the future.

Satellite data
Monthly mean level-3 chlorophyll concentration data (derived from algorithm OC4, reprocessing v5.2) for September 1997-December 2007 were downloaded from http:// oceancolor.gsfc.nasa.gov/. Satellite-derived chl, SST and photosynthetically available radiation were used to estimate PP using three different algorithms (Behrenfeld and Falkowski, 1997;Carr, 2002;Marra et al., 2003). Each algorithm has been validated against a database of in situ measurements, but each is formulated somewhat differently. To minimise potential biases or errors associated with any one algorithm, we use the PP estimated from all three methods. Each of these three algorithms produced PP trends of similar magnitude and spatial distribution. A fourth PP algorithm, the Carbon-based Productivity Model (CbPM), was also tested (Behrenfeld et al., 2005). Calculation of PP using the CbPM requires knowledge of the mixed layer depth (MLD). We calculated PP using MLD estimated from the ECCO (Estimating the Circulation and Climate of the Ocean; www.ecco-group.org) and the SODA (Simple Ocean Data Assimilation; www.atmos.umd.edu/ ∼ ocean) reanalysis programmes, and also using the hybrid MLD data used in Behrenfeld et al. (2005) and described at http://www.science. oregonstate.edu/ocean.productivity/mld.html. The sensitivity of the CbPM-derived PP to the MLD product used is detailed in Milutinovic et al. (2009). Our analyses found that each MLD product resulted in substantially different magnitude and spatial distribution of statistically significant trends. Results from the CbPM algorithm were also substantially different from the three other algorithms. The PP derived from this algorithm was excluded from the subsequent analyses.

Global physical-biogeochemical models
Three coupled physical-biogeochemical models are used to estimate long-term trends in PP: GFDL-TOPAZ (Dunne et al., 2005 and, IPSL-PISCES (Aumont and Bopp, 2006) and NCAR-CCSM3 . For the hindcast runs, ocean-only versions of the different models are used. Air temperature and incoming fluxes of wind stress, freshwater, shortwave and longwave radiation are prescribed as boundary conditions from the CORE version 2 reanalysis effort (for the GFDL and NCAR models), which covers the period 1958-2006 Yeager, 2004 and, and from NCEP forcing for the IPSL model, from 1948(Kalnay et al., 1996. The CORE forcing dataset is based on the NCEP forcing, with additional satellite data incorporated. The forcing datasets thus contain recent signals of climate change, e.g. rising air temperatures. Running the models in hindcast mode means that the modelled interannual and decadal variability is directly comparable to the variability in the data, rather than just in a statistical sense (as is the case for the global warming simulations in the coupled models). For the future climate change runs, the full coupled climate-biogeochemistry versions of the different models are used. These future climate change runs all use historical forcing (greenhouse gases and aerosols emissions or concentrations) from 1860-2000 and the IPCC A2 scenario (Nakicenovic and Swart, 2000) from 2001-2100. The A2 scenario envisages continued population growth and an increasing economic gap between the industrialised and developing nations, resulting in high cumulative carbon emissions. Each model's biogeochemistry is parameterised differently, and so results from all three models are compared in order to minimise potential errors and biases associated with any one model.

GFDL model
A biogeochemical and ocean ecosystem model (TOPAZ), developed at GFDL, has been integrated with the MOM-4 ocean model (Griffies et al., 2004;Gnanadesikan et al., 2006). MOM-4 has 50 vertical z-coordinates and spatial resolution is nominally 1 • globally, with higher 1/3 • resolution near the equator. The TOPAZ biogeochemical model includes all major nutrient elements (N, P, Si and Fe), and both labile and semi-labile dissolved organic pools, along with parameterizations to represent the microbial loop. Growth rates are modelled as a function of variable chl:C ratios and are co-limited by nutrients and light. Photoacclimation is based on the Geider et al. (1997) algorithm, extended to account for co-limitation by multiple nutrients and including a parameterisation for the role of iron in phytoplankton physiology (following Sunda and Huntsman, 1997). Loss terms include zooplankton grazing and ballast-driven particle export. Remineralisation of detritus and cycling of dissolved organic matter are also explicitly included (Dunne et al., 2005). The model includes highly flexible phytoplankton stoichiometry and variable chl:C ratios. Three classes of phytoplankton form the base of the global ecosystem. Small phytoplankton represent cyanobacteria and picoeukaryotes, resisting sinking and tightly bound to the microbial loop.
Large phytoplankton represent diatoms and other eukaryotic phytoplankton, which sink more rapidly. Diazotrophs fix nitrogen directly rather than requiring dissolved forms of nitrogen. Wet and dry dust deposition fluxes are prescribed from the monthly climatology of Ginoux et al. (2001) and converted to soluble iron using a variable solubility parameterisation (Fan et al., 2006). TOPAZ includes a simplified version of the oceanic iron cycle including biological uptake and remineralisation, particle sinking and scavenging and adsorption/desorption. Application of the TOPAZ model to determining global particle export and phytoplankton bloom timing have been detailed in Dunne et al. (2007) and Henson et al. (2009), respectively. The hindcast simulations were spun-up for 250 years using a repeat annual cycle of physical forcing, prior to initiating the interannually varying forcing. For the coupled runs, the GFDL Earth System Model (ESM2.1) includes atmospheric (AM2.1) and terrestrial biosphere (LM3) components (Anderson et al., 2004), in addition to the TOPAZ biogeochemistry model. The physical variables in GFDL's ESM2.1 were initialized from GFDL's CM2.1 (Delworth et al., 2006). The control run based on 1860 conditions was spun-up for 2000 years. Biogeochemical parameters were initialized from observations from the World Ocean Atlas 2001 (Conkright et al., 2002) and GLO-DAP (Key et al., 2004). This model was spun up for an additional 1000 years, with a fixed CO 2 atmospheric boundary condition of 286 ppm. For an additional 100 years, the atmospheric boundary condition was switched to a fully interactive atmospheric CO 2 tracer. Simulations were then made based on the IPCC AR4 protocols (A2 scenario).

IPSL model
The IPSL PISCES biogeochemical model (Aumont and Bopp, 2006) is coupled to the NEMO-OPA ocean general circulation model (Madec, 2008) in a configuration that here has 30 vertical levels and a horizontal resolution of 2 • ×cos (latitude) in the extratropics, with enhanced resolution of 0.5 • at the equator. Phytoplankton growth in the PISCES model can be limited by temperature, light and five different nutrients (NO 3 , PO 4 , Si, Fe and NH 4 ). Two phytoplankton and two zooplankton size classes are represented: nanophytoplankton, diatoms, microzooplankton and mesozooplankton. The diatoms differ from the nanophytoplankton by requiring silica for growth, by having higher requirements for iron (Sunda and Huntsman, 1995) and by higher half-saturation constants. For all species, the C:N:P ratios are assumed constant at the values proposed by Takahashi et al. (1985), but the internal ratios of Fe:C, chl:C and Si:C of phytoplankton are prognostically simulated. There are three non-living components of organic carbon: semi-labile DOC, and large and small detrital particles that differ by their vertical sinking speeds. The microbial loop is also implicitly represented. Nutrients are supplied to the ocean from three sources: atmospheric deposition, rivers and sedimen-tary sources. Iron is supplied by aeolian dust deposition, estimated from the monthly modelled results of Tegen and Fung (1995). Iron is also supplied from sediments following the method of Moore et al. (2004). Iron concentrations at the surface are restored to a minimum of 0.01 nM. This baseline concentration, which represents non-accounted for sources of iron that could arise from processes not explicitly taken into account in the model, has been shown to greatly improve the representation of the chlorophyll tongue and the iron-to-nitrate limitation transition zone in the Equatorial Pacific (Schneider et al. 2008). An improved version of PISCES (Tagliabue et al., 2009), taking into account Fe speciation, is able to represent the zonal extent of Equatorial Pacific chlorophyll without needing to include an unconstrained Fe source, but is not used in this study. The PISCES model has previously been used for a variety of studies concerned with paleo, historical and future climate. A full description and an extended evaluation against climatological datasets can be found in Aumont and Bopp (2006).
For the hindcast simulations, the initial conditions for nutrients and inorganic carbon are prescribed from data-based climatologies and the model is spun-up for 150 years using ERA-40 interannually varying forcing, prior to initiating the NCEP interannually varying forcing in 1948. For the global warming simulations, we used an offline version of the PISCES model that is forced with monthly outputs of a coupled climate simulation carried out with the IPSL-CM4 model as described in Bopp et al. (2005). IPSL-CM4 consists of an atmospheric model (LMDZ-4;Hourdin et al., 2006), a terrestrial biosphere component (ORCHIDEE; Krinner et al., 2005) and the OPA-8 ocean model and LIM sea ice model (Madec et al., 1998).

NCAR model
The Community Climate System Model (CCSM-3) ocean biogeochemistry model, consisting of an upper-ocean ecological module (Moore et al., 2004) and a full-depth ocean biogeochemistry module , is embedded in a global physical ocean general circulation model (Collins et al., 2006). The ecosystem module is based on the traditional NPZD (nutrients-phytoplankton-zooplankton-detritus) type models, expanded to include multiple nutrients that can limit phytoplankton growth (N, P, Si and Fe) and specific phytoplankton types (Moore et al., 2004). Three phytoplankton classes are represented: diatoms, diazotrophs and small plankton (pico/nanoplankton). Diazotrophs fix their required nitrogen from N 2 gas; small plankton exhibit rapid and very efficient nutrient recycling; and the diatom group represents large, bloom-forming phytoplankton. Growth rates are determined by available light and nutrients and photoadaptation is parameterised with variable chl:C ratios. The model has one zooplankton class which grazes on phytoplankton and large detritus. The biogeochemistry module includes full carbonate system thermodynamics and air-sea CO 2 and O 2 fluxes Biogeosciences, 7, 621-640, 2010 www.biogeosciences.net/7/621/2010/ 2006), nitrogen fixation and denitrification (Moore and Doney, 2007) and a dynamic iron cycle with atmospheric dust deposition, scavenging and a lithogenic source (Moore et al., 2006). For the hindcast simulations, the initial conditions for nutrients and inorganic carbon are prescribed from data-based climatologies (e.g. Key et al., 2004). The biogeochemistry model is spun up for several hundred years using a repeat annual cycle of physical forcing, prior to initiating the interannually varying forcing. Model ecosystem components converge within a few years (Doney et al., 2009a and. For the coupled runs, the NCAR model (CCSM3.1) includes, in addition to the ocean biogeochemistry and ecosystem components, a prognostic carbon cycle and coupled terrestrial carbon and nitrogen cycles (Thornton et al., 2009) embedded into a land biogeophysics model (Dickinson et al., 2006). Details of the initialisation, spin-up and overall behaviour of this version of the model can be found in Thornton et al. (2009). In brief, a sequential spin-up procedure was employed, similar to one previously described by Doney et al. (2006), to reduce the magnitude of drifts in the carbon pools when carbon and nitrogen are coupled to the climate of the coupled model. The ocean component was branched from the end of the ocean-only spin-up simulation mentioned above and which was forced with an observationally based physical atmospheric climatology and fixed atmospheric CO 2 held at a preindustrial value. Several incremental coupling steps are performed over several hundreds of years of model simulation to bring the system gradually to a stable initial condition in both surface temperature and atmospheric CO 2 . A 1000-year long preindustrial control simulation was then performed, and the historical and A2 simulations were branched from the middle of the pre-industrial control simulation.

Statistical analyses
The linear trend in monthly anomalies of SeaWiFS-derived chl and PP was calculated using a simple model, which can be formally stated as: where Y t is the data, µ is a constant term (the intercept), X t is the linear trend function (here time in months), ω is the magnitude of the trend (the slope) and N t is the noise, or unexplained portion of the data. The noise, N t , is assumed to be autoregressive of the order 1 (i.e. AR, Eq. 1), so that successive measurements are autocorrelated, φ =Corr (N t , N t−1 ).
Large values of autocorrelation, as often found in geophysical variables, increase the length of trend-like segments in the data, confounding the estimate of the real trend.
For the global warming simulations, we also tried fitting an exponential curve to the PP time series, of the form: where µ =ln (α). The coefficient of determination and standard deviation of the residuals were compared for the linear and exponential fits. In the vast majority of cases the two fits had similar statistics, so in the interests of parsimony we used the linear trend model. The number of years required to detect a trend above natural variability is calculated by the method of Tiao et al. (1990) and Weatherhead et al. (1998). More details of the origin of this equation can be found in Appendix A. The number of years, n * , required to detect a trend with a probability of 90% is: where σ N is the standard deviation of the noise (i.e. the residuals after the trend has been removed), ω is the estimated trend and φ is the autocorrelation of the AR (Eq. 1) noise. The number of years required to detect a trend when a data gap is present, n * * , is: where τ = (T 0 -1)/T and T is the total length of the time series and T 0 is the time of the interruption. For an interruption half-way through the data collection period, τ =0.5 and n * * is a factor of 1.59 larger than n * .

Biome definition
For ease of presentation, the calculated trends are averaged within 14 biomes (marked in contours on Fig. 1). The mid to high latitude biomes are defined as the regions in which phytoplankton growth is seasonally light limited (> 6 months/year when depth-averaged irradiance is < 21 W m −2 ; Riley, 1957). The equatorial regions are those in which annual mean net heat flux is < 30 W m −2 (ocean gaining heat). The remaining areas are classed as oligotrophic. Mixed layer depth data used to define the biomes came from the SODA programme; photosynthetically available radiation data came from the SeaWiFS project (http: //oceancolor.gsfc.nasa.gov); and net heat flux data was calculated using NCEP-NCAR Reanalysis Project fields (http: //www.cdc.noaa.gov/data/gridded/). The biomes are further divided by hemisphere and ocean basin, and finally the low latitude Indian Ocean is separated into Arabian Sea and Bay of Bengal biomes.

Climate change or decadal variability?
As a measure of the change in ocean productivity in the last was calculated (Fig. 1). Only those regions where the trend is statistically significant at the 95% level are plotted. The strong El Niño event at the start of the Sea-WiFS record in 1997/1998 is worth noting here, as linear trends in short data records can be sensitive to the values at the beginning and end of the time series. There are several large, coherent patches of significant trend in both chl and PP, particularly in the oligotrophic gyres of all three ocean basins, whilst at high latitudes there are a few smaller patches of significant trend. The typical magnitude of trends in chl is ∼ ± 0.002 mg m −3 /year, with peak values of ± 0.04 mg m −3 /year. For PP, typical trend magnitudes are of the order ∼ ± 1 mg C m −2 day −1 /year, with maxima of ∼ ± 30-40 mg C m −2 day −1 /year. The strongest negative trend is in the northern North Atlantic, and strongest positive trend is south and east of Australia. The spatial distribution of statistically significant trends are similar to the regions of large PP change between 1999 and 2004 shown in Behrenfeld et al. (2006;their Fig. 3b). The trends in the sub-tropics have been interpreted as reflecting the impact of global warming on PP (Polovina et al., 2008;Gregg et al., 2005;Kahru et al., 2008). However, to positively attribute these trends to climate change it has to be demonstrated that a 10-year record is able to capture a real trend, rather than just natural interannual to decadal variability. The SeaWiFS trends are compared to those estimated from three different biogeochemical models run using reanalysis forcing. The modelled chl and PP is split into overlapping 10-year sections (i.e. the trend for the period January 1958-December 1967 is calculated, then the trend for the period January 1959-December 1968, in order to examine the effect of using the relatively short time series of SeaWiFS observations to define trends. The 10-year trends calculated from the models are compared to the SeaWiFS chl trend in Fig. 2 and SeaWiFS PP trend in Fig. 3. (Note that analysis of different periods, e.g. September 1959 to December 1968, and so on, in the models, or January 1998 to December 2007 in the SeaWiFS data, did not significantly change the results). For ease of presentation the trends are reported as the average in each biome (biomes marked on Fig. 1). The trends and variability are similar in all 3 models, particularly in low latitudes, despite each model having differently parameterised physics and biogeochemistry. The trends in PP for each of the three different SeaWiFS algorithms are plotted as red stars in Fig. 3, and are generally similar. The high latitude North Pacific has the largest difference between the three algorithms, with the Behrenfeld and Falkowski (1997) algorithm showing a small positive trend in PP and the Carr (2002) and Marra et al. (2003) algorithms exhibiting a negative trend. In the other regions, the calculated trend is relatively insensitive to the algorithm used to estimate SeaWiFS PP.
The final datapoints of the modelled results in Figs. 2 and 3 represent the 10-year model trend that overlaps with the SeaWiFS record. In some biomes, e.g. the equatorial Pacific, the modelled trends overlap with the trends in SeaWiFSderived chl and PP, but in other regions the modelled and data trends diverge (e.g. the oligotrophic North Atlantic). This may arise from the models' lack of skill in reproducing the observed interannual variability. Of particular importance is the models' ability to reproduce the chl or PP response to the 1997/1998 El Niño. Correlation coefficients between time series of satellite-derived and modelled globally averaged chl and PP show a wide range of skill (For chl, r = 0.32 (p < 0.05) for GFDL, r = 0.22 (p < 0.05) for IPSL and r = 0.1 (n. s.) for NCAR. For PP, r = 0.23 (p < 0.05) for GFDL, r = 0.62 (p < 0.05) for IPSL and r = 0.15 (n. s.) for NCAR.) The models' coarser resolution, as compared to the data, errors in spatial positioning of circulation features, e.g. upwelling regions, and variability in the trends within each biome (Fig. 1), may result in mismatch between modelled and data-derived trends, when averaged within the biomes. We use the modelled trends here to place the SeaW-iFS data in a longer-term context and provide an estimate of variability in previous decades.  If a climate change trend were dominating the chl or PP signal, Figs. 2 and 3 would show consistently positive or negative trends. Instead, the sign of the trend in the 10-year long sections of modelled chl and PP switches between positive and negative on decadal timescales. The 10-year trend in SeaWiFS chl and PP is of similar magnitude to trends of previous decades, suggesting that the magnitude of decadal variability in chl or PP is currently larger than, or similar to, the response to global climate change. This influence of decadal variability on determining the apparent trends in relatively short time series is particularly evident in the low latitude biomes. For example, in the oligotrophic North Pacific, strong decadal variability is evident in the regular switching between periods of positive and negative trends. Seen in this longer-term context, it appears that the negative trend in the oligotrophic gyres observed in the last 10 years of SeaW-iFS data (Polovina et al., 2008;Gregg et al., 2005) is likely reflecting decadal variability, rather than a global warming response. For both chl and PP, the trends in the 10 years of SeaWiFS data fall within the bounds of trends in previous decades in most biomes in at least two of the models (i.e. the 95% confidence intervals overlap). The exceptions for chl are the high latitude North Atlantic and the Arabian Sea (Fig. 2), where the observed variability is greater than expected from the model results. In all other biomes, the trends in the 10 years of SeaWiFS chl and PP are not unprecedented when viewed in a longer-term context.
A climate change trend may be present in the data, in addition to the natural variability. However, within the relatively short length of the satellite ocean colour time series, the decadal variability is of a greater, or similar, magnitude than the trend. Therefore, the linear trends in PP or chl estimated 10-year trend in SeaWiFS-derived primary production (calculated using three different PP algorithms) compared to 10-year trends in three global biogeochemical models. The mean trend and 95% confidence interval in SeaWiFS-derived PP in each biome (see Fig. 1) and from 10-year long segments of output from the GFDL, IPSL and NCAR models are plotted. Negative (positive) trends for a particular 10-year period represent declining (increasing) primary production values over that period. The first data point is the trend in modelled primary production from January 1958-December 1967 and is plotted at 1958; the second is the trend from January 1959-December 1968 and is plotted at 1959, etc. from the SeaWiFS record cannot be separated from interannual to decadal variability, and cannot be attributed unequivocally to the impact of global warming.

Expansion of the oligotrophic gyres
The negative trends in SeaWiFS chl in the oligotrophic gyres ( Fig. 1) have been attributed to global warming-related increases in SST and stratification (Polovina et al., 2008). The models again allow the recent observed trends in the areal extent of oligotrophic waters to be put into a longer-term context. The size of the oligotrophic regions are estimated as the area (km 2 ) of the ocean where chl < 0.07 mg m −3 , following Polovina et al. (2008) and McClain et al. (2004). The time series from 1958-2006 of oligotrophic gyre size, both globally and regionally, in each of the three models is plotted in Fig. 4. In all three models, the global extent of oligotrophic waters has distinct multi-decadal variability, with a period of reduced size from 1958-1977, and increased area from 1977-1996. There is a local minimum in 1998, after which the global oligotrophic area increases again.
Regionally, the North Pacific gyre size has pronounced variability with a period of 4-6 years and reflects the El Niño-Southern Oscillation (ENSO) cycle. During El Niño events equatorial upwelling is curtailed, resulting in a temporary expansion of the region of low productivity, and vice versa during La Niña years. The size of the South Pacific gyre has a distinct step change around 1977, coinciding with the well-documented regime shift of the North Pacific ecosystem (Francis et al., 1998;McGowan et al., 1998). Superimposed on this increase of ∼ ×8 × 10 6 km 2 is substantial interannual variability. In the GFDL and NCAR models, the South Biogeosciences, 7, 621-640, 2010 www.biogeosciences.net/7/621/2010/ Atlantic gyre has a more gradual decline in size with a transition around 1990 to an oligotrophic area ∼ ×1 × 10 6 km 2 smaller than in previous decades. The North Atlantic has an increasing trend in oligotrophic area with large decadal variability superimposed in the GFDL and NCAR models. No trend in the North or South Atlantic gyre size is evident in the IPSL model. This may be due to the implementation of a minimum iron concentration in the IPSL model, which has the effect of dampening the variability of iron and corresponding variability in PP. In most oligotrophic regions, and in the global total, a local minimum occurs around 1998, after which the size of the low chlorophyll area increases again. The minimum is likely driven by the strong ENSO event which occurred in 1997/1998, and which happened to coincide with the start of the SeaWiFS data record. This is the likely origin of the increasing trend in gyre size observed in the SeaWiFS data (Polovina et al., 2008;Irwin and Oliver, 2009). Evidently, large decadal variability in the extent of the oligotrophic waters confounds attempts to extract trends from the 10-year satellite record. The models provide the needed context and suggest that in some regions, and some models, the size of the low chlorophyll area may have a long-term trend (in some areas increasing and in others decreasing), in addition to decadal variability. More certain is that ENSO events, regime shifts, and decadal variability have a pronounced influence on the size of the oligotrophic gyres.

Modelled trends in productivity in global warming simulations
So far the analysis has used output from hindcast model simulations for the contemporary period. The results generally indicate that any climate change trend in the 10 years of satellite-derived chl or PP is not yet distinguishable from the natural interannual to decadal variability. Clearly, 10 years is not enough, but how many years of observations will we need to detect a trend? To answer this question, we use output from coupled ocean-atmosphere models run into the future under global warming conditions. For the rest of the analysis, we turn to simulations forced with the IPCC global warming scenario, A2. The modelled trends in chl and PP for the period 2001-2100 for all three coupled models are plotted in Fig. 5. For detailed intercomparisons of modelled global warming response in chl and PP see also Schneider et al. (2008) and Steinacher et al. (2009)

How many years of data are needed to detect a trend in ocean productivity?
The output from the global warming simulations can be used to investigate the length of time series needed to detect a trend above the natural variability. We employ a method that calculates the signal-to-noise (i.e. trend-to-natural variability) ratio of a time series and, accounting for auto-correlation, estimates the number of data points necessary to detect a real trend (Eq. 3); Weatherhead et al., 1998). The method is applied to output from the three models run under the IPCC A2 scenario. The number of years required to detect a trend above the natural variability in chl and PP is Biogeosciences, 7, 621-640, 2010 www.biogeosciences.net/7/621/2010/ Table 1. Length of time series in years needed to detect a global warming trend in chlorophyll concentration and primary production (bold) above the natural variability, reported for each model as the average within the biomes (see Fig. 1 for biome locations). One standard deviation of the spatial average is shown in brackets.  (13) 44 (14) 34 (11) (7) 15 (2) 24 (8) 32 (6) 32 31 9. Oligotrophic South Atlantic 40 (12) 40 (13) 35 (12) 23 (13) 33 (13) 38 ( plotted in Fig. 6. The minimum length time series required is at least 15 years, but in many regions a time series of 50-60 years or more is needed (see Table 1 for biome mean values). (Note that the methodology used here does not specify a start date for the period of observations. In Sect. 3.5 the time period during which the climate change-driven signal exceeds natural variability is estimated, specifically to address whether a global warming signal might already be detectable in the satellite ocean colour record.) All three models suggest relatively short detection times (∼ 20 − 30 years) for chl in the equatorial regions. Longest detection times for chl (∼ 50 − 60 years) occur in parts of the Southern Ocean. The climate change trend in PP in the IPSL model is the most rapidly detectable, with a mean of ∼ 33 years. All three models suggest shorter detection times (∼ 20 − 30 years) for PP in equatorial regions (including the Arabian Sea) and the South Atlantic. Longest detection times (∼ 50 − 60 years) for PP occur in parts of the Southern Ocean and in the Arctic north of Iceland. Globally, the average length of a continuous time series required to unequivocally detect a trend in chl is 39 years or 41 years for PP. The satellite ocean colour dataset is currently 30 years short of that target. In order to extend the ocean productivity dataset, the CZCS data (1978)(1979)(1980)(1981)(1982)(1983)(1984)(1985)(1986) have been reprocessed to be consistent with SeaWiFS, creating a quasi 31-year dataset. However, two different methodologies have been developed, each of which gives different results. One method yields a 6% decrease in global chl between the 1980's and the early part of the SeaWiFS period (Gregg et al., 2003); the other method indicates a 22% increase (Antoine et al., 2005). Although a recent study (Martinez et al., 2009) employed EOF analysis to demonstrate that variability in CZCS and SeaWiFS chl responded in a similar fashion to sea surface temperature, directly comparing the two datasets remains challenging. The obvious technical difficulties in producing a consistent time series from two differently designed instruments that did not overlap in time sounds a clear note of caution about potential future gaps in the satellite ocean colour record. Fig. 6. Number of years required to detect a global warming trend in chlorophyll concentration (left column) and primary production (right column) above the natural variability, calculated for the GFDL, IPSL andNCAR models (A2 scenario, 2001-2100). Only points where the trend is statistically significant at the 95% level are plotted.
If there is a gap in the ocean colour time series, there are not only cross-calibration issues to face; the number of years required to detect a trend will also increase. If the data gap occurs roughly halfway through data collection, the number of years required would increase by ∼ 50% (Eq. 4); Weatherhead et al., 1998). So in the case of ocean PP or chl, if a data gap arises due to the failure of SeaWiFS and MODIS-Aqua, the mean length of time needed to detect a global climate change response would increase from ∼ 40 to ∼ 60 years.

When could the climate change signal exceed natural variability in productivity?
Although we need many more years of data before a trend in chl or PP can be unequivocally ascribed to global warming, is it possible that climate change has already impacted productivity within the satellite ocean colour era? The modelled chl and PP provides an estimate of the year when the climate change signal exceeds the natural variability of the system, represented by the standard deviation of the models' control runs (i.e. no external CO 2 forcing is applied). The year when the climate change-driven signal exceeds the variability is defined here as the year when the chl or PP in the warming run exceeds the standard deviation of the control run for at Biogeosciences, 7, 621-640, 2010 www.biogeosciences.net/7/621/2010/  Table 2. In general, even if the extended CZCS-SeaWiFS dataset were used, the observed shifts in chl or PP are unlikely to exceed the natural variability, and therefore cannot be unequivocally attributed to global warming. Note also that there are extensive regions where the changes in chl or PP remain smaller than the natural variability throughout the time frame of this analysis (which extends to 2100). An example from the oligotrophic Pacific (Fig. 7b) demonstrates how a climate change signal may be masked by vigorous interannual and decadal variability. As a global average, the climate change-driven trend in chl does not exceed natural variability until ∼ 2052 and not until ∼ 2057 for PP.

Discussion
The launch of the SeaWiFS ocean colour instrument in September 1997 ushered in a new era of biological oceanography. For the first time, daily high resolution images of surface phytoplankton distributions became publicly available, resulting in a substantial leap forward in our understanding of ocean productivity patterns from the global scale to the mesoscale and in temporal variability from days to years. Ten-plus years of ocean colour data have provided unprecedented coverage of changes in ocean productivity -but are the observed changes reflecting global climate change or natural variability? Our analyses suggest that 10 years of ocean colour data alone are not enough to unequivocally ascribe a trend in PP or chl to global climate change. Decadal variabilty in chl and PP is sufficiently large that it confounds attempts to determine trends in the relatively short time series available. Indeed, Fig. 8. The year when the trend in chlorophyll concentration (left column) and primary production (right column) exceeds the natural variability in the GFDL, IPSL and NCAR models, run with the IPCC A2 warming scenario from 1968-2100. White areas are where the trend never exceeds the natural variability. Purple and dark blue areas are where the trend exceeded the natural variability within the time period of contemporary satellite data. decadal variability can appear to reverse a climate change trend when 10-year datasets are examined. Consider the time series of PP from a global warming simulation shown in Fig. 7c. If a satellite with a 10-year life span were launched in 2007, we might be tempted to assume that there was a positive trend in PP. However, if a satellite were launched instead in 2016, we would observe a decreasing trend in PP. Ocean productivity has multiple time scales, responding as it does to variability in physical forcing on seasonal, interannual and decadal scales. In order to detect a long-term trend, a dataset that is considerably longer than the time scale of natural variability is necessary. In the case of ocean productivity, 10 years of data is insufficient.
The strong interannual and decadal variability in chl and PP masks any climate change-driven trend that may be present in the current satellite dataset. This effect has been noted previously in studies that examined the satellite ocean colour record for evidence of global warming (e.g. Martinez et al., 2009) and in modelling studies that investigated the timescales over which the climate change response exceeds the natural variability. Boyd et al. (2008) concluded that global warming induced changes in mixed layer depth in the Southern Ocean could not be separated from the natural variability until ∼ 2040; and Bopp et al. (2001) found that 30 to 60 years of data are necessary to detect climate change signals in modelled export production. The time scales for trend detection in chl and PP found in our analysis are consistent with these studies.  Our analysis of future model simulations suggests that ∼ 40 years of data are needed to distinguish a climate changedriven trend from natural variability. This conclusion depends on the ability of the models to simulate both natural variability and the biological response to global warming conditions. The models do well at simulating the contemporary variability in chl, PP and oligotrophic gyre size (Figs. 2,3 and 4). Confidence in the predictions of the response to global warming is lower. Potentially, a model's accuracy under high CO 2 conditions could be assessed by validating results against reconstructions of past marine biogeochemical conditions from sedimentary records. For example, an earlier version of the IPSL model was successfully evaluated against glacial-interglacial changes using a global compilation of paleoceanographic indicators from marine sediments (Bopp et al., 2003). In addition to the problem of validating simulations of future conditions, there are also some potentially climate-sensitive biological processes that the models do not represent, such as the complete spectrum of phytoplankton species, zooplankton and higher trophic level dynamics, or the evolution or acclimation of primary producers to changing conditions. There are potentially large (and mostly unquantifiable) uncertainties in the models' predictions of future conditions. Clearly, more data are needed to continue testing and validating biogeochemical models in order to improve confidence in the predictions. It could be that a climate change-driven trend in PP or chl will be detectable considerably sooner than the models suggest, particularly as the global CO 2 emissions growth rate had exceeded the worst case scenario used in the IPCC reports by 2007 (Raupach et al., 2007), although CO 2 emissions reduced slightly in 2009 due to the global economic crisis (Le Quéré et al., 2009). In addition, other indicators of the biological response to climate change may be more rapidly detectable than the change in PP or chl, such as shifts in biome boundaries (e.g. Sarmiento et al., 2004) or changes in phenology (Edwards and Richardson, 2004). As demonstrated by our analysis and others (e.g. Chavez et al., 2003;Behrenfeld et al., 2006;Henson and Thomas, 2007), the magnitude of interannual to decadal changes in physical forcing can be large and result in substantial yearto-year variability in productivity. On the other hand, the models suggest that global climate change may result in more gradual changes in conditions, potentially allowing time for phytoplankton populations to adapt or acclimate. If ecosystems are very plastic, there may be only small changes in the phytoplankton community due to the resident populations' ability to adapt to changing conditions over many years or decades (Boyd et al., 2008). Alternatively, a new ecosystem structure may develop as conditions at a particular location change (e.g. Boyd and Doney, 2002;Bopp et al., 2005). However, rather than a gradual change, ocean ecosystems may instead reach a "tipping point" and undergo rapid alterations, such as observed in regime shifts. For example, the 1976/77 North Pacific shift saw basin-scale alteration of the entire ecosystem, from phytoplankton to fish (e.g. Francis and Hare, 1994;deYoung et al., 2008;Alheit, 2009). These regime shifts may pose difficulties for accurately estimating satellite PP derived from empirical algorithms, as used here. In the tropical Pacific for example, Friedrichs et al. (2009) demonstrated that satellite PP models successfully reproduced in situ PP in the 1990s, but were much less successful in the 1980s. This possibility points to the necessity of understanding the mechanisms of present day variability in ocean productivity -not only might it provide an indication of the ecosystem response to future changes, but it may also aid in separating natural variability from the global climate change trend. For example, if one suspected that the El Niño-Southern Oscillation was a dominant source of the decadal variability evident in the SeaWiFS data (as shown in e.g. Behrenfeld et al., 2006;, one could add an El Niño index term to (Eq. 1), assuming a linear response is appropriate. This could assist in separating the decadal variability from the trend and permit even a trend of small magnitude, relative to the variability, to be examined. However, although progress has been made in understanding the relationships between contemporary natural variability and ecosystems (e.g. Behrenfeld et al., 2006;Martinez et al., 2009), it is not yet clear whether these will hold under global warming conditions. The currently observed relationships may prove to be an analogue of the future response to climate change; alternatively future changes in productivity may not map onto contemporary modes of variability, and the system will undergo unpredictable changes (Stone et al. (2001) provides a review of both arguments). Finally, the linear fit used here likely represents an upper limit on the length of time series required to detect a trend. With more sophisticated analyses, such as inclusion of spatial patterns via EOF or optimal fingerprint analysis (e.g. Hasselmann, 1993), or Bayesian methods to detect changes in the phase of the seasonal cycle (e.g. Dose and Manzel, 2004), we may be able to more rapidly detect climate change-driven trends in chl or PP.
All of these considerations point to the absolute necessity of continued global monitoring of ocean productivity. Climate change will almost certainly have a significant impact on ocean ecosystems, but it will be difficult to distinguish natural variability from a global warming trend without a substantially longer time series of data. The 10-plus years of ocean colour data currently available are not sufficient. Unfortunately, SeaWiFS and MODIS-Aqua, the two US ocean colour satellites and primary sources of data for the research community world-wide, are both well past their operational lifetimes, and there could potentially be a long wait before the next ocean colour instrument with similar capabilities is launched. Ocean colour missions are currently underway or planned outside the US, particularly by India and the European Space Agency (ESA). ESA launched the MERIS ocean colour instrument in 2002 and has supported a programme to merge MERIS, MODIS and SeaWiFS data to construct a consistent ocean colour record (GlobColour; www.globcolour.info). ESA also plans to launch an ocean colour sensor on Sentinel-3 in 2013, and India has recently launched OceanSat2 which has ocean colour capabilities. However, restricted routine access to data and poorly characterised imaging capabilities have limited the use of non-US ocean colour data in the past. Any potential gap in the time series of ocean colour data will severely compromise our ability to detect and quantify ocean biology's response to global climate change.
The possibility of an imminent gap in ocean colour data has led to the proposal of alternative monitoring strategies. The use of "sentinel sites" -point locations where comprehensive, regular sampling is carried out and which are intended to be representative of large ecological provinceshas been suggested as a strategy for detecting the biological response to climate change. The substantial spatial variability revealed by this analysis suggests however that time series stations alone are unlikely to be an optimal strategy and instead a global observing system is necessary to detect the PP or chl response to global climate change. Current ocean colour satellites are limited to measuring surface properties, but changes will occur throughout the water column, altering plankton community composition and trophic dynamics. Therefore, an integrated observing strategy consisting of satellites, time series stations, gliders, floats and moorings will be necessary to detect the full suite of biological responses to global warming.

Trend detection
We provide an abbreviated derivation of (Eq. 3) here. The interested reader is referred to Appendix 3 of Weatherhead et al. (1998) for the full derivation. The unexplained portion of the data after fitting a trend (Eq. 1), N t , is assumed to be autoregressive, so that N t = N t1 + ε t , where ε t is white noise (zero mean and variance σ 2 ε ). The variance of the noise N t is related to the variance of the white noise process as σ 2 N = σ 2 ε /(1 − φ 2 ).
Biogeosciences, 7, 621-640, 2010 www.biogeosciences.net/7/621/2010/ The estimate of the trend, ω in (Eq. 1), has a standard deviation associated with it, σ ω = √ Var(ω). The exact form of σ ω is given as Eq. A5 in Weatherhead et al. (1998). It simplifies to: where T = 12n denotes the number of months of data. Therefore, The commonly used rule is adopted, that a real trend is indicated at the 95% confidence level if |ω/σ ω | > 2, i.e. the trend is twice the standard deviation, z > 2 − |ω/σ ω |. From standard normal tables, z = −1.3 for a probability of detection of at least 90%, therefore |ω/σ ω | > 3.3 (Tiao et al., 1990). The minimum number of years to detect a trend, n * , is thus (rearranging Eq. A2): The derivation of the additional time needed to detect a trend if an interruption is present, n * * (Eq. 4), is outside the scope of this paper, and so the interested reader is referred to Appendix 3, Eq. A4 in Weatherhead et al. (1998).