Assessment of time of emergence of anthropogenic deoxygenation and warming: insights from a CESM simulation from 850 to 2100 CE

. Marine deoxygenation and anthropogenic ocean warming are observed and projected to intensify in the future. These changes potentially impact the functions and services of marine ecosystems. A key question is whether marine ecosystems are already or will soon be exposed to environmental conditions not experienced during the last millennium. Using a forced simulation with the Community Earth System Model (CESM) over the period 850 to 2100, we ﬁnd that anthropogenic deoxygenation and warming in the thermocline exceeded natural variability in, respectively, 60 % and 90 % of total ocean area. Control simulations are typically used to estimate the pre-industrial variability level. However, the natural variability of oxygen (O 2 ) and temperature ( T ) inferred from the last millennium period is system-atically larger than the internal variability simulated in the corresponding control simulation. This renders natural variability from control simulations to be biased towards low estimates. Here, natural variability is assessed from the last millennium period (850–1800 CE), thus considering the response to forcing from explosive volcanic eruptions, solar irradiance and greenhouse gases in addition to internal, chaotic variability. Results suggest that in the tropical thermocline, where biological and solubility-driven O 2 changes counter-act each other, anthropogenic changes in apparent oxygen utilisation (AOU) and in O 2 solubility (O 2 , sol ) are detectable earlier than O 2 changes. Both natural variability and change in AOU are predominantly driven by variations in with a smaller for They exceeds

Abstract. Marine deoxygenation and anthropogenic ocean warming are observed and projected to intensify in the future. These changes potentially impact the functions and services of marine ecosystems. A key question is whether marine ecosystems are already or will soon be exposed to environmental conditions not experienced during the last millennium. Using a forced simulation with the Community Earth System Model (CESM) over the period 850 to 2100, we find that anthropogenic deoxygenation and warming in the thermocline exceeded natural variability in, respectively, 60 % and 90 % of total ocean area. Control simulations are typically used to estimate the pre-industrial variability level. However, the natural variability of oxygen (O 2 ) and temperature (T ) inferred from the last millennium period is systematically larger than the internal variability simulated in the corresponding control simulation. This renders natural variability from control simulations to be biased towards low estimates. Here, natural variability is assessed from the last millennium period (850-1800 CE), thus considering the response to forcing from explosive volcanic eruptions, solar irradiance and greenhouse gases in addition to internal, chaotic variability. Results suggest that in the tropical thermocline, where biological and solubility-driven O 2 changes counteract each other, anthropogenic changes in apparent oxygen utilisation (AOU) and in O 2 solubility (O 2,sol ) are detectable earlier than O 2 changes. Both natural variability and change in AOU are predominantly driven by variations in circulation with a smaller role for productivity. By the end of the 21st century, ventilation becomes more vigorous in the tropical thermocline, whereas ideal age in deep waters increases by more than 200 years relative to the pre-industrial period. Different methodological choices are compared and the time for an anthropogenic signal to emerge (ToE) is earlier in many thermocline regions when using variability from a shorter period, from the control simulation or estimates from the industrial period instead of the variability from the last millennium. Our results highlight that published methods may lead to deviations in ToE estimates, calling for a careful quantification of variability. They also highlight that realised anthropogenic change exceeds natural variations in many regions.

Introduction
Oceanic oxygen O 2 concentrations have been observed to decrease over the past 50 years (e.g. Stramma et al., 2008;Helm et al., 2011;Ito et al., 2017;Schmidtko et al., 2017) and are projected to further decline under anthropogenic climate change (Sarmiento et al., 1998;Plattner et al., 2001;Keeling and Garcia, 2002;Shaffer et al., 2009;Cocco et al., 2013;Battaglia and Joos, 2018). This deoxygenation, in combination with other stressors Cocco et al., 2013), such as ocean warming (IPCC, 2013), poses major risks for the function of marine ecosystems and the services they provide (Gattuso et al., 2015;Mislan et al., 2017;Breitburg et al., 2018). The natural climate variability of the recent past provides a range of "familiar" climate conditions to which many ecological and socio-economic systems are adapted. Forcing environmental conditions outside the Published by Copernicus Publications on behalf of the European Geosciences Union. source: https://doi.org/10.7892/boris.135333 | downloaded: 4.6.2020 bounds of recent natural variability may trigger adverse impacts (Pörtner et al., 2014). It is still an open question to which extent marine species and ecosystems are already confronted with "unfamiliar" environmental and climatic conditions not encountered during the last millennium. A difficulty arises from the need to quantify natural variability that stems both from the partly chaotic internal variability of the climate system (Lorenz, 1963;Hasselmann, 1976) and from natural external forcing factors including variations in solar energy output and explosive volcanic eruptions (e.g. Jungclaus et al., 2017). Studies that address the natural variability of marine O 2 and temperature during the recent millennium and that compare this variability with anthropogenic change are missing.
Studies that address the ToE for temperature in the thermocline are nevertheless still missing. This is a gap as the observed and projected warming in the thermocline affects the physiology of fish and their habitat distribution in addition to changes in O 2 (Pörtner et al., 2014). A limited number of studies addressed the ToE for O 2 in the thermocline, often in combination for other tracers (Rodgers et al., 2015;Frölicher et al., 2016;Henson et al., 2016;Henson et al., 2017;Long et al., 2016). The results demonstrate a wide range of ToE associated with the range of different methods used.
The method applied for estimating ToE depends on the scientific questions. Most of the earlier studies on marine O 2 are directed towards the detection of the current anthropogenic trend by using modern measurements systems and, closely related, to quantify the uncertainty in projections arising from natural variability (Rodgers et al., 2015;Frölicher et al., 2016;Henson et al., 2016;Long et al., 2016). Studies that employ large model ensembles (Deser et al., 2014) highlight in particular the large contribution of internal natural variability to the spread in projected trends (Rodgers et al., 2015). Alternatively, Henson et al. (2017) address the question of when ecosystems are exposed to conditions outside the range of previously experienced seasonal variability, and hence ToE is estimated relative to the pre-industrial period.
ToE studies employ a wide range of different methods. For example, the anthropogenic signal is computed as a linear trend over a few decades (Keller et al., 2014;Rodgers et al., 2015) or a linear trend over the industrial period and the 21st century (Henson et al., 2017) or by a polynomial fit to simulated data (Carter et al., 2016). Even a wider range of approaches is used to estimate the variability or noise. It can be taken as the standard deviation (SD) of annual extrema from a control simulation (Henson et al., 2017) or SD remaining after removing anthropogenic trends (Keller et al., 2014;Henson et al., 2016). Others used SD from a model ensemble for the 1920-1950 period  or added the SD of annual values and monthly values in addition to estimation of measurement uncertainty (Carter et al., 2016). The amplitude of the simulated pre-industrial annual cycle (Friedrich et al., 2012) and the extrema simulated over the historical period (Mora et al., 2013) were also applied as measures of noise. In these studies, decadal-scale anthropogenic change is compared with variability on the annual or seasonal scale. More recently, outputs from a large ensemble of 21st century simulations were used to estimate mean trends and the standard deviation in the projected trends in O 2 (Rodgers et al., 2015;Frölicher et al., 2016). This approach enables the characterisation of anthropogenically forced trends and the uncertainty in future trends due to internal variability on the same timescale. A drawback is that variability in decadal or, even, multi-decadal trends is difficult to estimate from existing measurements, as these cover a short time period. It is therefore difficult to validate the results.
Most of the earlier studies did not consider variability from natural external forcing (e.g. Rodgers et al., 2015;Frölicher et al., 2016;Henson et al., 2017) or only through indirect methods (e.g. Keller et al., 2014;Henson et al., 2016). Indeed, forcing from explosive volcanic eruptions  and solar irradiance variations (Muscheler et al., 2007) are, by design, not included in model control simulations. Also, the indirect method to estimate natural variability using the residual variability from the simulation long-term trend excludes the multi-decadal natural variability. These shortcomings may bias earlier estimates of natural variability systematically low. As a consequence, the time when the anthropogenic signal leaves the bounds of natural variations would also be biased towards early emergence.
A range of approaches suggests substantial variations in physical and biogeochemical variables in response to the volcanic eruptions and solar irradiance variations (Mignot et al., 2011;Schmidt et al., 2011;Jungclaus et al., 2017). These include analyses of the last millennium using climate reconstructions (McGregor et al., 2015;PAGES 2k Consortium et al., 2015, 2017, climate simulations (Crowley, 2000;Ammann et al., 2007;Fernández-Donado et al., 2013;Camenisch et al., 2016) and the few existing Earth system model (ESM) simulations with enabled carbon and biogeochemical cycles Lehner et al., 2015;Brovkin et al., 2010;Chikamoto et al., 2016a). Regarding biogeochemical cycles, a substantial role of natural forced variability is also found in simulations with and without volcanic forcing (Frölicher et al., 2009(Frölicher et al., , 2011. Particularly, volcanic eruptions induce significant interannual and decadal variability in surface O 2 , which gradually penetrates the ocean's top 500 m and persists for several years (Frölicher et al., 2009). These forced variations add to the internal natural O 2 variations. In conclusion, the available evidence suggests that forced natural variability should not be ignored when comparing the relative importance of anthropogenic trends versus natural variability.
The goals of this study are to quantify when anthropogenic marine deoxygenation and warming leave the bound of natural variability of the last millennium in the thermocline (ToE). Moreover, we estimate the relative role of natural forced and internal variability in marine O 2 and temperature variations. A further goal is to document the influence of different methodological choices on estimates of ToE. We exploit results from one of the few available ESM simulations with active marine carbon and O 2 cycle that covers the entire last millennium, the industrial period and the 21st century . We further document the variability of apparent oxygen utilisation, the solubility-driven O 2 change and their covariance as well as variability in ideal age and export production of particulate organic carbon. This allows us to discuss the role of solubility, biological export and circulation for O 2 variability and, similarly, for the anthropogenic signal. The model used for this study is the Community Earth System Model version 1.0 (CESM1), released in 2010. It is a fully coupled state-of-the-art Earth system model (Hurrell et al., 2013). This version of the model was used in the Coupled Model Intercomparison Project (CMIP5). Its physics originates from the Community Climate System Model (CCSM4; Gent et al., 2011), which includes modules for the atmosphere, the land, the sea ice and the ocean, all coupled by a flux coupler. As compared with CCSM4, CESM1 additionally includes a fully interactive carbon cycle between the atmosphere, ocean and land modules. The atmospheric module is the Community Atmosphere Model version 4 (CAM4; Neale et al., 2010), with a horizontal resolution of 1.25 • × 0.9 • and 26 vertical levels. CAM4 provides an interactive coupled biogeochemistry module (CAM-CHEM). The land module is the Community Land Model version 4 (CLM4; Lawrence et al., 2012). It operates on the same horizontal grid as the atmospheric component. The land surface is represented as a hierarchy of subgrid types, includ-ing glacier, lake, wetland, urban and vegetated land units. The ocean is simulated by POP2 (Parallel Ocean Program version 2; Smith et al., 2010;Danabasoglu et al., 2011), with 60 vertical levels. The horizontal resolution varies around 1 • . It is higher at low latitudes (around the Equator) and around Greenland to where the North Pole is displaced in order to avoid singularity problems in the ocean model equations. Note that, for convenience, the global maps shown here are regridded using a regular grid. The global circulation model (POP2) also includes a water age tracer (ideal age). It is set to zero at the surface and ages every day in the ocean interior. The sea-ice component is the Community Ice Code (CICE4; Hunke et al., 2010). It operates on the same horizontal resolution as the ocean module.
The Biogeochemical Elemental Cycling model (BEC; Moore et al., 2002Moore et al., , 2004 is implemented in POP2. It is built on traditional phytoplankton-zooplankton-detritus food web models . It carries three different phytoplankton types: diatoms, diazotrophs and small phytoplankton. The photosynthesis and associated production rate of oxygen depend on the phytoplankton type, the solar irradiance and the nutrient-limitation terms (Cullen, 1990). The nutrient-limitation terms are scaled by Redfield ratios, which represent the nutrient assimilation type; C : O varies depending on whether NH 3 or NH + 4 is assimilated or N fixed (nitrogen fixation; only for diazotrophs). At the surface, the rate of air-sea oxygen transfer depends on the modelled wind speed, the temperature-dependent Schmidt number (Wanninkhof, 1992) and the air-sea partial pressure difference in O 2 . The solubility component of O 2 (O 2,sol ) is parameterised as a function of temperature (T ) and salinity (S) (Garcia and Gordon, 1992). It is defined as the O 2 concentration in equilibrium with an atmosphere of standard composition, fully saturated with water and with a total pressure of 1 atm. The oxygen content is homogenised within the mixed layer. The typical timescale to equilibrate the oxygen concentration in the mixed layer with the atmosphere by gas exchange is about 1 month.
The modelled consumption of oxygen occurs through remineralisation of organic material, respiration by zooplankton and grazing of the phytoplankton. The biomass of dead phytoplankton is distributed among dissolved and particulate organic and inorganic carbon and nutrient pools. The distribution varies by the type of plankton and the type of mortality (Moore et al., 2004). Aggregated biomass is routed to the sinking particulate organic matter (POM) pools and sinks at a rate of 20.0 m d −1 . Carbon export and remineralisation are following Armstrong et al. (2001). Remineralisation parameters of the detrital pools are set with the condition that [O 2 ] 4 mmol m −3 and a temperature-dependent function. The length scale of remineralisation for the sinking POM pool varies from 200 to 1000 m (Moore et al., 2002). Organic material reaching the ocean floor is remineralised instantaneously; i.e. no sediment module is included.

Description of the simulations
This study uses results from a forced, transient simulation spanning from 850 to 2100 CE (Common Era) and from a corresponding control simulation, both performed with CESM1.0.1. The experimental setup is described in detail by Lehner et al. (2015), and further results of these runs are described by PAGES 2k-PMIP3 group (2015), Keller et al. (2015), Camenisch et al. (2016) and Chikamoto et al. (2016b). The reference simulation (CTRL) branched from the CMIP5 CCSM4 (Gent et al., 2011) pre-industrial control and ran for 258 years with the 850 CE external forcing set to allow a spinup. After reaching the surface equilibrium, the forced simulation branched off. The CTRL was continued for another 518 years. The transient forcing largely follows the protocol of the Paleoclimate and Modelling Intercomparison Project 3 (Schmidt et al., 2011), applying reconstructed variations of the volcanic forcing (Gao et al., 2008), land use changes (Pongratz et al., 2008;Hurtt et al., 2011) and fossil fuel emissions (1750 to 2005 CE, following Andres et al., 2012). The total solar irradiance is taken from the reconstruction by Vieira and Solanki (2010); the original curve is scaled to have an amplitude change from the Maunder Minimum to present day of 0.2 % rather than 0.1 %, consistent with Bard et al. (2000). Over the period 2005-2100 CE, the Representative Concentration Pathway RCP8.5 (Moss et al., 2010) is used (see Fig. A1a in Appendix A for an overview of the forcings).

Model evaluation
We briefly compare model results for O 2 (average between 1986 and 2005) to observations . The reader is referred to Doney et al. (2006) and Najjar et al. (2007) for a more comprehensive evaluation of CESM. The model reproduces the main features of the O 2 distribution given by the World Ocean Atlas 2013  in the thermocline (here defined as the layer between 200 and 600 m; Fig. A2). O 2 concentrations are generally high at these intermediate depths in the midlatitudes and high latitudes of the Southern Hemisphere as well as in the midlatitudes of the North Pacific and Atlantic basins. Both the model and the World Ocean Atlas show low concentrations in the equatorial thermocline and in the northern North Pacific thermocline (Moore et al., 2013). The model simulates too widely expanded oxygen minimum zones (OMZs, defined here as areas where the oxygen concentration is below 20 mmol m −3 ; magenta contours) in the eastern Pacific, Atlantic and Indian oceans. Similar biases have been identified in other models and attributed to biases in the production and remineralisation of particulate organic matter and to deficiencies in the representation of the equatorial undercurrent (Cocco et al., 2013;Brandt et al., 2015;Cabré et al., 2015;Oschlies et al., 2017).
The evaluation of the modelled variability remains difficult as observational data are sparse. Furthermore, there is an inherent mismatch in the spatial scale between local measurements and the model resolution, which is of the order of 100 km. The modelled data are thus spatially averaged as compared to the observations and their variability does not explicitly take into account mesoscale eddies and other small-scale processes . Figure A3 compares the reconstructed variability and trends in O 2 over recent decades at 300 m depths and for different basins  with the model results. CESM-simulated historical O 2 concentrations show multidecadal variability, although with a much smaller amplitude compared to the observations. Long et al. (2016) have compared historical time series from the Hawaii Ocean Time-Series (HOT) station, the Ocean Station Papa (OSP) and the Bermuda Ocean Time Series (BATS) to the CESM large ensemble. They show that modelled variability in annually averaged O 2 is substantially smaller than observed. Taken at face value, beyond the limitations described above, these comparisons suggest that variability in CESM may be biased towards low estimates. This would tend to bias ToE towards early emergence.
Regarding the temperature mean distribution (Fig. A2d), the model is able to simulate the isopycnal structure at intermediate depths reported by Locarnini et al. (2013): cold water masses at the poles; the centre of the subtropical gyres shows higher temperatures than the surroundings (Fig. A2b). Nevertheless, the model simulates colder water in the Southern Ocean and in the equatorial Pacific band, and warmer water in the eastern North Atlantic (isotherm 14 • C) in the thermocline compared to the observations.

Correcting for model drift and removing millennial trends
The upper 500 m of the ocean are generally equilibrated with the forcing at the start of the simulations. This results in negligible linear drifts of T (−2.8 × 10 −2 • C per century), S (−5.0 × 10 −3 ‰ per century), O 2 (0.22 mmol m −3 per century) and other properties in the CTRL (not shown). The drift, however, increases with depth. It starts to become noticeable compared to the variability around 500 m depth. In this study, we focus on variability within the thermocline (200-600 m) where drifts are still small and do not affect conclusions (Fig. A1c). Nevertheless, we use the results from the CTRL to estimate and correct for model drifts in all the studied variables. It appears that model drift diminishes by the end of the control simulation. Hence, an exponential curve (a(1 − e −|b|t ) + c) was fitted to the annual outputs of the CTRL simulation at each grid cell and for each variable of interest, and extended to 2100 CE. The fit was then re-moved from the original output of the CTRL and the forced simulation ( Fig. A1b and c; solid purple curve). Yet, in the last millennium simulation, the drift-corrected signal shows residual millennial-scale trends in the subsurface waters and the deep ocean. We do not exactly know the origin of the millennial-scale trends in the deep ocean, but we hypothesise that these trends are a slow and accumulative response to the volcanic forcing. In this study, we are primarily interested in investigating variability on timescales ranging from years to many centuries in the upper ocean and to detect anthropogenic trends emerging from this variability. Slow natural trends influence the computation of metrics such as the standard deviation around a mean value and ToE. Although they are small in the surface ocean and the thermocline, we removed them from the forced simulation by fitting a linear trend to the model output over the period 850-1800 CE at each grid cell and for each variable of interest (Fig. A1b, c; red solid curve). Figure A1b and c illustrate the computation of drift-and trend-corrected fields from the original outputs for the globally averaged ocean and the thermocline depth range (200-600 m). We note that these corrections have a small influence on the results in the upper ocean, our main study area, and do not affect our main conclusions.

Separation of O 2 concentration into components
Following earlier studies (e.g. Frölicher et al., 2009;Bopp et al., 2017;Ito et al., 2017), marine O 2 concentration can be expressed as the sum of two components: O 2 solubility (O 2,sol ) and apparent oxygen utilisation (AOU). O 2,sol is approximated by the saturation concentration as described in Sect. 2.1.1. It depends non-linearly on T and S, but the variations in O 2,sol are mainly driven by the variations in temperature. AOU is computed as a residual from modelled O 2 and diagnosed O 2,sol : (1) AOU predominantly reflects O 2 respiration due to remineralisation of organic material in the model. It depends on the availability of dead organic matter and on water mass age. We use the ideal age tracer to estimate water mass age which is dictated by circulation, mixing and convection. Production of particulate organic carbon (POC) is used as a diagnostic for available dead organic matter. For completeness, we note that the diagnosed AOU component is additionally influenced by deficiencies in the saturation concentration O 2,sol to represent the solubility component. These arise due to the mixing of different source waters given the non-linear relationship between solubility and T , S as well as by incomplete air-sea surface equilibration of these source waters.

Estimation of the time of emergence
We apply the ToE concept (e.g. Hawkins and Sutton, 2012) to detect anthropogenic change. ToE is the time when changes due to anthropogenic forcing in O 2 , temperature and related variables emerge from natural variations (Eq. 2). Driftand trend-corrected concentrations are expressed as annual anomalies relative to a pre-industrial reference period spanning from 1720 to 1800 CE . The natural variability or noise, N, is computed as 1 SD of the anomalies of each variable of interest over the period from 850 to 1800 CE in the forced simulation and for each water volume. When considering spatially averaged variables, their standard deviation is computed from the spatially averaged values, rather than by taking the averaged deviations of the corresponding individual grid cells. The low-frequency climate change, S, is diagnosed as the spline-fitted (Enting, 1987) anomalies using a cut-off period of 40 years, from the year 1800 in order to remove short-term variations over the industrial period. Similarly, as for N , S is computed on the spatially averaged values when relevant. The ToE is determined as the time when the signal S becomes for the first time larger than twice the noise N (Eq. 2; Fig. A4).
The threshold of 2 SD allows the distinction of the signal from the variability with a confidence interval of 95.45 %; this confidence level is selected following many earlier studies (e.g. Christensen et al., 2007).

Evolution for globally averaged perturbations in ocean temperature and oxygen
First, we discuss variability and trends for averaged temperature (T ) and dissolved oxygen (O 2 ) at the surface, in the thermocline (200-600 m) and for the whole ocean (Figs. 1 and A5). The magnitude of variability and anthropogenic trends is larger for the surface ocean and the thermocline than for the deep ocean. Globally averaged T and O 2 show near-exponential evolution at all depths ( Fig. A5, right) during the industrial period and the 21st century in response to the prescribed anthropogenic forcing. Globally averaged T increases by 3.7, 2.0 and 0.7 • C from the pre-industrial reference period (1720-1800 CE) until 2100 in the surface layer, the thermocline and the whole ocean. Regarding O 2 , the anthropogenic perturbation leads to a O 2 decrease by about 15 mmol m −3 (−6 %) in the spatially averaged surface ocean, by 16 mmol m −3 (−11 %) in the thermocline and by 10.5 mmol m −3 (−5 %) when averaged over the whole ocean. The anthropogenic trends are qualitatively consistent with earlier observational (Keeling and Garcia, 2002;Keeling et al., 2010;Bakun, 2017;Ito et al., 2017;Schmidtko et al., 2017) and modelling studies Cocco et al., 2013;IPCC, 2013;Bopp et al., 2017). Last millennium variability in averaged surface ocean T and O 2 appears to be dominated by interannual to decadal timescales, whereas large variations on multi-decadal and centennial timescales are simulated for the thermocline and the whole ocean (Fig. 1a, left). During the period 850 to 1800 CE, simulated global mean sea surface temperature (SST) varies generally within an interval of about ±0.3 • C and global mean surface O 2 within ±1.2 mmol m −3 relative to the reference period (1720-1800 CE). Large global mean SST changes of up to 2 • C cooling are modelled after large explosive volcanic eruptions. These are accompanied by large positive anomalies in surface mean O 2 of up to 7 mmol m −3 . The large post-eruption anomalies decay within a few years to decades in the surface ocean. In the thermocline, annual T varies between −0.15 and +0.1 • C, and O 2 between −0.7 and 4 mmol m −3 relative to the reference period. These variations occur on multi-decadal to centennial timescales. The imprint of large explosive eruptions is visible as abrupt, sudden perturbations (e.g. in the year 1258), followed by a long-term recovery. Variability for the whole ocean is qualitatively similar to the thermocline, but peak variations are an order of magnitude smaller than for the thermocline depth range. . These figures show well-defined patterns, with zones of early emergence (before 2020) and zones of late (2020 < ToE < 2099) or no emergence (by 2099) of the anthropogenic signal. In general, the human-induced O 2 changes in the thermocline emerge early (before 2020) in high-and midlatitudes, whereas they emerge late or not at all in the tropics (Fig. 2a). Late or no emergence until 2100 is also found in the subtropical Atlantic, western South Pacific and Indian oceans. In contrast, a rather early emergence is simulated in the eastern equatorial Atlantic and Indian Ocean subtropical gyre. These patterns are consistent with the results from Long et al. (2016) and Henson et al. (2017).
In the case of temperature (Fig. 2b), the anthropogenic signal emerges generally before the end of the 21st century in the thermocline. Exceptions are small areas in the North Atlantic and in the western North Pacific. In contrast to O 2 , early emergence of the anthropogenic warming is simulated in the subtropical Atlantic gyres and in the equatorial regions. Late emergence is simulated in the subpolar North Atlantic and in the large gyres of the Pacific.
Interestingly, the spatial pattern of the difference between ToE of T minus ToE of O 2 resembles the spatial pattern of ToE of O 2 (Fig. 2c). The anthropogenic T signal emerges in large regions much earlier than the O 2 signal in the thermocline (brown areas; Fig. 2c). These regions include the tropics, the midlatitude Atlantic and the thermocline off the coast of South America. Yet, in some areas, the anthropogenic signal emerges first in O 2 . The reasons for these results are analysed in Sect. 3.4. The following section is dedicated to a more in-depth analysis of the signal and the noise that both define ToE.

Natural variability and anthropogenic changes
We analyse the magnitude and spatial feature of the noise (Fig. 3a, b) and of the signal (Fig. 3c, d) to understand why a signal emerges early or late. Natural variability of O 2 in the thermocline (Fig. 3a) ranges from less than 1 mmol m −3 to more than 10 mmol m −3 (SD ± 2.50 mmol m −3 ; 850 to 1800 CE). Variability is small in the core of the O 2 minimum zones. In contrast, variability is generally high at the edge of the major oceanic gyres, including transition zones to O 2 minimum regions. The anthropogenic signal in O 2 remains relatively small in the O 2 minimum zones and the subtropical Atlantic gyres (Fig. 3c). O 2 is projected to increase in the thermocline in the southern subtropical Indian Ocean gyre region and in the tropical Pacific but to decrease in midlatitudes and high latitudes in the thermocline.
By definition, the areas of early emergence (late or no emergence) result from a high (low) signal-to-noise ratio. Locally, a signal may emerge early (late) compared to other regions due to relatively weak (high) variability or a relatively strong (weak) anthropogenic response, or a combination of both. For example, in the region south 30 • S, simulated O 2 concentrations show generally relatively weak natural variability (±2 mmol m −3 ; Fig. 3a) and large anthropogenic O 2 changes (> +12 mmol m −3 by 2100 CE; Fig. 3c) resulting in early ToE. However, in the North Pacific, the early emergence arises from the large anthropogenic signal (−50 mmol m −3 by 2100 CE) despite the intense variability (±10 mmol m −3 ). In the eastern tropical Atlantic, weak changes and low natural variability result in early emergence. On the contrary, in the western tropical Pacific, the anthropogenic O 2 signal has not emerged by 2100 due to the combination of strong natural variability and relatively weak changes.
In general, the temperature signal-to-noise ratio is high in the thermocline, and the emergence of human-induced changes occurs before the end of the 21st century. The thermocline temperature varies naturally by less than ±1 • C (Fig. 3b) and the anthropogenic changes are between +1 and +4 • C by the end of the 21st century (Fig. 3d). However, the subtropical Pacific gyres, the northern tropical Atlantic and the subtropical Indian gyre show slightly more intense natural variations which delay the emergence of the anthropogenic signal. In parts of the western North Pacific and the North Atlantic, temperature variability in the thermocline is high and the anthropogenic changes remain within the range of natural variability.

Sensitivity of ToE to methodological differences
Different methodological choices were applied in earlier studies to estimate ToE for precipitation (Giorgi and Bi, 2009), air surface temperature (Karoly and Wu, 2005;Diffenbaugh and Scherer, 2011;Hawkins and Sutton, 2012), SST, pCO 2 , pH, dissolved inorganic carbon (Keller et al., 2014), primary production or O 2 (Rodgers et al., 2015;Carter et al., 2016;Frölicher et al., 2016;Henson et al., 2016;Henson et al., 2017;Long et al., 2016). These different definitions and methods are used to estimate the noise (or natural variability), the anthropogenic signal and the ToE. There seems to be no consensus on the method to estimate ToE. In the following part, in order to gain confidence in the ToE estimates presented above, the impact of different choices on the estimate of ToE is investigated.

Noise estimated from internal variability of a control simulation
The background noise is estimated in many studies from the temporal SD of a control simulation (CTRL) (Hawkins and Sutton, 2012;Maraun, 2013;Long et al., 2016;Henson et al., 2017). In this way, only the internal chaotic variability is taken into account. Accounting for external natural forcing may enlarge the natural O 2 variability significantly (Table 1)  T . There are many regions where the ratio between the SD from the forced versus the SD from the CTRL simulation is close to 1, indicating that internal and total natural variability are approximately equal. In particular, forced and internal natural variability in O 2 are comparable in most thermocline regions (Fig. 4a). However, there are also regions where natural variability is much larger than internal variability. Indeed, natural variability exceeds internal variability in O 2 by more than 50 % in 3 % of the thermocline, while deviations larger than 20 % are found in 18 % of the thermocline. For the temperature, forced variability exceeds internal variability by more than 20 % over a third of the thermocline and by more than 50 % over 10 % of the thermocline. The relative difference between natural and internal variability is thus often smaller for O 2 than for T in the thermocline. Turning to the surface ocean, natural variability in O 2 and T is substantially larger than internal variability in many low-latitude regions (Fig. 4c, d). Hence, using results from a control simulation to estimate natural variability leads to an underestimation of total natural variability in some specific regions. This affects ToE, as illustrated in Fig. 5. Nevertheless, the results from a control simulation appear to yield a reasonable approximation of simulated natural variability in O 2 and T on the local scale in the thermocline. Regions with the largest differences are located at the edges of the subtropical gyres in the North Pacific and the tropical Atlantic (Fig. 5a, b).

Noise estimated over the industrial period
A direct estimation of total natural variability is often not possible because ESM simulations typically start in the 19th century and apply natural plus anthropogenic forcings. Other studies have therefore estimated natural variability from the residual variability (e.g. Keller et al., 2014;Henson et al., 2016) after removing the anthropogenic response from the model output. The residual variability is then defined as the difference between the annual model output and a low-passfiltered signal. Here, a spline with a cut-off period of 40 years is used to compute the filtered signal. Then, the standard deviation of the residual variability is computed for the period 1850 to 2005 CE. Surprisingly, the SD of O 2 and T in the thermocline as computed from the residual variability is much smaller than that computed from the last millennium (LM) simulation in many regions (Fig. 6a, b). Particularly large differences (100 % of LM SD or more) are found in the Atlantic and high-latitude regions for T and O 2 and in large parts of the Pacific for O 2 . The stronger variability in LM compared to the residual variability likely results from low-frequency variability included in LM but removed by the spline in the residuals. As a consequence, O 2 and T in the thermocline emerge earlier when using SD from the residuals instead of SD from the 850 to 1800 CE period to compute ToE (Fig. 6c, d).
Estimates of noise may further be sensitive to the choice of the period. Because fully coupled millennial-scale simulations are expensive and relatively rare, we compare the SD of T and O 2 in the thermocline in the forced LM simulation for the period 850-1800 CE versus the shorter pre-industrial reference period 1720-1800 CE (Fig. A6a, b). In a large part of the thermocline, SD in T and O 2 are similar (within ±10 %), reflecting a similar estimated natural variability. Differences in SD are, for example, found in regions of the South Pacific, the eastern North Pacific, the Arctic Ocean and for O 2 in the Arabian Sea. For O 2 , substantially earlier ToE values are esti- Table 1. Overview of the standard deviations of T , O 2 , O 2,sol and −AOU, and the corresponding covariance between −AOU and O 2,sol for the global mean ocean, the thermocline (200-600 m) and the surface ocean for the control simulation (CTRL) and the forced simulation (850-1800; LM). SD and covariance (COV) are computed from annually and spatially averaged data.

Standard deviation COV(−AOU, O 2,sol )
Temperature Global ocean CTRL 0.3 × 10 −2 3.6 × 10 −2 4.9 × 10 −2 3.4 × 10 −2 −1.13 × 10 −3 LM 1.1 × 10 −2 16.5 × 10 −2 10.7 × 10 −2 7.2 × 10 −2 5.30 × 10 −3 Thermocline ( mated in large parts of the Pacific, the Arctic and the Southern Ocean when using the pre-industrial period to approximate the natural variability (Fig. A6a). However, changes in oxygen are estimated to appear later in some parts of the equatorial Atlantic, in the Arabian Sea and in a few grid cells in the Pacific and Arctic. Similarly, earlier ToE values for T are found, for example, in large parts of the Pacific when using the variability from the pre-industrial period (Fig. A6b).
The results suggest that a century-long period of the forced simulation may not yield robust results for variability and ToE in all regions.

Apparent oxygen utilisation, O 2 solubility, ventilation and organic matter cycling
We analyse AOU and the O 2 solubility component (O 2,sol ) in order to gain insight into the processes underlying O 2 variations. O 2,sol variations are driven by SST changes and AOU variations mainly reflect the imprint of changes in water mass ventilation and in the remineralisation of organic matter. Here, ventilation is diagnosed by an ideal age tracer, and changes in remineralisation of organic matter are linked to the production of organic matter.

Natural variability
Changes in AOU largely explain most of the pre-industrial variations in O 2 with a generally small role for O 2,sol both at the global scale and on average in the thermocline (Figs. 1a, b and A5a  (Fig. A5a). An even larger relative contribution of O 2,sol is simulated for the spatially averaged thermocline signal (Fig. 1b). Changes in AOU in the thermocline remain small over the industrial period due to the compensation of regionally opposed responses, as described below. The O 2 variations at the surface are, unsurprisingly, primarily caused by changes in solubility (O 2,sol ; Fig. A5b). Quantification of the natural variability of O 2 can be split into the contributions from individual components given in ( Var() stands for the variance of the variable and equals the square of SD. Both metrics are positive by definition. The sign of the covariance between O 2,sol and −AOU(COV(−AOU, O 2,sol )) indicates whether the contributions from O 2,sol and −AOU enhance or partly cancel each other. If COV(−AOU, O 2,sol ) is negative, for example, the resulting Var(O 2 ) will be smaller than the sum of Var(O 2,sol ) and Var(−AOU) and the two components partly cancel each other. On the contrary, they have an additive effect when their covariance is positive. Table 1 gathers the SD of O 2 and its components and the corresponding covariance in the global averaged surface ocean, thermocline and world ocean for the control simulation and the forced simulation (850-1800 CE). COV(−AOU, O 2,sol ) is negative in CTRL in all three water bodies. Hence, O 2,sol and −AOU partly compensate for each other in CTRL. COV(−AOU, O 2,sol ) is also negative in LM for the surface ocean. This implies that changes in solubility are partly compensated by changes in air-sea O 2 disequilibrium. In contrast, COV(−AOU, O 2,sol ) is positive in the LM simulation for the global ocean and the spatially averaged thermocline during the pre-industrial period. Hence, changes in O 2,sol and −AOU are positively correlated and reinforce each other on average. In conclusion, the natural forcing leads to a positive correlation of −AOU and O 2,sol . This statistical analysis is consistent with the discussion on O 2 perturbations above: −AOU and O 2,sol change hand in hand after large volcanic eruptions in the global mean and in the global thermocline (Fig. 1a, b). The variability in CTRL is not only smaller than in LM but apparently also different in terms of underlying physical and biogeochemical mechanisms and in their interactions.
The spatial pattern of the natural variability in O 2 (Fig. 3a) can largely be attributed to −AOU (Fig. 7a). O 2,sol variations have generally a limited impact on O 2 in the thermocline (Fig. 7b). By definition, O 2,sol is largely congruent with the previously discussed pattern of SD(T ) (Fig. 3b). Nevertheless, in the high-variability regions in the western North Pacific and the northern North Atlantic, SD(O 2,sol ) is of the same order of magnitude as SD(−AOU). The latter resembles the pattern of SD for ideal age (Fig. 7d). This suggests that a significant fraction of the variability in −AOU (and thus O 2 ) is driven by changes in circulation and water mass age. Variability in POC production in the surface layer (Fig. 7e), indicative of water column remineralisation of organic material, may also contribute to the variability in −AOU in the thermocline. For example, in the northern North Atlantic, SD in POC production and −AOU is relatively large, while SD in ideal age is low. On the other hand, the large SDs in POC production in parts of the Southern Ocean are not reflected in SD(−AOU). COV(−AOU, O 2,sol ) is strongly negative in some regions with large variability in −AOU (Fig. 7c). These regions, as discussed in Sect. 3.2, are located at the boundaries of major gyres. This suggests that changes in −AOU and O 2,sol partly compensate for each other in these regions. An exception is the boundary region between the subtropical and subpolar gyres in the North Atlantic, where the two components tend to enhance each other and therefore increase O 2 variability.

Anthropogenic change
Next, we address the pattern of anthropogenic changes in O 2,sol ( O 2,sol ) and in −AOU ( (−AOU)) in the thermocline (Figs. 8 and 9). O 2,sol shows a spatially coherent decrease as dictated by the global warming pattern (Fig. 3d). In contrast, (−AOU) shows a strong spatial pattern in the thermocline with positive values in the tropics, the Arctic and subtropical Atlantic and negative values in the midlatitude and high-latitude Pacific as well as the Southern Ocean and the subtropical Indian and the Pacific Ocean. Regional changes in −AOU largely balance each other, explaining the small change in spatially averaged −AOU (Fig. 1b). The anthropogenic increase in O 2 (Fig. 3c) in parts of the tropical thermocline is attributed to the increase in −AOU, partly offset by the decrease in O 2,sol , while the anthropogenic O 2 decrease in the northern Pacific and the Southern Ocean results from a decrease in both −AOU and O 2,sol .
(−AOU) in the thermocline is mainly driven by changes in ventilation and modulated by changes in remineralisation rates (Fig. 8c, d). The increase in ideal age and the increase in POC production (indicative of an increase in remineralisation rate) explain the decrease in −AOU in the Southern Ocean. By contrast, in the western tropical Pacific, the equatorial Indian and the Atlantic oceans, the combination of a decrease in water mass age and in POC production explains the increase in −AOU over the industrial period and the 21st century. In the eastern tropical Pacific and in the North Pacific, the impacts of changes in ventilation are partly mitigated by changes in organic matter remineralisation rate.
Below the thermocline and in the deep ocean, O 2 decreases over the industrial period and the 21st century as both −AOU and O 2,sol decrease (Fig. 9). The decrease in −AOU is again mainly explained by an increase in ideal age. Changes in ideal age in the deep Atlantic, Southern Ocean and Pacific partly exceed 150 years and indicate a general reduction in the deep water mass formation over the industrial period and the 21st century in the simulation.

ToE of O 2 components
Both the variability and the anthropogenic signal of O 2 in the thermocline are driven primarily by changes in −AOU, the former of which is influenced in part by changes in O 2,sol . The question arises whether the signal of −AOU or of O 2,sol emerge earlier than the signal of O 2 . As O 2,sol is a function of T , ToE(O 2,sol ) is very similar to ToE(T ) and we refer to the previous discussion on ToE(T ) (Fig. 2b) and its difference to ToE(O 2 ) (Fig. 2c). Interestingly, the −AOU and O 2 signals seem to emerge around the same time in the thermocline in the Pacific and Indian subtropical gyres (Fig. 10b). But the signal of −AOU emerges later in the thermocline in many regions including the subtropical oceans in the Southern Hemisphere, while it emerges earlier in the tropics and in the subtropical gyres of the Atlantic as well as south of Madagascar. This suggests that anthropogenic changes in −AOU might be detectable earlier in large oceanic regions. However, the specific results may be model-dependent and need to be confirmed by other models or by observations. Earlier emergence of −AOU than O 2 is generally found in regions with a positive change in −AOU and thus in regions where −AOU and O 2,sol partly offset each other. Late emergence in −AOU is found in regions with a small anthropogenic change in −AOU.

Discussion and conclusion
We have analysed the natural variability over the last millennium and anthropogenic trends of marine oxygen (O 2 ) and temperature (T ) in simulations using the Community Earth System Model (CESM). The relative roles of the internal and forced natural variability were quantified. We have also determined the time of emergence (ToE) of T , O 2 and apparent oxygen utilisation (AOU) in the thermocline.
We find that anthropogenic deoxygenation and warming in the thermocline has today already left the bounds of natural variability on the global scale. By 2020, these signals have emerged in over 60 % and 90 % of the thermocline area, respectively (Fig. 11). By the end of this century, these values are approaching 100 % if greenhouse gas emissions continue unabated. This presents an increasing risk for the function and services of marine ecosystems (Pörtner et al., 2014). ToE for O 2 is relatively early in the midlatitude and high-latitude thermocline and late in the tropical ocean and subtropical Atlantic, in agreement with earlier work Henson et al., 2017). Temperature ToE has not been analysed in the thermocline in earlier studies. In CESM, ToE is relatively early for T in the thermocline of eastern boundary systems, the subtropical Atlantic and in large parts of the Southern Ocean but late in many subtropical regions and in large parts of the North Pacific.
A large O 2 decline is simulated by CESM for the future in the North Pacific thermocline, in agreement with other models Cocco et al., 2013). This region is characterised by particularly large changes in ventilation. As an interesting consequence, the anthropogenic O 2 signal is detectable earlier than the anthropogenic temperature signal in large parts of the northern North Pacific. Along the same line, Keller et al. (2015) show that a potential weakening of the El Niño-Southern Oscillation (ENSO) variability is verifiable earlier and more widespread for carbon cycle tracers than for temperature, and Séférian et al. (2014) highlight the multi-year predictability of tropical productivity. This cor-roborates earlier suggestions by Joos et al. (2003) that measurements of O 2 , or, more generally, multi-tracer observations, are critical in detecting or predicting anthropogenic changes.
For a better understanding of these results, we have analysed the natural variability and the anthropogenic response of the respiration (−AOU) and solubility (O 2,sol ) components of O 2 . Variations in AOU dominate variations in O 2 during the last millennium, as well as the local response to anthropogenic emissions in the thermocline. For the globally averaged thermocline, internal variability in AOU and in O 2,sol are negatively correlated and partly offset each other. In contrast, the total natural variability of these two components is positively correlated and thus enforces variability in O 2 in the global thermocline (Table 1). Under human activities, changes in AOU and O 2,sol partly cancel each other in the tropical thermocline and in the subtropical Atlantic Ocean, in accordance with Bopp et al. (2017). Moreover, AOU (thus O 2 ) variability and its signal are explained by changes in water mass ventilation, with a smaller role for changes in biological productivity (Figs. 8 and 9). Strongly reduced ventilation in the northern North Pacific therefore drives the early O 2 emergence. This suggests that weakened ventilation precedes warming in the thermocline in this region.
The comparison between the transient last millennium simulation and the corresponding control simulation shows that both the naturally forced and the internal variability contributed to the simulated climate and biogeochemical variations of the last millennium. The internal variability arises from the inherent and partly chaotic variability of the climate system (Frölicher et al., 2009;Deser et al., 2012;Resplandy et al., 2015). The natural forced climate variability arises from explosive volcanism and from changes in solar irradiance on interannual to centennial timescales (Wanner et al., 2008). The relative importance of forced variability is particularly large when considering large-scale averages (Table 1, Fig. 4) but can also be substantial on the local scale. Our results imply that natural forced variations should not be neglected when comparing century-scale anthropogenic climate change to natural climate variability.
Moreover, we show that ToE is highly sensitive to the method applied for estimating the natural variability. We have estimated variability from a forced simulation over the last millennium (850-1800 CE), over the period 1720 to 1800 CE, and from a control simulation. As expected, using variability from the control and the short period yields, in general, an earlier emergence of the anthropogenic signal than when using variability from the last millennium simulation (Fig. 11). These two estimates of noise do not capture the full natural variability of the last millennium. Yet, differences in estimated ToE and noise are often modest on the local scale (Figs. 2, 4 and 11). In addition, we have tested the use of the "residual" variability of a time series to estimate noise. The long-term trend in the industrial period (1850( -2005 data is removed by a low-pass filter; the standard deviation of the remaining annual anomalies provides then an estimate of the noise. This yields a much smaller variability than estimated from the last millennium output and thus a much earlier ToE (Figs. 6 and 11). Applying a smoothingspline cut-off period of 80 or 100 years instead of 40 years leaves a higher variability but the full last millennium variability is still underestimated (not shown). In conclusion, it might be appropriate, though not ideal, to use the results from a control simulation to estimate natural variability, as last millennium Earth system simulations extended towards the future period are still rare.
Alternatively, large ensemble simulations for the industrial period and the future will become available within CMIP6. Different ensembles including or excluding anthropogenic forcing (Stott et al., 2000) and including or excluding natural forcing may be used to disentangle the individual contributions to trends and variability. Some model centres may also wish to generate large ensemble simulations for the last millennium to study natural variability over the more recent pre-industrial period .
However, there are still some uncertainties in our results, and some are linked to the relatively coarse resolution of the CESM model of the order of 1 • . Larger variability may be found on smaller scales. For example, Long et al. (2016) document that interannual variability from the Hawaii Ocean Time-Series (HOT) station is about a factor of 2 larger than the variability at the same location in CESM. Another source of error is structural model uncertainty. Comparison with observations (Sect. 2.1.3) and multi-model studies shows weaknesses of the current class of Earth system models in simulating the observed O 2 distribution and variability. Projections of anthropogenic O 2 change are particularly uncertain Figure 11. Fraction of the (a) surface ocean and (b) thermocline (200-600 m) where the signal has already emerged in oxygen (blue) and temperature (red) for different definitions of the background noise: (i) naturally forced variability during the last millennium (850-1800 CE, LM; solid), (ii) internal variability (CTRL; dotted), (iii) naturally forced variability during the pre-industrial period (1720-1800 CE, PI; dashed), (iv) residual variability using a low-pass filter (cut-off period of 40 years, residual; dashed dotted).
O 2 changes will likely continue beyond 2100 CE. We find a strong link between changes in O 2 , AOU and ideal age, with a shift to older water mass ages accompanied by a shift to lower O 2 . By 2100, ideal age in the near-bottom waters of the Southern Ocean and the deep Pacific has increased by up to 240 years and O 2 decreased by around 16 to 20 mmol m −3 relative to pre-industrial period. These anomalies are likely to spread further into the deep ocean. A long-term reduction in deep ocean ventilation and O 2 under anthropogenic forcing is consistent with results from Earth system models of intermediate complexity (Schmittner et al., 2008;Battaglia and Joos, 2018). For example, Battaglia and Joos (2018) find a large, transient decline in deep ocean O 2 and in the global O 2 inventory by as much as 40 % in scenarios where radiative forcing is stabilised in 2300 CE. In their simulations, deoxygenation peaks about a thousand years after the forcing stabilisation and new steady-state conditions are established only after 8000 CE. The CESM results also support the notion of long-term deep ocean deoxygenation.
In conclusion, we find that natural radiative forcing arising from explosive volcanism and solar irradiance changes contributes notably to the overall past variability of marine O 2 and temperature. We simulate large and widespread ocean deoxygenation under anthropogenic forcing and suggest that large parts of the thermocline are already experiencing environmental conditions that are outside the range of natural variability of the last millennium.
Data availability. Data are available on request. Figure A4. Illustration of the ToE method. The example is for the thermocline (200-600 m) and grid cells located at 158 • W and 22 • N near Hawaii. The SD of the detrended, annually and vertically (200-600 m) averaged data (blue) for the period 850-1800 CE is used to define the "noise" or bounds of natural variability which is set to ±2 SD (blue area). The annually and vertically averaged data are fitted with a spline using a cut-off period of 40 years (light blue). ToE is the point in time when the spline crosses and leaves the bounds of natural variability; here, ToE is 1984 and indicated by the vertical dashed line. All data are anomalies relative to the pre-industrial period (1720-1800 CE; yellow). Figure A5. Temporal evolution of simulated anomalies in (a) global mean O 2 (dark blue), −AOU (light blue), the O 2 solubility component (dashed black) and T (red) over the last millennium (left) and the historical and future periods (right). The horizontal dashed lines stand for the 2 standard deviation envelopes for O 2 (blue) and T (red) computed over the period 850-1800 CE. (b) Same as (a) but for the surface. The anomalies are relative to the pre-industrial reference period (1720-1800 CE). Figure A6. Ratio of the SDs of the forced simulation computed over the period 850-1800 CE and over the period 1720-1800 CE for (a) O 2 and (b) T . Difference of ToE using the SD of the forced simulation over the period 850-1800 CE minus the ToE using the SD of the same simulation but over the pre-industrial period (1720-1800 CE) for (c) O 2 and (d) T . The magenta contours highlight the ratio equal to 1, i.e. where SDs are equal. SDs are computed from annually and vertically averaged values between 200 and 600 m.