Detecting an external influence on recent changes in oceanic oxygen using an optimal fingerprinting method.

. Ocean deoxygenation has been observed in all major ocean basins over the past 50 yr. Although this signal is largely consistent with oxygen changes expected from anthropogenic climate change, the contribution of external forcing to recent deoxygenation trends relative to natural internal variability is yet to be established. Here we conduct a formal optimal ﬁngerprinting analysis to investigate if external forcing has had a detectable inﬂuence on observed dissolved oxygen concentration ([O 2 ]) changes between ∼ 1970 and ∼ 1992 using simulations from two Earth System Models (MPI-ESM-LR and HadGEM2-ES). We detect a response to external forcing at a 90 % conﬁdence level and ﬁnd that observed [O 2 ] changes are inconsistent with internal variability as simulated by models. This result is robust in the global ocean for depth-averaged (1-D) zonal mean patterns of [O 2 ] change in both models. Further analysis with the MPI-ESM-LR model shows similar positive detection results for depth-resolved (2-D) zonal mean [O 2 ] changes globally and for the Paciﬁc Ocean individually. Observed oxygen changes in the Atlantic Ocean are indistinguishable from natural internal variability. Simulations from both models consistently underestimate the amplitude of historical [O 2 ]


Introduction
The oceanic oxygen inventory is coupled to the climate system via a number of physical and biogeochemical processes making oxygen a useful tracer for detecting changes in the state of the earth system (Joos et al., 2003;Brennan et al., 2008). In particular, changes in temperature, ocean circulation, biological production and respiration, expected in response to global climate change, all exert a major control on dissolved oxygen concentration ([O 2 ]). Open ocean deoxygenation has been recorded in nearly all ocean basins during the second half of the 20th century (Helm et al., 2011) with more acute [O 2 ] decreases in the coastal ocean . Oxygen time series measurements also show historical expansion and intensification of established oxygen minimum zones (OMZs) in the eastern tropical Atlantic and equatorial Pacific since 1960 (Stramma et al., 2008) along with long-term [O 2 ] decreases in the subarctic North Pacific between 1956 and 2006 (Whitney et al., 2007).
Historical deoxygenation has been associated with global climate change, chiefly due to enhanced ocean stratification in a warming climate which increases the ventilation age of downwelling water masses, and augmented by the reduced solubility of dissolved oxygen at higher temperatures (Keeling et al., 2010). These findings are supported by prognostic simulations from a suite of Atmosphere-Ocean General Circulation Models (AOGCMs), which show ventilation-driven reductions in global mean [O 2 ] between 3 and 12 µmol kg −1 O. D. Andrews et al.: Detecting an external influence on recent changes in oceanic oxygen by 2100 (Sarmiento et al., 1998;Bopp et al., 2002;Matear and Hirst, 2003;Frölicher et al., 2009). Global [O 2 ] decreases have also been simulated over longer timescales in response to protracted ocean warming. For example, model simulations using Intermediate Complexity Earth System Models (EMICs) produce a tripling in the volume of suboxic ([O 2 ] < 5 µmol kg −1 ) waters by 2500 . Ocean deoxygenation and expansion of the OMZs has also been projected to persist on millennial timescales for EMIC simulations with high greenhouse gas emissions or high climate sensitivity (Shaffer et al., 2009).
All global models simulate that the most significant [O 2 ] decreases occur at mid-to high latitudes, consistent with observations and caused by decreasing ventilation of the oceanic interior in response to enhanced upper ocean stratification. However, models generally do not reproduce the deoxygenation observed at low latitudes, which leads to considerable uncertainties for future projections. The predictive skill of coarse resolution ocean models at low latitudes is limited by the need to resolve zonal currents thought to be important in supplying oxygen into the tropical OMZs (Keeling et al., 2010;Brandt et al., 2010;Stramma et al., 2010). Climate change integrations of coupled AOGCMs often simulate [O 2 ] increases in the tropical thermocline associated with reductions in the volume of O 2 -depleted waters at low latitudes, contrary to recent trends (Matear and Hirst, 2003;Gnanadesikan et al., 2012). Gnanadesikan et al. (2007) propose that this oxygenation is driven by reduced exchange between poorly ventilated deep water and the tropical thermocline in a more stratified ocean. However, this process has been posited to be a numerical artifact tied to unrealistically high rates of diapycnal mixing in coarse resolution ocean models (Keeling et al., 2010). A recent sensitivity study by Duteil and Oschlies (2011) confirms the importance of vertical diffusivity and its parameterisation on the evolution of suboxic areas in global models. Simulated oxygenation of suboxic zones has also been attributed to an elevated supply of oxygen from isopycnal diffusion in one study using the GFDL ESM2.1 (Gnanadesikan et al., 2012). Up to now, biogeochemical models can reproduce observed low-latitude deoxygenation and expansion of hypoxic and suboxic waters in the tropical OMZs only when they take into account poorly constrained pCO 2 -driven changes to the C:N ratio in primary production Tagliabue et al., 2011) or the alteration to CaCO 3 mineral ballasting in response to ocean acidification (Hofmann and Schellenhuber, 2009).
The implications of a climate change signal in [O 2 ] changes are profound, owing to the deleterious impact of reduced oxygen concentrations and expanding hypoxic zones on marine ecosystems and fisheries (e.g., Doney et al., 2012;Stramma et al., 2012b), and the sensitivity of biolimiting marine nutrient cycles such as nitrogen and phosphorous to oceanic redox conditions (Keeling et al., 2010). However, attribution of secular [O 2 ] changes to global climate change is confounded by natural interannual to decadal variability driven by the leading climate modes which act to mask any climate-driven signal (e.g., the North Atlantic Oscillation, Frölicher et al., 2009). To this end, modelling studies have demonstrated that internal variability associated with the major climate modes in the North Pacific drive propagating O 2 anomalies that are large enough to frequently preclude unequivocal detection of anthropogenic trends in dissolved oxygen on decadal timescales and regional scales (Deutsch et al., 2006;Ito and Deutsch, 2010). However, Deutsch et al. (2005Deutsch et al. ( , 2006 highlight that in the North Pacific strong historical [O 2 ] decreases within the lower ventilated thermocline are reproduced by models and are likely to be driven by long-term changes in ocean ventilation and circulation, but a formal detection study is required to separate external causes from internal variability. We use an optimal fingerprinting methodology to objectively quantify the detectability of externally forced changes in oceanic [O 2 ] relative to natural internal periodicity. Numerous detection studies have been conducted investigating the impact of external forcing on oceanic variables such as ocean temperature (e.g., Barnett et al., 2005;Pierce et al., 2006;Gleckler et al., 2012) and salinity Terray et al., 2012), but these techniques have not yet been used for biogeochemical tracers like oxygen. [O 2 ] changes, therefore, provide a novel means for detecting and attributing anthropogenic climate change in the oceans and for the evaluation of model performance (Hegerl et al., 2006). As such, this study aims to conduct a formal optimal fingerprinting analysis to investigate the "signal in noise" problem of historical changes in oceanic oxygen using [O 2 ] data and simulations from two Earth System Models (ESMs) participating in Phase 5 of the Coupled Model Intercomparison Project (CMIP5).

Observations and CMIP5 models
We use a recent collation of global World Ocean Circulation Experiment (WOCE) O 2 data compared with earlier ocean oxygen profiles (Helm et al., 2011) to investigate the cause of the observed historical changes in global marine oxygen distribution. This compilation maps all historical [O 2 ] data (1940-1988; mean year ∼ 1970) onto the locations of WOCE profiles to facilitate comparison with WOCE measurements (∼ 1989-2000; mean year ∼ 1992) using an optimal-interpolation technique that adapts the length scales to suit the density and distribution of oxygen data . The [O 2 ] data is binned into regular 1 • × 1 • grid cells with 101 fixed pressure levels, excluding shallow (< 1000 m) profiles and observations above 100 m where seasonal variability dominates the oxygen signal (Garcia et al., 2005). Noise variance calculated using the difference between neighbouring data points is used to provide an a priori estimate of natural variability in [O 2 ] data (Helm et al., , 2011Bindoff and Wunsch, 1992). This technique accounts for mesoscale processes and to some extent longer period internal variability such as the dominant climate modes.
This dataset is used in an optimal detection analysis along with the biogeochemical output from two state-of-the-art ESMs (MPI-ESM-LR and HadGEM2-ES) participating in the CMIP5 experiments (Taylor et al., 2009. [O 2 ] fields from MPI-ESM-LR and HadGEM2-ES have been selected here because they display a higher level of realism in simulating both climatological [O 2 ] distribution and historical oxygen changes compared to the other available CMIP5 ESMs. Sufficient qualitative agreement between modelled and observed [O 2 ] distribution allows for direct quantitative comparison using an optimal fingerprinting algorithm without any requirement for a prior transformation of the model's variables (e.g., Banks and Bindoff, 2003).
The Max Plank Institute for Meteorology Earth System Model used here (MPI-ESM-LR; Giorgetta et al., 2012) is a low resolution model version that includes the ECHAM6 atmospheric GCM with T63 horizontal resolution (1.875 • ) and 47 vertical layers developed from ECHAM5 (Roeckner et al., 2006) with modifications in the shortwave radiative transfer and representation of the middle atmosphere). It is coupled to the MPIOM physical ocean model Marsland et al., 2003) which includes a thermodynamic-dynamic sea ice component ) and ocean biogeochemistry model (HAMOCC5; Ilyina et al., 2013;Maier-Reimer, 1993;Maier-Reimer et al., 2005) implemented on a curvilinear bipolar orthogonal grid with a nominal horizontal resolution of 1.5 • and 40 z-levels which increase in thickness with depth. The HAMOCC5 ocean biogeochemistry component includes an extended Nutrient-Phytoplankton-Zooplankton-Detritus (NPZD) type ecosystem model (Six and Maier-Reimer, 1996) along with over 30 prognostic variables in the water column and the upper sediments including co-limiting nutrients nitrate, phosphate and iron, as well as DIC and oxygen. The ocean and atmosphere components are coupled daily without flux corrections.
The Hadley Centre Global Environment Model Version 2 Earth System Model (HadGEM2-ES, Collins et al., 2011;HadGEM2 Development Team, 2011) is also a coupled AOGCM and has an atmospheric resolution of N96 (1.875 • × 1.25 • ) with 38 vertical levels. The ocean model has a 1 • × 1 • horizontal resolution (increasing smoothly to 1/3 • at the equator) with 40 unevenly spaced depth levels. The ocean biogeochemistry component of this model is Diat-HadOCC, which is an NPZD model with two phytoplankton functional types (developed from the HadOCC model; Palmer and Totterdell, 2001), and includes the nutrient cycles of nitrogen, silica and iron along with prognostic tracers such as dissolved oxygen.
We use output from the ∼ 1000 yr control (piControl) experiments of MPI-ESM-LR and HadGEM2-ES, which pre-  Table 1). The piControl runs were initialised from a preindustrial spin-up to pseudo-equilibrium. Boundary conditions for the historical experiments are prescribed from observations as an evolving record of climate forcing factors. These include external forcings from: historical greenhouse gas concentrations, tropospheric and stratospheric ozone changes, surface emissions of tropospheric aerosols and land use changes as well as the natural forcings from changes in solar irradiance and volcanic aerosols. Each member of the model ensemble is initialised from a different point in the control simulation in order to create a spectrum of equally plausible historical simulations each starting at a different phase of internal variability. The MPI-ESM-LR and HadGEM2-ES CMIP5 experimental design and spin-up are described in Mauritsen et al. (2012) and , respectively. A persistent climate drift in HadGEM2-ES oxygen fields has been isolated from a 20 yr low-pass filtered version of the control run and subtracted at each grid point of the historical [O 2 ] data using the corresponding segment of the piControl integration. Since residual drift can also bias noise estimates required for the optimal fingerprinting procedure (as estimated from long control experiments) an equivalent point-by-point drift is also removed from the HadGEM2-ES piControl integration. In this case removal of a linear trend diagnosed from a low pass filter provides an adequate representation of climate drift behaviour, as distinct from natural internal periodicity.

Optimal fingerprinting method
We test the null hypothesis that historical changes in [O 2 ] are indistinguishable from natural internal variability (β = 0) using an optimal detection algorithm (Hasselmann, 1997;Allen and Tett, 1999). This statistical technique is widely used in the detection and attribution of climate change (e.g., IDAG, 2005;Hegerl et al., 2010) and also provides a powerful test of ESM performance which includes the effect of natural internal variability. We use this technique to regress model method  which estimates the scaling factors (β i ) required to match simulated and observed changes following Eq. (1): where in the single signal case, for the i-th forcing, v i is the error in the model response x i and v 0 is the climate noise in the observations. If the confidence interval that contains β exceeds zero a signal is detected in response to an imposed forcing i. If this scaling factor is consistent with one the simulated and observed responses are said to be similar in magnitude. In this study, the simulation consists of the ESM, all of its parameters and all of the temporal and spatial variations of the external forcing i transformed through the physical and chemical processes represented in the model. Thus, a value of β consistent with one demonstrates that forcing i and the theory are consistent with the observations, and a β estimate which is significantly greater than 0 demonstrates that the null hypothesis of no contribution from forcing i can be rejected. TLS is likely to yield a more robust β estimate than an Ordinary Least Squares (OLS) approach because it also includes a signal error term (v i ) arising from averaging across a finite model ensemble to obtain the simulated response pattern. Signal error (v i ) is inversely proportional to the model ensemble size (n) and can negatively bias scaling factors (Allen and Tett, 1999), particularly for variables where the forced response is small relative to internal variability. Thus, TLS is widely used in optimal detection studies as a more conservative approach which explicitly accounts for the effect of noise in simulated response patterns relative to that which is found in the observations (e.g., Stott et al., 2003Stott et al., , 2008Terray et al., 2012). We focus our single fingerprint analysis on the "all forcings" historical scenario of MPI-ESM-LR and HadGEM2-ES, with model output being bi-linearly interpolated onto a 1 • × 1 • grid and masked to emulate the pattern of missing values found in the observations of Helm et al. (2011) (Fig. 1). [O 2 ] changes between ∼ 1970 and ∼ 1992 are then calculated for these experiments and provided as model response patterns (x HIST ) in the TLS regression against corresponding observed [O 2 ] changes (y) in order to estimate scaling factors (β). The instrumental record of dissolved O 2 measurements is not sufficiently long to get a reliable approximation of internal climate variability (v 0 ), and also includes perturbations driven by external forcing. In order to characterise unforced climate variability in [O 2 ] on a global scale, we estimate v 0 by sampling non-overlapping 22 yr slabs of [O 2 ] fields taken from the long (∼ 1000 yr) control integrations of MPI-ESM-LR and HadGEM2-ES. The non-overlapping model [O 2 ] fields are also masked and re-gridded as "pseudoobservations". Subsampled model piControl output is then used to (1) estimate internal variability in [O 2 ] data and (2) place 5-95 % uncertainty limits on calculated scaling factors.
The TLS regression is carried out in a reduced dimension space where model and observed data are projected onto k leading Empirical Orthogonal Functions (EOFs) of simulated internal variability. Signal-to-noise ratios are optimised in a standard way via normalisation of observations and model response patterns by internal climate variability (e.g., IDAG, 2005), a transformation which down-weights patterns of [O 2 ] change with high internal variability and vice versa. Dimensionality of the detection space in this study is further reduced by averaging across multiple en-semble members in each historical CMIP5 experiment (see Table 1), and by calculating zonal means for observed and modelled [O 2 ] changes. The analysis of Helm et al. (2011) shows pronounced global [O 2 ] decreases between ∼ 1970 and ∼ 1992 in all ocean basins, accommodating a zonally averaged spatial domain. This removes small-scale variability in [O 2 ] and allows the detection analysis to focus on global and basin-scale patterns of oxygen change. Gridded dissolved oxygen data are also smoothed vertically and horizontally. Modelled [O 2 ] changes are calculated as temporal averages of annual data between 1960-1980 and 1990-2000 in order to be consistent with the historical spread of observations, with sensitivity tests demonstrating simulated [O 2 ] changes to be relatively invariant across a range of small changes in averaging period.
We prepare model fingerprints and observations for the optimal detection analysis using two different zonal averaging schemes: (1) 1-D analysis: zonal mean of depth-averaged [O 2 ] changes between 100 and 3000 m; (2) 2-D analysis: zonal mean [O 2 ] changes explicitly resolve depth (between 100-3000 m) globally and separately for the Atlantic and Pacific basins. Both MPI-ESM-LR and HadGEM2-ES models are used in the 1-D analysis. The MPI-ESM-LR model only is used to extend the work and provide a more detailed 2-D analysis. For the 2-D optimal detection, masked MPI-ESM-LR output and observations are remapped onto a ∼ 5 • latitude by ∼ 10 • longitude grid with 40 unevenly spaced zlevels to mediate the effects of internal noise on the signal whilst still retaining the depth structure of meridional [O 2 ] change between ∼ 1970 and ∼ 1992. The use of several models and several spatial averaging schemes provides multiple model fingerprints that are used to quantify possible errors in model response patterns ("structural uncertainty") driven by inadequate representation of physical and biogeochemical process in ESMs (e.g., Hegerl and Zwiers, 2011).
In order to avoid spurious detection it is a necessary prerequisite that the internal variability estimated from control simulations (v 0 ) provides a realistic estimate of observed climate noise in [O 2 ]. As such, the number of EOFs retained in the optimal detection analysis is guided by checking the fidelity of model simulated internal variability against the residual observed variance at k truncations using a standard residual consistency F test (Allen and Tett, 1999). This check is used to test the null hypothesis that internal variability as simulated by models is consistent with observed variability on the scales retained in the analysis. Failure of the residual consistency test could also indicate that the timing or pattern of ESM response is incorrect. We choose to truncate the 1-D MPI-ESM-LR analysis at 40 EOFs (k = 40) since this is the first truncation for which the residual test passes the consistency check, reducing the likelihood of keeping poorly sampled higher order EOFs in the optimal detection. Fewer truncations (k = 31) are needed to pass the residual consistency check in the 1-D HadGEM2-ES analysis, however, scaling factors are not substantially different at higher k A higher number of retained EOFs is consistent with the greater amount of information needed to describe spatial patterns of both depth and latitude variations, with more modes being necessary to explain variability in the depth-resolved signal.
In addition to the residual consistency test, we assess the reliability of model simulated climate variability by comparing piControl output with detrended subsurface [O 2 ] measurements from two long-term time series: Ocean Station Papa (1956( , Whitney et al., 2007 and the Oyashio Current region (1968( , Ono et al., 2001. Observed decadal standard deviations calculated for both time series fall within the 10-90 % ranges of MPI-ESM-LR control simulation estimates, demonstrating that this model provides a robust estimate of internal variability in [O 2 ] on decadal timescales ( Table 2). The HadGEM2-ES control simulation significantly underestimates decadal variability in [O 2 ] when compared to time series data and is, thus, less reliable than simulations from MPI-ESM-LR in the context of our analysis.  (Fig. 2) Ilyina et al. (2013) present a detailed comparison between biogeochemical tracers in MPI-ESM-LR CMIP5 historical simulations and observations using a range of statistical metrics to assess model capability.

Model-data comparison
A marked meridional structure also exists in observed depth-averaged zonal mean [O 2 ] changes, with deoxygenation increasing with latitude poleward of 40 • (up to 12 µmol kg −1 ) in both hemispheres (Fig. 3). Pronounced deoxygenation in the mid-to high-latitude ocean is opposed by no change or a small zonal mean [O 2 ] increase of The consistency between models and observations is further examined in zonal mean sections as a function of depth (Fig. 4). [O 2 ] data show acute deoxygenation of the ventilated thermocline (100-1000 m depth) at all latitudes, with large oxygen decreases (> 10 µmol kg −1 ) extending throughout the water column poleward of 40 • in both hemispheres (Fig. 4a). These regions of deoxygenation are countered by areas of increasing [O 2 ] (5-10 µmol kg −1 ) located between 30 • S and 30 • N below 1000 m depth and within the shallow subtropical gyres (15-30 • (Fig. 4b). MPI-ESM-LR also displays significant deoxygenation between 40 • S and 60 • S, but this is opposed by a limb of [O 2 ] increase within the interior of the Southern Ocean (a feature which is entrained into the depth-averaged trend of Fig. 3). HadGEM2-ES exhibits deoxygenation in the upper ocean at mid-to high latitudes in both hemispheres, with [O 2 ] depletion extending down to ∼ 3000 m depth south of 65 • S and at ∼ 60 • N (Fig. 4c) [O 2 ] are given at the 95 % confidence level and are associated with instrumental uncertainty, un-resolved ocean processes and methodological uncertainty in forming the zonal averages (derived from an a priori estimate of noise using the method of Bindoff and Wunsch, 1992).
This failure of the models to replicate the low-latitude OMZs is discussed in Sect. 1 and is well known. Overall, simulated [O 2 ] changes for both models tend to be similar but smaller in magnitude than recorded by observations, with MPI-ESM-LR exhibiting generally more skill at reproducing patterns of observed [O 2 ] change as a function of depth.
On a basin scale (Fig. 5) there is a higher level of agreement between modelled and observed water mass changes over the analysis period in the Pacific compared to the Atlantic. MPI-ESM-LR reproduces observed deoxygenation of the high-latitude North Pacific down to ∼ 3000 m depth, and shows a general trend towards decreasing [O 2 ] levels within the ventilated thermocline. In comparison, the structure of modelled oxygen changes within the Atlantic Ocean is largely inconsistent with observations. MPI-ESM-LR simulates extensive deoxygenation (up to 4 µmol kg −1 ) throughout much of the Atlantic between 1000 and 3000 m depth, and does not resolve major regions of observed [O 2 ] increase in the interior of the Atlantic between 30 • S and 30 • N. In both the Atlantic and Pacific basins, following the global trend, MPI-ESM-LR does not reproduce the pronounced deoxygenation signal recorded within the low-latitude OMZs. This model-data mismatch is most apparent in the thermocline of the tropical Atlantic where MPI-ESM-LR shows strong [O 2 ] increases of > 5 µmol kg −1 in opposition to observed [O 2 ] depletion.
Changes in upper ocean stratification are associated with changes in the ventilation (and oxygenation) of subsurface water masses. Figure 6 shows historical changes in zonal mean upper ocean density stratification as a function of lat- Southern Ocean both MPI-ESM-LR and HadGEM2-ES underestimate the observed stratification change signal, with major increases in stratification between 50 • S and 60 • S not reproduced by either model. Analysis of the density structure of these models show that in this region the density profile is more homogeneous than observed with more vigorous vertical mixing.

Optimal detection analysis
We apply the optimal detection approach outlined in Sect. 2.2 to quantitatively investigate the consistency of modelled and observed changes in [O 2 ], and to assess the detectability of observed changes in response to external forcing. Scaling factors (β) resulting from the single fingerprint analysis confirm that a statistically significant change in observed [O 2 ] in response to external forcing is detected and inconsistent with simulated internal variability at a 90 % confidence level (Fig. 8) models and observations have significant correlations in the latitudinal pattern of variation. However, since these β values are greater than one we can infer that the ESM simulated [O 2 ] responses to external forcing are significantly underestimated and need to be amplified (by a factor of between ∼ 2 and ∼ 4) to be consistent with observed changes. The residual consistency test passes for both model experiments indicating no inconsistency between residual observed variance and model simulated internal variability, and suggesting that both ESMs simulate the externally forced signal adequately to explain observed [O 2 ] changes. For both MPI-ESM-LR and HadGEM2-ES β estimates for the 1-D analysis are robust across a range of truncations proximal to the chosen values of k, adding confidence to the presented optimal detection results (Fig. 9).
In agreement with the 1-D result, the 2-D detection analysis using MPI-ESM-LR yields a best estimate scaling factor of greater than 1 (β = 2.26) for the global ocean, although in this case the 5-95 % uncertainty bounds on this value are consistent with 1. This range of estimated β values demonstrates that the simulated response of the MPI-ESM-LR model provides a good fit to observed zonal mean [O 2 ] changes between ∼ 1970 and ∼ 1992 in response to external forcing. A positive detection result (β = 5.30) is also found for observed [O 2 ] changes in the Pacific Ocean using a 2-D pattern with MPI-ESM-LR. However, this scaling factor is much larger than 1 indicating poor model-data agreement in amplitude. In contrast, the null hypothesis that observed changes in marine oxygen are caused by natural internal variability cannot be rejected for the Atlantic Ocean, where the 5-95 % range of β estimates are indistinguishable from zero, with a best estimate scaling factor of less than 1 (β = 0.89  ocean and Pacific basin, although the observed response is larger than that simulated by ESMs at all scales.

Discussion
This study presents, for the first time, an optimal fingerprinting analysis detecting statistically significant global decreases in observed oceanic [O 2 ] in response to external forcing, as distinct from internal variability driven by the leading climate modes. The primary natural external forcing imposed on [O 2 ] is explosive volcanism, a perturbation on the climate system that is generally limited to the ocean's upper ∼ 500 m on interannual timescales (Frölicher et al., 2009). Because the observed and modelled deoxygenation occurs throughout the water column and on interdecadal timescales, we can infer that rising greenhouse gas concentration is the main driver of a detectable external forcing on historical [O 2 ] changes between ∼ 1970 and ∼ 1992. In the 1-D analysis we find the most detectable [O 2 ] change signal relative to internal variability at mid-to high latitudes in regions of water mass renewal where the observed subsurface zonal mean oxygen decreases are largest. Helm et al. (2011) attribute ∼ 85 % of global ocean deoxygenation in polar regions to elevated upper ocean stratification which increases the ventilation age of downwelling water parcels allowing for more biological oxygen consumption to occur, consistent with a range of prognostic modelling studies (Sarmiento et al., 1998;Bopp et al., 2002;Matear and Hirst, 2003). We also find significant increases in upper ocean stratification between ∼ 1970 and ∼ 1992 at high latitudes for MPI-ESM-LR and HadGEM2-ES historical integrations concurrent with enhanced subsurface deoxygenation. Moreover, the inability of either model to reproduce observed increases in zonal mean stratification in the Southern Ocean could provide a mechanistic explanation for why modelled [O 2 ] decreases are considerably smaller than observed in this region.  also show that the ocean model within MPI-ESM-LR overestimates ventilation in the Antarctic Circumpolar Current system, producing enhanced oxygenation of the oceanic interior in this region. Large uncertainties are entrained into model estimates of physical circulation changes owing to incomplete assessments of the competing roles of stratification and wind forcing (e.g., Le Quéré et al., 2007;Böning et al., 2008;Downes et al., 2011) in controlling ventilation processes in a warmer world.
Although we find a clearly identifiable influence of external forcing on recorded patterns of [O 2 ] change, best estimates of β for both 1-D and 2-D global analyses are consistently greater than one suggesting that model responses need to be scaled up to match the observations. There are a number of potential explanations for this result. One possibility, endemic to climate sensitive variables that are locally heterogeneous and difficult to simulate (Masson and Knutti, 2011), is that [O 2 ] data could exhibit too much small-scale variability to be directly comparable with smoother model fields. This potential discrepancy could yield observed change signals with artificially inflated amplitudes relative to model response patterns, although we try to account for this effect by spatially smoothing the data prior to model evaluation. CMIP5 historical experiments are too weak or ESMs are less responsive to external forcing than in reality. Since CMIP5 historical forcings are prescribed from observations (Taylor et al., 2009) the former seems unlikely; however, there is evidence to support the proposition that the current generation of ESMs systematically underestimate observed variability in [O 2 ]. For example, in good agreement with our 1-D scaling factors, hindcast simulations initialised globally (Rödenbeck et al., 2008) and for the North Pacific (Deutsch et al., 2005(Deutsch et al., , 2006 underestimate observed interannual to decadal variability in ocean oxygen by a factor of between ∼ 2 and ∼ 3. It has been suggested that coarse resolution models might underestimate variability in simulated passive tracers since they do not resolve mesoscale ocean dynamics (Hirst et al., 2000;Jochum et al., 2007). Moreover, coupling frequency between the ocean and atmosphere (once a day in MPI-ESM-LR and HadGEM2-ES simulations) might also be insufficient to fully capture climate variability in state-of-theart ESMs. This could support the assertion that the response of models to secular climate change may also be underestimated, since a more naturally variable ocean-climate system will likely be more sensitive to imposed external forcings (von Storch and Zwiers, 1999;Swanson et al., 2009). However, the piControl derived estimates of natural internal variability used in this analysis are shown to be consistent with observed variance using a standard residual consistency test and comparison of simulated noise with two observational estimates (Sect. 2.2 and Table 2).
Qualitative model-data comparison (Sect. 3.1) suggests that regional differences between modelled and observed patterns of [O 2 ] change could also contribute to the weaker simulated zonal mean signal. However, consistent with the assumption of the TLS detection model that "structural uncertainty" has the same structure as internal variability (e.g., Terray et al., 2012), model errors generally fall within the range of internal variability (Fig. 7). Regional differences probably result from the omission of key biogeochemical and physical processes that control marine oxygen distribution at a local scale. Particularly, 2-D model response patterns are unable to reproduce a net deoxygenation signal in the tropical thermocline, consistent with deficiencies in modelling studies (Bopp et al., 2002;Matear and Hirst, 2003;Stramma et al., 2012a). Stramma et al. (2012a) report that the spurious tropical [O 2 ] increases at 300 dbar in the UVic ESM may be primarily related to the inability of coarse resolution ocean models to resolve the fine scale equatorial currents which control the oxygen budget of the OMZs (Brandt et al., 2010;Stramma et al., 2010). Their sensitivity study suggests that alternative proposed model deficiencies such as artificially high rates of diapycnal mixing within the tropical thermocline (Gnanadesikan et al., 2007) and the omission of a pCO 2 sensitive C:N stoichiometry in primary production Tagliabue et al., 2011) do not resolve the erroneous tropical [O 2 ] increases in their model. In the case of MPI-ESM, a comparison between the low resolution (analysed in this study) and eddy-permitting model configurations (MPI-ESM-MR with a nominal resolution of ∼ 0.4 • ) shows only a slight improvement in the representation of [O 2 ] in the thermocline (Ilyina et al., 2013). This indicates that eddy-permitting spatial resolution of the ocean model alone is insufficient to solve the coupled models deficiencies with respect to [O 2 ] dynamics. Other model-data discrepancies, particularly simulated oxygen increases within the ocean interior at high latitudes, are likely to be related to persistent errors in model physical mixing and deep ocean ventilation, as reported for historical CMIP5 experiments of IPSL-CM5A and CNRM-CM5.1 (Séférian et al., 2012). In addition, the inability of models to capture [O 2 ] dynamics in ice-covered high-latitude areas can be attributed to uncertainties in the underlying sea-ice models related to the growth and melting of seasonal ice (e.g., . Counter to the global and Pacific Ocean detection results, our analysis finds observed patterns of [O 2 ] change in the Atlantic basin to be indistinguishable from natural internal variability. The high internal variability of Atlantic oxygen has been reported by others (e.g., Johnson and Gruber, 2007) and in an analysis of the essential ocean indices the Atlantic scored poorly from the perspective of the strength of signal-to-noise ratio (Banks and Wood, 2002). These findings are supported by modelling studies which show secular increases in Apparent Oxygen Utilisation (AOU) of the subsurface North Atlantic to be rendered statistically insignificant by internal variability associated with the North Atlantic Oscillation (Frölicher et al., 2009). Thus, in the Atlantic, high levels of internal variability coupled to a smaller observed mid-to high-latitude deoxygenation trend (relative to the Pacific and Southern Oceans; Helm et al., 2011) reduce the signal-to-noise ratio, contributing to the null detection result for this basin. The same conclusion was reached for sea surface salinity where the single fingerprint analysis of Terray et al. (2012) shows robust detection of observed salinity changes in response to anthropogenic forcing for the Pacific basin but with scaling factors consistent with 0 for the Atlantic Ocean.

Summary and conclusions
Our analysis shows that the global consistency and pronounced meridional structure of [O 2 ] changes between ∼ 1970 and ∼ 1992 recorded by Helm et al. (2011) provide a distinct fingerprint of climate change in the oceans, which is robustly detected against internal variability as simulated by models. The unprecedented spatiotemporal coverage of data and sensitivity of [O 2 ] to climate perturbation provides a unique opportunity to validate the response of ESMs to conditions of global change on multidecadal timescales. Rigorous assessment of ESMs against observations using detection and attribution methods supports the need to improve model physics and biogeochemistry, but also reveals that these models already have significant capacity to simulate many aspects of the ocean system, particularly the deoxygenation at mid-to high latitudes. However, both MPI-ESM-LR and HadGEM2-ES underestimate the observed [O 2 ] response to external forcing, suggesting that model projections for future ocean deoxygenation in response to climate change may be too conservative. Furthermore, both models perform poorly at low latitudes, indicating that model projections in that region may not be reliable.
Detectable anthropogenic contributions to recent trends in ocean temperature (Barnett et al., 2005;Pierce et al., 2006;Gleckler et al., 2012) and salinity Terray et al., 2012) have already been identified. Taken together, these marked changes to global heat and freshwater fluxes implicitly support our finding that stratificationdriven global deoxygenation has started to emerge from the envelope of internal variability. We find the most detectable changes in [O 2 ] relative to internal variability to occur at high latitudes where the independent and synergistic effects of secular ocean warming, acidification and deoxygenation could have a major impact on polar ecosystems and biogeochemical cycles (Gruber, 2011). Subsurface [O 2 ] changes at high latitudes (particularly in the Pacific and Southern Oceans) should, therefore, be monitored closely alongside more widely documented [O 2 ] decreases within the OMZs (e.g., Stramma et al., 2008) to better constrain the anthropogenic fingerprint of climate change in the oceans. Models predict that ocean deoxygenation at mid-to high latitudes will continue to intensify under global warming conditions, such that climate-driven perturbation to oceanic oxygen will become more distinct with time. Therefore, on-going observational efforts from time-series, repeat hydrographic sections and global in-situ profiling floats (e.g., Argo;  are crucial to better understanding natural variability in marine O 2 on multidecadal timescales and improving the detectability of emergent anthropogenic trends.