The vehicles that fly the satellite into a model of the Earth
system are

Earth system models (ESMs) are complex software that capture our knowledge of
how the ocean, atmosphere, land, and ice operate and interact. ESMs provide
scientists with powerful tools to better understand our global environment,
its evolution, and the potential impact of human activities (e.g. analyses of
relevant processes, and their interaction and feedback mechanisms). ESM
applications range from numerical weather prediction (NWP), to seasonal or
decadal forecasting

Before being used for predictions, ESMs and their components should be
confronted with observations in order to assure their realism (validation).
Such validation procedures can be extended to standardized assessments of
model performance in so-called benchmarking systems by evaluation of a set of
observation-based metrics

Such confrontation with observations is hampered by the fact that observed
and modelled quantities typically differ in nature or scale (in space and
time). For example, a flask sample of the atmospheric carbon dioxide
concentration provides a value at a specific point in space and time, whereas
an atmospheric tracer model operates in a discretized representation of space
and time, i.e. on values that refer to a box in the atmosphere and a
particular period of time. Any comparison of the two quantities (modelled and
observed) must hence take the uncertainty arising from this representation
error into account

The link from the model to the observations is provided through a set of
relationships expressed in terms of an observation operator. We can
think of an observation operator as an arm that enables the ESM to access a
particular type of observation (see Fig.

Schematic of an ESM assessing several data types via observation
operators

The layout of the remainder of this paper is as follows. Section

Model–data comparison at the sensor level (level 1, solid arrows) and at the level of geophysical variables (level 2, dotted arrows). Ovals denote data, and rectangles denote some form of processing.

Mathematically, the observation operator is defined as mapping

The solid path in Fig.

Figure

The simulation of spectral radiances at the sensor level requires information
from the atmosphere and the land/ocean surface, including the description of
ice or snow covers. Hence, the observation operator typically consists of
various modules. First, from the model state, the relevant electromagnetic
signatures are simulated. For example, for a passive optical sensor observing
the terrestrial vegetation, this would be the reflected sunlight, and it
would be computed by a model of the radiative transfer within the canopy; for
examples, see

Each type of observation requires its own observation operator in order to be
accessible to a model. The complexity of the observation operator typically
reflects a compromise between the accuracy required for the application at
hand and the available computational resources. In a space mission, the
observation operator depends on characteristics such as the geometry of the
observation (as a function of the orbit of the platform) or the measuring
principle and therefore the spectral sensitivity of the sensor. The observation
operator also depends on the formulation of the dynamical model. One aspect
is the state space, which depends on the model formulation. For example, an
atmospheric model can either diagnose clouds or include them in the state
space

Generic scheme of an observation operator for spectral radiance. Oval boxes denote data, and rectangular boxes denote processing.

The crucial role of observation operators is reflected in comparison
exercises, such as the radiation transfer model intercomparison (RAMI)
initiative for the transfer of radiation in plant canopies and over soil
surfaces

This section starts with an introduction of the formalism behind advanced
data assimilation and retrieval schemes. The details of the formalism are
useful to understand the application examples in this section (and the
commonalities between assimilation and retrievals) and the need for
derivative information that is discussed in Sect.

Data assimilation is a procedure to combine the information from observations
with the information in a dynamical model. There is a range of data
assimilation techniques with varying degrees of sophistication. The simplest
techniques try to replace a component of the model state vector with an
observation or, more precisely, some average of the two. More advanced
approaches can assimilate observations

The function

Equation (

The model dynamics are even more emphasized when the scheme of
Eq. (

In the 4D-Var approach, the vector of unknowns

The

In the case of linear

In the non-linear case (i.e.

The alternative to the above assimilation approaches (which are based on
linearizations) are ensemble methods, such as Markov chain Monte Carlo

We used Eq. (

Another perspective on the assimilation of level 1 data is to regard it as an
advanced form of retrieval, and to regard the assimilation system as an advanced
retrieval algorithm that optimally combines the information from remote
sensing, radiative transfer, and dynamical model. The other point to note is
that

The prime example of an atmospheric 4D-Var system is the one

A prominent example of a variational ocean assimilation system was set up around the MIT General Circulation Model (MITgcm;

A recent example of a regional variational assimilation system for the
coupled ocean–sea-ice system in the northern latitudes was developed by

An example of the global terrestrial vegetation is provided by the Carbon
Cycle Data Assimilation System (CCDAS). Initially set up for the assimilation
of in situ observations of the atmospheric CO

The integrated retrieval of

An example of the land surface is the Joint Research Centre Two-stream
Inversion Package (JRC-TIP)

The Earth Observation Land Data Assimilation System

A serious practical difficulty in data assimilation is the specification of

A related topic is the consistency of the prior information
(

Observing system simulation experiments (OSSEs) and quantitative network
design (QND) are two methodologies that evaluate observation impact on
assimilation systems. Through an observing system or observational network,
we understand the superset of all observations that are made available to an
assimilation system. We only give a brief introduction to the topic, as QND
for the carbon cycle is addressed by another contribution to this special
issue (

An OSSE

QND (for overviews, see

For both approaches, OSSE and QND, the importance of suitable observation operators is obvious. A disadvantage is that the result depends on the model. Both techniques require the specification of data uncertainties for the hypothetical data streams to be evaluated.

This section first summarizes how the capability to evaluate derivatives of the observation operator is used in efficient schemes for retrieval, assimilation, or QND. It then introduces a technique for providing derivative information.

In variational assimilation, Eqs. (

Likewise, the Kalman filter requires derivatives of

As mentioned, advanced retrieval algorithms are based on the same equations;
i.e. they typically solve Eq. (

Traditionally, derivatives were approximated by multiple forward runs (finite
difference approximation)

Both disadvantages can be avoided by automatic differentiation

For variational assimilation, we require the derivative of the scalar-valued
cost function

A particular advantage of AD is that it can guarantee readability and
locality

Since an AD tool operates at the code level, it is restricted to a particular
programming language. For the most frequently used programming languages in
Earth system science, namely Fortran and C, AD tools are, however, available.
It is a considerable effort to develop and maintain an AD tool at a level
robust enough for relevant scientific applications. Over the last decade,
tool development has made good progress, and there is a long list of
successful AD applications to component models of the Earth system. A prime
example is the above-mentioned MITgcm, which is compliant with multiple AD
tools

In some cases, analytical formulations of the derivative can be derived and
implemented with the observation operator

Whether coded by hand or by an AD tool, the differentiation process typically
reveals issues in the function code that are not apparent otherwise. A
standard example is the square root e.g. used in the computation
of the norm, the derivative of which tends to infinity as the argument tends
to 0. Infinite sensitivities were typically not intended when the model code
was designed, and we can regard a differentiable reformulation as model
improvement. A further example is the introduction of a floor value
of 0 to avoid negative values of the simulated ice-covered area. An obvious
implementation of this floor value as the maximum of the simulated area and 0 produces a step in
the derivative at 0. Another example (now for the implementation as a
minimum) is the formulation of co-limitation in biogeochemical models, in
particular for carbon fixation in the photosynthesis model of

Typical formulations of leaf phenology rely on a number of on–off switches
that yield non-differentiable behaviour and hamper the performance in a
CCDAS. This problem was addressed by

EO products can only be accessed by Earth system models via suitable
observation operators. Hence, the careful design of observation operators is
essential to optimally exploit the observational information. There are
overlaps between observation operators used to confront dynamical models with
EO data (validation, benchmarking, assimilation) and forward models used for
retrievals of geophysical products. To allow the most flexible use, observation
operators should be designed in modular form with carefully constructed
interfaces. Several advanced retrieval algorithms and advanced assimilation
techniques (Kalman filter, 3D-Var, and 4D-Var) rely on first derivatives
(linearizations) of the observation operators, i.e. their tangent and adjoint
versions. The assessment of uncertainties and quantitative network design additionally require second derivatives of observation operators. To maximize
their application range, these derivative codes should be developed and
maintained together with their underlying observation operators. This
procedure is, for example, applied at the European Centre for Medium-Range
Weather Forecasts. Automatic differentiation (AD) provides a means to
minimize the development and maintenance effort for these derivative codes.
There is an ever-increasing list of successful AD applications to large-scale
Earth system science codes, including many observation operators. Meanwhile, there
is a tendency among code developers to achieve and preserve compliance with
an AD tool and thus enhance the functionality of their modelling system
through the availability of derivative information. In the development of an
AD-compliant modelling or retrieval system, the system's sustainability can
be maximized by the selection of a mature AD tool that is permanently
maintained by an experienced development team and extended in response to the
evolution of user needs and programming languages. Close collaboration with
AD tool developers has proven beneficial in the efficient setup of robust
AD-compliant systems for modelling

No data sets were used in this article.

The authors declare that they have no conflict of interest.

The authors would like to thank Laurent Bertino, Frédéric Chevallier, Patrick Heimbach, Christian Melsheimer, Bernard Pinty, and the anonymous reviewers for helpful comments. We acknowledge the support from the International Space Science Institute (ISSI). This publication is an outcome of the ISSI's Working Group on “Carbon Cycle Data Assimilation: How to Consistently Assimilate Multiple Data Streams”. Thomas Kaminski was in part funded by the ESA GHG-CCI project. Edited by: M. Scholze Reviewed by: two anonymous referees