Extreme hydrometeorological conditions typically impact ecophysiological
processes on land. Satellite-based observations of the terrestrial biosphere
provide an important reference for detecting and describing the
spatiotemporal development of such events. However, in-depth investigations
of ecological processes during extreme events require additional in situ
observations. The question is whether the density of existing ecological
in situ networks is sufficient for analysing the impact of extreme events,
and what are expected event detection rates of ecological in situ networks of
a given size. To assess these issues, we build a baseline of extreme
reductions in the fraction of absorbed photosynthetically active
radiation (FAPAR), identified by a
new event detection method tailored to identify extremes of regional
relevance. We then investigate the event detection success rates of
hypothetical networks of varying sizes. Our results show that large extremes
can be reliably detected with relatively small networks, but also reveal a
linear decay of detection probabilities towards smaller extreme events in
log–log space. For instance, networks with

Many lines of evidence point towards an intensification of
certain hydrometeorological extreme events, such as hot temperature extremes
or droughts in many regions of the world over the next few decades

Earth observations (EOs), especially satellite remote-sensing data, encode
relevant information on anomalous ecosystem functioning

Although EOs enable the detection of extremes in the terrestrial biosphere, a
deeper understanding of impacts on ecosystem functioning can be gained from
combining EOs with in situ observations

The site distribution in space of ecological in situ monitoring networks is
typically sparse. One obvious and common critique is that networks emerging
either as voluntary associations of sites or being constructed on the basis
of existing sites (naturally) cannot provide an equitable representation of
the world's ecosystems

In this paper we aim to understand the potential of ecological in situ networks of varying size for monitoring the impact of extreme events. This paper addresses this issue in three steps. (1) We propose an approach for detecting extremes that are of regional relevance. This step is important to avoid a bias toward considering extremes that take place only in high-variance regions, and may be a relevant contribution beyond our application. (2) We explore a series of random networks of varying sizes to explore the expected detection rates. We aim to understand the observed patterns using probabilistic approaches and formulate a theoretical expectation of detection probabilities of extremes. (3) We then analyse the detection probabilities in two real networks (NEON and AmeriFlux) and compare these to random networks of identical size. The paper concludes with an outlook on how our remarks could lead to improvements in network design that could be implemented to improve the detection of extreme events.

We required a catalogue of extreme events experienced by terrestrial
ecosystems in the past several years to analyse the suitability of in situ
networks for detecting them. To create such a catalogue of extreme impacts,
we used extreme negative anomalies of the fraction of absorbed
photosynthetically active radiation, FAPAR. These values are a dimensionless
spatiotemporal indicator of how much solar radiation energy (in the PAR
domain) is effectively absorbed by vegetation, i.e. converted by
photosynthesis

FAPAR is considered an “essential climate variable” (ECV)

The temporal variability of FAPAR is influenced by vegetation development,
but likewise encodes, e.g. fire events and other extreme reductions of FAPAR
that are assumed to have a pronounced effect on GPP. Here we use FAPAR data
derived by the JRC-TIP approach

First, we create artificial random in situ networks in order to systematically study the effects of varying network sizes and as a reference for the analysis of existing networks. Then we analyse existing or recently established in situ networks for their capability to detect the impacts of extreme events.

We use the geographical locations of EC flux tower networks but
to the actual measurements. Our main target is FLUXNET, a global collection
of EC data collected

The National Ecological Observatory Network (NEON;

The question of how to define extreme events in spatiotemporal data cubes is
key to the evaluation of the suitability of ecological in situ networks. One
approach would be to define some global threshold and identify values
exceeding this threshold as potential extremes (“peak over threshold”).
Choosing a global threshold setting is suitable when the question is about
how extremes add up to global anomalies

In the following we develop a strategy to define thresholds of regional
relevance. This is an attempt to find a compromise between fully local and
global thresholding. Our idea builds on the concept of optical types

Estimate mean seasonal cycles of the data sets under scrutiny at each grid
cell

Reduce the temporal dimensionality of the mean seasonal cycles (MSCs) by
a principal component analysis such that each principal component (PC)
represents a main feature underlying the seasonal cycles. The orthogonal
basis for the PCs can be approximated using a random subset of MSCs,
rendering the approach very efficient in dealing with this very large data
set. Figure

Identify pixels of comparable phenology by binning the scores of the
MSCs on the three leading PCs as illustrated in Fig.

Estimate a characteristic FAPAR anomaly threshold in each bin, considering
all grid cell

The top three principal components of the mean seasonal cycles of FAPAR over Europe visualized as red (R), green (G) and blue (B) channels. The first component accounts for 84 % of the variance. The cumulative explained variances in the first two components explain 95 % of the variance, and the first three components sum up to 97 %. Similar RGB colour combinations indicate comparable mean phenological patterns. These similarities are used to define overlapping regions of comparable phenology. Within each phenological region we estimate suitable and spatially varying thresholds as references for flagging potential extreme reductions in FAPAR.

Map of the regionally varying FAPAR threshold used for detecting extreme events. These thresholds are derived within each subregion as defined by the leading PCs of the mean seasonal cycles. The gradient between central and southern Europe indicates that we may classify an event as extreme in one ecosystem that would be considered part of the normal variability elsewhere; i.e. arid ecosystems have lower thresholds of extremeness in FAPAR compared to humid areas.

The rationale behind this approach is primarily that similar mean seasonal cycles indicate which pixels form a “phenological cluster”, requiring the application of similar quantiles. Additionally, the identification of these clusters based on the leading PCs avoids complications of an analogous analysis in geographical space where regions of similar phenology might be spatially separated by some barrier like a different land cover type, orography, or a body of water.

Based on the regional extreme threshold (Fig.

A critical step of this process is defining the search space around each
voxel for detecting potential neighbour extremes that should be concatenated.
Throughout this paper we consider the direct neighbourhood around a central
voxel as follows:

We define a spatial search space

We also define a temporal search horizon

Conceptual visualization of the presented approach. An extreme occurs over a well-defined spatiotemporal domain (which could be asymmetric as shown here on the latitude–longitude projection). The rank of an extreme can be determined, for example, by the anomaly integrated by the red voxels, or the maximum spatial extent (grey area), or the duration along the time axis, amongst other properties. Black lines indicate the spatial position and active time of three in situ measurement stations. In this example, only one site would have coincided with the extreme and would be considered as a potential basis for exploring the in situ effects of the event.

In summary, in this study we used the following settings:

Mean seasonal cycles computed over a time span from 2001 to 2014.

The first three PCs binned using a grain size of 4 % of the range of the first PC.

For each bin in the PC space and its surrounding 26 cells we estimate the
quantile

The search space for detecting extreme events is parameterized with

Comparison
of average detection rates for randomly placed networks of different sizes in
Europe for the period from 2000 to 2014. The colour code shows the moderately
exponentially increasing size of networks under consideration. Lines show the
average percentage of detected events by

In situ observations typically capture subgrid-level processes or footprints.
For the sake of simplicity, here we assume that each point measurement is
representative of one pixel

To better understand expected extreme event detection rates, we initially
explore random networks and their hypothetical capability to detect extreme
FAPAR reductions. We focus on Europe and vary the network sizes from

Figure

A different view on this phenomenon is offered by Fig.

The results shown in Fig.

This formulation helps explain the parallel decline (linear in log–log) in
the detection probabilities for small extremes: we can rewrite
Eq. (

In other words, the observed detection probabilities for small extremes are higher than expected, whereas detection probabilities of large extremes are lower in random networks compared to theoretical expectations. Our hypothesis is that these discrepancies are related to the spatiotemporal correlation structure of the extreme events, which is not taken into account in the above theoretical analysis.

Comparison of the affected area of extremes (continuous lines are a
subset from Fig

In order to investigate the discrepancy revealed in
Fig.

One doubt in applying a regional event detection approach was whether key
aspects of extreme event distributions would be affected. Occurrence
probabilities of extreme events in the terrestrial biosphere have often been
reported to follow a power law of the form

Without over-interpreting these patterns

Our results so far show that random networks may differ somewhat from our expected detection rates for various reasons. But the overarching hypothesis is that even relatively small networks may have a good chance of detecting large-scale extreme events. We therefore consider the configuration of real EC networks. We now focus on the US (continental areas only) instead of Europe. We have two networks with very different histories and therefore configuration – AmeriFlux and NEON – and we consider them both together. Again, we compare our results to random networks of equal size.

The starting point for our considerations was whether ecological in situ
networks have effectively been able to detect the most relevant extreme
events experienced by land ecosystems due to their network construction, or
if these were lucky circumstances. We therefore ranked the 100 largest events
detectable in the continental US by their integrated FAPAR anomalies. We then
counted the number of events that could have been detected by at least one of
the AmeriFlux or NEON towers, or by taking both together (if all towers would
have been active over the entire monitoring period).
Figure

Due to its large network size, AmeriFlux detects many more extremes than NEON
(128 vs. 39 sites in the contiguous US, excluding Alaska and islands).
Concatenating both networks helps increase the detection rates for small
events. Our next question was whether these detection rates are comparable to
random networks of the same size. For the case of NEON we find that the
median detection rate of randomly designed networks is slightly higher
compared to the real network – which still remains above the 2.5 percentile.
At first glance this is an unexpected finding: we would expect that undesired
vicinity may occur by chance in a random network, increasing redundancy among
towers in space compared to the very systematic sampling design of NEON

The equivalent experiment conducted on the AmeriFlux network yields much
higher detection rates for the random networks compared to the established
network (Fig.

Another aspect to investigate in this context is concatenating NEON and
AmeriFlux (both data sets are intended to be freely available to the research
community, Fig.

Comparison of the potential of NEON (39 terrestrial sites) and
AmeriFlux (128 sites) for detecting extremes defined by varying thresholds in
the contiguous continental US (excluding Alaska and islands). The purple
dashed line shows a merged AmeriFlux–NEON network. Dashed lines enveloped by
a 95th percentile range are detection rates of random networks. The sizes of
the random networks correspond to NEON (blue) and AmeriFlux (brown) and
summarize 100 repetitions. We also show the

Reliable event detection algorithms are a prerequisite to addressing the
question of how effective in situ networks are for detecting extreme events
of a given geographical extent. Our aim here is to classify events as
“extreme” if they exceed an anomaly value that is unusual across regions
that follow the same main phenological pattern. This contribution could be
relevant to other studies beyond the present application. This method has
advantages over using a global threshold, which fundamentally changes the
obtained picture and leads to a few hotspots of extremes in regions where the
data have high variability

Regarding the details of the chosen methodological approach, one may question
why we propose simply binning the leading PCs derived from the MSC of our EOs.
This approach was mainly developed to effectively deal with the very high
resolution of the underlying data, seeking a very efficient subgridding
approach. One alternative would have been to cluster the PCs directly.
However, besides the computational costs, conventional clustering methods
lead to a non-uniform partitioning of the space spanned by PCs. This
non-uniform partitioning makes it slightly more complicated to identify
neighbouring clusters, which is necessary to stabilize the quantile-based
computation of anomaly thresholds. Having an equal meshgrid over the PCs that
we can also compute on a subset of MSCs renders the approach very efficient
for very large data sets and is completely data adaptive. It was very
important for this exercise to have many small classes, in order to compute a
very well regionalized anomaly threshold (shown in Fig.

A further argument in favour of our approach was that we rely on a limited number of events detected in a finite time horizon of available satellite data. Monitoring 15 years of extreme events probably does not allow us to conclude anything about the future occurrences of extreme events. In this sense, this study can only be read as a call for (re)considering the density of ecological networks in network design studies. An alternative would be to also consider climate projections and put more emphasis on more “vulnerable” ecoregions. Non-stationary climate and environmental conditions notwithstanding, we have to acknowledge that extremes are too rare to derive a spatial occurrence probability using data from the satellite era only.

To the best of our knowledge, there are only a few realized examples of
systematically designed in situ ecological networks. One of the best examples
is NEON, which is therefore particularly interesting in the context of this
study. The underlying design principle is to cluster environmental conditions
and states, including precipitation, radiation, topography and water
table depth, among others

Our finding that concatenating NEON and AmeriFlux would have yielded only a
minimal increase in detection capacities for extreme events can be understood
as a call to avoid co-locating towers in relatively close vicinities – at
least when the objective of detecting extreme events is highly relevant. In
fact, when the objective is to monitor and understand the impacts of climate
extremes on ecosystems, we show here that probability-theoretical
expectations should be taken into account but would need to be extended to
consider temporal autocorrelation as well as the event detection approaches
chosen. In our case, the latter had a relatively large footprint (

Nevertheless, we think that the remarks presented here could become useful
elements for quantitative network design studies. In our area, earlier
considerations in this direction have put their emphasis on reducing the
uncertainties for upscaling fluxes from the site level to continental or
global flux fields

Overall, this study can be also seen as a prototype. In Appendix B we show that analogous studies can be effectively implemented. There we use the ISMN and detect EO anomalies using a drought indicator. This very brief analysis stresses one additional aspect that we have effectively ignored through the main paper: the importance of keeping network measurements alive over time. Many of the sites have only been active for short monitoring periods, leading to substantial losses in event detection rates. It is the continuously sustained measurement networks that will substantially improve event detection rates in the long term.

This study tries to understand to what degree ecological in situ networks such as AmeriFlux or NEON can capture extreme events of a given size that affect land ecosystems. We find, for instance, that the 10 largest that have occurred in the US between 2000 and 2014 would all have been identified with the current networks, offering a good perspective for in-depth site-level analyses of these phenomena. Concretely, this finding means that there is a high chance of capturing major extreme events – beyond the very few (2–3) prominent events that may receive major media coverage such as the 2003 heatwave in Europe or the 2012 US drought. In general, we find that “large” extreme events could have been detected in a very reliable way, whereas there was a linear decay of detection probabilities for smaller extreme events in log–log space. We can explain this general behaviour with straightforward considerations in probability theory, but the slopes of the decay rates deviate: while we find lower detection rates for the very large extremes, the opposite is the case for very small extremes. Experiments with artificial networks reveal that these deviations stem both from autocorrelation issues and the exact implementation of the detection algorithm.

Our original motivation for pursuing this study was the question of whether one could optimize the design of ecological in situ networks for maximizing the detection rates of extreme events. Indeed, we find some general rules; for example, when the goal is detecting very large events (i.e. low-rank events), network sizes can differ by up to 2 orders of magnitude but still yield nearly comparable detection rates. Only if the goal was to reliably enhance the detection probabilities of small-scale events would a disproportionate “investment” in large networks be required, which would then also become orders of magnitude more efficient compared to the small networks.

However, any inference on the future spatial occurrence probability of extremes is not tenable based on data from a decade of observation. It is not only data paucity that limits our insights here: quantitative network design is per se non-trivial in a changing world. We find, however, that certain general patterns could be taken into consideration, for instance the fact that event occurrence probabilities are clearly inversely related to detection probabilities on a very well defined and robust scale, and that the power law distribution of extreme event size seems to have practical relevance for network design purposes.

The JRC-TIP product based on MODIS collection 5 at 1 km resolution is available upon request from the corresponding author.

In the following we develop a strategy for defining thresholds of regional relevance that are computationally suitable for dealing with high-resolution remote sensing data like the 1 km FAPAR data considered here. Our aim is to find regions of comparable phenology. Our assumption is that the expected seasonal cycle in FAPAR is a good representation of overall phenology and hence ecosystem type.

The first step considers the data set of mean seasonal FAPAR patterns

In the second step, we use principal component analysis (PCA) to reduce this

Common patterns of seasonality are identified by first estimating the

Third, the question is how to identify regions of similar phenology in this
continuous space spanned by the principal components. One could use, for
instance, some clustering algorithm. However, given the high density of
spatial points and the continuous sampling, an equivalent approach is to
choose an equidistant grid in the space of the principal components. We
choose a very dense grid, such that each cell is as wide as 4 % of the
range of the first PC. We then define an FAPAR anomaly threshold as a
predefined quantile based on the distribution of FAPAR values separately for
each grid cell and its 26 neighbours in the space of the leading 3 PCs. This
threshold is assigned to all points in the respective grid cell represented
herein. This threshold is assigned to the all points represented therein.
Figure

Illustration of identification of regions with similar threshold: we
define a grid in the space of the leading PCs (geographically shown in
Fig. 1), where each mesh width corresponds to 4 % of the total min–max
range of the first PC. We assign percentile thresholds as calculated over a

We have now proposed a FAPAR threshold for each point and can map this
threshold back to the geographical space by remapping each point to the known
geographical coordinates

Artificial data example.

Artificial data example considering the actual event detection
algorithm.

Figure

Ignoring the time domain: in this case, the empirically identified
detection rates correspond exactly to the theoretical detection
probabilities. This finding reveals that the spatial correlation structure
does not explain a deviation from the theoretically expected pattern (compare
Fig.

Considering spatial and temporal correlations: in this case we find a
tendency towards lower detection probabilities. This effect becomes more
pronounced with larger extremes and spatiotemporal autocorrelation (see
Fig.

However, the approximation of the expected probabilities for the small events
is still inconsistent with our empirical finding (recall
Fig.

Ignoring the time domain: using a large search radius for detecting
extremes (which is clearly necessary in real and e.g fragmented landscapes)
leads to increased event detection rates. This effect can lead to higher
detection rates that exceed the simple statistical expectations as derived
from the binomial distribution by several orders of magnitude in the case of
small extremes (see Figs.

Considering the full spatiotemporal case reduces the discrepancy
slightly (i.e. for large events that would be detected anyway), but still
results in an overestimation (see Fig.

These numerical experiments highlight some of the issues that need to be considered in evaluating real networks or quantitative network-design: the phenomena we aim to monitor are highly autocorrelated in time, which leads to considerable edge effects for large events. Therefore, theoretically expected detection rates estimated from the binomial distribution are overly optimistic for large events – unless the effects of autocorrelation and edge effects as a consequence for large events are analytically taken into account.

Average detection rates of extremes of given ranks (each line
represents the rank of an extreme event) across varying network sizes in
logarithmic representation

Comparison of the affected area of extremes
(Fig.

The probability distribution of areas affected by extremes in

The approach for testing a network design for its capacity to detect extremes
is generic by construction. As an additional demonstration we explore the
capacity of the International Soil Moisture Network (ISMN)

Direct observations of soil moisture from satellites are available

Further, a local 10th percentile threshold is applied on the SPI time series
to flag dry events with subsequent detection of the large connected events.
The choice of the local threshold is consistent with the typical
meteorological/climatological use of SPI time series. Hence, in contrast to
biophysical applications as presented in the main part of the paper, global
or regional thresholds might not be physically meaningful for evaluating the
local impacts of climate variables. Since meteorological reanalyses typically
operate at much coarser resolution than EO data sets, for the analogous
analysis presented here both the spatial and temporal search space are chosen
to comprise only the spatially and temporally adjacent voxel (i.e.

International Soil Moisture Network and its capacity to detect SPI extremes in Europe. Again red line shows the reduction of detection capacity due to inactive towers. Randomly placing observation years in space and time leads to higher detection rates for large extremes, and lower rates for small extremes.

To evaluate the ISMN, all station locations and the periods of active data
sampling of each station were used for spatiotemporal intersection with the
SPI extremes in two different setups: firstly, we consider all stations
active only in periods when these stations were collecting data (“dynamic”
network), and secondly, a “static” (counterfactual) situation is taken into
account, where all stations are taken as active throughout the entire
ERA-Interim period. The comparison was restricted to Europe due to data
availability (i.e. most regional networks that form ISMN are operated in
Europe

If we consider the full spatiotemporal intersection we find that only the
first five SPI extremes would have affected areas where the ISMN has stations
(Fig.

International Soil Moisture Network and its capacities to detect SPI
extremes in Europe vs. a random network for the 1980s

An interesting feature of ISMN is that the network has changed its structure
over the last decades to a very large extent. In the eighties, all station
locations are confined to eastern Europe (Fig.

Number of stations in the International Soil Moisture Network over time confronted with drought-affected area.

The first three authors equally contributed to analyses presented in this study. JFD helped in deriving the probability-theoretic explanations for the identified patterns. All authors provided substantial input to the design of the study and discussion of the results.

This study was supported by the European Space Agency with the Support to
Science Element STSE “Coupled Atmosphere Biosphere virtual LABoratory
project CAB-LAB” (see