Detecting impacts of extreme events with ecological in-situ monitoring networks

. Extreme hydrometeorological conditions typically impact ecophysiological processes of terrestrial vegetation. Satellite based observations of the terrestrial biosphere provide an important reference for detecting and describing the spatiotemporal development of such events. However, in-depth investigations of ecological processes during extreme events require additional in-situ observations. The question is if the density of existing ecological in-situ networks is sufﬁcient for analyzing the impact of extreme events, or what are expected event detection rates of ecological in-situ networks of a given size. To 5 assess these issues, we build a baseline of extreme reductions in the Fraction of Absorbed Photosynthetically Active Radiation (FAPAR), identiﬁed by a new event detection method tailored to identify extremes of regional relevance. We then investigate the event detection success rates of hypothetical networks of varying sizes. Our results show that large extremes can be reliably detected with relatively small network, but also reveal a linear decay of detection probabilities towards smaller extreme events in log-log space. For instance, networks with ≈ 100 randomly placed sites in Europe yield a ≥ 90% chance of detecting the 10 largest 8 (typically very large) extreme events; but only a ≥ 50% chance of capturing the largest 39 events. These ﬁnding are consistent with probability-theoretic considerations, but the slopes of the decay rates deviate due to temporal autocorrelation issues and the exact implementation of the extreme event detection algorithm. Using the examples of AmeriFlux and NEON, we then investigate to what degree ecological in-situ networks can capture extreme events of a given size. Consistent with our theoretic considerations, we ﬁnd that today’s systematic network designs (i.e. NEON) reliably detects the largest extremes. But 15 the extreme event detection rates are not higher than they would be achieved by randomly designed networks. Spatiotemporal expansions of ecological in-situ monitoring networks should carefully consider the size distribution characteristics of extreme events if the aim is also to monitor their impacts in the terrestrial biosphere.


Introduction
Many lines of evidence point towards an intensification of certain hydrometeorological extreme events, such as hot temperature extremes or droughts in many regions of the world over the next few decades (IPCC, 2012).Consequently, much research focuses on understanding how extreme hydrometeorological events affect ecosystems and their functioning (overviews of the state of research and concepts are given in, for example, Smith, 2011;Reyer et al., 2013;Niu et al., 2014;Frank et al., 2015).For instance, ecosystem responses could be manifested in extreme anomalies of phenology (Ma et al., 2015), biogeochemical fluxes (Frank et al., 2015), or even in altered ecosystem structure due to induced mortality (Hartmann et al., 2015).Global analyses of the geographical extent and integrated anomalies of extremes in the terrestrial biosphere reveal that only a very few extremes affect large areas, whereas most events are only of very local relevance (Reichstein et al., 2013).Nevertheless, the integrated effects of extreme events may have global relevance.For instance, Zscheischler et al. (2014a) showed that extreme anomalies in gross primary production (GPP) to a large extent explain global inter-annual variability in gross carbon uptake.
Earth observations (EOs), especially satellite remotesensing data, encode relevant information on anomalous ecosystem functioning (Pfeifer et al., 2012;McDowell et al., 2015).Examples include the exploration of soil moisture anomalies in tandem with climate patterns to understand anomalous vegetation responses (Nicolai-Shaw et al., 2017), snow-cover-induced albedo anomalies with consequences for local climate (Chen et al., 2017), and the impact of weather extremes on vegetation indices to track anomalies in productivity and explain vector-borne disease outbreaks (Anyamba et al., 2014), among many others.The consistent and contiguous spatiotemporal data coverage and, more importantly, the fact that observations of the land surface typically integrate a plethora of processes make EOs very attractive for detecting extremes affecting the land surface.
Although EOs enable the detection of extremes in the terrestrial biosphere, a deeper understanding of impacts on ecosystem functioning can be gained from combining EOs with in situ observations (Frank et al., 2015;Babst et al., 2017).In fact, ecological in situ networks play an increasingly important role in analysing ecological phenomena and often provide a complementary perspective on natural phenomena to EOs (Nasahara and Nagai, 2015;Papale et al., 2015;Wingate et al., 2015) and complement model analyses (Rammig et al., 2015;Sippel et al., 2017).One prominent example is FLUXNET, with its proven record of advancing our understanding of the functioning of terrestrial ecosystems (Balddocchi, 2014).FLUXNET assembles data on the turbulent land-atmosphere exchanges of CO 2 , H 2 O, and energy via the eddy covariance (EC) technique (Aubinet et al., 2000(Aubinet et al., , 2012) ) as they are collected in regional networks at the country or continent scale (e.g. the pan-European Network Integrated Carbon Observation System (ICOS), AmeriFlux, AsiaFlux).Today, many additional networks are operational or are concatenating data from past campaigns.For instance, the International Soil Moisture Network (ISMN) includes a wide range of soil-moisture observations at different depths (Dorigo et al., 2011(Dorigo et al., , 2013)); phenological observations are collected in EUROPhen (Wingate et al., 2015) or Phenocam (Richardson et al., 2013), and one could easily extend this list.
The site distribution in space of ecological in situ monitoring networks is typically sparse.One obvious and common critique is that networks emerging either as voluntary associations of sites or being constructed on the basis of existing sites (naturally) cannot provide an equitable representation of the world's ecosystems (Schimel et al., 2015).In fact, geographic clustering of sites (Oliphant, 2012) as well as incoherence in their temporal continuity is problematic.However, it has also been shown that the problem of spatiotemporal representation for "upscaling" (sensu Jung et al., 2009;Xiao et al., 2012;Tramontana et al., 2016) is relatively minor compared to the advantages of the sheer size of the network (Papale et al., 2015).
In this paper we aim to understand the potential of ecological in situ networks of varying size for monitoring the impact of extreme events.This paper addresses this issue in three steps.(1) We propose an approach for detecting extremes that are of regional relevance.This step is important to avoid a bias toward considering extremes that take place only in high-variance regions, and may be a relevant contribution beyond our application.(2) We explore a series of random networks of varying sizes to explore the expected detection rates.We aim to understand the observed patterns using probabilistic approaches and formulate a theoretical expectation of detection probabilities of extremes.(3) We then analyse the detection probabilities in two real networks (NEON and AmeriFlux) and compare these to random networks of identical size.The paper concludes with an outlook on how our remarks could lead to improvements in network design that could be implemented to improve the detection of extreme events.

Earth observations
We required a catalogue of extreme events experienced by terrestrial ecosystems in the past several years to analyse the suitability of in situ networks for detecting them.To create such a catalogue of extreme impacts, we used extreme negative anomalies of the fraction of absorbed photosynthetically active radiation, FAPAR.These values are a dimensionless spatiotemporal indicator of how much solar radiation energy (in the PAR domain) is effectively absorbed by vegetation, i.e. converted by photosynthesis (Pinty et al., 2009;McCallum et al., 2010).
FAPAR is considered an "essential climate variable" (ECV) (Global Terrestrial Observing System, 2008) because it supports a large variety of studies on the states and variability of the biosphere (e.g.Knorr et al., 2007;Verstraete et al., 2008) and plays an increasingly important role in the investigation of global biogeochemical cycles (in particular carbon and water fluxes).For instance, FAPAR can be conceptually related to GPP (typically estimated from EC tower measurements).This relationship is of the general form GPP = ε × FAPAR × PAR, where ε is some "light use efficiency" and PAR is the "photosynthetically active radiation" (e.g.Monteith, 1977); one may also include other limiting factors.Consequently, FAPAR is an important basis for empirical estimates of GPP (Jung et al., 2008;Beer et al., 2010;Tramontana et al., 2016) and other relevant ecosystem-atmosphere fluxes (e.g.evapotranspiration, ET; Jung et al., 2010) or is directly used as input to diagnostic biosphere models (Seixas et al., 2009;Carvalhais et al., 2010).Given the tight link between FAPAR and land-surface fluxes, this variable has been used in various studies as a reference for monitoring extremes affecting terrestrial ecosystems (Zscheischler et al., 2013;Reichstein et al., 2013).
The temporal variability of FAPAR is influenced by vegetation development, but likewise encodes, e.g.fire events and other extreme reductions of FAPAR that are assumed to have a pronounced effect on GPP.Here we use FAPAR data derived by the JRC-TIP approach (TIP-FAPAR; Pinty et al., 2011).These estimates are based on the MODIS broadband visible and near-infrared surface albedo products from NASA Collection 5 at 1 km spatial resolution (MCD43B.005;Schaaf et al., 2002, available on demand from co-author Thomas Kaminski).These satellite data cover the entire surface every 16 days and the data range from 2000 to 2014; in this study we use data covering Europe and the contiguous US (excluding Alaska).In the following we denote this data set as a 3-D data cube where u is the index across the U grid longitudes, v the corresponding index on V latitudes and t is the index on the T time steps.Each element x uvt is called a voxel and is characterized by a well-defined space-time volume.

In situ networks
First, we create artificial random in situ networks in order to systematically study the effects of varying network sizes and as a reference for the analysis of existing networks.Then we analyse existing or recently established in situ networks for their capability to detect the impacts of extreme events.
We use the geographical locations of EC flux tower networks but to the actual measurements.Our main target is FLUXNET, a global collection of EC data collected (www.fluxdata.org;for in-depth descriptions see Baldocchi, 2008;Balddocchi, 2014).FLUXNET is a bottom-up initiative of regional networks which decided to bring their data to a central repository.Hence, there is no systematic sampling design, resulting in unbalanced spatial coverage biased towards central Europe and the contiguous US (Papale et al., 2015).In the US, FLUXNET is mainly composed of the regional network AmeriFlux https://ameriflux.lbl.gov/ and we use the geographical coordinates of their towers.In Europe, an overview of the most widely used EC can be found in the European Fluxes database http://www.europe-fluxdata.eu, which will be partly maintained in the future by ICOS https://www.icos-cp.eu.Here, we rely on the site distribution described in the LaThuile data set (Balddocchi, 2014).
The National Ecological Observatory Network (NEON; http://www.neoninc.org/;Keller et al., 2008) is an initiative to monitor ecosystems of the United States and was constructed using a systematic sampling design chosen to equitably represent the dominant ecoregions across the US.Comparable to AmeriFlux, NEON sites are equipped with EC towers, but also a large suite of additional instrumentation (SanClements et al., 2015), and human-based observations are recorded frequently (Kao et al., 2012).We also use the site coordinates of NEON to compare these with AmeriFlux in the US.

Regional extreme event flagging
The question of how to define extreme events in spatiotemporal data cubes is key to the evaluation of the suitability of ecological in situ networks.One approach would be to define some global threshold and identify values exceeding this threshold as potential extremes ("peak over threshold").Choosing a global threshold setting is suitable when the question is about how extremes add up to global anomalies (Zscheischler et al., 2014a), i.e. when one is working with extensive data properties where the target is the integral over space and time.However, the consequence of setting a global threshold is that values that are flagged as potential extremes will occur exclusively in high-variance regions, whereas low-variance regions will apparently never experience extreme events.An alternative would be using only highly local thresholds (defined over time at each spatial point x uv ).However, the latter approach would necessarily lead to an equal spatial distribution of extreme event occurrences, which is also not desirable.We want to define extremes relative to regions that are characterized by a similar ecophysiology; i.e. we want to compare each grid cell with other grid cells that have a comparable phenology and search for extremes across these geographical locations.However, as our approach should be entirely data driven, we refrain from using precomputed definitions of ecoregions.
In the following we develop a strategy to define thresholds of regional relevance.This is an attempt to find a compromise between fully local and global thresholding.Our idea builds on the concept of optical types (Ustin and Gamon, 2010), as they have been concretely elaborated for EOs by Huesca et al. (2015).The key idea offered by them is that similar autocorrelation functions allow us to classify ecosystems according to their temporal dynamics (see also Houborg et al., 2015).Huesca et al. (2015) use the leading principal components of the autocorrelation estimated at each pixel across time lags.We have developed a similar scheme to identify regions in the EOs that are of similar dynamics, but we use mean seasonal cycles instead of the autocorrelation patterns.The rationale of our choice is that we want to also maintain differences in amplitude and phasing.The main steps applied for obtaining a regional threshold are the following (for a full description of the regional event detection method, see Appendix A): 1. Estimate mean seasonal cycles of the data sets under scrutiny at each grid cell u, v.The mean seasonal cycles are centred around a mean of zero.
2. Reduce the temporal dimensionality of the mean seasonal cycles (MSCs) by a principal component analysis such that each principal component (PC) represents a main feature underlying the seasonal cycles.The orthogonal basis for the PCs can be approximated using a random subset of MSCs, rendering the approach very efficient in dealing with this very large data set.Figure 1 shows the first three PCs as an RGB image map for Europe.Although the nonlinearity of colour perception by the human eye limits the quantitative informative value of the map, similar colours still represent regions of similar phenological dynamics in FAPAR, so one can gain an impression of environmental heterogeneity in the investigated area.
3. Identify pixels of comparable phenology by binning the scores of the MSCs on the three leading PCs as illustrated in Fig. A1 into bins of equal size.Note that the bins are very small compared to the length of the PC, guaranteeing a very fine binning.
4. Estimate a characteristic FAPAR anomaly threshold in each bin, considering all grid cell u, v belonging to this bin and grid cell u, v in the adjacent bins.Note that in the case of binning the leading three PCs, we have all grid cells u, v in 27 bins to estimate an FAPAR anomaly threshold as a quantile of the anomalies.Figure 2 illustrates the resulting regional threshold of FAPAR anomalies.In southern European ecosystems, smaller negative anomalies of FAPAR (i.e. higher values in Fig. 2) would be used to flag values as potential extremes.The overall geographical pattern suggests that low-variance regions (i.e.arid ecosystems) typically require smaller deviations from the expected variability to be considered abnormal situations.The rationale behind this approach is primarily that similar mean seasonal cycles indicate which pixels form a "phenological cluster", requiring the application of similar quan-tiles.Additionally, the identification of these clusters based on the leading PCs avoids complications of an analogous analysis in geographical space where regions of similar phenology might be spatially separated by some barrier like a different land cover type, orography, or a body of water.

Contiguous spatiotemporal extremes
Based on the regional extreme threshold (Fig. 2) one may flag individual events as potential ("candidate") extremes.However, these initially flagged values may likewise reflect observational noise.Zscheischler et al. (2013) therefore proposed only considering events as extremes if larger geographical areas are synchronously affected or if the extreme persists over some temporal threshold (a very similar idea was proposed in the context of monitoring droughts by Lloyd-Hughes, 2012).This idea is realized by identifying clusters in the data cube where the spatial or temporal voxel neighbours are likewise flagged as potential ("candidate") extremes.Each of these clusters is subsequently considered a singular event; for a conceptual illustration, see Fig. 3.
A critical step of this process is defining the search space around each voxel for detecting potential neighbour extremes that should be concatenated.Throughout this paper we consider the direct neighbourhood around a central voxel as follows: -We define a spatial search space z.Two voxels x uvt and |v − v | ≤ z to obtain a spatial connectivity structure for a given t.
-We also define a temporal search horizon τ from the central voxel to compare x uvt and Visually speaking, we search a square in space and a short line structure in time centred on a locally detected extreme event.Note that a wide range of alternative spatiotemporal connectivity structures could be used, for instance emphasizing the temporal dimension by extending the search space along the t axis.Our choices of z = 5 (corresponding to 25 km) and τ = 1 (16 days) are adjusted ad hoc to the specific properties of the TIP-FAPAR data with its relatively high spatial resolution.By setting z = 5 we guarantee that, for example, similar vegetation types (from which we would assume a similar responsiveness to some extreme event) could be concatenated to one extreme, even if these vegetation types are spatially fragmented due to a mosaic of land cover types.In time we search only starting from the central voxel, but given that we do this at each v, u combination, relatively complex spatiotemporal structures are allowed.Each event may consist of a set of voxels with characteristic geometric properties such as the event average or maximum duration across all affected grid cells, or the maximum areal extent.Another interesting property is the average duration of an extreme per affected grid cell.Another way of looking at these events is to integrate the variable anomaly over the voxels affected by an event, and one could also define additional metrics.

Specific setting for this study
In summary, in this study we used the following settings: -Mean seasonal cycles computed over a time span from 2001 to 2014.
-The first three PCs binned using a grain size of 4 % of the range of the first PC.
-For each bin in the PC space and its surrounding 26 cells we estimate the quantile = 0.025.The FAPAR-anomaly values corresponding to this quantile are assigned as the threshold for the grid cells corresponding to this central bin.
-The search space for detecting extreme events is parameterized with z = 5 and τ = 1 corresponding here to a search space of ±5 km and ±16 days.

Coinciding in situ observations and 3-D extremes
In situ observations typically capture subgrid-level processes or footprints.For the sake of simplicity, here we assume that each point measurement is representative of one pixel x uv [1 km 2 ] and we intersect geographical positions u and v of the in situ data with the occurrences of 3-D extremes.This approach allows us to answer the hypothetical question of whether a certain observation site would have detected an extreme in the past.An intersection considering the time domain as well would allow us to understand if an extreme had a chance of being effectively observed.Along these lines, we can also investigate whether random placement of towers would have improved or deteriorated the capability to detect extreme events.

Random networks
To better understand expected extreme event detection rates, we initially explore random networks and their hypothetical capability to detect extreme FAPAR reductions.We focus on Europe and vary the network sizes from n = 5, . .., 10 000 sites on a logarithmic scale, asking how many of the detected extremes can be identified for each size class.More precisely, we investigate the probability that an extreme event of a given size m (measured in terms of affected area) will be detected by n hypothetical towers P (m, n).All following analyses are based on repeating the tower placement 100 times per size class.We mimic real site placement by assuming that a tower is not mobile -i.e. it remains active at a given location over the entire period covered by the FAPAR observations.Figure 4 shows the average detection success rates for the random networks.The ranks r shown in Fig. 4a are derived here from the integrated spatiotemporal FAPAR anomalies (i.e. the total impact); the latter are displayed in Fig. 4b.Across network sizes we find that empirical event detection probabilities increase with event impact.These increases typically follow a straight line in the log-log plot (power-lawlike behaviour) for small extremes and then level off for very large event sizes.To better illustrate this pattern, we selected the network of size n = 103 and display it as black lines in Fig. 4.This specific network size has a P ≥ 90 % chance of detecting the eight largest extreme events (according to the ranks of integrated FAPAR anomaly; see Fig. 4a).This success rate declines rapidly for smaller events; for example, we have only a ≥ 50 % chance of capturing the r = 39th largest event.An analogous pattern is found for the detection probabilities assessed in terms of spatial extents (Fig. 4c).In contrast, investigating the event durations (Fig. 4d) did not reveal such a clear pattern, which could be explained by the fact that we are dealing with a relatively short time series, in which only a few discrete duration classes can be recognized.The fact that global impacts of extreme events in the terrestrial biosphere behave similarly to those at smaller spatial extents is expected because these properties are known to be strongly correlated as shown in, for example, Reichstein et al. (2013).This study also reported that the duration of extreme events is less strongly correlated with their impact, as we would also suspect from Fig. 4.
A different view on this phenomenon is offered by Fig. C1 showing the detection likelihood for extremes of a given rank r across varying network sizes.Extremes of low rank (i.e.large in impact) need very small networks to be detected with rates near to 100 %, whereas high-rank events (of small impact) need much larger networks to reach similar detection rates.The detection probability scales linearly in loglog space with network size, indicating that one would need to inflate in situ networks by orders of magnitude in order to detect small-scale events at comparable rates to large-scale extremes.

Statistical considerations
The results shown in Fig. 4c are an empirical approach to describe the detection probability of extremes characterized by a given spatial extent m (measured, for example, in terms of the number of pixels or area affected during an event) using a network constructed with n randomly placed towers.In other terms, this figure reports the probability P (m, n) that at least one tower detects the extreme and a single extreme event of spatial extent m is detected by a single randomly placed tower with probability where m max is the maximum possible extent m (in our case the maximally affected area across all time steps).However, an equivalent question is the probability that one extreme is not detected by any of the n towers.According to the binomial distribution, the latter probability is (1 − p) n , and our estimated probabilities should be described by This formulation helps explain the parallel decline (linear in log-log) in the detection probabilities for small extremes: we can rewrite Eq. ( 2) as A Taylor expansion of Eq. ( 3) for a small number of towers n and small event sizes m/m max (here realized by assuming that |n ln(1 − m m max )| 1) yields Further adjusting this formula for small extremes with which, in a logarithmic form, reads We expect that this equation explains the empirically identified parallel lines of positive slope in Fig. 4 and compare our empirical findings to this theoretical expectation.Figure 5 compares the expected and observed detection probabilities.The levelling-off of event detection probabilities for large events is indeed theoretically expected, but the loglinear scaling for small events is expected to be steeper sensu Eq. (2).Our empirical detection probability is lower than the theoretical expected ones for large extremes and higher for small extremes.However, the overall pattern of the expected detection probabilities is well captured by the theoretical expectation.
In other words, the observed detection probabilities for small extremes are higher than expected, whereas detection probabilities of large extremes are lower in random networks compared to theoretical expectations.Our hypothesis is that these discrepancies are related to the spatiotemporal correlation structure of the extreme events, which is not taken into account in the above theoretical analysis.
In order to investigate the discrepancy revealed in Fig. 5, we performed a series of simulations using artificial data that are characterized by varying spatiotemporal correlation structures, and compared these to the expected detection rates.The results of these experiments are reported in Appendix B. There are very few effectively independent observations because the extremes are highly autocorrelated in time.Hence, these strong correlations lead to the fact that the largest spatiotemporal extremes tend to occur at some distance from the boundary of the domain (i.e. from the coasts).Because the networks are randomly placed, i.e. without regard to the differentiated occurrence probabilities of large vs. small extremes, this leads to the observed underestimation of detection probabilities for large extremes.A simple thought experiment can intuitively explain this effect: imagine a landscape that consists of a contiguous, relatively large mainland (e.g.Europe) and a number of islands or otherwise disconnected regions (e.g.Great Britain, Ireland, Sicily) that are all far enough from the mainland that spatiotemporal extremes can by definition not be connected, i.e. exceeding the search space z.In addition, imagine that the few largest extremes that affect the mainland exceed the size of any of the islands.In this case, any tower randomly placed on an island cannot contribute to detecting large extremes, which intuitively illustrates why not taking into account the effects of autocorrelation and edge effects in our analysis results in overly optimistic theoretical predictions of detection rates based on the binomial distribution for real-world landscapes.Contrarily, for medium-sized and small events, the chosen spatial search space of z = 5 leads to an overestimation of detection probabilities in the real data as compared to the theoretical predictions.Nonetheless, the theoretical predictions provide an exact expectation under simplified settings (i.e.no boundary effects, and an event search only in directly adjacent grid cells (z = 1); see Appendix B) and are thus useful for illustrating and understanding the almost linear scaling of detection rates and the size of extremes in log-log space.

Scaling issues
One doubt in applying a regional event detection approach was whether key aspects of extreme event distributions would be affected.Occurrence probabilities of extreme events in the terrestrial biosphere have often been reported to follow a power law of the form p(m) ∝ m −α in the tails, i.e. for some values ≥ m min (see Reichstein et al., 2013;Zscheischler et al., 2014a, for scaling examples in FAPAR and gross primary production respectively).Using a maximum likelihood estimator as suggested by Clauset et al. (2009) and Clauset and Woodbard (2013), we analyse the scaling characteristics of contiguous areas affected by extreme events.We find that the event properties follow a power law (see Fig. C3).The probabilities of areas affected by extremes in both areas decline with α = 1.85±0.007(uncertainties given as standard errors from 1000 bootstrap samples).
Without over-interpreting these patterns (i.e.many processes could lead to the emergence of these power laws, some of which are discussed in Zscheischler et al., 2014b) we consider that this property could be exploited to inform network design issues.According to Newman (2005) and others, there are a few considerations pointing in this direction: the expectation value E[m(r)] of an extreme event of rank r (in this formulation, the largest event has rank 1 as in Fig. 4a) has the form where α is the scaling exponent, and c is some normalization constant -both can be obtained from a fit to the empirically obtained rank function m(r).Applying Eq. ( 7) would allow us to study the network detection probability as a function of rank (see Figs. 4a and C1) and we can insert the expressions into Eq.( 2): Furthermore, using the approximated log-log form of the network detection probability (Eq.7) yields This equation may explain the parallel lines for ranks r corresponding to small extreme event extents m(r) (see e.g.Fig. C1).More importantly, it relates the scaling exponent to the expected detection probabilities.In other words, gaining insights about the scaling behaviour of the extremes can be used to formulate clear expectations about event detection probabilities of a given rank and size.

Comparing AmeriFlux and NEON
Our results so far show that random networks may differ somewhat from our expected detection rates for various reasons.But the overarching hypothesis is that even relatively small networks may have a good chance of detecting largescale extreme events.We therefore consider the configuration of real EC networks.We now focus on the US (continental areas only) instead of Europe.We have two networks with very different histories and therefore configuration -AmeriFlux and NEON -and we consider them both together.Again, we compare our results to random networks of equal size.The starting point for our considerations was whether ecological in situ networks have effectively been able to detect the most relevant extreme events experienced by land ecosystems due to their network construction, or if these were lucky circumstances.We therefore ranked the 100 largest events detectable in the continental US by their integrated FAPAR anomalies.We then counted the number of events that could have been detected by at least one of the AmeriFlux or NEON towers, or by taking both together (if all towers would have been active over the entire monitoring period).Figure 6 shows the number of detected events for these three network configurations (of NEON, AmeriFlux, and both together) as a function of their rank.
Due to its large network size, AmeriFlux detects many more extremes than NEON (128 vs. 39 sites in the contiguous US, excluding Alaska and islands).Concatenating both networks helps increase the detection rates for small events.Our next question was whether these detection rates are comparable to random networks of the same size.For the case of NEON we find that the median detection rate of randomly designed networks is slightly higher compared to the real network -which still remains above the 2.5 percentile.At first glance this is an unexpected finding: we would expect that undesired vicinity may occur by chance in a random network, increasing redundancy among towers in space compared to the very systematic sampling design of NEON (Keller et al., 2008).We conclude here that while the design efforts used in establishing NEON may pay off for certain studies, they are not an effective means to maximize the detection of extremes.This observation again reflects the lack of spatial regularity in the occurrence of extremes.
The equivalent experiment conducted on the AmeriFlux network yields much higher detection rates for the random networks compared to the established network (Fig. 6).We attribute this difference to one particular characteristic of AmeriFlux: many of the sites in this network are colocated on purpose (e.g. to explore spatial heterogeneity or to monitor different disturbance regimes in adjacent and hence climatologically similar ecosystems).Figure 6 shows that AmeriFlux sites have a relatively high degree of spatial clustering.If the target were to analyse continental extreme events and guarantee monitoring the largest events, the AmeriFlux configuration would be suboptimal.In other words, the spatial autocorrelation in an ecological in situ network that was not systematically designed can be outperformed by a random (and hence spatially independent) network.
Another aspect to investigate in this context is concatenating NEON and AmeriFlux (both data sets are intended to be freely available to the research community, Fig. 6 dashed line).Our results show that this approach would marginally increase the detection capacity.One reason for this marginal improvement is again that AmeriFlux and NEON sites are partly geographically co-located and that AmeriFlux -despite being a bottom-up activity -already has a significant spread across the country that is competitive with a novel network designed for the purpose of capturing large-scale extremes.

Regionalized event detection
Reliable event detection algorithms are a prerequisite to addressing the question of how effective in situ networks are for detecting extreme events of a given geographical extent.Our aim here is to classify events as "extreme" if they exceed an anomaly value that is unusual across regions that follow the same main phenological pattern.This contribution could be relevant to other studies beyond the present application.This method has advantages over using a global threshold, which fundamentally changes the obtained picture and leads to a few hotspots of extremes in regions where the data have high variability (for the case of GPP, see Zscheischler et al., 2014b).The effect of building on regional thresholds to delineate which anomalies should be considered "extreme" (recall Fig. 2) is that we find only very moderate geographi-Figure 6.Comparison of the potential of NEON (39 terrestrial sites) and AmeriFlux (128 sites) for detecting extremes defined by varying thresholds in the contiguous continental US (excluding Alaska and islands).The purple dashed line shows a merged AmeriFlux-NEON network.Dashed lines enveloped by a 95th percentile range are detection rates of random networks.The sizes of the random networks correspond to NEON (blue) and AmeriFlux (brown) and summarize 100 repetitions.We also show the 1 : 1 line, which would correspond to perfect detection performance and is the theoretical limit.cal clustering of event occurrences (not shown).From our viewpoint, this is very logical, as there is no reason why relative extremes should preferentially happen in certain regions.Methods of this kind are particularly relevant in times of increasing availability of EOs to detect impacts rather than referring to anomalous observations in the meteorological records, which may or may not affect terrestrial ecosystems.In fact, all of the largest extreme events that have had severe impacts on agriculture and human well-being and attracted the attention of the media are well detectable with our approach.Prominent examples are the 2003 European heat wave (e.g.Ciais et al., 2005), the 2010 Russian heat wave (e.g.Bastos et al., 2014), or the 2012 US drought (e.g.Schwalm et al., 2012), which are all easily detectable both from climate records and remote sensing data.However, the smaller the spatial extents become, the more relevant a remote-sensing-based regional assessment will be.We also expect that a regionalization of this kind could be useful when using more advanced multivariate event detection algorithms (see e.g.Flach et al., 2017) that can tap into the full potential of many EOs.
www.biogeosciences.net/14/4255/2017/Biogeosciences, 14, 4255-4277, 2017 Regarding the details of the chosen methodological approach, one may question why we propose simply binning the leading PCs derived from the MSC of our EOs.This approach was mainly developed to effectively deal with the very high resolution of the underlying data, seeking a very efficient subgridding approach.One alternative would have been to cluster the PCs directly.However, besides the computational costs, conventional clustering methods lead to a non-uniform partitioning of the space spanned by PCs.This non-uniform partitioning makes it slightly more complicated to identify neighbouring clusters, which is necessary to stabilize the quantile-based computation of anomaly thresholds.Having an equal meshgrid over the PCs that we can also compute on a subset of MSCs renders the approach very efficient for very large data sets and is completely data adaptive.It was very important for this exercise to have many small classes, in order to compute a very well regionalized anomaly threshold (shown in Fig. 2), which would not have been achievable using classical climate classifications of ecoregions.A more detailed follow-up study should explore the question of how the choice of the various parameters affects the event detection accuracies.A crucial question in this context will be whether one can tune these parameters such that a baseline of events is well detected.
A further argument in favour of our approach was that we rely on a limited number of events detected in a finite time horizon of available satellite data.Monitoring 15 years of extreme events probably does not allow us to conclude anything about the future occurrences of extreme events.In this sense, this study can only be read as a call for (re)considering the density of ecological networks in network design studies.An alternative would be to also consider climate projections and put more emphasis on more "vulnerable" ecoregions.Non-stationary climate and environmental conditions notwithstanding, we have to acknowledge that extremes are too rare to derive a spatial occurrence probability using data from the satellite era only.

Relevance for network design
To the best of our knowledge, there are only a few realized examples of systematically designed in situ ecological networks.One of the best examples is NEON, which is therefore particularly interesting in the context of this study.The underlying design principle is to cluster environmental conditions and states, including precipitation, radiation, topography and water table depth, among others (Hargrove and Hoffman, 2004).These delineated ecoregions are taken to be representative of approximately homogeneous areas in the mean land-climate system state, and yield an equitable representation of land-surface processes in upscaling activities (e.g. the spatiotemporal inter-and extrapolation of land-atmosphere fluxes of CO 2 , H 2 O and others; Jung et al., 2011;Xiao et al., 2012;Papale et al., 2015) or model-data integration studies (sensu Williams et al., 2009).
Our finding that concatenating NEON and AmeriFlux would have yielded only a minimal increase in detection capacities for extreme events can be understood as a call to avoid co-locating towers in relatively close vicinities -at least when the objective of detecting extreme events is highly relevant.In fact, when the objective is to monitor and understand the impacts of climate extremes on ecosystems, we show here that probability-theoretical expectations should be taken into account but would need to be extended to consider temporal autocorrelation as well as the event detection approaches chosen.In our case, the latter had a relatively large footprint (z = 5) in order to not miss events that may appear fragmented due to, for example, heterogeneous landscape characteristics.Clearly, one would need to determine such parametric choices depending on the type of extreme events and underlying question.
Nevertheless, we think that the remarks presented here could become useful elements for quantitative network design studies.In our area, earlier considerations in this direction have put their emphasis on reducing the uncertainties for upscaling fluxes from the site level to continental or global flux fields (Papale et al., 2015).Focusing on this first-order question is of course essential, before focusing on detecting rare anomalies.This is also reflected in the alternative methodological avenues that were used for addressing the network design problem.For instance, carbon cycle data assimilation systems (CCDAS; Rayner et al., 2005) were very useful for quantitative network design (QND; see e.g.Kaminski et al., 2010;Kaminski and Raynner, 2017), i.e. to evaluate real or hypothetical candidate networks in terms of their ability to constrain target quantities of interest.The quantitative network design (QND) approach within a Carbon Cycle Data Assimilation Scheme (CCDAS) allows us to combine terrestrial, atmospheric and ultimately also oceanic data streams.A key finding so far was that EC networks with one site per ecosystem type achieve excellent performance.QND studies have also been performed for EO data streams such as column-integrated atmospheric CO 2 (Kaminski et al., 2010;Kaminski and Raynner, 2017).But again, none of these studies so far have attempted to unravel the impacts of extreme events on the terrestrial biosphere, which might be a relevant pursuit for subsequent studies.
Overall, this study can be also seen as a prototype.In Appendix B we show that analogous studies can be effectively implemented.There we use the ISMN and detect EO anomalies using a drought indicator.This very brief analysis stresses one additional aspect that we have effectively ignored through the main paper: the importance of keeping network measurements alive over time.Many of the sites have only been active for short monitoring periods, leading to substantial losses in event detection rates.It is the continuously sustained measurement networks that will substantially improve event detection rates in the long term.

Conclusions
This study tries to understand to what degree ecological in situ networks such as AmeriFlux or NEON can capture extreme events of a given size that affect land ecosystems.We find, for instance, that the 10 largest that have occurred in the US between 2000 and 2014 would all have been identified with the current networks, offering a good perspective for in-depth site-level analyses of these phenomena.Concretely, this finding means that there is a high chance of capturing major extreme events -beyond the very few (2-3) prominent events that may receive major media coverage such as the 2003 heatwave in Europe or the 2012 US drought.In general, we find that "large" extreme events could have been detected in a very reliable way, whereas there was a linear decay of detection probabilities for smaller extreme events in log-log space.We can explain this general behaviour with straightforward considerations in probability theory, but the slopes of the decay rates deviate: while we find lower detection rates for the very large extremes, the opposite is the case for very small extremes.Experiments with artificial networks reveal that these deviations stem both from autocorrelation issues and the exact implementation of the detection algorithm.
Our original motivation for pursuing this study was the question of whether one could optimize the design of ecological in situ networks for maximizing the detection rates of extreme events.Indeed, we find some general rules; for example, when the goal is detecting very large events (i.e.lowrank events), network sizes can differ by up to 2 orders of magnitude but still yield nearly comparable detection rates.Only if the goal was to reliably enhance the detection probabilities of small-scale events would a disproportionate "investment" in large networks be required, which would then also become orders of magnitude more efficient compared to the small networks.
However, any inference on the future spatial occurrence probability of extremes is not tenable based on data from a decade of observation.It is not only data paucity that limits our insights here: quantitative network design is per se non-trivial in a changing world.We find, however, that certain general patterns could be taken into consideration, for instance the fact that event occurrence probabilities are clearly inversely related to detection probabilities on a very well defined and robust scale, and that the power law distribution of extreme event size seems to have practical relevance for network design purposes.
where E k the kth eigenvector of length S, and λ k the corresponding eigenvalue.The scores of the kth principal component are given by and k leading A k can be interpreted as a proxy for the characteristic patterns underlying the mean seasonal cycles across space.Figure 1 visualizes the three leading principal components as an RGB-colour composite, revealing a distinct map of European phenological regions.Third, the question is how to identify regions of similar phenology in this continuous space spanned by the principal components.One could use, for instance, some clustering algorithm.However, given the high density of spatial points and the continuous sampling, an equivalent approach is to choose an equidistant grid in the space of the principal components.We choose a very dense grid, such that each cell is as wide as 4 % of the range of the first PC.We then define an FAPAR anomaly threshold as a predefined quantile based on the distribution of FAPAR values separately for each grid cell and its 26 neighbours in the space of the leading 3 PCs.This threshold is assigned to all points in the respective grid cell represented herein.This threshold is assigned to the all points represented therein.Figure A1 illustrates this approach in detail.
We have now proposed a FAPAR threshold for each point and can map this threshold back to the geographical space by remapping each point to the known geographical coordinates u, v.This is shown in Fig. 2.  1), where each mesh width corresponds to 4 % of the total min-max range of the first PC.We assign percentile thresholds as calculated over a 3×3×3 set of mesh elements (shown in orange) and assign these percentiles to the central dots (shown in red).For the sake of clarity, we illustrate the approach only in the space of the leading two PCs.

Appendix B: Spatiotemporal correlations
Figure 5 reveals a strong discrepancy between theoretical and observed detection probability.Here we investigate this discrepancy further.We generated Gaussian data but introduced varying spatiotemporal correlation structures of different degrees.We followed the approach suggested by Venema et al. (2006a, b) to simulate data with a power law power spectrum of some prescribed exponential spectral decay.The method combines an approach for generating spatial fields of a desired correlation structure that likewise have a similar temporal correlation.The idea is that the Fourier coefficients of some artificial data (white noise) are forced to decay as a power law function across frequencies i.e. proportionally to f −β .An inverse transformation to space yields a correlated data field.If we choose β = 0, it corresponds to uncorrelated, β = − 3 5 to moderately correlated, and β = − 8 5 to highly correlated data.These artificial data sets are visualized in Fig. B1g-i.We used a simplified event search radius (z = 1, τ = 1) and investigate two cases: 1. Ignoring the time domain: in this case, the empirically identified detection rates correspond exactly to the theoretical detection probabilities.This finding reveals that the spatial correlation structure does not explain a deviation from the theoretically expected pattern (compare Fig. B1a-c).This is explained by the fact that, although patterns of extreme anomalies might be correlated in space, the tower placement is still random and for sufficiently sparse networks and relatively contiguous landscapes (i.e.only small edges, no islands, etc.) it has no effect.
2. Considering spatial and temporal correlations: in this case we find a tendency towards lower detection probabilities.This effect becomes more pronounced with larger extremes and spatiotemporal autocorrelation (see Fig. B1d-f) due to a stronger tendency for large spatiotemporal extremes to occur away from the domain's boundaries; thus any tower that is randomly placed close to a boundary would have a disproportionately low chance of detecting large extremes.
However, the approximation of the expected probabilities for the small events is still inconsistent with our empirical finding (recall Fig. 5).Hence, we repeat the artificial experiment using the exact algorithmic settings applied to the FA-PAR data: we allow for a tolerance radius (z 1, τ = 1) to identify each extreme by a given tower.Again we distinguish the two cases: 1. Ignoring the time domain: using a large search radius for detecting extremes (which is clearly necessary in real and e.g fragmented landscapes) leads to increased event detection rates.This effect can lead to higher detection rates that exceed the simple statistical expectations as derived from the binomial distribution by sev- In this case, the empirically identified detection rates dramatically overestimate the theoretical detection probabilities.If we induce moderate spatiotemporal correlations in (b), and stronger ones in (c), we still find this pattern, but it is less pronounced for the very large events.This shows that having a large footprint for the event detection algorithm leads to an overestimation of the detection rates of small extremes.If the detection rates over space and time are considered, however, the events are no longer independent due to their temporal autocorrelation.Parts (e) and (f) reveal lower deviations from the expected detection rates, which is a compensating effect of the autocorrelation and event detection method setting.The data corresponding to results in the columns are shown in (g), (f), and (i).
eral orders of magnitude in the case of small extremes (see Figs. B2a-c).
2. Considering the full spatiotemporal case reduces the discrepancy slightly (i.e. for large events that would be detected anyway), but still results in an overestimation (see Fig. B2d-f).For very large events, the lines may even cross in the case of strongly autocorrelated data.
These numerical experiments highlight some of the issues that need to be considered in evaluating real networks or quantitative network-design: the phenomena we aim to monitor are highly autocorrelated in time, which leads to consid-erable edge effects for large events.Therefore, theoretically expected detection rates estimated from the binomial distribution are overly optimistic for large events -unless the effects of autocorrelation and edge effects as a consequence for large events are analytically taken into account.The approach for testing a network design for its capacity to detect extremes is generic by construction.As an additional demonstration we explore the capacity of the International Soil Moisture Network (ISMN) (http://ismn.geo.tuwien.ac.at/; Dorigo et al., 2011), a steadily growing initiative that comprises collections of soil moisture only.Comparable to FLUXNET there is no specific funding for measurement campaigns, and ISMN crucially depends on the contributions of historical observations by the respective communities.

D1.1 Methodology
Direct observations of soil moisture from satellites are available (Liu et al., 2011), but these data still suffer from concatenating different data sources.In fact these transitions make the data set very problematic for detecting extremes -or in other words, extreme event detection may identify the data merging edges.Alternatives are classical drought indicators that can be derived from climatological data only.Here, we rely on the standardized precipitation index (SPI) for detecting extreme events as extracted from SPI and compare it to a random network of the same size (Fig. D1).The SPI is extracted following standard methodology (McKee et al., 1993) from monthly ERA-Interim rainfall data (Dee et al., 2011), using a 3-monthly aggregation window over the 1979-2011.We use the SPI only for illustration purposes until more robust EOs for soil moisture become available -i.e.we assume that low SPI values are proxies for low soil moisture contents.
Further, a local 10th percentile threshold is applied on the SPI time series to flag dry events with subsequent detection of the large connected events.The choice of the local threshold is consistent with the typical meteorological/climatological use of SPI time series.Hence, in contrast to biophysical applications as presented in the main part of the paper, global or regional thresholds might not be physically meaningful for evaluating the local impacts of climate variables.Since meteorological reanalyses typically operate at much coarser resolution than EO data sets, for the analogous analysis presented here both the spatial and temporal search space are chosen to comprise only the spatially and temporally adjacent voxel (i.e.z = 0.5 • and τ = 1 month in the SPI data set).
To evaluate the ISMN, all station locations and the periods of active data sampling of each station were used for spatiotemporal intersection with the SPI extremes in two different setups: firstly, we consider all stations active only in periods when these stations were collecting data ("dynamic" network), and secondly, a "static" (counterfactual) situation is taken into account, where all stations are taken as active throughout the entire ERA-Interim period.The comparison was restricted to Europe due to data availability (i.e.most regional networks that form ISMN are operated in Europe Dorigo et al., 2011).

D1.2 Results
If we consider the full spatiotemporal intersection we find that only the first five SPI extremes would have affected areas where the ISMN has stations (Fig. D1, red line).Higher ranked extremes are less likely of being detected.An annual random site placement (grey lines) would have been more efficient in identifying the extremes.In fact the current geographical coordinates u, v would have only reached the potential of a random network if they had been operated without ceasing over the entire monitoring period (blue lines).But that would have implied much more measurement years than the random site placement.For very high ranks of extremes (the very small events) the continuously operated real-world network would have outperformed the random network.These results are consistent with the results shown in the main paper.
An interesting feature of ISMN is that the network has changed its structure over the last decades to a very large extent.In the eighties, all station locations are confined to eastern Europe (Fig. D2a).In the last decade, western European station networks became active, but both the number and data availability from eastern European stations were severely reduced (Fig. D2b).This change in network design www.biogeosciences.net/14/4255/2017/Biogeosciences, 14, 4255-4277, 2017 materializes strongly in the spatial locations of the detected events: while in the 1980s most extremes in eastern Europe were "seen" by at least one tower and the detection rates in western Europe were poor, this pattern is reversed in the last decade (Fig. D2).Further, both decades highlight that a static random tower placement is more efficient than the current network, which is explicable by the high degree of site clustering.The importance of maintaining continuous observation alive becomes even more evident if one analyses the network development over time in more detail (Fig. D3).In conclusion, the complementary analysis presented here substantiates the main paper in that the consideration of both the spatial location and the availability of historical data is a crucial element to reconstruct the impacts of extreme events in the recent past.www.biogeosciences.net/14/4255/2017/Biogeosciences, 14, 4255-4277, 2017

Figure 1 .
Figure 1.The top three principal components of the mean seasonal cycles of FAPAR over Europe visualized as red (R), green (G) and blue (B) channels.The first component accounts for 84 % of the variance.The cumulative explained variances in the first two components explain 95 % of the variance, and the first three components sum up to 97 %.Similar RGB colour combinations indicate comparable mean phenological patterns.These similarities are used to define overlapping regions of comparable phenology.Within each phenological region we estimate suitable and spatially varying thresholds as references for flagging potential extreme reductions in FAPAR.

Figure 2 .
Figure2.Map of the regionally varying FAPAR threshold used for detecting extreme events.These thresholds are derived within each subregion as defined by the leading PCs of the mean seasonal cycles.The gradient between central and southern Europe indicates that we may classify an event as extreme in one ecosystem that would be considered part of the normal variability elsewhere; i.e. arid ecosystems have lower thresholds of extremeness in FAPAR compared to humid areas.

Figure 3 .
Figure3.Conceptual visualization of the presented approach.An extreme occurs over a well-defined spatiotemporal domain (which could be asymmetric as shown here on the latitude-longitude projection).The rank of an extreme can be determined, for example, by the anomaly integrated by the red voxels, or the maximum spatial extent (grey area), or the duration along the time axis, amongst other properties.Black lines indicate the spatial position and active time of three in situ measurement stations.In this example, only one site would have coincided with the extreme and would be considered as a potential basis for exploring the in situ effects of the event.

Figure 4 .
Figure 4. Comparison of average detection rates for randomly placed networks of different sizes in Europe for the period from 2000 to 2014.The colour code shows the moderately exponentially increasing size of networks under consideration.Lines show the average percentage of detected events by (a) rank, (b) integrated FAPAR anomaly, (c) affected spatial area and (d) event duration.The black line shows the case of a hypothetical network of 103 towers.

Figure 5 .
Figure 5.Comparison of the affected area of extremes (continuous lines are a subset from Fig 4c)and the theoretical expectation according to a binomial distribution and uncorrelated data (dashed lines) for varying network sizes (shown as different colours).Our empirical detection probability is lower than the theoretical expected ones for large extremes and higher for small extremes.However, the overall pattern of the expected detection probabilities is well captured by the theoretical expectation.

Figure A1 .
Figure A1.Illustration of identification of regions with similar threshold: we define a grid in the space of the leading PCs (geographically shown in Fig.1), where each mesh width corresponds to 4 % of the total min-max range of the first PC.We assign percentile thresholds as calculated over a 3×3×3 set of mesh elements (shown in orange) and assign these percentiles to the central dots (shown in red).For the sake of clarity, we illustrate the approach only in the space of the leading two PCs.

Figure B1 .
Figure B1.Artificial data example.(a) Detection probabilities when ignoring the time domain for varying network sizes.In this case, the empirically identified detection rates correspond exactly to the theoretical detection probabilities.If we induce moderate spatiotemporal correlations in (b), and stronger ones in (c), we still find an excellent fit to the theoretical expectation because we still have relatively sparse networks and the towers are independent samples of the underlying distribution.If the detection rates over space and time are considered, however, the events are no longer independent due to their temporal autocorrelation, and thus the largest extremes tend to cluster towards the centre of the domain.Parts (e) and (f) show these lower detection rates, and (g), (f), (i) are the data corresponding to results in the columns.

Figure B2 .
Figure B2.Artificial data example considering the actual event detection algorithm.(a) Detection probabilities when ignoring the time domain for varying network sizes.In this case, the empirically identified detection rates dramatically overestimate the theoretical detection probabilities.If we induce moderate spatiotemporal correlations in (b), and stronger ones in (c), we still find this pattern, but it is less pronounced for the very large events.This shows that having a large footprint for the event detection algorithm leads to an overestimation of the detection rates of small extremes.If the detection rates over space and time are considered, however, the events are no longer independent due to their temporal autocorrelation.Parts (e) and (f) reveal lower deviations from the expected detection rates, which is a compensating effect of the autocorrelation and event detection method setting.The data corresponding to results in the columns are shown in (g), (f), and (i).
Figure D1.International Soil Moisture Network and its capacity to detect SPI extremes in Europe.Again red line shows the reduction of detection capacity due to inactive towers.Randomly placing observation years in space and time leads to higher detection rates for large extremes, and lower rates for small extremes.

Figure D2 .
Figure D2.International Soil Moisture Network and its capacities to detect SPI extremes in Europe vs. a random network for the 1980s (a) and 2000s (b).

Figure D3 .
Figure D3.Number of stations in the International Soil Moisture Network over time confronted with drought-affected area.