Spring Blooms in the Baltic Sea have weakened but lengthened from 2000 to 2014

Phytoplankton spring bloom phenology was derived from a 15-year time-series (20002014) of ship-of-opportunity chlorophyll-a fluorescence observations collected in the Baltic Sea through the Alg@line network. Decadal trends were analysed against inter-annual variability in bloom timing and intensity, and environmental drivers (nutrient concentration, temperature, radiation level, wind speed). 5 Spring blooms developed from the south to the north with the first blooms peaking mid-March in the Bay of Mecklenburg and the latest bloom peaks occurring mid-April in the Gulf of Finland. Bloom duration was similar between sea areas (43± 2 d), except for shorter bloom duration in the Bay of Mecklenburg (36± 11 d). Variability in bloom timing increased towards the south. Bloom peak chlorophyll-a concentrations were highest (and most variable) in the Gulf of Finland 10 (20.2± 5.7mg m−3) and the Bay of Mecklenburg (12.3± 5.2mg m−3). Bloom peak chlorophyll-a concentration showed a negative trend of −0.31± 0.10mg m−3 yr−1. Trend-agnostic distribution-based (Weibull-type) bloom metrics showed a positive trend in bloom duration of 1.04± 0.20 d yr−1, which was not found with any of the threshold-based metrics. The Weibull bloom metric results were considered representative in presence of bloom intensity trends. 15 Bloom intensity was mainly determined by winter nutrient concentration, while bloom timing and duration co-varied with meteorological conditions. Longer blooms corresponded to higher water temperature, more intense solar radiation, and lower wind speed. It is concluded that nutrient reduction efforts led to decreasing bloom intensity, while changes in Baltic Sea environmental conditions associated with global change correspond to a lengthening spring bloom period. 20


Introduction
Human influence and climate change transform terrestrial and marine ecosystems worldwide at unprecedented rates (Cleland et al., 2007;Cloern et al., 2015).Coastal marine systems experience anthropogenic pressure as well as indirect changes in climatic conditions, which affect the marine food web (Heisler et al., 2008;Zhai et al., 2013;Paerl and Huisman, 2008).Ecosystem responses to these changes are difficult to relate to unique causes (HELCOM, 2007b;Winder and Cloern, 2010;Neumann et al., 2012).Experiments designed to support biogeochemical model scenarios (e.g.Neumann et al., 2002;Tamminen and Andersen, 2007;Seppälä and Olli, 2008) help to disentangle observed trends.However, the predictive capabilities of biogeochemical models (e.g Kuusisto et al., 1998;Meier et al., 2011;Gnanadesikan and Anderson, 2009) remain dependent on calibration against long and consistent multi-variable time series.
Phytoplankton bloom intensity and timing (bloom phenology) are indicators of ecosystem health at the base of the food web (e.g.Hays et al., 2005;Adrian et al., 2009;Vargas et al., 2009).Phenological studies are increasingly P. M. M. Groetsch et al.: Spring blooms in the Baltic Sea used to inspect regional ecosystem response to nutrient reduction efforts (HELCOM, 2007a;Voss et al., 2011;Fleming-Lehtinen et al., 2015) and changing climatic conditions (Sommer and Lengfellner, 2008;Paerl and Huisman, 2009).The Baltic Sea is a coastal ecosystem affected by eutrophication (Korpinen et al., 2012), which intensifies naturally occurring spring and summer bloom (Bianchi and Engelhaupt, 2000;HELCOM, 2007a).The Helsinki Commission formulated a nutrient reduction scheme aimed at improving ecosystem health in 1992 (HELCOM, 2008), which came into force in 2000.Monitoring of key ecosystem health indicators is implemented in the national monitoring programmes of HELCOM contracting parties.These programmes include traditional dedicated sampling campaigns at sea, and increasingly, the use of highly resolving observation platforms.
Ships of opportunity (typically cargo ships or passenger ferries) offer a largely weather-independent, reliable, and cost-effective platform for the collection of high frequency in situ observations (Leppänen et al., 1995;Ainsworth, 2008).Phytoplankton pigment fluorometers are included in most of these ferryboxes.In the Baltic sea, such systems have recorded phytoplankton blooms on the route from Helsinki to Travemünde (and vice versa) since 1992 (Rantajärvi et al., 2003).On this route, ferryboxes have collected over 9.5 million chlorophyll a pigment fluorescence observations from 1926 transects, with a median revisit time of under two days in the last 15 years (2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014).Ship-based observations from merchant vessels provide continuity in monitoring, which is particularly important in seasons when other observation systems are less reliable.In spring, satellite observations are rare due to high average cloud cover, while high costs of dedicated research cruises and coastal laboratories limit their spatio-temporal coverage.Ferrybox observations are therefore the primary source of observations to study spring bloom dynamics in this region.
Phytoplankton abundance and succession in the Baltic Sea is controlled by nutrient (Neumann et al., 2002;Tamminen and Andersen, 2007) and light availability (Sverdrup, 1953;Smetacek and Passow, 1990;Nelson and Smith, 1991;Siegel et al., 2002), mixing status (Ueyama and Monger, 2005;Sharples et al., 2006), temperature (Grayek and Staneva, 2011), ice cover (Kahru and Nommann, 1990;Omstedt et al., 2004;Sommer and Lengfellner, 2008), and salinity (Fennel, 1999;Tamminen and Andersen, 2007).In addition, the quantum yield of fluorescence is influenced by solar irradiance (Kiefer, 1973;Dandonneau and Neveux, 1997;Marra, 1997;Sackmann et al., 2008), species composition, and physiology (Kiefer et al., 1989).Hence, interpretation of unattended pigment fluorescence measurements in terms of phytoplankton biomass presents a number of challenges (Roesler and Barnard, 2013).Firstly, phytoplankton distribution exhibits high spatial and temporal variability, while ferryboxes measure pigment fluorescence at fixed depth (Ruokanen et al., 2003).Therefore, stratified conditions may not be well rep-resented in the data (Groetsch et al., 2014).Secondly, in a typical ferrybox setup, fluorescence yield is at best determined as a daily regional average, which disregards variability on smaller spatio-temporal scales.Despite these challenges, Fleming and Kaitala (2006) demonstrated that ferrybox observations in the Baltic Sea can be used to derive bloom timing and intensity for biomass-rich sea areas.They report a slightly negative trend in bloom initiation in the northern Baltic Proper and the Gulf of Finland for the period 1992-2004.Recent studies also reported shifts in phytoplankton spring bloom biomass or species composition (e.g.Klais et al., 2011;Wasmund et al., 2011Wasmund et al., , 2013)).Kahru and Elmgren (2014) reported that the timing of cyanobacterial surface accumulations has advanced approximately 20 days from 1979 to 2013.However, information about shifts in Baltic Sea spring bloom timing is still lacking.
Choosing an adequate bloom metric is not trivial, as no clear guidelines exist that conclusively support one metric over others.Bloom metrics for both remotely sensed and in situ sampled time series are commonly divided into three groups: (1) fixed or variable concentration threshold metrics (Siegel et al., 2002;Fleming and Kaitala, 2006;Lips et al., 2014;Racault et al., 2015), (2) growth-rate-based metrics (Rolinski et al., 2007;Wiltshire et al., 2008), and (3) distribution-based metrics (Rolinski et al., 2007;Platt et al., 2009;Vargas et al., 2009;Zhai et al., 2011).Threshold-based and growth-rate-based metrics typically require data preprocessing (e.g.interpolation and smoothing) to mitigate the impact of gaps, noise, outliers, and multi-modal bloom distributions on the derived bloom phenology (Rolinski et al., 2007;Cole et al., 2012;Ferreira et al., 2014).Distributionbased metrics fit an analytical expression to observations using fitting routines designed to cope with imperfections in the input data while optimally preserving natural variability.Distribution-based bloom metrics are considered more robust than threshold-based or growth-rate-based metrics, in the presence of complex, multi-modal bloom observations (Ji et al., 2010).Interpretation based on several, conceptually different bloom metrics can be used to obtain uncertainty estimates (Ho and Michalak, 2015).It also allows long-term trends in bloom phenology to be screened for.The latter is because threshold-based metrics are biased by longterm bloom intensity trends, whereas growth-rate-based and distribution-based metrics are not.Figure 1 illustrates how a gradual decline (negative trend) in bloom peak concentration causes any metric based on fixed thresholds (e.g.derived from climatology or expert judgement) to introduce an artificial negative trend in bloom duration.In contrast, metrics based on growth rate, distribution, or annually derived thresholds yield a single bloom duration in this example because bloom intensity does not influence these metrics.
The aims of this study are twofold: (1) to report longterm trends for Baltic Sea spring bloom intensity and timing, and (2) to attribute these trends to changes in environmental conditions.To meet these objectives, we describe a methodology to derive quality-controlled time series of chlorophyll a concentrations from observations collected under the Baltic Sea Alg@line program over a period of 15 years (2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014).Uncertainties arising from variability in the phytoplankton pigment fluorescence yield are estimated.Bloom phenology parameters, derived from threshold-and distribution-based bloom metrics, are explored for long-term trends.Inter-annual variability of bloom phenology parameters are attributed to nutrient availability and meteorological conditions (temperature, radiation level, wind speed), which might help to relate long-term trends to unique causes.Finally, we summarize how these results contribute to the discussion on recent changes in the Baltic Sea, and the monitoring practices that need to be in place to detect such changes.
2 Materials and methods

Alg@line data
In situ data in this study were collected until 2009 by the Finnish Institute of Marine Research, and by the Finnish Environment Institute (SYKE) from 2009 onwards, within the Alg@line network of Baltic Sea ferryboxes.Here we consider systems installed on two cargo vessels, M/S Finnpartner (2000)(2001)(2002)(2003)(2004)(2005)(2006) and M/S Finnmaid (2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014), which served between Travemünde (Germany) and Helsinki (Finland) as depicted in Fig. 2. Three routes were sailed during the study period.Depending on weather conditions, the passage between Gotland and the mainland of Sweden (39 % of all transects) was favoured over the direct route east of Gotland (52 %), while the route with a lay-over in Gdansk (Poland) was only occasionally served during 2009 to 2012 (7 %).Several transects (2 %) were sailed for refuelling or maintenance in other ports and not used for this study.
Quality control flags were derived from (1) sensor reading thresholds on speed, flow rate, hull, and sampled water temperature, and (2) data variability, expressed as lower and upper bounds for standard deviation between neighbouring measurements, as described below.Measurements at low (< 5 knots) or zero ship speed are typically collected in the harbour and were omitted.Erroneous records, e.g.caused by instrument communication errors, were removed using a moving window mean filter.A window length of 25 observations (approximately 8.3 min) was used for records of ship speed, and a window length of 100 observations (33.3 min) was used for flow rate and temperature records.Low flow rates can indicate blocked passages, pump failure, or leaks.Flow meter readings were available for approximately onethird of all records.A proxy for flow disruption is the difference in ship-hull temperature and in-line temperature.Flow rates < 0.3 L min −1 or a temperature difference > 2 • C were used to flag records as suspect.Instrument failure, communication or digitizing errors may lead to "stuck" values, which were detected by calculating standard deviation in a moving window of 100 samples.Observations corresponding to low standard deviation (σ < 1e −4 ) of Chl a fluorescence measurements or GPS-derived latitude were omitted.GPS-derived latitude was additionally filtered for exceptionally high short-term variability (σ > 0.5, window size 50 samples), caused by poor satellite reception or serial communication errors.Table 1 provides an overview of the applied quality control flags.
Chl a fluorescence data were corrected for sensor drift and discontinuities by transect-wise normalization (division by transect mean).This was necessary to account for changes in instrumentation, signal contamination due to bio-fouling, trapped bubbles and particles, and changes in sensor sensitivity due to deterioration or manual adjustments.Laboratory analysis results of bottle samples are typically available from every sixth transect, with up to 24 samples collected by automated, refrigerated water samplers (Teledyne Isco).Laboratory analyses included inorganic nutrient concentrations (nitrate+nitrite, phosphate and silicate), Chl a concentration, and occasionally inverted light microscopy counts of phytoplankton species.Laboratory Chl a concentration results were used to convert transect-normalized Chl a fluorescence to units of Chl a concentration (in mg m −3 ).First, a linear (generalized least squares) regression fit of normalized Chl a fluorescence against corresponding Chl a lab measurements was carried out for each sampled transect.If the regression failed (R 2 < 0.3 or p > 1), a moving window regression was carried out (window length 10 samples), and the subset with the highest R 2 was used to determine the correction factor.The threshold for R 2 was determined manually based on the distribution of R 2 , while p > 1 indicated numerical instabilities during the fitting procedure.Each transect without corresponding bottle samples was corrected by individually applying the regression parameters of the two neighbouring sampled transects.These two solutions were then interpolated linearly, weighted by their temporal distance to the respective transect.Negative concentration values occasionally occurred for weak fluorescence signals, and were set to zero.
The diurnal variability of the fluorescence signal was estimated from quality-controlled observations in all seasons.First, these observations were divided by their respective transect mean to remove biomass-driven first-order variability in the fluorescence signal.Then, diurnal cycles were derived by dividing these observations into hourly bins and sun elevation angle ranges (0.1 rad bins).

Meteorological data
Photosynthetically active radiation (par), sea surface temperature (sst), and wind speed (wind) were derived from the ECMWF (European Centre for Medium-Range Weather Forecasts) ERA-Interim reanalysis data set (Dee et al., 2011).The spatial resolution of the model is constrained by the underlying atmospheric model, which is stored on a spatial T255 grid corresponding to approximately 79 km cell size when projected to a reduced Gaussian grid.Four values per day were retrieved for each parameter and the entire Baltic Sea.Parameter values for each Alg@line obser-vation were extracted using spatio-temporal spline interpolation of third order.The first-order seasonal signal (e.g.rising par and sst in spring) was removed from the observations by subtracting multi-year (2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014) daily sea area averages, approximated by second-order polynomials.The seasonally detrended parameters were then averaged over the bloom period and are further referred to as par, sst, and wind.

Nutrient concentration and depletion timing
A single term for nutrient availability was adopted from Fleming and Kaitala (2006), calculated as nut = 3 , where NO 3 + NO 2 , PO 4 , and SiO 4 are the concentrations of nitrite+nitrate, phosphate, and silicate, respectively.These concentrations were derived from laboratory analysis of bottle samples that were regularly collected along the transect (further detail in Sect.2.1).nut was spatially binned for each investigated sea area and resampled to daily averages and consecutively smoothed with a 21-day centred-running-mean filter.This treatment resembles the processing of Alg@aline observations (see Sect. 2.4) to enable consistent interpretation of the joint data set.Nutrient concentrations and depletion timing are described using the following metrics.The nutrient concentration prior to bloom start (nut-peakvalue) was defined as the yearly maximum nutrient concentration (day of year between 31 and 160).The day of year when the nutrient concentrations equalled 100, 50, and 25 % of their peak values are referred to as nut-peakday, nut-deplday-50, and nut-deplday-25.The day and value of the lowest nutrient concentration index are referred to as nut-minday and nut-minvalue.The rate of nutrient depletion between 75 and 25 % of the peak value (nut-slope) was determined through linear regression.

Extraction of bloom timing and intensity
Extraction of bloom timing and intensity was carried out for five Baltic Sea areas, where each area follows definitions of the HELCOM Combine program (HELCOM, 2013).Figure 2 illustrates the location of the areas: the western Gulf of Finland (gof: > 59.5 • N latitude, alongtransect), the northern Baltic Proper (nbp: 58.4-59.5 • N latitude, along-transect), the combined western and eastern Gotland basins (got: 56.2-58.4• N latitude, along-transect), the southern Baltic Proper (sbp: 54.5-56.2• N latitude, alongtransect), and the Bay of Mecklenburg (bom: < 54.5 • N latitude, along-transect).For the got and sbp areas, only routes that passed by Gotland were selected, whereas routes via Gdansk were excluded.This is because the route through Gdansk was sailed only from 2009 to 2012.If not otherwise stated, all further steps are carried out individually for each of these areas and for day of year between 31 (31 January) and 160 (9 June).The ship-of-opportunity (Alg@line) measurements typically commenced in the second half of January, Table 1.Quality control flag definitions and statistics.Observations were omitted if any of the flags exceeded the respective threshold.Absolute temperature difference is measured between the water intake and the flow-through sensors.Availability and rejection rates were calculated relative to the total number of observations.SD denotes standard deviation.

Sign
Threshold which is why 31 January was chosen as the start of our analysis.The end date was chosen such that it covers all spring bloom events in all basins but excludes summer bloom.Alg@line Chl a concentrations (see Sect. 2.1) were resampled to daily sea area averages, using linear interpolation, and subsequently smoothed with a 21-day centred-running-mean filter (e.g.Ferreira et al., 2014;Racault et al., 2015) to fill in gaps and reduce short-term variability.We derive several metrics, all of which have in common that the bloom peak concentration (peakheight, see Table 2 for explanations of acronyms) and timing (peakday) are defined as the maximum Chl a value at the corresponding day-of-year, respectively.Two threshold-based metrics and one distribution-fit-based metric were calculated.
1. Chl a concentration exceeding a fixed threshold of 5 mg m −3 was defined as bloom by Fleming and Kaitala (2006), further referred to as const5.A 21-day centredrunning-mean filter was used to keep results comparable to the other metrics considered, whereas Fleming and Kaitala (2006) used a 7-day centred-running-median filter.
2. Siegel et al. (2002) proposed a spatially variablethreshold metric based on the 5 % above median concentration, but reported small quantitative differences for thresholds between 1 and 30 % above median.Their threshold is based on the complete annual cycle, while here only the spring bloom period from day-of-year 31 to 160 is considered.We refer to this metric as median5.
3. Distributions proposed to describe bloom phenology include shifted-Gaussian (Platt et al., 2009), gamma (Vargas et al., 2009), and Weibull distributions (Rolinski et al., 2007).The shifted Gaussian is symmetric in shape, whereas gamma distributions allow for different slopes of bloom rise and decline.In addition, Weibull functions recognize non-zero offsets before and after the bloom phase.The latter has proven essential to obtain a good fit for the transition phase between spring  et al. (2007), was fitted non-linearly to the preprocessed and scaled (to a range of 0-1) Chl a concentrations.The bloom initiation and end are defined as the 10th and 90th percentiles before and after the bloom peak, respectively.This metric is further referred to as weibull.
For each metric, bloom initiation, peak, and end dates (startday, peakday, and endday) were extracted from the data set.
Based on these dates, bloom duration (duration), concentration average (concavg), and the sum of daily Chl a concentrations (bloomidx) were calculated.The latter was proposed by Fleming and Kaitala (2006) to characterize bloom intensity.We assumed the bloom to have started prior to Alg@line service commencement if the first data point had already satisfied the bloom criterion for a given metric.Such cases were identified for 30 out of 225 combinations of sea region, year, and bloom metric (nine times for bloom metric const5, sixteen times for median5, and five times for weibull).Corresponding bloom start days were replaced by the median value for the region over the 15 years studied in all subsequent calculations.

Principal component analysis
Principal component analysis (PCA) was carried out to attribute seasonally detrended meteorological conditions (sst, par, wind) and nutrient concentrations (nut-peakvalue, nutminvalue) to the inter-annual variability in bloom intensity (bloomidx, concavg, peakheight) and timing (startday and peakday, duration).Outliers were defined for each parameter as departure by more than 3 standard deviations from the parameter mean, and replaced with the region median.
Z score normalization (subtraction of mean, division by standard deviation) was carried out on a per-region basis.Regionequalized, zero-mean, and unit-variance data were then subjected to the PCA function in the Python framework scikitlearn (Pedregosa and Varoquaux, 2011).

Quality-controlled chlorophyll a concentration time series
The Alg@line ferrybox systems collected over 9.5 × 10 6 observations between 2000 and 2014, of which 3.8×10 6 observations were sampled during spring (day-of-year 31 to 160).Availability and rejection rates for each quality control parameter are listed in Table 1.In total, quality control procedures removed 4.55 % of all observations.Determination of the fluorescence yield was supported by an "adaptive regression" method.Where necessary (R 2 < 0.3 or p > 1), it selected the subset of bottle-sampled and laboratory-analysed Chl a concentrations that yielded the best linear fit to Chl a fluorescence observations for a given transect.This procedure allowed 318 (98 %) out of 324 transects for which bottle samples were collected to be successfully fit.Only 266 (82 %) transects could have been used (R 2 >=0.3 and p 1) without applying this technique.
Figure 3a shows normalized fluorescence observations as a function of sampling time-of-day.Results are presented separately for summer (May to August), winter (November to February), and the transition periods (autumn, spring).Diurnal variability was most pronounced in summer, when the fluorescence signal varied on average 50 % over the course of a day.In winter and during the transition periods (spring, autumn) a diurnal variability of 35 and 38 %, respectively, was contained in the fluorescence signals.This seasonal effect is likely caused by variations in average irradiance intensity, which are modulated primarily by sun elevation, but also by atmospheric conditions (e.g.cloud cover, aerosol optical thickness) and optical properties of the water body (e.g.ice cover, attenuation).Figure 3b depicts normalized fluorescence as a function of solar elevation.In this representation, seasonal differences in diurnal variability are essentially absent and the correspondence between solar elevation and average fluorescence response was approximately linear for daytime observations.

Bloom intensity and timing
Blooms generally developed first in the south and progressed towards the north (see Fig. 4 and Table 3).Bloom peak timing (not influenced by choice of metric) followed this pattern, as did metric-dependent bloom start and end dates.The fixed-threshold bloom metric const5 suggested longer blooms in high-biomass sea areas like the gof, compared to low-biomass areas such as the sbs.The spatially variablethreshold metric median5 applies area-specific bloom thresholds (nbp: 3.52, gof: 4.95, got: 2.51, sbs: 2.62, bom: 4.02 mg m −3 ) and resulted in approximately stable bloom duration in all sea areas.The weibull metric, which is not sensitive to absolute bloom intensity, also resulted in comparable bloom durations for all sea areas.The year-to-year variability of start, peak, and end days generally increased towards the south for all metrics.
Spring bloom intensity was described by three parameters: the metric-independent bloom peak concentration (peakheight), the Chl a concentration average during bloom conditions (concavg), and the sum of daily Chl a concentrations over the bloom period (bloomidx).Similar patterns were observed for all these parameters and bloom metrics, as illustrated in Fig. 5.The highest bloom intensity was found in the gof and nbp, followed by the bom.Low-intensity blooms were observed in the sbp and the got.Variability was generally proportional to bloom intensity, highest in the highbiomass and coastal gof and bom.Variability in bloomidx was comparable to that in peakheight, while concavg was Table 3. Bloom timing and intensity for each investigated sea area (Fig. 2) and for all applied bloom metrics.SD denotes standard deviation.considerably more stable.All calculated bloom phenology parameters can be found in the Supplement.
Bloom duration resulting from the weibull metric stands out in the result set with a positive trend of 1.04 ± 0.20 day yr −1 (R 2 = 0.28, p 0.01, Fig. 7).No significant trend in bloom duration was found for any fixed-or variablethreshold metric.
PCA scores and loadings of the first three principal components (PCs) are shown as biplots in Fig. 8.The first PC is dominated by negative correlations to bloom intensity parameters (peakheight, concavg, bloomidx).This component is positively correlated to pre-bloom nutrient concentration (nut-peakvalue) and bloom duration, illustrating that bloom intensity is driven by pre-bloom nutrient availability.The second PC is linked to bloom timing, with strong positive correlations to startday and peakday.Correlations to par (positive), sst (positive), and wind (negative) suggest that weather conditions affect bloom timing.Bloom duration is positively correlated to the third PC, as well as to bloomidx.Additional negative correlations to nut-minvalue and wind, as well as a positive correlation to par, suggest a link between favourable meteorological conditions (low wind-mixing, high light level) and efficient nutrient depletion.

Discussion
Trends in spring bloom phenology can be interpreted as responses to nutrient reduction as well as to slowly acting environmental processes, such as climate change.To disentangle or even quantify these trends, suitable observation platforms and subsequent analytical approaches must be chosen.We present evidence that fundamental challenges of ferrybox observations can be overcome to yield an internally consistent data source.Subsequently, the behaviour of commonly used bloom metrics in the presence of decadal trends can be scrutinized in the context of previously reported system knowledge.Finally, we attempt to disentangle the effects of nutrient availability and meteorological conditions on interannual variability in bloom phenology.

Automated processing of ferrybox observations
Thresholds for speed, flow rate, and data variability were iteratively adjusted to the data set and may not be applicable to other ferrybox implementations.Particularly flow rate, derived from differences in line and hull temperature, will likely require tuning to each ferrybox installation.However, here we analysed data from two ferrybox installations, which could be treated with the same set of thresholds.Transect-wise normalization of the quality-controlled fluorescence data was adequate to consistently interpret observations collected by different generations of instrumentation.However, this approach crucially depends on continuous temporal coverage of reference measurements for cali-  bration to Chl a concentrations.Adaptive regression analysis improved the handling of statistical outliers which would otherwise hamper determination of fluorescence yield, while transects for which no bottle samples are available were corrected with an interpolated fluorescence yield derived from the closest bottle-sampled transects.The present procedure allows for automated and reproducible processing, which is an improvement over manual quality control.Applying the proposed interpolated fluorescence yield helps in reprocessing and long-term data analysis of ferrybox fluorescence observations to better represent natural variability.(Kahru and Elmgren, 2014), influenced fluorescence yield considerably less than diel cycles.
The diurnal variability in fluorescence response of 50 % during an average summer day is within the range of earlier findings, e.g.66 % (±33 %) for near-surface observations in upwelled waters of the equatorial Pacific reported by Dandonneau and Neveux (1997) or 30 % for near-surface seaglider observations in northeast Pacific waters off the Washington coast, United States (Sackmann et al., 2008); although differences in normalization impede direct comparison.The sampling depth of 5 m for Alg@line systems and the high attenuation of the Baltic Sea in comparison to clear Pacific Ocean waters are likely to dampen the observed diurnal variability.
In this study, fluorescence observations during spring, when diurnal variability reached on average 38 %, were binned for five large Baltic Sea areas.At a typical cruising speed of approximately 23 knots, each sea area is sampled for at least several hours.This limits the influence of diurnal variability in fluorescence yield along a transect on derived Chl a concentration, which is therefore of lesser relevance for the present study.However, if fluorescence measurements were to be quantitatively evaluated at a higher spatial resolution, locally varying fluorescence yield should be accounted for.Analysis of signal coherence (Groetsch et al., 2014) offers an alternative to quantitative interpretation of fluorescence observations and can be used to qualitatively detect cyanobacterial surface bloom.If light history is known, e.g. from a dedicated irradiance sensor, a correction of diurnal fluorescence yield variability might be possible, and further research in this direction is recommended.

Spring bloom timing and intensity
The presented bloom phenology expands the time series presented by Fleming and Kaitala (2006) and is in good agreement for the overlapping period (2000)(2001)(2002)(2003)(2004) when comparing the const5 metric results.Remaining differences are likely due to quality-control and preprocessing procedures on the fluorescence records.The authors reported for gof, nbp, and the Arkona Sea that bloom typically started in the south and ended in the north, while bloom intensity increased towards the north.These observations are confirmed here.Sea areas not covered in Fleming and Kaitala (2006), e.g the high-biomass bom and low-biomass sbp and got, followed the reported south-north trend in bloom development.Present results also support and expand the findings of Fennel (1999), who showed, with simulations and monitoring data from 1994-1996 for the western Baltic Sea, that surface heating in early spring needs to overcome the temperature of maximum density to repress convective mixing and allow spring bloom to emerge.The temperature of maximum density increases with decreasing salinity; therefore convective mixing is sustained longer in less saline northern Baltic Sea waters when spring temperature is on the rise.At the same time, incident solar radiation increases slower in the north due to lower solar elevation.

Trends
Inter-annual variability in coastal systems exceeds long-term trends by orders of magnitude (Cole et al., 2012).Consequently, trends were observed at relatively low coefficients of correlation.The importance of appropriate data preprocessing and gap handling (e.g Cole et al., 2012;Racault et al., 2014) and choice of metric (Ferreira et al., 2014) has been demonstrated in literature and is further emphasized by the present analysis.Robustness of the reported decadal trends is documented by high statistical significance levels (p 0.01, Figs.7 and 6), which were supported by spatially binning phenology parameters from all examined Baltic Sea areas.Similar trends were observed earlier for individual Baltic Sea areas, however, usually outside 95 % confidence intervals (e.g Wasmund and Uhlig, 2003).
Łysiak Pastuszak et al. ( 2014) reported stable or increasing Chl a concentrations for the period 2007-2011 in several Baltic Sea areas despite signs of declining nutrient concentrations.More recently, eutrophication trend reversal and oligotrophication processes were reported by Andersen et al. (2015), based on analysis of 112 years of consolidated Baltic Sea observations.Both reports considered surfacelayer Chl a concentration in summer as one of the direct indicators of eutrophication, but did not include spring bloom in their assessment.The time series for 2000-2014 that we present here fills this gap: a negative trend in bloom intensity was also found for spring bloom, providing further evidence for their hypothesis of gradual nutrient load reduction.Thresholds of const5 and median5 are fixed for the whole time series.The observed negative trend in peak concentration was expected to introduce an artificial negative trend in bloom duration because an increasingly higher percentile of the distribution is seen below the bloom threshold (Fig. 1).Contrary to this expected behaviour, however, const5 and median5 revealed no significant trends in bloom duration.This indicates that the anticipated negative trend in bloom duration was countered by a positive trend, e.g. in bloom intensity.The Weibull metric is based on concentration distribution ratios that are calculated individually for each bloom.Therefore, Weibull-metric results for bloom duration are not sensitive to long-term trends in peak concentration.Weibulldistribution metrics confirmed a highly significant, positive trend in bloom duration.These two sets of results corroborate the conclusion that spring blooms in the Baltic Sea have become longer, while Chl a peak and average concentration levels have declined.
This "flattening" of the concentration distribution is supported by the absence of a trend in time-integrated biomass bloomidx and by shifts in nutrient concentration timing (earlier nutrient peak concentration, later 25 % of peak value day).These results indicate that annually generated spring bloom biomass has not changed significantly over the study period, in contrast to bloom timing.Kahru and Elmgren (2014) found a similar development for cyanobacterial summer surface bloom, and reported decadal oscillations, yet no long-term trend, of surface area covered by cyanobacteria in the period 1979-2013.In the same period, summer bloom initiation moved to earlier dates by −0.6 day yr −1 .These results suggest that the gap has decreased between dinoflagelate-and diatom-dominated spring bloom and cyanobacterial summer bloom.Due to the shorter period covered here as compared to the time series presented by Kahru and Elmgren (2014), it cannot be ruled out that the spring bloom trends are caused by decadal oscillation.Moreover, Alg@line nutrient records often did not commence sufficiently early in the season to record bloom onset.Trends in bloom start and nutrient peak timing can therefore not be derived at the same accuracy and precision as the other phenological parameters.In future, additional data and longer time series may revise this analysis.To this end, nutrient metrics derived in this work are provided in the Appendix.
Our findings emphasize that bloom timing is an essential indicator to monitor marine ecosystem dynamics, and thus eutrophication status.Observations at high temporal resolution and choice of bloom metrics are crucial to derive bloom timing trends.Eutrophication status assessment frameworks such as HEAT3.0 (Andersen et al., 2015) may be adapted to embrace available high-frequency data sources to include bloom timing in their analysis.The present results may also prove useful in the calibration and validation of ecosystem models of the Baltic Sea.

Environmental forcing
Gradually decreasing nutrient concentrations (Łysiak Pastuszak et al., 2014;Andersen et al., 2015), as well as rising average air and sea-surface temperatures (Omstedt et al., 2004;Borsenkova et al., 2013) have been reported for recent years, corresponding to a combination of nutrient reduction efforts and global climate change.Several scenarios for future change are plausible (Duarte et al., 2009) but extrapolation of the present results to climate scenarios is beyond the scope of this study.We nevertheless make an attempt to attribute the observed bloom phenology shifts to reported changes in environmental drivers.
Wintertime nutrient concentration and bloom intensity were positively correlated if no spatial normalization was apwww.biogeosciences.net/13/4959/2016/Biogeosciences, 13, 4959-4973, 2016 plied.This supports the paradigm that the first-order driver of bloom intensity is nutrient availability.Lacking alternative explanations, we attribute the reported negative trend in bloom peak concentration to declining nutrient concentrations.First-order spatial trends in bloom intensity and timing can be removed by an area-wise z score normalization, which effectively constrains the analysis to inter-annual variability.After this normalization, both regression and PCA resulted in negative correlation between wintertime nutrient concentration and bloom intensity.This negative feedback can be understood as a subtle interaction between meteorological forcing and nutrient supply: strong wind-forced mixing can cause upwelling of deep, nutrient-rich waters to surface layers.Wind speed, however, was found to be negatively correlated to the prevalent light level, as well as to bloom duration and bloom index.Therefore, in years when additional nutrients are available due to strong wind forced mixing, lowlight regimes that can slow down bloom development are also likely to prevail.Bloom duration primarily co-varied with weather conditions; e.g.high irradiance levels and low wind speeds were frequently observed for long-lasting blooms (and vice versa).Although the same pattern was observed for bloom timing, no trend was found for bloom start and peak day.Increasingly favourable meteorological conditions in late bloom phases are thus a likely driver for the observed increase in bloom duration.Similar weather-driven modulations of bloom timing were reported earlier (Fleming and Kaitala, 2006;Meier et al., 2011;Neumann et al., 2012) for spring, and especially cyanobacterial summer bloom (Wasmund, 1997;Kanoshina et al., 2003;Wynne et al., 2010Wynne et al., , 2011)).

Conclusions
A Baltic Sea spring bloom phenology was derived from 15 years of automated ferrybox Chl a fluorescence observations.Procedures for automated quality control and processing were introduced, and uncertainty due to diurnal variability in phytoplankton fluorescence response was resolved.Both innovations promote increased use of ferrybox observations for scientific research and monitoring purposes, such as the periodic HELCOM eutrophication status assessments.Negative trends in spring bloom peak and average concentration were found, and an increase in bloom duration was derived from conceptually differing bloom metrics.Interannual variability in bloom intensity was primarily linked to nutrient availability, while bloom timing and duration were found to be related to meteorological conditions.In the future, these findings might help to better disentangle ecosystem response to changing nutrient availability and climatic conditions.

Data availability
The data set has been made publicly available (Groetsch and Simis, 2016).
The Supplement related to this article is available online at doi:10.5194/bg-13-4959-2016-supplement.

Figure 1 .
Figure 1.Illustration of threshold-based bloom metric behaviour when applied to a data set with a negative peak concentration trend.

Figure 2 .
Figure 2. Transect of M/S Finnmaid and M/S Finnpartner through the Baltic Sea from Helsinki (Finland) to Travemünde (Germany; vice versa).The following sea areas are considered in this study: the western Gulf of Finland (gof: > 59.5 • N latitude, along transect), the northern Baltic Proper (nbp: 58.4-59.5 • N latitude, along transect), the western and eastern Gotland basins (got: 56.2-58.4• N latitude, along transect), the southern Baltic Proper (sbp: 54.5-56.2• N latitude, along transect) and the Bay of Mecklenburg (bom: < 54.5 • N latitude, along transect).Depending on weather conditions, the north or south of Gotland routes were sailed.

Figure 3 .
Figure 3.Diurnal variability in the chlorophyll a fluorescence yield: (a) normalized (division by transect-mean) chlorophyll a fluorescence observations plotted against time of day, and (b) sun elevation angle.The analysis was carried out on three subsets: winter (November-February), summer (May-August), and transition periods (March, April, September, October), using all ferrybox observations along the routes shown in Fig. 2.

Figure 4 .
Figure 4. Bloom timing (bloom start, peak, and end day) for each sea area along the routes in Fig. 2, averaged over the period 2000 to 2014, and for all applied bloom metrics.Whiskers indicate standard deviations over the 15-year study period.The bloom peak day is independent of the chosen metric and plotted separately.The sea areas are ordered by latitude, from south to north.

Figure 5 .
Figure 5. (a) Concentration average and (b) bloom intensity index for each sea area along the routes in Fig. 2, averaged over the years 2000 to 2014, and for all applied bloom metrics.Whiskers indicate standard deviations over the 15-year study period.The sea areas are ordered by latitude.The metric-independent bloom peak concentration is added in both plots for visual comparison.

Figure 6 .
Figure 6.(a) Decadal trend of average (concavg) and (b) peak (peakheight) chlorophyll a concentration during bloom conditions, derived from the Weibull-distribution metric.Concentrations were normalized prior to regression (subtraction of area-average concentration).Dashed lines indicate the trend line (bold) and its confidence intervals (5 %, small dashes).SE denotes standard error; RMSE denotes root mean square error.

Figure 7 .
Figure 7. Decadal trend of bloom duration, calculated with the Weibull-distribution metric.Durations were normalized prior to regression (subtraction of area-average duration).Dashed lines indicate the trend line (bold) and its confidence intervals (5 %, small dashes).SE denotes standard error; RMSE denotes root mean square error.

Figure 8 .
Figure 8. Principal component analysis biplots: arrows indicate correlation of a parameter with the principal components (bottom and left axes, percentages refer to the variability explained by the principal component), and black dots indicate scores of individual observations (top and right axes) on the principal components (a component 1 and 2, b component 2 and 3).

Table 2 .
Description and acronyms of bloom phenology, nutrient, and meteorological parameters that were used in the trend and multi-variate analysis.