Map-based prediction of organic carbon in headwater streams improved by downstream observations from the river

. In spite of the great abundance and ecological importance of headwater streams, managers are usually limited by a lack of information about water chemistry in these head-waters. In this study we test whether river outlet chemistry can be used as an additional source of information to improve the prediction of the chemistry of upstream headwaters (size < 2 km 2 ) , relative to models based on map information alone. We use the concentration of total organic carbon (TOC), an important stream ecosystem parameter, as the target for our study. Between 2000 and 2008, we carried out 17 synoptic surveys in 9 mesoscale catchments (size 32–235 km 2 ) . Over 900 water samples were collected in total, primarily from headwater streams but also including each catchment’s river outlet during every survey. First we used partial least square regression (PLS) to model the distribution (median, interquartile range (IQR)) of headwater stream TOC for a given catchment, based on a large number of candidate variables including sub-catchment characteristics from GIS, and measured river chemistry at the catchment outlet. The best candidate variables from the PLS models were then used in hierarchical linear mixed models (MM) to model TOC in individual headwater streams. Three predictor variables were consistently selected for the MM calibration sets: (1) proportion of forested wetlands in the sub-catchment (positively correlated with headwater stream TOC), (2) proportion of lake surface cover in the sub-catchment (negatively correlated with headwater stream TOC), and (3) river outlet TOC (positively correlated with headwater stream TOC).


Introduction
Headwaters make up most of the watercourse length and hence provide a large proportion of the lotic habitat in a landscape (Meyer et al., 2007).The headwaters also provide much of the water and solutes to downstream locations (Person et al., 1936;Leopold et al., 1964).It is widely known that variability in water chemistry changes with catchment size, typically with small watercourses showing the highest variability in space (Wolock et al., 1997;Temnerud and Bishop, 2005) and time (Nagorski et al., 2003;Buffam et al., 2007).Significant field sampling efforts (Hutchins et al., 1999;Smart et al., 2001;Likens and Buso, 2006;McGuire et al., 2014) have been made to quantify the variability of Published by Copernicus Publications on behalf of the European Geosciences Union.

J. Temnerud et al.: Modelling TOC in headwater streams
headwaters in individual catchments.Readily derived GIS data from maps and satellite images have been used to model some chemical constituents in larger rivers (Alexander et al., 2007), but are seldom effective at predicting the chemistry of individual headwater streams (Strayer et al., 2003b, a;Temnerud et al., 2010).This is presumably in large part due to the greater importance of small-scale heterogeneity in headwater catchment characteristics, as compared to riverine catchments where much of the variability averages out at larger spatial scales (Gomi et al., 2002;MacDonald and Coe, 2007).
Since aquatic monitoring activities are generally located at downstream sites (Evans et al., 2010), this might provide information about the headwaters upstream from the monitoring sites.In an attempt to use environmental monitoring data to predict seldom assessed headwater streams, the chemistry at the river outlet was used by Temnerud et al. (2010) to predict the median and interquartile range (IQR) of several environmentally relevant stream chemistry parameters, including total organic carbon (TOC), acid neutralising capacity (ANC) and pH.This demonstrated that the river outlets were correlated to statistical features of the upstream population of headwaters.In that study significant relationships were found for ANC, pH and TOC between headwaters median and IQR vs the river outlets, with the strongest relationships for ANC, and the weakest for TOC.Of seven different leave-one-out attempts one model was significant for headwater median TOC and none for headwater TOC IQR (Temnerud et al., 2010).No map information was employed in that study.
In this study the goal was to test whether map information can be combined with river outlet chemistry to predict TOC in individual headwaters.More specifically, would the combination of map and river outlet chemistry give a better prediction than either one used separately.In this follow-up study we have chosen to focus solely on the prediction of TOC, for two main reasons.First, TOC is of great ecological importance for boreal and other watercourses because of its influence on pH, buffering capacity, nutrient bioavailability, metal and pollutants transport, light climate, microbial productivity, and carbon cycling (Wetzel, 2001;Schwarzenbach et al., 2003).Secondly, the statistical distribution of headwater TOC was not well predicted in the previous study using only downstream chemistry as the predictor (Temnerud et al., 2010).If the approach succeeds with TOC, then there is reason to hope that it would be even more effective in predicting other aspects of water chemistry.
An important aspect of modelling headwaters is that the spatial variation is largely dependent on temporal factors, often flow-related (Buffam et al., 2007), but also season (temperature and precipitation) and even long-term trends (Hytteborn et al., 2015).This temporal variation within a single headwater can be greater than the variation of TOC between catchments in the same biome.We want to make the reader aware that it could be easier to model headwaters in different catchments (at the same time) than headwaters in the same catchments that are sampled at different times.
Thus we modelled headwater TOC concentration's on several occasions in different stream networks using catchment map information (Andersson and Nyberg, 2009) complemented by data on the river outlet TOC concentration.
While this may seem straight-forward, there are in fact some theoretical challenges; the method must deal with both strong correlations between observations and between explanatory variables.We used a two-step modelling approach to handle these challenges.First we used partial least square regression (PLS), which can deal with strong correlations between explanatory variables, to model headwater stream TOC median and IQR-values for the mesoscale catchments from information on catchment land cover, geology, soil type maps, vegetation and river outlet chemistry.The best explanatory variables from the PLS models were then used as candidate variables for hierarchical linear mixed models (MM).Such MM have the advantage of being able to deal with strong correlations between observations (such as within a mesoscale catchment stream network), but MM are not appropriate for large numbers of explanatory variables.So PLS modelling was used first to narrow the number of explanatory variables, and then MM was used in the next step to model individual headwater streams.Thus, MM can account for the clustered data structure of catchment properties in a drainage network (Littell et al., 2006).Mixed models have been used rather successfully in related types of data evaluation (Jager et al., 2011;Sakamaki and Richardson, 2013).Two distinguishing features of our study are that (i) we test a combination of information from maps (GIS) with direct measurements of chemistry at the river outlet to create models of individual headwaters and (ii) we test our models on data that were not used in model calibrations.

Sampling approach
The synoptic surveys of mesoscale catchments used in this study were designed to provide a snapshot of the water chemistry in stream networks (Table 1, Fig. 1), with a strategy of sampling most watercourse junctions, lake inlets and outlets as well as the river outlets from each mesoscale catchment.In total there were data from 17 synoptic surveys conducted between 2000 and 2008 in nine catchments distributed across Sweden, spanning a north-south gradient of 800 km through the north-temperate and boreal zones (Fig. 1).This data set amounted to 938 stream samples of which 420 were from headwaters.Headwaters are defined as first order streams with catchments smaller than 2 km 2 in each of the nine drainage networks sampled (Table 1).The number of sampled headwaters differs between surveys (n HW in Table 2).All sampling during a given survey was carried out during a  1 to 3-day period (Table 2 and Fig. S1 in the Supplement), except for R. Krycklan in winter 2005, which took 2 weeks due to cold weather and difficulties finding the streams in deep snow.Stable base flow discharge was maintained throughout that winter sampling period, so that survey was still effectively a snapshot in time.
Five of the nine catchments contain at least one headwater stream site that has been monitored for runoff and chemistry regularly for a decade or more (Edström and Rystam, 1994;Temnerud et al., 2009;Köhler et al., 2008;Löfgren et al., 2011;Laudon et al., 2013).Of the other four catchments, R. Ottervattsbäcken was sampled twice and R. Vänjaurbäcken sampled once (the name R. Sörbäcken was used in the references, as this is a tributary in R. Vänjaurbäcken, Temnerud and Bishop, 2005;Temnerud et al., 2007) while R. Viggan and R. Mangslidsälven were sampled once, for this study.

Study sites
All sampled catchments (including headwaters, intermediate watercourses and river outlets) consisted mainly of forest (> 80 %) with a dominance of coniferous forest made up of Norway spruce (Picea abies) and Scots pine (Pinus silvestris) (Table S1).Mires and small humic lakes made up most of the remaining parts of the catchments, while the proportion of agricultural and developed areas were minimal (< 1 %).The mean annual air temperature in the river catchments (1990-2010) ranged from 7.8 • C in the southernmost river, R. Anråse å, to 2.6 • C in the northernmost, Krycklan.Mean annual precipitation (1990-2010) ranged from 980 mm at R. Anråse å, to 649 mm at R. Vänjaurbäcken.
Daily mean air temperature, daily precipitation and daily runoff for 1961-2010 at each river outlet were modelled by the Swedish Meteorological and Hydrological Institute (SMHI) (Johansson, 2000;Johansson and Chen, 2003), data received 20th of June 2013 from http://luftweb.smhi.se.Daily runoff for 1990-2010 at each river outlet was modelled based on Hydrological Predictions for the Environment program (HYPE) (Lindström et al., 2010), data received 20th of June 2013 from http://vattenweb.smhi.se.Typical accuracy in the HYPE modelling for small catchments (< 200 km 2 ) is 10 % (Strömqvist et al., 2012;Arheimer and Lindström, 2013).Catchment-specific daily mean air temperature, total precipitation and mean specific discharge are illustrated (Fig. S1) for the period of 30 days up to and including each sampling.

Map information
To relate headwater TOC to catchment characteristics, we began with 34 catchment parameters (Table S1, Figs.S3-S8) taken from the Swedish land cover data map (SMD), year 2000, version 2.1, which is based on the CORINE database (Bossard et al., 2000) as well as Geology and Quaternary deposits from the Geological Survey of Sweden (SGU) map, scale 1.1 million.The kNN-database of vegetation that has forestry variables estimated from LANDSAT 5 and LAND-SAT 7 satellite photos (version year 2000, Reese et al., 2002Reese et al., , 2003)), provided data about the average age and height of the forest as well as volume of the biomass for different tree species.All catchment map information uses the same version, year and scale of the maps so that the map data are commensurate between catchments (Table S1 in the Supplement).

Chemical analyses
After collection, all water samples were kept dark and cool until they were analysed.Total organic carbon (TOC) was measured by combustion and analysis as CO 2 using a Shimadzu TOC-VPCH analyser after acidification and sparging to remove inorganic carbon.Dissolved organic carbon (DOC) is the concentration of organic carbon in a filtered water sample (common cut-off is 0.45 µm).It has previously been shown that DOC and TOC differ on average by less than 5 % (Ivarsson and Jansson, 1994;Köhler, 1999), so TOC is essentially identical to DOC in the large majority of the Swedish surface waters (see also Gadmar et al., 2002;Laudon et al., 2011).

Statistical analysis
The main objective of this article is to model the TOC of individual headwaters based on map information and river outlet TOC.We use the following two-step approach where we first find the best map variables for predicting the average level and distribution around that level using PLS.Secondly, we use MM to predict individual headwaters within each catchments.
We are particularly interested in determining whether models based on geographical data, like lake surface coverage, forest coverage or altitude can be improved by adding data on TOC concentrations measured at the river outlet.

Modelling headwater median and interquartile range
To model the median and interquartile range of TOC in headwaters in different catchments we use partial least squares regression (PLS).Variables included in this model are TOC at the river outlet (OutletTOC) and a number of variables describing information derived from land cover, geology, soil type maps and vegetation (kNN) (Table S1).The main purpose of PLS was to narrow down the number of explanatory variables for subsequent use in the mixed-models approach.All data, both explanatory and response variables, were centred by mean normalisation and weighted by dividing the variables with the standard deviation prior to PLS analysis in SIMCA for Windows v13.0 (Umetrics).PLS identifies the relationship between explanatory variables and response variables through a linear model, and is less sensitive to correlated explanatory variables (so-called multicollinearity) when compared to multiple linear regression approaches (Geladi and Kowalski, 1986), since the explanatory variables are combined to factors.
In the PLS analyses, the goodness-of-fit parameter Q 2 was used to quantify the model performance, which is the average (n = 7, default value in SIMCA) explained variance of a randomly selected fraction (1/7 of the data) of the validation data not used to fit the model.In robust models, R 2 and Q 2 are often similar, but the latter will decline as models become increasingly over-fit.Even though PLS models work by defining factors, i.e. combinations of explanatory variables, it is also possible to compute coefficients and weights that describe the direction and relative strength of the individual explanatory variables on the response variable; weights with larger absolute values indicate greater importance to a given latent component.All PLS-models were refined by iteratively removing variables that had non-significant coefficients.This procedure served to minimise the difference between R 2 and Q 2 values while retaining high explanatory power, i.e. to find a model that can be generalized to new data, while retaining good explanatory power.The relative importance of each explanatory variable is ranked using variable importance on the projection (VIP) scores, derived as the sum of square of the PLS weights across all components.VIP values greater than one are considered to indicate variables that are most important to the overall model (Eriksson et al., 2006).
PLS allows for more explanatory variables than observations and gives us the possibility to include many candidate variables.Still, some of the variables available needed to be excluded for the following reasons: -Some variables, e.g.volume of oak, had zero value for all observations or only few observations different from zero.These variables could not be included due to the lack of variation.
-Geographical variables for the river outlet were not included, since they correlate highly with the median of the corresponding variable at the headwater scale.The latter is considered to bear more information and was therefore included.
After running the three different PLS models we determined whether the same variables appeared to be important (VIP > 1) in the model fittings.These variables were taken as good candidates to reproduce the TOC level of the headwaters and therefore included in the mixed modelling of the individual headwaters in the next step (Sect.2.5.2).An additional test set was created, Test PLS 02&05 c3 , which was comprised of data from the two catchments sampled once in 2002 and twice in 2005 (catchments c3, n = 3; O 2 , K5s, K5w, where s stands for summer and w for winter).No calibration was done on this data set, but was used for testing robustness of the models with respect to seasonality.

Modelling individual headwaters
When modelling individual headwaters we want to predict specific values for each of the different headwaters in all catchments.As an effort to improve these simulations, we make an assumption that headwaters within the same mesoscale catchment are more similar to each other than to headwaters from other mesoscale catchments due to subtle combinations of physiographic, weather and other factors which combine to influence the TOC levels in ways which are not readily apparent from the available map information, but might be reflected in differences between the average TOC levels in the different mesoscale catchments.This assumption leads to a new data structure.To model this data structure we use hierarchical linear mixed models (MM; Littell et al., 2006), which allow the estimation of the correlation between headwaters within the same mesoscale catchment and adjusts the analysis accordingly.A MM does not allow for highly correlated explanatory variables, so the number of explanatory variables must be substantially smaller than the number of observations.To fit these models we use candidate explanatory variables from the PLS approach described in Sect.2.5.1.In the PLS analysis some explanatory variables were excluded due to a large number of zero values giving rise to a median sub-catchment value of zero for all catchments.One of these parameters, lake surface area, may still be important for modelling individual headwaters.If lakes have a moderately large volume (appreciable residence time) they are known to influence the organic content (Eriksson, 1929;Birge and Juday, 1926).Therefore lake cover surface is expected to have an influence on the prediction of individual TOC HW , and thus this variable was included as a potential predictor in the MM models even though this variable was not important in the PLS modelling.
For fitting of the MM, Akaike information criterion (AIC) (Akaike, 1974) and p values were used.AIC is a goodnessof fit measure, which is corrected for the complexity of the model, similar to the adjusted R 2 .The p-values in regression models determine if parameter estimates are significantly different from zero, i.e. if there is a significant relationship between an explanatory variable and the response variable.The p values were calculated according to Kenward and Roger (1997) using "krmodcomp" in R package "pbkrtest" (version 0.4-1).During the model fitting, added variables were checked to see if they increased the predictive ability of the model by computing the prediction error sum of squares (PRESS).The smaller the PRESS value the closer the prediction is to the observed values.Kenward and Roger (1997) version of R 2 for predictions is called P 2 (similar to Q 2 for PLS).The P 2 were calculated according to Méndez Mediavilla et al. ( 2008), with the modification that instead of leave-one-out validation we compute P 2 on the evaluation test sets: P 2 = 1-PRESS/TSS where TSS is the total sum of squared differences between modelled values and the mean of observations in the evaluation set.Median absolute errors (MedAE) and median relative errors (MedRE%) were also calculated.
As an additional step in the evaluation of the models the most successful MM from the nine MM calibrations was tested on the sites between the headwaters and the river outlets, the intermediate sites (n = 501).

Results
For all synoptic surveys the median TOC HW values were higher than the values at the respective outlets (Table 2), with large differences (> 20 %) for 14 of 17 synoptic surveys.The median TOC HW for all surveys together (12 mg L −1 ) was also higher than the median outlet TOC of 10 mg L −1 (Fig. 2 and Table 2).For all synoptic surveys, except A8, there was a funnel-shape in the TOC concentration with larger variation in smaller catchments that attenuates with increasing catchment size (Fig. 2).Reproducing this variation in the headwaters (Fig. S2), and assigning individual headwaters to the proper value within that large variation, is one of the challenges of modelling water chemistry in a landscape perspective.

Modelling headwater median
The first principal component (PC) of the PLS-model of median headwater TOC was significant for both PLS calibration sets for the years 2007 and 2008, but not the second PC.No significant PLS-model was established using calibration set for the year 2000 (Cal PLS 00 c1 ).Calibration using data set 2007 (Cal PLS 07 c2 ) gave higher R 2 and Q 2 than using the data set for 2008 calibration (Cal PLS 08 c2 ) (Table 3).The calibration of the model using the Cal PLS 08 c2 data was evaluated by using the test data.This yielded a PRESS for the median TOC that was lower than when the model was calibrated using Cal PLS 07 c2 (Table 3).Based on PLS-modelling  1 and 2 for description of surveys.
of median headwater TOC, suitable candidates for the mixed models (MM) of individual headwaters were altitude of sampling sites, OutletTOC, proportion of clear-felled, coniferous forest, mixed forest, wet mires, coniferous forest on mires as well as the volume of birch-, spruce-and total forest volume.

Modelling headwater interquartile range (IQR)
For both PLS calibration sets year 2007 and 2008 the first principal component (PC) was significant in the PLS-models for IQR headwater TOC, but not the second PC.No signif-Table 3. Partial least square regression (PLS) results indicating the goodness of fit for the prediction of median and the interquartile range (IQR) of headwater total organic carbon (TOC) concentration (mg L −1 ).Cal is calibrated and 00, 07 and 08 refer to sampling year (2000, 2007 and 2008)  icant PLS-model was established using calibration set year 2000 (Cal PLS 00 c1 ).The calibration of the model using the Cal PLS 07 c2 data was evaluated by using the test data.This yielded a PRESS for TOC IQR that was lower than when the model was calibrated using Cal PLS 08 c2 (Table 3).Headwaters TOC IQR modelled by PLS indicates three variables as suitable candidates for the MM: OutletTOC, proportion of clear-felled area and birch volume.

Modelling individual headwaters
The PLS approach from Sect.3.1 identified the variables that can determine the median level of TOC, and the variation around that median, on a range of different catchments.In the following analysis we seek models to predict the TOC of individual watercourses at different locations and points in time.

Modelling individual headwaters and predicting those same headwaters at other points in time
When we fit the models to a calibration set, e.g.headwaters measured in 2007, we start with a base model consisting of the variables identified by the PLS model for the interquartile range, i.e.OutletTOC on the catchment scale and proportion wet mires as well as volume birch, with different values for different headwaters.The base model was fitted with a MM using catchment as the random factor describing the hierarchical structure.Other variables were included in a forward selection procedure always adding the most significant variable of the remaining set of variables.Candidate variables used in this were all land use variables (including lake surface coverage) and all variables giving the volume of different tree species with exception of the volume of oak and beech, since these volumes are generally very low and zero for many headwaters.After fitting the model the ability to predict new data was tested and non-significant variables in the model were individually removed to check if their removal also worsened the predictive ability of the model.The models created by this procedure are listed in Table 4.The models produced by Cal MM 07 c2 and Cal MM 08 c2 were very similar and can predict the data at the same sites quite well, i.e. the Cal MM 07 c2 model can predict the Test MM 08 c2 data set well and vice versa (Fig. 4-5 and Table 5).

Modelling individual headwaters and predicting new headwaters at other time points
When we use the calibration set Cal MM 00 c1 to fit a model, the variables selected (same procedure as in Sect.3.3.1)were OutletTOC, lake surface coverage and coniferous forest on mires.We evaluate this model by predicting values in the test sets Test MM 07 c2 and Test MM 08 c2 (Fig. 3).The best prediction model parameters are given in Table 4, with model performance in Fig. 3 and error results in Table 5. Predictions for new sites in the test set Test MM 00 c1 and Test MM 02&05 c3 are less satisfactory and indicate that the models might be over fitting the data.

Evaluation on intermediate sites
The headwater models were also tested on the sites of intermediate size (i.e.> 2 km 2 ) but excluding the river outlet (lower parts of Table 5).In general the intermediate sites were modelled as successfully as the headwaters (Table 5).Cal MM 07 c2 gave predictions for the intermediate sites which were not as good as for the other data sets, i.e. higher MedRE%, than Cal MM 08 c2 and Cal MM 00 c1 .

Evaluation of river outlets
To test the effect of including the river outlet TOC on the performance of MM predictions for individual headwaters, three versions of each calibration data set was used to create three different models, one using outlet TOC alone (Out), one using map information alone (Map), and one using both the outlet and map information (OutMap).The map variables were the same for each calibration data set but with different calibrated coefficients (Table 4).In 25 out of 27 different combinations of MM (three different calibration data sets, three versions of each calibration (Out, Map, OutMap) and three different test data sets), the OutMap version gave the best performance with the lowest PRESS, while two Map versions (map information only, no OutletTOC included) gave the lowest PRESS (Table 5 and Fig. 3-5).Similar results were observed for the intermediate sites (Table 5 and Fig. 3-5).The OutMap version gave 5-15 % better predictions than Map only.

Discussion
Twenty-five of twenty-seven tests, including river outlet chemistry (OutletTOC), resulted in lower errors in the mixed models predictions of the TOC for individual headwaters, and intermediate sites, compared to using map information alone.The measurements at the river outlet were necessary to reproduce more correct average headwater TOC levels (Table 5).Excluding the OutletTOC measurements leads to the assumption that average TOC levels in the headwaters were similar in different catchment stream networks if the map information is similar, which is not always true (cf. sampling 2007 and 2008).This is the first article to test how to include river outlet chemistry with map information for modelling headwaters, and how well the river outlet chemistry improved the models.
To predict the correct mean values for headwater TOC is still a challenge in our application of mixed models, since the calibration sets consist of 4-5 catchment systems, and these were sampled only a few times for each calibration.This clearly makes generalisations to new catchments or flowsituations difficult and uncertain.Calibration sets Cal07 and Cal08 share most of the catchments, but are measured during different years and perhaps more importantly, during different flow situations and seasons.Even if most headwaters are the same in Cal MM 07 c2 and Cal MM 08 c2 , and the models produced were similar, it was still not possible to account for more than about 50 % of the variation in the other set of data, indicating that there is large variability in time.
Weather is a factor that varies with time (and space) and influences stream water chemistry.In our approach we did not include the weather related data (temperature, precipitation, flow) in the models (PLS and MM) since it was not available for headwaters, but only for the river outlet.Presumably discharge for each headwater could benefit the models, but measuring discharge at all individual headwaters would have been very time consuming (and was not performed).To model discharge with appropriate accuracy at all these headwww.biogeosciences.net/13/399/2016/Biogeosciences, 13, 399-413, 2016 waters (size < 2 km 2 ) is so far too difficult due to large heterogeneity at these small scales (Lyon et al., 2012), and lack of precipitation data for all these headwaters.With the approach in this study we were able to achieve P 2 values around 50 %, indicating that about 50 % of the variation in the individual headwater test sets can be explained with a model including the explanatory variables OutletTOC, lake surface coverage and proportion of coniferous forests on mires.In two of the three calibrations, the proportion of broad-leaved forest and elevation was also significant.
Most lakes in these catchments are dimictic (mixing of the lake from the surface to bottom twice each year).Some of the data used in this study (year 2007 and 2008) have been used to evaluate the impact of lakes on stream water chemistry and there were indications that lake influence differs as a function of season, catchment and constituent (Lyon et al., 2011).The presence of lakes had a stronger influence on stream water TOC levels in October 2007 than in April 2008.Thus the presence of lakes could influence the impact of river outlet TOC on headwater TOC in MM.Lakes are known to often decrease TOC concentration (Müller et al., 2013;Weyhenmeyer et al., 2012), although this effect is not always visible at a landscape scale (Lottig et al., 2013) and lakes can also delay pulses of TOC within river networks, which is a complicating factor (Hytteborn et al., 2015).
The proportion of coniferous forest on mires had a positive sign and proportion of lake surface coverage had a negative sign for all calibration sets.This is plausible based on earlier studies (Andersson and Nyberg, 2009;Pers et al., 2001;Oni et al., 2013;Lottig et al., 2013;Clark et al., 2010).Ågren et al. (2014); Hope et al. (1997); Löfgren et al. (2014);Mattsson et al. (2003) and Walker et al. (2012) have also found that the amount of organic matter in the catchment soils (mire, wetor peat land proportion) is often positively correlated with stream TOC concentration (e.g.Mulholland et al., 2001), even if the extent of organic soils can be hard to estimate from maps (Creed et al., 2003;Johnston et al., 2008).
That broad-leaved forest had a negative coefficient for Cal07 c2 and Cal08 c2 (Table 4-5) could be related to several factors.In these systems most of the broad-leaved forest is made up of birch (Betula pendula).The negative coefficient of broad-leaved forest/birch could be a direct effect of birch on water chemistry; i.e. more birch in a coniferous landscape could give runoff with lower organic carbon (Brandtberg et al., 2000;Fröberg et al., 2011).However, in a set of explanatory variables like this, with a large amount of geographical information, many of the variables are correlated.This results in the fact that similarly good models could be found with other sets of explanatory variables.For instance, in our data set we found high correlations among volume of various tree species, furthermore volume of pine also had high correlation with these variables: proportion of coniferous forest, site altitude, volume of birch, broad-leaved forest and volume of spruce.
In this work we use calibration sets and test sets rather than the popular leave-one-out cross-validation method, since we have dependent, clustered data.Shao (1993) showed that leave-one-out cross-validation tends to select unnecessarily large models if observations are correlated.Libiseller and Grimvall (2003) showed that this is true for data that are serially correlated, since a single removed observation can be reproduced easily by observations in the temporal vicinity of the left-out observation.The same should hold for clustered data, where a single left-out headwater would be reasonably predicted by other headwaters in the same catchment.

Conclusion
Our modelling approach, using both river outlet TOC and map information from the headwater catchments, could explain up to 52 % of the variance in TOC among individual headwater streams.This is much better performance than an earlier attempt using river outlet TOC without map information (Temnerud et al., 2010), in which only one of seven models were significant for predicting headwater median TOC and none were significant for predicting headwater TOC IQR.The key factor in our approach here is the use of mixed models which allow the same headwaters to have different TOC depending on weather and flow etc.Since MM cannot use large numbers of correlated explanatory variables, PLS was used to identify a set of candidate explanatory variables for the MM.Since our combined approach increased the predictability for TOC, it would be interesting to evaluate whether the method could improve prediction of headwater pH and ANC, for which models using outlet catchment chemistry were already fairly successful (Temnerud et al., 2010).
In order to have the same map resolution for all catchments, due to lack of universal availability of fine-scale data, a rather coarse resolution was used (e.g.50 m grid data for altitude and soil map of scale 1.1 million).The Swedish authorities are LiDAR scanning all of Sweden to build a 2 m grid digital elevation model and are generating maps connecting all watercourses up to the headwaters (through lakes and wetlands), which by 2017 will provide new data that could help in modelling the headwaters.This improved map information might further improve the mixed model approach demonstrated here which includes river outlet chemistry.This will hopefully get us closer to the ability to predict individual headwaters that are such vital building blocks of aquatic ecosystems, but remain so very difficult to model.
The Supplement related to this article is available online at doi:10.5194/bg-13-399-2016-supplement.

Figure 1 .
Figure1.Map of Sweden with the nine investigated catchments, see Table1for coordinates.Labels in brackets are the abbreviated names of the catchments and year of sampling.

Figure 3 .
Figure 3. Scatterplots of measured headwaters with total organic carbon (TOC in mg L −1 ) on the x axis, and the three different versions of the mixed models Cal MM 00 c1 on the y axis: Out version on the left panel, OutMap version on the right panel and Map inbetween.Data for year 2000 (Cal MM 00 c1 ) on the top row in red text, followed by Test data; second row 2002 & 2005 data, third row is 2007 data and the last row is 2008 data.R. Anråse å indicated by circles, R. Danshytteån by diamonds, R. Getryggsån by rectangles, R. Krycklan by triangles (winter 2005 by upside-down triangles), R. Lugnån by squares, R. Mangslidsälven by multiplication sign, R. Ottervattsbäcken by upside-down triangles, R. Vänjaurbäcken by right tilted triangles and R. Viggan by plus sign.The black line is the 1:1 line.

Figure 4 .
Figure 4. Scatterplots of measured headwaters with total organic carbon (TOC in mg L −1 ) on the x axis, and the three different versions of the mixed models Cal MM 07 c2 on the y axis: Out version on the left panel, OutMap version on the right panel and Map inbetween.Data for year 2007 (Cal MM 07 c2 ) on the third row in red text, followed by Test data; first row 2000 data, second row 2002 & 2005 data and the last row is 2008 data.R. Anråse å indicated by circles, R. Danshytteån by diamonds, R. Getryggsån by rectangles, R. Krycklan by triangles (winter 2005 by upside-down triangles), R. Lugnån by squares, R. Mangslidsälven by multiplication sign, R. Ottervattsbäcken by upside-down triangles, R. Vänjaurbäcken by right tilted triangles and R. Viggan by plus sign.The black line is the 1 : 1 line.

Figure 5 .
Figure 5. Scatterplots of measured headwaters with total organic carbon (TOC in mg L −1 ) on the x axis, and the three different versions of the mixed models Cal08 on the y axis: Out version on the left panel, OutMap version on the right panel and Map in-between.Data for year 2008 (Cal MM 08 c2 ) on the last row in red text, followed by Test data; first row 2000 data, second row 2002 & 2005 data and the third row is 2007 data.R. Anråse å indicated by circles, R. Danshytteån by diamonds, R. Getryggsån by rectangles, R. Krycklan by triangles (winter 2005 by upside-down triangles), R. Lugnån by squares, R. Mangslidsälven by multiplication sign, R. Ottervattsbäcken by upside-down triangles, R. Vänjaurbäcken by right tilted triangles and R. Viggan by plus sign.The black line is the 1 : 1 line.
Table 1 for coordinates.Labels in brackets are the abbreviated names of the catchments and year of sampling.

Table 1 .
Characteristics of the rivers.Cluster is the grouping of calibration and test sets used in the modelling.Air temperature (T • C) and specific discharge (q mm day −1 ) are the medians of daily values for 1990-2010, precipitation (P mm) is the median of the yearly sum.P , T and q are modelled by SMHI, see Methods for more details.

Table 2 .
Median concentration of total organic carbon (TOC, mg L −1 ), with 25th and 75th percentiles in brackets, plus weather parameters (T , P , q) using the median value from the 30 days prior to sampling as modelled by SMHI (See Methods for more details).Clusters are the different groups of calibration and test sets used in the modelling.Sets are the different data sets used, with acronyms corresponding to river names (Table1) and the last digit of the sampling year.M = Month of sampling, HW is headwaters and n HW = number of sampled HW.
. For this observations for 2008 (Test MM 08 c2 ) were predicted by the model calibrated with data from 2007 (Cal MM 07 c2 ).The observations from 2007 (Test MM 07 c2 ) were also predicted based on a model calibrated using the 2008 data (Cal MM 08 c2 ).Test MM 02&05 c3 ), 2007 (Test MM 07 c2 ) and 2008 (Test MM 08 c2 ) respectively were predicted by models calibrated using data from 2000 on an entirely different set of catchments (Cal MM 00 c1 ) and of catchment (number = n).PRESS is the prediction error sum of squares.

Table 4 .
Coefficients for the best-fit hierarchical linear mixed models (MM), where Log 10 (TOC HW ) is the response variable.See the Method section for more details.Is the proportion of the entire catchment area covered by this particular feature. *

Table 5 .
Hierarchical linear mixed models (MM) results predicting headwater (HW) organic carbon (TOC) concentration in Log10.Cal is calibrated and 00, 07 and 08 refer to sampling year(2000, 2007 and 2008).The different coefficients are found in Table4.Each calibration has three versions (intercept is always included): Out, Map and OutMap.The Out version includes OutletTOC, the Map version includes the map information, while OutMap includes both OutletTOC and map information.The prediction error sum of squares (PRESS) is the squared differences between observed and predicted values for the Y-data kept out of the model fitting in the Test sets (00 stands for year 2000, 02&05 for 2002 and 2005, 07 for 2007 and 08 for 2008).The bold values show the lowest PRESS of the three versions for that Test data, bold PRESS per test sites are the lowest for that version.Intermediate sites (is) stands for sites between headwaters and the river outlet.R 2 for predictions is called P 2 (similar to Q 2 for PLS): P 2 = 1-PRESS/TSS where TSS is the total sum of squared differences between modelled and the mean of observations.AE is absolute error and RE is relative error, calculated on TOC in mg L −1 .