Estimating nitrogen fluxes at the European scale by upscaling INTEGRATOR model outputs from selected sites

Estimating nitrogen fluxes at the European scale by upscaling INTEGRATOR model outputs from selected sites G. J. Reinds, G. B. M. Heuvelink, T. Hoogland, J. Kros, and W. de Vries Alterra, Wageningen University and Research Centre, P.O. Box 47, 6700 AA Wageningen, The Netherlands ISRIC World Soil Information P.O. Box 353, Wageningen University, 6700 AJ Wageningen, The Netherlands Soil Geography and Landscape, Wageningen University, P.O. Box 47, 6700 AA Wageningen, The Netherlands


Introduction
The intensification of agricultural production by enhanced nitrogen use over the past few decades has strongly increased global food production, but at high environmental costs (e.g. Smil, 1999;Tilman et al., 2001). Environmental effects of the intensification in agriculture are manifested through the loss of nitrogen (N) to the atmosphere and hydrosphere, which causes a number of ecological and human health effects, such as (i) decreased biodiversity of terrestrial and aquatic ecosystems caused by eutrophication and acidification, (ii) too high NO 3 concentrations in drinking water, caused by elevated nitrate (NO 3 ) leaching to groundwater and (iii) global climate change, induced by emissions of nitrous-oxide (Galloway et al., 2003(Galloway et al., , 2008Erisman, 2004).
To gain insight in the risks associated with N use at the European scale, the model INTEGRATOR (De Vries et al., 2011a, c;Kros et al., 2012) has been developed that assesses N output fluxes from agricultural and terrestrial ecosystems in Europe to: 1. air, i.e. emissions of ammonia (NH 3 ), nitrous-oxide (N 2 O), nitrogen oxides (NO x ) and dinitrogen (N 2 ); 2. water, i.e. N leaching to groundwater and N runoff to surface water, in response to N inputs.
INTEGRATOR calculates these output fluxes for 36 466 calculation units (NCUs: NitroEurope Calculation Units), being unique combinations of soil type, administrative region, slope class and altitude class with a mean size of 199 km 2 . To gain insight in the uncertainty of the N fluxes, a comparison of model results was foreseen within the NitroEurope project (www.nitroeurope.eu) of different complex dynamic models that predict soil N fluxes on a European scale, such as DayCent (Del Grosso et al., 2005), DNDC (e.g. Li et al., 2000) and EPIC (Bouraoui and Aloe, 2007;Van der Velde et al., 2009), thereby performing an uncertainty analysis for each model. However, as this requires a Monte Carlo type of uncertainty quantification/uncertainty analysis (UQ/UA) of these complex models, this cannot be done for all 36 466 mapping units (NCUs). This would be too computationally demanding, given that this involves hundreds of Monte Carlo runs for each NCU. Therefore it was decided to limit the analysis to a sample of 450 sites within Europe, i.e. a realistic number to be processed in an UQ/UA with the complex dynamic models. The UQ/UA results can be scaled up to the whole of Europe with statistical quantification of the upscaling error provided the sample of sites is chosen using probability sampling (De Gruijter et al., 2006). In this paper, a comparison is made between upscaled model results from 450 selected sites within Europe and the results based on simulations for the whole of Europe (i.e. all NCUs), using the model INTEGRATOR. To enable statistical upscaling of results to the entire EU, sites were selected using stratified random sampling. By using appropriate statistical inference, this yields not only an estimate but also the estimation error in statistical terms (i.e. as a probability distribution). Deterministic quantification of the upscaling error (i.e. the actual value of the error) is achieved by a direct comparison between results from the 450 sites and those obtained for the whole of EU-27. This is feasible because a single run of INTEGRATOR takes much less computing time than those of the more complex dynamic models.
The aim of the study was to assess whether a reliable estimate of the total European greenhouse gas emission can be made by upscaling results for the selected sites. First the model INTEGRATOR is shortly described, followed by a description of the stratified random sampling plot selection and of the statistical methodology used to upscale results to the European and country scale. Next the results of the plot selection are presented, followed by a description of the results on N fluxes using both methods. Results focus on the comparability of N inputs by fertilizer and manure, N uptake and N surplus, NH 3 , and N 2 O emissions and N leaching/runoff. Based on the comparison, the reliability of the sampling and upscaling procedure is discussed.

The model INTEGRATOR
INTEGRATOR has been developed to assess responses of N and greenhouse gas (GHG: N 2 O, CH 4 and CO 2 ) emissions to European-scale changes in land use, land management and climate at a high spatial resolution, both in the past and in the future, focusing on changes in the period 1970-2030(De Vries et al., 2011a. It covers major agricultural and nonagricultural ecosystems (grassland, arable land, forests, and short vegetation) and includes interactions between agricultural land and non-agricultural land via gaseous emissions of NH 3 (and NO x ) and the resulting N deposition. INTE-GRATOR includes complete N balances, and can be applied at EU level . To achieve all these aims, INTEGRA-TOR (i) uses relatively simple and transparent model calculations based on the use and adaptation of available modelling approaches, (ii) includes empirical modelling approaches, using statistical relationships between model outputs and environmental variables and (iii) focuses on the derivation of high-resolution spatially explicit input data. The INTE-GRATOR concept is based on an appropriate balance between model complexity and data availability. For N fluxes, which is the focus of this paper, INTEGRATOR includes sub-models to predict -NH 3 , NO x , N 2 O and CH 4 emissions from housing and manure storage systems and agricultural soils, based on the adapted MITERRA-Europe model ; -NO x and N 2 O emissions from non-agricultural terrestrial systems, based on a meta-model of DNDC (Kesik et al., 2005;De Vries et al., 2007) and empirical relationships relating N 2 O and NO x emissions to georeferenced N inputs and stand/site characteristics, including meteorological parameters and soil characteristics (Bloemerts and de Vries, 2009); -N 2 O emissions from groundwater and surface waters according to a hole-in-the-pipe-model (Keuskamp et al., 2012); -N deposition (an emission-deposition matrix for NH 3 and NO x ), accounting for the interaction between agricultural and non-agricultural soils (EMEP, 2009).
To derive a complete N budget, background emissions and energy emissions are also included in INTEGRATOR, based on literature data (Simpson et al., 1999) and IMAGE (Bouwman et al., 2006) model calculations. Since this paper focuses on results from agricultural systems, obtained from an adapted version of MITERRA-EUROPE, this model is described briefly below. MITERRA-Europe ) is a deterministic and static N cycling model that calculates N emissions on an annual basis, using N emission factors and N leaching fractions distinguishing various manure types and manure application practices. The model can be used to assess the effects of measures and policies aiming at emission reduction of ammonia, nitrous oxide (N 2 O), N oxides (NO x ), and methane (CH 4 ) to the atmosphere, leaching of N (including nitrate) to groundwater and surface waters, and on the phosphorus (P) budget at EU-27 level, country level, and regional (NUTS-2) level. INTEGRATOR includes an adapted version of MITERRA, including more detail in N 2 O and NO x emission factors, based on a literature study (Lesschen and Velthof, 2009 The set of 450 sites in Europe for which INTEGRATOR was applied was obtained using stratified simple random sampling. Since land use is considered a dominant factor, we required that each stratum had to be homogeneous with respect to land use. The EU was divided into 150 geographical strata, such that the strata were homogeneous with respect to the other environmental factors that control N fluxes. Next three sites were randomly selected within each stratum. Simple random sampling of three sites from each of the 150 clusters was achieved by using the "spsample" function from the gstat package for R (Bivand et al., 2008). Sampling was based on digital representations of the NCU clusters in the Lambert Azimuthal Equal Area projection (Geographic Coordinate System GCS ETRS 1989). The sampling procedure is explained in detail below.

Principle of stratified random sampling
Stratified random sampling starts by dividing the total population of M NCUs into K strata, ideally such that the variation of the target variable (i.e. greenhouse gas emission) within strata is small compared to the variation between strata. Each of the K strata contains M k NCUs (k =1,...K).
Obviously, the sum of all M k equals M. Within each stratum, a simple random sample of m k sites is chosen using simple random sampling. In our case M = 36 466, and we used K = 150 and m k = 3. This yields a total of 450 sites. The important advantage of using m k > 1 is that variability within strata can also be assessed. Using multiple sites from the same stratum allows comparison of model results at different sites with similar environmental conditions. The approach taken to define the K strata (or groups) conforms with conventional clustering techniques from multivariate statistics; see Kaufman and Rousseeuw (1990) or Davis (2002) for details.

Factors controlling clustering
The M NCUs were clustered into groups that are homogeneous with respect to the expected model output, in this case annual N emission. Clustering was therefore based on important environmental factors that are known to influence emission, i.e. (i) land use, divided in arable land, grassland (including rough grazing) and nature (including forests but excluding wetlands), (ii) N manure application and N grazing, (iii) N fertilizer application, (iv) mean annual temperature, (v) total annual precipitation and (vi) soil type/texture, divided in sand, clay and peat. Loamy soils were added to the clay soils because of comparable hydrological characteristics. Wetlands were excluded (220 NCUs), and so were 2 NCUs for which INTEGRATOR calculated N Manure inputs greater than 400 kg ha −1 , because these values were considered unrealistic. As a result the total number of NCUs was reduced from 36 466 to 36 244. Data for precipitation and temperature for each NCU were derived from Mitchell et al. (2004). The assessment of N inputs for each NCU is described in detail in De Vries et al. (2011b). Soil texture is an attribute for each NCU, obtained from the European Soil Database polygon map (Panagos, 2006).
Next the subsequent cluster analysis was applied to each of the three initial clusters separately. From the 36 244 NCUs, there were 11 375 NCUs comprising arable land (1.763 million km 2 ), 9789 NCUs comprising managed grass and rough grazing (grassland in pastoral use, 0.866 million km 2 ), and 15 080 nature NCUs (1.923 million km 2 ). Differences in inclusion probabilities were taken into account when the results of the 450 point sites were scaled up to the entire EU (see Sect. 2.3).

Initial clustering per land use type by rounding
To reduce the number of initial clusters (i.e. NCUs) for each of the three land use types, we grouped NCUs with similar values of continuous numerical environmental factors using classes for these factors. For each NCU, annual precipitation less than 50 mm was rounded to multiples of 10 mm and annual precipitation greater than 50 mm was rounded to multiples of 50 mm. Mean annual temperature was rounded to multiples of 0.5 • C.
Applications of N through chemical fertilizer and animal manure less than 50 kg ha −1 yr −1 were rounded to multiples of 1 kg ha −1 yr −1 , and applications greater than 50 kg ha −1 yr −1 were rounded to multiples of 10 kg ha −1 yr −1 . After rounding, the original set of 36 244 NCUs was reduced to 11 629 unique combinations. Computation-wise, this proved to be an important advantage.

Data transformation
The relationship between annual N emission and annual temperature and precipitation is known to be non-linear. Therefore, annual temperature and precipitation were transformed prior to entering the cluster analysis. For temperature, a square root transform and, for annual precipitation, a sigmoid transformation were used: where T tr and P tr are the transformed mean annual temperature (T ) and precipitation (P ), respectively. The coefficients were taken as T 0 = 7 • C, c = ln(9)/500 mm −1 , and h = 1000 mm. T 0 was chosen such that no NCU had a negative value for T + T 0 ; the sigmoid transformation parameters were chosen such that precipitation values of 500, 1000 and 1500 mm yielded transformed values of 0.10, 0.50 and 0.90, respectively, assuming that precipitation amounts above 10 000 mm will lead to soils being wet for large parts of the year with associated high N 2 O emissions.

Calculating dissimilarities
Most clustering techniques make use of so-called dissimilarity matrices, which measure the dissimilarity or "distance" between objects. The dissimilarity is calculated from a weighted comparison of the differences between the factor values of a pair of objects. Thus, the dissimilarity between two objects p and q (i.e. NCUs) may be computed as where F is the number of factors, the D f are individual factor distances and the α f are weights. In the case of NCU clustering, there are five factors: N manure application, N fertilizer application, transformed annual temperature and transformed annual precipitation and soil texture class (recall that land use has already been dealt with). For soil texture, we used D soil (sand, clay) = 0.6, D soil (sand, peat) = 0.8, and D soil (clay, peat) = 0.4; we thus assumed the largest dissimilarity between sand and peat soils. Soil texture dissimilarities between two NCUs with the same soil texture are zero. The distances D f of the continuous numerical factors were computed by taking the absolute differences of the standardised values of the factors. Standardisation of factor values was achieved by subtracting the mean factor value among all NCUs from each observation and dividing by their standard deviation. The weights were chosen as α soil = 0.20, α Nmanure = 0.40, α Nfert = 0.25, α Temp = 0.10 and α Prec = 0.05. We thus assumed that e.g. differences in the amount of manure between two sites are more important than differences in temperature of precipitation. The weights were assigned based on the relative importance of environmental factors on N emissions, based on literature data and expert knowledge.

Comparison of three cluster techniques
The dissimilarities D were used to group objects into clusters. The general principle is that the dissimilarity between objects of the same clusters should be as small as possible. Different techniques were evaluated: (i) hierarchical agglomerative clustering (HAC), (ii) hierarchical divisive clustering (HDC) and (iii) k-means clustering (KMC). The most satisfactory technique was used to define the strata used for sampling. HAC techniques (Kaufman and Rousseeuw, 1990) start with as many clusters as there are objects. At first, each object is a small cluster by itself. Clusters are merged until only one large cluster remains which contains all the objects. At each stage the two nearest clusters are combined to form one larger cluster. This requires a definition of what "nearest" means. The "Agnes" algorithm from the R-cluster package (Maechler et al., 2009) distinguishes several options. In the most common approach, the distance between two clusters is the mean of the dissimilarities between the objects in one cluster and the objects in the other cluster. The clustering may be stopped when the required number of clusters has been reached. In our case we used 50 final clusters per land use type.
HDC algorithms construct a hierarchy of clusters starting with one large cluster containing all objects (i.e. all NCUs of a given land use type). We used the "Diana" algorithm from the R-cluster package (Maechler et al., 2009). Clusters are split until each cluster contains only a single object. At each stage, the cluster with the largest diameter was selected and split. The diameter of a cluster was defined as the largest dissimilarity between any two of its objects. To divide the selected cluster, the algorithm first looks for its most disparate object (i.e. the object with the largest mean dissimilarity to the other objects of the selected cluster). This object initiates the "splinter group". In subsequent steps, the algorithm reassigns objects that are closer to the "splinter group" than to the "old party". The result is a branching tree that can be cut at any desired level of number of final clusters (i.e. 50 per land use class).
The KMC algorithm (Hartigan and Wong, 1979) is not hierarchical and therefore neither divisive nor agglomerative. In this case the algorithm starts with (arbitrary chosen) centroids of the number of desired clusters. Next all objects are assigned to their nearest cluster. Once all objects have been assigned to clusters, the centroids are computed again. In case of a categorical variable such as soil texture type, the centroid takes the dominant value. Next allocation of all objects based on the new centroids is done, and the process is repeated iteratively until no more changes occur. We used the "PAM" algorithm that is fully described Kaufman and Rousseeuw (1990).
The Agnes, Diana and PAM algorithms were implemented in the "cluster" library of the R statistical software package (http://www.r-project.org/). The dissimilarity matrices for arable, grass and nature were calculated and used to assess the performance of the three algorithms. To evaluate and compare the performance of the algorithms, the homogeneity of the 150 clusters was calculated and the cluster sizes (i.e. the number of NCUs grouped into a cluster) were calculated. Homogeneity was evaluated by calculating the summed within-cluster variances for each of the four continuous-numerical factors and the summed within-cluster entropy for soil texture (Brus et al., 2008).

Spatial aggregation of model outputs to the European scale
Since the selected 450 sites are a stratified simple random sample from the entire EU-27, with known inclusion probabilities, upscaling of the results at these sites to the entire EU is relatively straightforward and also allows the assessment of the associated estimation error. The mathematicalstatistical procedure used to obtain estimates of averaged model outputs for the entire EU-27 and for countries within Biogeosciences, 9, 4527-4536, 2012 www.biogeosciences.net/9/4527/2012/ EU-27, including the sampling error associated with the estimates, is described below.

Upscaling to EU-27
The mean of model outputs across the EU-27, µ, was estimated by (De Gruijter et al., 2006) with K the number of strata, a i the relative area of stratum i, The variance of the estimation error is estimated by where V (ŷ i ) is the variance ofŷ i , which is estimated bŷ Assuming a normal distribution of the estimation error, the lower and upper limit of the symmetric 95 % confidence interval for the mean µ are estimated byμ − 1.96 · V (µ −μ) andμ + 1.96 · V (µ −μ).

Upscaling to countries
Upscaling to countries instead of the EU is easy when all K strata lie either entirely inside or entirely outside a country. In that case the equations above can be applied to the subset of strata that lie inside the country (i.e. sum only over the strata inside the country). When some of the strata lie partly inside and partly outside the country, then only that part of the stratum that lies in the country was included in the analysis and replaces the original stratum (note that this affects the stratum area A i ). The equations presented above were used as before, but note that in this case m no longer needs to be constant and equal to three, but may be smaller than three for strata that are partly outside the country. In fact, m was then replaced by the number of stratum sites that are located inside the country. This created a problem when m is smaller than two. When m = 1, the stratum variance can no longer be estimated, and when m = 0 neither can the mean value. For a stratum with m = 1, the variance could be estimated by merging the stratum with a neighbouring stratum (collapsed strata method, Cochran, 1977, p. 139), but in this study we applied the approach only for strata with m > 1.

Clustering and stratified random sampling
Dissimilarity matrices for arable, grass and nature used to assess the performance of the agglomerative (AG), divisive (DI) and the k-means partitioning around medoids (PAM) clustering methods are presented in Table 1. Medoids are objects in a cluster with minimal mean dissimilarity to all other objects. The table shows the summed within-cluster variance for the various factors affecting N fluxes. Within-cluster variances of temperature and within-cluster entropies for soil texture are much smaller for nature than for grassland and arable land, which means that the 50 nature clusters are much more homogeneous with respect to mean annual temperature and soil texture. This can be explained by the fact that clustering for nature is not influenced by the factors manure and fertilizer, since these are zero for nature (hence zero within-cluster  Table 1. Summed within-cluster variance for mean annual temperature ( • C), annual precipitation (mm), N manure (kg N ha −1 yr −1 ), fertilizer (kg N ha −1 yr −1 ) and summed within-cluster entropy for soil texture using agglomerative (AG), divisive (DI) and k-means (PAM) clustering. variances). Note also that the differences between precipitation variances for the three land uses are much less because precipitation has a much smaller weight. For nature, PAM has smaller variances than DI and AG, whereas for grassland and arable land AG has the smallest variances for the continuous-numerical factors (except for manure in arable land). PAM also has the smallest within-cluster entropy for soil texture. Overall, differences between the heterogeneity measures of the algorithms are not very large. Figure 1 gives the number of NCUs within each of the 50 clusters for arable land using agglomerative (AG), divisive (DI) and k-means (PAM) clustering. PAM proved to be most successful in creating clusters of uniform size, while maintaining acceptable summed within-cluster variances (Table 1). The same results were obtained for grassland and nature (not shown). It was therefore decided to derive 50 clusters for each of the three land use types using k-means clustering with the PAM algorithm. Below, resulting maps and cluster properties are only presented for arable land.
Box plots of the values of the continuous-numerical factors for the 50 arable clusters are shown in Fig. 2. The largest differentiation in factor values between clusters is in annual application of N-manure and N-fertilizer. This is in agreement with the large weights assigned to these factors. Some clusters such as clusters 6, 10, 19 and 25 (all located in North-West Europe) have large values for N-manure, whereas many have small values. Cluster 45 has the largest value for Nfertilizer, whereas cluster 26 has no N-fertilizer. Annual precipitation and temperature have much more overlap between clusters, indicating that clustering was not very successful to differentiate these factors (partly caused by the low weight assigned to these factors). The proportion of the three soil texture classes, peat, sand and clay over the clusters (Fig. 3) shows that the majority of clusters are not homogeneous with respect to soil texture; in almost all cases one of the soil texture classes is clearly dominant.
The majority of clusters are also geographically concentrated (Fig. 4). For instance, note that the Czech Republic is almost entirely grouped into a single cluster. The differentiation of the various clusters is mainly associated with distinct differences in the N input by fertilizer and manure (Fig. 4). Simple random sampling of three sites within each of the 150 clusters yielded the 450 sites given in Fig. 5.

Results at European level
For arable land, the mean total N inputs, crop N uptake, emission of N 2 O and NH 3 and NO x to the atmosphere and losses of N to ground water and surface water based on the 450 selected points compare well with the mean computed with full aerial support ( Table 2). The overall difference is always less than 10 %, and the 95 % confidence interval includes the value obtained by running INTEGRATOR for the whole of  Europe. Noteworthy is also the difference in the computed confidence intervals: for most balance terms, the 450 points yield a narrow interval, but for the emissions of N 2 O and NO x the uncertainty is larger. This is most likely due to the many factors that influence these emissions, such as rainfall, temperature and soil characteristics that have a large spatial variability. For nature, the mean N inputs and N losses from both methods are also mostly within 10 %. The emission of N 2 O is even identical for both methods (Table 3). The NH 3 flux from full areal support, however, is approximately 30 % larger than that from the 450 points and lies outside the 450-point confidence interval. This is most likely due to the influence of N deposition: with full area support the mean N deposition is higher than the mean of the 450 points. This could be due to the way the points are sampled or to the non-normal distribution of the N deposition. In INTEGRATOR NH 3 emissions increase with higher N deposition; including NCUs with higher N deposition thus causes higher mean emission. The same holds for other balance terms influenced by N deposition such as N uptake and N leaching. The results for agricultural land based on full aerial support are also presented and discussed in detail in De Vries et al. (2011b) and those for non-agricultural land in De Vries et al. (2011c). Results for grassland are not discussed because emission from grassland in INTEGRATOR is computed with MITERRA for managed grass and with the DNDC meta-model for unmanaged grass. Since both methods are used already for arable and nature for which the results are extensively described, adding grassland would not provide new insights in how well the method performs. Secondly, since grassland consists of both managed and unmanaged grass, results are not easily interpretable as a mix of two models is used to obtain the results.

Results at country level
Results for countries are presented only for those countries where, for a given land use type (nature and arable land), at least two points fall within the country area, so that a mean and standard deviation can be computed. As an example, results are presented for the N 2 O emission. The calculated mean N 2 O emission from arable land for the countries based on the sample plots within the country yields different results compared to results based on the full areal support (Fig. 6, upper left). For most countries the absolute difference is < 0.5 kg ha −1 yr −1 , but for Austria, UK and Hungary the difference is > 1 kg ha −1 yr −1 . With the exception of Hungary, the variance in the computed emissions at the selected plots within a country is small, indicated by the small range indicated by the error bars in the graphs. For Hungary the plots in the sample yield very different N 2 O emissions, leading to a large within-country variation. For arable land the calculated up-scaled N 2 O emissions in selected plots are mostly lower than the N 2 O emissions computed with full aerial support, in line with the results for EU-27. For nature (Fig. 6 upper right), the uncertainty in modelled N 2 O emissions is much lower because in the absence of N inputs by manure and fertilizers, the N input consists of N deposition, which has a much lower spatial variability. There is a reasonable agreement between the results from the two methods, and no systematic bias. Uncertainties in computed N leaching for arable soils are high (Fig. 6, lower left), and the results based on the sample plots mostly fall within the confidence interval based on full areal support. Again for nature, the uncertainty is lower (for the same reason as for the N 2 O emission), and for a few countries the estimates based on the sample plots fall outside the confidence interval using full support (Fig. 6 lower right). Results of this study show that an accurate estimate (within 10 %) of the total N inputs, crop N uptake, emission of N 2 O and NH 3 and NO x to the atmosphere and losses of N to ground water and surface water from arable land at EU-27 level can be obtained by running the model for a stratified random sample of 150 points. This shows that the method and data used to stratify spatial units into clusters contained the most important environmental factors that are known to influence emission. For nature, estimated N 2 O and NO x emissions were very accurate, but the procedure was less successful to predict NH 3 emissions. This may be caused by the fact that N deposition was not included in the clustering procedure, while this parameter strongly influences the NH 3 emissions from these ecosystems, or because the distribution of N deposition over Europe is skewed and we missed the high deposition areas in our set of selected sites. This study thus shows that, with only a few exceptions, a reliable estimate of the total N fluxes for EU-27 can be made by upscaling results of the 450 selected sites. If we assume that this also holds for other, more detailed, process-based N emission models, an enormous reduction in computation time can be achieved with this clustering method without substantial deterioration of results. This is specifically useful when an uncertainty analysis is carried out at EU-27 scale, because for a Monte-Carlo type uncertainty analysis a large number of model runs per site is required. It is, however, advisable to test the methods used in this study also for these type of models. INTEGRATOR computes N emissions from N inputs using mostly linear functions, and it is yet unclear if calculations on the selected sites with more complex non-linear models such as DNDC or DayCent would also be in line with Europe-wide assessments of these models. This study also shows that the sample of 150 points per land use class for the whole of EU-27 is too small to make reliable N 2 O emission estimates for arable land for most individual countries. In 14 out of the 27 countries, there were less than two points for a given land use type (nature and arable land) within the country area. For the remaining 13 countries, reasonable estimates on a country basis were only made for model outputs that depend on spatially homogenous input, such as NH 3 emissions from arable land and nature. In general, the number of points per country is too small to accurately represent the spatial variability of the factors influencing for example N 2 O emissions. It is therefore advised to restrict the upscaling procedure either to EU-27 or to large regions within Europe, such as Scandinavia, the Mediterranean or Central Europe.