Remotely sensed land-surface energy fluxes at sub-field scale in heterogeneous agricultural landscape and coniferous plantation

. In this study we evaluate a methodology for disaggregating land surface energy ﬂuxes estimated with the Two-Source Energy Balance (TSEB)-based Dual-Temperature Difference (DTD) model which uses day and night polar orbiting satellite observations of land surface temperature (LST) as a remotely sensed input. The DTD model is run with MODIS input data at a spatial resolution of around 1 km while the disaggregation uses Landsat observations to produce ﬂuxes at a nominal spatial resolution of 30 m. The higher-resolution modelled ﬂuxes can be directly compared against eddy covariance (EC)-based ﬂux tower measurements to ensure more accurate model validation and also provide a better visualization of the ﬂuxes’ spatial patterns in heterogeneous areas allowing for development of, for example, more efﬁcient irrigation practices. The disaggregation technique is evaluated in an area covered by the Danish Hydrological Observatory (HOBE), in the west of the Jut-land peninsula, and the modelled ﬂuxes are compared against measurements from two ﬂux towers: the ﬁrst one in a heterogeneous agricultural landscape and the second one in a homogeneous conifer plantation. The results indicate that the coarse-resolution DTD ﬂuxes disaggregated at Landsat scale have greatly improved accuracy as compared to high-resolution ﬂuxes derived directly with Landsat data without the disaggregation. At the agricultural site the disaggregated ﬂuxes display small bias and very high correlation ( r ≈ 0 . 95) with EC-based measurements, while at the plantation site the results are encouraging but still with signiﬁcant errors. In addition, we introduce a modiﬁcation to the DTD model by replacing the “parallel” conﬁguration of the resistances to sensible heat exchange by the “series” conﬁguration. The latter takes into account the in-canopy air temperature and substan-tially improves the accuracy of the DTD model.


Introduction
The reliable estimation of surface energy fluxes (latent heat -LE, sensible heat -H , ground heat -G, and net radiation -R n ) in agricultural landscapes requires that the model's spatial resolution matches the dominant landscape feature scale (Kustas and Albertson, 2003;Kustas et al., 2004). Since most of the models require satellite observations, particularly of land surface temperature (LST), for operational use over larger areas, their spatial resolution is limited by the resolution of those satellite observations. In many heterogeneous agricultural landscapes the field sizes can be of order of a couple of hectares, meaning that the spatial resolution of the LST satellite observation needs to be in the order of 100 m × 100 m. Among the few satellites which can provide this information on a regular basis is the Landsat satellite family with LST resolution of 120 m for Landsat 5, 60 m for Landsat 7 and 100 m for Landsat 8. In all three cases the LST is resampled by the data provider to 30 m (http://landsat.usgs. gov/band_designations_landsat_satellites.php, last accessed 17 March 2014). There are a number of methodologies which can exploit the Landsat-derived LST for estimating surface energy fluxes. They range from empirical, like the triangle approach (Stisen et al., 2008), to more physically based, such as one-source energy balance (Bastiaanssen et al., 1998) or two-source energy balance (TSEB) models (Norman et al., 1995). The empirical methods suffer limitations due to the simple assumptions and ratios employed within the models. On the other hand, the physically based models are highly sensitive to errors in absolute temperature measurements, either of satellite LST or of air temperature from numerical models or remotely located meteorological stations (Anderson et al., 1997). This is particularly important when using Landsat LST estimates, since they are derived from only one thermal band and so are highly susceptible to atmospheric water vapour absorption (Sobrino et al., 2004). Although the new Landsat 8 satellite has two thermal bands it is recommended to refrain from using one of them for LST estimation (USGS, http://landsat.usgs.gov/calibration_notices.php, last accessed 14 January 2014). Li et al. (2004) have estimated LST using the older Landsat 5 and Landsat 7 satellites and obtained a mean absolute difference of between 1 and 1.5 • C when compared to tower-based brightness temperature measurements.
To overcome issues arising from uncertainty in absolute temperature measurements, models such as Atmosphere-Land-Exchange-Inverse (ALEXI) (Anderson et al., 1997) or Dual-Temperature Difference (DTD) (Norman et al., 2000), both based on the TSEB modelling scheme, make use of the temperature difference between early morning and late morning or early afternoon observations. Both models require LST estimates provided by geostationary satellites, due to their high temporal resolution, and therefore produce flux estimates at low spatial resolution of around 5 km. The ALEXI model has previously been coupled to the DisALEXI disaggregation algorithm (Anderson et al., 2004), allowing the fluxes estimated at the geostationary satellite spatial resolution to be disaggregated using a Landsat LST observation to Landsat spatial scale while preserving the advantages provided by using the differential temperature estimates. This allows the ALEXI modelled fluxes to be directly validated against EC-based flux tower measurements, which have a measurement footprint that ranges from tens to hundreds of metres away from the tower and is constantly changing depending on wind speed and direction, atmospheric stability and surface roughness. It also provides a better visualization of the fluxes' spatial patterns in heterogeneous areas, such as typical agricultural landscapes, allowing for monitoring of individual fields and development of, for example, more efficient irrigation practices (Anderson et al., 2011).
More recently, a modification of the DTD model has been developed to allow the use of LST derived by polar orbiting satellites with night-time and daytime overpass times (Guzinski et al., 2013). This was done by exploiting the day and night LST estimates provided by the MODIS sensor on board of the Aqua and Terra satellites and by replacing the early morning temperature measurement in the DTD model with one taken at night by the Aqua satellite. By using the data provided by polar orbiting satellites the geographical region of the applicability of the DTD model has been extended to areas at high latitudes, such as Scandinavia, whose land surface processes cannot be reliably monitored with geostationary satellites due to severe geometric and atmospheric effects. The modified DTD model was run with the LST estimates provided by the MODIS sensor aboard the Terra and Aqua satellites and compared to flux tower observations in a number of ecosystems, in most cases obtaining satisfactory results (Guzinski et al., 2013).
However, one site where the modified DTD model did not perform satisfactorily was the Voulund (VOU) agricultural site in western Denmark. A possible factor contributing to the poor performance of the model at this particular location is the highly heterogeneous nature of the site (Guzinski et al., 2013). Figure 1a shows the orthophoto of the VOU site, with the overlaid grid indicating the location of MODIS pixels in the MODIS sinusoidal projection. Within each 930 m MODIS pixel there are a number agricultural fields present, each one at a different stage of crop development, as well as shrubs and small plantations of young spruce and fir. The LST contrast within the MODIS pixel, caused by heterogeneity, should not have such a large influence on the coarse-scale modelled fluxes, especially for estimates which are not either very high or very low (Kustas and Albertson, 2003). Therefore we hypothesize that the discrepancy between the modelled and measured fluxes is in large measure due to the mismatch between the footprint of the flux tower measurements and the size of the MODIS pixel. In addition, as the dominant wind direction in the region is from the west, a large part of the flux tower footprint lies in the MODIS pixel adjacent to the one containing the flux tower.
To test the hypothesis, our objective was to develop a DTD disaggregation algorithm, based on DisALEXI (Anderson et al., 2004), using high-resolution observations from the Landsat family of satellites. This disaggregation methodology has not been previously applied to fluxes derived with the DTD model or fluxes derived purely with polar orbiting satellites. We tested the disaggregation at two different sites, the agricultural site mentioned previously (VOU) as well as a conifer plantation Gludsted (GLU), which in contrast to VOU is homogeneous at MODIS pixel scale but with certain heterogeneity at a smaller spatial scale (Fig. 1b). This allowed us to evaluate the performance of the disaggregation algorithm in different ecosystems and at different spatial scales of heterogeneity. At both sites we tested the robustness of the disaggregation algorithm by comparing it to fluxes estimated by running the TSEB model directly with Landsat data without performing the disaggregation. In addition, we also modified the DTD formulation to enable interaction between the modelled canopy and soil sensible heat fluxes, which has a significant impact on the accuracy of the output energy flux estimates. Finally, we tested the performance of DTD when used with modelled, instead of in situ, meteorological data for operational applications at regional scales.
In Sect. 2 we present the data used in this study: the validation fluxes and meteorological observations from the VOU and GLU tower-based measurements, satellite-based inputs and modelled meteorological data. Section 3 explains the principles behind the TSEB and DTD models and the disaggregation algorithm used in this study, with the actual model equations presented in Appendices A1 and A2 respectively. In Sect. 4 we first compare the performance of the DTD model at MODIS pixel scale with the old and new model formulations. Then, we evaluate the disaggregated high-resolution fluxes and compare them with fluxes obtained by using Landsat data directly with TSEB without disaggregation and the low-resolution DTD fluxes. This is done using both locally measured and modelled meteorological inputs. We end with conclusions in Sect. 5.

Flux tower data
The models were run over an area covering the measurement footprints of two EC flux towers located on the Jutland peninsula in western Denmark (Fig. 1). The first flux tower is placed in a highly heterogeneous agricultural site, Voulund (VOU), while the second is in a coniferous plantation, Gludsted (GLU), dominated by Picea abies with an average height of 20 m and homogeneous at MODIS spatial resolution while displaying small-scale heterogeneity due to forest roads and clearings and stands of different species and ages (Ringgaard et al., 2012). Both sites are in sandy soils with a temperate maritime climate experiencing mean annual precipitation of 990 mm and mean annual temperature of 8.2 • C. The two ecosystems encompassed by the VOU and GLU flux tower sites together represent around 85 % of the land cover type of the Skjern river catchment, which is the largest river in Denmark in terms of water volume. A more detailed description of the sites is presented in Ringgaard et al. (2011).
Both sites were equipped with a Gill R3-50 sonic anemometer (Gill Instruments Ltd., Lymington, UK) and LI-7500 open-path infrared gas analyser (IRGA) (Li-Cor Inc., Lincoln, NE, USA) to continuously measure wind components in three dimensions and concentrations of water vapour, sensible heat and CO 2 . The EC system at VOU is mounted at a height of 6 m above ground level (a.g.l) and air temperature is measured at 4 m a.g.l., while at GLU the EC system is 38 m a.g.l. and air temperature is measured at 30 m a.g.l. above ground. Turbulent fluxes were calculated using the EddyPro 4.2 software (Li-Cor Inc., Lincoln, NE, USA). The routine to calculate the fluxes includes coordinate rotation, block averaging (30 min windows), corrections for density fluctuations (Webb et al., 1980) and spectral corrections (Moncrieff et al., 1997). Additionally, the fluxes were corrected for surface heating of the IRGA (Burba et al., 2008), which has a most pronounced effect during cold season. Fluxes were quality checked according to Mauder and Foken (2006) and flagged if quality criteria were not met. The only setting that was different between the two sites was the coordinate rotation. At VOU, where canopy height and structure changes during the season, and where the canopy is nearly homogeneous in all directions, double rotation was applied. At GLU, where the lined structure of the trees could potentially cause flow distortions in specific directions, the planar fit method (Wilczak et al., 2001) was applied. Processed data were subjected to further quality control to detect outliers in the calculated 30min average fluxes according to the method proposed in Papale et al. (2006). As some data were rejected in the quality control, there are gaps in the data set. Gaps are filled by the standardized method proposed by Reichstein et al. (2005) with the online tool available at: http://www.bgc-jena.mpg. de/bgi/index.php/Services/REddyProcWeb (last accessed 10 January 2014).
Both towers also have sensors for measuring the four components of net radiation, incoming/outgoing and short 5024 R. Guzinski et al.: Remotely sensed land-surface energy fluxes at sub-field scale wave/long wave, as well as air temperature and humidity. The 30 min averaged air temperature, wind speed, humidity and incoming solar radiation observations from the towers were used as input for the DTD and TSEB models. Energy closure in the measured fluxes was ensured by assigning residual energy flux to the latent heat, based on the assumption that errors in the measurements of LE are larger than in the measurements of H due to the nature of the sensors and the fluxes (Foken et al., 2011). In addition, at sites where Bowen ratio (BR) is low, such as VOU, it is recommended to assign the residual energy to LE , while at high flux towers, such as at GLU, the 30 min averaging period can miss the low frequency eddies which, once again, affects mostly LE (Finnigan et al., 2003).
During the validation of the model performance only the measured, not gap filled, fluxes were used. Measurements when either H or closed LE had implausible values (H < −100 W m −2 or H > 900 W m −2 and LE < 0 W m −2 or LE > 900 W m −2 ) were also removed.

Satellite data
The use of MODIS data as input for the DTD algorithm is presented in Guzinski et al. (2013). Briefly, the M*D11A1 V5 daily product (Wan, 2006) from Aqua and Terra satellites was used for LST and emissivity estimations, with the nighttime observations taken from Aqua overpass around 1 a.m. local time and daytime observations taken from Aqua, Terra or both satellites (resulting in two flux estimates per day) depending on the quality of the observations. LST observations with all view zenith angles (VZAs), up to 65 • , were included. Other MODIS products employed are 8-day MCD15A2 V5 for leaf area index (LAI) (Knyazikhi et al., 1999), 16-day MOD13A2 V5 for vegetation indices required for estimating the fraction of vegetation that is green (Guzinski et al., 2013) and 8-day MCD43B3 V5 product for albedo.
LAI is used in the DTD and TSEB models for three major functions: to estimate the fraction of vegetation viewed by the sensor; to estimate the amount of net radiation (mostly incoming short wave) intercepted by the canopy; and in calculations of resistances to heat transfer, e.g. for wind profile estimation through the canopy (see Appendices). In the first two applications, the "view" of the canopy is from the top. Therefore, in coniferous evergreen forests it is mainly the needles which are in the field of view of the sensor and which intercept net radiation. The woody biomass is mostly hidden below the needles. In contrast, in crops where the whole plant turns yellow during senescence, the green and non-green parts of the plant form a more homogeneous mixture and are significant in all three functions. Therefore, in croplands the MCD15A2 green LAI was divided by fraction of vegetation that is green to obtain plant area index (PAI) before using it in the models, while in coniferous forest MCD15A2 was used directly.
All Landsat data came from cloud-free observations taken by Landsat 5, Landsat 7 and Landsat 8 over the period 2009-2013 over the study area with the spatial resolution in the visible/near-infrared part of the spectrum of 30 m and thermal infrared observations resampled from 120 m (Landsat 5), 60 m (Landsat 7) and 100 m (Landsat 8) to 30 m by the data provider (USGS, http://landsat.usgs.gov/band_designations_ landsat_satellites.php, last accessed 17 January 2014). An atmospheric correction was performed with MODTRAN 5 (Berk et al., 2006) with the standard mid-latitude summer atmospheric profile used for all of the atmospheric parameters except for the total column water vapour, ozone, atmospheric optical thickness and temperature and pressure profiles which came from the Terra MOD08 gridded atmospheric product produced daily at one degree spatial resolution. The top-of-canopy reflectances in the visible and nearinfrared parts of the spectrum were derived following the approach of Xu et al. (2008) which is based on the Fast Lineof-sight Atmospheric Analysis of Hypercubes (FLAASH) method (Anderson et al., 2002) with the adjacency effect considered to be significant in an area with a diameter of 1 km (Verhoef and Bach, 2003). Cloud masking was performed using the Fmask algorithm (Zhu and Woodcock, 2012), modified to also work with Landsat 8 bands assuming that the Landsat 8 sensor has the same characteristics that the Landsat 7 sensor has.
The vegetation indices (NDVI and EVI) were calculated directly from the top-of-canopy reflectances of the appropriate Landsat bands. LAI and albedo at Landsat scale were derived using decision tree regression trained with highquality MODIS LAI and albedo observations and Landsat reflectances, from all the VIS and NIR bands, aggregated to MODIS pixel size (Gao et al., 2012). Therefore any errors present in MODIS LAI or albedo products were inherited by the Landsat estimates. The temporal distribution and magnitude of Landsat LAI at the two study sites is shown in Fig. 2. Emissivity was linearly scaled with fractional vegetation cover obtained from NDVI (Stisen et al., 2007), with bare soil emissivity set to 0.950 at NDVI of 0.15 and full vegetation cover emissivity set to 0.995 at NDVI of 0.70. Landsat LST was estimated using the approach of Coll et al. (2010) with the upwelling atmospheric radiance and atmospheric transmittance obtained from a MODTRAN run with the simulated sensor at satellite height, and LST of 0 K and ground emissivity of 1 (albedo of 0) to avoid any emitted or reflected long-wave radiation signal from the surface, and the downwelling atmospheric radiance from a MODTRAN run with the sensor just above the surface and ground emissivity of 0 (albedo of 1). Finally, land cover classification was taken from the 2006 update of the Corine land cover (Büttner et al., 2004). igure 2. Temporal distribution of Leaf Area Index (LAI) estimates at VOU and GLU sites. The x a dicates the day of year of the estimate, while the y axis shows the mean Landsat-scale LAI with e flux tower footprint. Landsat LAI estimates were based on MODIS LAI esitmates. Empty circ epresent green LAI derived directly from MODIS estimates, while full circles represent total LAI, us s input to the model and derived by (in some cases) dividing green LAI by fraction of vegetation tha reen. For details see sections 2.2. 51 Figure 2. Temporal distribution of leaf area index (LAI) estimates at VOU and GLU sites. The x axis indicates the day of year of the estimate, while the y axis shows the mean Landsat-scale LAI within the flux tower footprint. Landsat LAI estimates were based on MODIS LAI estimates. Empty circles represent green LAI derived directly from MODIS estimates, while full circles represent total LAI, used as input to the model and derived by (in some cases) dividing green LAI by fraction of vegetation that is green. For details see Sect. 2.2.

Modelled meteorological data
To determine regional-scale surface energy fluxes the local, tower-based, meteorological observations need to be replaced by an interpolated or modelled meteorological data set. We have therefore also tested the performance of the disaggregation algorithm when such data are being used. The modelled meteorological data are obtained from the ERA-Interim reanalysis data set (Dee et al., 2011) provided by the European Centre for Medium-Range Weather Forecasts (ECMWF). The products used were the 2 m air temperature (2T ), 2 m dew point temperature (2D) used to calculate the vapour pressure, the 10 m horizontal wind speed (10U and 10V ) and surface solar radiation downward (SSRD). Although the air temperature field is nominally at 2 m height, in our model it was assumed that it is the air temperature measured at 100 m above the ground. This can be justified based on the fact that the air temperature is modelled at very low spatial resolution indicating that is can be treated as regional temperature above blending height. In addition the time differential nature of DTD should remove any biases caused by increasing the temperature measurement height. The wind field was assumed to be at 10 m above the canopy, or the ground if the canopy is shorter than 10 m. The data are provided at a 0.75 • spatial resolution and were subset and resampled into a MODIS sinusoidal grid projection for the MODIS tile covering the area of interest. In the temporal domain the data were linearly interpolated between the 3-hourly ERA-Interim time steps.

TSEB
The TSEB model was developed by Norman et al. (1995) and later underwent a number of modifications (e.g. Kustas and Norman, 1999). The main innovation in the TSEB model is to split the LST observed at VZA θ, namely T R (θ ), into vegetation canopy and soil temperatures, T C and T S respectively, based on the fraction of view of the radiometer covered by vegetation, f θ , which is estimated from observation geometry and the fraction of vegetation canopy cover: This allows the sensible heat fluxes from the vegetation and soil to be computed separately, based on the temperature gradient from the canopy and soil respectively and air temperature at some height above or within the canopy. The latent heat flux from the canopy is initially estimated using a modified Priestley-Taylor approximation (Priestley and Taylor, 1972), while the latent heat flux from the soil is estimated as a residual of the other fluxes, thus ensuring energy balance closure. The TSEB model is used in this study for disaggregating fluxes from low to high spatial resolutions as well as for directly estimating fluxes with Landsat data. The equations used in the current implementation of the TSEB model are presented in Appendix A1.

DTD
One of the limitations of temperature-based energy balance models, such as the TSEB model, is their sensitivity to the temperature gradient between the LST (or its soil and canopy components) and the air temperature. This makes the models highly susceptible to errors introduced in the absolute measurements of LST or air temperature. To improve the robustness of the TSEB modelling scheme two approaches, ALEXI (Anderson et al., 1997) and DTD (Norman et al., 2000), have been developed which replace the absolute temperature estimates by a time-differential temperature measurement between a time early in the morning and another time later during the day. ALEXI couples the surface energy balance to a model of atmospheric boundary layer growth during the morning hours and requires an atmospheric profile sounding during the early morning hours, while DTD implements a simpler model formulation requiring the same inputs as TSEB but at the two observation times. Both ALEXI and DTD models require the first temperature measurement at one hour past sunrise, when surface heat fluxes are minimal (Anderson et al., 1997). This means that they require precise timing of the morning temperature estimate and so are dependent on observations from geostationary satellites with their sub-hourly temporal resolution. However, Guzinski et al. (2013) have established that it is possible to replace the early morning temperature observation in the DTD model with night-time observations with minimal degradation in the retrieved fluxes. They have also introduced modifications to the model to accommodate the two LST observations having different VZAs thus allowing the use of DTD with polar orbiting satellites with both nighttime and daytime overpasses. Another modification of the DTD model, presented in this paper, is to use a "series" resistance network instead of a "parallel" one for the calculation of the sensible heat fluxes. The advantage of the "series" resistance network is that it takes into account the interaction between the fluxes coming from the canopy and the soil by estimating the in-canopy air temperature. The differences between the two resistance network configurations are presented in Norman et al. (1995) while the modification to the DTD model, together with other implementation details, is shown in Appendix A2. This modified DTD model is used in this study to estimate the coarse-resolution fluxes.

Disaggregation
The disaggregation methodology is based on the DisALEXI algorithm, developed for disaggregating fluxes derived with the ALEXI model . Since ALEXI requires geostationary observations as input it produces flux estimates with a pixel resolution ranging from 3 to 10 km depending on the satellite resolution at a given latitude and longitude. Another output of ALEXI is the air temperature at blending height, assumed to be 50 m above the surface, also at 3-10 km resolution (Anderson et al., 1997). During the disaggregation procedure this air temperature is used as an upper boundary condition, while LST derived with a highresolution sensor (such as Landsat) is bias corrected to match the LST used as input to ALEXI at 3-10 km scale and then used as the lower boundary condition for a TSEB model (see Fig. 1 in Norman et al., 2003). This ensures that the surface to air temperature gradient, and therefore the surface fluxes, are consistent between the lower-resolution ALEXI estimates and the higher-resolution TSEB ones. One of the assumptions of DisALEXI is that the temperature at blending height is constant within the 3-10 km pixel. Even though the blending height air temperature is more spatially uniform than near-ground air temperature, this assumption might not necessarily hold in highly heterogeneous landscapes due to advection and localized couplings between the surface and the atmosphere (Anderson et al., 2004;Kustas and Albertson, 2003). However, in this application we are disaggregating from 1 km MODIS pixels which means that the assumption has a better chance of being met than if 3-10 km pixels were used.
In a recent application of DisALEXI, consistency between the low-resolution and high-resolution model runs was ensured using the daily H estimates, up-scaled from the instantaneous H estimates provided by ALEXI and the higher-resolution TSEB model using the assumption of selfpreservation of evaporative fraction (EF) (Cammalleri et al., 2013). This removes the requirement of having the low-and high-resolution flux estimates coincident in time and allows the technique to be used with polar orbiting satellites with different overpass times. In this approach the ALEXI-derived blending height temperature is used as the initial value for the upper boundary condition of the high-resolution TSEB run and is then iteratively adjusted until the daily H estimates of ALEXI and TSEB, aggregated to ALEXI pixel size, match.
However, there is no general agreement on the best way to up-scale instantaneous fluxes to daily values, or to compare two instantaneous flux measurements taken at different times of the same day while removing their time-dependent component. Some recent studies suggest that EF remains stable, especially around noon hours in cloud-free condition, in a wide range of ecosystems (Peng et al., 2013) while others have proposed the replacement of EF with the ratio of LE to incoming solar radiation at ground level (Cammalleri et al., 2014). Therefore in this study we evaluate three approaches to estimate what we term the constant ratio (CR): EF, ratio of LE to incoming solar radiation, LE/R s, in , and the ratio of H to incoming solar radiation, H /R s, in . The third approach is included since TSEB-based models estimate LE as a residual of the other fluxes, while H is calculated directly from the model inputs. It can therefore be assumed that errors present in all the other flux estimates will contribute to error in the LE estimate, meaning that it can potentially be less accurate than the H estimate.
Since the CRs are assumed to remain constant between the Terra/Aqua and Landsat overpass times, it is not necessary to determine daily fluxes, as was done in Cammalleri et al. (2013), to ensure consistency between the low-and high-resolution estimates. To obtain mean daily fluxes from the CRs, the ratios would have to be multiplied by the mean daily net radiation (in the case of EF) or mean daily incoming solar radiation (in the case of other ratios). Since the same mean daily net/solar radiation would be used for estimating mean daily fluxes from the instantaneous MODIS and Landsat fluxes, the mean daily net/solar radiation would just serve as a scaling factor. Therefore our hypothesis is that the use of daily fluxes is redundant and it is possible to perform the disaggregation on instantaneous fluxes, even though there is a time offset of a couple of hours between the low-and highresolution satellite observations, by using any of the CRs instead.
In summary the disaggregation method is as follows: 1. Estimates of instantaneous fluxes at MODIS pixel scale are provided by running DTD with MODIS inputs.
2. For each MODIS pixel the constant ratio, which is assumed to remain constant during the daylight hours, is calculated using the DTD output fluxes.
3. TSEB is run for all Landsat pixels falling within one MODIS pixel, with the air temperature at blending height (50 m) set at some plausible initial value.
4. The instantaneous TSEB estimated fluxes within one MODIS pixel are aggregated and the constant ratio of the aggregated fluxes is calculated.
5. If the ratios derived from DTD and TSEB runs do not match, the air temperature at blending height is adjusted and the TSEB model is rerun. This is repeated until the ratios match.
6. Once all the MODIS pixels in the region of interest have been processed, a 2 km × 2 km moving average filter is run over the resulting air temperature map under the assumption that air temperature at blending height should be rather homogeneous at that spatial scale. The filtered air temperature is then used for a final run of TSEB over the whole region to produce flux estimates at Landsat scale.
In Norman et al. (2003) and Cammalleri et al. (2013) the disaggregation is performed on fluxes produced with the ALEXI model, which also outputs an estimate of blending height air temperature (Anderson et al., 1997). This air temperature can then be used during the disaggregation procedure. This is not possible with the DTD model, since this output is not produced. Therefore, when running the model with tower-measured meteorological inputs the air temperature at tower height is used as the initial value of air temperature at blending height in step 3 of the disaggregation procedure. When using ERA-Interim meteorological data the air temperature from the 2T data set is used as the initial air temperature at blending height. Figure 3 illustrates model configurations used during the study. To perform the disaggregation (Fig. 3a), first the DTD model (Sect. 3.2) is run with MODIS and meteorological data as inputs to produce one of the three CRs as output. This CR is then used during the disaggregation procedure (Sect. 3.3) to establish boundary conditions for the fluxes derived with TSEB model (Sect. 3.1), which in addition to Landsat data, uses the same meteorological inputs as DTD. In order to evaluate the performance of the disaggregation procedure, DTD-derived fluxes at 930 m spatial resolution ( Fig. 3b) and TSEB-derived fluxes at a nominal resolution of 30 m ( Fig. 3c) are also produced. The model configurations are run with either tower-based meteorological observations (Sect. 2.1) or the ERA-Interim reanalysis meteorological data set (Sect. 2.3) used as input.

Flux tower footprint
The accuracy of the disaggregated modelled fluxes is evaluated by comparison with sensible and latent heat flux measurements from the tower-mounted EC systems. Since the disaggregated fluxes are at a spatial scale comparable to the size of the area contributing to the measured fluxes it is important to establish the actual flux tower footprint before performing the comparisons. The 2-D footprint is estimated using the approach of Detto et al. (2006). The footprint weights in the upwind direction are derived using the model of Hsieh et al. (2000), which takes atmospheric stability into account, and the weights in the cross-wind direction are assumed to be normally distributed with a standard deviation dependent on the standard deviation of the horizontal crosswind velocity fluctuations (Schmid, 1994). This results in a 2-D grid of pixels representing the relative contribution of each pixel to the total EC measurement, with the sum of all pixels being 1 (Fig. 1). When evaluating the high-resolution fluxes, each modelled pixel is scaled according to the strength of the contribution of its location to the EC measurement.

"Parallel" vs. "Series" DTD
The DTD model was run at the VOU and GLU flux towers using remotely sensed MODIS inputs with the exception of meteorological parameters which were taken from the tower-based observations. The model was run for all suitable MODIS observations from 2009 to 2013 giving around 200 modelled fluxes at each of the sites. The two versions of DTD, "parallel" and "series", used the same model formulations with the exception of the equation for the estimation of H . In the case of "parallel" DTD the original equation (Eq. A37) was used while in the "series" DTD version the new equation (Eq. A40) was used. The results for sensible and latent heat fluxes are presented in Fig. 4 and Table 1.
The model output fluxes were split into two groups. The first group consisted of all model outputs, while the second group (numbers in parentheses in Table 1) consisted only of outputs where "parallel" DTD converged to a plausible solution with positive LE. When a plausible solution could not be found at the end of model run, it was assumed that due to dry conditions there is no evapotranspiration (LE = 0) and the net radiation was split between H and G (see the Appendix for details). This is a backup behaviour of the model and can lead to inaccurate results when the lack of convergence in the model is not due to low ET but due to, for example, noise in the input parameters. The "series" implementation of DTD always produced plausible solution whenever "parallel" implementation did, and often managed to converge even when "parallel" implementation could not (Fig. 4).
The "series" implementation of DTD improved the accuracy of the modelled fluxes significantly, especially at VOU where the RMSE of H is significantly reduced (87 W m −2 vs. 160 W m −2 when all modelled fluxes are considered) and bias is reduced by a factor of around 3. At GLU the results are less pronounced with bias of H switching from positive to negative and increasing, although the RMSE is still reduced (118 W m −2 vs. 127 W m −2 ). In the case of LE, the "series" implementation reduced RMSE and bias significantly at both sites, while also improving correlation.
At both sites using the "series" implementation of DTD reduced the magnitude of the modelled sensible heat flux mainly when sensible heat was relatively high and latent heat was relatively low, i.e. during dry conditions. This is similar to the effect that the "series" formulation has on the TSEB model and is explained by the importance of the heating up of the in-canopy air temperature, which is explicitly modelled in the "series" formulation while being left out in the "parallel" formulation (Norman et al., 1995).
Since the "series" version of the DTD model provides substantial improvements, it is the version that is used in the reminder of this study.

Disaggregation at the agricultural site -VOU
The DTD model was run at the MODIS resolution of 930 m over the VOU site using MODIS data and tower-based meteorological observations as input and the results were disaggregated to 30 m resolution using the TSEB model with Landsat data and tower-based meteorological observations as input. All the dates between 2009 and 2013 when highquality, cloud-free MODIS and Landsat observations were available were considered. Figure 5 shows the comparison between the instantaneous modelled heat fluxes, aggregated to the estimated flux tower footprint, and the 30 min ECbased measurements. The results for the three methods of estimating the CR ratio are presented (Fig. 5a-c) together with the fluxes estimated purely using TSEB with Landsat ( Fig. 5d) and with DTD without disaggregation (Fig. 5e). The latter fluxes are not aggregated to the flux tower footprint but instead the value of the MODIS pixel containing the flux tower is used. A statistical comparison is presented in Table 2.
The results are split into two sets. The first set (called S70) contains dates on which the disaggregation was successfully performed and Landsat pixels contributing to at least 70 % of the EC measured flux are present but there also might be missing Landsat pixels within the modelled MODIS pixels. The second set (called S100) is a subset of S70 and consists of dates where, additionally to S70 criteria, all the Landsat pixels within the modelled MODIS pixels are present. This does not necessarily mean that pixels representing 100 % of EC measured flux are present, since sometimes the measurement footprint extends slightly beyond the modelled MODIS pixels. However, in practice all the dates in S100 set contain pixels representing at least 98 % of EC measured flux footprint. For dates when pixels representing less than 100 % of flux footprint contribution are present, the aggregated flux is scaled by the fraction of the missing footprint. For example, if pixels representing only 80 % of EC footprint are present then the aggregated modelled flux is divided by 0.8 before it is compared with EC measured flux. Table 1. Statistical comparison of modelled vs. observed sensible and latent heat fluxes at VOU and GLU for the "parallel" and "series" implementation of DTD. Modelled fluxes are instantaneous at the time of daytime Terra/Aqua overpass and observed fluxes are 30 min averaged EC measurements. Numbers in parentheses indicate statistics only for model runs when "parallel" DTD converged to a plausible solution (LE > 0 W m −2 ) while numbers outside brackets present statistics for all model runs, even when a backup formulation was used. The statistical parameters used are bias (modelled-measured), root mean square error (RMSE), coefficient of variation (CV -RMSE divided by mean of observed values), and correlation (r). Bias and RMSE are in W m −2 , the other parameters are unitless.

Site
Implemen The missing Landsat pixels are most frequently caused by the faulty sensor aboard Landsat 7 satellite but can also be caused by clouds smaller than the MODIS pixel size (Fig. 6). In both cases the missing pixels can cause biases during the disaggregation procedure, especially if the gaps are over an area with flux values significantly different from the mean MODIS pixel value, such as over an irrigated field surrounded by drier shrubland. This is because the disaggregation algorithm forces the high-resolution CR aggregated to a MODIS pixel to match the MODIS CR even if there are some high-resolution pixels missing within the MODIS pixel footprint.
The results in Table 2 show that the RMSE is reduced by around a factor of 2 for both H (from 51 W m −2 to 25-27 W m −2 in the case of S70 and from 52 W m −2 to 27-33 W m −2 in the case of S100) and LE (from 80 W m −2 to 35-37 W m −2 in the case of S70 and from 59 W m −2 to 24-32 W m −2 in case of S100) when the fluxes are disaggregated. The correlation is also improved, reaching a value around 0.95 for both turbulent fluxes, with particularly strong positive impact on LE in the S100 data set. The effect of disaggregation on bias is very varied, with the magnitude mostly increasing and sign changing from positive to negative in the case of H . However, there is significant increase in RMSE of net radiation (from 24 W m −2 to 44-46 W m −2 in the case of S70 and from 27 W m −2 to 48-51 W m −2 in the case of S100) which can be attributed to more-than-doubling of underestimation bias.
The statistical differences between the three disaggregation methods are not very pronounced. However, there is clear distinction in where they allocate the deficit of energy, caused by the underestimation of R n . When H /R s, in is used as CR, the bias of H is the smallest and the bias of LE is the largest in comparison to the other CRs. In contrast, when LE/R s, in is used as CR the situation reverses and the bias of H becomes the largest and that of LE the smallest. The magnitude of bias influences the magnitude of RMSE with H /R s, in producing the smallest RMSE for H and LE/R s, in leading to the smallest RMSE for LE. Using EF as CR achieves an optimal compromise with biases of H and LE falling between the two extremes but RMSE staying close to the minimum magnitude.
To evaluate whether using the low-resolution fluxes to establish boundary conditions for high-resolution fluxes actually improves the high-resolution model performance, the TSEB model was also run with air temperature taken directly from tower measurements and not adjusted based on the CR. In those cases the height of temperature measurements was set to 4 m which is the height of the tower-based temperature sensor. The results are presented in Fig. 5d and in column ND H of Table 2. When disaggregating the lowresolution fluxes it is often possible to obtain two estimates per day (although both at the same time of Landsat satellite overpass): one produced by disaggregating fluxes modelled with daytime Aqua satellite observation and the other by using the daytime Terra satellite observation. When using Landsat data directly to obtain high-resolution estimates, only one model run per Landsat overpass is possible. Therefore, there are fewer points in panel (d) of Fig. 5 than in the other panels. The RMSE of high-resolution disaggregated turbulent fluxes is significantly lower than when those fluxes are derived without disaggregation. The correlation becomes stronger with disaggregation when looking at the S70 data set but decreases slightly (although still remaining above 0.94) when considering just the fluxes in S100.

Disaggregation at coniferous plantation -GLU
The DTD model and the three variants of the disaggregation algorithm were run over the GLU site in a similar fashion as in VOU. The results are presented in Fig. 7 and Table 3. In the case of GLU the disaggregation does not improve the accuracy of the modelled turbulent heat fluxes when compared to the MODIS-scale estimates. This was partially expected since the area around the tower is quite homogeneous at MODIS scale. For the S70 data set the RMSE of lowresolution H fluxes (118 W m −2 ) was among the range of RMSE of the disaggregated fluxes (from 116 to 160 W m −2 ). The same is true for the correlation coefficient, having a value Empty symbols indicate model runs when "parallel" DTD could not converge to a plausible solution (LE < 0 W m −2 ) and instead a backup formulation was used. In the top panels the "parallel" implementation of DTD was used, in the lower panels the "series" implementation.
of 0.51 for low-resolution fluxes and between 0.33 and 0.55 for the disaggregated fluxes, while the bias increased in magnitude during the disaggregation. When just the S100 data set is considered, the disaggregation reduces the accuracy of the modelled sensible heat fluxes. This situation is reflected when looking at statistics for LE, although in this case the RMSE of the three disaggregation methods and the lowresolution fluxes are much closer together. The correlation is reduced from already very low value to around 0 or even becoming negative. As happened in VOU, the RMSE of R n in-creased significantly during the disaggregation (from around 50 W m −2 to around 90 W m −2 ) and this once again can be attributed to a similar increase in the magnitude of bias. Similarly to what happened at VOU, using EF as CR leads to the most balanced partition of the energy deficit between H and LE. When TSEB is used directly with Landsat inputs (Fig. 7d), without performing the disaggregation, a very large negative bias of sensible heat flux is present (over −200 W m −2 ) ( Table 3). This also leads to very high values of RMSE.  Panel (e) shows the low-resolution, non-disaggregated fluxes modelled with DTD. Rectangles represent LE, circles represent H , triangles represent G and diamonds represent R n . Empty symbols indicate aggregated fluxes where pixels comprising 70 % of EC footprint were present (S70 set), and full symbols where in addition there were no missing Landsat pixels within the MODIS pixels (S100 set). Figure 6. Two examples of gaps present in the disaggregated high-resolution flux estimates. On the left stripped gaps at VOU due to the failure of the sensor on board Landsat 7 and on the right gaps at GLU due to clouds during Landsat overpass. The darker reds indicate higher sensible heat fluxes. Even though small-scale flux variations due to features such as roads or fields are properly modelled, the total heat flux within a MODIS pixel might be biased due to the gaps.   Table 3. Statistical comparison of modelled vs. observed fluxes at GLU for the three approaches used to estimate the constant ratio used during the disaggregation procedure and for the non-disaggregated high, ND H , and low, ND L , resolution fluxes. Modelled fluxes are instantaneous at the time of satellite overpass and observed fluxes are 30 min averaged EC measurements. The statistical parameters used are bias (modelled-measured), root mean square error (RMSE), coefficient of variation (CV -RMSE divided by mean of observed values), and correlation (r). Bias and RMSE are in W m −2 , the other parameters are unitless. However, the correlation coefficient is the highest among all the model runs at the GLU site. In the case of LE the magnitude of bias (around 100 W m −2 ) is significantly larger than for any other model runs and it becomes positive whereas before it was mostly negative. RMSE also increases, although not as much as for H .

Disaggregation when using modelled meteorological inputs
To be able to operationally apply the models at regional scales, the tower-based meteorological inputs (air temperature, wind speed, relative humidity and incoming short-wave radiation) have to be replaced with spatially distributed inputs. As described in Sect. 2.3 the ERA-Interim reanalysis data set was used in this study. To evaluate the performance of the models when run with the ERA-Interim inputs, the low-resolution DTD modelled fluxes, the high-resolution fluxes modelled directly with TSEB and Landsat data and the disaggregated high-resolution fluxes were compared to measured fluxes at both VOU and GLU flux tower sites. Only the disaggregation algorithm which uses EF as CR is analysed here since, when tower-based meteorological inputs were used, it produced the most promising results. The results are presented in Fig. 8 and Table 4 for VOU and Fig. 9 and Table 5 for GLU. The first observation is that the magnitude of bias and RMSE of R n is increased by a factor of 2 to 3 between the model runs with ERA-and towerbased meteorological data. This is due to underestimation of incoming short-wave radiation by ERA SSRD product when compared to tower measurements. Therefore it was decided to also run the models with ERA meteorological inputs with the exception of R s, in , which was obtained from tower observations. The results of those runs are presented in Fig. 10 and Table 6 for VOU and Fig. 11 and Table 7 for GLU.
It appears that using ERA meteorological inputs (both with and without R s, in ) does not have a big impact on the accuracy of modelled H fluxes in all the model runs. For S70 data set in VOU, the RMSE does not change significantly between the runs with different inputs, with the exception of disaggregation using EF which is slightly more accurate when using tower-based observations. For dates in S100 there is a reduction in error when using ERA meteorological inputs, although that could be just due to rearrangement of the limited number of points present for the evaluation. Latent heat flux is much more sensitive to the amount of available net radiation, since it is calculated as a residual. Therefore, when purely ERA meteorological inputs are used there is a large increase in negative bias and RMSE. When ERA SSRD is replaced by tower observations, the errors are similar to errors obtained in model runs with purely tower-based observations, with RMSE of low-resolution LE not changing much, the RMSE of high-resolution non-disaggregated LE improving a bit and the RMSE of disaggregated LE becoming a bit bigger.
At GLU the statistics present a similar picture, with the underestimation of incoming short-wave radiation present in ERA SSRD affecting mostly the latent heat flux. Using ERA meteorological inputs actually significantly improves the statistics for the turbulent heat fluxes compared to when field observations are used. For H , there is a reduction in the magnitude of both bias and RMSE, while the correlation increases. In the case of LE the best results are obtained when ERA inputs are combined with tower-based R s, in .

Discussion
The results demonstrate that obtaining high-resolution fluxes through disaggregation of the low-resolution, DTD-derived fluxes is more accurate than obtaining the high-resolution fluxes by applying TSEB directly to Landsat data. This implies that there is some extra knowledge in the low-resolution flux estimates which is not present in the fluxes derived directly by TSEB. Therefore, it can be inferred that the accuracy of the low-resolution estimates is similar to that of the disaggregated fluxes if those estimates were to be compared to flux measurements on the same low-resolution spatial scale. By disaggregating the fluxes to a spatial scale below the flux tower footprint we were able to directly compare the modelled and measured fluxes. It should be noted, however, that although the Landsat thermal data is provided at 30 m resolution it was acquired at a resolution of between 60 and 120 m depending on the satellite. This could contribute to the uncertainty when comparing the model output with flux tower measurements.
There were substantial differences observed between the two flux tower sites and between the S70 and S100 data sets. In theory, when looking at the non-disaggregated fluxes at both at low and high resolutions (Tables 2-7, columns ND H and ND L ), the statistical measures of accuracy of the modelled fluxes should be the same for points in the S70 and S100 sets. This is because the membership of the sets was based mostly on the Landsat sensor used: the S100 set had nonstripped observations from Landsat 5 and Landsat 8 while the S70 set contained all the observations, including Landsat 7 observations with a strip of missing pixels due to sensor failure. In practice, the statistical measures of non-disaggregated modelled fluxes in S70 and S100 sets are only similar at GLU. At VOU the non-disaggregated modelled fluxes in the S70 data set have larger errors and lower correlation than the ones in the S100 data set. This could be due to different climatic conditions present at the study sites during the years of operation of the different Landsat satellites. It could also be due to the relatively small number of points present in the data sets not being enough to represent the error distribution.
There is a larger difference between the three variations of the disaggregation algorithm at GLU compared to VOU. These differences could be due to large bias in the estimated net radiation at Landsat scale in GLU. The modelled net radiation is consistently underestimated with a bias of around Panel (e) shows the low-resolution, non-disaggregated fluxes modelled with DTD. Rectangles represent LE, circles represent H , triangles represent G and diamonds represent R n . Empty symbols indicate aggregated fluxes where pixels comprising 70 % of EC footprint were present (S70 set), and full symbols where in addition there were no missing Landsat pixels within the MODIS pixels (S100 set). 80 W m −2 compared to the tower measurement and around 40 W m −2 compared to MODIS-scale modelled net radiation. At VOU the underestimation of high-resolution net radiation is around 40 W m −2 compared to the tower measurement and around 25 W m −2 compared to low-resolution estimates. This mismatch could be partly due to the point nature of net radiation measurement vs. the spatially distributed nature of the modelled net radiation or due to inaccurate parameterization of physical parameters, such as LAI or albedo, at Landsat scale at GLU. For example, at MODIS scale the forested pixels have an albedo of around 0.09 while at Landsat scale that rises to around 0.10-0.12.
However, it could also be due to the long-wave component of the net radiation, particularly due to changes in estimated air temperature during the disaggregation. It should be noted that the value of the derived air temperature does not necessarily reflect the actual air temperature. This is because the value is derived to compensate for any errors in the Landsat LST and to ensure consistent fluxes between the DTD and disaggregated estimates. At VOU the final estimated T A usually does not change much from the initial value, except for one date around DOY 200 when the large decrease is most probably caused by incorrect estimation of Landsat LST (Fig. 12a and b). At GLU the final T A is predominantly decreased, sometimes by up to 10 • C ( Fig. 12c  and d). This is reflected in the changes in bias of R n between the disaggregated and non-disaggregated high-resolution estimates. At VOU the bias stays constant between the different high-resolution model runs, while at GLU there is a difference of up to 40 W m −2 between the disaggregated and nondisaggregated net radiation. This can be attributed purely to changes in T A , since all the other inputs have remained the same. Figure 12 also illustrates that the final step of the disaggregation procedure (spatial averaging of T A between the MODIS pixels) does not impact on the modelled fluxes and  (a) and (b)) are aggregated to EC footprint. Rectangles represent LE, circles represent H , triangles represent G and diamonds represent R n . Empty symbols indicate aggregated fluxes where pixels comprising 70 % of EC footprint were present (S70 set), and full symbols where in addition there were no missing Landsat pixels within the MODIS pixels (S100 set).
is purely cosmetic to produce smooth looking energy flux maps.
Another possible reason why the modelled high-resolution fluxes at GLU are less accurate than the low-resolution ones is the accuracy of flux tower footprint modelling in the forested landscape. The footprint model assumes a constant roughness and while, as mentioned earlier, the area appears homogeneous at MODIS scale, at Landsat scale different stand ages, roads and clearings become apparent, causing the assumption of uniformity to be broken. However, this does not appear to be a major issue as illustrated by the accuracy statistics when TSEB model is used directly with Landsat inputs without disaggregation (Table 3, column ND H ). In this case the correlation between the modelled and observed H fluxes is the highest of all the model runs (including the lowresolution one) with the correlation coefficient having the value of 0.65 for the S70 set, while the negative bias is at least twice as large as in any disaggregated run. Similar correlation values are obtained for H in the S70 data set for all model runs with ERA meteorological inputs. Those reasonably high Rectangles represent LE, circles represent H , triangles represent G and diamonds represent R n . Empty symbols indicate aggregated fluxes where pixels comprising 70 % of EC footprint were present (S70 set), and full symbols where in addition there were no missing Landsat pixels within the MODIS pixels (S100 set).  (panels (a) and (b)) are aggregated to EC footprint. Rectangles represent LE, circles represent H , triangles represent G and diamonds represent R n . Empty symbols indicate aggregated fluxes where pixels comprising 70 % of EC footprint were present (S70 set), and full symbols where in addition there were no missing Landsat pixels within the MODIS pixels (S100 set).
correlations would indicate that the footprint model, which is the same for all high-resolution runs, is working satisfactorily.
Yet another reason for the larger errors over the forested site is the nature of the site and the flux tower setup. For example, due to the large size of the canopy a large amount of heat can be stored in the in-canopy air layer and in the tree biomass (Lindroth et al., 2010). In addition, at sites in which the EC equipment is mounted on a tall tower the 30 min averaging period might not be enough to capture all the contributing eddies (Finnigan et al., 2003). In a previous study Ringgaard et al. (2011) have hypothesized that large-scale advection can be a considerable factor in the current study area and at the GLU site in particular. Even though the apparent effects of advection were most significant in winter they might still impact on the measured fluxes during other  Figure 12. Temporal distribution of air temperature estimates at three stages of the disaggregation procedure with EF used as CR: initial value (green circles), value after disaggregation (blue squares), and value after smoothing (red diamonds). The air temperature is the mean of the estimates falling within the flux tower footprint. The panels on the left are with model runs using tower-based meteorological observations, and those on the right are with model runs using ERA-Interim-based meteorological inputs with the exception of incoming short-wave radiation which came from tower observations.
periods. It has also been observed that there are significant differences in the fluxes from trees at the edges or inside of the forest stands due to canopy structure (Ringgaard et al., 2012). Errors in parameterization of model inputs related to the canopy conditions, such as LAI or emissivity, could also play a significant role at the forest site. For example, MODIS LAI estimates are known to be not particularly accurate in coniferous forests (Jensen et al., 2011;Kauwe et al., 2011).
All those issues affect flux modelling at both high and low spatial resolutions, however they might be less significant at low resolution due to spatial averaging of the modelled fluxes. Finally, we discuss the impact on the accuracy of estimated fluxes of using model meteorological data, (Tables 4-7) instead of measured data (Tables 2 and 3) as input. It is clear that the ERA-Interim meteorological fields are suitable as Table 4. Statistical comparison of modelled vs. observed fluxes at VOU with the meteorological inputs provided by the ERA-Interim data set. Statistics are shown for the EF approach for estimating constant ratio used during the disaggregation procedure and for the non-disaggregated high, ND H , and low, ND L , resolution fluxes. Modelled fluxes are instantaneous at the time of satellite overpass and observed fluxes are 30 min averaged EC measurements. The statistical parameters used are bias (modelled-measured), root mean square error (RMSE), coefficient of variation (CV -RMSE divided by mean of observed values), and correlation (r inputs into the DTD or TSEB models. The only exception is the incoming short-wave radiation (SSRD) product which severely underestimates the clear-sky incoming short-wave radiation measured at the tower sites. The reason for this is that SSRD is not a clear-sky product, so it models the radiation for some estimated cloud cover. In regions such as Denmark the estimated cloud cover within a 0.75 • pixel can be significant even though at certain locations within this pixel (such as at the flux tower sites) the conditions are cloud free. Fortunately, the clear-sky incoming short-wave radiation around noon remains quite steady at regional scales. Therefore, it is possible to extrapolate point observations of the radiation to all cloud-free areas within a region. On the other hand, using the ERA Interim air temperature esti-mates instead of tower-based observation can even improve the model performance. For example in GLU a large reduction in bias of H is obtained when running the DTD model (and the disaggregation) with ERA Interim meteorological inputs. This could be caused by the fact that (1) DTD was designed to reduce errors caused by systematic bias in the input temperatures, and (2) that the air temperature estimated by the meteorological forecast and analysis models is representing regional blended air temperature rather than local, tower measured, air temperature. This indicates that while there is a dominant unsystematic difference between the air temperature measured at the flux tower site and air temperature at another point in the modelling domain, due to different heating of the air from the underlying surfaces, the difference between the modelled blending height temperature at very low resolution (0.75 • in case of ERA interim) and air temperature at a point within the modelling domain has a rather dominant systematic bias. This allows DTD to obtain more accurate results with the modelled meteorological inputs and this improvement is propagated through the disaggregation procedure.

Conclusions
In the current study we have looked at disaggregating MODIS spatial scale (930 m) sensible heat fluxes derived with the DTD model to Landsat thermal observations' spatial scale (60-120 m resampled to 30 m) using the TSEB model, and the assumption of self-preservation of evaporative fraction and ratios of H /R s, in and LE/R s, in , at a highly heterogeneous agricultural site and a more homogeneous coniferous plantation forest. It was found that using EF as the CR parameter during the disaggregation produces the most balanced results for H and LE at both agricultural and forested sites. The results at both sites also show that disaggregating the low-resolution fluxes to higher resolution produced more accurate results than when TSEB was applied directly to high-resolution Landsat data. This indicates that the low-resolution fluxes are accurate at the 1 km spatial scale, since they provide useful additional information to the highresolution fluxes during the disaggregation procedure. It also corroborates the theory raised in the Introduction that the discrepancy between fluxes modelled with DTD and measured using tower-based EC equipment is in large part due to the scale mismatch between the 930 m model pixel and the measurement footprint, especially at heterogeneous sites. At the agricultural site the disaggregated high-resolution fluxes compare very well with the flux tower measurements with small bias (−3 W m −2 for H and −25 W m −2 for LE) and RMSE (25 W m −2 with CV of 0.19 for H and 37 W m −2 with CV of 0.13 for LE) and correlation coefficient above 0.94. At the physically more complex forest site the disaggregated high-resolution fluxes were not so accurate, with the low-resolution fluxes comparing more favourably to the flux tower measurements. We have also shown that when the tower-measured meteorological model inputs are replaced with ERA-Interim model inputs (with the exception of incoming short-wave radiation) the accuracy of the DTD model, and the disaggregated fluxes, is not greatly affected and sometimes even improves, which is encouraging for applying the models for the derivation of high-resolution fluxes at regional scales. The results show that it is possible to accurately model heat fluxes in highly heterogeneous areas at both MODIS and Landsat spatial scales.
In addition to evaluating the disaggregation procedure we have made a small, but significant, modification to the DTD model by replacing the "parallel" resistance network with "series" resistance network which explicitly takes the incanopy air temperature into consideration. The modification resulted in large improvement in the accuracy of the modelled fluxes at both evaluation sites.
Further work should be conducted to better understand the processes occurring in forested ecosystems and to incorporate them into the TSEB models. Additionally, the performance of DTD and the disaggregation procedure when using new-generation sensors, such as VIIRS on the Suomi NPP satellite or SLSTR on the upcoming Sentinel-3 satellite, should be evaluated since the Terra and Aqua satellites are already running beyond their expected design life. Finally, even though the spatial resolution of Landsat-scale flux estimates is useful for many practical applications, such as precision agriculture, the low temporal resolution is a limiting factor. This is especially true in regions such as Denmark where the very frequent cloud cover results in only a few high-resolution observations per year. Therefore, the recent advances in temporal data fusion techniques (e.g. Cammalleri et al., 2013) should be tested in Danish climatic conditions.

Appendix A
When a study refers to model equations that come from a number of different papers, it is often unclear to the reader which formulation was actually used. Therefore, the purpose of this appendix is not to describe new model developments (apart from the series resistance network in DTD) but to clearly show the model implementation used in this study.

A1 TSEB model description
The TSEB model implemented in this study assumes an interaction between the soil and vegetation fluxes, i.e. the flux resistance network is implemented in series (see Fig. 11 in Norman et al., 1995). In the initial state of the model it is assumed that there is neutral atmospheric stability, meaning that the Obukhov length, L, is approaching ±∞. The actual stability of the boundary layer is later iteratively derived.
Firstly, the parameters which do not depend on L or canopy and soil temperatures, T C and T S respectively, are calculated. The fraction of vegetation that is green, f g = 1.2 EVI NDVI , is estimated for all the land-cover types except for croplands during the growing season (day of year ≤ 180) where it is assumed that the vegetation is fully green (Guzinski et al., 2013). The leaf area index taken from the MCD15A2 MODIS product is assumed to be the green leaf area index, LAI g . Therefore the total leaf (plant) area index is calculated as LAI = LAI g f g and is used in all the following equations with a symbol F . The only exception is in coniferous forest, where LAI g is used, also represented by a symbol F (see Sect. 2.2).
The nadir-view clumping factor, 0 , is assigned a value of 1.0 for the croplands and 0.5 for the coniferous forest, although in other studies the clumping factor is estimated (Kustas and Norman, 1999). The fraction of view of the radiometer covered by the vegetation depends on clumping factor and LAI as well as the VZA of the radiometer in radians, θ, and is calculated following Eq. (3) from Norman et al. (2000) as where θ is the clumping factor at VZA θ (Kustas and Norman, 1999): where D is the ratio of vegetation height to plant crown width which is set to a value of 1.0 for the croplands and 3.5 for the coniferous forest. A maximum limit of 0.95 has to be applied to f (θ ) to ensure that a fraction of soil is always visible to the radiometer. Without this limit T S calculated by the model can obtain extreme, and hence unrealistic, values. Equations for deriving displacement height, d 0 = 0.65h C , local roughness length for momentum transport, z 0M = 0.13h C , and local roughness length for heat transport, z 0H = z 0M exp(2) , are taken from Norman et al. (2000) and depend only on vegetation height, h C . In coniferous forest h C is kept constant at 20 m, while in cropland it depends on LAI and is calculated as follows: where h C,max is the maximum value of h C , set to 0.8 m, and F f is the value of LAI when the crop has reached a full height, set to 2. The net radiation reaching the ensemble soil and vegetation surface, R n , is estimated as the sum of its short-wave and long-wave components, R s and R l respectively: where σ is the Stefan-Boltzmann constant, α is combined soil and vegetation albedo derived from satellite observations, surf is combined soil and vegetation emissivity also derived from satellite observation and atm is the emissivity of the atmosphere derived following Brutsaert (1975) as In the above equations air temperature, T A , and LST, T R , are in Kelvin.
Once the parameters that remain constant for the duration of an individual TSEB run are calculated, the iterative part of the model can be computed initially with the assumption of |L| → ∞. First the wind friction velocity, u * , is calculated following rearranged Eq. (2.54) from Brutsaert (2005): where u is the wind speed measured at height z u , k is the von Karman constant and M (ζ ) is the Monin-Obukhov stability correction function for momentum calculated as in Eqs. (2.59) and (2.63) of Brutsaert (2005): where ζ = z u −d 0 L or ζ = z 0M L as required, y = −ζ , x = ( y a ) 1 3 , 0 = − ln(a) + 3 1 2 ba 1 3 π 6 , a = 0.33 and b = 0.41. In the second equation the value of y is limited such that y ≤ b −3 . In neutral atmospheric stability condition, when |L| → ∞, the stability correction function is set to 0.
There are three resistances in the soil-canopy-atmosphere heat flux network: R A -aerodynamic resistance to heat transport in the surface layer, R S -resistance to heat transport from the soil surface and R x -the total boundary layer resistance of the leaf canopy. The first resistance is estimated following Norman et al. (2000) as where H (ζ ) is the Monin-Obukhov stability function for heat, calculated in the same way as M (ζ ) for stable conditions and as in Eq. (2.64) from Brutsaert (2005) for unstable conditions: where y = −ζ , c = 0.33, d = 0.057 and n = 0.78. Calculation of R S is also taken from Norman et al. (2000): In the above equation the parameter c T is varying smoothly from a value of 0.006 ms −1 for LAI less than 2 to 0.004 ms −1 for LAI more than 2, b is a constant with a value of 0.012 (unitless) and u S is the wind speed just above the soil surface and is determined from wind speed just above the canopy, u C , following Norman et al. (1995) and taking 0 into account: where s is the leaf size in metres, h S is set to 0.05 m and h C has a minimum limit of 0.1 m in the u S equation. The final resistance R x is calculated as (Norman et al., 1995) where C is a constant with value of 90 s 1/2 m −1 and u d is wind speed at height d 0 + z 0M and is derived using the equation for u S with h S = d 0 + z 0M .
Once the values of the three resistances are known the temperature of the canopy, T C , soil, T S , and the inter-canopy air, T AC can be estimated. Firstly, the energy divergence in the canopy, R n , has to be established. During the first iteration, when T C and T S are not yet known, the short-wave and long-wave components of the net radiation are lumped together, and the divergence is calculated following Eq. (8b) from (Norman et al., 2000) and taking 0 into account: where θ s is the sun zenith angle and κ is an extinction coefficient varying smoothly from 0.45 for LAI more than 2 to 0.8 for LAI less than 2. In the following iterations the divergence of short-wave and long-wave radiation is treated explicitly so that R n = R s + R l . R s is calculated the same as R n in the first iteration with R n replaced by R s while R l is calculated as in Eq. (2b) of Kustas and Norman (1999): where R l,sky is the long-wave radiation from the sky calculated as in Eq. (A6) and R l,S and R l,C are long-wave radiation emitted from soil and canopy respectively and calculated using Stefan-Boltzmann equation and T S and T C . τ is the transmissivity of the vegetation estimated as τ = 1 − exp(−κ L F ) and κ L varies smoothly between 0.7 for LAI more than 1 and 0.95 for LAI less than 1. With R n it is possible to estimate the sensible heat flux of the canopy by using the Priestley-Taylor approximation (Norman et al., 2000): Initially it is assumed that the vegetation is transpiring at potential rate and the Priestley-Taylor parameter, α PT , has a value of 1.26. If implausible results are obtained, α PT can be reduced as explained later. sp is the slope of the saturation pressure curve and γ is the psychometric constant and both were obtained from Annex 3 of Allen et al. (1998). With the value of H C the temperature of the canopy can be estimated following Norman et al. (1995) as where T C,lin is the linear approximation of the canopy temperature: and T C is the correction factor: where The soil temperature can now be estimated from the canopy temperature, the T R and the viewing geometry: Finally, the inter-canopy air temperature can be estimated: With all the resistances and component temperatures now known it is finally possible to calculate the fluxes. Firstly, the canopy fluxes are calculated: Those are followed by soil fluxes: where G represents the soil heat flux and the Eq. (A32) is based on Liebethal and Foken (2007). The total sensible and latent heat fluxes are taken as the sum of their canopy and soil components: With the values of H and LE it is possible to recalculate L using the Eq. (2.46) from Brutsaert (2005): where g is gravitational constant with value of 9.8 ms −2 and E is the rate of surface evaporation in kg m −2 s −1 derived from LE using the equation from Annex 3 of Allen et al. (1998). The iterative part of the model is now re-run with the new value of L and the process is repeated until L converges to a stable value. Once L stabilizes, if the value of LE S is negative then it is assumed that the canopy transpiration has been overestimated and α PT is reduced and the iterative part of the model repeated once again. If α PT reaches zero and the modelled LE S is still negative then it is considered that there is no evaporation or transpiration in the modelled pixel (Norman et al., 1995). In those cases LE = LE S = LE C = 0 and since α PT = 0 it follows from Eq. (A21) that H C = R n . H can then be estimated as normally and the output value limited such that H ≤ R n − G. The limit is enforced since it is implausible that on a dry day without evapotranspiration the ground heat flux would be negative. If H < R n − G then any residual energy is assigned to G.

A2 DTD model description
The DTD model replaces the observations of absolute air temperature and LST with their time-differential values, taken as a difference between a night-time observation and another one in the late morning or early afternoon (Guzinski et al., 2013). Therefore, although it uses many of the TSEB model formulations, some of the key equations have been modified. The original DTD formulation uses "parallel" flux resistance network (see Fig. 1 in Norman et al., 1995), instead of the "series" resistance network implemented in TSEB. The former one is a simpler formulation which ignores the interaction between soil and vegetation fluxes and potentially can produce less accurate results (Kustas and Norman, 1999).
The main DTD equation is derived by applying Eq. (14) from Anderson et al. (1997) to night-time and daytime temperature observations, taking the difference of the two and simplifying by removing insignificant early morning, or night-time, fluxes (Norman et al., 2000): H 1 = ρc p (T R,1 (θ 1 ) − T R,0 (θ 0 )) − (T A,1 − T A,0 ) (1 − f θ,1 )(R A,1 + R S,1 ) The subscripts 0 and 1 in the above equation, and in all the following equations, refer to observations taken at night and during the day respectively. f θ,1 can be calculated in the same fashion as in the TSEB model. The sensible heat flux of the canopy, H C,1 , is derived using Eq. (A21) from the TSEB model with R n estimated with the short-wave and long-wave components of R n lumped together. The resistances used in the "parallel" resistance network, R A and R S , can also be calculated using the same equations as in TSEB. However, there is one important change in that the Richardson number, Ri, is used as an approximation for z u −d 0 L in all the resistance equations. Ri is calculated using time-differential observations as in Norman et al. (2000): In this study a new formulation for estimating H 1 in the DTD model using the "series" resistance network has