Global assessment of Vegetation Index and Phenology Lab ( VIP ) and Global Inventory Modeling and Mapping Studies ( GIMMS ) version 3 products

Earth observation-based long-term global vegetation index products are used by scientists from a wide range of disciplines concerned with global change. Intercomparison studies are commonly performed to keep the user community informed on the consistency and accuracy of such records as they evolve. In this study, we compared two new records: (1) Global Inventory Modeling and Mapping Studies (GIMMS) normalized difference vegetation index version 3 (NDVI3g) and (2) Vegetation Index and Phenology Lab (VIP) version 3 NDVI (NDVI3v) and enhanced vegetation index 2 (EVI3v). We evaluated the two records via three experiments that addressed the primary use of such records in global change research: (1) leaf area index (LAI), (2) vegetation climatology, and (3) trend analysis of the magnitude and timing of vegetation productivity. Unlike previous global studies, a unique Landsat 30 m spatial resolution and in situ LAI database for major crop types on five continents was used to evaluate the performance of not only NDVI3g and NDVI3v but also EVI3v. The performance of NDVI3v and EVI3v was worse than NDVI3g using the in situ data, which was attributed to the fusion of GIMMS and MODIS data in the VIP record. EVI3v has the potential to contribute biophysical information beyond NDVI3g and NDVI3v to global change studies, but we caution its use due to the poor performance of EVI3v in this study. Overall, the records were most consistent at northern latitudes during the primary growing season and southern latitudes and the tropics throughout much of the year, while the records were less consistent at northern latitudes during green-up and senescence, and in the great deserts of the world throughout much of the year. These patterns led to general agreement (disagreement) between trends in the magnitude (timing) of NDVI over the study period. Bias in inter-calibration of the VIP record at northernmost latitudes was suspected to contribute most to these discrepancies.


Introduction
The normalized difference vegetation index (NDVI) (Rouse, 1974) is defined as (ρ NIR −ρ RED )/(ρ NIR +ρ RED ), where ρ NIR and ρ RED are surface reflectance in the near infrared (NIR; 0.725-1.10µm) and visible red (0.58-0.68 µm), respectively.As plants become more photoactive, they absorb more visible red light due to the chlorophyll content of leaves and stems, and scatter more in the near infrared due to the alignment of cell walls (Tucker et al., 1994).This relationship, detected by remote sensing instruments at the canopy scale, has the effect of making the index increase (decrease) as the density of the canopy increases (decreases; Tucker, 1979).As such, NDVI has been used widely in global change research with Earth observation remote sensing for three general purposes: (1) the estimation of canopy properties related to light-use efficiency, such as the leaf area index (LAI) and fraction of photosynthetically active radiation intercepted by the canopy (F PAR ; e.g.Zhu et al., 2013), (2) representation of Published by Copernicus Publications on behalf of the European Geosciences Union.
vegetation climatology in soil-vegetation-atmosphere transfer models (e.g.O'ishi and Abe-Ouchi, 2009), and (3) detection of trends in vegetation (e.g. de Jong et al., 2011) and phenology (e.g. de Jong et al., 2012).Several agro-ecosystem modeling applications fall into these categories, including agro-climate forecasting (Funk and Brown, 2006), drought monitoring (Karnieli et al., 2006), and crop yield estimation (Xin et al., 2013).Although NDVI is widely used, it is sensitive to atmospheric effects, soil background, and saturates at high LAI.The enhanced vegetation index (EVI) was introduced to overcome these limitations, as it includes a visible blue band to reduce atmospheric effects, calibration terms to reduce the effects of soil background, and does not saturate as severely as NDVI at high LAI (Huete et al., 2002).EVI has also been used in a wide array of global change studies, but post 2000, when the Moderate-Resolution Imaging Spectroradiometric (MODIS) satellite sensor began retrieving visible blue reflectance (see Huete et al., 2010 for a review).
The Advanced Very High Resolution Radiometer (AVHRR) is the most commonly used sensor for long-term (i.e.pre-MODIS) global change studies, because it began retrieving visible red and NIR reflectance needed to estimate NDVI from 1981 (Brown et al., 2006).The AVHRR sensor has been on board eight National Oceanic and Atmospheric Administration (NOAA) satellites: 7 (1981)(1982)(1983)(1984)(1985), 9 (1985-1988 and 1994-1995 descending), 11 (1988-1994), 14 (1995-2000), 16 (2000-2003), 17 (2003-2009), 18 (2005-present), and 19 (2009-present).Reflectance data collected from the earlier AVHRR sensors (7, 9, 11, and 14) were difficult to process and synthesize, because they lacked onboard calibration; the NIR channel was sensitive to water, sun glint, glaciers at high latitudes, and clouds; and of orbital drift (Rao andChen, 1995, 1996).These issues were rectified with the launch of the AVHRR sensors onboard NOAA 16,17,18,and 19, but have resulted in radiometric and spectral inconsistencies across sensors that can significantly bias global change analyses (van Leeuwen et al., 2006).Various methods have been developed to make these data continuous and consistent through time, but take different approaches and are frequently updated, necessitating new accuracy assessments to inform the user community as they evolve.
The Global Inventory Modeling and Mapping Studies (GIMMS; Tucker et al., 1994) and Vegetation Index and Phenology Lab (VIP; Didan, 2014) AVHRR products are actively used and frequently updated, but represent fundamentally different approaches to synthesis.The NOAA Global Vegetation Index (Jiang et al., 2010) is a category onto itself, since it is stationary and therefore not appropriate for change detection.Both GIMMS and VIP are aggregated to a 15-day time step from daily data and are calibrated with higher spatial resolution sensors in the period that overlaps NOAA 7,9,11,and 14 and NOAA 16,17,18,and 19.However, before aggregation, the former undergoes minor radiometric and spectral correction, while the later undergoes rigorous atmospheric correction.Perhaps most importantly, GIMMS is developed solely from AVHRR, while VIP is a blend of the AVHRR 1981-1999 Long-Term Data Record (LTDR) (Nagol et al., 2009;Pedelty et al., 2007) and MODIS 2000-present.Finally, the VIP product includes EVI2 (Jiang et al., 2008), which is a red-NIR version of EVI that has not been widely evaluated and can potentially provide additional biophysical information and improve the accuracy of long-term global change analyses (Rocha and Shaver, 2009).Given these differences, studies have been performed at the global (Beck et al., 2011) and regional (Scheftic et al., 2014) scale to assess the performance of older product versions.Only one recent study compared the product versions analyzed in this study globally, but only for the consistency of trends (Tian et al., 2015).There remains no general consensus on which product is superior; however, GIMMS NDVI tends to be more appropriate than VIP NDVI for trend analysis, because the combination of poor orbital drift correction and blending between LTDR and MODIS potentially contributes to large interseasonal variations in VIP NDVI.VIP NDVI, on the other hand, may be more appropriate for estimating phenology (start of season, length of season, and timing of peak NDVI) and other applications that require absolute NDVI values.In each case, the performance of EVI2 was not evaluated nor was in situ data used for intercomparison.
The aim of this study was to perform a global assessment of the latest version of GIMMS and VIP over a 30-year period (January 1982 to December 2011) in order to aid the user (global change) community in interpreting results that involve these data.In doing so, we helped resolve the superiority of one product over another.The assessment was performed with three experiments that address the three major themes of global change research that involve Earth observation remote sensing.Unlike other intercomparison studies, we evaluated EVI2 and used an agro-ecosystem database comprised of relatively high spatial resolution Landsat and in situ LAI sample pairs to assess the performance of each product for agro-ecosystems in absolute terms.In addition, unlike other studies, the trend analysis was performed not only on the magnitude of change across the globe on an annual basis, but also on the change in the timing of NDVI according to the unique phenology in each hemisphere.

Global Inventory Modeling and Mapping Studies (GIMMS) normalized difference vegetation index version 3 (NDVI3g)
The GIMMS vegetation index record evaluated is version three, which is labeled as NDVI3g for the remainder of the paper.Full details on the product version can be found in Pinzon and Tucker (2014).The new product includes a se-ries of updates since the original GIMMS NDVI and second generation NDVIg (Tucker et al., 2005) products.Like ND-VIg, it is a non-stationary NDVI series at 15-day intervals and 1/12 • (∼ 8 km at the equator) resolution; corrected for orbital drift, Rayleigh scattering, and radiometric and spectral inconsistencies over deserts; furthermore, takes an empirical (Bayesian) approach to normalize overlapping AVHRR periods with another higher-resolution sensor that overlaps the two periods.In addition, daily NDVI data are scaled to 15day composites using a maximum value compositing (MVC) algorithm (Holben, 1986), which reduces further inconsistencies in the daily data.The most unique development in NDVI3g is the use of Sea-viewing Wide Field-of-view Sensor (SeaWiFS) for intercalibration instead of the System Pour I'Observation de la Terre (SPOT) sensors.This is intended to reduce significant bias in NDVI at extreme northern latitudes that has been observed in SPOT imagery (Guay et al., 2014).

Vegetation Index and Phenology Lab version 3 NDVI3v and enhanced vegetation index 2 (EVI3v)
The VIP vegetation index record evaluated is also in its third version, which is labeled as NDVI3v and EVI3v for NDVI and EVI2 data, respectively, for the remainder of the paper.Further information on the product version can be found in Didan (2014).Like previous versions, it is a nonstationary series at 15-day intervals and 1/20 • (∼ 5 km at the equator) resolution; corrected using radiometric, drift, and cloud screening procedures recommended in El Saleous et al. (2000); an atmospheric algorithm that reduces the effects of Rayleigh scattering, ozone, aerosols, and water vapor (Vermote et al., 1997); and takes an empirical (linear regression by land cover type) approach for intercalibration.Unlike GIMMS, SPOT is used for intercalibration and daily data are aggregated to 15-day composites using the constrained view angle -maximum value composite (CV-MVC) approach (Cihlar et al., 1997).Unlike MVC, CV-MVC does not give preference to off-nadir values that may be higher than true (at-nadir) values.Version three includes one notable improvement over version two, namely, the correction of NDVI and EVI2 for sparsely vegetated areas pre-MODIS era (Scheftic et al., 2014).EVI2 is derived from the following equation and responds similarly to EVI (Jiang et al., 2008): The VIP product contained persistent data gaps due to cloud cover and other noise and was at a higher spatial resolution than the GIMMS product, so additional steps were taken to process it before the assessment.A MODIS filtering algorithm described in Xiao et al. (2003) andFensholt et al. (2006).Data gaps due to cloud cover and poor data quality were not gap-filled.The algorithm was considered a compromise between preserving the actual data as much as possible and filling missing data so that a reasonable com- parison could be made.Figure 1 shows the percentage of missing data filled by the filtering algorithm.On a monthly basis, less than 20 % of the data was filled for the majority of pixels.Notable exceptions were primarily in the mid-and extreme latitudes during wintertime.The most severe case was in southern Asia during the monsoon (June-September) where more than 50 % of the pixels were filled by the filtering algorithm.After the filter was applied, NDVI3g was resampled to NDVI3v/EVI3v resolution using the gdalwarp utility (http://www.gdal.org/gdalwarp.html) with default parameters.Missing values were then made consistent across GIMMS and VIP, so that the summary statistics (experiment two below) and trends (experiment three below) were captured only for the 15-day values that the two products shared.The data sets were then resampled back to the native NDVI3g spatial resolution for evaluation.
M.  (Glenn et al., 2008).Monsi and Saeki (1953) found that light attenuation in the canopy followed Beer's law (Beer, 1852).This means that for a random canopy with a spherical leaf angle distribution, LAI, the second most commonly derived biophysical parameter from NDVI and EVI, can be approximated from F PAR using the following equation (Norman et al., 1995) where k is an extinction coefficient and LAI is the leaf area index (m 2 m −2 ).Given the importance of NDVI and EVI in estimating F PAR and LAI, standard regression techniques were used to measure the relative ability of NDVI3g, NDVI3v, and EVI3v to capture in situ LAI variability.It is difficult to compare these records to in situ LAI directly, because the NDVI/EVI-LAI relationship is typically scale dependent or non-linear (Friedl et al., 1995;Gao et al., 2000;Hall et al., 1992;Huete et al., 2005).Therefore F PAR derived from Landsat Thematic Mapper/The Enhanced Thematic Mapper Plus (TM/ETM+) 30 m resolution surface reflectance data was used intermediately to downscale NDVI3g, NDVI3v, and EVI3v to 30 m resolution to facilitate the comparison.

Landsat Thematic Mapper/The Enhanced Thematic Mapper Plus and in situ LAI
The Landsat TM/ETM+ surface reflectance and in situ LAI data were extracted from a database that was developed to determine the ability of Landsat-based NDVI, EVI2, and other vegetation indices to predict LAI for field crops around the world.Results of the analysis, along with a full description of the database can be found in Kang et al. (2015).Figure 2 shows the distribution of the Landsat-LAI sample pairs in the database.It includes nine major global field crops (barley, cotton, maize, pasture, potato, rice, soybean, sugar beet, and wheat) and several less common fields crops classified as "other" for purposes of this analysis.The in situ LAI was determined using ground-based optical (LAI 2000, AccuPar, and hemispherical) and destructive techniques and compiled from a number of sources.These include AmeriFlux (http: //ameriflux.ornl.gov/)and AsiaFlux (http://asiaflux.net/)regional flux networks; experimental and validation projects (e.g.Marshall and Thenkabail, 2015); the VALidation of European Remote sensing Instruments project (Baret et al., 2014); the Australian Airborne Cal/val Experiments for SMOS project (Peischl et al., 2012); as well as data retrieved from peer-reviewed journals.For each LAI record in the database, Landsat TM/ETM+ radiance was extracted from the US Geological Survey archive within a ±15-day window encompassing the date of in situ measurement and converted to surface reflectance with the Landsat Ecosystem Disturbance Adaptive Processing System (Masek et al., 2006).NDVI and EVI2 were computed using the equations above.In rare cases where more than one LAI observation fell in a single Landsat pixel, the LAI values were averaged, so that each in situ entry corresponded to a unique Landsat NDVI/EVI2 value.After averaging, the data set consisted of 2086 LAI-Landsat pairs, which was subsequently reduced to 1459 measurements after further quality control measures described in Kang et al. (2015) were taken to remove inconsistent samples.

Downscaling long-term records with the fraction of photosynthetically active radiation intercepted by the canopy (F par ) and evaluation with in situ LAI data
Downscaling was performed by converting AVHRR and Landsat vegetation indices to F PAR .Unlike the NDVI/EVI-LAI relationship, the NDVI/EVI-F PAR relationship is quasiscale invariant (Asrar et al., 1992;Friedl et al., 1995;Gutman and Ignatov, 1998;Myneni et al., 2002;Sellers, 1985), meaning a coarse-resolution F PAR pixel is approximately equal to the average of overlapping higher-resolution F PAR pixels.
In this study, on a per pixel basis, most of the in situ LAI was retrieved only once, so using a ratio-based approach was not feasible.Therefore, the AVHRR vegetation indices were downscaled to 30 m spatial resolution by regressing (linearly) Landsat F PAR and NDVI3g, NDIV3v, and EVI3v F PAR .In order to reduce the impact of land cover dependence, the models were developed for each crop.
The fraction of photosynthetically active radiation intercepted by the canopy was computed using the ratio method first proposed in Gutman and Ignatov (1998): where VI min is the vegetation index (NDVI or EVI2) for bare soil (LAI → 0), and VI max is the vegetation index (NDVI or EVI2) for dense vegetation (LAI → ∞).VI min and VI max for NDVI and EVI2 were set to 0.05 and 0.95 (Fisher et al., 2008;Mu et al., 2007).These limits are sometimes considered dependent on the spatial and temporal resolution and land cover type (Zeng et al., 2000).However, the limits proved arbitrary for downscaling purposes and using the range 0.05 to 0.95 guaranteed that fractions ranged from zero to one.Once NDVI3g, NDVI3v, and EVI3v F PAR were downscaled to corresponding Landsat data, their performance was evaluated by regressing them (linearly) with the in situ LAI data.Since the relationship between F PAR and LAI is logarithmic, as shown in Eq. ( 2), standardized residual plots (not shown) were made and linear transformations were per-Figure 2. Sites where in situ (destructive or optical) measurements and Landsat Thematic Mapper/The Enhanced Thematic Mapper Plus ground reflectance data were compiled, resulting in more than 1400 data pairs.The sites are overlaid with 1 km grid cells that contain 5 % or more crop area (Ramankutty et al., 2008).
formed to verify that the assumptions of normality were met.In most cases, transformations were not required.The performance of the final model selected in each case was characterized by the coefficient of determination (R 2 ), significance tests, and root mean square error (RMSE).
Of the original 1459 Landsat-LAI data pairs, only 242 were used for the final analysis.The majority of the data loss was due to considerable overlap of LAI data in space and time, because they were collected without remote sensing applications in mind: (1) LAI values that were captured by the same coarse-resolution pixels were averaged along with Landsat NDVI/EVI2 and (2) due to the presence of missing values in the long-term records, LAI and Landsat NDVI/EVI2 were averaged on a 15-day basis.These reductions led to small sample sizes for each crop.The sample sizes for cotton and rice were so small that they were omitted to avoid over-fitting.In order to increase the sample size on a per-crop basis, two aggregations based on the presumed similarity of crop spectral/canopy characteristics were made: (1) barley and wheat (winter and spring varieties) were classified as wheat and (2) garlic, onion, potato, and sugar beet were classified as tuber.

Second experiment: comparison of NDVI3g and
NDVI3v climatology used to parametrize SVAT models SVAT models traditionally were stand-alone and used to simulate the interaction of incoming solar radiation with the canopy driven by F PAR and biogeochemical processes for a single location, but are becoming increasingly coupled to regional and global scale climate models and run over regu-larized grids, given the importance of vegetation feedbacks on the atmosphere (Quillet et al., 2010).A common data set used to parameterize the F PAR component of SVATs is the 0.15 • resolution monthly climatology of AVHRR NDVI (Gutman and Ignatov, 1998).Given the importance of the F PAR climatology in SVATs, long-term summary statistics for NDVI3g and NDVI3v were computed as part of the assessment.EVI3v was not included in this experiment, because it does not have a GIMMS counterpart to compare it to, has different and more well-documented statistical properties than NDVI, and it is derived from the same visible red and NIR channels and underwent the same corrections as NDVI3v making its comparison redundant.The summary statistics were computed from the 15-day data, but the results are presented here on a monthly basis to reflect the NDVI climatology used in SVATs.The summary statistics included: mean, standard deviation, R 2 from linear regression, and slope from linear regression.The mean and standard deviation statistics are most critical for understanding the differences in NDVI climatology, while R 2 and slope indicate the strength, magnitude, and direction of the correlation between the two data sets.All summary statistics are presented with significance (p) ≤ 0.05.Non-linear correlation statistics were also computed, but were not included, because they showed similar spatial patterns as the linear statistics.

Third experiment: comparison of NDVI3g and NDVI3v trends in magnitude and timing (phenology)
Changes in the magnitude and timing (phenology) of plant productivity are important for understanding how ecosystems respond to climate change (Nemani et al., 2003).In North America, for example, trend analysis of these changes has revealed that global warming is driving an increase in plant productivity and a lengthening of the growing season (i.e. earlier green-up in the spring and later senescence in the fall; Barichivich et al., 2013).The characterization of the magnitude and phenology of productivity over a year is typically estimated with empirical methods that include NDVI and other bioclimatic predictors such as temperature and relative humidity (e.g. Brown et al., 2012).In order to avoid confounding the assessment of GIMMS and VIP with other variables, harmonic regression (Eastman et al., 2009;Jakubauskas et al., 2001) was performed on the vegetation index records to measure the magnitude and timing of NDVI on an annual basis.As with experiment two, EVI3v was not evaluated in this experiment.A trend analysis was then performed on the regression parameters to compare NDVI3g and NDVI3v as surrogates for the change in magnitude and timing of plant productivity over time.The primary parameters of harmonic regression are the amplitude (in this case the difference between peak and mean NDVI) and phase (in this case timing of NDVI peaks and troughs).Amplitude and phase are computed by fitting a series of sinusoidal functions to the time series (Eq.3).The harmonic regression was performed on a monthly basis for each year.Monthly values were determined by taking the maximum NDVI of the two 15-day composites per month.
where NDVI t is the predicted normalized difference vegetation index at month (t), NDVI 0 is the annual monthly mean, i is the number of harmonics up to the jth harmonic, N is the number of samples (months) in the year, and A and B are coefficients used to compute the amplitude and phase.
The regression was performed for the first harmonic, which represents the primary growing season, because multimodal systems (harmonics > 1) are uncommon and capturing them risks over-fitting.The change in amplitude and phase over time was quantified using the Theil-Sen technique (Gilbert, 1987).The Theil-Sen technique takes the median slope over all possible pairwise slopes through time.Unlike linear regression, it does not require normality or homoscedasticity, making it appropriate for trend analyses involving NDVI data (de Beurs and Henebry, 2005).The significance of the amplitude and phase trends (p ≤ 0.05) was identified using the nonparametric Mann-Kendall test.Since the primary growing season in the Southern Hemisphere occurs over two given calendar years, the trend analysis was repeated for the Southern Hemisphere by advancing the regression 6 months ahead each year.This resulted in one less year or a 29-year trend analysis for the Southern Hemisphere.

First experiment: performance of long-term records using Landsat F PAR and in situ LAI
The accuracy of each long-term record when compared to in situ LAI was mixed, but NDVI3g performed moderately better than NDVI3v and EVI3v.The scatter plots of predicted (downscaled) NDVI3g, NDVI3v, and EVI3v F PAR vs. Landsat F PAR for wheat and pasture are shown in Fig. 3, while the summary statistics of the linear models used to downscale the records for all crops with sufficient samples sizes and reasonable correlations are shown in Table 1.The models used to downscale NDVI3g yielded higher correlations and lower error than the models used to downscale NDVI3v for maize and wheat, while NDVI3v yielded higher correlations and lower error for soybean and pasture, and EVI3v was the most difficult to downscale of the three.Specifically, R 2 for NDVI3g over NDVI3v was 0.04 for maize and 0.18 for wheat, while R 2 for NDVI3v over NDVI3g was 0.06 and 0.04 for pasture and soybean.It is important to note, however, that the strength of the relationships were low across all records with the exception of pasture, which could be due to the homogeneity of pasture over large areas.The relationship for tuber was so poor that it was not included in the LAI evaluation.The relationship between the downscaled NDVI3g, NDVI3v, and EVI3v F PAR and in situ LAI are shown for wheat and pasture is in Fig. 4, while the model statistics and transformation for a linear comparison, are presented in Table 2.The NDVI3g-LAI models captured in situ variability better than NDVI3v and EVI3v for maize ( R 2 = 0.06), pasture ( R 2 = 0.11), and wheat ( R 2 = 0.10), with comparable results between NDVI3g and NDVI3v for soybean.EVI3v tended to perform better than NDVI3v for two of the crops: pasture ( R 2 = 0.05) and wheat ( R 2 = 0.04).As can be seen in Fig. 4, however, the predictive power of EVI3v could be inflated by leveraging at high LAI, i.e.EVI3v tends to be more variable than NDVI3v at higher LAI.

Second experiment: similarity of NDVI3g and NDVI3v climatology
On a monthly basis, NDVI3g and NDVI3v showed a high level of consistency in terms of relative magnitude expressed as R 2 (Fig. 5) and direction expressed as slope (Fig. 6).Both metrics were computed with the slopes forced through the origin (0, 0).In the Northern Hemisphere, R 2 approached one after green-up (May) and progressively got stronger over the boreal summer months (June, July, and August).The poorest correlations (R 2 < 0.7) were seen primarily at the northern-most latitudes during the transition between boreal winter and spring.Correlations were more consistent in the Southern Hemisphere where snow and cloud cover was notably less than in the north.A glaring exception, however, was the Strut Stony Desert of South Central Australia, which showed poor correlations during the transition between austral summer (December, January, and February) and fall.The tropics showed high and significant correlations throughout most of the year as well.The slopes followed a similar pattern as the correlations, with values approaching a one-toone relationship (slope = 1.0) after the transition from winter to spring in the Northern Hemisphere and consistently over much of the year in the tropics and Southern Hemisphere.
The great deserts of the world and sparsely vegetated areas had slopes approaching zero throughout the year.Since the slopes were expressed with NDVI3v as the dependent variable and the slopes were always less than one, NDVI3g was always less than NDVI3v.The difference in NDVI3g and NDVI3v magnitudes is more clearly shown in Fig. 7, which illustrates the monthly latitudinal mean and standard deviation for both.Mean NDVI3v was always higher and more variable than NDVI3g.In addition, large divergence in means between the two records occurred during senescence in the Northern Hemisphere.Other patterns were more consistent: NDVI3g and NDVI3v were high in the tropics throughout the year and peaked and declined following the seasons in the Northern and Southern hemispheres; and the standard deviations for both were higher in the Northern Hemisphere than the Southern Hemisphere due to continentally.

Third experiment: similarity of NDVI3g and NDVI3v trends in magnitude and phenology
The two NDVI records exhibited a high level of correspondence in maximum primary season NDVI (first harmonic amplitude), both in direction and location (Fig. 8).In terms of magnitude trends, however, NDVI3v was higher than NDVI3g.The figure was masked for pixels that had complete NDVI records to facilitate curve fitting in a given year and then again for trends that were statistically significant over the 30-year period.This resulted in no trends over much of the northern latitudes.In addition, NDVI amplitudes ≥ 0.03 per year (or 1.0 over the 30-year period) and NDVI amplitudes ≤ −0.03 (or −1.0 over the 30-year period) were flagged as missing, since NDVI ranges from −1 to 1.In most cases, however, the increase in absolute amplitude per year was less than 0.01 or 0.3 over the 30-year period.Overall, the positive NDVI3g trends appeared to be more consistent spatially in several important cropping and grazing regions, including the Great Plains of the United States; the Region del Norte Grande of Argentina; the Iberian Peninsula (particularly Portugal); Lesotho, South Africa (east), and Swaziland; Ganges (India) and Indus (Pakistan) plains; the Sahel of West Africa; and Cape York Peninsula (Australia).Negative trends (also more consistent in NDVI3g) appeared to be primarily in the great deserts of the Northern Hemisphere.
In the Southern Hemisphere, however, some negative trends were seen in the tropical forests of the Amazon and Congo River basins.The two records in terms of primary season timing (first harmonic phase) showed a lower level of correspondence than for amplitude (Fig. 9).As above, trends were not seen over much of the Northern Hemisphere.In addition, the NDVI phases ≥ 0.07 per year (or ∼ 2 months over the 30year period) and NDVI phases ≤ −0.07 (or ∼ −2 months over the 30-year period) were flagged as missing, because changes of more than 2 months were deemed aberrant.In most cases, however, the absolute change in timing was less than 2 months.As with trends in amplitude, the trends in phase were more consistent spatially over both hemispheres from NDVI3g.Earlier green-up (negative trend) represented the majority of trends in the two data sets, though considerably less than the increase in amplitude shown in Fig. 8. Negative trends were seen over many important cropping and grazing areas: California and the southwestern United States, the Iberian Peninsula, the Sahel of sub-Saharan Africa, Iran (east), South Africa (west), Turkmenistan (north), and over much of the areas bordering the deserts of Australia.Later green-up (positive trend) was primarily concentrated in the great deserts (e.g. the Great Sandy and Gibson deserts of northwestern Australia).

Discussion
This study assessed the latest versions of two non-stationary and long-term vegetation index records used in global change studies.The assessment was performed with three experiments that addressed important global change applications, namely, F PAR and LAI, vegetation climatology, and trend analysis of vegetation magnitude and phenology.The results of the analysis highlight important similarities and differences between the two records that the global change community should be aware of before using them for these applications: (1) NDVI3v was consistently higher and more variable than NDVI3g, which in Tian et al. (2015) has been attributed to artificial jumps in the record between AVHRR and MODIS periods and may contribute to relatively lower correlations and higher errors with in situ LAI; (2) the performance of EVI3v with in situ LAI compared to NDVI3g was unexpectedly poor; (3) correlations between GIMMS and VIP were the highest during the primary growing season; therefore, trends in peak NDVI were fairly consistent between the two, both showing increases over much of the globe and decreases in tropical rainforests; and (4) correlations between GIMMS and VIP were lower during green-up and senescence, which were most pronounced at high latitudes where the NDVI3g product is expected to have much lower bias due SeaWiFS inter-calibration.Overall, we recommend using NDVI3g over NDVI3v and EVI3v for vegetation climatology and trend analysis, because it is spatially and temporally more consistent.Unlike previous studies, however, the in situ LAI experiment revealed that NDVI3g is better suited for absolute measurements as well.

First experiment: performance of long-term records using Landsat FPAR and in situ LAI
Unlike previous inter-comparison studies, a unique moderate resolution remote sensing and in situ LAI database for agro-ecosystems was used for accuracy assessment.In most cases, NDVI3g was more accurate than NDVI3v or EVI3v.EVI3v performed considerably worse than NDVI3g, which is surprising, because EVI tends to be better correlated than NDVI from other sensors with canopy structural properties (Huete et al., 2002).Earlier studies have suggested that the LTDR NDVI from which MODIS data are merged in the VIP product is more appropriate for modeling applications requiring absolute values (Beck et al., 2011), meaning NDVI3v should reproduce more accurate estimates of F PAR and LAI than NDVI3g, but this was not the case in this study.Tian et al. (2015) assessed NDVI3v.They attributed jumps in the If users require the higher spatial resolution offered by VIP and the added biophysical information afforded by EVI3v for application purposes, several options exist for improving their accuracy.Perhaps the most important would be to fill the remaining data gaps in the filtered VIP data sets generated here with smoothed data (see Kandasamy et al., 2013 for examples), which will address some of the noise in the data observed in Tian et al. (2015) and this study.Another option widely used in the climate modeling community, is to generate an ensemble mean of NDVI3v and NDVI3g to account for some of the bias and uncertainties in each product.Finally, instead of using EVI3v, the red and NIR channels included in the VIP database could be used to calculate the Soil Adjusted Vegetation Index (SAVI; Huete, 1988) instead.Unlike EVI2, SAVI has undergone extensive evaluation.

Second experiment: similarity of NDVI3g and NDVI3v climatology
NDVI3g and NDVI3v showed a high level of agreement with one another at mid-latitudes during the primary growing season and in the densely vegetated tropics throughout most of the year, and a low level of agreement at high latitudes during winter months and in the sparsely vegetated sub-tropics throughout most of the year.The high level of agreement is expected, because data gaps, cloud contamination, and atmospheric water vapor is less at mid-latitudes during summer months (Beck et al., 2011;Moulin et al., 1997).The high level of agreement in the tropics was more surprising, because data gaps and cloud contamination are persistent there throughout much of the year, typically leading to large discrepancies among records (Brown et al., 2006).However, as previously stated, many contaminated pixels were omitted from the analysis.The large discrepancy at high latitudes could have been due to factors other than cloud contamination and other noise, including the (1) presence of snow cover; (2) high frequency of off-nadir pixels, which would impact the results of the compositing algorithm (MVC vs. CV-MVC); and, perhaps most importantly, (3) use of Sea-WiFS over SPOT for GIMMS inter-calibration (Hall et al., 2006).The large discrepancy in deserts and sparsely vegetated areas on the other hand was most likely due to the dominance of soil in the signal and sensitivity of NDVI to soil wetness (Jiang et al., 2006).With the high level of correlation during the primary growing season and higher and more variable NDVI3v, users should expect NDVI3v climatology during the primary growing season to be higher at mid-latitudes and in the tropics throughout most of the year, but consistent with changes in NDVI3g.During winter months, especially at high latitudes and in semi-arid to arid subtropical regions, where SeasWiFS inter-calibration is less biased, NDVI3v will be higher, more variable, and less accurate than NDVI3g.

Third experiment: similarity of NDVI3g and NDVI3v trends in magnitude and timing
NDVI3g and NDVI3v both showed greening (positive NDVI amplitude) globally, with localized browning (negative NDVI amplitude) over a 30+ year time frame, but the magnitude of the trends in the latter was higher.Therefore, trend analyses of peak NDVI or annual means will be higher in NDVI3v than NDVI3g, but the direction will be the same.The direction of change in general corroborated previous global studies.The gain or loss of plant productivity is generally attributed to biophysical drivers (temperature and precipitation), human-related change, and discontinuities in the long-term record (de Jong et al., 2012).At midlatitudes, warming (cooling) at the beginning of the growing season can lead to greening (browning) in areas where water supplies are ample.In North America east of the Great  Plains, for example, greening was observed in NDVI3g and NDVI3v, which has been attributed to temperature-driven increases in plant productivity in previous studies (Wang et al., 2011).Increased rainfall (droughts) proceeding or during the growing season can lead to greening (browning) particularly in water-limited regions such as the Sahel.As shown here, the Sahel has experienced greening over the past 30+ years.This greening, typically referred to as the "re-greening of the Sahel" is defined in other studies as the increase in woody biomass (Brandt et al., 2015) that followed the recovery of rains in the 1990s after 2 decades of severe droughts driven by below normal sea surface temperatures in the subtropical North Atlantic (Giannini et al., 2013).Deforestation is perhaps the most appreciated human driver of plant productivity.Browning in the Amazon and Congo River basins, as was shown in this study, has been attributed to widespread deforestation in previous studies (Hansen et al., 2010;Mayaux et al., 2013), though other drivers, such as the shift in Walker circulation potentially contribute to the loss as well (Zhou et al., 2014).Greening was observed in tropical rainforests as well and this has been attributed in previous studies to rapid regrowth after deforestation, the way VIs are composited, and the methods by which trends are detected (Beck et al., 2011).Some of the trends disagree with previous research and should be addressed in future studies.Most prominent were that no trend was detected at extreme northern latitudes, though previous studies have shown summer drought-driven declines in boreal forest productivity (Goetz et al., 2005), and positive trends were detected for the Region del Norte Grande of Argentina, though previous studies have shown negative trends attributed to the rapid encroachment of agriculture into subtropical forests of the region (Paruelo et al., 2004).NDVI3g and NDVIv both showed earlier green-up (negative NDVI phase) more than later green-up (positive NDVI phase), but they were less consistent with one another compared to trends in peak NDVI.NDVI3g and NDVI3v showed low correlations during green-up and diverging climatology during senescence, which could lead to discrepancies in the timing of start of season (SOS) and end of season (EOS).Global studies seldom analyze trends in vegetation timing.On a regional basis, however, the findings appear to be less consistent with previous studies.Over the majority of northern regions, for example, SOS has been retreating as shown; however unlike this study, previous studies have shown that EOS has been advancing.The combination of the two processes has led to a longer growing season attributed primarily to asymmetric and rising global temperatures.One of the limitations of the harmonic approach taken in this study is that it is rigid, i.e. it assumes that the time series oscillates at a regular interval over the year.In the future, a harmonic or other phenological model that accounts for SOS and EOS asymmetry may be more appropriate for accurate trend analysis.

Conclusions
This paper revealed important similarities and differences of two new long-term vegetation databases: (1) Global Inventory Modeling and Mapping Studies normalized difference vegetation index version 3 (NDVI3g) and (2) Vegetation Index and Phenology Lab version 3 NDVI (NDVI3v) and enhanced vegetation index 2 (EVI3v).Overall, NDVI3g performed better and more consistently than NDVI3v and EVI3v in the three experiments designed to evaluate the two products in absolute terms and changes in magnitude and timing.VIP tended to be higher in magnitude, more variable, and less consistent in terms of trends, due primarily to the blending of two sensors with different attributes (AVHRR with MODIS).GIMMS, on the other hand only uses AVHRR.The two databases showed a high level of consistency during the primary growing season, which contributed to similar changes in the relative magnitude and direction of plant productivity climatology and dynamics, which are critical to global change research.The two products were less consistent in timing, especially at the start and end of the primary growing seasons at high latitudes.It is suspected that these poor correlations are attributed to the higher-resolution sensors each product uses for intercalibration.In conclusion, we suggest users requiring a longterm product to measure biophysical parameters, vegetation climatology, and trends in plant productivity magnitude and timing to use NDVI3g and to avoid using EVI3v.

Figure 1 .
Figure 1.Percentage increase in pixels added (i.e.gaps filled) after applying the temporal filter to Vegetation Index and Phenology Lab version 3 records.

Figure 3 .
Figure 3. Scatter plots of the the fraction of photosynthetically active radiation intercepted by the canopy (F par ) vs. Landsat F par for wheat (a-c) and pasture (d-f) estimated by the Global Inventory Modeling and Mapping Studies normalized difference vegetation index version 3, Vegetation Index and Phenology Lab version 3 normalized difference vegetation index, and Vegetation Index and Phenology Lab version 3 enhanced vegetation index 2.The solid lines represent the linear model used to downscale the vegetation record for evaluation with in situ leaf area index.

Figure 4 .
Figure 4. Scatter plots of in situ leaf area index for wheat (a-c) and pasture (d-f) vs. corresponding Landsat resolution pixels downscaled from the Global Inventory Modeling and Mapping Studies normalized difference vegetation index version 3, Vegetation Index and Phenology Lab version 3 normalized difference vegetation index, and Vegetation Index and Phenology Lab version 3 enhanced vegetation index 2 data sets.The solid lines represent the leastsquares model fit.

Figure 5 .
Figure 5.The coefficient of determination (R 2 ) on a per-pixel basis for the Vegetation Index and Phenology Lab version 3 normalized difference vegetation index vs. the Global Inventory Modeling and Mapping Studies normalized difference vegetation index version 3. R 2 was determined using a 30-year time series of 15-day composites for each month.The images have been masked for significance ≤ 0.05 and latitudes ranging from 60 • N-60 • S.

Figure 6 .
Figure 6.The slope (intercept = 0) determined from linear regresion on a per-pixel basis for the Vegetation Index and Phenology Lab version 3 normalized difference vegetation index vs. the Global Inventory Modeling and Mapping Studies normalized difference vegetation index version 3. Slope was determined using a 30-year time series of 15-day composites for each month.The images have been masked for significance ≤ 0.05 and latitudes ranging from 60 • N-60 • S.

Figure 7 .
Figure 7.The latitudinal mean (solid line) and standard deviation (ribbon) of the Global Inventory Modeling and Mapping Studies normalized difference vegetation index version 3 (blue) and Vegetation Index and Phenology Lab version 3 normalized difference vegetation index (black) over 30 years.Values are shown from 60 • N-60 • S.

Figure 8 .
Figure 8.The change in maximum normalized difference vegetation index (NDVI) per year (yr) from the (a) Global Inventory Modeling and Mapping Studies (GIMMS) and (b) Vegetation Index and Phenology Lab (VIP) records.The upper panals represent the Northern Hemisphere (30-year change) and the lower panels represent the Southern Hemisphere (29-year change).The trends have been masked for significance ≤ 0.05.

Figure 9 .
Figure 9.The change in timing of the normalized difference vegetation index (NDVI) per year (yr) from the (a) Global Inventory Modeling and Mapping Studies (GIMMS) and (b) Vegetation Index and Phenology Lab (VIP) records.The upper panals represent the Northern Hemisphere (30-year change) and the lower panels represent the Southern Hemisphere (29-year change).Negative values indicate earlier green-up/scenence, while positive values indicate later green-up/scenence.The trends have been masked for significance ≤ 0.05.

Table 1 .
Summary statistics (R 2 is coefficient of determination, m is slope, b is intercept, p is significance, and RMSE is root mean square error) of the linear relationships between the fraction of photosynthetically active radiation intercepted by the canopy (F PAR ) estimated by Landsat Thematic Mapper or Enhanced Thematic Mapper Plus and F PAR estimated by the long-term vegetation

Table 2 .
Summary statistics (R 2 is coefficient of determination, m is slope, b is intercept, p is significance, and RMSE is root mean square error) of the relationships between in situ leaf area index (LAI) and fraction of photosynthetically active radiation intercepted by the canopy (F PAR ) estimated by the downscaled long-term vegetation records (NDVI3g is Global Inventory Modeling and Mapping Studies normalized difference vegetation index version 3, NDVI3v is Vegetation Index and Phenology Lab version 3 normalized difference vegetation index, and EVI3v is Vegetation Index and Phenology Lab enhanced vegetation index 2).A logarithmic transformation was performed for soybean to meet the assumptions of normality, while the in situ LAI from the other crops were not transformed.