Interactive comment on “ Water , Energy , and Carbon with Artificial Neural Networks ( WECANN ) : A statistically-based estimate of global surface turbulent fluxes using solar-induced fluorescence

from different existing datasets to combine their strengths and factor out their limitations. On one hand, I appreciate this effort to bring together different datastreams and somehow harmonize them through this new consolidated product, but on the other, I am wary of this approach of blindly adding further algorithmic layers without really trying to understand mechanistically why the initial datasets have shortcomings. If all products are equally off in some parts, combining them just gives the false impression we are going in the right direction while reality is still off. Also, the FLUXNET-MTE used as training is already a machine learning product driven by various input variables, very much like WECANN is. Furthermore, there is quite some circularity in the work since the FLUXNET-MTE and MODIS GPP are both strongly based on the same fluxtowers used here for validation. I deem that all these points need to be acknowledged clearly and discussed thoroughly.

The Tramontana et al 2016 paper uses a regression model to upscale fluxes from FLUXNET observations.However, we use remote sensing observations to estimate fluxes, and use FLUXNET towers to evaluate the performance of our retrievals.Therefore the strategy is pretty different compared to the Tramontana et al retrieval.In addition, our main objective is to show that SIF provides useful information on the rates of photosynthesis and evapotranspiration.To our knowledge this is the first direct es-C2 timate of fluxes based on SIF data.We revised portions of the text in the introduction section to make sure the novelty of our approach is clearly stated.
Third, the authors used MPI-BGC product as a training dataset while testing the product against FLUXNET data.As MPI-BGC product was trained against FLUXNET dataset, the approach is self-correlated.Why not evaluating the product against independent datasets from MPI-BGC? E.g. water balance derived ET in basin scale.
Re: That is not exactly correct.We train our algorithm against a target dataset which is derived from three products (including MPI-BGC) by using the Triple Collocation method and assigning a priori weights to every product in each pixel.This means that our target dataset has collective information from all three products and not just MPI-BGC.Therefore, we acknowledge that there is some information carried from FLUXNET tower data in MPI-BGC to our training process.However, the degree of self-correlation is mainly true for the comparison of FLUXNET-MTE with the tower estimates but not for the WECANN estimate.Indeed, it has been shown (see Jimenez et al. 2009 for instance) that the spatial and temporal correlations of a global artificial neural network are not due to the initial training dataset but to the remote sensing observations used as input.
Moreover, given that FLUXNET-MTE uses flux tower estimates for the retrieval we would have expected this product to be the better one when compared to local eddy covariance data.In fact, we show the opposite.WECANN informed by direct remote sensing observations typically outperforms FLUXNET-MTE, especially in terms of seasonal cycle, further emphasizing the information content provided by remote sensing data.

C3
Finally, conducting a water balance analysis will be informative while it has its own challenges because of uncertainties in other inputs for the water balance, in order to close the budget as multiple sources of information needs to be used.However, we believe that this is beyond the scope of the current study which is solely focused on developing the retrieval algorithm, also the other referee commented on the length of the paper asking us to reduce it.
Fourth, the spatial domain should be clearly defined.The authors said it is global product, but it did not include Antarctica and Greenland.Given the coarse resolution (100 km), most islands are likely uncovered but the global map (Fig 2 ) showed fluxes in some islands.How did it happen?Also, how to treat with water fraction for each 1-degree pixel?Re: Thank you for the comments.We have now revised the description in the introduction section to clearly note what the coverage of the new product is.C4 Re: The referee's point is an important one.The SIF relationship with GPP will likely change in C4 plants.However, we explicitly did not want to impose the C4/C3 (or even CAM) delimitation in the artificial neural network as it would be highly dependent on the quality of the classification map used and might be time varying.Given that we do not have partitioning of transpiration to total ET, it is hard to say whether the water use efficiency is indeed low or if rain re-evaporation and soil evaporation is the main process explaining the difference.We have nonetheless added a comment in the text emphasizing the referee's point(s).

Specific comments:
P6: why only 21 FLUXNET sites were used?More than 150 sites data are open to public Re: We had selected this 21 sites to represent a range of climatic conditions along a geographical gradient for validation of our retrieval.Presenting evaluation metrics and temporal time series for 150 sites would lengthen the manuscript and make it hard to read.However, in the revised manuscript we will present summary statistics from comparison of WECANN retrievals against a much larger number of tower data from the FLUXNET 2015 dataset in the Appendix.

P7 L9: Please define "multiple datasets." Is this training dataset?
Re: This refers to the three products that we use (together with error weights from Triple Collocation) to define a target dataset for training.We revised the text in the new version of the manuscript to clarify this.

P7 L12: What is "this" in "this prior distribution"?
Re: It refers to the pseudo Bayesian training mentioned in the lines before.We revised the text in the new version and clarified the point.
Re: Yes, this is the same.We made changes to section 3 and 4 of the manuscript in the new version to clarify all these terminologies.
P8 L22: Add another unit for GPP as PgC yr-1, which could be easily compared to the other studies.Same for LE (km3).
Re: Thanks for noting this.We included the new units along with previous ones in the new version of the manuscript.Re: We would like to emphasize that any new retrieval algorithm development requires some validation against ground truth observations.In fact, other reviewers wanted to see some comparison.While there is some caveat in validation against point based tower data, these are the only ground based observations available for such a validation.Moreover, in the comparison against tower data many large scale variabilities such as seasonal cycle are comparable to pixel based retrievals.This is also the case for interannual variability, and we have discussed them in detail, in section 4.4 of the original manuscript.For instance, the phenology has a strong impact on the seasonal cycle of the fluxes and is here clearly highlighted when comparing the different products to flux tower estimates.
In the revised manuscript, we highlighted this limitation in section 4.4, while noting that comparison against ground-based tower observations is common practice and is what the community indeed looks for when a new retrieval algorithm is developed.We believe that specific drought or flood events would lack the generality provided here when comparing all years/months.

Fifth
, I recommend showing global uncertainty maps for GPP, LE, H.I think one of strengths in WECANN is its ability to quantify uncertainty.Show the uncertainty map and discuss where and why uncertainties are high.Also quantify uncertainties in global values (e.g.XXX PgC yr-1 +-Y PgC yr-1).Re: In the revised manuscript we now include uncertainty estimates based on errors in the input data propagated into the network.We report a global average value as error is spatially and temporally variable.Sixth, test global more carefully.When I look at Fig 2, I found higher ET in mid to south east South America (e.g.cerrado) compared to other global ET products.Also, your ET in this region is relatively very high compared to your GPP map.So, water use efficiency will be very low in this region, which is unlikely.See global distribution of C4 maps.Higher proportion in C4 in this area is likely to lead higher water use efficiency.It is notable that your ANN did not consider C4 information.

P6 L23- 24 :
The authors explained that target data is used for training, validation, and testing.I am confused with the terminology of validation and testing.How do they differ?Also, in L36, "after training, . .... was evaluated".Here, does "evaluation" indicate validation or testing?I recommend clearly defining each term, and use them consistently across the whole manuscript.Re: We apologize for the confusion.The training, validation and testing proportions are related to the training phase of the retrieval.The back propagation algorithm uses a portion of the training data for training (basically estimating the weights of each neuron), and other portions of the training data for validation and testing that aims at checking the convergence of the training step.While after the training is done, we use a subset C5 of data that were not used in the training process for evaluation.We revised the text in the new version of the manuscript to clarify these terminologies.

P9 L29: I was surprised to see the reduction of GPP in the Saharan Desert after removing SiF. How to interpret this as we know there must be zero GPP? Also, exclusion of SiF in LE made mixed tendencies in this region. As we are confident LE and GPP are close to nil in this area, it will be interesting to test the impacts of inclusion/exclusion in SiF on LE and GPP here.
Re: This observation is true, and is caused by noise.As noted correctly by the referee, P12-