EGU22-5580
https://doi.org/10.5194/egusphere-egu22-5580
EGU General Assembly 2022
© Author(s) 2022. This work is distributed under
the Creative Commons Attribution 4.0 License.

Assessing the generalization power of three machine learning models and three evapotranspiration formulas using 143 FLUXNET towers data

Alireza Amani1, Marie-Amélie Boucher1, Alexandre R. Cabral1, and Daniel F. Nadeau2
Alireza Amani et al.
  • 1Université de Sherbrooke, Civil and Building Engineering department, Sherbrooke, Canada
  • 2Université Laval, Department of Civil and Water Engineering, Quebec, Canada

Direct measurement of evapotranspiration (ET) is costly and difficult to implement on a large scale. It is therefore a necessity to count on reliable approaches to estimate it. Among such approaches, Machine learning models (MLMs) are easily applicable and computationally inexpensive, especially for broadscale analyses. In this study, three different types of MLMs, namely Random Forest, Light Gradient Boosting Machine and Neural Networks are assessed for their estimation accuracy on unseen locations (i.e. generalization power). Estimates of ET from these MLMs are compared against direct observation from 143 eddy-covariance flux towers spanning across a broad range of climate and vegetation types. We initially hypothesized that the MLMs, provided that they are trained using data from a wide variety of climate and vegetation types, are able to accurately estimate ET on unseen locations (default experiment).  The MLMs are benchmarked against Penman, Priestley-Taylor, and Oudin ET formulas/models. The results show that the MLMs indeed perform satisfactorily on the majority of the test locations, but not in all of them, yielding on average a 15% lower normalized mean-absolute-error (NMAE) than the Priestley-Taylor formula. Moreover, we compared the performance of the MLMs trained and tested using different data splitting strategies. When training and testing data are not spatially separated, the results show that the Random Forest model has a 7% lower NMAE compared to when the spatial separation is done (the default experiment). This suggests that the MLMs are prone to overfit to site-specific patterns that might not be relevant for other locations. In conclusion, the results of this large scale study points toward reliability of the MLMs as far as their generalization power is concerned. At the same time, they also show that different data splitting strategies can lead to significantly different results. 

How to cite: Amani, A., Boucher, M.-A., Cabral, A. R., and Nadeau, D. F.: Assessing the generalization power of three machine learning models and three evapotranspiration formulas using 143 FLUXNET towers data, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-5580, https://doi.org/10.5194/egusphere-egu22-5580, 2022.

Displays

Display file