Large-scale comparison of machine and statistical learning algorithms for blending gridded satellite and earth-observed precipitation data

Georgia Papacharalampous; Hristos Tyralis; Anastasios Doulamis; Nikolaos Doulamis

doi:https://doi.org/10.5194/egusphere-egu23-3296

[Back] [Session HS3.1]

EGU23-3296

https://doi.org/10.5194/egusphere-egu23-3296

EGU General Assembly 2023

© Author(s) 2023. This work is distributed under
the Creative Commons Attribution 4.0 License.

Large-scale comparison of machine and statistical learning algorithms for blending gridded satellite and earth-observed precipitation data

Georgia Papacharalampous¹, Hristos Tyralis², Anastasios Doulamis³, and Nikolaos Doulamis⁴

Georgia Papacharalampous et al.

¹National Technical University of Athens, School of Rural, Surveying and Geoinformatics Engineering, Athens, Greece (papacharalampous.georgia@gmail.com)
²National Technical University of Athens, School of Rural, Surveying and Geoinformatics Engineering, Athens, Greece (montchrister@gmail.com)
³National Technical University of Athens, School of Rural, Surveying and Geoinformatics Engineering, Athens, Greece (adoulam@cs.ntua.gr)
⁴National Technical University of Athens, School of Rural, Surveying and Geoinformatics Engineering, Athens, Greece (ndoulam@cs.ntua.gr)

An established way for improving the accuracy of gridded satellite precipitation products is to “correct” them by exploiting ground-based precipitation measurements, together with machine and statistical learning algorithms. Such corrections are made in regression settings, where the ground-based measurements are the dependent variable and the satellite data are predictor variables. Comparisons of machine and statistical learning algorithms in the direction of obtaining the most useful precipitation datasets by performing such corrections are regularly conducted in the literature. Nonetheless, in most of these comparisons, a small number of machine and statistical learning algorithms are considered. Also, small geographical regions and limited time periods are examined. Thus, the results provided tend to be of local importance and to not offer more general guidance. To provide results that are generalizable, we compared eight state-of-the-art machine and statistical learning algorithms in correcting satellite precipitation data for the entire contiguous United States and for a 15-year period. We used monthly data from the PERSIANN (Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks) gridded dataset and the Global Historical Climatology Network monthly database, version 2 (GHCNm). Our results suggest that extreme gradient boosting (XGBoost) and random forests are more accurate than the remaining algorithms, which can be ordered as follows from the best to the worst ones: Bayesian regularized feed-forward neural networks, multivariate adaptive polynomial splines (poly-MARS), gradient boosting machines (gbm), multivariate adaptive regression splines (MARS), feed-forward neural networks, linear regression.

How to cite: Papacharalampous, G., Tyralis, H., Doulamis, A., and Doulamis, N.: Large-scale comparison of machine and statistical learning algorithms for blending gridded satellite and earth-observed precipitation data, EGU General Assembly 2023, Vienna, Austria, 24–28 Apr 2023, EGU23-3296, https://doi.org/10.5194/egusphere-egu23-3296, 2023.

Supplementary materials

Supplementary material file