- 1Dept. of Earth and Environmental Sciences, University of Milano-Bicocca, Milan, Italy
- 2Dept. of Physics, University of Milano-Bicocca, Milan, Italy
An accurate knowledge of precipitation data at high spatio-temporal resolution is crucial for hydrological forecasting, meteorological analysis, and climate studies. This is especially true in mountainous areas, where traditional climate models struggle to accurately predict precipitation due to factors such as low spatial resolution and where rain gauges are sparse. High-elevation areas are particularly relevant as they act as reservoirs of water resources and are characterized by elevation-dependent climate change signals (Pepin et al., 2022). By leveraging the good performances of the satellite-based IMERG (Integrated Multi-satellitE Retrievals for GPM) rainfall product and the realism of the ERA5 atmospheric reanalysis, we aim to produce a multi-decadal daily rainfall product at the IMERG spatial resolution (roughly 8 km) over the Greater Alpine Region (GAR). To achieve this, we employ advanced machine learning techniques designed to capture the complex, non-linear relationships inherent in atmospheric processes.
Twenty years of IMERG data (from 2001 to 2020) are used to train and test various types of machine learning algorithms to estimate daily precipitation maps starting from some ERA5 atmospheric fields including mid-tropospheric temperature and winds; vertically integrated ice, liquid water and water vapour contents; total precipitation, and other relevant variables. In addition to these atmospheric fields, a high-resolution elevation dataset (ETOPO) is used to represent the intricate terrain of the Alps. The Recursive Feature Elimination (RFE) technique is employed to select key input variables, introducing effective predictors and enhancing the understanding of the influence of physical atmospheric variables and their inter-relationships in mountainous regions. ERA5 total precipitation, vertically integrated ice and water vapour content appear to be the three most relevant input fields for an optimal estimate of IMERG precipitation. Among the algorithms tested (XGBoost, Random Forest, Convolutional Neural Networks, Deep Neural Networks), XGBoost (XGB) is found to be the most reliable and computationally efficient.
The results show a spatiotemporal RMSE improvement of approximately 15 percent, decreasing from 5.18 mm/day (between ERA5 and IMERG) to 4.37 mm/day (between XGB and IMERG). On a seasonal basis, the RMSE is higher in summer and fall, where higher mean precipitation intensities are observed. Also, in terms of changes with the terrain height, the RMSE follows quite tightly the mean precipitation elevation dependence. The XGB model is used to backward extend the IMERG dataset so that precipitation biases and trends can be computed over a multi-decadal time range. These findings demonstrate the potential of machine learning to improve the accuracy of ERA5 rainfall data, which can be exploited to advance our understanding of the emerging elevation-dependent climate change signal.
How to cite: Goudarzi, I., Fazzini, D., Pasquero, C., Meroni, A. N., and Borgnino, M.: A machine learning-based backward extension of IMERG daily precipitation over the Greater Alpine Region, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-369, https://doi.org/10.5194/egusphere-egu25-369, 2025.