- 1Euro-Mediterranean Center on Climate Change, Venice, Italy (majid.niazkar@cmcc.it)
- 2Ca’ Foscari University of Venice, Venice, Italy
- 3Centre of Natural Hazards and Disaster Science, Uppsala University, Uppsala, Sweden
- 4Department of Earth Sciences, Uppsala University, Uppsala, Sweden
Accurate streamflow prediction is crucial for water resources management, particularly in the regions facing challenges such as water scarcity and hydrological unpredictability. Physical-based hydrological models have long been used for rainfall-runoff simulations by solving equations governing hydrological processes in a typical watershed. In addition, Machine Learning (ML) models emerged as versatile data-driven approaches capable of capturing intricate patterns of hydroclimatic variables, which can be used for streamflow prediction.
The aim of this study is to compare performances of two distinct approaches: (i) the process-based and semi-distributed Hydrological Predictions for the Environment (HYPE) model and Extreme Gradient Boosting (XGBoost), a tree-based ML algorithm. The case study is upper Reno River Basin, situated in northern Italy. Precipitation across the basin varies considerably due to orographic influences, while this spatial variability drives diverse seasonal and regional streamflow patterns. For this purpose, a 5-km gridded meteorological data (the ERG5 dataset) was used as input for both models from 2001 to 2022. The database was developed by ARPAe-SIMC for the Emilia-Romagna region in Italy. Furthermore, the streamflow was considered as output results. For the sake of comparison, both models were calibrated using the same time series, partitioning the data into 75% for calibration/training and 25% for testing.
The simulation performance for river discharge showed high values of the Kling-Gupta Efficiency (KGE) for the training phase as XGBoost showed slightly better values of KGE (0.86) than that of HYPE (0.82). For the test period, KGE around 0.8 was obtained by both models. Thus, the KGE values were comparable for both models, with HYPE slightly outperforming XGBoost (0.82 vs. 0.78). The flow-duration curves revealed that both models performed well for estimating peak discharges (below 30% occurrence). However, for drier conditions, HYPE shows a better agreement with the observed data, while ML tended to overestimate it.
The results indicate that traditional hydrological models performed slightly better than XGBoost for streamflow estimation in the region under investigation. The performance of XGBoost may be improved if seasonality was taken into account, which can be explored in future works. Based on the comparative analysis, ML techniques can provide a suitable alternative in cases where little is known about the region’s hydrological characteristics, leveraging data patterns without requiring detailed process knowledge. Nonetheless, the application of ML requires caution, as its black-box nature may obscure the underlying physical and hydrological processes, potentially leading to misinterpretation of results. Finally, this comparison provides valuable guidance for researchers and practitioners in selecting appropriate tools for streamflow prediction tasks.
Acknowledgements: This research work was carried out as part of the TRANSCEND project with funding received from the European Union Horizon Europe Research and Innovation Programme under Grant Agreement No. 10108411.
How to cite: Niazkar, M., Cenobio-Cruz, O., Mozzi, G., Di Baldassarre, G., and Pal, J.: Hydrological Modelling vs. Machine Learning for Water Availability: Case Study from the Reno Basin (Italy), EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-17296, https://doi.org/10.5194/egusphere-egu25-17296, 2025.