From rainfall datasets to flood prediction: evaluating the impact of precipitation data source on catastrophic risk assessment by machine learning in France

Franck Baton; Mulah Moriah

doi:https://doi.org/10.5194/egusphere-egu26-19601

[Back] [Session ITS4.36/NH13.11]

EGU26-19601, updated on 14 Mar 2026

https://doi.org/10.5194/egusphere-egu26-19601

EGU General Assembly 2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

From rainfall datasets to flood prediction: evaluating the impact of precipitation data source on catastrophic risk assessment by machine learning in France

Franck Baton¹ and Mulah Moriah²

Franck Baton and Mulah Moriah

¹R&D Addactis, climate risk, France (baton.franck@gmail.com)
²R&D Addactis, climate risk, EURIA-LBMA (Laboratoire de Mathématiques de Bretagne Atlantique – LMBA) (mulah.moriah@etudiant.univ-brest.fr)

Precipitation is the primary driver of flood risk in France, with both cumulative totals and extreme intensity governing runoff and overflow events. Given the variety of available precipitation products, the choice of data source represents a critical methodological challenge for assessing flood risk. This study evaluates the reliability and predictive sensitivity of several daily precipitation datasets over French territory, including the new SIM2 chain, Météo-France station observations, ECMWF reanalyses (ERA5-Land and ERA-OBS), and regional reanalyses (CERRA and CERRA-Land).

We first perform an in-depth statistical intercomparison for the 1991-2020 period, using the Météo-France station network and ERA-OBS as references. Beyond classic performance metrics (Kling-Gupta Efficiency, RMSE), we place particular emphasis on extreme events using indices such as the Critical Success Index (CSI). Our results identify SIM2 as the most robust overall performer, while ERA-OBS shows high consistency in representing intense rainfall episodes.

Building on this comparison, we assess the operational impact of these data sources through a flood modelling application. Using municipal 'natural disaster' decrees (CatNat) available since 1989, an automatic and fully standardised procedure for variable construction, selection, and modelling is implemented, in which only the precipitation data source varies. We test several machine learning methods (Random Forest, XGBoost etc.) and design variables in multiple formats. This cross-sectional approach reveals how specific biases in meteorological products propagate into flood occurrence predictions. Our findings reinforce the importance of data set selection in hydrometeorological studies and provide a quantitative framework to evaluate the relevance of precipitation sources for the evaluation of insurance-related flood risk in France.

How to cite: Baton, F. and Moriah, M.: From rainfall datasets to flood prediction: evaluating the impact of precipitation data source on catastrophic risk assessment by machine learning in France, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-19601, https://doi.org/10.5194/egusphere-egu26-19601, 2026.