- 1RWTH Aachen University, Institute of Hydraulic Engineering and Water Resources Management, Aachen, Germany (paul.reis@rwth-aachen.de)
- 2Karlsruhe Institute of Technology, Institute of Water and Environment, Chair of Hydrology, Karlsruhe, Germany
- 3GFZ Helmholtz Centre for Geosciences, Section Hydrology, Potsdam, Germany
Deep learning, especially Long Short-Term Memory (LSTM) networks, has become popular in recent years for rainfall-runoff modelling. However, recent studies show that LSTM performance is constrained by a theoretical threshold, limiting the simulation of extreme discharge events. While the internal structure of the LSTM is one contributing factor, another contributor is the limited availability and diversity of hydro-meteorological training data of extremes, as major floods only represent a small fraction of the observed data.
To mitigate the underrepresentation of extreme hydrological events in the training data, this study investigates the effectiveness of data augmentation for rainfall-runoff modelling with LSTMs. Pre-generated artificial meteorological time series from the non-stationary climate-informed weather generator (nsRWG) are used to increase the representation of extreme events in the training data. The study area covers the region of North Rhine-Westphalia, Germany, and consists of 100 alternative precipitation and temperature scenarios spanning the past 70 years. Discharge for the catchments is simulated using an HBV model based on the nsRWG outputs. By combining observed time series from the CAMELS-DE dataset with artificial samples, the training set is enriched with additional extreme events, including samples that are more extreme in magnitude than those in the observed data. This augmented dataset is used to assess whether model performance in predicting extreme events can be improved. We aim to (1) assess whether data augmentation can shift the theoretical threshold limit of the LSTM, (2) quantify this limit, (3) optimize the integration of the weather generator data during training, and (4) evaluate overall predictive performance and, in particular, whether the prediction of extreme floods improves with the augmented training data.
How to cite: Reis, P., Dolich, A., Dasgupta, A., Hassenjürgen, P., Vorogushyn, S., Nguyen, V. D., and Loritz, R.: Integration of Generated Weather Data into LSTM Training to Improve the Simulation of Extreme Flood Events, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-12499, https://doi.org/10.5194/egusphere-egu26-12499, 2026.