- 1BioSense, Novi Sad, Serbia
- 2Skoltech, Mosocw, Russia
- 3WPI, Moscow, Russia
The last decade has been crucial for the development of artificial intelligence in all spheres of human life. The interest in this field can be explained by two factors: the increase in computational capacities and the availability of large datasets in both qualitative and quantitative terms. This study is devoted to the application of the LSTM neural network in modeling and forecasting daily discharge time-series for the rivers of the East European Plain with predominantly snowmelt or mixed river nourishment. A unique CAMELS_ru dataset, including both dynamic and static characteristics for 75 rivers, was created. Reanalysis data, geospatial grids, and time series of meteorological and hydrological characteristics covering a period of 70 years from 1950 to 2019 were collected and processed. As part of the input data preparation process, information on soil parameters, forestation, geological structure, and averaged climatic and hydrological parameters were obtained. Based on the existing LSTM architecture, a model implementation for the selected rivers was created. The dataset was partitioned in the following ratio: 60% for the training sample, 10% for the validation sample, and 30% for the test sample.
The Nash-Sutcliffe coefficient is close to 0.9 in most cases, which indicates that the model has sufficient predictive ability. The model captures the main patterns and trends in the existing data well, and the low value of the RMSE to STD ratio confirms that it is able to predict the time series with high accuracy. However, forecasting historical extreme events that lie beyond existing time-series data remains a “mission impossible” due to the overall concept of data-driven (DD) models.
An important modeling experiment was conducted on a reduced sample of 5 years, which proved the theory that high modeling results are a consequence of the large length of the data series and not an error. Additionally, it is assumed that in the context of using neural networks, there is no need to limit the time series to the last 30 years due to climate variability. The approach underlying the use of neural networks allows the model to account for climate dynamics when building internal relationships, thus ensuring good data reproduction and high-quality modeling. The current version of the model is a promising start for the development of data-driven models on a major regional scale.
How to cite: Kireeva, M., Gorbarenko, A., and Moreydo, V.: Harnessing data-driven insights: advanced modeling of discharge time-series for the East European plain, application and potential, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-13977, https://doi.org/10.5194/egusphere-egu25-13977, 2025.