EGU24-12293, updated on 09 Mar 2024
https://doi.org/10.5194/egusphere-egu24-12293
EGU General Assembly 2024
© Author(s) 2024. This work is distributed under
the Creative Commons Attribution 4.0 License.

Towards improved spatio-temporal selection of training data for LSTM-based flow forecasting models in Canadian basins

Everett Snieder and Usman Khan
Everett Snieder and Usman Khan
  • York University, Civil Engineering, Toronto, Canada (esnieder@yorku.ca)

Machine learning has extensively been applied to for flow forecasting in gauged basins. Increasingly, models generating forecasts in some basin(s) of interest are trained using data from beyond the study region. With increasingly large hydrological datasets, a new challenge emerges: given some region of interest, how do you select which basins to include among the training dataset?

There is currently little guidance on selecting data from outside the basin(s) under study. An intuitive approach might be to select data from neighbouring basins, or basins with similar hydrological characteristics. However, a growing body of research suggests that including hydrologically dissimilar basins can in fact produce greater improvements to model generalisation. In this study, we use clustering as a simple yet effective method for identifying temporal and spatial hydrological diversity within a large hydrological dataset. The clustering results are used to generate information-rich subsets of data, that are used for model training. We compare the effects that basin subsets, that represent various hydrological characteristics, have on model generalisation.
Our study shows that data within individual basins, and between hydrologically similar basins, contain high degrees of redundancy. In such cases, training data can be heavily undersampled with no adverse effects – or even moderate improvements to model performance. We also show that spatial hydrological diversity can hugely benefit model training, providing improved generalisation and a regularisation effect.

How to cite: Snieder, E. and Khan, U.: Towards improved spatio-temporal selection of training data for LSTM-based flow forecasting models in Canadian basins, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-12293, https://doi.org/10.5194/egusphere-egu24-12293, 2024.