Streamflow Forecasting using Genetic Programming, LSTM, and Hybrid Meta-Learning GP-LSTM model in Monsoon-Dominated Basins

Digvijay Singh; Vinayakam Jothiprakash

doi:https://doi.org/10.5194/egusphere-egu26-16415

[Back] [Session HS3.1]

EGU26-16415, updated on 14 Mar 2026

https://doi.org/10.5194/egusphere-egu26-16415

EGU General Assembly 2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

Streamflow Forecasting using Genetic Programming, LSTM, and Hybrid Meta-Learning GP-LSTM model in Monsoon-Dominated Basins

Digvijay Singh¹ and Vinayakam Jothiprakash²

Digvijay Singh and Vinayakam Jothiprakash

¹Indian Institute of Technology Bombay, Department of Civil Engineering, Mumbai, India (24m0576@iitb.ac.in)
²Indian Institute of Technology Bombay, Department of Civil Engineering, Mumbai, India (vprakash@iitb.ac.in)

Accurate streamflow prediction remains challenging in monsoon-dominated basins characterized by extreme flow variability. This study evaluated three machine learning approaches for daily streamflow forecasting using 39 years of data (1980-2018) of the Basantpur station, Mahanadi basin, India from the CAMELS-India dataset which are (1) Optuna- optimized Genetic Programming (GP) for interpretable symbolic regression, (2) Optuna- optimized bidirectional LSTM networks, and (3) a novel GP-LSTM meta-learning framework that predicts optimal hyperparameters from time series statistical features.

Analysis of highly skewed flow distributions (97.66% of values <5,000 m³/s) using the False Nearest Neighbor method identified six-day embedding dimensions. For regular flow conditions without extreme outliers, the optimized LSTM achieved superior performance (NSE = 0.92, KGE = 0.93, R² = 0.92) compared to GP (NSE = 0.86, KGE = 0.87, R² = 0.86). However, GP demonstrated lower absolute errors (RMSE = 197.68 vs. 210.46 m³/s) and produced interpretable mathematical expressions that revealed lag-dependent hydrological relationships.

The meta-learning framework showed the best results when tested on complete datasets, including those with extreme events. By extracting thirty-two statistical features that cover central tendency, time-based autocorrelation, complexity measures, and spectral properties, the GP- based meta-model learns to predict the best LSTM configurations for different flow patterns. This flexible approach performed better on test data with outliers, showing improved predictions for rare but important flood events.

The results suggest that standard deep learning is effective in normal conditions. However, meta-learning frameworks, which adjust model structure based on flow characteristics, provide better reliability for operational flood forecasting in complex monsoon-influenced areas. This proposed hybrid meta-learning framework aims to combine the strengths of both methods. Our initial implementation, though, reveals challenges that need more effort.

How to cite: Singh, D. and Jothiprakash, V.: Streamflow Forecasting using Genetic Programming, LSTM, and Hybrid Meta-Learning GP-LSTM model in Monsoon-Dominated Basins, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-16415, https://doi.org/10.5194/egusphere-egu26-16415, 2026.

Supplementary materials

Supplementary material link Supplementary material file

Comments on the supplementary material

AC: Author Comment | CC: Community Comment | Report abuse

supplementary materials version 1 – uploaded on 05 May 2026, no comments