Hydrologically constrained genetic programming for interpretable rainfall–runoff model discovery

Naila Matin; Viraj Vidura Herath Herath Mudiyanselage; Abhishek Saha; Lucy Marshall; Vladan Babovic

doi:https://doi.org/10.5194/egusphere-egu26-8757

[Back] [Session HS3.1]

EGU26-8757, updated on 14 Mar 2026

https://doi.org/10.5194/egusphere-egu26-8757

EGU General Assembly 2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

Hydrologically constrained genetic programming for interpretable rainfall–runoff model discovery

Naila Matin¹, Viraj Vidura Herath Herath Mudiyanselage², Abhishek Saha^3,4, Lucy Marshall², and Vladan Babovic¹

Naila Matin et al.

¹Department of Civil and Environmental Engineering, National University of Singapore, Singapore 117576, Singapore.
²School of Civil Engineering, Faculty of Engineering, The University of Sydney, Sydney, New South Wales, Australia.
³Hydroinformatics Institute, Singapore 118256, Singapore.
⁴Delft Institute of Applied Mathematics, Delft University of Technology, Delft, The Netherlands.

Data-driven rainfall–runoff models often deliver high predictive skill but provide limited insight into hydrological processes. Classic conceptual models, by contrast, are transparent and process-based but rely on a limited collection of empirically designed structures, so choosing and adapting an appropriate model across diverse catchments remains difficult. To bridge this gap, this study explores a hydrologically constrained genetic-programming (GP) framework that automatically discovers basin-specific conceptual model structures from a shared library of hydrological building blocks. Model structures are assembled from modular storages, flux functions, and routing components adapted from the Modular Assessment of Rainfall–Runoff Models Toolbox (MARRMoT) [1], which assembles and standardizes the storage and flux formulations of 47 established conceptual models. GP, an evolutionary algorithm, is then used to operate on structural flags and parameter values, selecting and combining these components into explicit model equations. Each candidate’s reservoir system is then assembled automatically and advanced with a mass-conserving implicit time-stepping scheme. Calibration uses a multi-objective NSGA-II algorithm, so structural choices and parameters are explored within a single optimization loop.

The framework is evaluated on CAMELS-US basins through three experiments. In a snow-dominated mountain catchment (Buffalo Fork, 13011900), the discovered structure reproduces the snowmelt-driven regime and flow-duration curve in the test period with high efficiency (held-out test period NSE ≈ 0.85). Uncertainty analyses indicate that a snow–soil–single-routing backbone is consistently retained. A transfer experiment to a hydrologically similar basin (Johnson Creek, 13313000) shows that directly reusing the Buffalo Fork structure and parameters already yields useful skill (NSE_test ≈ 0.72), while a short “hot-start” GP run seeded with this transferred solution can reach NSE_test ≈ 0.84, capturing most of the benefit of a much longer optimization (~40× fewer generations, at a small fraction of the computational cost). To evaluate the framework in a broader hydro-climatic context, it is benchmarked against the conceptual and LSTM rainfall–runoff models from the CAMELS benchmark study by Kratzert et al. [2]. We use 18 representative CAMELS-US basins (one medoid per HUC-2 region), asking the system to self-evolve a distinct model structure tuned to the hydro-climate of each basin from the same shared component library. In this multi-basin setting, the GP-derived models achieve a median NSE_test of about 0.70, generally match or exceed the conceptual benchmarks, and remain competitive with the LSTM variants. The results indicate that hydrologically constrained automated model discovery can help narrow the accuracy-interpretability trade-off, yielding transparent, physically consistent rainfall-runoff models and suggesting a potential path toward structure transfer in data-sparse or ungauged basins.

[1] L. Trotter, W. J. M. Knoben, K. J. A. Fowler, M. Saft, and M. C. Peel, “Modular Assessment of Rainfall–Runoff Models Toolbox (MARRMoT) v2.1: an object-oriented implementation of 47 established hydrological models for improved speed and readability,” Geosci. Model Dev., vol. 15, pp. 6359-6369, 2022.

[2] F. Kratzert, D. Klotz, G. Shalev, G. Klambauer, S. Hochreiter, and G. Nearing, “Towards learning universal, regional, and local hydrological behaviors via machine learning applied to large-sample datasets,” Hydrol. Earth Syst. Sci., vol. 23, pp. 5089-5110, 2019.

How to cite: Matin, N., Herath Mudiyanselage, V. V. H., Saha, A., Marshall, L., and Babovic, V.: Hydrologically constrained genetic programming for interpretable rainfall–runoff model discovery, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-8757, https://doi.org/10.5194/egusphere-egu26-8757, 2026.