- Karlsruhe Institute of Technology (KIT), Institute for Water and Enviroment, Hydrology, Karlsruhe, Germany (ralf.loritz@kit.edu)
Hydrological modelling has long been shaped by a steady drive toward ever more sophisticated models. In the era of machine learning, this race has turned into a relentless pursuit of complexity: deeper networks and ever more elaborate architectures that often feel outdated by the time the ink on the paper is dry. Motivated by a genuine belief in methodological progress, I, like many others, spent considerable effort exploring this direction, driven by the assumption that finding the “right” architecture or model would inevitably lead to better performance. This talk is a reflection on that journey; you could say my own Leidensweg. Over several years, together with excellent collaborators, I explored a wide range of state-of-the-art deep-learning approaches for rainfall–runoff modelling and other hydrological modelling challenges. Yet, regardless of the architecture or training strategy, I repeatedly encountered the same performance ceiling. In parallel, the literature appeared to tell a different story, with “new” models regularly claiming improvements over established baselines. A closer inspection, however, revealed that rigorous and standardized benchmarking is far from common practice in hydrology, making it difficult to disentangle genuine progress from artefacts of experimental design. What initially felt like a failure to improve my models turned out to be a confrontation with reality. The limiting factor was not the architecture, but the problem itself. We have reached a point where predictive skill is increasingly bounded by the information content of our benchmark datasets and maybe more importantly by the way we frame our modelling challenges, rather than by model design. Like many others, I have come to believe that if we want to move beyond the current performance plateau, the next breakthroughs are unlikely to come from ever more complex models alone. Instead, as a community, we need well-designed model challenges, better benchmarks, and datasets that meaningfully expand the information available to our models to make model comparisons more informative.
How to cite: Loritz, R., Dolich, A., and Heudorfer, B.: The empty mine: Why better tools do not help you find new diamonds, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-10401, https://doi.org/10.5194/egusphere-egu26-10401, 2026.