- 1Department of Geography, University of Bonn, Bonn, Germany
- 2Schulich School of Engineering, University of Calgary, Calgary, Canada
Large language Models (LLMs) are being developed and marketed at a rapid pace, and practitioners and scientists across many fields are exploring applications that deliver on the promises made by the leading Large Language Model providers. Given the advent of this new technology, the fields of hydrology and hydrologic modeling are starting to investigate its potential application. The idea of an AI assistant that is skilled in hydrological reasoning is exciting and timely. Despite the growing application of LLMs across earth sciences, it remains unclear if and how they can provide meaningful guidance on hydrological modelling.
In this study, we investigate whether LLMs provide robust a priori suggestions for conceptual model structure, based on the implicit hydrological understanding captured in their training data. We addressed this aim across 14 diverse and a separate set of 26 hydrologically similar catchments in the contiguous United States using Google’s Gemini 2.5 Flash model. We translated the conceptual hydrological modeling framework FUSE (Framework for Understanding Structural Errors) into five different structured text-based prompts, differing in symbolic abstraction. Next, we tasked the LLM to recommend suitable hydrological model components for each catchment based on their geographic location. These recommendations were then evaluated against an exhaustive set of all 78 plausible FUSE configurations.
We assessed the outcome of streamflow simulations from the recommendation of the LLM regarding KGE performance, regional consistency, and model fidelity in representing hydrological signatures. Our preliminary results indicate that LLMs can be prompted to adhere to strict modeling frameworks and provide model component recommendations that strongly adhere to the given restrictions resulting in executable model setups. Furthermore, the structure of the prompt profoundly impacts efficacy, highlighting a need for future research on prompt design. However, the model commonly did not recommend the top-performing structures and demonstrated inconsistency by recommending different model components across repeated identical prompts. This research represents a first step toward establishing benchmarks for "hydrologic understanding" in LLMs and assessing their viability in future modeling applications.
How to cite: Schultze, P., Eythorsson, D., Clark, M., and Klaus, J.: Does AI Understand Hydrology? - Investigating AI recommended conceptual hydrological model setups, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-8123, https://doi.org/10.5194/egusphere-egu26-8123, 2026.