- 1University of Stuttgart, Stuttgart Center for Simulation Science, Research Group for Statistical Model-Data Integration, Stuttgart, Germany (anneli.guthke@simtech.uni-stuttgart.de)
- 2Institute of Water and Environment, Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany
Despite great success of deep learning models in many applications of hydrological prediction, they still face limitations in predicting extreme events or in generalizing to unseen conditions, which raises questions about their fidelity and applicability beyond purely operational purposes. Physics-informed hybrid modelling is often proposed as a way to install interpretability and enable trustworthy data-driven predictions that are in agreement with theoretical knowledge. Yet, the community is still in search of best practices for how to construct physics-informed machine learning models – several “entry points” for physics knowledge exist, i.e., the loss function, the model inputs, or the architecture. Here, we focus on the latter, and on arguably the most “constrained” form of bringing in physics into a hybrid model: a traditional, process-based (conceptual) hydrological model is combined with a data-driven component (here: a long short-term memory network, LSTM) that modifies its parameters over time, as learned by training on observed discharge values. For this apparently well-constrained scenario of hybrid modelling, we raise the question if it can faithfully be called “physics-constrained”, or if the data-driven component is able to overwrite these constraints for the sake of increased performance.
To objectively address this question, we introduce an entropy-based method to quantify the “activity” of the data-driven component in acting against the conceptual constraints. This metric is complemented with a diagnostic workflow to better understand the internal functioning of the resulting, effective hybrid model structure in predicting discharge. Through didactic examples, inspired by real-world case studies, we present the method and build an intuition of what our entropy-based metric represents. Further, we discuss selected results from a large-sample case study on CAMELS-GB to illustrate the variety of findings and insights we had: (1) Performance heavily relies on the data-driven component, and the physics constraints often even make the prediction problem harder instead of adding helpful information; (2) the data-driven component tends to overwrite the constrained architecture “silently”, but this can be detected with our proposed workflow; (3) even nonsensical-at-first-sight constraints can in fact increase performance, as the hybrid model is transformed into a new structure that is parsimonious and efficient; (4) claiming interpretability on the basis of prescribed constraints is risky at best – before calling a hybrid model of this type interpretable, we should carefully check what’s happening inside. Overall, these findings provide fundamental guidance towards (hybrid) model building and will help us find better ways to reconcile knowledge and information in data for trustworthy models.
How to cite: Guthke, A., Álvarez Chaves, M., Acuna Espinoza, E., and Ehret, U.: Physics-constrained or physics-ignored? An entropy-based approach to diagnose if your hybrid model effectively skips conceptual constraints, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-18384, https://doi.org/10.5194/egusphere-egu26-18384, 2026.