Peeking Inside Hydrologists' Minds: Comparing Human Judgment and Quantitative Metrics of Hydrographs

Martin Gauch; Frederik Kratzert; Oren Gilon; Hoshin Gupta; Juliane Mai; Grey Nearing; Bryan Tolson; Sepp Hochreiter; Daniel Klotz

doi:https://doi.org/10.5194/egusphere-egu23-12261

[Back] [Session HS1.3.1]

EGU23-12261

https://doi.org/10.5194/egusphere-egu23-12261

EGU General Assembly 2023

© Author(s) 2023. This work is distributed under
the Creative Commons Attribution 4.0 License.

Peeking Inside Hydrologists' Minds: Comparing Human Judgment and Quantitative Metrics of Hydrographs

Martin Gauch^1,2, Frederik Kratzert², Oren Gilon², Hoshin Gupta³, Juliane Mai⁴, Grey Nearing², Bryan Tolson⁴, Sepp Hochreiter¹, and Daniel Klotz¹

Martin Gauch et al.

¹LIT AI Lab & Institute for Machine Learning, Johannes Kepler University Linz, Linz, Austria
²Google Research, Austria, Israel, USA
³Department of Hydrology and Atmospheric Sciences, University of Arizona, Tucson, USA
⁴Department of Civil and Environmental Engineering, University of Waterloo, Waterloo, Canada

Everyone wants their hydrologic models to be as good as possible. But how do we know if a model is accurate or not? In the spirit of rigorous and reproducible science, the answer should be: we calculate metrics. Yet, as humans, we sometimes follow a scheme of "I know a good model when I see it" and manually inspect hydrographs to assess their quality. This is certainly a valid method for sanity checks, but it is unclear whether these subjective visual ratings agree with metric-based rankings. Moreover, the consistency of such inspections is unclear, as different observers might come to different conclusions about the same hydrographs.

In this presentation, we report a large-scale study where we collected responses from 622 experts, who compared and judged more than 14,000 pairs of hydrographs from 13 different models. Our results show that overall, human ratings broadly agree with quantitative metrics in a clear preference for a Machine Learning model. At the level of individuals, however, there is a large amount of inconsistency between ratings from different participants. Still, in cases where experts agree, we can predict their most likely rating purely from qualitative metrics. This indicates that we can encode intersubjective human preferences with a small set of objective, quantitative metrics. To us, these results make a compelling case for the community to put more trust into existing metrics—for example, by conducting more rigorous benchmarking efforts.

How to cite: Gauch, M., Kratzert, F., Gilon, O., Gupta, H., Mai, J., Nearing, G., Tolson, B., Hochreiter, S., and Klotz, D.: Peeking Inside Hydrologists' Minds: Comparing Human Judgment and Quantitative Metrics of Hydrographs, EGU General Assembly 2023, Vienna, Austria, 24–28 Apr 2023, EGU23-12261, https://doi.org/10.5194/egusphere-egu23-12261, 2023.