EGU General Assembly 2023
© Author(s) 2023. This work is distributed under
the Creative Commons Attribution 4.0 License.

Peeking Inside Hydrologists' Minds: Comparing Human Judgment and Quantitative Metrics of Hydrographs

Martin Gauch1,2, Frederik Kratzert2, Oren Gilon2, Hoshin Gupta3, Juliane Mai4, Grey Nearing2, Bryan Tolson4, Sepp Hochreiter1, and Daniel Klotz1
Martin Gauch et al.
  • 1LIT AI Lab & Institute for Machine Learning, Johannes Kepler University Linz, Linz, Austria
  • 2Google Research, Austria, Israel, USA
  • 3Department of Hydrology and Atmospheric Sciences, University of Arizona, Tucson, USA
  • 4Department of Civil and Environmental Engineering, University of Waterloo, Waterloo, Canada

Everyone wants their hydrologic models to be as good as possible. But how do we know if a model is accurate or not? In the spirit of rigorous and reproducible science, the answer should be: we calculate metrics. Yet, as humans, we sometimes follow a scheme of "I know a good model when I see it" and manually inspect hydrographs to assess their quality. This is certainly a valid method for sanity checks, but it is unclear whether these subjective visual ratings agree with metric-based rankings. Moreover, the consistency of such inspections is unclear, as different observers might come to different conclusions about the same hydrographs.

In this presentation, we report a large-scale study where we collected responses from 622 experts, who compared and judged more than 14,000 pairs of hydrographs from 13 different models. Our results show that overall, human ratings broadly agree with quantitative metrics in a clear preference for a Machine Learning model. At the level of individuals, however, there is a large amount of inconsistency between ratings from different participants. Still, in cases where experts agree, we can predict their most likely rating purely from qualitative metrics. This indicates that we can encode intersubjective human preferences with a small set of objective, quantitative metrics. To us, these results make a compelling case for the community to put more trust into existing metrics—for example, by conducting more rigorous benchmarking efforts.

How to cite: Gauch, M., Kratzert, F., Gilon, O., Gupta, H., Mai, J., Nearing, G., Tolson, B., Hochreiter, S., and Klotz, D.: Peeking Inside Hydrologists' Minds: Comparing Human Judgment and Quantitative Metrics of Hydrographs, EGU General Assembly 2023, Vienna, Austria, 24–28 Apr 2023, EGU23-12261,, 2023.