EGU22-8396, updated on 28 Mar 2022
https://doi.org/10.5194/egusphere-egu22-8396
EGU General Assembly 2022
© Author(s) 2022. This work is distributed under
the Creative Commons Attribution 4.0 License.

Rate my Hydrograph: Evaluating the Conformity of Expert Judgment and Quantitative Metrics

Martin Gauch1, Frederik Kratzert2, Juliane Mai3, Bryan Tolson3, Grey Nearing4, Hoshin Gupta5, Sepp Hochreiter1, and Daniel Klotz1
Martin Gauch et al.
  • 1Institute for Machine Learning, Johannes Kepler University, Linz, Austria (gauch@ml.jku.at)
  • 2Google Research, Vienna, Austria
  • 3Department of Civil and Environmental Engineering, University of Waterloo, Waterloo, Canada
  • 4Google Research, Mountain View, USA
  • 5Department of Hydrology and Atmospheric Sciences, The University of Arizona, Tucson, USA

As hydrologists, we pride ourselves on being able to identify deficiencies of a hydrologic model by looking at its runoff simulations. Generally, one of the first questions that a practicing hydrologist always asks when presented with a new model is: "show me some hydrographs!". Everyone has an intuition about how a "real" (i.e., observed) hydrograph should behave [1, 2]. Although there exists a large suite of summary metrics that measure differences between simulated and observed hydrographs, those metrics do not always fully account for our professional intuition about what constitutes an adequate hydrological prediction (perhaps because metrics typically aggregate over many aspects of model performance). To us, this suggests that either (a) there is potential to improve existing metrics to conform better with expert intuition, or (b) our expert intuition is overvalued and we should focus more on metrics, or (c) a bit of both.

In the social study proposed here, we aim to address this issue in a data-driven fashion: We will ask experts to access a website where they are tasked to compare two unlabeled hydrographs (at the same time) against an observed hydrograph, and to decide which of the unlabeled ones they think matches the observations better. Together with information about the experts’ background expertise, the collected responses should help paint a more nuanced picture of the aspects of hydrograph behavior that different members of the community consider important. This should provide valuable information that may enable us to derive new (and hopefully better) model performance metrics in a data-driven fashion directly from human ratings.

 

[1] Crochemore, Louise, et al. "Comparing expert judgement and numerical criteria for hydrograph evaluation." Hydrological sciences journal 60.3 (2015): 402-423.

[2] Wesemann, Johannes, et al. "Man vs. Machine: An interactive poll to evaluate hydrological model performance of a manual and an automatic calibration." EGU General Assembly Conference Abstracts. 2017.

How to cite: Gauch, M., Kratzert, F., Mai, J., Tolson, B., Nearing, G., Gupta, H., Hochreiter, S., and Klotz, D.: Rate my Hydrograph: Evaluating the Conformity of Expert Judgment and Quantitative Metrics, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-8396, https://doi.org/10.5194/egusphere-egu22-8396, 2022.