The persistence of errors: How evaluating models over data partitions relates to a global evaluation

Daniel Klotz; Martin Gauch; Grey Nearing; Sepp Hochreiter; Frederik Kratzert

doi:https://doi.org/10.5194/egusphere-egu23-15221

[Back] [Session HS1.3.1]

EGU23-15221

https://doi.org/10.5194/egusphere-egu23-15221

EGU General Assembly 2023

© Author(s) 2023. This work is distributed under
the Creative Commons Attribution 4.0 License.

The persistence of errors: How evaluating models over data partitions relates to a global evaluation

Daniel Klotz¹, Martin Gauch^1,2, Grey Nearing², Sepp Hochreiter¹, and Frederik Kratzert²

Daniel Klotz et al.

¹LIT AI Lab & Institute for Machine Learning, Johannes Kepler University Linz, Linz, Austria
²Google Research, Austria, USA

Skillful today, inept tomorrow. Today's hydrological models have pronounced and complex error dynamics (e.g., small, highly correlated errors for low flows and large, random errors for high flows). Modellers generally accept that simple, variance based evaluation criteria — like the Nash-Sutcliffe Efficiency (NSE) — are not fully able to capture these intricacies. The (implied) consequences of this are however seldom discussed.

This contribution examines how evaluating the model over two data partitions (above and below a chosen threshold) relates to a global model evaluation of both partitions combined (i.e., the usual way of computing the NSE). For our experiments we manipulate dummy simulations with gradient descent to approximate specific NSE values for each partition individually. Specifically, we set the NSE for runoff values that fall below the threshold, and vary the NSE of the simulations above the threshold as well as the threshold itself. This enables us to study how the global NSE relates to the partition NSEs and the threshold. Intuitively, one would wish that the global NSE somehow reflects the performance on the partitions in a comprehensible manner. We do however show that this relation is not trivial.

Our results also show that subdividing the data and evaluating over the resulting partitions yields different information regarding model deficiencies than an overall evaluation. The downside is that we have less data to estimate the NSE. In the future we can use this for model selection and diagnostic purposes.

How to cite: Klotz, D., Gauch, M., Nearing, G., Hochreiter, S., and Kratzert, F.: The persistence of errors: How evaluating models over data partitions relates to a global evaluation, EGU General Assembly 2023, Vienna, Austria, 24–28 Apr 2023, EGU23-15221, https://doi.org/10.5194/egusphere-egu23-15221, 2023.