Suggesting a new diagram and convention for characterising and reporting model performance
- 1Department of Sustainable Development, Environmental Science and Engineering, KTH Royal Institute of Technology, Teknikringen 10B, 100 44, Stockholm, Sweden
- 2Uppsala University, Earth Sciences, and Centre of Natural Hazards and Disaster Science, Uppsala, Sweden (sina.khatami@geo.uu.se)
- 3Department of Hydrology and Atmospheric Sciences, The University of Arizona, Tucson, AZ, USA
- 4Centre for Integrative Ecology, School of Life and Environmental Sciences, Deakin University, Melbourne, Australia
- 5Department Water Resources and Drinking Water, Eawag, Swiss Federal Institute of Aquatic Science and Technology, Dübendorf, Switzerland
A long-standing research issue, in the hydrological sciences and beyond, is that of developing methods to evaluate the predictive/forecasting skill, errors and uncertainties of a model (or model ensembles). Various methods have been proposed for characterising time series residuals, i.e. the differences between observed (or target) and modelled (or estimate) time series. Most notably, the Taylor Diagram summarises model performance via a single plot based on three related metrics: the (linear Pearson) correlation, standard deviation, and root mean squared differences of one or multiple pairs of target and estimate time series. Despite its theoretical elegance and widespread use, the Taylor diagram does not account for bias errors, which is an important summary statistic for evaluating model performance. Further, it is very common to evaluate, compare, and report on model “skill” by use of a single aggregate metric value, even when a vector of metrics is used to calibrate/train the model; most commonly this is a dimensionless efficiency metric such as Nash-Sutcliffe Efficiency (NSE) or Kling-Gupta Efficiency (KGE). Such “efficiency” metrics typically aggregate over multiple types of residual behaviours: for example the most commonly used version of KGE is based on correlation, bias, and variability errors, although the authors recommended that it should be applied in a context-dependent fashion based on which model behaviours are deemed to be important to a given situation. Nevertheless, the use of a single summary value fails to account for the interactions among the error component terms, which can be quite informative for the evaluation and benchmarking of models. In this study, we propose a new diagram that is as easy to use and interpret as the Taylor Diagram, while also accounting for bias. We further suggest a new convention for reporting model skill that is based on foundational error terms. Our vision is that this new diagram and convention will enable researchers and practitioners to better interpret and report model performance. We provide multiple numerical examples to illustrate how this approach can be used for evaluating performance in the context of multi-model and multi-catchment (large-sample) studies.
How to cite: Khatami, S., Di Baldassarre, G., Gupta, H., Moallemi, E. A., and Pool, S.: Suggesting a new diagram and convention for characterising and reporting model performance, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-8211, https://doi.org/10.5194/egusphere-egu22-8211, 2022.