HS1.3.1
Revisiting good modelling practices – where are we today?

HS1.3.1

EDI
Revisiting good modelling practices – where are we today?
Convener: Diana Spieler | Co-conveners: Janneke Remmers, Keirnan Fowler, Joseph Guillaume, Lieke Melsen
Presentations
| Tue, 24 May, 08:30–10:00 (CEST)
 
Room 2.31

Presentations: Tue, 24 May | Room 2.31

Chairpersons: Diana Spieler, Janneke Remmers, Lieke Melsen
08:30–08:35
|
EGU22-5215
|
Virtual presentation
Claude Gout and Marie-Christine Cacas-Stentz

Deep subsurface dynamic models allow simulating the interaction of multiple physical processes at regional and geological scale. In the past three decades, O&G industry developed so called Basin and Petroleum Systems Models to improve the prediction of hydrocarbons accumulation and reduce risks of exploration wells failure. By simulating the geological history of a sedimentary basin from its origin, these thermo-hydro-mechanical and chemical (THMC) models provide at present day a balanced distribution of static and dynamic properties of a huge volume of rocks.

 

For the last years, one of these THMC simulators has been extended to more generic application, such as geothermal potential assessment of sedimentary basins, large scale aquifers systems appraisal for massive CO2 sequestration or quantification of present-day methane seepage from shallow biogenic gas production.

 

At the basin scale, data to describe the subsurface are very diverse and scattered and the uncertainty of representativeness of basin geological models is large, especially if one expects to obtain results in quantitative terms on connected pore volumes, temperatures, pressures, stress or fluid composition.

This scarcity of data requires geoscientists to describe alternative scenarios that are compatible with the observational data.  The description of a 4D model (3D structure through geological time) of a sedimentary basin is a long and complex task and the creation and analysis of multiple digital scenarios is therefore almost impossible in reasonable timeframe.

 

We have developed and proofed the concept of interactive basin model that allows simulating while interpreting, hence comparing scenarios while interpreting. In the concept implementation, the processes of surface and subsurface data analysis, 3D scenario model building, simulation parameters setup, THMC simulation, results visualisation and analysis and scenario comparison is performed in a single “real-time” loop.

The concept also allows the incremental building of a geological basin model. Therefore, one can start by building a coarse model of the full sedimentary basin that is continuously watertight and consistent. Then by visualising the result of the simulation in terms of present-day temperature, pressure, stress, and fluid chemistry fields compared instantaneously with the available data, it can be improved to a more complete and consistent representation. This interactive loop avoids the need for costly and complex inversion and allows the geologist to quickly explore the consistency of his or her assumptions.

 

Ultimately, this interactive modelling protocol based on advanced multi-physics simulation tools should become an essential weapon for rapidly defining the basis for assessing the potential, risks and balances between human activity and the nature of an often poorly documented deep underground.

It is complementary to specific tools for data analysis or uncertainty and risk assessment, such as specialised simulators like reservoir or aquifer models.

How to cite: Gout, C. and Cacas-Stentz, M.-C.: An interactive geological basin model: supporting the fast-track assessment of large-scale subsurface potential in the context of the ecological transition, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-5215, https://doi.org/10.5194/egusphere-egu22-5215, 2022.

08:35–08:37
08:37–08:42
|
EGU22-13083
|
Virtual presentation
|
Mark Thyer, Jason Hunter, David McInerney, and Dmitri Kavetski

Probabilistic predictions describe the uncertainty in modelled streamflow, which is a critical input for many environmental modelling applications.  A residual error model typically produces the probabilistic predictions in tandem with a hydrological model that predicts the deterministic streamflow. However, many objective functions that are commonly used to calibrate the parameters of the hydrological model make (implicit) assumptions about the errors that do not match the properties (e.g. of heteroscedasticity and skewness) of those errors. The consequence of these assumptions is often low-quality probabilistic predictions of errors, which reduces the practical utility of probabilistic modelling. Our study has two aims:

1. Evaluate the impact of objective function inconsistency on the quality of probabilistic predictions;

2. To demonstrate how a simple enhancement to a residual error model can rectify the issues identified with inconsistent objective functions in Aim 1, and thereby improve probabilistic predictions in a wide range of scenarios.

Our findings show that the enhanced error model enables high-quality probabilistic predictions to be obtained for a range of catchments and objective functions, without requiring any changes to the hydrological modelling or calibration process. This advance has practical benefits that are aimed at increasing the uptake of probabilistic predictions in real-world applications, in that the methods are applicable to existing hydrological models that are already calibrated, simple to implement, easy to use and fast. Finally, these methods are available as an open-source R-shiny application and an R-package function.

How to cite: Thyer, M., Hunter, J., McInerney, D., and Kavetski, D.: High-quality probabilistic predictions for existing hydrological models with common objective functions   , EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-13083, https://doi.org/10.5194/egusphere-egu22-13083, 2022.

08:42–08:44
08:44–08:49
|
EGU22-8603
|
Presentation form not yet defined
|
Arnald Puy, Razi Sheikholeslami, Hoshin Gupta, Jim Hall, Bruce Lankford, Samuele Lo Piano, Jonas Meier, Florian Pappenberger, Amilcare Porporato, Giulia Vico, and Andrea Saltelli

Irrigation agriculture is the most important user of the global freshwater resources worldwide, which makes it one of the key actors conditioning sustainable development and water security. The anticipated future climate change, population growth, and rapidly rising global demand for food will likely lead to agricultural expansion by allowing the development of irrigated areas. This together with the fact that irrigated crops are approximately four times more profitable than rainfed crops will place much additional pressure on water resources in the next years. Therefore, it is of vital importance to devise solutions that minimize the negative impacts of agricultural expansion, particularly on biodiversity and water use, so as to help us achieve environmental and economic sustainability. To realize such an ambition, quantifying irrigation water withdrawal at different spatio-temporal scales is essential. Global Hydrological Models (GHM) are often used to produce irrigation water withdrawal estimates. Yet GHMs questionably rely on several uncertain estimates of irrigated areas, crop evapotranspiration processes, precipitation and irrigation efficiency, which are the four main inputs in the structure of GHMs. Here we show that, once basic uncertainties regarding these estimates are properly integrated into the calculations, the point-based irrigation water withdrawal estimates actually correspond to uncertainty intervals that span several orders of magnitude already at the grid cell level. Our approach is based on the concept of “sensitivity auditing”, a practice of process-oriented skepticism towards mathematical models. The numerical results suggest that current estimates of global irrigation water withdrawals are spuriously accurate due to their neglect of several ambiguities/uncertainties, and thus need to be re-assessed. Our analysis highlights that models of global irrigation water demands need to better integrate uncertainties, both technical and epistemological, so as to avoid misguiding the development of strategies intended to help ensure water and food security.

How to cite: Puy, A., Sheikholeslami, R., Gupta, H., Hall, J., Lankford, B., Lo Piano, S., Meier, J., Pappenberger, F., Porporato, A., Vico, G., and Saltelli, A.: How certain are we about the model-based estimations of global irrigation water withdrawal?, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-8603, https://doi.org/10.5194/egusphere-egu22-8603, 2022.

08:49–08:51
08:51–08:56
|
EGU22-11463
|
ECS
|
Virtual presentation
|
|
Andrijana Todorović, Thomas Grabs, and Claudia Teutschbein

Effective water resources management and mitigation of adverse effects of global warming requires accurate flow projections. These projections are generally focused on statistical changes in hydrologic signatures (e.g., 100-year floods, 30-year or 7-day minimum flows), which are obtained from statistical analyses of simulated flows under baseline and future conditions. However, hydrological models used for these simulations are traditionally calibrated to reproduce entire flow series, rather than statistical properties of the hydrologic signatures. Therefore, there is a dichotomy between criteria for hydrological model evaluation/selection and the actual requirements of climate change impact studies.

Here, we address this dichotomy by providing novel insights into the assessment of model suitability for climate change impact studies. Specifically, we analyse performance of numerous spatially-lumped, bucket-style hydrological models in reproducing observed distributions and trends in the annual series hydrologic signatures relevant for hydrologic impacts studies under climate change. Model performance in reproducing distributions of the signatures is evaluated by applying the Wilcoxon rank sum test. We consider that a model properly reproduces trends in the series of signatures if either series of observed and simulated signatures both exhibit lack of statistically significant trends, or both series exhibit statistically significant trends of the same sign. Statistical significance of the trends is estimated by applying the Man-Kendall test is used, while signs of the trends are obtained from the San slope. Model performance is also quantified in terms of commonly used numerical indicators, such as Nash-Sutcliffe or Kling-Gupta coefficients.

Our results, which are based on streamflow simulations in 50 high-latitude catchments in Sweden, show that high model performance quantified in terms of traditional performance indicators does not necessarily imply that distributions or trends in series of hydrologic signatures are well reproduced, and vice-versa. Therefore, these two aspects of model performance are distinct and complementary, and they require separate evaluation analyses. Accurate reproduction of statistical properties of hydrologic signatures relevant for climate change impact studies is essential for improving the credibility of future flow projections. We, therefore, recommend that the traditional process of selecting hydrological models for the impact studies should be enhanced with assessments of model ability to reproduce distributions and trends in the hydrologic signatures.

How to cite: Todorović, A., Grabs, T., and Teutschbein, C.: Assessment of Suitability of Hydrological Models for Climate Change Impact Studies , EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-11463, https://doi.org/10.5194/egusphere-egu22-11463, 2022.

08:56–08:58
08:58–09:03
|
EGU22-3905
|
ECS
|
On-site presentation
|
Clara Hohmann, Christina Maus, Dörte Ziegler, Sameh Kantoush, and Qasem Abdelal

Severe flash floods have hit Jordan in recent years, e.g., in 2018 and 2020, leading to fatalities and infrastructure damages. Moreover, even though Jordan is one of the water scarcest countries of the world, extreme rainfall events might occur more frequently under climate change (IPCC Sixth Assessment Report 2021), causing flash floods in wadi systems. Also, the population growth combined with construction and sealing in cities increases the risk of damages, and authorities are under pressure to provide solutions for disaster risk reduction. Few flash flood models have been adopted and developed for wadi systems. Here the scientific community might help by providing tools to understand better, assess, and predict such events to introduce possible adaptation strategies.

The BMBF funded German-Jordanian project “CapTain Rain” studies flash flood risks with a transdisciplinary approach, interacting with local stakeholders. Jordan receives annual precipitation of around 110 mm overall, and hydrological data is not abundant, discontinuous, and of differing quality. Hence, flash flood modelling approaches and available software for humid regions from northern hemisphere industrialized countries cannot be easily transferred. Therefore, we want to review the variety of model options for flash flood modelling in arid and humid areas and give an overview of the selection process.

The model selection is often based on different aspects like application of interest, data requirements and availability, model complexity, code availability and open-source option, user knowledge, and modeling group experience. On the one hand, Beven and Young (2013) strengthen that model selection should not be more complex as necessary and fit-for-purpose. On the other hand, Addor and Melsen (2019) saw a strong social component. They mention the hydrological model selection is stronger influenced by legacy aspects instead of adequacy aspects. Horton et al. (2021) reviewed the hydrological model application for Switzerland. They discuss that not all aspects of model selection are mentioned in the published articles, mainly social elements. In addition, their author survey shows that modeling group experience plays a crucial factor in model selection, and most models used have a strong basis in the country.

By focusing on Jordan or other dry and data-scarce regions worldwide, other aspects need to be considered. For example, modelling knowledge of users might be limited, validation and calibration data are scarce, and financial resources for software are restricted. Therefore, we see an urgent need to analyze the aspects of model selection for flash floods in Wadi systems in a scientific context and to give the stakeholders a fact-based overview about possible model options.  

 

Literature:

Addor, N.; Melsen, L.A. (2019): Legacy, Rather Than Adequacy, Drives the Selection of Hydrological Models. WRR. 55, 378–390

Beven, K.; Young, P. (2013): A guide to good practice in modeling semantics for authors and referees. WRR. 49, 5092–5098

Horton, P; Schaefli, B.; Kauzlaric, M. (2021): Why do we have so many different hydrological models? A review based on the case of Switzerland. Wiley Interdiscip.Rev.-Water, e1574

How to cite: Hohmann, C., Maus, C., Ziegler, D., Kantoush, S., and Abdelal, Q.: Selection of flash flood models in data-scarce regions like Jordan, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-3905, https://doi.org/10.5194/egusphere-egu22-3905, 2022.

09:03–09:05
09:05–09:10
|
EGU22-6363
|
On-site presentation
|
Pascal Horton, Bettina Schaefli, and Martina Kauzlaric

Hydrological models are fundamental tools that play a key role in many areas of hydrological science and climate change impact studies. However, it is well known that the number of models has increased beyond what is necessary. One of the key drivers for model diversity in hydrology is the wide range of model applications, motivated by specific needs and contexts that require suitable models. Yet, a significant part of this diversity is not driven by the context, as different models are applied under analogue circumstances.

To better understand the main drivers of model diversity, a review of hydrological modelling habits was conducted on studies carried out in Switzerland. Despite being a small country, Switzerland has a variety of hydro-climatological regimes, water resource management challenges, and hydrological research institutes, and can thus be representative of other regions. A first observation was that the motivations for selecting a model are rarely stated in scientific articles, and the adequacy of the model for the context or landscape is often not addressed. Thus, a survey was conducted to evaluate some subjective aspects that are otherwise difficult to retrieve from the scientific literature.

Not surprisingly, researchers are very keen on using a model developed at their own institute, which provides the benefit of expertise and efficiency, but at increased risk of context inadequacy and automatism in decisions. Other aspects were considered relevant in the model selection process, such as – indeed – adequacy, access to the code, reuse of existing model setups, collaborations, technical constraints or data availability.

Several hydrological models exist in Switzerland, while the vast majority of the studies were conducted using a single model. To some extent, model diversity is desirable to assess model variability, but multi-model applications to harness this diversity are largely missing. The survey could highlight that most researchers consider multi-model approaches important, but most do not apply them for various practical reasons, such as lack of resources (time and/or money) or lack of expertise in another model. We believe that some barriers can be lowered to facilitate multi-model approaches, requiring efforts from the modelling community and the funding agencies.

How to cite: Horton, P., Schaefli, B., and Kauzlaric, M.: Drivers of hydrological model diversity and model selection factors - The example of Switzerland., EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-6363, https://doi.org/10.5194/egusphere-egu22-6363, 2022.

09:10–09:12
09:12–09:17
|
EGU22-1167
|
ECS
|
On-site presentation
|
Lieke Melsen

Computer models are frequently used tools in hydrological research. Many decisions related to the model set-up and configuration have to be made before a model can be run, which might influence the results of the model. This study is an empirical investigation of the motivations for certain modeling decisions. Fourteen modelers from three different institutes were interviewed about their modeling decisions. In total, 83 different motivations were identified. Most motivations were related to the team of the modeler and the modelers themselves, `Experience from colleagues' was the most frequently mentioned motivation. Institutionalization and Internalization were observed: a modeler can introduce a concept that subsequently becomes the teams' standard, or a modeler can internalize the default team approach. These processes depend on the experience of the modeler. For model selection, two types of motivations were identified: experience (from colleagues or the modelers themselves), and model vision (the model has assets that align with the modeling vision). Model studies are mainly driven by context, such as time constraints, colleagues, and facilities at the institute, rather than epistemic (such as aligning with the modeling vision). The role of local context in the construction of and the value assigned to models shows that models are social constructs, making model results time and place dependent. To account for this context in the estimation of the robustness of model results, we need diversity of opinions, perspectives, and approaches. This requires transparent modeling procedures and an explicit modeling vision for each model study. 

How to cite: Melsen, L.: It takes a village to run a model, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-1167, https://doi.org/10.5194/egusphere-egu22-1167, 2022.

09:17–09:19
09:19–09:24
|
EGU22-4802
|
Virtual presentation
Fabrizio Fenicia and Dmitri Kavetski

Stringent modelling methods and diagnostic techniques for improving the credibility of model predictions have received a lot of attention in the hydrological literature. However, previous discussions have revolved mainly around theoretical aspects, and arguably lacked persuasive examples. In this work, in order to illustrate the weaknesses of widespread modelling practices, we instead provide an applied perspective. In particular, we present the case of a distributed rainfall-runoff model that evolves in response to progressively more stringent application of model diagnostics. Through this example we demonstrate the usefulness of the following methodological instruments: (i) benchmarking model results against a null-hypothesis model, (ii) testing model predictions in space-time validation, and (iii) carrying out controlled model comparisons. These instruments, arguably still underutilized in the hydrological community, offer important diagnostic capabilities to increase the rigor of hydrological and environmental model applications.  Therefore their more widespread application is encouraged.

How to cite: Fenicia, F. and Kavetski, D.: Behind every robust result is a robust method: Perspectives from a hydrological case study, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-4802, https://doi.org/10.5194/egusphere-egu22-4802, 2022.

09:24–09:26
09:26–09:31
|
EGU22-12403
|
Virtual presentation
Daniel Klotz, Martin Gauch, Grey Nearing, Sepp Hochreiter, and Frederik Kratzert

The goal of this contribution is to demonstrate deficiencies that we observe in hydrological modelling studies. Our hope is that awareness of potential mistakes, errors, and habits will support accurate communication and analysis — and consequently lead to better modelling practises in our community.

By deficiencies, we broadly mean wrong assumptions, false conclusions, and artificial limitations that impair our modelling efforts. To give some explicit examples:

  • Model calibration: Often, only two data splits are used: one for model calibration and one for model validation. To provide a robust estimate of model quality on unseen data, one should, however, involve a three-way split: a calibration set used for parameter adaptation, a validation set used for hyperparameter tuning and intermediate model evaluations, and a test set used only once to derive the final, independent model accuracy.
  • Artificial restrictions: Studies often restrict modelling setups to specific settings (e.g., model classes, input data, or objective functions) for comparative reasons. In general, one should use the best available data, inputs, and objective functions for each model, irrespective of the diagnostic metric used for evaluation and irrespective of what other models are (able to) use.
  • (Missing) Model rejection: Although benchmarking efforts are not an entirely new concept in our community, we do observe that the results of model comparisons are seemingly without consequences. Models that repeatedly underperform on a specific task continue to be used for the same task they were just proven not to be good for. At some point, these models should be rejected and we as a community should move forward to improve the other models or develop new models.
  • Interpretation of intermediate states: Many hydrologic models attempt to represent a variety of internal physical states that are not calibrated (e.g., soil moisture). Unfortunately, these states are often mistaken for true measurements and used as ground truth in downstream studies. We believe that (unless the quality of these states was evaluated successfully), using intermediate model outputs is of high risk, as it may distort subsequent analyses.
  • Noise: Albeit it is commonly accepted that hydrological input variables are subject to large uncertainties and imprecisions, the influence of input perturbations is often not explicitly accounted for in models. 
  • Model  complexity: We aim to model one of the most complex systems that exists, our nature. In practice, we will only be able to obtain a simplified representation of the system. However, we should not reduce complexity for the wrong reasons. While there is a tradeoff between simplicity and complexity, we should not tend towards the most simple models, such as two- or three-bucket models.

Our belief is that modelling should be a community-wide effort, involving benchmarking, probing, model building, and examination. Being aware of deficiencies will hopefully bring forth a culture that adheres to best practises, rigorous testing, and probing for errors — ultimately benefiting us all by leading to more performant and reliable models.

How to cite: Klotz, D., Gauch, M., Nearing, G., Hochreiter, S., and Kratzert, F.: Deficiencies in Hydrological Modelling Practices, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-12403, https://doi.org/10.5194/egusphere-egu22-12403, 2022.

09:31–09:33
09:33–09:38
|
EGU22-9464
|
Virtual presentation
Joseph Guillaume

It has been said that culture eats strategy for breakfast. The effect of legacy over adequacy in modelling practice exemplifies the difficulty in changing behaviours to improve modelling outcomes. Ideally, good modelling practice would be incentivised by the systems in which modellers operate, and moreover, that modelling practice would have a learning orientation that gradually improves over time, seeking an ever closer alignment with organisational and societal needs.

Digital twins institutionalised within organisational operations provide a possible opportunity to incentivise these behaviours. A digital twin is a time-varying representation of a system that brings together observed information and predictive model capabilities. Juxtaposing model predictions with other sources of information forces models to demonstrate their value, in continually changing conditions. Operational use of a digital twin means that models need to be fit for purpose. The need to prioritise investment across a digital twin means that the model suite needs to address a broad range of purposes and model augmentation is more likely to be driven by consideration of value of information and prioritisation of efforts to reduce uncertainty over time.

These theoretical benefits are explored with example use cases in the context of cross-scale catchment water resource, landscape, and irrigation management, drawing on preliminary experiments in Australia.

How to cite: Guillaume, J.: Can digital twins incentivise good modelling practice?, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-9464, https://doi.org/10.5194/egusphere-egu22-9464, 2022.

09:38–09:40
09:40–09:45
|
EGU22-10846
|
ECS
|
Virtual presentation
|
Hongren Shen, Bryan Tolson, and Juliane Mai

Model calibration and validation are critical in hydrological model robustness assessment. Unfortunately, the commonly used split-sample test (SST) framework for data splitting requires modelers to make subjective decisions without clear guidelines.

A massive SST experiment for hydrological modeling is proposed and tested across a large sample of catchments to empirically reveal how data availability and calibration period features (i.e., length and recentness) simultaneously impact model performance in the post-validation period (e.g., forecasting or prediction), thus providing practical guidance on split-sample design. Unlike most SST studies that use two sub-periods (i.e., calibration and validation) to build models, this study incorporates an independent model testing period in addition to calibration and validation periods. Model performance of two lumped conceptual hydrological models (i.e., GR4J and HMETS) are calibrated and tested in 463 CAMELS catchments across the United States using 50 different data splitting schemes. These schemes are established regarding the data availability, length, and data recentness of the continuous calibration sub-periods (CSPs). A full-period CSP is also included in the experiment, which skips model validation entirely. The results are synthesized regarding the large sample of catchments and are comparatively assessed in multiple novel ways, including how model building decisions are framed as a decision tree problem and viewing the model validation process as a formal testing period classification problem, aiming to accurately predict model success/failure in the testing period.

Results span different climate and catchment conditions across a 35-year period with available data, making conclusions generalizable. Strong patterns show that calibrating to older data and then validating models on newer data produces inferior model testing period performance in every single analysis conducted and should hence be avoided. Calibrating to the full available data and skipping model validation entirely is the most robust split-sample decision. Findings have significant implications for SST practice in hydrological modeling. As the next phase of this study, results for discontinuous calibration sub-periods (DCSP) will be evaluated as an alternative SST design choice and contrasted then with the CSP results.

How to cite: Shen, H., Tolson, B., and Mai, J.: Time to Update the Split Sample Approach to Hydrological Model Calibration: A Massive Empirical Study, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-10846, https://doi.org/10.5194/egusphere-egu22-10846, 2022.

09:45–09:47
09:47–09:52
|
EGU22-8211
|
ECS
|
On-site presentation
Sina Khatami, Giuliano Di Baldassarre, Hoshin Gupta, Enayat A Moallemi, and Sandra Pool

A long-standing research issue, in the hydrological sciences and beyond, is that of developing methods to evaluate the predictive/forecasting skill, errors and uncertainties of a model (or model ensembles). Various methods have been proposed for characterising time series residuals, i.e. the differences between observed (or target) and modelled (or estimate) time series. Most notably, the Taylor Diagram summarises model performance via a single plot based on three related metrics: the (linear Pearson) correlation, standard deviation, and root mean squared differences of one or multiple pairs of target and estimate time series. Despite its theoretical elegance and widespread use, the Taylor diagram does not account for bias errors, which is an important summary statistic for evaluating model performance. Further, it is very common to evaluate, compare, and report on model “skill” by use of a single aggregate metric value, even when a vector of metrics is used to calibrate/train the model; most commonly this is a dimensionless efficiency metric such as Nash-Sutcliffe Efficiency (NSE) or Kling-Gupta Efficiency (KGE). Such “efficiency” metrics typically aggregate over multiple types of residual behaviours: for example the most commonly used version of KGE is based on correlation, bias, and variability errors, although the authors recommended that it should be applied in a context-dependent fashion based on which model behaviours are deemed to be important to a given situation. Nevertheless, the use of a single summary value fails to account for the interactions among the error component terms, which can be quite informative for the evaluation and benchmarking of models. In this study, we propose a new diagram that is as easy to use and interpret as the Taylor Diagram, while also accounting for bias. We further suggest a new convention for reporting model skill that is based on foundational error terms. Our vision is that this new diagram and convention will enable researchers and practitioners to better interpret and report model performance. We provide multiple numerical examples to illustrate how this approach can be used for evaluating performance in the context of multi-model and multi-catchment (large-sample) studies.

How to cite: Khatami, S., Di Baldassarre, G., Gupta, H., Moallemi, E. A., and Pool, S.: Suggesting a new diagram and convention for characterising and reporting model performance, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-8211, https://doi.org/10.5194/egusphere-egu22-8211, 2022.

09:52–09:54
09:54–09:59
|
EGU22-8396
|
ECS
|
On-site presentation
|
Martin Gauch, Frederik Kratzert, Juliane Mai, Bryan Tolson, Grey Nearing, Hoshin Gupta, Sepp Hochreiter, and Daniel Klotz

As hydrologists, we pride ourselves on being able to identify deficiencies of a hydrologic model by looking at its runoff simulations. Generally, one of the first questions that a practicing hydrologist always asks when presented with a new model is: "show me some hydrographs!". Everyone has an intuition about how a "real" (i.e., observed) hydrograph should behave [1, 2]. Although there exists a large suite of summary metrics that measure differences between simulated and observed hydrographs, those metrics do not always fully account for our professional intuition about what constitutes an adequate hydrological prediction (perhaps because metrics typically aggregate over many aspects of model performance). To us, this suggests that either (a) there is potential to improve existing metrics to conform better with expert intuition, or (b) our expert intuition is overvalued and we should focus more on metrics, or (c) a bit of both.

In the social study proposed here, we aim to address this issue in a data-driven fashion: We will ask experts to access a website where they are tasked to compare two unlabeled hydrographs (at the same time) against an observed hydrograph, and to decide which of the unlabeled ones they think matches the observations better. Together with information about the experts’ background expertise, the collected responses should help paint a more nuanced picture of the aspects of hydrograph behavior that different members of the community consider important. This should provide valuable information that may enable us to derive new (and hopefully better) model performance metrics in a data-driven fashion directly from human ratings.

 

[1] Crochemore, Louise, et al. "Comparing expert judgement and numerical criteria for hydrograph evaluation." Hydrological sciences journal 60.3 (2015): 402-423.

[2] Wesemann, Johannes, et al. "Man vs. Machine: An interactive poll to evaluate hydrological model performance of a manual and an automatic calibration." EGU General Assembly Conference Abstracts. 2017.

How to cite: Gauch, M., Kratzert, F., Mai, J., Tolson, B., Nearing, G., Gupta, H., Hochreiter, S., and Klotz, D.: Rate my Hydrograph: Evaluating the Conformity of Expert Judgment and Quantitative Metrics, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-8396, https://doi.org/10.5194/egusphere-egu22-8396, 2022.

09:59–10:00