Assessment of Predictive Uncertainty of Data-Driven Environmental Models

Benedikt Knüsel; Christoph Baumberger; Marius Zumwald; David N. Bresch; Reto Knutti

doi:https://doi.org/10.5194/egusphere-egu2020-10883

[Back] [Session ITS4.3/AS5.2]

EGU2020-10883

https://doi.org/10.5194/egusphere-egu2020-10883

EGU General Assembly 2020

© Author(s) 2020. This work is distributed under
the Creative Commons Attribution 4.0 License.

Assessment of Predictive Uncertainty of Data-Driven Environmental Models

Benedikt Knüsel^1,2, Christoph Baumberger¹, Marius Zumwald^1,2, David N. Bresch^1,3, and Reto Knutti²

Benedikt Knüsel et al.

¹ETH Zürich, Institute for Environmental Decisions, Department of Environmental Systems Science, Switzerland
²ETH Zürich, Institute for Atmospheric and Climate Science, Department of Environmental Systems Science, Switzerland
³Federal Office of Meteorology and Climatology MeteoSwiss, Switzerland

Due to ever larger volumes of environmental data, environmental scientists can increasingly use machine learning to construct data-driven models of phenomena. Data-driven environmental models can provide useful information to society, but this requires that their uncertainties be understood. However, new conceptual tools are needed for this because existing approaches to assess the uncertainty of environmental models do so in terms of specific locations, such as model structure and parameter values. These locations are not informative for an assessment of the predictive uncertainty of data-driven models. Rather than the model structure or model parameters, we argue that it is the behavior of a data-driven model that should be subject to an assessment of uncertainty.

In this paper, we present a novel framework that can be used to assess the uncertainty of data-driven environmental models. The framework uses argument analysis and focuses on epistemic uncertainty, i.e., uncertainty that is related to a lack of knowledge. It proceeds in three steps. The first step consists in reconstructing the justification of the assumption that the model used is fit for the predictive task at hand. Arguments for this justification may, for example, refer to sensitivity analyses and model performance on a validation dataset. In a second step, this justification is evaluated to identify how conclusively the fitness-for-purpose assumption is justified. In a third step, the epistemic uncertainty is assessed based on the evaluation of the arguments. Epistemic uncertainty emerges due to insufficient justification of the fitness-for-purpose assumption, i.e., if the model is less-than-maximally fit-for-purpose. This lack of justification translates to predictive uncertainty, or first-order uncertainty. Uncertainty also emerges if it is unclear how well the fitness-for-purpose assumption is justified. We refer to this uncertainty as “second-order uncertainty”. In other words, second-order uncertainty is uncertainty that researchers face when assessing first-order uncertainty.

We illustrate how the framework is applied by discussing to a case study from environmental science in which data-driven models are used to make long-term projections of soil selenium concentrations. We highlight that in many applications, the lack of system understanding and the lack of transparency of machine learning can introduce a substantial level of second-order uncertainty. We close by sketching how the framework can inform uncertainty quantification.

How to cite: Knüsel, B., Baumberger, C., Zumwald, M., Bresch, D. N., and Knutti, R.: Assessment of Predictive Uncertainty of Data-Driven Environmental Models, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-10883, https://doi.org/10.5194/egusphere-egu2020-10883, 2020

Displays

Display file