EGU23-12528, updated on 10 Jan 2024
EGU General Assembly 2023
© Author(s) 2024. This work is distributed under
the Creative Commons Attribution 4.0 License.

Evaluation of explainable AI solutions in climate science

Philine Bommer1,2, Marlene Kretschmer3,4, Anna Hedstroem1,2, Dilyara Bareeva1, and Marina M.-C. Hoehne2
Philine Bommer et al.
  • 1Technical University Berlin, Informatics, Machine Learning, Berlin, Germany (
  • 2Leibniz Institute of Agricultural Engineering and Bio-economy
  • 3University of Reading
  • 4University of Leipzig

Explainable artificial intelligence (XAI) methods serve as a support for researchers to shed light onto the reasons behind the predictions made by deep neural networks (DNNs). XAI methods have already been successfully applied to climate science, revealing underlying physical mechanisms inherent in the studied data. However, the evaluation and validation of XAI performance is challenging as explanation methods often lack ground truth. As the number of XAI methods is growing, a comprehensive evaluation is necessary to enable well-founded XAI application in climate science.

In this work we introduce explanation evaluation in the context of climate research. We apply XAI evaluation to compare multiple explanation methods for a multi-layer percepton (MLP) and a convolutional neural network (CNN). Both MLP and CNN assign temperature maps to classes based on their decade. We assess the respective explanation methods using evaluation metrics measuring robustness, faithfulness, randomization, complexity and localization. Based on the results of a random baseline test we establish an explanation evaluation guideline for the climate community. We use this guideline to rank the performance in each property of similar sets of explanation methods for the MLP and CNN. Independent of the network type, we find that Integrated Gradients, Layer-wise relevance propagation and InputGradients exhibit a higher robustness, faithfulness and complexity compared to purely Gradient-based methods, while sacrificing reactivity to network parameters, i.e. low randomisation scores. The contrary holds for Gradient, SmoothGrad, NoiseGrad and FusionGrad. Another key observation is that explanations using input perturbations, such as SmoothGrad and Integrated Gradients, do not improve robustness and faithfulness, in contrast to theoretical claims. Our experiments highlight that XAI evaluation can be applied to different network tasks and offers more detailed information about different properties of explanation method than previous research. We demonstrate that using XAI evaluation helps to tackle the challenge of choosing an explanation method.

How to cite: Bommer, P., Kretschmer, M., Hedstroem, A., Bareeva, D., and Hoehne, M. M.-C.: Evaluation of explainable AI solutions in climate science, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-12528,, 2023.

Supplementary materials

Supplementary material file