EGU23-8479
https://doi.org/10.5194/egusphere-egu23-8479
EGU General Assembly 2023
© Author(s) 2023. This work is distributed under
the Creative Commons Attribution 4.0 License.

Model evaluation strategy impacts the interpretation and performance of machine learning models

Lily-belle Sweet1, Christoph Müller2, Mohit Anand1, and Jakob Zscheischler1
Lily-belle Sweet et al.
  • 1Helmholtz Centre for Environmental Research - UFZ, Department of Computational Hydrosystems, Leipzig, Germany (lily-belle.sweet@ufz.de)
  • 2Potsdam Institute for Climate Impact Research (PIK), Member of the Leibniz Association, Potsdam, Germany

Machine learning models are able to capture highly complex, nonlinear relationships, and have been used in recent years to accurately predict crop yields at regional and national scales. This success suggests that the use of ‘interpretable’ or ‘explainable’ machine learning (XAI) methods may facilitate improved scientific understanding of the compounding interactions between climate, crop physiology and yields. However, studies have identified implausible, contradicting or ambiguous results from the use of these methods. At the same time, researchers in fields such as ecology and remote sensing have called attention to issues with robust model evaluation on spatiotemporal datasets. This suggests that XAI methods may produce misleading results when applied to spatiotemporal datasets, but the impact of model evaluation strategy on the results of such methods has not yet been examined.

In this study, machine learning models are trained to predict simulated crop yield, and the impact of model evaluation strategy on the interpretation and performance of the resulting models is assessed. Using data from a process-based crop model allows us to then comment on the plausibility of the explanations provided by common XAI methods. Our results show that the choice of evaluation strategy has an impact on (i) the interpretations of the model using common XAI methods such as permutation feature importance and (ii) the resulting model skill on unseen years and regions. We find that use of a novel cross-validation strategy based on clustering in feature-space results in the most plausible interpretations. Additionally, we find that the use of this strategy during hyperparameter tuning and feature selection results in improved model performance on unseen years and regions. Our results provide a first step towards the establishment of best practices for model evaluation strategy in similar future studies.

How to cite: Sweet, L., Müller, C., Anand, M., and Zscheischler, J.: Model evaluation strategy impacts the interpretation and performance of machine learning models, EGU General Assembly 2023, Vienna, Austria, 24–28 Apr 2023, EGU23-8479, https://doi.org/10.5194/egusphere-egu23-8479, 2023.

Supplementary materials

Supplementary material file