- 1Max Planck Institute for Biogeochemistry, Biogeochemical Integration, Germany
- 2Institute for Atmospheric and Climate Science (IAC), ETH, Zurich, Switzerland
- 3ELLIS Unit Jena, Jena, Germany
Machine learning (ML), and deep learning (DL) in particular, hold the potential to solve long-standing challenges in understanding and modeling the Earth system. Earth system model (ESM) development is reluctant to implement DL algorithms because they are considered intransparent, meaning it is unclear how these models extrapolate to unseen conditions, e.g., under a changing climate. Still, machine learning is often used to extrapolate into the future, which can lead to misleading results.
We demonstrate these limitations and the dangers of performing naive extrapolation by using a set of deep neural networks to emulate simulated data of gross primary production (GPP). We use a process-based model (PBM) that simulates photosynthetic CO2 uptake as a product of radiation (PAR), stress from daily meteorology (fTmin , fVPD , fSM ), vegetation state (fPAR), and CO2 (εmax(CO2 )). It is given by
GPP = εmax (CO2 ) · PAR · fPAR · fTmin · fVPD · fSM + ε. (1)
The PBM contains many of the typical challenges when using ML for Earth’s system science. It accounts for stochastic noise (ε), is capable of exhibiting multi-year memory, and the predictors are highly correlated on multiple time scales. Further, this model exhibits interesting extrapolation behavior as some of the factors (fTmin , fVPD , fSM , fPAR) saturate in extreme meteorological conditions while others (PAR, εmax ) do not. We feed the PBM with predictors obtained from historical and future climate simulations of a comprehensive Earth system model. The training dataset contains the predictors and predictions of the PBM for various locations in a similar climate zone but different continents and for the historical time frame (1850-present) together with a spurious predictor, namely, surface wind speed. To obtain a set of independent models, each of the co-authors separately implements a custom architecture, without knowing which predictor is which. This results in four different models, namely a linear model, a multi-layer perceptron, a long-short term memory (LSTM), and an attention-based model.
We find that all models show strong prediction performance in cross-validation (Normalized Nash–Sutcliffe Efficiency (NNSE) > 0.9), decent performance when extrapolating to sites on different continents (NNSE > 0.7), but three out of four models show virtually no skill when predicting to a changed climate (NNSE < 0.6). Additionally, most models emit gradients in the same order of magnitude as the PBM when ignoring values where some factors saturate. This indicates that the networks did not learn the saturation behavior from the data. Further, the model that extrapolates best is the LSTM, a model that has a built-in maximum output and, hence, has to saturate.
In conclusion, strong spatial generalization and cross-validation performance do not guarantee decent extrapolation for neural networks even in relatively simple, stable systems. These findings highlight the importance of selecting architectures in line with the expected extrapolation behavior when predicting Earth’s system processes under climate change conditions.
How to cite: Reimers, C., ElGhawi, R., Kraft, B., and Winkler, A. J.: Limitations of Machine Learning Models in Extrapolating to a Changing Climate, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-16360, https://doi.org/10.5194/egusphere-egu25-16360, 2025.