- 1Oxford Robotics Institute , University of Oxford, Oxford, United Kingdom
- 2Atmospheric, Oceanic and Planetary Physics, University of Oxford , United Kingdom
Data driven parameterisations for sub-grid processes unlocks the ability to surpass the current computational constraints of Earth system models. However, machine learning (ML) can be brittle. State-of-the-art ML approaches can reliably perform on in-distribution data, exceeding human ability across a diverse range of tasks. Yet, when faced with shifts in data distribution, performance degrades. In climate modelling, when the task is predicting the state of a non-stationary system, this is evidently a substantial issue. We illustrate this with the ClimSim dataset, forming spatio-temporal groups and quantitatively show how even small shifts in distribution affect performance.
Next, we use the theory of compositional generalisation to build models which are less susceptible to these shifts in distribution. Compositional generalisation is the formation of novel combinations of observed elementary components. That is, the ability to decompose data into building blocks that are reused across both the in- and shifted-domains, such that a model can capture a domain shifted state through a set of in-domain, learnt abstractions. Inspired by these concepts we propose various architectural and regularisation changes to standard ML parameterisations to improve generalisation. Preliminary results in sub-grid process emulators suggest new insights into if and how CG can reduce model sensitivity to domain shifts.
How to cite: Stanley-Clamp, B., Posner, I., and Christensen, H.: Beyond In-Distribution Skill: Towards Robust ML Parameterisations for Non-Stationary Climate Systems, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-20353, https://doi.org/10.5194/egusphere-egu26-20353, 2026.