- 1Research Institute for Statistics and Information Science, University of Geneva, Geneva, Switzerland (sebastian.engelke@unige.ch)
- 2Department of Mathematics, Imperial College London, London, UK
- 3Institute of Statistics, Karlsruhe Institute of Technology, Karlsruhe, Germany
Recent AI weather models outperform traditional physics-based weather prediction models on many benchmarks. The evaluation is mostly restricted to point-wise metrics such as the mean squared error and therefore does not assess whether the joint multivariate behavior is well captured. Since AI weather models do not rely on any physical laws, there are strong concerns and first indications that the forecasted fields lack physical consistency in terms of spatial coherence and energy constraints. Verifying such constraints directly is however far from trivial.
We propose a Turing test for physicality that leverages the spread of an ensemble of pre-trained AI forecasting models. The main idea is that the epistemic uncertainty of these models is much larger when applied to non-physical conditions compared to physical conditions that have been part of the training data. We combine this intuition with the theory of conformal inference to obtain a statistical test for physicality with finite-sample guarantees. Case studies on the 1963 Lorenz system show the effectiveness of our proposed approach in identifying conditions that lie outside of its attractor. We then illustrate the applicability of our methodology to recent AI weather models.
How to cite: Engelke, S., Gnecco, N., Froelich, M., Hentschel, M., and Zhang, Z.: A Turing test for physicality in AI weather models, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-18411, https://doi.org/10.5194/egusphere-egu26-18411, 2026.