Learning representations from different pre-training strategies in the WeatherGenerator&nbsp;

Sebastian Hickman; Sophie Xhonneux; Ilaria Luise; Julian Kuehnert; Matthias Karlbauer; Kerem Tezcan; Yura Perugachi Diaz; Timothee Hunter; Christian Lessig

doi:https://doi.org/10.5194/egusphere-egu26-13486

[Back] [Session ESSI1.1]

EGU26-13486, updated on 14 Mar 2026

https://doi.org/10.5194/egusphere-egu26-13486

EGU General Assembly 2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

Learning representations from different pre-training strategies in the WeatherGenerator

Sebastian Hickman¹, Sophie Xhonneux², Ilaria Luise², Julian Kuehnert², Matthias Karlbauer², Kerem Tezcan³, Yura Perugachi Diaz⁴, Timothee Hunter², and Christian Lessig²

Sebastian Hickman et al.

¹ECMWF, Reading, United Kingdom
²ECMWF, Bonn, Germany
³MeteoSwiss, Zurich, Switzerland
⁴KNMI, De Bilt, Netherlands

In general, pre-training of large machine learning models uses self-supervised learning to generate expressive latent representations. These can then be used for downstream applications with little to no fine-tuning. The WeatherGenerator project follows this paradigm and aims to train a foundation model from a large number of weather and climate datasets to learn general and useful representations that may be used for a variety of downstream tasks, such as forecasting, downscaling or data assimilation. A wide variety of self-supervised tasks and training paradigms exist from other domains such as computer vision, that provide impressive performance. However, the extent to which these strategies transfer to atmospheric dynamics, and the physical sciences in general, has not been widely explored except for a few notable cases (Lessig et al., 2023, Parker et al., 2025).

We explore how different pre-training approaches, including masked token modelling and student-teacher methods (Caron et al.,2021, Zhou et al, 2022, Assran et al., 2023), can be adapted to learn representations for atmospheric dynamics using reanalysis, forecast, and observation datasets. We then show how linear probing and small non-linear decoders can be used to evaluate the quality of the representations learned by different pre-training strategies. The relationship between the pre-training task and the quality of the representations learned for different tasks is explored. Finally, we illustrate the importance of including varied and representative datasets during pre-training and compare this to the specific pre-training method used.

Parker, L., Lanusse, F., Shen, J., Liu, O., Hehir, T., Sarra, L., Meyer, L., Bowles, M., Wagner-Carena, S., Qu, H. and Golkar, S., 2025. AION-1: Omnimodal Foundation Model for Astronomical Sciences. arXiv preprint arXiv:2510.17960.

Lessig, C., Luise, I., Gong, B., Langguth, M., Stadtler, S. and Schultz, M., 2023. AtmoRep: A stochastic model of atmosphere dynamics using large scale representation learning. arXiv preprint arXiv:2308.13280.

Caron, M., Touvron, H., Misra, I., Jégou, H., Mairal, J., Bojanowski, P., Joulin, A., 2021. Emerging Properties in Self-Supervised Vision Transformers. https://doi.org/10.48550/arXiv.2104.14294

Assran, M., Duval, Q., Misra, I., Bojanowski, P., Vincent, P., Rabbat, M., LeCun, Y., Ballas, N., 2023. Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture. https://doi.org/10.48550/arXiv.2301.08243

Zhou, J., Wei, C., Wang, H., Shen, W., Xie, C., Yuille, A., Kong, T., 2022. iBOT: Image BERT Pre-Training with Online Tokenizer. https://doi.org/10.48550/arXiv.2111.07832

How to cite: Hickman, S., Xhonneux, S., Luise, I., Kuehnert, J., Karlbauer, M., Tezcan, K., Perugachi Diaz, Y., Hunter, T., and Lessig, C.: Learning representations from different pre-training strategies in the WeatherGenerator , EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-13486, https://doi.org/10.5194/egusphere-egu26-13486, 2026.