EGU26-13486, updated on 14 Mar 2026
https://doi.org/10.5194/egusphere-egu26-13486
EGU General Assembly 2026
© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.
Oral | Monday, 04 May, 16:40–16:50 (CEST)
 
Room -2.92
Learning representations from different pre-training strategies in the WeatherGenerator 
Sebastian Hickman1, Sophie Xhonneux2, Ilaria Luise2, Julian Kuehnert2, Matthias Karlbauer2, Kerem Tezcan3, Yura Perugachi Diaz4, Timothee Hunter2, and Christian Lessig2
Sebastian Hickman et al.
  • 1ECMWF, Reading, United Kingdom
  • 2ECMWF, Bonn, Germany
  • 3MeteoSwiss, Zurich, Switzerland
  • 4KNMI, De Bilt, Netherlands

In general, pre-training of large machine learning models uses self-supervised learning to generate expressive latent representations. These can then be used for downstream applications with little to no fine-tuning. The WeatherGenerator project follows this paradigm and aims to train a foundation model from a large number of weather and climate datasets to learn general and useful representations that may be used for a variety of downstream tasks, such as forecasting, downscaling or data assimilation. A wide variety of self-supervised tasks and training paradigms exist from other domains such as computer vision, that provide impressive performance. However, the extent to which these strategies transfer to atmospheric dynamics, and the physical sciences in general, has not been widely explored except for a few notable cases (Lessig et al., 2023, Parker et al., 2025).  

We explore how different pre-training approaches, including masked token modelling and student-teacher methods (Caron et al.,2021, Zhou et al, 2022, Assran et al., 2023), can be adapted to learn representations for atmospheric dynamics using reanalysis, forecast, and observation datasets. We then show how linear probing and small non-linear decoders can be used to evaluate the quality of the representations learned by different pre-training strategies. The relationship between the pre-training task and the quality of the representations learned for different tasks is explored. Finally, we illustrate the importance of including varied and representative datasets during pre-training and compare this to the specific pre-training method used. 

Parker, L., Lanusse, F., Shen, J., Liu, O., Hehir, T., Sarra, L., Meyer, L., Bowles, M., Wagner-Carena, S., Qu, H. and Golkar, S., 2025. AION-1: Omnimodal Foundation Model for Astronomical Sciences. arXiv preprint arXiv:2510.17960. 

Lessig, C., Luise, I., Gong, B., Langguth, M., Stadtler, S. and Schultz, M., 2023. AtmoRep: A stochastic model of atmosphere dynamics using large scale representation learning. arXiv preprint arXiv:2308.13280. 

Caron, M., Touvron, H., Misra, I., Jégou, H., Mairal, J., Bojanowski, P., Joulin, A., 2021. Emerging Properties in Self-Supervised Vision Transformers. https://doi.org/10.48550/arXiv.2104.14294

Assran, M., Duval, Q., Misra, I., Bojanowski, P., Vincent, P., Rabbat, M., LeCun, Y., Ballas, N., 2023. Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture. https://doi.org/10.48550/arXiv.2301.08243

Zhou, J., Wei, C., Wang, H., Shen, W., Xie, C., Yuille, A., Kong, T., 2022. iBOT: Image BERT Pre-Training with Online Tokenizer. https://doi.org/10.48550/arXiv.2111.07832 

How to cite: Hickman, S., Xhonneux, S., Luise, I., Kuehnert, J., Karlbauer, M., Tezcan, K., Perugachi Diaz, Y., Hunter, T., and Lessig, C.: Learning representations from different pre-training strategies in the WeatherGenerator , EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-13486, https://doi.org/10.5194/egusphere-egu26-13486, 2026.