AtmoRep: Large Scale Representation Learning for Atmospheric Data

Christian Lessig; Ilaria Luise; Martin Schultz

doi:https://doi.org/10.5194/egusphere-egu23-3117

[Back] [Session ITS1.13/AS5.2]

EGU23-3117

https://doi.org/10.5194/egusphere-egu23-3117

EGU General Assembly 2023

© Author(s) 2023. This work is distributed under
the Creative Commons Attribution 4.0 License.

AtmoRep: Large Scale Representation Learning for Atmospheric Data

Christian Lessig¹, Ilaria Luise², and Martin Schultz³

Christian Lessig et al.

¹Otto-von-Guericke-Universität Magdeburg, Magdeburg, Germany (christian.lessig@ovgu.de)
²OpenLab, CERN, Meyring, Switzerland (ilaria.luise@cern.ch)
³Supercomputing Center Jülich, Jülich, Germany (m.schultz@fz-juelich.de)

The AtmoRep project asks if one can train one neural network that represents and describes all atmospheric dynamics. AtmoRep’s ambition is hence to demonstrate that the concept of large-scale representation learning, whose principle feasibility and potential was established by large language models such as GPT-3, is also applicable to scientific data and in particular to atmospheric dynamics. The project is enabled by the large amounts of atmospheric observations that have been made in the past as well as advances on neural network architectures and self-supervised learning that allow for effective training on petabytes of data. Eventually, we aim to train on all of the ERA5 reanalysis and, furthermore, fine tune on observational data such as satellite measurements to move beyond the limits of reanalyses.

We will present the theoretical formulation of AtmoRep as an approximate representation for the atmosphere as a stochastic dynamical system. We will also detail our transformer-based network architecture and the training protocol for self-supervised learning so that unlabelled data such as reanalyses, simulation outputs and observations can be employed for training and re-fining the network. Results will be presented for the performance of AtmoRep for downscaling, precipitation forecasting, the prediction of tropical convection initialization, and for model correction. Furthermore, we also demonstrate that AtmoRep has substantial zero-short skill, i.e., it is capable to perform well on tasks it was not trained for. Zero- and few-shot performance (or in context learning) is one of the hallmarks of large-scale representation learning and to our knowledge has never been demonstrated in the geosciences.

How to cite: Lessig, C., Luise, I., and Schultz, M.: AtmoRep: Large Scale Representation Learning for Atmospheric Data, EGU General Assembly 2023, Vienna, Austria, 24–28 Apr 2023, EGU23-3117, https://doi.org/10.5194/egusphere-egu23-3117, 2023.