4-9 September 2022, Bonn, Germany
EMS Annual Meeting Abstracts
Vol. 19, EMS2022-560, 2022
https://doi.org/10.5194/ems2022-560
EMS Annual Meeting 2022
© Author(s) 2022. This work is distributed under
the Creative Commons Attribution 4.0 License.

Latent Dirichlet Allocation: a new machine learning tool to evaluate CMIP6 climate models atmospheric circulation

Nemo Malhomme1,2, Davide Faranda2, Bérengère Podvin3, and Lionel Mathelin1
Nemo Malhomme et al.
  • 1Université Paris-Saclay, CNRS, Laboratoire interdisciplinaire des sciences du numérique, 91405, Orsay, France
  • 2Université Paris-Saclay, CNRS, CEA, UVSQ, Laboratoire des sciences du climat et de l'environnement, 91191, Gif-sur-Yvette, France
  • 3Université Paris-Saclay, CNRS, CentraleSupélec, Laboratoire EM2C, 91190, Gif-sur-Yvette, France

Climate models aim at representing as closely as possible the observed state of the climate components such as the atmosphere or the ocean. This is a fundamental requirement to correctly project changes in their dynamics due to anthropogenic forcing. In order to evaluate how closely models match observations, we need algorithms capable of selecting, processing and evaluating relevant dynamical features of the climate components. This has to be reiterated efficiently for large datasets such as those issued from the Coupled Model Intercomparison Project 6 (CMIP6). In this work, we use Latent Dirichlet Allocation (LDA), a statistical learning method initially designed for natural language processing, to extract synoptic patterns from sea-level pressure data and evaluate how close the dynamics of CMIP6 climate models are to the state-of-the-art reanalyses datasets such as ERA5 or NCEPv2.

LDA allows for learning a basis of decomposition of maps into objects called "motifs". Applying it to sea-level pressure data, reanalysis or simulation, robustly yields motifs that are known relevant synoptic objects, i.e. cyclones or anticyclones. Furthermore, LDA provides their weight in each of the maps of the dataset, their most probable geographical position and their possible changes due to internal variability or external forcings. LDA decomposition is efficient because most of the information of a given sea-level pressure map is contained in about 5 motifs, making it possible to decompose any map in a limited number of easy-to-interpret synoptic objects. This allows for a variety of new angles for statistical analysis.

We look at the dominant motifs and their distributions either on entire datasets or conditionally to particular extreme events, such as cold or heat waves, and compare results between reanalysis data and historical simulations. This enables us to assess which models can or cannot reproduce statistical properties of the observations, and whether or not there are properties that no model yet demonstrates. We find that models can capture the statistical synoptic composition of sea-level pressure data in general, but that some drawbacks still exist in the modelling of extreme events. LDA can also be applied separately to each dataset, and the two resulting synoptic bases can be compared. We find the sets of motifs from reanalysis and historical simulations are very similar, even if different spatial resolutions are used.

How to cite: Malhomme, N., Faranda, D., Podvin, B., and Mathelin, L.: Latent Dirichlet Allocation: a new machine learning tool to evaluate CMIP6 climate models atmospheric circulation, EMS Annual Meeting 2022, Bonn, Germany, 5–9 Sep 2022, EMS2022-560, https://doi.org/10.5194/ems2022-560, 2022.

Supporters & sponsors