EGU23-382
https://doi.org/10.5194/egusphere-egu23-382
EGU General Assembly 2023
© Author(s) 2023. This work is distributed under
the Creative Commons Attribution 4.0 License.

Latent Dirichlet Allocation: a new machine learning tool to evaluate CMIP6 climate models atmospheric circulation and extremes

Nemo Malhomme1,2, Bérengère Podvin3, Davide Faranda1, and Lionel Mathelin2
Nemo Malhomme et al.
  • 1LSCE, CEA, CNRS, France
  • 2LISN, CNRS, France
  • 3EM2C, CentraleSupélec

Climate models aim at representing as closely as possible the statistical properties of the climate components, including the extreme events. This is a fundamental requirement to correctly project changes in their dynamics due to anthropogenic forcing. In order to evaluate how closely models match observations, we need algorithms capable of selecting, processing and evaluating relevant dynamical features of the climate components. This has to be reiterated efficiently for large datasets such as those issued from the Coupled Model Intercomparison Project 6 (CMIP6). In this work, we use Latent Dirichlet Allocation (LDA), a statistical learning method initially designed for natural language processing, to extract synoptic patterns from sea-level pressure data and evaluate how close the dynamics of CMIP6 climate models are to the state-of-the-art reanalyses datasets such as ERA5 or NCEPv2, in general as well as in the case of extremes.

LDA allows for learning a basis of decomposition of maps into objects called "motifs". Applying it to sea-level pressure data, reanalysis or simulation, robustly yields motifs that are known relevant synoptic objects, i.e. cyclones or anticyclones. Furthermore, LDA provides their weight in each of the maps of the dataset, their most probable geographical position and their possible changes due to internal variability or external forcings. LDA decomposition is efficient and sparse, most of the information of a given sea-level pressure map is contained in few motifs, making it possible to decompose any map in a limited number of easy-to-interpret synoptic objects. This allows for a variety of new angles for statistical analysis.

We look at the dominant motifs and their distributions either on entire datasets or conditionally to particular extreme events, such as cold or heat waves, and compare results between reanalysis data and historical simulations. This enables us to assess which models can or cannot reproduce statistical properties of the observations, and whether or not there are properties that no model yet demonstrates. We find that models can capture the statistical synoptic composition of sea-level pressure data in general, but that some drawbacks still exist in the modelling of extreme events. LDA can also be applied separately to each dataset, and the two resulting synoptic bases can be compared. We find the sets of motifs from reanalysis and historical simulations are very similar, even if different spatial resolutions are used.

How to cite: Malhomme, N., Podvin, B., Faranda, D., and Mathelin, L.: Latent Dirichlet Allocation: a new machine learning tool to evaluate CMIP6 climate models atmospheric circulation and extremes, EGU General Assembly 2023, Vienna, Austria, 24–28 Apr 2023, EGU23-382, https://doi.org/10.5194/egusphere-egu23-382, 2023.