EMS Annual Meeting Abstracts
Vol. 21, EMS2024-326, 2024, updated on 05 Jul 2024
https://doi.org/10.5194/ems2024-326
EMS Annual Meeting 2024
© Author(s) 2024. This work is distributed under
the Creative Commons Attribution 4.0 License.
Oral | Monday, 02 Sep, 10:00–10:15 (CEST)| Aula Magna

Machine Learning methodologies for identifying atmospheric deep convective events in ERA5

Alberto Sanchez-Marroquin1, Jordi Barcons Roca2, Omjyoti Dutta2, Iciar Guerrero Calzas2,3, Lorenzo Rossetto2, Mirta Rodriguez Pinilla2, and Fernando Cucchietti1
Alberto Sanchez-Marroquin et al.
  • 1Barcelona Supercomputing Centre, Spain (alberto.sanchez@bsc.es)
  • 2Mitiga Solutions S.L., Passeig del Mare Nostrum, 15, 08039 Barcelona, Spain
  • 3Escola d’Enginyeria. Universitat Autònoma de Barcelona. C. de les Sitges, 08193 Cerdanyola

Atmospheric deep convection can occur when the warming of the Earth’s surface
by solar radiation leads to buoyant plumes that break through the mixed layer and
produce vertical clouds reaching the tropopause. This phenomenon is associated with
thunderstorms, heavy precipitation, hail, strong winds and other events that cause se-
vere damage to life and property. However, representing deep convection and its as-
sociated events in models is challenging as they depend on many high-resolution sub-
grid processes which are difficult and expensive to simulate. As a consequence, some
approaches based on artificial intelligence and specially Machine Learning (ML) have
recently emerged to bypass some of these limitations of physical models. Here we discuss
some of the ML methodologies implemented in the Convective Day Detector (CDD), a
statistical model designed to identify hazardous convective events at ground level based
on ERA5 reanalysis data.
First, we will describe the CDD, which is a ML classifier based on meteorological
variables from ERA5 reanalysis which are associated with deep convection, such as
convective available potential energy, vertical wind velocity or specific humidity. The
CDD is trained to find the relationship between these variables and the occurrence of
severe weather events such as hailstorms and severe wind from observation-based reports
databases. The trained CDD is subsequently employed to infer the probability of the
occurrence of these convective events beyond the the training region, where observations
are more limited or inconsistent, if available at all.
However, this modelling approach presents many challenges that need to be over-
come. To start with, hazardous convective events are rare and difficult to measure in a
consistent manner. This leads to a very unbalanced training dataset, with many posi-
tive unlabelled data. Therefore, we will discuss some ways to address these problems,
such as under sampling, artificially filtering the storm database or positive unlabelled
learning methodologies. Additionally, the meteorological conditions that lead to the
development of convective events are different depending on the location. As a conse-
quence, we will also discuss transfer learning methodologies to apply a classifier trained
in North America to different regions of the world such as Europe, and how to validate
the results with very scarce and inconsistent observations.

How to cite: Sanchez-Marroquin, A., Barcons Roca, J., Dutta, O., Guerrero Calzas, I., Rossetto, L., Rodriguez Pinilla, M., and Cucchietti, F.: Machine Learning methodologies for identifying atmospheric deep convective events in ERA5, EMS Annual Meeting 2024, Barcelona, Spain, 1–6 Sep 2024, EMS2024-326, https://doi.org/10.5194/ems2024-326, 2024.