- 1Environment and Climate Change Canada, Toronto, Canada (dominique.brunet@canada.ca)
- 2Adam Mickiewicz University, Poznan, Poland (mateusz.taszarek@amu.edu.pl)
- 3University of Waterloo, Waterloo, Canada (j82su@uwaterloo.ca)
- 4University of Manitoba, Winnipeg, Canada (John.Hanesiak@umanitoba.ca)
Forecasting lightning from environmental parameters is well established for mid-latitude regions over land. A combination of lifted index (LI), convective available potential energy (CAPE), convective inhibition (CIN) and relative humidity in the mid-troposphere are known to be good predictors of lightning occurrence. However, forecasting lightning globally with these parameters is less skillful, particularly in the tropics and ocean surface. Using the XGBoost machine learning method, we implemented a global lightning forecast from a selection of environmental parameters derived from ERA5 and thundeR package. The method is trained efficiently on several millions of pairs between global lightning observations and hundreds of convective parameters. In a first experiment, called BigXGB, we trained models over the entire globe using all parameters. For a second experiment, called RegionalizedXGB, we trained four different models for : land mid-latitude (LM), ocean mid-latitude (OM), land tropics (LT) and ocean tropics (OT). Finally, in a third experiment we incrementally dropped the least important feature in term of information gain until only one feature remained. When trained on years 2019-2022, BigXGB achieved a ROC-AUC score of 0.94 for entire domain (LM: 0.97, LT: 0.92, OM: 0.98, OT: 0.95) on the 2023 test year, with a special CIN formulation (MU5_CIN_4km), a special LI formulation (MU5_LI_eff), total column cloud ice water (tciw), and total column liquid supercooled water (tcslw), being the four most important features. RegionalizedXGB obtained similar scores to BigXGB when using the same set of features, but with the most important features varying by region. The most important features for LM and OM were related to LI, CIN and CAPE while for LT and OT the most skillful predictors were more diversified. Incrementally dropping features showed that only 40-50 features are necessary to obtain top performance, with significant performance declines below 15 features. Many top convective parameters are variants of different parcel types (most-unstable, mixed-layer, etc.), indicating that a variety of flavours of the same convective parameters help to increase predictive accuracy. A calibrated probabilistic lightning occurrence forecast was then obtained by isotonic regression between raw uncalibrated predictions and frequency of observations. This new global lightning prediction machine learning-based model opens the door to design global lightning climatology for the past 75 years and for implementing accurate lightning diagnostics in operational global numerical weather prediction.
How to cite: Brunet, D., Taszarek, M., Su, J., and Hanesiak, J.: Machine Learning-based Global Lightning Prediction from Convective Parameters, 12th European Conference on Severe Storms, Utrecht, The Netherlands, 17–21 Nov 2025, ECSS2025-319, https://doi.org/10.5194/ecss2025-319, 2025.
Comments on the supplementary material
AC: Author Comment | CC: Community Comment | Report abuse
Post a comment