EGU24-6282, updated on 08 Mar 2024
EGU General Assembly 2024
© Author(s) 2024. This work is distributed under
the Creative Commons Attribution 4.0 License.

Identifying precipitation types over China using a machine learning algorithm

Yi Wang
Yi Wang
  • University of Exeter, Global Systems Institute, Department of Mathematics and Statistics, United Kingdom of Great Britain – England, Scotland, Wales (

In the context of global warming, changes in extreme weather events may pose a larger threat to society. Therefore, it is particularly important to improve our climatological understanding of high impact precipitation types (PTs), and how their frequency may change under warming. In this study, we use MIDAS (the Met Office Integrated Data Archive System) observational data to provide our best estimate of historical PTs (e.g. liquid rain, freezing rain, snow etc.) over China. We use machine learning (ML) techniques and meteorological analysis methods applied to data from the ERA5 historical climate reanalysis data to find the best variables for diagnosing PTs, and formed training and testing sets, which were input into ML training. We evaluate the diagnostic ability of the Random Forest Classifier (RFC) for different PTs. The results show that using meteorological variables such as temperature, relative humidity, and winds to determine different PTs, ERA5 grid data and MIDAS station data have good matching ability. Comparing the feature selection results with Kernel Density Estimation, it was found that the two methods have consistent results in evaluating the ability of variables to distinguish different PTs. RFC shows strong robustness in predicting different PTs by learning the differences in meteorological variables between 1990 and 2014. It can capture the frequency and spatial distribution of different PTs well, but this capture ability is sensitive to the training methods of the algorithm. In addition, the algorithm finds it difficult to identify events such as hail that are very low frequency in observations. According to the results of testing for different regions and seasons in China, models trained using seasonal data samples have relatively good performance, especially in winter. These results show the potential for combining a RFC with state-of-the-art climate models to effectively project the possible response of different PT frequencies to climate warming in the future. However, the training method of ML algorithm should be selected with caution.

How to cite: Wang, Y.: Identifying precipitation types over China using a machine learning algorithm, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-6282,, 2024.