A framework for benchmarking precipitation type classifiers used in weather and climate models&nbsp;

Ali Nazemi; Ramin Ahmadi; Amin Hammad

doi:https://doi.org/10.5194/egusphere-egu26-15328

[Back] [Session HS7.2]

EGU26-15328, updated on 14 Mar 2026

https://doi.org/10.5194/egusphere-egu26-15328

EGU General Assembly 2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

A framework for benchmarking precipitation type classifiers used in weather and climate models

Ali Nazemi, Ramin Ahmadi, and Amin Hammad

Ali Nazemi et al.

Concordia University, Building, Civil and Environmental Engineering, Canada (ali.nazemi@concordia.ca)

Diagnosing precipitation type (ptype) is a major source of uncertainty in hydroclimatological applications. We propose a systematic framework for benchmarking the algorithms used for identifying ptype in numerical weather predictors and climate models. Six widely-used ptype algorithms, proposed by Derouin (1973), Cantin & Bachand (1993), Baldwin & Contorno (1993), Ramer (1993), Bourgouin (2000), and the European Centre for Medium-Range Weather Forecasts (ECMWF, 2024), are considered over a box region in north eastern North America with Montreal at its center. The benchmarking is made using hourly data collected at 25 Automated Surface Observing Systems during the period of 2007 to 2024. All ptype algorithms are fed by ERA5 single- and pressure-level climate reanalysis fields at 0.25° resolution. We consider four skills for benchmarking: (1) efficiency at the local scale, (2) temperature conditioning at the regional scale, as well as (3) spatial, and (4) spatiotemporal coherences. For assessing the efficiency at the local scale, we use three measures of precision, recall and F1-score that reveal how modeled ptypes are compared with observed ones at each station. For regional temperature conditioning, we extract probabilities of ptypes conditioned to near-surface temperature and compare the observed and modeled conditional density function using Kolmogorov–Smirnov test and the Wasserstein-1 (W1) distance. For both spatial and spatiotemporal coherences, we consider probabilities of co-occurrence and the Jaccard similarity index at the 0-hour time lag (spatial) and 1–48-hour lags (spatiotemporal) and quantify agreements between modeled and observed ptypes using F1-score. Our results show the excessive weakness of current ptypes algorithms in distinguishing rare and high impacts ptypes, such as freezing rain and ice pellets. Temperature conditioning show that rain, freezing rain, and ice pellets are frequently shifted toward colder regimes with W1 reaching up to 8.3 °C. While rain classification shows moderate spatial realism, the skills in snow and freezing rain are substantially weaker. When temporal structure is added, the coherence is declined even further, with Bourgouin (2000) standing out among other algorithms with F1-score reaching to 0.5 for freezing rain and 0.61 for other/mixed types. Our findings are a call for improving ptype algorithms in weather and climate models, particularly for predicting rare but high impact ptypes.

How to cite: Nazemi, A., Ahmadi, R., and Hammad, A.: A framework for benchmarking precipitation type classifiers used in weather and climate models , EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-15328, https://doi.org/10.5194/egusphere-egu26-15328, 2026.