Exploring Tree-Based Machine Learning Methods for Estimation of Hail Sizes

Amruta Vurakaranam; Christian Berndt; Katharina Lengfeld; Lukas Josipovic; Markus Schultze; Katharina Schröer

doi:https://doi.org/10.5194/ecss2025-229

[Back] [Session Session 6]

ECSS2025-229, updated on 05 Oct 2025

https://doi.org/10.5194/ecss2025-229

12th European Conference on Severe Storms

© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.

Exploring Tree-Based Machine Learning Methods for Estimation of Hail Sizes

Amruta Vurakaranam¹, Christian Berndt², Katharina Lengfeld², Lukas Josipovic

², Markus Schultze², and Katharina Schröer

¹

Amruta Vurakaranam et al.

¹Albert-Ludwigs-Universität Freiburg, Freiburg, Germany
²Deutscher Wetterdienst, Offenbach, Germany

Hail remains one of the most challenging and least understood severe weather hazards in Germany, posing significant challenges for forecasting and contributing to substantial economic losses, particularly in agriculture, infrastructure, and related insurance sectors. While the occurrence and probability of hail have been studied, estimating hail size remains a key open research question from both a forecasting and a climatological perspective.

This study is part of the HAIPI project (Hailstorm Analysis, Impact, and Prediction Initiative) funded by the German weather service DWD, which aims to improve hail size estimation by leveraging various newly developed datasets. These include advanced polarimetric radar products, numerical weather prediction (NWP) outputs, lightning data, and crowd-sourced observations from platforms such as the European Severe Weather Database (ESWD) and the DWD WarnWetter app.

We present first results from a set of tree-based machine learning approaches, including Random Forests and Gradient Boosting methods. These models incorporate atmospheric variables such as convective available potential energy (CAPE), wind shear, and radar products from the DWD’s KONRAD3D forecast system. A comparative analysis of model performance is conducted for both binary classification—distinguishing between severe and non-severe hail using various threshold definitions—and multiclass classification, categorizing hail sizes into three groups: Category 1 (<2 cm), Category 2 (2–5 cm), and Category 3 (≥5 cm).

A preliminary model achieves around 70% accuracy with balanced performance across hail size classes, demonstrating strong potential for operational forecasting. Feature importance analysis identifies radar-derived vertical extent features (e.g., vertical_extent, echo_top_threshold_55dBZ) and model-based reflectivity metrics (e.g., cell_based_VIL) as key predictors. These initial findings highlight the value of integrating radar, model-based, and crowd-sourced data to improve hail size prediction.

How to cite: Vurakaranam, A., Berndt, C., Lengfeld, K., Josipovic, L., Schultze, M., and Schröer, K.: Exploring Tree-Based Machine Learning Methods for Estimation of Hail Sizes, 12th European Conference on Severe Storms, Utrecht, The Netherlands, 17–21 Nov 2025, ECSS2025-229, https://doi.org/10.5194/ecss2025-229, 2025.