- Department of Earth and Atmospheric Sciences, University of Houston, Houston, TX, USA (abhatt21@cougarnet.uh.edu)
Accurate initiation of deep convection remains a persistent challenge in weather and climate models. Most general circulation models (GCMs) operate at coarse resolution and therefore cannot explicitly resolve convective events; instead, they rely on convective parameterizations in which triggering is diagnosed from environmental thresholds, commonly based on convective available potential energy (CAPE). Convection-permitting models (CPMs) alleviate some of these structural limitations by resolving grid-scale convective spectrum while leaving behind sub-grid scale events. On the other hand, machine learning (ML)-based convection trigger functions have emerged, but still with uncertainty, whose causes are rarely examined. Here, we diagnose the atmospheric states associated with “blind spots” in ML predictors of deep convection initiation, leveraging the Department of Energy Atmospheric Radiation Measurement constrained variational analysis (VARANAL) product and the CPM-based CONUS404 hydroclimate dataset over the Southern Great Plains (SGP). We train a conventional artificial neural network (ANN) and a controlled abstention network (CAN), evaluate their skill in identifying deep convection, and use CAN to quantitatively isolate low-confidence samples while understanding the associated physical conditions in which the models are least reliable. ANN and CAN show comparable baseline performance, and for both models, skill increases when low-confidence samples are excluded, indicating that abstention identifies systematically difficult conditions rather than random noise. Across both VARANAL and CONUS404 datasets, low-confidence samples preferentially occur under weak-to-moderately negative mid-level vertical velocity (−10 to −5 hPa hr⁻¹) and dynamic generation rate of CAPE (dCAPE; 0–200 J kg⁻¹ hr⁻¹). Additionally, these cases are dominated by short, convective episodes that persist for only a few hours, dominantly occurring during the afternoon. These abstention samples also exhibit locally forced, non-equilibrium environments characterized by larger convective adjustment time (τ), consistent with reduced predictability relative to regimes controlled by broader synoptic forcing with smaller τ. Collectively, our results quantitatively identify the regimes and associated physical mechanisms in which ML-based convection predictors are least robust, providing actionable guidance for operational forecasters to treat predictions with greater caution when these low-confidence conditions are present.
How to cite: Bhattarai, A. and Zheng, Y.: Uncertainty-Aware Machine Learning for Deep Convection Initiation: Insights from ARM Observations and Kilometer-Scale Hydroclimate Reanalysis, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-8710, https://doi.org/10.5194/egusphere-egu26-8710, 2026.