- 1Development Innovation Lab India, University of Chicago Trust, India, 560025
- 2Development Innovation Lab, University of Chicago, IL, 60637
- 3Department of the Geophysical Sciences, University of Chicago, IL, 60637
- 4Data Science Institute, University of Chicago, IL, 60637
- 5Harris School of Public Policy, University of Chicago, IL, 60637
- 6Department of Earth and Planetary Science, University of California, Berkeley, Berkeley, California, 94720
- 7Climate and Ecosystem Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, California, 94720
False monsoon onsets involve an early-season wet spell followed by a prolonged dry spell, often resulting in agricultural losses when sowing is initiated during premature rains and farmers are unprepared for the dry conditions. Despite its importance for risk reduction for hundreds of millions of farmers in the tropics, the predictability of these pre-monsoonal wet–to–dry events remains largely unexplored. Here, we benchmark six state-of-the-art artificial intelligence weather prediction (AIWP) models (AIFS, FuXi, FuXi-S2S, GraphCast, GenCast, NeuralGCM) and a numerical weather prediction (NWP) model (IFS) against novel, decision-relevant historical reference forecasts to assess the ability to predict the false monsoon onset at lead times up to 30 days. We find that both AIWP and NWP models exhibit positive predictive skill in the core monsoon zone of India, with ensemble-based probabilistic models retaining positive predictive value relative to these reference forecasts across all lead times. Deterministic skills vary strongly with regions, with good short-lead predictability (0-10 days) and a decrease in skills at longer lead times (11-30 days). We further evaluated the models using well-documented canonical false onset events from the literature and found that skillful forecasts are associated with the ability to reproduce the large-scale circulation evolution characteristic of false onsets, in particular the progression from a transient monsoon-like state to a subsequent circulation collapse that produces a dry spell.
We use agriculturally relevant thresholds to define monsoon onset, wet spells, and dry spells. To enable a meaningful assessment of model skill, the reference forecast is constructed from 124 years of gridded rain-gauge observations and quantifies the baseline probability of false monsoon onsets within a decision-relevant framework. We first calibrate model-specific event-definition wet- and dry-spell thresholds using quantile mapping within a leave-one-year-out cross-validation framework, rather than applying bias correction directly to rainfall fields. Forecast performance is evaluated using deterministic and probabilistic metrics, including probability of detection, false alarm ratio, critical success index, and Brier score. Reliability diagrams show systematic overconfidence at higher forecast probabilities, indicating the need for additional calibration and post-processing. Together, this framework establishes a decision-relevant benchmark and evaluates current AI-based and physics-based forecast systems for the sub-seasonal early warning of false onsets involving dry spells.
How to cite: Gupta, M., Aitken, C., Masiwal, R., Marchakitus, A., Boos, W., Kowal, K., Jina, A., and Hassanzadeh, P.: Operational benchmarking of AI and NWP models for false monsoon onset prediction in India, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-17228, https://doi.org/10.5194/egusphere-egu26-17228, 2026.