- 1Department of the Geophysical Sciences, University of Chicago, Chicago, United States of America
- 2Development Innovation Lab, University of Chicago, Chicago, United States of America
- 3Development Innovation Lab India, University of Chicago Trust, India
- 4Data Science Institute, University of Chicago, United States of America
- 5NorthWest Research Associates, Boulder, United States of America
- 6Department of Earth and Planetary Science, University of California, Berkeley, United States of America
- 7Harris School of Public Policy, University of Chicago, United States of America
- 8Climate and Ecosystem Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, United States of America
Rapid advances in artificial intelligence weather prediction (AIWP) have enabled AI models to potentially outperform traditional numerical weather prediction (NWP) models while requiring only a fraction of the computational resources. However, many AI forecast evaluation studies have compared models using global metrics over limited years without focusing on sector and region-specific applications. Operationally driven benchmarking is necessary to effectively deploy these models, informing both model selection and improvements for different decision-making needs. Such benchmarking has been instrumental in driving AI progress in areas like ImageNet and AlphaFold. In this work, we benchmark the performance of six state-of-the-art AIWP models (AIFS, FuXi, FuXi-S2S, GraphCast, GenCast, NeuralGCM) and an NWP model (IFS) in forecasting local-scale agriculturally relevant monsoon onset over India. The models’ onset forecasts are compared with over a century of rain gauge–based ground truth observations, using standard verification metrics for both deterministic and probabilistic forecasts. This multiperiod evaluation is specifically designed to align with how such forecasts will be disseminated to stakeholders. In this operationally oriented benchmarking, we find that most AIWP models outperform the climatological baseline forecasts at medium-range timescales (~15 days), but exhibit comparable skill at subseasonal timescales (~30 days) in the core monsoon zone. These models also achieve comparable performance to IFS, while enabling calibration of probabilistic forecasts through precisely controlled ensembles that can be efficiently generated for multiple past decades. The speed and open-source nature of AIWPs provide the additional advantage that one can localize such models.
This benchmark guided model selection for large-scale AI-based generation and dissemination of the 2025 monsoon onset forecast to 38 million farmers in India. Our work presents a framework for developing operational, decision-oriented benchmarks that can accelerate the translation of the AI-driven second weather revolution into the democratization of weather forecasting worldwide.
How to cite: Masiwal, R., Aitken, C., Marchakitus, A., Gupta, M., Kowal, K., Pahlavan, H., Yang, T., Sun, Y. Q., Jina, A., Boos, W., and Hassanzadeh, P.: Decision-oriented benchmarking of AI weather models for subseasonal monsoon onset forecasts in India , EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-16083, https://doi.org/10.5194/egusphere-egu26-16083, 2026.