- 1Forschungszentrum Jülich GmbH, Jülich Supercomputing Centre (JSC), Germany (a.semcheddine@fz-juelich.de; s.melidonis@fz-juelich.de; m.schultz@fz-juelich.de)
- 2University of Cologne, Germany
- 3European Centre for Medium-Range Weather Forecasts, Germany (christian.lessig@ecmwf.int)
Inadequate air quality is still a major cause of illness and premature death, and accurate air pollution analyses and predictions are needed to enhance human and animal well-being, protect natural and agricultural vegetation, and reduce climate change impacts. In a collaborative effort led by ECMWF and funded by the European Union, the WeatherGenerator, a foundation model for Earth system prediction, has been under development for nearly one year. In this study, we train and evaluate an early prototype of the WeatherGenerator model as first steps towards assessing how foundation models can represent chemical transport and regime behavior over extended forecast horizons. While some machine learning models, like Aurora [1], have demonstrated skillful 3-5 day predictions, air quality services require extended predictability windows (10-30 days) for strategic early warning systems and emission mitigation planning, especially for longer-lived species such as CO and background/regional O₃. WeatherGenerator's multi-resolution infrastructure simultaneously ingests datasets at different spatial and temporal resolutions without regridding, enabling integration of CAMS reanalysis chemistry data (0.75°, 3-hourly) with ERA5 meteorological information at synoptic scales (1°, 6-hourly). The model was trained from scratch on 2003-2021 observational data to forecast four reactive chemical species (O₃, CO, NO, NO₂) and three particulate matter size fractions (PM1, PM2.5, PM10). The training was conducted in two stages: pre-training the model on 2-step autoregressive rollouts, followed by fine-tuning on 8-step rollouts. Predictions span the entire tropospheric column including surface-level concentrations and 13 vertical levels from the boundary layer (1000 hPa) to the upper troposphere and tropopause region (50 hPa). We evaluate this proof-of-concept using 30-day autoregressive forecasts initialized from June 1, 2022. The trained model demonstrated stable 30-day continuous prediction of all species across all vertical levels, with comparable 5-day forecast skill to CAMS (at 0.75° resolution). Extended evaluation over June-November 2022 is currently underway to enable direct benchmark comparison. Notably, WeatherGenerator's training and fine-tuning required only 127 hours on 8 NVIDIA A100 GPUs. Ongoing work includes: (1) expanding training data to incorporate CAMS operational analyses at higher spatial resolution, (2) hyperparameter optimization, and (3) quantitative comparison to existing air quality forecast models to contextualize skill relative to operational CAMS and other baseline systems.
[1] Bodnar C, Bruinsma WP, Lucic A, Stanley M, Allen A, Brandstetter J, Garvan P, Riechert M, Weyn JA, Dong H, Gupta JK. A foundation model for the Earth system. Nature. 2025 May 21:1-8.
How to cite: Semcheddine, B. A., Melidonis, S., Schultz, M. G., and Lessig, C.: Tropospheric Transport Simulation with the WeatherGenerator Prototype Model, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-11888, https://doi.org/10.5194/egusphere-egu26-11888, 2026.