EGU26-22911, updated on 14 Mar 2026
https://doi.org/10.5194/egusphere-egu26-22911
EGU General Assembly 2026
© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.
Oral | Thursday, 07 May, 11:00–11:10 (CEST)
 
Room 0.14
Hierarchical Testing of a Hybrid Machine Learning-Physics Global Atmosphere Model
Ziming Chen1, L. Ruby Leung1, Wenyu Zhou1, Jian Lu2,3, Sandro W. Lubis1, Ye Liu1, Jay Chang1, Bryce E. Harrop1, Ya Wang4, Mingshi Yang5, Gan Zhang5, and Yun Qian1
Ziming Chen et al.
  • 1Atmospheric, Climate, & Earth Sciences (ACES) Division, Pacific Northwest National Laboratory, Richland, Washington, USA
  • 2College of Oceanic and Atmospheric Sciences, Ocean University of China, Qingdao, China
  • 3State Key Laboratory of Physical Oceanography, Ocean University of China, Qingdao, China
  • 4State Key Laboratory of Numerical Modeling for Atmospheric Sciences and Geophysical Fluid Dynamics, Institute of Atmospheric Physics, Chinese Academy of Sciences, Beijing, China
  • 5Department of Climate, Meteorology, and Atmospheric Sciences, University of Illinois Urbana-Champaign, Urbana, United States of America

Machine learning (ML)-based models have recently demonstrated high skill and computational efficiency, often outperforming conventional physics-based models in weather forecasting and subseasonal prediction. While prior efforts have assessed their ability to capture atmospheric dynamics at the synoptic scale, their performance across broader timescales and under out-of-distribution forcing remains insufficiently understood but essential criterion for establishing their credibility in Earth system science.

In this study, we design three idealized test cases to evaluate the Neural General Circulation Model (NeuralGCM), a hybrid model that couples a dynamical core with ML-based physical parameterizations. The test casts span synoptic-scale phenomena, interannual variability, and out-of-distribution forcings via uniform warmings. We benchmark NeuralGCM against observations and conventional physics-based Earth system models (ESMs). At the synoptic scale, NeuralGCM captures the evolution and propagation of extratropical cyclones with performance comparable to ESMs. At the interannual scale, when forced by El Niño-Southern Oscillation sea surface temperature (SST) anomalies, NeuralGCM successfully reproduces associated teleconnection patterns but exhibits deficiencies in capturing nonlinear response. Under out-of-distribution uniform-warming forcings, NeuralGCM simulates similar responses in global-average temperature and precipitation and reproduces large-scale tropospheric circulation features similar to those in ESMs. Notable weaknesses include overestimating the tracks and spatial extent of extratropical cyclones, and biases in the teleconnected wave train triggered by tropical SST anomalies. Furthermore, its simulated temperature responses near the tropopause and in the stratosphere under uniform warming simulations deviate from those in physics-based models, likely due to the biases in vertical temperature advection by the residual circulation.

Despite these limitations, NeuralGCM exhibits credible responses across all test cases and performs comparably to both observations and physics-based ESMs. These results suggest that hybrid models like NeuralGCM, which integrate dynamical cores with ML physics, offer a promising path toward the next generation of ML-based ESMs.

How to cite: Chen, Z., Leung, L. R., Zhou, W., Lu, J., Lubis, S. W., Liu, Y., Chang, J., Harrop, B. E., Wang, Y., Yang, M., Zhang, G., and Qian, Y.: Hierarchical Testing of a Hybrid Machine Learning-Physics Global Atmosphere Model, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-22911, https://doi.org/10.5194/egusphere-egu26-22911, 2026.