Physics-guided machine learning improves spatial structure and transferability in high-resolution NO2 mapping under sparse observations

Wenfu Sun; Frederik Tack; Lieven Clarisse; Michel Van Roozendael

doi:https://doi.org/10.5194/egusphere-egu26-14446

[Back] [Session BG9.7]

EGU26-14446, updated on 14 Mar 2026

https://doi.org/10.5194/egusphere-egu26-14446

EGU General Assembly 2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

Physics-guided machine learning improves spatial structure and transferability in high-resolution NO₂ mapping under sparse observations

Wenfu Sun^1,2, Frederik Tack¹, Lieven Clarisse², and Michel Van Roozendael¹

Wenfu Sun et al.

¹Royal Belgian Institute for Space Aeronomy (BIRA-IASB), Brussels, Belgium
²Université libre de Bruxelles (ULB), Spectroscopy, Quantum Chemistry and Atmospheric Remote Sensing (SQUARES), Brussels, Belgium

Machine learning has become an important tool for producing high-resolution environmental maps, as traditional chemistry-transport models often face limitations in computational cost and spatial detail at the kilometer scale and hourly resolution. At such high spatiotemporal resolution, target fields become highly dynamic and spatially heterogeneous, while ground observations remain sparse. This raises a key question: how can we improve physical consistency and recover realistic spatial structure (e.g., transport-related spatial patterns) when reconstructing high spatiotemporal resolution fields from sparse stations?

We address this question by systematically comparing three machine-learning models for hourly surface mapping of NO₂, a critical air pollutant, at 2 km resolution over Western Europe. All models use the same inputs, including static emission-related fields, satellite remote-sensing products, and meteorological variables, constrained by ground-based measurements from the European Environment Agency’s AirBase network.

Model A is trained using station observations only. Model B extends Model A by introducing wind-driven advection encoding to explicitly consider atmospheric transport. Model C further builds on Model B by incorporating a pretraining stage informed by hourly gridded NO₂ fields at a coarser resolution (10 km) from the Copernicus Atmosphere Monitoring Service (CAMS) European reanalysis. Model B and Model C represent two physics-guided machine learning paradigms.

In the study region, Model A and Model B show similar predictive performance at unobserved stations and similar structural similarity to CAMS fields, while Model C performs best. However, compared to Model A, both Model B and Model C can reproduce plume-like structures that respond coherently to wind-field perturbations, such as changes in plume orientation under altered wind directions. We have also conducted a transfer learning experiment in Central Europe and found that Model C achieves the highest transferability in terms of maintaining spatial structure.

Overall, our results demonstrate that, at high spatiotemporal scales, although including simple advection physics can recover the pollutant's transport, training on stations alone is insufficient to capture dynamics and physically plausible patterns. In contrast, pretraining with large-scale simulation data can more significantly improve spatial structure, physical sensitivity, and transferability, as well as station-based metrics. Our study highlights the importance of pretraining with large-scale simulations for improving physically consistent, transferable learning in complex environmental systems with sparse observations.

How to cite: Sun, W., Tack, F., Clarisse, L., and Van Roozendael, M.: Physics-guided machine learning improves spatial structure and transferability in high-resolution NO2 mapping under sparse observations, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-14446, https://doi.org/10.5194/egusphere-egu26-14446, 2026.