Integrating validated large-scale sensor observations into ML-based PM2.5 mapping: lessons from Europe with global relevance

Philipp Schneider; Shobitha Shetty; Amirhossein Hassani; Vasileios Salamalikis; Kerstin Stebel; Paul Hamer; Terje Koren Berntsen; Nuria Castell

doi:https://doi.org/10.5194/egusphere-egu26-7366

[Back] [Session AS5.11]

EGU26-7366, updated on 14 Mar 2026

https://doi.org/10.5194/egusphere-egu26-7366

EGU General Assembly 2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

Integrating validated large-scale sensor observations into ML-based PM2.5 mapping: lessons from Europe with global relevance

Philipp Schneider¹, Shobitha Shetty^1,2, Amirhossein Hassani¹, Vasileios Salamalikis¹, Kerstin Stebel¹, Paul Hamer¹, Terje Koren Berntsen³, and Nuria Castell¹

Philipp Schneider et al.

¹NILU, Kjeller, Norway
²SatSure, Bangalore, India
³University of Oslo, Oslo, Norway

Low-cost sensor (LCS) networks can complement sparse regulatory monitoring, but their value depends on robust integration strategies that preserve data quality while exploiting dense spatial sampling. Here we assess the added value of incorporating validated LCS PM_2.5 observations into the S-MESH (Satellite and ML-based Estimation of Surface air quality at High resolution) machine learning framework (Shetty et al., 2024, 2025) to generate continental-scale, 1 km resolution surface PM_2.5 fields across Central Europe. Two integration strategies are evaluated for 2021–2022 within a stacked XGBoost architecture driven by satellite aerosol optical depth, meteorological predictors, and CAMS regional forecasts: a) using LCS data as an additional training target (LCST), and b) using LCS information as a model input feature (LCSI) via an inverse-distance-weighted spatial convolution layer that encodes local sensor influence. Relative to a baseline trained only on official monitoring stations, LCSI yields consistent performance gains, with RMSE reductions of ~15–20% in urban areas, whereas LCST provides less consistent improvement. The resulting high-resolution mapping product achieves skill comparable to the CAMS regional reanalysis, often considered as a modelling “gold standard” for European air-quality assessment, and in some evaluations surpasses it, with lower annual mean absolute error (2.68 vs 3.32 µg m⁻³) (Shetty et al., 2026). This demonstrates that a data-fusion ML approach including LCS information can deliver reanalysis-level performance at 1 km resolution while requiring only modest computational resources compared with running full chemical transport model reanalyses, enabling rapid updates and scalable deployment. SHAP-based attribution further suggests that LCSI improves the model’s ability to capture localized pollution variability, while performance degrades where sensor density is low, limiting representation of inter-urban transport.

Although demonstrated in Europe, the underlying methodology, namely combining globally available satellite products and meteorology with quality-controlled LCS networks in a computationally efficient ML framework, has potential to strengthen air-quality assessment also in resource-limited settings where regulatory infrastructure is scarce. A requirement for this is that appropriate sensor calibration/validation workflows are in place and equitable partnerships support sustainable sensor deployment and data stewardship.

Shetty, S., Schneider, P., Stebel, K., Hamer, P. D., Kylling, A., and Koren Berntsen, T.: Estimating surface NO2 concentrations over Europe using Sentinel-5P TROPOMI observations and Machine Learning, Remote Sens. Environ., 312, 114321, https://doi.org/10.1016/j.rse.2024.114321, 2024.

Shetty, S., Hamer, P. D., Stebel, K., Kylling, A., Hassani, A., Berntsen, T. K., and Schneider, P.: Daily high-resolution surface PM2.5 estimation over Europe by ML-based downscaling of the CAMS regional forecast, Environ. Res., 264, 120363, https://doi.org/10.1016/j.envres.2024.120363, 2025.

Shetty, S., Hassani, A., Hamer, P. D., Stebel, K., Salamalikis, V., Berntsen, T. K., Castell, N., and Schneider, P.: Evaluating the role of low-cost sensors in machine learning based European PM2.5 monitoring, Environ. Res., 291, 123558, https://doi.org/10.1016/j.envres.2025.123558, 2026.

How to cite: Schneider, P., Shetty, S., Hassani, A., Salamalikis, V., Stebel, K., Hamer, P., Berntsen, T. K., and Castell, N.: Integrating validated large-scale sensor observations into ML-based PM2.5 mapping: lessons from Europe with global relevance, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-7366, https://doi.org/10.5194/egusphere-egu26-7366, 2026.