EGU26-18107, updated on 14 Mar 2026
https://doi.org/10.5194/egusphere-egu26-18107
EGU General Assembly 2026
© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.
Oral | Wednesday, 06 May, 14:27–14:30 (CEST)
 
vPoster spot 5
Poster | Wednesday, 06 May, 16:15–18:00 (CEST), Display time Wednesday, 06 May, 14:00–18:00
 
vPoster Discussion, vP.10
Satellite-Based PM2.5 Estimation in Data-Sparse Urban Environments: Comparing Machine Learning and Geostatistical Approaches in Kolkata, India
Anjali Raj1, Tirthankar Dasgupta2, Manjira Sinha2, and Adway Mitra1
Anjali Raj et al.
  • 1Department of Artificial Intelligence, Indian Institute of Technology Kharagpur, India
  • 2Tata Consultancy Services Research, Kolkata, India

Fine particulate matter (PM2.5) is among the foremost environmental determinants of human health, contributing to cardiovascular disease, respiratory illness, and premature mortality. In rapidly urbanizing regions of the Global South, accurate spatial characterization of PM2.5 exposure requires spatially continuous concentration surfaces that also provide reliable uncertainty estimates, yet ground-based monitoring networks remain severely sparse. Kolkata, India’s third-largest metropolitan area (population 14.9 million), exemplifies this challenge: only seven regulatory monitoring stations cover the entire city, leaving large areas unobserved.

This study evaluates how different PM2.5 surface generation strategies—satellite-based machine learning (ML) and spatial interpolation—differ not only in predictive accuracy but also in their ability to provide decision-relevant uncertainty under sparse monitoring conditions. Using six years of daily observations (2019–2024), we compare two complementary approaches. The first employs satellite-based ML, integrating Sentinel-5P trace gases, MODIS aerosol optical depth, ERA5 meteorological reanalysis, and static urban features (VIIRS nightlights, population density) to predict PM2.5. The second evaluates spatial interpolation methods—ordinary kriging, inverse distance weighting (IDW), and simple averaging—using station observations alone.

For satellite-based ML (Random Forest), the station-level model achieved R2 = 0.79 under leave-one-station-out (LOSO) validation, while grid-based model trained on kriging-interpolated targets reached R2 = 0.70 under temporal out-of-sample validation (train: 2019–2022, test: 2023–2024). Feature importance analysis consistently identified dewpoint temperature, air temperature, and surface albedo as dominant predictors, indicating that atmospheric conditions exert stronger control on PM2.5 variability than emission proxies or land-use variables.

For spatial interpolation evaluated under daily LOSO, all methods achieved comparable point prediction accuracy (R2 ≈ 0.85). However, uncertainty calibration diverged sharply. Ordinary kriging achieved 88% empirical coverage for nominal 95% prediction intervals (90% when including observation noise)—approaching theoretical calibration—whereas IDW and simple averaging exhibited severe under-coverage (45–52%), substantially underestimating true prediction error.

These findings yield three key insights: (1) satellite-derived predictors enable spatially complete PM2.5 estimation beyond monitoring locations, though with moderate accuracy; (2) when temporally aligned station data are available, interpolation achieves higher point accuracy than satellite-based ML; and (3) regardless of estimation strategy, only geostatistical approaches provide uncertainty estimates suitable for health-protective decision-making. We conclude that hybrid frameworks combining satellite-based spatial prediction with kriging-derived uncertainty characterization offer a principled pathway for generating spatially complete and risk-aware PM2.5 maps in data-sparse urban environments.

How to cite: Raj, A., Dasgupta, T., Sinha, M., and Mitra, A.: Satellite-Based PM2.5 Estimation in Data-Sparse Urban Environments: Comparing Machine Learning and Geostatistical Approaches in Kolkata, India, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-18107, https://doi.org/10.5194/egusphere-egu26-18107, 2026.