Evaluating Environmental and Temporal Performance of Machine Learning Calibration Models for Low-cost Particulate Matter Sensors: A Case Study Across 4 Indian Cities

Roshan Wathore; Devishree Jadhao; Abhishek Chakraborty; Nitin Labhasetwar

doi:https://doi.org/10.5194/egusphere-egu26-976

[Back] [Session AS5.11]

EGU26-976, updated on 13 Mar 2026

https://doi.org/10.5194/egusphere-egu26-976

EGU General Assembly 2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

Evaluating Environmental and Temporal Performance of Machine Learning Calibration Models for Low-cost Particulate Matter Sensors: A Case Study Across 4 Indian Cities

Roshan Wathore^1,2, Devishree Jadhao¹, Abhishek Chakraborty^3,4, and Nitin Labhasetwar^1,2

Roshan Wathore et al.

¹CSIR-National Environmental Engineering Research Institute, CSIR-NEERI, Nagpur 440020, India
²Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
³Environmental Science and Engineering Department, Indian Institute of Technology Bombay, Mumbai, 400076, India
⁴Centre for Climate Studies, Indian Institute of Technology Bombay, Mumbai, 400076, India

Low-cost particulate matter sensors (LCPMS) offer scalable and affordable capabilities that complement regulatory-grade monitoring networks by enabling high-resolution urban air quality monitoring. However, they frequently suffer from inaccuracies arising from environmental and temporal variability, necessitating robust calibration approaches to ensure measurement reliability. Co-location studies against reference-grade monitors, combined with machine learning (ML) calibration algorithms, have emerged as effective strategies to significantly improve LCPMS performance. In this study, a long-term co-location experiment was conducted in Vishakhapatnam, India from February 2018 to January 2020, incorporating environmental and temporal co-variates: temperature, relative humidity, hour-of-day, month-of-year. The baseline Linear Regression (LR) model used only raw sensor readings as input. Subsequent models incrementally incorporated environmental variables (temperature and relative humidity), temporal features (hour of day and month of year), and finally all covariates combined. The ML approaches included LR, Random Forest (RF), eXtreme Gradient Boosting (XGB), and a hybrid ensemble combining the best-performing models, with all comparisons made relative to the baseline LR model. Results demonstrate that ML models, particularly the hybrid ensemble, yielded substantial improvements in predictive accuracy. The baseline LR model exhibited an RMSE of 17.62 µg/m³. In comparison, the best-performing RF model achieved a 58% RMSE reduction, while the hybrid ensemble model attained a 63% reduction relative to baseline, satisfying the performance criteria recommended by USEPA. Additionally, we also explore the performance of the models across the temporal, environmental and the AQI category to identify potential performance variations and inform strategies for maintaining reliable measurements across changing environmental and pollution conditions. Although the hybrid model was overall the best, the analysis highlights that no single model consistently performs optimally across all conditions, suggesting that adaptive calibration strategies, such as using different models for different seasons or environmental conditions, are more effective than relying on a single model throughout the year.

To examine the generalizability of this ML-based calibration framework, we used a publicly available co-location dataset (Campmier et al., 2023) of three Indian cities- Delhi, Hamirpur, and Bangalore, wherein RMSE of the baseline model (factory calibration) is 90.5 µg/m³, 123 µg/m³, and 75.3 µg/m³ respectively and a physics-based Köhler theory calibration model reduced RMSE by 66%, 83% and 75% respectively. In comparison, our calibration framework outperformed these results with reductions of 77%, 95%, and 97% in the respective cities demonstrating strong generalizability across different urban contexts. These improvements highlight the advantages of ML-based methods in capturing nonlinear sensor-environment interactions and addressing the limitations of physics-based or factory-derived calibration algorithms, which assume fixed aerosol properties or rely on simplified empirical relationships. Collectively, our findings indicate that ML-based calibration frameworks enhance measurement accuracy and also generalize effectively across geographically diverse urban Indian environments, which are often characterized by high PM₂.₅ levels. The proposed framework demonstrates its potential to serve as a reliable and scalable solution for improving LCPMS performance in large-scale air quality monitoring efforts and is easy to incorporate, computationally less demanding, and agnostic to sensor models, target pollutants, and calibration approaches.

How to cite: Wathore, R., Jadhao, D., Chakraborty, A., and Labhasetwar, N.: Evaluating Environmental and Temporal Performance of Machine Learning Calibration Models for Low-cost Particulate Matter Sensors: A Case Study Across 4 Indian Cities, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-976, https://doi.org/10.5194/egusphere-egu26-976, 2026.

OSPP voting tool

This contribution takes part in the OSPP contest. Please log in to see the relevant judging section.

Supplementary materials

Supplementary material file

Comments on the supplementary material

AC: Author Comment | CC: Community Comment | Report abuse

supplementary materials version 1 – uploaded on 07 May 2026, no comments