PM2.5 Reconstruction using MERRA-2 using Ensemble Machine Learning Approach and Long-term Analysis for India (1980-2021)

Vikas Kumar; Vasudev Malyan; Manoranjan Sahu

doi:https://doi.org/10.5194/egusphere-egu23-10764

[Back] [Session AS5.13]

EGU23-10764, updated on 26 Feb 2023

https://doi.org/10.5194/egusphere-egu23-10764

EGU General Assembly 2023

© Author(s) 2023. This work is distributed under
the Creative Commons Attribution 4.0 License.

PM2.5 Reconstruction using MERRA-2 using Ensemble Machine Learning Approach and Long-term Analysis for India (1980-2021)

Vikas Kumar¹, Vasudev Malyan², and Manoranjan Sahu^1,2,3

Vikas Kumar et al.

¹Indian Institute of Technology Bombay, Indian Institute of Technology Bombay, Interdisciplinary Programme (IDP) in Climate Studies, India (214400003@iitb.ac.in)
²Indian Institute of Technology Bombay, Indian Institute of Technology Bombay, Environmental Science and Engineering Department, India (malyanvasudev@iitb.ac.in)
³Indian Institute of Technology Bombay, Indian Institute of Technology Bombay, Centre for Machine Intelligence and Data Science, India (mrsahu@iitb.ac.in)

Particle exposure affects more humans globally than any other air pollutant. However, due to expensive instruments and infrastructural deficiency, a high spatiotemporal network of monitoring stations is not possible, leading to data-scarce regions. Satellite and reanalysis datasets can be implemented to estimate particulate matter, but they do not provide surface concentration and needs to be reconstructed from the components. In this study, a machine learning (ML) framework is implemented to reconstruct PM_2.5 from MERRA-2 data components, namely black carbon (BC), organic carbon (OC), dust (DUST), sea salt (SS), and sulfate (SO₄). The ground level and respective MERRA-2 data were collected from India's 335 continuous ambient air quality monitoring stations (CAAQMS) for 2017-2021 at hourly resolution. Random forest (RF) performs better with train and test scores (R²) of 0.86 and 0.74, respectively, while the empirical equation provides an R² of only 0.27 on test data. The estimated PM_2.5 for Indian states from 1980-2021 indicates a significant increase in most cases. However, states in the Indo-Gangetic plain such as Delhi, Punjab, Haryana, and Uttar Pradesh are the most polluted regions of India. The major shift in concentration is from 2000 onwards, which can be seen as a direct result of the economic liberalization policies implemented in 1991. The results provide evidence for the limitations of the broad application of the empirical equation and the feasibility of ML algorithms as a potential reconstruction technique for developing robust and accurate region-specific models from MERRA-2 data.

How to cite: Kumar, V., Malyan, V., and Sahu, M.: PM2.5 Reconstruction using MERRA-2 using Ensemble Machine Learning Approach and Long-term Analysis for India (1980-2021), EGU General Assembly 2023, Vienna, Austria, 24–28 Apr 2023, EGU23-10764, https://doi.org/10.5194/egusphere-egu23-10764, 2023.

Supplementary materials

Supplementary material file