Enhancing Urban PM2.5 Predictions: An Innovative Machine Learning Approach to Address Data Gaps
- 1Dept. of Engineering ‘Enzo Ferrari’, University of Modena and Reggio Emilia, Modena, 41125, Italy
- 2University School for Advanced Studies IUSS Pavia, Palazzo Del Broletto, Pavia, 27100, Italy
- 3School of Geography, Earth and Environmental Sciences, University of Birmingham, Birmingham, B15 2TT, United Kingdom
- 4Department of Environmental Sciences, Faculty of Meteorology, Environment and Arid Land Agriculture, King Abdulaziz University, Jeddah 21589, Saudi Arabia
PM2.5 (Particulate matter < 2.5 mm in diameter) pollution is a significant environmental and public health concern in Europe. According to the European Environmental Agency (EEA, 2023), 97% of the urban population are exposed to concentrations over the World Health Organization's (WHO) 2021 annual limit of 5 µg m-3. Various models predict PM levels, such as Chemical Transport Models (CTMs) and statistical approaches based on meteorological variables. Machine Learning (ML) tools, particularly tree based alogorithms, outperform linear models due to the non-linear response of atmospheric species to environmental conditions and emissions.
Our research aims to introduce a novel methodology for predicting PM2.5 levels at fine spatial and temporal scales using ML tools. The primary objective is to showcase the methodology's capability in estimating missing PM2.5 measurements in urban areas where direct observations are unavailable. To achieve this, we compiled a hybrid dataset using inputs from an intensive aerosol campaign conducted in the Selly Oak neighbourhood of Birmingham, UK, spanning from April 15th to June 20th, 2023. This campaign focused on a 1×1 km² block, housing approximately 10,000 university students. Four low-cost Optical Particle Counters (OPC-N3, Alphasense, UK) were strategically placed at fixed locations within the study area, measuring particle number size distribution (PNSD) in 24 bins from 0.35 – 40 µm, as well as PM1, PM2.5, and PM10 mass concentrations. An additional four OPC-N3 devices were employed for aerosol mapping through a mobile backpack-based arrangement. Data collection adopted a citizen science approach, collaborating with local businesses and schools for static sensors and engaging university students for the deployment of mobile sensors. All sensors underwent calibration by collocating with research-grade instruments at the Birmingham Air Quality Supersite (BAQS), both before and after the campaign.
For a detailed analysis of PM2.5 distribution along different road segments, the network was divided into 30-meter segments, and the centroid was computed for each segment. Spatially resolved proxy variables of atmospheric emissions were assigned to each centroid, including population data, average traffic count by vehicle, road rank, and the average frequency distribution of vehicle speed. The hybrid dataset also integrated meteorological parameters from BAQS (wind speed, wind direction, atmospheric pressure, relative humidity, atmospheric temperature) and aerosol properties from reference instruments at BAQS.
Three distinct calibration approaches were employed: 1) Standard Random Forest Regression (RF) with an 80-20 train-test split to predict PM2.5 levels based on input features (R2 = 0.85, MBE = -0.01 µg m-3). 2) Sensor Transferability Evaluation: Calibrating the RF on a specific OPC unit and evaluating its performance on an independent OPC (best performance R2 = 0.65, MBE = 0.43 µg m-3). This approach assesses the model's generalization across different sensors. 3) Road Transferability Evaluation: Calibrating the model on one road and evaluating its performance on a different new road (R2 = 0.71, MBE = -1.14 µg m-3). This approach explores the model's ability to generalize across different road types.
This methodology holds significant potential for improving spatial resolution beyond regulatory monitoring infrastructure, refining air quality predictions, and enhancing exposure assessments critical for investigating health impacts.
How to cite: Baruah, A., Bousiotis, D., Damayanti, S., Bigi, A., Ghermandi, G., Harrison, R. M., and Pope, F. D.: Enhancing Urban PM2.5 Predictions: An Innovative Machine Learning Approach to Address Data Gaps, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-6448, https://doi.org/10.5194/egusphere-egu24-6448, 2024.