EGU24-1214, updated on 08 Mar 2024
https://doi.org/10.5194/egusphere-egu24-1214
EGU General Assembly 2024
© Author(s) 2024. This work is distributed under
the Creative Commons Attribution 4.0 License.

Forecasting PM2.5 concentrations using machine learning approaches: added value of low-cost monitoring and regional modeling

K. Santiago Hernández1,2, Duvan Nieves1, Jhayron S. Pérez-Carrasquilla1,3, Paola Montoya1, Manuel D. Zuluaga1, and Mauricio Ramírez1
K. Santiago Hernández et al.
  • 1Early Warning System of Medellín and the Aburrá Valley - SIATA, Metropolitan Area of the Aburrá Valley, Medellín, Colombia
  • 2Environmental School, Engineering Faculty, University of Antioquia, Medellín, Colombia
  • 3Department of Atmospheric and Oceanic Science, University of Maryland, College Park, MD, U.S.

Machine Learning (ML) techniques have acquired great importance for forecasting air pollution events, due to their relatively low computational cost and skillful results within horizons of up to 3 days. In this study, we forecast PM2.5 concentrations measured by low-cost sensors in the Aburrá Valley, a densely populated and complex terrain region in the Colombian Andes. ML models such as Artificial Neural Networks, Random Forest, Gradient Boosting, and Support Vector Regression, were trained for each forecast horizon (up to 72 hours) using data from satellites and global atmospheric models, which are available in other cities with little in-situ information. The information includes 2-meter temperature, boundary layer height, latent heat flux, winds at different levels and precipitation from the Global Forecasting System (GFS); total aerosol optical thickness (AOD), dust AOD, black carbon AOD and sea salt AOD data from the CAMS Global Atmospheric Composition Forecast; and an index calculated from predicted back-trajectories and the fire radiative power derived from MODIS satellite-monitored hotspots, which allows accounting for long-range transport of biomass burning aerosols. As an added value, we investigated the effect of including data from real-time PM2.5 concentrations from low cost sensors, as well as operational forecast information from the Early Warning System of Medellín and the Aburrá Valley (SIATA) with the WRF regional model. The predictions were evaluated across multiple performance metrics and during an air quality special period in which air pollution increases in the region. Our results show that ML-based forecasts perform better than those obtained directly from CAMS. By including real-time measured information, forecast performance significantly improves during the first 24 hours after initialization. In addition, meteorological data obtained from the WRF model are useful for extending the usefulness of the forecasts to longer horizons (2 to 3 days). Since this approach is based on satellite data and global atmospheric models, it can be easily replicated in other cities with scarce in-situ information. Finally, this work highlights the usefulness of these tools for air quality management and serves as a reference framework for the implementation of forecasting tools in other cities with scarce air quality data.

How to cite: Hernández, K. S., Nieves, D., Pérez-Carrasquilla, J. S., Montoya, P., Zuluaga, M. D., and Ramírez, M.: Forecasting PM2.5 concentrations using machine learning approaches: added value of low-cost monitoring and regional modeling, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-1214, https://doi.org/10.5194/egusphere-egu24-1214, 2024.