Big data techniques to improve the performance of an air quality model in a mega-city with limited air pollution monitoring
- 1(1) Centro de Investigaciones del Mar y la Atmósfera (CIMA), UBA-CONICET-CNRS-IRD IFAECI, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Buenos Aires, Argentina
- 2(2) Leloir Institute - IIBBA/CONICET, Buenos Aires, Argentina
Urban-scale atmospheric dispersion models play a crucial role in air quality (AQ) management, enabling the evaluation of pollutant concentration distribution in unsampled regions and determining necessary emission reductions for compliance with local regulations. However, a significant challenge in implementing air quality models is that their performance assessment requires observations from numerous AQ monitoring stations, a resource often lacking in low and middle-income countries. This constraint is particularly evident in the case of the DAUMOD-GRS model, developed for estimating nitrogen dioxide (NO2) and ozone (O3) concentrations in the Metropolitan Area of Buenos Aires (MABA), where AQ monitoring is scarce. In an effort to overcome this limitation and comprehensively understand model outcomes, even in non-monitored areas, we have devised two innovative methods employing big data techniques. The first method focuses on analysing both input and output (I/O) conditions that are associated with elevated air pollutant concentrations, without relying on observational data. For instance, applying a clustering analysis to an ensemble of I/O data related to summer maximum O3 concentrations in the MABA showed four distinct solution patterns varying with emissions. This analysis revealed different ozone dynamics in the suburban areas. A similar approach used to investigate conditions leading to elevated hourly NO2 concentrations suggested that the model's memory effect could contribute significantly to overestimations in low emission zones of the MABA under conditions of low wind speed. The second method was used to analyse the first long time series of hourly NO2 concentrations measured in the city, which have become recently available. This has allowed a comprehensive assessment of the performance of DAUMOD-GRS. While the model shows an overall acceptable performance at the three monitoring sites, a complementary methodology was introduced to discern whether errors are randomly distributed or concentrated in specific regions within the space of the input data conditions. Employing a k-means algorithm on three daily-calculated performance metrics (FB, NMSE and R), we ranked days according to their levels of model performance. This approach revealed a systematic underestimation of NO2 concentration at the coastal monitoring site when winds come from the river, suggesting a significant impact of the southernmost power plant. Furthermore, it highlighted that the removal of the memory effect leads to an improved estimate of the daily maximum NO2 concentrations. Subsequent re-evaluation of the first method after this modification identified a large number of NO2 events concentrated in a few hours during warm months. A detailed analysis of these cases revealed a change in the reporting of low wind speed values from 2010 onwards. These examples show that analysing both I/O data of high pollutant concentrations and disaggregating model errors by short time periods can help identify possible model improvements and increase confidence in model results in a context of limited air quality monitoring.
How to cite: Pineda Rojas, A. and Kropff, E.: Big data techniques to improve the performance of an air quality model in a mega-city with limited air pollution monitoring, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-186, https://doi.org/10.5194/egusphere-egu24-186, 2024.
Comments on the supplementary material
AC: Author Comment | CC: Community Comment | Report abuse