EGU25-20571, updated on 15 Mar 2025
https://doi.org/10.5194/egusphere-egu25-20571
EGU General Assembly 2025
© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.
Poster | Wednesday, 30 Apr, 14:00–15:45 (CEST), Display time Wednesday, 30 Apr, 08:30–18:00
 
vPoster spot 5, vP5.29
Characterization and machine learning prediction of atmospheric pollutants in an urban region of the Cerrado biome
Marco Aurélio Franco1 and Márcio Teixeira2
Marco Aurélio Franco and Márcio Teixeira
  • 1Institute of Astronomy, Geophysics and Atmospheric Sciences, University of São Paulo, São Paulo, Brazil
  • 2School of Technology, University of Campinas, Limeira, SP, Brazil

The Cerrado biome, a globally significant biodiversity hotspot, is undergoing rapid degradation primarily due to anthropogenic activities. Large-scale conversion of native vegetation for agriculture, particularly soybean and cattle ranching, and strong urbanization rates are the main drivers of the biome losses. Additionally, unsustainable water use, infrastructure development, and recurrent fires exacerbate ecosystem degradation, leading to significant biodiversity decline and ecosystem service impairment. A direct consequence of this change in land use is the generation of substantial quantities of air pollutants, mainly particulate matter of 2.5 and 10 𝜇m (PM2.5 and PM10, respectively). These particles, emitted from biomass burning, soil erosion, and dust storms, can penetrate the respiratory tract, leading to various health issues, including respiratory infections, cardiovascular disease, and increased mortality rates. Using measurements of meteorological variables and air pollutants from CETESB (Environmental Company of the State of São Paulo) from 2017 to 2023 in an important urbanized region of the Brazilian Cerrado, we characterized the seasonal distribution of PM2.5 and PM10, together with other pollutants, such as nitrogen oxides (NOx), carbon monoxide (CO) and ozone (O3). In addition, using different combinations of meteorological and air pollution variables, we trained machine learning models to predict the concentration of PM2.5 and PM10. We list Random Forest, XGBoost, and Artificial Neural Networks (ANN) among these models. Our results show that a lower concentration of air pollutants (PM10, PM2.5, CO, and NOx) is observed during summer, while, in contrast, the peak occurs during winter. This is directly related to the seasons with higher and lower precipitation rates. Curiously, O3 peaks in spring and is minimal in autumn, likely related to cloud occurrence. During the whole analyzed period, NOx, PM10, and PM2.5 exceeded the daily average limits of the World Health Organization by about 15, 22 and 35%, respectively. Regarding the predictive models, the random forest better predicted PM10 and PM2.5 concentrations. For PM10, the statistical results for the train (80% of the data)/test (20% of the data) set were R² = 0.79/ 0.92 (p-value < 0.05), with RMSE of 10.7 and 6.5 𝜇g m-3. For PM2.5, the model returned R² = 0.74/0.91, with RMSE of 4.3 and 2.6 𝜇g m-3 for the train/test set, respectively. Although not the best, the ANN also worked relatively well after proper tuning. Future investigations will extend and validate the predictions obtained in this study to other stations in the Cerrado biome with multiple models to spatialize the PM prediction and obtain the regions in which the most air pollutants are emitted. 

How to cite: Franco, M. A. and Teixeira, M.: Characterization and machine learning prediction of atmospheric pollutants in an urban region of the Cerrado biome, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-20571, https://doi.org/10.5194/egusphere-egu25-20571, 2025.