EGU25-19339, updated on 15 Mar 2025
https://doi.org/10.5194/egusphere-egu25-19339
EGU General Assembly 2025
© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.
Oral | Thursday, 01 May, 16:45–16:55 (CEST)
 
Room M2
 Reconstruction, Regionalization, and Prediction of Tropospheric Pollution in the Mediterranean Basin: A Machine Learning Approach
Francisco Sánchez-Jiménez1, Eloisa Raluy-López1, Leandro Cristian Segado-Moreno1, Ester García-Fernández1, Pedro Jiménez-Guerrero2, and Juan Pedro Montávez1
Francisco Sánchez-Jiménez et al.
  • 1Physics of the Earth, Regional Campus of International Excellence (CEIR) “Campus Mare Nostrum", University of Murcia, Spain (francisco.sanchez16@um.es) (francisco.sanchez16@um.es)
  • 2Biomedical Research Institute of Murcia (IMIB-Arrixaca), Spain
Atmospheric pollution at the tropospheric level is a critical concern, particularly in the Mediterranean basin, which experiences significant air quality challenges. This study focuses on key pollutants: ozone (O₃), particulate matter (PM₁₀ and PM₂₅), nitrogen monoxide (NO), and nitrogen dioxide (NO₂). Hourly measurements from 3323, 4727, 2317, 3446, and 4933 monitoring stations, respectively, spanning the period 2000–2022, were analyzed. These data, sourced from the AirBase database provided by the European Environmental Agency (EEA), exhibit challenges typical of long-term monitoring, such as missing data, inconsistencies, outliers, and station reassignments due to relocations.
To address these challenges, a robust and reliable database was constructed, applying advanced data-cleaning techniques to ensure data quality while maximizing valid entries. Subsequently, a backward-reconstruction algorithm for time series was developed, leveraging the higher data density available from 2013 onwards. This algorithm, based on Bayesian Ridge Regression and interpolation methods, successfully reconstructed historical records station by station, incorporating crucial temporal trends and spatial coherence. The methodology enabled complete reconstruction for stations with sufficient data quality post-2013.
The reconstructed dataset facilitated a regional clustering analysis, grouping stations by similar spatiotemporal pollution patterns. This regionalization revealed distinct areas with shared trends in tropospheric pollution evolution. Integrating meteorological variables such as solar radiation, temperature, cloud cover, precipitation, and pollution persistence further enriched the analysis. Advanced machine learning techniques, including Principal Component Analysis (PCA) and Random Forest models, were employed to develop predictive models for each pollutant, enabling accurate contamination forecasts.
This research highlights the potential of combining statistical reconstruction techniques, spatiotemporal clustering, and machine learning to enhance our understanding and prediction of atmospheric pollution trends. By addressing long-standing data issues and leveraging modern computational tools, the study contributes a robust framework for long-term air quality analysis in the Mediterranean region, offering insights applicable to other regions facing similar challenges.

How to cite: Sánchez-Jiménez, F., Raluy-López, E., Segado-Moreno, L. C., García-Fernández, E., Jiménez-Guerrero, P., and Montávez, J. P.:  Reconstruction, Regionalization, and Prediction of Tropospheric Pollution in the Mediterranean Basin: A Machine Learning Approach, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-19339, https://doi.org/10.5194/egusphere-egu25-19339, 2025.