&nbsp;Reconstruction, Regionalization, and Prediction of Tropospheric Pollution in the Mediterranean Basin: A Machine Learning Approach

Francisco Sánchez-Jiménez; Eloisa Raluy-López; Leandro Cristian Segado-Moreno; Ester García-Fernández; Pedro Jiménez-Guerrero; Juan Pedro Montávez

doi:https://doi.org/10.5194/egusphere-egu25-19339

[Back] [Session AS3.28]

EGU25-19339, updated on 15 Mar 2025

https://doi.org/10.5194/egusphere-egu25-19339

EGU General Assembly 2025

© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.

Oral | Thursday, 01 May, 16:45–16:55 (CEST)

Room M2

Reconstruction, Regionalization, and Prediction of Tropospheric Pollution in the Mediterranean Basin: A Machine Learning Approach

Francisco Sánchez-Jiménez¹, Eloisa Raluy-López¹, Leandro Cristian Segado-Moreno¹, Ester García-Fernández¹, Pedro Jiménez-Guerrero², and Juan Pedro Montávez¹

Francisco Sánchez-Jiménez et al.

¹Physics of the Earth, Regional Campus of International Excellence (CEIR) “Campus Mare Nostrum", University of Murcia, Spain (francisco.sanchez16@um.es) (francisco.sanchez16@um.es)
²Biomedical Research Institute of Murcia (IMIB-Arrixaca), Spain

Atmospheric pollution at the tropospheric level is a critical concern, particularly in the Mediterranean basin, which experiences significant air quality challenges. This study focuses on key pollutants: ozone (O₃), particulate matter (PM₁₀ and PM₂₅), nitrogen monoxide (NO), and nitrogen dioxide (NO₂). Hourly measurements from 3323, 4727, 2317, 3446, and 4933 monitoring stations, respectively, spanning the period 2000–2022, were analyzed. These data, sourced from the AirBase database provided by the European Environmental Agency (EEA), exhibit challenges typical of long-term monitoring, such as missing data, inconsistencies, outliers, and station reassignments due to relocations.

To address these challenges, a robust and reliable database was constructed, applying advanced data-cleaning techniques to ensure data quality while maximizing valid entries. Subsequently, a backward-reconstruction algorithm for time series was developed, leveraging the higher data density available from 2013 onwards. This algorithm, based on Bayesian Ridge Regression and interpolation methods, successfully reconstructed historical records station by station, incorporating crucial temporal trends and spatial coherence. The methodology enabled complete reconstruction for stations with sufficient data quality post-2013.

The reconstructed dataset facilitated a regional clustering analysis, grouping stations by similar spatiotemporal pollution patterns. This regionalization revealed distinct areas with shared trends in tropospheric pollution evolution. Integrating meteorological variables such as solar radiation, temperature, cloud cover, precipitation, and pollution persistence further enriched the analysis. Advanced machine learning techniques, including Principal Component Analysis (PCA) and Random Forest models, were employed to develop predictive models for each pollutant, enabling accurate contamination forecasts.

This research highlights the potential of combining statistical reconstruction techniques, spatiotemporal clustering, and machine learning to enhance our understanding and prediction of atmospheric pollution trends. By addressing long-standing data issues and leveraging modern computational tools, the study contributes a robust framework for long-term air quality analysis in the Mediterranean region, offering insights applicable to other regions facing similar challenges.

How to cite: Sánchez-Jiménez, F., Raluy-López, E., Segado-Moreno, L. C., García-Fernández, E., Jiménez-Guerrero, P., and Montávez, J. P.: Reconstruction, Regionalization, and Prediction of Tropospheric Pollution in the Mediterranean Basin: A Machine Learning Approach, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-19339, https://doi.org/10.5194/egusphere-egu25-19339, 2025.