Reconstruction, Regionalization, and Prediction of Tropospheric Pollution in the Mediterranean Basin: A Machine Learning Approach
Atmospheric pollution at the tropospheric level is a critical concern, particularly in the Mediterranean basin, which experiences significant air quality challenges. This study focuses on key pollutants: ozone (O₃), particulate matter (PM₁₀ and PM₂₅), nitrogen monoxide (NO), and nitrogen dioxide (NO₂). Hourly measurements from 3323, 4727, 2317, 3446, and 4933 monitoring stations, respectively, spanning the period 2000–2022, were analyzed. These data, sourced from the AirBase database provided by the European Environmental Agency (EEA), exhibit challenges typical of long-term monitoring, such as missing data, inconsistencies, outliers, and station reassignments due to relocations.
To address these challenges, a robust and reliable database was constructed, applying advanced data-cleaning techniques to ensure data quality while maximizing valid entries. Subsequently, a backward-reconstruction algorithm for time series was developed, leveraging the higher data density available from 2013 onwards. This algorithm, based on Bayesian Ridge Regression and interpolation methods, successfully reconstructed historical records station by station, incorporating crucial temporal trends and spatial coherence. The methodology enabled complete reconstruction for stations with sufficient data quality post-2013.
The reconstructed dataset facilitated a regional clustering analysis, grouping stations by similar spatiotemporal pollution patterns. This regionalization revealed distinct areas with shared trends in tropospheric pollution evolution. Integrating meteorological variables such as solar radiation, temperature, cloud cover, precipitation, and pollution persistence further enriched the analysis. Advanced machine learning techniques, including Principal Component Analysis (PCA) and Random Forest models, were employed to develop predictive models for each pollutant, enabling accurate contamination forecasts.
This research highlights the potential of combining statistical reconstruction techniques, spatiotemporal clustering, and machine learning to enhance our understanding and prediction of atmospheric pollution trends. By addressing long-standing data issues and leveraging modern computational tools, the study contributes a robust framework for long-term air quality analysis in the Mediterranean region, offering insights applicable to other regions facing similar challenges.