The role of gap-filling observational data in air quality data-fusion methods: a case study with CALIOPE
- 1Barcelona Supercomputing Center, Earth Sciences Department, Spain (ada.barrantes@bsc.es)
- 2Department of Fluid Mechanics, Universitat Politècnica de Catalunya, Barcelona, 08034, Spain
Reliable air quality data are vital for informed decision-making, enabling evidence-based mitigation strategies to improve public health and sustainability. Although monitoring stations are essential for assessing air quality, they have limited spatial representativeness, leaving large extensions of areas without appropriate observational data. On the other hand, numerical air quality systems provide full spatial coverage. Nevertheless, modeled data are affected by persistent uncertainties, mainly due to emission inventories inaccuracies and the complexity of atmospheric processes involved in pollution transport. Data-fusion methods offer bias-corrected air quality maps with full spatial coverage. There is, however, a strong dependence on observational data availability to ensure reliable results of data-fusion methods.
In this study, we quantify the impact of imputing missing observational data in data-fusion methods. We focus on PM2.5 for the region of Catalonia (Northeastern Spain) during 2019, for which data availability is strongly limited. We first present straightforward gap-filling methodologies, such as linear interpolation and persistence (repetition of the previous available value). We then compare these techniques with a state-of-the-art artificial intelligence gap-filling method based on the Gradient Boosting Machine algorithm trained with several years of data (2019-2022). To assess gap-filling methodologies, we generate random gaps of varying characteristics identifying the optimal technique for each gap size and frequency. Finally, we study how these methods affect the data-fusion process applied to the mesoscale air quality model CALIOPE. The output of this system has a horizontal spatial resolution of 1 km x 1 km on a daily scale. The data-fusion method uses universal kriging, a geostatistical technique based on a regression model and the spatial correlation between the model and observational data.
Data-fusion results show significant improvement when using gap-filling observational data. Notably, the method’s effectiveness depends on observation availability, performing better with GBM-filled data.
How to cite: Barrantes, A., Carnerero, C., and Mateu Armengol, J.: The role of gap-filling observational data in air quality data-fusion methods: a case study with CALIOPE, EMS Annual Meeting 2024, Barcelona, Spain, 1–6 Sep 2024, EMS2024-768, https://doi.org/10.5194/ems2024-768, 2024.