- 1Eurasia Institute of Earth Sciences, Istanbul Technical University, Istanbul, Türkiye
- 2Faculty of Aeronautics and Astronautics, Istanbul Technical University, Istanbul, Türkiye
- 3Department of Civil and Environmental Engineering, University of Washington, Seattle, WA, USA
Conducting accurate air quality measurements is of critical importance for sustaining environmental and public health; however, gaps due to various reasons in respective datasets often undermine the reliability of subsequent processes.This study, therefore, aims at presenting a novel hybrid methodology that leverages the Optuna framework to optimize the hyperparameters of the Extreme Gradient Boosting (XGBoost) model for imputing missing data within one of the most significant indicators of air quality, namely PM2.5 data. The proposed approach was systematically evaluated under varying data loss scenarios, using synthetic datasets generated under the Missing Completely at Random (MCAR) mechanism with missing rates of 5%, 10%, 20%, and 30%. Traditional interpolation methods (such as linear and spline) and widely adopted machine learning techniques (i.e., random forest, multivariate adaptive regression splines) were also utilized to not only benchmarking but also ensuring a comparative environment. In this sense, three experimental configurations were examined: (1) imputation based solely on the PM2.5 time series, (2) integration of ERA5 reanalysis covariates and (3) inclusion of data from neighboring monitoring stations. The results indicate that the XGBoost-Optuna model outperformed its counterparts across all missing data scenarios, with R2 values of 0.852, 0.874, 0.862, and 0.866 for missing rates of 5%, 10%, 20%, and 30%, respectively. These findings highlight the potential of the XGBoost-Optuna model as a robust tool for handling missing air quality data, ensuring enhanced accuracy across varying data gaps and scenarios.
How to cite: Denizoğlu, M., Sezen, İ., Deniz, A., and Ünal, A.: A Novel Hybrid Approach for Missing PM2.5 Data Imputation Using Optuna-Optimized Extreme Gradient Boosting, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-12477, https://doi.org/10.5194/egusphere-egu25-12477, 2025.