- 1Welsh School of Architecture, Cardiff University, Cardiff, United Kingdom of Great Britain - England, Scotland, Wales
- 2School of Art, Design and Architecture, University of Plymouth, Plymouth, United Kingdom of Great Britain - England, Scotland, Wales
Urban climate observation networks, including crowdsourcing data from citizen weather stations (CWS), provide critical insights into intra-urban climate variability. However, the usability of CWS data is often limited by continuous gaps (weekly or monthly intervals) and high missing rates caused by connection disruptions and power outages. To address this challenge, we propose a machine learning-based framework specifically designed for gap-filling in CWS datasets. Here we evaluate multiple data-driven approaches including Multiple Linear Regression (MLR), Random Forest (RF), and Multilayer Perceptron (MLP), by leveraging relationships between CWS and official weather station data during periods of data availability. During training, Bayesian optimization is used to optimize hyperparameters, while a model-based feature selection process mitigates overfitting by identifying the most relevant predictors for each CWS. Using complete CWS and OWS datasets from various urban areas in London in July 2018, the MLP-based models incorporating temporal and meteorological predictors demonstrate superior performance across various missing scenarios. Under the most challenging condition (70%-80% missing data with continuous gaps), the MLP model achieves a MAE of 0.59°C, RMSE of 0.73°C, and R² of 0.94. This study provides strategies for addressing continuous data gaps in CWS data, even in small datasets, and provides references for future machine learning-related research.
How to cite: He, M., Luo, Z., and Xie, X.: Robust imputation of extensive missing data in crowdsourced urban temperature using machine learning models, 12th International Conference on Urban Climate, Rotterdam, The Netherlands, 7–11 Jul 2025, ICUC12-114, https://doi.org/10.5194/icuc12-114, 2025.