ICUC12-114, updated on 21 May 2025
https://doi.org/10.5194/icuc12-114
12th International Conference on Urban Climate
© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.
Robust imputation of extensive missing data in crowdsourced urban temperature using machine learning models
Miao He1, Zhiwen Luo1, and Xiaoxiong Xie2
Miao He et al.
  • 1Welsh School of Architecture, Cardiff University, Cardiff, United Kingdom of Great Britain - England, Scotland, Wales
  • 2School of Art, Design and Architecture, University of Plymouth, Plymouth, United Kingdom of Great Britain - England, Scotland, Wales

Urban climate observation networks, including crowdsourcing data from citizen weather stations (CWS), provide critical insights into intra-urban climate variability. However, the usability of CWS data is often limited by continuous gaps (weekly or monthly intervals) and high missing rates caused by connection disruptions and power outages. To address this challenge, we propose a machine learning-based framework specifically designed for gap-filling in CWS datasets. Here we evaluate multiple data-driven approaches including Multiple Linear Regression (MLR), Random Forest (RF), and Multilayer Perceptron (MLP), by leveraging relationships between CWS and official weather station data during periods of data availability. During training, Bayesian optimization is used to optimize hyperparameters, while a model-based feature selection process mitigates overfitting by identifying the most relevant predictors for each CWS. Using complete CWS and OWS datasets from various urban areas in London in July 2018, the MLP-based models incorporating temporal and meteorological predictors demonstrate superior performance across various missing scenarios. Under the most challenging condition (70%-80% missing data with continuous gaps), the MLP model achieves a MAE of 0.59°C, RMSE of 0.73°C, and R² of 0.94. This study provides strategies for addressing continuous data gaps in CWS data, even in small datasets, and provides references for future machine learning-related research.

How to cite: He, M., Luo, Z., and Xie, X.: Robust imputation of extensive missing data in crowdsourced urban temperature using machine learning models, 12th International Conference on Urban Climate, Rotterdam, The Netherlands, 7–11 Jul 2025, ICUC12-114, https://doi.org/10.5194/icuc12-114, 2025.

Supporters & sponsors