EGU24-7377, updated on 08 Mar 2024
https://doi.org/10.5194/egusphere-egu24-7377
EGU General Assembly 2024
© Author(s) 2024. This work is distributed under
the Creative Commons Attribution 4.0 License.

Calibration of integrated low-cost environmental sensors based on machine learning with multiple scenes

Fang Nan1, Chao Zeng1, and Huanfeng Shen1,2,3
Fang Nan et al.
  • 1School of Resource and Environmental Sciences, Wuhan University, Wuhan, China (e-mail: nf2020@whu.edu.cn; zengchao@whu.edu.cn; shenhf@whu.edu.cn)
  • 2Collaborative Innovation Center of Geospatial Technology, Wuhan, China(e-mail: shenhf@whu.edu.cn)
  • 3Hubei Luojia Laboratory, Wuhan, China(e-mail: shenhf@whu.edu.cn)

With increasing attention to urban temperature and outdoor thermal comfort, monitoring urban microenvironments at a lower cost is an effective method to supplement the spatiotemporal deficiencies of traditional monitoring networks. But widespread use of low-cost sensors has been hampered by uncertainty about their data quality. The calibration of low-cost sensors is key to promoting their large-scale application and increasing people's confidence in related research. The purpose of this study is to calibrate low-cost integrated environmental sensors and effectively improve their hourly data quality based on an IoT case study in Wuhan, China.

Based on the standards of 24 traditional weather stations in different locations of the meteorological regulatory authorities, this study applied a total of eight machine learning (ML) algorithms to calibrate low-cost sensors and compared their performance. The eight ML algorithms are: (a) Multiple Linear Regression (MLR); (b) Random Forest (RF); (c) K-Nearest Neighbors (KNN); (d) Gradient Boosting Regression Tree (GBRT); (e) Decision Tree (DT); (f) AdaBoost; (g) Bagging; (h) Extremely randomized Trees (Extra-Trees). Hourly raw data collected by 34 low-cost sensors deployed near traditional weather stations were calibrated, and the model was tested using ten-fold cross-validation. The two farthest locations are 121km apart in a straight line, and the maximum data collected from a single sensor is 12,406 hours. In addition, the model migration effects in different field scenarios were also considered, including six typical land surface types, namely built area, scrub, water, artificial surfaces, woodland, and cultivated land.

The results show that the random forest model shows better performance than other models on multiple low-cost sensors at different locations. By applying our method, it shows an average improvement with its R-squared value from 0.682 to 0.980, Root Mean Square Error (RMSE) from 5.989 to 1.355, and Mean Absolute Error (MAE) from 4.250 to 0.932. The random forest model has a better migration effect in similar scenarios. Using a model with a surface type that is more similar to the sensor to be calibrated, the average R-squared obtained by calibrating 34 sensors is 0.946, and the average MAE is 1.584. At the same time, the distance between the sensor to be calibrated and the best-performing migration model was also considered, with the farthest straight-line distance being 94km and the nearest being 7km.

This study introduces a calibration method for low-cost meteorological integrated sensors for long-term complex field environment monitoring. Moreover, we compared the migration effect of the random forest model in different typical scenes in the field. Similar surface types are more beneficial to model migration. Even in locations far apart, our model still has stable performance. The results show that this method can significantly improve data quality and increase user confidence in low-cost environmental sensor applications.

How to cite: Nan, F., Zeng, C., and Shen, H.: Calibration of integrated low-cost environmental sensors based on machine learning with multiple scenes, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-7377, https://doi.org/10.5194/egusphere-egu24-7377, 2024.