Indoor and Outdoor Population Prediction Using Location-Based Services Based on Machine Learning Approach
- 1The University of Tokyo, Faculty of Engineering, Systems Innovation, Japan (suzan19990419@g.ecc.u-tokyo.ac.jp)
- 2The University of Tokyo, Faculty of Engineering, Systems Innovation, Japan (kimura-teru232@g.ecc.u-tokyo.ac.jp)
- 3Faculty of Biosphere-Geosphere Science, Okayama University of Science, Kita-ku, Okayama City, Japan (ohashi@ous.ac.jp)
- 4Environmental Management Research Institute, National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba City, Ibaraki, Japan (takane.yuya@aist.go.jp)
- 5TEPCO Research Institute, Tokyo Electric Power Company Holdings, Tsurumi-ku, Yokohama City, Kanagawa, Japan (Kazuki Yamaguchi)
- 6Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa City, Chiba, Japan (ihara-t@k.u-tokyo.ac.jp)
In the fields of research concerning heatstroke, air pollution, and disasters, it is necessary to distinguish between indoor and outdoor populations in order to evaluate the benefit of different measures indoors and outdoors. However, there are few studies such distinctions have been taken into account because it is difficult to obtain the dataset of the distribution of indoor and outdoor populations.
This study predicts the ratio of indoor populations on a 500 m grid basis in Tokyo by machine learning approach. In this prediction, time-series location data, building area, building attributes and temperature is used as explanatory variables and the ratio of indoor populations is set as objective variables. In particular, it focuses on improving the accuracy of predictions for times when the ratio of indoor to outdoor populations stabilizes, which is critical for assessing risks such as heatstroke.
The machine learning model, employing Random Forest, demonstrated high predictive accuracy with an average error of 4%. Additionally, the model's performance improved during morning commute hours, lunch breaks, and evening return times. Also, the effects of various factors such as whether it is a holiday, convenience of transportation, and commercial activities are investigated in terms of the respective increases and decreases in indoor and outdoor populations.
While our methods have some limitations such as accuracy of location data, and the scarcity of data points, these results are expected to have broad social applications in the future, including risk assessments for heatstroke, air pollution, and disasters.
Future work includes establishing more robust validation methods, using more accurate time-series location data, and adding explanatory variables that better represent the characteristics of each grid. This would lead to even more accurate predictions.
How to cite: Suzuki, T., Kimura, T., Ohashi, Y., Takane, Y., Yamaguchi, K., and Ihara, T.: Indoor and Outdoor Population Prediction Using Location-Based Services Based on Machine Learning Approach, EMS Annual Meeting 2024, Barcelona, Spain, 1–6 Sep 2024, EMS2024-221, https://doi.org/10.5194/ems2024-221, 2024.