- National Disaster Management Research Institute, Disaster Scientific Investigation Division
As similar disasters and accidents continue to occur, public concern about the limitations of existing disaster response systems and the need for institutional improvement is increasing. The National Disaster Management Research Institute of Korea conducts disaster cause investigations as part of its statutory responsibilities, examining problems observed before and after disasters, institutional weaknesses, and public demands for improvement. In this context, news data provide valuable unstructured information that reflects on-site conditions, response activities, policy debates, and public opinion, and thus complement official investigation records in understanding institutional and managerial factors related to disasters.
This study aims to develop a media analysis framework based on big data and text mining for use in disaster cause investigations. Disaster-related news articles were first collected, and a large language model (Gemini) was applied to identify and extract sentences that describe problems and suggested improvements in the stages of disaster occurrence and response. The extracted sentences were then processed using natural language processing techniques, including stopword removal and the merging of duplicate and semantically similar sentences. Based on semantic similarity, the remaining sentences were grouped to organize major issues. In addition, nouns were extracted and their frequencies were analyzed by year to identify key terms and to examine changes in topics emphasized in media coverage.
Applying the proposed framework to the disaster cause investigation of the 2023 Osong Underpass Flooding Disaster conducted in 2025, we identified 21 problem items grouped into seven categories, such as insufficient pre-closure of the underpass and inadequate maintenance of river embankments. In addition, 17 improvement measures were derived in six categories, including improvements to underpass closure criteria and flood risk grading, as well as the strengthening of river management practices, and were systematically organized and proposed. The results indicate that combining news big data, text mining, and large language models can effectively structure key issues and institutional weaknesses, and can serve as a useful analytical tool for strengthening the evidence base and explanatory power of disaster cause investigations.
How to cite: Kim, J. E., Shin, H., and Choi, S.: A Big Data and Text Mining–Based Media Analysis Framework for Disaster Cause Investigation, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-8542, https://doi.org/10.5194/egusphere-egu26-8542, 2026.