- 1IMT Mines Albi, France
- 2Department of Risks and Prevention, French Geological Survey BRGM, France
- 3Dauphine Recherches en Management (DRM) UMR 7088, France
- 4Predict Service, France
The increasing availability of social media data offers valuable opportunities for real-time crisis monitoring and disaster management. However, extracting actionable insights from these unstructured, multilingual, and often ambiguous data sources remains a significant challenge, particularly in non-English contexts. In this context, natural language processing (NLP) and machine learning techniques are key tools to automated data extraction and enhance situational awareness for crisis managers, particularly during flash floods and earthquakes.
In crisis management, the rapidly processing and transformation of unstructured social media data into actionable information is essential for effective decision-making. While the literature highlights the value of social media for improving the situational awareness of decision-makers, extracting relevant information remains resource-intensive, especially for most French crisis management units, which lack the necessary tools and resources. Although, several systems exist for extracting automatically information in social media, only few of them deal with French language. One of the main challenges with social media data lies in its inherent ambiguity including semantic variability (context-dependent meanings of words and idioms), informal language (abbreviations, typos, emojis, and neologisms), entity ambiguity (e.g., locations or organizations with identical names), and a high proportion of noisy or irrelevant content.
The French ReSoCIO project addresses these challenges by bringing together experts in earth sciences, AI, social sciences and specialists and software developers in risk management and forecasting to develop a novel approach to social data disambiguation for geospatial visualization of crisis situations. This study introduces an innovative pipeline that combines filtering, entity linking, and geolocation integration to enhance data disambiguation and tailored for real-time predictions. The pipeline first employs a supervised classifier to filter out unrelated tweets. Relevant messages are then processed through an entity linking module, where detected entities are disambiguated by matching them with Wikidata entries. This process leverages embeddings from Wikipedia and compares them with tweet embeddings using CamemBERT, enriching extracted data with contextual and geospatial information. The final step employs large language models LLMs to summarize and linked the extracted information, ensuring that stakeholders receive concise and accurate overviews validated against structured event reports. By characterizing and predicting the impacts and damages of crisis events, this pipeline provides a robust framework for transforming fragmented online data into structured, actionable knowledge.
The system's performance aligns with state-of-the-art models, effectively identifying entities that correspond with the spatiotemporal patterns of actual natural disasters. While this suggests the system's potential utility in enhancing situational awareness for crisis managers by providing timely and accurate geolocated information extracted from social media posts, experimental observation conducted during the ReSoCIO project confirms the contribution of this disambiguation pipeline to French crisis managers.
How to cite: Montarnal, A., Gracianne, C., Caillaut, G., Sabouni, A., Adrot, A., Chave, S., Rigart, L., Faï, F., and Auclair, S.: ReSoCIO: Towards geospatial visualization of Social Media Data by AI-driven Disambiguation. Application to Crisis Management in the French Context., EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-18732, https://doi.org/10.5194/egusphere-egu25-18732, 2025.