ITS1.13/NH13.1 | Text data and other emerging data sources in earth system sciences
EDI PICO
Text data and other emerging data sources in earth system sciences
Convener: Lina Stein | Co-conveners: Mariana Madruga de BritoECSECS, Gabriele Messori, Georgia Destouni

Earth System Science is witnessing an ever-increasing availability of textual, digital trace, social sensing, mobile phone, opportunistic sensing, audiovisual, and crowdsourced data. These data open unprecedented new research avenues and opportunities but also pose important challenges, from technical hurdles to skewed coverage, difficulties in quality control, and reproducibility limits.

Textual data is a case in point. Digital newspaper repositories, social media platforms, and archives of peer-reviewed articles provide vast amounts of digitalized text data. At the same time, large language models, such as ChatGPT, have opened new scalable ways of extracting research-relevant and actionable information from texts. However, such models are far from unbiased and may not be transparent, interpretable, or open access, hindering reproducibility. The same holds true for other types of data and associated data mining methods, such as knowledge extraction from images, audio, and videos.

This session welcomes abstracts that explore using text and other emerging data sources in Earth System Sciences, especially in hydrology, natural hazards, and climate research. The session scope spans data analysis methodologies, scientific advances from the analysis of emerging data, and broader perspectives on the opportunities and challenges that these data sources present. Specific topics include but are not limited to, for example: assessment of natural hazard impacts (e.g. floods, droughts, landslides, temperature extremes, windstorms), real-time monitoring of disasters, evidence synthesis, public sentiment analysis, policy and awareness tracking, discourse and narrative analyses, natural language processing, large language models, social media analysis, historical data rescue, image mining, deep learning, and machine learning.

Earth System Science is witnessing an ever-increasing availability of textual, digital trace, social sensing, mobile phone, opportunistic sensing, audiovisual, and crowdsourced data. These data open unprecedented new research avenues and opportunities but also pose important challenges, from technical hurdles to skewed coverage, difficulties in quality control, and reproducibility limits.

Textual data is a case in point. Digital newspaper repositories, social media platforms, and archives of peer-reviewed articles provide vast amounts of digitalized text data. At the same time, large language models, such as ChatGPT, have opened new scalable ways of extracting research-relevant and actionable information from texts. However, such models are far from unbiased and may not be transparent, interpretable, or open access, hindering reproducibility. The same holds true for other types of data and associated data mining methods, such as knowledge extraction from images, audio, and videos.

This session welcomes abstracts that explore using text and other emerging data sources in Earth System Sciences, especially in hydrology, natural hazards, and climate research. The session scope spans data analysis methodologies, scientific advances from the analysis of emerging data, and broader perspectives on the opportunities and challenges that these data sources present. Specific topics include but are not limited to, for example: assessment of natural hazard impacts (e.g. floods, droughts, landslides, temperature extremes, windstorms), real-time monitoring of disasters, evidence synthesis, public sentiment analysis, policy and awareness tracking, discourse and narrative analyses, natural language processing, large language models, social media analysis, historical data rescue, image mining, deep learning, and machine learning.