- Alfred Wegener Institute, Helmholtz Centre for Polar and Marine Research, Bremerhaven, Germany
PANGAEA GPT is a Large Language Model (LLM) multi-agent framework that aims to streamline the work of geoscientists with the diverse Earth system datasets held in the PANGAEA archive (pangaea.de), a widely used data repository in Earth and Environmental Sciences. Built on top of the LangChain library and the LangGraph framework, it uses a multi-agent collaboration approach with a centralized supervisor agent that interprets incoming user queries and then coordinates specialized agents according to task requirements. These specialized agents include the Search Agent, which performs data lookups via API requests to PANGAEA and locates related publications via Crossref (to further answer questions about what has been published based on a particular dataset). They also include an orchestra of Data Agents configured in different modes - such as "oceanographer," "ecologist," or "geologist" - to perform dataset-specific analyses. Each Data Agent operates within a dedicated Python environment that allows for code manipulation, data analysis, visualization, and iterative refinement of results. The Supervisor Agent then aggregates the output from these Data Agents and delivers a consolidated response back to the user (including generated analysis scripts). The current framework has been shown to excel at providing a list of relevant datasets, locating related publications, and performing statistical analysis upon user request, greatly simplifying data discovery and use for geoscientists. In addition to the rapid discovery, analysis, and visualization of heterogeneous datasets, a particularly valuable end goal of PANGAEA GPT is to generate concise documentation for historical or underutilized datasets that currently lack related publications, ensuring that their valuable information endures and drives further scientific discoveries.
How to cite: Pantiukhin, D., Shapkin, B., Kuznetsov, I., Jost, A. A., Jung, T., and Koldunov, N.: PANGAEA GPT: A Coordinated Multi-Agent Architecture for Earth System Data Discovery and Analysis, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-13656, https://doi.org/10.5194/egusphere-egu25-13656, 2025.