EGU25-7059, updated on 14 Mar 2025
https://doi.org/10.5194/egusphere-egu25-7059
EGU General Assembly 2025
© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.
PICO | Thursday, 01 May, 08:43–08:45 (CEST)
 
PICO spot 2, PICO2.5
LLM-Enhanced CMIP6 Search
Boris Shapkin, Dmitrii Pantiukhin, Ivan Kuznetsov, Antonia Anna Jost, and Nikolay Koldunov
Boris Shapkin et al.
  • Alfred Wegener Institute for Polar and Marine Research, Bremerhaven, Germany

We present LLM-Enhanced CMIP6 Search, a Python-based tool built with LangChain and LangGraph frameworks that simplifies the discovery of and access to Coupled Model Intercomparison Project Phase 6 (CMIP6) climate data through natural language processing. By combining Large Language Models (LLMs) with retrieval-augmented generation (RAG), our system translates user queries into precise CMIP6 search parameters, bridging the gap between researchers' information needs and CMIP6's structured metadata system. The tool employs a single LLM agent coordinating three specialized tools: a search tool that maps natural language to CMIP6 parameters (such as model, experiment, and variable identifiers), an access tool that both verifies data availability and generates ready-to-use Python code for retrieval, and an adviser tool that helps refine search criteria. To improve search accuracy, we developed a refined database of CMIP6 metadata descriptions, optimizing vector-based similarity matching between user queries and technical CMIP6 terminology, providing a foundation for more intuitive climate data discovery.

How to cite: Shapkin, B., Pantiukhin, D., Kuznetsov, I., Jost, A. A., and Koldunov, N.: LLM-Enhanced CMIP6 Search, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-7059, https://doi.org/10.5194/egusphere-egu25-7059, 2025.