- Alfred Wegener Institute for Polar and Marine Research, Bremerhaven, Germany
We present LLM-Enhanced CMIP6 Search, a Python-based tool built with LangChain and LangGraph frameworks that simplifies the discovery of and access to Coupled Model Intercomparison Project Phase 6 (CMIP6) climate data through natural language processing. By combining Large Language Models (LLMs) with retrieval-augmented generation (RAG), our system translates user queries into precise CMIP6 search parameters, bridging the gap between researchers' information needs and CMIP6's structured metadata system. The tool employs a single LLM agent coordinating three specialized tools: a search tool that maps natural language to CMIP6 parameters (such as model, experiment, and variable identifiers), an access tool that both verifies data availability and generates ready-to-use Python code for retrieval, and an adviser tool that helps refine search criteria. To improve search accuracy, we developed a refined database of CMIP6 metadata descriptions, optimizing vector-based similarity matching between user queries and technical CMIP6 terminology, providing a foundation for more intuitive climate data discovery.
How to cite: Shapkin, B., Pantiukhin, D., Kuznetsov, I., Jost, A. A., and Koldunov, N.: LLM-Enhanced CMIP6 Search, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-7059, https://doi.org/10.5194/egusphere-egu25-7059, 2025.