EGU26-21959, updated on 14 Mar 2026
https://doi.org/10.5194/egusphere-egu26-21959
EGU General Assembly 2026
© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.
Poster | Monday, 04 May, 10:45–12:30 (CEST), Display time Monday, 04 May, 08:30–12:30
 
Hall X5, X5.122
An LLM-assisted global proxy database for the Common Era
Feng Shi
Feng Shi
  • Institute of Geology and Geophysics, Chinese Academy of Sciences, State Key Laboratory of Lithospheric and Environmental Coevolution, Beijing, China (shifeng@mail.iggcas.ac.cn)

Understanding modern warming in a long-term geological context and separating natural forcing from internal variability remain key challenges in paleoclimatology. The Common Era (last two millennia) provides an important bridge between paleoclimate and the instrumental period. Yet building global proxy databases is often limited by manual literature review, data extraction, and metadata curation, which tasks that are time-consuming and can introduce inconsistencies.

We present an LLM-assisted workflow with human oversight to improve proxy database construction. Large Language Models (LLMs) are used to parse peer-reviewed paleoclimate publications and extract metadata such as sampling coordinates, temporal coverage, resolution, archive and proxy types, dating methods, and the authors' climate interpretations. The extracted information is then organized into a consistent format and checked through a multi-level quality control (QC) process.

Using this workflow, we compiled data from major community repositories (PAGES2k, ISO2k, ARC2k, SISALv3, ITRDB, and NTPDC) into a dataset of over 12,000 records. The database covers various archives including tree rings, speleothems, sediments, ice cores, corals, sclerosponges, and documentary sources, with around 70 proxy variables (e.g., δ¹⁸O, δ¹³C, tree-ring width). Quality control includes: (i) chronology classification, (ii) expert review of climate targets, and (iii) standardized formatting for analysis.

Initial results show clear cooling signals following major volcanic eruptions (e.g., mid-6th century, 1257 CE, 1815 CE), and selected proxy networks capture ENSO variability and Northern Hemisphere temperature changes reasonably well. Current work focuses on expanding literature coverage through automated search, improving proxy classification with active learning, and developing machine-learning approaches for spatial reconstructions and model-data comparisons.

How to cite: Shi, F.: An LLM-assisted global proxy database for the Common Era, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-21959, https://doi.org/10.5194/egusphere-egu26-21959, 2026.