EGU24-18265, updated on 11 Mar 2024
https://doi.org/10.5194/egusphere-egu24-18265
EGU General Assembly 2024
© Author(s) 2024. This work is distributed under
the Creative Commons Attribution 4.0 License.

GeoGPT, the large earth science language model system

Jian Wang
Jian Wang
  • Zhejiang Lab, Hangzhou, China (jianw@aliyun.com)

GeoGPT, a large earth science language model system for geoscientists, was designed and developed in response to the Deep-time Digital Earth (DDE) International Science Initiative, which was officially launched at DDE Open Science Forum co-organized by UNESCO in 2021.

Starting with leading open-source large language models, GeoGPT has built fundamental capabilities, including extraction of key information from geoscience documents, question-and-answer interaction, logical reasoning, automatic code generation, and numerical computation analysis. Smart incremental training strategy based on open-source large-scale models rapidly enhances the adaptability and performance of GeoGPT in the field of Earth sciences. GeoGPT is architected to be flexible in adapting to different foundation models in the future.

To ensure accuracy and professionalism in the field of earth science, GeoGPT has specifically constructed a large high-quality geoscience corpus covering 8 secondary disciplines of earth science and innovatively developed a software system designed for annotating geoscience data more efficiently. Hundreds of Earth scientists have collaborated to complete the annotation of nearly one hundred thousand highly specialized question-and-answer pairs, which greatly enriched the training data resources for this geoscience model.

GeoGPT is a global effort of open science practice across research institutes, universities, industry, and other organizations. GeoGPT model is open to the global research community today. It is also being planned to provide open access to large-scale datasets used in GeoGPT and GeoGPT API to Earth science community. GeoGPT is helping to transform the research paradigm of earth science through its potential capabilities of generating scientific hypotheses, constructing theoretical models, and doing research plans.

How to cite: Wang, J.: GeoGPT, the large earth science language model system, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-18265, https://doi.org/10.5194/egusphere-egu24-18265, 2024.

This abstract will not be presented.