EGU26-12877, updated on 14 Mar 2026
https://doi.org/10.5194/egusphere-egu26-12877
EGU General Assembly 2026
© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.
Poster | Tuesday, 05 May, 16:15–18:00 (CEST), Display time Tuesday, 05 May, 14:00–18:00
 
Hall X4, X4.31
EVE: An Open Source Earth Science LLM for Researchers, Policymakers, and the Public
Àlex R. Atrio1, Antonio Lopez1, Jino Rohit1, Yassine Elhouadi2, Marcello Politi1, Vijayasri Iyer1, Sébastien Bratières1,3, Umar Jamil2, and Nicolas Longépé4
Àlex R. Atrio et al.
  • 1Pi School, Rome, Italy
  • 2Mistral AI, Paris, France
  • 3Translated, Rome, Italy
  • 4European Space Agency Φ-Lab, Frascati, Italy

Recent advances in Large Language Models (LLMs) have created opportunities to support reasoning, discovery, and synthesis in Earth Observation (EO) and Earth Sciences, provided domain specificity and reliability can be ensured. In this work, we introduce Earth Virtual Expert (EVE), a comprehensive open-source initiative to develop, evaluate, and deploy a domain-specialized LLM for EO. EVE serves as a testbed for studying domain-adaptive training, grounded generation, and evaluation strategies tailored to scientific use, rather than general-purpose conversational performance.

As part of this initiative, we present EVE-instruct, a text-only, instruction-tuned and aligned LLM specialized for EO. Built on Mistral Small 3.2 (24B parameters) with a 128k context window, it focuses on domain-specific reasoning, question answering, and retrieval– and hallucination-aware generation, without significant tradeoff of general capabilities. We release all data used to train and evaluate EVE-instruct: a large-scale curated EO corpus of 3B tokens, synthetically generated fine-tuning datasets derived from this corpus (4B tokens), and manually-created EO-specific evaluation test sets comprising 7500 samples across multiple-choice and open-ended question answering, and factuality test sets.

To support trustworthy usage and deployment, we further develop a Retrieval-Augmented Generation (RAG) database from the curated corpus and a hallucination-detection module focused on factual consistency and scientific grounding. These components are integrated with EVE-instruct and deployed with a graphical user interface and accessible via API, currently supporting more than 300 users from the EO research and industry field.

All models, datasets, and code are publicly released at: https://huggingface.co/eve-esa and https://github.com/eve-esa.

How to cite: R. Atrio, À., Lopez, A., Rohit, J., Elhouadi, Y., Politi, M., Iyer, V., Bratières, S., Jamil, U., and Longépé, N.: EVE: An Open Source Earth Science LLM for Researchers, Policymakers, and the Public, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-12877, https://doi.org/10.5194/egusphere-egu26-12877, 2026.