Open-source large language models in astronomical data classification: applications and benchmarking

Evgeny Smirnov

doi:https://doi.org/10.5194/epsc-dps2025-226

[Back] [Session MITM5]

EPSC Abstracts

Vol. 18, EPSC-DPS2025-226, 2025, updated on 09 Jul 2025

https://doi.org/10.5194/epsc-dps2025-226

EPSC-DPS Joint Meeting 2025

© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.

Open-source large language models in astronomical data classification: applications and benchmarking

Evgeny Smirnov

Belgrade Astronomical Observatory, Serbia (smirik@gmail.com)

Recent advances in large language models (LLMs) have opened new possibilities for astronomical data analysis and classification tasks. While multimodal models such as GPT-4o/4.1, Claude 3.5/3.7, and others have demonstrated remarkable capabilities in processing both text and images, their application to astronomy has been limited by substantial operational costs. This work presents a comprehensive evaluation of open-source LLMs, including LLaMA 3.2, Gemma, and DeepSeek, for astronomical data classification tasks, with particular emphasis on mean-motion resonance identification in asteroid dynamics.

In this research, it is demonstrated that open-source models can achieve performance acceptable to the given problem and outperform traditional neural network while significantly reducing operational costs. This approach leverages the inherent pattern recognition capabilities of LLMs to analyze time series data and astronomical images, tasks that traditionally require specialized algorithms and extensive computational resources. Building upon previous work that established the viability of multimodal LLMs for resonance identification, it is shown that careful prompt engineering and model instructing and fine-tuning can yield acceptable accuracy rates even with freely available models that can be launched on a researcher's laptop.

Furthermore, a structured framework for developing standardized benchmarks for astronomical tasks using LLMs is introduced. This framework includes: (1) systematic dataset curation protocols, (2) evaluation metrics adjusted to some astronomical applications, (3) cross-model performance comparison methodologies, and (4) guidelines for prompt engineering technics. These benchmarks enable reproducible performance assessment across different LLM architectures and can facilitate the identification of cost-effective solutions for specific astronomical problems.

How to cite: Smirnov, E.: Open-source large language models in astronomical data classification: applications and benchmarking, EPSC-DPS Joint Meeting 2025, Helsinki, Finland, 7–12 Sep 2025, EPSC-DPS2025-226, https://doi.org/10.5194/epsc-dps2025-226, 2025.