EGU23-1902, updated on 22 Feb 2023
https://doi.org/10.5194/egusphere-egu23-1902
EGU General Assembly 2023
© Author(s) 2023. This work is distributed under
the Creative Commons Attribution 4.0 License.

TACTICIAN: AI-based applications knowledge extraction from ESA’s mission scientific publications

Omiros Giannakis1, Iason Demiros2, Konstantinos Koutroumbas1, Athanasios Rontogiannis1,3, Vassilis Antonopoulos2, Guido De Marchi4, Christophe Arviset5, George Balasis1, Athanasios Daglis1, George Vasalos1, Zoe Boutsi1, Jan Tauber4, Marcos Lopez-Caniego5, Mark Kidger5, Arnaud Masson5, and Philippe Escoubet4
Omiros Giannakis et al.
  • 1National Observatory of Athens, Institute for Astronomy, Astrophysics, Space Applications and Remote Sensing, Greece (ogiannakis@noa.gr)
  • 211Tensors, Greece
  • 3School of Electrical & Computer Engineering, National Technical University of Athens, 9, Iroon Polytechniou St., 157 80, Athens, Greece
  • 4European Space Research and Technology Centre, European Space Agency, the Netherlands
  • 5European Space Astronomy Center, European Space Agency, Spain

Scientific publications in space science contain valuable and extensive information regarding the links and relationships between the data interpreted by the authors and the associated observational elements (e.g., instruments or experiments names, observing times, etc.). In this reality of scientific information overload, researchers are often overwhelmed by an enormous and continuously growing number of articles to access in their daily activities. The exploration of recent advances concerning specific topics, methods and techniques, the review and evaluation of research proposals and in general any action that requires a cautious and comprehensive assessment of scientific literature has turned into an extremely complex and time-consuming task.

The availability of Natural Language Processing (NLP) tools able to extract information from scientific unstructured textual contents and to turn it into extremely organized and interconnected knowledge, is fundamental in the framework of the use of scientific information. Exploitation of the knowledge that exists in the scientific publications, necessitates state-of-the-art NLP. The semantic interpretation of the scientific texts can support the development of a varied set of applications such as information retrieval from the texts, linking to existing knowledge repositories, topic classification, semi-automatic assessment of publications and research proposals, tracking of scientific and technological advances, scientific intelligence-assisted reporting, review writing, and question answering.

The main objectives of TACTICIAN are to introduce Artificial Intelligence (AI) techniques to the textual analysis of the publications of all ESA Space Science missions, to monitor and evaluate the scientific productivity of the science missions, and to integrate the scientific publications’ metadata into the ESA Space Science Archive. Through TACTICIAN, we extract lexical, syntactic, and semantic information from the scientific publications by applying NLP and Machine Learning (ML) algorithms and techniques. Utilizing the wealth of publications, we have created valuable scientific language resources, such as labeled datasets and word embeddings, which were used to train Deep Learning models that assist us in most of the language understanding tasks. In the context of TACTICIAN, we have devised methodologies and developed algorithms that can assign scientific publications to the Mars Express, Herschel, and Cluster ESA science missions and identify selected named entities and observations in these scientific publications. We also introduced a new unsupervised ML technique, based on Nonnegative Matrix Factorization (NMF), for classifying the Planck mission scientific publications to categories according to the use of the Planck data products.

These methodologies can be applied to any other mission. The combination of NLP and ML constitutes a general basis, which has proved that it can assist in establishing links between the missions’ observations and the scientific publications and to classify them in categories, with high accuracy.

This work has received funding from the European Space Agency under the "ArTificiAl intelligenCe To lInk publiCations wIth observAtioNs (TACTICIAN)" activity under ESA Contract No 4000128429/19/ES/JD.

How to cite: Giannakis, O., Demiros, I., Koutroumbas, K., Rontogiannis, A., Antonopoulos, V., De Marchi, G., Arviset, C., Balasis, G., Daglis, A., Vasalos, G., Boutsi, Z., Tauber, J., Lopez-Caniego, M., Kidger, M., Masson, A., and Escoubet, P.: TACTICIAN: AI-based applications knowledge extraction from ESA’s mission scientific publications, EGU General Assembly 2023, Vienna, Austria, 24–28 Apr 2023, EGU23-1902, https://doi.org/10.5194/egusphere-egu23-1902, 2023.