EGU25-17005, updated on 21 Mar 2025
https://doi.org/10.5194/egusphere-egu25-17005
EGU General Assembly 2025
© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.
Poster | Friday, 02 May, 10:45–12:30 (CEST), Display time Friday, 02 May, 08:30–12:30
 
Hall X4, X4.145
Building semantic bridges between multi-domain scientific data resources
Alexandra Kokkinaki1, Gwenaelle Moncoiffe1, Christelle Pierkot2, and Guillaume Alviset2
Alexandra Kokkinaki et al.
  • 1National Oceanography Centre, British Oceanographic Data Centre, Liverpool, UK
  • 2IR DATA-TERRA / UAR Data Terra, Montpellier, France

Data silos exist as a result of domain-specific semantic and syntactic legacy but they continue being created as an inevitable consequence of the exploratory and experimental nature of scientific investigation. New technologies are developed and adopted, new variables are measured, new terms and concepts are defined, new formats are required, and all this within typically fairly narrow and highly specialised fields of research. Data sharing technologies must adapt to this evolving environment. Flexibility and connectivity between neighbouring or overlapping fields of research is key. Bridging the semantic gaps and discrepancies to enable seamless discovery, sharing and exploitation of data is the challenge. 

Achieving cross-domain interoperability requires the establishment and harmonisation of both the syntax and semantics of datasets. The Semantic Analyser was developed to address the semantic challenge by scanning metadata records and data files to identify and analyse the semantics used for specific metadata elements, focusing on instruments, parameters, platforms, and keywords.

To determine whether the values for these metadata elements originated from semantic artefacts, we initially explored leveraging existing large semantic repositories, such as Earth Portal and BioPortal. These repositories offered extensive semantic artifacts, potentially reducing the effort required to match terms. However, this approach presented two significant challenges: (1) implementing a federated service for term matching against these repositories proved to be slow and inefficient, and (2) the large number of matched terms generated confusion among users, largely due to the difficulty of selecting appropriate vocabularies and ontologies for specific domains and targeted context. 

To overcome these obstacles, we decided to construct a dedicated knowledge base (KB) containing well-known vocabularies relevant to the datasets in focus. The KB was iteratively refined as new insights were gained, providing a streamlined and domain-specific solution for semantic harmonization and improving the usability and performance of the Semantic Analyser.

How to cite: Kokkinaki, A., Moncoiffe, G., Pierkot, C., and Alviset, G.: Building semantic bridges between multi-domain scientific data resources, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-17005, https://doi.org/10.5194/egusphere-egu25-17005, 2025.