- 1National Oceanography Centre, British Oceanographic Data Centre, Liverpool, UK
- 2IR DATA-TERRA / UAR Data Terra, Montpellier, France
Data silos exist as a result of domain-specific semantic and syntactic legacy but they continue being created as an inevitable consequence of the exploratory and experimental nature of scientific investigation. New technologies are developed and adopted, new variables are measured, new terms and concepts are defined, new formats are required, and all this within typically fairly narrow and highly specialised fields of research. Data sharing technologies must adapt to this evolving environment. Flexibility and connectivity between neighbouring or overlapping fields of research is key. Bridging the semantic gaps and discrepancies to enable seamless discovery, sharing and exploitation of data is the challenge.
Achieving cross-domain interoperability requires the establishment and harmonisation of both the syntax and semantics of datasets. The Semantic Analyser was developed to address the semantic challenge by scanning metadata records and data files to identify and analyse the semantics used for specific metadata elements, focusing on instruments, parameters, platforms, and keywords.
To determine whether the values for these metadata elements originated from semantic artefacts, we initially explored leveraging existing large semantic repositories, such as Earth Portal and BioPortal. These repositories offered extensive semantic artifacts, potentially reducing the effort required to match terms. However, this approach presented two significant challenges: (1) implementing a federated service for term matching against these repositories proved to be slow and inefficient, and (2) the large number of matched terms generated confusion among users, largely due to the difficulty of selecting appropriate vocabularies and ontologies for specific domains and targeted context.
To overcome these obstacles, we decided to construct a dedicated knowledge base (KB) containing well-known vocabularies relevant to the datasets in focus. The KB was iteratively refined as new insights were gained, providing a streamlined and domain-specific solution for semantic harmonization and improving the usability and performance of the Semantic Analyser.
How to cite: Kokkinaki, A., Moncoiffe, G., Pierkot, C., and Alviset, G.: Building semantic bridges between multi-domain scientific data resources, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-17005, https://doi.org/10.5194/egusphere-egu25-17005, 2025.