- 1Department of Earth Science, University of Bergen, Bergen, Norway
- 2Istituto Nazionale di Geofisica e Vulcanologia, Rome, Italy
- 3NORSAR, Kjeller, Norway
- 4GFZ Helmholtz Centre for Geosciences, Potsdam, Germany
The rapid evolution of cross-disciplinary research in geoscience has led to an exponential increase in complex data production, significantly challenging the data research experts as well as the data repositories management. This complexity is evident in large-scale data infrastructure projects like the EU-funded Geo-INQUIRE project, which includes five major Research Infrastructures (RIs) in geoscience, namely EPOS-ERIC, EMSO-ERIC, ECCSEL-ERIC, ARISE and ChEESE, offering both Transnational Access (TA) and Virtual Access (VA).
Integrating data from TA into a unified VA systems often presents challenges, particularly in multi-institutional projects. This process requires significant expert intervention and frequently results in excessive meetings and potential integration failures.
To address this, the current contribution proposes a novel data science-driven method targeting research infrastructure governance challenges. The approach introduces an automated analytical framework to guide the integration of TA assets into VA systems. Leveraging Large Language Models (LLMs) for semantic embedding, the method transforms unstructured metadata from VA and TA sources into structured data vectorizations. This cohesive data frame then undergoes a series of similarity analysis techniques based on cross-semantic embedding evaluations. Using data from the multidisciplinary Geo-INQUIRE project, the method's is tested for its ability to manage complex asset integration across five major geoscience RIs.
The primary finding offers a preemptive framework streamlining connections for integrating TA assets into appropriate VA systems, facilitating decision-making on asset integration flow.
The resulting mapping not only optimizes TA-VA asset matching but also uncovers cross-connections between installations (services), inter-RIs, and potential multi-institutional collaborations. Furthermore, the research presents complex scenarios, through idealized simulations based on TA-VA metadata variable changes, proposing alternative integration pathways when minor asset adjustments or asset enhancements are implemented at the VA installation level.
This contribution is a proof-of-concept research based on a data-driven solution aimed at streamlining data integration in large-scale geoscience projects. It could potentially reduce expert intervention, enhance cross-disciplinary research opportunities, and improve overall efficiency in managing complex, multi-institutional data infrastructures.
How to cite: Ramanantsoa, J., Bailo, D., Michalek, J., Näsholm, S. P., Paciello, R., and Strollo, A.: Optimizing Transnational and Virtual Access: A Data-Driven Framework for Managing Geoscience Research Infrastructure, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-10649, https://doi.org/10.5194/egusphere-egu25-10649, 2025.