EGU25-9063, updated on 14 Mar 2025
https://doi.org/10.5194/egusphere-egu25-9063
EGU General Assembly 2025
© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.
Poster | Wednesday, 30 Apr, 10:45–12:30 (CEST), Display time Wednesday, 30 Apr, 08:30–12:30
 
Hall X4, X4.66
Improving the findability of legacy laboratory data: enrich metadata using controlled vocabularies
Laurens Samshuijzen1, Otto Lange1, Ronald Pijnenburg2, Richard Wessels2, and Maik Nothbaum2
Laurens Samshuijzen et al.
  • 1Utrecht University Library, Utrecht University, Utrecht, Netherlands
  • 2Faculty of Geosciences, Utrecht University, Utrecht, Netherlands

The EPOS TCS Multi-Scale Laboratories (MSL) collects and harmonizes both available and newly emerging laboratory (meta)data, thereby aiming to generate data products that are easily Findable, Accessible, Interoperable and Reusable (FAIR) for future research, notably into Geo-resources, Geo-storage, Geo-hazards and Earth System Evolution. Key for discovery of MSL data is the use of well-established and openly published controlled community vocabularies. These vocabularies provide all terms for a full contextual description of a conducted laboratory experiment (e.g., materials used, apparatus, etc.). To improve the findability of future data publications we provide (metadata) editor components which connect to the community vocabularies. These vocabularies themselves are openly accessible and ready for incorporation in existing data publication chains at data repositories.

Challenges arise especially with respect to legacy content stemming from the long tail of science, i.e. data that were published before the MSL community standards for metadata and vocabularies became available. In many of such cases the presence of standardized metadata for discovery and provenance is often limited. To improve the findability of these valuable but non-harmonized data publications we developed a strategy which makes use of the MSL vocabularies. With this strategy we demonstrate how controlled vocabularies can be used for filling metadata gaps in older data publications and as such can be useful not merely for new data publications, but for the improvement of FAIRness for older sets as well.

The first challenge we faced concerned the identification of relevant legacy content that had to be discovered within the large offering at repositories. Using controlled term recognition we were able to identify a large set of data publications that appeared to be relevant to the MSL community. The second issue to solve was the enrichment of metadata to improve the findability of the identified publications. The use of the MSL vocabularies in combination with a textual analysis of the collected abstracts and titles allowed for an hierarchical description of the data, the experiment itself, and the equipment used. The result was an improvement of the findability through an extension of the initial metadata.

The extended metadata is shared via the EPOS Platform (https://www.ics-c.epos-eu.org/) and the MSL community data catalogue (https://epos-msl.uu.nl) which guides users in finding data publications through the provision of hierarchical filtering options with increasing granularity. The methodology we describe could be applied in broader contexts within the solid Earth sciences.

How to cite: Samshuijzen, L., Lange, O., Pijnenburg, R., Wessels, R., and Nothbaum, M.: Improving the findability of legacy laboratory data: enrich metadata using controlled vocabularies, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-9063, https://doi.org/10.5194/egusphere-egu25-9063, 2025.