EGU24-19155, updated on 11 Mar 2024
https://doi.org/10.5194/egusphere-egu24-19155
EGU General Assembly 2024
© Author(s) 2024. This work is distributed under
the Creative Commons Attribution 4.0 License.

FAIR Environmental Data through a STAC-Driven Inter-Institutional Data Catalog Infrastructure – Status quo of the Cat4KIT-project

Mostafa Hadizadeh1, Christof Lorenz1, Sabine Barthlott2, Romy Fösig3, Katharina Loewe4, Corinna Rebmann4, Benjamin Ertl2, Robert Ulrich5, and Felix Bach6
Mostafa Hadizadeh et al.
  • 1Karlsruhe Institute of Technology (KIT), Institute of Meteorology and Climate Research - Atmospheric Environmental Research (IMK-IFU), Garmisch-Partenkirchen, Germany
  • 2Karlsruhe Institute of Technology (KIT), Institute of Meteorology and Climate Research - Atmospheric Trace Gases and Remote Sensing (IMK-ASF), Karlsruhe, Germany
  • 3Karlsruhe Institute of Technology (KIT), Institute of Meteorology and Climate Research - Atmospheric Aerosol Research (IMK-AAF), Karlsruhe, Germany
  • 4Karlsruhe Institute of Technology, Institute of Meteorology and Climate Research - Department Troposphere Research (IMK-TRO), Germany
  • 5Karlsruhe Institute of Technology (KIT), KIT Library (BIB), Karlsruhe, Germany
  • 6Leibniz Institute for Information Infrastructure (FIZ), Karlsruhe, Germany

In the rapidly advancing domain of environmental research, the deployment of a comprehensive, state-of-the-art Research Data Management (RDM) framework is increasingly pivotal.  Such a framework is key to ensure FAIR data, laying the groundwork for transparent and reproducible earth system sciences.

Today, datasets associated with research articles are commonly published via prominent data repositories like Pangaea or Zenodo. Conversely, data used in actual day-to-day research and inter-institutional projects tends to be shared through basic cloud storage solutions or, even worse, via email. This practice, however, often conflicts with the FAIR principles, as much of this data ends up in private, restricted systems and local storage, limiting its broader accessibility and use.

In response to this challenge, our research project Cat4KIT aims to establish a cross-institutional catalog and Research Data Management framework. The Cat4KIT framework is, hence, an important building block towards the FAIRification of environmental data. It not only streamlines the process of ensuring availability and accessibility of large-scale environmental datasets but also significantly enhances their value for interdisciplinary research and informed decision-making in environmental policy.

The Cat4KIT system comprises four essential elements: data service provision, meta(data) harvesting, catalogue service, and user-friendly data presentation. The data service provision module is tailored to facilitate access to data within typical storage systems by using well-defined and standardized community interfaces via tools like the Thredds data server, Intake Catalogues, and the OGC SensorThings API. By this, we ensure seamless data retrieval and management for typical use-casers in environmental sciences.

(Meta)data harvesting via our so-called DS2STAC-package entails collecting metadata from various data services, followed by creating STAC-metadata and integrating it into our STAC-API-based catalog service.

This catalog service module synergizes diverse datasets into a cohesive, searchable spatial catalog, enhancing data discoverability and utility via our Cat4KIT UI.

Finally, our framework's data portal is tailored to elevate data accessibility and comprehensibility for a wide audience, including researchers, enabling them to efficiently search, filter, and navigate through data from decentralized research data infrastructures.

One notable characteristic of Cat4KIT is its dependence on open-source solutions and strict adherence to community standards. This guarantees not just the framework's ability to function well with current data systems but also its simple adaption and expansion to meet future needs. Our presentation demonstrates the technical structure of Cat4KIT, examining the development and integration of each module to adhere to the FAIR principles. Additionally, it showcases examples to illustrate the practical use of the framework in real-life situations, emphasizing its efficacy in enhancing data management practices within KIT and its potential relevance in other research organizations.

How to cite: Hadizadeh, M., Lorenz, C., Barthlott, S., Fösig, R., Loewe, K., Rebmann, C., Ertl, B., Ulrich, R., and Bach, F.: FAIR Environmental Data through a STAC-Driven Inter-Institutional Data Catalog Infrastructure – Status quo of the Cat4KIT-project, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-19155, https://doi.org/10.5194/egusphere-egu24-19155, 2024.