EGU23-6726
https://doi.org/10.5194/egusphere-egu23-6726
EGU General Assembly 2023
© Author(s) 2023. This work is distributed under
the Creative Commons Attribution 4.0 License.

Data compilations for enriched reuse of sea ice data sets

Anna Simson, Anil Yildiz, and Julia Kowalski
Anna Simson et al.
  • Methods for Model-based Development in Computational Engineering, RWTH Aachen University, Aachen, Germany (simson@mbd.rwth-aachen.de)

A vast amount of in situ cryospheric data has been collected during publicly funded field campaigns to the polar regions over the past decades. Each individual data set yields important insights into local thermo-physical processes, but they need to be assembled into informative data compilations to unlock their full potential to produce regional or global outcomes for climate change related research. The efficient and sustainable interdisciplinary reuse of such data compilations is of large interest to the scientific community. Yet, the creation of such compilations is often challenging as they have to be composed of often heterogeneous data sets from various data repositories. We will focus on the reuse of data sets in this contribution, while generating extendible data compilations with enhanced reusability.

Data reuse is typically conducted by researchers other than the original data producers, and it is therefore often limited by the metadata and provenance information available. Reuse scenarios include the validation of physics-based process models, the training of data-driven models, or data-integrated predictive simulations. All these use cases heavily rely on a diverse data foundation in form of a data compilation, which depends on high quality information. In addition to metadata, provenance, and licensing conditions, the data set itself must be checked for reusability. Individual data sets containing the same metrics often differ in structure, content, and metadata, which challenges data compilation.

In order to generate data compilations for a specific reuse scenario, we propose to break down the workflow into four steps:
1) Search and selection: Searching, assessing, optimizing search, and selecting data sets.
2) Validation: Understanding and representing data sets in terms of the data collectors including structure, terms used, metadata, and relations between different metrics or data sets.
3) Specification: Defining the format, structure, and content of the data compilation based on the scope of the data sets.
4) Implementation: Integrating the selected data sets into the compilation.

We present a workflow herein to create a data compilation from heterogeneous sea ice core data sets following the previously introduced structure. We report on obstacles encountered in the validation of data sets mainly due to missing or ambiguous metadata. This leaves the (re)user space for subjective interpretation and thus increases uncertainty of the compilation. Examples are challenges in relating different data repositories associated with the same location or the same campaign, the accuracy of measurement methods, and the processing stage of the data. All of which often require a bilateral iteration with the data acquisition team. Our study shows that enriching data reusability with data compilations requires quality-ensured metadata on the individual data set level.

How to cite: Simson, A., Yildiz, A., and Kowalski, J.: Data compilations for enriched reuse of sea ice data sets, EGU General Assembly 2023, Vienna, Austria, 24–28 Apr 2023, EGU23-6726, https://doi.org/10.5194/egusphere-egu23-6726, 2023.