EGU26-9441, updated on 14 Mar 2026
https://doi.org/10.5194/egusphere-egu26-9441
EGU General Assembly 2026
© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.
Poster | Monday, 04 May, 16:15–18:00 (CEST), Display time Monday, 04 May, 14:00–18:00
 
Hall X4, X4.135
The Data-to-Knowledge Package - A Framework for publishing reproducible and reusable analysis workflows in Earth System Science
Markus Konkol1, Simon Jirka1, Sami Domisch2, Merret Buurman2, Vanessa Bremerich2, and Astra Labuce3
Markus Konkol et al.
  • 152°North Spatial Information Research GmbH, Münster, Germany (m.konkol@52north.org)
  • 2Leibniz-Institute of Freshwater Ecology and Inland Fisheries
  • 3Latvian Institute of Aquatic Ecology, Agency of Daugavpils University

More and more funders, reviewers, and publishers ask researchers to follow Open Science principles and make their research results publicly accessible. In the case of a computational analysis workflow, this means providing access to data and code that produced the figures, tables, and numbers reported in a paper. However, doing so, even in consideration of the FAIR Principles, does not mean others can easily reuse the materials and continue the research. It still requires effort to understand an analysis script (e.g., written in R or python) and extract those parts of a workflow (i.e. the code snippets) that generate, for instance, a particular figure.

In this contribution, we demonstrate the concept and realization of the Data-to-Knowledge Package (D2K-Package), a collection of digital assets which facilitate the reuse of computational research results [1]. The heart of a D2K-Package is the reproducible basis composed of the data and code underlying, for instance, a statistical analysis. Instead of simply providing access to the analysis script as a whole, the idea is to structure the code into self-contained and containerized functions making the workflow steps more reusable. Each function follows the input-processing-output-logic and fulfills a certain task such as data processing, analysis, or visualization. Creating such a reproducible basis allows inferring the following components that are also part of the D2K-Package:

A virtual lab is a web application, for example, in the form of a JupyterLab environment provided with the help of MyBinder. Users can access it via the browser and obtain a computational environment with all dependencies and the runtime pre-installed. Creating such a virtual lab is possible since all code is containerized and the image is built based on a specification of the used libraries, runtime, and their versions. A virtual lab can help users with programming expertise to engage with the code in a ready-to-use programming environment.

A web API service exposes the encapsulated and self-contained functions such that every function has a dedicated URL endpoint. Users can send requests from their analysis script to that endpoint and obtain the results via HTTP. Hence, they can reuse the functions without copying the code snippets or struggling with dependencies. Such a service can be realized using OGC API Processes and pygeoapi.

The computational workflow connects the functions to an executable analysis pipeline and acts as an entry point to a complex analysis. Such a workflow can help users obtain a better understanding of the functions and relevant input parameters. By using workflow tools such as the Galaxy platform, also users without programming experience receive the chance to change the parameter configuration and see how the new settings affect the final output.

Besides the concepts as outlined above, this contribution will also report on real demonstrators which showcase the idea of a D2K-Package.

This project has received funding from the European Commission’s Horizon Europe Research and Innovation programme. Grant agreement No 101094434.

1) Paper: Konkol et al. (2025) https://doi.org/10.12688/openreseurope.20221.3

How to cite: Konkol, M., Jirka, S., Domisch, S., Buurman, M., Bremerich, V., and Labuce, A.: The Data-to-Knowledge Package - A Framework for publishing reproducible and reusable analysis workflows in Earth System Science, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-9441, https://doi.org/10.5194/egusphere-egu26-9441, 2026.