EGU24-12410, updated on 09 Mar 2024
https://doi.org/10.5194/egusphere-egu24-12410
EGU General Assembly 2024
© Author(s) 2024. This work is distributed under
the Creative Commons Attribution 4.0 License.

Towards Enhancing WaaS and Data Provenance over Reana

Iraklis Klampanos, Antonis Ganios, and Antonis Troumpoukis
Iraklis Klampanos et al.
  • Institute of Informatics and Telecommunications, NCSR "Demokritos", Agia Paraskevi, Greece (iaklampanos@iit.demokritos.gr)

Interoperability and reproducibility are critical aspects of scientific computation. The data analysis platform Reana [1], developed by CERN, enhances the interoperability and reproducibility of scientific analyses by allowing researchers to describe, execute, and share their analyses. This is achieved via the execution of standardised scientific workflows, such as CWL, within reusable containers. Moreover, it allows execution to span different types of resources, such as Cloud and HPC. 

In this session we will present ongoing work to enhance Reana’s Workflows-as-a-Service (WaaS) functionality and also support Workflow registration and discoverability. Building upon the design goals and principles of the DARE platform [2], this work aims to enhance Reana by enabling users to register and discover available workflows within the system. In addition, we will present the integration of Data Provenance based on the W3C PROV-O standard [3] allowing the tracking and recording of data lineage in a systematic and dependable way across resource types. 

In summary, key aspects of this ongoing work include:

  • Workflows-as-a-Service (WaaS): Extending Reana's service-oriented mode of operation, allowing users to register, discover, access, execute, and manage workflows by name or ID, via APIs, therefore enhancing the platform's accessibility and usability.
  • Data Provenance based on W3C PROV-O: Implementing support for recording and visualising data lineage information in compliance with the W3C PROV-O standard. This ensures transparency and traceability of data processing steps, aiding in reproducibility and understanding of scientific analyses.

This work aims to broaden Reana's functionality, aligning with best practices for reproducible and transparent scientific research. We aim to make use of the enhanced Reana-based system on the European AI-on-demand platform [4], currently under development, to address the requirements of AI innovators and researchers when studying and executing large-scale AI-infused workflows.

References: 

[1] Simko et al., (2019). Reana: A system for reusable research data analyses. EPJ Web Conf., 214:06034, https://doi.org/10.1051/epjconf/201921406034

[2] Klampanos et al., (2020). DARE Platform: a Developer-Friendly and Self-Optimising Workflows-as-a-Service Framework for e-Science on the Cloud. Journal of Open Source Software, 5(54), 2664, https://doi.org/10.21105/joss.02664

[3] PROV-O: The PROV Ontology: https://www.w3.org/TR/prov-o/ (viewed 9 Jan 2024)

[4] The European AI-on-Demand platform: https://aiod.eu (viewed 9 Jan 2024)

This work has been has received funding from the European Union’s Horizon Europe research and innovation programme under Grant Agreement No 101070000.

How to cite: Klampanos, I., Ganios, A., and Troumpoukis, A.: Towards Enhancing WaaS and Data Provenance over Reana, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-12410, https://doi.org/10.5194/egusphere-egu24-12410, 2024.