EGU25-10981, updated on 15 Mar 2025
https://doi.org/10.5194/egusphere-egu25-10981
EGU General Assembly 2025
© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.
yProv: a Software Ecosystem for Multi-level Provenance Management and Exploration in Climate Workflows
Fabrizio Antonio1, Gabriele Padovani2, Ludovica Sacco2, Carolina Sopranzetti2, Marco Robol2, Konstantinos Zefkilis2, Nicola Marchioro2, and Sandro Fiore2
Fabrizio Antonio et al.
  • 1CMCC Foundation - Euro-Mediterranean Center on Climate Change, Lecce, Italy
  • 2University of Trento, Trento, Italy

Scientific workflows and provenance are two faces of the same medal. While the former addresses the coordinated execution of multiple tasks over a set of computational resources, the latter relates to the historical record of data from its original sources. As experiments rapidly evolve towards complex end-to-end workflows, handling provenance at different levels of granularity and during the entire analytics workflow lifecycle is key for managing lineage information related to large-scale experiments in a flexible way as well as enabling reproducibility scenarios, thus playing a relevant role in Open Science.

The contribution highlights the importance of tracking multi-level provenance metadata in complex, AI-based scientific workflows as a way to foster documentation of data and experiments in a standardized format, strengthen interpretability, trustworthiness and authenticity of the results, facilitate performance diagnosis and troubleshooting activities, and advance provenance exploration. More specifically, the contribution introduces yProv, a joint research effort between CMCC and University of Trento targeting multi-level provenance management in complex, AI-based scientific workflows. The yProv project provides a rich software ecosystem consisting of a web service (yProv service) to store and manage provenance documents compliant with the W3C PROV family of standards, two libraries to track provenance in scientific workflows at different levels of granularity with a focus on AI models training (yProv4WFs and yProv4ML), and a data science tool for provenance inspection, navigation, visualization, and analysis (yProv Explorer). Activity on trustworthy provenance with yProv is also ongoing to fully address end-to-end provenance management requirements.

The contribution will cover the presentation of the yProv software ecosystem and use cases from the interTwin (https://www.intertwin.eu/) and ClimateEurope2 (https://climateurope2.eu/) European projects as well as from the ICSC National Center on HPC, Big Data and Quantum Computing targeting Digital Twins for extreme weather & climate events and data-driven/data-intensive workflows for climate change. 

How to cite: Antonio, F., Padovani, G., Sacco, L., Sopranzetti, C., Robol, M., Zefkilis, K., Marchioro, N., and Fiore, S.: yProv: a Software Ecosystem for Multi-level Provenance Management and Exploration in Climate Workflows, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-10981, https://doi.org/10.5194/egusphere-egu25-10981, 2025.