Provenance powered microservices: a flexible and generic approach fostering reproducible research in Earth Science

Alessandro Spinuso1, Ian van der Neut1, Mats Veldhuizen1, Christian Pagé2, and Daniele Bailo3
  • 1Royal Netherlands Meteorological Institute (KNMI), R&D Observations and Data Technology, De Bilt, The Netherlands (
  • 2CECI, Université de Toulouse, CNRS, Cerfacs, Toulouse, France (
  • 3Istituto Nazionale di Geofisica e Vulcanologia (INGV), Rome, Italy (

Scientific progress requires research outputs to be reproducible, or at least persistently traceable and analysable for defects through time. This can be facilitated by coupling analysis tools that are already familiar to scientists, with reproducibility controls designed around common containerisation technologies and formats to represent metadata and provenance. Moreover, modern interactive tools for data analysis and visualisation, such as computational notebooks and visual analytics systems, are built to expose their functionalities through the Web. This facilitates the development of integrated solutions that are designed to support computational research with reproducibility in mind, and that, once deployed onto a Cloud infrastructure, benefit from operations that are securely managed and perform reliably. Such systems should be able to easily accommodate specific requirements concerning, for instance, the deployment of particular scientific software and the collection of tailored, yet comprehensive, provenance recordings about data and processes. By decoupling and generalising the description of the environment where a particular research took place from the underlying implementation, which may become obsolete through time, we obtain better chances to recollect relevant information for the retrospective analysis of a scientific product in the long term, enhancing preservation and reproducibility of results.

In this contribution we illustrate how this is achievable via the adoption of microservice architectures combined with a provenance model that supports metadata standards and templating. We aim at empowering scientific data portals with Virtual Research Environments (VREs) and provenance services, that are programmatically controlled via high-level functions over the internet. Our system SWIRRL deals, on behalf of the clients, with the complexity of allocating the interactive services for the VREs on a Cloud platform. It runs staging and preprocessing workflows to gather and organise remote datasets, making them accessible collaboratively. We show how Provenance Services manage provenance records about the underlying environment, datasets and analysis workflows, and how these are exploited by researchers to control different reproducibility use cases. Our solutions are currently being implemented in more contexts in Earth Science. We will provide an overview on the progress of these efforts for the EPOS and IS-ENES research infrastructures, addressing solid earth and climate studies, respectively.

Finally, although the reproducibility challenges can be tackled to a large extent by modern technology, this will be further consolidated and made interoperable via the implementation and uptake of the FDOs. To achieve this goal, it is fundamental to establish the conversation between engineers, data-stewards and researchers early in the process of delivering a scientific product. This fosters the definition and implementation of suitable best practices to be adopted by a particular research group. Scientific tools and repositories built around modern FAIR enabling resources can be incrementally refined thanks to this mediated exchange. We will briefly introduce success stories towards this goal in the context of the IPCC Assessment Reports.

How to cite: Spinuso, A., van der Neut, I., Veldhuizen, M., Pagé, C., and Bailo, D.: Provenance powered microservices: a flexible and generic approach fostering reproducible research in Earth Science, EGU General Assembly 2023, Vienna, Austria, 24–28 Apr 2023, EGU23-2744,, 2023.