A machine-actionable workflow for the publication of climate impact data of the ISIMIP project

Jochen Klar and Matthias Mengel
The Inter-Sectoral Impact Model Intercomparison Project (ISIMIP) is a community-driven climate impact modeling initiative that aims to contribute to a quantitative and cross-sectoral synthesis of the various impacts of climate change, including associated uncertainties. ISIMIP is organized into simulation rounds for which a simulation protocol defines a set of common scenarios. Participating modeling groups run their simulations according to these scenarios and with a common set of climatic and socioeconomic input data. The model output data are collected by the ISIMIP team at the Potsdam Institute for Climate Impact Research (PIK) and made publicly available in the ISIMIP repository. Currently the ISIMIP Repository at includes data from over 150 impact models spanning across 13 different sectors. It comprises of over 100 Tb of data.

As the world's largest data archive of model-based climate impact data, ISIMIP output data is used by a very diverse audience inside and outside of academia, for all kind of research and analyses. Special care is taken to enable persistent identification, provenience, and citablity. A set of workflows and tools ensure the conformity of the model output data with the protocol and the transparent management of caveats and updates to already published data. Datasets are referenced using unique internal IDs and hash values are stored for each file in the database.

In recent years, this process has been significantly improved by introducing a machine-readable protocol, which is version controlled on GitHub and can be accessed over the internet. A set of software tools for quality control and data publication accesses this protocol to enforce a consistent data quality and to extract metadata. Some of the tools can be used independently by the modelling groups even before submitting the data. After the data is published on the ISIMIP Repository, it can be accessed via web or using an API (e.g. for access from Jupyter notebooks) using the same controlled vocabularies from the protocol. In order to make the data citable, DOI for each output sector are registered with DataCite. For each DOI, a precise list of each contained dataset is maintained. If data for a sector is added or replaced, a new, updated DOI is created.

While the specific implementation is highly optimized to the peculiarities of ISIMIP, the general ideas should be transferable to other projects. In our presentation, we will discuss the various tools and how they interact to create an integrated curation and publishing workflow.

