Scientific progress greatly benefits from the participation of a broad and diverse community. Increasing data volumes put pressure on this scientific ecosystem, limiting the participation in the scientific process to a select group of researchers with access to sufficient storage and compute resources. This is not new.
To level the playing field for all researchers, a shared infrastructure had to be developed. We know it today as the ESGF. The European contribution to ESGF has been coordinated mainly through the IS-ENES projects. The current infrastructure provides access to the data as well as compute resources. So far, so good.
The next bottleneck for a smooth scientific process is ease of use. A lot of progress has already been made on standardization of climate model output, so that it is easier to analyse and compare different models. Moreover, a broad range of tools is being developed to better facilitate the processing of large data volumes. The constraint then becomes the ability to navigate this new scientific landscape and to effectively wield the new tools we have at our disposal.
There is another factor that hampers scientific progress. The increasing complexity of climate analysis workflows makes it difficult to reproduce, reuse, and build upon previous results. Of course it does not help that the main scientific mode of exchange is through journal articles, which are not well suited for sharing workflows. Which brings us to sharing code.
Code, by its nature, documents a workflow and thereby helps reproducibility. Sharing code is only just starting to take off, as part of a broader development towards a more transparent and reproducible scientific process. Now, interestingly, it is not the scarcity of tools, but rather their abundance that can lead to diverging workflows and poor interoperability.
The Earth System Model eValuation Tool (ESMValTool) was originally developed as a command line tool for routine evaluation climate models. This tool encourages some degree of standardization by factoring out common operations, while allowing for custom analytics of the pre-processed data. All scripts are bundled with the tool. Over time this has grown into a library of so-called ‘recipes’.
Recently we have started developing a Python API for the ESMValTool. This allows for interactive exploration, modification, and execution of existing recipes, as well as the creation of new workflows. At the same time, partners in IS-ENES3 are making their infrastructure accessible through JupyterLab. Through the combination of these technologies, researchers have direct access to data and resources, and they can easily re-use existing analysis workflows, all through the comfort of the web browser. During the conference, we will give an overview of the current possibilities, and we would like to encourage the discussion on future developments that are needed for a fruitful scientific process.
How to cite: Kalverla, P. C., Smeets, S., Drost, N., Andela, B., Alidoost, F., Camphuijsen, J., and Vreede, B.: The evolution of shared infrastructure for climate analytics, EMS Annual Meeting 2021, online, 6–10 Sep 2021, EMS2021-338, https://doi.org/10.5194/ems2021-338, 2021.