EGU23-7417
https://doi.org/10.5194/egusphere-egu23-7417
EGU General Assembly 2023
© Author(s) 2023. This work is distributed under
the Creative Commons Attribution 4.0 License.

Data-integrated executable publications for reproducible geohazards research

Anil Yildiz and Julia Kowalski
Anil Yildiz and Julia Kowalski
  • Methods for Model-based Development in Computational Engineering, RWTH Aachen University, Aachen, Germany (yildiz@mbd.rwth-aachen.de)

Investigating the mechanics of physical processes involved in various geohazards, e.g. gravitational, flow-like mass movements, shallow landslides or flash floods, predicting their temporal or spatial occurrence, and analysing the associated risks clearly benefit from advanced computational process-based or data-driven models. Reproducibility is needed not only for the integrity of the scientific results, but also as a trustbuilding element in practical geohazards engineering. Various complex numerical models or pre-trained machine learning algorithms exist in the literature, for example, to determine landslide susceptibility in a region or to predict the run-out of torrential flows in a catchment. These use FAIR datasets with increasing frequency, for example DEM data to set up the simulation, or open access landslide databases for training and validation purposes. However, we maintain that workflow reproducibility is not ensured simply due to the FAIRness of input or output datasets. Underlying computational or machine learning model needs to be (re)structured to enable the reproducibility and replicability of every step in the workflow so that a model can be (re)built to either reproduce the same results, or can be (re)used to elaborate on new cases or new applications. We propose a data-integrated, platform-independent scientific model publication approach combining self-developed Python packages, Jupyter notebooks, version controlling, FAIR data repositories and high-quality metadata. Model development in the form of a Python package guarantees that model can be run by any end-user, and defining submodules of analysis or visualisation within the package helps the users to build their own models upon the model presented. Publishing the manuscript as a data- and model-integrated Jupyter notebook creates a transparent application of the model, and the user can reproduce any result either presented in the manuscript or in the datasets. We demonstrate our workflow with two applications from geohazards research herein while highlighting the shortcomings of the existing frameworks and suggesting improvements for future applications.

How to cite: Yildiz, A. and Kowalski, J.: Data-integrated executable publications for reproducible geohazards research, EGU General Assembly 2023, Vienna, Austria, 24–28 Apr 2023, EGU23-7417, https://doi.org/10.5194/egusphere-egu23-7417, 2023.