In recent years, the geoscience community has been making strides towards making our science more open, inclusive, and accessible, driven both by individual- or community-led initiatives and by broader-scale regulatory changes. Open-source software, accessible codebases and open online collaboration resources (such as GitHub, VHub, etc.) are becoming the norm in many disciplines. The open-access publishing landscape has been changing too: several geoscience journals have defined data availability policies, and many publishers have introduced green and gold open-access options to their journal collections. Pre-print servers and grassroots diamond open-access journals are changing the readiness with which scholarly content can be accessed beyond the traditional paywall model.

However, good scientific practice requires research results to be reproducible, experiments to be repeatable and methods to be reusable. This can be a challenge in geosciences, with available data sets that are becoming more complex and constantly superseded by new, improved releases. Similarly, new models and computational tools keep emerging in different versions and programming languages, with a large variability in the quality of the documentation. Moreover, how data and models are linked together towards scientific output is very rarely documented in a reproducible way. As a result, very few published results are reproducible for the general reader. These challenges especially apply to hydrology, which is highlighted here as an example in the general geosciences.

This session is designed to gain a community overview of the current open-science landscape and how this is expected to evolve in the future. It aims to foster a debate on open science, lower the bar for engaging in open science and showcase examples, including software and other instruments for assisting open research. This may include software and tools, open science dissemination platforms (such as pre-print servers and journals), the teams driving the development of open-science resources and practices, and discussion on the regulatory moves towards standardising open access in the scientific community and what those policies mean in practice. The session has a focus on hydrological sciences, as an example within the geosciences. This session should advance the discussion on open and reproducible science, highlight its advantages and also provide the means to bring this into practice.

Co-organized by HS1.2
Convener: Remko C. Nijzink | Co-conveners: Niels Drost, James Farquharson, Alexandra KushnirECSECS, Francesca Pianosi, Stan Schymanski, Leonardo UiedaECSECS, Fabian WadsworthECSECS
vPICO presentations
| Tue, 27 Apr, 15:30–17:00 (CEST)

vPICO presentations: Tue, 27 Apr

Chairpersons: Alexandra Kushnir, Remko C. Nijzink, Stan Schymanski
David Rosenberg

Science and engineering rest on the concept of reproducibility. Yet across numerous fields like psychology, computer systems, and water resources there are great problems to reproduce research results. In this presentation, I identify reasons for low reproducibility in science. I share tools to make results more reproducible. I introduce financial incentives and awards to encourage you and our colleagues to make our research more reproducible. Finally, I advance a vision for what our future reproducible science should look like and I ask each attendee to identify and commit to take at least one step to make their research results more reproducible.


How to cite: Rosenberg, D.: Can you make your science more reproducible?, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-16547, https://doi.org/10.5194/egusphere-egu21-16547, 2021.

Caitlyn Hall, Sheila Saia, Andrea Popp, Stan Schymanski, Niels Drost, Nilay Dogulu, Tim van Emmerik, Rolf Hut, and Lieke Melsen

To have lasting impact on the scientific community and broader society, hydrologic research must be open, accessible, reusable, and reproducible. With so many different perspectives on and constant evolution of open science approaches and technologies, it can be overwhelming for hydrologists to start down the path towards or grow one’s own push for open research. Open hydrology practices are becoming more widely embraced by members of the community and key organizations, yet, technical (e.g., limited coding experience), resource (e.g., open access fees), and social barriers (e.g., fear of being scooped) still exist. These barriers may seem insurmountable without practical suggestions on how to proceed. Here, we propose the Open Hydrology Principles to guide individual and community progress toward open science. To increase accessibility and make the Open Hydrology Principles more tangible and actionable, we also present the Open Hydrology Practical Guidelines. Our aim is to help hydrologists transition from closed, inaccessible, not reusable, and not reproducible ways of conducting scientific work to open hydrology and empower researchers by providing information and resources to equitably grow the openness of hydrological sciences. We provide the first version of a practical open hydrology resource that may evolve with open science infrastructures, workflows, and research experiences. We discuss some of the benefits of open science and common reservations to open science, and how hydrologists can pursue an appropriate level of openness in the presence of barriers. Further, we highlight how the practice of open hydrology can be expanded. The Open Hydrology Principles, Practical Guide, and additional resources reflect our knowledge of the current state of open hydrology and we recognize that recommendations and suggestions will evolve. Therefore, we encourage hydrologists all over the globe to join the open science conversation by contributing to the living version of this document and sharing open hydrology resources at the community-supported repository at open-hydrology.github.io.

How to cite: Hall, C., Saia, S., Popp, A., Schymanski, S., Drost, N., Dogulu, N., van Emmerik, T., Hut, R., and Melsen, L.: A Hydrologist’s Guide to Open Science, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-605, https://doi.org/10.5194/egusphere-egu21-605, 2021.

Robert Reinecke, Tim Trautmann, Thorsten Wagener, and Katja Schüler

Software development has become an integral part of the earth system sciences as models and data processing get more sophisticated. Paradoxically, it poses a threat to scientific progress as the pillar of science, reproducibility, is seldomly reached. Software code tends to be either poorly written and documented or not shared at all; proper software licenses are rarely attributed. This is especially worrisome as scientific results have potential controversial implications for stakeholders and policymakers and may influence the public opinion for a long time. 

In recent years, progress towards open science has led to more publishers demanding access to data and source code alongside peer-reviewed manuscripts. Still, recent studies find that results in hydrology can rarely be reproduced. 

In this talk, we present first results of a poll conducted in spring 2021 among the hydrological science community. Therein, we strive to investigate the causes for that lack of reproducibility. We take a peek behind the curtain and unveil how the community develops and maintains complex code and what that entails for reproducibility. Our survey includes background knowledge, community opinion, and behaviour practices regarding reproducible software development.  

We postulate that this lack of reproducibility might be rooted in insufficient reward within the scientific community, insecurity regarding proper licencing of software and other parts of the research compendium as well as scientists’ unawareness about how to make software available in a way that allows for proper attribution of their work. We question putative causes such as unclear guidelines of research institutions or that software has been developed over decades by researchers' cohorts without a proper software engineering process and transparent licensing. 

To this end, we also summarize solutions like the adaption of modern project management methods from the computer engineering community that will eventually reduce costs while increasing the reproducibility of scientific research. 

How to cite: Reinecke, R., Trautmann, T., Wagener, T., and Schüler, K.: A Community Perspective on Research Software in the Hydrological Sciences , EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-801, https://doi.org/10.5194/egusphere-egu21-801, 2021.

Chelle Gentemann, Chris Holdgraf, Ryan Abernathey, Daniel Crichton, James Colliander, Edward Kearns, Yuvi Panda, and Richard Signell

The core tools of science (data, software, and computers) are undergoing a rapid and historic evolution, changing what questions scientists ask and how they find answers. Earth science data are being transformed into new formats optimized for cloud storage that enable rapid analysis of multi-petabyte datasets. Datasets are moving from archive centers to vast cloud data storage, adjacent to massive server farms. Open source cloud-based data science platforms, accessed through a web-browser window, are enabling advanced, collaborative, interdisciplinary science to be performed wherever scientists can connect to the internet. Specialized software and hardware for machine learning and artificial intelligence (AI/ML) are being integrated into data science platforms, making them more accessible to average scientists. Increasing amounts of data and computational power in the cloud are unlocking new approaches for data-driven discovery. For the first time, it is truly feasible for scientists to bring their analysis to the data without specialized cloud computing knowledge. Practically, for scientists, the effect of these changes is to vastly shrink the amount of time spent acquiring and processing data, freeing up more time for science. This shift in paradigm is lowering the threshold for entry, expanding the science community, and increasing opportunities for collaboration, while promoting scientific innovation, transparency, and reproducibility. These changes are increasing the speed of science, broadening the possibilities of what questions science can answer, and expanding participation in science.

How to cite: Gentemann, C., Holdgraf, C., Abernathey, R., Crichton, D., Colliander, J., Kearns, E., Panda, Y., and Signell, R.: Science storms the cloud, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-1005, https://doi.org/10.5194/egusphere-egu21-1005, 2021.

Marcus Strobl, Elnaz Azmi, Sibylle K. Hassler, Mirko Mälicke, Jörg Meyer, and Erwin Zehe

The virtual research environment V-FOR-WaTer aims at simplifying data access for environmental sciences, fostering data publications and facilitating data analyses. By giving scientists from universities, research facilities and state offices easy access to data, appropriate pre-processing and analysis tools and workflows, we want to accelerate scientific work and facilitate the reproducibility of analyses.

The prototype of the virtual research environment consists of a database with a detailed metadata scheme that is adapted to water and terrestrial environmental data. Present datasets in the web portal originate from university projects and state offices. We are also finalising the connection of V-FOR-WaTer to GFZ Data Services, an established repository for geoscientific data. This will ease publication of data from the portal and in turn give access to datasets stored in this repository. Key to being compatible with GFZ Data Services and other systems is the compliance of the metadata scheme with international standards (INSPIRE, ISO19115).

The web portal is designed to facilitate typical workflows in environmental sciences. Map operations and filter options ensure easy selection of the data, while the workspace area provides tools for data pre-processing, scaling, and common hydrological applications. The toolbox also contains more specific tools, e.g. for geostatistics and soon for evapotranspiration. It is easily extendable and will ultimately also include user-developed tools, reflecting the current research topics and methodologies in the hydrology community. Tools are accessed through Web Processing Services (WPS) and can be joined, saved and shared as workflows, enabling more complex analyses and ensuring reproducibility of the results.

How to cite: Strobl, M., Azmi, E., Hassler, S. K., Mälicke, M., Meyer, J., and Zehe, E.: V-FOR-WaTer: A Virtual Research Environment for Environmental Research, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-3356, https://doi.org/10.5194/egusphere-egu21-3356, 2021.

Niels Drost, Jerom P.M. Aerts, Fakhereh Alidoost, Bouwe Andela, Jaro Camphuijsen, Nick van de Giesen, Rolf Hut, Eric Hutton, Peter Kalverla, Gijs van den Oord, Inti Pelupessy, Stef Smeets, Stefan Verhoeven, and Ben van Werkhoven

The eWaterCycle platform (https://www.ewatercycle.org/) is a fully Open Source system designed explicitly to advance the state of Open and FAIR Hydrological modelling. While working with Hydrologists to create a fully Open and FAIR comparison study, we noticed that many ad-hoc tools and scripts are used to create input (forcing, parameters) for a hydrological model from the source datasets such as climate reanalysis and land-use data. To make this part of the modelling process better reproducible and more transparent we have created a common forcing input processing pipeline based on an existing climate model analysis tool: ESMValTool (https://www.esmvaltool.org/). 

Using ESMValTool, the eWaterCycle platform can perform commonly required preprocessing steps such as cropping, re-gridding, and variable derivation in a standardized manner. If needed, it also allows for custom steps for a hydrological model. Our pre-processing pipeline directly supports commonly used datasets such as ERA-5, ERA-Interim, and CMIP climate model data, and creates ready-to-run forcing data for a number of Hydrological models.

Besides creating forcing data, the eWaterCycle platform allows scientists to run Hydrological models in a standardized way using Jupyter notebooks, wrapping the models inside a container environment, and interfacing to these using BMI, the Basic Model Interface (https://bmi.readthedocs.io/). The container environment (based on Docker) stores the entire software stack, including the operating system and libraries, in such a way that a model run can be reproduced using an identical software environment on any other computer.

The reproducible processing of forcing and a reproducible software environment are important steps towards our goal of fully reproducible, Open, and FAIR Hydrological modelling. Ultimately, we hope to make it possible to fully reproduce a hydrological model experiment from data pre-processing to analysis, using only a few clicks.

How to cite: Drost, N., Aerts, J. P. M., Alidoost, F., Andela, B., Camphuijsen, J., van de Giesen, N., Hut, R., Hutton, E., Kalverla, P., van den Oord, G., Pelupessy, I., Smeets, S., Verhoeven, S., and van Werkhoven, B.: Towards Open and FAIR Hydrological Modelling with eWaterCycle, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-7797, https://doi.org/10.5194/egusphere-egu21-7797, 2021.

YoungDon Choi, Jonathan Goodall, Raza Ahmad, Tanu Malik, and David Tarboton

It is widely acknowledged that the reproducibility of published computational results is critical to advancing science. Creating reproducible computational workflows, however, is burdensome and requires significant work to share the complete package that efficiently encapsulates all required data and software. Computational hydrology is one field that has seen rapid advancements through fast-evolving technologies for supporting increasingly complex computational hydrologic modeling and analysis. This growing model complexity, along with rapidly evolving underlying software technologies, makes the options and approaches for achieving computational reproducibility extremely challenging to settle. We argue that the technologies needed to achieve open and reproducible hydrological modeling can be grouped into three general categories: 1) data (and metadata) sharing, 2) containerizing computational environments, and 3) capturing and executing modeling workflows. While a growing set of science gateways and virtual research environments have been created to support one or more of these technologies to improve reproducibility, the integration and interoperability across all three needs are still lacking, making end-to-end systems still out of reach. The objective of this research is to advance such an end-to-end solution that can support open and reproducible hydrological modeling that effectively integrates data sharing, containerization, and workflow execution environments. Our approach emphasizes 1) well-documented modeling objects shared with meaningful metadata through the HydroShare open repository, 2) version control with efficient containerization using the Sciunit software, and 3) immutable, but flexible, computational environments to use newly developing software packages. A key to this work is advancing Sciunit, a tool for easily containerizing, sharing, and tracking deterministic computational applications, to minimally containerize reproducible hydrologic modeling workflow objects into the same container with version control capabilities. We present how to add new model input and modeling dependencies into the Sciunit container for flexibility and how to create Docker images through Sciunit containers for compatibility with popular containerization tools. In this presentation, we will emphasize both the underlying technological developments made possible through this research along with a user-centric case study showing the application of the technology from a hydrologic modeler’s perspective.

How to cite: Choi, Y., Goodall, J., Ahmad, R., Malik, T., and Tarboton, D.: An Approach for Open and Reproducible Hydrological Modeling using Sciunit and HydroShare, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-13763, https://doi.org/10.5194/egusphere-egu21-13763, 2021.

Alexander Jüstel, Arthur Endlein Correira, Florian Wellmann, and Marius Pischke

Geological modeling methods are widely used to represent subsurface structures for a multitude of applications – from scientific investigations, over natural resource and reservoir studies, to large-scale analyses and geological representations by geological surveys. In recent years, we have seen an increase in the availability of geological modeling methods. However, many of these methods are difficult to use due to preliminary data processing steps, which can be specifically difficult for geoscientific data in geographic coordinate systems.

We attempt to simplify the access to open-source spatial data processing for geological modeling with the development of GemGIS, a Python-based open-source library. GemGIS wraps and extends the functionality of packages known to the geo-community such as GeoPandas, Rasterio, OWSLib, Shapely, PyVista, Pandas, NumPy and the geomodelling package GemPy. The aim of GemGIS, as indicated by the name, is to become a bridge between conventional geoinformation systems (GIS) such as ArcGIS and QGIS, and geomodelling tools such as GemPy, allowing simpler and more automated workflows from one environment to the other.

Data within the different disciplines of geosciences are often available in a variety of data formats that need to be converted or transformed for visualization in 2D and 3D and subsequent geomodelling methods. This is where GemGIS comes into play. GemGIS is capable of working with vector data created in GIS systems through GeoPandas, Pandas and Shapely, with raster data through rasterio and NumPy, with data obtained from web services such as maps or digital elevation models through OWSLib and with meshes through PyVista. Support for geophysical data and additional geo-formats are constantly added.

The GemGIS package already contains several tutorials explaining how the different modules can be used to process spatial data. It was decided against creating new data classes in case users are already familiar with concepts such as (Geo-)DataFrames in (Geo-)Pandas or PolyData/Grids in PyVista.

The GemGIS package is hosted at https://github.com/cgre-aachen/gemgis, the documentation is available at https://gemgis.readthedocs.io/en/latest/index.html. GemGIS is also available on PyPi. You can install GemGIS in your Python environment using ‘pip install gemgis’.

We welcome contributions to the project through pull requests and are open to suggestions and comments, also over Github issues, especially about possible links to other existing software developments and approaches to integrate geoscientific data processing and geomodelling.

How to cite: Jüstel, A., Endlein Correira, A., Wellmann, F., and Pischke, M.: GemGIS – GemPy Geographic: Open-Source Spatial Data Processing for Geological Modeling, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-4613, https://doi.org/10.5194/egusphere-egu21-4613, 2021.

Louis Krieger, Remko Nijzink, Gitanjali Thakur, Chandrasekhar Ramakrishnan, Rok Roskar, and Stan Schymanski

Good scientific practice requires good documentation and traceability of every research step in order to ensure reproducibility and repeatability of our research. However, with increasing data availability and ability to record big data, experiments and data analysis become more complex. This complexity often requires many pre- and post-processing steps that all need to be documented for reproducibility of final results. This poses very different challenges for numerical experiments, laboratory work and field-data analysis. The platform Renku (https://renkulab.io/), developed by the Swiss Data Science Center, aims at facilitating reproducibility and repeatability of all these scientific workflows. Renku stores all data, code and scripts in an online repository, and records in their history how these files are generated, interlinked and modified. The linkages between files (inputs, code and outputs) lead to the so-called knowledge graph, used to record the provenance of results and connecting those with all other relevant entities in the project.

We will discuss here several use examples, including mathematical analysis, laboratory experiments, data analysis and numerical experiments, all related to scientific projects presented separately. Reproducibility of mathematical analysis is facilitated by clear variable definitions and a computer algebra package that enables reproducible symbolic derivations. We will present the use of the Python package ESSM (https://essm.readthedocs.io) for this purpose, and how it can be integrated into a Renku workflow. Reproducibility of laboratory results is facilitated by tracking of experimental conditions for each data record and instrument re-calibration activities, mainly through Jupyter notebooks. Data analysis based on different data sources requires the preservation of links to external datasets and snapshots of the dataset versions imported into the project, that is facilitated by Renku. Renku also takes care of clear links between input, code and output of large numerical experiments, our last use example, and enables systematic updating if any of the input or code files are changed.

These different examples demonstrate how Renku can assist in documenting the scientific process from input to output and the final paper. All code and data are directly available online, and the recording of the workflows ensures reproducibility and repeatability.

How to cite: Krieger, L., Nijzink, R., Thakur, G., Ramakrishnan, C., Roskar, R., and Schymanski, S.: Repeatable and reproducible workflows using the RENKU open science platform, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-7655, https://doi.org/10.5194/egusphere-egu21-7655, 2021.

Andres Peñuela and Francesca Pianosi

Reproducibility and re-usability of research requires giving access to data and numerical code but, equally importantly, helping others to understand how inputs, models and outputs are linked together. Jupyter Notebooks is a programming environment that dramatically facilitates this task, by enabling to create stronger and more transparent links between data, model and results. Within a single document where all data, code, comments and results are brought together, Jupyter Notebooks provide an interactive computing environment in which users can read, run or modify the code, and visualise the resulting outputs. In this presentation, we will explain the philosophy that we have applied for the development of interactive Jupyter Notebooks for two Python toolboxes, iRONS (a package of functions for reservoir modelling and optimisation) and SAFE (a package of functions for global sensitivity analysis). The purposes of the Jupyter Notebooks are two: some Notebooks target current users by demonstrating the key functionalities of the toolbox (‘how’ to use it), effectively replacing the technical documentation of the software; other Notebooks target potential users by demonstrating the general value of the methodologies implemented in the toolbox (‘why’ use it). In all cases, the Notebooks integrate the following features: 1) the code is written in a math-like style to make it readable to a wide variety of users, 2) they integrate interactive results visualization to facilitate the conversation between the data, the model and the user, even when the user does not have the time or expertise to read the code, 3) they can be run on the cloud by using online computational environments, such as Binder, so that they are accessible by a web browser without requiring the installation of Python. We will discuss the feedback received from users and our preliminary results of measuring the effectiveness of the Notebooks in transferring knowledge of the different modelling tasks.

How to cite: Peñuela, A. and Pianosi, F.: Creating a more fluent conversation between data, model and users through interactive Jupyter Notebooks, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-7755, https://doi.org/10.5194/egusphere-egu21-7755, 2021.

Bernadette Fritzsch and Daniel Nüst

Open Science has established itself as a movement across all scientific disciplines in recent years. It supports good practices in science and research that lead to more robust, comprehensible, and reusable results. The aim is to improve the transparency and quality of scientific results so that more trust is achieved, both in the sciences themselves and in society. Transparency requires that uncertainties and assumptions are made explicit and disclosed openly. 
Currently, the Open Science movement is largely driven by grassroots initiatives and small scale projects. We discuss some examples that have taken on different facets of the topic:

  • The software developed and used in the research process is playing an increasingly important role. The Research Software Engineers (RSE) communities have therefore organized themselves in national and international initiatives to increase the quality of research software.
  • Evaluating reproducibility of scientific articles as part of peer review requires proper creditation and incentives for both authors and specialised reviewers to spend extra efforts to facilitate workflow execution. The Reproducible AGILE initiative has established a reproducibility review at a major community conference in GIScience.
  • Technological advances for more reproducible scholarly communication beyond PDFs, such as containerisation, exist, but are often inaccessible to domain experts who are not programmers. Targeting geoscience and geography, the project Opening Reproducible Research (o2r) develops infrastructure to support publication of research compendia, which capture data, software (incl. execution environment), text, and interactive figures and maps.

At the core of scientific work lie replicability and reproducibility. Even if different scientific communities use these terms differently, the recognition that these aspects need more attention is commonly shared and individual communities can learn a lot from each other. Networking is therefore of great importance. The newly founded initiative German Reproducibility Network (GRN) wants to be a platform for such networking and targets all of the above initiatives. GRN is embedded in a growing network of similar initiatives, e.g. in the UK, Switzerland and Australia. Its goals include 

  • Support of local open science groups
  • Connecting local or topic-centered initiatives for the exchange of experiences
  • Attracting facilities for the goals of Open Science 
  • Cultivate contacts to funding organizations, publishers and other actors in the scientific landscape

In particular, the GRN aims to promote the dissemination of best practices through various formats of further education, in order to sensitize particularly early career researchers to the topic. By providing a platform for networking, local and domain-specific groups should be able to learn from one another, strengthen one another, and shape policies at a local level.

We present the GRN in order to address the existing local initiatives and to win them for membership in the GRN or sibling networks in other countries.

How to cite: Fritzsch, B. and Nüst, D.: German Reproducibility Network - a new platform for Open Science in Germany, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-14724, https://doi.org/10.5194/egusphere-egu21-14724, 2021.

Wouter Knoben, Shervan Gharari, and Martyn Clark

Setting up earth system models can be cumbersome and time-consuming. Model-agnostic tasks are typically the same regardless of model used and include definition and delineation of the modeling domain and preprocessing of forcing data and parameter fields. Model-specific tasks include conversion of preprocessed data into model-specific formats and generation of model inputs and run scripts. We present a workflow that includes both model-agnostic and model-specific steps needed to set up the Structure for Unifying Multiple Modeling Alternatives (SUMMA) anywhere on the planet, with the goal of providing a baseline SUMMA set up that can easily be adapted for specific study purposes. The workflow therefore uses open source data with global coverage to derive basin delineations, climatic forcing, and geophysical inputs such as topography, soil and land use parameters. The use of open source data, an open source model and an open source workflow that relies on established software packages results in transparent and reproducible scientific outputs, open to verification and adaptation by the community. The workflow substantially reduces model configuration time for new studies and paves the way for more and stronger scientific contributions in the long term, as it lets the modeler focus on science instead of set up.

How to cite: Knoben, W., Gharari, S., and Clark, M.: Facilitating reproducible science: a workflow for setting up SUMMA simulations anywhere on the globe, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-3091, https://doi.org/10.5194/egusphere-egu21-3091, 2021.

Riccardo Rigon, Marialaura Bancheri, Giuseppe Formetta, Francesco Serafin, Michele Bottazzi, Niccolò Tubini, and Concetta D'Amato

The scope of this work is to present new insights of the GEOframe system. GEOframe is an open-source, semi-distributed, component-based hydrological modeling system. It is developed in Java and in Python and based on the environmental modeling framework Object Modeling System V3 (OMS3). Each part of the hydrological cycle is implemented in a self-contained building block, commonly called component. Components can be joined together to obtain multiple modeling solutions that can accomplish from simple to very complicated tasks. More than 50 components are available for the estimation of all the variables of the hydrological cycle. Starting from the geomorphic and DEM analyses, GEOframe allows the spatial interpolation of the meteorological forcing data, the simulation of the radiation budget, the estimation of the ET and of the snow processes. Runoff production is performed by using the Embedded Reservoir Model (ERM) or a combination of its reservoirs. Model parameters can be calibrated using two algorithms and several objective functions. The graph-based structure, called NET3, is employed for the management of process simulations. NET3 is designed using a river network/graph structure analogy, where each HRU is a node of the graph, and the channel links are the connections between the nodes. In any NET3 node, a different modeling solution can be implemented and nodes (HRUs or channels) can be connected or disconnected at run time through scripting.  Thanks to its solid informatics infrastructure and physical base, GEOframe proved a great flexibility and a great robustness in several applications, from small to big scale catchments. GEOframe is open source, is chain of development is based on open source products, and its codes are engineered to be inspectionable. This because it helps the reproducibility and replicability of research. Developers and users can easily collaborate, share documentation, and archive examples and data within the GEOframe community. We believe that these are a priori condition to verify the reliability and the robustness of models. GEOframe modular structure allows for the fair comparison of model structure units and algorithms implementations because just the component performing that specific task has to be changed. In this contribution we list the components available and discuss some applications at different scales whit different modeling tools which return what we think realistic results. We show that there exist no perfect model of a process but that the modelling art and science can  be made more evolutionary even when they are revolutionary. 

How to cite: Rigon, R., Bancheri, M., Formetta, G., Serafin, F., Bottazzi, M., Tubini, N., and D'Amato, C.: The GEOframe system: a modular, expandible, open-source system for doing hydrology by computer according to the open science paradigms., EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-7070, https://doi.org/10.5194/egusphere-egu21-7070, 2021.

Luojia Hu, Wei Yao, Zhitong Yu, and Yan Huang

A high resolution mangrove map (e.g., 10-m), which can identify mangrove patches with small size (< 1 ha), is a central component to quantify ecosystem functions and help government take effective steps to protect mangroves, because the increasing small mangrove patches, due to artificial destruction and plantation of new mangrove trees, are vulnerable to climate change and sea level rise, and important for estimating mangrove habitat connectivity with adjacent coastal ecosystems as well as reducing the uncertainty of carbon storage estimation. However, latest national scale mangrove forest maps mainly derived from Landsat imagery with 30-m resolution are relatively coarse to accurately characterize the distribution of mangrove forests, especially those of small size (area < 1 ha). Sentinel imagery with 10-m resolution provide the opportunity for identifying these small mangrove patches and generating high-resolution mangrove forest maps. Here, we used spectral/backscatter-temporal variability metrics (quantiles) derived from Sentinel-1 SAR (Synthetic Aperture Radar) and sentinel-2 MSI (Multispectral Instrument) time-series imagery as input features for random forest to classify mangroves in China. We found that Sentinel-2 imagery is more effective than Sentinel-1 in mangrove extraction, and a combination of SAR and MSI imagery can get a better accuracy (F1-score of 0.94) than using them separately (F1-score of 0.88 using Sentinel-1 only and 0.895 using Sentinel-2 only). The 10-m mangrove map derived by combining SAR and MSI data identified 20,003 ha mangroves in China and the areas of small mangrove patches (< 1 ha) was 1741 ha, occupying 8.7% of the whole mangrove area. The largest area (819 ha) of small mangrove patches is located in Guangdong Province, and in Fujian the percentage of small mangrove patches in total mangrove area is the highest (11.4%). A comparison with existing 30-m mangrove products showed noticeable disagreement, indicating the necessity for generating mangrove extent product with 10-m resolution. This study demonstrates the significant potential of using Sentinel-1 and Sentinel-2 images to produce an accurate and high-resolution mangrove forest map with Google Earth Engine (GEE). The mangrove forest maps are expected to provide critical information to conservation managers, scientists, and other stakeholders in monitoring the dynamics of mangrove forest.

How to cite: Hu, L., Yao, W., Yu, Z., and Huang, Y.: An updated national-scale mangrove forest map in China Using Sentinel-1 and Sentinel-2 Time-Series Data with Google Earth Engine, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-5310, https://doi.org/10.5194/egusphere-egu21-5310, 2021.

Matteo Ravasi, Carlos Alberto da Costa Filho, Ivan Vasconcelos, and David Vargas

Inverse problems lie at the core of many geophysical algorithms, from earthquake and exploration seismology, all the way to electromagnetics and gravity potential methods.

In 2018, we open-sourced PyLops, a Python-based framework for large-scale inverse problems. By leveraging the concept of matrix-free linear operators – together with the efficiency of numerical libraries such as NumPy, SciPy, and Numba – PyLops solves computationally intensive inverse problems with high-level code that is highly readable and resembles the underlying mathematical formulation. While initially aimed at researchers, its parsimonious software design choices, large test suite, and thorough documentation render PyLops a reliable and scalable software package easy to run both locally and in the cloud.

Since its initial release, PyLops has incorporated several advancements in scientific computing leading to the creation of an entire ecosystem of tools: operators can now run on GPUs via CuPy, scale to distributed computing through Dask, and be seamlessly integrated into PyTorch’s autograd to facilitate research in machine-learning-aided inverse problems. Moreover, PyLops contains a large variety of inverse solvers including least-squares, sparsity-promoting algorithms, and proximal solvers highly-suited to convex, possibly nonsmooth problems. PyLops also contains sparsifying transforms (e.g., wavelets, curvelets, seislets) which can be used in conjunction with the solvers. By offering a diverse set of tools for inverse problems under one unified framework, it expedites the use of state-of-the-art optimization methods and compressive sensing techniques in the geoscience domain.

Beyond our initial expectations, the framework is currently used to solve problems beyond geoscience, including astrophysics and medical imaging. Likewise, it has inspired the development of the occamypy framework for nonlinear inversion in geophysics. In this talk, we share our experience in building such an ecosystem and offer further insights into the needs and interests of the EGU community to help guide future development as well as achieve wider adoption.

How to cite: Ravasi, M., da Costa Filho, C. A., Vasconcelos, I., and Vargas, D.: Developing open-source tools for reproducible inverse problems: the PyLops journey, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-5772, https://doi.org/10.5194/egusphere-egu21-5772, 2021.

Jason Hunter, Mark Thyer, Dmitri Kavetski, and David McInerney

Probabilistic predictions provide crucial information regarding the uncertainty of hydrological predictions, which are a key input for risk-based decision-making. However, they are often excluded from hydrological modelling applications because suitable probabilistic error models can be both challenging to construct and interpret, and the quality of results are often reliant on the objective function used to calibrate the hydrological model.

We present an open-source R-package and an online web application that achieves the following two aims. Firstly, these resources are easy-to-use and accessible, so that users need not have specialised knowledge in probabilistic modelling to apply them. Secondly, the probabilistic error model that we describe provides high-quality probabilistic predictions for a wide range of commonly-used hydrological objective functions, which it is only able to do by including a new innovation that resolves a long-standing issue relating to model assumptions that previously prevented this broad application.  

We demonstrate our methods by comparing our new probabilistic error model with an existing reference error model in an empirical case study that uses 54 perennial Australian catchments, the hydrological model GR4J, 8 common objective functions and 4 performance metrics (reliability, precision, volumetric bias and errors in the flow duration curve). The existing reference error model introduces additional flow dependencies into the residual error structure when it is used with most of the study objective functions, which in turn leads to poor-quality probabilistic predictions. In contrast, the new probabilistic error model achieves high-quality probabilistic predictions for all objective functions used in this case study.

The new probabilistic error model and the open-source software and web application aims to facilitate the adoption of probabilistic predictions in the hydrological modelling community, and to improve the quality of predictions and decisions that are made using those predictions. In particular, our methods can be used to achieve high-quality probabilistic predictions from hydrological models that are calibrated with a wide range of common objective functions.

How to cite: Hunter, J., Thyer, M., Kavetski, D., and McInerney, D.: An open-source R-package and web application for high-quality probabilistic predictions in hydrology, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-8549, https://doi.org/10.5194/egusphere-egu21-8549, 2021.

Sandra Hellmers, Christoph Sauer, and Peter Fröhle

An efficient method to solve a significant weakness in hydrological modelling to compute backwater effects in low lying catchments is presented. The re-usable and transferable method is implemented in the open source software KalypsoNA (KalypsoHydrology) and validated with results of a tidal influenced low lying catchment study. 
Especially in low lying (marshy) catchments, the pressure on current storm water drainage systems raises due to combined impacts of enlarged urbanisation on the one hand and mean sea level rise and heavy storm events on the other hand. Models are applied to analyse and assess the resulting consequences by these impacts on the flood routing along a stream using different hydrological approaches: (i) pure black box (namely empirical, lumped), (ii) hydrological conceptual or (iii) hydrodynamic-numerical approaches. The computation of flow depths, velocities and backwater effects in streams as well as on forelands are not yet modelled with hydrological approaches, but using simplified hydrodynamic-numerical approaches. A requirement for accurate hydrodynamic-numerical modelling is high resolution data of the topography of the main channel and the flood plain in case of bank overflow. Hence, the availability of suitable detailed profile data from measurements is crucial for hydrodynamic-numerical modelling. The comparatively long computing time for hydrodynamic-numerical model simulations is no limitation for answering special research questions, but it poses a limitation in real-time operational application and for meso to regional scale catchment modelling (>100 km2). 
To resolve the shortcomings in hydrological approaches to model water depths and backwater effects, new concepts are required which are applicable for catchments with scarce data availability, efficient for real-time operational model application, open for further model developments and re-useable for other hydrological model implementations.
This contribution presents the development, implementation and evaluation of a method for modelling backwater effects based on a hydrological flood routing approach and a backwater volume routing according to the water level slope. The developed method computes the backwater effects in two steps. First, the inflow from sub-catchments and the non-backwater affected flood routing processes are computed. Secondly, the afflux conditions are calculated which cause backwater effects in upstream direction. Afflux conditions occur mainly at tributary inlets or control structures (for example, tide gates, weirs, retention ponds or sluices). The input parameters comprise simplified or complex geometrical data per stream segment. Therefore, the model is applicable for catchments with a good or scarce availability of data. Computation time is in the range of max 3 minutes even for large catchments (> 150 km² with several sub- and sub-sub-catchments) using a time step size of 15 minutes for a 14 days simulation and is therefore applicable for real-time operational simulations in flood forecasting. 
The proposed method is re-useable and transferable to other hydrological numerical models which use conceptual hydrological flood routing approaches (e.g. Muskingum-Cunge or Kalinin-Miljukov). The open source software model KalypsoHydrology and the calculation core KalypsoNA are available at https://sourceforge.net/projects/kalypso/ and http://kalypso.wb.tu-harburg.de/downloads/. Open access for developments and user application is supported by an online accessible commitment management via SourceForge and a wiki as an online manual.

How to cite: Hellmers, S., Sauer, C., and Fröhle, P.: Computation of backwater effects in low lying (marshland) catchments – a re-usable and efficient method in an open source hydrological model, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-11714, https://doi.org/10.5194/egusphere-egu21-11714, 2021.

Matevž Vremec and Raoul Collenteur

Evapotranspiration (ET) is a major component of the hydrological cycle and accurate estimates of the flux are important to the water and agricultural sector, among others. Due to difficulties in the direct observation of ET in the field, the flux is often estimated from other meteorological data using empirical formulas. There is a wide variety of such formulas, with different levels of input data and parameter requirements. While some Python packages are available in the Python ecosystem for these tasks, they typically focus on one specific formula or data type. The goal of PyEt is to provide a Python package for the estimation of ET that works with many different data types, is well documented and tested, and simple to use. The source code is hosted at GitHub (https://github.com/phydrus/PyEt) and Pypi can be used to install the package. PyEt currently contains nine different methods to estimate ET and various methods to estimate surface and aerodynamic resistance. The methods are tested against other open source data to ensure proper functioning of the methods. While the methods currently are only implemented for 1D data (e.g. time series data), future work will focus on enabling the methods on 2D and 3D data as well (such as Numpy Arrays, XArray, and NetCDF files). The package allows hydrologists to compute and compare evapotranspiration estimates using different approaches with minimum effort. The presentation will focus on the problems associated with reproducibility in ET estimation and linkage with existing Python libraries to perform complex sensitivity and uncertainty analyses.

How to cite: Vremec, M. and Collenteur, R.: PyEt - a Python package to estimate potential and reference evapotranspiration, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-15008, https://doi.org/10.5194/egusphere-egu21-15008, 2021.

Martijn van den Ende, Lucile Bruhat, Gareth Funning, Alice-Agnes Gabriel, Stephen Hicks, Romain Jolivet, Thomas Lecocq, Christie Rowe, and The Seismological Community

On 24 November 2020, the Springer Nature publishing group announced the introduction of Open Access (OA) journals in Nature and its sibling journals. The corresponding OA publication fee (charged directly to the authors) was set to €9,500/$11,390/£8,290, an amount that may be well out of reach for researchers with limited financial means. This is especially a problem for researchers in developing countries, and for early-career researchers on small, personal fellowships. Funding agencies often demand that research be published under an OA license, forcing authors to accept the high publication fees.

The high cost of these and similar OA fees for other Earth science journals prompted a discussion among the seismological community on Twitter, during which the idea was raised to start a free-to-publish, free-to-read journal for seismological research. The concept of Diamond Open Access was previously adopted by Volcanica (www.jvolcanica.org) for volcanological research, providing a precedent and directives for similar initiatives (like Seismica, but also Tektonika for the structural geology community). Following the community discussion on Slack with over 100 participants, a small "task force" was formed to investigate in detail the possibility of starting a Diamond OA seismology journal, taking Volcanica as a model. In this contribution, we report the progress that has been made by the task force and the seismological community in the conceptualisation of the journal, and the steps that remain to be taken. Once the initiation of the journal is completed, Seismica will offer a platform for researchers to publish and access peer-reviewed work with no financial barriers, promoting seismological research in an inclusive manner. We invite all interested members of the seismological and earthquake community to participate in the discussions and development of this OA journal, by contacting the authors listed on this abstract.

How to cite: van den Ende, M., Bruhat, L., Funning, G., Gabriel, A.-A., Hicks, S., Jolivet, R., Lecocq, T., Rowe, C., and Seismological Community, T.: The Seismica initiative: a community-led Diamond Open Access journal for seismological research, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-2519, https://doi.org/10.5194/egusphere-egu21-2519, 2021.

Catherine Jex, Wiliiam Colgan, Michael Bryld Wessel Fyhn, Adam A Garde, Jon R Ineson, Adam Hambly, Kim Hyojin, Julian Koch, Thomas Find Kokfelt, Signe Hillerup Larsen, Sofie Lindström, Stefanie Lode, Rasmus Bødker Madsen, Mette Olivarius, Kerstin Saalmann, Sara Salehi, Marit-Solveig Seidenkrantz, Lars Stemmerik, and Kristian Svennevig

One active journal. Fourteen legacy titles. More than 3000 articles published since 1893 – some digitised, some not. One full-time member of staff. A small team of dedicated geoscientists. Limited budget. PlanS. Open-source journal software. If these are the ingredients, what is the recipe? 

Like many surveys, the Geological Survey of Denmark and Greenland (GEUS) has a long history of publishing. Our full catalogue of titles extends back to 1893 and our current title, GEUS Bulletin (www.geusbulletin.org; formerly Geological Survey of Denmark and Greenland Bulletin), has been active since 2003. Our journals have always been grassroots initiatives – run by scientists, for scientists. But two years ago, amid the fast-changing demands of digital publishing, the Survey faced a quandary: should we continue publishing our own journal? At a time of rapid proliferation of journals for any discipline imaginable, what niche did a geographically-focused journal fill? What should we modernise? Could we relaunch as an online, diamond open-access journal on our existing budget? Could we implement more of the services our authors wanted and attract more authors beyond our traditional audience? 

Two years later, we have successfully re-launched our collection of journals, without increasing our overall budget. Using open-source solutions, we have transformed our print-focused publication workflow to a new online, open-access platform and data repository. We are currently migrating our entire back catalogue of legacy titles to the same platform. Although we only have visitor data for our new platform since November 2020, we can see early signs of increased article views (c. +82% in Nov–Dec 2020, compared with the same months in 2018 and 2019) and a jump in traffic from external websites like Google Scholar (from 5% before re-launch to 35% after re-launch). In this presentation, we present a recipe that we hope other geological surveys, societies and institutions can follow when launching (or relaunching) their own journals using open-source solutions. We review the options available to small survey or society publishers on a limited budget, from journal hosting to typesetting. We highlight the advantages of non-profit open-access publishing and open source, community-driven solutions that currently exist. We close by highlighting the barriers that remain for small non-profit publishers when balancing discoverability, journal impact and compliance with the latest open-access initiatives such as Plan S, and web accessibility regulations.  

It is still early days for GEUS Bulletin, but we see the adoption of open-source platforms as the key ingredient to our potential for success in the coming years. Such platforms allow us to offer diamond open-access publishing and a data repository, while maintaining our non-profit, publishing model with neither author nor reader fees. 

How to cite: Jex, C., Colgan, W., Fyhn, M. B. W., Garde, A. A., Ineson, J. R., Hambly, A., Hyojin, K., Koch, J., Kokfelt, T. F., Larsen, S. H., Lindström, S., Lode, S., Madsen, R. B., Olivarius, M., Saalmann, K., Salehi, S., Seidenkrantz, M.-S., Stemmerik, L., and Svennevig, K.: A recipe for launching a diamond open-access journal with a century of geological knowledge in the pantry: Lessons learned from GEUS Bulletin, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-7168, https://doi.org/10.5194/egusphere-egu21-7168, 2021.

Mohamed Gouiza, David Fernández Blanco, Clare Bond, Dave McCarthy, Amicia Lee, Lucia Pérez-Díaz, and τeκτoniκa community

τeκτoniκa is an up-coming community-led diamond open access (DOA) journal, which aims to publish high-quality research in structural geology and tectonics. It is a grass-roots community-driven initiative that relies on the involvement of Earth Scientists from around the globe; that together represent the wide and diverse spectrum of the structural geology and tectonics community. 

Beyond the obvious objective of publishing novel research on structural geology and tectonics, it is intended to offer an alternative to traditional publishing models, which hide scholarly work behind exclusive and expensive paywalls. τeκτoniκa is a new addition to the growing set of DOA journals that have appeared in recent years. Along with preprint platforms, data and software repositories, it is part of an expanding movement within academia focused on breaking the barriers inherited from the pre-internet publishing era, to ensure free and open access to knowledge.

This contribution aims to showcase the value of this ambitious project as well as our vision for how DOA journals in general (and Tektonika in particular) might shape the future of geoscience publishing.

How to cite: Gouiza, M., Fernández Blanco, D., Bond, C., McCarthy, D., Lee, A., Pérez-Díaz, L., and community, Τ.: τeκτoniκa, a journal for an open future, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-11146, https://doi.org/10.5194/egusphere-egu21-11146, 2021.

Fabian Wadsworth, Jamie Farquharson, Alexandra Kushnir, Michael Heap, Ben Kennedy, Oryaëlle Chevrel, Rebecca Williams, and Pierre Delmelle

The case for open access research is well established. At the core of the pro-openness argument is a philosophy that it is good if research outputs are widely accessible, breaking down the walls that stand between the world of research and the public. Moreover, openness ensures that scientists can access resources worldwide, even if their institutions cannot afford subscription fees, thereby breaking down economic barriers and access disparities that exist globally. In large part, publishers and publications are adopting this philosophy, and ensuring that the costs of publication are covered by charging them to the authors (via Article Processing Charges, or APCs) rather than the readers of research – this is the ‘gold’ openness model. However, these charges for publication are often very high, which discourages submission to gold open access forums and maintains an academic environment that favours the older ‘subscription’ models of publication.

At the journal Volcanica, we have found a way to remove both costs – costs to readers and costs to authors – by building a community journal that maintains exceptionally low running costs, paid for by a university press publisher – this is the ‘diamond’ openness model. We can achieve this by relying on volunteer time provided at no cost. In this presentation, we explore the current state of our journal three years after the publication of our first article. We survey the challenges faced by Volcanica as we grow, handle more submissions, and expand our reviewing, typesetting, and back-end work-flow. To meet these challenges, we have expanded our technical and editorial personnel.

Here we explore the growth challenges that are still to come, and compare our volunteer model with the model of the ‘academic society journal’, in which relatively minimal staff costs are paid for by a mixed model. The mixed model is still driven by article processing charges, but keeps those costs comparatively low, and offers fee-waivers on a needs basis, acknowledging that not all authors are well-funded. In doing so, we take a nuanced approach to the realities of growing a community-led endeavour, and examine the extent to which our model could be scaled to the size of the leading journals in our field. While we do not reach a definitive conclusion as to the role that Diamond publishing models will play in the future landscape of research dissemination, we hope that the presentation of our experiences is informative to the geo-scientific community, especially as new ‘diamond’ open journals – Seismica and Tektonika – are slated for launch in the coming years.

How to cite: Wadsworth, F., Farquharson, J., Kushnir, A., Heap, M., Kennedy, B., Chevrel, O., Williams, R., and Delmelle, P.: Growing a diamond open access community initiative: Volcanica 3 years on, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-15938, https://doi.org/10.5194/egusphere-egu21-15938, 2021.

Caroline Coward

Open Science, as we commonly define it, has grown steadily over the past two to three decades, thanks to the proliferation of electronic data and information, as well as ease of access to computers with high speed Internet connectivity. What began as a mechanism to share the products of our scientific research has evolved into a global movement involving journal article manuscripts, source code, copyright, access, and intellectual property negotiations, digital repositories, cloud-based tools, and data in a variety of formats.

This presentation will briefly define Open Science, and enumerate and describe common elements of Open Science through a brief history of the movement. It will also touch on both triumphs and challenges faced by proponents, discuss the role of professional publishers, aggregators, and other traditional gatekeepers, and will propose scenarios for the future of the movement. Questions, anecdotes, vexations, and suggestions from attendees are welcomed at the end of the presentation, with the goal of generate deeper discussion around the future and sustainability of Open Science. 

How to cite: Coward, C.: Open Science – The Great Equalizer: Decolonizing the tools, technologies, and results of science in the 21st century, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-14830, https://doi.org/10.5194/egusphere-egu21-14830, 2021.