Earth Sciences depend on detailed multi-variate measurements and investigations to understand the physical, geological, chemical, biogeochemical and biological processes of the Earth. Making accurate prognoses and providing solutions for current questions related to climate change, water, energy and food security are important requests towards the Earth Science community worldwide. In addition to these society-driven questions, Earth Sciences are still strongly driven by the eagerness of individuals to understand processes, interrelations and tele-connections within and between small sub-systems and the Earth System as a whole. Understand and predict temporal and spatial changes in the above mentioned Micro- to Earth spanning scales is the key to understand Earth ecosystems; we need to utilize high resolution data across all scales in an integrative/holistic approach. Using Big Data, which are often distributed and particularly very in-homogenous, has become standard practice in Earth Sciences and digitalization in conjunction with Data Science promises new discoveries.
The understanding of the Earth System as a whole and its sub-systems depends on our ability to integrate data from different disciplines, between earth compartments, and across interfaces. The need to advance Data Science capabilities and to enable earth scientists to follow best possible workflows, apply methods, and use computerized tools properly and in an accessible way has been identified worldwide as an important next step for advancing scientific understanding. This is particularly necessary to access knowledge contained in already acquired data, but which due to the limitations of data integration and joint exploration possibilities currently remains invisible. This session aims to bring together researchers from Data and Earth Sciences working on, but not limited to,
• SMART monitoring designs by dealing with advancing monitoring strategies to e.g. detect observational gaps and refine sensor layouts to allow better and statistically robust extrapolation
• Data management and stewardship solutions compliant with FAIR principles, including the development and application of real-time capable data management and processing chains
• Data exploration frameworks providing qualified data from different sources and tailoring available computational and visual methods to explore and analyse multi-parameter data generated through monitoring efforts/ model simulations

Convener: Jens Greinert | Co-conveners: Peter Dietrich, Andreas Petzold, Roland Ruhnke, Viktoria WichertECSECS
| Attendance Wed, 06 May, 10:45–12:30 (CEST)

Files for download

Download all presentations (22MB)

Chat time: Wednesday, 6 May 2020, 10:45–12:30

Chairperson: Viktoria Wichert
D800 |
| Highlight
Philipp Fischer, Madlen Friedrich, Markus Brand, Uta Koedel, Peter Dietrich, Holger Brix, Dorit Kerschke, and Ingeborg Bussmann

Measuring environmental variables over longer times in coastal marine environments is a challenge in regard to sensor maintenance and data processing of continuously produced comprehensive datasets. In the project “MOSES” (Modular Observation Solutions for Earth Systems), this procedure became even more complicated because seven large Helmholtz centers from the research field Earth and Environment (E&E) within the framework of the German Ministery of Educatiopn and Research (BMBF) work together to design and construct a large scale monitoring network across earth compartments to study the effects of short-term events on long term environmental trends. This requires the development of robust and standardized automated data acquisition and processing routines, to ensure reliable, accure and precise data.

Here, the results of two intercomparison workshops on senor accuracy and precicion for selected environmental variables are presented. Environmental sensors which were to be used in MOSES campaigns on hydrological extremes (floods and draughts) in the Elbe catchment and the adjacent coastal areas in the North Sea in 2019 to 2020 were compared for selected parameters (temperature, salinity, chlorophyll-A, turbidity and methane) in the same experimentally controlled water body, assuming that all sensors provide comparable data. Results were analyzed with respect to individual sensor accuracy and precision related to an “assumed” real value as well as with respect to a cost versus accuracy/precision index for measuring specific environmental data. The results show, that accuracy and precision of sensors do not necessarily correlate with the price of the sensors and that low cost sensors may provide the same or even higher accuracy and precision values as even the highest price sensor types.

How to cite: Fischer, P., Friedrich, M., Brand, M., Koedel, U., Dietrich, P., Brix, H., Kerschke, D., and Bussmann, I.: The challenge of sensor selection, long term-sensor operation and data evaluation in inter- -institutional long term monitoring projects (lessons learned in the MOSES project) , EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-21816, https://doi.org/10.5194/egusphere-egu2020-21816, 2020.

D801 |
Pei Liu, Ruimei Han, and Leiku Yang

Rapid urbanization has become a major urban sustainability concern due to environmental impacts, such as development of urban heat island (UHI) and the reduction of urban security states. To date, most research on urban sustainability development has focus on dynamic change monitoring or UHI state characterization. While there is little literature on UHI change analysis. In addition, there has been little research on the impact of land use and land cover changes (LULCCs) on UHI, especially simulates future trend of LULCCs, UHI change, and dynamic relationship of LULCCs and UHI. The purpose of this research is to design a remote sensing based framework that investigates and analysis that how the LULCCs in the process of urbanization affected thermal environment. In order to assesses and predicts impact of LULCCs on urban heat environment, multi-temporal remotely sensed data from 1986 to 2016 were selected as source data, and Geographic Information System (GIS) methods such as CA-Markov model were employed to construct the proposed framework. The results shown that (1) there has been a substantial strength of urban expansion during the 40 years study period; (2) the most far distance urban center of gravity movement from north-northeast (NEE) to west-southwest (WSW) direction; (3) the dominate temperature were middle level, sub-high level and high level in the research area; (4) there was a higher changing frequency and range from east to west; (5) there was significant negative correlation between land surface temperature and vegetation, and significant positive correlation between temperature and human settlement.

How to cite: Liu, P., Han, R., and Yang, L.: Land-Use/Land-Cover Changes and Their Influence on Urban Thermal Environment in Zhengzhou City During the Period of 1986 to 2026, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-9338, https://doi.org/10.5194/egusphere-egu2020-9338, 2020.

D802 |
Everardo González Ávalos and Ewa Burwicz

Over the past decade deep learning has been used to solve a wide array of regression and classification tasks. Compared to classical machine learning approaches (k-Nearest Neighbours, Random Forests,… ) deep learning algorithms excel at learning complex, non-linear internal representations in part due to the highly over-parametrised nature of their underling models; thus, this advantage often comes at the cost of interpretability. In this work we used deep neural network to construct global total organic carbon (TOC) seafloor concentration map. Implementing Softmax distributions on implicitly continuous data (regression tasks) we were able to obtain probability distributions to asses prediction reliability. A variation of Dropout called Monte Carlo Dropout is also used during the inference step providing a tool to model prediction uncertainties. We used these techniques to create a model information map which is a key element to develop new data-driven sampling strategies for data acquisition. 

How to cite: González Ávalos, E. and Burwicz, E.: Deep neural networks for total organic carbon prediction and data-driven sampling, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-22587, https://doi.org/10.5194/egusphere-egu2020-22587, 2020.

D803 |
Angela Schäfer, Norbert Anselm, Janik Eilers, Stephan Frickenhaus, Peter Gerchow, Frank Oliver Glöckner, Antonie Haas, Isabel Herrarte, Roland Koppe, Ana Macario, Christian Schäfer-Neth, Brenner Silva, and Philipp Fischer

Today's fast digital growth made data the most essential tool for scientific progress in Earth Systems Science. Hence, we strive to assemble a modular research infrastructure comprising a collection of tools and services that allow researchers to turn big data into scientific outcomes.

Major roadblocks are (i) the increasing number and complexity of research platforms, devices, and sensors, (ii) the heterogeneous project-driven requirements towards, e. g., satellite data, sensor monitoring, quality assessment and control, processing, analysis and visualization, and (iii) the demand for near real time analyses.

These requirements have led us to build a generic and cost-effective framework O2A (Observation to Archive) to enable, control, and access the flow of sensor observations to archives and repositories.

By establishing O2A within major cooperative projects like MOSES and Digital Earth in the research field Earth and Environment of the German Helmholtz Association, we extend research data management services, computing powers, and skills to connect with the evolving software and storage services for data science. This fully supports the typical scientific workflow from its very beginning to its very end, that is, from data acquisition to final data publication. 

The key modules of O2A's digital research infrastructure established by AWI to enable Digital Earth Science are implementing the FAIR principles:

  • Sensor Web, to register sensor applications and capture controlled meta data before and alongside any measurement in the field
  • Data ingest, allowing researchers to feed data into storage systems and processing pipelines in a prepared and documented way, at best in controlled NRT data streams
  • Dashboards, allowing researchers to find and access data and share and collaborate among partners
  • Workspace, enabling researchers to access and use data with research software in a cloud-based virtualized infrastructure that allows researchers to analyse massive amounts of data on the spot
  • Archiving and publishing data via repositories and Digital Object Identifiers (DOI).

How to cite: Schäfer, A., Anselm, N., Eilers, J., Frickenhaus, S., Gerchow, P., Glöckner, F. O., Haas, A., Herrarte, I., Koppe, R., Macario, A., Schäfer-Neth, C., Silva, B., and Fischer, P.: Implementing FAIR in a Collaborative Data Management Framework, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-19631, https://doi.org/10.5194/egusphere-egu2020-19631, 2020.

D804 |
David Schäfer, Bert Palm, Lennart Schmidt, Peter Lünenschloß, and Jan Bumberger

The number of sensors used in the environmental system sciences is increasing rapidly, and while this trend undoubtedly provides a great potential to broaden the understanding of complex spatio-temporal processes, it comes with its own set of new challenges. The flow of data from a source to its sink, from sensors to databases, involves many, usually error prone intermediate steps. From the data acquisition with its specific scientific and technical challenges, over the data transfer from often remote locations to the final data processing, all carry great potential to introduce errors and disturbances into the actual environmental signal.

Quantifying these errors becomes a crucial part of the later evaluation of all measured data. While many large environmental observatories are moving from manual to more automated ways of data processing and quality assurance, these systems are usually highly customized and hand written. This approach is non-ideal in several ways: First, it is a waste of resources as the same algorithms are implemented over and over again and second, it imposes great challenges to reproducibility. If the relevant programs are made available at all, they expose all problems of software reuse: correctness of the implementation, readability and comprehensibility for future users, as well as transferability between different computing environments. Beside these problems, related to software development in general, another crucial factor comes into play: the end product, a processed and quality controlled data set, is closely tied to the current version of the programs in use. Even small changes to the source code can lead to vastly differing results. If this is not approached responsibly, data and programs will inevitably fall out of sync.

The presented software, the 'System for automated Quality Control (SaQC)' (www.ufz.git.de/rdm-software/saqc), helps to either solve, or massively simplify the solution to the presented challenges. As a mainly no-code platform with a large set of implemented functionality, SaQC lowers the entry barrier for the non-programming scientific practitioner, without sacrificing the possibilities to fine-grained adaptation to project specific needs. The text based configuration allows the easy integration into version control systems and thus opens the opportunity to use well established software for data lineage. We will give a short overview of the program's unique features and showcase possibilities to build reliable and reproducible processing and quality assurance pipelines for real-world data from a spatially distributed, heterogeneous sensor network.

How to cite: Schäfer, D., Palm, B., Schmidt, L., Lünenschloß, P., and Bumberger, J.: From source to sink - Sustainable and reproducible data pipelines with SaQC, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-19648, https://doi.org/10.5194/egusphere-egu2020-19648, 2020.

D805 |
Daniel Eggert and Doris Dransch

Environmental scientists aim at understanding not only single components but systems, one example is the flood system; scientists investigate the conditions, drivers and effects of flood events and the relations between them. Investigating environmental systems with a data-driven research approach requires linking a variety of data, analytical methods, and derived results.

Several obstacles exist in the recent scientific work environment that hinder scientists to easily create these links. They are distributed and heterogeneous data sets, separated analytical tools, discontinuous analytical workflows, as well as isolated views to data and data products. We address these obstacles with the exception of distributed and heterogeneous data since this is part of other ongoing initiatives.

Our goal is to develop a framework supporting the data-driven investigation of environmental systems. First we integrate separated analytical tools and methods by the means of a component-based software framework. Furthermore we allow for seamless and continuous  analytical workflows by applying the concept of digital workflows, which also demands the aforementioned integration of separated tools and methods. Finally we provide integrated views of data and data products by interactive visual interfaces with multiple linked views. The combination of these three concepts from computer science allows us to create a digital research environment that enable scientists to create the initially mentioned links in a flexible way. We developed a generic concept for our approach, implemented a corresponding framework and finally applied both to realize a “Flood Event Explorer” prototype supporting the comprehensive investigation of a flood system.

In order to implement a digital workflow our approach intends to precisely define the workflow’s requirements. We mostly do this by conducting informal interviews with the domain scientists. The defined requirements also include the needed analytical tools and methods, as well as the utilized data and data products. For technically integrating the needed tools and methods our created software framework provides a modularization approach based on a messaging system. This allows us to create custom modules or wrap existing implementations and tools. The messaging system (e.g. pulsar) then connects these individual modules. This enables us to combine multiple methods and tools into a seamless digital workflow. The described approach of course demands the proper definition of interfaces to modules and data sources. Finally our software framework provides multiple generic visual front-end components (e.g. tables, maps and charts) to create interactive linked views supporting the visual analysis of the workflow’s data.

How to cite: Eggert, D. and Dransch, D.: An integrative framework for data-driven investigation of environmental systems, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-9251, https://doi.org/10.5194/egusphere-egu2020-9251, 2020.

D806 |
Hai-Po Chan and Kostas Konstantinou

Mayon Volcano on eastern Luzon Island is the most active volcano in the Philippines. It is named and renowned as the "perfect cone" for the symmetric conical shape and has recorded eruptions over 50 times in the past 500 years. Geographically the volcano is surrounded by the eight cities and municipalities with 1 million inhabitants. Currently, its activity is daily monitored by on-site observations such as seismometers installed on Mayon's slopes, plus, electronic distance meters (EDMs), precise leveling benchmarks, and portable fly spectrometers. Compared to existing direct on-site measurements, satellite remote sensing is currently assuming an essential role in understanding the whole picture of volcanic processes. The vulnerability to volcanic hazards is high for Mayon given that it is located in an area of high population density on Luzon Island. However, the satellite remote sensing method and dataset have not been integrated into Mayon’s hazard mapping and monitoring system, despite abundant open-access satellite dataset archives. Here, we perform multiscale and multitemporal monitoring based on the analysis of a nineteen-year Land Surface Temperature (LST) time series derived from satellite-retrieved thermal infrared imagery. Both Landsat thermal imagery (with 30-meter spatial resolution) and MODIS (Moderate Resolution Imaging Spectroradiometer) LST products (with 1-kilometer spatial resolution) are used for the analysis. The Ensemble Empirical Mode Decomposition (EEMD) is applied as the decomposition tool to decompose oscillatory components of various timescales within the LST time series. The physical interpretation of decomposed LST components at various periods are explored and compared with Mayon’s eruption records. Results show that annual-period components of LST tend to lose their regularity following an eruption, and amplitudes of short-period LST components are very responsive to the eruption events. The satellite remote sensing approach provides more insights at larger spatial and temporal scales on this renowned active volcano. This study not only presents the advantages and effectiveness of satellite remote sensing on volcanic monitoring but also provides valuable surface information for exploring the subsurface volcanic structures in Mayon.

How to cite: Chan, H.-P. and Konstantinou, K.: Surface Temperature Monitoring by Satellite Thermal Infrared Imagery at Mayon Volcano of Philippines, 1988-2019, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-1412, https://doi.org/10.5194/egusphere-egu2020-1412, 2020.

D807 |
Erik Nixdorf, Marco Hannemann, Uta Ködel, Martin Schrön, and Thomas Kalbacher

Soil moisture is a critical hydrological component for determining hydrological state conditions and a crucial variable in controlling land-atmosphere interaction including evapotranspiration, infiltration and groundwater recharge.

At the catchment scale, spatial- temporal variations of soil moisture distribution are highly variable due to the influence of various factors such as soil heterogeneity, climate conditions, vegetation and geomorphology. Among the various existing soil moisture monitoring techniques, the application of vehicle-mounted Cosmic Ray Sensors (CRNS) allows monitoring soil moisture noninvasively by surveying larger regions within a reasonable time. However, measured data and their corresponding footprints are often allocated along the existing road network leaving inaccessible parts of a catchment unobserved and surveying larger areas in short intervals is often hindered by limited manpower.

In this study, data from more than 200 000 CRNS rover readings measured over different regions of Germany within the last 4 years have been employed to characterize the trends of soil moisture distribution in the 209 km2 large Mueglitz River Basin in Eastern Germany. Subsets of the data have been used to train three different supervised machine learning algorithms (multiple linear regression, random forest and artificial neural network) based on 85 independent relevant dynamic and stationary features derived from public databases.  The Random Forest model outperforms the other models (R2= ~0.8), relying on day-of-year, altitude, air temperature, humidity, soil organic carbon content and soil temperature as the five most influencing predictors.

After test and training the models, CRNS records for each day of the last decade are predicted on a 250 × 250 m grid of Mueglitz River Basin using the same type of features. Derived CRNS record distributions are compared with both, spatial soil moisture estimates from a hydrological model and point estimates from a sensor network operated during spring 2019. After variable standardization, preliminary results show that the applied Random Forest model is able to resemble the spatio-temporal trends estimated by the hydrological model and the point measurements. These findings demonstrate that training machine learning models on domain-unspecific large datasets of CRNS records using spatial-temporally available predictors has the potential to fill measurement gaps and to improve soil moisture dynamics predictions on a catchment scale.

How to cite: Nixdorf, E., Hannemann, M., Ködel, U., Schrön, M., and Kalbacher, T.: Catchment scale prediction of soil moisture trends from Cosmic Ray Neutron Rover Surveys using machine learning, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-3049, https://doi.org/10.5194/egusphere-egu2020-3049, 2020.

D808 |
Christian Scharun, Roland Ruhnke, Jennifer Schröter, Michael Weimer, and Peter Braesicke

Methane (CH4) is the second most important greenhouse gas after CO2 affecting global warming. Various sources (e.g. fossil fuel production, agriculture and waste, biomass burning and natural wetlands) and sinks (the reaction with the OH-radical as the main sink contributes to tropospheric ozone production) determine the methane budget. Due to its long lifetime in the atmosphere methane can be transported over long distances.

Disused and active offshore platforms can emit methane, the amount being difficult to quantify. In addition, explorations of the sea floor in the North Sea showed a release of methane near the boreholes of both, oil and gas producing platforms. The basis of this study is the established emission data base EDGAR (Emission Database for Global Atmospheric Research), an inventory that includes methane emission fluxes in the North Sea region. While methane emission fluxes in the EDGAR inventory and platform locations are matching for most of the oil platforms almost all of the gas platform sources are missing in the database. We develop a method for estimating the missing sources based on the EDGAR emission inventory.

In this study the global model ICON-ART (ICOsahedral Nonhydrostatic model - Aerosols and Reactive Trace gases) will be used. ART is an online-coupled model extension for ICON that includes chemical gases and aerosols. One aim of the model is the simulation of interactions between the trace substances and the state of the atmosphere by coupling the spatiotemporal evolution of tracers with atmospheric processes. ICON-ART sensitivity simulations are performed with inserted and adjusted sources to access their influence on the methane and OH-radical distribution on regional (North Sea) and global scales.

How to cite: Scharun, C., Ruhnke, R., Schröter, J., Weimer, M., and Braesicke, P.: Modeling methane from the North Sea region with ICON-ART, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-5028, https://doi.org/10.5194/egusphere-egu2020-5028, 2020.

D809 |
| Highlight
Wolfgang Kurtz, Stephan Hachinger, Anton Frank, Wolfram Mauser, Jens Weismüller, and Christine Werner

The ViWA (Virtual Water Values) project aims to provide a global-scale assessment of the current usage of water resources, of the efficiency of water use and of agricultural yields as well as the flow and trade of ‘virtual’ water across country boundaries. This is achieved by establishing a global management and monitoring system which combines high-resolution (1 km2) agro-hydrological model simulations with information from high-resolution remote-sensing data from Copernicus satellites. The monitoring system is used to judge the progress in achieving water-related UN sustainable development goals on the local and global scale. Specific goals of the project are, for example, to:

  • evaluate possible inefficiencies of the current water use in agriculture, industry and water management and its economic consequences.
  • assess the vulnerability of agriculture and ecosystems to climate variability with a special emphasis on water availability.
  • identify regional hot-spots of unsustainable water use and to analyze possible institutional obstacles for a sustainable and efficient water use.
  • identify trade-offs between the commercial water use and protection of ecosystem services.

A cornerstone for reaching these project goals are high-resolution global ensemble simulations with an agro-hydrological model for a variety of crop types and management practices. These simulations provide the relevant information on agricultural yields and water demands at different scales. In this context, a considerable amount of data is generated and subsets of these data might also be of direct relevance for different external interest groups.

In this presentation, we describe our approach for managing the simulation data, with a special focus on possible strategies for data provisioning to interested stakeholders, scientists, practitioners and the general public. We will give an overview on the corresponding simulation and data storage workflows on the utilized HPC-systems and we will discuss methods for providing the data to the different interest groups. Among other aspects, we address findability (in the sense of the FAIR principles) of simulation results for the scienctific community in indexed search portals through a proper metadata annotation. We also discuss a prototypical interactive web portal for visualizing, subsetting and downloading of selected parts of the data set.

How to cite: Kurtz, W., Hachinger, S., Frank, A., Mauser, W., Weismüller, J., and Werner, C.: Management and dissemination of global high-resolution agro-hydrological model simulation data from the Virtual Water Values project, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-10239, https://doi.org/10.5194/egusphere-egu2020-10239, 2020.

D810 |
Valeriy Kovalskyy and Xiaoyuan Yang

Imagery products are critical for digital agriculture as they help delivering value and insights to growers. Use of publicly available satellite data feeds by digital agriculture companies helps keeping imagery services affordable for broader base of farmers. Optimal use of public and private imagery data sources plays a critical role in the success of image based services for agriculture. 

At the Climate Corporation we have established a program focused on intelligence about satellite image coverage and frequency expected in different geographies and times of the year which is becoming critical for global expansion of the company. In this talk we report the results of our analysis on publicly available imagery data sources for key agricultural regions of the globe. Also, we demonstrate how these results can guide commercial imagery acquisition decisions on the case study in Brazil, where some growers run the risk of going through the growing season without receiving imagery from one satellite if relying on a single source of satellite imagery. The study clearly shows the validity of approaches taken as the results matched with factual image deliveries to single digits of percent cover on regional level. Also, our analysis clearly captured realistic temporal and spatial details of chances in image frequency from addition of alternative satellite imagery sources to the production stream. The optimization in imagery acquisitions enables filling data gaps for research and development. In the meantime, it contributes to delivering greater value for growers in Crop Health Monitoring and other image based service. 

How to cite: Kovalskyy, V. and Yang, X.: Assessment of Multiplatform Satellite Image Frequency for Crop Health Monitoring, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-12328, https://doi.org/10.5194/egusphere-egu2020-12328, 2020.

D811 |
Daniela Henkel, Everardo González Ávalos, Mareike Kampmeier, Patrick Michaelis, and Jens Greinert

Marine munitions, or unexploded ordnances (UXO), were massively disposed of in coastal waters after World War II; they are still being introduced into the marine environment during war activities and military exercises. UXO detection and removal has gained great interest during the ongoing efforts to install offshore wind parks for energy generation as well as cable routing through coastal waters. Additionally, 70 years after World War II munition dumping events, more and more chemical and conventional munition is rusting away increasing the risk of toxic contamination.

The general detection methodology includes high resolution multibeam mapping, hydroacoustic sub-bottom mapping, electromagnetic surveys with gradiometers as well as visual inspections by divers or remotely operated vehicles (ROVs). Using autonomous unmanned vehicles (AUVs) for autonomous underwater inspections with multibeam, camera and EM systems is the next technological step in acquiring meaningful high resolution data independently of a mother ship. However, it would be beneficial for the use of such technology to be able to better predict potential hot spots of munition targets and distinguish them from other objects such as rocks, small artificial constructions or metallic waste (wires, barrels, etc.).

The above-mentioned predictor layers could be utilized for machine learning with different, already existing, and accessible algorithms. The structure of the data has a high similarity to image data, an area where neural networks are the benchmark. As a first approach we therefore trained convolutional neural networks in a supervised manner to detect seafloor areas contaminated with UXO. For this we manually annotated known UXO locations as well as known non-UXO locations to generate a training dataset which was later augmented by rotating and flipping each annotated tile. We achieved a high accuracy with this approach using only a subset of the data sources mentioned above as input layers. We also explored the use of further input layers and larger training datasets, and their impact in performance. This is a good example for machine learning enabling us to classify large areas in a short time and with minimal need for manual annotation.

How to cite: Henkel, D., González Ávalos, E., Kampmeier, M., Michaelis, P., and Greinert, J.: Machine learning as supporting method for UXO mapping and detection, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-22594, https://doi.org/10.5194/egusphere-egu2020-22594, 2020.

D812 |
Uta Koedel, Peter Dietrich, Erik Nixdorf, and Philipp Fischer

The term “SMART Monitoring” is often used in digital projects to survey and analyze data flows in near- or realtime. The term is also adopted in the project Digital Earth (DE) which was jointly launched in 2018 by the eight Helmholtz centers of the research field Earth and Environment (E&E) within the framework of the German Ministry of Education and Research (BMBF). Within DE, the “SMART monitoring” sub-project aims at developing workflows and processes to make scientific parameters and the related datasets SMART, which means specific, measurable, accepted, relevant, and trackable (SMART).

“SMART Monitoring” in DE comprises a combination of hard- and software tools to enhance the traditional sequential monitoring approach - where data are step-by-step analyzed and processed from the sensor towards a repository - into an integrated analysis approach where information on the measured value together with the status of each sensor and possible auxiliary relevant sensor data in a sensor network are available and used in real-time to enhance the sensor output concerning data accuracy,  precision, and data availability. Thus, SMART Monitoring could be defined as a computer-enhanced monitoring network with automatic data flow control from individual sensors in a sensor network to databases enhanced by automated (machine learning) and near real-time interactive data analyses/exploration using the full potential of all available sensors within the network. Besides, “SMART monitoring” aims to help for a better adjustment of sensor settings and monitoring strategies in time and space in iterative feedback.

This poster presentation will show general concepts, workflows, and possible visualization tools based on examples that support the SMART Monitoring idea.

How to cite: Koedel, U., Dietrich, P., Nixdorf, E., and Fischer, P.: Significance and implementation of SMART Monitoring Tools, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-11084, https://doi.org/10.5194/egusphere-egu2020-11084, 2020.

D813 |
Katharina Höflich, Martin Claus, Willi Rath, Dorian Krause, Benedikt von St. Vieth, and Kay Thust

Demand on high-end high performance computer (HPC) systems by the Earth system science community today encompasses not only the handling of complex simulations but also machine and deep learning as well as interactive data analysis workloads on large volumes of data. This poster addresses the infrastructure needs of large-scale interactive data analysis workloads on supercomputers. It lays out how to enable optimizations of existing infrastructure with respect to accessibility, usability and interactivity and aims at informing decision making about future systems. To enhance accessibility, options for distributed access, e.g. through JupyterHub, will be evaluated. To increase usability, the unification of working environments via the operation and the joint maintenance of containers will be explored. Containers serve as a portable base software setting for data analysis application stacks and allow for long-term usability of individual working environments and repeatability of scientific analysis. Aiming for interactive big-data analysis on HPC will also help the scientific community in utilizing increasingly heterogeneous supercomputers, since the modular data-analysis stack already contains solutions for seamless use of various architectures such as accelerators. However, to enable day-to-day interactive work on supercomputers, the inter-operation of workloads with quick turn-around times and highly variable resource demands needs to be understood and evaluated. To this end, scheduling policies on selected HPC systems are reviewed with respect to existing technical solutions such as job preemption, utilizing the resiliency features of parallel computing toolkits like Dask. Presented are preliminary results focussing on the aspects of usability and interactive use of HPC systems on the basis of typical use cases from the ocean science community.

How to cite: Höflich, K., Claus, M., Rath, W., Krause, D., von St. Vieth, B., and Thust, K.: Towards easily accessible interactive big-data analysis on supercomputers, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-22618, https://doi.org/10.5194/egusphere-egu2020-22618, 2020.

D814 |
| Highlight
Uta Koedel and Peter Dietrich

The FAIR principle is on its way to becoming a conventional standard for all kinds of data. However, it is often forgotten that this principle does not consider data quality or data reliability issues. If the data quality isis not sufficiently described, a wrong interpretation and use of these data in a common interpretation can lead to false scientific conclusions. Hence, the statement about data reliability is an essential component for secondary data processing and joint interpretation efforts. Information on data reliability, uncertainty, quality as well as information on the used devices are essential and needs to be introduced or even implemented in the workflow from the sensor to a database if data is to be considered in a broader context.

In the past, many publications have shown that the same devices at the same location do not necessarily provide the same measurement data. Likewise, statistical quantities and confidence intervals are rarely given in publications in order to assess the reliability of the data. Many secondary users of measurement data assume that calibration data and the measurement of other auxiliary variables are sufficient to estimate the data reliability. However, even if some devices require on-site field calibration, that does not mean that the data are comparable. Heat, cold, internal processes on electronic components can lead to differences in measurement data recorded with devices of the same type at the same location, especially with the increasingly complex devices themselves.

The data reliability can be increased by implementing data uncertainty issues within the FAIR principle. The poster presentation will show the importance of comparative measurements, the information needs for the application of proxy-transfer functions, and suitable uncertainty analysis for databases.

How to cite: Koedel, U. and Dietrich, P.: Going beyond FAIR to increase data reliability, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-11117, https://doi.org/10.5194/egusphere-egu2020-11117, 2020.