GI2.3 | Data and Information Services for Interdisciplinary Research and Applications in Earth Science
EDI
Data and Information Services for Interdisciplinary Research and Applications in Earth Science
Co-organized by EMRP2/ESSI3/SM2
Convener: Sebastien Payan | Co-conveners: Hela Mehrtens, Wolfgang zu Castell, Frederic Huynh
Orals
| Fri, 28 Apr, 08:30–10:15 (CEST)
 
Room -2.91
Posters on site
| Attendance Fri, 28 Apr, 14:00–15:45 (CEST)
 
Hall X4
Posters virtual
| Attendance Fri, 28 Apr, 14:00–15:45 (CEST)
 
vHall ESSI/GI/NP
Orals |
Fri, 08:30
Fri, 14:00
Fri, 14:00
Research in Earth and environmental sciences benefits from interdisciplinary approaches (e.g. to understand and model multi-scale processes). The study of complex environmental processes may involve diverse collections of samples and associated field or laboratory measurements, sensors, remote sensing data, across international dimensions. Research benefits from practices that use easily-portable and reproducible tools and techniques. Best practices of sharing our data and software are now well-established and the earth science community needs to move forward with generally accepted methodologies of software and data distribution that can expand easily to include complex system and multi-domain challenges.

This session seeks innovative presentations for interdisciplinary research and applications, including but not limited to, on Earth Science data and service activities. Presentations addressing the specific societal needs, best practices, learned lessons and new challenges in data provenance, information access, visualization, and analysis, are highly encouraged, as well as presentation on the ways to adopt FAIR data principles towards sustainable solutions in Earth Science and the path to open science are . Discussion of challenges for future data services or European infrastructure are also welcome.

Orals: Fri, 28 Apr | Room -2.91

Chairpersons: Sebastien Payan, Hela Mehrtens, Wolfgang zu Castell
08:30–08:35
08:35–08:45
|
EGU23-16605
|
GI2.3
|
ECS
|
On-site presentation
Kaylin Bugbee, Ashish Acharya, Carson Davis, Emily Foshee, Rahul Ramachandran, Xiang Li, and Muthukumaran Ramasubramanian

NASA’s Science Plan includes a strategy to advance discovery by leveraging cross-disciplinary opportunities between scientific disciplines. In addition, NASA is committed to building an inclusive, open science community over the next decade and is championing the new Open-Source Science Initiative (OSSI) to foster that community. The OSSI supports many activities to promote open science including the development of an empowering cyberinfrastructure to accelerate the time to actionable science. One component of the OSSI cyberinfrastructure is the Science Discovery Engine (SDE). The goal of the SDE is to enable the discovery of data, software and documentation across the five SMD divisions including Astrophysics, Biological and Physical Sciences, Earth Science, Heliophysics and Planetary Science. The SDE increases accessibility to NASA’s open science data and information and promotes interdisciplinary scientific discovery. In this presentation, we describe our work to develop the Science Discovery Engine in Sinequa, a Cognitive Search capability. We also share lessons learned about data governance, curation and information access.

How to cite: Bugbee, K., Acharya, A., Davis, C., Foshee, E., Ramachandran, R., Li, X., and Ramasubramanian, M.: NASA’s Science Discovery Engine: An Interdisciplinary, Open Science Data and Information Discovery Service, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-16605, https://doi.org/10.5194/egusphere-egu23-16605, 2023.

08:45–08:55
|
EGU23-1099
|
GI2.3
|
On-site presentation
|
Isabelle Braud, Charly Coussot, Véronique Chaffard, and Sylvie Galle

Understanding, modeling and predicting the future of the Earth System in response to global change is a challenge for the Earth system scientific community, but a necessity to address pressing societal needs related to the UN Sustainable Development Goals and risk monitoring and prediction. These “wicked” environmental problems require the building of integrated modeling tools . The latter will only provide reliable response if they integrate all existing multi-disciplinary data sources. Open science and data sharing using the FAIR (Findable, Accessible, Interoperable, Reusable) principles provide the framework for such data sharing. However, when trying to put it into practice, we face a large fragmentation of the landscape, with different communities having developed their own data management systems, standards and tools.

When starting to work on the Theia/OZCAR Information System (IS) that aims to Facilitate the discovery, to make FAIR, in-situ data of continental surfaces collected by French research organizations and their foreign partners, we performed a “Tour de France” to understand the critical zone science users’ needs when searching for data. The common criterion that emerged was the variables names. We believe that this need is general to all disciplines involved in Earth System sciences and is all the more important when data is searched by scientists of other disciplines that are not familiar with the vocabularies of the other communities. This abstract aim is to share our experience in building the tools aiming at harmonizing and sharing variables names using FAIR principles.

In the Theia/OZCAR critical zone research community, long term observatories that produce the data have heterogeneous data description practices and variable names. They may be different for the same variable (i.e.: "soil moisture", "soil water content", "humidité des sols", etc.). Moreover, it is not possible to infer automatically or semi-automatically similarities between these variables names. In order to identify these similarities and implement data discovery functionalities on these dimensions in the IS, we built the Theia/OZCAR variable thesaurus. To enable technical interoperability of the thesaurus, it is published on the web using the SKOS vocabulary description standard. Other thesauri used in environmental sciences in Europe and worldwide have been identified and the definition of associative relationships with these vocabularies ensures the semantic interoperability of the Theia/OZCAR thesaurus. However, it is quite common that the variable names used for the search dimensions remain general (e.g. "soil moisture") and are not specific enough for the end user to interpret exactly what has been measured (e.g. "soil moisture at 10 cm depth measured by TDR probe"). Therefore, to improve data reuse and interoperability, the thesaurus now follows a recommendation of the Research Data Alliance and implements the I-ADOPT framework to describe the variables more precisely. Each variable is composed and described by relationships with atomic concepts whose definition is specified. The use of these atomic concepts enhances interoperability with other catalogues or services and contributes to the reuse of the data by other communities that those who collected them.

How to cite: Braud, I., Coussot, C., Chaffard, V., and Galle, S.: Theia/OZCAR Thesaurus: a terminology service to facilitate the discovery, interoperability and reuse of data from continental surfaces and critical zone science in interdisciplinary research, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-1099, https://doi.org/10.5194/egusphere-egu23-1099, 2023.

08:55–09:05
|
EGU23-15863
|
GI2.3
|
On-site presentation
Benedikt Gräler, Katharina Demmich, Johannes Schnell, Merel Vogel, Stefano Bagli, and Paolo Mazzoli

Climate Services (CS) are crucial in empowering citizens, stakeholders and decision-makers in defining resilient pathways to adapt to climate change and extreme weather events. Despite advances in scientific data and knowledge (e.g. Copernicus, GEOSS), current CS fail to achieve their full value proposition to end users. Challenges include incorporation of social and behavioral factors, local needs, knowledge and the customs of end users. In I-CISK, we put forward a co-design based requirement analysis to develop a Spatial Data Infrastructure and Platform that empowers a next generation of end user CS, which follow a social and behaviorally informed approach to co-producing services that meet climate information needs of the Living Labs of the European I-CISK project. Core to the project are climate extremes such as droughts, floods and heatwaves. The use-cases touch upon agriculture, forestry, tourism, energy, health, and the humanitarian sectors. We will present the summarized stakeholders' requirements regarding the new climate-service platform and their technical implications for the open source spatial infrastructure. The design also includes assessing, managing and presenting uncertainties that are an inherent component of climate models.

How to cite: Gräler, B., Demmich, K., Schnell, J., Vogel, M., Bagli, S., and Mazzoli, P.: Building an Open Source Infrastructure for Next Generation End User Climate Services, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-15863, https://doi.org/10.5194/egusphere-egu23-15863, 2023.

09:05–09:15
|
EGU23-12144
|
GI2.3
|
On-site presentation
Rorie Edmunds

Material samples are a vital output of the scientific endeavour. They underpin research in the Earth, Space, and Environmental Sciences, and are a necessary component of ensuring the transparency and reproducibility of such research. While there has been a lot of discussion in recent years about the openness and FAIRness of data, code, methods, and so on, material samples have been much less under the spotlight.

The lack of focus on material samples is in part due to them being unique as a research output, in the sense that they are inherently physical and thus they are mostly transported and managed by human beings rather than machines; it is rather more straightforward to archive and share both information about an output—and the output itself—for something that is already a digital object. However, it is for this reason that materials samples must be made more FAIR and treated as first-class citizens of Open Science. To do this, one needs to connect the physical and digital worlds. IGSN IDs enable these connections to be made.

The IGSN ID is a globally unique and persistent identifier (PID) specifically for labelling material samples themselves (i.e., they are for neither images nor data about a sample). Functionally a Digital Object Identifier (DOI) registered under DataCite services, the IGSN ID can be applied to all types of material samples coming from any discipline. Not only can IGSN IDs be used to identify individual material samples that currently exist in a repository, museum, or otherwise, but they can also be registered

  • At the aggregate level for sample collections.
  • For the sites from which the samples are taken.
  • For ephemeral samples.

Importantly, in all cases, when registering an IGSN IDs, one must supply metadata in the DataCite Metadata Schema, as well as create landing pages that supply additional, disciplinary, user-focussed information about the collection, site, or (sub)sample. Hence, by registering a PID for a physical object, it is given a permanently resolvable URI to a findable and accessible digital footprint, and through the provision of rich metadata, enables its interoperability and reusability. Sharing of associated data is also possible within the metadata, and one may even include the potential for relocation of a sample itself for reuse.

This presentation will briefly introduce the IGSN ID and the partnership between DataCite and the IGSN e.V. to transfer the IGSN PID infrastructure under DataCite DOI services. It will mainly highlight practical use cases of IGSN IDs, including what is needed to include them in the sample workflow. It will also talk about efforts to better support IGSN IDs and sample metadata within the DataCite Metadata Schema.

How to cite: Edmunds, R.: FAIR & Open Material Samples: The IGSN ID, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-12144, https://doi.org/10.5194/egusphere-egu23-12144, 2023.

09:15–09:25
|
EGU23-12173
|
GI2.3
|
On-site presentation
Andrew Valentine, Jiawen He, Juerg Hauser, and Malcolm Sambridge

Many Earth systems cannot be observed directly, or in isolation. Instead, we must infer their properties and characteristics from their signature in one or more datasets, using a variety of techniques (including those based on optimization, statistical methods, or machine learning). Development of these techniques is an area of focus for many geoscience researchers, and methodological advances can be instrumental in enhancing our understanding of the Earth.         

In our experience, progress is substantially hindered by the absence of infrastructure facilitating communication between sub-disciplines. Researchers tend to focus on one area of the earth sciences — such as seismology, hydrology or oceanography — with only slow percolation of ideas and innovations from one area to another. Indeed, silos often exist even within these subfields. Testing new ideas on new problems is challenging as it requires the acquisition of domain knowledge, an often difficult and time-consuming endeavour with uncertain returns. Key questions that arise include: What is a relevant field data set, and how has it been processed? Which simulation package is most appropriate to predict the data? What would a 'good' model look like and what should it be able to resolve? What is the current best practice?

To address this, we introduce the ESPRESSO project — a collection of Earth Science Problems for the Evaluation of Strategies, Solvers and Optimisers. It aims to provide  access to a suite of ‘test problems’, spanning a wide range of inference and inversion scenarios. Each test problem defines appropriate dataset(s) and simulation routines, accessible within a standardised Python interface. This will allow researchers to rapidly test new techniques across a spectrum of problems, share domain-specific inference problems and ultimately identify areas where there may be potential for fruitful collaboration and development. ESPRESSO is envisaged as an open, community-sourced project, and we invite contributions from across the geosciences.

How to cite: Valentine, A., He, J., Hauser, J., and Sambridge, M.: ESPRESSO: Earth Science Problems for the Evaluation of Strategies, Solvers and Optimizers, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-12173, https://doi.org/10.5194/egusphere-egu23-12173, 2023.

09:25–09:35
|
EGU23-13420
|
GI2.3
|
ECS
|
On-site presentation
Alessandro Morichetta, Anne-Marie Lézine, Aline Govin, and Vincent Douet

Studying how the Earth’s climate changed in the past requires a joint interdisciplinary effort of scientists from different scientific domains. Paleoclimatic records are increasingly obtained on multiple archives (e.g. marine and terrestrial sediments, ice cores, speleothems, corals) and they document past changes in various climatic variables of the different components of the climatic system (e.g. ocean, atmosphere, vegetation, ice). 

Most paleoclimatic records still rely on independent observations with no standard format describing their data or metadata, resulting in a progressive increase of variables and taxonomies. Therefore, despite the achievements of the last decades (e.g. NOAA, NEOTOMA and PANGAEA databases), the lack of a common language strongly limits the systematic reusability of paleoclimate data, for example for the construction of paleoclimatic data syntheses or the evaluation of climate model simulations.

The international project “Abrupt Change in Climate and Ecosystems: Data and e-infrastructure” (ACCEDE, funded by the Belmont Forum) aims at creating an ecosystem for paleoclimatic data in order to investigate the tipping points of past climatic changes. In this context, the recently formalized Linked PaleoData (LiPD) format is the core for the standardization of paleoclimate data and metadata, and it is acting as communication protocol between the different databases that compose the e-infrastructure.

Here we show two web-based solutions that are part of this effort and that take advantage of the LiPD ecosystem. The African Pollen Database, and the IPSL Paleoclimate Database, both hosted and developed by Institut Pierre-Simon Laplace, France, have the objectives (1) to give open access, while respecting the FAIR principles, to a variety of paleoclimate datasets - from pollen fossils to various tracers measured on marine sediments, ice cores or tree rings -, and (2) to combine and compare, using visualization tools, carefully selected and well dated paleoclimatic records from different disciplines to address specific research questions. 

The two databases are the result of data recovery from pre-existing and obsolete archives that followed a process of data (and metadata) consolidation, enrichment and formatting, in order to respect the LiPD specification and ensure the interoperability between them and the already existing databases. We designed harmonised web interfaces and REST APIs to explore and export existing datasets with the help of filtering tools. Datasets are published with DOI under an open license, allowing free access to the completeness of information. A LiPD upload form is embedded to the websites, in order to encourage both users and data stewards to propose, edit, add new records, and to bring the community into the use of LiPD format. We are currently working on finalizing visualization tools to evaluate aggregate data for research and education purposes.

With this effort we are developing a framework in which heterogeneous paleoclimatic records are fully interoperable, allowing scientists from the whole community to take advantage of the completeness of the available data, and to reuse them for very different research applications.

How to cite: Morichetta, A., Lézine, A.-M., Govin, A., and Douet, V.: Development of interoperable web applications for paleoclimate research, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-13420, https://doi.org/10.5194/egusphere-egu23-13420, 2023.

09:35–09:45
|
EGU23-5626
|
GI2.3
|
On-site presentation
Chen Wang, David Miller, Alessandro Gimona, Maria Nijnik, and Yang Jiang

A digital twin is a digital representation of real-world physical product, system, or process. Digital twins potentially offer a much richer capability to model and analyze real-world systems and improve environment sustainability.

In this work, an integrated 3D GIS and VR model for scenarios modeling and interactive data visualisation has been developed and implemented through the Digital Twin technology at the Glensaugh research farm. Spatial Multi-criteria Analysis has been applied to decide where to plant new woodlands, recognizing a range of land-use objectives while acknowledging concerns about possible conflicts with other uses of the land. The virtual contents (e.g., forest spatial datasets, monitored climate data, analyzed carbon stocks and natural capital asset index) have been embedded in the virtual landscape model which help raise public awareness of changes in rural areas.

The Digital twin prototype for Glensaugh Climate-Positive Farming was used at the STFC workshop 2021, GISRUK 2022, 2022 Royal Highland Show which provides an innovative framework to integrate spatial data modelling, analytical capabilities and immersive visualization.

Audience feedback suggested that the virtual environment was very effective in providing a more realistic impression of the different land-use and woodland expansion scenarios and environmental characteristics. This suggests considerable added value from using digital twin technology to better deal with complexity of data analysis, scenarios simulation and enable rapid interpretation of solutions.

Findings show this method has a potential impact on future woodland planning and enables rapid interpretation of forest and climate data which increases the effectiveness of their use and contribution to wider sustainable environment.

How to cite: Wang, C., Miller, D., Gimona, A., Nijnik, M., and Jiang, Y.: An integration of digital twin technology, GIS and VR for the service of environmental sustainability, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-5626, https://doi.org/10.5194/egusphere-egu23-5626, 2023.

09:45–09:55
|
EGU23-7703
|
GI2.3
|
On-site presentation
Constanze Roedig and Kareem Sorathia

We present a method for publishing high performance compute (HPC) code and results in a scalable, portable and ready-to-use interactive environment in order to enable sharing, collaborating, peer-reviewing and teaching. We show how we utilize cloud native elements such as kubernetes, containerization, automation and webshells to achieve this and demonstrate such an OpenScienceLab for the MAGE (Multiscale Atmosphere Geospace Environment) model, being developed by the recently selected NASA DRIVE Center for Geospace Storms.
We argue that a key factor in the successful design of such an environment is its (cyber)-security, as  these labs require non-trivial compute resources open to a vast audience. Benefits as well as implied costs of different hosting options are discussed, comparing public cloud, hybrid, private cloud and even large desktops.
We encourage HPC centers to test our method using our fully open source blueprints. We hope to thus unburden the research staff and scientists to follow FAIR principles and support open source goals without needing a deep knowledge of cloud computing.

How to cite: Roedig, C. and Sorathia, K.: Cloud native OpenScienceLabs for HPC : Easing the road to FAIR collaboration and OpenSource, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-7703, https://doi.org/10.5194/egusphere-egu23-7703, 2023.

09:55–10:05
|
EGU23-8585
|
GI2.3
|
On-site presentation
Aaron Kaulfus, Alfreda Hall, Manil Maskey, Will McCarty, and Frederick Policelli

Established in 2017 as a pilot project, the NASA Commercial Smallsat Data Acquisition (CSDA) Program evaluates and acquires commercial datasets that compliment NASA Earth Science research and application goals. The success of the pilot and recognition of the value commercial data provide to the scientific community led to establishment of a sustained program within NASA’s Earth Science Division (ESD) with objectives of providing continuous on-ramp of new commercial vendors to evaluate the potential to advance NASA’s Earth science research and application activities, enable sustained use of the purchased data by the scientific community, ensure long-term preservation of purchased data for scientific reproducibility, and coordinate with other U.S. Government agencies and international partners on the evaluation and use of commercial data. This presentation will focus on data made available for scientific use through the CSDA Program, especially those datasets added since the conclusion of the original pilot project, describe the process for end users to access of CSDA managed datasets, and provide a status overview of ongoing and upcoming vendor evaluation activities will be given. Recent scientific research results from CSDA subject matter experts utilizing commercial data will also be provided.

How to cite: Kaulfus, A., Hall, A., Maskey, M., McCarty, W., and Policelli, F.: Programmatic Update for NASA’s Commercial Smallsat Data Acquisition (CSDA) Program, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-8585, https://doi.org/10.5194/egusphere-egu23-8585, 2023.

10:05–10:15
|
EGU23-7015
|
GI2.3
|
ECS
|
On-site presentation
Mariam Er-rondi, Magali Troin, Sylvain Coly, Emmanuel Buisson, Laurent Serlet, and Nourddine Azzaoui

Agriculture is extremely vulnerable to climate change. Increase in air temperature alongside the more frequent extreme climate events are the main climate change’s negative impacts influencing the yields, safety, and quality of crops. One approach to assess the impacts of climate change on agriculture is the use of agro-climatic indicators (AgcIs). Agcls characterize plant-climate interactions and are practical and understandable for both farmers and decision makers.

Climate and climate change impact studies on crop require long samples of reliable past and future datasets describing both spatial and temporal variability. The lack of observed historical data with an appropriate temporal resolution (i.e., 30 years of continuous daily data) and a sufficient local precision (i.e., 1km) is a major concern. To overcome that, the reanalysis products (RPs) are often used as a potential reference data of observed climate in impact studies. However, RPs have some limitations as they contain some biases and uncertainties. In addition, the RPs’ evaluation is often conducted on climate indicators which raises questions about their suitability for agro-climatic indicators.

This work aims to evaluate the ability of five of the most used RPs to reproduce observed AgcIs for three specific crops (i.e., apple, corn, and vine) over France. The five RPs selected for this study are the SCOPE Climate, FYRE Climate, ERA5, ERA5 Land and the gridded dataset RFHR. They are compared to the SYNOP meteorological data provided by Météo-France, considered as a reference dataset from 1996 to 2021.

Our findings show a higher agreement between the five RPs and SYNOP for the temperature-based Agcls than the precipitation-based Agcls. RPs tend to overestimate the precipitation-based Agcls. We also note that, for each RP, the discrepancies between the AgcIs and the reference SYNOP dataset do not depend on the geographical location or the crop. This study emphasizes the need to quantify uncertainty in climate data in climate variability and climate change impact studies on agriculture.

How to cite: Er-rondi, M., Troin, M., Coly, S., Buisson, E., Serlet, L., and Azzaoui, N.: Evaluation of five reanalysis products over France: implications for agro-climatic studies, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-7015, https://doi.org/10.5194/egusphere-egu23-7015, 2023.

Posters on site: Fri, 28 Apr, 14:00–15:45 | Hall X4

Chairpersons: Wolfgang zu Castell, Sebastien Payan, Hela Mehrtens
X4.176
|
EGU23-1599
|
GI2.3
Sabine Schmidt, Erwann Quimbert, Marine Vernet, Joël Sudre, Caroline Mercier, Dominique Obaton, Jean-François Piollé, Frédéric Merceur, Gérald Dibarboure, and Gilbert Maudire

The consequences of global change on the ocean are multiple such as increase in temperature and sea level, stronger storms, deoxygenation, impacts on ecosystems. But the detection of changes and impacts is still difficult because of the diversity and variability of marine environments. While there has been a clear increase in the number of marine and coastal observations, whether by in situ, laboratory or remote sensing measurements, each data is both costly to acquire and unique. The number and variety of data acquisition techniques require efficient methods of improving data availability via interoperable portals, which facilitate data sharing according to FAIR principles for producers and users. ODATIS, the ocean cluster of Data Terra, the French research infrastructure for Earth data, is the entry point to access all the French Ocean observation data (Ocean Data Information and Services ; www.odatis-ocean.fr/en/). The first challenge of ODATIS is to get data producers to share data. To that purpose, ODATIS offers several services to help them define Data Management Plan (DPM), implement the FAIR principles, make data more visible and accessible by being referenced in the ODATIS catalog, and better tracked and cited through a Digital Object Identifier (DOI). ODATIS also offers a service for publishing open scientific data on the sea, through SEANOE (www.seanoe.org) that provides a DOI that can be cited in scientific articles in a reliable and sustainable way. In parallel to the informatic development of the ocean cluster, further communication and training are needed to inform the research community of these new tools. Through technical workshops, Odatis offers data providers practical experience and support in implementing data access, visualization and processing services. Finally, ODATIS relies on scientific consortia in order to promote and develop innovative processing methods and products for remote, airborne, or in situ observations of the ocean and its interfaces (atmosphere, coastline, seafloor) with the other clusters of the RI Data Terra.

How to cite: Schmidt, S., Quimbert, E., Vernet, M., Sudre, J., Mercier, C., Obaton, D., Piollé, J.-F., Merceur, F., Dibarboure, G., and Maudire, G.: Overview of the services provided to marine data producers by ODATIS, the French ocean data center, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-1599, https://doi.org/10.5194/egusphere-egu23-1599, 2023.

X4.177
|
EGU23-1294
|
GI2.3
Sung Dae Kim, Hyuk Min Park, Young Shin Kwon, and Hyeon Gyeong Han

A data integration and processing system was established to provide long-time data and real-time data to the researcher who are interested in long-term variation of ocean data in the Northwest Pacific area. All available ocean data of 6 variables (ocean temperature, salinity, dissolved oxygen, ocean CO2, nutrients) in the NWP area (0°N - 65°N, 95°E - 175°E) are collected from the Korean domestic organizations (KIOST, NFIS, KHOA, KOEM), the international data systems (WOD, GTSPP, SeaDataNet, etc.), and the international observation networks (Argo, GOSHIP, GLODAP, etc.). Total number of data collected is over 5 millions and observation dates are from 1938 to 2022. After referring to several QC manuals and related papers, QC procedures and test criteria for 6 data items were determined and documented. Several Matlab programs complying with QC procedures were developed and used to check quality of all collected data. We excluded duplicated data from the data set and saved them in 0.25° grid data files. Long-term average over 40 years and standard deviation of data at each standard depths and grid point were calculated. All quality controlled data, qc flag, average, standard deviation of each ocean variables are saved in format of netCDF and provided to ocean climate researchers and numerical modelers. We also have 2 plans using the collected data from 2023 to 2025. The one is production of long-term grid data set focused on the NWP area, the other is developing a data service system providing observation data and reanalysis data together.

Acknowledgement : This research was supported by Korea Institute of Marine Science & Technology Promotion(KIMST) funded by the Ministry of Oceans and Fisheries(KIMST-20220033)

How to cite: Kim, S. D., Park, H. M., Kwon, Y. S., and Han, H. G.: A data integration system for ocean climate change research in the Northwest Pacific, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-1294, https://doi.org/10.5194/egusphere-egu23-1294, 2023.

X4.178
|
EGU23-5866
|
GI2.3
|
ECS
|
Tamar Chichinadze, Zaza Gulashvili, Nana Bolashvili, Lile Malania, and Nikoloz Suknidze

Anthrax is a rare but serious disease caused by gram-positive, stem-shaped bacteria Bacillus anthracis, which are toxin-producing, encapsulated, facultative anaerobic organisms. Anthrax is found naturally in the soil and mainly harms livestock and wildlife. It can cause serious illness in both humans and animals. Anthrax, an often fatal disease of animals, is spread to humans through contact with infected animals or their products. People get infected with anthrax when spores get into the body.

The study aims to monitor the anthill localization map of anthrax on geographical maps and identify geographical variables that are significantly associated with environmental risk factors for anthrax recurrence in Georgia (Caucasus), as specific diseases affect the geographical environment, soil, climate. etc.

We carefully analyzed a set of 1664 cases of anthrax in humans and 621 cases of anthrax in animals, up to 1430 locations in anthrax foci (animal burial sites, slaughterhouses, BP roads, construction, etc.) observed in Georgia. Literature and the National Center for Disease Control for over 70 years. We analyzed more than 30 geographical variables such as climate, topography, soil (soil type, chemical composition, acidity), landscape, etc., and created several digital thematic maps, and foci of ant distribution and detection. The identified variable will help you to monitor anthrax development foci.

How to cite: Chichinadze, T., Gulashvili, Z., Bolashvili, N., Malania, L., and Suknidze, N.: Mapping and Analysis of Anthrax Cases in Humans and Animals, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-5866, https://doi.org/10.5194/egusphere-egu23-5866, 2023.

X4.179
|
EGU23-12381
|
GI2.3
|
ECS
|
Highlight
Daehwan Kim, Kwanchul Kim, Dasom Lee, Jae-Hoon Yang, Seong-min Kim, and Jeong-Min Park

This paper identifies the aspects of living environment elements (PM2.5, PM10, Noise) throughout Seoul and the urban planning characteristics that affect them by utilizing the big data of the S-Dot sensor in Seoul, which has recently become a hot topic. In other words, it proposes a big data-based research methodology and research direction to confirm the relationship between urban characteristics and environmental sectors that directly affect citizens.  The temporal range is from 2020 to 2022, which is the available range of time series data for S-Dot sensors, and the spatial range is throughout Seoul by 500m*500m GRID. First of all, as part of analyzing specific living environment patterns, simple trends through EDA are identified, and cluster analysis is conducted based on the trends. After that, in order to derive specific urban planning characteristics of each cluster, basic statistical analysis such as ANOA and OLS, and MNL analysis were conducted to confirm more specific characteristics. As a result of this study, cluster patterns of PM2.5, PM10, noise and urban planning characteristics that affect them are identified, and there are areas with relatively high or low long-term living environment values compared to other regions. The results of this study are believed to be a reference for urban planning management measures for vulnerable areas of living environment, and it is expected to be an exploratory study that can provide directions to studies related to data in various fields related to environmental data in the future.

How to cite: Kim, D., Kim, K., Lee, D., Yang, J.-H., Kim, S., and Park, J.-M.: An Exploratory Study on the Methodology for the Analysis of Urban Environmental Characteristics in Seoul City based on S-Dot Sensor Data, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-12381, https://doi.org/10.5194/egusphere-egu23-12381, 2023.

X4.180
|
EGU23-13455
|
GI2.3
Gaetano Festa, Shane Murphy, Mariusz Majdanski, Iris Christadler, Fabrice Cotton, Angelo Strollo, Marc Urvois, Volker Röhling, Stefano Lorito, Andrey Babeyko, Daniele Bailo, Jan Michalek, Otto Lange, Javier Quinteros, Mateus Prestes, and Stefanie Weege

The Geo-INQUIRE (Geosphere INfrastructure for QUestions into Integrated REsearch) project, supported by the Horizon Europe Programme, is aimed at enhancing services to make data and high-level products accessible to the broad Geoscience scientific community. Geo-INQUIRE’s goal is to encourage curiosity-driven studies into understanding the geosphere dynamics at the interface between the solid Earth, the oceans and the atmosphere using long data streams, high-performance computing and cutting-edge facilities.

In the framework of Geo-INQUIRE, Transnational Access (TA, both virtual and on-site) will be provided at six test beds across Europe: the Bedretto Laboratory, Switzerland; the Ella-Link Geolab, Portugal; the Liguria-Nice-Monaco submarine infrastructure, Italy/France; the Irpinia Near-Fault Observatory, Italy; the Eastern Sicily facility, Italy; and the Corinth Rift Laboratory, Greece. These test beds are state-of-the-art research infrastructures, covering the Earth’s surface, subsurface, and marine environments over different spatial scales, from small-scale experiments in laboratories to kilometric submarine fibre cables. The TA will revolve around answering scientific key-questions on the comprehension of fundamental processes associated with geohazards and georesources such as: the preparatory phases of earthquakes, the role of the fluids within the Earth crust, the fluid-solid interaction at the seabed, and the impact of geothermal exploitation. TA will be also offered for software and workflows belonging to the EPOS-ERIC and the ChEESE Centre of Excellence for Exascale in Solid Earth, to develop awarded user’s projects. These are grounded on simulation of seismic waves and rupture dynamics in complex media, tsunamis, subaerial and submarine landslides. HPC-based Probabilistic Tsunami, Seismic and Volcanic Hazard workflows are offered to assess hazard at high-resolution with extensive uncertainty exploration. Support and collaboration will be offered to the awardees to facilitate the access and usage of HPC resources for tackling geoscience problems. Geo-INQUIRE will grant TA to researchers to develop their own lab or numerical experiments with the aim of advancing scientific knowledge of Earth processes while fostering cross-disciplinary research across Europe. To be granted, researchers submit a proposal to the yearly TA calls that will be issued three times during the project life. Calls will be advertised at the Geo-INQUIRE web page https://www.geo-inquire.eu/ and through the existing community channels.

To encourage the cross-disciplinary research, Geo-INQUIRE will also organize a series of training and workshops, focused on data, data products and software delivered by research infrastructures, and useful for researchers. In addition, two summer schools will be organized, dedicated to cross-disciplinary interactions of solid earth and marine science.

The proposals, for both transnational access and training, will be evaluated by a panel that reviews the technical and scientific feasibility of the project, ensuring equal opportunities and diversity in terms of gender, geographical distribution and career stage. The first call is expected to be issued by the end of Summer 2023. The data and products generated during the TAs will be made available to the scientific community via the project’s strict adherence to FAIR principles.

How to cite: Festa, G., Murphy, S., Majdanski, M., Christadler, I., Cotton, F., Strollo, A., Urvois, M., Röhling, V., Lorito, S., Babeyko, A., Bailo, D., Michalek, J., Lange, O., Quinteros, J., Prestes, M., and Weege, S.: The Transnational access and training in the Geo-INQUIRE EU-project, an opportunity for researchers to develop leading-edge science at selected facilities and test-beds across Europe, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-13455, https://doi.org/10.5194/egusphere-egu23-13455, 2023.

X4.181
|
EGU23-14423
|
GI2.3
|
Mathilde Vergnolle and Jean-Luc Menut

EPOS-GNSS is the Thematic Core Service dedicated to GNSS data and products for the European Plate Observing System.
EPOS-GNSS provides a service to explore and download validated and quality controlled data and metadata. This service is based on a network of 10 data nodes connected to a centralized portal, called "EPOS-GNSS Data Gateway". The service aims to follow the FAIR principles and continues to evolve to better meet them. It currently provides more than 4 millions of daily files in the RINEX standardized format for 1670 European GNSS stations and their associated metadata.
In addition to the integration into the multi-disciplinary EPOS data portal, the service proposes a direct access to the data and metadata for users with a need for more complex or more specific queries and filtering. A GUI (web client) and a specialized command line client are provided to facilitate the exploration and download of the data and metadata.
The presentation introduces the EPOS GNSS-Data Gateway (https://gnssdata-epos.oca.eu), its clients, and its use.

How to cite: Vergnolle, M. and Menut, J.-L.: EPOS-GNSS DATA GATEWAY: a portal to European GNSS Data and Metadata, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-14423, https://doi.org/10.5194/egusphere-egu23-14423, 2023.

X4.182
|
EGU23-14605
|
GI2.3
Wolfgang zu Castell, Jan Bumberger, Peter Braesicke, Stephan Frickenhaus, Ulrike Kleeberg, Ralf Kunkel, and Sören Lorenz

Earth System Science (ESS) relies on the availability of data from varying resources and ranging over different disciplines. Hence, data sources are rich and diverse, including observatories, satellites, measuring campaigns, model simulations, case studies, laboratory experiments as well as citizen science etc. At the same time, practices of professional research data management (RDM) are differing significantly among various disciplines. There are many well-known challenges in enabling a free flow of data in the sense of the FAIR criteria. Such are data quality assurance, unique digital identifiers, access to and integration of data repositories, just to mention a few. 

The Helmholtz DataHub Earth&Environment is addressing digitalization in ESS by developing a federated data infrastructure. Existing RDM practices at seven centers of the Helmholtz Association working together in a joint research program within the Research Field Earth and Environment (RF E&E) are harmonized and integrated in a comprehensive way. The vision is to establish a digital research ecosystem fostering digitalization in geosciences and environmental sciences. Hereby, issues of common metadata standards, digital object identifiers for samples, instruments and datasets, defined role models for data sharing certainly play a central role. The various data generating infrastructures are registered digitally in order to collect metadata as early as possible and enrich them along the flow of the research cycle.

Joint RDM bridging several institutions relies on professional practices of distributed software development. Apart from operating cross-center software development teams, the solutions rely on concepts of modular software design. For example, a generic framework has been developed to allow for quick development of tools for domain specific data exploration in a distributed manner. Other tools incorporate automated quality control in data streams. Software is being developed following guiding principles of open and reusable research software development.

A suite of views is being provided, allowing for varying user perspectives, monitoring data flows from sensor to archive, or publishing data in quality assured repositories. Furthermore, high-level data products are being provided for stakeholders and knowledge transfer (for examples see https://datahub.erde-und-umwelt.de). Furthermore, tools for integrated data analysis, e.g. using AI approaches for marine litter detection can be implemented on top of the existing software stack.

Of course, this initiative does not exist in isolation. It is part of a long-term strategy being embedded within national (e.g. NFDI) and international (e.g. EOSC, RDA) initiatives.

How to cite: zu Castell, W., Bumberger, J., Braesicke, P., Frickenhaus, S., Kleeberg, U., Kunkel, R., and Lorenz, S.: Towards an interoperable digital ecosystem in Earth System Science research, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-14605, https://doi.org/10.5194/egusphere-egu23-14605, 2023.

X4.183
|
EGU23-6357
|
GI2.3
Vincent Douet, Sophie Bouffiès-Cloché, Joanne Dumont, Martial Haeffelin, Jean-Charles Dupoont, Simone Kotthaus, Valéry Masson, Aude Lemonsu, Valerie Gros, Christopher Cantrell, Vincent Michoud, and Sébastien Payan

The urban is at the heart of many disciplinary projects covering very broad scientific areas. Acquired data or simulations are often accessible (when they are) via targeted thematic portals. However, the need for transdisciplinarity has been essential for several years to answer specific scientific questions or societal demands. For this, the crossing of human sciences data, health, air quality, land use, emissions inventories, biodiversity, etc., would allow new innovative studies in connection with the city.

PANAME (PAris region urbaN Atmospheric observations and models for Multidisciplinary rEsearch) developed by AERIS was designed as the first brick of a data portal that can promote the discovery, access, cross-referencing and representation of urban data from various sectors with air quality and urban heat islands as a starting point. The portal and future developments will be discussed in this presentation.

How to cite: Douet, V., Bouffiès-Cloché, S., Dumont, J., Haeffelin, M., Dupoont, J.-C., Kotthaus, S., Masson, V., Lemonsu, A., Gros, V., Cantrell, C., Michoud, V., and Payan, S.: PANAME: a portal laboratory for city's environmental data, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-6357, https://doi.org/10.5194/egusphere-egu23-6357, 2023.

X4.184
|
EGU23-6873
|
GI2.3
Hela Mehrtens, Janine Berndt, Klaus Getzlaff, Andreas Lehmann, and Sören Lorenz

GEOMAR research covers a unique range of physical, chemical, biological and geological ocean processes. The department Digital Research Services develops and provides advice and tools to support scientific data workflows, including metadata description of expeditions, model experiments, lab experiments, and samples. Our focus lies on standardized internal data exchange in large interdisciplinary scientific projects and citable data and software publications in discipline specific repositories to meet the FAIR principles. GEOMAR aims at providing their services not only internally but as a collaborative RDM platform for marine projects as a community service. How to achieve this on the operational level is currently worked on jointly with other research institutions in community projects, e.g. within the DAM (German Alliance of Marine Research), the DataHUB, an initiative of several research centres within the Helmholtz research area Earth and Environment, and within the national research infrastructure NFDI4Earth, a network of more than 60 partners.  

Our latest use cases are the inclusion of the seismic data and numerical model simulations into the community portals to increase their visibility and reusability. We present the success stories and pitfalls of bringing a locally well established system in larger communities and address the challenges we are facing. 

How to cite: Mehrtens, H., Berndt, J., Getzlaff, K., Lehmann, A., and Lorenz, S.: From local to global: Community services in interdisciplinary research data management , EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-6873, https://doi.org/10.5194/egusphere-egu23-6873, 2023.

Posters virtual: Fri, 28 Apr, 14:00–15:45 | vHall ESSI/GI/NP

Chairpersons: Sebastien Payan, Wolfgang zu Castell
vEGN.7
|
EGU23-15072
|
GI2.3
|
Sabino Maggi, Silvana Fuina, and Saverio Vicario

Since the development of the original specifications in the '90s the PDF document format has become the de-facto standard for the distribution and archival of documents in electronic form because of its ability to preserve the original layout of the documents, independently of the hardware, operating system and application software used to visualize them.

Unfortunately the PDF format does not contain explicit structural and semantic information, making it very difficult to extract structured information from them, in particular data presented in tabular form. 
The automatic extraction of tabular data is a difficult and challenging task because tables can have extremely different formats and layouts, and involves several complex steps, from the proper recognition and conversion of printed text into machine-encoded characters, to the identification of logically coherent table constructs (headers, columns, rows, spanning elements), and to the breaking down of the data constructs into elemental objects.

Several tools have been developed to support the extraction process. In this work we survey the most interesting tools for the automatic detection and extraction of tabular data, analyzing their respective advantages and limitations. A particular emphasis is given on programmable open source tools because of their flexibility and long-term availability, together with the possibility to easily tweak them to meet the peculiar needs of the problem at hand.

As a practical application, we also present a workflow based on a set of R and AWK scripts that can automatically extract daily temperature and precipitation data from the official PDF documents made available each year by Regione Puglia, in Italy. The lessons learned from the development of this workflow and the possibility to generalize the approach to different kinds of PDF documents are also discussed.

How to cite: Maggi, S., Fuina, S., and Vicario, S.: Automated Extraction of Bioclimatic Time Series from PDF Tables, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-15072, https://doi.org/10.5194/egusphere-egu23-15072, 2023.

vEGN.8
|
EGU23-16416
|
GI2.3
Simone Tarquini, Francesco Martinelli, Marina Bisson, Emanuela De Beni, Claudia Spinetti, and Gabriele Tarabusi

Active volcanoes are complex, poorly predictable systems that can pose a threat to humans and their infrastructures. As such, it is important to improve as much as possible the understanding of their behavior. The Stromboli volcano, in Italy, is one of the most active volcanoes in the world, and its almost persistent activity is documented since centuries. The persistent background activity is sometimes interrupted by much more energetic, dangerous episodes. The Istituto Nazionale di Geofisica e Vulcanologia (Italy) set up the interdisciplinary “UNO” project, aimed to understand when the Stromboli volcano is about to switch from the ordinary to the extraordinary activity. The UNO project includes an outstanding variety of research activities, such as sampling in the field, the modeling of Stromboli topography from ALS technique and satellite data, the 3D numerical simulations of ballistic trajectories, or the set up of an ultrasonic microphones system. Key to the success of the project is the collection of integrated high spatial and temporal resolution data and their joint analyses in a shared relational database. We present here the simplified logical model of such database, focusing on the identification of entities and their relationships.

How to cite: Tarquini, S., Martinelli, F., Bisson, M., De Beni, E., Spinetti, C., and Tarabusi, G.: The set up of the “UNO” project relational database for Stromboli volcano, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-16416, https://doi.org/10.5194/egusphere-egu23-16416, 2023.

vEGN.9
|
EGU23-15293
|
GI2.3
|
ECS
|
Highlight
Anastasia Angelou, Sandra Gewehr, Spiros Mourelatos, and Ioannis Kioutsioukis

The transmission of West Nile Virus is known to be affected by multiple factors related to the behavior and interactions between reservoir (birds), vector (Culex-mosquitos), and hosts (humans). Environmental parameters can play a critical role in understanding WNV epidemiology. The aim of this research was to determine the association of various climatic factors with the Culex mosquito abundance in Greece during the period 2011-2022. Climate data were acquired from ERA5 (European Centre for Medium-Range Weather Forecasts), while Culex abundance data were obtained through the mosquito surveillance network of ECODEVELOPMENT S.A, who hold the biggest mosquito surveillance network in Greece. The research was conducted at the municipality level. Culex abundance depends in a nonlinear fashion from temperature (Figure 1). The spread of the measurements indicates however there are other factors that affect the abundance of mosquitoes.

Figure 1 Scatter plot of air temperature VS Culex abundance in a municipality (Delta) with relatively sizeable mosquito population.

Correlation heatmaps were used as a tool to visualize the correlation of vector abundance and average monthly temperature up to 2 months before at several municipalities in the Region of Central Macedonia. The correlations decrease with increasing the lag in temperature (Figure 2). Moreover, there are some municipalities in which the correlation coefficient is considerably greater than others. Those correlations cannot be explained without considering the mosquito breeding sites found in these municipalities. In these municipalities there is a presence of important water resources, such as rice paddies, drainage canals, wetland systems or a combination of all the above. When surface waters warm and the outside temperature rises, the mosquito life cycle is completed more quickly, resulting in more generations being produced in a shorter period of time.

Figure 2 Correlation heatmap of the correlation coefficient between the mosquito abundance (municipality scale) and the average monthly temperature up to 2 months before.

Scatterplots and correlation heatmaps calculated with the Culex abundance and total precipitation, relative humidity or wind speed did not reveal similar patterns. Ongoing analysis focuses in more factors, environmental and not, which affect the abundance of mosquitoes that transmit WNV.

Acknowledgments 
This research has been co‐financed by the European Regional Development Fund of the European Union and Greek national funds through the Operational Program Competitiveness, Entrepreneurship and Innovation, under the call RESEARCH – CREATE – INNOVATE (project code: Τ2ΕΔΚ-02070). 

How to cite: Angelou, A., Gewehr, S., Mourelatos, S., and Kioutsioukis, I.: Environmental parameters as a critical factor in understanding mosquito population, EGU General Assembly 2023, Vienna, Austria, 23–28 Apr 2023, EGU23-15293, https://doi.org/10.5194/egusphere-egu23-15293, 2023.