Session GI2.4

[Programme]

GI2.4 | Artificial Intelligence in Geosciences: applications, innovative approaches and new frontiers.

Orals |

Mon, 10:45

Posters on site |

Mon, 16:15

Posters virtual

Mon, 14:00

Artificial Intelligence in Geosciences: applications, innovative approaches and new frontiers.

Convener: Andrea Vitale | Co-conveners: Jie Dodo XuECSECS, Luigi BiancoECSECS, Giacomo RoncoroniECSECS, Ivana VentolaECSECS, J ZhangZhou, Guillaume Siron

Orals

| Mon, 15 Apr, 10:45–12:30 (CEST), 14:00–15:45 (CEST)

Room 0.94/95

Posters on site

| Attendance Mon, 15 Apr, 16:15–18:00 (CEST) | Display Mon, 15 Apr, 14:00–18:00

Hall X4

Posters virtual

| Attendance Mon, 15 Apr, 14:00–15:45 (CEST) | Display Mon, 15 Apr, 08:30–18:00

vHall X4

In recent years, technologies based on Artificial Intelligence (AI), such as image processing, smart sensors, and intelligent inversion, have garnered significant attention from researchers in the geosciences community. These technologies offer the promise of transitioning geosciences from qualitative to quantitative analysis, unlocking new insights and capabilities previously thought unattainable.
One of the key reasons for the growing popularity of AI in geosciences is its unparalleled ability to efficiently analyze vast datasets within remarkably short timeframes. This capability empowers scientists and researchers to tackle some of the most intricate and challenging issues in fields like Geophysics, Geochemistry, Seismology, Hydrology, Planetary Science, Remote Sensing, and Disaster Risk Reduction.
As we stand on the cusp of a new era in geosciences, the integration of artificial intelligence promises to deliver more accurate estimations, efficient predictions, and innovative solutions. By leveraging algorithms and machine learning, AI empowers geoscientists to uncover intricate patterns and relationships within complex data sources, ultimately advancing our understanding of the Earth's dynamic systems. In essence, artificial intelligence has become an indispensable tool in the pursuit of quantitative precision and deeper insights in the fascinating world of geosciences.
For this reason, aim of this session is to explore new advances and approaches of AI in Geosciences.

Orals: Mon, 15 Apr | Room 0.94/95

Chairpersons: Andrea Vitale, Luigi Bianco, Ivana Ventola

10:45–10:50

5-minute convener introduction

10:50–11:00

EGU24-340

ECS

Highlight

On-site presentation

Super-resolution for satellite imagery: uncovering details using a new Cross Band Transformer architecture

Jasper S. Wijnands, Nikolaos Ntantis, Jan Fokke Meirink, and Domenica Dibenedetto

Recent advances in artificial intelligence (AI) techniques have enabled the processing and analysis of vast datasets, such as archives of satellite observations. In the geosciences, remote sensing has transformed the way in which the atmosphere and surface are observed. Traditionally, substantial funding is directed towards the development of new satellites to improve observation accuracy. Nowadays, novel methods based on AI could become a complementary approach to further enhance the resolution of observations. Therefore, we developed a new, state-of-the-art super-resolution methodology.

Satellites commonly measure electromagnetic radiation, reflected or emitted by the earth's surface and atmosphere, in different parts of the spectrum. Many instruments capture both panchromatic (PAN) and low-resolution multi-spectral (LRMS) images. While PAN typically covers a broad spectral range, LRMS focuses on details in narrow bands within that range. Pansharpening is the task of fusing the spatial details of PAN with the spectral richness of LRMS, to obtain high-resolution multi-spectral (HRMS) images. This has proven to be valuable in many areas of the geosciences, leading to new capabilities such as detecting small-sized marine plastic litter and identifying buried archaeological remains. Although HRMS images are not directly captured by the satellite, they can provide enhanced visual clarity, uncover intricate patterns and allow for more accurate and detailed analyses.

Technically, pansharpening is closely related to the single image super-resolution task, where attention-based models have achieved excellent results. In our study a new Cross Band Transformer (CBT) for pansharpening was developed, incorporating and adapting successful features of vision transformer architectures. Information sharing between the panchromatic and multi-spectral input streams was enabled through two novel components: the Shifted Cross-Band Attention Block and the Overlapping Cross-Band Attention Block, implementing mechanisms for shifted and overlapping cross-attention. Each block led to a more accurate fusion of panchromatic and multi-spectral data. For evaluation, CBT was also compared to seven competitive benchmark methods, including MDCUN, PanFormer and ArbRPN. Our model produced state-of-the-art results on the widely used GaoFen-2 and WorldView-3 pansharpening datasets. Based on peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM) scores of the generated images, CBT outperformed all benchmark methods. Our AI method can be integrated in existing remote sensing pipelines, as CBT converts actual observations into a high-resolution equivalent for use in downstream tasks. A PyTorch implementation of CBT is available at https://github.com/VisionVoyagerX/CBT.

Furthermore, we developed the Sev2Mod dataset, available at https://zenodo.org/record/8360458. Unlike conventional benchmark datasets, Sev2Mod acquired input and target pairs from two different satellite instruments: (i) SEVIRI onboard the Meteosat Second Generation (MSG) satellite in geostationary orbit and (ii) MODIS onboard the Terra satellite in polar, sun-synchronous orbit. SEVIRI measures a fixed field of view quasi-continuously, while MODIS passes only twice a day but observes at a much higher spatial resolution. Our study investigated image generation at the spatial resolution of MODIS, while preserving SEVIRI's high temporal resolution. Since Sev2Mod is better aligned with actual situations one may encounter in applications of pansharpening methods (e.g., noise, bias, approximate temporal matching), it provides a solid foundation to design robust pansharpening models for real-world applications.

How to cite: Wijnands, J. S., Ntantis, N., Meirink, J. F., and Dibenedetto, D.: Super-resolution for satellite imagery: uncovering details using a new Cross Band Transformer architecture, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-340, https://doi.org/10.5194/egusphere-egu24-340, 2024.

11:00–11:10

EGU24-16211

ECS

On-site presentation

Waterbody Detection of Korean Reservoirs from Sentinel-1 Images and the Analysis of its Relationship with Water Level: A Deep Learning Approach

Soyeon Choi and Yangwon Lee

In this study, we propose a method for monitoring the surface area of agricultural reservoirs in South Korea using Sentinel-1 synthetic aperture radar (SAR) images and deep learning models. This approach includes verifying the correlation between water surface area and water level, using data from both the monitored water surface area and real-time water level gauges. Leveraging the Google Earth Engine (GEE) platform, we constructed datasets for seven reservoirs, each with capacities of 700,000 tonnes, 900,000 tonnes, and 1.5 million tonnes, covering the period from 2017 to 2021. The model training was conducted on 1,283 images from four reservoirs, applying shuffling and 5-fold cross-validation techniques. The models' detection results were evaluated based on mean Intersection over Union (mIoU). Utilizing the highest-performing model, we analyzed the correlation between surface area and water level changes from 2017 to 2021. By integrating the water surface area data calculated by the model with real-time reservoir water level information from RAWRIS (Rural Agricultural Water Resource Information System), we confirmed the correlation between changes in water surface area and water levels from 2017 to 2021. This study illustrates that monitoring of water surface areas by satellite can be effectively utilized for tracking status changes in agricultural reservoirs in South Korea.

How to cite: Choi, S. and Lee, Y.: Waterbody Detection of Korean Reservoirs from Sentinel-1 Images and the Analysis of its Relationship with Water Level: A Deep Learning Approach, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-16211, https://doi.org/10.5194/egusphere-egu24-16211, 2024.

11:10–11:20

EGU24-18300

Virtual presentation

Geospatial Foundation Models for Efficient Retrieval of Remote Sensing Images

Thomas Brunschwiler, Benedikt Blumenstiel, Viktoria Moor, and Romeo Kienzler

11:20–11:30

EGU24-19507

ECS

Highlight

On-site presentation

Ground Deformation Forecasting and Modeling in Mining Areas Using Artificial Intelligence Techniques

Patryk Balak, Przemysław Tymków, and Paweł Bogusławski

Underground mining activities cause ground deformations that threaten the stability of surface infrastructure and ecosystems, posing risks to both the environment and human populations. In response to these threats, this study focuses on developing a method for forecasting and modelling ground deformations, which are a consequence of underground mining activities, using advanced artificial intelligence (AI) techniques. The primary goal is to create a model utilising data from Differential Interferometric Synthetic Aperture Radar (DInSAR) and specialised mining data, enabling precise monitoring and forecasting of future changes which will support an adequate upgrade of the decision-making procedure in the mining industry.

The study employed two categories of neural networks: Convolutional Neural Networks (CNN) and Feedforward Neural Networks (FNN). In the application FNN, a detailed analysis was conducted on a per-pixel basis across the entire dataset. Each pixel, representing a specific point on the terrain, was analysed with its associated feature vector. This vector comprised multiple attributes derived from the mining data and DInSAR images, effectively capturing the local characteristics of each point, such as its relative position, historical deformation patterns, and proximity to mining activities. For the CNN method, the study focused on exploring the impact of different kernel sizes on model performance. Kernels in CNNs are small matrices used to process data across the image, essential for extracting and learning features crucial for understanding and predicting ground deformations. Varying kernel sizes allow the network to capture different aspects of the data. Considered features included the distance from the centre of the subsidence basin and the mining face at different time intervals. In the context of forecasting, the use of high-quality data is crucial. Unfortunately, some DInSAR images exhibited noise, due in part to a lack of stable coherence and adverse atmospheric effects. A key aspect of the study was therefore the creation and testing of a classifier for the suitability of DInSAR images for forecasting purposes. The analyses showed that the developed classifier achieved an accuracy of 83%. The training data for the prediction study came from the Budryk-Knothe method. The network was tasked with reproducing the operation of this method while simultaneously predicting six days ahead. The models were evaluated based on the mean squared error (MSE) in the areas of the subsidence basin. The test set consisted of specially prepared and trimmed DInSAR images. The FNN-based solution achieved the best results. For this network, satisfactory accuracy was achieved in determining the direction of settlement, with an MSE of 0.12, corresponding to a percentage error of approximately 10% (5 cm for a subsidence of 50 cm).

The results from the study highlight the significant potential of integrating AI techniques with advanced geodetic methods, opening new possibilities in monitoring the impact of mining on the environment. Future work may focus on further optimization of AI algorithms to increase forecasting accuracy over longer periods and in various geological and operational conditions.

How to cite: Balak, P., Tymków, P., and Bogusławski, P.: Ground Deformation Forecasting and Modeling in Mining Areas Using Artificial Intelligence Techniques, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-19507, https://doi.org/10.5194/egusphere-egu24-19507, 2024.

11:30–11:40

EGU24-22089

ECS

On-site presentation

Classification of different physical scatterers in weather radar data using machine learning techniques

Alakh Agrawal, Swasti Pahuja, Anjita Neelatt, and JDr Indu

11:40–11:50

EGU24-19021

ECS

Highlight

On-site presentation

Optimizing crop type mapping for fairness

Ilya Gorbunov, Caroline Gevaert, and Mariana Belgiu

Minor crops are crucial for food security, especially due to their resilience against climate change related challenges (Renard & Tilman, 2019) . Consequently, accurate crop mapping is essential for monitoring policies seeking to incentivize minor crop production. However, the class imbalance problem in machine learning introduces a bias against these crops, leading to unfair classifications. This research aims to explore how this bias is mitigated through two main class imbalance correction approaches: sample balancing methods and cost-sensitive learning. Apart from investigating how these methods address the typical class imbalance problem, where there are simply less labelled samples of a specific class, we investigate how these methods can be used to address another level of bias, that created by omitted sensitive attributes. These are attributes such as parcel size, which are not explicitly considered by the classifier, yet significantly impact accuracy and contribute to the unfairness of the classification, as evidenced by notably lower accuracy for smaller parcels. By integrating these attributes into the class imbalance correction methods, we assess the potential for enhancing fairness. This approach is vital, as it corrects performance biases affecting specific sub-groups, which are not necessarily class dependent, thus addressing a critical but overlooked dimension of fairness in classification.

Utilizing the BreizhCrops dataset, we create sub-sampled datasets that represent a variety of class imbalance problems. This enables us to conduct an across-the-board comparison of the selected class imbalance correction techniques, providing insights that may help streamline future research looking to employ these techniques. For the classifier architecture, we select the transformer encoder, chosen for its greater performance among deep learning methods tested on the BreizhCrops dataset (Rußwurm et al., 2020).

This research contributes to the broader understanding of class imbalance correction in classification tasks, particularly for crop mapping, though the methods can also be applied in other GeoAI contexts. By evaluating sample balancing and cost-sensitive learning in varied contexts, we provide insights into optimizing classification tasks for fairness. Our work contributes to the development of responsible AI practices by offering valuable insights on how fairness can be enhanced across GeoAI applications.

Renard, D., & Tilman, D. (2019). National food production stabilized by crop diversity. Nature, 571(7764), 257–260. https://doi.org/10.1038/s41586-019-1316-y

Rußwurm, M., Pelletier, C., Zollner, M., Lefèvre, S., & Körner, M. (2020). BreizhCrops: A Time Series Dataset for Crop Type Mapping (arXiv:1905.11893). arXiv. https://doi.org/10.48550/arXiv.1905.11893

How to cite: Gorbunov, I., Gevaert, C., and Belgiu, M.: Optimizing crop type mapping for fairness , EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-19021, https://doi.org/10.5194/egusphere-egu24-19021, 2024.

11:50–12:00

EGU24-15426

ECS

On-site presentation

Characterizing subsurface structures from hard and soft data with multiple-condition fusion neural network

(withdrawn after no-show)

Qiyu Chen and Zhesi Cui

12:00–12:10

EGU24-10105

ECS

Virtual presentation

Estimation of Lateral River Aquifer Exchanges with Physics Informed Neural Networks

Mayank Bajpai, Lakhadive Mehulkumar Rajkumar, Shreyansh Mishra, and Shishir Gaur

12:10–12:20

EGU24-5905

ECS

On-site presentation

Deep Learning for Detecting Thrust Faults in Subduction Zones

Wenhao Zheng, Rebecca Bell, Cédric M. John, and Lluis Guasch

Subduction plate boundary faults and splay faults in accretionary wedges are capable of generating some of the largest earthquakes and tsunamis on Earth. Owing to the complexity of geological structures and the inherent ambiguity in geophysical data, comprehensively characterizing potential thrust fault systems presents a considerable challenge. Current automated fault detection methods, primarily targeting normal faults, show limited efficacy in complex fault systems of subduction zones. Treating the task of fault detection as a binary image segmentation issue, we propose a supervised end-to-end fully convolutional neural network (U-Net) to automatically and accurately delineate thrust faults from seismic data. To circumvent the labour-intensive and potentially subjective manual labelling process required for model training, we have designed a workflow to efficiently auto-generate more than 10000 training pairs comprising both 2D synthetic seismic images and their corresponding labelled images of the thrust faults simulated in the seismic images. Each synthetic seismic image includes randomly undulating stratigraphic strata and faults with dip angles between 5 and 40 degrees, aiming to simulate realistic and varied geological structures and thrust fault features in subduction zone, which equipped the U-Net model to achieve a 91% accuracy rate in fault detection within the test dataset. The example from the Hikurangi subduction zone, New Zealand demonstrates that the U-Net trained by only synthetic data is superior to conventional automatic methods, such as unsupervised methods or supervised methods trained by normal faults, in delineating more than 70% thrust faults from seismic images. To enhance the U-Net model's adaptation to specific regional fault characteristics and reduce the interference from noise, we incorporated a select set of real 2D seismic images and manually interpreted fault labels into the transfer learning process, which significantly improved its prediction accuracy and make the results clearer. From the comprehensive 2D characterizations based on the U-Net model, we can further extract 3D thrust fault systems and quantitatively measure their geometric parameters.

How to cite: Zheng, W., Bell, R., John, C. M., and Guasch, L.: Deep Learning for Detecting Thrust Faults in Subduction Zones, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-5905, https://doi.org/10.5194/egusphere-egu24-5905, 2024.

12:20–12:30

EGU24-10627

ECS

Highlight

On-site presentation

Enhancing Geoscience Analysis: AI-Driven Imputation of Missing Data in Well Logging Using Generative Models

Abdulrahman Al-Fakih, Ardiansyah Koeshidayatullah, and Sanlinn Kaka

The integrity of well logging data is paramount in geophysical explorations for accurate subsurface analysis, notably in the North Sea Dutch region known for its extensive hydrocarbon exploration. Addressing the common challenge of missing data in well logs, our study introduces an AI-driven methodology employing generative models. These models utilize machine learning to analyze existing data patterns and generate realistic imputations for missing values. The approach has shown to not only enhance the quality of geological interpretations but also to streamline the workflow in hydrocarbon exploration. This integration of AI signifies a substantial move towards more precise and efficient geoscience data analysis. A qualitative comparison using Principal Component Analysis (PCA) and t-Distributed Stochastic Neighbor Embedding (t-SNE) was conducted to evaluate the results. The PCA comparison demonstrates the synthetic data’s alignment with real data in principal component space, effectively capturing the variance. The t-SNE analysis further validates the model's fidelity, with the synthetic data exhibiting clustering behaviors analogous to real data. Together, these results showcase the transformative potential of machine learning in geosciences, providing a robust framework for enhancing data reliability in geophysical studies.

How to cite: Al-Fakih, A., Koeshidayatullah, A., and Kaka, S.: Enhancing Geoscience Analysis: AI-Driven Imputation of Missing Data in Well Logging Using Generative Models, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-10627, https://doi.org/10.5194/egusphere-egu24-10627, 2024.

Lunch break

Chairpersons: Jie Dodo Xu, J ZhangZhou, Guillaume Siron

14:00–14:10

EGU24-7440

ECS

On-site presentation

Recover the Water Content of Mid-Ocean Ridge Basalts by a Machine Learning Method

Jingjun Zhou, Jia Liu, Qunke Xia, Cheng Su, Takeshi Kuritani, and Eero Hanski

Water's impact on the physicochemical attributes of mantle rocks makes it a pivotal factor in mantle evolution. Mid-ocean ridge basalts (MORBs) are essential for analyzing the upper mantle's composition, yet many global MORB samples lack direct water content assessment. The common method, using a correlation between H₂O and trace elements like Ce to estimate MORB water contents, often presume a constant H₂O/Ce ratio. Sometimes this assumption is unreliable due to the heterogeneity in H₂O/Ce ratios, even within short ridge segments. For addressing this gap, we utilize compositional data from 1,467 global MORB glasses with measured water contents to develop a Random Forest Regression model. This machine learning-based model can predict water concentrations of MORB glasses based on major and trace element data, without the need for a fixed H₂O/trace element ratio. Our model accurately recovers water contents of MORB glasses, showing comparable precision to traditional analytical methods. Applying this model to 1,931 MORB glass samples has significantly expanded the global MORB water content database, revealing the widespread presence of high-water MORBs. Importantly, this innovative approach enables the exploration of water content in MORBs from regions previously without such data, like the Chile Ridge and the Pacific-Antarctic Ridge. Moreover, it allows us to deduce variations of water contents of MORB sources by applying the model to transform fault samples, thereby offering novel insights into the dynamics of the mantle.

How to cite: Zhou, J., Liu, J., Xia, Q., Su, C., Kuritani, T., and Hanski, E.: Recover the Water Content of Mid-Ocean Ridge Basalts by a Machine Learning Method, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-7440, https://doi.org/10.5194/egusphere-egu24-7440, 2024.

14:10–14:20

EGU24-11007

ECS

solicited

Highlight

On-site presentation

MagMaTaB: A machine learning-based model for magmatic liquid thermobarometry

Gregor Weber and Jon Blundy

Determining pressures and temperatures of magmas is crucial for addressing diverse challenges in petrology, geodynamics, and volcanology. However, inherent inaccuracies, especially in barometry, have limited the effectiveness of existing models in unravelling the architecture of crustal igneous systems. In this presentation, I will introduce a novel machine learning model, calibrated using an extensive experimental database, to create regression models for extracting P-T conditions of magmas. Calculations are conducted by considering melt chemistry and the coexisting mineral assemblage as input variables.
Our approach is versatile, applicable across a wide range of compositions from basalt to rhyolite, covering pressures from 0.2 to 15 kbar and temperatures ranging from 675 to 1400°C. Testing and optimization demonstrate that the model can recover pressures with a root-mean-square error of 1.1-1.3 kbar and temperature estimates with errors as low as 21°C. This indicates that melt chemistry-mineral assemblage pairs reliably capture magmatic variables across a broader spectrum of conditions than previously thought. We propose that this reliability arises from the relatively low thermodynamic variance in natural magma compositions, despite the presence of numerous oxide components.
Applying our model to two cases with well-constrained geophysics - Mount St. Helens volcano (USA) and the Askja caldera in Iceland - we analyse dacite whole-rocks from Mount St. Helens, erupted between 1980-1986. These rocks, inferred to represent liquids extracted from a complex mineral mush, yield melt extraction source pressures that align remarkably well with geophysical constraints. For Askja caldera, our model allows to assign basaltic and rhyolitic magma chemistries to distinct seismic wave speed anomalies, highlighting the potential of our model to bridge the gap between petrology and geophysics. Our model, named MagMaTaB, is accessible through a user-friendly web application.

How to cite: Weber, G. and Blundy, J.: MagMaTaB: A machine learning-based model for magmatic liquid thermobarometry, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-11007, https://doi.org/10.5194/egusphere-egu24-11007, 2024.

14:20–14:30

EGU24-19015

ECS

On-site presentation

Explainable machine learning to uncover hydrogen diffusion mechanism in clinopyroxene

Anzhou Li, Sensen Wu, Huan Chen, Zhenhong Du, and Qunke Xia

Estimating the water content of mantle-derived magma using clinopyroxene (cpx) phenocrysts serves as a valuable constraint on the water budget in deep Earth. Intricate magma processes and the high hydrogen diffusion rate necessitate careful evaluations of whether the water content in cpx preserves its original state. Machine learning (ML) has been utilized to develop a classifier for judging hydrogen diffusion in cpx. Never- theless, the opaqueness and complexity of most ML models hinder a clear understanding of their classification principles. To elucidate the mechanistic basis of the ML model, the Shapley theory is integrated to determine the contributions of major elements of cpx as features in a linear additive manner. This study achieves superior classification performance using an extreme gradient boosting model and innovatively presents a quantitative evaluation of feature importance at the sample level for each observation. The results indicate that Na plays a predominant role in the diffusion process surpassing other major elements and its associated hydrogen can easily diffuse out of cpx. Our model also identifies various hydrogen association modes in different elemental com- positions and puts constraints on the properties of incorporated hydrogen with non-lattice forming elements in cpx. The findings demonstrate that the application of explainable ML methods in mineralogy holds significant potential for advancing the comprehension of geological phenomena.

How to cite: Li, A., Wu, S., Chen, H., Du, Z., and Xia, Q.: Explainable machine learning to uncover hydrogen diffusion mechanism in clinopyroxene, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-19015, https://doi.org/10.5194/egusphere-egu24-19015, 2024.

14:30–14:40

EGU24-5002

Highlight

On-site presentation

Geochemistry π: Automated Machine Learning Python Framework for Tabular Data

Jianming Zhao, Johnny ZhangZhou, Can He, and Yang Lyu and the ZJU Earth Data Group

Machine learning has significantly advanced geochemistry research, but its implementation can be arduous and time-consuming. In response to this challenge, we introduce Geochemistry π, an open-source automated machine learning Python framework. With Geochemistry π, geochemists can effortlessly process tabulated data and execute machine learning algorithms by selecting preferred options. This streamlined process operates in a user-friendly question-and-answer format, eliminating the need for coding expertise. Following automatic or manual parameter adjustment, Geochemistry π furnishes users with comprehensive performance metrics and predictive outcomes for their machine learning models. Leveraging the scikit-learn library, Geochemistry π has developed a tailored automated workflow encompassing classification, regression, dimensionality reduction, and clustering algorithms. The framework’s extensibility and portability are enhanced through a modular pipeline architecture, segregating data handling from algorithm application. Geochemistry π’s Auto Machine Learning module integrates Cost-Frugal Optimization and Blended Search Strategy hyperparameter search methods from the A Fast and Lightweight Auto Machine Learning Library. Additionally, model parameter optimization is expedited using the Ray distributed computing framework. Efficient machine learning lifecycle management is facilitated through integration with the MLflow library, allowing users to compare multiple trained models at various scales and manage generated data and visualizations. To enhance accessibility, Geochemistry π separates front-end and back-end frameworks, culminating in a user-friendly web portal. This portal not only showcases the machine learning model but also presents the data science workflow, making it accessible to both researchers and developers. In summary, Geochemistry π offers a robust Python framework that empowers users and developers to significantly enhance their data mining efficiency, with options for both online and offline operation.

How to cite: Zhao, J., ZhangZhou, J., He, C., and Lyu, Y. and the ZJU Earth Data Group: Geochemistry π: Automated Machine Learning Python Framework for Tabular Data, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-5002, https://doi.org/10.5194/egusphere-egu24-5002, 2024.

14:40–14:50

EGU24-14857

ECS

Virtual presentation

Nitrate contamination prediction in Groundwater data in Karnataka, India, using Machine Learning (ML) Techniques

Himanshi Bansal, Venkataramana Devarakonda, and Mayank Dixit

Groundwater is a natural water source crucial in sustaining ecosystems and meeting various human needs. Groundwater is often contaminated due to the various anthropogenic and non-anthropogenic activities. Nitrate is the most abundant pollutant of Groundwater, which may be exogenic and anthropogenic. We studied nitrate ion concentration in Groundwater from dug well data. The Karnataka state's nitrate ion concentration varies from 0 to 1696 mg/l, which is higher in most places than the admissible limit of 45 mg/l as per the World Health Organisation (WHO). The correlation of various parameters, such as pH, electrical conductivity (EC), fluoride, chloride, etc., was studied with nitrate, and maximum correlation was found with chloride and EC. Our prediction concentration of nitrate ion using Different Machine Learning (ML) algorithms, including Regression, Random Forest (RF), Support Vector Regression (SVR) and Decision Tree (DT) models using the input parameters as pH, EC chloride, and fluoride. The result showcased that the best model is Support Vector Regression (SVR) with an R² value of 0.93 and a Mean Square Error (MSE) value of 0.02 for the region. The region's nitrate pollution might be forecast using the SVR model for better estimation.

How to cite: Bansal, H., Devarakonda, V., and Dixit, M.: Nitrate contamination prediction in Groundwater data in Karnataka, India, using Machine Learning (ML) Techniques, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-14857, https://doi.org/10.5194/egusphere-egu24-14857, 2024.

14:50–15:00

EGU24-1619

ECS

On-site presentation

Near real-time drillhole data analysis using non-destructive mineral exploration tools

Hamid Zekri, David Cohen, Neil Rutherford, and Matilda Thomas

Field-based data acquired from drillholes by pXRF, spectrometer, and wireline loggings can provide prompt, relatively inexpensive and precise information about the geochemistry, mineralogy and petrophysical properties of geological units. When on-site data capture is followed by proper visualisation and statistical analyses, these non-destructive methods can assist in rapid interpretations and decision makings.

Identification of distinctive units and critical zones in exploration under cover can be challenging for even experienced geologists when dealing with drilling chips. This study presents a data-driven framework for rapid boundary detection from drillhole cuttings through a combination of geochemical, mineralogical, and geophysical data. The workflow was tested on two drillholes during a drilling campaign conducted by Mineral Exploration Cooperative Research Centre (MinEx CRC) for Geoscience Australia's Exploring for the Future program in the Delamerian orogeny located in far western New South Wales, Australia.

A multivariate change point detection technique was applied to the 30 effective attributes retained from various geochemical variables, spectral scalars, and petrophysical parameters obtained through field-based instruments. These include major (e.g., Al, K, Ca, Fe etc.), conserved (Ti and Zr), and trace elements (e.g., Cu, Pb, and Zn), as well as spectral features associated with ferric oxides, kaolinite, micas, smectite, chlorites, and epidote. Natural gamma, electrical conductivity and resistivity, and magnetic susceptibility were also used as petrophysical parameters. Various interfaces between the weathered profile and basement rocks were detected at two scales providing useful insights into the stratigraphy and detailed geochemical logs previously carried out by the field geologists. Using different data types resulted in more reliable boundary detection compared to the limitations of using each data type on its own. This approach was also able to delineate a critical zone in the saprock zone above the fresh basement where elevated concentrations of lead and zinc are accumulated, providing guidance for more detailed sampling and analysis.

This framework can be utilised for data-driven stratigraphy/lithology logging, regolith characterisation, identification of the key horizons for further sampling and studies and can facilitate decision-making during exploration drilling campaigns.

How to cite: Zekri, H., Cohen, D., Rutherford, N., and Thomas, M.: Near real-time drillhole data analysis using non-destructive mineral exploration tools, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-1619, https://doi.org/10.5194/egusphere-egu24-1619, 2024.

15:00–15:10

EGU24-5256

ECS

Highlight

On-site presentation

Hydroacoustic geophony automatic detection: an open benchmark dataset with an open model

Pierre-Yves Raumer, Sara Bazin, Jean-Yves Royer, Cazau Dorian, and Vaibhav Vijay Ingale

Underwater seismic events such as earthquakes are known to produce not only seismic waves but also hydro-acoustic waves. Indeed, seismic waves arriving at the ocean bottom convert into acoustic waves in the water column. Other events, such as hot lava-seawater interactions or icequakes, also generate water-born acoustic signals. Monitoring these different signals with moored hydrophones proved to be useful and very efficient thanks to the little attenuation of acoustic waves propagating in the Sound Fixing and Ranging (SOFAR) channel. This led to the deployment of wide-range moored hydrophone networks to monitor active seafloor-spreading ridges in the world ocean. However, analyzing year-round data recordings from several stations is a cumbersome, user-dependent and most importantly time-consuming task. Despite some efforts to develop automatic detection algorithms, the community still lacks efficient and available off-the-shelf tools, as well as open datasets and benchmarks against which they could be compared objectively. To address this problem, we are glad to make publicly available three partially-annotated hydroacoustics datasets consisting of recordings from Atlantic and Indian oceans, with a total of ~60,000 hours. We propose a benchmark of models on a first task of binary classification, and an original convolutional neural network (CNN) model called TiSSNet showing promising results. To maximize the reliability of the evaluations, two datasets have been carefully and exhaustively annotated to serve as evaluation datasets. The getting started codes have also been made available on GitHub. We wish the datasets and benchmarks will be used as references upon which the state-of-the-art could be developed in a collaborative way. In the future, the best model, used as an automatic or semi-automatic detection framework, will be applied to larger datasets, and combined with multi-stations association and trilateration techniques to output nearly complete catalogs of geophonic events (source type and location, with signal characteristics).

How to cite: Raumer, P.-Y., Bazin, S., Royer, J.-Y., Dorian, C., and Vijay Ingale, V.: Hydroacoustic geophony automatic detection: an open benchmark dataset with an open model, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-5256, https://doi.org/10.5194/egusphere-egu24-5256, 2024.

15:10–15:20

EGU24-20777

On-site presentation

The EGDI Knowledge Infrastructure – the next step of Geological data

Carlo Cipolloni, Jasna Šinigoj, Martin Schiegl, Ángel Prieto Martin, Jorgen Tulstrup, and Stephan Gruijters

One of the main objectives of the European project GSEU (Geological Service for Europe) is to, starting from the EGDI (European Geological Data Infrastructure), develop a system for exchanging knowledge and skills in the geological and geothematic fields, as well as strengthening the development of data and information standards and harmonization frameworks.

To achieve this objective, it was decided to evolve EDGI towards a Geospatial Knowledge Infrastructure (EGDI-KI), for which the conceptual model and the first prototype under development are presented here, which were developed starting from the document developed by UN-GGIM in 2021 for Geospatial Knowledge Infrastructure.

Essential elements of the EGDI-KI are shown in figure 1 and are: The Knowledge Hub forms part of the EGDI structure and serves as a gateway to facilitate access to other components. It enhances the accessibility and usability of the following components:

Data Hub: This component focuses on data exchange, access technologies, and supports data science, data engineering, and data warehouse endpoints. The Knowledge Hub helps streamline access to the data hub, making it easier for users to leverage data resources.
Applications: The Applications component encompasses WebGIS or thematic portals designed to share information, data, and enable big data analysis. The Knowledge Hub contributes to the seamless integration and utilization of these applications, ensuring efficient access to information and facilitating analysis.
Collaboration Tools: Collaboration Tools within EGDI enable the sharing of documents, models, and methods among users. The Knowledge Hub complements this by providing a platform for organizing and accessing these shared resources, fostering collaboration and knowledge exchange.
Educational Facilities: EGDI includes educational facilities that support end-users and thematic domains in sharing and transferring knowledge. The Knowledge Hub plays a role in facilitating access to these educational resources, making them readily available to users seeking to enhance their understanding of relevant topics.
Expertise & Networking Hub is thematic expert and physical Infrastructure catalogue as well as the possible research and Industry community interactions.

Finally, the knowledge infrastructure platform is the portal to query and navigate all the knowledge resources available in the Knowledge Hub.

The Knowledge Hub plays an important role in the European Geological Data Infrastructure (EGDI) by ensuring that the wealth of knowledge and expertise available within the system is not fragmented and disconnected. Instead, it enables the organisation and accessibility of this knowledge using a semantic Knowledge engine.

By leveraging the semantic Knowledge engine, the Knowledge Hub facilitates the integration and structuring of diverse pieces of information within the EGDI system. It allows for the establishment of meaningful connections and relationships between different data sources, ensuring a coherent and organized presentation of knowledge.

Through the Knowledge Hub, users can efficiently navigate and explore the EGDI system, accessing relevant information in a structured and interconnected manner. It enhances the overall usability and effectiveness of the system, enabling users to leverage the collective knowledge and expertise within the EGDI framework.

How to cite: Cipolloni, C., Šinigoj, J., Schiegl, M., Martin, Á. P., Tulstrup, J., and Gruijters, S.: The EGDI Knowledge Infrastructure – the next step of Geological data, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-20777, https://doi.org/10.5194/egusphere-egu24-20777, 2024.

15:20–15:30

EGU24-9192

On-site presentation

XAI for small-data problems in remote sensing: monitoring Atlantic forests with UAVs

Pranav Chandramouli, Caroline M. Gevaert, Francesco Nattino, Ou Ku, Alexandra Aguiar Pedro, Patricia do Prado Oliveira, Eduardo Hortal Pereira Barreto, and Felipe de Ovileira

Despite the increased availability of UAV / drone imagery in Low- to upper Middle-Income countries and the demonstrated potential of deep learning to support the interpretation of these images for sustainable development purposes, practical operations in these countries are constrained by the need for sufficient labeled data-sets which are often difficult to obtain (especially for tropical forest). This makes it difficult to train suitable networks and assess whether the model is performing well. One such example is the use of drones to monitor the Atlantic Forest in Sao Paulo, Brazil. Here, members of the Sao Paulo Municipal Green and Environment Secretariat (Secretaria do Verde e do Meio Ambiente - SVMA) are starting to use drones to identify some native and invasive species in their forests. Deep learning will quickly speed up this process, but there is little training data available. This refers to the so called ‘small-data problem’ commonly found in DL for remote sensing applications [1]. A workflow was designed to support this application through a novel zero-shot learning technique and explainable AI methods. A pre-trained tree-crown detection model ‘DeepForest’ [2] is used to identify individual tree crowns in the UAV imagery. The detected tree-crowns are further classified using a Siamese network architecture using zero-shot learning – the model is trained on relevant data-sets but not exposed to species found in the test data-set. A Siamese network architecture is motivated by the need for explainability in DL models – the results will be used for making administrative decision for forest management. A more intricate DL model (such as image segmentation) could be more accurate but at the cost of transparency/explainability. In particular, we apply a variation of the ‘What I Know’ (WIK) explainability method [3] which provides examples from the training set along with the test sample increasing transparency and understanding of the model results.

[1] Safonova, Anastasiia, et al. "Ten deep learning techniques to address small data problems with remote sensing." International Journal of Applied Earth Observation and Geoinformation 125 (2023): 103569.

[2] Weinstein, Ben G., et al. "DeepForest: A Python package for RGB deep learning tree crown delineation." Methods in Ecology and Evolution 11.12 (2020): 1743-1751.

[3] Ishikawa, Shin-nosuke, et al. "Example-based explainable AI and its application for remote sensing image classification." International Journal of Applied Earth Observation and Geoinformation 118 (2023): 103215.

How to cite: Chandramouli, P., Gevaert, C. M., Nattino, F., Ku, O., Aguiar Pedro, A., do Prado Oliveira, P., Hortal Pereira Barreto, E., and de Ovileira, F.: XAI for small-data problems in remote sensing: monitoring Atlantic forests with UAVs, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-9192, https://doi.org/10.5194/egusphere-egu24-9192, 2024.

15:30–15:40

EGU24-6817

Highlight

On-site presentation

Development and Application of a Model to Extract Disaster and Safety-related Information from News Big Data reported in the Media using Text Mining

Sengyong Choi, Do Woo Kim, Eun Hye Shin, and Yeon Ju Kim

In Korea, when a disaster occurs, numerous news related to the disaster are reported very quickly through various media. These news contain useful disaster-related information that disaster researchers need, such as the causes of disasters, problems in the process of disaster occurrence, and improvement measures suggested by related experts. However, finding articles containing the disaster-related information we need from the numerous news reports in the media is not easy and takes a long time. Accordingly, in this study, the R-Scanner model using text mining technology was developed to extract disaster and safety information desired by users from large-scale news data, which is 'unstructured big data'. Here, R stands for Risk. The developed model was constructed based on natural language processing systems for Korean and English and was developed to perform Sentence Segmentation, Tokenization, and Morphological Analysis using text as analysis data. In the Morphological Analysis process, the model was developed to perform Entity Recognition, Semantic Role Labeling, and Semantic Chunking. Additionally, the model was developed to extract articles containing the desired information from news big data reported through the media when the user inputs keywords related to the desired information, and the extracted articles can be downloaded in Excel format. To verify the performance of the developed model, we applied it to landslides that resulted in 14 deaths due to torrential rains in Korea in 2023. Problems and improvement measures in the landslide occurrence process were set with the desired information, and keywords were set to extract each information. About 200 keywords related to problems were set, such as 'procrastination', 'defenseless', 'ignored', 'sloppy', and 'careless', and about dozens of keywords such as ‘suggested’, ‘should be prepared’, and ‘necessary’ were set as keywords related to improvement measures. As a result of applying the model, a total of 364 articles related to problems and improvement measures were extracted from 30 media news 15,911,665 articles, and as a result of grouping the extracted problems and improvement measures into similar contents, 24 problems and 22 improvement measures were finally derived. As a result of the review of related experts on the problems and improvement measures derived, it was confirmed that the contents were quite meaningful. The problems and improvement measures derived in this way were used as basic data for the establishment of government measures to prevent landslides. In the future, the developed model is expected to be used not only to establish the government's countermeasures for disaster, but also to monitor real-time disaster and safety issues, and furthermore to detect disaster risks at an early stage.

How to cite: Choi, S., Kim, D. W., Shin, E. H., and Kim, Y. J.: Development and Application of a Model to Extract Disaster and Safety-related Information from News Big Data reported in the Media using Text Mining, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-6817, https://doi.org/10.5194/egusphere-egu24-6817, 2024.

15:40–15:45

Discussion

Posters on site: Mon, 15 Apr, 16:15–18:00 | Hall X4

Display time: Mon, 15 Apr, 14:00–18:00

Chairpersons: Luigi Bianco, Giacomo Roncoroni, Ivana Ventola

X4.189

EGU24-2695

Data inheritance concept in mineralogical warehouse

Liubomyr Gavryliv, Vitalii Ponomar, and Marián Putiš

Recently, there has been a substantial surge in the availability of web services offering access to geological, geochemical, crystallographic, and mineralogical data. Embracing this data-rich environment, mineralogy.rocks emerges as a pioneering outreach project, poised to harness the vast potential embedded in this information reservoir. Focused on extracting valuable insights, the project seeks to leverage the wealth of open-access data and fosters knowledge dissemination by openly sharing the underlying code of its processes under MIT, available on https://github.com/orgs/mineralogy-rocks.

Mineralogy.rocks' core developers recently tackled the challenge of establishing relationships between minerals and their associated entities such as synonyms, varieties, and parental groups. Often, these related entries lack distinct properties; synonyms may only have a name and historical context, chemical varieties might differ only in impurity presence, and structural variations may diverge solely in crystal system. Database-wise, all other properties remain identical to the parent mineral.

In response, we introduce the concept of Data Inheritance, drawing parallels with Object-Oriented Programming's class inheritance mechanism. This concept permits multiple base classes, enabling a derived class to override methods of its base class or classes, thus allowing objects to encompass diverse and arbitrary data. Applied to a data warehouse dimension, this concept facilitates the retrieval of the actual properties of a related entry defined in the database and the inherited properties not defined for this specific entry but established for the parental mineral.

To implement this, we calculate the inheritance chain, representing the chain of relations from the bottom-most child entry to the top-most parental mineral, such as in the case of agate—chalcedony—quartz. The chain, coupled with specific code rules and patterns, enables the retrieval of properties for each entry in the chain, effectively determining which properties are pertinent to the child species. This systematic approach adds precision and clarity to the extraction and utilization of mineralogical data in the context of inherited properties.

mineralogy.rocks is dedicated to open science, prioritizing innovation, quality, and public impact in mineralogical research. Our commitment is evident through actions that swiftly share research outcomes and metadata, fostering accessibility and reuse. Embracing open science principles, we contribute to advancing the field with transparent and collaborative practices.

This project, No. 3007/01/01, has received funding from the European Union’s Horizon 2020 research and innovation Programme based on a grant agreement under the Marie Skłodowska-Curie scheme No. 945478 and was supported by the Slovak Research and Development Agency (contracts APVV-19-0065 and APVV-22-0092).

How to cite: Gavryliv, L., Ponomar, V., and Putiš, M.: Data inheritance concept in mineralogical warehouse, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-2695, https://doi.org/10.5194/egusphere-egu24-2695, 2024.

X4.190

EGU24-7272

ECS

Estimation of Hydraulic and Thermal Parameters Using Convolutional Neural Network and Hydraulic Tomography

Che-Wei Liang and Jui-Pin Tsai

The ground-source heat pump (GSHP) is an efficient thermal exchange system that utilizes natural environmental heat for heating and cooling. Heat exchange efficiency depends not only on factors such as pipe material and diameter but also on groundwater's flow field and soil's thermal parameters. This study aims to estimate hydraulic and geothermal parameters by utilizing convolutional encoder-decoder architecture neural networks and hydraulic tomography, a data collection strategy. The proposed method is named THT-NN. To examine the capability of the THT-NN on parameter estimation, we developed numerical experiments to test THT-NN. Further, to produce the training and validation data pairs, we create a two-dimensional heterogeneous groundwater and heat transport model by TOUGH2 with constant injection patterns and 10000+ realizations of parameter fields. The groundwater heads and temperature collected from the monitoring well groups are used to develop two channels of the input layers, and four parameters' fields (hydraulic conductivity, porosity, heat conductivity, and specific heat) are used to develop four channels of the output layers. Subsequently, the estimated parameters results are examined by R2 and root mean squared error. The performance of the proposed THT-NN is discussed in this study.

How to cite: Liang, C.-W. and Tsai, J.-P.: Estimation of Hydraulic and Thermal Parameters Using Convolutional Neural Network and Hydraulic Tomography, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-7272, https://doi.org/10.5194/egusphere-egu24-7272, 2024.

X4.191

EGU24-8843

Use of deep learning and a partial convolutional neural network to gap-fill a long term time series of NO2 columns from satellite impacted by cloud

wanli ma, Hugh Coe, David Topping, Zhonghua Zheng, Congbo Song, and Hao Zhang

Satellite monitoring plays a significant role in monitoring nitrogen dioxide (NO2) concentrations in the atmospheric column, but it is often affected by clouds and ice and snow surface. This leads to much missing data. Deep learning with Partial Convolutional Neural Network (PCNN) is adept at handling incomplete or missing data in image processing by focusing only on the known pixels during convolution, thus making approach ideal for tasks such as image restoration, denoising, and enhancing resolution.

It is therefore important to reduce such data gaps.. Under cloudy skies, ground-level NO2 often tends to be higher. Clouds are typically associated with low pressure and increased wind speeds in mid-latitudes, leading to enhanced dispersion of pollutant. However, low cloud often occurs during periods of high pressure when boundary layer heights are lower and air pollutants are trapped closer to the ground. Additionally, clouds intensify the Surface Sensible Heat Flux, contributing to the urban heat island effect and potentially increasing NO2 concentrations. On the other hand, clouds decrease Surface Net Solar Radiation, which might mitigate NO2 photolysis.

It is therefore likely that NO2 concentrations close to the surface during cloudy conditions will not necessarily be well represented by satellite derived NO2 columns in clear sky conditions.. It becomes necessary to recalibrate satellite-derived data to reflect actual meteorological conditions. In this work we separate out ground-level data from an urban network across Paris, France, into two categories: those with contemporaneous TROPOMI and those without. Each category is then analyzed with the weather conditions at that time. This analysis helps estimate the variance in NO2 concentrations due to cloud presence. Subsequently, the determined percentage difference, indicative of the cloud cover's impact, is applied to the NO2 estimates provided by the PCNN model.

This adjustment not only strengthens the data's coverage but also its reliability, reducing the biases in the original satellite data resulting from clear sky viewing only and are therefore a closer representation of the urban atmospheric pollution. This approach, combining technical precision with contextual sensitivity, improves the use of satellite data as a tool for understanding and interpreting urban pollution.

How to cite: ma, W., Coe, H., Topping, D., Zheng, Z., Song, C., and Zhang, H.: Use of deep learning and a partial convolutional neural network to gap-fill a long term time series of NO2 columns from satellite impacted by cloud, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-8843, https://doi.org/10.5194/egusphere-egu24-8843, 2024.

X4.192

EGU24-13948

Progress in the construction of DDE OnePetrology Igneous Rock Database

(withdrawn after no-show)

Yi Ding

X4.193

EGU24-14437

ECS

PredRNNv2-based drought prediction using Vegetation Health Index (VHI)

Soo-Jin Lee and Yangwon Lee

Droughts are expected to increase in both frequency and severity, exacerbated by rising global temperatures associated with climate change. These trends pose serious threats to the agricultural sector, directly impacting food production and security. Moreover, increasing drought incidence increases the risks associated with agricultural and forestry disasters, including reduced crop yields, soil degradation, and wildfires. Given these challenges, the ability to accurately monitor and predict drought conditions is critical. Effective drought forecasting plays an important role in establishing agricultural and water management policies and enabling better handling of the impacts of these events. This will enable timely and informed decisions to ensure that appropriate measures are in place to mitigate the adverse impacts of drought on ecosystems, food supplies and overall environmental health. The development and improvement of tools for drought time series forecasting is therefore essential to ongoing efforts to adapt to and mitigate the impacts of climate change. This study introduces a model designed to predict Vegetation Health Index (VHI) time series data using the Predictive Recurrent Neural Network Version 2 (PredRNN-V2). The VHI, which effectively integrates land surface temperature and vegetation status, has been widely used in drought assessment. The study focuses on South Korea, utilizing long-term weekly VHI data from NOAA for short-term prediction. The PredRNN-V2 model utilizes a network of interconnected spatio-temporal LSTM cells to learn and predict the temporal and spatial characteristics of time series images. This architecture can properly handle the complex spatial and temporal dynamics inherent in satellite-based drought data and can therefore be an effective tool for drought prediction.

This research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education(NRF-2022R1I1A1A01073185)

How to cite: Lee, S.-J. and Lee, Y.: PredRNNv2-based drought prediction using Vegetation Health Index (VHI), EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-14437, https://doi.org/10.5194/egusphere-egu24-14437, 2024.

X4.194

EGU24-16473

ECS

Segment Every Fossil - A Deep Learning Model Tailored for Automatic Segmentation of Microfossils in Thin Section Images

(withdrawn after no-show)

Ivan Ferreira-Chacua and Ardiansyah Koeshidayatullah

X4.195

EGU24-18515

ECS

Toward Determining the Controls on Subduction Zone Seismic Behaviour with Machine Learning

Valerie Locher, Rebecca Bell, Cedric John, and Parastoo Salah

Variations in earthquake frequency and magnitude across global subduction zones are thought to be influenced by a combination of geological and geophysical factors, such as the age and dip angle of the subducting plate. Despite numerous previous qualitative studies on the correlation between seismic behaviour and subduction zone characteristics, the parameters and mechanisms governing seismicity at subduction zones remain elusive. Our limited historical record of earthquakes further complicates this understanding. Finding underlying general correlations and mechanisms that are valid across different subduction trenches is critical for assessing seismic behaviour and earthquake hazards along subduction plate boundaries which are poorly monitored or have been seismically quiet during the short instrumental record.
This study aims to bridge the knowledge gaps highlighted above by applying specific unsupervised machine learning techniques to publicly available data on subduction zone parameters and earthquake catalogues. This approach is particularly adept at uncovering hidden correlations in complex, high-dimensional datasets, which might not be discernible through traditional analysis methods. We suggest that seismic behaviour may be describable as a non-linear combination of subduction margin parameters and present a quantitative tool for comparing seismic behaviours across different margins. This may help assess seismic hazards in regions with scant seismic records or that have been historically quiescent. By doing so, we hope to contribute significantly to the predictive modelling of earthquake occurrences and their potential impacts globally.

How to cite: Locher, V., Bell, R., John, C., and Salah, P.: Toward Determining the Controls on Subduction Zone Seismic Behaviour with Machine Learning, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-18515, https://doi.org/10.5194/egusphere-egu24-18515, 2024.

X4.196

EGU24-19139

ECS

Data Bridges: Modeling Marine Science Information to Heterogeneous Information Network for Research Data Management

Muhammad Asif Suryani, Ewa Burwicz-Galerne, Klaus Wallmann, and Matthias Renz

Research Data Management (RDM) in Natural Science establishes a structured foundation for organizing and preserving scientific data. Effective management and access to these diverse data sources are crucial for supporting domain scientists in future knowledge discovery. Scientific publications, a primary data source often presented in Portable Document Format (PDF), serve as a rich source of information, encompassing text, tables, figures, and metadata. These components present information individually or collectively, offering the potential to explore exciting research directions. However, to fully address these aspects, it is necessary to be able to perform data acquisition from these publications, focusing on these data components, and conducting respective information extraction. Furthermore, modeling the extracted information into a Heterogeneous Information Network of publications enhances accessibility, collaboration, and information harvesting within the natural sciences domain.

We developed a comprehensive framework ensuring user accessibility and widespread applicability, which is capable of modeling diverse information from marine science publications into a Heterogeneous Information Network. The framework comprises three modules: Data Acquisition, Information Extraction, and Information Modeling. The Data Acquisition (DA) module extracts various data components from the relevant publications and transforms them into machine-readable formats. The Information Extraction (IE) module includes two sub-modules: Named Entity Recognition (NER) modules trained on marine science annotated text, capable of extracting eight different types of entities from plain text; and an information parser module responsible for extracting quantitative information from tabular data. It initially detects and then extracts scientific measurements, relevant spatial information, and other available characteristics. Finally, the information modeling module exhibits the extracted information from data components and performs information linking. Consequently, the information is structured into a Heterogeneous Information Network (HIN) of scientific publications, ensuring effective information delivery and providing diverse information to domain experts while supporting the Research Data Management initiative.

How to cite: Suryani, M. A., Burwicz-Galerne, E., Wallmann, K., and Renz, M.: Data Bridges: Modeling Marine Science Information to Heterogeneous Information Network for Research Data Management, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-19139, https://doi.org/10.5194/egusphere-egu24-19139, 2024.

Posters virtual: Mon, 15 Apr, 14:00–15:45 | vHall X4

Display time: Mon, 15 Apr, 08:30–18:00

Chairpersons: Jie Dodo Xu, J ZhangZhou, Guillaume Siron

vX4.22

EGU24-77

ECS

Identification and Application of Detrital Diagenetic Facies Logging Based on Unsupervised Xi Technology: A Case Study of the Mesozoic in Chengdao-Zhuanghai Area

yuan meng and liqiang zhang

The Mesozoic in Chengdao-Zhuanghai area is affected by complex tectonic evolution, diverse sedimentary types and lithology, and the reservoir heterogeneity is extremely strong, and the prediction of reservoir quality is difficult, and the accurate identification and division of lithofacies types plays a crucial role in the classification and evaluation of reservoirs. In the well section with relatively few corings, four logging curves sensitive to diagenesis, GR, AC, DEN, and RD were selected as the basis for diagenetic facies division, and the diagenetic facies division was carried out by the method of machine Xi. The traditional machine Xi is divided into two Xi: supervised Xi and unsupervised, in which supervised Xi requires a large number of Xi samples to ensure its accuracy, and unsupervised Xi does not need to learn Xi samples, but the classification results may not be the expected classification type. Combined with the characteristics of strong heterogeneity, relatively few coring sections and limited results of unsupervised Xi in this area, the method of unsupervised Xi with single factor constraint was considered to identify and divide the logging facies of the three formations in the Chengdao-Zhuhai area. Combined with the geological data such as core, cast thin section identification, logging data, etc., the calibration of logging facies and diagenetic facies is realized, so as to complete the identification and division of regional diagenetic facies. Finally, the accuracy of the Xi method is verified by comparing the thin section identification results, which provides a basis for the identification of reservoir diagenetic facies in the lack of coring well sections.

Keywords: clastic rocks; Chengdao-Zhuanghai area; The Mesozoic;Diagenetic facies logging identification; Univariate constrained unsupervised learning

How to cite: meng, Y. and zhang, L.: Identification and Application of Detrital Diagenetic Facies Logging Based on Unsupervised Xi Technology: A Case Study of the Mesozoic in Chengdao-Zhuanghai Area, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-77, https://doi.org/10.5194/egusphere-egu24-77, 2024.

vX4.23

EGU24-8771

ECS

An Automated Conformalized Causal Learning System for Enhanced Mineral Prospectivity Mapping

Evelyn Jessica Jaya, Xinbing Wang, Chenghu Zhou, and Nanyang Ye

Mineral Prospectivity Mapping (MPM) is a crucial process in mineral exploration, traditionally hampered by subjective interpretations and labor-intensive methods leading to unreliable outcomes. Predominantly focused on Independent and Identically Distributed (IID) scenarios, traditional research in MPM often struggles to generalize in Out of Distribution (OOD) scenarios, which are vital for accurate mineral exploration. Addressing these challenges, we introduce an innovative automated conformalized causal learning system for MPM. This system integrates a comprehensive data preprocessing pipeline that includes interpolation, feature filtering, data augmentation, and splitting, effectively managing diverse and imbalanced geological datasets. A central component of the system is Bayesian Optimization, autonomously selecting optimal machine learning models and hyperparameters to significantly enhance performance over non-automated methods. The system's most significant innovation is the incorporation of conformalized causal learning, exceptionally effective in handling OOD data scenarios. This methodology introduces an 'uncertainty region' in predictive models through conformal prediction, substantially reducing misclassification risks, while causal learning elucidates complex cause-and-effect relationships among geological features, essential for precise mineral deposit predictions. We evaluated the performance of our approach on six datasets, where the Area Under the Receiver Operating Characteristic (AUC ROC) of our automated optimized system surpassed the baseline method by an overall 17.84%, and the false positive rate (FPR) was reduced by an overall 84.31%. This development marks a significant advancement in MPM, enhancing accuracy and efficiency in mineral resource exploration and setting a new benchmark in the field. Released as an open-source platform, it offers the geological community a highly efficient, adaptable, and user-friendly tool, poised to revolutionize mineral prospectivity mapping in varied real-world scenarios.

How to cite: Jaya, E. J., Wang, X., Zhou, C., and Ye, N.: An Automated Conformalized Causal Learning System for Enhanced Mineral Prospectivity Mapping, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-8771, https://doi.org/10.5194/egusphere-egu24-8771, 2024.

vX4.24

EGU24-11850

ECS

solicited

Revolutionizing Igneous Rock Classification: Proportion-Based Deep Learning Analysis of Petrographic Thin Section Photomicrographs

Evelyn Jessica Jaya, Xinbing Wang, Chenghu Zhou, and Nanyang Ye

In our study, we present a segmented-based, rule-driven classification of igneous rocks through the analysis of thin section photomicrographs, representing a significant advancement over traditional petrographic methods. This deep learning-based approach is especially innovative in its recognition that the naming of rocks is intrinsically linked to the proportion of minerals they contain, a vital aspect frequently overlooked in conventional classification techniques. By focusing on accurately quantifying these mineral proportions, our method effectively addresses the subjectivity and observer variability inherent in traditional petrography. Utilizing semantic image segmentation on 963 petrographic thin section photomicrographs, we have successfully identified 29 distinct minerals and classified 15 types of igneous rocks. This showcases the precision and scope of our approach, which automates the quantification of mineral proportions, thus ensuring a more objective and precise rock classification. The development of our proprietary dataset mask, despite its labor-intensive nature and the challenges with incomplete labelling, was crucial for achieving accurate segmentation based on the proportional regions of each mineral within the photomicrographs. This segmentation, key to our rule-driven classification, streamlines the rock naming process. Our method not only sets new standards in igneous rock classification but also signifies a transformative leap in geological research. By integrating advanced image processing with deep learning, we are opening new frontiers in Earth sciences, highlighting the transformative impact of technology in refining traditional geological methodologies. Considering the dataset's incomplete and highly imbalanced mask scenario, our method achieves an accuracy of 73.32%, significantly surpassing the baseline method using VGG16 as the backbone, which attains only 63.64% classification accuracy.

How to cite: Jaya, E. J., Wang, X., Zhou, C., and Ye, N.: Revolutionizing Igneous Rock Classification: Proportion-Based Deep Learning Analysis of Petrographic Thin Section Photomicrographs, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-11850, https://doi.org/10.5194/egusphere-egu24-11850, 2024.