LithoNet: A benchmark dataset for machine learning with digital outcrops

Sam Thiele; Ahmed J. Afifi; Sandra Lorenz; Raimon Tolosana-Delgado; Moritz Kirsch; Pedram Ghamisi; Richard Gloaguen

doi:https://doi.org/10.5194/egusphere-egu23-14007

[Back] [Session TS11.1]

EGU23-14007

https://doi.org/10.5194/egusphere-egu23-14007

EGU General Assembly 2023

© Author(s) 2023. This work is distributed under
the Creative Commons Attribution 4.0 License.

LithoNet: A benchmark dataset for machine learning with digital outcrops

Sam Thiele¹, Ahmed J. Afifi^1,2, Sandra Lorenz¹, Raimon Tolosana-Delgado¹, Moritz Kirsch¹, Pedram Ghamisi¹, and Richard Gloaguen¹

Sam Thiele et al.

¹Helmholtz-Zentrum Dresden-Rossendorf, Helmholtz-Institut Freiberg, School of Earth, Atmosphere and Environment, Freiberg, Germany (sam.thiele01@gmail.com)
²Karlsruher Institut für Technologie (KIT), Institut für Industrielle Informationstechnik (IIIT), Karlsruhe, Germany

Deep learning techniques are increasingly used to automatically derive geological maps from digital outcrop models, lessening interpretation time and (ideally) reducing bias. Such techniques are especially needed when hyperspectral images are back-projected to create data-rich ‘hypercloud’ type digital outcrop models. However, accurate validation of these automated mapping approaches is a significant challenge, due to the subjective nature of geological mapping and difficulty collecting quantitative validation data. This makes validation of different machine learning approaches for geological applications exceedingly difficult. Furthermore, many state-of-the-art deep learning methods are limited to 2-D image data, making application to 3-D digital outcrops (e.g., hyperclouds) an outstanding challenge.

In this contribution we present LithoNet, a benchmark digital outcrop dataset designed to (1) quantitatively compare learning approaches for geological mapping, and (2) facilitate the development of new approaches that are compatible with non-structured 3-D data (i.e., point clouds). LithoNet comprises two halves: a set of real digital outcrop models acquired at Corta Atalaya (Spain), attributed with different spectral and ground-truth data, and a synthetic twin that uses latent features in the original datasets to reconstruct realistic spectral data (including sensor noise and processing artifacts) from the ground-truth. We have used these datasets to explore the abilities of different machine learning approaches for automated geological mapping. By making it public we hope to foster the development and adaptation of new machine learning tools.

How to cite: Thiele, S., Afifi, A. J., Lorenz, S., Tolosana-Delgado, R., Kirsch, M., Ghamisi, P., and Gloaguen, R.: LithoNet: A benchmark dataset for machine learning with digital outcrops, EGU General Assembly 2023, Vienna, Austria, 24–28 Apr 2023, EGU23-14007, https://doi.org/10.5194/egusphere-egu23-14007, 2023.