Microscopic analysis of sediment micro fractions requires skilled scientists and is a very time consuming and expensive process. As micro particles are diagnostic of paleoenvironments, sedimentological processes and time ranges (biostratigraphy), image recognition through machine learning holds great potential for automating the identification of microfossils, mineral grains, anthropogenic remnants (micro plastics), and other micro particles. Therefore, automatic image recognition and sorting is likely to render data acquisition more cost- and time-effective, not only increasing traceability and reproducibility but also further reducing identification errors.
The purpose of this session is to gather experts from the geoscientific, engineering, and deep learning communities who are collaborating to apply machine and deep learning techniques to microscopic analysis. Given the novelty of this technique, we encourage contributions addressing development in this field ¬¬– for example, production of training sets, laboratory and camera/video setups/designs, applied robotics, and algorithmic developments. We also welcome any geological studies applying machine learning and numerical approaches (including biometric studies) via image recognition of microscopic images.

Convener: Morten Hald | Co-conveners: Thibault de Garidel-Thoron, Fred Godtliebsen, Allison Hsiang, Marit-Solveig Seidenkrantz
| Attendance Mon, 04 May, 16:15–18:00 (CEST)

Files for download

Download all presentations (128MB)

Chat time: Monday, 4 May 2020, 16:15–18:00

Chairperson: Morten Hald
D1102 |
Jiaxin Yu, Joyce Schmatz, Marven von Domarus, Mingze Jiang, Simon Virgo, Bastian Leibe, and Florian Wellmann

Machine learning approaches and deep learning-based methods are efficient tools to address problems for which large amounts of observations and data are documented. They have proven excellent performance for many applications in the geosciences and remote sensing area. However, to one of the most fundamental data types in geoscientific studies, mineral thin sections, they have not yet been applied to its full potential. Mineral thin sections contain a treasure of information. It is anticipated that thin section samples can be systematically and quantitatively analyzed with a specifically designed system equipped with ML approaches or deep learning methods such as CNNs. The development of any artificial intelligence application that enables automated image analysis requires consistent and sufficiently large training datasets with ground truth labels. However, a dataset which serves for visual object detection in petrographic thin sections analysis is still missing. We wish to close this data gap by generating a large dataset of pixel-wise annotated microscopic images for thin sections.

The variation of optical features of certain minerals under different settings of a petrographic microscope is closely related to crystallographic characteristics that can be indicative for a mineral. In order to fully capture optical features into digital images, we generated raw data of microscopic images for different rock samples by using virtual petrographic microscopy (ViP), a cutting-edge methodology that is able to automatically scan entire thin sections in Gigapixel resolution under various polarization angle and illumination conditions. We proved that using ViP data will result in better segmentation result compared to single image acquisition.

Image annotation, especially pixel-wise annotation is always a time-consuming and inefficient process. Moreover, it would be particularly challenging when to manually create dense semantic labels for ViP data in view of its size and dimensionality. To address this problem, we proposed a human-computer collaborative annotation pipeline where computers extract image boundaries by splitting images into superpixels, while human-annotators subsequently associate each superpixel manually with a class label with a single mouse click or brush stroke. This frees the human annotator from the burden of painstakingly delineating the exact boundaries of grains by hand and it has the potential to significantly speed up the annotation process.

Instead of providing a discrete representation of images, superpixels are better aligned with region boundaries and largely reduce the image complexity. The use of superpixel segmentation in the annotation pipeline not only significantly reduce the manual workload for human annotators but also provides a significant dataset reduction by reducing the number of image primitives to operate on. In order to find the most suitable algorithms to generate superpixel segmentation, we evaluated state-of-art superpixel algorithms with regard to standard error metrics based on scanned ViP images and corresponding boundary maps traced by hand. We also proposed a novel adaption of the SLIC superpixel extraction algorithm that can cope with the multiple information layers of ViP data. We plan to use these superpixel algorithms in our pipeline to generate open data sets of several types of mineral thin sections for training of ML and DL algorithms.

How to cite: Yu, J., Schmatz, J., von Domarus, M., Jiang, M., Virgo, S., Leibe, B., and Wellmann, F.: Generating a pixel-wise annotated training dataset to train ML algorithms for mineral identification in rock thin sections, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-18865, https://doi.org/10.5194/egusphere-egu2020-18865, 2020.

D1103 |
Joanna Pszonka

The Mineral Liberation Analysis (MLA) setup is an automated measurement system to provide quantitative data of material features. Originally the MLA system was created and applied to mineralogical and metallurgical processing, however its usage turned out promising for extraction of quantitative data sets in other areas, including sedimentary geology, for example grain size and shape, digital textural maps, porosity, modal mineralogy or mineral associations.

The system is based on a scanning electron microscope (SEM) with an energy dispersive X-ray (EDX) spectrometer and a computer software:

(i)        backscattered electron (BSE) image analysis allows to determine grain boundaries and locations for X-ray spectral acquisition,

(ii)       X-ray spectra allow to classify mineralogical composition of samples by comparison to a library of reference spectra, and

(iii)     software automates microscope operations and data acquisition.

The application of the MLA is useful for collecting textural and mineralogical features of siliciclastic sediments, relevant for assessment of hydrodynamic properties of the flows that deposited them. Moreover, this approach seems to be crucial for analysis of the processes governing difficult to monitor submarine gravity flows, one of the most important sediment transport processes on Earth. Non-linear, non-uniform and unsteady dynamics of submarine gravity flows cause uncertainty in understanding of their nature. Usage of the MLA increases productivity, provides significant statistical representation, reduces human errors and bias as well as tedious manual analyses and is cost effective.

Research is the result of the project no. 2017/01/X/ST10/00048 funded by the Polish National Science Centre

How to cite: Pszonka, J.: Insight in hydrodynamic properties of submarine flows by the mineral liberation analysis system, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-928, https://doi.org/10.5194/egusphere-egu2020-928, 2020.

D1104 |
Benjamin Emmel, Ole-Andre Roli, and Anouar Romdhane

Sand volume and porosity measurements on sandstones are routine work in geoscientific applications, providing useful input to flow simulation in porous media-based analyses (e.g., in CO2 storage and/or hydrocarbon migration studies). The classic way to gain knowledge about these parameters is point counting on thin sections. This time-consuming, repetitive, and subjective work is usually done by an experienced petrographer. Attempts to automate and digitize this process are therefore promising. An example using image analysis has been discussed in Roduit, 2007. However, one step further is combining image analysis with machine learning.

In this work, we evaluate the use of a neural network learning algorithm to classify selected sandstone properties from thin section images. Our database consists of ca. 3500 thin section images from different sandstone types with known properties. The images are grouped into 8 different sand volume and 8 different porosity classes. We split the dataset into a training (85 %) and validation dataset (15 %). In the processing stage, we normalize and scale all the images to a reference number of 128 pixels. For both classifications, we trained a convolutional neural network consisting of 5 convolutional layers and 4 max pool layers. The batches are normalized after each pooling layer and a dropout layer used to reduce overfitting before flattening. A final soft max layer is added so that the recovered output can be interpreted as probability distributions. We perform the training phase with a varying number of epochs ranging between 20 and 200. A training and validation accuracy > ca. 90 % is obtained after 25 epochs. For both cases, we observe that initially high model loss for the validation data reaches low values after 50 epochs.

To further test the approach, we analyse in a second stage a holdout dataset of sandstones from the Norwegian Continental Shelf. Preliminary results show that the derived sand volumes classification reproduce the point counting results well (80 % accuracy of predicting classes or neighbouring classes). More problematic is the reproducibility of porosities. Here, models using different epochs show variable results and the ≥100 epochs models systematically underestimates the measured rock porosities. We observe that only porosity classes well represented in the initial population of training images are reproduced with high accuracy. We finally discuss strategies to overcome such limitations.


Roduit, N., 2007. JMicroVision: un logiciel d'analyse d'images pétrographiques polyvalent. PhD Thesis University of Geneva,116 pp. 

How to cite: Emmel, B., Roli, O.-A., and Romdhane, A.: Machine learning to identify sandstone properties from thin sections, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-4697, https://doi.org/10.5194/egusphere-egu2020-4697, 2020.

D1105 |
Niccolò Maffezzoli, Giovanni Baccolo, Patrizia Ferretti, Barbara Delmonte, Kerim Nisancioglu, and Carlo Barbante

The detection of insoluble particles trapped in ice cores, like volcanic and dust particles, pollen grains, foraminifera and diatom assemblages, represents the experimental basis for multiple lines of environmental paleoresearch regarding the atmosphere, the biosphere and volcanology. To date, except for ice core dust, the detection of such particles is achieved through observations by manual microscopy. Artificial Intelligence predictive models are already applied to several research fields within geoscience, but up to date its implementation to ice core science is missing. The recently EU funded Marie Curie ICELEARNING project (2020-2022) aims to develop a two-phase routine for the automatic quantification of insoluble particles trapped in ice cores. The routine is based on a commercial Flow Imaging Microscope producing micro-scale images of insoluble particles from melted ice core samples. The image collection of mineral dust, tephra, pollen and marine foraminifera obtained from natural and/or ad-hoc prepared samples will constitute the training datasets. The images will be then analyzed by Pattern Recognition algorithms developed for automatic particle classification and counting. The routine will be specifically developed in order to be implemented in ice core Continuous Flow Analysis (CFA) systems, thus improving the more traditional methods and potentially providing continuous ice core insoluble particle records. The ICELEARNING methodology is suitable for melted ice core samples and any diluted aqueous sample, thus representing a ground-breaking analytical advancement for a wide range of research fields, from ice core science to marine geology. The innovative routine here presented is automatic and non-destructive, imperative prerequisites for future Antarctic ice core projects.

How to cite: Maffezzoli, N., Baccolo, G., Ferretti, P., Delmonte, B., Nisancioglu, K., and Barbante, C.: The ICELEARNING project - Artificial Intelligence techniques for ice core analyses , EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-11879, https://doi.org/10.5194/egusphere-egu2020-11879, 2020.

D1106 |
Juho Junttila, Steffen Aagaard Sørensen, Thomas Haugland Johansen, and Geir Wing Gabrielsen

Information about the distribution microplastics is crucial in marine environmental research. At present, plastic pollution is an environmental threat to the oceans and more than 90 % of microplastic particles are assumed to be deposited in the sediments on the ocean floor. An efficient way of identifying microplastic particles in marine sediments would result in improved understanding of microplastic distribution, inception, accumulation areas, and impact on marine ecosystems. Today, manual classification of microplastic particles using a microscope is time consuming. The goal of this study is to identify microplastic particles in marine sediment samples with the help of image recognition and machine learning. The possibility of using artificial microplastic particles will also be tested as a means of constructing comprehensive training sets. Existing algorithms already have been successful in classification of microfossils, which could be further developed for recognition of microplastic particles. Furthermore, hyperspectral analysis will be tested to determine the origin of the microplastic particles. Our overall goal is to train classifiers that in the future successfully can recognize different plastic objects in marine sediment samples and thereby replace the time-consuming manual classification task. Comparison between human based and machine based identifications for a large number of data sets will be made to test these classifiers.

How to cite: Junttila, J., Aagaard Sørensen, S., Haugland Johansen, T., and Wing Gabrielsen, G.: Image recognition of microplastic particles in marine sediments – planned activities, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-7858, https://doi.org/10.5194/egusphere-egu2020-7858, 2020.

D1107 |
| Highlight
Steffen Aagaard-Sørensen, Thomas Haugland Johansen, and Juho Junttila

Foraminifera are microscopic single-celled organisms, ubiquitous to the marine realm, that construct shells during their life cycle. The shells, in general, fossilize well in the sediment and they are diagnosable due to inter-species morphology and ornamentation variability. Classifying and counting foraminiferal shells is an important tool in assessing and reconstructing past and present environmental, oceanographic and climatological conditions. However, the present day manual identification procedure, performed with a microscope and a needle/brush, is a very time consuming. Circumventing this manual procedure, using machine leaning, promises to dramatically lower the time consumption related to generating foraminiferal data records.

The first step towards that end is developing a deep learning model that can detect and classify microscopic foraminifera from 2D digital microscope pictures. The work is based on a VGG16 model implementation that has been pre trained on the ImageNet dataset and employing transfer learning techniques to adapt the model to the foraminifera task. The 2D photographic training data input was constructed by combining objects representative of and extracted from Arctic marine sediments (100µm-1mm size fraction) from the Barents Sea region. Four object groups, including 1) calcareous and 2) agglutinated benthic foraminifera, 3) planktic foraminifera and 4) sediments were used in the training data construction. With the initial set-up the algorithms were able to identify adherence to one of the four groups correctly ~90% of the time and with further fine-tuning and refinement reaching 98% correct identifications.

The second step is to use machine leaning for classification of individual benthic calcareous foraminiferal species within the sediment. The work will focus on the 20 most common species that comprise ca. ≥ 80% of the total benthic calcareous foraminiferal fauna in the Arctic. The training of the algorithms will be done using targeted species-specific 2D photographic and 3D CT scanning data in addition to potentially using hyperspectral imaging.

How to cite: Aagaard-Sørensen, S., Haugland Johansen, T., and Junttila, J.: First-order machine learning based detection and classification of foraminifera in marine sediments from Arctic environments, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-7606, https://doi.org/10.5194/egusphere-egu2020-7606, 2020.

D1108 |
| Highlight
Thibault de Garidel-Thoron, Ross Marchant, Martin Tetard, Michael Adebayo, and Yves Gally

Recent progresses in image processing, and image recognition have paved the way for automated procedures to classify natural objects such as foraminifera. Foraminifera are among the most useful tracers in biostratigraphy and paleoceanography. Yet, the protocol used to extract and recognize the foraminifera has not changed since the mid-18thcentury: manual picking using a brush with a stereomicroscope.


Here we present the results we achieve by developing the MiSo - Microfossil Sorter - automaton, to automatically pick microfossils from the sediment coarse fraction. This automated system, built with ATG Technologies, is fully operational and works 24/7 at CEREGE. In this study, we will detail the basic workflow of the automaton, processing ~8000 particles/day, and its ability to cope with the large morphological and structural variability of particles encountered in real, marginal to deep-sea sediments. We use convolutional neural networks adapted and trained on deep sea sediment samples to classify the coarse sediment particles, including planktonic and benthic foraminifera.


As a test case, we will compare paleoceanographic records generated by a micropaleontologist with the ones generated by our automaton: relative abundance, fragmentation rate, biometrical changes. We have studied two deep-sea cores from the equatorial Pacific to document past hydrographic changes in the late Quaternary, achieving millennial scale resolution through the last deglaciation. Using the automaton, we processed more than 500,000 foraminifera. The accuracy of recognition typically ranges around 85 to 95% depending of the morphoclasses and of the CNN used for the training. Morphoclass size probability density function and assemblages derived from the CNN will be compared to multi-proxy (micropaleontological and geochemical) records. We will discuss the ongoing applications of our workflow, from foraminifera to pteropods in deep sea sediments, and the recent updates of our system.

How to cite: de Garidel-Thoron, T., Marchant, R., Tetard, M., Adebayo, M., and Gally, Y.: Automated recognition and picking of foraminifera using the MiSo (microfossil sorter) prototype, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-18067, https://doi.org/10.5194/egusphere-egu2020-18067, 2020.

D1109 |
| Highlight
Marit-Solveig Seidenkrantz, Claus Melvad, Kim Bjerge, Peter Ahrendt, Emiel J. Broeders, Anders O. B. Christensen, Mikkel Førrisdahl, Troels Poulsen, and Esben Skov

One of the best methods for studying past climate variability is the analyses of microfossils in sediment cores, especially foraminifera. However, this is highly laborious and time-consuming work. Consequently, several independent endeavors are currently underway with the aim of to automate this procedure, each testing different techniques. Here, we present preliminary results of one of these endeavors that focus on benthic foraminifera from arctic and temperate regions. The study is based on ongoing student projects carried out in collaboration between engineers and geologists. We combine robotics, imaging and machine learning.

The project is divided into three stages, with stage 1 and 2 currently ongoing: 1) Robotic separation of foraminiferal specimens from sediment particles, 2) Species classification algorithm based on Convolutional Neural Networks (CNN) including creation of training material. 3) System verification comparing analyses carried out by the automated system and a foraminiferal specialist on the same dataset. Phase 3 has not yet commenced, but initial results of 1 and 2 are available. In time, we hope to be able to build up a database of about 100 different foraminiferal species, which will cover the main assemblages of the coastal regions of the Arctic and Atlantic cold temperate regions.

For separating and picking of specimens (1) we have evaluated two different methods using a custom made xyz-platform or a robotic arm. Based on this, it seems that moving the specimens with a robotic arm will work well, but the price of such a robotic arm makes this solution less practical. In contrast, the combination of separating the specimens through shaking the sample in a tray and picking specimens for photographing and analyses using a suction system, with a custom made xyz-platform, is the best solution when considering quality, speed and price. Subsequently, the picked foraminifera/grains are delivered automatically to a digital microscopy system and photographed. So far focus on this part of the process has been developing a precise system for moving and picking, and in the future, we will work towards being able to handle particles of highly variable size in the same sample as well as increasing the speed of the picking and photographing process. 

For foraminiferal identification (2), parts of the labeling process have been automated using the Django (Python) framework and Amazon Web Services. Also, a number of imaging experiments have been investigated and several Convolutional Neural Network (CNN) algorithms are being developed and tested. In this first test, we include three different benthic foraminiferal species, with very distinct morphologies, as well as various types of clastic grains in approximately the same size fraction as the foraminiferal individuals. In this initial test case only relatively few specimens were included in the database (Ammonia batava - 168 specimens, Elphidium williamsoni - 168 specimens and Quinqueloculina seminulum specimens - 168 specimens as well as 449 clastic grains). Using a customized CNN algorithm, the separation of foraminifera from mineral grains and foraminiferal species identification could be carried out respectively with a precision, recall and F1-score of 94% and 91%.

How to cite: Seidenkrantz, M.-S., Melvad, C., Bjerge, K., Ahrendt, P., Broeders, E. J., Christensen, A. O. B., Førrisdahl, M., Poulsen, T., and Skov, E.: Foraminiferal sorting and identification: preliminary results of a test phase, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-5953, https://doi.org/10.5194/egusphere-egu2020-5953, 2020.

D1110 |
| Highlight
Allison Hsiang and Pincelli Hull

The rich fossil record of planktonic foraminifera makes them an indispensable group for understanding interactions between climatic, oceanic, and biological dynamics through time and space. Over the past few years, we have been working to provide databases and informatics resources to standardize and speed up the generation of large datasets for community-scale analyses of planktonic foraminifera. Our public database Endless Forams Most Beautiful (www.endlessforams.org), which currently contains >34,000 unique images of individual planktonic foraminifera comprising 35 species, is an important new resource for taxonomic training and standardization, supervised machine learning, and large-scale analyses of community ecology and morphological evolution. Here, we present one such application using both the individuals in the Endless Forams database and an additional ~26,000 specimens from across the North Atlantic, identified using a supervised machine learning classifier trained using the Endless Foram data. We combine taxonomic information from these ~60,000 individuals with morphometric measurements extracted using our open source software AutoMorph to explore ecological and evolutionary drivers of modern planktonic foraminifera diversity and size.

How to cite: Hsiang, A. and Hull, P.: Next-generation community ecology: Exploring ecological and evolutionary drivers of planktonic foraminifera diversity using the Endless Forams database and a supervised machine learning classifier, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-9421, https://doi.org/10.5194/egusphere-egu2020-9421, 2020.

D1111 |
Martin Tetard, Ross Marchant, Yves Gally, Thibault de Garidel-Thoron, and Luc Beaufort

Identification of microfossils is usually done by taxonomist experts and requires significant systematic knowledge and time, as about 300 specimens per sample are commonly identified for statistically reliable studies. Radiolaria are no exception, and their utility has been demonstrated for a long time in biostratigraphy using the presence / absence of some species, as well in palaeoceanographic reconstructions (past productivity, temperature, and water masses variability). Traditionally, these studies have required the manual identification of numerous species in a lot of samples under a transmitted light microscope, which is very time consuming. Furthermore, identification may differ between operators, biasing reproducibility. Recent technological advances in image acquisition, processing, and recognition now enable automated procedures for this process, from microscopic slide field-of-view acquisition to taxonomic identification.


A new workflow was developed for radiolarian acquisition, processing and identification. Firstly, a new protocol was developed as a proposed standard methodology for preparing radiolarian microscopic slides. We mount 8 samples per slide (using 12x12 mm cover slides) on which radiolarians were randomly and uniformly decanted using a new 3D-printed decanter that minimizes the loss of material. The slides are then automatically imaged using an automated transmitted light microscope. About 500 individual radiolarian specimens (excluding the broken and overlaying specimens) are recovered (about 4000 specimens per slide) from 3375 original fields of view (15 images z-stacked per FOV x 225 FOVs) per sample, after which automated image processing and segmentation is performed using a custom plugin developed for the ImageJ software. Each image is then classified using a convolutional neural network (CNN) trained on a database of radiolarian images.


To create the CNN classification stage, a dedicated software program, ParticleTrieur, was used to annotate a large dataset of radiolarian taxa (currently more than 27488 images, corresponding to 101 classes, from Neogene to recent). This software enables the visualisation and assignation of radiolarian pictures to defined taxa by progressively learning and suggesting taxa labels based on previous labelling. This database was then used to train a CNN (convolutional neural network) for the automated taxonomical identification stage. After fusing classes containing less than 10 images into a single “other” class, 69 classes were trained to be recognised with an overall accuracy of 93 %. This new workflow will now be used on a Miocene to Recent sedimentary record from the IODP expedition 363 (Core U1488A), recovered in the West Pacific Warm Pool.

How to cite: Tetard, M., Marchant, R., Gally, Y., de Garidel-Thoron, T., and Beaufort, L.: A new automated radiolarian image acquisition, processing and identification workflow, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-16034, https://doi.org/10.5194/egusphere-egu2020-16034, 2020.

D1112 |
Luc Beaufort, Yves Gally, Thibault de Garidel-Thoron, Ross Marchant, and Martin Tetard

SYRACO (SYstème de Reconnaissance Automatique de COccolithes) is a software that pilots an automatic microscope and a digital camera in order to automatically recognize coccolith species and measure their morphological characteristic based on artificial neural networks. The first version was displayed in 1996 (Dollfus and Beaufort, 1996; 1999) and was scientifically used for the first time in 2001 (Beaufort et al., 2001). SYRACO evolved during the last 20 years in many aspects such as the architecture of the neural networks, the image scanning and pre-treatments. Twenty years ago, SYRACO was dedicated to quaternary paleoceanographic studies, because it was able to recognize morphological classes. With all the developments, it is now able to be used in biostratigraphy as it is able to determine coccolith species. The latest version of SYRACO will be described, and an example of application to a south Pacific core will be given.  


Beaufort, L., de Garidel Thoron , T., Mix, A. C., and Pisias, N. G.: ENSO-like forcing on Oceanic Primary Production during the late Pleistocene, Science, 293, 2440-2444, 2001.

Dollfus, D., and Beaufort, L.: Automatic pattern recognition of calcareous nannoplankton, Neural Network and their Applications : NEURAP 96, Marseille, France, 1996, 306-311, 

Dollfus, D., and Beaufort, L.: Fat neural network for recognition of position-normalised objects, Neural Networks, 12, 553-560, 1999.

How to cite: Beaufort, L., Gally, Y., de Garidel-Thoron, T., Marchant, R., and Tetard, M.: Automatic calcareous nannofossil biostratigraphy using the latest version of SYRACO, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-17548, https://doi.org/10.5194/egusphere-egu2020-17548, 2020.