EGU23-13612
https://doi.org/10.5194/egusphere-egu23-13612
EGU General Assembly 2023
© Author(s) 2023. This work is distributed under
the Creative Commons Attribution 4.0 License.

MIN-ML: A Machine Learning Framework for Exploring Mineral Relations and Classifying Common Igneous Minerals

Sarah Shi1, Penny Wieser2, Kerstin Lehnert1, and Lucia Profeta1
Sarah Shi et al.
  • 1Columbia University in the City of New York, Lamont-Doherty Earth Observatory, Geoinformatics, New York, United States of America (sarahshi@ldeo.columbia.edu)
  • 2University of California, Berkeley, Department of Earth and Planetary Science, Berkeley, United States of America

Explorations of mineral compositions aiming to reveal complex magmatic processes in melts have proliferated with the growing accessibility of geochemical datasets through databases including PetDB, LEPR/TraceDs, and GEOROC and of computational methods. The generation and continuous quality assurance of mineral data in these databases requires significant human intervention and individual post-processing. One major problem is that minerals may be misclassified (i.e., a compiled dataset of clinopyroxenes may contain some amphiboles), and compilations may contain poor-quality electron microprobe (EPMA) analyses (with low totals, low cation sums, or poor correspondence to the theoretical stoichiometry of a mineral phase). At the moment, individual studies compiling geochemical datasets for specific tectonic settings [1] or calibrating thermobarometers based on mineral-melt equilibrium [2] tend to apply their own filters. With a push for a more consistent approach, we create a new open-source Python package called MIN-ML (MINeral classification using Machine Learning) for classifying common igneous minerals based on oxide data collected by EPMA, with functions for calculating stoichiometries and crystallographic sites based on this classification. Utilizing this package allows for the identification of misclassified mineral phases and poor-quality data. We streamline data processing and cleaning to allow for the rapid transition to usable data, improving the utility of data curated in these databases and furthering computing and modeling capabilities. 

While mineral identification and classification are obviously critical to the success of computational methodologies and machine learning (ML) applied to these large datasets, the question of how to best classify minerals from EPMA analyses comes to the fore. We approach this question by exploring and developing ML workflows, both supervised (classification algorithms) and unsupervised (dimensionality reduction and clustering). Unsupervised methods including autoencoders, a type of artificial neural network, present the opportunity to classify minerals with little a priori information. Autoencoders pair two neural networks with an encoder, compressing input data to a dimensionality-reduced latent representation, and a decoder, expanding latent representations to reconstruct the input and minimize loss. We present a novel autoencoder model aimed at meaningfully representing EPMA analyses of minerals in latent space, investigating the relationships between mineral phases, and performing classifications of these minerals. The model is trained with newly compiled datasets of twelve igneous mineral phases on thousands to tens of thousands of analyses per phase – across tectonic settings to train these ML models. The autoencoder is applied to datasets of mineral analyses from PetDB, LEPR, and GEOROC to evaluate model performance and show significant improvements in mineral phase segregation and classification, critical to rigorous dataset quality control and future integration into data processing routines. 

 

[1] Gale, A., et al., The mean composition of ocean ridge basalts. Geochemistry, Geophysics, Geosystems 14, 489-518 (2013).

[2] Petrelli, M., et al., Machine learning thermobarometry: Application to clinopyroxene-bearing magmas. JGR: Solid Earth 125, e2020JB020130 (2020).

[3] Lehnert, K. A., et al., 2022, IEDA2: Evolving EarthChem, LEPR/TraceDs, and SESAR into a Next Generation Data Infrastructure for Data-Driven Research Paradigms in Geochemistry, Petrology, and Volcanology, in 2022 Goldschmidt Conference.

How to cite: Shi, S., Wieser, P., Lehnert, K., and Profeta, L.: MIN-ML: A Machine Learning Framework for Exploring Mineral Relations and Classifying Common Igneous Minerals, EGU General Assembly 2023, Vienna, Austria, 24–28 Apr 2023, EGU23-13612, https://doi.org/10.5194/egusphere-egu23-13612, 2023.