Optimization of clustering analyses for classification of ChemCam data from Gale crater, Mars

Kristin Rammelkamp; Olivier Gasnault; Olivier Forni; Jeremie Lasue; Sylvestre Maurice

doi:https://doi.org/10.5194/epsc2020-867

[Back] [Session TP14]

EPSC Abstracts

Vol. 14, EPSC2020-867, 2020, updated on 26 Aug 2024

https://doi.org/10.5194/epsc2020-867

Europlanet Science Congress 2020

© Author(s) 2024. This work is distributed under
the Creative Commons Attribution 4.0 License.

Optimization of clustering analyses for classification of ChemCam data from Gale crater, Mars

Kristin Rammelkamp, Olivier Gasnault, Olivier Forni, Jeremie Lasue, and Sylvestre Maurice

Kristin Rammelkamp et al.

Institut de recherche en Astrophysique et Planétologie, Toulouse, France (kristin.rammelkamp@irap.omp.eu)

1. Introduction
ChemCam, the first extraterrestrially employed LIBS (laser-induced breakdown spectroscopy) instrument on board the Mars science laboratory (MSL) rover, has been successfully operating since landing in Gale crater in 2012 [1,2]. ChemCam is used nearly every sol [3] which allows to track compositional variations on a small scale and to collect a large dataset. Thus, machine learning based classification analyses can help to identify phases of similar compositions [4]. Using supervised techniques is challenging due to the lack of training data from Mars, nevertheless, unsupervised methods such as hierarchical clustering applied to particular datasets prove to provide conclusive and interpretable classifications [5-7]. Here, we present an optimized approach relying on repeated clustering of randomly selected sub-datasets in order to consolidate the selection of clusters that consist of targets with similar compositions and are somewhat unique relative to the others.

2. Method

We limited the ChemCam dataset to spectra from targets measured at a maximum distance of 3.5 m until sol 2756 of the mission which are 18833 spectra in total. Non-negative matrix factorization (NMF) with six components was applied for dimensionality reduction. For the clustering, we tested several techniques such as agglomerative hierarchical clustering with linkage complete and ward, spectral clustering, and k-means clustering. While the formers are efficient to identify outliers, the latter gives better results in terms of cluster quality measures such as silhouette scores and distance-to-spread ratios. Hence, we will focus on k-means clustering in the following.
The applied approach includes 100 repetitions with one repetition structured as follows:

3. Results
To evaluate if the clustering of the 100 repetitions is consistent, we computed the distribution of members n in the clusters for each repetition in Figure 1. Besides a few outliers, the membership is consistent between the runs which proves a certain stability of our clustering method. Next, we used the major elemental compositions [8] to compare the mean composition of each cluster among the repetitions, which is shown for all major elements in Figure 2. Similar as for the membership, the mean compositions of the clusters are consistent despite a few outliers. Furthermore, there are no strong overlaps of mean major elemental concentrations between the clusters. Within the 100 repetitions, spectra were selected multiple (20-40) times and as a measure of accuracy it was counted for each spectrum, how often it was assigned to the same cluster. For ≈88% of all spectra, the same cluster assignment was obtained each time the spectrum was in the selected sub-dataset. The spread of compositions in each cluster is larger than shown in Figure 2, therefore, standard deviations were computed (example for SiO2: cluster 1-5 ≈ 2-7 wt%, cluster 6 ≈ 17 wt%). In general, the differences between the means associated with the computed standard deviations reveal that each cluster has a unique signature at least in one or several elements.

Figure 1: Boxplot of n members in the clusters among the 100 repetitions. Besides a few outliers the cluster sizes are consistent between the runs.

The mean compositions in Figure 2 show that cluster 1 and cluster 6 have divergent SiO2 mean compositions while the other clusters 2-5 are at a similar level of SiO₂. These clusters, on the other hand, have varying MgO mean concentrations. Furthermore, the largest clusters 3 and 5 are together with cluster 4 the clusters with the highest FeO_T mean concentrations. Alkalis, in particular Na₂O, are remarkable high in cluster 2 which is also enriched in Al₂O₃, indicating a felsic composition [9]. Furthermore, cluster 6 contains spectra from targets with high CaO which must be associated with frequently observed Ca-sulphate veins [10]. The members of the high SiO₂ cluster 1 were mainly measured between sol 1000 and 1500 which is in agreement with local enrichments in SiO₂ at Marias Pass and Bridger Basin [11].

Figure 2: For each repetition, the mean major element composition of each cluster was computed. Here, the results of all repetitions are shown in form of boxplots for each major element. Despite a few outliers, the clusters have stable mean compositions without strong overlaps.

4. Summary and Conclusions
The presented approach using repeated k-means clustering of NMF scores belonging to randomly selected sub-datasets reveals promising results to support the classification of ChemCam data [12]. The clustering is consistent in terms of membership and major element composition between the runs and the rate of unambiguous cluster assignments is ≈88%. At least six distinct compositions for the six clusters were observed which could correspond to end-members. Further investigations need to be done to derive and validate possible mineralogical phase labels for the clusters. With those labels, supervised learning models could be trained in order to rapidly identify new targets of similar compositions.

Acknowledgements
This work was supported by the Centre National d'Etudes Spatiales (CNES), France.

References

[1] Maurice et al. (2012) SSR, 170, 95
[2] Wiens et al. (2012) SSR, 170, 167
[3] Maurice et al. (2016), JAAS, 31, 863
[4] Forni et al. (2019), 50th LPSC, Abstract #1404
[5] Gasnault et al. (2013), 44th LPSC, Abstract #1994
[6] Gasnault et al. (2019), 9th Int. Conf. on Mars, Abstract #6199
[7] Bedford et al. (2020), Icarus 341, 113622
[8] Clegg et al. (2017), SAP B, 129, 64
[9] Cousin et al. (2017), Icarus, 288, 265
[10] Nachon et al. (2014), JGR: Planets, 119, 1991
[11] Frydenvang et al. (2017), GRL, 44, 4716
[12] Mangold et al. (2017), Icarus, 284, 1

How to cite: Rammelkamp, K., Gasnault, O., Forni, O., Lasue, J., and Maurice, S.: Optimization of clustering analyses for classification of ChemCam data from Gale crater, Mars, Europlanet Science Congress 2020, online, 21 Sep–9 Oct 2020, EPSC2020-867, https://doi.org/10.5194/epsc2020-867, 2020.