- 1Universidad de Chile, Department of Mining Engineering, Santiago, Chile (abhishek.borah@ug.uchile.cl)
- 2Advanced Mining Technology Center, Universidad de Chile, Santiago, Chile
Introduction: This work deals with the regionalized classification of hydrothermal alteration types from data of continuous features (assays of trace elements and sulfide minerals) in a porphyry copper-gold deposit in Mongolia, using supervised learning algorithms. Traditional machine learning methods ignore the spatial correlations of regionalized data, whereas geostatistics can take advantage of these correlations and enhance classification scores. The novelty of our proposal lies in the deployment of a complementary set of features (‘proxies’) at the sampled data points, calculated ingeniously through geostatistical simulation with nugget effect filtering.
Methodology: We perform the cleaning and preparation of a vast set of exploratory drill hole samples, including the splitting of this dataset into training and testing subsets in the ratio 70:30. The dataset is used for the geostatistical modeling of the feature variables to simulate (by spectral simulation with filtering) the same feature variables at the training and testing data points. Because of the nugget effect filtering, the simulated values ('proxies') do not coincide with the measured (noisy) values and exhibit a stronger spatial continuity. The proxies are then taken as the input for a supervised classification of the hydrothermal alteration type on the training data, which incorporates misclassification cost matrices that account for geological criteria. The performance of the classifier is finally assessed on the testing data on the basis of standard metrics.
Results and Conclusions: Compared to the traditional approach, where hydrothermal alteration types are predicted directly from the measured features, the classification that uses the geostatistical proxies systematically provides better scores (accuracy rate and Cohen’s kappa statistic increased by 5 to 10 percentual points), showing the importance of incorporating proxy variables obtained by a spatial processing of the input information. Another advantage of using geostatistical proxies in the classification is the handling of missing data, insofar as these proxies provide a ‘clever’ alternative to the imputation of missing values, based on the spatial correlation structure of the feature variables and neighboring information, instead of a simple median value by alteration class. The use of geostatistical proxies can therefore be decisive in the presence of highly heterotopic datasets, for which discarding missing data implies a considerable loss of information. In a nutshell, our study demonstrates two things: the first is how geostatistics enriches machine learning to achieve higher predictive performance and to handle incomplete and noisy datasets in a spatial setting. Secondly, it establishes that better prediction accuracy can be achieved than in previous studies, where alteration types were predicted solely from geochemical data.
The proposed approach has far-reaching consequences for decision-making in mining exploration, geological modeling, and geometallurgical planning. We expect it to be used in supervised classification problems that arise in varied disciplines of natural sciences and engineering and involve regionalized data.
How to cite: Borah, A. and Emery, X.: Integration of Machine Learning and Geostatistics for Hydrothermal Alteration Classification in Smart Mining, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-2073, https://doi.org/10.5194/egusphere-egu26-2073, 2026.