Enhancing accuracy and interpretability of machine learning models using super learning and permutation feature importance techniques in digital soil mapping

Ruhollah Taghizadeh-Mehrjardi; Nikou Hamzehpour; Maryam Hassanzadeh; Karsten Schmidt; Thomas Scholten

doi:https://doi.org/10.5194/egusphere-egu21-9382

[Back] [Session SSS10.3]

EGU21-9382

https://doi.org/10.5194/egusphere-egu21-9382

EGU General Assembly 2021

© Author(s) 2021. This work is distributed under
the Creative Commons Attribution 4.0 License.

Enhancing accuracy and interpretability of machine learning models using super learning and permutation feature importance techniques in digital soil mapping

Ruhollah Taghizadeh-Mehrjardi^1,2,3, Nikou Hamzehpour⁴, Maryam Hassanzadeh⁴, Karsten Schmidt⁵, and Thomas Scholten^1,2

Ruhollah Taghizadeh-Mehrjardi et al.

¹Eberhard Karls University Tübingen, Institute of Geography, Soil Science and Geomorphology, Tübingen, Germany (ruhollah.taghizadeh-mehrjardi@mnf.uni-tuebingen.de)
²CRC 1070 ResourceCultures, University of Tübingen, Gartenstr. 29, Tübingen, Germany
³Faculty of Agriculture and Natural Resources, Ardakan University, Ardakan, Iran
⁴Soil Science Department, Faculty of Agriculture, University of Maragheh, Maragheh, Iran
⁵eScience Center, University of Tübingen, 72070 Tübingen, Germany

The digital soil mapping (DSM) approach predicts soil characteristics based on the relationship between soil observations and related covariates using machine learning (ML) models. In this research, we applied a wide range of machine learning models (12 base learners) to predict and map soil characteristics. To enhance accuracy and interpretability we combined the base learner predictions using super learning strategy. However, a major problem of using super learning and complex models is that the explicit share of individual covariates persons in the overall result cannot be explicitly quantified. To overcome this restriction and make the super learning models interpretable, we employed model-agnostic interpretation tools, for example, permutation feature importance. Particularly, we integrated the weight assigned to each ML base learner obtained by super learning and the ranked ML base learner’s covariates obtained by permutation feature importance to explore the contribution of covariates on the final prediction. We tested our super learning and permutation feature importance techniques to predict and mapping physicochemical soil characteristics of Urmia Playa Lake (UPL) sediments in Iran. As expected, our results indicated that super leaning could significantly improve the ML accuracies for predicting soil characteristics of single base learners. In terms of root mean square error, super learning improved over the performance of the linear regression by an average of 45.7%. Furthermore, the permutation feature importance allowed us to interpret our results better and prove the significant contribution of geomorphological features and groundwater data in predicting soil characteristics of UPL sediments.

How to cite: Taghizadeh-Mehrjardi, R., Hamzehpour, N., Hassanzadeh, M., Schmidt, K., and Scholten, T.: Enhancing accuracy and interpretability of machine learning models using super learning and permutation feature importance techniques in digital soil mapping , EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-9382, https://doi.org/10.5194/egusphere-egu21-9382, 2021.

Corresponding displays formerly uploaded have been withdrawn.