Advances in soil modeling through data analytics, machine learning and prediction


Soil modeling are witnessing an unprecedented increase in data volume, opening up new opportunities to advance physical understanding, improve earth system modeling, and increase the predictive ability of climate and earth surface processes at a range of scales. Mining the data for new knowledge presents also new challenges and opportunities to the soil science community instigating the development or adaptation of tools from mathematics, statistics, and computer science for the problems at hand. This session provides a forum for scientists to exchange ideas on the topic such as, but not limited to the following (1) innovative data analytics approach in soil modeling, (2) using machine learning (deep learning) and other innovative approaches in predicting soil hydraulic properties, soil transport parameters, thermal parameters, biogeochemical parameters, etc., (3) machine learning coupled with field data and numerical data, (4) advances in theoretical and applied studies in soil modeling along with their predictability and uncertainty, and (5) development of soil related data and its application in soil modeling. Scientists working in soil modeling sciences related to the above topics are encouraged to participate.

Convener: Yonggen Zhang | Co-Conveners: Tomislav Hengl, Teamrat Ghezzehei, Wei Shangguan
| Thu, 20 May, 15:00–16:30 (CEST)
| Attendance Thu, 20 May, 16:30–18:00 (CEST)

Oral: Thu, 20 May

Chairpersons: Yonggen Zhang, Wei Shangguan, Teamrat Ghezzehei
José Padarian

Since the first applications of machine learning (ML) methods in the 80s, ML adoption in soil science has increased considerably. In parallel, the size of the soil datasets has also increased. However, current soil modelling is mostly based on “traditional” ML approaches, not taking full advantage of large datasets or the multiple opportunities provided by more advanced modelling methods. Here I present the latest examples in the use of ML for soil predictive modelling, specifically the use of deep learning models in the context of soil spatial modelling and soil spectroscopy. Additionally, I will show less traditional ML applications that allow the use of field data into numerical workflows, and some advanced training techniques that showcase the flexibility of neural networks and open new, exciting opportunities to solve soil modelling challenges.

How to cite: Padarian, J.: Deep neural networks: a flexible framework for soil modelling, 3rd ISMC Conference ─ Advances in Modeling Soil Systems, online, 18–22 May 2021, ISMC2021-105,, 2021.

Surya Gupta, Peter Lehmann, Andreas Papritz, Tomislav Hengl, Sara Bonetti, and Dani Or

Saturated soil hydraulic conductivity (Ksat) is a key parameter in many hydrological and climatic modeling applications, as it controls the partitioning between precipitation, infiltration and runoff. Values of Ksat are often deduced from Pedotransfer Functions (PTFs) using maps of soil attributes. To circumvent inherent limitations of present PTFs (heavy reliance of arable land measurements, ignoring soil structure, and geographic bias to temperate regions), we propose a new global Ksat map at 1–km resolution by harnessing technological advances in machine learning and availability of remotely sensed surrogate information (terrain, climate and vegetation). We compiled a comprehensive Ksat data set with 13,258 data geo-referenced points from literature and other sources. The data were standardized and quality-checked in order to provide a global database of soil saturated hydraulic conductivity (SoilKsatDB). The SoilKsatDB was then applied to develop a Covariate-based GeoTransfer Function (CoGTF) model for predicting spatially distributed Ksat values using remotely sensed information on various environmental covariates. The model accuracy assessment based on spatial cross-validation shows a concordance correlation coefficient (CCC) of 0.16 and a root meansquare error (RMSE) of 1.18 for log10 Ksat values in cm/day (CCC=0.79 and RMSE=0.72 for non spatial cross-validation). The generated maps of Ksat represent spatial patterns of soil formation processes more distinctly than previous global maps of Ksat based on soil texture information and bulk density. The validation indicates that Ksat could be modeled without bias using CoGTFs that harness spatially distributed surface and climate attributes, compared to soil information based PTFs. The relatively poor performance of all models in the validation (low CCC and high RMSE) highlights the need for the collection of additional Ksat values to train the model for regions with sparse data.

How to cite: Gupta, S., Lehmann, P., Papritz, A., Hengl, T., Bonetti, S., and Or, D.: Global Mapping of Soil Saturated Hydraulic Conductivity Combining Legacy Data, Spatial Covariates and Machine Learning, 3rd ISMC Conference ─ Advances in Modeling Soil Systems, online, 18–22 May 2021, ISMC2021-90,, 2021.

Jingyi Huang, Ankur Desai, Jun Zhu, Alfred Hartemink, Paul Stoy, Steven Loheide II, Heye Bogena, Yakun Zhang, Zhou Zhang, and Francisco Arriaga

Current in situ soil moisture monitoring networks are sparsely distributed while remote sensing satellite soil moisture maps have a very coarse spatial resolution. In this study, an empirical global surface soil moisture (SSM) model was established via fusion of in situ continental and regional scale soil moisture networks, remote sensing data (SMAP and Sentinel-1) and high-resolution land surface parameters (e.g., soil texture, terrain) using a quantile random forest (QRF) algorithm. The model had a spatial resolution of 100m and performed moderately well under cultivated, herbaceous, forest, and shrub soils (R2 = 0.524, RMSE = 0.07 m3 m−3). It has a relatively good transferability at the regional scale among different continental and regional networks (mean RMSE = 0.08–0.10 m3 m−3). The global model was then applied to map SSM dynamics at 30–100m across a field-scale network (TERENO-Wüstebach) in Germany and an 80-ha irrigated cropland in Wisconsin, USA. Without local training data, the model was able to delineate the variations in SSM at the field scale but contained large bias. With the addition of 10% local training datasets (“spiking”), the bias of the model was significantly reduced. The QRF model was also affected by the resolution and accuracy of soil maps. It was concluded that the empirical model has the potential to be applied elsewhere across the globe to map SSM at the regional to field scales for research and applications. Future research is required to improve the performance of the model by incorporating more field-scale soil moisture sensor networks and high-resolution soil maps as well as assimilation with process-based water flow models.

How to cite: Huang, J., Desai, A., Zhu, J., Hartemink, A., Stoy, P., Loheide II, S., Bogena, H., Zhang, Y., Zhang, Z., and Arriaga, F.: A data-driven approach for mapping global surface soil moisture at 100 m using high-resolution remote sensing data and land surface parameters, 3rd ISMC Conference ─ Advances in Modeling Soil Systems, online, 18–22 May 2021, ISMC2021-26,, 2021.

Ruhollah Taghizadeh-Mehrjardi, Razieh Sheikhpour, Norair Toomanian, and Thomas Scholten

The most critical aspect of application of digital soil mapping is its limited transferability. Modelling soil properties for regions where no or only sparse soil information is available is highly uncertain, when using the low-cost geo-spatial environmental covariates alone. To overcome this drawback, transfer learning has been introduced in different environmental sciences, including soil science. The general idea behind extrapolation of soil information with transfer learning in soil science is that the target area to transfer to is alike, e.g. in terms of soil-forming factors, and the same machine learning rules can be applied. Supervised machine learning, so far, has been used to transfer the soil information from the reference to the target areas with very similar environmental characteristics between both. Hence, it is unclear how machine learning can perform for other target regions with different environmental characteristics. Semi-supervised learning is an approach to machine learning that combines a small amount of labeled data (reference area) with a large amount of unlabeled data (target area) during training. In this study, we explored if semi-supervised learning could improve the transferability of digital soil mapping relative to supervised learning methods. Soil data for two arid regions and associated environmental covariates were obtained. Semi-supervised learning and supervised learning models were trained based on the data in the reference area and then tested based on the data in the target area. The results of this study indicated the higher power of semi-supervised learning for transferring soil information from one area to another in comparison to the supervised learning method.   

How to cite: Taghizadeh-Mehrjardi, R., Sheikhpour, R., Toomanian, N., and Scholten, T.: Semi-supervised learning for increasing transferability of machine learning in digital soil mapping, 3rd ISMC Conference ─ Advances in Modeling Soil Systems, online, 18–22 May 2021, ISMC2021-32,, 2021.

Rodrigo Miranda, Rodolfo Nobrega, Estevão Silva, Jadson Freire, José Filho, Magna Moura, Alexandre Barros, Alzira Saraiva, Anne Verhoef, Raghavan Srinivasan, Suzana Montenegro, Maria Araújo, and Josiclêda Galvíncio

Environmental models often require soil maps to represent the spatial variability of soil attributes. However, mapping soils using conventional in-situ survey protocols is time-consuming and costly. As an alternative, digital soil mapping offers a fast-mapping approach that might be used to monitor soil attributes and their interrelationships over large areas. In Brazil, conventional survey methods are still widely used, and thus maps still in development are considered as the state-of-the-art products for decades. In this study, we address this lack of updated spatial information on many soil attributes by producing regional statistical soil models using an innovative framework. This new framework attempts to reduce prediction redundancies due to high multicollinearity, by implementing a Feature Selector algorithm. This is expected to improve a model’s strength by decreasing its unexplained variance. The framework’s core is composed of the Soil-Landscape Estimation and Evaluation Program (SLEEP) and a calibrated Gradient Boosting Model capable of modelling the spatial distribution of soil attributes at multiple soil depths. These models allowed us to explain the spatial distribution of some basic soil attributes (physical and chemical), and its environmental drivers. The model training and testing approach used 30 environmental attributes, and data from 223 soil profiles for the state of Pernambuco, Brazil. Our models demonstrated a consistent potential to perform spatial extrapolations with r2 ranging from 0.8 to 0.97, and PBIAS from -0.51 to 2.03. The properties related to topographic and climatic conditions were dominating when estimating the number of horizons, percentage of silt and the sum of bases (a measure of soil fertility). We believe that our framework features high flexibility, while reducing capital investments when compared to in situ surveys and traditional mapping protocols. These findings also have implications for the improvement and testing of pedotransfer functions. We thank FACEPE for funding this through APQ 0646-9.25/16.

How to cite: Miranda, R., Nobrega, R., Silva, E., Freire, J., Filho, J., Moura, M., Barros, A., Saraiva, A., Verhoef, A., Srinivasan, R., Montenegro, S., Araújo, M., and Galvíncio, J.: Digital soil mapping using machine learning techniques in a varied tropical environment, 3rd ISMC Conference ─ Advances in Modeling Soil Systems, online, 18–22 May 2021, ISMC2021-78,, 2021.

Interactive: Thu, 20 May, 16:30–18:00 | virtual poster area

Chairpersons: Wei Shangguan, Teamrat Ghezzehei, Yonggen Zhang
Annelie Ehrhardt, Jannis Groh, and Horst H. Gerke

Preferential and lateral subsurface flow may be responsible for the accelerated transport of water and solutes in sloping agricultural landscapes; however, the process is difficult to observe. One idea is to compare time series of soil moisture observations in the field with those in lysimeters, where flow is vertically oriented. This study aims at identifying periods of deviations in soil water contents and pressure heads measured in the field and in a weighing lysimeter with the same soil profile. Wavelet Coherency Analysis (WCA) was applied to time series of hourly soil water content and pressure head data (15, 32, 60, 80, and 140 cm depths) from Colluvic Regosol soil profiles in summer 2017. The phase shifts and periodicities indicated by the WCA plots reflected the response times to rain events in the same depth of lysimeter and field soil. For many rain events and depths, sensors installed in the field soil showed a faster response than those in the lysimeters soil. This could be explained by either vertical preferential flow or lateral subsurface flow from upper hillslope positions. Vice versa, a faster sensor response in the lysimeter soil could be indicative for vertical preferential effects. The WCA plots comprise all temporal patterns of time shifts and correlations between larger data time series in a condensed form to identify potentially relevant periods for more detailed analyses of subsurface flow dynamics. 

How to cite: Ehrhardt, A., Groh, J., and Gerke, H. H.: Wavelet Analysis of Soil Water State Variables for Identification of Lateral Subsurface Flow: Lysimeter versus Field Data, 3rd ISMC Conference ─ Advances in Modeling Soil Systems, online, 18–22 May 2021, ISMC2021-3,, 2021.

Sen Lu

The soil thermal conductivity (λ) and matric suction of soil water (h, the negative of matric potential) relationship has been widely used in land surface models for estimating soil temperature and heat flux following the McCumber and Pielke (1981, MP81) λ-h model. However, few datasets are available for evaluating the accuracy and feasibility of the MP81 λ-h model under various soil and moisture conditions. In this study, we developed a new λ-h model and compared its performance with that of the MP81 model using measurements on 18 soils with a wide range of textures, water contents and bulk densities. The heat pulse technique was used to measure λ, and the suction table, micro-tensiometers, pressure plate device, and the dew point potentiometer were applied to obtain soil water retention curves at the appropriate suction ranges. In the range of pF (the common logarithm of h in cm)≤3, the λ-h relationships were highly nonlinear and varied strongly with soil texture and bulk density. In the dry range (i.e., pF > 3), there existed a universal λ-h relationship for all soil textures and bulk densities, and an exponential function was established to describe the relationship. Independent evaluations using λ-h data on five intact soil samples showed that the new model produced accurate λ data from pF values with root mean square errors (RMSE) with the range of 0.03–0.18W m−1 K−1. While, large errors (RMSEs within 0.17–0.36W m−1 K−1) were observed with λ estimates from the MP81 model. 

How to cite: Lu, S.: The relationship between thermal conductivity and matric suction of soils, 3rd ISMC Conference ─ Advances in Modeling Soil Systems, online, 18–22 May 2021, ISMC2021-5,, 2021.

Qingliang Li, Zhongyan Li, Wei Shangguan, Yifei Yao, Xuezhi Wang, and Fanhua Yu

The skillful long-term (from 3 days delay) prediction of soil moisture can provide more help than the short-term prediction of soil moisture for many practical applications including ecosystem management and precision agriculture. It presents great challenges because the far future variation of soil moisture has more uncertainties than the near future on soil moisture. Therefore, a novel circulating learning deep learning (DL) model based on Long Short-Term Memory (LSTM), is developed in this study as an alternative data-intelligence tool. This model includes two layers: the encoder-decoder LSTM layer and LSTM with a fully connected layer, which were used to enhance the long-term prediction ability by considering the intermediate time-series data between the input timestep and the predictive timestep. We applied this model by using FLUXNET2015 tie1 and tie2 subset data product over seven sites in different countries. The result shows that our model predicts soil moisture with better accuracy in average state and fluctuation pattern and amplitude when compared with other state-of-the-art DL methods, such as Multiple Linear Regression (MLR), Long Short-Term Memory (LSTM) and encoder-decoder LSTM models. Furthermore, the different-term (short-term, medium-term and long-term) predictability of soil moisture over various conditions (i.e., different hyper-parameters in our model, different predictive models, different climate regions and different sites) has been widely discussed in this paper. The code of our model is publicly available at We hope that this work will accelerate the research for long-term soil moisture prediction.

How to cite: Li, Q., Li, Z., Shangguan, W., Yao, Y., Wang, X., and Yu, F.: Exploring the Long-Term Soil Moisture Predictability with FLUXNET Site Data using Circulating Learning Model, 3rd ISMC Conference ─ Advances in Modeling Soil Systems, online, 18–22 May 2021, ISMC2021-21,, 2021.

Liang Zhong, Xi Guo, Zhe Xu, and Meng Ding

Soil, as a non-renewable resource, should be monitored continuously to prevent its degradation and promote sustainable agricultural management. Soil spectroscopy in the visible-near infrared range is a fast and cost-effective analytical technique to predict soil properties. The use of large soil spectral libraries can reduce the work needed to reliably estimate soil properties and obtain robust models capable of widespread applicability. Deep learning is apt for big data analysis, and this approach could herald a profound change in the way we model soil spectral data generally. Accordingly, we explored the modeling potential of deep convolutional neural networks (DCNNs) for soil properties based on a large soil spectral library. The European topsoil dataset provided by the Land Use/Cover Area frame Survey (LUCAS) was used without any pre-processing of spectra or covariates added. Two 16-layer DCNN models (ResNet-16 and VGGNet-16) were successfully used to make regression predictions of seven soil properties and classification predictions of soil texture into four groups and 12 levels. Our results showed that the ResNet-16 and VGGNet-16 models produced highly accurate predictions for most soil properties, being superior to either a shallow convolutional neural network and traditional machine learning approaches. Soil organic carbon content, nitrogen content, cation exchange capacity, pH, and calcium carbonate content were well predicted, having a ratio of performance to deviation (RPD) > 2.0. Soil potassium content was adequately predicted (1.4 ≤ RPD ≤ 2.0) and phosphorous content was poorly predicted (RPD < 1.4). The overall classification accuracy of soil texture was 0.749 (four groups) and 0.566 (12 levels). The position of feature wavelengths differed among the soil properties, for which multiple characteristic peaks were common. This study fully demonstrates the modeling potential of deep learning with soil hyperspectral data, which could bring us closer to achieving precision agriculture.

How to cite: Zhong, L., Guo, X., Xu, Z., and Ding, M.: Soil properties: their prediction and feature extraction from the LUCAS spectral library using deep convolutional neural networks, 3rd ISMC Conference ─ Advances in Modeling Soil Systems, online, 18–22 May 2021, ISMC2021-28,, 2021.

Nisha Bao, Xiaoyu Yang, and Yue Cao

Soil nutrient is one of the most important properties to support farmland quality and product. Imaging spectrometry has the potential for rapid acquisition and real-time monitoring of soil characteristics. The goal of this study was to explore the preprocessing and modeling method of hyperspectral image acquired from UAV platform for soil organic matter (SOM) and soil total nitrogen (STN) content estimation in farmland. The results showed that: 1) Multiple Scattering Correction method performed better in reducing image scattering noise rather than Standard Normal Variate transformation or spectral derivatives with higher correlation and lower signal-to-noise ratio; 2) The proposed feature selection method, which was combined with Competitive Adaptive Reweighted Sampling algorithm (CARS) and Successive Projections Algorithm (SPA), could provide selective preference for hyperspectral bands with final 24 feature bands for SOM estimation and 22 feature bands for STN estimation; 3) The particle swarm optimization (PSO) algorithm was selected to optimize input weights and hidden biases of extreme learning machine (ELM)  model for SOM and STN prediction. The PSO-ELM model with input selective preference bands produced higher prediction accuracy with the R2 of 0.73, RPD of 1.91 for SOM and R2 of 0.63, RPD of 1.53 for STN respectively rather than ELM model. These outcomes provided a technical support for wider application of soil properties estimation using imaging spectrometry in agriculture precision monitoring and mapping.

How to cite: Bao, N., Yang, X., and Cao, Y.: Soil nutrient estimation and mapping in agriculture land based on improved ELM and UAV imaging spectrometry, 3rd ISMC Conference ─ Advances in Modeling Soil Systems, online, 18–22 May 2021, ISMC2021-31,, 2021.

Fatemeh Hateffard and Tibor József Novák

One of the most critical steps in digital soil mapping is finding a sampling approach to cover a good spatial coverage of the area regarding the soil spatial variation. In this matter, environmental variables can aid in taking samples in more innovative and more precise locations while reducing the soil sampling efforts such as time and costs. Conditioned Latin hypercube sampling (cLHS) is a stratified random design strategy that perfectly represents the variability of auxiliary variables in feature space. This study applied this method and compared it to simple random sampling to optimize sampling designs for mapping in the agricultural study site in Hungary. The covariates were indices extracted by the digital elevation model and Landsat images. The principal component analysis (PCA) was applied to reduce the data overlap and select the most important variables as the model's inputs. By computing the statistical criteria (mean, variance, standard deviation, etc.) for covariates and comparing these results between the sampling populations and the entire one, we may conclude that both designs gave almost similar predictions. However, for most covariates, statistical means of cLHS provide the closest approximation compared to the random approach sampling method, but the statistical variances and SDs retrieved similar results. Furthermore, the histogram distribution of most variables in the cLHS was following more closely to the original distribution of the environmental covariates. Overall, considering the type of the study site and the chosen variables, it seems that cLHS is a more applicable method.


How to cite: Hateffard, F. and József Novák, T.: Soil sampling design optimization by using conditioned Latin Hypercube sampling   , 3rd ISMC Conference ─ Advances in Modeling Soil Systems, online, 18–22 May 2021, ISMC2021-35,, 2021.

Gao Bingbo, Alfred Stein, and Wang Jinfeng

The soil heavy metal contamination has becoming a serious problem worldwide. An accurate prediction of soil heavy metal concentration at un-sampled locations using a small sample remains a challenge, because of many natural and human factors and resulted complex heterogeneous pattern, and the relationship between influencing factors are also not homogeneous. To overcome those heterogeneities and improve the prediction accuracy, a two point machine learning method is proposed in this paper by fully leveraging the spatial relationship and similarity relationship of high dimensional ancillary variables. It firstly models the difference between paired points using machine learning model, then predict the concentration differences between sampling points and the un-sampled points, and finally utilize the predicted differences to choose near neighbors to obtain the final concentration prediction. In this method, an innovative way to search near neighbors for local model from the difference of response variable was put forward to overcome the Curse of Dimensionality. Its performance was illustrated in two diverse case studies and it is demonstrated that proposed method can dramatically improve the prediction accuracy for soil heavy metal. Besides spatial prediction of soil pollution, it can also be applied to spatial prediction of other other elements of the earth system. And in further the machine learning method in this paper can be replaced to any other supervised learning model according to specific situations.





How to cite: Bingbo, G., Stein, A., and Jinfeng, W.: A two point machine learning method for spatial prediction for soil : overcoming the spatially heterogeneous distribution and relationship  of soil heavy metal concentration , 3rd ISMC Conference ─ Advances in Modeling Soil Systems, online, 18–22 May 2021, ISMC2021-37,, 2021.

Zhihan Yang, Xiaolu Tang, Xinrui Luo, and Yuehong Shi

Soil respiration (RS), consisting of soil autotrophic respiration (RA) and heterotrophic respiration (RH), is the largest outflux of CO2 from terrestrial ecosystems to the atmosphere. The temperature sensitivity (Q10) of RS is a crucial role in benchmarking the intensity of terrestrial soil carbon-climate feedbacks. However, the heterogeneity of Q10 of RS has not been well explored. To fill this substantial knowledge gap, gridded long-term Q10 datasets of RS at 5 cm with a spatial resolution of 1 km were developed from 515 field observations using a random forest algorithm with the linkage of climate, soil and vegetation variables. Q10 of RA and RH were estimated based on the linear correlation between Q10 of RS and RA/RH. Field observations indicated that regardless of ecosystem types, Q10 of RS ranged from 1.54 to 4.17 with an average of 2.52. Q10 varied significantly among ecosystem types, with the highest mean value of 3.18 for shrubland, followed by wetland (2.66), grassland (2.49) and forest (2.48), whereas the lowest value of 2.14 was found in cropland. RF could well explain the spatial variability of Q10 of RS (model efficiency = 0.5). Temporally, Q10 of RS, RA and RH did not differ significantly (p = 0.386). Spatially, Q10 of RS, RA and RH varied greatly. In different climatic zones, the plateau areas had the highest mean Q10 value of 2.88, followed by tropical areas (2.63), temperate areas (2.52), while the subtropical region had the lowest Q10 on average (2.37). The predicted mean Q10 of RS, RA and RH were 2.52, 2.29, 2.64, respectively, with strong spatial patterns, indicating that the traditional and constant Q10 of 2 may bring great uncertainties in understanding of soil carbon-climate feedbacks in a warming climate.

How to cite: Yang, Z., Tang, X., Luo, X., and Shi, Y.: Spatial heterogeneity of temperature sensitivity of soil respiration across China, 3rd ISMC Conference ─ Advances in Modeling Soil Systems, online, 18–22 May 2021, ISMC2021-42,, 2021.

Leonardo Deiss, Shameema Oottikkal, Karen Tomko, Wanyu Huang, Steve Culman, and Scott Demyan

Soil infrared spectroscopy has great potential for estimating soil properties, but reference soil measurements are typically required in combination with multivariate statistical models to estimate soil properties. User-friendly predictive tools based on open-source statistical environment remain one of the main limitations to enable technology diffusion to non-specialist users. Our aim is to build capacity for an automated machine learning routine for rapid and robust prediction of soil health indicators using lab acquired soil infrared spectra. This intelligent system runs on R statistical environment and includes (1) a diverse soil spectral library comprising main physiographic regions from the USA Midwest region under diverse land uses and various sampling depths, (2) a classification process to detect potential outliers in newly acquired spectra using supervised machine learning techniques, and (3) a multi-model optimized prediction process based on linear and non-linear statistical procedures (partial least squares, support vector machines, and neural network). This prediction system works at the intersection of soil and data science and high-performance computing to enable efficient parallel processing of spectral data on multi-core coprocessors. Using artificial intelligence to automate soil infrared spectroscopy is a fundamental demand that will make this technique an effective routine in soil laboratories to estimate soil health.

How to cite: Deiss, L., Oottikkal, S., Tomko, K., Huang, W., Culman, S., and Demyan, S.: Machine learning to automate rapid soil health assessment using infrared spectroscopy, 3rd ISMC Conference ─ Advances in Modeling Soil Systems, online, 18–22 May 2021, ISMC2021-83,, 2021.

Wenjuan Zheng, Chongyang Shen, Lianping Wang, and Yan Jin

Knowledge of the soil water retention curve (SWRC) is critical to mathematical modeling of soil water dynamics in the vadose zone. Traditional SWRC models were developed based on bundles of cylindrical capillaries (BCCs) using a residual water content, which fail to accurately describe the dry end of the curve. This study improved and expanded on the traditional BCC models. Specifically, the total water retention was treated as a weighed superposition of capillary and adsorptive components.We proposed a mathematical continuous expression for
water retention from saturation to oven dryness, which also allowed for a partition of capillary and adsorptive retention. We further evaluated six capillary retention functions using different probability laws for pore-size distribution - namely, the log-logistic, Weibull, lognormal, two-parameter van Genuchten (VG), three-parameter VG (or Dagum), and Fredlund–Xing (FX) distributions. Model testing against 144 experimental data showed better agreement of the proposed model with experimental observations than the traditional approaches that use the residualwater content. The Dagum and FX distributions, which have one more degree of freedom, provided better agreement with experimental data than the other four distributions. The log-logistic and lognormal distributions fitted the experimental data better than the Weibull and VG distribution for loam soils. In addition, the fitted weighting factor w using the log-logistic and lognormal distributions better correlated to soil clay content than the other four distributions. Our study suggests that the log-logistic and lognormal distributions are more suitable to model soils’ pore-size distribution than other tested distributions.

How to cite: Zheng, W., Shen, C., Wang, L., and Jin, Y.: An empirical soil water retention model based on probability laws for pore-size distribution, 3rd ISMC Conference ─ Advances in Modeling Soil Systems, online, 18–22 May 2021, ISMC2021-87,, 2021.

Hao Chen, Tiejun Wang, and Yonggen Zhang

Accurately mapping soil water retention parameters is vital for modeling atmosphere-land interactions but is challenged by limited measurements and simulations globally. Ensemble pedotransfer functions (PTFs) have been highly recommended for use due to the higher reliability of ensemble models and the error compensation among ensemble members. However, conventional ensemble approaches assign a fixed weight to each PTF and may not fully utilize the strengths of individual PTFs. In this work, we developed a new ensemble approach based on an automated machine learning workflow to assign varying weights to assemble 13 widely used PTFs. The AutoML-assisted ensemble approach (AutoML-Ens), as well as the simple average (MEAN), Bayesian Model Average (BMA), and the hierarchical multi-model ensemble approach (HMME), were evaluated using the global coverage National Cooperative Soil Surbey (NCSS) Soil Characterization Database. Results indicate that AutoML-Ens approach performs better than the conventional approaches in terms of the coefficient of determination (R2) and root mean square error (RMSE). Three soil hydraulic parameters, i.e., saturated water content, field capacity, and wilting points, and their corresponding uncertainties, were further derived through the AutoML-Ens approach at a 30’’×30’’ geographical spatial resolution based on a global soil composition database (SoilGrids), which can be applied in the Earth System Modeling. This study demonstrated the necessity of dynamic weights assigning in ensemble approaches and the great potential of coupling data-driven (here, the AutoML) and modeling (empirically or physically-based PTFs) approaches in mapping global soil water retention-like parameters.

How to cite: Chen, H., Wang, T., and Zhang, Y.: An automated machine learning based ensemble approach for improving estimates of soil water retention parameters, 3rd ISMC Conference ─ Advances in Modeling Soil Systems, online, 18–22 May 2021, ISMC2021-96,, 2021.

Alexandre Stehlick, Ana Elisa Barioni, Paulino Ribeiro, and Luís Gustavo Barioni

Most models of soil C dynamics can be expressed by the differential vector equation:

dC(t)/dt = f(t).K.C(t) + b(t)

where each element of the vector C(t) represents a carbon compartment with intrinsic decomposition rate (usually fast, slow and passive); K is the transition matrix between the compartments (decomposition rates and decomposition partitioning); the scalar function f(t) is a forcing function of the decomposition rates modifiers (e.g. soil moisture and temperature); and b(t) is the vector with rates of external C inputs for each compartment. Considering the case where only total soil carbon is measured, only the sum of C in all compartments can be used for model evaluation, calibration and data assimilation. Also, in most compartmental models there are too many parameters to be adjusted, leading to identifiability problems. Although some parameters can be constrained according to the model’s assumptions, identifiability is still problematic except for the simplest compartmental models. By working on the differential equation, it is possible to deduce an explicit representation of the total carbon trajectory, in a way that the number of necessary empirical parameters is reduced, without loss of generality or need of further assumptions. In this work we propose such a representation for the total carbon trajectory whose generality embraces implicitly the mechanism of models as Century, RothC and CQESTR. The solution requires less parameters than the original models do but still allows mapping the original model parameters and decomposition modifiers functions onto the solution. Additionally, we show how the main processes of decomposition of soil organic matter can be represented by the terms of the solution found. Finally, we present the solution behavior under extreme conditions of temperature, humidity and initial stocks. We expect our general framework to help improving model’s calibration and data assimilation procedures.

How to cite: Stehlick, A., Barioni, A. E., Ribeiro, P., and Barioni, L. G.: An analytic representation of the total soil carbon trajectory implied by the general mathematical framework of most soil carbon models, 3rd ISMC Conference ─ Advances in Modeling Soil Systems, online, 18–22 May 2021, ISMC2021-97,, 2021.

Yang Zhou, Xiaomin Zhao, and Xi Guo

Soil total nitrogen is closely related to soil quality and fertility. It is of great significance to know the spatial distribution characteristics of soil total nitrogen for the implementation of precision agriculture management. The spatial distribution of total nitrogen in the surface soil of Xunwu County was predicted and mapped by using two methods: random forest and random forest plus residuals kriging. These methods were combined with multi-source auxiliary variables such as (i) terrain factors, (ii) geographical coordinate, (iii) remote sensing factors, (iv) climate factors, (v) distance factors, and (vi) soil physical or chemical factors. Also, the prediction accuracy of the two models was compared after 100 times of repeated operation. Our results show that the mean values of the decision coefficient (R2 = 0.6291) and concordance correlation coefficient (CCC = 0.7613) of the random forest model were higher than those of the random forest plus residual kriging method (R2 = 0.5719, CCC = 0.6881). Also, the mean values of the mean absolute error (MAE = 0.1570 g·kg-1) and root mean squared error (RMSE = 0.2108 g·kg-1) were lower than those of the random forest plus residual kriging method (MAE = 0.1682 g·kg-1, RMSE = 0.2267 g·kg-1). Importantly, adding residual to the random forest model did not improve its accuracy. These results suggest that the random forest model can be used as a new method for predicting soil properties, and it provides technical support for the implementation of agricultural management.

How to cite: Zhou, Y., Zhao, X., and Guo, X.: Prediction of Total Nitrogen Distribution in Surface Soil Based on Multi-source Auxiliary Variables and Random Forest Approach, 3rd ISMC Conference ─ Advances in Modeling Soil Systems, online, 18–22 May 2021, ISMC2021-103,, 2021.