- Ohio State University, Department of Food, Agricultural and Biological Engineering, Columbus, United States of America (drewryd@gmail.com)
Soil heat flux (SHF) is a key component of the surface energy balance and a driver of soil physiochemical and biological processes. Despite its importance accurate estimation of soil heat flux is hindered due to variations in soil composition, overlying vegetation density and phenology, and highly variable environmental forcings. These factors have challenged the development of robust models of SHF, with modeling studies focused on mid-day conditions corresponding to satellite overpass times, missing the significant variability that occurs throughout diurnal periods across a growing season. Here we assess the performance of ensemble machine learning modeling for predicting soil heat flux at half-hourly resolution for multiple agro-ecosystems. Observations span a wide range of phenological and climatological variability over a complete growing season. We utilized the random forest machine learning (ML) approach to develop a wide range of models utilizing combinations of predictor variables that include widely-available meteorological conditions and proximal remote sensing observations of reflectance indices and land surface temperature (LST). The performance of the ML models developed here was compared to a set of six semi-empirical soil heat flux models developed around the use of remote sensing information. The random forest ML ensembles demonstrated a general ability to significantly outperform the six semi-empirical models in capturing diurnal variations across the growing season for each of the four crops examined here (soybean, corn, sorghum and miscanthus). We found ML models using the complete set of meteorological and remote sensing predictors captured over 90% of the variability in SHF across all crops. ML models using only LST and NDVI as predictors were able to capture over 82% of SHF variability across all crops. Shapley additive explanations (SHAP) methods were examined to allow for model interpretability, providing insights into the typically opaque ML modeling process. From a set of seven observation variables an exhaustive search was performed to identify predictor attributions for each of the four crops examined here. Models trained with fewer input variables tended to display more linear and interpretable feature attribution, suggestive of physical consistency. LST and air temperature were often the most crucial predictors when present due to high correlation with soil heat flux, with NDVI the next most crucial predictor due to its ability to quantify canopy density and phenological status. These results suggest that robust and accurate soil heat flux estimations can be made at high-temporal resolution purely through simple proximal remote sensing observations and widely available meteorological observations.
How to cite: Drewry, D. and Cross, J.: Ensemble machine learning for interpretable soil heat flux estimation, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-2935, https://doi.org/10.5194/egusphere-egu25-2935, 2025.