EGU26-9587, updated on 14 Mar 2026
https://doi.org/10.5194/egusphere-egu26-9587
EGU General Assembly 2026
© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.
Oral | Thursday, 07 May, 15:05–15:15 (CEST)
 
Room 0.15
A Generic Data-driven model for Soil Moisture Prediction 
Vidya Sumathy, Ilektra Tsimpidi, and George Nikolakopoulos
Vidya Sumathy et al.
  • Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Luleå, Sweden (vidya.sumathy@ltu.se)

In the literature, most of the scientific approaches that appeared in the related Soil Moisture (SM) have been generated by trying to model physical interactions between the sampled parameters and their effect on their environment. Classical approaches in this direction have been physics-based models, such as the water balance model, which describe hydrological processes [Hu, Xiande et.al.2025]. These models use physics-based equations and require high-quality input data. Furthermore, their high computational cost limits their use in large-scale applications. Statistical methods were subsequently incorporated to enhance model adaptability [Fu, Rong, et.al., 2023]. Through data-driven approaches, empirical relationships between soil moisture and environmental parameters can be established with efficient computational costs. 

In contrast to these well-known areas, this work is trying to develop a comprehensive survey of the most popular data driven algorithms reported in the literature. These algorithms could be further categorized as: a) classical machine learning models (e.g., Random Forests and Support Vector Machines), b) deep learning models (e.g., Long Short-Term Memory, Artificial Neural Networks and Convolutional Neural Networks), c) statistical models (Multiple Linear Regression and Autoregressive Integrated Moving Average) and d) geostatistical models (Kriging). As the name indicates, these models use data as input, which are either historical data of SM, or environmental data, or both to predict soil classification such as wet or dry soil, or continuous soil moisture estimation using regression. The construction of such models typically entails an initial exploration of the data, the evaluation of several candidate models, and the final selection and training of a model using an appropriate learning algorithm [Ding et. al.2018].    

As an overall conclusion, the most common physical parameters utilized in data drive models that affect SM variation include air temperature, precipitation, air relative humidity, solar radiation, soil type, topography, and vegetation cover data. GPS location data is also important for allowing generality and adaptability in the field. Thus, we are aiming to create a novel generic data driven model, as depicted in Figure1, that will take into consideration all the previous parameters to generalize the estimation of the SM and expand its applicability in other fields without real field measurements. For achieving this, the first potential candidate as a data driven learning model will be the Long Short-Term Memory (LSTM).  

Figure 1: A block diagram of the proposed Generic Data-Driven Model.

References 

Hu, Xiande et.al. "Urban rainwater resource utilization: A sustainable environmental impact assessment using life cycle assessment (LCA) and water balance model." Desalination and Water Treatment 322 (2025): 101094. 

Fu, Rong, et.al. "A soil moisture prediction model, based on depth and water balance equation: A case study of the Xilingol League Grassland." International Journal of Environmental Research and Public Health 20, no. 2 (2023): 1374. 

Ding, Jie, et.al. "Model selection techniques: An overview." IEEE Signal Processing Magazine 35, no. 6 (2018): 16-34. 

How to cite: Sumathy, V., Tsimpidi, I., and Nikolakopoulos, G.: A Generic Data-driven model for Soil Moisture Prediction , EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-9587, https://doi.org/10.5194/egusphere-egu26-9587, 2026.