EGU24-7223, updated on 08 Mar 2024
https://doi.org/10.5194/egusphere-egu24-7223
EGU General Assembly 2024
© Author(s) 2024. This work is distributed under
the Creative Commons Attribution 4.0 License.

Comparison of Root Zone Soil Moisture Data Fusion Using Machine Learning, Triple Collocation, and Three-Cornered Hat Methods

Jing Tian and Yongqiang Zhang
Jing Tian and Yongqiang Zhang
  • Key Laboratory of Water Cycle and Related Land Surface Processes, Institute of Geographical Sciences and Natural Resources Research (IGSNRR), Chinese Academy of Sciences. Beijing, China (tianj.04b@igsnrr.ac.cn)

Root zone soil moisture (RZSM) serves as a crucial metric for assessing water stored in the soil. Modeling approaches are commonly employed in estimating RZSM. However, modelled RZSM often deviate from true RZSM values due to errors from model input data and parameters. Machine learning methods and data fusion techniques can enhance simulation accuracy. In this study, we conducted a comparative analysis of three methods for RZSM data fusion: random forest (RF), extended triple collocation (ETC), and Bayes Three Cornered Hat (BTCH).

Soil moisture observation data from 2018 to 2022 were collected at 2121 sites across China from the China Meteorological Administration (Fig.1). Daily average data were calculated by arithmetically averaging hourly data and used in the analysis. Six RZSM datasets were utilized, including SMAP Level 4, GLDAS-NOAH2.1, GLDAS-Catchment2.2, ERA5, MERRA2, and CRSR. All these data were resampled to 0.25° to maintain the same spatial resolution and were arithmetically averaged as daily averages. Additionally, some parameters related to soil, climate, and vegetation were used to build a machine learning model, specifically a random forest model. 

Fig. 1 Distribution of soil moisture sites and daily soil moisture (m3/m3) at depths ranging from 0–50 cm across China during the period from 2018 to 2019

To investigate the impact of different inputs on the performance of the RF method, three groups of inputs were employed. The specifics of the inputs used for the three methods are outlined in Table 1. The evaluation of the RF method results was carried out using a five-fold cross-validation approach.

Model Inputs
RFmodel1 NOAH, SMAP, ERA5, MERRA2, CFSR, CLSM, LAI, Soil properties, Meteorological data
RFmodel2 NOAH, LAI, Soil properties, Meteorological data
RFmodel3 NOAH, SMAP, ERA5, MERRA2, CFSR, CLSM
BTCH NOAH, SMAP, ERA5, MERRA2, CFSR, CLSM
ETC NOAH, MERRA2, CLSM

 

The boxplots show RFmodel1 performs best, emphasizing the need for comprehensive information in machine learning models. RFmodel2, superior to RFmodel3, highlights the significance of LAI, soil properties, and meteorological data in RZSM estimation. ETC and BTCH outperform individual RZSM datasets, especially in the absence of true data. The superior performance of ETC over BTCH is attributed to ETC's inputs, namely NOAH, MERRA2, and CLSM, which exhibit better accuracy compared to SMAP, ERA5, and CFSR, the inputs used by BTCH.

Fig.2 Boxplots of the Pearson coefficient (R), Root Mean Square Error (RMSE), and bias between in situ root zone soil moisture (RZSM) and its estimates from the three random forest models, Bayes Three Cornered Hat (BTCH), and Extended Triple Collocation (ETC) methods

In summary, the random forest method outperforms BTCH and ETC in the fusion of root zone soil moisture (RZSM) data, highlighting the importance of including leaf area index (LAI), soil properties, and meteorological data in the construction of the random forest model. Both BTCH and ETC demonstrate utility in enhancing RZSM estimates, making them valuable options when true data is unavailable.

How to cite: Tian, J. and Zhang, Y.: Comparison of Root Zone Soil Moisture Data Fusion Using Machine Learning, Triple Collocation, and Three-Cornered Hat Methods, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-7223, https://doi.org/10.5194/egusphere-egu24-7223, 2024.