Random forest algorithm for long-gap imputation in Eddy Covariance data: a case study in an upland semi-natural grassland in the Auvergne region
- Université Clermont Auvergne, INRAE, VetAgro Sup, UMR Ecosystème Prairial, 63000 Clermont-Ferrand, France
Eddy covariance techniques are widely used to measure the net exchange of greenhouse between the surface and the atmosphere, providing high resolution, instantaneous flux measures and long-term observations, which in turn allows more accurate assessments of the ecosystem’s state. However, gaps in eddy covariance time series reduce the statistical efficiency and increase bias estimates, hampering predictions of ecosystem function. Although, several imputation techniques have been proposed to overcome these difficulties, including Marginal Distribution Sampling (MDS), the standard method of FLUXNET, MDS has limitations for filling long gaps (weeks to months). In this study, we combine MDS and machine learning imputation techniques to fill an 18-year time series of carbon fluxes. Our objective was to evaluate whether Random Forest algorithms are able to fill long-gaps and detect seasonality, as well as to identify the best predictors of ecosystem exchange, gross primary productivity, and ecosystem respiration. The eddy covariance raw-data were obtained from an experiment in an upland semi-natural grassland in the Auvergne region of France that has been managed by continuous cattle grazing under low animal stocking rate. After raw-data processing using EddyPro software, we applied the MDS technique to half-hour data to fill the short-gaps, and then used a Random Forest (RF) algorithm to daily data to fill longer gaps. The time series was split into a training and testing dataset, and all variables describing atmospheric conditions, solar radiation, and energy fluxes were used to predict C fluxes. Random Forest models with high R2 and low prediction error increases were used to impute the long-gaps. The cross-validation between observed and predicted values in the test dataset obtained R2 of greater than 0.85 for all carbon flux variables. Our analysis also revealed that the daily carbon flux values could be estimated using the basic meteorological variables, i.e., air temperature, precipitation, atmospheric pression, friction velocity, and wind speed, but also by energy fluxes. Finally, the imputed dataset presented similar seasonality along the years, with the highest C sequestration and respiration in the summer and spring. These results highlight the value of machine learning techniques for producing robust, long-term eddy flux data time series.
How to cite: WINCK, B., BLOOR, J., and KLUMPP, K.: Random forest algorithm for long-gap imputation in Eddy Covariance data: a case study in an upland semi-natural grassland in the Auvergne region, EGU General Assembly 2023, Vienna, Austria, 24–28 Apr 2023, EGU23-8031, https://doi.org/10.5194/egusphere-egu23-8031, 2023.