- Max Planck Institute for Biogeochemistry, Biogeochemical Integration, Jena, Germany (drachti@bgc-jena.mpg.de)
Land ecosystems play a crucial role in the global carbon cycle, absorbing large amounts of atmospheric CO2 through photosynthesis and releasing it back through decomposition and respiration. However, predicting the net carbon flux is a complex task, as meteorological variability affects these processes in different ways and at various timescales. Data-driven models, such as global upscaling methods based on local flux tower measurements, struggle especially to accurately predict the year-to-year fluctuations in the net terrestrial carbon uptake, known as inter-annual variability (IAV). These difficulties are often being attributed to a lack of observational data, however, do we need longer time series of terrestrial carbon flux observations or better spatial coverage to improve IAV predictions?
Here, we test the change in performance of an interpretable machine learning (ML) framework given growing training datasets of different properties created based on output from a global land surface model (JSBACH3.2). All training datasets have an initial setting comparable to the actual observational setting (FLUXNET sites). We scale the training datasets either by increasing the number of pixels in training (space model), or by extending the time series (time model), or both (space-time model), however, we keep the increment in additional training samples to each training dataset constant. The ML framework is trained on the training datasets of different sizes and characteristics and evaluated in predicting IAV on an independent test set. To take the various effective time scales into account, our ML framework* is based on a wavelet transform of the predictor variables and a convolutional neural network to jointly predict carbon and water fluxes.
Our results confirm that increasing the sample size in the training dataset substantially enhances the performance in predicting global IAV. Further, we find that increasing the spatial coverage during training improves model performance in predicting IAV more (space model; ΔR2=0.83) than increasing the length of the time series (time model; ΔR2=0.60) compared to the initial setup. Overall, the model trained with the largest number of pixels (space-model) outperforms the other models for the same total number of training samples but fewer pixels. Using the interpretable ML technique based on the wavelet transform, we investigate the differences among the three models towards their sensitivity to different meteorological factors. We focus this analysis part on test pixels where the space and time models show the largest performance discrepancy.
In conclusion, our study demonstrates that a large spatial representation in the observational training data is more important than longer observational time series for predicting year-to-year fluctuations in global land carbon uptake.
*Reimers, C., Hafezi Rachti, D. , Liu, G., & Winkler, A. J. (2024). Comparing Data-Driven and Mechanistic Models for Predicting Phenology in Deciduous Broadleaf Forests. arXiv preprint arXiv:2401.03960.
How to cite: Hafezi Rachti, D., Reimers, C., and Winkler, A. J.: Space versus Time: Better Spatial Representation in Training Beats Longer Time Series for Predicting Global Land Carbon Uptake Variability, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-12808, https://doi.org/10.5194/egusphere-egu25-12808, 2025.