- Department of Civil Engineering, Schulich School of Engineering, University of Calgary, Calgary, Canada (ignacio.aguirre@ucalgary.ca)
Accurately simulating latent and sensible heat fluxes is a long-standing open challenge in the land modeling community. The recent model intercomparison project PLUMBER 2 over 154 flux towers showed that simple 1-variable linear regression models can outperform process-based models in simulating latent and sensible heat. PLUMBER 2 simulations were run using default model parameters, leaving the potential performance gains from parameter estimation unquantified.
Identifying optimal parameters in land models has several challenges, including high computational cost and the need to identify parameters that can correctly reproduce temporal dynamics (i.e., good performance across different time epochs) and spatial patterns (i.e., good performance across many sites). To evaluate the ability of different calibration methods to handle these challenges, this study compared the performance of traditional and machine-learning emulator-based calibration methods against Long Short-Term Memory (LSTM) benchmarks, with single-objective experiments (latent heat or sensible heat calibrated individually) and multi-objective experiments (latent and sensible heat calibrated simultaneously). We also tested two ways to train emulators and LSTMs: either considering one site at a time or leveraging information from multiple sites and their attributes simultaneously.
Our results show that the calibrated simulations outperformed the default parameters and the simple benchmarks used in PLUMBER 2, demonstrating the potential to improve process-based models. Moreover, we observed that traditional calibration methods have a tendency to overfit: these traditional calibration methods can achieve high performance during calibration but are unable to achieve similar results during validation. The emulator-based methods achieve more consistent results across both calibration and validation time periods. Additionally, we found that parameter estimation methods that incorporate information from multiple sites simultaneously achieve better spatial consistency than methods that only learn from one site at a time. These results suggest that the performance gap between LSTM and process-based models can be significantly narrowed through calibration.
How to cite: Aguirre, I., Knoben, W., Vasquez, N., and Clark, M.: Benchmarking machine learning-based emulators and traditional methods to calibrate land model parameters for 124 global flux tower sites, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-14172, https://doi.org/10.5194/egusphere-egu26-14172, 2026.