- 1Department of Physical Geography, Faculty of Geosciences, Utrecht University, Utrecht, The Netherlands
- 2Department of Information and Computing Sciences, Faculty of Science, Utrecht University, Utrecht, The Netherlands
- 3Univ. Grenoble Alpes, Université de Toulouse, Météo-France, CNRS, CNRM, Centre d'Études de la Neige, Grenoble, France
Snow water equivalent (SWE) is an important component of the hydrological cycle but still faces large uncertainties in its quantification due to its high temporal and spatial variability. While machine learning (ML) has been applied to multiple domains in hydrology, its use for SWE prediction has been hindered by limited observational training data beyond the local scale. Hybrid models that integrate simulated data from physics-based models with a ML setup may overcome this lack of observations, outperforming both physics-based models and conventional ML approaches in data-scarce regions.
In this project, we tested two different hybrid ML setups that predict the daily change in SWE using Crocus snow model simulations together with data from ten meteorological and snow observation stations throughout the northern hemisphere containing 7-20 years of data. The first setup follows a common post-processor approach where the outputs and state variables from Crocus are fed as additional predictors to the ML model at each time step. The second setup follows the concept of data augmentation, where Crocus is used to simulate SWE for stations for which no observations are available. These simulations are then fed as additional data points to the ML model, but are weighted in the loss function to control their influence during training.
The obtained results show that the post-processor approach is best suited for predicting SWE in years excluded during training. However, when predicting SWE in untrained stations the data augmentation setup achieves the largest increase in performance, reducing the root mean squared error by 22% compared to Crocus and by 42% compared to the measurement-based ML model. A feature importance analysis reveals that the hybrid model predictions are influenced the most by the current SWE status, incoming radiation, snowfall and air temperature. These results showcase the potential of hybrid models for predicting variables that suffer from data scarcity such as SWE.
How to cite: Pomarol Moya, O., Karssenberg, D., Immerzeel, W. W., Kraaijenbrink, P., Nussbaum, M., Mehrkanoon, S., and Gouttevin, I.: Bridging machine learning and physics-based models for improving snow water equivalent predictions in the northern hemisphere, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-11168, https://doi.org/10.5194/egusphere-egu25-11168, 2025.