A Random-Forest approach to predicting preferential-flow snowpack runoff: early results and outlook for the future
- 1University of Utah, Civil and Environmental Engineering, Water Resorces, United States of America (ryan.c.johnson@utah.edu)
- 2University of California, Berkely, Department of Civil and Environmental Engineering, United States of America
- 3National Research Institute for Earth Science and Disaster Resilience, Snow and Ice Research Center, Japan
- 4Division of Hydrology and Hydraulics, CIMA Research Foundation, Italy
Predicting the occurrence of preferential-flow snowpack runoff as opposed to spatially homogeneous matrix flow has recently become an important topic of cryosphere research, because of its implications for better understanding and forecasting wet-snow avalanche formation, streamflow generation during rain-on-snow events, and the polar-sheet water balance. Using twelve seasons of daily data from nine multi-compartment snow-lysimeters and concurrent weather and snowpack observations, we explored the accuracy of a machine-learning algorithm, Random Forest, in predicting the occurrence of preferential-flow snowpack runoff in a maritime context where sub-freezing conditions are rare (Nagaoka, Niigata prefecture, Japan). The algorithm was trained to predict three metrics of preferential-flow snowpack runoff: the coefficient of variation and standard and maximum deviations from mean spatial snowpack runoff. Two validation scenarios were used: one in which data were randomly subsampled from the entire period of record (66% training data, 33% testing), and a leave-one-year-out scenario, in which the model was trained on 11 years and tested on an unseen year. The latter was intended to represent a more realistic scenario in which limited data are available. Five tiers of features were used as inputs (independent variables) to the algorithm, including concurrent weather and bulk-snow properties, snow-atmosphere energy-balance components, internal snow structure, simulated matrix-flow snowpack runoff, and a selection of the five most important features from all previous groups. Relatively high model performance (Nash-Sutcliffe-Efficiency, NSE, > 0.53) was observed in all all-year scenarios, whereas the leave-one-year-out scenario displayed nearly a 50% reduction in performance, indicative of an inconsistent relationship across weather, snow conditions, and preferential-flow snowpack runoff generation between seasons. Random Forest also underestimated seasonal peaks in preferential flow, indicative of under-sampling in the dataset or unrepresented processes exceeding the spatial scale of multi-compartment lysimeters. This research presents an initial framework for understanding key factors influencing preferential-flow occurrence; improvements in algorithm accuracy could support predictions of preferential-flow snowpack runoff, especially in sparsely monitored regions.
How to cite: Johnson, R., Oroza, C., Avanzi, F., Satoru, Y., Hirashimia, H., and Maurer, T.: A Random-Forest approach to predicting preferential-flow snowpack runoff: early results and outlook for the future, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-1212, https://doi.org/10.5194/egusphere-egu2020-1212, 2019