Applying Non-Random Block Cross-Validation to Improve Reliability of Model Selection and Evaluation in Hydrology: An illustration using an algorithmic model of seasonal snowpack

Charles Luce; Abigail Lute

doi:https://doi.org/10.5194/egusphere-egu2020-12176

[Back] [Session HS8.1.6]

EGU2020-12176

https://doi.org/10.5194/egusphere-egu2020-12176

EGU General Assembly 2020

© Author(s) 2020. This work is distributed under
the Creative Commons Attribution 4.0 License.

Applying Non-Random Block Cross-Validation to Improve Reliability of Model Selection and Evaluation in Hydrology: An illustration using an algorithmic model of seasonal snowpack

Charles Luce¹ and Abigail Lute²

Charles Luce and Abigail Lute

¹US Forest Service Research, Rocky Mountain Research Station, Boise, United States of America (charlie.luce@usda.gov)
²University of Idaho Water Resources Graduate Program

A central question in model structural uncertainty is how complex a model should be in order to have greatest generality or transferability. One school of thought is that models become more general by adding process subroutines. On the other hand, model parameters and structures have been shown to change significantly when calibrated to different basins or time periods, suggesting that model complexity and model transferability may be antithetical. An important facet to this discussion is noting that validation methods and data applied to model evaluation and selection may tend to bias answers to this question. Here we apply non-random block cross-validation as a direct assessment of model transferability to a series of algorithmic space-time models of April 1 snow water equivalent (SWE) across 497 SNOTEL stations for 20 years. In general, we show that low to moderate complexity models transfer most successfully to new conditions in space and time. In other words, there is an optimum between overly complex and overly simple models. Because structures in data resulting from temporal dynamics and spatial dependency in atmospheric and hydrological processes exist, naïvely applied cross-validation practices can lead to overfitting, overconfidence in model precision or reliability, and poor ability to infer causal mechanisms. For example, random k-fold cross-validation methods, which are in common use for evaluating models, essentially assume independence of the data and would promote selection of more complex models. We further demonstrate that blocks sampled with pseudoreplicated data can produce similar outcomes. Some sampling strategies favored for hydrologic model validation may tend to promote pseudoreplication, requiring heightened attentiveness for model selection and evaluation. While the illustrative examples are drawn from snow modeling, the concepts can be readily applied to common hydrologic modeling issues.

How to cite: Luce, C. and Lute, A.: Applying Non-Random Block Cross-Validation to Improve Reliability of Model Selection and Evaluation in Hydrology: An illustration using an algorithmic model of seasonal snowpack , EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-12176, https://doi.org/10.5194/egusphere-egu2020-12176, 2020

Displays

Display file