Large Sample, High Dimension Hydrology Dataset Validation: Getting Bit By Bytes
- University of British Columbia, Civil Engineering, Vancouver, Canada (dkovacek@mail.ubc.ca)
Many large hydrometeorological datasets have been developed and published in recent years in support of wide applications from physical to machine learning models, and from operations forecasting to prediction in ungauged basins. The HYSETS database (Arsenault et al. 2019) is one such large-sample dataset featuring numerous physiographic, geologic, and climate attributes associated with over fourteen thousand monitored watersheds in North America and Mexico. The wide array of geospatial data sources used to extract the many basin attributes described by this dataset, combined with the continental scale of study regions, necessitates the assembly of geospatial data sources with non-uniform properties and the analysis of observations collected by different governing organizations.
In this study, the static basin attribute set derived for the HYSETS database was replicated. Preliminary results suggest that incorporating updated geospatial data sources such as higher resolution DEM, and the interpretation of basin attribute derivations due to the use of different software packages, can yield distinct estimates of statistical properties of basin attributes with implications for their use as model input data. At the very least, the preliminary results demonstrate that the greater the size and complexity of a dataset, the greater the likelihood of introducing bias and computational error.
How to cite: Kovacek, D., Eugeni, S., and Weijs, S.: Large Sample, High Dimension Hydrology Dataset Validation: Getting Bit By Bytes, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-10908, https://doi.org/10.5194/egusphere-egu22-10908, 2022.