Effect of merging large datasets on prediction accuracy of low flow estimation by random forest

Johannes Laimighofer; Gregor Laaha

doi:https://doi.org/10.5194/egusphere-egu22-7312

[Back] [Session HS2.5.1]

EGU22-7312, updated on 28 Mar 2022

https://doi.org/10.5194/egusphere-egu22-7312

EGU General Assembly 2022

© Author(s) 2022. This work is distributed under
the Creative Commons Attribution 4.0 License.

Effect of merging large datasets on prediction accuracy of low flow estimation by random forest

Johannes Laimighofer and Gregor Laaha

University of Natural Resources and Life Sciences, Institute of Statistics, Department of Landscape, Spatial and Infrastructure Sciences, Austria (johannes.laimighofer@boku.ac.at)

Low flow estimation is a crucial part in water management. Prediction of low flow in ungauged basins is often performed through statistical models. This can be either regionalization approaches, where homogeneous regions are used for modeling, or single model frameworks that range from simple linear models to more complex as random forest, support vector regression or deep learning approaches. Although there are large sample studies for the US (e.g. Tyralis et al. 2021) or Australia (e.g. Worland et al. 2018), we are not aware of a study that combines different large datasets and analyzing the effect on prediction accuracy. We are hypothesing that the heterogeneity of many datasets together can improve prediction accuracy for tree-based models relative to linear models. Hence, we propose to combine several similar datasets and analyze the effect on prediction accuracy for estimating Q95 by a simple random forest model.

Our study uses four large hydrological datasets – CAMELS-GB (Coxon et al. 2020), CAMELS-US (Addor et al. 2017), CAMELS-AUS (Fowler et al. 2021) and LamaH-CE (Klinger et al., 2021). We are applying a random forest model to ensure that interactions and non-linearity can be captured. Prediction accuracy is evaluated by leave one out cross-validation (LOOCV) and several performance metrics, e.g. median absolute error (MDAE), or root mean squared error (RMSE). LOOCV is used for each individual dataset and in one run for the merged dataset. Results indicate that merging datasets can improve prediction accuracy, but models fail to correctly predict low flows around zero.

References

Addor, N., Newman, A. J., Mizukami, N., and Clark, M. P.: The CAMELS data set: catchment attributes and meteorology for large-sample studies, Hydrol. Earth Syst. Sci., 21, 5293–5313, https://doi.org/10.5194/hess-21-5293-2017, 2017.
Fowler, K. J. A., Acharya, S. C., Addor, N., Chou, C., and Peel, M. C.: CAMELS-AUS: hydrometeorological time series and landscape attributes for 222 catchments in Australia, Earth Syst. Sci. Data, 13, 3847–3867, https://doi.org/10.5194/essd-13-3847-2021, 2021.
Coxon, G., Addor, N., Bloomfield, J. P., Freer, J., Fry, M., Hannaford, J., Howden, N. J. K., Lane, R., Lewis, M., Robinson, E. L., Wagener, T., and Woods, R.: CAMELS-GB: hydrometeorological time series and landscape attributes for 671 catchments in Great Britain, Earth Syst. Sci. Data, 12, 2459–2483, https://doi.org/10.5194/essd-12-2459-2020, 2020.
Klingler, C., Schulz, K., and Herrnegger, M.: LamaH-CE: LArge-SaMple DAta for Hydrology and Environmental Sciences for Central Europe, Earth Syst. Sci. Data, 13, 4529–4565, https://doi.org/10.5194/essd-13-4529-2021, 2021.
Tyralis, H.; Papacharalampous, G.; Langousis, A.; Papalexiou, S.M. Explanation and Probabilistic Prediction of Hydrological Signatures with Statistical Boosting Algorithms. Remote Sens. 2021, 13, 333. https://doi.org/10.3390/rs13030333
Worland, S. C., Farmer, W. H., and Kiang, J. E.: Improving predictions of hydrological low-flow indices in ungaged basins using machinelearning, Environmental modelling & software, 101, 169–182, https://doi.org/10.1016/j.envsoft.2017.12.021, 2018.

How to cite: Laimighofer, J. and Laaha, G.: Effect of merging large datasets on prediction accuracy of low flow estimation by random forest, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-7312, https://doi.org/10.5194/egusphere-egu22-7312, 2022.

Displays

Display file