4-9 September 2022, Bonn, Germany
EMS Annual Meeting Abstracts
Vol. 19, EMS2022-467, 2022, updated on 28 Jun 2022
https://doi.org/10.5194/ems2022-467
EMS Annual Meeting 2022
© Author(s) 2022. This work is distributed under
the Creative Commons Attribution 4.0 License.

ML Driven Imputation of Precipitation Data Collected at High Sampling Rates

Peter Lünenschloß1, David Schäfer1, Florian Gransee1, Antje Claußnitzer2, Thomas Schartner2, and Jan Bumberger1
Peter Lünenschloß et al.
  • 1UFZ - Helmholtz Centre for Environmental Research, Leipzig, Germany (peter.luenenschloss@ufz.de)
  • 2DWD - Deutscher Wetter Dienst, Potsdam, Germany (antje.claussnitzer@dwd.de)

For the reduction of climate change and the understanding of the effects of anthropogenic interventions on environmental systems, the monitoring of these systems is a fundamental requirement that relies heavily on the availability of extensive but consistent data sets.

Quality control tests and consistency routines that generate those datasets from available sensor data will inevitably produce data gaps, where measurement data does not pass tests or is simply not available.

However, most further data utilization will need those data gaps to be filled (imputed) in a consistent way. This consistency usually is assured by having a good set of predictors, together with a suitable method for predicting the variable that is to be imputed.

This is also true for precipitation, a meteorologic variable that is fundamental to the understanding of hydro logical system dynamics but notoriously hard to predict at the micro climatic scale, with sampling rates exceeding the one hour mark.

We conducted an imputation study with machine learning methods on precipitation time series collected in a reference set of gauging stations that are a subset of the wider network of the german meteorologic service (DWD), where precipitation and other meteorological data is available at a 10 minute sampling rate.

We trained an Extreme Gradient Boosted Tree classifier and a Deep Neural Network regressor on a 10 years record of those data. We selected several distinct sets of predictors available in the surrounding of the reference station based on temporal and spatial proximity and evaluated the feature importance at different proximity value levels.

Assuming that the imputation does not have to be performed at real time, but serves as a post-processing step, we could extend the set of bounding conditions to measurements obtained in the future of the gap to be imputed, and could thus improve over results obtained in regular forecasting scenarios.

To further improve the imputation results, especially for the matching of singular and erratic rainfall events, we aligned spatio-temporally separated measurements of the same (traveling) rainfall events by including a non-linear time series stretching algorithm (dynamic time warping) into the samples preprocessing.

We observed, that meteorologic variables such as wind and humidity, that are useful for the prediction of precipitation at lower sampling rates, can not compensate for the noise their inclusion in the set of predictors results in, when imputing precipitation sampled at a 10 minutes rate.

However, with precipitation collected at neighboring stations used as predictors and the preprocessing measures taken, we were able to achieve a solid correlation score and could thus show, that ML-driven post processing routines enable imputations at high temporal resolutions, providing the end user with consistent precipitation data sets.

 

 

 

How to cite: Lünenschloß, P., Schäfer, D., Gransee, F., Claußnitzer, A., Schartner, T., and Bumberger, J.: ML Driven Imputation of Precipitation Data Collected at High Sampling Rates, EMS Annual Meeting 2022, Bonn, Germany, 5–9 Sep 2022, EMS2022-467, https://doi.org/10.5194/ems2022-467, 2022.

Supporters & sponsors