EGU24-4280, updated on 08 Mar 2024
https://doi.org/10.5194/egusphere-egu24-4280
EGU General Assembly 2024
© Author(s) 2024. This work is distributed under
the Creative Commons Attribution 4.0 License.

Exploring Machine Learning Models to Detect Outliers in HydroMet Sensors

Iram Parvez
Iram Parvez
  • University of Genova, DICCA, Italy (iram.parvez@edu.unige.it)

Iram Parvez1, Massimiliano Cannata2, Giorgio Boni1, Rossella Bovolenta1 ,Eva Riccomagno3 , Bianca Federici1

1 Department of Civil, Chemical and Environmental Engineering (DICCA), Università degli Studi di Genova, Via Montallegro 1, 16145 Genoa, Italy (iram.parvez@edu.unige.it,bianca.federici@unige.it, giorgio.boni@unige.it, rossella.bovolenta@unige.it).

2 Institute of Earth Sciences (IST), Department for Environment Constructions and Design (DACD), University of Applied Sciences and Arts of Southern Switzerland (SUPSI), CH-6952 Canobbio, Switzerland(massimiliano.cannata@supsi.ch).

3 Department of Mathematics, Università degli Studi di Genova, Via Dodecaneso 35, 16146 Genova, Italy(riccomag@dima.unige.it).

The deployment of hydrometeorological sensors significantly contributes to generating real-time big data. The quality and reliability of large datasets pose considerable challenges, as flawed analyses and decision-making processes can result. This research aims to address the issue of anomaly detection in real-time data by exploring machine learning models. Time-series data is collected from IstSOS - Sensor Observation Service, an open-source software that stores, collects and disseminates sensor data. The methodology consists of Gated Recurrent Units based on recurrent neural networks, along with corresponding prediction intervals, applied both to individual sensors and collectively across all temperature sensors within the Ticino region of Switzerland. Additionally, non-parametric methods like Bootstrap and Mean absolute deviation are employed instead of standard prediction intervals to tackle the non-normality of the data. The results indicate that Gated Recurrent Units based on recurrent neural networks, coupled with non-parametric forecast intervals, perform well in identifying erroneous data points. The application of the model on multivariate time series-sensor data establishes a pattern or baseline of normal behavior for the area (Ticino). When a new sensor is installed in the same region, the recognized pattern is used as a reference to identify outliers in the data gathered from the new sensor.

How to cite: Parvez, I.: Exploring Machine Learning Models to Detect Outliers in HydroMet Sensors, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-4280, https://doi.org/10.5194/egusphere-egu24-4280, 2024.