EGU2020-14903
https://doi.org/10.5194/egusphere-egu2020-14903
EGU General Assembly 2020
© Author(s) 2020. This work is distributed under
the Creative Commons Attribution 4.0 License.

Providing a user-friendly outlier analysis service implemented as open REST API

Doron Goldfarb, Johannes Kobler, and Johannes Peterseil
Doron Goldfarb et al.
  • Environment Agency Austria, Ecosystem Research & Environmental Information Management, Vienna, Austria (doron.goldfarb@umweltbundesamt.at)

As outliers in any data set may have detrimental effects on further scientific analysis, the measurement of any environmental parameter and the detection of outliers within these data are closely linked. However, outlier analysis is complicated, as the definition of an outlier is controversially discussed and thus - until now - vague. Nonetheless, multiple methods have been implemented to detect outliers in data sets. The application of these methods often requires some statistical know-how.

The present use case, developed as proof-of-concept implementation within the EOSC-Hub project, is dedicated to providing a user-friendly outlier analysis web-service via an open REST API processing environmental data either provided via Sensor Observation Service (SOS) or stored as data files in a cloud-based data repository. It is driven by an R-script performing the different operation steps consisting of data retrieval,  outlier analysis and final data export. To cope with the vague definition of an outlier, the outlier analysis step applies numerous statistical methods implemented in various R-packages.

The web-service encapsulates the R-script behind a REST API which is decribed by a dedicated OpenAPI specification defining two distinct access methods (i.e. SOS- and file-based) and the required parameters to run the R-script. This formal specification is subsequently used to automatically generate a server stub based on the Python FLASK framework which is customized to execute the R-script on the server whenever an appropriate web request arrives. The output is currently collected in a ZIP file which is returned after each successful web request. The service prototype is designed to be operated using generic resources provided by the European Open Science Cloud (EOSC) and the European Grid Initiative (EGI) in order to ensure sustainability and scalability.

Due to its user-friendliness and open availability, the presented web-service will facilitate access to standardized and scientifically-based outlier analysis methods not only for individual scientists but also for networks and research infrastructures like eLTER. It will thus contribute to the standardization of quality control procedures for data provision in distributed networks of data providers.

 

Keywords: quality assessment, outlier detection, web service, REST-API, eLTER, EOSC, EGI, EOSC-Hub

How to cite: Goldfarb, D., Kobler, J., and Peterseil, J.: Providing a user-friendly outlier analysis service implemented as open REST API, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-14903, https://doi.org/10.5194/egusphere-egu2020-14903, 2020

Displays

Display file