Data flow, harmonization, and quality control
- 1Alfred-Wegener-Institute, Computing and data center, Germany (bsilva@awi.de)
- *A full list of authors appears at the end of the abstract
Earth system cyberinfrastructures include three types of data services: repositories, collections, and federations. These services arrange data by their purpose, level of integration, and governance. For instance, registered data of uniform measurements fulfill the goal of publication but do not necessarily flow in an integrated data system. The data repository provides the first and high level of integration that strongly depends on the standardization of incoming data. One example here is the framework Observation to Archive and Analysis (O2A) that is operational and continuously developed at the Alfred-Wegener-Institute, Bremerhaven. A data repository is one of the components of the O2A framework and much of its functionality depends on the standardization of the incoming data. In this context, we focus on the development of a modular approach to provide the standardization and quality control for the monitoring of the near real-time data. Two modules are under development. First, the driver module transforms different tabular data to a common format. Second, the quality control module that runs the quality tests on the ingested data. Both modules rely on the sensor operator and on the data scientist, two actors that interact with both ends of the ingest component of the O2A framework (http://data.awi.de/o2a-doc). We demonstrate the driver and the quality control modules in the data flow within Digital Earth showcases that also connect repositories and federated databases to the end-user. The end-user is the scientist, who works closely in the development approach to ensure applicability. The result is the proven benefit of harmonizing data and metadata of multiple sources, easy integration and rapid assessment of the ingested data. Further, we discuss concepts and current development that aim at the enhanced monitoring and scientific workflow.
Stephan Frickenhaus (speaker)
How to cite: Silva, B., Fischer, P., Immoor, S., Denkmann, R., Maturilli, M., Weidinger, P., Rehmcke, S., Düde, T., Anselm, N., Gerchow, P., Haas, A., Schäfer-Neth, C., Schäfer, A., Frickenhaus, S., and Koppe, R. and the Computing and Data Centre of the Alfred-Wegener-Institute: Data flow, harmonization, and quality control, EGU General Assembly 2021, online, 19–30 Apr 2021, EGU21-10547, https://doi.org/10.5194/egusphere-egu21-10547, 2021.