ES1.6

Open Data - data, application development, impact

ES1.6

Open Data - data, application development, impact

Including EMS Technology Achievement Award

Convener: Hella Riede | Co-conveners: Roope Tervo, Björn Reetz, Håvard Futsæter

Orals

| Mon, 05 Sep, 14:00–17:15 (CEST)|Room HS 2

Posters

| Attendance Tue, 06 Sep, 09:00–10:30 (CEST)|b-IT poster area

Orals: Mon, 5 Sep | Room HS 2

Chairpersons: Hella Riede, Roope Tervo

Oral session part 1

14:00–14:15

EMS2022-347

Onsite presentation

Open Data from a mixed on-premise and cloud environment at the Finnish Meteorological Institute

Mikko Visa

Finnish Meteorological Institute has a long history of providing open data. Partly as a result of the INSPIRE directive almost all important data was opened back in 2013. Because of this we have quite a long data usage record and experience on technical solutions and user needs. The presentation will open up the current status and future development keeping in mind the upcoming WMO WIS2 development as well as the Open Data directive with its High Value Dataset proposal.

Data is provided via machine-readable interfaces as well as human usable web interfaces. We use on-premise storage and interfaces and in addition offer cloud-based distribution via Amazon Public Dataset program. The current operational interfaces are based on WFS 2.0, WMS and Amazon S3. Most recently added datasets include weather and flood warnings in Common Alerting Protocol (CAP) format and radar data archive via Amazon S3 in GeoTIFF and HDF5 formats. There is development ongoing for providing data via even more developer-friendly interfaces such as the OGC EDR and OGC Features API. Also new data is being added continuously based on our own and user needs.

An impact study has also been conducted for the usage of data from 2018 which revealed some findings on what data is used and how it impacts the users and their potential businesses. Also valuable information on the future needs of users was gathered and the most important findings of this study will be presented during the session. Based on these findings an impact dashboard is currently being setup to continuously monitor the impact and usage of our open data offering.

How to cite: Visa, M.: Open Data from a mixed on-premise and cloud environment at the Finnish Meteorological Institute, EMS Annual Meeting 2022, Bonn, Germany, 5–9 Sep 2022, EMS2022-347, https://doi.org/10.5194/ems2022-347, 2022.

14:15–14:30

EMS2022-380

Onsite presentation

ZAMG Data Hub – Open access to high value data sets

Erika Dautz, Irene Teubner, Martin Auer, Alexander Beck, Fabian Pechstein, Julia Schöberl, Bernhard Stuxer, and Daniel Lang

With new open data policies in place, the European Commission aims to speed up the development of innovative services and products, thereby playing a significant role in economic growth. In particular, the reuse of so-called high value data sets is expected to bring remarkable benefits to society and economics. By being defined as high value information, meteorological data sets need to be made available by public sector bodies free of charge and with minimal legal restrictions, published via APIs and in machine readable formats.

Austria's national weather service Zentralanstalt für Meteorologie und Geodynamik (ZAMG) is aiming to realize those directives by publishing public sector data on the ZAMG Data Hub. Data sets are made available for the public in a step wise approach, which has started in 2021 with providing station measurement data and raster data sets and continues with the preparation of forecasts. Besides the open data sector, the ZAMG Data Hub provides commercial data for registered users and serves as a data platform for universities and ZAMG internal users. The main services and interfaces for the end user are a CKAN based web portal, a REST API, and an associated Metadata Hub, which are connected to the underlying Zarr and S3 file storage infrastructure by state-of-the-art technologies and components. The numerous use cases and actors, the large amount of data to be stored and processed together with the demand of offering a highly available and reliable infrastructure brings challenging questions regarding the architecture of the system and its components. Key aspects are the need for a powerful access control system, scalable and performant data access and easy-to-use metadata information. Besides continuous improvements, ZAMG wants to provide further data sets and implement additional data services to fulfill open data directives and maximize customer satisfaction.

How to cite: Dautz, E., Teubner, I., Auer, M., Beck, A., Pechstein, F., Schöberl, J., Stuxer, B., and Lang, D.: ZAMG Data Hub – Open access to high value data sets, EMS Annual Meeting 2022, Bonn, Germany, 5–9 Sep 2022, EMS2022-380, https://doi.org/10.5194/ems2022-380, 2022.

14:30–14:45

EMS2022-141

Online presentation

Tiny Weather Forecast Germany - an open source weather app based on open data from the Deutscher Wetterdienst (DWD)

Pawel Dube

Weather data and especially a precise weather forecast on mobile devices is of great public interest. However, many apps for mobile devices are closed-source and even a lot of open source weather apps rely on commercial weather data providers. An other drawback of many weather apps is a great number of requested permissions on the device.

The idea was to create a free, open source weather app based on the open data from the Deutscher Wetterdienst (DWD). An other requirement was a privacy-friendly design without user-tracking. Furthermore, the package size should remain low and the amount of data traffic should be limited as far as possible.

The app is continuously developed since 2020. The first version was released in July 2020, and it left beta status in September 2020. It started with a simple weather forecast, features added over time include weather warnings for Germany, weather texts provided by the DWD and a rain radar. Lately, notifications about warnings applying to the selected location were added.

The project is licensed under the “GNU General Public License v3.0 or later”, ensuring that it will remain open source. It uses only open source code, especially no binary or closed-source libraries were used. This contributes to transparency and keeps the size of the program package fairly low (approx. 7 MB, version 0.58.0). The number of required permissions is very limited. Internet traffic only takes place between the app and the open data server of the DWD in order to fetch weather forecast data and weather warnings.

The project also uses non-commercial infrastructures on the web: it is hosted on codeberg.org, published on fdroid.org and translations take place on a private platform. Many volunteers contributed to the project over time and translated it into many languages.

Challenges were a proper implementation of weather warnings without using proprietary push services, and also the correct implementation and use of the open data without any special knowledge on meteorology despite a very good documentation by the DWD.

How to cite: Dube, P.: Tiny Weather Forecast Germany - an open source weather app based on open data from the Deutscher Wetterdienst (DWD), EMS Annual Meeting 2022, Bonn, Germany, 5–9 Sep 2022, EMS2022-141, https://doi.org/10.5194/ems2022-141, 2022.

14:45–15:00

EMS2022-468

Onsite presentation

DWD Geoportal – A central hub for Open Data, API and communication

Björn Reetz, Hella Riede, Dirk Fuchs, Matthias Jerg, and Renate Hagedorn

In November 2021, DWD launched the pilot phase of the new DWD Geoportal at dwd-geoportal.de. It offers user-friendly exploration of DWD's Open Data, combining access to data, metadata, and documentation along with fulltext search and interactive previews powered by OGC WMS. This new front end covers datasets hosted on opendata.dwd.de as well as other DWD Open Data sources, for instance the Climate Data Centre and the DWD GeoServer.

The contribution will outline the latest version of the new DWD Geoportal and discuss current development and features that are planned for integration in the near future.

Open Data has been a part of the DWD data distribution strategy since 2017, starting with a small selection of meteorological products, but the number of available datasets has grown continuously over the last years. Since the start, users can download file-based meteorological products without registration. Free access and the variety of products have been welcomed by the general public as well as private met service providers. However, the more datasets were provided in a directory structure, the more tedious it became to find and select among all available data. Also, metadata and documentation were available, but on separate DWD websites. The DWD Geoportal adressed these trends, especially having new users of DWD's open data in mind.

Cloud technology is a suitable way forward for hosting the geoportal in it's operational state along with the data. Benefits are expected for the easy integration of rich APIs with the DWD Geoportal, and the flexible and fast deployment and scaling of optional or prototypical services such as WMS-based previews. DWD focuses on cloud technology for new development projects.

How to cite: Reetz, B., Riede, H., Fuchs, D., Jerg, M., and Hagedorn, R.: DWD Geoportal – A central hub for Open Data, API and communication, EMS Annual Meeting 2022, Bonn, Germany, 5–9 Sep 2022, EMS2022-468, https://doi.org/10.5194/ems2022-468, 2022.

15:00–15:15

EMS2022-421

Presentation form not yet defined

Introducing BitTorrent : a scary but efficient way to disseminate archive and real-time data

(withdrawn)

Nicolas Baldeck

15:15–15:30

EMS2022-263

Online presentation

A modernised Data Store infrastructure for improving the access to Copernicus Climate and Atmosphere data and services.

Angel Lopez, Carlo Buontempo, Martin Suttie, Baudouin Raoult, Edward Comyn-Platt, and James Varndell

The Copernicus Climate (CDS) and Atmosphere (ADS) Data Stores implemented by ECMWF on behalf of the EC are instances of a shared underlaying infrastructure which was designed as a distributed system and open framework to provide seamless web-based and API-based access to a wide catalogue of datasets, tools, applications and other digital information fulfilling the objectives of the Services. Such an approach also allowed the implementation ofquality controlled standards. The infrastructure also integrates a Toolbox platform to perform operations and create web-based applications .that can be subsequently made available to end-users within the Data Store portals or even embedded on external platforms as in the case of Climate-Adapt (EEA). Due to the adoption of FAIR guiding principles (Findable, Accessible, Interoperable, Reusable) and international recognized standards across different components of the infrastructure, the Data Stores are currently able to interoperate and establish close synergies with other data and services platforms such as WEkEO. The Data Store infrastructure is hosted in an on-premises Cloud physically located within ECMWF premises in Bologna providing elasticity of resources and automated deployment capabilities.

Having grown at a steady rate in terms of users, functional capabilities, workload and content since their official opening, the infrastructure is now looking to the new challenges and opportunities that lay ahead. In the coming future the Data Stores will remain at the core of both C3S and CAMS Services but the underlaying infrastructure is in the process for being further improved. Taking onboard operational experience, user feedback, lessons learned, know-how and updated technologies which may have evolved since the initial implementation are the key priorities of this new phase. The final objective of this modernisation effort is to make the current services more accessible and fully embrace Open-Source scientific software to ensure compatibility with state-of-the-art solutions such as machine learning, data cubes and interactive notebooks. In summary the Data Stores are evolving into a modern, cloud-based, more usable and interoperable infrastructure that will allow to: better meet the evolving requirements, scale-up according to increased demand, strengthen synergies with other platforms such as WEkEO, contribute to related initiatives such as the Destination Earth system and become a core building block for the European green and digital transformation.

After several years in operation, more than 130k registered users and daily rates of 90 TB of data delivered , the aim of this presentation is to guide the audience through the past and present of the Climate and Atmosphere Data Stores and their Toolbox and engage participants into a discussion about the future infrastructure which is currently under development.

How to cite: Lopez, A., Buontempo, C., Suttie, M., Raoult, B., Comyn-Platt, E., and Varndell, J.: A modernised Data Store infrastructure for improving the access to Copernicus Climate and Atmosphere data and services., EMS Annual Meeting 2022, Bonn, Germany, 5–9 Sep 2022, EMS2022-263, https://doi.org/10.5194/ems2022-263, 2022.

Coffee break

Chairpersons: Håvard Futsæter, Björn Reetz, Hella Riede

Oral session part 2

16:00–16:10

EMS2022-726

solicited

Presentation form not yet defined

Introduction to the EMS Technology Achievement Award (TAA) by Robert Mureau, Chair of the TAA Committee

Robert Mureau

16:10–16:30

EMS2022-725

solicited

EMS Technology Achievement Award

Onsite presentation

The Weather Observations Website

Ken Mylne, Simon Gilbert, Hannah Male, Ed Pavelin, Jacqueline Sugier, Jouke De Baar, Maarten Reyniers, Josef Runbäck, Joanne Walker, Kevin Alder, and David Gooding

16:30–16:45

EMS2022-189

Online presentation

Open-Data and the Citizens: gathering weather and climate data in a digital common, crowdsourcing from the community, and producing value-added tools for the ecosystem

Frederic Ameye and the Infoclimat team

Infoclimat is a non-profit organization created 20 years ago, aiming at facilitating the use, production and dissemination of weather and climate data, to experts and to the citizens, while promoting scientific education towards the general public and children.

The organization, constitued of hundreds of volunteers and no employees at this time, has started its work by gathering sources of open weather data at the beginning of this century. From a few French weather stations under WMO Resolution 40, to more than 18.000 weather stations around the world today, and 6 billion climate records accessible. Today, there are more than 40 sources of data that are processed, from the GTS SYNOP records to national open-data APIs made accessible by NWS all over the world, to custom IoT data transmission for owned weather stations. Often, those records do not use the same standards, data granularity, or time granularity, which make merging them the trickiest part of the platform.

The citizens are also part of this development : more than 1800 of their weather stations, strictly quality-controlled, complement the official networks, maintained by the organization or its contributors in France, or its non-profit counterparts in other European countries. The data produced by the organization network is placed under open licenses, and its access is made possible through a standardized API. The metadata, available to all consumers of the data, is followed closely by a team of enthusiasts and algorithms, which monitors data quality, instruments calibration, and environmental changes. This way, high-quality data is obtained, in environments that are agreed with the national weather service Météo-France.

The platform maintained by the volunteers of Infoclimat provides access to data in a common form, whatever its source, with respect to the producers licenses. There are many uses of the data stored in the platform : climatological analyses, real-time or climate interactive maps, national indicators, and data fusion with gridded products (eg. reanalysis, Copernicus products,...). The tools are made to be accessible to the general public for educational purposes, on computers or smartphones, but they are design so that more professionnal users can dig out the data, analyze them further or download raw data.

Finally, the organization is now changing its scale : from 100% volunteers, to now a first developer, from a budget of 30.000€ in 2015 to 100.000€ in 2022. The platform is shifting towards better internationalization, to make those tools available in more languages, and aims at integrating more regionalized climate predictions (like CORDEX), for better user awareness on climate change. Also, there is a work in progress to make the platform available for scientific teams to store and analyze their data reliably and in a persistent way, for example for researchers on Urban Heat Islands.

How to cite: Ameye, F. and the Infoclimat team: Open-Data and the Citizens: gathering weather and climate data in a digital common, crowdsourcing from the community, and producing value-added tools for the ecosystem, EMS Annual Meeting 2022, Bonn, Germany, 5–9 Sep 2022, EMS2022-189, https://doi.org/10.5194/ems2022-189, 2022.

16:45–17:00

EMS2022-424

Onsite presentation

GeoE3 - combining meteorological data with geospatial and statistical data

Mikko Visa

GeoE3 (Geospatially Enabled Ecosystem for Europe) is a project co-financed by the Connecting Europe Facility of the European Union. It aims to connect existing national, regional and cross-border services and data sets including meteorological, statistical and geospatial such as building data or road network data. This simplifies meaningful analysis and visualization in a national and cross-border context.

GeoE3 develops tools and APIs that will merge available information from national sources. It demonstrates data and service interoperability and creates dashboards and visualizations for an improved understanding of data from a variety of sources. To produce tangible results that can serve as best practices for other domains, disciplines, and areas, GeoE3’s tools and services address compelling use cases. The action will simplify the discovery of relevant data and improve services through the adoption of the latest standards with emphasis on Web APIs and linked data principles.

The three use cases include Solar Energy, Smart Cities and Electric Cars. Investigation of the solar energy potential and energy efficiency of buildings is based on detailed 3D building data, digital elevation models, climate normals, observations and forecasts. The electric cars use case is based on evaluation and prediction of the energy consumption of electric cars as well as providing new services for the European C-ITS platform (Cooperative Intelligent Transport Systems). This use case will use 2D and 3D road data, weather and traffic data, road signs and speed limits. Smart Cities explores renewable energy potential where a planned development area is considered and deals with the optimization of the efficiency of urban expansion.

From a meteorological domain perspective GeoE3 will deliver an implementation of the OGC EDR API built on top of Finnish Meteorological Institute open source SmartMet Server data server. Focus during this year will be on datasets necessary to fulfill the GeoE3 project use cases.

How to cite: Visa, M.: GeoE3 - combining meteorological data with geospatial and statistical data, EMS Annual Meeting 2022, Bonn, Germany, 5–9 Sep 2022, EMS2022-424, https://doi.org/10.5194/ems2022-424, 2022.

17:00–17:15

EMS2022-17

Onsite presentation

Providing AI- and ML-ready data

Roope Tervo and Mike Grant

Artificial Intelligence (AI) and Machine Learning (ML)-applications have become a huge hype. What does it mean to serve data for AI and ML? EUMETSAT climate reprocessing data records try to meet following guidelines as far as possible.

In ML applications data is typically combined from several sources. Training ML model needs normally a long history of data. Typical environmental ML applications employ 1-5 years of historical data while for example impact forecasts require often at least 10 years of history to contain enough extreme weather samples. ML applications are often trained with history data but applied to near-real-time (NRT) data. Thus, corresponding NRT data should be always available.

The historical data series should obviously be as harmonised as possible. However, the harmonisation doesn’t need to be perfect. Small changes in the data are not necessary affecting the performance of ML model too much. The changes in the underlying data should well documented.

Data quality is also very important aspect as ML models are just as good as underlying data. Thus, quality flags should be always available and provided in a way that they can be used to filter out bad samples. While reasonable assumption for default is to provide only good quality data, also other samples should be available as sometimes more lower quality data yields better results than less higher quality data. Whenever possible, users should be provided with option to access the raw data as well since it may open avenues to new ways to apply ML models or pre-process.

Data access should be obviously as fast as possible and all data should always be served from online data storage. As datasets are almost always combined with each other, data formats should be as well-known and supported as possible, even that would mean loss of metadata. Typically, it’s better to provide metadata beside the actual data and keep the data as consist as possible.

Some of the ML methods, such as Random Forests (RF) are more often used for supervised learning to specific points while i.e. neural networks (NN) are used for images and gridded fields, tensors. Serving data for point-based applications greatly benefits from API capable to provide best representative samples for any given point so that it’s easy to be combined with labels. Serving data for grid-based applications, however, benefit of relatively raw interfaces, such as S3, with wide client support. Critical requirements for the interface and the data model is to enable sub-setting and slicing.

Finally, providing well-known and documented reference datasets with ready labels would be highly beneficial for ML developers. Such general domain datasets, such as the Iris Dataset already exist. Meteorological community should publish such datasets along with ready methods in common libraries to load the dataset easily.

How to cite: Tervo, R. and Grant, M.: Providing AI- and ML-ready data, EMS Annual Meeting 2022, Bonn, Germany, 5–9 Sep 2022, EMS2022-17, https://doi.org/10.5194/ems2022-17, 2022.

Discussion and wrap-up

Display time: Tue, 6 Sep, 08:00–18:00

Posters: Tue, 6 Sep, 09:00–10:30 | b-IT poster area

Chairpersons: Hella Riede, Håvard Futsæter, Björn Reetz

Poster session and open discussion on open data

EMS2022-480

Onsite presentation

DWD-Crowdsourcing: User Reports available on Open Data

Arne Spitzer, Harald Kempf, Matthias Jerg, and Ulrich Blahak

Since July 2020 the DWD WarnWetter-App comprises the Crowdsourcing-module “User Reports”. This module provides users the functionality to report observations about current weather conditions and severe weather to DWD and other users. The data is daily collected and available on DWD’s Open Data portal (https://opendata.dwd.de/weather/crowdsourcing/warnwetter/).

The user reports represent the current meteorological conditions at a certain place at a certain point of time. The Crowdsourcing-module provides 10 different meteorological categories (lightning, wind, hail, rain, wet icy conditions, snowfall, snow cover, cloudiness, fog, tornado), each of which contains specific characteristic levels and optionally additional attributes. In addition, the user has the option of setting the location und time of the event manually.

The benefit of the data is that meteorological information at ground level is collected at places where no weather station is located in the immediate vicinity. The dataset is able to complement the existing synoptic station network. In the future, the data could improve the evaluation of the current meteorological conditions and the warning management particularly during extreme weather events.

There is no sophisticated quality control for the user reports. Instead, the users are expected to estimate and report the weather conditions as accurate as possible. Badly inaccurate and false reports are detected by reference data and are excluded instantly. Additionally, in the app users have the opportunity to manually flag meteorologically doubtful reports. Other quality assurance methods are under development.

This contribution contains some numbers and statistics on previous user reports, shows some meteorologically interesting cases, and gives an insight into quality control.

How to cite: Spitzer, A., Kempf, H., Jerg, M., and Blahak, U.: DWD-Crowdsourcing: User Reports available on Open Data, EMS Annual Meeting 2022, Bonn, Germany, 5–9 Sep 2022, EMS2022-480, https://doi.org/10.5194/ems2022-480, 2022.

EMS2022-502

Onsite presentation

Open-data licenses for meteorological datasets - a detailled overview

(withdrawn)

Nicolas Baldeck

EMS2022-722

Onsite presentation

Open Data at ECMWF

Emma Pidduck, Victoria Bennett, Xiaobo Yang, Maartje Kuilman, Ilaria Parodi, and Baudouin Raoult

As the world focuses its efforts on understanding and mitigating the impacts of climate change, historical and predictive weather and climate data have become critical inputs by a broad user base, ranging from policy and decision-makers, local and national governments, private sector entities, as well as the public. The potential consequences of not making full use of this data can have a devastating effect in terms of loss of and disruption to life, but also financially. In 2021, it is estimated that natural disasters led to damages and losses of US$343 billion, of which European flooding events contributed US$13bn alone. Of these, the ten most costly events caused 9,500 deaths.

ECMWF products and weather data in general contribute to a broad range of activities by service providers, and their use enables and enhances the protection of life and property by National Weather Services and humanitarian agencies. The broad reach, use and significance of the information require that data and derived outputs are communicated effectively, promptly, and without restriction wherever possible.

Open data has been recognised as one of the main tools to maximise the socio-economic benefits of investments in weather and climate data production and forms a key part of the ECMWF Strategy between now and 2030. To realise the full potential of open policies, data need to be easily accessible and with the appropriate supporting information to allow users to derive information and form valuable conclusions.

This presentation highlights the ECMWF roadmap for the transition to open data, including a summary of recent changes to data policy and data access methods, as well as the proposed changes in the coming years, such as new open datasets and reductions in the cost of data. The challenges associated with the transition are also presented.

How to cite: Pidduck, E., Bennett, V., Yang, X., Kuilman, M., Parodi, I., and Raoult, B.: Open Data at ECMWF, EMS Annual Meeting 2022, Bonn, Germany, 5–9 Sep 2022, EMS2022-722, https://doi.org/10.5194/ems2022-722, 2022.

ES1.6

Orals: Mon, 5 Sep | Room HS 2

Posters: Tue, 6 Sep, 09:00–10:30 | b-IT poster area

Supporters & sponsors