Climate monitoring: data rescue, management, quality and homogenization


Robust and reliable climatic studies, particularly those assessments dealing with climate variability and change, greatly depend on availability and accessibility to high-quality/high-resolution and long-term instrumental climate data. At present, a restricted availability and accessibility to long-term and high-quality climate records and datasets is still limiting our ability to better understand, detect, predict and respond to climate variability and change at lower spatial scales than global. In addition, the need for providing reliable, opportune and timely climate services deeply relies on the availability and accessibility to high-quality and high-resolution climate data, which also requires further research and innovative applications in the areas of data rescue techniques and procedures, data management systems, climate monitoring, climate time-series quality control and homogenisation.
In this session, we welcome contributions (oral and poster) in the following major topics:
• Climate monitoring , including early warning systems and improvements in the quality of the observational meteorological networks
• More efficient transfer of the data rescued into the digital format by means of improving the current state-of-the-art on image enhancement, image segmentation and post-correction techniques, innovating on adaptive Optical Character Recognition and Speech Recognition technologies and their application to transfer data, defining best practices about the operational context for digitisation, improving techniques for inventorying, organising, identifying and validating the data rescued, exploring crowd-sourcing approaches or engaging citizen scientist volunteers, conserving, imaging, inventorying and archiving historical documents containing weather records
• Climate data and metadata processing, including climate data flow management systems, from improved database models to better data extraction, development of relational metadata databases and data exchange platforms and networks interoperability
• Innovative, improved and extended climate data quality controls (QC), including both near real-time and time-series QCs: from gross-errors and tolerance checks to temporal and spatial coherence tests, statistical derivation and machine learning of QC rules, and extending tailored QC application to monthly, daily and sub-daily data and to all essential climate variables
• Improvements to the current state-of-the-art of climate data homogeneity and homogenisation methods, including methods intercomparison and evaluation, along with other topics such as climate time-series inhomogeneities detection and correction techniques/algorithms, using parallel measurements to study inhomogeneities and extending approaches to detect/adjust monthly and, especially, daily and sub-daily time-series and to homogenise all essential climate variables
• Fostering evaluation of the uncertainty budget in reconstructed time-series, including the influence of the various data processes steps, and analytical work and numerical estimates using realistic benchmarking datasets

Convener: Manola Brunet-India | Co-conveners: Victor Venema (deceased), Dan Hollis, John Kennedy
Lightning talks
| Fri, 10 Sep, 11:00–12:30 (CEST)

Lightning talks: Fri, 10 Sep

Chairpersons: Victor Venema (deceased), Manola Brunet-India
Climate dataset development
Progress toward a holistic land and marine surface meteorological database and a call for additional contributions.
Simon Noone, Chris Atkinson, David I. Berry, Robert J.H. Dunn, Eric Freeman, Irene Perez Gonzalez, John J. Kennedy, Elizabeth C. Kent, Anthony Kettle, Shelley Mc Neill, Matthew Menne, Ag Stephens, Peter W. Thorne, William Tucker, Corinne Voces, and Kate M. Willet
Marc J. Prohom-Duran and Monica Herrero-Anaya

Data rescue of instrumental meteorological measurements plays an important role in climate research, as it allows daily-to-decadal variability and changes, including extremes, to be addressed. In this context, long and high-quality series are the most valuable tool, but attention to short series is also needed for extremes evaluation and reanalysis of historical episodes. This need is even more relevant in those geographical areas with a great spatial and temporal variability, like the Mediterranean basin.

Here, we describe a new data source of instrumental and phenological observations for Catalonia (northeast of Iberia) and the Balearics, covering a period from 1894 to 1908 (until 1917 for the city of Barcelona), and known as the “Meteorological Network of Catalonia and the Balearics”. This observational network was the first successful coordinated initiative in Spain and was conducted by the Granja Experimental de Barcelona, an institution created by the Barcelona Provincial Council with the objective of collecting meteorological and phenological data for agronomic studies. The Granja created a network of 51 weather stations, supplying instrumentation and rules of observation to the volunteer observers. Most of the stations provided air temperature, rainfall, and air pressure data, and more detailed information was added since 1898, including sky conditions, or evaporation, with daily and sub-daily reports (twice a day). Regarding phenology, several stations reported various phenophases, such as first leaves, first fruits, fruit ripening and defoliation for plants and trees, and the passage, arrival, and departure of certain birds. The network, although short-lived, marked the beginning of many observatories that continued in later decades, and was the laid the first stone of the Meteorological Service of Catalonia, established in 1921.

Original observations are kept in paper sheets and have been recently digitized (scanned) and catalogued by the current SMC, jointly with additional documentation, such as written correspondence between the observers and the Granja (i.e., a valuable source for metadata) or special reports on intensity and duration of thunderstorms. The digitized documents (4,100 images) will be soon fully available throughout the public website “Digital Memory of Catalonia”, while daily maximum and minimum air temperature and rainfall data has already been extracted and recorded at the SMC database. 

How to cite: Prohom-Duran, M. J. and Herrero-Anaya, M.: A new source of instrumental and phenological data for Catalonia and the Balearics (1895-1908), EMS Annual Meeting 2021, online, 6–10 Sep 2021, EMS2021-101, https://doi.org/10.5194/ems2021-101, 2021.

Hans Olav Hygen, Christoffer Artturi Elo, and Herdis Motrøen Gjelten

MET Norway has, like many other NHMS, more than a century of data, where a large portion of early measurements are not digitized. At MET Norway these data were stored on microfilm, which was smouldering away. In the autumn of 2020 were we able to scan all of these microfilms, which produced 1.3 million pictures with 1 to 6 datasheets per image. Each image contains daily data from one month. These images were produced automatic, and basically without metadata.

Moving these images towards data requires multiple steps, and a manual job in-house in MET would be too costly. In other projects at MET Norway image recognition has been applied to e.g. cloud classification. Based on this, image recognition was applied. We apply this in a multi-stage approach to ensure the quality of every stage and gain confidence in the technology. The first stage is to make a catalogue of which stations is represented in each image. The next stage is to capture the metadata such as the year and month of each image/datasheet. Stage three is to extract the data from each image and prepare it for MET Norway’s data storage and distribution. As part of this, the regular quality control from MET Norway will be performed on the dataset, thus ensuring that the quality is up to MET Norway’s regular standard.

We are still in the early stages of the project. As stated are the microfilms scanned, and we have used image recognition to create a catalogue of which image belongs to which stations. We have also extracted which year is represented in each picture. This was done due to this information been in printed letters. The observations are handwritten and have so far been to be harder to extract. An internal website has been established to monitor the progress from the image recognition and limited manual corrections.

How to cite: Hygen, H. O., Elo, C. A., and Gjelten, H. M.: Data rescue and digitization through image recognition, EMS Annual Meeting 2021, online, 6–10 Sep 2021, EMS2021-43, https://doi.org/10.5194/ems2021-43, 2021.

Hans Olav Hygen, Abigail Louise Aller, Anette Lauen Borg, Line Båserud, Louise Oram, Nina Elisabeth Larsgård, and Vegar Kristiansen

The meteorological observation networks are in rapid change. Among other trends, these changes include: increased frequency in observations, increased spatial resolution of observations, and increased heterogeneity in observation platforms. These changes challenge the current data storage and quality control. MET Norway has implemented a new data storage, ODA, to be able to receive a significant amount more data. 

A significant challenge is that the current quality system doesn’t scale to the new world of observations. The current quality control system is not built to be modular, thus requires significant work to integrate improvements.

MET Norway is rising to the challenge of the new observation structure and storage renewing the handling of observational networks and the quality control system. Previously there have been strict criteria on how MET Norway should handle data from an observational station, this is changing with the emergence of new, cheap observational platforms. To accommodate this we are structuring the handling of the station in a hierarchical system where some stations will have fully populated metadata and be treated at the highest level, whilst others will have less information down to unknown stations with unknown setup, e.g. Netatmo.

The new quality control system will be modular to ensure the ability to change and upgrade different parts. One major module of the system is an in-house developed library for spatial quality control, Titan (presented at EMS 2019).

Unlike the present quality control, which is a separate entity to the data storage, CONFIDENT will be built to use ODA as data storage to ensure the best information is available for users and CONFIDENT at all times. We are also working on how we can integrate other software performing quality control of the data, e.g. for assimilation.

The project is planned to start in the autumn of 2021 for three years. Spring of 2021 was used to map relevant activities and modules as the foundation of the planned development of the new system. The plan is not to change the current quality system in one go, but to start implementing the different modules in 2022, and phasing out the current system throughout the project period of three years.

How to cite: Hygen, H. O., Aller, A. L., Borg, A. L., Båserud, L., Oram, L., Larsgård, N. E., and Kristiansen, V.: CONFIDENT - Met Norway’s plan for a new quality control system, EMS Annual Meeting 2021, online, 6–10 Sep 2021, EMS2021-45, https://doi.org/10.5194/ems2021-45, 2021.

Angelika Heil, Anette Ganske, Andrea Lammert, Daniel Heydebreck, and Hannes Thiemann

Atmospheric Model data form the basis to understand and predict weather, climate and air quality phenomena. Access to this data is not only of interest to a wide scientific community but also to public services, companies, politicians and citizens. One way to make the data available is to publish them via a data repository. To ensure that datasets in a repository are indeed Findable, Accessible, Interoperable, and Reusable (i.e. FAIR1), it is essential that the data are stored together with detailed metadata and that the file structure and metadata follow an established standard. Furthermore, datasets are easier to find and reuse if  the corresponding metadata is machine-readable and uses a standardised vocabulary. While data standardization is well established in large, internationally coordinated model intercomparison projects (e.g. for climate models in CMIP2), joint standards are still lacking in many atmospheric modelling sub-disciplines, such as e.g. urban climate or cloud-resolving modelling. 

The AtMoDat project (Atmospheric Model Data)3, led by a team of atmospheric scientists and infrastructure providers, aims to improve the overall FAIRness of atmospheric model data and thus promote their re-use. Within the project, the ATMODAT standard4 has been developed which includes precise recommendations to achieve enhanced FAIRness of atmospheric model data in repositories. A prerequisite of this standard is that the data are published with a DataCite DOI5. The ATMODAT standard specifies requirements for rich metadata with controlled vocabularies, structured landing pages, file formats (netCDF) and the structure within files. Human- and machine-readable landing pages holding discipline-specific metadata are a core element of this standard. 

The ATMODAT standard is easy to implement and provides checklists for data curators and data producers. In addition, to facilitate the compliance check with the ATMODAT standard, the atmodat data checker6 has been developed. A dataset that complies with this standard will follow the FAIR principles and its metadata will be of high quality. If this compliance has been verified by the respective repository, the dataset can be labelled with the Earth System Data Branding (EASYDAB)7. This branding makes it easy for users to verify that the data are properly curated and the metadata has been quality assured.

1  Juckes et al., 2020: https://doi.org/10.5194/gmd-13-201-2020 
2  Eyring et al., 2016: https://doi.org/10.5194/gmd-9-1937-2016
3  www.atmodat.de
4  https://doi.org/10.35095/WDCC/atmodat_standard_en_v3_0
5  https://datacite.org
6  https://github.com/AtMoDat/atmodat_data_checker 
7  https://easydab.de

How to cite: Heil, A., Ganske, A., Lammert, A., Heydebreck, D., and Thiemann, H.: The ATMODAT Standard enhances FAIRness of Atmospheric Model data, EMS Annual Meeting 2021, online, 6–10 Sep 2021, EMS2021-298, https://doi.org/10.5194/ems2021-298, 2021.

Christopher William Frank, Richard Figura, Bernd Fischer, Frank Dimpfel, Ulrich Rothstein, Charlotte Eberz, Marvin Schuchert, and Max Lübcke

Climate and weather data play an important role for e.g. identifying actions against climate change and optimizing industries. However, a correct understanding and handling of such data is often difficult for users without a meteorological background. Moreover, specialized software solutions and an infrastructure capable of handling large amounts of data are needed to process and analyze these data. 

The research project FAIR addresses this issue by simplifying the exchange of information and data between the German Meteorological Service (DWD) and stakeholders from industry and public. To fulfill this purpose, microservices for processing, caching, visualizing, and analyzing meteorological data in an efficient way are being developed. Processing comprises, for example, the selection of specific information from model data or the conversion of the result into formats commonly used by the user. The compilation of microservices makes it possible to support different types of applications and at the same time to make data from third parties available to the DWD. To demonstrate the utility of these microservices, three test scenarios are considered: 1) wind farm planning, 2) integration of meteorological data for individual traffic routing, and 3) planning of social events such as festivals. 

In this article, we present the general idea and the current state of the project. The focus is on the challenges that have been identified for the three test scenarios and our technical approaches to address them. Herein we present the developed architecture, the data flow, the FAIR portal and the handling of metadata.

How to cite: Frank, C. W., Figura, R., Fischer, B., Dimpfel, F., Rothstein, U., Eberz, C., Schuchert, M., and Lübcke, M.: FAIR: User-friendly delivery of climate and weather data, EMS Annual Meeting 2021, online, 6–10 Sep 2021, EMS2021-302, https://doi.org/10.5194/ems2021-302, 2021.

Peter Domonkos

The development of ACMANT homogenization software started during the European COST HOME project, around 2010. Due to its excellent results in method comparison tests, the development of ACMANT has been being continuous since then. While its first version was applicable only to the homogenization of monthly temperature series, the later versions are applicable to a wide range of climatic variables and either for monthly or daily time series.

The operation of ACMANT is fast and automatic, and it is easy to use that even for large size datasets. The method can homogenize together time series of varied lengths, well tolerate data gaps, includes outlier filtering and infilling of data gaps (optional). ACMANT includes modern and effective statistical tools for the detection and removal of inhomogenities, such as step function fitting, bivariate detection for breaks of annual means and seasonal amplitudes (where applicable), ANOVA correction method and ensemble homogenization with varied pre-homogenization of neighbour series. For these properties, ACMANTv4 was the most accurate homogenization method in most method comparison tests of the Spanish MULTITEST project (https://doi.org/10.1175/JCLI-D-20-0611.1). In these tests, one important exception occurred, namely network mean trend errors were removed with significantly higher certainty by the Pairwise Homogenization Algorithm when approximately a half of the time series were affected with quasi synchronous breaks imitating concerted technical changes in the performance of climate observations. The most recent developments aiming the release of ACMANTv5 include the elimination of this drawback of ACMANT.

For ACMANTv5, a new break detection method has been developed, in which the combination of two time series comparison methods is applied. The new method contains both the use of composite reference series and pairwise comparisons, and in the detection with composite reference series the step function fitting is forced to include the breaks detected by pairwise comparisons. Another novelty of ACMANTv5 is that it gives options to use metadata in the homogenization procedure. The default operation mode of ACMANTv5 is still fully automatic, with or without the automatic use of a prepared metadata table. ACMANTv5 uses every date of the metadata list as a break indicator, and they are evaluated together with other indicators obtained by pairwise comparisons. Optionally, ACMANTv5 gives access to users to edit the list of detected breaks based on the pairwise detections of the first homogenization round. In the later steps of ACMANTv5 user intervention is not possible, but metadata may be considered by the automatic procedure also in the final estimation of break positions.     

How to cite: Domonkos, P.: Time series homogenization with the ACMANT software, EMS Annual Meeting 2021, online, 6–10 Sep 2021, EMS2021-29, https://doi.org/10.5194/ems2021-29, 2021.

Moritz Buchmann, Michael Begert, Stefan Brönnimann, Gernot Resch, and Christoph Marty

Measurements of snow depth and snowfall can vary dramatically over small distances. However, it is not clear if this applies to all derived variables and is the same for all seasons. Almost all meteorological time series incorporate some sort of inhomogeneities. Complete metadata and existing “parallel” stations in close proximity are not always available.
First, we analyse the impacts of local-scale variations based on a unique set of parallel manual snow measurements for the Swiss Alps consisting of 30 station pairs with up to 70 years of parallel data. Station pairs are mostly located in the same villages (or within 3km horizontal and 150m vertical distances). 
Seasonal analysis of derived snow climate indicators such as maximum seasonal snow depth, sum of new snow, or days with snow on the ground shows that largest differences occur in spring and the smallest ones are found in DJF and NDJFMA. Relative inter-pair differences (uncertainties) for days with snow on the ground (average snow depth) are below 15% for 90% (30%) .
Second, in view of any homogenization efforts of snow data series, it is paramount to understand the impacts of inhomogeneities. Using state-of-the-art break detection algorithms, we strive to investigate which method works best for detecting breaks in snow data series. The results can then be used on time series with insufficient metadata or no neighbouring stations in order to include them in future homogenization processes.
Furthermore, the knowledge about inhomogeneities and breakpoints paves the way for new applications such as the reliable combination of two parallel series into one single series.

How to cite: Buchmann, M., Begert, M., Brönnimann, S., Resch, G., and Marty, C.: Evaluation of break detection methods for snow data series, EMS Annual Meeting 2021, online, 6–10 Sep 2021, EMS2021-47, https://doi.org/10.5194/ems2021-47, 2021.

Elinah Khasandi Kuya, Herdis Motrøen Gjelten, and Ole Einar Tveito

Climate normals play an important role in weather and climate studies and therefore require high-quality dataset that is both consistent and homogenous. The Norwegian observation network has changed considerably during the last 20-30 years, introducing non-climatic changes such as automation and relocation. Homogenization was therefore necessary and work has been done at the Norwegian Meteorological Institute to establish a homogeneous precipitation reference dataset for the purpose of calculating the new climatological standard normals for the period 1991-2020. 

The homogenization tool Climatol was applied to detect inhomogeneities in the Norwegian precipitation series, for the period 1961-2018. 370 series (including 44 from Sweden and one from Finland) of monthly precipitation sums, from the ClimNorm precipitation dataset were used in the homogenization analysis. ClimNorm is an international network activity under the Nordic Framework for Climate Services covering six countries in the Nordic region (Denmark, Estonia, Finland, Latvia, Norway and Sweden) with an objective that includes sharing data, methods and experiences in preparing a data basis as good as possible for calculation of new climate normals. 

Results from homogeneity testing found inhomogeneities in 95 (29 %) of the 325 Norwegian precipitation series. However, only 81 (25 %) of the series were classified as inhomogeneous after conferring with metadata and therefore adjusted. Relocation of the precipitation gauge and automation were the main causes of all the inhomogeneities in the Norwegian series, explaining 71 % and 12 % respectively of all detected breaks. All but one of the accepted inhomogeneities could be confirmed with metadata. Inhomogeneities found in the Swedish and Finnish series were adjusted without metadata. Results further showed benefits of incorporating metadata to the automatically detected inhomogeneities. Linear trend analysis showed increasing trends in the period 1961-2018 except in autumn where a decreasing trend was observed. The homogeneity analysis produced a 58-year long homogenous dataset for 325 monthly precipitation sum with regional temporal variability and spatial coherence that was significantly better than that of non-homogenized series. The homogenized dataset is more reliable in explaining the large-scale climate variations and was used to calculate the new climate normals in Norway.

How to cite: Kuya, E. K., Gjelten, H. M., and Tveito, O. E.: Homogenization of Norwegian monthly precipitation series, EMS Annual Meeting 2021, online, 6–10 Sep 2021, EMS2021-211, https://doi.org/10.5194/ems2021-211, 2021.

Sven Brinckmann, Anna Klameth, and Jörg Trentmann

Measurements of the surface solar radiation have a high importance for the fields of meteorology, climatology, solar energy, agriculture, forestry and other applications. Radiation measurements at ground stations using high quality instruments such as pyranometers began in the second half of the nineteenth century. From the 1980s onward, satellite imagery in the visible radiation spectrum has been used to calculate gridded data of cloud information and, subsequently, of solar radiation at the Earth's surface. Compared to station data, satellite data have the advantage of spatial continuity, but have disadvantages in temporal resolution and data accuracy.

As part of a restructuring of the radiation measurement network, the German Meteorological Service (DWD) is pursuing the goal of expanding its high-quality surface measurements using pyranometers (to 42 stations) and largely discontinuing other radiation measurements, such as direct measurements of sunshine duration. At the same time, surface solar radiation products from satellite data are progressively improving in quality and can be used to compensate for the reduction of ground measurements and increase the spatial coverage of radiation information over Germany. For this purpose, the project DUETT aims at a merging between solar radiation data from the 42 pyranometer stations and near-real-time data based on measurements from METEOSAT-SEVIRI. As products, hourly values of the parameters global horizontal irradiance (GHI) and sunshine duration (SDU) will be provided on a 1x1km grid for Germany with a time delay of 15 minutes after each full hour.

Merging is performed in three main steps, which are described in the following for the parameter GHI. First, the hourly mean values of both data sources are calculated. In the case of the satellite data, this step involves the use of an 'optical flow' technique to generate intermediate images to increase the original time resolution from 15 minutes to virtually 1 minute. Using this technique, the displacement of fast-moving clouds is better reflected. In the second step, systematic deviations between the two data sources are determined and corrected for by using predictors. Preliminary research suggests that cloudiness (or clearness index) is one such appropriate predictor. In the final step, the local differences between the corrected satellite data and the station data are interpolated to the target grid using Universal Kriging and the results are added to the corrected satellite data.

We present the first results of the merging procedure to be developed for both radiation parameters GHI and SDU. Analyses of the systematically occurring radiation differences between the two data sources are shown as well as the related correction functions. Furthermore, first results of the validation of the combined radiation products will be presented. This includes comparisons with measurements at validation stations as well as analyses based on cross-validation.

How to cite: Brinckmann, S., Klameth, A., and Trentmann, J.: Merging of satellite and ground measurements of hourly surface solar radiation variables in Germany, EMS Annual Meeting 2021, online, 6–10 Sep 2021, EMS2021-128, https://doi.org/10.5194/ems2021-128, 2021.


Supporters & sponsors