EGU25-9581, updated on 14 Mar 2025
https://doi.org/10.5194/egusphere-egu25-9581
EGU General Assembly 2025
© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.
Poster | Friday, 02 May, 08:30–10:15 (CEST), Display time Friday, 02 May, 08:30–12:30
 
Hall X5, X5.209
MeteoSaver: a new machine-learning based software for transcription of historical weather data
Derrick Muheki1, Bas Vercruysse2, Krishna Kumar Thirukokaranam Chandrasekar2,10, Christophe Verbruggen2, Julie M. Birkholz2,9, Koen Hufkens3, Hans Verbeeck4, Pascal Boeckx5, Seppe Lampe1, Ed Hawkins6, Peter Thorne7, Dominique Kankonde Ntumba8, Olivier Kapalay Moulasa8, and Wim Thiery1
Derrick Muheki et al.
  • 1Department of Water and Climate, Vrije Universiteit Brussel, Brussel, Belgium (derrick.muheki@vub.be)
  • 2Department of History, Ghent Centre for Digital Humanities, Ghent University, Ghent, Belgium
  • 3BlueGreen Labs (bv), Melsele, Belgium
  • 4Department of Environment, Ghent University, Ghent, Belgium
  • 5Isotope Bioscience Laboratory - ISOFYS, Ghent University, Ghent, Belgium
  • 6National Centre for Atmospheric Science, Department of Meteorology, University of Reading, Reading, United Kingdom
  • 7ICARUS Climate Research Centre, Maynooth University, Maynooth, Ireland
  • 8Institut National pour l’Etude et la Recherche Agronomiques, Direction Générale, Kinshasa, Democratic Republic of the Congo
  • 9Digital Research Lab, KBR - Royal Library of Belgium, Brussels, Belgium
  • 10Royal Museums of Art and History, Brussels, Belgium

Archives of observed weather data present unique opportunities for scientists to obtain long time series of the historical climate for many regions of the world. Unfortunately, most of these observational records are to-date available only on paper, and thus require digitization and transcription to facilitate analysis of climatic trends. Here we present a new open-source software, MeteoSaver, that uses machine learning (ML) algorithms to transcribe handwritten records of historical weather data. MeteoSaver version 1.0 processes images of tabular sheets alongside user-defined configuration settings, performing transcription through five sequential steps: (i) image pre-processing, (ii) table and cell detection, (iii) transcription, (iv) quality assessment and quality control, and (v) data formatting and upload. As an illustration and evaluation of the software, we apply MeteoSaver to ten pictured sheets of handwritten temperature observations from the Democratic Republic of the Congo. The results show that 95-100% of the records can be transcribed, of which a median of 74.4% reached the highest internal quality flag and 74% matches with the manually transcribed record, yielding a median mean absolute error of 0.3°C. These results illustrate that MeteoSaver can be applied to a range of handwriting styles and varying tabular dimensions, paper sizes, and maintenance conditions, highlighting its potential for transcribing tabular meteorological observations from multiple regions, especially if the sheets have a consistent format. Overall, our open-source software can help address the challenges of limited available hydroclimatic data within many regions of the world, by helping to save millions of handwritten records of historical weather data presently stored in archives, and expedite research on the climate and environmental changes in data scarce regions.  

How to cite: Muheki, D., Vercruysse, B., Chandrasekar, K. K. T., Verbruggen, C., Birkholz, J. M., Hufkens, K., Verbeeck, H., Boeckx, P., Lampe, S., Hawkins, E., Thorne, P., Ntumba, D. K., Moulasa, O. K., and Thiery, W.: MeteoSaver: a new machine-learning based software for transcription of historical weather data, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-9581, https://doi.org/10.5194/egusphere-egu25-9581, 2025.