EGU25-14610, updated on 15 Mar 2025
https://doi.org/10.5194/egusphere-egu25-14610
EGU General Assembly 2025
© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.
Oral | Friday, 02 May, 16:25–16:35 (CEST)
 
Room -2.32
Weather Data Streaming with Kerchunk: Strengthening Early Warning Systems 
Nishadh Kalladath1, Masilin Gudoshava1, Shruti Nath2, Jason Kinyua1, Fenwick Cooper2, Hannah Kimani3, David Koros3, Christine Maswi3, Zacharia Mwai3, Asaminew Teshome4, Samrawit Abebe4, Isaac Obai5, Jesse Mason5, Ahmed Amdihun1, and Tim Palmer2
Nishadh Kalladath et al.
  • 1IGAD Climate Prediction and Applications Centre, Nairobi, Kenya
  • 2University of Oxford, Oxford, United Kingdom
  • 3Kenya Meteorological Department, Nairobi, Kenya
  • 4Ethiopia Meteorological Institute, Addis Ababa, Ethiopia
  • 5World Food Programme, Rome, Italy

The Ensemble Prediction System (EPS) provided by global weather forecast centres generates vast amounts of data that is crucial for early warnings of extreme weather and climate. However, regional and national meteorological services often face challenges in processing this data efficiently, particularly during regional downscaling and post-processing. Conventional methods of downloading and storing GRIB-format data have become increasingly inefficient and unsustainable. The Strengthening Early Warning Systems for Anticipatory Actions (SEWAA) project aims to address these challenges by exploring the use of cloud native operations and GenAI-cGAN driven post-processing systems.   

Kerchunk provides a groundbreaking solution for real-time weather data streaming, catering to the transition towards open and free to use cloud-based object storage from global weather forecasting centres. Kerchunk, in conjunction with GRIB index files, enables efficient, real-time access to weather data, fostering more sustainable workflows in weather and climate services, thus strengthening early warning systems.  

This study developed a workflow for streaming forecast data using Kerchunk with two primary objectives:  

1. Using GRIB index files to reduce redundant readings and generate Kerchunk reference files.  

2. Through streaming-like access, convert the reference files into virtual Zarr datasets and utilise Dask compute for scalable data handling   

The methodology utilised recent improvements in the Kerchunk library that integrate GRIB scanning with its index files. This allowed the system to sample subsets of the GRIB corpus instead of processing entire Forecast Model Run Collections (FMRC), significantly optimising performance.  

The workflow was implemented using cloud-based compute operations via Coiled python library and its service on the Google Cloud Platform. Dask cluster, managed through Coiled, enabled the creation of Zarr virtual datasets for analysis and visualisation. This streaming approach efficiently loads weather data into memory on demand, avoiding unnecessary data downloads and duplication.   

We validated the solution with NOAA GFS/GEFS datasets stored in AWS S3 bucket as open datasets. The optimised workflow demonstrated remarkable efficiency, requiring only <5% of the original GRIB data to be read, with the rest replaced by index files as input for reference file creation. This is followed by the step of Kerchunk reference files to virtual Zarr conversion by Dask clusters to process on a regional scale, such as East Africa’s in minutes supporting near real-time applications across spatial and temporal scales.  

This approach significantly enhances post processing workflows for EPS weather forecast, bolstering early warning systems and anticipatory action. Future work will focus on using the method to scaling training datasets and improving the cost efficiency of cGAN training to advance operational early warning systems. This innovative solution directly addresses the challenges faced by meteorological services in processing massive weather datasets, providing a scalable, cost-effective, development foundation for applying GenAI based post-processing and improving early warning systems. 

How to cite: Kalladath, N., Gudoshava, M., Nath, S., Kinyua, J., Cooper, F., Kimani, H., Koros, D., Maswi, C., Mwai, Z., Teshome, A., Abebe, S., Obai, I., Mason, J., Amdihun, A., and Palmer, T.: Weather Data Streaming with Kerchunk: Strengthening Early Warning Systems , EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-14610, https://doi.org/10.5194/egusphere-egu25-14610, 2025.