EGU25-1294, updated on 14 Mar 2025
https://doi.org/10.5194/egusphere-egu25-1294
EGU General Assembly 2025
© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.
Poster | Thursday, 01 May, 10:45–12:30 (CEST), Display time Thursday, 01 May, 08:30–12:30
 
Hall X4, X4.74
A new sub-chunking strategy for fast netCDF-4 access in local, remote and cloud infrastructures. 
Flavien Gouillon, Cédric Penard, Xavier Delaunay, and Sylvain Herlédan
Flavien Gouillon et al.
  • CNES, Data Campus, France (flavien.gouillon@cnes.fr)

NetCDF (Network Common Data Form) is a self-describing, portable and platform-independent format for array-oriented scientific data which has become a community standard for sharing measurements and analysis results in the fields of oceanography, meteorology but also in the space domain.

The volume of scientific data is continuously increasing at a very fast rate. Object storage, a new paradigm that appeared with cloud infrastructures, can help with data storage and parallel access issues, but NetCDF may not be able to get the most out of this technology without some tweaks and fine tuning.

The availability of ample network bandwidth within cloud infrastructures allows for the utilization of large amounts of data. Processing data       where the data is located is preferable as it can result in substantial resource savings. But for some use cases downloading data from the cloud is required (e.g. processing also involving confidential data) and results still have to be fetched once processing tasks have been executed on the cloud.

Networks      exhibit significant variations in capacity and quality (ranging from fiber-optic and copper connections to satellite connections with poor reception in degraded conditions on boats, among other scenarios). Therefore, it is crucial for formats and software libraries to be specifically designed to optimize access to      data by minimizing the transfer to only what is strictly necessary.

In this context, a new approach has emerged in the form of a library that indexes the content of netCDF-4 datasets. This indexing enables the retrieval of sub-chunks, which are pieces of data smaller than a chunk, without the need to reformat the existing files. This approach targets access patterns such as time series in netCDF-4 datasets formatted with large chunks.

This report provides a performance assessment of netCDF-4 datasets for varied use cases. This assessment executes these use cases under various conditions, including POSIX and S3 local filesystems, as well as a simulated degraded network connection. The results of this assessment may provide guidance on the most suitable and most efficient library for reading netCDF data in different situations.

How to cite: Gouillon, F., Penard, C., Delaunay, X., and Herlédan, S.: A new sub-chunking strategy for fast netCDF-4 access in local, remote and cloud infrastructures. , EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-1294, https://doi.org/10.5194/egusphere-egu25-1294, 2025.