- 1STFC, CEDA, United Kingdom of Great Britain – England, Scotland, Wales (rhys.r.evans@stfc.ac.uk)
- 2STFC, CEDA, United Kingdom of Great Britain – England, Scotland, Wales (philip.kershaw@stfc.ac.uk)
- 3STFC, CEDA, United Kingdom of Great Britain – England, Scotland, Wales (david.poulter@stfc.ac.uk)
- 4NCAS, CEDA, United Kingdom of Great Britain – England, Scotland, Wales (rhys.evans@ncas.ac.uk)
- 5NCAS, CEDA, United Kingdom of Great Britain – England, Scotland, Wales (philip.kershaw@ncas.ac.uk)
- 6NCAS, CEDA, United Kingdom of Great Britain – England, Scotland, Wales (david.poulter@ncas.ac.uk)
The Earth System Grid Federation (ESGF) is the international partnership responsible for the distribution, cataloging and archiving of both the Coupled Model Intercomparison Project (CMIP) and the Coordinated Regional Climate Downscaling Experiment (CORDEX). In operation since 2009, it was the first decentralised climate data repository of its kind, storing and serving many petabytes of data across tens of global and region data centre partners.
Over the last five years, the system has been fully rearchitected, introducing a cloud-ready deployment architecture and a new system for distributed search, fundamental to ESGF’s federated model for data access. This has involved innovations, translating successful experience with the STAC (Spatio-Temporal Asset Catalogue) specification from the EO world and developing a profile for its use with global climate projections data. Providing a STAC interface to ESGF archives has allowed us to explore alternate access methods for cloud-accessible analysis-ready data ready formats through the use of tools such as Kerchunk, a lightweight non-conversion approach for referencing existing data, which works with open-source python packages like fsspec and Xarray. Use of STAC also provides the potential for greater integration between EO and climate modelling domains essential for the validation of model outputs.
ESGF has traditionally used a distributed model for search services which though powerful has led to challenges around consistency of search content. Over the last twelve months, in preparation for CMIP7, a further fundamental innovation has been made in the architecture to address these issues. The new system adopts a centralised model, with two search nodes, one in the US and one in Europe each hosted on public cloud. These two nodes are synchronised together using a new event-driven architecture. This approach, driven by a shared messaging framework between the nodes, ensures eventual-consistency across the nodes, to reduce or eliminate errors caused by individual node down time and simplify processes such as the replication and retraction of data from the archives distributed at sites across the federation.
The move to a message based, event driven architecture has been integrated with STAC records and services. In ESGF-NG data is shared between nodes as messages in the form of STAC Item records, ensuring a consistent, publicly documented archive distributed across many nodes. The ESGF team have contributed several changes to the STAC project to facilitate this change. Looking forward, we see potential in this new event driven architecture for search systems as a means to integrate across federations - in the European context this could include the ESA Climate Change Initiative open data portal, work with the Copernicus Climate Data Store and DestinE.
How to cite: Evans, R., Poulter, D., Kershaw, P., Foster, I., Ananthakrishnan, R., Hoffman, F., Radhakrishnan, A., Kinderman, S., Ames, S., and Westwood, D.: ESGF Next Generation and preparations for CMIP7, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-19453, https://doi.org/10.5194/egusphere-egu25-19453, 2025.