EGU26-14863, updated on 14 Mar 2026
https://doi.org/10.5194/egusphere-egu26-14863
EGU General Assembly 2026
© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.
Oral | Thursday, 07 May, 12:00–12:10 (CEST)
 
Room -2.33
Breaking Data Siloes: How the NSF NCAR Geoscience Data Exchange Powers Collaboration
Douglas Schuster1, Harsha Hampapura1, Riley Conroy1, and Brian Bockelman2
Douglas Schuster et al.
  • 1NSF National Center for Atmospheric Research, Computational and Information Systems Lab, United States of America (schuster@ucar.edu)
  • 2Morgridge Institute for Research, United States of America (bbockelman@morgridge.org)

Data intensive research continues to drive innovation and discovery across Earth system science (ESS).  ESS datasets maintained in science discipline specific repositories, including climate model projections, historical reanalysis products, and observational datasets, provide rich resources to support these initiatives. While significant progress has been made through the hosting of datasets by commercial cloud providers, many of these data resources– sometimes stored in non-standard formats–are primarily maintained in unconnected, domain-focused data systems designed to support the legacy “download, clean and analyze model”. This is a time consuming process with bandwidth and storage requirements that may be prohibitive, particularly for institutions with limited resources. This combination of the download, clean, and analyze model, and the use of non-standard formats, combine to create a barrier to realizing the full research potential of ESS data assets. 


This presentation will highlight the National Science Foundation National Center for Atmospheric Research (NSF NCAR) efforts to develop and deploy its Geoscience Data Exchange, Research Data Commons (GDEX, https://gdex.ucar.edu). GDEX is designed to overcome the challenges described above by: 1) curating standards based (FAIR), Analysis and AI optimized (AR/AI) versions of global and regional atmospheric reanalysis outputs, earth systems simulation outputs, and observations produced at NSF NCAR and partner organizations, 2) providing direct access to these datasets through its integration with on-premise computational resources, and 3) providing performant distributed access through its integration the Open Science Data Federation’s (OSDF, https://osg-htc.org/services/osdf).  The OSDF supports streaming data access and integration with a variety of data and compute services through its system of geographically distributed data caches, including commercial cloud hosted open datasets. GDEX’s integration with OSDF supports a wider variety of cross-domain research use cases by enabling efficient access to the spectrum of datasets hosted through OSDF’s origin access points.  Finally, GDEX is integrated with colocated data analytics services to support rapid development and iteration of data science (e.g. AL/ML) workflows, and facilitate open sharing of those workflows. To promote user adoption of these services, an example set of reference data analysis workflows have been seeded in public collaboration software repositories and documented in JupyterBook style web pages.  GDEX users are encouraged to submit their own workflow examples through this resource, amplifying the impact of their science by allowing others to more easily build upon their work.

How to cite: Schuster, D., Hampapura, H., Conroy, R., and Bockelman, B.: Breaking Data Siloes: How the NSF NCAR Geoscience Data Exchange Powers Collaboration, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-14863, https://doi.org/10.5194/egusphere-egu26-14863, 2026.