EGU2020-12386
https://doi.org/10.5194/egusphere-egu2020-12386
EGU General Assembly 2020
© Author(s) 2020. This work is distributed under
the Creative Commons Attribution 4.0 License.

Data dissemination best practices and challenges identified through NOAA’s Big Data Project

Meredith Richardson1, Ed Kearns1, and Jonathan O'Neil2
Meredith Richardson et al.
  • 1Office of the Chief Information Officer (OCIO), NOAA, United States of America
  • 2BDP Director, OCIO, NOAA, United States of America

Through satellites, ships, radars, and weather models, the National Oceanic and Atmospheric Administration (NOAA) generates and handles tens of terabytes of data per day. Many of NOAA’s key datasets have been made available to the public through partnerships with Google, Microsoft, Amazon Web Services, and more as part of the Big Data Project (BDP). This movement of data to the Cloud has enabled access for researchers from all over the world to vast amounts of NOAA data, initiating a new form of federal data management as well as exposing key challenges for the future of open-access data. NOAA researchers have run into challenges of providing “analysis-ready” datasets to which researchers from varying fields can easily access, manipulate, and use for different purposes. This issue arises as there is no agreed-upon format or method of transforming traditional datasets for the cloud across research communities, with each scientific field or start up expressing differing data formatting needs (cloud-optimized, cloud-native, etc.). Some possible solutions involve changing data formats into those widely-used throughout the visualization community, such as Cloud-Optimized GeoTIFF. Initial findings have led NOAA to facilitate roundtable discussions with researchers, public and private stakeholders, and other key members of the data community, to encourage the development of best practices for the use of public data on commercial cloud platforms. Overall, by uploading NOAA data to the Cloud, the BDP has led to the recognition and ongoing development of new best practices for data authentication and dissemination and the identification of key areas for targeting collaboration and data use across scientific communities.

How to cite: Richardson, M., Kearns, E., and O'Neil, J.: Data dissemination best practices and challenges identified through NOAA’s Big Data Project, EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-12386, https://doi.org/10.5194/egusphere-egu2020-12386, 2020

Displays

Display file

Comments on the display

AC: Author Comment | CC: Community Comment | Report abuse

displays version 1 – uploaded on 05 May 2020
  • CC1: Answering to Your question in the chat: PPPs in Europe, Knut Behrends, 06 May 2020

    I am not a lawyer and not involved in administative processes, but one Public-Private Partnership in Germany is DFN (Deutsches ForschungsNetz) and Deutsche Telekom (and Cisco?) striking deals to give research institutions  discounted access to Internet Infrastructure for data transfer and videoconferencing. Organising cloud storage is an upcoming extra project of the DFN members. See www.DFN.de . I don't know how it works in practice (this is way over my head). We also have Academies of Science  (Leopoldina, Acatech) that serve as kick-off platforms for big projects and initiatives (because members are Professors, politicians, top managers, super-rich people)

    • AC2: Reply to CC1, Meredith Richardson, 06 May 2020

      Knut, thank you for this insight into PPPs in Germany!

  • AC1: Comment on EGU2020-12386, Meredith Richardson, 06 May 2020

    Continuing the discusssion from the live chat,

    Philip Kershaw asked, "@Meredith, how do you manage sustainability with serving data on public cloud? Is data withdrawn if the the cloud provider decides it's not commercially viable?"

    The data that the partners request to host will be there as long as the cloud provider sees fit. However, any NOAA data that are being stored by NOAA on the Partners’ commercial cloud platforms using other contract vehicles for mission purposes (free egress) will continue to be be stored as long as the contract vehicle exists.

    • AC3: Clarification, Meredith Richardson, 06 May 2020

      To clarify a caveat, the minimum time guaranteed by the could service providers to host NOAA data on the Big Data Project is 2 years.

  • CC2: Who pays for the data storage? Might the data selection process be biased by commercial interests?, Daniel Heydebreck, 06 May 2020

    Having data in the Google/AWS/Microsoft clouds might be beneficial for a lot of users -- particularly those of the Pangeo community. Additionally, I expect that the companies will be faster in adapting to new data access technologies than publicly funded organizations can. Thus, there is a clear value in putting the data into commercial clouds. The security aspect seems to be also quite reasonable. It is parly neglected in research.

    In the beginning Google, Amazon and Microsoft  might host the data for free -- advertisment, CSR, ... . But someone has to pay for the storage in the long run -- either the companies themselves, the data users or der data providers. What are the perspectives?

    Do you see a danger that the hosted data might be select based on how well the content of data fits into the companies own agenda? Meaning that some data are "undesirable" and, hence, not hosted?

  • AC4: Comment on EGU2020-12386, Jonathan O'Neil, 07 May 2020

    During the CRADA phase of the project, data selection was mostly driven by the CSPs customers' demand, and to a lesser extent, the interest of the CSPs themselves. Under the contract, NOAA is allocated storage, provide by the CSPs, for which it decides what data are stored. This was done in an effort to create a better balance between the various types of NOAA data assets represented on the cloud platforms.