EGU24-22461, updated on 11 Mar 2024
https://doi.org/10.5194/egusphere-egu24-22461
EGU General Assembly 2024
© Author(s) 2024. This work is distributed under
the Creative Commons Attribution 4.0 License.

Pretraining a foundation model using MODIS observations of the earth’s atmosphere

Valentine Anantharaj1, Takuya Kurihana2, Gabriele Padovani3, Ankur Kumar4, Aristeidis Tsaris1, Udaysankar Nair4, Sandro Fiore3, and Ian Foster2
Valentine Anantharaj et al.
  • 1Oak Ridge National Laboratory, Oak Ridge, USA
  • 2University of Chicago, Chicago, USA
  • 3University of Trento, Trento, Italy
  • 4University of Alabama - Huntsville, Huntsville, USA

Pretraining a foundation model using MODIS observations of the earth’s atmosphere 

The earth and atmospheric sciences research community has an unprecedented opportunity to exploit the vast amount of data available from earth observation (EO) satellites and earth system models (ESM). Smaller and cheaper satellites with reduced operational costs have made a variety of EO data affordable, and technological advances have made the data accessible to a wide range of stakeholders, especially the scientific community (EY, 2023). The NASA ESDS program alone is expected to host 320 PB of data by 2030 (NASA ESDS, 2023). The ascent and application of artificial intelligence foundation models (FM) can be attributed to the availability of large volumes of curated data, accessibility to extensive compute resources and the maturity of deep learning architectures, especially the transformer (Bommasani et al., 2021). 

Developing a foundation model involves pretraining a suitable deep learning architecture with large amounts of data, often via self supervised learning (SSL) methods. The pretrained models can then be adapted to downstream tasks via fine tuning, requiring less amount of data than task-specific models. Large language models (LLM) are likely the most common type of foundation encountered by the general public. Vision transformers (ViT) are based on the LLM architecture and adapted for image and image-like data (Dosovitskiy, et. al., 2020), such as EO data and ESM simulation output.  We are in the process of pretraining a ViT model for the earth’s atmosphere using a select few bands of 1-km Level-1B MODIS radiances and brightness temperatures, MOD021KM and MYD021KM from the NASA Terra and Aqua satellites respectively. We are using 200 million image chips of size 128x128 pixels. We are pretraining two ViT models of sizes 100 million and 400 million parameters respectively. The pretrained models will be finetuned for cloud classification and evaluated against AICCA. We will discuss our experiences involving data and computing, and present preliminary results.

 

References

Bommasani R, Hudson DA, Adeli E, Altman R, Arora S, et al: On the opportunities and risks of foundation models. CoRR abs/2108.07258. https://arxiv.org/abs/2108.07258, 2021. 

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S. and Uszkoreit, J.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.

Ernst & Young (EY): How can the vantage of space give you strategic advantage on Earth? https://www.ey.com/en_gl/technology/how-can-the-vantage-of-space-give-you-strategic-advantage-on-earth, 2023. Accessed 10 January 2024.

Kurihana, Takuya, Elisabeth J. Moyer, and Ian T. Foster: AICCA: AI-Driven Cloud Classification Atlas. Remote Sensing 14, no. 22: 5690. https://doi.org/10.3390/rs14225690, 2022.

NASA MODIS: MODIS - Level 1B Calibrated Radiances. DOI: 10.5067/MODIS/MOD021KM.061 and DOI: 10.5067/MODIS/MYD021KM.061

NASA ESDS: Earthdata Cloud Evolution https://www.earthdata.nasa.gov/eosdis/cloud-evolution. Accessed 10 January 2024.

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I: Attention is all you need. Adv Neural Inf Process Syst 30, 2017.





How to cite: Anantharaj, V., Kurihana, T., Padovani, G., Kumar, A., Tsaris, A., Nair, U., Fiore, S., and Foster, I.: Pretraining a foundation model using MODIS observations of the earth’s atmosphere, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-22461, https://doi.org/10.5194/egusphere-egu24-22461, 2024.

Supplementary materials

Supplementary material file

Comments on the supplementary material

AC: Author Comment | CC: Community Comment | Report abuse

supplementary materials version 1 – uploaded on 18 Apr 2024, no comments