EGU24-3202, updated on 08 Mar 2024
https://doi.org/10.5194/egusphere-egu24-3202
EGU General Assembly 2024
© Author(s) 2024. This work is distributed under
the Creative Commons Attribution 4.0 License.

Foundation Models for Science: Potential, Challenges, and the Path Forward

Manil Maskey1, Rahul Ramachandran1, Tsengdar Lee1, Kevin Murphy1, Sujit Roy2, Muthukumaran Ramasubramanian2, Iksha Gurung2, and Raghu Ganti3
Manil Maskey et al.
  • 1NASA
  • 2University of Alabama in Huntsville
  • 3IBM Research

Foundation models signify a significant shift in AI by creating large-scale machine learning models (FMs) pre-trained on wide-ranging datasets. These models act as flexible starting points, ready to be fine-tuned for various specialized tasks. Distinct from traditional models designed for narrow objectives, foundation models apply their broad pre-training to learn patterns across data, enhancing their adaptability and efficiency in diverse domains. This approach minimizes the necessity for extensive, task-specific labeled datasets and prolonged training periods. A single foundation model can be tailored for many scientific applications, often outperforming traditional models in some tasks, even when labeled data is scarce.

 

Addressing the right array of complex scientific challenges using AI FMs requires interdisciplinary teams from various groups and organizations. No single research group or institution can independently muster the necessary resources or expertise to construct useful AI FMs. Thus, collaborative efforts are essential, combining diverse skills, resources, and viewpoints to create more comprehensive solutions. The right blend of domain-specific expertise and a broad understanding of various AI subfields is crucial to ensure the versatility and adaptability of foundation models. Moreover, the scientific community must develop a wide array of use cases, labeled datasets, and benchmarks to evaluate these models effectively across different scenarios to be accepted and widely utilized within science.

 

Building Foundation Models for science demands fostering collaboration among a diverse spectrum of research groups to ensure this broad range of perspectives. This strategy should include stakeholders like individual researchers, academic and government institutions, and tech companies. Embedding this collaboration within the principles of open science is therefore vital. Open science calls for transparent research, open sharing of findings, promoting reproducibility by making methodologies and data accessible, and providing tools researchers can freely use, modify, and distribute. Encouraging community collaboration in the model pre-training development leads to more robust and functional FM. Guaranteeing open access to datasets, models, and fine-tuning code enables researchers to validate findings and build upon previous work, thus reducing redundancy in data collection and cultivating a culture of shared knowledge and progress.

How to cite: Maskey, M., Ramachandran, R., Lee, T., Murphy, K., Roy, S., Ramasubramanian, M., Gurung, I., and Ganti, R.: Foundation Models for Science: Potential, Challenges, and the Path Forward, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-3202, https://doi.org/10.5194/egusphere-egu24-3202, 2024.