ESSI1.9 | AI Foundation Models for Earth, Space and Planetary Sciences
AI Foundation Models for Earth, Space and Planetary Sciences
Convener: Takuya Kurihana | Co-convener: Valentine Anantharaj

Science Foundation Models (SFM), based on large volumes of data and computing at scale, have already demonstrated their usefulness in science applications. These generalized AI models are designed not just for specific tasks but for a plethora of downstream applications. Trained on any sequence data through self-supervised methods, FMs eliminate the need for extensive labeled datasets. Often leveraging the power of transformer architectures, which utilize self-attention mechanisms, FMs can capture intricate relationships in data across space and time. Their emergent properties, derived from the data, make them invaluable tools for scientific research. When fine-tuned, FMs outperform traditional models, both in efficiency and accuracy, paving the way for rapid development of diverse applications. FMs, with their ability to synthesize vast amounts of data and discern intricate patterns, can revolutionize our understanding of and response to challenging global problems, such as monitoring and mitigating the impacts of climate change and other natural hazards.

The session will discuss advances, early results and best practices related to the preparation and provisioning of curated data, construction and evaluation of model architectures, scaling properties and computational characteristics of model pretraining, use cases and finetuning of downstream applications, and MLops for the deployment of models for research and applications. The session also encourages discussion on broad community involvement toward the development of open foundation models for science that are accessible for all.

Science Foundation Models (SFM), based on large volumes of data and computing at scale, have already demonstrated their usefulness in science applications. These generalized AI models are designed not just for specific tasks but for a plethora of downstream applications. Trained on any sequence data through self-supervised methods, FMs eliminate the need for extensive labeled datasets. Often leveraging the power of transformer architectures, which utilize self-attention mechanisms, FMs can capture intricate relationships in data across space and time. Their emergent properties, derived from the data, make them invaluable tools for scientific research. When fine-tuned, FMs outperform traditional models, both in efficiency and accuracy, paving the way for rapid development of diverse applications. FMs, with their ability to synthesize vast amounts of data and discern intricate patterns, can revolutionize our understanding of and response to challenging global problems, such as monitoring and mitigating the impacts of climate change and other natural hazards.

The session will discuss advances, early results and best practices related to the preparation and provisioning of curated data, construction and evaluation of model architectures, scaling properties and computational characteristics of model pretraining, use cases and finetuning of downstream applications, and MLops for the deployment of models for research and applications. The session also encourages discussion on broad community involvement toward the development of open foundation models for science that are accessible for all.