EGU25-21496, updated on 15 Mar 2025
https://doi.org/10.5194/egusphere-egu25-21496
EGU General Assembly 2025
© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.
Oral | Friday, 02 May, 17:10–17:20 (CEST)
 
Room -2.92
Enhancing Earth Science Research through the FAIR-EASE Data Lake Infrastructure: Integrating Diverse Data Sources for Advanced Computational Services
Samuel Keuchkerian1, Vincent Breton1, David Sarramia1,3, Marc Portier2, Antoine Mahul3, and Erwan Bodere4
Samuel Keuchkerian et al.
  • 1CNRS
  • 2VLIZ
  • 3UCA
  • 4Ifremer

The FAIR-EASE Data Lake infrastructure is a pivotal development in Earth sciences, providing a deep depth in cloud approaches that significantly enhances the accessibility and utility of complex data for the earth science research community. At its core, the infrastructure integrates diverse data sources, including Copernicus, enabling comprehensive environmental and geophysical analyses, and several existing infrastructures. A key strength of the FAIR-EASE datalake lies in its sophisticated collaborative framework. It incorporates features from established environments like the European Grid Infrastructure (EG.euI), Galaxy.eu, D4Science, and the UCA test bed, along with several analytics tools that collectively enhance the infrastructure's operational and security capabilities. This setup will ensure high levels of interoperability and facilitate the usage of data and data sources across various scientific domains in the earth science domain. The integration among the five strategic pilot projects—Coastal Water Dynamics, Earth Critical Zones, Ocean Biogeochemical Observation, Marine Omics Observations, and Volcano Space Observatory—demonstrates the infrastructure's unique capability. These projects benefit from shared data access, whatever their format and access protocol and synergistic interactions among their data sources, allowing for innovative correlations, such as combining satellite data with biological data and in-situ data. This synergy provides new insights into biodiversity patterns or ecosystem health, showcasing the power of cross-disciplinary data integration. By providing discovery and access to diverse data sources and offering advanced analytical tools in a secure, collaborative environment, the FAIR-EASE Data Lake is pioneering new methodologies that transcend traditional disciplinary boundaries. It exemplifies the transformative potential of integrated data systems using distributed infrastructures in advancing our understanding of Earth’s dynamic systems. This has been done by identifying totechnical solutions tackling this distributed way of working used in other communities (such as Galaxy), identifying, integrating and deploying cloud data and data management tools (such as S3, Apache Iceberg). By developing tools enabling data discovery namely IDDAS for Interdisciplinary Data Discovery Access Service) and libraries namely UDAL for Uniform Data Access Layer, in combination with the proposal to use and include in users practices Amazon S3 API for data access, FAIR-EASE datalake has given the opportunity to include cloud technologies in pilots practices and to take advantages of distributed data resources in a very transparent way. In conclusion, the FAIR-EASE Data Lake infrastructure sets new standards for data-driven research and data-analytics in Earth sciences. By merging extensive data access with sophisticated computational resources and a robust collaborative framework, it empowers researchers to expand the frontiers of knowledge about Earth systems and their complex interactions.

 

 

 

How to cite: Keuchkerian, S., Breton, V., Sarramia, D., Portier, M., Mahul, A., and Bodere, E.: Enhancing Earth Science Research through the FAIR-EASE Data Lake Infrastructure: Integrating Diverse Data Sources for Advanced Computational Services, EGU General Assembly 2025, Vienna, Austria, 27 Apr–2 May 2025, EGU25-21496, https://doi.org/10.5194/egusphere-egu25-21496, 2025.