Pangeo for everyone with Galaxy
- 1The Nordic e-Infrastructure Collaboration (NeIC), University of Oslo, Norway (annefou@geo.uio.no)
- 2Pôle National de Données de Biodiversité (PNDB), Muséum National d’Histoire Naturelle, France (yvan.le-bras@mnhn.fr)
- 3Department of Geosciences, University of Oslo, Norway (adelez@student.matnat.uio.no)
Pangeo has been deployed on a number of diverse infrastructures and learning resources are available with for instance the Pangeo Tutorial Gallery (http://gallery.pangeo.io/repos/pangeo-data/pangeo-tutorial-gallery/index.html). However, knowledge of Python is necessary to develop or reuse applications with the Pangeo ecosystem which hinders its wider adoption and reduces potential inter-disciplinary collaborations.
Our main objective is to reduce barriers for using the Pangeo ecosystem and allow everyone to understand the fundamental concepts behind Pangeo and offer a Pangeo deployment for teaching and developing reproducible, reusable and fully automated workflows.
Most Pangeo tutorials and examples use Jupyter notebooks but the gap between these “toy examples” and real complex applications is still huge: adopting best software practices for Jupyter notebooks and big applications is essential for reuse and automation of workflows.
Galaxy project is a worldwide community dedicated to making tools, workflows and infrastructures open and accessible to everyone. Each tool in Galaxy has a wrapper describing the tool itself along with the input and output parameters, citations, and possible annotations thanks to EDAM ontology. Galaxy workflows are also annotated and can contain any kind of Galaxy Tools, including interactive tools such as Pangeo notebooks.
Galaxy is also accessible via a web-based interface. The platform is designed to be community and technology agnostic and has gained adoption in various communities, ranging from Climate Science and Biodiversity to Biology and Medicine.
By combining Pangeo and Galaxy, we are providing access to the Pangeo ecosystem to everyone, including those who are not familiar with Python and we offer fully automated and annotated Pangeo “tools”.
Two main set of tools are currently available in Galaxy:
- Pangeo notebook (synced with Pangeo notebook with corresponding docker https://github.com/pangeo-data/pangeo-docker-images)
- Xarray tools to manipulate and visualise netCDF data from Galaxy Graphical User Interface.
Training material is being developed and included in the Galaxy Training Network (https://training.galaxyproject.org/):
- “Pangeo ecosystem 101 for everyone - Introduction to Xarray Galaxy Tools” where anyone can learn about Pangeo and its main concepts and try it out without using any command lines;
- Pangeo Notebook in Galaxy - Introduction to Xarray:itl is very similar to “Xarray Tutorial” from Pangeo (http://gallery.pangeo.io/repos/pangeo-data/pangeo-tutorial-gallery/xarray.htm) but makes use of Galaxy Pangeo notebooks and offers a different entry point to Pangeo.
Galaxy Training Infrastructure as a Service (https://galaxyproject.eu/tiaas.html) with infrastructure at no cost is provided by Galaxy Europe for teachers/instructors. It was used for the FORCeS eScience course “Tools in Climate Science: Linking Observations with Modeling” (https://galaxyproject.eu/posts/2021/11/13/tiaas-anne/) where about 30 students learned about Pangeo (see https://nordicesmhub.github.io/forces-2021/intro.html).
Galaxy Pangeo also contributes to the worldwide online training “GTN Smörgåsbord” (last event 14-18 March 2022, https://gallantries.github.io/posts/2021/12/14/smorgasbord2-tapas/) where everyone is welcome as a trainee, trainer or just observer! This will contribute to democratising Pangeo.
How to cite: Fouilloux, A., Le Bras, Y., and Zaini, A.: Pangeo for everyone with Galaxy, EGU General Assembly 2022, Vienna, Austria, 23–27 May 2022, EGU22-5709, https://doi.org/10.5194/egusphere-egu22-5709, 2022.