SC 4.14 | Harnessing the Power of Pangeo: Enhancing Your Scientific Data Analysis Workflow with scalable open source tools
EDI
Harnessing the Power of Pangeo: Enhancing Your Scientific Data Analysis Workflow with scalable open source tools
Co-organized by ESSI2
Convener: Anne Fouilloux | Co-conveners: Tina Odaka, Scott Henderson, Max Jones, Justus Magin

The analysis and visualisation of data is fundamental to research across the earth and space sciences. The Pangeo (https://pangeo.io) community has built an ecosystem of tools designed to simplify these workflows, centred around the Xarray library for n-dimensional data handling and Dask for parallel computing. In this short course, we will offer a gradual introduction to the Pangeo toolkit, through which participants will learn the skills required to scale their local scientific workflows through cloud computing or large HPC with minimal changes to existing codes.
The course is beginner-friendly but assumes a prior understanding of the Python language. We will guide you through hands-on jupyter notebooks that showcase scalable analysis of in-situ, satellite observation and earth system modelling datasets to apply your learning. By the end of this course, you will understand how to:
- Efficiently access large public data archives from Cloud storage using the Pangeo ecosystem of open source software and infrastructure.
- Leverage labelled arrays in Xarray to build accessible, reproducible workflows
- Use chunking to scale a scientific data analysis with Dask
All the Python packages and training materials used are open-source (e.g., MIT, Apache-2, CC-BY-4). Participants will need a laptop and internet access but will not need to install anything. We will be using the free and open Pangeo@EOSC (European Open Sicence Cloud) platform for this course. We encourage attendees from all career stages and fields of study (e.g., atmospheric sciences, cryosphere, climate, geodesy, ocean sciences) to join us for this short course. We look forward to an interactive session and will be hosting a Q&A and discussion forum at the end of the course, including opportunities to get more involved in Pangeo and open source software development. Join us to learn about open, reproducible, and scalable Earth science!
Preparation: We recommend learners with no prior knowledge of Python review resources such as the Software Carpentry training material and Project Pythia in advance of this short course. Participants should bring a laptop with an internet connection. No software installation is required as resources will be accessed online using the Pangeo@EOSC platform. Temporary user accounts will be provided for the course and we will also teach attendees how to request an account on Pangeo@EOSC to continue working on the platform after the training course.

The analysis and visualisation of data is fundamental to research across the earth and space sciences. The Pangeo (https://pangeo.io) community has built an ecosystem of tools designed to simplify these workflows, centred around the Xarray library for n-dimensional data handling and Dask for parallel computing. In this short course, we will offer a gradual introduction to the Pangeo toolkit, through which participants will learn the skills required to scale their local scientific workflows through cloud computing or large HPC with minimal changes to existing codes.
The course is beginner-friendly but assumes a prior understanding of the Python language. We will guide you through hands-on jupyter notebooks that showcase scalable analysis of in-situ, satellite observation and earth system modelling datasets to apply your learning. By the end of this course, you will understand how to:
- Efficiently access large public data archives from Cloud storage using the Pangeo ecosystem of open source software and infrastructure.
- Leverage labelled arrays in Xarray to build accessible, reproducible workflows
- Use chunking to scale a scientific data analysis with Dask
All the Python packages and training materials used are open-source (e.g., MIT, Apache-2, CC-BY-4). Participants will need a laptop and internet access but will not need to install anything. We will be using the free and open Pangeo@EOSC (European Open Sicence Cloud) platform for this course. We encourage attendees from all career stages and fields of study (e.g., atmospheric sciences, cryosphere, climate, geodesy, ocean sciences) to join us for this short course. We look forward to an interactive session and will be hosting a Q&A and discussion forum at the end of the course, including opportunities to get more involved in Pangeo and open source software development. Join us to learn about open, reproducible, and scalable Earth science!
Preparation: We recommend learners with no prior knowledge of Python review resources such as the Software Carpentry training material and Project Pythia in advance of this short course. Participants should bring a laptop with an internet connection. No software installation is required as resources will be accessed online using the Pangeo@EOSC platform. Temporary user accounts will be provided for the course and we will also teach attendees how to request an account on Pangeo@EOSC to continue working on the platform after the training course.