EGU24-15872, updated on 09 Mar 2024
https://doi.org/10.5194/egusphere-egu24-15872
EGU General Assembly 2024
© Author(s) 2024. This work is distributed under
the Creative Commons Attribution 4.0 License.

Deploying Pangeo on HPC: our experience with the Remote Sensing Deployment Analysis environmenT on SURF infrastructure

Francesco Nattino, Meiert W. Grootes, Pranav Chandramouli, Ou Ku, Fakhereh Alidoost, and Yifat Dzigan
Francesco Nattino et al.
  • Netherlands eScience Center, Amsterdam, Netherlands (f.nattino@esciencecenter.nl)

The Pangeo software stack includes powerful tools that have the potential to revolutionize the way in which research on big (geo)data is conducted. A few of the aspects that make them very attractive to researchers are the ease of use of the Jupyter web-based interface, the level of integration of the tools with the Dask distributed computing library, and the possibility to seamlessly move from local deployments to large-scale infrastructures. 

The Pangeo community and project Pythia are playing a key role in providing training resources and examples that showcase what is possible with these tools. These are essential to guide interested researchers with clear end goals but also to provide inspiration for new applications. 

However, configuring and setting up a Pangeo-like deployment is not always straightforward. Scientists whose primary focus is domain-specific often do not have the time to spend solving issues that are mostly ICT in nature. In this contribution, we share our experience in providing support to researchers in running use cases backed by deployments based on Jupyter and Dask at the SURF supercomputing center in the Netherlands, in what we call the Remote Sensing Deployment Analysis environmenT (RS-DAT) project. 

Despite the popularity of cloud-based deployments, which are justified by the enormous data availability at various public cloud providers, we discuss the role that HPC infrastructure still plays for researchers, due to the ease of access via merit-based allocation grants and the requirements of integration with pre-existing workflows. We present the solution that we have identified to seamlessly access datasets from the SURF dCache massive storage system, we stress how installation and deployment scripts can facilitate adoption and re-use, and we finally highlight how technical research-support staff such as Research Software Engineers can be key in bridging researchers and HPC centers. 

How to cite: Nattino, F., Grootes, M. W., Chandramouli, P., Ku, O., Alidoost, F., and Dzigan, Y.: Deploying Pangeo on HPC: our experience with the Remote Sensing Deployment Analysis environmenT on SURF infrastructure, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-15872, https://doi.org/10.5194/egusphere-egu24-15872, 2024.