- ECMWF, Computing Department, Reading, UK
MediTwin is a European research initiative aimed at developing digital twin technologies for the Mediterranean region, integrating Earth observation data, numerical modelling and artificial intelligence to support environmental monitoring and decision-making. In this context, the MediTwin Summer School 2025 was organised to provide hands-on training on data-driven workflows, cloud-native tools and AI/ML techniques for Earth system applications. The school targeted early-career researchers, PhD students and technical staff from research institutions, with a total of 20 participants.
The School required a scalable, secure and reproducible cloud infrastructure capable of supporting hands-on training activities in Earth system digital twins, data analysis and AI/ML workflows. This contribution presents the design and provisioning of the cloud-native infrastructure deployed to support the school, with a focus on Infrastructure as Code (IaC), Kubernetes-based orchestration and hybrid GPU-enabled workloads.
The infrastructure was deployed on the ECMWF on-premises cloud, based on OpenStack and backed by Ceph software-defined storage, providing elastic compute, networking and persistent storage services. The Kubernetes cluster was provisioned in a high-availability configuration using Terraform and Rancher Cluster Manager, following established GitOps best practices. The cluster architecture comprised dedicated control-plane, worker, ingress and GPU nodes, enabling both standard cloud-native services and accelerated AI/ML workloads. Cluster lifecycle management, configuration drift prevention and application delivery were handled through a GitOps approach using Rancher Fleet.
GitLab acted as the central orchestration platform for source control, CI/CD pipelines and IaC automation, hosting Terraform modules, Helm charts, Rancher cluster definitions and configuration templates. This ensured full traceability, auditability and reproducibility of both infrastructure and application deployments. Sensitive credentials and API keys were securely managed using HashiCorp Vault and dynamically injected into workloads.
To support interactive training activities, a JupyterHub service was deployed on Kubernetes using the official Helm chart, customised for resource management, authentication and storage integration. GPU acceleration was enabled via the NVIDIA GPU Operator, which automated driver installation, device discovery and scheduler integration. In addition, outside the Kubernetes environment, 20 GPU-enabled virtual machines were provisioned directly on OpenStack using an Ansible role executed through AWX, itself deployed on the Kubernetes cluster, to accommodate specific student exercises requiring isolated VM-based access.
This experience demonstrates how modern cloud-native and DevSecOps practices can be effectively applied to provision short-lived yet production-grade scientific training infrastructures, ensuring scalability, security and reproducibility for future Earth observation and digital twin education initiatives.
How to cite: Fornari, F., Pisa, C., Antonacci, M., Baousis, V., Kaprol, T., and Albughdadi, M.: Provisioning a Cloud-Native Training Infrastructure for the MediTwin Summer School 2025 , EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-7413, https://doi.org/10.5194/egusphere-egu26-7413, 2026.