Session 1 | Preparation and optimization of HPC codes to Exascale

Session 1

Preparation and optimization of HPC codes to Exascale
Convener: Boris Kaus | Co-coveners: Marta Pienkowska, Marisol Monterrubio-Velasco
Orals
| Wed, 24 May, 09:00–13:00
Poster
| Attendance Wed, 24 May, 10:30–11:30
Orals |
Wed, 09:00
Wed, 10:30
Many scientific codes are being restructured (modernized), optimized and ported to run efficiently on the emerging hybrid-node architectures, eventually with different typologies of accelerators. Exascale is also posing challenges on code scalability, load balance, fault tolerance, and co-design among others.

Orals: Wed, 24 May | Sala d'Actes

09:00–10:00
|
GC11-solidearth-65
|
solicited
Richard Tran Mills, Mark Adams, Satish Balay, Jed Brown, Jacob Faibussowitsch, Matthew G. Knepley, Scott Kruger, Hannah Morgan, Todd Munson, Karl Rupp, Barry Smith, Stefano Zampini, Hong Zhang, and Junchao Zhang

The Portable Extensible Toolkit for Scientific Computation (PETSc) library provides scalable solvers for nonlinear time-dependent differential and algebraic equations and for numerical optimization; it is used in dozens of scientific fields, and has been an important building block for many computational geoscience applications. Starting from the terascale era in the 1990s and continuing into the present dawn of the exascale era, a major goal of PETSc development has been achieving the scalability required to fully utilize leadership-class supercomputers. We will describe some of the algorithmic developments made during the era in which achieving inter-node scalability was the primary challenge to enabling extreme-scale computation, and then survey the challenges posed by the current era in which harnessing the abundant fine-scale parallelism within compute nodes -- primarily in the form of GPU-based accelerators -- has assumed at least equal importance. We will discuss how the PETSc design for performance portability addresses these challenges while stressing flexibility and extensibility by separating the programming model used by application code from that used by the library. Additionally, we will discuss recent developments in PETSc's communication module, PetscSF, that enable flexibility and scalable performance across large GPU-based systems while overcoming some of the difficulties posed by working directly with the Message Passing Interface (MPI) on such systems. A particular goal of this talk will be to go beyond simply describing the work performed to prepare PETSc and simulation codes that rely on it to run on exascale-class systems, but to enumerate the challenges we encountered and to share the essential lessons learned that can help other developers to prepare and optimize their high-performance scientific computing codes for the exascale era.

How to cite: Tran Mills, R., Adams, M., Balay, S., Brown, J., Faibussowitsch, J., G. Knepley, M., Kruger, S., Morgan, H., Munson, T., Rupp, K., Smith, B., Zampini, S., Zhang, H., and Zhang, J.: Enabling End-to-End Accelerated Multiphysics Simulations in the Exascale Era Using PETSc, Galileo Conference: Solid Earth and Geohazards in the Exascale Era, Barcelona, Spain, 23–26 May 2023, GC11-solidearth-65, https://doi.org/10.5194/egusphere-gc11-solidearth-65, 2023.

10:00–10:15
|
GC11-solidearth-21
Marc de la Asunción, Jorge Macías, and Manuel Jesús Castro

The recently finished ChEESE European project aimed at establishing a center of excellence in the domain of solid earth by developing ten flagship European codes prepared for the upcoming exascale supercomputers. The EDANYA group, at the University of Malaga, Spain, has developed two of these flagship codes: Tsunami-HySEA and Landslide-HySEA, for the simulation of tsunamis generated by earthquakes and landslides, respectively. These two codes, although being already implemented for multi-gpu architectures at the beginning of the ChEESE project, underwent substantial changes during the lifetime of the project with the objective of improving several crucial aspects such as their efficiency, scaling or input/output requirements. Specifically, we added features such as an improved load balancing, direct GPU to GPU data transfers or compressed output files, among others. Additionally, we developed a version of Tsunami-HySEA, named Monte-Carlo, particularly suited for areas such as probabilistic tsunami forecast or machine learning, capable of performing multiple simulations in parallel. In this presentation we describe all these developments carried out during the ChEESE project, along with the two audits that the two codes went through, performed by the Performance Optimisation and Productivity Centre of Excellence in HPC (POP CoE). Finally, we show some comparative results using realistic scenarios achieved at the beginning and at the end of the project.

How to cite: de la Asunción, M., Macías, J., and Castro, M. J.: Towards exascale-prepared codes for tsunami simulation, Galileo Conference: Solid Earth and Geohazards in the Exascale Era, Barcelona, Spain, 23–26 May 2023, GC11-solidearth-21, https://doi.org/10.5194/egusphere-gc11-solidearth-21, 2023.

10:15–10:30
|
GC11-solidearth-16
|
ECS
Ivan Utkin and Ludovic Räss

Continuum-based numerical modelling is a useful tool for interpreting field observations and geological or geotechical data. In order to match the available data and the results of numerical simulations, it is necessary to estimate the sensitivity of a particular model to changes in its parameters. Recent advances in hardware design, such as the development of massively parallel graphical processing units (GPUs), make it possible to run simulations at unprecedented resolution close to that of the original data. Thus, automated methods of calculating the sensitivity in a high-dimensional space of parameters are in demand.

The adjoint method of computing sensitivities, i.e., gradients of the forward model solution with respect to the parameters of interest, gains more attention in the scientific and engineering communities. This method allows for computing the sensitivities for every point of computational domain using the results of only one forward solve, in contrast to the direct method that would require runninng the forward simulation for each point of the domain. This property of adjoint method significantly reduces the amount of computational resources required for sensitivity analysis and inverse modelling.

In this work, we demonstrate the applications of the adjoint method to inverse modelling in geosciences. We developed massively parallel 3D forward and inverse solvers with full GPU support using Julia language. We present the results of performance and scalability tests on Piz Daint supercomputer at CSCS.

How to cite: Utkin, I. and Räss, L.: Massively parallel inverse modelling on GPUs using the adjoint method, Galileo Conference: Solid Earth and Geohazards in the Exascale Era, Barcelona, Spain, 23–26 May 2023, GC11-solidearth-16, https://doi.org/10.5194/egusphere-gc11-solidearth-16, 2023.

Coffee break and poster session for session 1
11:30–12:30
12:30–13:00

Poster: Wed, 24 May, 10:30–11:30 | Poster area (V217)

P1
|
GC11-solidearth-15
Albert de Montserrat Navarro, Boris Kaus, Ludovic Räss, Ivan Utkin, and Paul Tackley

Following the long-standing paradigm in HPC, scientific software has been typically written in high-level statically typed and compiled languages, namely C/C++ and Fortran. The arguably low productivity rates of these languages led to the so-called two-language problem, where dynamic languages such as Python or MATLAB are used for prototyping purposes, before porting the algorithms to high-performance languages. The Julia programming language aims at bridging the productivity rates and other advantages of such dynamic languages without sacrificing the performance provided by their static counterparts. The combination of this high performance, productivity rates and other powerful tools, such as advanced meta-programming (i.e. code generation), make Julia a suitable candidate for the next generation of HPC-ready scientific software.

We introduce the open-source and Julia-written package JustRelax.jl (https://github.com/PTsolvers/JustRelax.jl) as a way forward for the next generation of geodynamic codes. JustRelax.jl is a production-ready API for a collection of highly-efficient numerical solvers (Stokes, diffusion, etc.) based on the embarrassingly parallel pseudo-transient method. We rely on ParallelStencil.jl (https://github.com/omlins/ParallelStencil.jl), which leverages the advanced meta-programming capabilities of Julia to generate efficient computational kernels agnostic to the back-end system (i.e. Central Processing Unit (CPU) or Graphics Processing Unit (GPU)).  Using  ImplicitGlobalGrid.jl (https://github.com/eth-cscs/ImplicitGlobalGrid.jl) to handle the MPI and CUDA-aware MPI communication, these computational kernels run seamlessly in local shared-memory workstations and distributed memory and multi-GPU HPC systems with little effort for the front-end user.

Efficient computation of the (local) physical properties of different materials is another critical feature required in geodynamic codes, for which we employ GeoParams.jl (https://github.com/JuliaGeodynamics/GeoParams.jl). This package provides lightweight, optimised, and reproducible computation of different material properties (e.g. advanced rheological laws, density, seismic velocity, etc.), amongst other available features. GeoParams.jl is also carefully designed to support CPU and GPU devices, and be fully compatible with other external packages, such as ParallelStencil.jl and existing auto-differentiation packages.

We finally show high-resolution GPU examples of geodynamic models based on the presented open-source Julia tools.

How to cite: de Montserrat Navarro, A., Kaus, B., Räss, L., Utkin, I., and Tackley, P.: Using Julia for the next generation of HPC-ready software for geodynamic modelling, Galileo Conference: Solid Earth and Geohazards in the Exascale Era, Barcelona, Spain, 23–26 May 2023, GC11-solidearth-15, https://doi.org/10.5194/egusphere-gc11-solidearth-15, 2023.

P2
|
GC11-solidearth-37
Rui M L Ferreira, Daniel Conde, Zixiong Zhao, and Peng Hu

This work addresses current performance limitations by introducing new distributed multi-architecture design approaches for massively parallel hyperbolic solvers. Previous successful implementations that couple MPI with either OpenMP or CUDA have been previously reported in the literature. We present novel approaches that remain intuitive and compatible with developer-centered object-oriented (OOP) paradigm but coupled with a cache-conscious data layout, compatible with both structured and unstructured meshes, promoting memory efficiency and quasi-linear scalability.

One of the approaches is based on a unified object-oriented CPU+GPU framework, augmented with an inter-device communication layer, enabling both coarse and finegrain parallelism on hyperbolic solvers. The framework is implemented through a combination of three different programming models, namely OpenMP, CUDA and MPI. The second approach is also based on a unified object-oriented CPU+GPU framework, augmented with an improved local time step algorithm (LTS) on variable updating. This framework is implemented through a combination of parallel technology (CUDA Fortran) and mathematical algorithm (TLS).

The efficiency of these distributed-heterogeneous frameworks are quantified under static and dynamic loads on consumer and professional grade CPUs and GPUs. In both approaches, an asynchronous communications scheme is implemented and described, showing very reduced overheads and a nearly linear scalability for multiple device combinations. For simulations (or systems) with non-homogeneous workloads (or devices) the domain decomposition algorithm incorporates a low-frequency load-to-device fitting function to ensure computational balance. Real-world applications to high-resolution shallow-water problems are presented. The proposed implementations show speedups of up to two orders of magnitude, opening new perspectives for shallow-water solvers with high-demand requirements.

How to cite: L Ferreira, R. M., Conde, D., Zhao, Z., and Hu, P.: A distributed-heterogeneous framework for of explicit hyperbolic solvers of shallow-water equations, Galileo Conference: Solid Earth and Geohazards in the Exascale Era, Barcelona, Spain, 23–26 May 2023, GC11-solidearth-37, https://doi.org/10.5194/egusphere-gc11-solidearth-37, 2023.

P3
|
GC11-solidearth-38
Daniel Caviedes-Voullième, Mario Morales-Hernández, and Ilhan Özgen-Xian

The Simulation EnviRonment for Geomorphology, Hydrodynamics and Ecohydrology in Integrated form (SERGHEI) model framework is a model framework for environmental hydrodynamics, ecohydrology, morphodynamics, and, importantly, interactions and feedbacks among such processes. SERGHEI is designed to be applicable to both geoscientific questions of coupled processes in Earth system science such as hydrological connectivity and river stability, as well as engineering applications to flooding and transport phenomena.

In this contribution we present the SERGHEI model framework, its modular concept and its performance-portable implementation. We discuss the implementation of SERGHEI including the specifics of a highly efficient parallel implementation of the numerical scheme (based on augmented Riemann solvers) and how we achieve portability using the Kokkos programming model as an abstraction layer. The experience in SERGHEI suggests that Kokkos is a robust path towards performance-portability, and sets a realistic path for SERGHEI to be ready for the upcoming European exascale systems.

We focus on the SERGHEI-SWE module which solves 2D shallow-water equation. We show that this fully operational module is performance-portable across CPUs and GPUs in several TOP500 systems, as well as first results on portability across GPU vendors.  We discuss the computational performance on benchmark problems and show its scalability into the range of hundreds of scientific-grade GPUs. Additionally, we show first results of performance of the upcoming transport module in SERGHEI, and discuss the computational implications and outlook considering further integration of new modules and solvers in SERGHEI.

 

 

 

How to cite: Caviedes-Voullième, D., Morales-Hernández, M., and Özgen-Xian, I.: Towards exascale shallow-water modelling with SERGHEI model and Kokkos, Galileo Conference: Solid Earth and Geohazards in the Exascale Era, Barcelona, Spain, 23–26 May 2023, GC11-solidearth-38, https://doi.org/10.5194/egusphere-gc11-solidearth-38, 2023.

P4
|
GC11-solidearth-39
Pascal Aellig, Boris Kaus, Albert de Montserrat, Daniel Kiss, and Nicolas Berlie

The occurrence of large voluminous volcanic eruptions, so called caldera-forming eruptions, pose a threat to humankind and its growing cities located near volcanoes. With today's technology, the underlying processes of large scale magmatic systems can be modelled to further improve the understanding of all phases of an eruption. For those caldera-forming events, the deficit in overpressure and magma stored within the chamber results in a collapse of the ceiling and permanently alters the geomorphology of the region. The processes of magma accumulation, the resulting overpressure and fracturing of the country rock can be modelled using various dynamic magma models. In this study we take a multiphysical approach and apply an open source thermal evolution magma intrusion model and couple it with a pseudo-transient Stokes solver (PT Solver) written in Julia. The model is set up to run in parallel and work on graphics processing unit (GPU) to maximise its efficiency and applicability to the newest generation of high performance computing (HPC) machines. The coupling enables us to model the growth of the magmatic system while also accounting for different complexities in rheology. The model provides an indication on the long-term magmatic evolution, both thermal and volumetric, during the build-up stage prior to caldera-forming eruptions. 

How to cite: Aellig, P., Kaus, B., de Montserrat, A., Kiss, D., and Berlie, N.: Modelling the accumulation of magma prior to the caldera collapse, Galileo Conference: Solid Earth and Geohazards in the Exascale Era, Barcelona, Spain, 23–26 May 2023, GC11-solidearth-39, https://doi.org/10.5194/egusphere-gc11-solidearth-39, 2023.

P5
|
GC11-solidearth-67
Kim Olsen, Daniel Roten, Te-Yang Yeh, Ke Xu, and Yifeng Cui

AWP-ODC is an open-source dynamic rupture and wave propagation code which solves the 3D velocity-stress wave equation explicitly by a staggered-grid finite-difference method with fourth-order accuracy in space and second-order accuracy in time. The code is memory-bandwidth, with excellent scalability up to full machine scale on CPUs and GPUs, tuned on CLX, with support for generating vector folded finite difference stencils using intrinsic functions. AWP-ODC includes frequency-dependent anelastic attenuation Q(f), small-scale media heterogeneities, support for topography, Drucker-Prager visco-plasticity, and a multi-yield-surface, hysteretic (Iwan) nonlinear model using an overlay concept. Support for a discontinuous mesh is available for increased efficiency. An important application of AWP-ODC is the CyberShake Strain-Green-Tensor (SGT) code used for probabilistic hazard analysis in CA and other regions.

Here, we summarize implementation and verification of some of the widely-used capabilities of AWP-ODC, as well as validation against strong motion data and recent applications for future earthquake scenarios. We show for a M7.8 dynamic rupture ShakeOut scenario on the southern San Andreas fault that while simulations with a single yield surface reduces long period ground motion amplitudes by about 25% inside a wave guide in greater Los Angeles, multi-surface Iwan nonlinearity further reduces the values by a factor of two. In addition, we show assembly and calibration of a 3D Community Velocity Model (CVM) for central and southern Chile as well as Peru. The CVM is validated for the 2010 M8.8 Maule, Chile, earthquake up to 5 Hz, and the validated CVM is used for scenario simulations of megathrust scenario events with magnitude up to M9.5 in the Chile-Peru subduction zone for risk assessment. Finally, we show simulations of 0-3 Hz 3D wave propagation for the 2019 Mw 7.1 Ridgecrest earthquake including a data-constrained high-resolution fault-zone model. Our results show that the heterogeneous near-fault low-velocity zone inherent to the fault zone structure significantly perturbs the predicted wave field in the near-source region, in particular by more accurately generating Love waves at its boundaries, in better agreement with observations, including at distances 200+ km in Los Angeles.

How to cite: Olsen, K., Roten, D., Yeh, T.-Y., Xu, K., and Cui, Y.: AWP-ODC: A Highly Scalable HPC Tool for Dynamic Rupture and Wave Propagation Simulations, Galileo Conference: Solid Earth and Geohazards in the Exascale Era, Barcelona, Spain, 23–26 May 2023, GC11-solidearth-67, https://doi.org/10.5194/egusphere-gc11-solidearth-67, 2023.

P6
|
GC11-solidearth-69
Casper Pranger, Dave A. May, and Alice-Agnes Gabriel

Seismic cycle modeling has advanced to the point where 3D models could be connected to geodetic and seismic observations to test hypotheses about controls on the depth and along-strike extent of ruptures, and interactions between seismic and aseismic slip events in geometrically complex fault systems. Such an undertaking does however require a greater degree of geometrical flexibility, material behaviors, code performance, and community participation than has so far been the standard.

In this light we present tandem, an open-source C++ code for 3D earthquake sequence modeling (Uphoff et al., 2022a). Tandem solves the elasticity problem with a discontinuous Galerkin finite element method on unstructured tetrahedral meshes. It can handle multiple, nonplanar faults and spatially variable elastic properties (Figure 1). The method can be extended to nonlinear off-fault material response (e.g., power-law viscoelasticity). The code is parallelized with MPI and uses the PETSc-TAO library (Balay et al., 2022) for time-integrators and preconditioned Krylov methods to solve the static elasticity problem. Faults are governed by rate- state friction and adaptive time-stepping permits modeling of dynamic rupture, the postseismic period, and interseismic loading, all across multiple earthquake cycles. The code is developed with best practices for open-source community software and includes documentation and tutorials to facilitate use by the community (github.com/TEAR-ERC/tandem).

Uphoff, C., D. A. May, and A.-A. Gabriel (2022a), A discontinuous Galerkin method for sequences of earthquakes and aseismic slip on multiple faults using unstructured curvilinear grids, Geophysical Journal International.
Balay, S., et al. (2022), PETSc/TAO users manual, Tech. rep., Argonne National Lab., Argonne, IL, USA (petsc.org/).

How to cite: Pranger, C., May, D. A., and Gabriel, A.-A.: Tandem: A discontinuous Galerkin method for sequences of earthquakes and aseismic slip on multiple faults using unstructured curvilinear grids, Galileo Conference: Solid Earth and Geohazards in the Exascale Era, Barcelona, Spain, 23–26 May 2023, GC11-solidearth-69, https://doi.org/10.5194/egusphere-gc11-solidearth-69, 2023.