EGU24-13522, updated on 09 Mar 2024
https://doi.org/10.5194/egusphere-egu24-13522
EGU General Assembly 2024
© Author(s) 2024. This work is distributed under
the Creative Commons Attribution 4.0 License.

Developing cutting-edge geophysical data and software infrastructure to future-proof national scale geophysical assets for 2030 computation  

Nigel Rees1, Lesley Wyborn1,2,3, Rui Yang1, Jo Croucher1, Hannes Hollmann1, Rebecca Farrington2, Yue Sun1, Yiling Liu1, and Ben Evans1
Nigel Rees et al.
  • 1National Computation Infrastructure, Australian National University, Canberra, Australia
  • 2AuScope, Wurundjeri Country, Melbourne, Australia
  • 3Australian Research Data Commons, Australian National University, Canberra, Australia

The 2030 Geophysics Collections Project was a collaborative effort between the National Computational Infrastructure (NCI), AuScope, Terrestrial Ecosystem Research Network (TERN) and the Australian Research Data Commons (ARDC) that aimed to create a nationally transparent, online geophysics data environment suitable for programmatic access on High Performance Computing (HPC) at the NCI. Key focus areas of this project included the publication of internationally standardised geophysical data on NCI’s Gadi Tier 1 research supercomputer, as well as the development of geophysics and AI-ML related specialised software environments that allow for efficient multi-physics processing, modeling and analysis at scale on HPC systems.

Raw and high-resolution versions of AuScope funded Magnetotelluric (MT), Passive Seismic (PS) and Distributed Acoustic Sensing (DAS) datasets are now accessible on HPC along with selected higher-level data products. These datasets have been structured to enable horizontal integration, allowing disparate datasets to be accessed in real-time as online web services from other repositories. Additionally, vertical integration has been established for MT data, linking the source field acquired datasets with derivative processed data products at the NCI repository, as well as linking to other derivative data products hosted by external data portals.

To support next-generation geophysical research at scale, these valuable datasets and accompanying metadata need to be captured in machine-readable formats and leverage international standards, vocabularies and identifiers. For MT, automations were developed that generate different MT processing levels at scale in internationally compliant high-performant data and metadata standards. By parallelising these automated processes across HPC clusters, one can rapidly generate different processing levels for entire geophysical surveys in a matter of minutes. 

In parallel with these data enhancements, the NCI-geophysics software environment was developed, which compiled and containerised a wide range of geophysical and data science related packages in Python, Julia and R. In addition, the NCI-AI-ML environment bundled together popular machine learning and data science packages and configured them for HPC GPU architectures. Standalone open source geophysical applications that support parallel computation have also been added to NCI’s Gadi supercomputer. 

The 2030 Geophysics Collections Project has made the first strides towards enabling a new era in Australian geophysical research, opening up the potential for rapid multi-physics geophysical analysis at scale with the computational tools available within the NCI. By establishing and continuing to build on this geophysical infrastructure, the nation will be better equipped to address the various geophysical challenges and opportunities in the decades ahead.

How to cite: Rees, N., Wyborn, L., Yang, R., Croucher, J., Hollmann, H., Farrington, R., Sun, Y., Liu, Y., and Evans, B.: Developing cutting-edge geophysical data and software infrastructure to future-proof national scale geophysical assets for 2030 computation  , EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-13522, https://doi.org/10.5194/egusphere-egu24-13522, 2024.