- 1Forschungszentrum Jülich GmbH, Jülich, Germany (j.chew@fz-juelich.de)
- 2Department of Physical Geography, Faculty of Geosciences, Utrecht University, Utrecht, Netherlands (k.dejong1@uu.nl)
Forward simulation of geographical systems typically involves time step iterations of reading, compute and writing temporal states until the target end time. As the spatial fidelity of geographical data continues to be refined to achieve simulations with higher accuracy, so does the amount of operations associated with read, compute and write within each time step. Simulations of continental or global scale can only be completed within a reasonable time scale if the data can be distributed over multiple supercomputer nodes, in conjunction with parallel execution for the operations within each time step.
The LUE framework is designed to be a general software platform that enables scientists in defining custom computational models and achieving scalable performance on large-scale computing environment. There have been some efforts in the parallel implementations of compute operations which demonstrated good scaling behaviour[1,2]. This is achieved in LUE by distributing small subsets of the global geographical dataset to available CPU threads across multiple supercomputer nodes in an asynchronous manner, each subset having its own set of compute operations to be executed. The asynchronicity of the workload queueing allows large number of subsets to be processed in parallel, as well as ensuring full workload occupancy to all available compute resources.
This advancement, however, inadvertently highlighted the inefficiency of serial handling of read/write operations. File access operations like read/write is also known as Input/Output (I/O) operations. Just as scalable computation requires parallel algorithms to realize, scalable I/O requires the utilization of parallel I/O libraries to distribute the I/O workload over multiple I/O-specific compute nodes. However, combining parallel I/O with asynchronously spawned computations, while ensuring that the resulting file output is correct is challenging.
The challenge originates from complexities in ensuring data in memory is synced to the file storage system while the storage system is being acted on by all participating CPU threads. Often times, careless management of I/O results in unintended overwriting of file content due to concurrent accesses. This highlights the added difficulties in file access parallelization compared to in-memory operations such as computations. As such, much care is needed in the design and planning of file access and synchronisation patterns for meaningful gain in parallel I/O performance within an asynchronous many task execution.
In this work, we attempt to implement a parallel read/write access pattern that works well with the asynchronous parallel compute paradigm deployed within the LUE modelling framework. Integration of parallel I/O in an asynchronous execution brings additional benefit of interleaved compute and I/O tasks. Part of the I/O latencies can be hidden by concurrent compute workloads, which is harder to realize in a synchronous parallel execution. Success of this work will enable scalable compute and parallel file access for geoscience simulation workloads carried out via the LUE framework, reducing the overall computational resource consumption for large scale simulations.
References:
1. https://doi.org/10.1016/j.cageo.2022.105083
2. https://doi.org/10.1016/j.envsoft.2021.104998
How to cite: Chew, J. and de Jong, K.: Parallel file access: the missing piece in efficient large scale geosimulation, EGU General Assembly 2026, Vienna, Austria, 3–8 May 2026, EGU26-5035, https://doi.org/10.5194/egusphere-egu26-5035, 2026.