16 File operations for a parallel world

 

This chapter covers

  • Modifying a parallel application for standard file operations
  • Writing out data using parallel file operations with MPI-IO and HDF5
  • Tuning parallel file operations for different parallel filesystems

Filesystems create a streamlined workflow of retrieving, storing, and updating data. For any computing work, the product is the output, whether it be data, graphics, or statistics. This includes final results but also intermediate output for graphics, checkpointing, and analysis. Checkpointing is a special need on large HPC systems with long-running calculations that might span days, weeks, or months.

Definition

Checkpointing is the practice of periodically storing the state of a calculation to disk so that the calculation can be restarted in the event of system failures or because of finite length run times in a batch system

When processing data for highly parallel applications, there needs to be a safe and performant way of reading and storing data at run time. Therein lies the need to understand file operations in a parallel world. Some of the concerns you should keep in mind are correctness, reducing duplicate output, and performance.

16.1 The components of a high-performance filesystem

16.2 Standard file operations: A parallel-to-serial interface

16.3 MPI file operations (MPI-IO) for a more parallel world

16.4 HDF5 is self-describing for better data management

16.5 Other parallel file software packages

16.6 Parallel filesystem: The hardware interface

16.6.1 Everything you wanted to know about your parallel file setup but didn’t know how to ask

16.6.2 General hints that apply to all filesystems

16.6.3 Hints specific to particular filesystems

16.7 Further explorations

16.7.1 Additional reading

16.7.2 Exercises

Summary