chapter seven

7 OpenMP that performs

 

This chapter covers

  • The planning and design of a correct and performant OpenMP program
  • How to quickly write loop-level OpenMP for modest parallelism
  • Detecting correctness problems and improving robustness
  • Fixing performance issues with OpenMP
  • How to write scalable OpenMP for high performance

7.1   OpenMP introduction

7.1.1   OpenMP concepts

7.1.2   A very simple OpenMP program

7.2   Typical OpenMP use cases: Loop-level, High-level, and MPI+OpenMP

7.2.1   Loop-level OpenMP for quick parallelization

7.2.2   High-level OpenMP for better parallel performance

7.2.3   MPI + OpenMP for extreme scalability

7.3   Examples of standard loop-level OpenMP

7.3.1   Loop level OpenMP: Vector addition example

7.3.2   Stream triad example

7.3.3   Loop level OpenMP: Stencil example

7.3.4   Performance of loop-level examples

7.3.5   Reduction example of a global sum using OpenMP threading

7.3.6   Potential loop-level OpenMP issues

7.4   Variable scope is critically important in OpenMP for correctness

7.5   Function-level OpenMP: making a whole function thread parallel

7.6   Improving parallel scalability with high-level OpenMP

7.6.1   How to implement high-level OpenMP

7.6.2   Example of implementing high-level OpenMP

7.7   Hybrid threading and vectorization with OpenMP

7.8   Advanced examples using OpenMP

7.8.1   Stencil example with a separate pass for the x and y directions

7.8.2   Kahan summation implementation with OpenMP threading

7.8.3   Threaded implementation of the prefix scan algorithm

7.9   Threading tools essential for robust implementations

7.9.1   Using Allinea-Map to get a quick high-level profile of your application

7.9.2   Finding your thread race conditions with Intel thread inspector

7.10   Example of task-based support algorithm

7.11   Further explorations

7.11.1   Additional reading

7.11.2   Exercises

7.12   Summary