- The planning and design of a correct and performant OpenMP program
- How to quickly write loop-level OpenMP for modest parallelism
- Detecting correctness problems and improving robustness
- Fixing performance issues with OpenMP
- How to write scalable OpenMP for high performance
7.1 OpenMP introduction
7.1.1 OpenMP concepts
7.1.2 A very simple OpenMP program
7.2 Typical OpenMP use cases: Loop-level, High-level, and MPI+OpenMP
7.2.1 Loop-level OpenMP for quick parallelization
7.2.2 High-level OpenMP for better parallel performance
7.2.3 MPI + OpenMP for extreme scalability
7.3 Examples of standard loop-level OpenMP
7.3.1 Loop level OpenMP: Vector addition example
7.3.2 Stream triad example
7.3.3 Loop level OpenMP: Stencil example
7.3.4 Performance of loop-level examples
7.3.5 Reduction example of a global sum using OpenMP threading
7.3.6 Potential loop-level OpenMP issues
7.4 Variable scope is critically important in OpenMP for correctness
7.5 Function-level OpenMP: making a whole function thread parallel
7.6 Improving parallel scalability with high-level OpenMP
7.6.1 How to implement high-level OpenMP
7.6.2 Example of implementing high-level OpenMP
7.7 Hybrid threading and vectorization with OpenMP
7.8 Advanced examples using OpenMP
7.8.1 Stencil example with a separate pass for the x and y directions
7.8.2 Kahan summation implementation with OpenMP threading
7.8.3 Threaded implementation of the prefix scan algorithm
7.9 Threading tools essential for robust implementations
7.9.1 Using Allinea-Map to get a quick high-level profile of your application
7.9.2 Finding your thread race conditions with Intel thread inspector
7.10 Example of task-based support algorithm
7.11 Further explorations
7.11.1 Additional reading
7.11.2 Exercises
7.12 Summary