7 OpenMP that performs

 

This chapter covers

  • Planning and designing a correct and performant OpenMP program
  • Writing loop-level OpenMP for modest parallelism
  • Detecting correctness problems and improving robustness
  • Fixing performance issues with OpenMP
  • Writing scalable OpenMP for high performance

As many-core architectures grow in size and popularity, the details of thread-level parallelism become a critical factor in software performance. In this chapter, we first introduce the basics of Open Multi-Processing (OpenMP), a shared memory programming standard, and why it’s important to have a fundamental understanding of how OpenMP functions. We will look at sample problems ranging in difficulty from a simple common “Hello World” example to a complex split-direction stencil implementation with OpenMP parallelization. We will thoroughly analyze the interaction between OpenMP directives and the underlying OS kernel, as well as the memory hierarchy and hardware features. Finally, we will investigate a promising high-level approach to OpenMP programming for future extreme-scale applications. We show that high-level OpenMP is efficient for algorithms containing many short loops of computational work.

7.1 OpenMP introduction

 
 

7.1.1 OpenMP concepts

 
 
 

7.1.2 A simple OpenMP program

 
 

7.2 Typical OpenMP use cases: Loop-level, high-level, and MPI plus OpenMP

 
 
 
 

7.2.1 Loop-level OpenMP for quick parallelization

 
 
 

7.2.2 High-level OpenMP for better parallel performance

 
 

7.2.3 MPI plus OpenMP for extreme scalability

 
 

7.3 Examples of standard loop-level OpenMP

 
 
 

7.3.1 Loop level OpenMP: Vector addition example

 
 

7.3.2 Stream triad example

 

7.3.3 Loop level OpenMP: Stencil example

 
 

7.3.4 Performance of loop-level examples

 
 
 

7.3.5 Reduction example of a global sum using OpenMP threading

 
 

7.3.6 Potential loop-level OpenMP issues

 
 

7.4 Variable scope importance for correctness in OpenMP

 

7.5 Function-level OpenMP: Making a whole function thread parallel

 
 

7.6 Improving parallel scalability with high-level OpenMP

 

7.6.1 How to implement high-level OpenMP

 
 
sitemap

Unable to load book!

The book could not be loaded.

(try again in a couple of minutes)

manning.com homepage