chapter eleven

11 Directive-based GPU programming

This chapter covers

Selecting the best directive-based language for your GPU
Using directives or pragmas to port your code to GPUs or other accelerator devices
Optimizing the performance of your GPU application

There has been a scramble to establish standards for directive-based languages for programming for GPUs. The pre-eminent directive-based language, OpenMP, released in 1997, was the natural candidate to look to as an easier way to program GPUs. At that time, OpenMP was playing catchup and mainly focused on new CPU capabilities. To address GPU accessibility, in 2011, a small group of compiler vendors, (Cray, PGI and CAPS) along with NVIDIA as the GPU vendor, joined to release the OpenACC standard, providing a simpler pathway to GPU programming. Similar to what you saw in chapter 7 for OpenMP, OpenACC also uses pragmas. In this case, OpenACC pragmas direct the compiler to generate GPU code. A couple of years later, the OpenMP Architecture Review Board (ARB) added their own pragma support for GPUs to the OpenMP standard.

We’ll work through some basic examples in OpenACC and OpenMP to give you an idea of how they work. We suggest that you try out the examples on your target system to see what compilers are available and their current status.

Note

As always, we encourage you to follow along with the examples for this chapter at https://github.com/EssentialsofParallelComputing/Chapter11.

11.1 Process to apply directives and pragmas for a GPU implementation

11.2 OpenACC: The easiest way to run on your GPU

11.2.1 Compiling OpenACC code

11.2.2 Parallel compute regions in OpenACC for accelerating computations

11 Directive-based GPU programming

This chapter covers

Note

11.1 Process to apply directives and pragmas for a GPU implementation

11.2 OpenACC: The easiest way to run on your GPU

11.2.1 Compiling OpenACC code

11.2.2 Parallel compute regions in OpenACC for accelerating computations

11.2.3 Using directives to reduce data movement between the CPU and the GPU

11.2.4 Optimizing the GPU kernels

11.2.5 Summary of performance results for the stream triad

11.2.6 Advanced OpenACC techniques

11.3 OpenMP: The heavyweight champ enters the world of accelerators

11.3.1 Compiling OpenMP code

11.3.2 Generating parallel work on the GPU with OpenMP

11.3.3 Creating data regions to control data movement to the GPU with OpenMP

11.3.4 Optimizing OpenMP for GPUs

11.3.5 Advanced OpenMP for GPUs

11.4 Further explorations