Part 2 CPU: The parallel workhorse

 

Today, every developer should understand the growing parallelism available within modern CPU processors. Unlocking the untapped performance of CPUs is a critical skill for parallel and high performance computing applications. To show how to take advantage of CPU parallelism, we cover

  • Using vector hardware
  • Using threads for parallel work across multi-core processors
  • Coordinating work on multiple CPUs and multi-core processors with message passing

The CPU’s parallel capabilities need to be at the core of your parallel strategy. Because it’s the central workhorse, the CPU controls all the memory allocations, memory movement, and communication. The application developer’s knowledge and skill are the most important factors for fully using the CPU’s parallelism. CPU optimization is not automatically done by some magic compiler. Commonly, many of the parallel resources on the CPU go untapped by applications. We can break down the available CPU parallelism into three components in increasing order of effort. These are

  • Vectorization—Exploits the specialized hardware that can do more than one operation at a time
  • Multi-core and threading—Spreads out work across the many processing cores in today’s CPUs
  • Distributed memory—Harnesses multiple nodes into a single, cooperative computing application