3 Performance limits and profiling

 

This chapter covers

  • Understanding the limiting aspect of application performance
  • Evaluating performance for the limiting hardware components
  • Measuring the current performance of your application

Programmer resources are scarce. You need to target these resources so that they have the most impact. How do you do this if you don’t know the performance characteristics of your application and the hardware you plan to run on? That is what this chapter means to address. By measuring the performance of your hardware and your application, you can determine where it’s most effective to spend your development time.

Note

We encourage you to follow along with the exercises for this chapter. The exercises can be found at https://github.com/EssentialsofParallel Computing/Chapter3.

3.1 Know your application’s potential performance limits

3.2 Determine your hardware capabilities: Benchmarking

3.2.1 Tools for gathering system characteristics

3.2.2 Calculating theoretical maximum flops

3.2.3 The memory hierarchy and theoretical memory bandwidth

3.2.4 Empirical measurement of bandwidth and flops

3.2.5 Calculating the machine balance between flops and bandwidth

3.3 Characterizing your application: Profiling

3.3.1 Profiling tools

3.3.2 Empirical measurement of processor clock frequency and energy consumption

3.3.3 Tracking memory during run time

3.4 Further explorations

3.4.1 Additional reading

3.4.2 Exercises

Summary