3 Performance limits and profiling

 

This chapter covers:

  • How to understand the limiting aspect of the performance of your application. Is it flops, memory bandwidth, or reading data from disk?
  • How to evaluate hardware performance for the target of the next set of changes. For example, if the plan is to add vectorization to the code, a vectorized benchmark and theoretical performance for vectorized code might be helpful.
  • How to measure the current performance of your application

Programmer resources are scarce. You need to target them to where they have the most impact. How do you do this if you don’t know the performance characteristics of your application and the hardware you plan to run on? That is what this chapter is meant to address. By measuring the performance of your hardware and your application, you can determine where it would be most effective to spend your development time.

3.1       Know your application’s potential performance limits

3.2       Determine your hardware capabilities: benchmarking

3.2.1   Tools for gathering system characteristics

3.2.2   Calculating theoretical maximum FLOPS

3.2.3   The memory hierarchy and theoretical memory bandwidth

3.2.4   Empirical measurement of bandwidth and flops

3.2.5   Calculating the machine balance between flops and bandwidth

3.3       Characterizing your application: profiling

3.3.1   Profiling Tools

3.3.2   Empirical measurement of processor clock frequency and energy consumption

3.3.3   Tracking memory during runtime

3.5       Summary

sitemap