This chapter has two topics that are intimately coupled: (1) the introduction of performance models increasingly dominated by data movement and, thus, necessarily (2) the underlying design and structure of data. Although it may seem secondary to performance, the data structure and its design are critical. This must be determined in advance because it dictates the entire form of the algorithms, code, and later, the parallel implementation.
The choice of data structures and, thereby, the data layout often determines the performance that you can achieve and in ways that are not always obvious when the design decisions are made. Thinking about the data layout and its performance impacts is at the core of a new and growing programming approach called data-oriented design. This approach considers the patterns of how data will be used in the program and proactively designs around it. Data-oriented design gives us a data-centric view of the world, which is also consistent with our focus on memory bandwidth rather than floating-point operations (flops). In summary, for performance, our approach is to think about