As many-core architectures grow in size and popularity, the details of thread-level parallelism become a critical factor in software performance. In this chapter, we first introduce the basics of Open Multi-Processing (OpenMP), a shared memory programming standard, and why it’s important to have a fundamental understanding of how OpenMP functions. We will look at sample problems ranging in difficulty from a simple common “Hello World” example to a complex split-direction stencil implementation with OpenMP parallelization. We will thoroughly analyze the interaction between OpenMP directives and the underlying OS kernel, as well as the memory hierarchy and hardware features. Finally, we will investigate a promising high-level approach to OpenMP programming for future extreme-scale applications. We show that high-level OpenMP is efficient for algorithms containing many short loops of computational work.