3 Concurrency, parallelism, and asynchronous processing

 

This chapter covers

  • Using asynchronous processing to design applications with reduced wait times
  • Threading in Python and its limitations on writing parallel applications
  • Making multiprocessing applications to take full advantage of multicore computers

Modern CPU architectures allow for more than one sequential program to be executed at the same time, permitting impressive gains in processing speeds. In fact, speeds can increase right up to the number of parallel processing units (e.g., CPU cores) that are available. The bad news is that to take advantage of all this parallel processing speed for our programs, we need to make our code parallel-aware, and Python is ill-suited for writing parallel solutions. Most Python code is sequential, so it is not able to use all the available CPU resources. Furthermore, the implementation of the Python interpreter is, as we will see, not tuned for parallel processing. In other words, our usual Python code cannot make use of modern hardware’s capabilities, and it will always run at a much lower speed than the hardware allows. So we need to devise techniques to help Python make use of all the available CPU power.

3.1 Writing the scaffold of an asynchronous server

3.1.1 Implementing the scaffold for communicating with clients

3.1.2 Programming with coroutines

3.1.3 Sending complex data from a simple synchronous client

3.1.4 Alternative approaches to interprocess communication

3.1.5 The takeaway: Asynchronous programming

3.2 Implementing a basic MapReduce engine

3.2.1 Understanding MapReduce frameworks

3.2.2 Developing a very simple test scenario

3.2.3 A first attempt at implementing a MapReduce framework

3.3 Implementing a concurrent version of a MapReduce engine

3.3.1 Using concurrent.futures to implement a threaded server

3.3.2 Asynchronous execution with futures

3.3.3 The GIL and multithreading

3.4 Using multiprocessing to implement MapReduce

3.4.1 A solution based on concurrent.futures

3.4.2 A solution based on the multiprocessing module