2 Extracting performance from built-in features

This chapter covers:

Profiling code to find speed and memory bottlenecks
Making more efficient use of existing Python data structures
Understanding Python’s memory cost of allocating typical data structures
Using lazy programming techniques to process large amounts of data

There are many tools and libraries to help us write more efficient Python. But before we dive into all the external options to improve performance, let’s first take a closer look at how we can write pure Python code that is more efficient, in both computing and IO performance. Indeed many, though certainly not all, Python performance problems can be solved by being more mindful of Python’s limits and capabilities.

2.1 Introducing the project dataset

2.1.1 An architecture for big data processing

2.1.2 Preparing the data

2.2 Profiling code to detect performance bottlenecks

2.2.1 Using Python’s built-in profiling module

2.2.2 Visualizing profiling information

2.2.3 Line profiling

2.3 Optimizing basic data structures for speed: lists, sets, dictionaries

2.3.1 Performance of list searches

2.3.2 Using the `bisect` module

2.3.3 Content aware search approaches

2.3.4 Searching using sets or dictionaries

2.3.5 List complexity in Python

2.4 Finding excessive memory allocation

2.4.1 Navigating the minefield of Python memory estimation

2.4.2 Using more compact representations

2.4.3 Packing many observations in a number

2.4.4 Use the `array` module

2.4.5 Systematizing what we have learned: Estimating the memory usage of Python objects

2.5 Using laziness and generators for big-data pipelining