chapter five

5 Re-implementing critical code with Cython

This chapter covers

How to re-implement Python code more efficiently
Understanding Cython from a data processing perspective
Profiling Cython code
Using Cython to implement performant NumPy functions
Releasing the GIL to implement true threaded parallelism

Python is slow. The standard implementation is slow, and the language’s dynamic features pay a performance toll. Many Python libraries are performant precisely because they are partially implemented in lower-level languages, making available efficient data processing algorithms. But sometimes we will need to implement our own high-performance algorithms in something faster than Python. In this chapter, we will consider Cython, a superset of Python that is converted to C and is substantially more performant than Python.

There are plenty of alternatives other than Cython that can be integrated with Python for performance, so we will start with a brief overview of the available options. After that, we will delve into Cython properly.

5.1 Overview of techniques for efficient code re-implementation

5.2 A whirlwind tour of Cython

5.2.1 A naive implementation in Cython

5.2.2 Using Cython annotations to increase performance

5.2.3 Why annotations are fundamental to performance

5.2.4 Adding typing to function returns

5.3 Profiling Cython code

5.3.1 Using Python’s built-in profiling infrastructure

5.3.2 Using line_profiler

5.4 Optimizing array access with Cython memoryviews

5.4.1 The takeaway

5.4.2 Cleaning up all internal interactions with Python

5.5 Writing NumPy generalized universal functions in Cython

5.5.1 The takeaway

5.6 Advanced array access in Cython

5.6.1 Bypassing the GIL’s limitation on running multiple threads at a time