chapter four

4 High performance NumPy

This chapter covers

Rediscovering NumPy from a performance perspective
Leveraging NumPy views for computing efficiency and memory conservation
Introducing array programming as a paradigm
Configuring NumPy internals for efficiency

It is difficult to overstate the importance of NumPy [for doing] data analytics with Python. This book could might well be called "High Performance Python with NumPy." NumPy will be found somewhere in your stack: You use Pandas? NumPy. You use scikit-learn? NumPy. Dask? NumPy. SciPy? NumPy. Matplotlib? NumPy. TensorFlow? NumPy. If you are doing data analytics in Python, almost surely your answer includes NumPy.

NumPy provides N-dimensional array objects—[such as matrices, though there are many others]—along with functionality to manipulate them. The implementation is extremely efficient with its core written in Fortran and C. Almost all data analysis problems can be modeled at their core by N-dimensional arrays; this is why NumPy is pervasive in this field.

Given the core importance and pervasiveness of NumPy in Python’s data analysis science some topics related to it will be discussed in other chapters, notably:

4.1 Understanding NumPy from a performance perspective

4.1.1 Copies and views of existing arrays

4.1.2 Understanding NumPy’s view machinery

4 High performance NumPy

This chapter covers

4.1 Understanding NumPy from a performance perspective

4.1.1 Copies and views of existing arrays

4.1.2 Understanding NumPy’s view machinery

4.1.3 Making use of views for efficiency

4.2 Using array programming

4.2.1 Broadcasting in NumPy

4.2.2 Applying array programming

4.2.3 Developing a "vectorized mentality"

4.3 Tuning NumPy’s internal architecture for performance

4.3.1 An overview of NumPy dependencies

4.3.2 How to tune NumPy in your Python distribution

4.3.3 Threads in NumPy

4.4 Summary