chapter four

4 Using NumPy more efficiently

This chapter covers:

Rediscovering NumPy from a performance perspective
Leveraging NumPy views for computing efficiency and memory conservation
Introducing array programming as a paradigm
Configuring NumPy internals for efficiency

It is difficult to overstate the importance of NumPy in the context of data analytics with Python. This book could might well be called "High Performance Python with NumPy." NumPy will be found somewhere in your stack: You use Pandas? NumPy. You use scikit-learn? NumPy. Dask? NumPy. SciPy? NumPy. Matplotlib? NumPy. TensorFlow? NumPy. If you are doing data analytics in Python, almost surely your answer includes NumPy.

NumPy provides N-dimensional array objects—matrices being just one of many examples of such objects—along with functionality to manipulate those. The implementation is extremely efficient with its core written in Fortran and C. Almost all data analysis problems can be modeled at its core by N-dimensional arrays hence the pervasiveness of NumPy.

Given the core importance and pervasiveness of NumPy in Python’s data analysis science some topics related to it will be discussed in other chapters, notably:

4.1 Understanding NumPy from a performance perspective

4.1.1 Copies and views

4.1.2 Understanding NumPy’s view machinery

4 Using NumPy more efficiently

This chapter covers:

4.1 Understanding NumPy from a performance perspective

4.1.1 Copies and views

4.1.2 Understanding NumPy’s view machinery

4.1.3 Making use of views for efficiency

4.2 Using array programming

4.2.1 Broadcasting in NumPy

4.2.2 Applying array programming to image manipulation

4.2.3 Developing a "vectorized mentality"

4.3 Tuning NumPy’s internal architecture for performance

4.3.1 An overview of NumPy dependencies

4.3.2 How to tune NumPy in your Python distribution

4.3.3 Threads in NumPy

4.4 Summary