4 High-performance NumPy

 

This chapter covers

  • Rediscovering NumPy from a performance perspective
  • Leveraging NumPy views for computing efficiency and memory conservation
  • Introducing array programming as a paradigm
  • Configuring NumPy internals for efficiency

It is difficult to overstate the importance of NumPy for doing data analytics with Python. This book might as well be called High-Performance Python with NumPy. NumPy will be found somewhere in your stack: Do you use pandas? NumPy. Do you use scikit-learn? NumPy. Dask? NumPy. SciPy? NumPy. Matplotlib? NumPy. TensorFlow? NumPy. If you are doing data analytics in Python, almost surely your answer includes NumPy.

NumPy is a Python library that provides N-dimensional or multidimensional array objects such as matrices, which are two-dimensional, along with the functionality to manipulate them. The implementation is extremely efficient with its core written in Fortran and C. Many data analysis problems can be modeled at their core by N-dimensional arrays; this is why NumPy is pervasive in this field.

Given the importance and wide usage of NumPy in Python’s data analysis, some topics related to it will be discussed in other chapters, notably:

4.1 Understanding NumPy from a performance perspective

4.1.1 Copies vs. views of existing arrays

4.1.2 Understanding NumPy’s view machinery

4.1.3 Making use of views for efficiency

4.2 Using array programming

4.2.1 The takeaway

4.2.2 Broadcasting in NumPy

4.2.3 Applying array programming

4.2.4 Developing a vectorized mentality

4.3 Tuning NumPy’s internal architecture for performance

4.3.1 An overview of NumPy dependencies

4.3.2 How to tune NumPy in your Python distribution

4.3.3 Threads in NumPy

Summary