NumPy, which stands for Numerical Python, is the engine that powers Pythonic data science. Python, despite its many virtues, is simply not suited for large-scale numeric analysis. Hence, data scientists must rely on the external NumPy library to efficiently manipulate and store numeric data. NumPy is an incredibly powerful tool for processing large collections of raw numbers. Thus, many of Python’s external data processing libraries are NumPy compatible. One such library is Matplotlib, which we introduced in the previous section. Other NumPy-driven libraries are discussed in later portions of the book. This section focuses on randomized numerical simulations. We will use NumPy to analyze billions of random data points; these random observations will allow us to learn hidden probabilities.
NumPy should already be installed in your working environment as one of the Matplotlib requirements. Let’s import NumPy as np based on common NumPy usage convention.