1 An urgent need for efficiency in data processing

This chapter covers

The challenges of dealing with the exponential growth of data
Comparing traditional and recent computing architectures
The role and shortcomings of Python in modern data analytics
Techniques for delivering efficient Python computing solutions

An enormous amount of data is being collected all the time, at intense speeds, and from a broad scope of sources. It is collected whether or not there is currently a use for it. It is collected whether or not there is a way to process, store, access, or learn from it. Before data scientists can analyze it, before designers and developers and policymakers can use it to create products, services, and programs, software engineers must find ways to store and process it. Now more than ever those engineers need efficient ways to improve performance and optimize storage.

In this book, I share a collection of strategies for performance and storage optimization that I use in my own work. Simply throwing more machines at the problem is often neither possible nor helpful. So the solutions I introduce here rely more on understanding and exploiting what we all have at hand: coding approaches, hardware and system architectures, available software, and, of course, nuances of the Python language, libraries, and ecosystem.

1.1 How bad is the data deluge?

1.2 Modern computing architectures and high-performance computing

1 An urgent need for efficiency in data processing

This chapter covers

1.1 How bad is the data deluge?

1.2 Modern computing architectures and high-performance computing

1.2.1 Changes inside the computer

1.2.2 Changes in the network

1.2.3 The cloud

1.3 Working with Python’s limitations

1.3.1 The Global Interpreter Lock

1.4 A summary of the solutions

Summary