Chapter 4. Handling large data on a single computer

This chapter covers

Working with large data sets on a single computer
Working with Python libraries suitable for larger data sets
Understanding the importance of choosing correct algorithms and data structures
Understanding how you can adapt algorithms to work inside databases

What if you had so much data that it seems to outgrow you, and your techniques no longer seem to suffice? What do you do, surrender or adapt?

Luckily you chose to adapt, because you’re still reading. This chapter introduces you to techniques and tools to handle larger data sets that are still manageable by a single computer if you adopt the right techniques.

This chapter gives you the tools to perform the classifications and regressions when the data no longer fits into the RAM (random access memory) of your computer, whereas chapter 3 focused on in-memory data sets. Chapter 5 will go a step further and teach you how to deal with data sets that require multiple computers to be processed. When we refer to large data in this chapter we mean data that causes problems to work with in terms of memory or speed but can still be handled by a single computer.

Chapter 4. Handling large data on a single computer

This chapter covers

4.1. The problems you face when handling large data

4.2. General techniques for handling large volumes of data

4.3. General programming tips for dealing with large data sets

4.4. Case study 1: Predicting malicious URLs

4.5. Case study 2: Building a recommender system inside a database

4.6. Summary

Chapter 4. Handling large data on a single computer

This chapter covers

4.1. The problems you face when handling large data

4.2. General techniques for handling large volumes of data

4.3. General programming tips for dealing with large data sets

4.4. Case study 1: Predicting malicious URLs

4.5. Case study 2: Building a recommender system inside a database

4.6. Summary

Unable to load book!