7 Memory management with Python
This chapter covers
- How to profile your code for memory usage and issues
- Handling and consuming large datasets
- Optimizing data types for memory
- Training an ML model when your data doesn’t fit in memory
- Making use of Python’s data structures for memory efficiency
For this chapter, our primary dataset will be the ad clicks dataset available at this link: https://www.kaggle.com/competitions/avazu-ctr-prediction/data. The training dataset available here has over 40 million rows. Many of the techniques we will discuss in this chapter are also applicable for even larger datasets, such as ones in the billions of rows.
In Chapter 6 (Section 71.2), we covered using line-profiler to profile your code for computational speed / efficiency issues. This helped us to easily identify what points in our code are taking the longest to run. We’re going to get started in this chapter by discussing memory profiling, which is a similar mechanism for identifying what points in your code cause the highest amount of memory consumption.
7.1 Memory profiler
A memory profiler is a tool that allows you to identify how much memory is being consumed in various actions in your code. Similar to what we covered in the last chapter around computational profiling, we can perform an analagous check for memory.