In our last chapter, we focused on building a feature engineering pipeline that would maximize our model’s performance on our dataset. This is generally the stated goal of an ML problem. Our goal in this chapter will be to not only monitor and measure model performance but also to keep track of how our model treats different groups of data because sometimes data are people.
In our case study today, data are people whose lives are on the line. Data are people who simply want to have the best life they can possibly have. As we navigate the waters around bias and discrimination, around systemic privilege and racial discrepancies, we urge you to keep in mind that when we are talking about rows of data, we are talking about people, and when we are talking about features, we are talking about aggregating years—if not decades—of life experiences into a single number, class, or Boolean. We must be respectful of our data and of the people our data represents. Let’s get started.