In the last chapter, you started with the analysis of the DC taxi fare data set. After the data set was converted to an analysis-friendly Apache Parquet format, you crawled the data schema and used the Athena interactive querying service to explore the data. These first steps of data exploration surfaced numerous data quality issues, motivating you to establish a rigorous approach to deal with the garbage in, garbage out problem in your machine learning project. Next, you learned about the VACUUM principles for data quality along with several case studies illustrating the real-world relevance of the principles. Finally, you applied VACUUM to the DC taxi data set to “clean” it and prepare a data set of sufficient quality to proceed with sampling from the data set for machine learning.