Parts 1 and 2 were all about data transformation, but we’re going to go above and beyond that by tackling scalable machine learning in part 3. While not a complete treatment of machine learning in itself, this part will give you the foundation to write your own ML programs in a robust and repeatable fashion.
Chapter 12 sets the stage for machine learning by building features, curated bits of information to use for the training process. Feature engineering itself is akin to purposeful data transformation. Get ready to use the skills learned in parts 1 and 2!
Chapter 13 introduces ML pipelines, Spark’s way to encapsulate ML workflows in a robust and repeatable way. Now, more importantly than ever, good code structure makes or breaks ML programs, so this tool will keep you sane as you build your models.