chapter seven

7 Data Analysis & Preparation

This chapter covers

Introducing the Captone Projects
Building and launching images for Kubeflow notebooks
Using Kubeflow notebooks for data analysis
Data passing in Kubeflow Pipelines
Writing Kubeflow components that pass data
Developing the Data Preparation pipeline for Object Detection, including downloading the dataset and splitting it into train, validation, and test

This chapter kicks off with two capstone projects: one centered on detecting identity cards and the other on recommending movies. As we progress, upcoming chapters will explore different stages of the ML pipeline—model training, evaluation, and serving. Our focus on practical application means that we'll heavily integrate these concepts into a tangible project within the pipeline. Through concrete real-world implementations, our aim is to solidify the relevance and understanding of these discussed concepts within our ongoing project.

Capstone Project 1: Identity Card Detection

The landscape of Machine Learning is ever-evolving, with new developments surfacing every other week. During the era when Deep Learning took center stage, innovations like new versions of YOLO (You Only Look Once) and ResNet became the talk of the town.

7.1 Data analysis

7.1.1 Launching a notebook server in Kubeflow

7.1.2 Workspace and data volumes

7.1.3 Configurations and affinity / tolerations

7.1.4 Customizing the menu

7.1.5 Creating a custom Kubeflow notebook image

7.2 Data Passing

7.2.1 Scenario 1: Passing Simple Values to Downstream Components

7.2.2 Scenario 2: Passing Paths for Larger Data

7.2.3 Rules for Input and Output Pipeline Parameters

7.3 Project: Data Preparation

7.3.1 Data preparation: Object detection

7.3.2 Data preparation: Movie recommender

7.4 Summary