chapter three

3 Building an ML platform in Kubernetes

This chapter covers

Setting up an Amazon EKS Kubernetes cluster using Terraform
Creating an Ingress using NGINX Ingress Controller
Deploy an identity provider using Keycloak
Creating a scalable data science development environment using JupyterHub
Enabling GPU workloads in Kubernetes

After learning the fundamentals of Kubernetes in the previous chapter, you are now ready to get your hands dirty. We’ll need a real Kubernetes cluster for that. Deploying a production-grade Kubernetes cluster is quite a cumbersome task. Maintaining a Kubernetes cluster control plane is even more arduous. This is the reason most organizations use a managed Kubernetes service to reduce their operational burden. We will also use a managed Kubernetes cluster in this book. This book builds a data analytics and machine learning system on Amazon cloud.

While we’re using AWS for the implementation, it is our goal to keep the architecture we propose in this book cloud-agnostic. Therefore, we won't be sacrificing our commitment to open-source tooling in the pursuit of cloud-native excellence.

3.1 Creating a Kubernetes cluster

3.1.1 Creating an EKS cluster using Terraform

3.2 Architecting an MLOps system on Kubernetes

3.3 Setting up an identity provider

3.3.1 Configuring DNS

3.3.2 Getting a TLS certificate

3.3.3 Deploying an Ingress Controller

3.3.4 Creating an Identity Provider using Keycloak

3.3.5 Preparing Keycloak for client authentication

3.3.6 Create a user in Keycloak

3.3.7 Understanding the authentication workflow

3.4 Creating a self-service development environment

3.4.1 Deploy JupyterHub

3.4.2 Role-Based Access Control with Keycloak

3.4.3 Providing persistence to notebook servers

3.4.4 Customizing user environment

3.4.5 Enabling GPU workloads in Kubernetes

3.4.6 Reducing wastage by shutting down idle notebooks

3.4.7 Optimizing GPU node utilization

3.5 Summary