1 Introduction to serverless machine learning

published book

This chapter covers

What serverless machine learning is and why you should care
The difference between machine learning code and a machine learning platform
How this book teaches about serverless machine learning
The target audience for this book
What you can learn from this book

A Grand Canyon—like gulf separates experimental machine learning code and production machine learning systems. The scenic view across the “canyon” is magical: when a machine learning system is running successfully in production it can seem prescient. The first time I started typing a query into a machine learning—powered autocomplete search bar and saw the system anticipate my words, I was hooked. I must have tried dozens of different queries to see how well the system worked. So, what does it take to trek across the “canyon?”

It is surprisingly easy to get started. Given the right data and less than an hour of coding time, it is possible to write the experimental machine learning code and re-create the remarkable experience I have had using the search bar that predicted my words. In my conversations with information technology professionals, I find that many have started to experiment with machine learning. Online classes in machine learning, such as the one from Coursera and Andrew Ng, have a wealth of information about how to get started with machine learning basics. Increasingly, companies that hire for information technology jobs expect entry-level experience with machine learning.¹

While it is relatively easy to experiment with machine learning, building on the results of the experiments to deliver products, services, or features has proven to be difficult. Some companies have even started to use the word unicorn to describe the unreasonably hard-to-find machine learning practitioners with the skills needed to launch production machine learning systems. Practitioners with successful launch experience often have skills that span machine learning, software engineering, and many information technology specialties.

This book is for those who are interested in trekking the journey from experimental machine learning code to a production machine learning system. In this book, I will teach you how to assemble the components for a machine learning platform and use them as a foundation for your production machine learning system. In the process, you will learn:

How to use and integrate public cloud services, including the ones from Amazon Web Services (AWS), for machine learning, including data ingest, storage, and processing
How to assess and achieve data quality standards for machine learning from structured data
How to engineer synthetic features to improve machine learning effectiveness
How to reproducibly sample structured data into experimental subsets for exploration and analysis
How to implement machine learning models using PyTorch and Python in a Jupyter notebook environment
How to implement data processing and machine learning pipelines to achieve both high throughput and low latency
How to train and deploy machine learning models that depend on data processing pipelines
How to monitor and manage the life cycle of your machine learning system once it is put in production

Why should you invest the time to learn these skills? They will not make you a renowned machine learning researcher or help you discover the next ground-breaking machine learning algorithm. However, if you learn from this book, you can prepare yourself to deliver the results of your machine learning efforts sooner and more productively, and grow to be a more valuable contributor to your machine learning project, team, or organization.

1.1 What is a machine learning platform?

If you have never heard of the phrase “yak shaving” as it is used in the information technology industry,² here’s a hypothetical example of how it may show up during a day in a life of a machine learning practitioner:

My company wants our machine learning system to launch in a month . . . but it is taking us too long to train our machine learning models . . . so I should speed things up by enabling graphical processing units (GPUs) for training . . . but our GPU device drivers are incompatible with our machine learning framework . . . so I need to upgrade to the latest Linux device drivers for compatibility . . . which means that I need to be on the new version of the Linux distribution.

There are many more similar possibilities in which you need to “shave a yak” to speed up machine learning. The contemporary practice of launching machine learning—based systems in production and keeping them running has too much in common with the yak-shaving story. Instead of focusing on the features needed to make the product a resounding success, too much engineering time is spent on apparently unrelated activities like re-installing Linux device drivers or searching the web for the right cluster settings to configure the data processing middleware.

Why is that? Even if you have the expertise of machine learning PhDs on your project, you still need the support of many information technology services and resources to launch the system. “Hidden Technical Debt in Machine Learning Systems,” a peer-reviewed article published in 2015 and based on insights from dozens of machine learning practitioners at Google, advises that mature machine learning systems “end up being (at most) 5% machine learning code” (http://mng.bz/01jl).

This book uses the phrase “machine learning platform” to describe the 95% that play a supporting yet critical role in the entire system. Having the right machine learning platform can make or break your product.

If you take a closer look at figure 1.1, you should be able to describe some of the capabilities you need from a machine learning platform. Obviously, the platform needs to ingest and store data, process data (which includes applying machine learning and other computations to data), and serve the insights discovered by machine learning to the users of the platform. The less obvious observation is that the platform should be able to handle multiple, concurrent machine learning projects and enable multiple users to run the projects in isolation from each other. Otherwise, replacing only the machine learning code translates to reworking 95% of the system.

Figure 1.1 Although machine learning code is what makes your machine learning system stand out, it amounts to only about 5% of the system code according to the experiences described in “Hidden Technical Debt in Machine Learning Systems” by Google’s Sculley et al. Serverless machine learning helps you assemble the other 95% using cloud-based infrastructure.

1.2 Challenges when designing a machine learning platform

How much data should the platform be able to store and process? AcademicTorrents.com is a website dedicated to helping machine learning practitioners get access to public data sets suitable for machine learning. The website lists over 50 TB of data sets, of which the largest are 1—5 TB in size. Kaggle, a website popular for hosting data science competitions, includes data sets as large as 3 TB. You might be tempted to ignore the largest data sets as outliers and focus on more common data sets that are at the scale of gigabytes. However, you should keep in mind that successes in machine learning are often due to reliance on larger data sets. “The Unreasonable Effectiveness of Data,” by Peter Norvig et al. (http://mng.bz/5Zz4), argues in favor of the machine learning systems that can take advantage of larger data sets: “simple models and a lot of data trump more elaborate models based on less data.”

A machine learning platform that is expected to operate on a scale of terabytes to petabytes of data for storage and processing must be built as a distributed computing system using multiple inter-networked servers in a cluster, each processing a part of the data set. Otherwise, a data set with hundreds of gigabytes to terabytes will cause out-of-memory problems when processed by a single server with a typical hardware configuration. Having a cluster of servers as part of a machine learning platform also addresses the input/output bandwidth limitations of individual servers. Most servers can supply a CPU with just a few gigabytes of data per second. This means that most types of data processing performed by a machine learning platform can be sped up by splitting up the data sets in chunks (sometimes called shards) that are processed in parallel by the servers in the cluster. The distributed systems design for a machine learning platform as described is commonly known as scaling out.

A significant portion of figure 1.1 is the serving part of the infrastructure used in the platform. This is the part that exposes the data insights produced by the machine learning code to the users of the platform. If you have ever had your email provider classify your emails as spam or not spam, or if you have ever used a product recommendation feature of your favorite e-commerce website, you have interacted as a user with the serving infrastructure part of a machine learning platform. The serving infrastructure for a major email or an e-commerce provider needs to be capable of making the decisions for millions of users around the globe, millions of times a second. Of course, not every machine learning platform needs to operate at this scale. However, if you are planning to deliver a product based on machine learning, you need to keep in mind that it is within the realm of possibility for digital products and services to reach hundreds of millions of users in months. For example, Pokemon Go, a machine learning—powered video game from Niantic, reached half a billion users in less than two months.

Is it prohibitively expensive to launch and operate a machine learning platform at scale? As recently as the 2000s, running a scalable machine learning platform would have required a significant upfront investment in servers, storage, networking as well as software and the expertise needed to build one. The first machine learning platform I worked on for a customer back in 2009 cost over $100,000 USD and was built using on-premises hardware and open source Apache Hadoop (and Mahout) middleware. In addition to upfront costs, machine learning platforms can be expensive to operate due to waste of resources: most machine learning code underutilizes the capacity of the platform. As you know, the training phase of machine learning is resource-intensive, leading to high utilization of computing, storage, and networking. However, trainings are intermittent and are relatively rare for a machine learning system in production, translating to low average utilization. Serving infrastructure utilization varies based on the specific use case for a machine learning system and fluctuates based on factors like time of day, seasonality, marketing events, and more.

1.3 Public clouds for machine learning platforms

The good news is that public cloud-computing infrastructure can help you create a machine learning platform and address the challenges described in the previous section. In particular, the approach described in this book will take advantage of public clouds from vendors like Amazon Web Services, Microsoft Azure, or Google Cloud to provide your machine learning platform with:

Secure isolation so that multiple users of your platform can work in parallel with different machine learning projects and code
Access to information technologies like data storage, computing, and networking when your projects need them and for as long as they are needed
Metering based on consumption so that your machine learning projects are billed just for the resources you used

This book will teach you how to create a machine learning platform from public cloud infrastructure using Amazon Web Services as the primary example. In particular, I will teach you:

How to use public cloud services to cost effectively store data sets regardless of whether they are made of kilobytes of terabytes of data
How to optimize the utilization and cost of your machine learning platform computing infrastructure so that you are using just the servers you need
How to elastically scale your serving infrastructure to reduce the operational costs of your machine learning platform

1.4 What is serverless machine learning?

Serverless machine learning is a model for the software development of machine learning code written to run on a machine learning platform hosted in a cloud-computing infrastructure with consumption-based metering and billing.

If a machine learning system runs on a server-based cloud-computing infrastructure, why is this book about serverless machine learning? The idea of using servers from a public cloud for a machine learning platform clearly contradicts the premise of serverless. Machine learning without servers? How is that even possible?

Before you object to the use of the word serverless in the definition, keep in mind that information technology professionals working with cloud-computing platforms have adopted serverless as a moniker to describe an approach to using cloud computing, including computing, storage, and networking, as well as other cloud resources and services, in a way that helps them spend their time more effectively, improve their productivity, and optimize costs. Serverless does not mean without servers; it means that when using a serverless approach a developer can ignore the existence of servers in a cloud provider and focus on writing code.

Serverless, as it is used in this book, describes an approach for building machine learning systems that enables the machine learning practitioners to spend as much of their time as possible on writing machine learning code and to spend as little of their time as possible on managing and maintaining the computing, storage, networking, and operating systems; middleware; or any other parts of the underlying information technology needed to host and run the machine learning platform. Serverless machine learning also delivers on a key idea for cost optimization in cloud-computing: consumptive billing. This means that with serverless machine learning, you are billed just for the resources and services that you use.

Machine learning, as used in academia as well as in the information technology industry, covers a broad spectrum of algorithms and systems, including those that defeated top human players at the ancient board game of Go, won on the TV show Jeopardy, and generated deep-fake images of the world’s celebrities and leaders. This book focuses on a specific subfield of machine learning known as supervised learning with structured (tables of rows and columns) data. If you are worried that this subfield is too narrow, note that over 80% of production machine learning systems implemented and used at various stages of maturity at Google, arguably the leader in adopting machine learning, are built using supervised learning from structured data sets.

1.5 Why serverless machine learning?

Prior to serverless machine learning, developers involved in getting machine learning code to run in production had to either work in concert with team members from an operations organization or take on the operations role themselves (this is known in the industry as DevOps). The responsibility of the development role included writing the machine learning code, for example, the code to perform inference, such as estimating a house sales price from real estate property records. Once the code was ready, the developers packaged it, typically as part of a machine learning framework such as PyTorch (more about PyTorch in part 2) or along with external code libraries so that it could be executed as an application (or a microservice) on a server, as shown in figure 1.2.

Figure 1.2 Before serverless platforms, most cloud-based machine learning platforms relied on infrastructure-as-a-service (IaaS) or platform-as-a-service (PaaS) service models, illustrated in the figure. Both IaaS and PaaS require an operations role responsible for instantiating infrastructure: server-based, in the case of IaaS, or application-based, in the case of PaaS. Operations are also responsible for managing the life cycle of the infrastructure once it is running.

The operations role involved instantiating the infrastructure required to run the code while ensuring the infrastructure had the appropriate capacity (memory, storage, bandwidth). The role also was responsible for configuring the server infrastructure with the operating system, middleware, updates, security patches, and other prerequisites. Next, operations started the execution of the developer’s code as one or more application instances. After the code was up and running, operations managed the execution of the code, ensuring that requests were serviced with high availability (i.e., reliably) and low latency (i.e., responsively). Operations were also called to help reduce costs by optimizing infrastructure utilization. This meant continuously monitoring the levels of CPU, storage, network bandwidth, and service latency in order to change the infrastructure capacity (e.g., de-provision servers) and achieve target utilization goals.

Cloud-computing service models such as IaaS replaced physical servers with virtual servers and thus made operations more productive: it took significantly less time and effort to provision and de-provision virtual servers than physical ones. The operations were further automated in the cloud with features such as auto-scaling, which automatically provisioned and de-provisioned virtual servers depending on near-real-time measurements of CPU, memory, and other server-level metrics. PaaS, a more abstract cloud service model, further reduced the operation overhead with virtual servers preconfigured for code execution runtimes, along with pre-installed middleware and operating systems.

While cloud-computing service models like IaaS and PaaS worked well for the serving infrastructure part of machine learning platforms, they fell short elsewhere. While performing exploratory data analysis as preparation for training, a machine learning engineer may execute dozens of different queries against data before settling on the right one. In IaaS and PaaS models, this means that the infrastructure handling data analysis queries needs to be provisioned (sometimes by the operations team) even before the first query can execute. To make matters worse, the utilization of the provisioned infrastructure is entirely at a whim of the user. In an extreme example, if the machine learning engineer runs just one data analysis query a day and it takes 1 hour to execute, the data analysis infrastructure can end up idle while still incurring costs for the other 23 hours of the day.

1.5.1 Serverless vs. IaaS and PaaS

In contrast, the serverless approach illustrated in figure 1.3 helps further optimize the utilization and costs of the machine learning platform. Serverless platforms eliminate the need for performing traditional operations tasks. With serverless machine learning, the machine learning platform takes over the entire life cycle of the machine learning code, instantiating and managing it. This is accomplished by the platform hosting a dedicated runtime for different programming languages and functions. For example, a service runtime exists to execute Python code for running machine learning model training, another to execute SQL code for structured data queries, and more.

Figure 1.3 Serverless platforms eliminate the need for operations to manage the life cycle of the code infrastructure. The cloud-based platform is responsible for instantiating the code in runtime to service requests and for managing the infrastructure to ensure high availability, low latency, and other performance characteristics.

The most impactful consequence of using serverless as opposed to IaaS or PaaS models is cost. With both IaaS and PaaS models, public cloud vendors bill based on provisioned capacity. In contrast, with serverless models, it is possible to optimize machine learning platform costs based on whether the code is actually executed on the platform.

Serverless and machine learning exist at the intersection of two information technologies. On one hand, machine learning opens the potential for new products, new features, or even re-invented industries based on capabilities that previously didn’t exist in the marketplace. On the other hand, serverless models strike the balance between productivity and customization, enabling developers to focus on building differentiating capabilities while reusing existing components from cloud-computing platforms. The serverless approach is more than a re-use of black box components. It is about rapidly assembling project-specific machine learning platforms that can be customized with code to enable the development of new products and services.

1.5.2 Serverless machine learning life cycle

Machine learning–based systems become more valuable when they can operate at scale, making frequent and repetitive decisions about data while supporting a large quantity of users. To get a sense of machine learning operating at this scale, think about your email provider classifying emails as spam or not spam for millions of emails every second and for millions of concurrent users around the globe. Alternatively, consider product recommendations (“If you bought this, you may also like that”) from a major e-commerce website.

While machine learning–based systems grow more valuable at larger scales, just like with any software project, they should still work efficiently when they are small, and if successful, scale for growth. Yet most software projects don’t become overnight successes and don’t grow to reach billions of users. Although this can sound expensive from a cost perspective, the serverless part in serverless machine learning, in this book, is about ensuring your project can benefit from the original promise of public cloud computing: paying only for what you use, no more and no less.

1.6 Who is this book for?

The serverless machine learning approach described in this book is targeted at teams and individuals who are interested in building and implementing a machine learning system that may need to be scaled up to a potentially large number of users and large quantity of requests and data volumes, but that also needs to scale down when necessary to stay cost efficient. Even if you decide against using machine learning algorithms in a project, you can still use this book to learn about how serverless and cloud computing can help you manage, process, and analyze data.

1.6.1 What you can get out of this book

If you are planning to put a machine learning system in production, at some point you have to decide whether to buy or to build the supporting 95%, in other words, the components of a machine learning platform. The examples, such as the ones from “Hidden Technical Debt in Machine Learning Systems,” include the serving infrastructure, data collection, verification, storage, monitoring, and more.

If you plan to build most or all of your machine learning platform, you can approach this book as a series of design use cases or inspirational examples from a sample machine learning project. The book demonstrates how the platform capabilities are implemented in cloud-computing platforms from various public cloud vendors, including AWS, Google Cloud, and Microsoft Azure. The book will also teach you about the features you will need for the machine learning platform, including object storage, data warehousing, interactive querying, and more. Whenever possible, the book will highlight the open source projects you can use in your platform build-out. While this book will not give you the step-by-step instructions for how to build your machine learning platform, you can use it as a case study and a guide for the components of the architecture that you should be building.

If you are planning to acquire most of the machine learning platform capabilities, the book gives you the instructions and walks you through the process for how to build a sample machine learning project and then put it into production using Amazon Web Services. The book will also walk you through the implementation steps for a machine learning platform, including the source code needed for the project. Whenever possible, the approach in this book relies on portable open source technologies such as Docker (more about Docker in appendix B) and PyTorch (more about PyTorch in part 2) that will ease the process of porting the project to other cloud providers such as Google Cloud and Microsoft Azure.

1.7 How does this book teach?

The field of machine learning exists at the intersection of computer science and statistics. So, it should come as no surprise that there are alternative routes for introducing a reader to the applications of machine learning. Many information technology professionals began their studies of machine learning with the well-known Coursera class by Andrew Ng (https://www.coursera.org/learn/machine-learning). Those with a statistical or academic background often cite An Introduction to Statistical Learning by James et al. (Springer, 2013), as their first textbook on machine learning.

This book takes a software engineering approach to machine learning. This means that for the purposes of this book, machine learning is the practice of building software-based systems with the defining ability to automatically derive answers from data in order to augment, and often replace, the need for humans in repetitive data-driven decision making. The focus on software engineering also means that the details of the machine learning algorithms, techniques, and statistical foundations will be covered with less rigor compared to how they are treated by the other sources mentioned. Instead, this book will focus on describing how to engineer production-ready systems that have machine learning—based features at their core.

1.8 When is this book not for you?

Based on everything you’ve read so far, you may develop a mistaken impression that serverless machine learning is suitable to every application of machine learning. So, when does it make sense to use serverless machine learning? I will be the first to admit that it does not apply in every circumstance. If you are working on an experimental, one-of-a-kind project, one that is limited in scope, size, or duration, or if your entire working data set is and always will be small enough to fit in memory of a virtual server, you should reconsider using serverless machine learning. You are probably better off with an approach with a dedicated single virtual server (single node) and a Jupyter notebook in an Anaconda installation, Google Colaboratory, or a similar Jupyter notebook hosting environment.

The serverless approach does help optimize the costs related to running a machine learning project on a public cloud; however, this does not mean that re-implementing the project from this book is free of charge. To get the most from this book, you will want to use your AWS account to reproduce the examples described in the upcoming chapters. To do so you will need to spend about $45 USD to re-create the project by following the steps described in the book. However, to benefit from the book, you don’t need to stick to AWS. Whenever possible, this book will make references to alternative capabilities from other vendors such as Google Cloud and Microsoft Azure. The good news is that this book’s entire project can be completed within the free credit allowances available from the three major public cloud vendors. Alternatively, if you choose not to implement code examples or the project from this book in a public cloud, you can also rely on the descriptions to get a conceptual understanding of what it takes to launch a machine learning system at scale.

Keep in mind that you should not use the approach in this book if you are not prepared to maintain your system after it is put into production. The reality is that the serverless approach integrates with the capabilities of the public cloud platforms, such as AWS, and those capabilities, specifically their APIs and endpoints, change over time. While the public cloud vendors have an approach for providing you with some stability for those endpoints (e.g., managed phaseout plans), you should be prepared for vendors to introduce new features and changes that, in turn, mean you should be prepared to invest time and effort in maintaining your features over time. If you need to minimize and control the extent of maintainability, the serverless approach is not for you.

Privacy concerns could give rise to another host of reasons to avoid using a public cloud-based infrastructure for your project. Although most public cloud providers offer sophisticated encryption key-based data security mechanisms, and have features to help meet data privacy needs, in a public cloud you can achieve a high degree of certainty in data privacy but not necessarily a complete guarantee that your data and processes will be secure. With that said, this book does not teach you how to ensure 100% security for your data in the cloud, how to provide authentication and authorization, nor how handle other types of security concerns for the machine learning systems described in the book. Whenever possible, I provide references that can help you with security, but it is outside the scope of this book to teach you the security aspects of data and privacy.

From a portability standpoint, the approach described in this book tries to strike the balance between ideal code portability and the need to minimize the amount of effort needed to deploy a machine learning project. If portability is the overriding concern for you, you will be better off attempting a different approach. For example, you can rely on complex infrastructure management stacks, such as Kubernetes or Terraform, for infrastructure deployment and runtime management. You should also not use the serverless machine learning approach if you are determined to use a proprietary framework or technology that is incompatible with the stack used in this book. The book will attempt to use nonproprietary, portable, and open source tools whenever possible.

1.9 Conclusions

What problems can this book solve for the reader, and what value can the reader get out of it? The contemporary practice of machine learning sucks too much productivity out of a machine learning practitioner. This book teaches the reader to work efficiently though a sample machine learning project. Instead of navigating the maze of alternatives for a machine learning platform, risking mistakes or failure, this book teleports the reader right to the well-trodden path of experienced machine learning practitioners. Instead of having to rediscover the practices of machine learning yourself, you can use this book to take advantage of the capabilities that work well for the requirements of the vast majority of machine learning projects.

This book is for someone who already has some experience with machine learning because it does not teach machine learning from scratch. The book focuses on practical, pragmatic understanding of machine learning and provides you with just enough knowledge to understand and complete the sample project. By the end of the book, you will have completed your machine learning project, deployed it to a machine learning platform on a public cloud, made your system available as a highly available web service accessible to anyone on the internet, and prepared for the next steps of ensuring the system’s long-term success.

Summary

Successful machine learning systems consist of about 5% machine learning code. The rest is the machine learning platform.
Public cloud-computing infrastructure enables cost-effective scalability for a machine learning platform.
Serverless machine learning is a model for the software development of machine learning code that is written to run on a machine learning platform hosted in a cloud-computing infrastructure.
Serverless machine learning can help you develop new products and services by rapidly assembling a machine learning system.
This book will help you navigate the path from experimental machine learning code to a production machine learning system running in a public cloud.

^1.If you need or would like a refresher on machine learning basics, there is a section about the topic in appendix A.

^2.The phrase is thought to have originated at the MIT AI Lab in the 1990s (see http://mng.bz/m1Pn).