Artificial intelligence (AI), which reaches into many aspects of everyday life, has been extensively explored in recent years. It attempts to use computational devices to automate tasks by allowing them to perceive the environment as humans do. As a branch of AI, machine learning (ML) enables a computer to perform a task through self-exploration of data. It allows the computer to learn, so it can do things that go beyond what we know how to order it to do. But the barriers to entry are high: the cost of learning the techniques involved and accumulating the necessary experience with applications means practitioners without much expertise cannot easily use ML. Taking ML techniques from their ivory tower and making them accessible to more people is becoming a key focus of research and industry. Toward this end, automated machine learning (AutoML) has emerged as a prevailing research field. Its aim is to simulate how human experts solve ML problems and discover the optimal ML solutions for a given problem automatically, thereby granting practitioners without extensive experience access to off-the-shelf ML techniques. As well as being beneficial for newcomers, AutoML will relieve experts and data scientists of the burden of designing and configuring ML models. Being a cutting-edge topic, it is new to most people, and its current capabilities are often exaggerated by mass media. To give you a glimpse of what AutoML is, this chapter provides some background and an introduction to the fundamental concepts and orients you to its research value and practical benefits. Let’s start with a toy example.
highlight, annotate, and bookmark
You can automatically highlight by performing the text selection while keeping the alt/ key pressed.

Suppose you want to design an ML model to recognize handwritten digits in images. The ML model will take the images as inputs and output the corresponding digits in each of the images (see figure 1.1).
In case you’re not experienced with ML, let’s use a programmatic illustration with Pythonic style to show how we usually achieve this goal in practice. We take an ML model as an object instantiated from a class, as shown in listing 1.1. This class corresponds to a specific type of ML algorithm (a set of procedures) that we would like to use in our model.1 To instantiate a model, besides selecting the algorithm class to be used, we also need to feed the algorithm some historical data and arguments (arg1 and arg2). The historical data used here consists of images of handwritten digits, whose labels (corresponding numbers) are already known. This helps the machine (or the ML algorithm) to conduct the learning process—that is, to learn how to recognize the digits in images, similar to how a child is trained to recognize objects from pictures. (You’ll see the details of this process in later sections.) The arguments here are used to control the algorithm, instructing it how to do this process. The resulting ML model will be able to predict the digits in previously unseen images (see figure 1.1) with the second line of code in the next listing.
Listing 1.1 A simplified ML process
ml_model = MachineLearningAlgorithm1( arg1=..., arg2=..., data=historical_images) #1 digits=[model.predict_image_digit(image) for image in new_images] #2
As you can see from the code, besides the dataset, which we may need to prepare ourselves, we need to provide the following two things based on our prior knowledge to address the task:
- The ML algorithm (or method) to be used; that is, MachineLearningAlgorithm1
- The arguments of the algorithm
Selecting the algorithm and configuring its arguments can be difficult in practice. Let’s use algorithm selection as an example. As a beginner, a typical approach is to collect some learning materials, explore the code for some related tasks, and identify a pool of ML algorithms you might be able to use for the task at hand. You can then try them out one by one on your historical data (as we do in listing 1.1) and pick the best one based on their performance at recognizing the digits in the images. This repetitive process is summarized in the next code sample.
Listing 1.2 A naive way of selecting ML algorithms
ml_algorithm_pool = [ MachineLearningAlgorithm1, #1 MachineLearningAlgorithm2, #1 ..., #1 MachineLearningAlgorithmN, #1 ] for ml_algorithm in ml_algorithm_pool: #2 model = ml_algorithm( #3 arg1=..., arg2=..., #3 data=historical_images) #3 result = evaluate(model) #3 push result into the result_pool push model into the model_pool best_ml_model = pick_the_best(result_pool, ml_model_pool) #4 return best_ml_model
The process looks intuitive but may take you hours or days if you do not have much ML knowledge or experience for a few reasons. First, collecting a pool of feasible ML algorithms could be challenging. You may need to explore the literature, identify the state-of-the-art algorithms, and learn how to implement them. Second, the number of feasible ML algorithms could be huge. Trying them out one by one may not be a good choice and may even be prohibitive. Third, each algorithm has its own arguments. Configuring them correctly requires expertise, experience, and even some luck.
Might there be a better way of doing this? Is it possible to let the machine perform automatically for you? If you have faced similar problems and want to adopt ML in a more labor-saving way, AutoML could be the tool you are looking for. Loosely speaking, AutoML mimics the manual process described in the preceding pseudocode. It tries to automate the repetitive and tedious process of selecting and configuring ML algorithms and can allow you access to many advanced algorithms without even knowing they exist. The following two lines of pseudocode illustrate how to use an AutoML algorithm to generate the ML solution:
automl_model = AutoMLAlgorithm() best_ml_model = automl_model.generate_model(data=historical_images)
Creating an AutoML model object from an AutoML algorithm means you don’t even need to provide the pool of ML algorithms to test, and you can generate the desired model simply by feeding data into it.
But how do you select an AutoML algorithm? What are the ML algorithms it will choose from? How does it evaluate them and choose a model? Before going any further, I’ll give you some background on ML so you can better understand what AutoML automates and how to use it in practice to save yourself time and effort. The focus here will be on what you need to know to learn and use AutoML. If you want to learn more about these algorithms, I recommend referring to other ML books, such as Machine Learning in Action by Peter Harrington (Manning, 2012) and Deep Learning with Python, 2nd ed., by François Chollet (Manning, 2021). For readers who are already familiar with the basics of ML, this next section will serve as a recap, make sure we’re all on the same page with some terminology, and better motivate the following introduction to AutoML.
discuss

This section provides a brief introduction to ML—what it is, the critical components in an ML algorithm, and how an ML model is created based on a selected algorithm and data input. Learning these basics is essential to understanding the concepts of AutoML introduced in the next sections.
Before the emergence of ML, the dominant paradigm in AI research was symbolic AI, where the computer could process data only based on predefined rules explicitly input by humans. The advent of ML revolutionized the programming paradigm by enabling knowledge to be learned from the data implicitly. For example, suppose you want a machine to recognize images of apples and bananas automatically. With symbolic AI, you would need to provide human-readable rules associated with the reasoning process, perhaps specifying features like color and shape, to the AI method. In contrast, an ML algorithm takes a bunch of images and their corresponding labels (“banana” or “apple”) and outputs the learned rules, which can be used to predict unlabeled images (see figure 1.2).
The essential goals of ML are automation and generalization. Automation means an ML algorithm is trained on the data provided to automatically extract rules (or patterns) from the data. It mimics human thinking and allows the machine to improve itself by interacting with the historical data fed to it, which we call training or learning. The rules are then used to perform repetitive predictions on new data without human intervention. For example, in figure 1.2, the ML algorithm interacts with the apple and banana images provided and extracts a color rule that enables it to recognize them through the training process. These rules can help the machine classify new images without human supervision, which we call generalizing to new data. The ability to generalize is an important criterion in evaluating whether an ML algorithm is good. In this case, suppose an image of a yellow apple is fed to the ML algorithm—the color rule will not enable it to correctly discern whether it’s an apple or a banana. An ML algorithm that learns and applies a shape feature for prediction may provide better predictions.
An ML algorithm learns rules through exposure to examples with known outputs. The rules are expected to enable it to transform inputs into meaningful outputs, such as transforming images of handwritten digits to the corresponding numbers. So, the goal of learning can also be thought of as enabling data transformation. The learning process generally requires the following two components:
- Data inputs—Data instances of the target task to be fed into the ML algorithm, for example, in the image recognition problem (see figure 1.2), a set of apple and banana images and their corresponding labels
- Learning algorithm—A mathematical procedure to derive a model based on the data inputs, which contains the following four elements:
- An ML model with a set of parameters to be learned from the data
- A measurement to measure the model’s performance (such as prediction accuracy) with the current parameters
- A way to update the model, which we call an optimization method
- A stop criterion to determine when the learning process should stop
After the model parameters are intialized,2 the learning algorithm can update the model iteratively by modifying the parameters based on the measurement until the stop criterion is reached. This measurement is called a loss function (or objective function) in the training phase; it measures the difference between the model’s predictions and the ground-truth targets. This process is illustrated in figure 1.3.
Let’s look at an example to help you better understand the learning process. Imagine we have a bunch of data points in two-dimensional space (see figure 1.4). Each point is either black or white. We want to build an ML model that, whenever a new point arrives, can decide whether this is a black point or a white point based on the point’s position. A straightforward way to achieve this goal is to draw a horizontal line to separate the two-dimensional space into two parts based on the data points in hand. This line could be regarded as an ML model. Its parameter is the horizontal position, which can be updated and learned from the provided data points. Coupled with the learning process introduced in figure 1.3, the required components could be chosen and summarized as follows:
- The data inputs are a bunch of white and black points described by their location in the two-dimensional space.
- The learning algorithm consists of the following four selected components:
- ML model—A horizontal line that can be formulated as y = a, where a is the parameter that can be updated by the algorithm.
- Accuracy measurement—The percentage of points that are labeled correctly based on the model.
- Optimization method—Move the line up or down by a certain distance. The distance can be related to the value of the measurement in each iteration. It will not stop until the stop criterion is satisfied.
- Stop criterion—Stop when the measurement is 100%, which means all the points in hand are labeled correctly based on the current line.
Figure 1.4 An example of the learning process: Learning a horizontal line to split white and black points

In the example shown in figure 1.4, the learning algorithm takes two iterations to achieve the desired line, which separates all the input points correctly. But in practice, this criterion may not always be satisfied. It depends on the distribution of the input data, the selected model type, and how the model is measured and updated. We often need to choose different components and try different combinations to adjust the learning process to get the expected ML solution. Also, even if the learned model is able to label all the training inputs correctly, it is not guaranteed to work well on unseen data. In other words, the model’s ability to generalize may not be good (we’ll discuss this further in the next section). It’s important to select the components and adjust the learning process carefully.
How do we select the proper components to adjust the learning process so that we can derive the expected model? To answer this question, we need to introduce a concept called hyperparameters and clarify the relationship between these and the parameters we’ve been discussing as follows:
- Parameters are variables that can be updated by the ML algorithm during the learning process. They are used to capture the rules from the data. For example, the position of the horizontal line is the only parameter in our previous example (figure 1.4) to help classify the points. It is adjusted during the training process by the optimization method to capture the position rule for splitting the points with different colors. By adjusting the parameters, we can derive an ML model that can accurately predict the outputs of the given input data.
- Hyperparameters are also parameters, but they’re ones we predefine for the algorithm before the learning process begins, and their values remain fixed during the learning process. These include the measurement, the optimization method, the speed of learning, the stop criterion, and so on. An ML algorithm usually has multiple hyperparameters. Different combinations of them have different effects on the learning process, resulting in ML models with different performances. We can also consider the algorithm type (or the ML model type) as a hyperparameter, because we select it ourselves, and it is fixed during the learning process.
The selection of an optimal combination of hyperparameters for an ML algorithm is called hyperparameter tuning and is often done through an iterative process. In each iteration, we select a set of hyperparameters to use to learn an ML model with the training dataset. The ML algorithm block in figure 1.5 denotes the learning process described in figure 1.3. By evaluating each learned model on a separate dataset called the validation set, we can then pick the best one as the final model. We can evaluate the generalizability of that model using another dataset called the test set, which concludes the whole ML workflow.
In general, we will have three datasets in the ML workflow. Each dataset is distinct from the other two, as described next:
- The training set is used during the learning process to train a model given a fixed combination of hyperparameters.
- The validation set is used during the tuning process to evaluate the trained models to select the best hyperparameters.
- The test set is used for the final testing, after the tuning process. It is used only once, after the final model is selected, and should not be used for training or tuning the ML algorithm.
The training and test sets are straightforward to understand. The reason we want to have an additional validation dataset is to avoid exposing the algorithm to all the training data during the tuning stages—this enhances the generalizability of the final model to unseen data. If we don’t have a validation set, the best model selected in the tuning stage would be the one that focuses on extracting any subtle features in the training data to ceaselessly increase the training accuracy without caring about any unseen dataset. This situation will likely lead to bad performance on the final test set, which contains different data. When the model performs worse on the test set (or validation set) than the training set, this is called overfitting. It’s a well-known problem in ML and often happens when the model’s learning capacity is too strong and the size of the training dataset is limited. For example, suppose you want to predict the fourth number of a series, given the first three numbers as training data: a1 = 1, a2 = 2, a3 = 3, a4 = ? (a4 is the validation set here; a5 onward are the test sets.) If the right solution is a4 = 4, a naive model, ai = i, would provide the correct answer. If you use a third-degree polynomial to fit the series, a perfect solution for the training data would be ai = i3 - 6i2 + 12i - 6, which will predict a4 as 10. The validation process enables a model’s generalization ability to be better reflected during evaluation so that better models can be selected.
Note
Overfitting is one of the most important problems studied in ML. Besides doing validation during the tuning process, we have many other ways to address the problem, such as augmenting the dataset, adding regularization to the model to constrain its learning capacity during training, and so on. We won’t go into this in more depth here. To learn more about this topic, see Chollet’s Deep Learning with Python.
At this point, you should have a basic understanding of what ML is and how it proceeds. Although you can make use of many mature ML toolkits, you may still face difficulties in practice. This section describes some of these challenges—the aim is not to scare you off, but to provide context for the AutoML techniques that are described afterward. Obstacles you may meet include the following:
- The cost of learning ML techniques—We’ve covered the basics, but more knowledge is required when applying ML to a real problem. For example, you’ll need to think about how to formulate your problem as an ML problem, which ML algorithms you could use for your problem and how they work, how to clean and preprocess the data into the expected format to input into your ML algorithm, which evaluation criteria should be selected for model training and hyperparameter tuning, and so on. All these questions need to be answered in advance, and doing so may require a large time commitment.
- Implementation complexity—Even with the necessary knowledge and experience, implementing the workflow after selecting an ML algorithm is a complex task. The time required for implementation and debugging will grow as more advanced algorithms are adopted.
- The gap between theory and practice—The learning process can be hard to interpret, and the performance is highly data driven. Furthermore, the datasets used in ML are often complex and noisy and can be difficult to interpret, clean, and control. This means the tuning process is often more empirical than analytical. Even ML experts sometimes cannot achieve the desired results.
These difficulties significantly impede the democratization of ML to people with limited experience and correspondingly increase the burden on ML experts. This has motivated ML researchers and practitioners to pursue a solution to lower the barriers, circumvent the unnecessary procedures, and alleviate the burden of manual algorithm design and tuning—AutoML.
settings

The goal of AutoML is to allow a machine to mimic how humans design, tune, and apply ML algorithms so that we can adopt ML more easily (see figure 1.6). Because a key property of ML is automation, AutoML can be regarded as automating automation.
ml_algorithm_pool = [ MachineLearningAlgorithm1, MachineLearningAlgorithm2, ..., MachineLearningAlgorithmN, ] for ml_algorithm in ml_algorithm_pool: model = ml_algorithm(arg1=..., arg2=..., data=historical_images) result = evaluate(model) push result into the result_pool push model into the model_pool best_ml_model = pick_the_best(result_pool, ml_model_pool) return best_ml_model
This pseudocode can be regarded as a simple AutoML algorithm that takes a pool of ML algorithms as input, evaluates them one by one, and outputs a model learned from the best algorithm. Each AutoML algorithm consists of the following three core components (see figure 1.7):
- Search space—A set of hyperparameters, and the ranges of each hyperparameter from which to select. The range of each hyperparameter can be defined based on the user’s requirements and knowledge. For example, the search space can be a pool of ML algorithms, as shown in the pseudocode. In this case, we treat the type of ML algorithm as a hyperparameter to be selected. The search space can also be the hyperparameters of a specific ML algorithm, such as the structure of the ML model. The design of the search space is highly task-dependent, because we may need to adopt different ML algorithms for various tasks. It is also quite personalized and ad hoc, depending on the user’s interests, expertise, and level of experience. There is always a tradeoff between the convenience you’ll enjoy by defining a large search space and the time you’ll spend identifying a good model (or the performance of the model you can achieve in a limited amount of time). For beginners, it can be tempting to define a broad search space that is general enough to apply to any task or situation, such as a search space containing all the ML algorithms—but the time and computational cost involved make this a poor solution. We’ll discuss these considerations more in the second part of the book, where you’ll learn how to customize your search space in different scenarios based on additional requirements.
- Search strategy—A strategy to select the optimal set of hyperparameters from the search space. Because AutoML is often an iterative trial-and-error process, the strategy often sequentially selects the hyperparameters in the search space and evaluates their performance. It may loop through all the hyperparameters in the search space (as in the pseudocode), or the search strategy may be adapted based on the hyperparameters that have been evaluated so far to increase the efficiency of the later trials. A better search strategy can help you achieve a better ML solution within the same amount of time. It may also allow you to use a larger search space by reducing the search time and computational cost. How to adopt, compare, and implement different search algorithms will be introduced in the third part of the book.
- Performance evaluation strategy—A way to evaluate the performance of a specific ML algorithm instantiated by the selected hyperparameters. The evaluation criteria are often the same as the ones used in manual tuning, such as the validation performance of the model learned from the selected ML algorithm. In this book, we discuss different evaluation strategies in the context of adopting AutoML to solve different types of ML tasks.
To facilitate the adoption of AutoML algorithms, an AutoML toolkit often wraps up these three components and provides some general application programming interfaces (APIs) with a default search space and search algorithm so that you don’t need to worry about selecting them yourself. For end users, in the simplest case, all you need to do to obtain the final model is provide the data, as shown here—you don’t even need to split the data into training and validation sets:
automl_model = AutoMLAlgorithm() best_ml_model = automl_model.generate_model(data=...)
But because different users may have different use cases and levels of ML expertise, they may need to design their own search spaces, evaluation strategies, and even search strategies. Existing AutoML systems, therefore, often also provide APIs with configurable arguments to allow you to customize different components. A broad spectrum of solutions are available, from the simplest to the most configurable (figure 1.8).
The range of APIs available allows you to pick the most suitable one for your use case. This book will teach you how to select the right API in an advanced AutoML toolkit, AutoKeras, for different AutoML applications. You’ll also learn how to create your own AutoML algorithm with the help of KerasTuner.
The field of AutoML has been evolving for three decades, with the involvement of industry and the open source community. Many successful implementations and promising developments have been seen, as described here:
- Many company internal tools and open source platforms have been developed to help with hyperparameter tuning of ML models and model selection (Google Vizier, Facebook Ax, and so on).
- AutoML solutions performing at near human levels have been observed in many Kaggle data science competitions.
- Vast open source ML packages for improved hyperparameter tuning and ML pipeline creation have been developed, such as Auto-sklearn, AutoKeras, and so on.
- Commercial AutoML products are helping many companies, big and small, to adopt ML in production. For example, Disney has successfully used Google Cloud AutoML to develop ML solutions for its online store without hiring a team of ML engineers (https://blog.google/products/google-cloud/cloud-automl-making-ai-accessible-every-business/).
- Researchers in fields other than computer science, such as medicine, neurobiology, and economics, are also leveraging the power of AutoML. They can now bring new ML solutions to domain-specific problems such as medical image segmentation,3 genomic research,4 and animal recognition and protection,5 without going through the long learning curve of ML and programming.
We are still exploring the full capabilities of AutoML to democratize ML techniques and make them accessible to more people in different domains. Despite the many successful applications of AutoML that have been seen so far, we still have a lot of challenges and limitations to further explore and address, including the following:
- The difficulty of building AutoML systems—Compared to building an ML system, building an AutoML system from scratch is a more complex and involved process.
- The automation of collecting and cleaning data—AutoML still requires people to collect, clean, and label data. These processes are often more complicated in practice than the design of ML algorithms, and, for now at least, they cannot be automated by AutoML. For AutoML to work today, it has to be given a clear task and objective with a high-quality dataset.
- The costs of selecting and tuning the AutoML algorithm—The “no free lunch” theorem tells us that there is no omnipotent AutoML algorithm that fits any hyperparameter tuning problem. The effort you save on selecting and tuning an ML algorithm may be amortized or even outweighed by the effort you need to put into selecting and tuning the AutoML algorthm.
- Resource costs—AutoML is a relatively costly process, in terms of both time and computational resources. Existing AutoML systems often need to try more hyperparameters than human experts to achieve comparable results.
- The cost of human-computer interaction—Interpreting the solution and the tuning process of AutoML may not be easy. As these systems become more complex, it will become harder and harder for humans to get involved in the tuning process and understand how the final model is achieved.
AutoML is still in its early stages of development, and its continuing progress will rely heavily on the participation of researchers, developers, and practitioners from different domains. Although you may contribute to that effort one day, the goal of this book is more modest. It mainly targets practitioners who have limited expertise in machine learning, or who have some experience but want to save themselves some effort in creating ML solutions. The book will teach you how to address an ML problem automatically with as few as five lines of code. It will gradually approach more sophisticated AutoML solutions for more complicated scenarios and data types, such as images, text, and so on. To get you started, in the next chapter, we’ll dig more deeply into the fundamentals of ML and explore the end-to-end pipeline of an ML project. It will help you better understand and make use of AutoML techniques in the later chapters.
- Machine learning refers to the capacity of a computer to modify its processing by interacting with data automatically, without being explicitly programmed.
- The ML process can be described as an iterative algorithmic process to adjust the parameters of an ML model based on the data inputs and certain measurements. It stops when the model is able to provide the expected outputs, or when some particular criterion defined by the user is reached.
- Tuning the hyperparameters in an ML algorithm allows you to adjust the learning process and select components tailored to the ML problem at hand.
- AutoML aims to learn from the experience of designing and applying ML models and automate the tuning process, thereby relieving data scientists of this burden and making off-the-shelf ML techniques accessible to practitioners without extensive experience.
- An AutoML algorithm consists of three key components: the search space, search strategy, and evaluation strategy. Different AutoML systems provide different levels of APIs that either configure these for you or allow you to customize them based on your use case.
- AutoML contains many unaddressed challenges, preventing it from living up to the highest expectations. Achieving true automatic machine learning will be difficult. We should be optimistic but also take care to avoid exaggerating AutoML’s current capabilities.
1. Many well-known ML packages provide these kinds of classes corresponding to ML algorithms, such as scikit-learn.
2. The parameter values may be intialized randomly or assigned following a strategy such as a warm start, where you begin with some existing parameters learned by similar models.