1 Why deep learning with structured data?

published book

This chapter covers

A high-level overview of deep learning
Benefits and drawbacks of deep learning
Introduction to the deep learning software stack
Structured versus unstructured data
Objections to deep learning with structured data
Advantages of deep learning with structured data
Introduction to the code accompanying this book

Since 2012, we have witnessed what can only be called a renaissance of artificial intelligence. A discipline that had lost its way in the late 1980s is important again. What happened?

In October 2012, a team of students working with Geoffrey Hinton (a leading academic proponent of deep learning based at the University of Toronto) announced a result in the ImageNet computer vision contest that achieved an error rate in identifying objects that was close to half that of the nearest competitor. This result exploited deep learning and ushered in an explosion of interest in the topic. Since then, we have seen deep learning applications with world-class results in many domains, including image processing, audio to text, and machine translation. In the past couple of years, the tools and infrastructure for deep learning have reached a level of maturity and accessibility that make it possible for nonspecialists to take advantage of deep learning’s benefits. This book shows how you can use deep learning to get insights into and make predictions about structured data: data organized as tables with rows and columns, as in a relational database. You will see the capability of deep learning by going step by step through a complete, end-to-end example of deep learning, from ingesting the raw input structured data to making the deep learning model available to end users. By applying deep learning to a problem with a real-world structured dataset, you will see the challenges and opportunities of deep learning with structured data.

1.1 Overview of deep learning

Before reviewing the high-level concepts of deep learning, let’s introduce a simple example that we can use to explore these concepts: detection of credit card fraud. Chapter 2 introduces the real-world dataset and an extensive code example that prepares this dataset and uses it to train a deep learning model. For now, this basic fraud detection example is sufficient for a review of some of the concepts of deep learning.

Why would you want to exploit deep learning for fraud detection? There are several reasons:

Fraudsters can find ways to work around the traditional rules-based approaches to fraud detection (http://mng.bz/emQw).
A deep learning approach that is part of an industrial-strength pipeline--in which the model performance is frequently assessed and the model is automatically retrained if its performance drops below a given threshold--can adapt to changes in fraud patterns.
A deep learning approach has the potential to provide near-real-time assessment of new transactions.

In summary, deep learning is worth considering for fraud detection because it can be the heart of a flexible, fast solution. Note that in addition to these advantages, there is a downside to using deep learning as a solution to the problem of fraud detection: compared with other approaches, deep learning is harder to explain. Other machine learning approaches allow you to determine which input characteristics most influence the outcome, but this relationship can be difficult or impossible to establish with deep learning.

Assume that a credit card company maintains customer transactions as records in a table. Each record in this table contains information about the transaction, including an ID that uniquely identifies the customer, as well as details about the transaction, including the date and time of the transaction, the ID of the vendor, the location of the transaction, and the currency and amount of the transaction. In addition to this information, which is added to the table every time a transaction is reported, every record has a field to indicate whether the transaction was reported as a fraud.

The credit card company plans to train a deep learning model on the historical data in this table and use this trained model to predict whether new incoming transactions are fraudulent. The goal is to identify potential fraud as quickly as possible (and take corrective action) rather than waiting days for the customer or vendor to report that a particular transaction is fraudulent.

Let’s examine the customer transaction table. Figure 1.1 contains a snippet of what some records in this table would look like.

Figure 1.1 Dataset for credit card fraud example

The columns customer ID, transaction date, transaction time, vendor ID, City, Country, currency, and amount contain details about individual credit card transactions for the previous quarter. The fraud column is special because it contains the label: the value that we want the deep learning model to predict when it has been trained on the training data. Assume that the default value in the fraud column is 0 (meaning “not a fraud”), and that when one of our customers or vendors reports a fraudulent transaction, the value in the fraud column for that transaction in the table is set to 1.

As new transactions arrive, we want to be able to predict whether they are fraudulent so that we can quickly take corrective action. By training the deep learning model on the historical dataset, we will be defining a function that can predict whether new credit card transactions are fraudulent. In this example of supervised learning (http://mng.bz/pzBE), the model is trained by means of a dataset that incorporates examples with labels. The dataset that is used to train the model includes the value that the trained model will predict (in this case, whether a transaction is fraudulent). By contrast, in unsupervised learning the training dataset does not include labels.

Now that we have introduced the credit card fraud example, let’s use it to take a brief tour of some of the concepts of deep learning. For a more in-depth description of these concepts, see François Chollet’s Deep Learning with Python, 2nd ed. (http://mng.bz/OvM2), which includes excellent descriptions of these concepts:

Deep learning is a machine learning approach in which multilayer artificial neural networks are trained by setting weights and offsets at each layer by optimizing a loss function (the delta between the actual outcome [the values in the fraud column] and the predicted outcome) through the use of gradient-based optimization and backpropagation.
Neural networks in a deep learning model have a series of layers, starting with the input layer, followed by several hidden layers, and culminating with an output layer.
In each of these layers, the output of the previous layer (or, in the case of the first layer, the training data, which for our example is the dataset columns from customer ID, date, time, vendor ID, City, Country, currency and amount) goes through a series of operations (multiplication by a matrix of weights, addition of an offset [bias], and application of a nonlinear activation function) to produce the input for the next layer. In figure 1.2, each circle (node) has its own set of weights. The inputs are multiplied by those weights, the bias is added, and an activation function is applied to the result to produce the output that is taken in by the next layer.

Figure 1.2 Multilayered neural network

The final output layer generates the prediction of the model based on the input. In our example of predicting credit card fraud, the output indicates whether the model predicts a fraud (output of 1) or not a fraud (output of 0) for a given transaction.
Deep learning works by iteratively updating the weights in the network to minimize the loss function (the function that defines the aggregate difference between the predictions of the model and the actual result values in the training dataset). As the weights are adjusted, the model’s predictions in aggregate get closer to the actual result values in the fraud column of the input table. With each training iteration, the weights are adjusted based on the gradient of the loss function.
You can think of the gradient of the loss function as being roughly equivalent to the slope of a hill. If you make small, incremental steps in the direction opposite the slope of the hill, you will eventually get to the bottom of the hill. By making small changes to the weights in the direction opposite to the gradient for each iteration through the network, you reduce the loss function bit by bit. A process called backpropagation is used to get the gradient of the loss function, which can then be applied to update the weights for each node in the neural network in such a way that with repeated applications, the loss function is minimized and the accuracy of the model’s predictions is maximized. The training process is summarized in figure 1.3.

Figure 1.3 Training data is used when weights are iteratively updated in the network to train the model.

When the training is complete (the weights in the model have been repeatedly updated using the gradient provided by backpropagation to achieve the desired performance with the training data), the resulting model can be used to make predictions on new data that the model has never seen.

The output of the process is a trained deep learning model that incorporates the final weights and can be used to predict outputs from new input data, as shown in figure 1.4.

Figure 1.4 A trained model generates predictions on new data.

This book does not cover the mathematical basis of deep learning. The section on the mathematical building blocks of deep learning in Deep Learning with Python , 2nd ed., provides a clear, concise description of the math behind deep learning. You can also see the reference to the deeplearning.ai curriculum in chapter 9 for a good overview of the math behind deep learning.

1.2 Benefits and drawbacks of deep learning

The core point of deep learning is both simple and profound: a trained deep learning model can incorporate a function of incredible complexity that accurately characterizes patterns implicit in the data on which the model is trained. Given enough labeled data to train on (such as a large-enough dataset of credit card transactions with a column to indicate whether each transaction is a fraud), deep learning can define a model that predicts the label values for new data that the model never saw during the training process. The functions that deep learning defines in the form of trained models can include millions of parameters, well beyond what any human could create by hand.

In some use cases, such as image recognition, deep learning models have the benefit of being trainable on data that is closer to the raw input data than is possible with non-deep-learning machine learning approaches. Those approaches may require extensive feature engineering (hand-coded transformations of the input data and new columns in the input table) to achieve good performance.

The benefits of deep learning don’t come free. Deep learning has several significant drawbacks that you need to be prepared to deal with. For deep learning to work, you need

Lots of labeled data --You may need millions of examples, depending on the domain.
Hardware capable of doing massive matrix manipulations --As you will see in chap-ter 2, a modern laptop may be sufficient to train a simple deep learning model. Bigger models will require specialized hardware (GPUs and TPUs) to train efficiently.
Tolerance for the model’s imperfect transparency --When you compare deep learning with classic, non-deep-learning machine learning, it can be more difficult to spell out why a deep learning model is making the predictions it is making. In particular, if a model is trained on a certain set of features (customer ID, transaction date, transaction time, and so on), it can be difficult to determine which features contribute most to the model’s capability to predict an outcome.
Significant engineering to avert common pitfalls --These pitfalls include overfitting (the model is accurate for the data it was trained on, but doesn’t generalize to new data) and vanishing/exploding gradients (backpropagation blows up or grinds to a halt because the modifications to the weights become too large or too small at each step).
Ability to manipulate multiple hyperparameters --Data scientists need to control a set of knobs called hyperparameters , including learning rate (the size of the steps taken each time the weights are updated), regularization (various tactics to avert overfitting), and the number of times the training process iterates through the input dataset to train the model. Adjusting these knobs to get a good result can be like trying to fly a helicopter. As a helicopter pilot needs to coordinate hands and feet in harmony to keep the machine on a steady path and avoid crashing, a data scientist training a deep learning model needs to coordinate the hyperparameters in harmony to get desired results out of the model and avoid pitfalls such as overfitting. See chapter 5 for details about the hyperparameters used to train the model for this book’s extended example.
Tolerance for less-than-perfect accuracy --Deep learning is, by its nature, not going to produce 100% accurate predictions. If absolute accuracy is required, it’s better to use a more deterministic approach.

Here are some mitigations for these drawbacks:

Lots of labeled data --Deep learning’s thirst for massive amounts of labeled data can be tempered with transfer learning : reusing models or subsets of models that are trained to perform one task on a related task. A model trained on a large, general set of labeled image data can be used to jump-start a model that is being applied to a specific domain in which labeled image data is scarce. The extended example in this book does not apply transfer learning, but you can see Transfer Learning for Natural Language Processing by Paul Azunre (http:// mng.bz/GdVV) for details on the key role that transfer learning plays in deep learning use cases such as natural language processing and computer vision.
Hardware capable of doing massive matrix manipulations -- Today, it’s easy to get access to environments (including the cloud environments introduced in chapter 2) with sufficient hardware power to train challenging models at modest cost. The extended deep learning example in this book can be exercised faster in a cloud environment with hardware specifically designed for deep learning, but you can also exercise it on a reasonably provisioned modern laptop.
Tolerance for the model’s imperfect transparency -- Several vendors (including Amazon, Google, and IBM) now offer solutions to help make deep learning models more transparent and explain the behavior of deep learning models.
Significant engineering to avert common pitfalls -- Algorithm improvements keep making their way into common deep learning frameworks to help insulate you from problems like exploding gradients.
Ability to manipulate multiple hyperparameters -- Automated approaches to optimizing hyperparameters have the potential to reduce the complexity of tuning hyperparameters and make the experience of training a deep learning model less like flying a helicopter and more like driving a car, in that a limited set of inputs (steering wheel, accelerator) has direct results (car changes direction, car changes speed).

Less-than-perfect accuracy remains a challenge. The impact of imperfect accuracy depends on the problem that you are trying to solve. If you are predicting whether a client is going to churn (take its business to a competitor), being right 85% or 90% of the time may be more than sufficient for the problem. If you are predicting a potentially fatal medical condition, however, the intrinsic limits of deep learning are harder to get around. How much inaccuracy you can tolerate will depend on the problem you are solving.

1.3 Overview of the deep learning stack

A variety of deep learning frameworks is available today. The two most popular are TensorFlow (https://www.tensorflow.org), which dominates in industrial applications of deep learning, and PyTorch (https://pytorch.org), which has a strong following in the research community.

In this book, we’re going to use Keras (https://keras.io) as our deep learning library. Keras began life as a freestanding project that could be used as a frontend for a variety of deep learning frameworks. As explained in chapter 5, as of TensorFlow 2.0, Keras is integrated into TensorFlow. Keras is the recommended high-level API for TensorFlow. The code accompanying this book has been validated with TensorFlow 2.0, but you should not have any issues using later versions of TensorFlow.

Here is a brief introduction to the main components of the stack:

Python --This easy-to-learn, flexible interpreted language is by far the most popular language for machine learning. Python’s growth in popularity has closely tracked the machine learning renaissance in the past decade, and it now far outstrips its closest rival, R, as the lingua franca of machine learning. Python has a huge ecosystem and a massive set of libraries that cover not only everything you want to do with machine learning, but also the gamut of development. In addition, Python has a huge developer community, and you can easily find answers online to almost any Python question or problem. The code examples in this book are written entirely in Python, with the exception of the YAML config files described in chapter 3; an SQL example in chapter 2; and the deployments described in chapter 8, which include code in Markdown, HTML, and JavaScript.
Pandas --This Python library gives you everything you need to conveniently deal with tabular, structured data within Python. You can easily import structured data (whether from CSV or Excel files or directly from a table in a relational database) into a Pandas dataframe and then manipulate it with table operations (such as dropping and adding columns, filtering by column values, and joining tables). You can think of Pandas as being Python’s answer to SQL. Chapter 2 contains several examples of loading data into Pandas dataframes and using Pandas to perform common SQL-type operations.
scikit-learn --scikit-learn is an extensive Python library for machine learning. The extended example in this book makes extensive use of this library, including the data transformation utilities described in chapters 3 and 4 and the facility described in chapter 8, to define trainable data pipelines that prepare data both for training the deep learning model and for getting predictions from the trained model.
Keras --Keras is a straightforward library for deep learning that gives you ample flexibility and control while abstracting out some of the complexity of the low-level TensorFlow API. Keras has a large, active community that includes beginners and experienced machine learning practitioners, and it’s easy to find solid examples of using Keras for deep learning applications.

1.4 Structured vs. unstructured data

The title of this book contains two terms that do not commonly appear together: deep learning and structured data. Structured data (in the context of this book) refers to data that is organized in tables with rows and columns--the kind of data that resides in relational databases. Deep learning is an advanced machine learning technique that has demonstrated success on a range of problems with data that is not commonly stored in tables, such as images, video, audio, and text.

Why apply deep learning to structured data? Why combine a data paradigm that is 40 years old with cutting-edge deep learning? Aren’t there simpler approaches to solving problems that involve structured data? Aren’t there better applications of the power of deep learning than attempting to train models with data that resides in tables?

To answer these valid questions, we’re first going to define in a bit more detail what we mean by structured and unstructured data; in section 1.5, we’ll address these and other objections to applying deep learning to structured tabular data.

In this book, structured data is data that has been organized to reside in a relational database with rows and columns. The columns can contain numeric values (such as currency amounts, temperatures, time durations, or other quantities expressed as integer or floating-point values) or non-numeric values (such as strings, embedded structured objects, or unstructured objects).

All relational databases support SQL (albeit with varying dialects) as the primary interface to the database. Common relational databases include the following:

Proprietary databases --Oracle, SQL Server, Db2, Teradata
Open source databases --Postgres, MySQL, MariaDB
Propriety database offerings based on open source --AWS Redshift (based on Postgres)

Relational databases can include relationships between tables, such as foreign keys (in which the permissible values in the column of one table depend on the values in an identified column in another table). Tables can be joined to create new tables that contain combinations of the rows and columns from the tables participating in the join. Relational databases can also incorporate sets of code, such as sets of SQL statements called stored procedures, that can be invoked to access and manipulate data in the database. For the purposes of this book, we will be focusing on the row and typed column nature of tables rather than the additional intertable interactions and code interfaces provided by relational databases.

Relational databases are not the only possible repositories of structured tabular data. As shown in figure 1.5, data in Excel or CSV files is intrinsically structured in rows and columns, although unlike in relational tables, the types of the columns are not encoded as part of the structure but inferred from the column contents. The dataset for the main example in this book comes from a set of Excel files.

Figure 1.5 Examples of tabular structured data

For the purposes of this book, we will not be looking at unstructured data--data that is not organized to reside in tabular form in a relational database. As shown in figure 1.6, unstructured data includes image, video, and audio files, as well as text and tagged formats such as XML, HTML, and JSON. By this definition, unstructured data doesn’t necessarily have zero structure. The key/value pairs in JSON are a kind of structure, for example, but in its native state JSON is not organized in a tabular form with rows and columns, so for the purposes of this book, it is unstructured. To complicate matters further, structured data can contain unstructured elements, such as columns in a table that contain freeform text or that refer to XML documents or BLOBs (binary large objects).

Figure 1.6 Examples of unstructured data

Many books cover applications of deep learning to unstructured data such as images and text. This book takes a different direction by focusing exclusively on deep learning applied to tabular structured data. Sections 1.5 and 1.6 provide some justification for this focus on structured data, first discussing some reasons why you might be skeptical about a focus on structured data, and then reviewing the benefits of exploring a structured data problem with deep learning.

1.5 Objections to deep learning with structured data

Many of the celebrated applications of deep learning have involved unstructured data such as images, audio, and text. Some deep learning experts question whether deep learning should be applied to structured data at all and insist that a non-deep-learning approach is best for structured data.

To motivate your exploration of deep learning with structured data, let’s review some of the objections:

Structured datasets are too small to feed deep learning. Whether this objection is valid depends on the domain. Certainly, there are many domains (including the problem explored in this book) in which the labeled structured dataset contains tens of thousands or even millions of examples, making them large enough to be in contention for training a deep learning model.
Keep it simple. Deep learning is hard and complicated, so why not use an easier solution, such as non-deep-learning machine learning or traditional business intelligence applications? This objection was more valid three years ago than it is today. Deep learning has reached a tipping point in terms of simplicity and widespread use. Thanks to the popularity of deep learning, the tools available to exploit it are much easier to use. As you will see in the extended coding examples in this book, deep learning is now accessible to nonspecialists.
Handcrafted deep learning solutions are becoming less necessary. Why go through the effort of creating an end-to-end deep learning solution, particularly if you are not a full-time data scientist, if handcrafted solutions will increasingly be replaced by solutions that require little or no coding? The fast.ai library (https://docs.fast.ai), for example, allows you to create powerful deep learning models with a few lines of code, and data science environments like Watson Studio offer GUI-based model builders (as shown in figure 1.7) that let you create a deep learning model without doing any coding at all.
With these solutions, why make the effort to learn how to code a deep learning model directly? To understand how to use low-code or no-code solutions, you still need to understand how a deep learning model is put together, and the fastest way to learn that is to write the code to harness deep learning frameworks. If you deal primarily with tabular data in your job, it makes sense to be able to apply deep learning to that data. By coding a deep learning solution to a problem that involves structured tabular data that you understand thoroughly, you gain understanding of the concepts, strengths, and limitations of deep learning. Armed with that understanding, you will be able to exploit deep learning (whether hand-coded or not) to solve further problems. The extended example in this book takes you through an end-to-end example of applying deep learning to structured tabular data. In chapter 9, you will learn how to adapt the example in this book to your own structured data datasets.

Figure 1.7 Creating a deep learning model by using a GUI (Watson Studio)

In this section, we looked at common objections to using deep learning to solve problems involving structured data and reviewed subjective responses. A subjective response is not sufficient, however; we also need to compare working code implementations of deep learning versus non-deep learning. In chapter 7, we make a head-to-head comparison of two solutions to the extended example in this book: the deep learning solution and a solution based on a non-deep-learning approach called XGBoost. We compare these two approaches in terms of performance, model training time, code complexity, and flexibility.

1.6 Why investigate deep learning with a structured data problem?

In section 1.5, we reviewed some of the objections to applying deep learning to structured data. Let’s assume that you are satisfied with how these objections were handled. There’s still the question of what benefit you will get by taking the time to go through an extended example of applying deep learning to structured data. Many books can take you through the process of applying deep learning to a wide variety of problems and datasets. What distinguishes this book? What is the benefit of going through an end-to-end problem using a structured dataset with deep learning?

Let’s start with the big picture: there is a lot more unstructured data in the world than structured data (https://learn.g2.com/structured-vs-unstructured-data). If 80% of data is unstructured, why bother trying to apply deep learning to the small subset of all data that is structured? Although there may be four times as much unstructured data as structured data, the slice of the pie that is structured is extremely important. Banks, retailers, insurance companies, manufacturers, governments--the building blocks of modern life--run on relational databases. Every day as you go about your daily activities, you generate updates in dozens or even hundreds of tables in various relational databases. When you pay for something with your debit card, make a mobile phone call, or check your bank balance online, you are accessing or updating data in a relational database. On top of the importance of structured data to our daily lives, many jobs revolve around structured tabular data. Using deep learning on images and video is fun, but what if your job doesn’t deal with this kind of data? What if your job is all about tables in relational databases or CSV and Excel files? If you master the techniques of applying deep learning to structured data, you will be able to apply these techniques to solve real problems with the kinds of datasets that you encounter in your job.

In this book, you’ll learn from start to finish how to apply deep learning to a tabular structured dataset. You’ll learn how to prepare a real-world dataset (with all the typical warts and problems that these datasets have) for training a deep learning model, how to categorize the dataset by the column types in the table, and how to create a simple deep learning model that is automatically defined by this categorization of the data. You will learn how this model combines layers that are adapted to each category of data so that you can take advantage of different types of data in the source tables (text, categorical, and continuous) to train the model. You will also learn how to deploy the deep learning model and make it available for other people to use. The techniques that you will learn in this book are applicable to a wide variety of structured datasets and will allow you to unlock the potential of deep learning to solve problems with these datasets.

1.7 An overview of the code accompanying this book

The heart of this book is an extended coding example that applies deep learning to solve a problem with a real-world structured dataset. Chapter 2 introduces the problem and describes all the code used in this example. In this section, we briefly summarize the most important programs used to solve the problem.

The code that accompanies this book is made up of a series of Jupyter Notebooks and Python programs that take you from the raw input dataset to a deployed, trained deep learning model. You can find all the code, along with associated data and configuration files, at http://mng.bz/v95x. Following are some of the key files in the repo:

chapter2.ipynb --Code snippets associated with introductory code in chapter 2.
chapter5.ipynb --Code snippets associated with using Pandas to do SQL-type operations, as described in chapter 2.
Data preparation notebook --Code to ingest the raw dataset and perform common data cleansing and preparation steps. The output of this notebook is a Python pickle file that contains the Pandas dataframe with the cleansed training data.
Basic data exploration notebook --Basic exploratory data analysis of the dataset for the main example in this book, as described in chapter 3.
Data preparation for geocoding --Code for preparing the latitude and longitude values derived from location values in the main dataset, as described in chapter 4.
Time-series forecasting data exploration notebook --Additional exploration of the dataset for the main example in this book, using time-series forecasting techniques, as described in chapter 3.
Deep learning model training notebook --Code to refactor the cleansed data in a format that accounts for periods when there are no delays for a given streetcar and prepares this refactored form of the data for input to the Keras deep learning model, as described in chapters 5 and 6. The output of this notebook is a trained deep learning model.
XGBoost model training notebook --Code for exercising a non-deep-learning model. This notebook is identical to the notebook for training the deep learning model up to the actual model training code. In chapter 7, we compare the results of this model with the deep learning model.
Web deployment --Code for a simple web-based deployment of the trained deep learning model, as described in chapter 8.
Facebook Messenger deployment --Code for a deployment of the trained deep learning model as a chatbot in Facebook Messenger, as described in chapter 8.

The raw dataset used by the main example in the book is not in the repo but is published at http://mng.bz/4B2B.

1.8 What you need to know

To get the most out of this book, you should be comfortable with coding in Python in the context of Jupyter Notebooks as well as raw Python files. You should also be familiar with non-deep-learning machine learning approaches. In particular, you should have a grasp of the following concepts: overfitting, underfitting, loss function, and objective function. You should be comfortable with basic operations in one of the common cloud environments, such as AWS, Google Cloud, or Azure. For the deployment section, you should have some basic familiarity with web programming. Finally, you should have a background in relational databases and be comfortable with SQL.

This book covers the essentials of deep learning but does not dig into the theoretical details. Instead, it takes you through an extended example of applying deep learning on a practical example. If you need a deeper examination of deep learning and its implementation in the Python environment, Deep Learning with Python is an excellent resource. I heartily recommend the whole book as a complement to this one. Here are three chapters that provide additional background on general deep learning topics:

“The mathematical building blocks of neural networks”--Provides background on concepts that are fundamental to deep learning, including tensors (the core theoretical data container for deep learning) and backpropagation.
“Getting started with neural networks”--Walks through a variety of simple deep learning problems covering classification (predicting which class an input data point belongs to) and regression (predicting a continuous value target for an input data point).
“Advanced deep learning best practices”--Examines a variety of deep learning architectures and includes details on Keras callbacks (a topic introduced in chapter 6 of this book) and monitoring your deep learning model with TensorBoard.

In this book, I emphasize providing a practical exploration of the end-to-end process of deep learning with tabular, structured data, starting with the raw input data and going right to the deployed, trained deep learning model. Because this book covers such a broad scope, it won’t always be possible to go into details about every related technical topic. Throughout this book, where appropriate, I will refer to Deep Learning with Python , other Manning publications, and technical articles for more details on related topics. In addition, chapter 9 recommends resources on the theoretical background of deep learning.

Summary

Deep learning is a powerful technology that has come into its own in the past decade. So far, the celebrated applications of deep learning deal with nontabular data, such as images and text. In this book, I demonstrate that deep learning should also be considered for problems related to tabular, structured data.
Deep learning applies a set of techniques (including gradient-based optimization and backpropagation) to input data to automatically define functions that can predict outcomes on new data.
Deep learning has produced state-of-the-art results in a range of domains, but it has drawbacks compared with other machine learning techniques. These drawbacks include a lack of transparency about which features matter most to the model and a thirst for training data.
Some people think that deep learning should not be applied to tabular, structured data. These people say that deep learning is too complex, structured datasets are too small to train deep learning models, and simpler alternatives are adequate for structured data problems.
At the same time, structured data is essential to modern life. Why limit deep learning’s scope to images and freeform text? Many important problems involve structured data, so it’s worthwhile to learn how to harness deep learning to solve structured data problems.