List of Figures

 

Chapter 1. The data science process

Figure 1.1. The lifecycle of a data science project: loops within loops

Figure 1.2. The fraction of defaulting loans by credit history category. The dark region of each bar represents the fraction of loans in that category that defaulted.

Figure 1.3. A decision tree model for finding bad loan applications, with confidence scores

Figure 1.4. Notional slide from an executive presentation

Chapter 2. Loading data into R

Figure 2.1. Car data viewed as a table

Figure 2.2. SQuirreL SQL table explorer

Figure 2.3. Browsing PUMS data using SQuirreL SQL

Figure 2.4. Strings encoded as indicators

Chapter 3. Exploring data

Figure 3.1. Some information is easier to read from a graph, and some from a summary.

Figure 3.2. A unimodal distribution (gray) can usually be modeled as coming from a single population of users. With a bimodal distribution (black), your data often comes from two populations of users.

Figure 3.3. A histogram tells you where your data is concentrated. It also visually highlights outliers and anomalies.

Figure 3.4. Density plots show where data is concentrated. This plot also highlights a population of higher-income customers.

Figure 3.5. The density plot of income on a log10 scale highlights details of the income distribution that are harder to see in a regular density plot.

Figure 3.6. Bar charts show the distribution of categorical variables.