List of Tables

 

Chapter 2. The data science process

Table 2.1. A list of open-data providers that should get you started

Table 2.2. An overview of common errors

Table 2.3. Detecting outliers on simple variables with a frequency table

Table 2.4. An overview of techniques to handle missing data

Chapter 3. Machine learning

Table 3.1. Confusion matrix example

Table 3.2. The first three rows of the Red Wine Quality Data Set

Table 3.3. The findings of the PCA

Table 3.4. How PCA calculates the 11 original variables’ correlation with 5 latent variables

Table 3.5. Interpretation of the wine quality PCA-created variables

Table 3.6. The first three rows of the Red Wine Quality Data Set recoded in five latent variables

Chapter 4. Handling large data on a single computer

Table 4.1. Classification problem: Can a website be trusted or not?

Table 4.2. Examples of calculating the hamming distance

Table 4.3. Combining the information from different columns into the movies column. This is also how DNA works: all information in a long string.

Table 4.4. Excerpt from the client database and the movies customers rented

Table 4.5. The most similar customers to customer 27

Table 4.6. Movies from customer 2 can be used as suggestions for customer 27.

Chapter 5. First steps in big data

Table 5.1. List of common Hadoop file system commands

Chapter 8. Text mining and text analytics

Table 8.1. A list of all POS tags