Copyright
Brief Table of Contents
Table of Contents
Foreword
Preface
Acknowledgments
About this Book
About the Cover Illustration
1. Introduction to data science
Chapter 1. The data science process
1.1. The roles in a data science project
1.1.1. Project roles
1.2. Stages of a data science project
1.2.1. Defining the goal
1.2.2. Data collection and management
1.2.3. Modeling
1.2.4. Model evaluation and critique
1.2.5. Presentation and documentation
1.2.6. Model deployment and maintenance
1.3. Setting expectations
1.3.1. Determining lower and upper bounds on model performance
1.4. Summary
Chapter 2. Loading data into R
2.1. Working with data from files
2.1.1. Working with well-structured data from files or URLs
2.1.2. Using R on less-structured data
2.2. Working with relational databases
2.2.1. A production-size example
2.2.2. Loading data from a database into R
2.2.3. Working with the PUMS data
2.3. Summary
Chapter 3. Exploring data
3.1. Using summary statistics to spot problems
3.1.1. Typical problems revealed by data summaries
3.2. Spotting problems using graphics and visualization
3.2.1. Visually checking distributions for a single variable
3.2.2. Visually checking relationships between two variables
3.3. Summary
Chapter 4. Managing data
4.1. Cleaning data
4.1.1. Treating missing values (NAs)
4.1.2. Data transformations
4.3. Summary