Table of Contents

 

Copyright

Brief Table of Contents

Table of Contents

Foreword

Preface

Acknowledgments

About this Book

About the Cover Illustration

1. Introduction to data science

Chapter 1. The data science process

1.1. The roles in a data science project

1.1.1. Project roles

1.2. Stages of a data science project

1.2.1. Defining the goal

1.2.2. Data collection and management

1.2.3. Modeling

1.2.4. Model evaluation and critique

1.2.5. Presentation and documentation

1.2.6. Model deployment and maintenance

1.3. Setting expectations

1.3.1. Determining lower and upper bounds on model performance

1.4. Summary

Chapter 2. Loading data into R

2.1. Working with data from files

2.1.1. Working with well-structured data from files or URLs

2.1.2. Using R on less-structured data

2.2. Working with relational databases

2.2.1. A production-size example

2.2.2. Loading data from a database into R

2.2.3. Working with the PUMS data

2.3. Summary

Chapter 3. Exploring data

3.1. Using summary statistics to spot problems

3.1.1. Typical problems revealed by data summaries

3.2. Spotting problems using graphics and visualization

3.2.1. Visually checking distributions for a single variable

3.2.2. Visually checking relationships between two variables

3.3. Summary

Chapter 4. Managing data

4.1. Cleaning data

4.1.1. Treating missing values (NAs)

4.1.2. Data transformations

4.3. Summary