chapter three

Chapter 3. Real-world Data

2.1. Getting started: data collection

2.1.1. Which features should be included?

2.1.2. How can we obtain ground truth for the target variable?

2.1.3. How much training data is required?

2.1.4. Is the training set representative enough?

2.2. Preprocessing the data for modeling

2.2.1. Categorical features

2.2.2. Dealing with missing data

2.2.3. Simple feature engineering

2.2.4. Data normalization

2.3. Using data visualization

2.3.1. Mosaic plots

2.3.2. Box plots

2.3.3. Density plots

2.3.4. Scatter plots

2.4. Summary

2.5. Terms from this chapter

What's inside

@font-face { font-family: 'livebook'; src:url('https://d19npu3b8zepp3.cloudfront.net/assets/fonts/livebook.eot?1.9.0'); src:url('https://d19npu3b8zepp3.cloudfront.net/assets/fonts/livebook.eot?1.9.0') format('embedded-opentype'), url('https://d19npu3b8zepp3.cloudfront.net/assets/fonts/livebook.woff?1.9.0') format('woff'), url('https://d19npu3b8zepp3.cloudfront.net/assets/fonts/livebook.ttf?1.9.0') format('truetype'), url('https://d19npu3b8zepp3.cloudfront.net/assets/fonts/livebook.svg?1.9.0') format('svg'); font-weight: normal; font-style: normal; }