part three

Part 3.

Part 3 covers the practical issues you’ll likely encounter when performing outlier detection, such as working with different types of data, very large datasets, time constraints, and memory limits. Part 3 also covers techniques to evaluate individual detectors and the outlier detection system as a whole, including techniques to create synthetic data; it explains how to create ensembles and how to process and interpret the results of outlier detection processes, even where large numbers of outliers are flagged.

In chapter 8, we go over techniques to identify the most useful detectors and best hyperparameters for any given project. Often there are many approaches possible to identify outliers in a dataset, and it can be quite nontrivial to identify the most appropriate tools and settings for your needs.

In chapter 9, we look at working with specific types of data (for example, text data, dates, addresses), encoding categorical data, binning and scaling numeric data, and the distance metrics that are used by many algorithms. Decisions related to these can significantly affect the outliers flagged, so it is important to understand how to set these appropriately.

In chapter 10, we look at handling very large and very small datasets.