chapter eight
8 Learning with Categorical Features
This chapter covers
- An introduction to categorical features in machine learning
- Preprocessing categorical features using supervised and unsupervised encoding
- Understanding how ordered boosting works
- Introducing CatBoost: a powerful ordered boosting framework for categorical variables
- Handling high-cardinality categorical features
8.1 Hidden heading for figure and table indices (ignore this)
Data sets for supervised machine learning consist of features that describe objects, and labels that describe the targets we are interested in modeling. At a high level, features, also known as attributes or variables, are usually classified into two types: continuous and categorical.
A categorical feature is one that takes a discrete value from a set of finite, non-numeric values, called categories. Categorical features are ubiquitous and appear in nearly every data set and in every domain. For example,