chapter four

4 Classical algorithms for tabular data

This chapter covers

An introduction to Scikit-learn
Exploring and processing features of the Airbnb NYC dataset
Some classic machine learning techniques

Depending on the problem, classic machine learning algorithms are often the most practical approach to working with tabular data. With decades of research and practice behind these tools and algorithms, there is a rich palette of solutions to choose from.

In this chapter, we’ll cover essential algorithms in classical machine learning for making predictions using tabular data. We have focused on the linear models because they are still the most common solutions for both a challenging baseline and a solid and robust model in production. In addition, discussing linear models helps us build concepts and ideas that we can find in deep learning architectures and in more advanced machine learning algorithms, such as gradient-boosting decision trees (which will be one of the topics of the next chapter).

We’ll also give you a quick introduction to Scikit-learn, a powerful and versatile machine learning library that we’ll use to continue exploring the Airbnb NYC dataset. We’ll stay away from lengthy mathematical definitions and textbook details in favor of examples and practical recommendations for applying these models to tabular data problems.

4.1 Introducing Scikit-learn

4.1.1 Common features of Scikit-learn packages

4.1.2 Common Scikit-learn interface

4.1.3 Introduction to Scikit-learn pipelines

4 Classical algorithms for tabular data

This chapter covers

4.1 Introducing Scikit-learn

4.1.1 Common features of Scikit-learn packages

4.1.2 Common Scikit-learn interface

4.1.3 Introduction to Scikit-learn pipelines

4.2 Exploring and processing features of the Airbnb NYC dataset

4.2.1 Dataset exploration

4.2.2 Pipelines preparation

4.3 Classical machine learning

4.3.1 Linear and logistic regression

4.3.2 Regularized methods

4.3.3 Logistic regression

4.3.4 Generalized linear methods

4.3.5 Handling large datasets with stochastic gradient descent

4.3.6 Choosing your algorithm

Summary