Chapter 7. Getting smart with MLlib

 

This chapter covers

  • Machine-learning basics
  • Performing linear algebra in Spark
  • Scaling and normalizing features
  • Training and applying a linear regression model
  • Evaluating the model’s performance
  • Using regularization
  • Optimizing linear regression

Machine learning is a scientific discipline that studies the use and development of algorithms that make computers accomplish complicated tasks without explicitly programming them. That is, the algorithms eventually learn how they can solve a given task. These algorithms include methods and techniques from statistics, probability, and information theory.

Today, machine learning is ubiquitous. Examples include online stores that offer you similar items that other users have viewed or bought, email clients that automatically move emails to spam, advances in autonomous driving recently developed by several car manufacturers, and speech and video recognition. It’s also becoming a big part of online business: finding hidden relationships in user habits and actions (and learning from them) can bring critical added value to existing products and services.

But with the advent of companies handling huge amounts of data (known as big data), more scalable machine-learning packages are needed. Spark provides distributed and scalable implementations of various machine-learning algorithms and makes it possible to handle those continuously growing datasets.[1]

7.1. Introduction to machine learning

7.2. Linear algebra in Spark

7.3. Linear regression

7.4. Analyzing and preparing the data

7.5. Fitting and using a linear regression model

7.6. Tweaking the algorithm

7.7. Optimizing linear regression

7.8. Summary

sitemap