Chapter 1. Meet Apache Mahout

 

This chapter covers

  • What Apache Mahout is, and where it came from
  • A glimpse of recommender engines, clustering, and classification in the real world
  • Setting up Mahout

As you may have guessed from the title, this book is about putting a particular tool, Apache Mahout, to effective use in real life. It has three defining qualities.

First, Mahout is an open source machine learning library from Apache. The algorithms it implements fall under the broad umbrella of machine learning or collective intelligence. This can mean many things, but at the moment for Mahout it means primarily recommender engines (collaborative filtering), clustering, and classification.

It’s also scalable. Mahout aims to be the machine learning tool of choice when the collection of data to be processed is very large, perhaps far too large for a single machine. In its current incarnation, these scalable machine learning implementations in Mahout are written in Java, and some portions are built upon Apache’s Hadoop distributed computation project.

Finally, it’s a Java library. It doesn’t provide a user interface, a prepackaged server, or an installer. It’s a framework of tools intended to be used and adapted by developers.

To set the stage, this chapter will take a brief look at the sorts of machine learning that Mahout can help you perform on your data—using recommender engines, clustering, and classification—by looking at some familiar real-world instances.

1.1. Mahout’s story

1.2. Mahout’s machine learning themes

1.3. Tackling large scale with Mahout and Hadoop

1.4. Setting up Mahout

1.5. Summary

sitemap