chapter seven

Chapter 7. Data mining: process, toolkits, and standards

This chapter covers

A brief overview of the data mining process
Introduction to key mining algorithms
WEKA, the open source data mining software
JDM, the Java Data Mining standard

The data mining process enables us to find gems of information by analyzing data. In this chapter, you’ll be introduced to the field of data mining. The various data mining algorithms, tools, and data mining jargon can be overwhelming. This chapter provides a brief overview and walks you through the process involved in building useful models. Implementing algorithms takes time and expertise. Fortunately, there are free open source data mining frameworks that we can leverage. We use WEKA—Waikato Environment for Knowledge Analysis—a Java-based open source toolkit that’s widely used in the data mining community. We look at the core packages of WEKA and work through a simple example to show how WEKA can be used for learning. We really don’t want our implementation to be specific to WEKA. Fortunately, two initiatives through the Java Community Process—JSR 73 and JSR 247—provide a standard API for data mining. This API is known as Java Data Mining (JDM). We discuss JDM in the last section of this chapter and review its core components. We take an even deeper look at JDM in chapters 9 and 10, when we discuss clustering and predictive models.

Chapter 7. Data mining: process, toolkits, and standards

This chapter covers

7.1. Core concepts of data mining

7.2. Using an open source data mining framework: WEKA

7.3. Standard data mining API: Java Data Mining (JDM)

7.4. Summary

7.5. Resources