chapter two

2 Your first NLP example

 

This chapter covers:

  • How to implement your first practical NLP application from scratch
  • How to structure an NLP project from beginning to end
  • A number of useful NLP concepts, including tokenization and text normalization
  • How to apply a Machine Learning algorithm to textual data

In this chapter, you will learn how to implement your own NLP application from scratch. In doing so, you will also learn how to structure a typical NLP pipeline and how to apply a simple machine learning algorithm to solve your task. The particular application you will implement is spam filtering. We overviewed it in Chapter 1 as one of the classic tasks on the intersection of NLP and machine learning.

2.1       Introducing NLP in practice: spam filtering

In this book, you use the spam filtering as your first practical NLP application as it is an example of a very widely spread family of tasks – text classification. Text classification comprises a number of applications that we discuss in this book, for example user profiling (Chapter 5), sentiment analysis (Chapter 6) and topic labeling (Chapter 8), so this chapter will give you a good start for the rest of the book. First, let’s see what exactly classification addresses.

2.2       Understanding the task

2.3        Implementing your own spam filter

2.3.1   Step 1: Define the data and classes

2.3.2   Step 2: Split the text into words

2.3.3   Step 3: Extract and normalize the features

2.3.4   Step 4: Train the classifier

2.3.5   Step 5: Evaluate your classifier

2.4       Deploying your spam filter in practice

2.5       Summary