In this chapter, you will learn how to implement your own NLP application from scratch. In doing so, you will also learn how to structure a typical NLP pipeline and how to apply a simple machine-learning algorithm to solve your task. The particular application you will implement is spam filtering. We overviewed it in chapter 1 as one of the classic tasks on the intersection of NLP and machine learning.
In this book, you use spam filtering as your first practical NLP application, as it exemplifies a widely spread family of tasks—text classification. Text classification comprises several applications that we discuss in this book, including user profiling (chapters 5 and 6), sentiment analysis (chapters 7 and 8), and topic classification (chapter 9), so this chapter will give you a good start. First, let’s see what exactly classification addresses.