Chapter 8. Text mining and text analytics

 

This chapter covers

  • Understanding the importance of text mining
  • Introducing the most important concepts in text mining
  • Working through a text mining project

Most of the human recorded information in the world is in the form of written text. We all learn to read and write from infancy so we can express ourselves through writing and learn what others know, think, and feel. We use this skill all the time when reading or writing an email, a blog, text messages, or this book, so it’s no wonder written language comes naturally to most of us. Businesses are convinced that much value can be found in the texts that people produce, and rightly so because they contain information on what those people like, dislike, what they know or would like to know, crave and desire, their current health or mood, and so much more. Many of these things can be relevant for companies or researchers, but no single person can read and interpret this tsunami of written material by themself. Once again, we need to turn to computers to do the job for us.

Sadly, however, the natural language doesn’t come as “natural” to computers as it does to humans. Deriving meaning and filtering out the unimportant from the important is still something a human is better at than any machine. Luckily, data scientists can apply specific text mining and text analytics techniques to find the relevant information in heaps of text that would otherwise take them centuries to read themselves.

8.1. Text mining in the real world

 
 

8.2. Text mining techniques

 

8.3. Case study: Classifying Reddit posts

 
 
 

8.4. Summary

 
 
sitemap

Unable to load book!

The book could not be loaded.

(try again in a couple of minutes)

manning.com homepage
test yourself with a liveTest