Chapter 1. Getting started taming text

 

In this chapter

  • Understanding why processing text is important
  • Learning what makes taming text hard
  • Setting the stage for leveraging open source libraries to tame text

If you’re reading this book, chances are you’re a programmer, or at least in the information technology field. You operate with relative ease when it comes to email, instant messaging, Google, YouTube, Facebook, Twitter, blogs, and most of the other technologies that define our digital age. After you’re done congratulating yourself on your technical prowess, take a moment to imagine your users. They often feel imprisoned by the sheer volume of email they receive. They struggle to organize all the data that inundates their lives. And they probably don’t know or even care about RSS or JSON, much less search engines, Bayesian classifiers, or neural networks. They want to get answers to their questions without sifting through pages of results. They want email to be organized and prioritized, but spend little time actually doing it themselves. Ultimately, your users want tools that enable them to focus on their lives and their work, not just their technology. They want to control—or tame—the uncontrolled beast that is text. But what does it mean to tame text? We’ll talk more about it later in this chapter, but for now taming text involves three primary things:

1.1. Why taming text is important

1.2. Preview: A fact-based question answering system

1.3. Understanding text is hard

1.4. Text, tamed

1.5. Text and the intelligent app: search and beyond

1.6. Summary

1.7. Resources

sitemap