Lesson 23. Working with text and Unicode

 

After reading lesson 23, you’ll be able to

  • Use the Text type for more-efficient text processing
  • Change Haskell’s behavior with language extensions
  • Program by using common text functions
  • Use Text to properly handle Unicode text

So far in this book, you’ve made heavy use of the String type. In the preceding lesson, you saw that you can even view an I/O stream as a lazy list of type Char, or a String. String has been useful in helping you explore many topics in this book. Unfortunately, String has a huge problem: it can be woefully inefficient.

From a philosophical standpoint, nothing could be more perfect than representing one of the more important types in programing as one of the most foundational data structures in Haskell: a list. The problem is that a list isn’t a great data structure to store data for heavy string processing. The details of Haskell performance are beyond the scope of this book, but it suffices to say that implementing Strings as a linked list of characters is needlessly expensive in terms of both time and space.

In this lesson, you’ll take a look at a new type, Text. You’ll explore how to replace String with Text for more-efficient text processing. Then you’ll learn about the functions common to both String and Text for processing text. Finally, you’ll learn about how Text handles Unicode by building a function that can highlight search text, even in Sanskrit!

23.1. The Text type

23.2. Using Data.Text

23.3. Text and Unicode

23.4. Text I/O

Summary