Lesson 6. Lists

published book

After reading lesson 6, you’ll be able to

Identify the parts that make up a list
Know how to build lists
Understand the role of lists in functional programming
Use common functions on a list
Learn the basics of lazy evaluation

In many ways, an array is the fundamental data structure for programming in C. If you properly understand arrays in C, you necessarily understand how memory allocation works, how data is stored on a computer, and the basics of pointers and pointer arithmetic. For Haskell (and functional programming in general), the fundamental data structure is a list. Even as you approach some of the more advanced topics in this book, such as functors and monads, the simple list will still be the most useful example.

This lesson provides a proper introduction to this surprisingly important data structure. You’ll learn the basics of taking lists apart and putting them back together, as well as learning some of the essential functions for a list that Haskell provides. Finally, you’ll take a peek at another unique feature of Haskell: lazy evaluation. Lazy evaluation is so powerful that it allows you to represent and work with lists that are infinitely long! If you get stuck on a topic in Haskell, it’s almost always helpful to turn back to lists to see if they can give you some insight.

Consider this

You work for a company that has 10,000 employees, and some of them want to play on an after-work softball team. The company has five teams, named after colors, which you want to use to assign employees:

teams = ["red","yellow","orange","blue","purple"]

You have a list of employees and you want to match them to the correct team as evenly as possible. What’s a simple way that you can use Haskell’s list functions to perform this task?

6.1. The anatomy of a list

Lists are the single most important data structure in functional programming. One of the key reasons is that lists are inherently recursive. A list is either an empty list or an element followed by another list. Taking apart and building lists are fundamental tools for many techniques in functional programming.

When taking apart a list, the main pieces are the head, the tail, and the end (represented by []). The head is just the first element in a list:

GHCi> head [1,2,3]
1
GHCi> head [[1,2],[3,4],[5,6]]
[1,2]

The tail is the rest of the list left over, after the head:

GHCi> tail [1,2,3]
[2,3]
GHCi> tail [3]
[]

The tail of a list with just one element is [], which marks the end of the list. This end of the list is just an empty list. But an empty list is different from other lists, as it has neither a head nor a tail. Calling head or tail on [] will result in an error. If you look at the head and tail, you can start to see the recursive nature of working with lists: a head is an element, and a tail is another list. You can visualize this by imagining tearing the first item off a grocery list, as in figure 6.1.

Figure 6.1. A list is made up of the head element and the tail list.

You can break a list into pieces, but this does you little good if you can’t put them back together again! In functional programming, building lists is just as important as breaking them down. To build a list, you need just one function and the infix operator (:), which is called cons. This term is short for construct and has its origins in Lisp. We’ll refer to this operation as consing, because : looks a bit odd in a sentence.

To make a list, you need to take a value and cons it with another list. The simplest way to make a list is to cons a value with the empty list:

GHCi> 1:[]
[1]

Under the hood, all lists in Haskell are represented as a bunch of consing operations, and the [...] notation is syntactic sugar (a feature of the programming language syntax designed solely to make things easier to read):

GHCi> 1:2:3:4:[]
[1,2,3,4]

GHCi> (1,2):(3,4):(5,6):[]
[(1,2),(3,4),(5,6)]

Notice that all of these lists end with the empty list []. By definition, a list is always a value consed with another list (which can also be an empty list). You could attach the value to the front of an existing list if you wanted:

GHCi> 1:[2,3,4]
[1,2,3,4]

It’s worth noting that the strings you’ve seen so far are themselves syntactic sugar for lists of characters (denoted by single quotes rather than double quotes):

GHCi>['h','e','l','l','o']
"hello"
GHCi> 'h':'e':'l':'l':'o':[]
"hello"

An important thing to remember is that in Haskell every element of the list must be the same type. For example, you can cons the letter 'h' to the string "ello" because "ello" is just a list of characters and 'h' (single quotes) is a character:

GHCi> 'h':"ello"
"hello"

But you can’t cons "h" (double quotes) to "ello" because "h" is a list of one character and the values inside "ello" are individual characters. This becomes more obvious when you remove the syntactic sugar.

Listing 6.1. Consing characters and strings

GHCi> "h":"ello"                     #1
GHCi> ['h']:['e','l','l','o']        #2
GHCi>  'h':[]:'e':'l':'l':'o':[]     #3

#1 Error!
#2 Same code with one layer of sugar removed
#3 Completely desugared

If you do want to combine two lists, you need to concatenate them by using ++. You saw this in lesson 3 with concatenating text, but given that strings are just lists, it will work on any list:

GHCi> "h" ++ "ello"
"hello"
GHCi> [1] ++ [2,3,4]
[1,2,3,4]

Consing is important to understand because it’s an essential part of writing recursive functions on lists. Nearly all sequential operations in functional programing involve building lists, breaking them apart, or a combination of the two.

6.2. Lists and lazy evaluation

Because lists are so important in Haskell, there are many ways to quickly generate ranges of data. Here are some examples:

GHCi> [1 .. 10]                               #1
[1,2,3,4,5,6,7,8,9,10]

GHCi> [1,3 .. 10]                             #2
[1,3,5,7,9]

GHCi> [1, 1.5 .. 5]                           #3
[1.0,1.5,2.0,2.5,3.0,3.5,4.0,4.5,5.0]

GHCi> [1,0 .. -10]                            #4
[1,0,-1,-2,-3,-4,-5,-6,-7,-8,-9,-10]

#1 Generates a list of numbers from 1 through 10
#2 Adding the next step, 3, generates odd numbers.
#3 Generates a list in increments of 0.5
#4 Generates a decrementing list

These are useful but not particularly interesting. Many programing languages have a range function that works in a similar manner. What happens if you forget to put an upper bound to your range?

GHCi> [1 .. ]
[1,2,3,4,5,6,7,8,9,10,11,12 ..

An unending list is generated! This is cool but quickly clogs up the terminal and doesn’t seem particularly useful. What’s interesting is that you can assign this list to a variable and even use it in a function:

simple x = x
longList = [1 .. ]
stillLongList = simple longList

What’s shocking is that this code compiles just fine. You defined an infinite list and then used it in a function. Why didn’t Haskell get stuck trying to evaluate an infinitely long list? Haskell uses a special form of evaluation called lazy evaluation. In lazy evaluation, no code is evaluated until it’s needed. In the case of longList, none of the values in the list were needed for computation.

Lazy evaluation has advantages and disadvantages. It’s easy to see some of the advantages. First, you get the computational benefit that any code you don’t absolutely need is never computed. Another benefit is that you can define and use interesting structures such as an infinite list. This can be useful for plenty of practical problems. The disadvantages of lazy evaluation are less obvious. The biggest one is that it’s much harder to reason about the code’s performance. In this trivial example, it’s easy to see that any argument passed to simple won’t be evaluated, but even a bit more complexity makes this less obvious. An even bigger problem is that you can easily build up large collections of unevaluated functions that would be much cheaper to store as values.

Quick check 6.1

Q1:

True or false: You can compile and run a program with the variable backwardsInfinity = reverse [1..].

QC 6.1 answer

True. Even though you’re reversing an infinite list, you’re never calling this code, so the infinite list is never evaluated. If you loaded this code into GHCi and typed the following

GHCi> backwardsInfinity

you’d have a problem, as the program would need to evaluate this argument to print it out.

6.3. Common functions on lists

Because lists are so important, a wide range of useful functions are built into Haskell’s standard library module, called Prelude. So far, you’ve seen head, tail, : and ++, which allow you to take apart lists and put them back together. There are many other useful functions on lists that will come up so frequently when writing Haskell that it’s worth familiarizing yourself with them.

6.3.1. The !! operator

If you want to access a particular element of a list by its index, you can use the !! operator. The !! operator takes a list and a number, returning the element at that location in the list. Lists in Haskell are indexed starting at 0. If you try to access a value beyond the end of the list, you’ll get an error:

GHCi> [1,2,3] !! 0
1
GHCi> "puppies" !! 4
'i'
GHCi> [1..10] !! 11
*** Exception: Prelude.!!: index too large

As mentioned in lesson 5, any infix operator (an operator that’s placed between two values, such as +) can also be used like a prefix function by wrapping it in parentheses:

GHCi> (!!) [1,2,3] 0
1

Using prefix notation can often make things such as partial application easier. Prefix notation is also useful for using operators as arguments to other functions. You can still use partial application with an infix operator; you just need to wrap the expression in parentheses:

GHCi> paExample1 = (!!) "dog"
GHCi> paExample1 2
'g'
GHCi> paExample2 = ("dog" !!)
GHCi> paExample2 2
'g'

Notice that in paExample2 you see how partial application works with infix binary operators. To perform partial application on a binary operator, called a section, you need to wrap the expression in parentheses. If you include only the argument on the right, the function will be waiting for the leftmost argument; if you include only the argument on the left, you get a function waiting for the argument on the right. Here’s paExample3, which creates partial application of the right argument:

GHCi> paExample3 = (!! 2)
GHCi> paExample3 "dog"
'g'

The important thing to remember about sections is that the parentheses aren’t optional.

6.3.2. length

The length function is obvious; it gives you the length of the list!

GHCi> length [1..20]
20
GHCi> length [(10,20),(1,2),(15,16)]
3
GHCi> length "quicksand"
9

6.3.3. reverse

As expected, reverse reverses the list:

GHCi> reverse [1,2,3]
[3,2,1]
GHCi> reverse "cheese"
"eseehc"

You can use reverse to make a basic palindrome checker, as shown in the next listing.

Listing 6.2. `isPalindrome`

isPalindrome word = word == reverse word

GHCi> isPalindrome "cheese"
False
GHCi> isPalindrome "racecar"
True
GHCi> isPalindrome [1,2,3]
False
GHCi> isPalindome [1,2,1]
True

6.3.4. elem

The elem function takes a value and a list and checks whether the value is in the list:

GHCi> elem 13 [0,13 .. 100]
True
GHCi> elem 'p' "cheese"
False

elem is a function that you may want to treat as an infix operator for readability. Any binary function can be treated as an infix operator by wrapping it in back-quotes (`). For example, the function respond returns a different response depending on whether a string has an exclamation mark, as follows.

Listing 6.3. `respond`

respond phrase = if '!' `elem` phrase
                 then "wow!"
                 else "uh.. okay"

GHCi> respond "hello"
"uh.. okay"
GHCi> respond "hello!"
"wow!"

Whether infix elem adds much readability is certainly debatable, but in the real world you’ll frequently come across infix forms of binary functions.

6.3.5. take and drop

The take function takes a number and a list as arguments and then returns the first n elements of the list:

GHCi> take 5 [2,4..100]
[2,4,6,8,10]
GHCi> take 3 "wonderful"
"won"

If you ask for more values then a list has, take gives you what it can, with no error:

GHCi> take 1000000 [1]
[1]

take works best by being combined with other functions on lists. For example, you can combine take with reverse to get the last n elements of a list.

Listing 6.4. `takeLast`

takeLast n aList = reverse (take n (reverse aList))

GHCi> takeLast 10 [1..100]
[91,92,93,94,95,96,97,98,99,100]

The drop function is similar to take, except it removes the first n elements of a list:

GHCi> drop 2 [1,2,3,4,5]
[3,4,5]
GHCi> drop 5 "very awesome"
"awesome"

6.3.6. zip

You use zip when you want to combine two lists into tuple pairs. The arguments to zip are two lists. If one list happens to be longer, zip will stop whenever one of the two lists is empty:

GHCi> zip [1,2,3] [2,4,6]
[(1,2),(2,4),(3,6)]
GHCi> zip "dog" "rabbit"
[('d','r'),('o','a'),('g','b')]
GHCi> zip ['a' .. 'f'] [1 .. ]
[('a',1),('b',2),('c',3),('d',4),('e',5),('f',6)]

6.3.7. cycle

The cycle function is particularly interesting, because it uses lazy evaluation to create an infinite list. Given a list, cycle repeats that list endlessly. This may seem somewhat useless but comes in handy in a surprising number of situations. For example, it’s common in numerical computing to need a list of n ones. With cycle, this function is trivial to make.

Listing 6.5. `ones`

ones n = take n (cycle [1])

GHCi> ones 2
[1,1]
GHCi> ones 4
[1,1,1,1]

cycle can be extremely useful for dividing members of a list into groups. Imagine you want to divide a list of files and put them on n number of servers, or similarly spilt up employees onto n teams. The general solution is to create a new function, assignToGroups, that takes a number of groups and a list, and then cycles through the groups, assigning members to them.

Listing 6.6. `assignToGroups`

assignToGroups n aList = zip groups aList
  where groups = cycle [1..n]

GHCi> assignToGroups 3 ["file1.txt","file2.txt","file3.txt"
                     ,"file4.txt","file5.txt","file6.txt","file7.txt"
                     ,"file8.txt"]

[(1,"file1.txt"),(2,"file2.txt"),(3,"file3.txt"),(1,"file4.txt"),
(2,"file5.txt"),(3,"file6.txt"),(1,"file7.txt"),(2,"file8.txt")]

GHCi> assignToGroups 2 ["Bob","Kathy","Sue","Joan","Jim","Mike"]
[(1,"Bob"),(2,"Kathy"),(1,"Sue"),(2,"Joan"),(1,"Jim"),(2,"Mike")]

These functions are just some of the more common of a wide range of list functions that Haskell offers. Not all of the functions on lists are included in the standard Prelude module. All list functions, including those automatically included in Prelude, are in the Data.List module. An exhaustive list of Data.List functions can be found online in the standard Haskell documentation (https://hackage.haskell.org/package/basedocs/Data-List.html).

Summary

In this lesson, our objective was to go over the basic structure of a list. You learned that a list is made up of a head and a tail that are consed together. We also went over many of the most common functions on a list. Let’s see if you got this.

Q6.1

Haskell has a function called repeat that takes a value and repeats it infinitely. Using the functions you’ve learned so far, implement your own version of repeat.

Q6.2

Write a function subseq that takes three arguments: a start position, an end position, and a list. The function should return the subsequence between the start and end. For example:

GHCi> subseq 2 5 [1 .. 10]
[3,4,5]
GHCi> subseq 2 7 "a puppy"
"puppy"

Q6.3

Write a function inFirstHalf that returns True if an element is in the first half of a list, and otherwise returns False.