1 Introduction

published book

This chapter covers

What Haskell is
What pure functional programming is and why it matters
The advantages of using abstractions within programming
What we will learn in this book

Complex software is all around us and we as programmers need tools in order to construct it. Mainly, we need programming languages that can facilitate and ease our development process and Haskell, a state-of-the-art language providing a mix of various cutting edge technologies, is exactly that. Featuring an impressive amount of language features mixed with a certain elegance that few other languages can achieve, Haskell has become shrouded in legend and myth… and we are going to take a look behind the curtain!

In this book we will cover the implementation of various small (some might even call them tiny) projects. Some of them are just for fun, some of them are useful tools and some of them were chosen to specifically show you, the reader, how to effectively use Haskell to create fast, safe and reliable software. In following along with these projects, you will learn the ins and outs of writing software using some of the most elegant and elaborate programming models.

If you have researched Haskell before and thought "I don’t know where to start" or were unsure about the practicality of it, don’t worry. This book does not require any prior knowledge about any of Haskells features or functional programming in general. We will start slow with simple algorithms and end by writing our own web servers!

So, jump in. The journey is worth it!

1.1 What is Haskell?

If you have picked up this book, there is a good chance you already have some idea of what Haskell is. You know that it is a programming language, you most likely know that it is a functional programming language and you probably have heard of the unique nature of some of its features. But why would you use Haskell and what is it, that makes it special?

The first exciting thing are the many high-level abstractions such as algebraic data types, type classes and monads, all of which will be explored in this book. These abstractions let us write neat, composable and clean code, that let us generalize complexities into reusable functionalities. Why is that a good thing? It saves time and avoids a lot of headaches when debugging. It is very common to implement algorithms only once in Haskell and then reuse them in different contexts over and over again. Why write a sorting algorithm for multiple different data types, when you could just write it once?

The second special property of Haskell is its direct connection to academia and contemporary programming language research. Other languages typically are born from an industrial need. This reflects heavily in their architectural design. Java serves as a prime example since many low level design decisions were first made due to the advertised write once, run anywhere promise. Later specification changes came with the Enterprise Edition, promising a well designed API for web services and distributed computing. Haskell on the other hand doesn’t have such a background. It was built with (then modern) results from type theory and programming language design focusing on simple concepts from which the language was built. The core language is thus relatively simple and small. Many of the additional features of the language where first studied in academia and then added to it. This makes Haskell much more focused on program correctness than other languages leading to design decisions that heavily favor safety.

However, that does not mean that performance has been neglected in Haskell. Its compiler is an optimizing compiler which means that it will try to analyze the source code and perform rewriting steps to decrease its execution time. In certain situations, aggressive optimizations can even lead to code with a performance comparable to low-level languages such as C! Haskell also features powerful libraries for parallelism, concurrency and asynchronous computations, making it simple to write multi-threaded code to improve performance even further, all the while you as a programmer do not have to worry about pesky details such as memory-management since the run time uses a garbage collector.

Haskell is not an ordinary functional programming language but it is a pure functional programming language. By pure we mean that functions we write in our programs are much like functions in the mathematical sense. They have input and produce some output, but have no side effects. This means that functions can only work with the data we put into them and cannot, for example, read additional data from files or the network. Is that helpful? Yes! It makes Haskell programs easier to analyze and to understand just by reading the code itself and generally leads to less bugs and more reliable software. We will explore this concept at great length in this book since it forces us to think about programming differently than we normally do.

So how does Haskell compare to other languages, keeping the aforementioned design decisions in mind? An important attribute of programming languages is the strength of assertions, meaning how many rules on the programming semantics the compiler or interpreter of that language enforces. This brings about two qualities in programming languages: Freedom and safety. The freedom specifies how unrestricted the programmer is in their ability to modify the program state and work with resources such as memory, files, sockets and threads. The safety specifies the likeliness of a written program to be inherently free of undefined behavior and bugs. Both have an inverse correlation. If the languages allows many dangerous operations its safety decreases, but if these operations are simply forbidden the safety increases while the freedom to do dangerous things decreases since these operations are either impossible or have a much higher implementation overhead. This relationship is represented in Figure 1.1.

Figure 1.1. The Freedom-Safety relationship

Programming languages have to make compromises when it comes to these qualities. Many dynamically typed scripting languages, like Python, allow almost everything that you could possibly want. Resulting from this design decision, the inherent safety of these languages is very low. It is easy to write scripts and it is even easier to write scripts that fail on execution. Statically typed, compiled languages try to increase the safety by letting the compiler perform sanity checks on the program semantics, which forces the programmer to adhere to the type systems rules in order to produce programs which are allowed to be executed. The stronger the assertions the compiler makes are the less error prone the language becomes but the harder it also is to write a program which the compiler allows. For example, Python allows for values of arbitrary type to be passed to any function which makes it very easy to write polymorphic functions but also is prone to errors since you can not necessarily be sure of the types the arguments of your functions have. In a language like C, your compiler at least checks if the types of variables matches, but it gives you free reign over the memory, which makes it easy to manipulate data on a low level, but might lead to undefined behaviors. Java uses a garbage-collector as an abstraction to deal with this issue but still lets you share object references across processes, which can lead to race conditions. A language like Rust disallows such behavior, ruling out many types of unexpected behaviors, but still allows for mutable state. Haskell is a language on the safe side of programming, having many restrictions like enforcing immutable data and not allowing side effects without the usage of special programming models.

In Figure 1.2 we can see how we can understand the process of compiling programs as a pipeline. The programmer writes (syntactically correct) source code which is then being used by the compiler (or interpreter) to create executable binary code. The stricter the compiler enforces rules the "harder" it becomes for the programmer. However, this filters away many programs which might eventually fail while executing. So if the compiler is very strict there is a good chance that if the program compiles it produces the correct result and does not crash while executing or produce undefined behavior. This means that compiler makes sure that the programmers intent is correctly reflected by the executable code. Haskell falls into this safe category.

Figure 1.2. The programming pipeline

To understand the uniqueness and helpfulness of Haskells safe paradigm I want to give a little bit of background.

1.2 The Pure Functional Way

Since the dawn of programming we are confronted with a question which is easily stated but almost impossible to solve: How do we know that our programs are correct? With this question we don’t just want to know that our programs don’t crash but that they indeed have the intended behavior we as programmers specified. Many resources are used for quality assurance and testing in software development. This makes it seem that we simply can’t trust the programs we wrote. Why is that? When we write a function, don’t we know what it does? For the most part, we do! However, there is one thing which often cannot be correctly be accounted for: State. When a program is running it has some internal state which consists of the data the program is working with (e.g. the values of our variables) and the execution context (e.g. what system the program is running on and what its environment is). In the last 50 years, many attempts have been made to work with state in a more manageable way. One answer was to split up the state and package it into objects, associated with some interface to work with that state in a controlled manner. That is essentially what object-oriented programming is. However, there is another way of dealing with this issue.

What if, instead of trying to split up the state we tried to simply minimize it? The less of that pesky stuff around, the better, right? This idea is at the heart of pure functional programming used by Haskell!

For the uninitiated: Here is a little excursion into the world of functional programming. What is it? First let us think of how non-functional programs usually work. Such a program can be thought of as a sequence of instructions. Think of it as the instructions of a recipe. They tell you the steps you need to take in order to arrive with a finished result. That might look something like this:

Melt butter in a pan on low heat
Add chocolate
Beat 8 eggs in a bowl
Add flour, sugar and backing powder

3
Mix well with a hand mixer
Add butter and chocolate mixture
Pour chocolate cake batter into baking sheet
Bake for 25 minutes at 200°C in an oven

In the end, following the steps will result in a chocolate cake! But what if we now want to create a lemon cake? It’s possible that the recipe would look almost identical to our chocolate cake but will vary in step 2 where we don’t add chocolate but something lemony. So if we wanted to write a lemon cake recipe we would need to copy most steps. This poses a problem. Which steps do we copy? We realize that step 1 and 2 have nothing to do with making a general cake batter. They are additional steps. Step 6 is also highly specialized and cannot be copied into other recipes since you need a butter and chocolate mixture in order to add it to something. This is a problem of state! After completing each step you have changed the state (the finished products in your kitchen in this case) and some steps depend on on this state. An instruction like "Mix well" only makes sense, when you have something to mix!

So how would a functional recipe look like?

A butter and chocolate mixture consists of melted butter and chocolate using low heat
A cake batter consists of 8 beaten eggs, flour, sugar and backing powder

2
A chocolate cake batter is a cake batter mixed with a butter and chocolate mixture using a hand mixer
A chocolate cake is a chocolate cake batter baked in an oven for 25 minutes at 200°C

We immediately see a profound difference. The recipe doesn’t tell us what to do, but how our intermediate results are defined! However, from this we can easily infer the steps we need to take in order to produce the finished chocolate cake. We first have to create the chocolate cake batter, for which we need the general cake batter mixed with the butter and chocolate mixture. Both, in turn, have their own definition. So by recursively looking at the definitions we arrive at the most basic steps (adding of basic ingredients) from which we can then produce the wanted end result. So what if we wanted to create a lemon cake recipe? It’s easy! We need to switch out the butter and chocolate mixture in the third step with something lemony and we are done! The definition of a cake batter simply stays the same. We have no state in our recipe, we only have definitions for intermediate results. This also means that this recipe doesn’t have a fixed order. It doesn’t matter if you first create the cake batter or the butter and chocolate mixture. You as the baker can decide which is more convenient for you!

So how does this relate to programming? If we imagine the ingredients, mixtures and cakes to be variables we realize that in the imperative recipe they change over time, being modified by our instructions. In the functional recipe they never change. They are produced once. The steps are not reliant on the state of those variables but only their existence. Once a variable is needed we can evaluate it. In the functional recipe some steps contain information on how to perform transformations, like baked in an oven for 25 minutes at 200°C. In software this would be represented by a function, which in turn can be parametrized. This is one of the main properties of functional programming: Functions are first-class objects. They can be passed as arguments to other functions!

This type of programming is also called declarative programming since the program can be interpreted as a single big definition made up of other smaller or even recursive definitions. This approach makes it much easier to create reusable code since the same definition might be used in different contexts. This makes it clear that we don’t want to have mutable variables since they don’t follow a clear definition but change over time. However, some functional programming languages allow mutable state. Those languages might be functional but they are not pure. For a programming language to be pure, functions cannot have side effects. A side effect is any interaction with the state of the program outside of a function while being inside that function. This means that functions are not allowed to do a number of things:

Input (reading) / Output (writing) of
- Files
- Network sockets
- Threads or other processes
- Databases
Random Numbers
Information on the current time
Access memory

Of course there is a way of doing all of these operations in pure languages by controlling the effects using a variety of software design patterns. In Haskell the chosen paradigm is called monadic programming which we will eventually go into. This way of programming is much different compared to the programming you are probably used to and forces us as programmers to rethink many concepts we have taken for granted. Interestingly, modern programming languages employ an increasing number of functional concepts and features. Be it Java, TypeScript, Python or even C++. Additionally, many new languages designed by large companies leave behind historically popular paradigms like imperative or object-oriented programming and focus more on a functional design. F# (Microsoft) and Reason (Facebook), which are two direct descendents of the functional programming language OCaml, serve as two examples. Understanding functional concepts is an important step to future-proofing yourself as a developer.

Additionally, Haskell uses a large number of exotic and foreign concepts as its main language features. Some of them are:

Monadic programming
Type classes
Lazy evaluation
Software Transactional Memory
Generalized algebraic data types

But why? What is it for? In software engineering we have figured out a number of concepts to help our architecture stay maintainable. KISS (Keep it simple, stupid) and DRY (Don’t repeat yourself) are two examples. While we try our best to fit our implementations to follow these rules, we as software engineers often have to fight against the programming languages we are working with to achieve these goals. Sometimes we just have to copy code from one class to another because otherwise we would need to refactor our class hierarchy and sometimes we cannot keep our procedures short and simple since there is no way to further break up the process we are trying to describe. Haskells declarative approach mixed with its high abstraction potential doesn’t just make it simple to follow clean coding principles but actively aids the developer in creating maintainable code which is why the language becomes more and more popular with the software industry. Haskells academic roots show in almost every language feature it presents. These features were not just created as fun thought experiments but have practical value if you know how to use them.

Industrial applications aside, Haskell also has a passionate open-source community that actively develops tools and libraries for the language. This passion might be fueled by intellectual curiosity or the need for a quicker and easier way to develop but I think there is a much more important reason: Haskell is fun! It’s that easy. Developers making a career of writing Haskell code seemingly do it out of this simple motivation. I know that my colleagues and I do it, at least partially, for this exact reason.

1.3 Usage of Abstraction

It is clear that no programming language is universally useful. Haskell is highly abstract. It redefines what a high-level language is. This means that the source code the developer writes is far removed from the actual instructions ran by the processor in the end. This has clear up- and downsides.

1.3.1 The Good Parts

Abstraction shines when complexity becomes overbearing and modern software is full of complexities. Haskell makes them manageable which makes it a prime candidate for usage in cryptographic and distributed systems. By minimizing state, the developer minimizes the possibilities for bugs and unexpected behavior which is why implementing security related protocols and web server logic is made almost stress free. Declarative languages are generally very definition heavy. This is great if the programs logic is mostly based on definition. This is true for compilers, transpilers and programs for file conversion.

Haskell is a general purpose programming language. Generally speaking, any program could be written in it. The vast amount of libraries available allows the language to be used in a wide variety of industry applications. Haskell truly shines in building large software with varied data sources since in its fundamental core is composition of disparate software components. After all, the whole is bigger than the sum of its parts.

As beginners to the language we might ask: What can projects can we do in this language? A good choice are tools having well defined input and output. Think of UNIX tools. They follow a simple philosophy: Do one thing and do it well. They often read their input, process it and produce some output. Haskells pure nature is ideal for such tools. Even if the actual task becomes slightly complicated, like computing advanced statistics on a bunch of data, it often can be modeled very easily. A nice example for such a tool is Pandoc, an open-source tool used for conversion of documents of varying file types. It can take documents in e.g. HTML, OpenDocument or LaTeX and output equivalent documents in formats such as DocBook, EPUB or PDF.

Somewhat different from many other languages, Haskell was born in the world of academia and not the industry. Thus, it is no wonder that many programming projects in Haskell (by hobbyists and professionals alike) revolve around building compilers, interpreters and domain specific languages. What could be more fun than programming in a language someone else wrote? Programming in a language you yourself wrote! Haskell allows you to define your own operators which lets you create your own domain specific languages components within Haskell itself. The languages academic roots also show in the many proof assistants and automatic reasoning tools written in it. These applications can be used for mathematical proofs or testing of software specifications.

Haskells ability to simplify complex software architectures makes it a popular choice in back end software for data analysis or complex data management. It is used by large companies like Facebook, Target, Barclays, Standard Chartered and NASA in data intensive applications and there is a good reason for it. Haskell code tends to be more reliable, easier to refactor and much simpler to test and verify. If you need a safe bet when writing software Haskell is a very likely choice.

I personally find Haskell to be an excellent choice for rapid prototyping because it is easy to create simple applications with just a few lines of code and then extend those lines often without the need for any refactoring. You can turn a single threaded processing model into a multi-threaded one with just a few lines of code and no additional changes. You can often switch out configuration file structures by simply switching from one parser to the other without any other major changes. You can prototype an application with file input and then much later down the line switch to network input, often without changing most parts of your system. Haskell applications often grow very naturally with your changing needs.

1.3.2 The Bad Parts

Abstractions are great. However, Haskell is as far removed from low-level code as a space shuttle is from the Mariana Trench. This has a profound downside: As a programmer you are never fully in control of what is going on in your program. Between your intentions and the actual hardware sits a run time you have little to no control over. Threads are being handled by it. Memory gets allocated whenever the run time pleases. The garbage collector starts acting whenever it deems it necessary. Of course, in some ways, this makes the developers job easier but makes the language largely unsuitable for a number of applications:

Real-time critical applications
State heavy programs such as video games or multimedia applications
Device drivers or operating systems

It is in no way impossible to write such applications with Haskell. However, it will prove a challenge if anything.

1.4 The Things We Learn

Some people claim you have to have studied maths in order to learn Haskell. This book tries to once and for all break this stereotype by explaining the fundamental concepts of the language in an easy to understand manner. You don’t have to any prior exposure to functional programming at all! However, you should be familiar with procedural or imperative programming (C, Java, Python, etc.) and have a base understanding of algorithms and data structures as well as operating and file systems. The ideal reader has:

At least 1-2 years of programming experience
Worked on different (small) software projects and knows the problems that can arise in real-world applications
Some basic knowledge on operating systems (specifically UNIX)

Much different from other materials this book will not give you a crash course on the most advanced techniques and concepts in Haskell but highlight fun, creative and useful projects, showing you how to tackle certain problems that arise when writing real applications. The chapters will also present best practices and explain why these practices are important. Haskell can be an ocean of foreign concepts and this book is not trying to give you a deep dive. It rather tries to be the diving instructor showing you the safer, shallower depths of this ocean. By the end of the last chapter you should then feel comfortable to board a submarine and explore the deepest depths yourself. It’s going to be an amazing journey.

This book is designed to cover a large variety of projects. We will begin by writing beginner friendly tools like a simple (but clever) artificial intelligence for a special variety of the game word ladder and a CSV tool, which will be capable of neatly formatting a CSV file and printing it as a table in ASCII form as well as provide additional features like searching. Later in the book we will cover more data intensive work like working with audio and image files and how to manipulate them within our software, even creating our own musical synthesizer and multi-threaded image processing library. At some point we will honor the academic roots of this language by writing our own toy programming language complete with its own interpreter! Of course no book on Haskell would be complete without trying to emulate the big boys by creating our own microservice! A web server capable of answering requests by performing actions that we can configure on the spot. All the while we will have a chance to look at some advanced Haskell libraries in action!

We will mostly focus on writing applications used on UNIX-like systems like Linux, BSD or Mac OS. If you want to use Haskell on Windows that is definitely possible by using WSL (Windows Subsystem for Linux). While Haskell can also be used natively on Windows I personally would not recommend it. Most of the projects will consist of applications running on the command line or terminal. This is a deliberate choice since it simplifies the development process when we can focus and concentrate on minimal user interfaces and don’t have to worry about GUI programming.

After reading this book readers will feel comfortable in the usage with Haskell, implementing real world projects, not falling into the typical pitfalls that beginners usually fall into. Additionally, readers will know how to apply the learned functional concepts universally to other languages.

1.5 Summary

Haskell is a pure functional programming language focusing on a safe and composable code
Safe code disallows dangerous operations which translates to less bugs and undefined behavior, thus better reflecting the programmers intention
Haskell has strong roots in academia and presents many results from contemporary programming theory in its feature set
Haskell is a garbage-collected and compiled language featuring native support for parallelism and concurrency
Pure functions are functions with no side-effects, only consuming some input and producing some output
Functional and declarative programming focuses on the definitions of intermediate results instead of providing a sequence of instructions
A side effect is any interaction with the state of the program outside of a function while being inside that function
Haskells declarative approach mixed with its abstractions aids the programmer in following clean coding principles
Haskells abstractions simplify complex software architectures making it a popular choice in back end software for data analysis or complex data management

1