chapter two

2 Strings

This chapter covers

Defining strings in the ECMAScript specification
Comparing ASCII, Latin-1, and Unicode encodings
Examining UTF-32, UTF-16, and UTF-8 formats
Distinguishing DOMString, ByteString, and USVString
Exploring V8’s string implementation strategies

In Chapter 1, we saw how identical JavaScript code produces different results across runtimes. Some of these differences stem from how runtimes handle the most fundamental data type: strings. As we'll discover through a real debugging story, even something as basic as reading a filename can behave differently depending on assumptions about character encoding.

Most of the time, a JavaScript implementation handles details of the language that user code (that is, your code) should never actually have to worry about how it is implemented. If that’s the case, why talk about these internal details at all? Well, breaking down the internals serves two purposes:

First, it further illustrates that there is a fundamental difference between language-defined behaviors and implementation-defined behaviors
And second, it turns out that there can be a very real performance cost for an application depending on how the application uses things.

In this chapter we will start by explaining some of the background of what makes a string a string and how they encode information. We will then conclude with a brief look at how JavaScript engines are able to efficiently implement strings to maximize performance.

2.1 What is a string?

2 Strings

This chapter covers

2.1 What is a string?

2.1.1 Let’s talk about encodings

2.1.2 Unicode, UTF-32, UTF-16, and UTF-8

2.1.3 DOMString vs. ByteString vs. USVString

2.1.4 So, about those “file not found” errors…

2.2 How V8 Implements Strings

2.2.1 Setting the stage: pathologically bad template generation

2.2.2 String memory

2.2.3 V8’s multiple string types

2.3 Summary