chapter two

2 Strings

 

This chapter covers

  • Defining strings in the ECMAScript specification
  • Comparing ASCII, Latin-1, and Unicode encodings
  • Examining UTF-32, UTF-16, and UTF-8 formats
  • Distinguishing DOMString, ByteString, and USVString
  • Exploring V8’s string implementation strategies

In Chapter 1, we saw how identical JavaScript code produces different results across runtimes. Some of these differences stem from how runtimes handle the most fundamental data type: strings. As we'll discover through a real debugging story, even something as basic as reading a filename can behave differently depending on assumptions about character encoding.

Most of the time, a JavaScript implementation handles details of the language that user code (that is, your code) should never actually have to worry about how it is implemented. If that’s the case, why talk about these internal details at all? Well, breaking down the internals serves two purposes:

  • First, it further illustrates that there is a fundamental difference between language-defined behaviors and implementation-defined behaviors
  • And second, it turns out that there can be a very real performance cost for an application depending on how the application uses things.

In this chapter we will start by explaining some of the background of what makes a string a string and how they encode information. We will then conclude with a brief look at how JavaScript engines are able to efficiently implement strings to maximize performance.

2.1 What is a string?

2.1.1 Let’s talk about encodings

2.1.2 Unicode, UTF-32, UTF-16, and UTF-8

2.1.3 DOMString vs. ByteString vs. USVString

2.1.4 So, about those “file not found” errors…

2.2 How V8 Implements Strings

2.2.1 Setting the stage: pathologically bad template generation

2.2.2 String memory

2.2.3 V8’s multiple string types

2.3 Summary