2 Strings
This chapter covers
- Defining strings in the ECMAScript specification
- Comparing ASCII, Latin-1, and Unicode encodings
- Examining UTF-32, UTF-16, and UTF-8 formats
- Distinguishing DOMString, ByteString, and USVString
- Exploring V8’s string implementation strategies
In Chapter 1, we saw how identical JavaScript code produces different results across runtimes. Some of these differences stem from how runtimes handle the most fundamental data type: strings. As we'll discover through a real debugging story, even something as basic as reading a filename can behave differently depending on assumptions about character encoding.
Most of the time, a JavaScript implementation handles details of the language that user code (that is, your code) should never actually have to worry about how it is implemented. If that’s the case, why talk about these internal details at all? Well, breaking down the internals serves two purposes:
- First, it further illustrates that there is a fundamental difference between language-defined behaviors and implementation-defined behaviors
- And second, it turns out that there can be a very real performance cost for an application depending on how the application uses things.
In this chapter we will start by explaining some of the background of what makes a string a string and how they encode information. We will then conclude with a brief look at how JavaScript engines are able to efficiently implement strings to maximize performance.