chapter eleven

11 Working with strings

 

This chapter covers

  • Understand relation between Unicode, code points, and UTF-8 encoding
  • Compare strings, convert them to lowercase and perform other string operations
  • When and how to use raw strings
  • Learn about different kinds of string literals: Regular expressions, MIME types, BigInt literals

We had had some hands-on experience working with strings in earlier chapters. However to correctly use text strings there are many details worth knowing about. In this chapter we will examine these details more closely. As long as you are working with letters from A-Z, things are simple. However there are a multitude of languages in the world with their own unique set of characters which Julia needs to be able to deal with.

That means a minimal required knowledge to work effectively with Julia strings requires some knowledge of Unicode. Unicode is the international standard for mapping numbers (code points) to characters.

Julia also has support for special string literals to aid in performing a variety of tasks. E.g. there are special strings called regular expressions which allow you to check whether another string matches a particular pattern, such as an email address, IP address or Zip code.

11.1 UTF-8 and Unicode

Text strings in Julia are Unicode, encoded in UTF-8 format. What does that mean, and should you even care? Let me walk you through a simple example to motivate your need to understand Unicode better.

11.1.1 Understanding relation between code points and code Units

11.2 String operations

11.2.1 Camel case to snake case

11.2.2 Converting between numbers and strings

11.2.3 String interpolation and concatenation

11.2.4 sprintf formatting

11.3 Using string interpolation to generate code

11.4 Working with non-standard string literals

11.4.1 DateFormat strings

11.4.2 Raw strings

11.4.3 Using regular expressions to match text

11.4.4 Making large integers with BigInt

11.4.5 MIME types

11.5 Summary