This chapter covers
- Understand relation between Unicode, code points, and UTF-8 encoding
- Compare strings, convert them to lowercase and perform other string operations
- When and how to use raw strings
- Learn about different kinds of string literals: Regular expressions, MIME types,
BigIntliterals
We had had some hands-on experience working with strings in earlier chapters. However to correctly use text strings there are many details worth knowing about. In this chapter we will examine these details more closely. As long as you are working with letters from A-Z, things are simple. However there are a multitude of languages in the world with their own unique set of characters which Julia needs to be able to deal with.
That means a minimal required knowledge to work effectively with Julia strings requires some knowledge of Unicode. Unicode is the international standard for mapping numbers (code points) to characters.
Julia also has support for special string literals to aid in performing a variety of tasks. E.g. there are special strings called regular expressions which allow you to check whether another string matches a particular pattern, such as an email address, IP address or Zip code.
Text strings in Julia are Unicode, encoded in UTF-8 format. What does that mean, and should you even care? Let me walk you through a simple example to motivate your need to understand Unicode better.