6 Strings

This chapter covers

Problems with characters that don’t fit the Java ‘char’ type
Bugs caused by relying on the default system locale
Discrepancies between format string and subsequent format arguments
Accidental use of regular expressions
Pitfalls around Java escape sequences
Possible mistakes when using the indexOf() method.

There are many possible bugs involving strings. Strings look deceivingly simple, while in fact working with them correctly is quite difficult. Many common assumptions about strings are wrong.

6.1 Mistake #45. Assuming that char value is a character

Developers often assume that the Java char type corresponds to a single displayed character and that the String.length method returns the number of displayed characters or strings can be processed char-by-char. This is true in simple cases, but if the character Unicode code point is higher than 0x10000, such characters lay outside of the so-called Basic Multilingual Plane (BMP) and are represented as surrogate pairs: two Java char values represent a single character. Many emoji characters are located outside the BMP and require a surrogate pair to be represented.

For example, if it’s necessary to split the text into fixed chunks to distribute it to several rows in the UI, a naïve approach would be to use something like this (for simplicity, let’s omit bounds checking):

String part = string.substring(rowNumber * rowLength,
                              (rowNumber + 1) * rowLength);

6.2 Mistake #46. Incorrect case conversion in Turkish locale

6 Strings

This chapter covers

6.1 Mistake #45. Assuming that char value is a character

6.2 Mistake #46. Incorrect case conversion in Turkish locale

6.3 Mistake #47. Using String.format with the default locale

6.4 Mistake #48. Mismatched format arguments

6.5 Mistake #49. Using plain strings instead of regular expressions

6.6 Mistake #50. Accidental use of replaceAll

6.7 Mistake #51. Accidental use of escape sequences

6.8 Mistake #52. Comparing strings in different case

6.9 Mistake #53. Not checking the result of indexOf method

6.10 Mistake #54. Mixing arguments of indexOf

6.11 Summary

6 Strings

This chapter covers

6.1 Mistake #45. Assuming that char value is a character

6.2 Mistake #46. Incorrect case conversion in Turkish locale

6.3 Mistake #47. Using String.format with the default locale

6.4 Mistake #48. Mismatched format arguments

6.5 Mistake #49. Using plain strings instead of regular expressions

6.6 Mistake #50. Accidental use of replaceAll

6.7 Mistake #51. Accidental use of escape sequences

6.8 Mistake #52. Comparing strings in different case

6.9 Mistake #53. Not checking the result of indexOf method

6.10 Mistake #54. Mixing arguments of indexOf

6.11 Summary

Unable to load book!