After reading lesson 8, you’ll be able to
- Manipulate substrings
- Do mathematical operations with strings
If you’re given a long file, it’s typical to read the entire file as one large string. But working with such a large string can be cumbersome. One useful thing you might do is break it into smaller substrings—most often, by new lines, so that every paragraph or every data entry could be looked at separately. Another beneficial thing is to find multiple instances of the same word. You could decide that using the word very more than 10 times is annoying. Or if you’re reading the transcript of someone’s award acceptance speech, you may want to find all instances of the word like and remove those before posting it.
Consider this
While researching the way that teens text, you gather some data. You’re given a long string with many lines, in the following format:
- #0001: gr8 lets meet up 2day
- #0002: hey did u get my txt?
- #0003: ty, pls check for me
- ...
Given that this is originally one large string, what are some steps that you could take to make the data more approachable by analyzing it?
1. Separate the big data string into a substring for each line.
2. Replace common acronyms with proper words (for example, pls with please).
3. Count the number of times certain words occur in order to report on the most popular acronyms.