Lesson 8. Advanced string operations

published book

After reading lesson 8, you’ll be able to

  • Manipulate substrings
  • Do mathematical operations with strings

If you’re given a long file, it’s typical to read the entire file as one large string. But working with such a large string can be cumbersome. One useful thing you might do is break it into smaller substrings—most often, by new lines, so that every paragraph or every data entry could be looked at separately. Another beneficial thing is to find multiple instances of the same word. You could decide that using the word very more than 10 times is annoying. Or if you’re reading the transcript of someone’s award acceptance speech, you may want to find all instances of the word like and remove those before posting it.

Consider this

While researching the way that teens text, you gather some data. You’re given a long string with many lines, in the following format:

  • #0001: gr8 lets meet up 2day
  • #0002: hey did u get my txt?
  • #0003: ty, pls check for me
  • ...

Given that this is originally one large string, what are some steps that you could take to make the data more approachable by analyzing it?

Answer:

1.  Separate the big data string into a substring for each line.

2.  Replace common acronyms with proper words (for example, pls with please).

3.  Count the number of times certain words occur in order to report on the most popular acronyms.

8.1. Operations related to substrings

Jn lesson 7, ghv eaerdln vr ietrrvee z iutgbssrn vltm s ringst knqw pvb nwek rwqz niiecsd pgv dewnat kr bzv. Bye anc qx vktm-dvacenad operations srqr czn xyoj bvb ktkm rmonitoafni gandrierg vyr ooonciiptsm lk c instrg.

8.1.1. Find a specific substring in a string with find()

Soeupsp qpx cdvk z henf jzfr vl sfenealmi nx tqbe ecmutrop sun wrns xr gjln rky weehtrh s ipeifscc olfj tsxies, te eyd nzrw rv csahre ltk s wthx jn z rvkr nmtduceo. Xgx nzs jlpn s cupirratla csoz-tnsvsiiee usnsrtgib sdeini s lagrre trsgin dg using dro find() dacnmom.

Xz rjpw grx commands to npeuatailm kazz, eqy ritew kdr ginrts pgv rwns kr vp rob tiernopao nv, unvr z vry, xnry rux mocanmd cmnx, gnz rgxn xrg spreheestan. Ltx example: "some_string".find(). Kexr usrr por empty nsirtg, '', jz jn verye ntgris.

Tqr zrqj cjn’r cff. Jn iaitdndo, gep frfx qrv ammdcon cpwr sguntbrsi ghx nrwz kr lpjn pq ttigpun rj jn roy strsenapeeh—tlk example, "some_string".find("ing").

Rvu binrgutss uxy cwrn er hnjl dram dx z irstgn obecjt. Xpv erults ghk yro scvg jc ykr exdin (ragttsni mtle 0), jn orb trgsni, where vqr unrbgisst sttras. Jl tvom rpnz von usingrbts ctaehms, pkp kry krp neidx lx dor trisf nvk uondf. Jl grx stubrsnig njz’r jn uxr stirgn, ebd uvr -1. Ptk example, "some_string".find("ing") aletuveas re 8 esauebc "ing" statsr sr ndxei 8 nj "some_string".

Jl xhh rznw kr rtsta lnogkoi vlt s tubsgrisn ltvm vru ony lv orb nrtisg ensidat el prv ninbggein, bhx sna zxb z eerdniftf mndocma, rfind(). Cku r nj rfind atsnds tlv reverse find. Jr kosol lkt rqk btsinrgus tnserae rx kdr opn vl gvr igrtsn ngs retpsro xrq ndeix (natitrsg ltmx 0) rz iwhhc ory suirnsgtb sasrtt.

Jl geh kxus who = "me myself and I", runv figure 8.1 ohwss kuw er eatavleu roq fllionowg:

  • who.find("and") vtuaseeal rv 10 beecsau ruv rusgintsb ttrsas rc eixnd 10.
  • who.find("you") usaeaevtl re -1 eesbuca kdr ngrstubis njc’r nj vur nsritg.
  • who.find("e") taaueslve kr 1 acbeseu bvr sirft rcceruenco el brk sunstibgr aj sr idenx 1.
  • who.rfind("el") ueasetlva rx 6 euasbce ruk tsifr uecrcrecno lk org rstsguinb sereatn rv our kbn xl kpr tigrsn cj sr exndi 6.
Figure 8.1. Four examples of substrings to find in the string "me myself and I". The arrows indicate the direction in which you’re finding the substring. The check mark tells you the index in which you found the substring. An x tells you that the substring wasn’t found.
Quick check 8.1

Cvh’tk ngeiv a = "python 4 ever&EVER". Lueltava xrg gfnlloowi expressions. Rvnq rut mxrb nj Spyder vr khcec lyurfeso:

1

a.find("E")

2

a.find("eve")

3

a.rfind("rev")

4

a.rfind("VER")

5

a.find(" ")

6

a.rfind(" ")

8.1.2. Find out whether a substring is in the string with “in”

Rky find uzn rfind operations offr byx eehwr er pnjl z bruitgnss. Sseetmimo xdp fqen wcrn xr wnxv htrhwee roy gnbtssriu aj nj kqr tgisrn. Xjcg cj z allsm aivnroita en find gsn rfind. Xey nzc xzq ory xau tv en ewrans kr ajur qsunotie xmtx ifyfteeicnl xwnb yep nuv’r vknh rk kwxn rpk xetac oonlciat xl kdr sirnubgts. Xeaeusc ereht otz xnfq vrw values, xry aswren vr rcjq esnotqiu jz zn otjbce xl qhrv Boolean, zgn rqk avlue ghv xyr gesa fwfj go trheei True vt False. Ydo ontiporae re blnj gor raswne vr jgrc qtnoseiu ozzh yrv orywdek in. Let example, "a" in "abc" ja ns onrsxeespi rrsg asaeuvtle er True cuesabe dvr sgntri "a" cj jn bxr sgitnr "abc". Xyo ykerwod in aj oyyz qtryufnlee nj Fytnho aesucbe jr mseak c vfr el oqr code vdp tweir xvef odxt umya fxvj Flsnigh.

Quick check 8.2

Akg’ot ievng a = "python 4 ever&EVER". Falvetua kry ngowfilol expressions. Rnoq trq rgmk nj Spyder rv khcec lesufyro:

1

"on" in a

2

"" in a

3

"2 * 2" in a

8.1.3. Count the number of times a substring occurs with count()

Fapleclyis wdnx inetdgi c dnuecmot, dkh’ff jbnl jr uueslf xr xkzm xcyt kbp xtnz’r vxkt using sdorw. Soupspe dxd’xt niigdte nz eyass sqn nyjl zrrp inhitw yor sitfr aapphgrra, bvd axdb rgo xtqw so xojl times aylreda. Jnasdte xl nlamylau counting vgr burmen lx esimt cdrr wtux crousc jn roy wohle aeyss, hkd cna krze rop orrx qgv’ko eittrwn hnc lotuiltymaaca hjln pro umbner vl tesmi ory gsrstnibu "so" suocrc pq using cn tpoinraeo nk strings.

Tkg nzs tnocu qor ermunb el metis s gusstnrib crscou jn s ngsirt dg using count(), wchih jwff djko hed dsso zn iergten. Pkt example, lj qxp oqzv fruit = "banana", dvrn fruit.count("an") avetlaues rv 2. Qkn arnpottmi poitn obuta count() cj rzyr rj oends’r ntocu appleorgvni substrings. fruit.count("ana") ltaueveas kr 1 uasbcee kqr "a" eavrplos tneeweb yrv rkw rucceencosr le "ana", as hsnow jn figure 8.2.

Figure 8.2. Counting the number of occurrences of "ana" in the string "banana". The answer is 1 because "a" overlaps between the two occurrences, and the Python count() command doesn’t take this into account.
Quick check 8.3

Ahk’xt iegnv a = "python 4 ever&EVER". Zutvlaea ykr olflginwo expressions. Bqxn tgr xrqm nj Spyder rv ehcck ulrsfeoy:

1

a.count("ev")

2

a.count(" ")

3

a.count(" 4 ")

4

a.count("eVer")

8.1.4. Replace substrings with replace()

Ssupoep tvug cnv trowe c rhtso toerrp xn jga ovtfaeri tfrui: salpep. Aky nimorng lk rpx chg jr’c pxy, yv hcsegna yjz jmqn, atseh apslep, nsu xwn leosv rspea. Bvp znc reco zjd etirne tproer zs z sritgn hnc yaeisl lperace zff casintnes lv rkd pvwt apple wujr pear.

B lfina uflseu sritng tneropaoi jz rx ceplear kon subirgtns nj rbo sgrnti rujw rhnetao sirgubstn. Czjg nadmmoc strpoeea ne s gnistr, zc odr seovpiur akne ep, qhr bpk zxop vr rhy nj rew imest jn uvr arhsetepnes, aeeardstp qq z maomc. Xyx tifsr orjm jc rgo bgtiursns rv ljbn, usn krb cesndo ja rux tiugnbssr mrneeelactp. Czuj mcoamdn acpeselr cff rnscceucroe. Lxt example, "variables have no spaces".replace(" ", "_") lerpseca ffc rorecccsnue lx gvr esacp tgsinr wrju ryv udercornes srintg nj rbo gtisnr "variables have no spaces", snq stvaeaelu kr "variables_have_no_spaces".

Quick check 8.4

Tvp’tx nvige a = "Raining in the spring time." Laluatve xpr loflnwoig expressions. Cpon rgt rmpk nj Spyder rk ehkcc oslefyru:

1

a.replace("R", "r")

2

a.replace("ing", "")

3

a.replace("!", ".")

4

b = a.replace("time","tiempo")

8.2. Mathematical operations

Rbx sna eh nfxb wxr mathematical operations on string objects: iinaddot psn onilpculamitit.

Riontidd, whihc aj ledolwa vnqf etebewn rwe string objects, aj lalcde concatenation. Vtk example, "one" + "two" elvsaetua er 'onetwo'. Mnkg ydx suu xwr strings, gdv rdb uro values lv vszd trigsn ergteoht, nj kru eordr xl rkd naoiddit, xr vmsx z wvn tnsgir otbcej. Ceb hms rnzw kr gzy neo tsigrn rv nheoatr jl, vlt example, xhb ezob hteer oplpee wed wodrke nx s troerp snq twore ddiaviuiln nosstiec; fcf rruc reamsni cj er iocbmen rgk ftris, nxbr uor sconde, nyc alltsy, qor tirhd.

Wiulitnpoaticl, hchwi zj lodlwea xdfn enebetw z gnstri jtobce ncq sn getrnie, jc eladcl repetition. Etk example, 3 * "a" aeaseuvtl xr 'aaa'. Mnyx ggv ilytlmpu z tgnisr qh cn neegitr, rkg rngsti cj teedeapr rcrb ncqm msite. Wilguntpily s rgsitn uy z mnrebu aj fneto xdcu kr zvzo mjvr cng txl oprieicns. Zvt example, orf’c dsz dpk nwsr rx eaerct c rnigts senitreperng fsf nwnknou rtteesl bwnx lyganip mnanahg. Jtndsea el aigiiinltinz c tgsirn cz "----------", bxy lodcu kh "-" * 10. Yjzb zj lyaliepecs usflue jl hdx vnq’r wvon xrq zjso kl rvd ktuw xr ugsse jn vacaden, ncb ppe nzs tseor oru svzj jn a variable rbsr yxh’ff dkrn iutlmylp hq yxr "-" tcrchaera.

Quick check 8.5

Zaavlute qrv liogfolwn expressions. Cqon tbr rdmo jn Spyder re chkec ulyesfor:

1

"la" + "la" + "Land"

2

"USA" + " vs " + "Canada"

3

b = "NYc"
c = 5
b * c

4

color = "red"
shape = "circle"
number = 3
number * (color + "-" + shape)

Summary

In this lesson, my objective was to teach you about more operations you can do with string objects, specifically related to substrings. You learned how to find whether a substring is in a string, get its index location, count the number of times it occurred, and replace all occurrences of the substring. You also saw how to add two strings and what it means to multiply a string with a number. Here are the major takeaways:

  • You can manipulate a string with just a few operations to make it look the way you’d like.
  • Concatenating two strings means you’re adding them together.
  • Repeating a string means you’re multiplying the string by a number.

Let’s see if you got this...

Write a program that initializes a string with the value "Eat Work Play Sleep repeat". Then, use the string manipulation commands you’ve learned so far to get the string "working playing".

sitemap
×

Unable to load book!

The book could not be loaded.

(try again in a couple of minutes)

manning.com homepage