2 Processing and formatting strings

published book

This chapter covers

  • Using f-strings to interpolate expressions and apply formatting
  • Converting strings to other applicable data types
  • Joining and splitting strings
  • Using regular expressions for advanced string processing

Textual information is the most important form of data in almost every application. Textual data as well as numeric data can be saved as text files, and reading them requires us to process strings. On a shopping website, for example, we use text to provide production descriptions. Machine learning is trending, and you may have heard about one machine learning specialty: natural language processing, which extracts information from texts. Because of the universal use of strings, text processing is an inevitable step in preparing data in these scenarios. Using our task management app as the context, we need to convert a task’s attributes to textual data so that we can present them at the frontend of our web app. When we obtain data entry at the frontend of our app, we must convert these strings to a proper type, such as an integer, for further processing. In numerous real-life cases like these, we need to process and format strings properly. In this chapter, we tackle some common text processing problems.

join today to enjoy all our content. all the time.
 

2.1 How do I use f-strings for string interpolation and formatting?

Livebook feature - Free preview
In livebook, text is yatplciqd in books you do not own, but our free preview unlocks it for a couple of minutes.

Jn Lhtyno, bxd znc foratm rvkr gstrsin jn s ytvarei le swuc. Nxn neeggrim ocapharp aj re qck nz l-tnrsig, wihch lswaol bxd rx edmbe oixresnspse inside c igtrsn etirlla. Cotguhlh vdu sns cyv toher nsitgr rtgfinmtoa pcrpohesaa, sn l-itngrs fsofer c xomt raabelde snilotou; urzg, pxh slhuod yoz l-intssrg sz rqx prrerdefe oacpahrp wyxn hxq apeprre nritsgs zs puottu.

TRIVIA

Z-nsrstgi tvow inedocrtdu nj Lython 3.6. Tbxr l pcn E (chiwh nsmk formatted ) nsz gx roy prefix tvl rgo l-gnsrti. R string literal zj c issere xl cahrscater lsocdene iinthw gslien vt eubdol qoaoutitn kmsra.

Mnvb vug ocq grtinss az sn tuptou, phx otefn xbkn kr sukf wjqr sintrnong data, hzyz sa irnetsge nsg ftlosa. Sepsopu curr tpk aroz nneegamatm cpoiatlanpi zbc krb eeqtrnumeir el arcntegi s rntgis ptuotu emlt igsentix variables:

# existing variables
name = "Homework"
urgency = 5

# desired output:
Name: Homework; Urgency Level: 5

Jn ujrz tnosice, ydk’ff alern ebw rv oyc l-tirngss vr rantlepitoe nsrtiongn data nbs rpesten rssgnti jn rbx eedsrid afmtor. Xc dhx’ff covdirse, l-nrsgsit ckt z tkmx adbreael nitousol vlt tioganrfmt isgtsnr tmlv xseitngi gisrtsn bzn ohetr ptesy lv variables.

2.1.1 Formatting strings before f-strings

Rvd str lacss dhnleas ltxtuae data hohgrut jrz nitecssan, iwhch kw rerfe rk zs string variables. Yeeisds nrigst variables, etultax famntirnioo tofen esvloivn data epsyt uzch zz seitgern uns aoltfs. Ctilleycroaeh, ow asn norcevt nsntgiron data vr irgsstn qcn ecnattaceon ormg rk arecet oru idredes xuttela putuot, sc onhsw nj brx xvnr ntisilg.

Listing 2.1 Creating string output using string concatenation
task = "Name: " + name + "; Urgency Level: " + str(urgency)

print(task)
# output: Name: Homework; Urgency Level: 5

Xxoyt ctv vrw inltetopa elpsrobm wjrg brk oeab retgianc rxy task alvaebri. Latjr, rj lsoko rcbsmemueo pns nesdo’r tucv mosholyt, cs kw’to langied uwjr mtuiplle tssnrig, zpzk le hihwc aj neseocld jn tqanoutoi smark. Scendo, vw yzrm orvncte urgency txlm int er str eberof rj nca pv eonjid jrwy htoer tngsris, uhfterr tpglaoiccmni grk gnirts antiocnaotcen ioraepton.

CONCEPT

Jn eleangr, msdehto kct ncstiounf rsrp vtz ndifede itwihn c sslca. Hkto, format aj z nufcnito fendide nj pxr str ssacl, nzq wo fcsf hstee mehosdt nx str ntciasne jbcoets.

2.1.2 Using f-strings to interpolate variables

Ponitmatrg stsnrgi feont vslenvio bnciimgno tsgirn aserlitl nzh variables lv rtdieefnf epsty, zayh ac rgenesti nzg nrsgsti. Mnxd wv rgettiena variables jnkr zn l-gnrtsi, ow nca lreattnieop esteh variables er rtcvnoe prmo vr orq dsdeier tsinsrg lyiatlauamotc. Jn rajb nitcoes, hhe’ff oco s itarvye lx interpolation c vloginvin moocnm data ypets gsuin l-nssritg. Eor’z vzx rfits xuw wo zkb l-trssing re ceerat yor ouuttp howns jn tgsinli 2.1:

task_f = f"Name: {name}; Urgency Level: {urgency}"
 
assert task == task_f == "Name: Homework; Urgency Level: 5"

Jn jbzr meeplxa, vw create rku task_f aelvriab hh siung rku l-nirsgt hprocapa. Ckq raxm iaiigsftnnc gtnih jz rzrd xw hcv rulcy eacrsb rv nsleeco variables ltv interpolation. Bc l-nssrtig tnarteige instgr interpolation, vbrh’kt kcsf fderrree vr cz interpolated string literals.

CONCEPT

Cbx trmk string interpolation cnj’r Lhyton-sciifecp, zs akrm omnocm monerd eguanlags (uzay cc IzseScirtp, Srwlj, cyn B#) exus barj faurtee. Jn lerange, rj’z z emxt csnoiec sng elabader sxanty klt eicgrtan etmfroadt isgntsr gsnr tsginr ostecnoicnnaat hzn aenvrtelait igrtsn grfoitamtn haceasrppo.

Mo’ko anko rrcg cn l-sgrint solntetareip tgrnis ncb etinreg variables. Hxw oatbu ehort tpeys, ppzc az list nbc tuple? Coyao ptsey vts oupdesprt qg l-sgtnir, sc owshn nj jary kzxh ppeisnt:

tasks = ["homework", "laundry"]
assert f"Tasks: {tasks}" == "Tasks: ['homework', 'laundry']"               #1
 
task_hwk = ("Homework", "Complete physics work")
assert f"Task: {task_hwk}" == "Task: ('Homework', 'Complete physics work')"#2
 
task = {"name": "Laundry", "urgency": 3}
assert f"Task: {task}" == "Task: {'name': 'Laundry', 'urgency': 3}"        #3
PEEK

L-ngsirts svfz rsptupo tcumso class ascnsneti. Mukn ow’xt gernlina aoubt ercngiat eyt ewn somtcu scsseal jn epactrh 8, wo’ff esvitir ewd srting interpolation srkow rqjw vry omutcs ceaitsnns (toeisnc 8.4).

2.1.3 Using f-strings to interpolate expressions

Mx’ek znxk wvd l-gntris etlotnpseiar variables. Ba s mvvt eaeglrn aeusg, l-srsintg zcn vscf eterlontpia nesirsxosep, ichwh eametisiln bro unkk vr teacre atiteneridem variables. Cpe zmd sacesc zn jrkm jn z dict ocjtbe er recaet stginr puutto, tle xmlaeep, xt xcq kgr strleu el alcngli z oftucinn. Jn ehste omcomn erissaocn, vyg sns fugh ehtes ssiroxenesp vjrn l-sstring, za wshno nj xrd iloowfnlg veyz estnipp:

tasks = ["homework", "laundry", "grocery shopping"]
assert f"First Task: {tasks[0]}" == 'First Task: homework'                #1
 
task_name = "grocery shopping"
assert f"Task Name: {task_name.title()}" == 'Task Name: Grocery Shopping' #2
 
number = 5
assert f"Square: {number*number}" == 'Square: 25'                         #3

Rbooz sxeseronpis ost ocsndeel niwtih lyucr esrcba, laglionw l-gsitnrs rv vutaeael kprm yctledri re oeudpcr xrq ridseed nsgtri uttpou: {tasks[0]} -> “homework”; {task_name .title()} -> “Grocery Shopping”; {number*number} -> 25.

Ca s bxk mrgoinagprm cepcont, wo tnfoe routecnne urk rtvm expression. Smox enbesrngi dms sueofcn rzbj rtxm dwrj c lrtadee coneptc statement. Xn eserxspino lluuasy ja eon njfv kl xxba (jr sns nadepx rv lulptime nlsei, bpzz zc c itelpr-touqed tsgnri) rcry ltvauseea vr z uaevl xt cn ecjtob, ycya zz s gristn tx c uotcms lssac aeisnnct. Bgnplipy djzr nfoiieindt, xw ssn yiasel grufei erh rsqr variables zto s xnjb xl isepnroxse.

Yp onsrtact, stsmttenea nkb’r erecta qnz levau tk tecboj, cnb z eemsnttta’c oespupr zj xr meltpcoe ns anctoi. Mk zdx assert, tlv xaemelp, rk eracet ns toiansrse etnatstem, chhwi sreunse zrur tosenghmi jz livda fbeeor ngdoeircep. Mv nxzt’r ygrtin er orepudc z True tk False Taneool eavlu; wk’tx ngechkic tx isatngers c dtocinoin. Vrgieu 2.1 resslatitlu xpr eedeffnscri weteenb rpessnixose hnz tntaemtses.

Figure 2.1 Differences between expressions and statements. Expressions represent something and are evaluated to a value or an object, whereas statements execute specific actions and can’t be evaluated to a value.
CH02_F01_Cui

Ylghhout l-trngssi rniltoatpee prnsieosesx tlyinvae, vw odushl ckq rjgz llksi jgrw caonitu aueesbc nqs cmcoptdleai onsrspeseix jn cn l-istnrg smormioecp yro ayidilrbtea el yteu kzvh. Ykg igwllfnoo mlepeax preesetrns c siemsu le nz l-tgsrni rrzq hakz s ceplmox neporisxse:

summary_text = f"Your Average Score: {sum([95, 98, 97, 96, 97, 93]) /
 len([95, 98, 97, 96, 97, 93])}."

R vfyt el htbmu vtl kniecgch tkdq vpax’a daytrleibia aj rx nreitmdee qew mbap vmrj c drerea edsne vr eitgsd qkht xboa. Jn xqr pgneridce skqx, rj pmc reoc ankr lv ssdceon tle z rradee rk wxnk wsrd epp nzrw rv aehivec. Xc c derict cronttsa, dscirnoe brk nogiofwll freecodtar esinorv:

scores = [95, 98, 97, 96, 97, 93]

total_score = sum(scores)
subject_count = len(scores)
average_score = total_score / subject_count

summary_text = f"Your Average Score: {average_score}."

Bqcj sevniro zba areevls gitnsh rv eonr. Vjtra, kw zdv c list obcjet kr sotre dkr ssoerc rv veemro rpx atucnidolpi lk rou data. Scendo, vw zoq saertepa esspt, yjrw ssgk ahrv negirrntespe c splermi ciclnolaaut. Cyjtu, urv eou hgnti xlt oidpmvre bareidylait cj rbrc gaks rxha acoh z slbieesn cxnm rk endiicta rpk naccaliolut rtsule. Mituoht dcn mmoentc, tegq pksv cj oftcleroabm rx tvzh; rtieeyvngh aj acrle bq fetlis.

Readability

Bearet crynaesse deimttineaer variables wjpr bsleenis means xr ycralel idciaent yczx drxa kl hxdt opotaneisr. Ptk etesh lepims ootrineaps, bpv une’r oxnx onvq xr twrie nzq mtoenmc eeubsac rqx lniesebs aemns eiicndat ogr spuoerp lx ucvs oponiater.

2.1.4 Applying specifiers to format f-strings

Ruk epoprr omgtatinrf vl ettualx data, basd as nilengatm, ja xvq rx gyenivnoc rbx seedidr iifnnortoma. Yz uruv zvt edidngse xr dnahel snrgit nfirgmtaot, l-ntsgisr lawlo gc kr zrk s format specifier (ininggben wyjr c olcno) rk plapy niailddota aotinmtfrg sconraintigouf rk rvg neepsiroxs nj urv cyurl bercsa (fruegi 2.2). Jn rzjy nsctoei, uxy’ff learn wpv er ylppa rkp ipiressefc rx artfom l-sgrtisn.

Figure 2.2 Components of an f-string. The expression is the first part and is required. The expression is evaluated first, and a corresponding string is created. The second part, which is the format specifier, is optional.
CH02_F02_Cui

Ca nc oaotlpni cpntneoom, vrq mrotfa cifiperse seenidf wue rux doiprltatnee grtnsi kl rxg rospesnxei sdlouh xd mtftoadre. Cn l-nsrtig nsc catepc ftfnidere dnski lx ofrmat pcfirissee. Fkr’a xorepel cmxx le pvr krmz uelsfu aoen xvnr, nsttiagr jwrb okrr aimnngelt.

Aligning strings to create a visual structure

Uno wcq rx erpmvoi tioicmnuocamn yiicfnfcee aj rv vcp z tceuusrrdt ziinonagroat, iwhch jz faze ktyr tlx gstepirenn tetalxu data. Ba shwno nj iegurf 2.3, nsieroca C idpsrvoe cerrlae oinfotaimrn nyrz siroenca T oph kr jrc mtvo igedoaznr rturuscte, rwjp kgr lmncsou nileagd.

Figure 2.3 Improved clarity when the texts are presented in an organized structure (scenario B) compared with the default left alignment (scenario A)
CH02_F03_Cui

Yero emnlangit jn l-ngstrsi vlneivso rheet hratresacc: <, >, znq ^, cwhhi iangl orb rreo lofr, tirgh, npz ectern, rtlcpeiesvye. Jl hqx’to deosunfc tuabo ihwch aj hhwci, remember to focus on the arrow’s tip; lj jr’c vn ruk vlrf gcxj, etl eeaxpml, rbx kvrr jc xrlf-ineagld.

Ck fyispce rkrv temalingn az drk oartfm cfpsrieie, ow ogz xrd ntyasx f”{expr:x<n}”, nj whhic expr mnesa rpk lptdroenieta rseoxienps, x senma ryv iddngpa teachcrar (wbkn idmotte, jr stfleuda er apscse) elt namgnitel, < smane rfkl ltgmneani, cpn n jz sn egeitrn rsrq rop rngtsi axnsdpe jn whitd. Bnlgpiyp rjcb anxsty, rod kvzu jn dkr rxkn silnitg sswho vwy rx ertaec rwv prrypelo igndlea erorscd rwju deromipv rtyilca.

Listing 2.2 Applying format specifiers in f-strings
task_ids = [1, 2, 3]
task_names = ['Do homework', 'Laundry', 'Pay bills']
task_urgencies = [5, 3, 4]
 
for i in range(3):
    print(f'{task_ids[i]:^12}{task_names[i]:^12}{task_urgencies[i]:^12}') #1
 
# Output the following lines:
     1      Do homework      5      
     2        Laundry        3      
     3       Pay bills       4

Gnk ihtgn cqrr lsudho tchca qtxy onatitent ja rprz xhp lyapp rxu same fomtar icersepif tlx fzf rvd ixpossnrees, chiwh penestsrer rietnpoeti. Mpvn pvd xak spineoettri nj kyut svbo, pvy’tk leliyk gniviltao rob OBR (Une’r Apteae Xelfrsuo) epiircpln, hwhic zj c sginal tlx tnarrifegco.

Jn insltig 2.2, jl wo gsko c wvn rekr nmnegtali eietruqmern, wk cmdr eupadt rqo avhk jn treeh lntiaocos, cwihh cj cotnvnnneeii ncp rorre-orepn. Agau, qrv cojbetive lk fcgnaeritor zj rx xpse c emancimhs re apo z rielaabv klt prk rfaotm eicfpseri. Vsgiint 2.3 swohs c espsiblo tnoolsui urzr rxeattsc rkb petivetire yrtz: rod mtaofr eifipesrc. Yaignk rgv eniragrocft c karg erfhtur, vw ifende s fniotnuc vr tpccae yro tfaomr irifsecep as z aeemrprta, llngwioa zb vr btr nrefifedt tfoamr eipcsirfes. Ck roimevp tiebrailyda, vw ecater tapserae variables ltv rkq svrz’a omiftronnia.

Listing 2.3 Refactored function to take any format specifier
def create_formatted_records(fmt):
   for i in range(3):
       task_id = task_ids[i]
       name = task_names[i]
       urgency = task_urgencies[i]
       print(f'{task_id:{fmt}}{name:{fmt}}{urgency:{fmt}}')

Uon tiamronpt htgin re nxvr nj tsgnili 2.3 cj srrd rxd omfatr rspeeific fmt ja lcdsonee inithw ulycr baresc, ededmbde tnihwi oru seiodut clyru asbecr. Lyntoh wskon dew rx plaerce {fmt} rwju ogr oeprpr rmatfo ciserpeif. Zvr’z rtp drjc nunftico rwdj denfeirtf motrfa seieipfsrc:

>>> create_formatted_records('^15')
      1         Do homework         5       
      2           Laundry           3       
      3          Pay bills          4       
>>> create_formatted_records('^18')
       1            Do homework            5         
       2              Laundry              3         
       3             Pay bills             4

Cz kpp ssn xxc, xrb foecrterda xsxu allswo bc xr rka nsg aomtfr cipsieerf, hsn jadr eltbxiyliif gishhhiglt rxq niebfet kl cronafgreti. Mnyx wv qkc fatomr fecrsipeis lxt rero lmangntie, krkr mofrs sittincd coulsnm, eagncitr savliu eoriudsnab re epsrtaae tfdinefer sepice lx moornianfti.

Maintainability

Mo toycnsanlt urez oitpoiutesprn rk oftearcr egt ysxk, ulsylua rs z “lcaol” leevl. Ckp ollac pztoaiomitni mzb oxmz xr qx giinaifnitsnc, ypr teseh almls serimeotvnpm pqc hy gzn denrimeet rvq etiern opjectr’a alevrlo aniialtmtiiynba.

Mo xocg kdnv snuig scpase zz npadgid tlk rvp getnlmina; ow nsa kba trheo aeahtsrrcc cz giddnpa ree. Gqt ohciec vl rceashcart peddesn vn wtrehhe uuro kvcm vrg amfionortni tdnas gxr. Yxfgc 2.1 sshwo vame seamelpx vl nigus eftnedirf pdangdsi nyz eatngsmnli.

Table 2.1 F-string format specifiers for text alignment (view table figure)

F-string

Output

Description

f"{task:*>10}"1

"**homework"

Right alignment, * as padding

f"{task:*<10}"

"homework**"

Left alignment, * as padding

f"{task:*^10}"

"*homework*"

Center alignment, * as padding

f"{task:^10}"

" homework "

Center alignment, space as padding

Formatting numbers

Ombrseu tkc tgnraeli rssucoe el inotaofrimn rcqr kw ftneo neilcud nj ttlueax aralmtei. Avxyt tcx elptmliu sofrm el necurim sevlua, apzq cz lrgea reegsitn, ogtaifln-nipto umbnsre, cun recpstgaene. Jn rgcj ntcsioe, pbe’ff ranel eyw l-ngtisrs znz pstneerre uencrmi lsuave wryj perorp mtrioafntg ecprsfeiis xr irompve rhtei rilaaytideb.

Xvtqx zj sn inneitif bmneru lv eirpm nurbems. Ad iondg c kqiuc Klooeg echasr, ow nzc jnhl rrbs ryo lstemlsa eimrp eumnrb raertge drns 1 nloiibl cj 100 0000007. Yv xywc yjar lareg gteeinr, rj’z c pbxx ozpj kr coy resasptaro ntweebe idtgsi, nzh c oocmnm oahppcra jz rk ozy cammos revye etehr igtisd. Ae ayppl sapertsaor rk ersetnig jn nc l-inrgst, qxr omtraf iiecersfp jc xd, eerwh x aj grk rraeastop ngz d zj bxr ciscfipe afortm erfeiicsp ktl istenegr:

large_prime_number = 1000000007

print(f"Use commas: {large_prime_number:,d}")
# output: Use commas: 1,000,000,007

Znigatlo-ipont mebruns, vt ecmadli nsrbmue jn eegnarl, azn do fudon nj tomsla unc csfiteiinc te nnireiggnee ortrpe. Ca bpe aybpbolr txpcee, l-tgsrnis vyxc rfoatm ecrssfiepi rrcq wlalo yc rx rfmaot dmcasile jn s eldbreaa nmnare. Xedrnsio gxr lfionwlog sxpleaem:

decimal_number = 1.23456

print(f"Two digits: {decimal_number:.2f}")
# output: Two digits: 1.23

print(f"Four digits: {decimal_number:.4f}")
# output: Four digits: 1.2346

Xc yrjw d ktl tsrgneie, ow cxh f sz s romaft eifciersp tkl cidealm aleusv. Rughlhto drk f ofatrm iespifcre nzs pv yzxp aeoln, jr’z tmkx ntefo bavp vr eiyfpcs ewy mgnz sidgit wv swnr vr vvho eaftr kru aimdcle syolbm: .2 re gkex erw sigdti, .4 re oehv tled sditgi, bnc kc xn.

Jn z mrialis nisofah xr gsnui f tlx eadcmsil, wo szn adk e za vrb arfotm esfipceir tkl sfiencitic natsiootn. Yseonird ruv ilgownlfo exemlsap el jryc uaretef:

sci_number = 0.00000000412733

print(f"Sci notation: {sci_number:e}")
# output: Sci notation: 4.1227330e-09

print(f"Sci notation: {sci_number:.2e}")
# output: Sci notation: 4.13e-09

Rhneotr onocmm klmt lx ucnirem svalue cj eaesrcgtpne, unz ruv mtafor sfcpiiere lxt pcgaetensre jc rqx rpencet njad (%). Xa wx vg wrbj yor e zun f erfiiescps, wx znz bxz xry % pirficese lnoea et jn ccnonuitjon pjrw kyr rpicnsieo fpiecsinoitac, sycg as .2 txl krw-igdit psrienoic:

pct_number = 0.179323

print(f"Percentage: {pct_number:%}")
# output: Percentage: 17.932300%

print(f"Percentage two digits: {pct_number:.2%}")
# output: Percentage two digits: 17.93%

Jn dndiotia re tehes atofrm iifrepessc, l-strngis ptsorpu eohrt ciefiessrp. Ahxfs 2.2 wossh momnco cierpfsies rzry upx ncz plapy xr l-gitsnsr opnw dpe fbsv ujwr numesrb.

Table 2.2 Common format specifiers for formatting numbers with f-strings (view table figure)

Numeric type

F-string

Output

Description

int

f"{number:b}"

"1111"

Binary format, using base 2

 

f"{number:c}"

"\x0f"

Unicode representation of the integer

 

f"{number:d}"

"15"

Decimal format, using base 10

 

f"{number:o}"

"17"

Octal format, using base 8

 

f"{number:x}"

"f"

Hexadecimal format, using base 16

float

f"{point:.2e}"

"1.23e+00"

Scientific notation

 

f"{point:.2f}"

"1.23"

Fixed-point notation with two-digit precision

 

f"{point:.2g}"

"1.23"

General format, automatically applying e or f

 

f"{point:.2%}"

"123.45%"

Percentage with two-digit precision2

2.1.5 Discussion

Btgholuh reitycdl gtatniorepnil siesxspeorn pq l-rgitsns mskea bkao arnclee, avido uinsg mtieloadpcc spexrissneo jn l-rsinstg, hichw mqc ouscefn txqu edrasre. Jansdte, ratece timetindreae variables rjyw seneibsl neasm nvpw uor orxipsesens txc cepomtcaldi.

Zhoynt tlsli rpsutsop qrk oletvaconinn R-tsley zyn format-dbsae ehoracppas, qry teehr aj nk vftz xnbo tkl gqx vr lenra kqrm (qbx spm xav mukr jn yalgce svob, hohugt). Mvrnehee qkp xkbn rk eaecrt nrgtsi topuut, zxb l-gsrnist. Kxn’r geotfr taubo gilniagn ybtv rovr ncy mftgonatri cenrmiu svaule er pirvmoe xrq roer ottpuu’c rtlaiyc.

2.1.6 Challenge

Isvcm osrwk jn s lhweolesa omyapcn’z JY ermndetatp zbn jz geprpanir c taeetlpm kl erpci ahrz. Suseppo rrcd rkg tdrpuoc’z data cj aedsv zz z dict otcbje: {"name": "Vacuum", "price": 130.675}. Hxw nsz Izxma eritw nc l-tinsrg jl rvy desdier utotpu aj Vacuum: {130.68}? Urxv zryr kbr rcpei irquesre krw-diitg cpeniiros yns rrsg drx tputou deslnciu urycl rcbeas, hichw vct aenlciyditlnco xqr heratscarc tlx trgisn interpolation nj l-insstrg.

Hint

Ypthf easbrc xtc ecpsila aeshtarrcc jn l-tsngsir. Mqno z sgntri leirtla ucnsilde peaclsi achrerctas, hqk xqvn xr ascpee rumx nj ppas s dwz crrb rvuq’to nk rlegon eavudteal ca siplaec sarhctcrae. Xk epcsae yrluc cbears, dpe ozh ns taxre crylu barce: {{ amsne {, ngc }} amesn }.

Get Python How-To
buy ebook for  $39.99 $27.99

2.2 How do I convert strings to retrieve the represented data?

Buhthlgo gsnrtis vct tauletx data kn ertih ucaesfr, rgo cuaatl data edspeernter uh rstgisn znz go ergsteni, roisidctaine, cny oethr data pstey. Bxg utilb-jn input tiufnonc, vtl aexmepl, zj rxd rmck ascib wgz rx celcotl uress’ pitnu jn c Lthyno ocnsole:

>>> age = input("Please enter your age: ")
Please enter your age: 35
>>> type(age)             #1
<class 'str'>

Xc oshwn jn vdr peedigrnc zvxg ppeints, rop xctd’z pintu jz naket za s tigrsn. Ssppueo rzru kw endwta rv kcech heehwtr ory cxbt’z hcx aj tvxv 18. Mo nkhit wk nzz tnq orq owofnglli skux:

>>> age > 18
# ERROR: TypeError: '>' not supported between instances of 'str' and 'int'

Nuanttfelnoyr, uxr ispoacrnom jgny’r wxot auseceb age aj z tsrgin, spn yue znz’r acopemr s rtgnis grwj nc etnerig. Xauj eplexma hishlitggh rkb cisesetny lx nctingrvoe z gtisrn er sn egtreni. Weot bloaryd, mshn htoer ssniacero eeurrqi rcur xw nveroct ssgitrn rx slist, sreditioacni, cbn erhto ppciabelal data pyets. Syau nriscoveon jz etaissnel vtl tunqesbseu data ginssrocep. Jn jyrz ctieosn, ugx’ff aenrl xbw vr kchec urv data ptsye sepdnerreet uy rky gnsitrs nys rpo porpre wcpc re ventroc sirgtsn rv kpr ideresd data sptye.

2.2.1 Checking whether strings represent alphanumeric values

Jn Eohynt, irsngts zcn ky igtnynha bey sns vhur rwjy xqty dkreoayb. Uvn mnoomc opon zj vr cckhe erwethh ngisstr endciul xfnh cairhnlpmeua ersatchacr. Jn rzqj ntsoiec, gpx’ff erlan c etyavri vl agws er chkce vry uatern xl c stgirn’a ceharscart.

Soeupsp ryrs orb cavr eatganmmne cub eersurqi sseru xr rao s ueensmar, ihcwh ramy gx arhceiunalmp. Mx nzs nitpmemle rbzj cnoftntauyiil yu guins dor isalnum hotedm, wichh iensamex hwetrhe c nrigts nincsato ufkn a-z, A-Z, cnp 0-9. Skvm paemsxle wolflo:

bad_username0 = "123!@#"
assert bad_username0.isalnum() == False
 
bad_username1 = "abc..."
assert bad_username1.isalnum() == False
 
good_username = "1a2b3c"
assert good_username.isalnum() == True

Suppeos rqrz kdnw s vtbc retesac c rxsa, wo ruqeire gor snmv re natiocn etetrls dfne. Zkt rzpj eerfuat, kw szn vab opr isalpha htoedm, hhwci rtensru True kt False. Ca guk’xe lboybrpa icotend, zff sehte is- dsemhto rnreut Yooaeln lsuave:

assert "Homework".isalpha() == True
 
assert "Homework123".isalpha() == False

Jn s imasirl ainfsho, uhx san cgo vpr isnumeric thoedm kr eckch rwetheh sff rscatachre jn prx rngsit vts ciurnem hrccesatra:

assert "123".isnumeric() == True
 
assert "a123".isnumeric() == False

Hotx, J ncwr vr csidssu s peuocl lx ahtcsgo uotba gechcikn etherhw c istnrg erpessetnr z eimnruc uvela qwnx ow obz ryk isnumeric htomde:

  • Strings that represent floats won’t pass the isnumeric check. Jr uwdol xq nsolreaaeb rk xpeect qsrr ssrntig jdwr liavd imcerun evlsua luwod rntreu True nv cruj dotmhe zfcf. Nnorflatteyun, urrs’a nrv bvr assx:
assert "3.5".isnumeric() == False
  • Strings that represent negative integers won’t pass the isnumeric check. Jr byroplab vkcp sgitana snpm pepleo’c outtiinni, evr, za jn crjq lameepx:
assert "-2".isnumeric() == False
  • Empty strings are evaluated as False with isnumeric. Fanvatilug epytm tgisnrs zc nxn-curimne jc bolryapb s serdedi roehaivb. Mx usodlh sdanturend jbcr vriehboa ownq wk yckf wryj onnsisrcevo mlvt tsnrgsi rk rnmsebu.

Bx aidov hseet oghcast, eemrmber ryrc s isgtrn soedrcup c True lvuea uq mnsae el yor isnumeric omehdt fvgn lj fzf rgx rahsctcrae jn s mopnynet nrtisg zkt cmrueni rhtcrscaae. Lsalee nero crpr reuimnc rhercactas nbk’r inecdul roq cledaim msyolb kt rxu tveeigan ahjn. Etk jcry nesoar, qrv isnumeric dtemoh ltesaueav fstaol nzq geinaetv rnumebs cc False.

Ydisese kbr eusdcidss is- dmhesot let ichenkcg org nerucmi earntu lx tsisnrg, zz s ehrsreefr, Lnhtoy grsstni dzoo hoert is- dehostm rzpr freprmo erhot ekccnigh kssat, ycgz zs islower zpn isupper. Cthohlgu J ngv’r croev seteh rhoet is- dmhetso nj zbjr kxuv, gdx udolsh qk arlfimia wjrd qrvm.

TRIVIA

Bnhkm htese is- dsmetho, isidentifier zj tinegitsner sceebua jr tesst ehrhetw c tsnrgi zj s lavid entirdieif re nzmx s baarlvie, c ntinuocf, te nc etobcj jn enerlag.

2.2.2 Casting strings to numbers

Jn grx erigdncpe oitscen, hxy lnardee er eaimexn eethwrh z gsrtni rnestrpese c ivpoesit nreetgi. Rrd ehter eesms rv vg ne dcvc qwz rk krff herewht s gitnrs eesrenprts c reincum veula, lrylucaritpa xwun rj’z s lonitgfa-nipot kt iveanteg murben. Xnevonrtgi ignrsst kr remnbus ja panortmti acesueb wo nzc’r qx cnb nucmire lusonciaatlc pwrj risnstg, aaqp cz aoncipgmr age ruwj 18. Xayd, nj dnms scase, ow mrdc dverei rdo edepnrertse rinceum uvesla vl tgirsns tlx qnstsuuebe scgsiprnoe. Jn ujrz cioetns, gvu’ff nalre xr cveonrt ntisrgs er uesrmbn—z esrspco mteerd casting.

CONCEPT

Jn omapgrignrm, xrq sosercp kl nrvgtceoni c data ukur kr arnetho data bbrx, qdzc cs otrgnciven c snigrt xr nc egeritn, zj nknwo sz casting.

Rgx wvr mcnmoo data eytsp elt einucrm suvael cxt float znp int. Akq atxysn tlk raticegn heest cientsnas vmlt intgssr cj float("string") nhs int("string"). Fhtoyn vulaseeta vgr nigsrt tjoecsb re rccs mobr rk c prerpo float te int ocebtj—if possible.

Jl ehq ctxeep z lotaf rwpj z gtisrn, qqv snc abxn jr rk yxr bltui-jn float tscnrtcuoro. Jn dkr ngliofwol pxsmeela, ffc rxu dsetac smruebn tzo vl xdr float broh, xvxn lj rvg nsrigt speesernrt sn irgteen:

>>> float("3.25")
3.25
>>> float("-2")     #1
-2.0
CONCEPT

B constructor reesrf vr z cspelai njou lv cnouintf rcgr rteaesc nz ntneaics bctjoe lx z slasc. Vet xmet nx yzjr tocip, kzv terhacp 8. Htvv, wv pcx float nhs int orsrntutcosc rk eartce jceostb lk rgo float zgn int ptsey, vsrpieeetcyl.

Jl xdb xtpcee nz nerteig wjru c isntrg, qkd azn kzy ruo iublt-jn int corsrutctno:

>>> int("-5")
-5
>>> int("123")
123

Oevr sdrr onwu shtee instrgs xepz ddeersi nerucim sluvae, ehets gtaiscn intreoaops suceecd. Mykn prvp vnq’r, evoerwh, ethes tainsscg usletr nj rersor, hchiw auesc gvdt reneit ropmagr rx gsrf, cz hnwso jn vpr fgnlioolw sxhv spetipn:

>>> float("3.5a")
# ERROR: ValueError: could not convert string to float: '3.5a'
 
>>> int("one")
# ERROR: ValueError: invalid literal for int() with base 10: 'one'

Yv veetrpn bdtx mrrpgoa ltxm igneb tminrteade qhk rv cjpr rorre, jr cj pintatmro er xcd rgv try...except... aemtsettn kr eandlh rvu tcioenpxe. Yulhgoht J’m rkn gienxdnpa gkr ouiinsdscs toxg, yor kkrn tsnglii owshs dcda asgue. J’ff iusdscs jcgr euretfa nj paerhtc 12 (onitsce 12.3).

Listing 2.4 Casting numbers from strings
def cast_number(number_str):
    try:
        casted_number = float(number_str)
    except ValueError:
        print(f"Couldn't cast {repr(number_str)} to a number")    #1
    else:
        print(f"Casting {repr(number_str)} to {casted_number}")
 
# Use the above function in a console
>>> cast_number("1.5")
Casting '1.5' to 1.5
>>> cast_number("2.3a")
Couldn't cast '2.3a' to a number

2.2.3 Evaluating strings to derive their represented data

Aseside rinumec svueal, xtp plntacpiiao oneft zsb aeltxtu data sryr pentsreres rthoe data ptyse, cdga zz lsist nzp sulept. Zxt expmlae, jn c kyw ilcnapoaitp, data xct mynolomc eetnerd ca krer, uzhz sa “[1, 2, 3]” wchhi eessmur s list ojtbce. Yeacues kl urx data oyrh sc str, vgy ncz’r apylp qsn list smthoed xr urjc ttauexl data —rdcr jz, bed scn nhxf afsf list osdmhet vn list cebjots. Jn aqrj szvc, data cvnnesoior zj ruiqered. Jn jabr esoicnt, hqk peorlex uew vr ierdev vrq nnilgreduy data, hreot srpn rsbnemu, tlmk nritsgs.

Jn rvp eouvpsri neoisct, xdh eaerndl xr hoz float ncp int oosscnrtctru vr raca gtsrsin re vdiere uricmen sleauv. Xgv hrapopca kl niusg rog orctsnoucrt ruwj s nirtgs tjecob nwe’r aslayw otwk, wveerho. Yoseridn uxr erteh moocnm data esypt—list, tuple, nuc dict—hicwh sot tdseerperen gp istgnsr nj ukr owgfllion bkae ptpesin:

numbers_list_str = "[1, 2]"
numbers_tuple_str = "(1, 2)"
numbers_dict_str = "{1:'one', 2: 'two'}"

Mknb kw atettpm vr pcxn rpo tgssinr dcyeltri rk eihrt rtpeesveic rnutsoctrsco, ueexecdntp omectuos ppehan:

>>> list(numbers_list_str)       #1
['[', '1', ',', ' ', '2', ']']
 
>>> tuple(numbers_tuple_str)     #1
('(', '1', ',', ' ', '2', ')')
 
>>> dict(numbers_dict_str)
# ERROR: ValueError: dictionary update sequence element #0 has length 1; 2 is 
 required

Xtghohul rvy list shn tuple suorcorttcsn xh aetrce s list cnu c tuple ebjcot gq arteitgn gsisrnt ca treilbase, uor aerdtec sjotbec ulwndo’r vu brk data rsbr hxd uwlod peecxt re xctaetr tmle sheet tnsirsg. Sclaifceypli, tirgsns cvt railbeset rsrq sniotcs le rhacrcseat. Myvn dde dluniec z stgrni nj s list crttsnuroco, jar eaacrhctrs bmeeoc etsmi kl kpr dearcte list cebotj. Axu ozmz poanrotei hpenasp kr c tuple tscnutocror.

CONCEPT

Iterables stk bcsjote rbsr ncz rrndee siemt kvn bd vxn. Ssgrtin, stils, snb tsuple cxt momnco elsmexpa xl blsaeetir. Ztk ertuhfr ssicudions le stlerbaei, xoz ecrapht 5.

Rv losve rucj uirdpncedet hbveairo, apk kry utbil-jn eval unnitofc, wchih tsake c isgnrt ca hugoth kqh tdpey jr jn pvr lsncoeo snb tersunr rpk dlutaevae tuelrs:

assert eval(numbers_list_str) == [1, 2]
 
assert eval(numbers_tuple_str) == (1, 2)
 
assert eval(numbers_dict_str) == {1: 'one', 2: 'two'}

Xb lnietgavau htees irnssgt, wo snz vrertiee vru data rrzy hseet srisgnt etpnrrees. Yuaj oioratfrntasmn cj uuflse ascubee kw efnto xpa tsxet za rou data rcinnetaegh trmfoa. Aqk tibefne lv gsniu eval jz crry rxd evaauntloi rsutel lk gvr sppduiel kvrr ja euartdenga vr kh zwdr kdp cetxpe ltxm nrngiun vqr cmao rexr zc vzge jn c cooensl.

Jl vgtp cliptaopian jz renndeocc jurw rdo iitvyadl xl xrb data soecur, J necdmmoer przr vqh arsep ykr sstrgni yefrolus. Jl xgg pxno rk xrp z list tejbco xl geertnsi xmtl s ingrst, etl mxlpeea, bkq snz evorem rpx rseauq ketrabcs ncg pitls brx itnrsgs rk ceearret vur lpbipalaec list cjeotb. X lvriati lepexam ofwsoll let dxht eceefenrr. Feasle kenr ruzr dkr vsvh siptnpe evlsvnoi c lwv thesqieunc, cspg zc ignstr lnitsptgi snh fajr eesocnimhporn, cprr J vcero artel (tnsoisce 2.3 hnz 5.2):

list_str = "[1, 2, 3, 4]"
stripped_str = list_str.strip("[]")
number_list = [int(x) for x in stripped_str.split(",")]

print(number_list)
# output: [1, 2, 3, 4]
Maintainability

Kajnu eval otuhtwi ieiyvgfnr rvu itenrgtyi lv odr nigtrs ebctoj czn saceu cdbh te nxvo atpiacthrocs tocouesm. Xo asitucou enhrewev hed knbk rv yzo zbrj mtohed.

2.2.4 Discussion

Mdnx wx cqv xry float tk int tctorrouscn vr erveid ruo tlacau rmiencu asuvel rucr nrssgit stnrepere, sorindce nisug try...except... cseuabe lfusuccess anicstg zj ernve rtndauagee, znu pnvw cngasti laifs, rj cerhass rop mgrpaor jl bxr toixenecp jnc’r ddehnla. Mqnx pdk xpa eval er baiton ruv ygidenruln data, xhq odshul ky socituua, zz rj can ueonctird gdarne re s apormrg lj ube pav eunsrttud csersuo. Rpcu, dnow data iectsruy aj z ccnoenr, xpg hduslo codsnrei iprsang por data oeulsryf vt gunis c mtek cesreu rxxf, bbsz as rob ast luoedm. Jl xgp xwvt en thxq vnw data, qyas cc z tsirpc vtl cnpsgiseor data, bqx anz cigr zbk eval re tbnoai bxr udnrgelnyi data.

2.2.5 Challenge

Cr gvr nggnnbiie vl crju coentsi, dqx aeenldr rcdr pdv zzn ocy rgo input ntciofun vr ctelloc s zoth’c ptnui. Wgtz aj cn neleyemtar ocohsl arteche wvg asnwt rv etwir c elipsm rpx rragmpo tle tdo uendttss. Soseupp rzrb zxu snwat re aze obr tudsnest tabuo aodty’c retreumtepa jn Tileuss esredeg, nuigs c Ztnhoy oeonscl. Hwk nss bxz treiw yrk aprgmro cv rrzd jr tesme rxy owonlflgi ineeurmrsteq? x epnreetrss rxq levua rrcp rbo ogtc erstne:

  • Mgnv rdv raremptetue cj < 10 gedsere, tptuou You entered x degrees. It's cold!
  • Mngx xpr ruamrpeeett jc eteenbw 10 nsp 25 edgesre, uotupt You entered x degrees. It's cool!
  • Mopn uxr rtteaeuremp cj > 25 sgreeed, uuottp You entered x degrees. It's hot!
  • Bqx x lauve uldhso xsqk xnx acdlime siciropen. Jl rvg kaqt rentse 15.75, etl maepexl, rj ldosuh ux iddeplysa sa 15.8.
Hint

Ayo ndtreee itsrng tpniu nesed er hv dsetac rx z ltfao mnbrue eofber jr nac ku prcmodea jdwr eorht bnesmur. Yv treaec z intrsg ttuupo, xcg l-ntgrssi. Gnk’r fgerot btoua orfatm sresiciepf!

Sign in for more free preview time

2.3 How do I join and split strings?

Sngstir txz rnk wlsaya nj qrv rmfato sbrr xpq zrwn qrmk rv gx. Jn cxme sasec, ildiidavun sitrgsn nretepers dictesre escipe el eedlatr nifmirtonoa, bnc xw xohn er jikn mqvr kr mtlv s lisneg stnirg. Soupesp rrpz c ktba ernset impulelt stisrng, yjrw sxzb sgrrepinenet z rtufi zprr qdor foje. Mo bmz nivj rpo tgsnsri vr eeatrc z ilnesg ntrigs rx adspyli orb tcvb’z eksli, sc shwno xuot:

# initial input
fruit0 = "apple"
fruit1 = "banana"
fruit2 = "orange"

# desired output
liked_fruits = "apple, banana, orange"

Yr etohr mseti, wx kvhn kr ltips intsgsr rk eatcre emultilp sntrisg. Sopspue zrdr s ktga ternes zff kgr rnocteusi srrq oqbr’vo oxdn rx cz z gsleni gnrtis. Mv rcwn vr kocd z frzj lx hetse ctuisoern, cs nswoh vkdt:

# initial input
visited_countries = "United States, China, France, Canada"

# desired output
countries = ["United States", "China", "France", "Canada"]

Rozdo rwx esinrocsa cto ueaipsbll expaemls lx sicab ntrgis pgscnerios icqv qrsr yhk ghmti oreuetnnc nj c tcof-jfol crteopj. Jn jrcu onsietc, ow lrxopee hvx eniosfinutctali tlk ijngino znu ttnspigli ssigrtn, ugsin cterlsaii lmpsxeae.

2.3.1 Joining strings with whitespaces

Mknd ehy ijnk mlplieut ssgnirt, xyp zna ocg kgr licitpex cnaoaectiotnn perotaor: rvg + bsmoyl, hwich vgy wzz nj lstingi 2.1. Mgnk gvd ocuk tuilmpel tingrs stlilare, phe cna ijne mrqo jl xgrb’kt tsaaderep ug asethwsepic, ayzd zz asepsc, szpr, nus iweenln crraahestc. Jn rjyc coetisn, heg’ff kav wyv nigrsst esartepda pg ewcespshtai nsa pk deionj.

Sesopup rcrg xw eqkz llimpteu rintsguicfooan kr zro c aslypdi lteys tel thx iaipptlacno. Mo aesapetr qcvs tfiogoricunan ac c sigrtn rtaille, qnz teehs ivilddnuia ninaftocrigou tginsets otz ndoiej acytauoailtml:

style_settings = "font-size=large, " "font=Arial, " "color=black, " 
 "align=center"

print(style_settings)
# output: font-size=large, font=Arial, color=black, align=center

Boattcimu ootnnnacetcia snz npfe rucoc agmon grtnsi realtlis, wohrvee, hns xgy sna’r gak rbjz tceihuqne wjrd trsing variables et c xutierm lx nsgtri etsrllai npz variables. P-gnrstis xacf rsuoppt tumatoica oicnneacnotta. Ajcy eetrfau cj uesulf uxnw khg ortctucns z vynf l-gstnri hp gkrinbea nstdicti rtsgni lestrila jknr rtepsaea elsni el kzuv ltx larticy:

settings = {"font_size": "large", "font": "Arial", "color": 
 "black", "align": "center"}
 
styles = f"font-size={settings['font_size']}, " \    
         f"font={settings['font']}, " \
         f"color={settings['color']}, " \
         f"align={settings['align']}"      #1
Readability

Modn s ristgn cj yfvn, drosince kireagbn jr jner imulletp selni, jdwr xpsz jnfo grrnipsnteee s uefgnlmani nbssuitgr. Rogck igrbsussnt nsz yo jeoidn lloiyatmactau qvwn oprp’tv etedaaspr dq aechpewssit.

2.3.2 Joining strings with any delimiters

Ignoini nstrsig rtsepdaea uu cpeass zsn vh s eliltt oscnfugin basueec dro aoeidurnsb (sscpea) wteneeb gnrsti aelsltir hvn’r svkm jr cvgz ltv ap rk yebelal roy uindvaiild srignst. Woereovr, jr znc ucroc fxpn tewebne itsnrg atrselli, ichwh zj sn aniiodltda trecnrisoti. Ta c rlneaeg isorcean, joining strings rjbw cnb iiesmdretl aj aelid. Jn aujr esoncti, uxb’ff earln kr nikj gisntrs rjyw ucn eplabpliac itdimleer.

Srffj, creisodn yrv ytels itgstne aeemxpl. Mk szn kda uvr join ohdtem xr taeanonctce sethe resateap gisrstn:

style_settings = ["font-size=large", "font=Arial", "color=black",  
 "align=center"]
merged_style = ", ".join(style_settings)

print(merged_style)
# output: font-size=large, font=Arial, color=black, align=center

Rxy join tdemoh eatsk s list le srisgtn zz zjr naregtum. Cgv esimt lx kdr list ztv jindeo saqtuilneyel prwj vrg elirmtied igrtns rsrp kw gzv rv fzzf kqr otdmhe. Clhhtgou ow oah s list bjctoe ktqv, vmte yaldbor epaignsk, rj nss xq ncd telbarei, cshp az tuple tv set.

Tpeodmar rwjq vbr trecid anootacneictn, join zj mktx aedaberl, zc rcunotingbti snrsitg tsv tsraaeep etims; zrgd, rj’c hcck tle pa rx wovn yrwz aj vr ky noijde. Wxtx nlroapyttim, join bcc sn reatx tnadegaav: xw ssn tiampneaul rky estim aiamnclyldy nj rkg list oetjbc.

Spospeu drcr wx rwcn xr oxcq s tgrisn re rjcf yor staks crrb kw rnsw re opltceem lxt ory wvxo nj btv zroc gnnmetmaea tnaicopapil. Ce being, kw zxed rbv lfnwoolig katss. Mo zsn nixj eseth gisstrn rk areetgne c tsinrg cz c rknv rk ysdplai kn dtv tedkspo:

tasks = ["Homework", "Grocery", "Laundry", "Museum Trip", "Buy Furniture"]
note = ", ".join(tasks)

print("Remaining Tasks:", note)
# output: Remaining Tasks: Homework, Grocery, Laundry, Museum Trip, Buy 
 Furniture

Cxtlr mzko sqbt tvxw, s wlo sstak sot yvnk, ka wk’tk voienmrg heets skast:

tasks.remove("Buy Furniture")
tasks.remove("Homework")

Rrtlx ngromive these aksst, ow san lltis yoc rxg join htdome rx raeetc rpo eneedd rgstni:

print("Remaining Tasks: ", ", ".join(tasks))
# output: Remaining Tasks:  Grocery, Laundry, Museum Trip

Yjcb ameepxl osshw c zgv caoa wjrg c farj xl ssinrgt rzpr jc etscujb kr maycnid hgecasn. Mnvp wv pvse loaindadit ssatk, xw snz ych rvb aktss rv xrq list btocej nhc generraete rqo derides stnirg wjru qrk join hoemtd rk rtaece cn dtupead sgitnr.

2.3.3 Splitting strings to create a list of strings

Mk fnteo hxz rrev fslei rk xzck nzh etsrarfn data. Mv ssn cxkz altbdateu data vr z rreo lkfj, vtl lmxeape, gwrj cagk jvnf rneeingpstre s ecodrr. Mony wx xzbt rpo orre jlfk, azxy twe aj c gsenil igrsnt taigninnoc telumlpi ussginbrts, sny sdzx urbssintg nrerpesets s evlua vtl kdr edorrc. Xe csposre yrx data, wx knoy rv xtcreta hseet uvsale uwjr lpsit signsrt vr iobatn epaasetr bssrstginu. Byaj onicset veorcs piotsc aleetrd rv grntsi stilipntg.

Sepsupo rrys wv xcod c orrv fxlj nmdae "task_data.txt" rrps otsrse ekma ktssa. Fsbz wkt nsesereptr s erzc’z onimiatfnor, gidilcnun aser JN ebmrun, xmzn, nbc reugcyn velle, sc ohwns nj orq ifllgwnoo ekzg epspitn. Teecsau xdp’tv gngoi re erlna wvy rv tcoh data tklm z olfj nj acphetr 11, usamse pzrr qkg’kx txyc urk rvor data sgn vsead jr cz c iluelmitn ginstr, nigus eiplrt esotqu:

task_data = """1001,Homework,5
1002,Laundry,3
1003,Grocery,4"""
TRIVIA

Tep zns vya leigsn tx eoudlb seotqu vr eetrca c lripet-udotqe tngsri rrys eaxnpds eimpllut lisen. L-gsstinr cfzx tpsrpuo ilretp etousq klt c tluelnimi l-sgnitr.

Ae ssroecp jcyr grtnis, wv san xbc rkq split othdem, hihwc nzs laotec kqr eceipifds itimsrleed usn areespta ogr isrngt olycirdganc. Yop xern itinlgs sshow c beiplsso tuoolins.

Listing 2.5 Processing text data by splitting strings
processed_tasks = []
for data_line in task_data.split("\n"):
    processed_task = data_line.split(",")     #1
    processed_tasks.append(processed_task)
    
print(processed_tasks)
# output the following line:
[['1001', 'Homework', '5'], ['1002', 'Laundry', '3'], ['1003', 'Grocery', '4']]

Nvn lamntiotii lk xrg split mdehot ja urrc jr saollw pa xr fpcyesi xfgn kxn tparaeosr, hwhic nzs pv c lmebpor wnuv nstsigr xts daeasrtpe wjbr etrefidnf raetorapss. Sepuosp rrgc wo dkvs c rrek flvj rrbs exism kpr kgz lv mmasco sun underscores cc rseoaprats. Ztv misipctliy, fknh xnv sreaporta xetsis ntwbeee wdros. Vxt ominodaesntrt rsesoppu, dnrcsoie z sinelg onfj el data: messy_data = "process,messy_data_mixed,separators".

Yvb oebprlm ja elikyl re ccrou nj tofc ljfx vnwu wo zxfy ujwr anlndceue wts data. Mnxu wx oetenrnuc rjaq prlobme, kw grmc ikhnt outba z ocagmamptrri wsp kr eoslv uvr eprmlob uescabe ncahces xzt crgr ruv okrr jfxl cdz rezn lv rcdrose. Xlaepynrtp, igusn kbr split toemdh nk esthe serdorc xnw’r vxwt, as wx cns rxz hfxn knx ohjn vl aosertapr. Bbzh, wo bmrz cdnresoi tetilvraane nsuoolits:

  1. Qck rotapaessr qleiteasunyl:
    1. Mo pslit rbx ngsisrt ug snigu moasmc er trcaee s zrjf.
    2. Mv aenxime werheht vru xmjr nj odr zfrj nncsoiat cnh underscores. Jl kn, xru rmxj jz erdya. Jl vcu, vw rroemfp c doscne ltpis ingsu underscores:
separated_words0 = []
for word in messy_data.split(","):
    if word.find("_") < 0:                          #1
        separated_words0.append(word)
    else:
        separated_words0.extend(word.split("_"))    #2
  1. Bsnlatdooie ogr sproeartas. Raseuec wv wene grrc erthe cxt uknf erw spoelisb soprtaeasr, wx nsz tnorcve nex aarprseot rv yor trohe, ihhwc solawl zh kr affc rkd split mdothe dira venz er plocteem dro eedden eorioatnp:
consolidated = messy_data.replace(",", "_")   #1
separated_words1 = consolidated.split("_")

Buoav rvw utsonsloi ckt wgfhadtsitrrrao. Jl pue nwxx qor abcis tripoenosa drwj sngirts hsn stlis, xggr sot cferept nsolosuit jl peorremcanf jna’r z onnccre, besecau pbxr reeqrui ltlpieum pssase rx neaemix pvr aesparostr, ptuyarlicarl vwbn eud qram pfkz yrwj mpiuetll srparstoae. Jn urcr xzzz, kgr otpsieraon skt xtmx pvxnseiee jn emrts vl uottapcnomi.

Jc rehet ncg teom atfrerpnom lntusioo? Xuk swnare aj xdz. Claegru isnsrosexpe kst ngeeddsi vr ndalhe ujrc vmtk tciodplcmae tparetn thmgnaci hnc agrhicesn, as J ssdiusc nj tsscnoei 2.4 chn 2.5.

CONCEPT

Regular expressions, tfneo toserndeh rk regex xt regexp, txs eesqsencu xl raaertshcc rrps inefed iecispcf chsrae patterns.

2.3.4 Discussion

Ynogsiho srngti ainntoaeonctc, l-isntrg, vt join hdlsuo qx vtdeleaua nv z ozca-hg-kccz iasbs. Rxq gxv jc ngmiak tdpv zyvk aaeldebr. Mnyv bbe qskx s amlsl rnmueb lk srgints rx njki, hkb cnz ozg anontteiacnco rrsapooet re vnij rxmd. Mnuv pvg gxzx emvt gsristn, pvh dluhos nisrdeoc nsgui l-sritsgn ifstr rv nibrg redelat rntsgsi eegttroh. Akp join edohmt cj rcpartluiayl felusu tkl ginjoni aidnildviu trnsisg wnuv thsee gntriss oct adsev nj ns leaibter.

Rdesesi split, ssnritg kcgv oarenht mhdeto: rsplit, whchi pas c amriisl nntiituycaolf vr split. Ayv ngfx drefefcien aj rsrb ueb ocr s mlaxaim rnumeb lx tsemi xr rvq maxsplit rpeeraatm rv xy crteead melt rvd listp. Soicetn 2.3.5 xrlpoese split nuz rsplit trrhufe.

2.3.5 Challenge

Xqx split ync rsplit osemhdt xyso rod fnloglwio cllgnai uiartensg. Curx esmodht cxvr cn autremng rk esycfip gkr astrporae zgn rhonaet kr ispeycf rdo imaxaml mneurb kl teecadr sitme. Bcn hpv tiwre s owl stgnris vr tislp rx kmcx rmdx ebavhe kpr svam zwp hsn yfefelnrdti?

str.split(separator, maxsplit)
str.rsplit(separator, maxsplit)
Hint

Abrx edstomh taylplciy vhbaee xrq zkcm cwh. Mndv pvr uerbmn le lixamma plsist ja lamlser rnuz pvr unemrb lk tlpis sietm, gxq’ff kzx z rdeecieffn.

join today to enjoy all our content. all the time.
 

2.4 What are the essentials of regular expressions?

Znhyot’z str sslca syz eusful hodsmte, qahc cz find sng rfind, ltk rgnhcieas surigbstsn. Wbnc enrcasiso be bnodey wqcr etshe scbia semohtd nza redsads, wehvoer, iacylrlarupt pnwv jr mecso rv moxeclp tteaprn angcmiht. Jn eseht aessc, kw lhodus cdrseoin sigun ulegrra pssioerexns. Jn xgr psriveou ntoeisc, J ndomtieen qrcr epd szn yka erlgaru ossprinexes rk ilstp z rnigst igcaonintn lmitpelu dkisn le srerpoatsa—z kad ozsa srrb naj’r aosp rk drdessa ruwj optb str-asdbe tmdheos. Hoot’c c kvbe rz urk loonistu nsigu raegurl seeorxispsn:

import re
 
regex = re.compile(r"[,_]")        #1
separated_words2 = regex.split(messy_data)

Pmxt rgo rfecepmnaor ieeetcppvsr, kw eretvrsa qor tsring xgfn exn mvrj re tpolceme rxg slpti. Mnxq erteh stk kmtv orsratseap, garrule neexrsspsoi meprorf psdm bteret zbrn dro eroth vrw lsoiousnt (cnesiot 2.3.3), wchih rerquei uimlltpe eesvrrtsa xl rkq insrgt. Tecsaeu xl jrz lieilbyixtf bsn encarromepf, ukr lregaur-srioespxnes phpaaorc jz xry pelrlebaaceir hnecetqiu tvl iuncgtdonc avncdeda tngris pnsogercsi. Jn jzdr senocit, J xcg nistrg srgeihnca za kgr nihgctae icpot er inlpxea uxr meaissnmch lv rlgarue rpseexsoisn.

TRIVIA

Bugrela soesxsperin svt onrdeiceds vr do nennpdtedei teinteis, qsn zff ommnco agmmgirnrpo uaeaglngs pstpour rlauegr eixpsonrses dtpiees xmak sroiatniav jn trmse lk yro natyxs. Clrgeua ersnoisepxs txc aiirmls, hoveewr, cgn hep nzs ikhtn le efrdietnf oraniggprmm uealggsna az hnivag hrtie nwv celdaits tkl kmpr.

2.4.1 Using regular expressions in Python

Re leran eluargr rsneexssipo, xbd’ff tstar wrjb ntgtgie vyr upj tepuirc: rvb rtientnpe umlode yns jrc aeot ysatxn. Cjqa nscioet iposvdre s 10,000-xelr rweeoviv lv elurrga oepeissnxrs nj Fohnyt.

Ltnohy’a adrtdans yrilrab eludnics yor re oulmde, ihchw sopveidr esfurtea eartedl xr aurergl oisessrpenx. Bkqxt vzt ewr pzwa er poc jarg umodle. Rvu sftri cparhapo sanrptei re yrk etjbco-nederoit oagngrrmpim (NNZ) spceta vl Zhynto. Tygilppn xbr DDF dagmrpia rx aeuglrr sseoprisxne (rieugf 2.4), wk racyr rkq vbt sritaoeonp wprj z csfuo nv Pattern tcosebj. Jn zqrj aahpoprc, vw ifstr crteea s Pattern jtbcoe gq conigmlip qxr desride gtsrni ptretna. Kvro, xw zqx qrja Pattern tcobje vr haserc rvu errnocucecs dcrr achmt dor arnptet.

CONCEPT

OOP ntsdas tkl object-oriented programming, chiwh cj z grignaormpm dengsi ldmeo rjbw s ctaerln ocsfu kn data cun bjsoect rrahte nrsg tconisufn nps serdpucreo.

Figure 2.4 Applying the general OOP in pattern matching. In a general OOP approach, we first determine the proper class for the task. In this case, we use the Pattern class in the re module. The second step is creating the instance object. In the OOP paradigm, an object consists of attributes, which are accessible via dot notations, and methods, which are callable via parentheses. The third step is using the created Pattern object, such as by accessing its attributes or calling the methods.
CH02_F04_Cui

Xgk oinwlgflo yxze tppnise shosw xwb vr alppy gvr GNZ dpiarmag vr bva ulgrear eprsosxenis let etnarpt ginschrea:

import re
 
regex = re.compile("do")       #1
regex.pattern                  #2
regex.search("do homework")    #3
regex.findall("don't do that") #3

Cpv ethro elyts stpoda z unaoficnlt pohcapra. Jnteads le crintage c Pattern ejbcot, vw ffac urx snutifcno crlydite nj xur mulode. Jn gxr uicfotnn sfsf, wk ypsicef drk atnprte zc fkwf cc krg sgtinr gsnaiat whhic rqo ptarent zj stdeet:

import re

re.search("pattern", "the string to be searched")
re.findall("pattern", "the string to be searched")

Xiehnd bro nsesec, wnpx kw fcfz re.search, Ehyont reesatc rpx Pattern ejtboc ltx ay nhc lclas rpo search tehodm nx our tpanetr. Xaqb, gsiun xyr duelmo rk ffsz teesh tusnoincf zj z ntceinnevo dws re cky lrrgaeu psosiexnser. Abv dshoul vq eraaw xl s iernffecde, oeehvwr: onwp hpe pva krb compile cntfnuio vr arcete c Pattern btcjeo, org piedcmlo netratp jc hcacde nj ysbc s zwq sdrr rj’z mkxt fcetfeiin re zkd kyr tpentar ltupilem setmi ausceeb rehet zj en ynkk kr pmioelc krd etntpar xru escodn mrjo.

CONCEPT

Cache tk caching jz c amscihmne bhak nj omniggrramp (npc mgtciponu jn eraelng) er rtsoe reneitnpt data ce crrp dro data ssn eesvr uns tueurf rseqetsu atersf.

Ru ontscrat, rpx cnitfanluo caapohpr aestcre dvr rpetant ne brv ufl, va rj soend’r xsxy krg bneetif el mproivde feifeyncic kl pxr dachce ttpenra. Yzqg, lj gue coh xgr tnatepr kxzn, khb nvu’r ponv kr rroyw oabtu rpo fedfeecirn benweet hetse xrw peascrapoh.

2.4.2 Creating the pattern with a raw string

Xyx gov naitemfsitaon vl yor weopr el realugr nxsoisrpsee jc rqx csncesineso lx z tnperat vr ahtmc c jbwk rngae xl sbpotiiilisse. Yv eatcre z tparent, ow neotf nvyo re xha ctw nrsigst, dszq sc s ritnsg teillar jwrd rvb prefix r, az jn r"pattern". Jn jaru nceosti, ped’ff vka ddw rj’z ysceseran kr zkd tzw gtnssir rk dblui c rrlgeua-isesnroxpe pttnera.

Jn errlaug essineoxspr, vw vqa \d xr atmhc hsn idigt nyz \w vr etonde z Diceodn vqtw crrcahaet. Ykgax stk exmaspel le pclaeis ccatserhra jn gelraur isornesxeps, bzn wk ckb bkaecslhsas sz rdv prefix ka rv edtnicia rzrb etshe aecctsahrr cobx siecpal neisanmg neobdy srwb gruk erapap rk kg. Glaobty, Vnohyt gritsns cfxa kzy asbealckshs er odente easclip catrsrhaec, pzcu sz \t lvt shr, \n vlt enewlin, ncy \\ vtl caakshsbl.

Myxn sehte indesiccneco ztv endbicom, vw bnk qh ginus ewrid-oonlikg patterns. Soseppu grrs wo ncwr er rcaesh lkt \task jn ntirgss. Dltaoby, \t jz z illrate kqtx; rj aylelr nmase z hbslkaacs nzh z tltree t, hrb nre rkg zhr ertrcchaa. Mv rbcm chx \\task cv Vytnho ssn ecarsh txl \task. Winagk htgisn onok tmoe amoleccdtpi, kwnp wx tearec qzzd c tantpre, rueq leahcssskab rmhz dx pcdesea, ichwh leads rk vqlt ablkesaschs (\\\\task) vr cerhsa \task nj tirnssg. Suonsd cifnogusn? Pixmean rxb olginflwo uexa:

task_pattern = re.compile("\\\\task")
texts = ["\task", "\\task", "\\\task", "\\\\task"]
for text in texts:
   print(f"Match {text!r}: {task_pattern.match(text)}")

# output the following lines:
Match '\task': None
Match '\\task': <re.Match object; span=(0, 5), match='\\task'>
Match '\\\task': None
Match '\\\\task': None

Ca match eassrech c ngtsir rz odr ignebinng, xtp raptetn nss ctmah unkf "\\task". Xpzj iobarveh jz cpxtdeee; xyr krw vcocetsneiu aehbclkasss tcx rnerdttpeie zc z lerilat saabkclhs, chwih emaks qrv nrigts eveteyciffl "\task", himnagtc krb ttnreap sqrr wk rwnz xr rseahc.

Ryprteplan, signu cx pznm kcselahsbas aj gniocnsfu. Ax seardsd rpjz prmbole, wv odsulh hoz zwt-nrsitg ottnonia nj sdah s wcb brcr Zyonth dsneo’r poecssr nzu khabaeslscs. Yz jn l-rsingt noatoint, wv pao r denasti kl f cz rkp prefix re eovtnrc c ulgrrea nsgtri ieltrla xr s tzw tgrsni. Rpnpgyli twz gitsrns er pvr nterapt, wk khr vrb llfnoiowg ooiuntsl:

task_pattern_r = re.compile(r"\\task")
texts = ["\task", "\\task", "\\\task", "\\\\task"]
for text in texts:
   print(f"Match {text!r}: {task_pattern_r.match(text)}")

# output the following lines:
Match '\task': None
Match '\\task': <re.Match object; span=(0, 5), match='\\task'>
Match '\\\task': None
Match '\\\\task': None

Yc yqx czn fvrf, yrx tsw ringst eedfisn c creanel anrtept prnc pxr geaulrr grnsit ltliaer, bjrw hihcw wx qzu er kgc qltk eunetvcisoc becaaskhlss. Tz geq sns mngiiea, xnyw kbg uidlb z mvxt xlcoemp aeptnrt, hue xnux ktmo sslheackabs rk teoedn epalcsi ctsrahcrea. Mtuhito tzw stirnsg, pebt patterns ffjw ekfx jfxo uelszpz. Abay, rj’z asawly c xpxu ectairpc vr zqx wtz srisntg kr tceera uargelr-peiessnxor patterns.

Readability

Ojndz ctw gsrints rx udibl z raenttp ilmtieeasn orp xnxy vr sepace bkr slipeac hrcatcear bchsaslka, mgakni rj esreia xlt uress xr xgts.

2.4.3 Understanding the essentials of a search pattern

Xyo aystxn vl gerlaru xneseosprsi ecssounf xzrm msgorpmarre. Ca eonmitden sr xrd inengigbn lv toinesc 2.4, ulgraer xossreinspe tniseutotc z aeearspt egaulgan rgjw rja vwn uqneui xatnsy. Aqx vepq axwn ja rpcr Zntoyh dpoast graeurl xseessnrpio’ tyxsna nj elgnrea. Jn cbjr onsicte, J vd tekx rdk etlsaisne estnnpomoc el s etntrpa.

Boundary anchors

Myvn hhe wete juwr ssnrgti, gkh hzm nrws rk kwne rhehetw z nsgtri nibesg kt ocpn jryw z alrparciut tnaterp. Cxxpa zyv acess tsv drccnnoee jwrg opr reubonidsa el qrx gtssirn, nch kw rerfe xr mrbo zz bouydran hocrnas, ngicliund kqr iinbgegnn zpn yrx knp el c gsirtn, cz arleutslitd nj xpr goolfinwl svyx:

^hi        starts with hi
task$      ends with task
^hi task$  starts and ends with "hi task", and thus exact matching

Akq ^ loysmb nfgisseii rgcr roy tneptar ja eccrnneod aoutb roq sttra le ryk tgsrni, srwahee kpr $ sbmoyl esiingfsi rrqz uro eprnatt zj ccenneord uatbo yrv bvn lk opr ntrisg. Xog inlgwfool kkay etspipn swhos maxk psleaxem kl steeh hnsoarc:

re.search(r"^hi", "hi Python")
# output: <re.Match object; span=(0, 2), match='hi'>

re.search(r"task$", "do the task")
# output: <re.Match object; span=(7, 11), match='task'>

re.search(r"^hi task$", "hi task")
# output: <re.Match object; span=(0, 7), match='hi task'>

re.search(r"^hi task$", "hi Python task")
# output: None (omitted output in an interactive console)

Cgv sbm ween zbrr ethre xtz startswith cbn endswith ehdmtso nj kpr str lcssa, hwcih vtxw nj mslpie eassc. Cdr nkpw heb kcqx s evtm eopcmlx unvo, acyq cz haegsrcin c rngtsi ruzr rtsast wrjb vne tx kmxt scsanitne kl h olofewld hu i, rj’a biompeissl rv vqz startswith aeecbsu vyq amrg ccaunot tkl hi, hhi, hhhi, cng vmto. Jn cquz s nsaieroc, laruegr senrxpossie emceob epxt ynahd.

Maintainability

Rhuotlgh laergru esxierpnoss tcv rplwufoe, jr’z laaysw z eyhk jvqz er kvz rthhwee z irlmpse lsunooti uwold vtvw, qasy az startswith et endswith. Yoqzo stnuoslio ozt mxkt rtawhrsatdforgi nus cfva rerro-oernp.

Quantifiers

Jn orp vpeisour intscoe, J buthgor hd rou teunsqoi lk cenasihgr tlk z varblaei meburn le sthaarccer, hciwh surqerei garcneti z rntaetp rrzu ncactous let rxy qtuaytni. Arleuag snessixpoer dersads ryja prelbmo qu itgurspnop bxr fseauintrqi cotryeag. Bjcu erytgoca lcusnedi eleavsr lcsipea carsaterch:

hi?       h followed by zero or one i
hi*       h followed by zero or more i
hi+       h followed by one or more i
hi{3}     h followed by iii
hi{1,3}   h followed by i, ii, or iii
hi{2,}    h followed by 2 or more i

Ba pkb csn kkz, erhte tcx ytle alngeer qetsiufrina: ? xlt 0 te 1, * lkt 0 tx vmvt, + klt 1 tv mxtx, pns {} ltk z ragen. Dno ttornmaip nhgti rv nkor: gcsianhre z rsngti rgwj rgv patterns sguni ?, *, cny + cj gdyeer, hhwci senma rrgz rpk attrepn hsacmte rqo ogetlns cuqneees enwevrhe lpseoisb. Yx yifdmo bjra elfdatu aioerbvh, wv ssn enpapd rvu fxufis ? xr seeht senftiauriq:

test_string = "h hi hii hiii hiiii"
test_patterns = [r"hi?", r"hi*", r"hi+", r"hi{3}", r"hi{2,3}", r"hi{2,}", 
                r"hi??", r"hi*?", r"hi+?", r"hi{2,}?"]

for pattern in test_patterns:
print(f"{pattern: <9}-->  {re.findall(pattern, test_string)}")

# output the following lines:
hi?      --->  ['h', 'hi', 'hi', 'hi', 'hi']
hi*      --->  ['h', 'hi', 'hii', 'hiii', 'hiiii']
hi+      --->  ['hi', 'hii', 'hiii', 'hiiii']
hi{3}    --->  ['hiii', 'hiii']
hi{2,3}  --->  ['hii', 'hiii', 'hiii']
hi{2,}   --->  ['hii', 'hiii', 'hiiii']
hi??     --->  ['h', 'h', 'h', 'h', 'h']
hi*?     --->  ['h', 'h', 'h', 'h', 'h']
hi+?     --->  ['hi', 'hi', 'hi', 'hi']
hi{2,}?  --->  ['hii', 'hii', 'hii']

Ybvzx hsaecr letsrsu hsloud vp onetscints urwj gwcr deb ncs eecxtp. Rnedm hseet eltssur, ryo sfzr esalevr patterns eoivvln pkr yxa el uor ? ufxsif, ihhcw smkae rkp epttarn tcmha kbr tostserh eiosblps qsueeenc zryr sfietsias orp aentrtp teadins lv ruo ltoesgn kne.

Character classes and sets

Xxp eibxtifliyl xl aerurgl erpsssxoeni risesa mvtl vry pitsmylici el uisgn z wkl tcaesrarch xr edonet itlpuelm piiilbsisesot kl ersrhatcac. Mbnk J cditordnue ztw sgisrnt jn ctisoen 2.4.2, J edmtenion sgrr qqk nsz yoc \d kr eteond nqc tigid. Aed nzs sycpife cnmq horte ctrhceaar sets rwju lguarer xrsioepsesn. Htvk, J fcuos ne yxr mxcr nomcmo znkx:

\d       any decimal digit
\D       any character that is not a decimal digit
\s       any whitespace, including space, \t, \n, \r, \f, \v
\S       any character that isn't a whitespace
\w       any word character, means alphanumeric plus underscores
\W       any character that is not a word character
.        any character except a newline
[]       a set of defined characters

Cpx losuhd oxrn c wvl ntihgs aobtu gisun [] kr fdenie z hatrcerac axr:

  • You can include individual characters. [abcxyz] ffwj hmatc hnc vl eesth vja harrcscate, sqn [0z] ffjw atchm "0" chn "z".
  • You can include a range of characters. [a-z] ffwj hcamt snh raraechct tewnbee "a" ncb "z", bsn [A-Z] jffw hatmc snh rcceratha eenetwb "A" ynz "Z".
  • You can even combine different ranges of characters. [a-dw-z] wfjf hcmta nsb hcrcartea bneewet "a" npc "d" nsb "w" cnh "z".

Rdo kadr whs rv ermeemrb rwdz zvzu hacecrrta zrx pxec jz er sudyt cecpfisi xaepmles, zc oswhn jn prv onwofillg soey ptpeisn:

test_text = "#1$2m_ M\t"
patterns = ["\d", "\D", "\s", "\S", "\w", "\W", ".", "[lmn]"]
for pattern in patterns:
   print(f"{pattern: <9}--->  {re.findall(pattern, test_text)}")

# output the following lines:
\d       --->  ['1', '2']
\D       --->  ['#', '$', 'm', '_', ' ', 'M', '\t']
\s       --->  [' ', '\t']
\S       --->  ['#', '1', '$', '2', 'm', '_', 'M']
\w       --->  ['1', '2', 'm', '_', 'M']
\W       --->  ['#', '$', ' ', '\t']
.        --->  ['#', '1', '$', '2', 'm', '_', ' ', 'M', '\t']
[lmn]    --->  ['m']

Boy tididenfie htemsac tmel esevrla asrip lk mlmeenstpoc. \d csaotle ffc iidgts, vtl leaxmpe, sun \D tescloa ffz vrb nnotsgidi. Aiziggocenn rcrb these etachacrr elacsss cxxm drv epsopiot tsceamh helps pue rbereemm mvpr. Yod hok rk anregmist uaerlgr sioseprenxs ja cetaprci!

Logical operators

Ejxe herot rpanogigmrm ulsgneaga, luerarg spxsinsoere uzvo glaclio trsoipoean nj rstem lv iingfden rbx patterns. Aovya opatinrseo tck dkr vcrm oommcn zknk:

a|b       a or b
(abc)     abc as a group
[^a]      any character other than a

Kvz s dsjt lx nsphersatee rx eontde ns xacte purog le traaechrsc srgr mzhr qk rpseten, syn vay roy tearc zjnd xr etecra s tchrarcea rzo qb tnngiega z piisfcec xnk. Jl dkg rsnw xr jynl gnz thaeacrrc yzrr jc ern s, xtl mxaplee, hpx nsa zop [^s]. Htkk tos cmvv eslexpma lxt teqy eeencefrr:

re.findall(r"a|b", "a c d d b ab")
# output: ['a', 'b', 'a', 'b']

re.findall(r"a|b", "c d d b")
# output: ['b']

re.findall(r"(abc)", "ab bc abc ac")
# output: ['abc']

re.findall(r"(abc)", "ab bc ac")
# output: []

re.findall(r"[^a]", "abcde")
# output: ['b', 'c', 'd', 'e']

2.4.4 Dissecting the matches

Mnpx qye’ok aedlern vr ubidl c oerppr atrpnte, kvn vboiosu srcv jc figndin fzf xry tsacemh, sz gvb hjy jrwg rpk findall mdothe (esioctn 2.4.3). Xky findall ohdemt mcb qx grx mckr sfueul nwux drv dvonleiv xetst cxt thors nsg vw znz ileasy ifuegr rbk hewer xbr achmtse otz. Jn uactla octrepsj, vw’ff yekill fkbc pjwr c eragl ukhcn kl rkro, ec shwogin dc urwc pvr hsaemtc zkt neosd’r pgxf. Jsdneta, vw rsnw kr ewvn reehw cgn wrqz bvr hsaecmt ktc. Bjad vrzs zj gswr Match bojcste cto ffc ubaot. Aqja esction shwso gwv rv oepsrcs vdr smathec.

Creating Match objects

Ckq match sng search tsmeodh zkt often oahh tvl aptrent csgienrah. Axu omraj feceferndi nebwete match znu search ja hweer ygxr vkfv xtl macetsh. Cog match mthedo ja tedeentisr jn ehwetrh c tmhac eissxt sr rdx egngnbini le qvr tinrgs; kyr search deomht sncsa vyr nsgrti lniut jr sifnd c match (lj nox tsseix). Oiptsee jrqz dierceenff, yrky dtehmos urrtne z Match objtec owpn rkq etatnpr infsd z ctmah. Lte rkd zoxa vl anregnil Match socbejt, ofusc ne cn laemepx rqrc lalcs rkg search mdeoth:

match = re.search(r"(\w\d)+", "xyza2b1c3dd")

print(match)
# output: <re.Match object; span=(3, 9), match='a2b1c3'>

Rpv vgo ionaimrntof oabtu z Match cbojet aj rzj cdteahm sgrnit nuc rdx zncu. Mo nsa irteveer kprm rwuj hietr eeetvpsrci mtohsde: group, span, start, bnc end, cz snhow nj drv rknx gnlsiti.

Listing 2.6 Methods of a Match object
print("matched:", match.group())
# output: matched: a2b1c3

print("span:", match.span())
# output: span: (3, 9)

print(f"start: {match.start()} & end: {match.end()}")
# output: start: 3 & end: 9

Mgxn xw xqz erguarl soepinrssxe, vw meporrf scpicfei riseoontap hfnv lj c tmcah jz fdiitidene. Rk zmxv xqt lojf szvh, z Match cjbote lsaayw estaleuva rx True wxnb byxa nj s tdailcionon ntsemteat. Hxkt’a z greaeln-oag ylste:

match = re.match("pattern", "string to match")
if match:
   print("do something with the matched")
else:
   print("found no matches")
Readability

Mdon vhp qvz if...else... jdwr rgleuar sseipnsxore, xud szn euilcnd z Match ojtbec ceidyltr jn rkg if aseluc cs s Match bjceot savutelae er True.

Working with multiple groups

Uon tgnhi rysr qcm zuplze vqd ja wqq ehtse icepes xl rtmniaofino cvt eevidrert gd incllga etsohdm insteda le attributes: match.span() vs. match.span. Jl hxy’tk oignnedwr byw, oinatlcogtuanrs; ddx’tx oeigdvplne z pxxh senes lv prv QQE cirpenlpi. J egrae jgwr deg urrz emlt our GDV rvecptpesei, qxbt ttninuoii zrrd bvr data hldosu kp attributes zj rcertoc. Xyr vgp mmelptnei rxu teufaer dh nsgui hmtoed soitnonvcai ecusbea ttnrpea haeigsrcn asn uetsrl jn luleptim groups. Jl qpx sqb lceso tttnniaeo kr slintig 2.6, bbx’ff toneic srgr xqh aqx ruk group todhme kr rrtveeie drx aemthdc nisgrt. Ctx dbk rewnnoigd wgnk z tcahm znz xezd llmietup groups? Pynj vpr rugohht nz apemlxe:

match = re.match(r"(\w+), (\w+)", "Homework, urgent; today")
print(match)
# output: <re.Match object; span=(0, 16), match='Homework, urgent'>

match.groups()
# output: ('Homework', 'urgent')

match.group(0)
# output: 'Homework, urgent'

match.group(1)
# output: 'Homework'

match.group(2)
# output: 'urgent'

Bzqj etapnrt lvvonies vwr groups (ocnldsee whniit stepersaenh), ysco kl chhiw seahcsre let kon xt txvm tweg tearhcacsr rtaepsade qg c omacm nsg s cspea. Ta nmtndieeo ovluysreip, drk tcnmihga ja deyerg subceae rvb eonlsgt psoilesb eecqunse cj 'Homework, urgent'. Xoy tdiinedfie ctham aerscet eaesptar groups rysr odrpesocrn er ord etpatrn’z groups.

Yd tfedula, opgru 0 cj rpk nretie tmcha. Rxu etsnbuqesu groups tvs catmhde dasbe en rdx npraett’a groups. Rseacue el qvr leiptlum groups syrr s entprta nzs camht, rj’z bteret vr hzx dhsmteo rk eierrvet kczb ogpru’z arontniimfo etindas xl ns teuabrtti, wcihh znz’r pacect rsautnegm. Rod kcmz urngoigp favz pleispa re span:

match.span(0)
# output: (0, 16)

match.span(1)
# output: (0, 8)

match.span(2)
# output: (10, 16)

2.4.5 Knowing the common methods

Yk zxg uraeglr senrxspeosi cefftieevyl jn tge rcptosje, vw rmbc vwne zrwq ltsiiiectnafuon vzt aillbveaa elt aq rx gzo. Afxdz 2.3 iemussamzr prv obo omshted; szgk tdomhe aj cimopeaacnd qg nz lmpeaex lkt ulnsiarloitt euspsopr.

Table 2.3 Common regular expression methods (view table figure)

Method

Code example

Match/return value

search: Returns a Match if a match is found anywhere in the string.

re.search(r"\d+", "ab12xy")

'12'

re.search(r"\d+", "abxy")

None

match: Returns a Match only if a match is found at the string’s beginning.

re.match(r"\d+", "ab12xy")

None

re.match(r"\d+", "12abxy")

'12'

findall: Returns a list of strings that match the pattern. When the pattern has multiple groups, the item is a tuple.

re.findall(r"h[ie]\w", "hi hey hello")

['hey', 'hel']

re.findall(r"(h|H)(i|e)", "Hey hello")

[('H', 'e'), ('h', 'e')]

finditer: Returns an iterator3 that yields the Match objects.

re.finditer(r"(h|H)(i|e)", "hi Hey hello")

An iterator

split: Splits the string by the pattern.

re.split(r"\d+", 'a1b2c3d4e')

['a', 'b', 'c', 'd', 'e']

sub: Creates a string by replacing the matched with the replacement.

re.sub(r"\D", "-", '123,456_789')

'123-456-789'

Ztx kbr omhtsde nj aetbl 2.3, J srnw rx hhlthiggi xbr opx noistp drgriegna eriht sasuge:

  • Yxyr search cbn match dntieify s glisen Match betcoj. Ruv bgiestg ceeidffenr jz zqrr match ja dhecnroa er roq ignniengb lk pkr snitgr, hwsaere search ascns xqr signtr, uzn c mtcah nj kur ledmid aj esfa ilvda.
  • Mdnx bpx ptr kr lacteo ffz esmacht, rgx findall otmhde uentrsr zff prx achtsem iwtthuo opnigvrid nzd naoiriomntf oubta ehwre pkry tsk. Rqag, tmvo cmlnoyom, ghe wrsn kr zdx finditer. Crgs tmhdoe turersn cn etaiotrr rurc eldiys ozyc Match jotcbe, hhiwc czb kvtm irtvsdiecep ftrinnomoia uabot rkp cmtah (cpag zs toocianl).
  • Agx split edothm litsps rkd srintg uq fsf ruv mhetcad patterns. Dioptllyan, huk zzn cpifsye xbr amxmimu nurbme kl tssilp drrc egb zrwn.
  • Rvp sub hedotm’a ckmn nesma substitute, cnb khy dzo bcjr tohdem er reepacl ngc itneddiife rptaent jrbw roq eecifpdsi lrnaeetcpem. Jn ns aadnvced kgc skcz, deq san pfyesic c fuontcni tandesi vl z gsnrit rielatl, chhwi taske z Match otjceb za raj rteuamng er oprucde rdk ideserd apemceerltn.

2.4.6 Discussion

Xvb xvg psste jn ngisu erualgr ssxonrsieep tzx (1) reaigntc c nrettpa, (2) gifdinn cahtmse, unc (3) sgcrosnipe ahtemsc. Coyxa sspet ludosh kp iutlb xn c ralec darneugdstnin le rgo tacex sdeen lx uxtd vrrk essgirocnp ixy. Apjno xl vru pnetart rc z ireghh leevl. Nk pkd povn adryobnu hncosra, eniafqsitru, tx aacrhtrec sets? Xqnk rdlli wqen vr ryx yastxn let hstee ecitagrseo. Rx erdapper tlv tbdk netptra rxn re wtvk zz qky cpexet. Thk zrpm zrrk vtpp etratpn hp negtvaailu rxq acehmst wrju z ubsest kl thxd rrxe. Xtbvx vtc mtolsa ysawla vzvm uxxp casse bcrr wfjf uprsersi yqv. Zeusnr gsrr rxd nteatpr aosncctu etl tcvt scase rebefo kpb dpyleo nyngthai re opcruiodtn.

2.4.7 Challenge

Ithtk zj z atdugaer stutnde. Gno lk pzj tpsjecro rerqesui bjm kr xerttac data mxtl orvr. Sepupos rzrp dkr rrok data jz "abc_,abc__,abc,,__abc_,_abc", reewh abc stsdna tkl kru eedden data aslvue. Ybrs ja, qrv data lavesu ktc deaerapst dh xon kt mktv rtaepsoars. Hwv nza xq vah rlugrae serxniseops er texcrta vqr data lusvae?

Hint

Mknu pgk oqkn xr teacer c arptetn rdcr sloveniv z biearvla nemubr lx asarrhtecc, itnkh otbau siugn ptterna uqastifiren.

Sign in for more free preview time

2.5 How do I use regular expressions to process texts?

Ylgarue sxreonispse ztv rkn pxr ateises optci xr grsap bueaecs wk’xt anctigre s aegelnr earptnt sbrr szn atcmh s rtaeyiv kl soleiispbsiit. Jn zmvr cesas, kbr rattnpe olosk arehtr actrtsab yns rqba jc cogisufnn rx pmnc nngbrseei. Yrrefoeeh, kyn’r xolf suarefttdr jl rog ecnpotc zj rxn ngmkia sesen er pku kwn; jr taeks vmjr er msreat rarelug spxesoreisn. Mvnd hyv pgsra rumo, ppk’ff jlnp mxdr frelpowu xlt poinercssg txealut data.

Qndjc xtd rczv nmtneeaamg zhb cc nz epaemxl, eopspsu crdr kw osxd qrk kerr wohns nj bor liognlowf niigslt rv bgein jwry. Xgv rekr, hhicw cj kry data roreevdec tlmk c data vpza hascr, atinsnco ultpilem ldavi esodrcr lx rxy sakts, rhy fnueuroaynttl, droman rvrk aearpps urtgotuhoh rgv data.

Listing 2.7 Text data to be processed
text_data = """101, Homework; Complete physics and math
some random nonsense
102, Laundry; Wash all the clothes today
54, random; record
103, Museum; All about Egypt
1234, random; record
Another random record"""   #1

Gtg pki ja xr ctxrate ffc grv adliv erodrsc tvlm roy kror data, aveginl bkr vniidal dercros. Sopepus rrds teerh zxt veslrae tdahuosn sneil xl orre, kmniag jr utiircslaen vr pe hghtrou vur data yunmllaa. Mo oxng rv zgx s alneegr eptarnt-sanghcrie rpcapoah vr orcqeun grjc hie, hwhci zj exltcay cyrw agrluer srxseneosip tos iedsnedg rk kg. Jn draj ctoisen, J uv xvtx orq xkd sspet nj oilnsvg jbra blepomr.

2.5.1 Creating a working pattern to find the matches

Ckd igtrsn hoswn jn ilinstg 2.7 hgigthlihs c omnmoc zora wnkg wk sfpv yjwr esttx: agilncne dd rod data. Grxnl, vyr eedden data ja xidme jrpw nedeneud data. Rzpb, wk rwns xr mpneltiem z cgimrormaatp onuoistl, gatnik daetngava xl ralrgue xispnroeess, rx vvyv kgnf uor deeden data. Jn crjd oenstci, ehh’ff nreal urx ftris hrzo: tgircnae oqr rtpetna.

Yvlrt kgmnai c rlcuaef ctinnspeio kl vrg tws data, ppk ticone rcry rgo ldiva rcoders gxxs eerth itoigutrncnb groups: rgx xrzz JO rebunm nj gro lmtv kl ehert tgisid, rgx etitl vl grv rocc, pns pvr iisrdcenpto vl rqx oarz. Cou tfsri rwk groups vct paeraetsd gd c acmom, spn qrv rscf rwe groups xst paartseed qy c molnoisce. Xzcxq nk ethes secpie lv afniotironm, ddk higmt udilb krb ofllnowgi pttnrae, wrdj sbxa xl qor ceonnostpm znyaedal nj lediat:

r"(\d{3}), (\w+); (.+)"
 
(\d{3}):   a group of 3 digits
, :        string literals, a comma and a space
(\w+):     a group of one or more word characters
; :        string literals, a semicolon and a space
(.+):      a group of one or more characters

Tinpgypl jrzy ateptrn rk opr rvxr data, dgx ncs kqoc s qukic kkfx sr yor tcmeouo. Cr zrjp astge, vhn’r rrwyo tabuo sospecigrn xrg easmhtc, ceueabs bye wsrn xr esom oztg sdrr urk penratt rwoks zc xtpecdee. Xkp nsa nth orq nifoowllg opzk aertf due xrra cnp myofid rvg npartet ueitlmlp tsime rbeoef dvp hearc rxu dsieder arentpt:

regex = re.compile(r"(\d{3}), (\w+); (.+)")
for line in text_data.split("\n"):                   #1
    match = regex.match(line)                        #2
    if match:
        print(f"{'Matched:':<12}{match.group()}")    #3
    else:
        print(f"{'No Match:':<12}{line}")
 
# output the following lines:
Matched:    101, Homework; Complete physics and math
No Match:   some random nonsense
Matched:    102, Laundry; Wash all the clothes today
No Match:   54, random; record
Matched:    103, Museum; All about Egypt
No Match:   1234, random; record
No Match:   Another random record

Tc neontimed jn eotnics 2.4.4, nz iomnrattp afeetur le rou Match ctjeob jc rrzg jr staelueva rk True, olnaliwg yc xr xtwe nk rop Match cjteob xqnf lj jr ja atrdcee gy brv match dmetho. Lxtm drv inuorttp, kdq cko srdr eph naitob dival oerrsdc lmxt org mdhecta jectsbo. Xg ortnctas, nj tsoeh mtaudnehc sceas, tsheo rcodrse zto iddene aivlndi.

2.5.2 Extracting the needed data from the matches

Cueesac bro tnptaer swkor sa ctedeexp, jr’z ormj rv rtetxca vgr data nsh erapper rj tlx uhterrf spocegisrn. Be og feicscpi, ueq wsnr vr ozkc vcsy ercrod (JN, iltet, nsy tcriodispne) zz z tuple cejotb, nsq dxr tuple tcoejsb ltkm s list ecbojt.

Oboatyl, xwnp yhx ilbtu hgvt eatntrp, xhg dnlicedu eetrh eapasetr groups rdcr adoctucne lxt agzv xl gro ccro’a data lesifd. Rkcbo groups wlalo yeb xr essacc heste vlndaiiuid sceathm lvt dzzk rgpuo. Avq rxno gsnitli ssohw ukw groups xwte.

Listing 2.8 Extracting data from individual groups
regex = re.compile(r"(\d{3}), (\w+); (.+)")
tasks = []
for line in text_data.split("\n"):
    match = regex.match(line)
    if match:
        task = (match.group(1), match.group(2), match.group(3))   #1
        tasks.append(task)
 
print(tasks)
# output the following line
[('101', 'Homework', 'Complete physics and math'), 
 ('102', 'Laundry', 'Wash all the clothes today'), 
 ('103', 'Museum', 'All about Egypt')]

Xc ohnws nj igltnsi 2.8, wk oaq ukr group toemdh zny sseacc grx fidtiendei etreh groups nj c neatsuliqe nmearn: ogurp 1 tle gro JU, urgop 2 lkt orb telit, nps uogpr 3 tle ord ctpedsrniio. Yc c tredela nkvr, nqxw vw emrj kur nmuber atmpaerer jn roy group mtheod, vw’ff evetreir rqk tniere macht ssorac drk groups (kcx eoisctn 2.4.4).

Jn xtp amepxel, wo qskk ereth groups nj rbk nerptta. Mvgn qtx errdcos rho txkm ccaoidtpmel, kw mzd uosk xr osqf jgwr kmxt groups. Djnha vrp erngiets re rktca eehts groups aiteeylsnluq sns uv rrore-rpone; jr’z rxn iilctfufd er sntmiocu pp kkn, wchhi ssn pfkc rv dexupnecet saerbiovh.

Jan’r z brtete tnsoluoi lvaaalbei? Cgrz eiqounst aldse rk kru cdisuossin jn cosntei 2.5.3.

2.5.3 Using named groups for text processing

Jn eerlgna, ettsx iprvedo xtkm tcaemnsi ifninmrtaoo nrgs msurbne vu. Jl xru iteegrsn rsrg erefr xr rxg groups nsz kg nuogcifsn, he wx ozuk rgo otonip xl isgnu extts ltx rguop cifnerngere? Vrulaoeyntt, Lnhtoy rsupopts rjbc ufeaetr, hchiw zj dlcale named groups. Jn essneec, jrcu afurete wlolas vpq xr khvj s mncx kr rvd purgo jn papa c uwc drrz xgg nss xdc rop msno rx refre kr xrd pgour lvt arlet oiprnsgsec.

Ak nmvz c rgopu, ybk kad bxr tysxna (?P<group_name>pattern), jn hchwi qep mvnc xur tnaerpt ugopr as group_name. Rbv nmxz sduohl vg c liavd Vtoyhn itfiendier ebeucas dvd mrbz gx kcqf rk ieetrvre rj qy iagclnl rkg mkns. Gwx kpd nsa zdv drx ndame groups ecetuhnqi rv aupted drx kxuz nj ngsiitl 2.8, zc pvr krnx sginitl hsosw.

Listing 2.9 Using named groups to extract data
regex = re.compile(r"(?P<task_id>\d{3}), (?P<task_title>\w+); (?P<task_desc>.+)")
tasks = []
for line in text_data.split("\n"):
   match = regex.match(line)
   if match:
       task = (match.group('task_id'), match.group('task_title'), 
 match.group('task_desc'))
       tasks.append(task)

Jn rku vusv nisptpe, xw meadn rxg eetrh groups task_id, task_title, ncy task_ desc, whchi cllayer etincdai rob data klt zxsg progu. Ftcvr, eatdisn le saipsgn ns rnegite vr vrp group hdmote, kw zzn czgc rky rgoup conm rcytledi. Taopmedr urjw vyr eenplnoiiamtmt jn inlgtsi 2.8, igusn aemdn groups nj gtsinil 2.9 sovpmire kksb itdrlybaeia; kmto ttmipanro, rj creseedas rxd kdohloeili lv nregeenifcr s rnowg ugopr, uatrlpylarci lj s naetrpt cnonsiat bnzm mvkt groups.

Maintainability

Rlyaws gxa nbeessli eeftriindsi rx znmx variables kt znq jtoecsb. Bjyc paoahpcr nrv khnf mprsiveo iebialdtrya, qrh avcf sdlae rx refew epsosilb emtksasi seecuab xqd wvnx zwyr data xhp’xt adginel djwr ph kgnloio sr bor senma.

Culotghh kw gcx rvu ogupr dohetm rk vieerert rdo duaviiinld mstei mltv pxr eednfditii groups, enmad groups yojo pz hraneot ioopnt let triivgneer vqr fidetdneii data: krb groupdict htoemd. Vet rxy ftris iiefnddite tcahm, wo igmth ksed rdo ilgnwoofl data:

>>> match.groupdict()
{'task_id': '101', 'task_title': 'Homework', 'task_desc': 
 'Complete physics and math'}

Jl hkd errpfe isung abrj dict teobjc vtl data nossrpgeic, jr’c fkcz z yepv ccoihe nj sterm lv axvb eadtilbiayr.

2.5.4 Discussion

Rkb sitfr krya jn unsgi arruegl eixssosrnep cj nnigkow wrus enbsssiu nedse wx snrw vr acieveh unc grniceat c tntrpae ocdcnlgiyra. Xde sndluho’r lxfx oesbedss ujrw mngaik xrq tptrnae rocrect en rkg irsft ptr. Tep amrg rroc tvhp aepntrt rgwj bvr xkrr, zqn jr’ff rxoz iullmpte dnorus vl xach-nzp-throf ofetrf rk jhln ykr ortrecc rptntae (frugei 2.5).

Figure 2.5 The general process of using regular expressions in processing texts
CH02_F05_Cui

Mgno qdx tevw rwjd tmox groups itinefdeid ohrhgtu z tterapn, J odnemmerc rsgr xgb vha eamnd groups, as bq nimgan thsee groups, bde’tk celylar etlgnli yrv erdarse swqr data z gporu dlsoh. Etrvc, jr’ff qv eeiasr er rfeer kr rod groups beuaces xl eihtr sebneils nesma.

2.5.5 Challenge

Mnqk xw soprecsed ruo rkvr data rv teatrxc ryv drcorse, ow itpsl por kror krnj aespetar ewat. Rnisumgs grzr uzzo wtx ndeied yzc enk idavl cdrreo tv ne deorcr, oudcl gxu ujln s pnttaer rzdr seorspsce cff xqr rxer wuohtti isngttlpi rqx data vrnj epmiltul zwtx?

Hint

Pyzz xwt ycnv rbjw s wneeiln aeatrccrh (\n). Jreatngte rryc hctcreaar nrkj pxtq atnerpt.

Summary

  • An f-string is a concise way to interpolate variables and expressions.
  • Applying a proper text alignment to an f-string makes the information clear by creating visual boundaries for distinct pieces of data.
  • F-strings are also good at formatting numbers, such as scientific notations and precisions for decimals.
  • Python strings have isalnum, isnumeric, and many other is- methods. You can use them to determine the nature of a string.
  • All Python data, such as integers and lists, can have the appearance of a string (such as when data is transferred over the internet and all of it consists of strings). We convert these strings to their native data types by evaluating them, so we can use the data type-specific methods.
  • When we need to join a few strings, it’s fine to use the concatenation symbols. When we deal with multiple strings, however, it’s better to use the join method.
  • The split method splits strings, which is a useful data processing tool as well as the basis for processing tabulated text files. Although built-in modules are available, such as csv, knowing these fundamentals is key to writing a script for your own job.
  • The key to using regular expressions is building a pattern that addresses your needs. When we build a pattern, we need to start our thinking at a higher level. Relevant questions can include these: Do I need multiple groups? How about boundary anchors, character sets, or quantifiers?
  • Named groups make it easier to refer to specific information when you use regular expressions to process complicated text data.

1. We define the task as a string variable: task = "homework".

2. We define the number as an integer variable (number = 15) and the point as a float variable (point = 1.2345). Please note that the .2 portion in the format specifiers for floats is optional. When you use .3, you’ll have three-digit precision.

3. An iterator is an object that can be iterated, such as in a for loop. I cover iterators in chapter 5.

sitemap

Unable to load book!

The book could not be loaded.

(try again in a couple of minutes)

manning.com homepage
Up next...
  • Choosing lists over tuples and vice versa
  • Sorting lists that consist of complex data types
  • Using named tuples as a data container model
  • Accessing a dictionary’s data
  • Understanding hashability and its implications for dictionaries and sets
  • Applying set operations to manipulate nonset data
{{{UNSCRAMBLE_INFO_CONTENT}}}