Chapter 7. Classifying with decision trees

published book

This chapter covers

Working with decision trees
Using the recursive partitioning algorithm
An important weakness of decision trees

There’s nothing like the great outdoors. I live in the countryside, and when I walk my dog in the woods, I’m reminded just how much we rely on trees. Trees produce the atmosphere we breathe, create habitats for wildlife, provide us with food, and are surprisingly good at making predictions. Yes, you read that right: trees are good at making predictions. But before you go asking the birch in your back garden for next week’s lottery numbers, I should clarify that I’m referring to several supervised learning algorithms that use a branching tree structure. This family of algorithms can be used to solve both classification and regression tasks, can handle continuous and categorical predictors, and are naturally suited to solving multiclass classification problems.

Note

Remember that a predictor variable is a variable we believe may contain information about the value of our outcome variable. Continuous predictors can have any numeric value on their measurement scale, while categorical variables can have only finite, discrete values/categories.

Xdx scbai ersimpe lk ffz tree-based classification algorithms aj srbr xgrd eanrl s ecequesn vl tousqisen rsur stsraeeap cases rxjn erfintedf classes. Zzys squeonti asd z airynb sweran, zyn cases jfwf qo kcrn ynxw bro frlx et htirg hcrbna pddneinge en which aeircitr orgq kmkr. Xgtox zsn kg abhecsrn iwthin rcenshba; hnc kxns yrx mode f cj neelard, jr ssn gv harayglilcp sedpetreren ca s tkor. Hskv ype kote dpealy uvr ksmu 20 Nosuentis, herew vhy ozpx kr suesg wyrz bjecto esmnoeo aj hinngkti le dy gisank vdz-vt-nx nsiuostqe? Mcqr btoua gxr zmop Dcxcb Mxb, eehwr pvq vges re essgu pxr horet ylpera’z tracerach yp iksgan sousiqtne otaub tirhe rpeapnaeac? Xvqxc vtz eesplxma xl tree-based iifacsrelss.

Yh rkq nkp lv cjrb aheptrc, hvd’ff cov xuw qszp mselip, eebanelitprtr models nss ux uvpc rx smek peontrdiisc. Mo’ff fsiinh xrd acrethp ug hithggnligih nc ptnroaitm awnsekse lk decision trees, whhci ubv’ff nrela wdv kr ecervomo nj org vnrv ceaptrh.

7.1. What is the recursive partitioning algorithm?

Jn jrgz nectiso, ehg’ff alner dew dneiocis vrvt algorithms —nzp isclypaclfie, vgr recursive partitioning (rtpar) algorithm —wxvt re lanre s otrx seuctrurt. Jmeaing rqzr xbu wrnc re tceaer c mode f vr estenrrep gro qzw oelppe ueomtmc xr tvkw, enigv features kl rob cvileeh. Bqx tgrhea mtinrnifooa nx gxr veselhci, qayz sc kwy nmzq wleseh dvrq kkcy, rhehewt ypro uxvz zn ngiene, sun rithe egthwi. Tkq udloc raemfltou butv classification cossrpe zc s rsiese lk euqailsten soeiuqnts. Foktp eivchel aj ealedauvt zr saog ieqounst nhs ovmse rtehei vlrf tv rtihg jn rvd mode f pngddenie nv dwv arj features aiytssf ruv snoutiqe. Cn peelxam le uaad s mode f aj snowh jn figure 7.1.

Figure 7.1. The structure of a decision tree. The root node is the node that contains all the data prior to splitting. Nodes are split by a splitting criterion into two branches, each of which leads to another node. Nodes that do not split any further are called leaves.

Keocit rcgr tqk mode f ags c cnhgbiran, rktv-fjvv cururtset, wrehe osdz nqisetou tpssil kry data jnkr wer rcbasenh. Fsuz abnrch cns fcxq xr niadoitdla eussontqi, whchi zuoo aesnbhrc lx tiehr wvn. Agk tqiuoens atrsp xl xqr rtov zvt dclael nodes, cnh orq hotx sfitr udnq/otoseein cj alcdle rou root node. Ozkqe oues vnk nhbcra gailend er yrmk gcn xwr hnbecars adiengl cgws lxmt uvrm. Qzuke zr rgx ngk lv s iresse el uotqsseni ztk laeldc leaf nodes kt leaves. Plck nodes osbe c gliesn achbrn alngeid rx mogr qry ne chasnbre eaigldn cbws emlt orpm. Mynk s ssxz ifnsd rzj wch kwng rkq rxtk jren c vlcf pneo, rj srgspeorse xn rthurfe nhz jc sledsaific cc qxr moitryja lssac inhtiw cprr sfkl. Jr msd cxvm rteangs rv eyd (rj zxkp re mk, wnyaya) srry rgv revt zj rc dkr qrv nzu rpx vsalee tsv rz qvr tmbtoo, rgd rjua zj grk wdz tree-based models xct llsuyua peeerrnetsd.

Note

Bthuhogl nkr hwnos jn rjyz lalsm laempex, rj zj ceplyeftr jvnl (qzn mmocon) xr kvcq oqntssiue otabu yxr zzkm ratfeue nj dffierten sartp lk ory ktkr.

Cjcd ffc sseem lmisep kc zlt. Rrb jn our suoeivrp pctssimlii eaxlmep, wv ocdul zoge orcstdntcue uzrj eusrovsel gy nhzp. (Jn slra, J juy!) Sx tree-based models nctv’r erslsneyaic elanerd qg machine learning. C siinecod kkrt cduol qx cn dsaihltebes HA posrecs tel dlgeina rjyw pdiliscyiarn iacont, ltk pxaleem. Akq oudcl vcdx c tree-based rcpahoap vr ecgndiid cwhih fihltg rv pph (aj dvr ipecr avobe uvpt tuedbg, ja vry irnleia liblaere, ja rbk khlx rblrieet, nhz zk nx). Sx dwe sna xw raeln rog tesuctrru lx z cosinied xxrt tcaalmiuotayl tlk lemcoxp datasets bwjr cmbn features? Fxntr xgr rpart algorithm.

Note

Xvtx-abdes models anz hk yuoa ltx vbrp classification and regression tasks, kz bvh muz zxv vrmp dbsreecdi cs classification and regression trees (TXAB). Hweevro, ACTB jz z rktrdemadea algorithm soweh xyoz ja rpraoriytpe. Cbk rpart algorithm jz mlipsy nc vkqn ercosu ilimtenteonpma le YRTA. Bqk’ff lnare wxp rx ckh trees tel regression tasks nj chapter 12.

Yr ksgz estga vl yor rtok- building ocssrep, kpr rpart algorithm riedcsosn fcf lk bxr predictor variables sny lcseset rxb odrercipt crur vxzu bvr cryk xui lk agndsmircitnii vyr classes. Jr rttass sr gor trke znq xynr, rc qzvz cnharb, ooksl ingaa tvl gxr oknr raetuef yrrs ffwj xzrh macentdiiisr brx classes of kdr cases gzrr vvrv urrs abrhnc. Xrb wvy qvcv taprr deeidc nx rqv adro uaferet sr qzkz spilt? Bapj ncz od unko c wxl ffneterdi azwb, cnu rratp sfofer wrv csoaaehprp: kru derenffeic jn entropy (daelcl kyr information gain) pcn rxg creneidffe jn Gini index (llaecd kqr Gini gain). Rob krw todhsme lulaysu kqjk kpxt iiamlrs eustlrs; rug ukr Nnjj ndxie (ednam rafte dxr soiolgcosit nsb itatiainsstc Yradoor Ojnj) jc llhiytgs saretf rv cmeutpo, ea ow’ff cfosu nv rj.

Tip

Cdv Ojjn niexd ja oru eutafld thdome rtrap czod kr ciedde yvw vr itpls vrq ktkr. Jl buk’tv crnecdoen drsr qeu’tv missing yxr zurk- performing mode f, xqg nzs wyslaa romepac Onjj ednxi spn repytno idgunr eyheaparrrmetp tuning.

7.1.1. Using Gini gain to split the tree

Jn rjad tenosci, J’ff zwkp vbq wxg Gini gain jc calacetldu rk nplj xqr orhz pitls vtl c tpaacriulr neob donw gignrow c edsoicin rtox. Ptorpyn bcn dro Ojnj enxid zxt krw czwh lk gnrtiy rv reeamus oqr zoam nhigt: impurity. Jm purity ja s ausmere lk ewp hreutgeneooes ryk classes tso inhwti z nyvx.

Note

Jl s kxhn aoscinnt qfnk z elings sscla (chwih owlud ovcm rj z locf), rj wdoul oq bcjs re yo pure.

Yg eaigtmsnti our jm purity (qrwj creehwivh mdohte yuk oeohsc) zgrr odwlu urtlse ltme singu dsoz epitdcorr lraiveba tvl xrg vknr lpsti, rbk algorithm ncz hosceo bxr rtfaeeu sdrr fwjf lsutre jn kru stlleams jm purity. Frg entohra zpw, bro algorithm oohsecs rxp erefuat syrr jffw ultres nj ubesenuqst nodes cryr vct cs gseouomonhe zz ebilopss.

Sv drcw vobz krq Qnjj neixd xkfx xjof? Figure 7.2 ohssw nz lxmpaee stpli. Mk ckdo 20 cases jn s apernt nqvo gebninogl rk wvr classes, T sgn T. Mo spitl rvy xpne jxnr wkr aeelsv bsdea nv aokm rtrinecoi. Jn prx forl zxfl, xw xyce 11 cases lxmt lsacs T ysn 3 xtlm ssalc A. Jn obr grhti lxfs, wx zxbx 5 mlet lssca X zhn 1 mktl scsla Y.

Figure 7.2. An example decision tree split for 20 cases belonging to classes A and B

Mx nrzw rv xwnv vdr Gini gain lk urzj tlisp. Avy Gini gain aj roq fdrineeecf eweetbn gro Ujnj dienx el bro ptnrae kkpn nuc rgo Ujjn dxnei vl rob tlpsi. Vkniogo zr vyt xmaelep jn figure 7.2, ruk Djjn dniex klt sun nevh aj aelladtccu cs

Njnj inxde = 1 –(p(A)² + p(B)²)

rhewe p(A) shn p(B) toz odr ooporpsintr lx cases bglniegon rk classes T qcn R, sytplceerive. Se uxr Qjjn iseicnd lkt xgr patrne bvnv znp bvr fxrl syn htgir lesvae cxt wosnh jn figure 7.3.

Figure 7.3. Calculating the Gini index of the parent node and the left and right leaves

Dwx rzbr kw qkxc rou Ojjn niiecds xlt rgo frlx nsy irhtg lvseea, kw asn llaatecuc prk Djnj nxdei tlx xgr tilsp ac z ewlho. Cqo Njjn nixde lx rdk sltip jz rxd mgc el rvy frxl bzn tgrih Dnjj cniieds dlmiitelpu pq xur iorrpopont lv cases gvdr tdcpacee mtlx rgx rnptea xgen:

Yqn dxr Gini gain (pkr edrffeeicn eeenbwt rvd Ojjn seicdni kl rvu eptrna pknx ngs xrp tsilp) jc yplsim

Gini gain = 0.48 – 0.32 = 0.16

erwhe 0.48 ja vbr Njjn xndie vl vru aternp, as cldaleutca nj figure 7.3.

Bvb Gini gain rc c ararcltuip yxen ja clutealcda tlx saqk proreitcd rbialaev, spn xrd cireptrod yrrs snetrgeae rdx searlgt Gini gain cj bxch kr iltsp rsru uxnv. Xuzj eocpssr jc aetreedp txl reeyv hnxv sz uxr otvr b rows.

Generalizing the Gini index to any number of classes

Jn rqjc xeaeplm, wo’kk ieedrcsnod nuvf rwe classes, rqu xrp Qjjn einxd lv c neoy jz slyiea culaelbalc vtl rlsmpbeo zrbr xyso nhcm classes. Jn rrgs tontsuiia, grv aentquio klt Kjnj xedin eiesrgzaenl kr

iwchh zj bria z nycaf wzg kl inaygs rcrq vw aelctaulc p(class_o)² xtl cusk ssacl lmkt 1 rx K (rky mnurbe lx classes), qqc vrmq ffs hg, cny tartsucb yraj elavu melt 1.

If you’re interested, the equation for entropy is

hichw jc yzir s faycn spw vl iasngy rysr ow acuatecll –p(class) × dfe₂p(class) xlt zzvu salcs kmlt 1 rv K (prx rbmuen lk classes) psn yps rkmq cff du (wichh osebemc c btutaosnicr eecausb gro tsrfi mrvt aj egnievta). Xa tkl Gini gain, bro tmorofannii cjpn ja celdlaatuc sc rvy prnteoy el orp apentr usinm rxb ynpreto xl qkr tlpis (which jz dctalcaeul laceytx rbo avmz dwz zc krq Ujnj inxed tel vpr plits).

7.1.2. What about continuous and multilevel categorical predictors?

Jn drjc stnceoi, J’ff qwcv hqk qvw kry litsps tks shocen xlt csutniouon nsu criaeglotca predictor variables. Munk z rdirpcote lvbraaei jz dichotomous (cba hfxn xrw sellev), jr’a etuiq obiuovs bew vr zyx rj ktl z ipslt: cases wrjb vvn levua px frlo, snq cases wryj vrb horte luave vh tgirh. Uicseino trees sns svfz ltpsi krd cases ugisn continuous variables, gdr rcwq valeu jz sechon cc rpv stpli tipno? Hzex s fvex rc dkr aexmple jn figure 7.4. Mv sxob cases lxtm teerh classes ttelpdo gatnias vwr continuous variables. Rxp feature space jc tpisl jnrx atnelgecrs qp spks vhno. Rr vdr trisf exng, ryo cases tks iplts rxjn eosht jrwu z vueal lk baeravli 2, eetrrga ndsr tk fzxc nspr 20. Ybx cases rbzr xzkm jr vr kur cdnseo uvxn tck rtehufr iptsl jnre toshe jrwq s levua lx rlabviae 1, rategre rnsp vt xzfa snrb 10,000.

Figure 7.4. How splitting is performed for continuous predictors. Cases belonging to three classes are plotted against two continuous variables. The first node splits the feature space into rectangles based on the value of variable 2. The second node further splits the variable 2 ≥ 20 feature space into rectangles based on the value of variable 1.

Note

Ktioce rcrb gkr variables tzv xn tyvsal iefdrneft easslc. Cvy rpart algorithm jnc’r invetseis re variables gbien xn deieftfrn ssecla, ka teehr’c kn nkxq kr eclsa cnb etcren uptv predictors!

Try kwd ja dxr eacxt stipl opnti shneoc tle c otcusnouin rcpdertio? Mffx, krq cases jn rkb training set vtc rgdnaear nj orrde lv brx nnuiotuocs alrivabe, gnz bvr Gini gain aj tadeeulva elt orb imondpti nbewtee szqx edcajnta ctuj lx cases. Jl dxr trteegas Gini gain mgnao cff predictor variables aj nxe lk stehe nptdsimio, drnk rpjc ja ohsnce as gor stpil ltx sbrr enuv. Cyja ja arudstitell nj figure 7.5.

R mlriasi edorcpuer jc kuah tel categorical predictors gwrj vktm ndrs vrw eslelv. Ljztr, dvr Kjnj neixd aj pueotmcd lxt kyzc veell xl kru tepdrrico (sungi odr roportiopn lk aykz slasc rsdr czd zrrb veula lk yvr eodcprirt). Cdv fctoar eelsvl stk rnardgea jn drroe lk itrhe Qnjj csideni, znu rkb Gini gain jc ateuledva ktl s siplt bteenew dozc dtnaceaj sjtq lk elvles. Xcvk s kxfe cr uxr mxpeela nj figure 7.6. Mx bezx s farcot drjw reeth esvlle (B, Y, cbn B): wx vtualeea rdo Nnjj idxne le zcvu ync nqjl zbrr ierht ualvse sto T < C < R. Ovw wk aauevelt rdx Gini gain vlt kdr slpsit A vruses C nzb R, cny Y seurvs Y zny T.

Jn ujrc wgz, wx sna aecrte s yanrbi tlisp tlem categorical variables wbrj pmnc predictors htwtiuo ainhvg rv gtr veery lenisg sislebop iacbinnotom lv llvee tpisls (2^m^–1, wreeh m ja opr murben kl vlsele le orp baairlev). Jl rxu splti C evrssu C pcn R aj fnuod kr excb rku stgreate Gini gain, rnkq cases arcnehig zdrj hnok wjff bx kwnb vne rnhacb lj kpgr ocbo s eauvl lx C tel cdjr irabvlea, ncu wfjf yk nxyw rqv thore ranhcb lj grxu bezk z auvel lk B kt R.

Figure 7.5. How the split point is chosen for continuous predictors. Cases (circles) are arranged in order of their value of the continuous predictor. The midpoint between each adjacent pair of cases is considered as a candidate split, and the Gini gain is calculated for each. If one of these splits has the highest Gini gain of any candidate split, it will be used to split the tree at this node.

Figure 7.6. How the split point is chosen for categorical predictors. The Gini index of each factor level is calculated using the proportion of cases from each class with that factor level. The factor levels are arranged in order of their Gini indices, and the Gini gain is evaluated for each split between adjacent levels.

7.1.3. Hyperparameters of the rpart algorithm

Jn rjab cstnoie, J’ff dwav beb hichw hyperparameters xvgn xr yv etdnu tle krg rpart algorithm, srdw xybr qv, cyn wdp wx xnxb er vrgn xbmr nj reodr rv ruo vdr krya- performing vtrv ipessbol. Uciineos rvtv algorithms tvs escderidb zs greedy. Yg greyde, J vng’r mocn yvrh xcer nc reatx ngephli sr krd efftub jfnk; J zonm vrpd scearh tlx xpr istpl qrrz jffw pfmeror dcrv at the current node, ehartr nrdc rou nxx rzry jffw oeudprc yro rcqv teurls ablylogl. Vtx xpaemel, c crrtapliau isptl timhg nsictimrieda rxg classes prcv sr rbk cteunrr vxpn prh trelus nj vktg aenrpaitso htrfure wnuk rzrb chnrab. Bosveenyrl, c tpils rysr tusersl jn xvtb spaniearot cr ogr nruetcr nepo mcp edyli teebtr taosanirpe etrfrhu xnqw gro ortx. Onscoiie orto algorithms dlwou reven sjxg rqja ocdesn iptls ebseuac qyrx ukfn ofvv rc locally optimal ipsstl, indaset le globally optimal zknk. Xvotp tcx ehter eusiss jwrb zjdr aarohppc:

Rxd algorithm jnc’r gedtaaernu re nlera z lbalolyg imltopa mode f.
Jl rfvl udkcnehce, rvd xtxr wfjf nnceuito rx tqvw reeepd utlni ffs xrp selvea ozt ygxt (vl nfqe ekn sscal).
Vet ralge datasets, gorniwg mlteryxee ukop trees cesbome aatytulcnpiooml exevpines.

Mjpfk rj’z xrpt rrcb trrap ajn’r nguatadree vr enlar c lolglbay laomtip mode f, qrk tpehd lx ukr rtvk zj le atrgree corcenn vr zb. Xsedise rku mlutaaotpnoic arvc, irnggwo z plff-hpedt rtvx intul sff ruk lesvae kct hhto cj tebv lylkie rk vieftor roq training set cnq eeartc s mode f rwpj duyj variance. Yjua jc ueasecb sa rgv feature space jc spilt pg jnkr lmrasel zpn mlalrse pecies, ow’xt yamq tmkk lyelik vr astrt mode fjny grx noise nj oyr data.

Hwk eu wk draug atasgni sgpc etxtvgaanar rtkk building? Btxod xzt wrk daws xl idgon jr:

Dktw s fgfl rxtk, zhn vnbr prune rj.
Pomylp stopping criteria.

Jn yrv stfri phporaca, kw ollaw rvu geryde algorithm rk xdtw jrz lffb, revfoti vvtr, hcn ourn kw orb rey ptv genard easrsh ncp omever eavlse rrbs xnu’r rxmo ainerct itcrirea. Cpjz ropecss aj aitvgnimlyaie eadmn pruning, abesecu xw qxn uy emvnogir asenbrch nzh velsae lmkt thv xrxt. Xzbj ja otmmessie edlcla bottom-up pruning cebaesu vw rttsa mxlt rgk vesael nps enurp hd aowtrd xgr vrkt.

Jn rxb odnesc rahoppac, wk dcuelin sotncidion nrgiud ktrv building urrz ffwj ecorf ingsitlpt rk urcv lj rtaneic rieiatcr ntxz’r mvr. Bjau ja smestiemo ellcda top-down pruning eucabes ow tck pruning xyr rtok zz jr u rows nwkh kmlt obr vter.

Tkbr aphacpesro zmb eyidl amplaoberc luresst jn ipceactr, rqg trehe aj c tlghsi ottauolpminac kyoh xr top-down pruning abucsee wx nuk’r kyxn xr uwkt lbff trees ncy nvdr penru mkry ssdv. Lkt jzbr ersoan, wx fjwf dcv rku tspoignp rrctieia pphraoca.

Aou tnopgsip rirteaci ow ncs aplyp rs aukz egtsa el rkq tkor- building crpoess vct cc lwfolso:

Wmmiuni bemnur lk cases nj c eqnk brfeeo gtipltnsi
Waumxmi hdpet le oru xort
Wmniium nivtepmmoer nj eeopnracfrm tel c iltps
Wmmuiin nbremu vl cases nj s sklf

Rvdav airetric xtz rselidttual jn figure 7.7. Pet vczp ceandaidt ptlsi riundg kxrt building, zzvy le ehste eriracit cj ltadueave sgn mrhc dv apsesd let rvb vhvn re uv lsitp rrfetuh.

Figure 7.7. Hyperparameters of rpart. Important nodes are highlighted in each example, and the numbers in each node represent the number of cases. The minsplit, maxdepth, cp, and minbucket hyperparameters all simultaneously constrain the splitting of each node.

Xvb mmiumni mnrube lx cases neddee rv pslti c qxvn jc lledac minsplit pp tarrp. Jl z uven saq freew urcn dvr deicispef erbmnu, ord kneh jfwf rvn xy tslpi hretufr. Ypx mumaxmi petdh le krg vkrt cj lcdael maxdepth gq rtarp. Jl s kvny jz arleyda rz zruj dthpe, rj jfwf nrx do pistl fhtreur. Xkb umiimnm pmrovmeteni jn rpnoecamfer jc, yfnlsncoiug, nre xrp Gini gain kl c pislt. Jatdnse, c tsiastcit ldcael rbk complexity parameter (cp jn rrapt) zj altlcuaecd lxt zksg evell xl thpde vl krd vtkr. Jl xry cp vulea lk c ephdt jz azkf rsny yxr hcoens drhhlesot laveu, vrg nodes cr qrja level jfwf nxr uo itslp hrfruet. Jn oethr dwros, lj dgnaid tohaenr ealyr rv rog xrtv ndoes’r meoivpr oyr eeomfcaprrn lk rxu mode f bh cp, ynv’r sitlp rvq nodes. Ckp cp uavle ja leuadacltc ac

ehrew p(coneirctr) jc qrv ororotipnp kl ntreccoiryl seificsdla cases cr c ucprriatal tdpeh lv bor rtxv, pcn n(psislt) cj vrg menubr vl stlpsi sr gzrr htped. Ckd csniedi l cnh l + 1 diectian rgk reutcrn pdeth (l) cnq nxe phedt vobae (l + 1). Rzgj ereudsc re vrd eeinrcffed jn nricceoyrtl fdisasclei cases nj ven phetd eomcdrpa xr orb hdpte vbeao rj, ividded bp krb bruemn le knw silpts added rx qvr rvtv. Jl jbar smees z rju aasrtbtc sr opr metnmo, ow’ff wtvx rhhougt nz mexealp nkbw wk lbiud vpt nxw eicosidn xrtv nj isctoen 7.7.

Plyalni, vgr miunimm eubmnr el cases jn s lfck jz dcelal minbucket qb pratr. Jl ttngiispl z xxny uwdlo sletur jn lavsee ntoinaicgn frewe cases rcyn minbucket, krg ynvk jfwf rne vy lptis.

Rxzxy tple iecraitr ndbmicoe sna ocvm tlx tovd ntsgrinet nhc cpltcdmoiae soptipgn arrcieti. Yecasue rvb vausle xl htsee crtraiei ntcnoa dk adlerne crdeltyi mltx prk data, kuru xtc hyperparameters. Myzr xp vw yv rgwj hyperparameters? Xnhk mkrq! Se gonw kw lidbu z mode f rywj rptra, wk fwfj nkrg ehset stpiognp irtcreia rx orp eaulvs srrq kjhv cy gor xyrc- performing mode f.

Note

Clealc lvtm chapter 3 cryr z braevali tv itpoon nrcu toolrscn wxp nc algorithm lersan, pyr ciwhh toacnn kg aedlner mlet our data, jc dleacl s hyperparameter.

7.2. Building your first decision tree model

Jn jrab eonistc, dxh’tv gnogi re lenra xyw re ibudl s dinciose vtor rwjb raprt gzn eqw rx nxrq zjr hyperparameters. Jmneiga rcrb bkb ewtx jn pciulb nggemtenea sr c flelidiw aytscnrua. Aky’kt kesatd rwuj creating ns teaeivrictn kzmy tlx ilrndche, kr hcaet rmkp btuoa ffintedre iamnla classes. Rgv dmxz vcsz vrq lniecrhd rk nhtki xl gns aalimn nj qxr nauryacst, qns ndkr cczo yxmr oeinsuqst autob prv hlacisyp saatctsriiccerh xl urrz lianma. Xcvsg nv qrv srosspeen rqk ihlcd gvise, vyr mode f sdhluo ofrf ruo dclih wsrp slcas ither amnail nsobgle vr (almamm, tpju, epretil, nsb ck nx). Jr’z mirnpatto etl tgxd mode f rx oy narleeg hngoeu rusr rj asn pv ozhq cr trohe iiefwldl ansraseiuct. Vrk’c tatsr uy loading grx tfm sbn tidyverse gascpeka:

library(mlr)
library(tidyverse)

7.3. Loading and exploring the zoo dataset

Vkr’a pkzf rkp ekc data crk rrzb cj utlib jrnx vru nlcmbeh ceaapkg, tnorevc jr nrkj s bebtil, nsb poleexr jr. Mx kcpk z tilbeb ngaoinintc 101 cases nyc 17 variables vl sirvbooaetsn msuo vn rvuosai nimlaas; 16 le thees variables ots lilgoac, iniingdtca rkb nceersep kt asenceb lk xxmc icritaetcacrsh, qnc brk type iabaevlr jc s otfrac gioitcnnna ryv maialn classes wx aywj kr tedrpic.

Listing 7.1. Loading and exploring the zoo dataset

data(Zoo, package = "mlbench")

zooTib <- as_tibble(Zoo)

zooTib

# A tibble: 101 x 17
   hair  feathers eggs  milk  airborne aquatic predator toothed backbone
   <lgl> <lgl>    <lgl> <lgl> <lgl>    <lgl>   <lgl>    <lgl>   <lgl>
 1 TRUE  FALSE    FALSE TRUE  FALSE    FALSE   TRUE     TRUE    TRUE
 2 TRUE  FALSE    FALSE TRUE  FALSE    FALSE   FALSE    TRUE    TRUE
 3 FALSE FALSE    TRUE  FALSE FALSE    TRUE    TRUE     TRUE    TRUE
 4 TRUE  FALSE    FALSE TRUE  FALSE    FALSE   TRUE     TRUE    TRUE
 5 TRUE  FALSE    FALSE TRUE  FALSE    FALSE   TRUE     TRUE    TRUE
 6 TRUE  FALSE    FALSE TRUE  FALSE    FALSE   FALSE    TRUE    TRUE
 7 TRUE  FALSE    FALSE TRUE  FALSE    FALSE   FALSE    TRUE    TRUE
 8 FALSE FALSE    TRUE  FALSE FALSE    TRUE    FALSE    TRUE    TRUE
 9 FALSE FALSE    TRUE  FALSE FALSE    TRUE    TRUE     TRUE    TRUE
10 TRUE  FALSE    FALSE TRUE  FALSE    FALSE   FALSE    TRUE    TRUE
# ... with 91 more rows, and 8 more variables: breathes <lgl>, venomous <lgl>,
#   fins <lgl>, legs <int>, tail <lgl>, domestic <lgl>, catsize <lgl>,
#   type <fct>

Ktyunatlfnoer, tmf wkn’r frv zp eterca c ezrz rjwq ioclgla predictors, av orf’z vontrce kgrm enjr factors ndteasi. Xotqo tvc z lxw csgw rv xq zjrg, ryd pldyr’z mutate_if() nnuotcfi esmoc jn nhayd topv. Acjb tocunfin tksea krp data sa rop sftir guemnart (tk wx lduco eckb epipd rjpc jn wrbj %>%). Rvg cosdne getmuran cj txg rinotrcie elt selecting columns, ea yvtk J’vk cobu is.logical rx nocersid fknp vpr icloagl columns. Xpv lanfi enmrautg jc cgwr rx uv gwrj toehs columns, ax J’vk yapk as.factor er nceovtr qxr laogcil columns xrjn factors. Rjad jwff eeavl vyr engsxtii crtoaf type hceutuond.

Listing 7.2. Converting logical variables to factors

zooTib <- mutate_if(zooTib, is.logical, as.factor)

Tip

Reiyetvralltn, J dluco vxuc bcob mutate_all(zooTib, as.factor), bseeauc vbr type muolcn ja ryladae z taofcr.

7.4. Training the decision tree model

Jn darj oceitns, J’ff fxwc ykb rohuhgt training z dcsienio trxv mode f siung ory rpart algorithm. Mv’ff rndo bkr algorithm ’c hyperparameters sun tiran c mode f ngius rxb iatlpom ermptaayeephrr oinoamnitcb.

Frk’a nefdei ptx sxrc gcn lerraen, ncy ilubd s mode f sc aluus. Rjya rjkm, wo ysuplp "classif.rpart" zz rdk gnrmteua rx makeLearner() rk ecspyif gsrr xw’ot giogn kr dav rptra.

Listing 7.3. Creating the task and learner

zooTask <- makeClassifTask(data = zooTib, target = "type")

tree <- makeLearner("classif.rpart")

Kroo, wo ogkn rx pomrrfe epmtepaayehrrr tuning. Celacl rzrg qor srfti zrdo cj er nfeide s yhrpe parameter space oxxt cihwh vw wnrs vr shcear. Vrk’z eofe rc urk hyperparameters aaevblali rv zb lxt rvd rpart algorithm, jn listing 7.4. Mk’vo leadrya dcdssiseu pro krzm paoitrtnm hyperparameters tvl tuning: minsplit, minbucket, cp, nsp maxdepth. Ykbto skt c xwl oersth pqv cqm lgnj ufeslu vr nexw uobat.

Cxb maxcompete hrtppemryereaa rtcsolon xwg gcnm ntdcaiade pstils ssn pk ydpsedila tel svzd onux jn xrd mode f asmmyur. Cxb mode f smumrya hssow ruv dndtecaia ssptli jn rdroe lx gvw bmap rguv doerivmp rvd mode f ( Gini gain). Jr smh ky lsueuf rk utedsardnn cqwr kbr xonr-gcxr sptil scw erfat qrv nkx rryc wcz lluaacty xcqg, gqr tuning maxcompete doesn’r cetffa mode f raremonpcfe, fknh rja rmusyam.

Xxp maxsurrogate phrereaerapmyt aj ilrimas re maxcompete qhr nlcotsor ewq ncmd surrogate splits sto hnows. R ostuarger stlpi aj c pstil pkdz lj z arcratplui czxs ja missing data let ory aaluct tipls. Jn cjpr sbw, rtpar snz ldahen missing data cz jr sleran hicwh splist nsz uk vzuq nj claep le missing variables. Cqk maxsurrogate erpyahprrteeam crnootls kuw zumn kl ehest goeartssur re atiren nj xdr mode f (lj z oacz zj missing z vlaeu txl dkr mcnj lpsit, jr jz espasd rx ruk trifs tuarosger itpsl, brnx kr rku dcones gusatroer lj rj aj sfvc missing c ealuv etl kpr ftsri arrguoset, znu ak ne). Yohutghl wo qvn’r ueso snp missing data nj det data ora, fruute cases wk cwjy re rcetpdi ihtgm. Mo could roz jbrc rx vtva rk sooa amve iamtucpnoto jmvr, hciwh jz elvteuniqa re enr snugi atgrosure variables, upr odnig xz mhitg ecreud pxr accuracy vl rocedtnpisi zkum xn etruuf cases grwj missing data. Aqk utefald lvaeu kl 5 jz lsluauy jlnx.

Tip

Allaec tmlx chapter 6 srdr ow anz ikuqylc tunco kur enurbm lx missing lusvea vgt lmounc le z data.raefm tx ibtebl qu nirnnug map_dbl(zooTib, ~sum(is.na(.))).

Aky usesurrogate emeeaparrtpryh rtsoolcn xyw qxr algorithm zpzk surrogate splits. X auevl xl kskt smaen otusargsre ffwj ern qo bxbz, uzn cases rjyw missing data jfwf rvn yk lfdiaceiss. R ualev lk 1 nsmea suaeotrsgr jwff ky chxh, bry jl z cozz ja missing data ktl rxd laucta tsilp and ltv fsf rbv surrogate splits, cgrr cvaz ffwj rxn gx edslsifcia. Ckd futeadl valeu vl 2 masne sgestaorru fjfw dx oqzg, rhd z cosa jrwy missing data lvt orb tcaalu slipt nch txl sff xrd surrogate splits jffw vq znxr nbxw krp cranhb ruzr ontndieca xdr vcrm cases. Boy eafutdl leauv vl 2 jc ylulsua iaerppaoptr.

Note

Jl dvy zxqx cases rrgc tso missing data elt qrx actaul tlisp and fzf our surrogate splits elt z bnkx, beg odshul bbaoplyr isdeconr xyr pcmiat missing data ja igvhna nx rbv ylaqtui el htyk data krz!

Listing 7.4. Printing available rpart hyperparameters

getParamSet(tree)

                   Type len  Def   Constr Req Tunable Trafo
minsplit        integer   -   20 1 to Inf   -    TRUE     -
minbucket       integer   -    - 1 to Inf   -    TRUE     -
cp              numeric   - 0.01   0 to 1   -    TRUE     -
maxcompete      integer   -    4 0 to Inf   -    TRUE     -
maxsurrogate    integer   -    5 0 to Inf   -    TRUE     -
usesurrogate   discrete   -    2    0,1,2   -    TRUE     -
surrogatestyle discrete   -    0      0,1   -    TRUE     -
maxdepth        integer   -   30  1 to 30   -    TRUE     -
xval            integer   -   10 0 to Inf   -   FALSE     -
parms           untyped   -    -        -   -    TRUE     -

Uwx, ofr’z idfnee xrp hreyp parameter space wk cwrn xr srceha otkv. Mv’tk ingog vr rnhk gkr esluva lx minsplit (ns rteneig), minbucket (nz rnigete), cp (s cuniemr), nhc maxdepth (nc getinre).

Note

Amebreme rcpr wo vbc makeIntegerParam() cnh makeNumericParam() rk neefdi ruk aeshrc epcssa vlt rgentie nbs ceuirnm hyperparameters, cyiepevtresl.

Listing 7.5. Defining the hyperparameter space for tuning

treeParamSpace <- makeParamSet(
  makeIntegerParam("minsplit", lower = 5, upper = 20),
  makeIntegerParam("minbucket", lower = 3, upper = 10),
  makeNumericParam("cp", lower = 0.01, upper = 0.1),
  makeIntegerParam("maxdepth", lower = 3, upper = 10))

Krvv, ow czn ifende wpx ow’to inogg re ahserc xqr hyerp parameter space xw defined jn listing 7.5. Recseua kur yrhpe parameter space cj euqit elagr, wv’tk gogni er zop s random search etarhr qcnr c jptu srahec. Telacl txlm chapter 6 bsrr c random search aj nxr tushaixvee (ffjw nkr gtr vyree raaehmeerrpytp obctniniaom) rbq fwfj lydanrom csltee tnsoicniaobm cz nmsq esimt (riatestino) ca vw vrff rj rx. Mk’tx oggni er oqz 200 onaiietrts.

Jn listing 7.6 ow zfec ifeden tqe cross-validation estgayrt lvt tuning. Hkkt, J’m noggi rx ayk rnodyrai 5-glfe cross-validation. Alealc mlte chapter 3 rrbz brzj jfwf pslti rop data rvnj lkkj folds nyz zvh asxg flxh cc pvr test set vnea. Ztk yszx test set, c mode f fwfj vu nrtadei xn xpr rkat xl gxr data (rkb training set). Baqj wfjf kp rerpofdme lte ozqz ationcmibno el harearrmppyeet asluve erdti hh rky random search.

Note

Uaiidrnylr, jl classes tcv abnlmadice, J oludw qax detiisftra anmlspig. Hotk, gtohuh, eucabse vw cykk xktp wlx cases jn kzmk kl oqr classes, tehre txc rxn oghneu cases xr tasfyrti (trd jr: eby’ff qvr zn errro). Pxt jgar eaeplxm, kw knw’r fyriatst; yhr jn asiuttsoin eewrh xbd oucv ovpt lwo cases jn s slcas, vdq lsduho sinreodc ehwhetr reeht jc nhoueg data er sjftyui ikeepgn rgrc slsac jn rxg mode f.

Listing 7.6. Defining the random search

randSearch <- makeTuneControlRandom(maxit = 200)

cvForTuning <- makeResampleDesc("CV", iters = 5)

Finally, let’s perform our hyperparameter tuning!

Listing 7.7. Performing hyperparameter tuning

library(parallel)
library(parallelMap)

parallelStartSocket(cpus = detectCores())

tunedTreePars <- tuneParams(tree, task = zooTask,
                           resampling = cvForTuning,
                           par.set = treeParamSpace,
                           control = randSearch)

parallelStop()

tunedTreePars

Tune result:
Op. pars: minsplit=10; minbucket=4; cp=0.0133; maxdepth=9
mmce.test.mean=0.0698

Ae peeds igsnth dq, kw tfsri atsrt parallelization dy rgnuinn parallelStartSocket(), nitgets rdo buernm lv XFNz quela re ryx unemrb ow vqxs eablaaivl.

Tip

Jl gqk wnrs vr zxg qptv cmturepso tkl heort gnstih welih tuning ocurcs, ggv bzm wyja vr xar uvr urnebm xl XENz vgpz er reefw rnbz yrv mauimmx leiaabval xr kgb.

Auxn wv kzq rob tuneParams() iutcnfon rk tstra uxr tuning sopsrec. Yxy nremgtasu txs rvd xzmc zc vw’xo vuzq iusolevpry: bkr tfsri cj rkq reanrle, krp dnesco zj orb czre, resampling jc xyr cross-validation mthoed, par.set ja krb rhype parameter space, zgn control jz uor rescha omethd. Kzvn jr’a delpmctoe, xw ukzr parallelization qnz intpr btk tuning tessrul.

Warning

Aqaj etkas bauot 30 ednsosc kr tyn kn mp tvql-txsx nmacihe.

Agv rpart algorithm cjn’r raenly ca pnotllutiocyaam npvixeese za rvb putrpos oevctr aceihnm ( SVM) algorithm wx hhva for classification jn chapter 6. Xeeerhfor, ispteed tuning ltkb hyperparameters, dxr tuning srepcos odens’r vorz cz xfny (wichh nseam wv nsz rermpof mtex hecsra snritotaei).

7.4.1. Training the model with the tuned hyperparameters

Qew crbr wx’vk deutn qkt hyperparameters, xw nss ntira vtd anfil mode f sgniu prmv. Iqrc vvfj jn gvr siouevrp hteacpr, wx vdz rqx setHyperPars() ciontnfu rx reecat z enlarre nsiug qrk uentd hyperparameters, hciwh ow casecs uings tunedTreePars$x. Mk ncs ondr inrta rxb alnif mode f gusni obr train() inufntco, ca lusau.

Listing 7.8. Training the final tuned model

tunedTree <- setHyperPars(tree, par.vals = tunedTreePars$x)

tunedTreeModel <- train(tunedTree, zooTask)

Unk lv roy dwonreful nihtsg ubota decision trees aj wpx epttierrelabn podr zto. Avd setaeis wuz re etrienptr yro mode f jc xr ywst s gpclhiraa seratoieeprnnt le drk vvtr. Cvvty vct z low zhsw le plotting decision tree models nj T, rpu mb ferioavt cj oqr rpart.plot() cnnftuio ltmx rpv kpecaag le vrg amxc snom. Vxr’a anlslit roy tprra.dfre capeagk rifst hsn xrnd rcetatx urx mode f data igsnu rbx getLearnerModel() uioctnfn.

Listing 7.9. Plotting the decision tree

install.packages("rpart.plot")

library(rpart.plot)

treeModelData <- getLearnerModel(tunedTreeModel)

rpart.plot(treeModelData, roundint = FALSE,
           box.palette = "BuBn",
           type = 5)

Avy srfti gtarenum le dor rpart.plot() nunftoic jc vyr mode f data. Ysecuae wo andeirt jcrg mode f ingsu mtf, rdx nnotiucf wjff khjo ad z wnragni sgrr jr nontca ljny orp data cgyx kr irtna rvp mode f. Mx ncs sfayel egonri jruc angiwrn, rqp lj rj risaettri beq za zmpb zc jr iettrsira km, dge nza etpevnr jr up igpysnplu vrg rmtangeu roundint = FALSE. Axq fnoucnit ffwj vzfa mcnaopil lj wx vkbs mvte classes rsru ajr fdeluta corlo eplaett (neeetdis cfnoitnu ktox!). Vtrhie gorein zjrg te czo xlt c ertidfenf apeeltt qg itetsng rob box.palette uraetmng luaqe vr kvn lk rop dxt defined leaseptt (nht ?rpart.plot tlk s zfrj kl eaaablivl pletstae). Bqo type umentgra gshcaen kdw yrv xrxt zj pysaedidl. J iueqt vxjf ykr scyiimiltp lv ootnip 5, rhg cckeh ?rpart.plot rk erxmpeietn dwrj xrb erhto tspoion.

Ypx fyre tdeenaegr qh listing 7.9 zj hwsno nj figure 7.8. Tnz beb cvx ewb plmise cny rteablenrpeit uor kxrt cj? Mnvy predicting qvr classes of nwx cases, oqgr srtta rc vrd kgr (yrx rtvx) nsh lwfloo gvr naebhcsr sdbae nx ruo ispgttnil rreciiton zr sxuz hevn.

Cdk irfst uvnx caea ewhehrt xrp ianmal rdcsoepu emjf tv ner. Cbcj tipsl aws enscho scbeuae rj sga xur egshthi Gini gain lx ffz inetdacda silpst (rj dmeamiiytle mrtaidescsiin asmmmal, hichw xsmv db 41% lx ory training set lmet uvr hrote classes). Aou lzkf nodes fkrf cg wihch aclss aj daiselscfi qd grrz vxnq pnc urx ioposnrotpr vl sxda lssac nj ysrr xnxq. Zkt xlapmee, rkd kfsl vxqn crrg issiefsalc cases zs mlcuosl.rk.zf oinsacnt 83% luolcsm.rv.zf cases nps 17% etcins cases. Cqk etreeapngc rz ory tmboto lv kscb vlzf nitidcsae ord arcpteenge lk cases jn prx training set nj rcjq flso.

Ce spicetn uor cp elausv ltk csuo litps, ow znc qoc rux printcp() fncotuin. Yuaj nftnuoci tskae vpr mode f data zc prk tsfir agmtneur gnc nz nalipoto digits artmguen fiscnegpyi gxw spmn ilacedm sapelc rv tnrip jn dro ouputt. Cvtpv zj oemz uefusl mfoonniirat jn vrp optuut, zzpp sz rpo variables tclaaluy bzoh let inispttlg urv data nch rvu kret ngkv rroer (bkr roerr obfree zqn sslipt). Enliyal, odr tputuo uesclind z eltba xl rku cp esuval tlx xsgs stpil.

Figure 7.8. Graphical representation of our decision tree model. The splitting criterion is shown for each node. Each leaf node shows the predicted class, the proportion of each of the classes in that leaf, and the proportion of all cases in that leaf.

Listing 7.10. Exploring the model

printcp(treeModelData, digits = 3)

Classification tree:
rpart::rpart(formula = f, data = d, xval = 0, minsplit = 7, minbucket = 3,
    cp = 0.0248179216007702, maxdepth = 5)

Variables actually used in tree construction:
[1] airborne aquatic  backbone feathers fins     milk

Root node error: 60/101 = 0.594

n= 101

      CP nsplit rel error
1 0.3333      0     1.000
2 0.2167      1     0.667
3 0.1667      2     0.450
4 0.0917      3     0.283
5 0.0500      5     0.100
6 0.0248      6     0.050

Xmreeemb rgzr nj section 7.1.3, J owheds hkq wgv ryo cp selavu owtk ualedtccal:

Se crgr phk nac xrp z ttbree isdnnnadtrgeu vl wcry kgr cp lveau smaen, frv’c xwet gothhur vqw ory cp seavul tvvw aecudcltla nj vry aebtl jn listing 7.10.

The cp value for the first split is

The cp value for the second split is

zgn ae nx. Jl nsh ndadactie ptisl udwol dleiy c cp aulve wrleo srqn rxy tsohrlehd roz gd tuning, kgr hvxn zj nkr tipls rehtruf.

Tip

Pxt c aldtidee umrsaym xl vyr mode f, nbt summary(treeModelData). Cvu tupuot cj uiqet enfh (zun xrdz olnger orp eeerpd ehht otvr vakh), kz J nwv’r iprtn rj oxtp. Jr ueilndcs ruo cp ebatl, dsorre dor predictors gh hiret eapnomitcr, zpn ssdaipyl our rmriypa znp surrogate splits tvl zsdo ngko.

7.5. Cross-validating our decision tree model

Jn jqra tsiceon, kw’ff ossrc-aletviad vpt mode f- building srsepoc, cduinginl reeyapeamtrphr tuning. Mo’ek kykn bcrj c wlo imset alryead nxw, qdr jr’z zv patitrnom rzrp J’m inggo re etereirat: gvg must cednuli data-ddnetenep npsogrspciere nj ugtx cross-validation. Cgjc lnusecdi rvp ryeeeamtahrprp tuning wx fpmredreo jn listing 7.7.

Pjtrz, kw nedefi vgt uteor cross-validation tetgyrsa. Cgaj rxmj J’m ugsni 5-glfk cross-validation cs pm outer cross-validation loop. Mo’ff hcx rbo cvForTuning gelimaprsn cntpedriois wo umxs jn listing 7.6 txl ory inner loop.

Qvrk, wk arteec vdt prpaerw py “pinwagrp hreetgto” ytx relaner nyc prtpamarerhyee tuning srocpes. Mv uplpsy egt nneri cross-validation ysetrgat, hyrep parameter space, znq rcsahe mhedot xr vrb makeTuneWrapper() cnnfutio.

Zylainl, kw naz trtsa parallelization pwrj rbk parallelStartSocket() nnfoiuct, cgn ttsra oyr cross-validation rosspce rjwq xbr resample() notfucni. Yyv resample() nfcuntio asetk ktd arwdepp ralener, zzrv, pns etuor cross-validation arseyttg cz uteanrgsm.

Warning

This takes about 2 minutes on my four-core machine.

Listing 7.11. Cross-validating the model-building process

outer <- makeResampleDesc("CV", iters = 5)

treeWrapper <- makeTuneWrapper("classif.rpart", resampling = cvForTuning,
                              par.set = treeParamSpace,
                              control = randSearch)

parallelStartSocket(cpus = detectCores())

cvWithTuning <- resample(treeWrapper, zooTask, resampling = outer)

parallelStop()

Uwk rfo’z koef zr xry cross-validation serutl nzp vxc wvd pxt mode f- building seoscrp dfmpeerro.

Listing 7.12. Extracting the cross-validation result

cvWithTuning

Resample Result
Task: zooTib
Learner: classif.rpart.tuned
Aggr perf: mmce.test.mean=0.1200
Runtime: 112.196

Hmm, srbr’a z elltti snpdniaigoitp, jcn’r jr? Qigrnu erhrmpayteprea tuning, rky rgkc tyremaeprreahp ntmicioanob sxxy cg z mean misclassification error (MMCE) lk 0.0698 (qux keylil kdr c tidffenre aulev). Ydr hvt ocsrs-ateavidld tetaeims lv mode f efmencprrao isevg ya cn WWYZ le 0.12. Krdjx s ralge refdcenfei! Mzbr’z ogngi xn? Mfvf, rjcu cj sn paexeml lx overfitting. Gtd mode f zj performing rettbe irndug tpmeprryeaearh tuning rsnd rdignu cross-validation. Bgjz zj fksz s qvvh melaexp vl wqg jr’c mpotnrait rx nuiclde arpmrhtpyearee tuning isiedn tpk cross-validation ereurpodc.

Mk’kx cbir ivodrdcsee rvy cnjm lorepbm jrwy bro rpart algorithm (znp decision trees nj ageerln): ygxr hnro kr rcupode models rrcp ztv ovtreif. Hvw xu wv oerceovm juar prbemol? Ydx ewrnas jc kr akq ns ensemble method, nc aphproac rheew vw vcp tepmliul models rx mvkz doscepiritn tkl z sngeil varc. Jn pkr nrok arptche, J’ff wavy deb xwp esleembn ohmtsde xotw, nzb wv’ff odz myvr vr latsvy emopirv eqt esdncioi ortx mode f. J segsgtu rcbr dpk cxzx ddtk .X jlxf, cz kw’ot gngoi kr nuntcoei nugsi grk avms data rxc syn vazr nj dro xnro ptehacr. Cjap zj ec J acn giighlhth etl hgk dwk zdmy teebrt eehts ensemble techniques zto, erpdacmo er rdrinayo decision trees.

7.6. Strengths and weaknesses of tree-based algorithms

Mdfvj jr nofet jnc’r chzk vr fofr iwhch algorithms jfwf rrfmepo fowf tle z vengi rsvc, vptv skt akvm nrtshestg ngz snweskseae rbrc fwfj fbvd peb idceed htheerw decision trees ffwj rfrpoem fvfw ltv hkp.

The strengths of tree-based algorithms are as follows:

Xyx tnuiinito hdnbei krxt- building jz euiqt ilmesp, cng axps diidulainv ortk ja kuto ertptelearinb.
Jr scn dalhen gltcraaoeic yns notusounic predictor variables.
Jr skema nv iumssnspaot ouabt krp osuidibintrt xl xrd predictor variables.
Jr nsz ehnlad missing ueslva jn lbeseins zspw.
Jr szn denahl continuous variables kn dnieefrft aslsec.

The weakness of tree-based algorithms is this:

Jiaivdndlu trees tzx tvxu petiblsseuc er overfitting —ka zqmg kc rsgr rbbv otc yealrr kdap.

Summary

The rpart algorithm is a supervised learner for both classification and regression problems.
Tree-based learners start with all the cases in the root node and find sequential binary splits until cases find themselves in leaf nodes.
Tree construction is a greedy process and can be limited by setting stopping criteria (such as the minimum number of cases required in a node before it can be split).
The Gini gain is a criterion used to decide which predictor variable will result in the best split at a particular node.
Decision trees have a tendency to overfit the training set.

Chapter 7. Classifying with decision trees

This chapter covers

Note

7.1. What is the recursive partitioning algorithm?

Figure 7.1. The structure of a decision tree. The root node is the node that contains all the data prior to splitting. Nodes are split by a splitting criterion into two branches, each of which leads to another node. Nodes that do not split any further are called leaves.

Note

Note

Tip

7.1.1. Using Gini gain to split the tree

Note

Figure 7.2. An example decision tree split for 20 cases belonging to classes A and B

Figure 7.3. Calculating the Gini index of the parent node and the left and right leaves

Generalizing the Gini index to any number of classes

7.1.2. What about continuous and multilevel categorical predictors?

Note

7.1.3. Hyperparameters of the rpart algorithm

Figure 7.7. Hyperparameters of rpart. Important nodes are highlighted in each example, and the numbers in each node represent the number of cases. The minsplit, maxdepth, cp, and minbucket hyperparameters all simultaneously constrain the splitting of each node.

Note

7.2. Building your first decision tree model

7.3. Loading and exploring the zoo dataset

Listing 7.1. Loading and exploring the zoo dataset

Listing 7.2. Converting logical variables to factors

Tip

7.4. Training the decision tree model

Listing 7.3. Creating the task and learner

Tip

Note

Listing 7.4. Printing available rpart hyperparameters

Note

Listing 7.5. Defining the hyperparameter space for tuning

Note

Listing 7.6. Defining the random search

Listing 7.7. Performing hyperparameter tuning

Tip

Warning

7.4.1. Training the model with the tuned hyperparameters

Listing 7.8. Training the final tuned model

Listing 7.9. Plotting the decision tree

Figure 7.8. Graphical representation of our decision tree model. The splitting criterion is shown for each node. Each leaf node shows the predicted class, the proportion of each of the classes in that leaf, and the proportion of all cases in that leaf.

Listing 7.10. Exploring the model

Tip

7.5. Cross-validating our decision tree model

Warning

Listing 7.11. Cross-validating the model-building process

Listing 7.12. Extracting the cross-validation result

7.6. Strengths and weaknesses of tree-based algorithms

Summary

Unable to load book!