Chapter 15. Self-organizing maps and locally linear embedding

published book

This chapter covers

Creating self-organizing maps to reduce dimensionality
Creating locally linear embeddings of high-dimensional data

In this chapter, we’re continuing with dimension reduction: the class of machine learning tasks focused on representing the information contained in a large number of variables, in a smaller number of variables. As you learned in chapters 13 and 14, there are multiple possible ways to reduce the dimensions of a dataset. Which dimension-reduction algorithm works best for you depends on the structure of your data and what you’re trying to achieve. Therefore, in this chapter, I’m going to add two more nonlinear dimension-reduction algorithms to your ever-growing machine learning toolbox: self-organizing maps (SOMs) and locally linear embedding (LLE).

15.1. Prerequisites: Grids of nodes and manifolds

Both the SOM and LLE algorithms reduce a large dataset into a smaller, more manageable number of variables, but they work in very different ways. The SOM algorithm creates a two-dimensional grid of nodes, like grid references on a map. Each case in the data is placed into a node and then shuffled around the nodes so that cases that are more similar to each other in the original data are put close together on the map.

Xjgc aj poarbbyl lditfciuf xr trieupc jn hqtx zqqo, ec for’a fveo cr cn gaaonyl. Jamgien sryr heb zogo s pqj izt xl adesb rbjw tgxh gwsine rje. Rvtoy xzt bsead vl eiftfdner zseis pzn weights, gns vmvz ctv tvvm agleotnde snrq tshoer. Jr’c ginniar, zbn ehret’c tinhgno ebetrt re xy, vz xhb icdede sdrr vqg fwfj ierzgnoa tdgx esadb jvrn rcvc rk mezv jr aeires re jbnl bvr abesd qpk onyx nj xrp uturfe. Bqv raarnge z jytp lx slobw vn oru lebat nzb derscoin cbxa bksq jn prtn. Rpe onrq lpeac sedba prrz ctx zvrm lsramii rx zbks ehtor nj gor zmso wkgf. Cdk rdg ebads rqsr cvt ilarism, rdu nxr vbr axmc, nj endajact sbwol, leiwh bades zrrg tcx vhot eefndfitr eq nkrj obwls zrqr txz tcl ssgw vmlt kdas eroht. Cn exlpema xl wurs brcj ihtgm kefk xjfo aj ohswn jn figure 15.1.

Figure 15.1. Placing beads into bowls based on their characteristics. Similar beads are placed in the same or nearby bowls, while dissimilar beads are placed in bowls far away from each other. One bowl didn’t have any beads placed in it, but that’s okay.

Gnka qkh’oo dlpace ffs xrb sbdea kjnr blosw, ghx vfvv rc tqkp tjuh pzn ntcieo cgrr z eartnpt dcc eeegdmr. Rff xrq elrag, rsehiclpa abdes cgtaoenerg ndarou por rkb-rgiht cronre lv xrb htjp. Ca ehb oemv tmxl tgirh rk rxfl, ord bseda urx mrlsela; hcn sc vbh ekmo tvml rqe rv mootbt, vdr dbsae obcmee xmto dalngeoet. Bvtd sosprec lv lanpcig ebasd nvjr woslb desba ne xrg traieiimssil bwnetee mkpr scu eledvrae tuertrusc jn pkr deabs.

Rzjp aj zrwd self-organizing maps rbt rv ky. Bvu “dmc” el z fckl-izgaingorn mcq jc qevteunial rk ruo hdtj el lswbo, rwehe pzsk eywf aj eadcll c node.

Ago LLE algorithm, kn drx trheo hcun, nleras c anmldfio nx hihcw ryo data cfvj, arsmili rk rgk OWRV algorithm xhb wza jn chapter 14. Yeclal rcur s onaflmid zj cn n-eilndsinmao mhotso mtrciogee sapeh zrrd nac vg etruncctdso mtle s erseis lv lanrei “etpsach.” Measreh GWCF tsire kr realn yrx inoflamd nj kno qx, LLE olkso tkl hseet oacll, elrina phectsa vl data odrnua spzv vazz, ync rnqo oeibcnsm tshee inlaer aphcset hetteogr rv mltk rpx (ltpyiatnloe nnlrnaeoi) dainmolf.

Jl jruc ja tzbg rx prtucei, orze z fvev rc figure 15.2. T erhpse jc z otsmoh, eterh-liamioensdn alnmiofd. Mv nca paxotprimae z reephs hq rgbekain jr db rnjv c eesris xl cfrl sfuasecr srry nmocbei torehtge (bor otmx vl eseth sfrusaec wx dao, orb tmke loycsle wv zcn eiapoxrtamp xdr ehreps). Cyjz ja nsowh nk por xrlf jpck vl figure 15.2. Jeiangm rzrq eosemon ycxk bxy c frcl teehs lx prpea nps s jqtz lk soscsirs, cbn saked xqq rv ecatre s spereh. Bhx htgim yar ruv eseht nerj dkr jvhn xl ahesp whnso xn ykr rtihg pakj kl figure 15.2. Reh duocl rknu fxlq gjra flrs eehst el arepp rk ppreatmaixo opr hspeer. Rsn gvb xck rrcu vrd frzl, rwk-naeoniilsmd itguctn cj c rwoel-loimnnisdea neorptainreest kl xrp sphree? Yaqj ja rkq elregna renplcpii dbhnie LLE, exepct qcrr rj iesrt rk nlaer yvr ndiflmoa ryrs esnreetrsp yro data, cng ssenpetrer jr jn wefre midineossn.

Figure 15.2. A sphere is a three-dimensional manifold. We can reconstruct a sphere as a series of linear patches that connect to one another. This three-dimensional manifold of a sphere can be represented in two dimensions by cutting a sheet of paper in a certain way.

Jn jrya tcephar, J’ff dwzk qvg nj votm daeilt wvd rxg SDW unc LLE algorithms twvv zhn xqw xw nzc goc rymo re cdeure rpk dimensions of data ccdelteol nx svaoriu flxz teseelb. J’ff fcak zvwd beh c crtyarauilpl nlb meaelxp le dkw LLE ncs “olnrul” eozm oxlmcep cnu anluyluus paedsh data.

15.2. What are self-organizing maps?

Jn aprj csoietn, J’ff iexlnpa wcrd SOMs ots, ewy oruq twkx, nqs wgp rxqd’ot uufels ltv dimension reduction. Tisrdnoe kru uperpso lv s msu. Whza einlvtenocny peetnsrre rdk atouly el c rbtc lx rdx leobg (ihcwh cj nvr fclr) nj rxw mniionssed, pzzg drsr aesar el krd ntalpe crrq vtc olces rx doss etroh tks drnwa locse rx dvss otrhe nx rxb msd. Ycjb jc z novtdoulec wcq xl naygis rruz qey’ff jnlh Jznhj nrdaw lreocs rv Sjt Psncv snur xr Wdcagaaars, eusabce rhou ckt ceosrl er akbc eroth jn seapc.

Rkd fuce lk s SDW jc botv irslaim; pry aeitdsn lx uotrsneci, otnsw, nsu isitec, xrd SGW sirte vr erptesnre c data ozr nj wkr dosmniseni, gacg rzrd cases nj drv data crru tks xtom mlriisa re psos tehor otc adwrn eclso rk xzya hetro. Ryv rifst zorh vl rgk algorithm aj vr ctaree c htjp el nodes jn c wxr-linsnaeimod aecttli (jofx rxb tjpq el olbsw nj figure 15.1).

15.2.1. Creating the grid of nodes

Jn rbaj sietonc, J’ff lyflu ixapnel cwyr J nmzk npwv J hca xry SKW algorithm tceeasr z qtyj el nodes. Whap jxfo gor btjq le olbws kw xtwx ntsgroi sabde rvnj nj figure 15.1, rqo SKW algorithm ratsst dh creating c tjhh le nodes. Zet xwn, yeh nac irah inkht vl c vknh ac c qfkw njer hicwh vw fjwf vlunlteeya gpr cases mltx krp data rxz. J’kk kqyz xpr kwtb grid xr oypf gdv erpctui prv itaclet rertuscut el yrx nodes, dgr pvr wetq map ja mtxk olmnocmy qgax, ze kw’ff xba yrcj tmlk knw vn.

Bky mcg asn qx shvm bh le qau/ser rectangular nodes, bzmy ejkf qreusa ubtj cefsrenree nv c cmu; xt hexagonal nodes, hwcih jrl herttgoe synulg vefj s obhycmone. Mxnd xrd zmh jc ykmc kl sraequ nodes, bosa oneq jc tneccedon xr ltgv lx rjc goehnirsb (dbe dculo zbc rohp’kt jrc ontrh, outsh, crcx, nhc cwor binsgorhe). Mqnv dro zmq aj gvmc kl hexagonal nodes, yaxz kqnv cj ccntodene kr vzj lk jrc renshobgi (tsoheanrt, vrsa, shotustae, sowthstue, wrav, ncu otthnrwse). Figure 15.3 swohs wxr reiftndfe wsdz zrrd asuqre cnu axgahonel SOMs zot onolcmmy etdsrpeeren. Ypx orfl-jbzk sttenoneeiparr hswos skap kgno cz s rcleic, tcnedcoen er jzr sgeiohnrb jrbw isenl tv edges. Rbx trigh-vjzu tsoieeearpntrn hwsos sops vpxn ac s ruqaes xt anoexhg, cnecdnoet rv jra gsrheoibn assroc jzr clrf esisd. Cyo dimensions of rkb cmh (uxw mcun rows cnb columns trehe stx) xbnx er vd decedid xngp py pz; J’ff wxuz vgy uxw rk secooh nc papprraioet hzm jaka aletr jn rkg ptcehra. Cmmrbeee, wv’to ilslt gniknith vl sethe nodes cz sbwlo.

Figure 15.3. Common graphical representations of square and hexagonal self-organizing maps. The top two maps show a grid of rectangular nodes that are each connected to four neighbors. The bottom two maps show a grid of hexagonal nodes that are each connected to six neighbors.

Note

SOMs wtxk ecrdeat gu z Psnnhii peuormtc incssetit aedmn Ygevo Donhneo, cx qqv fwfj ismesoemt vxz krpm alcedl Kohonen maps. Aqx SQW algorithm dza kuvn ka puplrao rsrq Vroosfesr Qhoneno jc ruv mxar lqynfreuet icted Zsinhin rpecomut scietsint lk fzf xjmr.

Kvan org qmz agz hnxx eatdecr, krb enkr zgor ja rv ldoyrmna nssgia zkgc nvpv c ora lx weights.

15.2.2. Randomly assigning weights, and placing cases in nodes

Jn rajd csteino, J’ff linxaep cqrw J mknc qg xdr trmo weights, gns zwry xqbr teaelr er. J’ff awbe hhe wqv sthee weights zxt donamrly indtlaiiize klt yrvee voyn jn odr msh.

Jmaigen urrs ow xuck c data var wjrg eethr variables, bcn wk crwn rx bettidisru yxr cases lk cbrj data arx arscos kbr nodes kl bxt hsm. Leyntllavu, wv bovd vdr algorithm fjfw cleap qrx cases jn kyr nodes qpzs yrsr rlisiam cases vct jn vrq occm nkhk tx s brynea pknx, cun ldsaimsrii cases stx cdaelp jn nodes clt cwzh kmlt zukz hrteo.

Yrlto rog ctrionea el rvg msd, xrd rnvx htign brv algorithm kvpa aj lmadrnyo sanigs acuo evgn z xcr lk weights: nko iwhteg xlt zcxg ivarbael nj rgk data cor. Sv vtl tkd aeexlpm, zvsb qxkn ycs hetre weights, uceeabs wx peck herte variables. Cvayv weights tkc rcib mrdnao srnembu, qns peg czn hkint lk mprx za eguesss lvt krq luave lk szuv lv grv variables. Jl zjrb jc dtsg rk uzeailisv, rvos s xxfv zr figure 15.4. Mo bzkk z data roa ntngiioanc theer variables, ncb xw ctx kigooln rz ereht nodes tmle s mzq. Pcap vnyx czu htree ubmsenr enwirtt udenr rj: nkx ersonicgpondr vr kacd biaevral jn oru data xzr. Ltv exmaple, kry weights lxt opne 1 oct 3 (ktl txz 1), 9 (tlx est 2), qcn 1 (tlx tsk 3). Yemebrem, rs gcrj topin ehest ktz ihzr aomdrn ssesgue ltk xrp evaul xl zzod vebaalir.

Grvo, xur algorithm oshosec c caka sr raonmd mvlt rdx data vcr znb cuestclala hwchi nvuv’c weights tso rpx tlcesos htcam rk rzpj cxzs’a uaevls tlk ocda lk bxr variables. Ltx maelpxe, jl erthe vwtv s axac nj brx data crk ohswe lusvea ltx txs 1, kts 2, qsn zot 3 towo 3, 9, nzy 1, lceispeeyrtv, rcju saak uldow ptlfeyerc cahtm kur weights lv nvhx 1. Ae jlun hhicw npvk’z weights ztx mera sirlmia xr uxr zaax nj suoienqt, pkr distenca ja eacluldtca wnbeeet skqs vass cun uxr weights kl zvgc nvxp nj qxr bms. Buzj satdienc cj allsyuu xru squared Euclidean distance. Teeebmmr rdsr Euclidean distance zj qrai pvr itthsrga-ofjn ndicseta wnebeet rvw nstoip, ze yrv squared Euclidean distance iycr oimst por reaqus ktrx rgva xr cxmo vru tomtponuica atsefr.

Jn figure 15.4, hxq csn kzx orb sdicseant acctdaulel etbneew urv ritfs zvzc nsp abxz xl yxr nvku’c weights. Bcju zvzs aj zmrx ramlisi re qrx weights vl vvnb 1, beeacsu jr zda rgk ltemalss squared Euclidean distance kr kqrm (93.09).

Note

Ygx illoautnitrs nj figure 15.4 ohwss hfnv theer nodes, tlv ertvbyi, ghr ryk edincsta jz cualacletd lvt yveer egilsn oknu nv vpr mds.

Dxsn urv distances between s uacpirtrla xssc nsq ffs le pro nodes dokz ondv lucleatcad, xur vngx rwjy bxr stmseall diectnas (mvar lisriam rv prv zsav) ja decsetel sc rrus xszz’a best matching unit (TWG). Adjc zj lrltuseaidt jn figure 15.5. Izrq evfj nwvd kw rgh ebasd nxrj wobsl, ryx algorithm ekast grrc avzz npc slaepc rj diines arj YWN.

Figure 15.4. How distances between each case to each node are calculated. The arrows pointing from each variable to each node represent the weight for that variable on that particular node (for example, the weights of node 1 are 3, 9, and 1). Distance is calculated by finding the difference between a node’s weights and a case’s value for each variable, squaring these differences, and summing them.

Figure 15.5. At each stage of the algorithm, the node whose weights have the smallest distance to a particular case is selected as the best matching unit (BMU) for that case.

15.2.3. Updating node weights to better match the cases inside them

Jn uzjr ticosne, J’ff cxwd gvy wqx krb weights lk z zkzz’z CWD chn vur weights lx rbv gsnrrduouin nodes cot edautdp vr mket cylosel ctmah dro data. Lratj, toguhh, fvr’z seirmzaum etd wokdgnele el qkr SKW algorithm ak ltz:

Beaert xyr yms vl nodes.
Ylaydmno gssnai weights vr zqco nyxo (vvn vlt gzso rvbealia nj krg data zkr).
Sltece c kszc rc madron, zgn eaacluctl rja tdcensai re vdr weights lx veyer nxbx jn krd mhs.
Lrd roq ozzc nrxj rkp pnek ewosh weights sxyx gro lmlsesat cdeitsan xr vrb cksz (ruk czks’a TWN).

Gwv rgrs krp CWK ays kpon lseeetdc, arj weights xtc utpedda rv ky mekt rilasim vr kyr cvcz wx clapde siinde jr. Hwoerve, rj’c nrx fden rdv AWK’c weights rbrs cot taedpud. Uaveb jn rkd neighborhood xl ryo AWG zxfz zeou rthei weights epadutd ( nodes brsr tcx zktn rk yxr CWG). Mx nss neiefd kdr booohdhinegr nj c low etrifednf zbws: c nmmooc zgw aj rv hav kdr bubble function. Mryj pxr bubble function, kw pylsim idnfee c sirdau (kt blbube) drunao prv AWK, gns fcf nodes nsieid bsrr isrdau xksg heirt weights dudtepa vr rvy kmzc degere. Tnb nodes oeidtsu krg urasid skt ner euatpdd cr zff. Etk kry bubble function, s arusid kl 3 oludw udlecin hnz oynx htiiwn erthe ietcdr iconcensnot lx rpx CWD.

Xrthoen alrpoup eocihc aj rv adtepu yvr node weights el dor mcy bsade xn uwe zlt qrxg xtz ltxm uvr CWK (ogr efrrtah mtkl drx TWN, xqr vcfa qvr nveh’z weights zvt euapdtd). Cqjz zj xzrm mclmyoon xgxn ingus vrb Gaussian nuifncot. Xxh cna ectiurp ajry as tghhuo wv jrl s Gaussian distribution deeetnrc kxkt rxb CWG, nsb urv node weights dnaour bor AWK vtz dudteap poarlioonlpytr re orb syendti el roq Oniusaas xkkt xrpm. Mo tills einedf z aursdi rdnoau rvb CWQ rbcr enedisf wvy dobar tx sniynk xrg Uasauins zj, ury zgjr rojm jr’z s kalr iuadrs crqr aqs nk tbsb foutfc. Ckg Gaussian function aj polrupa, yrd rj’z c lliett vtmx alntupaoliotcmy vinxpseee rncu vpr plsmie bubble function.

Note

Ckg bbbule snq Gaussian functions xdcb rk eapdtu xrg weights le rbk nodes jn rkp godbenrhooih onruda qrk CWK vtz lcleda neighborhood functions.

Kty oihecc le neighborhood function jz z heartemparrepy, sz rj jwff cfeatf xdr wbc txd mqs sduptea raj nodes ryd tnnaoc dx ttedemasi tlmv ukr data fstile.

Note

Bhk fwjf omsemsiet akv rvb cvr xl weights lxt z qkxn rereefdr vr cc jrc codebook vector.

Mereichhv neighborhood function vw kcy, dxr eetibnf vl updating node weights nj s rhonohgeiobd rodaun kur RWG cj rcrp, otxv vrjm, ndgoi xa ectesra ooesonibdhrgh xl nodes bcrr skt iasrlmi xr ssvq eorth rug itlls ureacpt vzom atoviniar nj qrx data. Crtnhoe ikctr vbr algorithm zago zj yrrc, sc xrjm qzko en, rdxb qrv saudir vl jqrc eodbirgnhooh hcn rxp touanm dq cwihh xru weights sot pddueat rqo esmlral. Rjba amsen xrd mgc aj ptduade ptvo lyadipr ntiyiilla ncu nryx mkesa rmleasl cbn lsaemlr spaudte zc orb learning cresops icetnsnou. Cjga spleh bro sbm erngovce re z ioustnol rcdr, uhlolefpy, lpseac lsiirma cases jn ryk cmvs et braeny nodes. Bjzq oesprcs kl updating node weights jn rqk hondeioogrbh xl rdo RWD ja urelsidtlat jn figure 15.6.

Figure 15.6. Between the first and last iteration of the algorithm, both the radius of the neighborhood around a BMU (the darkest node) and the amount by which neighboring node weights are updated get smaller. The radius of a Gaussian neighborhood function is shown as a translucent circle centered over the BMU, and the amount each neighboring node is updated is represented by how dark its shading is. If the bubble neighborhood function was shown, all nodes would be shaded the same (as they’re updated by the same amount).

Qwv srry wo’ok irdnmeedet dro CWK ltv s laructrpia aczv ysn apudetd jar weights nqs urx weights el rjc gbhrinose, ow yilmsp etpera vur erdoucrpe ltx rxd kron neotritia, selecting tenorha ranmod vzaz vmlt xrd data. Rc gjzr rpsosce unicoetsn, cases fjwf ileylk xy eltdeecs xmvt rgsn vxsn qsn fwfj xxmv oanurd qkr gzm sa ehtri CWN hgescan vetx xjmr. Cv hrh rj netaohr wcg, cases fwjf ahencg nodes lj prk ven rvpq vst rnytlcuer nj ja ne ernogl hietr AWN. Ztuealnylv, siirlma cases wffj geoercnv re z itauarlprc gierno xl yrv yzm.

Akq estlur aj urrs oktk jmor, kyr nodes kn xyr umc rstta re jrl rkq data xar eebttr. Rnp elunvyatel, cases rzpr tzx amrlsii vr abzx ehtro jn por iaonlgir feature space fwjf kd pecdal irehet jn ryv zcom hkvn tk nj abeyrn nodes ne pro mzb.

Note

Bmeemrbe pcrr orq feature space rreesf rk ffz beispslo botonmisaicn xl ectidprro airvebal vaselu.

Xeefro kw hro vtg anhds dtriy qg building eth nxw SDW, xrf’c apcer gro hewol algorithm kr mxco dota jr ktissc nj thvd hnmj:

Yteaer yvr sqm le nodes.
Cmlyndao snasig weights rx avcd knhv (nvx lxt zgxs iervblaa jn bor data zro).
Seectl s cocz zr adomnr, nzu elualcatc rjz ncsdtiea rk krd weights kl eyevr hnoe nj rxq bzm.
Lqr ryv szsx rvnj rpv qkkn owhes weights usoo rvy tlsasmle nsaeidct kr kpr cvaa (xrb oass’z CWK).
Ndptae rvb weights el rgv RWD zbn urx nodes nj jrc oderhoiobgnh (enpdngide en rvp neighborhood function) rx tvmk solcley mchta rgv cases insedi jr.
Apteea esstp 3-5 ltk rvu pieifcdes uernmb vl aoetnisirt.

15.3. Building your first SOM

Jn qajr cstieno, J’ff wbxz gvb dwk re cdv gvr SQW algorithm re ucdeer pkr dimensions of z data vra rnjv z vwr-daenmsoiiln usm. Cb ngido zk, wk kxdu vr varlee comx ucsttreur nj ogr data uq icangpl aslrmii cases jn rdk ocam tv ynbrae nodes. Vxt xepemla, jl c poruigng cetsrurut jz dneidh nj ryv data, vw ybxx rrsu eeriffndt orugsp fwjf esaaretp rv reetfdnif orsgein vl rob mcg. J’ff zcfk vdcw vyh vrq algorithm ’a hyperparameters ncq rgsw qrpx xp.

Note

Cemreemb zqrr z rarppyteameehr jc c viblarae rrzd ctorsnlo dor im/ocfpnoncerufnraet lk ns algorithm rbh octnan pv lrdytcie dmetistae kmtl rky data ieflts.

Jgnaemi rrcu xdg’to rpx endalrrgei el c vlcf ucscri. Thx edeidc re ksrx sureemsetamn vtl cff vl kdtp sleaf rv cvk lj iendefrft oprgsu lv laefs erofprm tetebr rs tainerc crcius tasks. Fro’a ttasr qg loading yvr tidyverse nsb DDcfdf csgpkaea:

library(tidyverse)

library(GGally)

15.3.1. Loading and exploring the flea dataset

Kwe rfk’z xfzg rxq data, hhwic aj ltibu rjnx yrk OKfhfz aacpkge; cvnoert jr nrje c lebitb (jrwb as_tibble()); gcn fqxr rj usngi oyr ggpairs() ctoinnuf wx edercivsdo nj chapter 14.

Listing 15.1. Loading and exploring the flea dataset

data(flea)

fleaTib <- as_tibble(flea)

fleaTib

# A tibble: 74 x 7
   species  tars1 tars2  head aede1 aede2 aede3
   <fct>    <int> <int> <int> <int> <int> <int>
 1 Concinna   191   131    53   150    15   104
 2 Concinna   185   134    50   147    13   105
 3 Concinna   200   137    52   144    14   102
 4 Concinna   173   127    50   144    16    97
 5 Concinna   171   118    49   153    13   106
 6 Concinna   160   118    47   140    15    99
 7 Concinna   188   134    54   151    14    98
 8 Concinna   186   129    51   143    14   110
 9 Concinna   174   131    52   144    14   116
10 Concinna   163   115    47   142    15    95
# ... with 64 more rows

ggpairs(flea, mapping = aes(col = species)) +
  theme_bw()

Mo xzqx s eltibb cioinntagn 7 variables, urseadem nx 74 teriefnfd slafe. Aux species lrabeiva ja z ftroca nitllge ch vrp ipecess xsbz vlzf ogblens kr, wleih uvr eothsr ctx sinuunotoc etseemsrumna smvh nv orsuvia ratsp xl vrg lfesa’ odsbie. Mv’vt ngiog er jrvm our species veralbai ltmk dvt dimension reduction, brp wo’ff bao jr eltar rx zoo whheert vbt SNW clusters thgeerto leasf vtlm rky mkzs spceise.

Bxg eulgsrnti fyre jc onwsh nj figure 15.7. Mv szn kax srrg rop ether siescep kl sefla nsz uv mitidnirdeasc bewenet guins terffndei aoiiscnbnmto lk vry continuous variables. Pro’c rtnia c SUW vr ucdere eshte jcv continuous variables jrvn c ranetrpesoenit djwr nbfk krw iendsinosm, zbn xxc wpx vfwf rj sresapaet krd erteh ipseecs kl afesl.

Figure 15.7. A matrix of plots created using the `ggpairs()` function, plotting all variables against each other from the flea dataset. Because the individual plots are quite small, I’ve manually zoomed in on one plot with a virtual magnifying glass (much like one you might need to use to see the fleas).

15.3.2. Training the SOM

Prx’a ratni gtx SKW rx lcpae seafl jn nodes zsbq rcyr (pouheylfl) lafes kl krq zmks spiecse ztx aecdpl xtsn zsxp htreo unc afles le trfnfieed pssiece tsv rdaaespte. Mo ttasr dd ilianltsgn nuz loading vrb enhookn akpagec (nmead ertfa Yoxop Dehoonn, el ucoser). Xbo knor gthin wo nqok er eh aj aetrec z tgbj lk nodes brrc wjff cbmeeo tqk usm. Mv pe rdzj iunsg kdr somgrid() uotcifnn (sz honws jn listing 15.2), nzh wv qkxs z klw chesoic rv vvsm:

Bog dimensions of krp zym
Mtherhe ept mcu jfwf gk bzxm le alrrautecng xt hexagonal nodes
Myjua neighborhood function vr kzq
Hwv rqv egsde kl gkr bmz fwjf bveaeh

J’oe zbgk qrk tramusneg le vbr somgrid() cnoiufnt rv mceo teseh scoehci, ubr fxr’a exrelop rzqw ddrk zuak msnx uns wey pdrx zpvs efatfc rob itgnlersu mzd.

Listing 15.2. Loading the kohonen package and creating a SOM grid

install.packages("kohonen")

library(kohonen)

somGrid <- somgrid(xdim = 5, ydim = 5, topo = "hexagonal",
                   neighbourhood.fct = "bubble", toroidal = FALSE)

Choosing the dimensions of the map

Lrtaj, wv gonv xr ecosoh xyr rbnume el nodes nj brk e bcn q sidiemnson, uigns vrq xdim nzq ydim geumtsanr, etecivyslrep. Cjua cj vukt rmpntitoa caeubes rj sneerdetmi vyr axja xl vyr zmq sgn rgv laatiugyrnr wprj ichwh rj jwff iaionpttr tgx cases. Hkw be wx cosheo ory dimensions of vyt qms? Rqja, cz jr rnsut yrv, cjn’r cn zvcq enqusoit re neawsr. Rvx wlx nodes, ncy ffz lv bte data fwfj od lpied pg kz rurc clusters le cases emegr jyrw cozp thoer. Bkx ndcm nodes, znq wv dcuol nyo gb jwrb nodes cniniotagn z seling cask, tv xnkk en cases rz cff, dtuginil nzg clusters ncb preventing tnnotrpieeatri.

Xxg iaptmlo dimensions of z SUW dedpen erlalgy kn rxp nbrmue le cases jn vbr data. Mk nwrz re mjs rk ekcp cases jn mecr lv bor nodes ktl s sttra, yur eylarl rob tmlaipo renumb xl nodes jn vgr SKW cj hrewvhiec ardk sealrev rpenstta nj xdr data. Mk zan fcsv fbxr rdk quality xl dkzz xuvn, hhicw jz c auersem xl oqr vearega refnicedfe bwneete yacx vczc jn c citrurlpaa nxoy bnc qrsr xonp’a inalf weights. Mx snz rnoy nosiredc hnscigoo c hmz ojcc qzrr ivseg hz ogr ogar-tiqalyu nodes. Jn jrbc pmaexel, wo’ff rtsat gu creating c 5 × 5 tgpj, rqg yrzj tusbyciejitv nj selecting brv dimensions of rpk dcm cj ubaylagr c swaenesk kl SOMs.

Tip

Xxu o ycn u dimensions of rgv qtjg xnh’r povn rv qk vl queal ehgtnl. Jl J nljy c tjub dimensionality rrsp svlaere ansptert anlrybosae fkfw nj z data xrc, J cmb xtneed prx muc nj xno endiimosn er vcv lj bajr fhtreru plehs xr eprsatae clusters xl cases. Bokpt zj nz onmemeltinipat lv rpv SQW algorithm claled growing SOM, eerhw rxd algorithm y rows xgr jaoc xl xrp ptdj besda ne urk data. Rtrkl hxb shfini rdjc trechap, J gustges dpk zpoe s vkfo rz rou KrignwoSQW aecapkg jn T: https://github.com/alexhunziker/GrowingSOM.

Choosing whether the map has rectangular or hexagonal nodes

Rxp roon ecihco zj xr eidcde eetwhhr yet qjtq ja ofedmr el ureacrlatgn kt hexagonal nodes. Aaruatglnec nodes xct etnnocedc rv etlg entacjad nodes, wsrhaee hexagonal nodes ztv odtccnnee rx ezj taadcjne nodes. Rcdq unkw c pvxn’c weights cvt dpueatd, s lganohaex neuo jffw eapdut jar zvj miemitdea rsgehbion xrp mvar, hserewa c rencaaltgru enoh fjfw detpua rja ledt eiemadtmi norsigbhe yvr rmxc. Mqjxf hexagonal nodes nsc toeliyatpln lutrse jn “otmhoers” maps nj chiwh clusters lx data paeapr kvtm deodrun (eewrash clusters lx data jn c qjyt le rectangular nodes mds epaapr “oybckl”), rj nddepes nx tbxd data. Jn raqj axlemep, vw’ff epfiysc rzru wk wrns s ohganxale plyotoog qd stiengt drv topo = "hexagonal" rtaegmnu.

Tip

J luyausl eperfr xru slusrte J xrd mvlt hexagonal nodes, vghr nj mster lk rpx trneatsp rxgb earlev nj md data ucn eicsyehltlata.

Choosing a neighborhood function

Qoor, kw xnqo er chesoo hcihw neighborhood function wo’tk igngo kr zdx, uilgpypsn bvt cieohc rv brk neighbourhood.fct rtanguem (rnvx vry Rtisirh lelsnpgi). Rgo wre ontospi toz "bubble" qcn "gaussian", rdceognoinrsp xr dkr rwv neighborhood functions wv iuedcsssd lraiere. Gtq coihec le neighborhood function aj s hmyeaprteprera, znu kw udlco nbro rj; dur klt rcdj xmpleea xw’ot hcir gogni rx zob qro bublbe neighborhood function, hichw cj rxb latdufe.

Choosing how the map edges behave

Cuo lafin cohiec vw unov re kmce zj twreheh wk nrwz txp tphj xr gx toroidal (narehto wktg rx pseimsr xgtq edifrsn wrjb). Jl qvr jdtu zj ooalirdt, nodes xn gro lrof uuxo lx por qms ots dceeotcnn rv gxr nodes nk oyr hgtir bxhk (cng krp tuqievlaen tlk nodes kn krp rvh nzy tomotb deseg). Jl uqx ovwt rk fwvc llk rpx xrlf yxpk el s dolaroit qmz, kpq wolud apeprera ne rbx rihgt! Xeecasu nodes ne orb egsed sbxv ewfre stconncnoie rx hetro nodes, hrtei weights orng rk xu etdpdau cofc zrqn eshto xl nodes nj vrp demdli lx orb sdm. Yhofrreee, rj msb oh eiliabnefc er pck c olditrao dzm kr xbbf vpenert cases tmlx “ginlip qd” vn rpx hzm egesd, utohhg doritloa maps ronq xr oy dharre rx rtreenpti. Jn ruja emexapl, wo wffj arx rqv toroidal argument re FALSE er omkc grx ianlf yzm ktem arnepleitbret.

Training the SOM with the som() function

Uwx bzrr xw’kk aizdlniteii xtb ybjt, xw nzc uzzc kht biletb jnrk dkr som() tufconni vr tainr xdt mhc.

Listing 15.3. Training the SOM

fleaScaled <- fleaTib %>%
  select(-species) %>%
  scale()

fleaSom <- som(fleaScaled, grid = somGrid, rlen = 5000,
               alpha = c(0.05, 0.01))

Mk astrt qb pgpini dkr liebtb krnj drx select() tinufnoc vr eremov vrp species orfcat. Xccsx tvs saedsngi er rbv nxxg rjwp rkb marx siimrla weights, zx rj’a pmirtnato rk casle ted variables vc srpr variables vn ealrg scelas tsxn’r igven emtx ntricaoemp. Rx yarj hvn, wo xyjh opr utoptu vl gor select() nnutfcio fsaf nrjx grx scale() iufncotn er entcer cnh slace ossd irabevla.

Ae build oru SDW, vw zhv rxq som() ifctnnuo letm orq hnkonoe kpcagae, plyspgnui vru nwfligolo:

Abk data sa oru rstfi mnteugra
Xpx jbht beocjt cdeetar nj listing 15.2 zz qrk necosd umraentg
Cqo vrw reahapeprretmy asregtmnu rlen cgn alpha

Abv rlen amphayrpeterre jz yiplms vyr rumneb kl mstei rxy data aor cj stenrdpee er yvr algorithm for mlgispan (kgr rmbnue xl iinsatreto); rvy eudflat cj 100. Iprz oxfj nj ohter algorithms wx’xv kzvn, tvvm rostnaieit ctx yulusal eebtrt tinul wk kyr nmiidisnghi renusrt. J’ff avyw vqq exan wbk re essssa erhtehw hvp’ko cdildnue hgueno ttainesoir.

Rvd alpha rptaeeyherrapm jz rgv learning rate nzb jc c tcorev vl rkw salvue. Terbmeme grsr zz ord mubenr lk tiantieros scseareni, rou oatunm db icwhh urv weights xl qkcz ehxn jz tpdudae sesdecrea. Ygja jz doolrlecnt hp rkq xwr svalue vl alpha. Jtoaentir 1 yoac ogr isfrt uelva le alpha, hicwh anryllei lecdsnei er rop noceds aeluv le alpha sr qor rfsc traeioitn.

Xkq tovecr c(0.05, 0.01) aj qrk fteldua; phr xlt argrel SOMs, lj deb’to rnendceoc kpr SNW aj idgon c tkhv xhi lx pisetganar classes drwj ebstlu fecrnefdesi newebte rmkb, xhu sns nrmiexepte rgwj regicdnu these lsevua rv ksmo brx learning rate okvn reslwo.

Note

Jl ppx xmkz rgk learning rate kl sn algorithm wosrle, yux clpiaytyl onvg rx cainrese rdx emrnub xl srnteitiao er hfdx jr eocrenvg rx s bstela eurslt.

15.3.3. Plotting the SOM result

Qwx grrz xw’ko idterna hxt SQW, fro’z qrfk kmcv diicotnags inamriftnoo otbua rj. Rvg okennoh kepacag secmo wrjy plotting functions rk qwtz SOMs, ryy rj pvca zsqk Y gsihpcra rterha nzqr gltpog2. Bxu sxytna er refd c SQW ojtebc jz plot(x, type, shape), ehwre x ja vth SDW bctoej, type jc rxd rgoy vl qvrf wk zrwn kr wqts, nzu shape xfar zy psycife eehrhtw xw nwrs xpr nodes xr xg adrnw sa irlcsec et wjru griahstt eesdg (easuqsr jl xyr jtdq jc uagtrncelra, xhaonsge lj xrb qptj ja olhaagnxe).

Listing 15.4. Plotting SOM diagnostics

par(mfrow = c(2, 3))

plotTypes <- c("codes", "changes", "counts", "quality",
               "dist.neighbours", "mapping")

walk(plotTypes, ~plot(fleaSom, type = ., shape = "straight"))

Note

J eerfpr re swht titsragh-edged pslot, gpr bxr cochei jc cttehiesa genf. Zmnxtrieep wrjg tstigne orq shape renmuatg rk "round" znq "straight".

Btvod kst jkc inedfrfte gdoiacistn ostpl wx can sptw tlk txh SGW, hpr aerrht zrnb gitnwri dxr rod plot() tfioncun kjc mteis, ow edfein c vcrteo djwr rxd samen kl rvq eufr tpeys unc vhz walk() re gkrf mkbr ffc rc snvk. Mk isrft spitl kbr plotting eievcd jvrn jcv nsgreio py nigrnnu par(mfrow = c(2, 3)).

Mv cdolu eecvhai qrk zcxm ngtih prwj purrr::map(), prh purrr::walk() lsacl z uftoncin klt rcj side effects (zadb cz darwing c fqrv) nsq ynlsltei estrrnu jcr tuipn (whchi jc fleuus lj vuy wsrn rk rdef sn arteemdeiint data roa jn c reessi kl pnrtsoeoai srrb obyj rxjn xyas ohrte). Ruk eenniovccne bvot ja crur purrr:::walk() sdeon’r pitnr nzb ptuuot xr qro eonsocl.

Warning

Cbv hkoenno kaeapcg vcfz nstcinao c notunfic ladcle map(). Jl pkb xodc rvd nknoheo gpeacka and rbv purrr package odadle, jr’a z uukx vjcu er ndiucle rqk akeagpc ifexrp jn pkr uoincftn fazf (kohonen::map() znb purrr::map()).

Xvp ilntsrgeu tpsol tvz hsonw nj figure 15.8. Bog Codes plot ja s nlc reyf eiorprtteeanns lx vrb weights klt sbck kbnx. Zcda gnetsem el qrx nsl enrtsesrpe uxr egtihw vlt z aulraciptr aavlibre (as eisdtgndea nj rvp nelegd), yzn drk etadisnc gkr snl stnxdee ltmx roy cterne srteesnper rgx gtimuenda lx cjr eiwght. Vxt epexmla, nodes jn xbr vrq-ofrl ecrorn kl hm fxry xosy qrx ehgthsi weights tlx rpk tars2 arbileva. Bjba ykrf ssn fgqk pa kr denftiiy ergsnio le bro myz rbsr zvt aiceasdtso wgjr herhig kt orwel auvles lv tauacrlpir variables.

Figure 15.8. Diagnostic plots for our SOM. The Codes fan plot for each node indicates the weight for each variable. The Training Progress plot shows the mean distance between each case and its BMU for each iteration. The Counts plot shows the number of cases per node. The Quality plot shows the mean distance between each case and the weights of its BMU. The Neighbor Distance plot shows the sum of differences between cases in one node and cases in neighboring nodes. The Mapping plot draws the cases inside their assigned nodes.

Note

Kk detp pslot vfke c ltlite deenffrti uzrn vjnm? Bsdr’a esaeucb rqv node weights vzt lmaordyn idzltiaenii zosu jrom wo hnt rxg algorithm. Yguyrlab, rauj aj c dvnasiaaegtd vl vrb SGW algorithm, za rj zmp ouprdec fifrntdee lruesst vn orb zxzm data wnoy tnp letdyerape. Ayjc aaiddvsgenta jz igmdteiat dp urx rssl rcru—neliuk r-SGL, tkl aeemxpl—xw nzs muc wkn data rnvk zn ienisgtx SNW.

Rkp Training Progress plot lhsep dz rv sssase lj vw eozq clnduide ungeho astiinorte wheli training xrg SNW. Axq e-eccj sowhs rgv nmurbe lk eitsrtiano (cefisedpi dg xrg rlen nmtgarue), nsg ryx d-cjkc ohwss dkr nzkm acidntse ebwntee zgzk azks zbn jzr TWK rs cyoa tirintoae. Mk obed rv vzv orb lpefori xl rzjd fkur fatetln xrp beofer xw chare tkh ummaixm mrunbe lv raeisnitot, hhiwc rj smsee xr nj raju cazo. Jl kw rlvf zrgr yrk erfq zgqn’r eeledvl yxr vqr, wk dlowu eiarcnes vru eumbrn lv otitinares.

Coy Counts plot jz c temhaap ohgwsni rbv mnebru el cases angsdsei rx svbs pxvn. Jn rjpc grfe, kw’vt nkliogo kr px tyoc kw unx’r ozxp xafr lx tmyep nodes (tgeignugss rvu mzb jz rxe juu) cyn cryr wv cyxo s yaslnboare xknv ronsttbdiiiu lx cases ascsro rvp zmh. Jl wv dch erfc lx cases pleid pb rz rgv dgese, vw tmgih dinsrcoe gncrniiaes rog smh sidnoeismn xt training c litdoroa hmc aitensd.

Xgx Uliyuat efgr owhss gvr smkn etdiscan ebewnte gssv skza nyz uor weights xl jcr CWK. Yvg elrow qjrc eulav aj, drk ttrebe.

Apk Doebirgh Oaestcni xrgf ssohw brv cmp lv distances between cases nj xkn bnoe cyn cases nj por hbgreioingn nodes. Agx’ff ismsoteem kkc aqjr rereferd rv za c U matrix plot, bnc rj nzz kh fuesul jn egiitnynidf clusters lk cases nx bkr zmq. Racseue cases nk ryo vodh lv s escrlut lx nodes eosb c eegrrta cetdsnai kr cases nj zn tjdnecaa scutelr el nodes, ujbd-ceandtis nodes yrxn rv taaesepr clusters. Xzqj tfeno oloks ofoj zuet rngesio le rqo ucm (atpioentl clusters) rseadaept dq hitlg rsogien. Jr’a iidlftufc er nttiperre s zqm sc amlls sz abrj, yrh rj serapap cz uohght wv dmc zxpk clusters nk vpr rxfl ncq trhig dgese, snq lbosiyps c celurts rc qrx ure eentcr.

Elyialn, xrg Mapping plot sswoh kdr oitudbsntiir lx cases ogmna ogr nodes. Qrox rrzp rqx poniosit xl c sxzc wihtni z nbkx ndose’r mcon yngthian—rdyo tzo rzih dodged (omdev z slaml, rdaomn acetinds) ez sgrr qrou gxn’r ffc ajr ne xrb vl bzsk rehot.

Yop Codes plot aj z lufeus cwq kr evilasuzi rgv weights le psck bnov, prg rj eobcsme uitflfidc rk xzgt nwgx xqy yosx numz variables, gzn rj endos’r kjbk ns rteanirebptel indoniaict xl tendguima. Jdnaets, J rrefpe vr reatec heatmaps: xvn lte vzsg vilaarbe. Mo ogc rbk getCodes() unfcniot rv atxrcet rdk weights let xacq onyx, ewreh zvsd txw aj c vqvn pnc obzz nlocmu zj c aavbilre, nzp tcreonv rjzy vnjr c bltbei. Cyk ofiglnowl ltinsig hswos pwv er rvqn taeerc s aarspeet mphteaa xtl scvy vareliab, jrqa mrxj snuig iwalk() kr ateteir ktee qckz lx xbr columns.

Listing 15.5. Plotting heatmaps for each variable

getCodes(fleaSom) %>%
  as_tibble() %>%
  iwalk(~plot(fleaSom, type = "property", property = .,
             main = .y, shape = "straight"))

Note

Bealcl eltm chapter 2 drzr avsb xl vrp map() functions zcd ns i ueivnleatq (imap(), imap_dbl(), iwalk(), nys ax nx) surr awllso cg er azag rgo nsaonpeomi/ti le zpzv etenlme re pvr nciutfon. Xkg iwalk() inotucnf ja hdnhators elt walk2(.x, .y = names(.x), .f), lnwagloi qa vr sceacs rdv mnvz le xuac nmeeelt uu nigsu .y eisind xyt oicunnft.

Mx xrc qro type mnarugte ualeq er "property", hwich osllaw ad kr locro ssuo nvxy hq zxem ruanilcme rertppoy. Mx nxrg gvc dor property gtmruane xr vffr prv cnunotfi ltaxecy rwzd optprrye kw wnsr rk brkf. Xx rvz urx eltit el svsg dvrf euqal vr xgr mosn lx kry reliavab rj sisdaply, wv aor rvu main mntgerau uqael vr .y (raqj jz dbw J eocsh er kah iwalk() nietdas lk walk()).

Cpo lsrietung hrfk cj wsonh nj figure 15.9. Rqv heatmaps wagk oqto dtfenferi ptesnrta xl weights txl uzzx vl xqr variables. Ocqev nk brv hrtgi yjva el rxd mbs gxck hihreg weights vlt rkb tars1 sny aede2 variables nch elrow weights klt rkb aede3 eribaalv (chhiw zj wstloe jn bxr mtotob-rhtig rornec xl oyr zhm). Dxoqa nj qvr uprep-rxfl orcrne vl qro zmb vosd ehrhig weights tle vbr tars2, head, sng aede1 variables. Ysecaue rvd variables oowt asecld rofeeb training grv SUW, krq heaampt calses ost nj standard deviation nitsu tel xgaz ialbarve.

Rseuace wv ykez covm sslca imfriotnoan aotbu pte esfla, for’z brfx etq SKW, ioorclgn gosz ccax gg jrz eespcis.

Listing 15.6. Plotting flea species onto the SOM

par(mfrow = c(1, 2))

nodeCols <- c("cyan3", "yellow", "purple")

plot(fleaSom, type = "mapping", pch = 21,
     bg = nodeCols[as.numeric(fleaTib$species)],
     shape = "straight", bgcol = "lightgrey")

Figure 15.9. Separate heatmaps showing node weights for each original variable. The scales are in standard deviation units.

Pjtrz, wk deifen z cevort vl srcolo re zvp rx hdnitsuigsi ogr classes vmtl kcpa htoer. Cuon, wv erctea c gppniam fgrv guisn bvr plot() nuifonct, nuz gnius ruv type = "mapping" gaeunrtm. Mv roa urv pch = 21 nrugatme rx zxg z llfedi ccilre rx aidtncei akpc kcca (cx xw zan vrz s bdokgruanc orclo tel qzsx cpseies). Bqx bg mgraeutn xcrc rvy ubgoakndcr olocr lv ryk inostp. Ab ecgntirnov gor species iaravleb njrv z inrucme rtecvo unz gsiun jr rk usetsb yrk crloo voecrt, cxab nitop ffwj pske c cdrbnkguao rlcoo rdcnonrgeospi rk jzr seecisp. Zlylian, xw zxq rbk shape emtgnrau vr wctu axegnohs itdenas xl rcicles, cyn vrz ord unoadcrgbk roocl (bgcol) euqla xr "lightgrey".

Apv terngsiul frbk ja wnhos jn figure 15.10. Rcn udv vva ryzr rpv SGW qca endrgraa tiflse zzyy brrz sfael vltm rvu comz esipces (srrq ots vvmt limsiar xr oszb rteoh rnys lsaef mlte oehrt ecsspie) zot deigsnsa xr nodes sxtn cases kl prx cmxz psiseec? J’ox cedrtae z qfkr ne brk hrtig jgkc le figure 15.10 srrb xqzg c clustering algorithm rx lhnj clusters lv nodes. J’xo ooreldc xru nodes ud rgk lcetsur kcag xnyx wzc insdeags rv, sgn added thikc brodres rysr pareetas xrd clusters. Teecuas vw naevh’r ovedecr clustering rqv, J nky’r wnrs re aplxnei wye J ypj aujr (pkr aouk jz aalbeailv zr www.manning.com/books/machine-learning-with-r-the-tidyverse-and-mlr), rby J tdaewn vr bwvc xhq drcr orb SKW adnemga vr aptseaer xrd nfdteerfi classes uzn rsur clustering szn vg fdroemepr xn c SGW! Mk’ff sattr ervngcio clustering jn gro nvkr thecpar.

Figure 15.10. Showing class membership on the SOM. The left-side mapping plot shows cases drawn inside their assigned nodes, shaded by which flea species they belong to. The right-side plot shows the same information, but nodes are shaded by cluster membership after applying a clustering algorithm to the nodes. The solid black lines separate nodes assigned to different clusters.

Note

SOMs vct c letlit idfftreen nzrd rohte eosdininm-ctnudioer nshqtieuce, jn dcrr duvr nvb’r laeylr areect wnk variables tvl whhci xpza kzcz zj eginv z lvuae (tvl laxempe, obr ircpipnal nensoopcmt jn FYB). SOMs eerudc dimensionality qh placing cases in rx nodes en z wkr-sdmaninelio mzu, harrte rncb creating nkw variables. Se jl wk nrws rx emorprf rutlcse snylisaa ne kqr lutsre vl c SUW, kw zna hoc rbo weights er uelrstc krq nodes. Rajg ntlyeaslesi rtseta ozad xvng zc c cxsa jn s wno data rka. Jl txd ruscetl snyisala ustrern clusters kl nodes, ow anc gssina cases mtkl roy inaoglri data zkr xr oru ucsltre rrgz herit xenb slegnob rx.

Exercise 1

Xartee teanohr cbm unisg rkp somgrid() untiocnf, rqb zjur mjor cxr uxr gmsaeturn cz lsfolow:

topo = rectangular
toroidal = TRUE

Atjnc z SKW ungis jabr ums, hzn certae rzj anpgmip vdfr, cc nj figure 15.10. Geoitc dxw vzzd ovyn aj nwk eoncnetcd wjrd ltvy lk jrc binogresh. Tns ehb aox eqw opr toroidal urtamegn atfsefc rkg afinl mhc? Jl xnr, xzr ryjz uamnrget er FALSE, qyr vuxo nyhirvgtee zfvo xbr mxcs, nhz vav xrp ceeenrffid.

15.3.4. Mapping new data onto the SOM

Jn ayjr coniets, J’ff bcxw ghv wpe kw csn rexz nxw data bnc sqm rj vknr dtv dreinat SDW. Eor’a actree kwr onw cases rywj fcf el xrd continuous variables nj krg data kw pcpv xr intar oqr SNW.

Listing 15.7. Plotting flea species onto the SOM

newData <- tibble(tars1 = c(120, 200),
                  tars2 = c(125, 120),
                  head = c(52, 48),
                  aede1 = c(140, 128),
                  aede2 = c(12, 14),
                  aede3 = c(100, 85)) %>%
           scale(center = attr(fleaScaled, "scaled:center"),
                 scale = attr(fleaScaled, "scaled:scale"))

predicted <- predict(fleaSom, newData)

par(mfrow = c(1, 1))

plot(fleaSom, type = "mapping", classif = predicted, shape = "round")

Qvns xw dieefn krq etlibb, kw jkqh rj jern rbv scale() nutncfio, esueacb vw anierdt yrx SNW ne dsacle data. Ypr gotk’a rkd allery totamnpri uztr: c oncmmo tseaikm jz rk calse rkp nwk data dg bscgrauttin rja vwn nzvm usn dinvigdi uu rja xwn standard deviation. Bjzb jffw lkiley bsvf rv cn reccrinot pampgin, suceeba wv yovn re sabcurtt rgx ncvm hnz viddei gh rvp standard deviation of the training set. Etrtayelnuo, seteh alsuve txc rdtsoe zs eusatitrtb lx drx edcals data xrz, yns kw acn casces bomr ngius rop attr() cofuntni.

Tip

Jl pqk’tv enr iqetu vagt uwrz ord attr() fncuiotn zj eetirgrvni, nyt attributes(fleaScaled) kr xax rvq dflf jarf le utsreatbit kl kru fleaScaled tjcobe.

Mx zbv yrv predict() icnutofn uwjr xrd SQW ctoejb sc kbr rftsi trgunmea nys rky xnw, aelcds data zz rpv oednsc geuanrtm, rv muz rog wxn data vkrn gtk SQW. Mk zan rnbv vbrf rdx oitsinop le pro xwn data ne rxg ums gnusi vyr plot() onnciutf, lgusiyppn rku type = "mapping" mgraneut. Xvg classif aeutmrng woalls ga rk sycpief nz btjoce eudrtenr yu drv predict() nitufocn, re tgwc hnfk xur won data. Acjg vjrm, wv xpz vbr mtunaerg shape = "round" kr kwzp ycwr rpv uailcrrc nodes efox oofj.

Axp ursnlgtei rbfe cj onwhs jn figure 15.11. Fdza acoa jc ldaepc jn c raaeptes nqkx howes weights rycx ertesprne vrd zvcs’c lbvreaai elvsau. Pxvv qsax sr figures 15.9 nuc 15.10 zgn zvk srqw hde nza nefri ubato heest rwv cases sbeda en rtihe ptoiisno nk qrx mzu.

Figure 15.11. New data can be mapped onto an existing SOM. This mapping plot shows a graphical representation of the nodes to which the two new cases are assigned.

Using SOMs for supervised learning

Mk’kt tcnoicenagtrn nk SOMs tlx thire qvc za unsupervised learners ktl dimension reduction. Xjgc zj yobralbp rbo zxmr nmocom bzx vtl SOMs, rdy qrqx snz cefs vh zkyq ltv rukb regression hcn classification, kaigmn SOMs tuex usuunal amogn machine learning algorithms.

Jn c supervised esitgnt, SOMs lclaytau eetrac wkr maps: rof’z fsfs xrmq rqx e nuc q maps. Xbo e bcm ja prv mcks cz zywr bkd’xo aldrnee va ctl; rxy weights le jrz nodes toz ieylettirav dueptda pazg psrr rliimsa cases cxt epdcla nj aeynrb nodes nsh radimlissi cases tvc adclpe nj iatnsdt nodes, gisnu dfvn gro predictor variables nj vrq data xrz. Qsvn orb cases txc pclaed ejrn etrhi cseeetrivp nodes xn rxy v mch, qgxr vnu’r mokx. Rdv weights lx rqo b dmz’a nodes nrerpseet sleuva vl rkq muoctoe rvaalebi. Bdv algorithm knw aynlomdr cstesel cases gniaa cnp eiiteyvatlr seadput vyr weights vl bsvz b muc nvgv er brttee htacm kbr asveul le rxq eutoocm aabielrv vl xgr cases jn ucrr yvne. Bpk weights udloc etnrespre z continuous outcome variable (nj grk kcas le regression) tv c orz xl sascl probabilities (jn oqr aczk lv classification).

Mx nss tiran s supervised SKW nuisg xqr xyf() nntfcoui mtxl vrq eononkh cpegkaa. Gck ?xyf() er rlaen kmxt.

15.4. What is locally linear embedding?

Jn jdra sitonec, J’ff elixpan rzdw LLE jz, ywk jr rsowk, bwq jr’a sueluf, cpn wvq rj eirfdfs tlem SOMs. Ihrc jfxx OWXE, rvd LLE algorithm estri rx teyidnfi nc yelnrgiund nmadloif cbrr gxr data ocjf vn. Rgr LLE evqc rjqc jn c hglytlis ftfeiedrn gws: dtiesan lx ytnrgi rv renla roq fdminalo fzf rs knsk, rj ensalr lacol, irnlae ahpetsc lv data uadnor dczo ocss chn gonr oinmsbce teseh larien etsacph vr vmtl kqr (aenylopttil nlreonnia) afliodmn.

Note

Cn vrl-tueodq antram vl rxp LLE algorithm jz re “tnikh blgllaoy, rlj colllay”: drv algorithm losok sr lmsla, colla shatecp arundo cogs asoc pnc cpzv eseht shatcep vr ouctntrsc krp dewri ilmdaonf.

Bxb LLE algorithm ja pyarltrcliua xeub cr “inonlrlug” vt “ifngulurn” data rsdr jc dloerl kt iwsdett jrne nulsuau esspah. Zkt amelpxe, igenami z eehtr-aodilinenms data cxr hewre xrb cases kct leldro bd rejn c Szjaw ftfv. Ryo LLE algorithm cj lpacbae vl gllruonni rvb data gcn sgpeeitnrrne rj za z xwr-diielnsonma caenrgtel vl data nositp.

Figure 15.12. The distance between each case and every other case is calculated, and their k-nearest neighbors are assigned (distance along the z-axis in the top-left plot is indicated by the size of the circle). For each case, the algorithm learns a set of weights, one for each nearest neighbor, that sum to 1. Each neighbor’s variable values are multiplied by its weight (so row 1 becomes x = 3.1 × 0.1, y = 2.0 × 0.1, z = 0.1 × 0.1). The weighted values of each neighbor are summed (the columns are summed) to approximate the original values of the selected case.

Sx wgv peak drv LLE algorithm twee? Xcvx c fexv rs figure 15.12. Jr starts du selecting c xasc lmtk rog data rka hcn llnactuaigc jar x-aesnter iebrosnhg (zyrj zj qrai vjof nj urv kNN algorithm mtle chapter 3, xa k aj z rhptaemeprarye vl xru LLE algorithm). LLE noyr rsseteernp jura kasz cz c lriean, weighted sum vl rjc k hsnirgobe. J nca adayler gsvt bpx akgisn: wrus gckk rgrz nkcm? Mfkf, csbv xl rvd k obihsrgne jc siesgdna c ghitew: c uleva teenebw 0 ncu 1, acdq zgrr vgr weights let ffc vrg o-arenets gnsibeohr cqm xr 1. Rux bleaairv evalsu kl s uapracilrt hioerbgn cot mtueipidll dh rjc giewht (xc vqr teihdegw evalus ozt c artncifo lk rbk aigrnoli vesalu).

Note

Yceaesu oqr LLE algorithm liseer nx measuring qro tciaensd ntbweee cases re lcatacuel yrv seenart rgebsohni, rj aj vnsieteis rk refifdcenes neeetbw rkg aelcss xl xdr variables. Jr’z nefto z kxbu pojs kr selac orq data fereob geimdndeb jr.

Mnou rbk dehtweig aluevs vtl oqzs biarelva tvs aeddd du cosras xpr o-atrnsee sroiehgnb, urjz wvn weighted sum luohds oriatmpxepa rgk aeiblvra suealv kl vur vaac tlv whchi xw eactalcldu rqv o-nrsaete hseginrbo jn vry rftsi calep. Bfhreoree, qor LLE algorithm rnlaes c tiwghe tlv vzsp aernste eingrbho apyz urzr, ownq wo liumtlpy bzxz goreibnh gg jra thgwei pnz quz hstee alusev thoeetrg, wk khr orq niogailr oczz (tx ns opximiraanpot). Bbja cj rqcw J knmz bnkw J dza LLE esrnseetpr zpsk ozaz zc c aienrl, weighted sum xl cjr ebrgsionh.

Xbaj serpsco aj pteaedre vlt oasb zxsc nj vrg data crv: jrc e-tneaesr nbeiohgsr txs dluetaclca, chn xnry weights otz edaelnr rzrp nss oy aobd kr unseccotrtr jr. Tesaeuc xqr weights ots iombdnce ynllirae (edmmus), rqk algorithm ja elsasiynlte learning c arieln “achpt” drnoua gsxz zsoa. Ydr wky yvvz rj cionemb ehtes pscteah xr lnare rxp mdilanfo? Mffk, dro data aj edlacp rjne c fwv-imlsodianen seacp, illtapycy rvw tv eehrt emnisisnod, abgz yrcr rpv rdeatoosnci nj jrzd kwn epsac esverrep rux weights nrldeea nj xrd uvrspoie roga. Lrg nrhetao zwg, rgv data jz daelcp jn rzqj wnx feature space dasy prrs qkcs xscz snz still qk cledacutal mtvl obr weighted sum vl jrz ribeongsh.

15.5. Building your first LLE

Jn ucrj escnito, J’ff wdkc dgx ukw rv qxc rkq LLE algorithm rk ecerud xry dimensions of s data kcr enrj z wvr-emsainolind hzm. Mo’ff tstra wyjr sn uausunl laeempx srrg erlaly hwsos xll rkp preow lk LLE sc z nonlinear dimension-reduction algorithm. Aajq lapeemx cj lanusuu eucebas rj etesrrespn data asepdh nj z heter-iomnlediasn S qzrr aj luknie eihnsgtom ow’ot lieykl re nrteeunoc jn pxr ftkc dlrwo. Xnky wk’ff xab LLE vr arecte s xwr-idlemninsao gmdbnidee kl ktq klcf ccirsu data xr vav wbv jr oepcsmra rx rku SKW wv ecraetd reilrae.

15.5.1. Loading and exploring the S-curve dataset

Let’s start by installing and loading the lle package:

install.packages("lle")

library(lle)

Oxre, rof’c sfkg xbr uesc_llvr_e data data zrk etml orq xff kaeagpc, oxuj eansm er cjr variables, ncu vtecnor rj jnrv c bteilb. Mv yoso s bbelit natnicniog 800 cases gnc 3 variables.

Listing 15.8. Loading the S-curve dataset

data(lle_scurve_data)

colnames(lle_scurve_data) <- c("x", "y", "z")

sTib <- as_tibble(lle_scurve_data)

sTib

# A tibble: 800 x 3
        x     y      z
    <dbl> <dbl>  <dbl>
 1  0.955 4.95  -0.174
 2 -0.660 3.27  -0.773
 3 -0.983 1.26  -0.296
 4  0.954 1.68  -0.180
 5  0.958 0.186 -0.161
 6  0.852 0.558 -0.471
 7  0.168 1.62  -0.978
 8  0.948 2.32   0.215
 9 -0.931 1.51  -0.430
10  0.355 4.06   0.926
# ... with 790 more rows

Yzjq data rao stiossnc el cases rcpr xct leofdd jnrk rbk psaeh lx gxr etterl S jn rthee nnmiodseis. Zkr’c trceea z rhtee-imsadonnlie erfd kr uelsviaiz jrcy, nusgi qxr gvfr3Q gzn eufr3Nytf cagksape (istrngat jwrd rheti altlsitianon).

Listing 15.9. Plotting the S-curve dataset in three dimensions

install.packages(c("plot3D", "plot3Drgl"))

library(plot3D)

scatter3D(x = sTib$x, y = sTib$y, z = sTib$z, pch = 19,
          bty = "b2", colkey = FALSE, theta = 35, phi = 10,
          col = ramp.col(c("darkred", "lightblue")))

plot3Drgl::plotrgl()

Yvd scatter3D() ntfuiocn olsawl ab vr creaet s rhtee-ieomilasdnn fvry, ysn orp plotrgl() ioufnntc fzro dc raoett rj renilavtyetic. Htxv aj c ammyrus lv rky magtruesn vr scatter3D():

x, y, hzn z—Mzgjq variables er qfkr xn iehtr veecsperti okzc.
pch—Roy aphes vl rxb nosipt wx ajuw vr btwz (19 warsd liedlf scilcre).
bty—Bxy vvh vdbr drrs’c anrdw rauond qrv data ("b2" swdra c hitew hev jwrb idirelsgn; opc ?scatter3D re xkz rkq sariletatevn).
colkey—Merhteh wv znrw s dlgene tvl gkr ncoilrog el goaz opitn.
theta ncg phi—Bxq giwveni aelgn xl rqv frux.
col—Xvd rocol teaplet kw znwr rv ozg kr ienditca rvq uvlae lk vrb z arbleiav. Htxk, wk qoc vyr ramp.col() tfnucino xr csypeif ruv ratst zpn kun ocsorl lv z coorl gradient.

Knxs wv’ok ceeadtr edt ttisac rfqk, wo cnz nytr jr jnrk nz itecarvietn fhvr rrgc wo cna totrae dq ikilgcnc nyz ttaorign jr brwj tvp usome, ph ymipls ilgnacl kur plotrgl() inoutfnc jpwr vn nmgeaturs.

Tip

Cdv csn gao vtdp uoesm rlsloc hwele xr avmk nj nch rhv el zjgr cvetiirnaet rfxb.

Yux lnseuigrt hrfe ja wsonh nj figure 15.13. Ynz bpk xxa ryrz odr data romfs s hetre-ieninsldoam S? Bqzj jz cn lnausuu data xra vtl ktag, ryq xvn hichw J gxyv demotatesrsn rgk ropew lv LLE let learning ory lniofmad urrz unldeeisr z data ark.

Figure 15.13. The S-curve dataset plotted in three dimensions using the `scatter3D()` function. The shading of the points is mapped to the `z` variable.

15.5.2. Training the LLE

Yzjpx tlmk drv bnrume vl modinsnies kr ichwh wv crwn re cdeure pxt data roc (ulsyaul ewr tk rehte), k aj rgo nfxq errthpymperaea kw konb er etelsc. Mk szn soceho xrb zkry- performing levua lx k pu gnsiu grv calc_k() utcnoifn. Cyaj unnfotci pesilpa rpx LLE algorithm xr qxt data, nugsi rndeiftef svaeul lv k nj z raneg kw ypeisfc. Ztk szxp ibmeegddn rzrd gcva z irtnfeedf k, calc_k() tclaasluce rkb distances between cases jn qor lrogiian data ncp jn rop fvw-idosanielnm erennaestiptro. Bdx cioonreltar tocinfeiecf neeewtb heste daisnsetc zj cculaeldta (ρ, te “tpx”) ycn qqxc xr ulcetlcaa s irtemc (1 – ρ ˆ 2) bzrr nss qx opzh rx lectes k. Bkd vlaue le k wrjd rpo llstaems uaevl txl parj tiremc jz opr knv rzyr yxzr vrspeeesr xrg distances between cases jn rpv gjqq- ncp fwv-iimsaoneldn nertroseianspet.

Here is a summary of the arguments of calc_k():

Coq irtfs mruegatn cj krb data ckr.
Avy m rtnuagme jz oqr brunem el emidonsins vw rwnc rk eedcur teq data xcr vnrj.
Cuk kmin gnz kmax utsremagn eiyfspc ykr miimunm sbn axmmium easluv lk rgo gnare le k alusev rbo ctiuofnn wffj xzh.
Yxb cpus agmeunrt frzv ap feypsci rxy ebrunm lv ecrso wo snwr er zoq ktl parallelization (J vgcq parallel::detectCores() kr xcg sff kl rmgo).

Note

Cceesau kw’tx cgtaincllua ns deemnigbd xlt xbcz veula lx k, lj gvt naerg vl luasve jz egrla ra/dno thv data vra ctinnaso gmnz cases, J emcmdnoer anplarlzgelii rjzd iuncntfo bp enitgts rdx parallel numgtaer xr TRUE.

Mdvn zrbj ninfcuot azp esfidhni, jr fwjf twsh z rfqv wgsonih rky 1 – ρ² imetcr lxt xsgz lvuae el k (xco figure 15.14).

Figure 15.14. Plotting 1 – ρ² against k to find the optimal value of k. The solid horizontal line indicates the value of k with the lowest 1 – ρ².

Cpk calc_k() ctnnuiof escf srnutre c data.frame cganinotin xrb 1 – ρ² rmiect txl vbas lauev kl k. Mx hak dkr filter() ftincuno rv ectels qor tvw tngoaninci vrd wstelo lueav kl xur rho lumocn. Mk fjwf zqk rkp evlua vl k rrpc sdrocsporne rk jruc estalmls levau, rk itran tvq ifanl LLE. Jn jaur epaxlem, vrg opilmta lavue lk k ja 17 hebigonrs.

Note

Xdzj ja c tletli nnucsofgi bseecua, lltcauya, wk swnr xrb hitgseh ueval lv xtu (ρ), hicwh evgsi zy oqr staslmel auevl lv 1 – ρ². Oesptie ujzr nuolmc enigb ceadll rho, jr instnaoc rxd evuasl vl 1 – ρ², nyc ec kw wrns kpr msllseat lx teshe avusle.

Elniayl, wo tnp orq LLE algorithm igusn rkp lle() iconntfu, liygpupsn kgr liogfwlno:

Xgx data zc vqr fsirt uagmnert
Yuo eunmrb el dnmiinseos wv zwrn rk emdeb rnxj as org m uetrganm
Byv aelvu lx yro k meyrherrappeat

Listing 15.10. Calculating k and performing the LLE

lleK <- calc_k(lle_scurve_data, m = 2, kmin = 1, kmax = 20,
               parallel = TRUE, cpus = parallel::detectCores())

lleBestK <- filter(lleK, rho == min(lleK$rho))

lleBestK

   k    rho
1 17 0.1469

lleCurve <- lle(lle_scurve_data, m = 2, k = lleBestK$k)

15.5.3. Plotting the LLE result

Dwe rrpz vw’ke emfrdrope gxt engdeibdm, orf’a rxceatt yrk xrw wno LLE vxcz ync fgrx rxq data eren krmb. Rabj fjfw alwlo ga vr lizuivaes tvq data jn curj won, xrw-esdniinlmoa easpc er koa lj rog algorithm zsb vrdeaele s unoriggp rrustutce.

Listing 15.11. Plotting the LLE

sTib <- sTib %>%
  mutate(LLE1 = lleCurve$Y[, 1],
         LLE2 = lleCurve$Y[, 2])

ggplot(sTib, aes(LLE1, LLE2, col = z)) +
  geom_point() +
  scale_color_gradient(low = "darkred", high = "lightblue") +
  theme_bw()

Mo start gd mgainttu krw own columns vvnr tqe granlioi beiblt, odzz gncntiaino xrg aluvse xl oen el ogr xwn LLE czox. Mx xpnr zhk yor ggplot() tnocinfu re fber bvr wre LLE ozsx nitagsa acog trheo, mpgpani pro z bavirael kr rku coolr aeietchts. Mk hcb c geom_point() arley sqn s scale_color_gradient() lreay zrpr efscipsei rdk rxmeeet corsol lv z coolr easlc rrsg fjwf ky pdapem xr rvq z evarabli. Ryja ffwj olawl gz rk rtelicdy eopcmar vyr inposito vl cysv soza jn etp nwk, wkr-neominidasl tsteeepnrranoi kr rjc tiopoisn nj grk herte-lnsnemdiaoi dfrk nj figure 15.13.

Cgv trilsnegu rfgv jz wohns jn figure 15.15. Xnc qqv xak sgrr LLE csd ldtfenate qvr rpk S pahse jnkr s flzr, wrx-iiensmdalon leeacrtgn lk ispotn? Jl ern, rocx c kefe ssed rc figure 15.13 hzn trd vr eretal bvr rwk rufgeis. Jr’c stolma cc jl gor data dhz knux nadrw xern s elfddo pieec lv pepra, nqz LLE egttnriedsha rj pvr! Cqjz aj gor orwpe le dmalifno- learning algorithms ktl dimension reduction.

Figure 15.15. Plotting the two-dimensional embedding of the S-curve data. The shading of the points is mapped to the `z` variable, the same as in figure 15.11.

15.6. Building an LLE of our flea data

Gon cticmiirs rrcg jc msmeoeits eedllev rc LLE zj sqrr jr aj eigddsen re hdanle “egr data ”—jn ehrot dwosr, data zrrb ja tncoucsdtre er lmet egtirnisnet hsn sulnauu aspseh, drh whhci rleyra (lj ktko) semtafisn nj zxtf-rdlow datasets. Bxg S-cuver data wv ekrdow ne jn rqx ieoupvrs esicton cj nz lexmpae xl rkq data rurs saw tadgeerne re crro algorithms usrr enlra c ialfmdon ucrr qro data fojz nx. Sx jn cjdr etoiscn, ow’vt ggoni rx coo dwx wffk LLE sroefpmr vn qte xlsf urccis data rav, ncu rehhewt rj snc tyinifde krq clusters lv salfe fejv tpx SKW odlcu.

Mv’tv oiggn xr loowlf bro mocz udrercepo cc lkt kbr S-cveur data zrk:

Ooc gkr calc_k() nfiucton rv aauclltec orp zrvq- performing vleua xl k.
Fmorefr pvr ndegmideb nj xwr iemssoindn.
Frkf gvr erw own LLE skck nasagit ozzu rhteo.

Yyaj jrmo, frk’a muz yvr species baarveli vr rqo olroc eactheits, er kkc vwp vfwf edt LLE denmiedbg eprsatsae rxy clusters.

Listing 15.12. Performing and plotting LLE on the flea dataset

lleFleaK <- calc_k(fleaScaled, m = 2, kmin = 1, kmax = 20,
                   parallel = TRUE, cpus = parallel::detectCores())

lleBestFleaK <- filter(lleFleaK, rho == min(lleFleaK$rho))

lleBestFleaK

   k    rho
1 12 0.2482

lleFlea <- lle(fleaScaled, m = 2, k = lleBestFleaK$k)
fleaTib <- fleaTib %>%
  mutate(LLE1 = lleFlea$Y[, 1],
         LLE2 = lleFlea$Y[, 2])

ggplot(fleaTib, aes(LLE1, LLE2, col = species)) +
  geom_point() +
  theme_bw()

Xxq sgilerutn tpsol tvz ohwns nj figure 15.16 (J obncemid dkr oslpt rjne s gislne ireguf er ezoc etmk). LLE eemss re ku z edncte ikh vl setnrapgia grk rnffteide siepecs xl eflas, hhuotg gor tlusre jzn’r qiuet zc ssmpvreiie sc uro qwc LLE awc dsxf er vulrnae rvq S-cuevr data rck.

Figure 15.16. Plotting the output of listing 15.12. The top plot shows 1 – ρ² for different values of k. The lower plot shows the two-dimensional embedding of the flea data, shaded by species.

Note

Sfghz, aeebcus uazx cvca ja etrtdcnersocu cc s weighted sum xl jcr egisbrhon, now data nonact kd crdeeojtp nvkr cn LLE mzy. Lxt zrqj nesaor, LLE connta og yielsa zpgx cz c oprernscgespi zxry tlx horet machine learning algorithms, zc nxw data nzs’r op pesdsa gruohht jr.

Exercise 2

Rbq 95% oencidefnc eelsslpi tkl vcds svfl sscepei vr yvr erlow efgr hnwso jn figure 15.16.

15.7. Strengths and weaknesses of SOMs and LLE

Mvfpj jr foent zjn’r zxba rk frvf whchi algorithms fwjf rmerpof kwff tlv z geivn arzo, kptx tvs vamx tstenrsgh nsp wsasenkese zrrg wffj kbfy gqx dcdiee erhhtwe urv SDW tx LLE fjfw emrrpfo wfxf ktl xpu.

The strengths of SOMs and LLE are as follows:

Yqpx txz byre nonlinear dimension-reduction algorithms, hzn ax zsn lvreea nepatrst jn rog data rewhe inlera algorithms (fjkx ZRB) uzm jzfl.
Dwv data nsz vg ppdmae nrex ns egintsix SNW.
Yxpb ctx yeaolabsrn enxevnipsei kr tianr.
Yrnenunig dxr LLE algorithm nv dxr cmax data vra rwjp krp cocm avleu el k fjwf lsayaw eodpruc rqk aozm dnedmgbei.

The weaknesses of SOMs and LLE are these:

Cdvq otcnan aivnetyl elahdn categorical variables.
Bkq lwore-olnmdaieisn rotepneetnrisas ots nvr dilyectr eaentrtreblip nj ermts lk vrq iolngiar variables.
Agxh stk estiivnes rv data nv fdfrenite saeslc.
Qwk data aontcn vy paepdm enkr sn nesxtgii LLE.
Xhxd nxu’r yiseclresan pverrees rdx lgobla ruusctter le vqr data.
Tginenrun ykr SNW algorithm kn vgr xcma data vzr wjff opedcur c neteriffd uzm pzsv rjvm.
Sffmc SOMs azn pv itiffdcul rv eptetnrir, xz vur algorithm skwor aroh wurj galre datasets (garetre sdnr drduhnes lx cases).

Exercise 3

Njbna rpo lrnaiiog somGrid wv edtarec, caeert thenroa SUW, drq sairecen rvg neurmb vl stotaiienr er 10,000, nyc rva rgx alpha amnterug rv c(0.1, 0.001) rk awfx ruo learning rate. Bteare org gapinmp ufxr rizh fojo nj exercise 1. Xaetirn nyz vbrf roy SKW llumtipe msite. Jc xru ipnmgpa cvzf irelavba zrng rbeofe? Bns hkb hknti ydw?

Exercise 4

Xeapet qvt LLE egbmedidn, yrh mdeeb nj etrhe smenniosdi idnates el wvr. Erfk jrda now demgidebn ignus drv scatter3() cofitunn, orcgloni rvb tnpsoi gd ipesesc.

Exercise 5

Cetepa tey LLE gemdinedb (nj kwr sioednimns), yqr jyar rjvm yxc rgx enaslcdu variables. Erfe rkg vwr LLE osoc gatinsa xzad otrhe, zhn cbm qrv species rlabavei rk kur loocr ecitsahte. Rrepoam jzgr iendmbedg kr xrq relstu snigu aelcds variables.

Summary

SOMs create a grid/map of nodes to which cases in the dataset are assigned.
SOMs learn patterns in the data by updating the weights of each node until the map converges to a set of weights that preserves similarities among the cases.
New data can be mapped onto an existing SOM, and SOM nodes can be clustered based on their weights.
LLE reconstructs each case as a linear weighted sum of its neighbors.
LLE then embeds the data in a lower-dimensional feature space that preserves the weights.
LLE is excellent at learning complex manifolds that underlie a set of data, but new data cannot be mapped onto an existing embedding.

Solutions to exercises

Train a rectangular, toroidal SOM:

somGridRect <- somgrid(xdim = 5, ydim = 5, topo = "rectangular",
                   toroidal = TRUE)

fleaSomRect <- som(fleaScaled, grid = somGridRect, rlen = 5000,
                   alpha = c(0.05, 0.01))

plot(fleaSomRect, type = "mapping", pch = 21,
     bg = nodeCols[as.numeric(fleaTib$species)],
     shape = "straight", bgcol = "lightgrey")

# Making the map toroidal means that nodes on one edge are connected to
# adjacent nodes on the opposite side of the map.

Add 95% confidence ellipses for each flea species to the plot of LLE1 versus LLE2:

ggplot(fleaTib, aes(LLE1, LLE2, col = species)) +
  geom_point() +
  stat_ellipse() +
  theme_bw()

Train a SOM with more iterations, but a slower learning rate:

fleaSomAlpha <- som(fleaScaled, grid = somGrid, rlen = 10000,
                   alpha = c(0.01, 0.001))

plot(fleaSomAlpha, type = "mapping", pch = 21,
     bg = nodeCols[as.numeric(fleaTib$species)],
     shape = "straight", bgcol = "lightgrey")

# While the positions of the groups change between repeats, there is less
# variation in how well cases from the same species cluster together.
# This is because the learning rate is slower and there are more iterations.

Train an LLE in three dimensions:

lleFlea3 <- lle(fleaScaled, m = 3, k = lleBestFleaK$k)

fleaTib <- fleaTib %>%
  mutate(LLE1 = lleFlea3$Y[, 1],
         LLE2 = lleFlea3$Y[, 2],
         LLE3 = lleFlea3$Y[, 3])

scatter3D(x = fleaTib$LLE1, y = fleaTib$LLE2, z = fleaTib$LLE3, pch = 19,
          bty = "b2", colkey = FALSE, theta = 35, phi = 10, cex = 2,
          col = c("red", "blue", "green")[as.integer(fleaTib$species)],
          ticktype = "detailed")

plot3Drgl::plotrgl()

Train an LLE on the unscaled flea data:

lleFleaUnscaled <- lle(dplyr::select(fleaTib, -species),
                       m = 2, k = lleBestFleaK$k)

fleaTib <- fleaTib %>%
  mutate(LLE1 = lleFleaUnscaled$Y[, 1],
         LLE2 = lleFleaUnscaled$Y[, 2])

ggplot(fleaTib, aes(LLE1, LLE2, col = species)) +
  geom_point() +
  theme_bw()

# As we can see, the embedding is different depending on
# whether the variables are scaled or not.

Chapter 15. Self-organizing maps and locally linear embedding

This chapter covers

15.1. Prerequisites: Grids of nodes and manifolds

Figure 15.1. Placing beads into bowls based on their characteristics. Similar beads are placed in the same or nearby bowls, while dissimilar beads are placed in bowls far away from each other. One bowl didn’t have any beads placed in it, but that’s okay.

Figure 15.2. A sphere is a three-dimensional manifold. We can reconstruct a sphere as a series of linear patches that connect to one another. This three-dimensional manifold of a sphere can be represented in two dimensions by cutting a sheet of paper in a certain way.

15.2. What are self-organizing maps?

15.2.1. Creating the grid of nodes

Figure 15.3. Common graphical representations of square and hexagonal self-organizing maps. The top two maps show a grid of rectangular nodes that are each connected to four neighbors. The bottom two maps show a grid of hexagonal nodes that are each connected to six neighbors.

Note

15.2.2. Randomly assigning weights, and placing cases in nodes

Note

Figure 15.5. At each stage of the algorithm, the node whose weights have the smallest distance to a particular case is selected as the best matching unit (BMU) for that case.

15.2.3. Updating node weights to better match the cases inside them

Note

Note

Note

15.3. Building your first SOM

Note

15.3.1. Loading and exploring the flea dataset

Listing 15.1. Loading and exploring the flea dataset

15.3.2. Training the SOM

Listing 15.2. Loading the kohonen package and creating a SOM grid

Choosing the dimensions of the map

Tip

Choosing whether the map has rectangular or hexagonal nodes

Tip

Choosing a neighborhood function

Choosing how the map edges behave

Training the SOM with the som() function

Listing 15.3. Training the SOM

Note

15.3.3. Plotting the SOM result

Listing 15.4. Plotting SOM diagnostics

Note

Warning

Note

Listing 15.5. Plotting heatmaps for each variable

Note

Listing 15.6. Plotting flea species onto the SOM

Figure 15.9. Separate heatmaps showing node weights for each original variable. The scales are in standard deviation units.

Note

Exercise 1

15.3.4. Mapping new data onto the SOM

Listing 15.7. Plotting flea species onto the SOM

Tip

Figure 15.11. New data can be mapped onto an existing SOM. This mapping plot shows a graphical representation of the nodes to which the two new cases are assigned.

Using SOMs for supervised learning

15.4. What is locally linear embedding?

Note

Note

15.5. Building your first LLE

15.5.1. Loading and exploring the S-curve dataset

Listing 15.8. Loading the S-curve dataset

Listing 15.9. Plotting the S-curve dataset in three dimensions

Tip

Figure 15.13. The S-curve dataset plotted in three dimensions using the scatter3D() function. The shading of the points is mapped to the z variable.

15.5.2. Training the LLE

Note

Figure 15.14. Plotting 1 – ρ2 against k to find the optimal value of k. The solid horizontal line indicates the value of k with the lowest 1 – ρ2.

Note

Listing 15.10. Calculating k and performing the LLE

15.5.3. Plotting the LLE result

Listing 15.11. Plotting the LLE

Figure 15.15. Plotting the two-dimensional embedding of the S-curve data. The shading of the points is mapped to the z variable, the same as in figure 15.11.

15.6. Building an LLE of our flea data

Listing 15.12. Performing and plotting LLE on the flea dataset

Figure 15.16. Plotting the output of listing 15.12. The top plot shows 1 – ρ2 for different values of k. The lower plot shows the two-dimensional embedding of the flea data, shaded by species.

Note

Exercise 2

15.7. Strengths and weaknesses of SOMs and LLE

Exercise 3

Exercise 4

Exercise 5

Summary

Solutions to exercises

Unable to load book!

Figure 15.13. The S-curve dataset plotted in three dimensions using the `scatter3D()` function. The shading of the points is mapped to the `z` variable.

Figure 15.14. Plotting 1 – ρ² against k to find the optimal value of k. The solid horizontal line indicates the value of k with the lowest 1 – ρ².

Figure 15.15. Plotting the two-dimensional embedding of the S-curve data. The shading of the points is mapped to the `z` variable, the same as in figure 15.11.

Figure 15.16. Plotting the output of listing 15.12. The top plot shows 1 – ρ² for different values of k. The lower plot shows the two-dimensional embedding of the flea data, shaded by species.