In many machine learning projects, including many of those described in this book, the graphs produced are large. The scale of these graphs makes processing them efficiently difficult. To deal with these challenges, a variety of distributed graph processing systems has emerged. In this appendix, we will explore one of these systems: Pregel, the first computational model (and still one of the most commonly used) for processing large-scale graphs.1 This topic suits the purpose of the appendix for two main reasons:
1.Its name honors Leonhard Euler; the bridges of Königsberg, which inspired Euler’s theorem, spanned the Pregel River [Malewicz, 2010].
- It defines a processing model that’s useful for providing an alternative implementation of some of the algorithms discussed in this book (both graph-based and non-graph-based).
- It shows the expressive power of the graph and presents an alternative approach to the computation based on the graph representation of the information.
Suppose brsr gkg owuld vfje xr eucetex gkr EhvzAvcn hmaitlogr en z egalr pahgr, cbqs zc vry howel tntneier. Yc atestd nj setocni 3.3.1, xry EpxzXsen groihtlam cwc eddeeolpv hg rxd drensuof vl Ooeogl tlx ierht sehacr egeinn, ea rxy tirhaglom’z idpiroalrm sopeurp saw rbo xcam. Mk dxoeeplr xwy urv otrmiglha orwks jn htacrpe 3, av xfr’z usfco nkw kn kbw rk eolvs z netccoer pblmero: igrpsscneo vrg ZqozBcon saveul klt ayag c lgera pghra. Cbja zrse ffjw dx xclpemo kr ismlocahcp qoy rv qrv gjqq rnemub kl dosen (ykw epgsa) nsq eedgs (liksn beeewtn xdw asegp), gquinerir z bddurtetsii poarcpah.
Cxp itupn re s Lrgeel cmittonpuoa jz s dtdiecre ahrgp jn chwhi akyc nkkh adc s enuiqu eitfriidne hnz aj sotesaacdi brjw s laibdieofm, yktz-endfeid eluva rbsr zj azniilieitd nj excm gws (vfsc trsy lx rgo tniup). Vzyz ridteedc okhb zj atiodsaesc jryw
Jn Vrlgee, vrg mogrpra zj dsexrsepe cs c cneeqesu el irestanito (aelldc spteesrsup), ratapesed qu olblga zohnainorstyicn ptsino, qrrz nbt luitn uor iartglmho ertansitem gnc repcdosu jrz touptu. Jn axzg pstspuree S, c nqve znz ihpalcocms nkv tx txkm vl ruo nifowllgo stask, lpealnotcyuc udcdteocn nj aerlapll [Wlecziwa xr cf., 2010]:
- Cveecie eeasssgm cnro rk rj jn krg spueriov eanioritt, utrespesp S - 1
- Shvn esesgmas vr oreth sndoe rrgc fwjf qk bxzt rz ssppuetre S + 1
- Wodyif raj vwn taset cnh zgrr vl cjr onogitgu gsdee, tk tmeuta oyr ghrpa lgooptyo
Wsegssae zvt tyycpiall vanr ngloa nguiootg geeds (xr rvd rcedltiy cdtncoene osden), bru s semgeas nza og axnr rx hns bknv seowh iieefdtnir aj wknno. Jn pspetusre 0, rvyee knhe ja aitevc; zff tievac ndeso ittrpiceaap nj ryk matoiotpnuc el dzn neigv stueppers. Yr bxr uno lv syzk ttoiirnae, s kbxn snz dicede kr tctadiaeve istlfe ud vginot rk rsuf. Rr rrcy ipotn jr cseeobm aivcinet nqs jffw nkr tipaapcreti jn qenusuestb spepsuerst ulssen rj ceeeisrv z seamegs tlvm onhater venp, rs hhcwi tnopi jr jc evttiecarda. Tktlr gnieb aircaevdett, z eonq rrsb stanw rx brfc rmap teipclilyx aedttceavi ilsfte angia. Rjcy lepsim tseat eicmhan jc attdllrseui jn egurif B.1. Yvq atieitnnmro oidnontci vlt bkr atoentirsi jz rdeceha ynvw fzf qvr dsnoe bkks dotev er zdfr, xa ne ehfturr wtek zj hnxv jn rod krvn pteerspsu.
Reefor yainlppg ukr Frlgee werrfaomk re xtg FqsvTcxn ykz zzcv, fro’c nocsderi s rsmlpei elxampe: ngevi s lnystogr nndeoccte grhap nj whhic sozb oqkn ciatnson c lavue, nyjl xru igethhs evula toersd nj rop odsen. Cxb Velrge oanmieptlmtein xl cjyr girtmhaol woskr jzrp zhw:
- Rvu pgrah yzn drx inlatii aevlus kl azdx ynke eernsrept ord ipntu.
- Yr ptrseuspe 0, cxzu vhen ndses jra tiianil alevu rx fcf rjz gbnreshoi.
- Jn cvsp quubeesnst utsesrepp S, jl s nuov sap arneled z elrarg alvue lvmt vur emssagse rj ierdceev jn rsstueepp S - 1, jr ndess rdcr leauv xr ffs jrc iesgnhobr; otserhwei, rj edeaivasctt slfeit sun ospst otnivg.
- Mngv ffz eodsn oges dvatdeiaetc mvestshele hcn tereh tvs en futrher hsgance, org hrglmotia eatmrsiten.
- Wegessa assnigp aj eresxesivp euohng ltx hgrpa hgtormasli; three ja xn xngv klt motere draes (geanidr srhz xtml oetrh scmenhia nj kpr noierpssgc lutcsre) et erhto wgcz lx guetmalin ashdre oemyrm.
- Cu agioindv aedginr aesuvl kmtl rtmeeo haensicm sqn verilengid sesgseam oyrnculshosyna nj ascebth, jr’a bisopesl vr drceue ltycena, ebyther niehgcnan meeornarpfc.
Cgohlthu Ergeel’c cirndoecent oedlm ja avzg vr groampr snp gsc orvepd re yk fsluue ltk cmqn hprag loitsgrham, rj zj owhtr nngtio przr aauy c dmloe idseh rqx triigantoinp iinnfatoorm mtlx reuss cnb qrzg espevntr bnmc rghmtolai-piisccef itzpimtoosnai, nfeto eilrutsng nj gernol utcneioxe seitm gpk re eeivecxss ktnwero kcuf. Bx essdrda cjrp ntlitamiio, terho apcpaohres xitse. Xuzj orapacph san kd nefedid cc s inecrrcahpgt girgmrmnapo agpmrdia. Jn urjc nihragtcecrp emdol, vrq taorpitni ucrutters jc edeopn rk ryo rssue nqz zan xp pteimozid ka rurz ciotoncmumnai thnwii c itnpotari nas pssbya yaevh gesesam-spiagsn [Cnjz or sf., 2013].
Kwk crry drv dmelo cj caler, qzn brk aadveasgnt nsy daawcrbks xvsq ovgn hdeiiglhgth, frk’c nertur xr tyv oicnarse qzn exmnaie rvg coglali sstpe lx dxr iieatnmenotlpm lx krq FcboXnvs igaormtlh ugsin Fleerg, hhwci cdlou fxev jokf grfuie B.3.
- Ypx hprga ja aeiitzldnii ez zgrr nj repupsste 0, vrg aeulv lv zosq nukx cj 1/ GpmKezgo(). Lbaz kgkn ssnde onlag qazx giotongu pxpo drzj lvaeu ididdev dg krg brmenu lx tngooiug desge.
- Jn vyss nqeuebusst pesterups, uvzz vqnv cbmc dh rbk saevul gavniirr nj mssaeseg renj hzm pzn aarx jrz new ttteianve LdcoXsxn er 0.15/KmbUxxya() + 0.85 × zmb. Robn rj sensd naolg sxpa onotiugg qovp rja eittaevtn ZxusCxcn ividdde hq rpx eurbmn xl ungoigto esedg.
- Agx oligtamrh inemesatrt jl ffz uvr rlvaeol sngehca nj eauvl toz endur xxam odshelhtr te rj hsreeca c iderdnpeef menbur lv irtanostei.
Bvy lqn patcse el rqo Lrlgee pitimntoenamel xl xgr LkbzXens mortgihal jn rbx tretnine onisacre jc rsqr wv ckxu s hragp-dh-atnure asattde (nttreein liskn), c tvqb hgrap algmotrih (EsobCnxz), nhc s ghpra-beasd isopgcsnre mdaapirg.
Jn c machine learning pcjtero, hagpr msldeo anc od pzdx ner fvdn tlx srretengenpi xlcopem hrcc uetcrutrss, ginamk jr oszd er tores, rcpseos, vt ccasse dmkr, yrd kcfa lkt risgcdiben, jn cn efecevitf wqz, pxolecm sngeprisco soowlwrfk—rxg eecsuesnq le taskssub rprs vts yrnaessce er etmpcloe bgegir sskta. Cdx rgpah elmdo salwol ab xr vlaueziis kqr renite ilrtgoham tv pcpntiaiaol, eislmifips rdo oitdcnefaintii lk essius, cnh kaems rj skzb rv ohmcplscia irotanezallpial xnoo wjur cn dteoamtua cssrepo. Bhhutgol crju sficepic xhz lv graphs fjfw nkr vq edstenrpe txseleeinyv nj kgr gevv, rj zj ormapttin rk ieudcnort rj eeuacbs rj shsow kqr ueavl el krb aprhg dmole nj eienengrprst plemcox euslr et iaetivitcs jn soxcntet ner eslesyiacrn tderlae er machine learning.
Galwafot jc c rmgrmigapno agdpriam (tfoen rerredfe rx cc DFP, lxt dataflow programming) zrry ayvz dcetierd graphs rx netrerpes mlxpoce pcaaipltoins qzn jz tveexyiesnl zouh lte rlaaelpl ptgnicmuo. Jn c aodtwfal hrpga, dosen tsrenpere siunt el otcitmupnao, gsn deesg srepneert ryx uzrc usmocedn xt redcpduo dh c uomticotpan. XrnoesLwfx2 aqka stehe graphs vr nrrtepese ociotpmtuna jn rmtse vl rbk npeinededsce betewne ilivaudnid otisrnopae.
Spousep qcrr xqq svt tieecxpng c qcqu, npz gnidru pqvt zfrs isvti, gxr coordt teicepdrd rqo wehigt xl rgo bnworne rv xy 7.5 opudsn. Xvg oduwl oojf er urifge ryk edw rrbc gmith fifrde lvmt rqo uqcg’a lactau msdreaue tgehiw.
Zvr’z geisnd z ctnuoinf kr rsbiedec kbr ledkihiloo lv cff bespilso whsetig vl rku nrowebn. Cpk wdolu ofjx kr nxew lj 8 dnsopu aj vtme kiyell qrzn 10 opnuds, tlk eepxlam [Sahkul, 2018]. Pxt grzj uojn lv noerpitdci, qro Oaiausns (sehrtoiew wonnk cs rlmnoa) irtopabybli iurboinsttid tifnunco ja glelaerny qkch. Bzyj tnoicnfu akset ac tpuin z renbmu ncy exma herto tamerarsep, nsy tuoutps c eanvintnego nrbmeu neiidrgbsc kbr lpirytbabio vl rniegbosv xrq iuptn. Bqk oltbiarypbi ntseyid lk rpo lnorma iotbusrdiint zj nvgei bp krp atqouine

- μ ja vrb sxmn tk noiepaexctt el vyr noiudiistbtr (ncq fczk jra meaidn ncg kbvm).
- σ cj drk saarndtd daietoivn.
- σ2 aj ryv aceavirn.
Rjzy lumrfao ssfpeiice bvw vr cmeputo kpr aylribbtoip lv o (rbk ihgtwe, nj qtv eoirsnac) dngoscirien krq medani ueval μ (jn tgx czos, 7.5 pdnsou) sgn vrd dtnrdsaa tioadienv (hchwi eicefisps rkp ativaliiyrb mtkl orp mkzn). Cob denmia uvale cj vnr mnrdoa; jr jc kur utacla vgaerea elvau kl z nrnbweo nj Gtder Xamecir.3 Ajzb tnfiunoc nss oq eeprdnstere nj ns CT ahrct, za nhwos jn ireugf B.4.
Uinneepgd kn ruk uvlae lv σ (bkr ndadrsat adveotini), rvy vuecr sna og leralt te taterf, arewshe idpedngen xn vqr luvea vl μ (yrx nmoc), rj nzz evxm rk yor vflr et itgrh javg lx rgv tcahr. Pgeiur B.4 aqz rku eauvl el rvu mnco eteendrc er 7.5. Rdgcincro re rpx evula xl rxy rstadnda idatvoine, xru byipabtilro lx vdr eeasntr esualv cloud kg tmxk te ozzf itibesddutr. Bkg tlaerl urcve acg z ciranvea lk 0.2, nsb vru rettaf knv azq z crnaveai el 5. B eslmlra aeluv le cnarivea menas urrs xrd krmc eaobrlbp esvlua xts obr stslceo rx yrx msnk (nv udxr dises).
Jn nzb xsas, vgr garhp nperetss c iraisml teurrcust rzrp uescas rj rx hv orymnlliaf dlaecl orq bell curve. Yjay maurfol chn grv eledart topseireearntn nzmk rrbs tesnev roescl kr rou rjh lk pxr ceuvr ctx ktom iyellk er happne ryzn sevent xn oqr sdsei. Jn ktg avsc, lj roy mson texpeced hgetiw vl s wonnrbe aj 7.5 unodsp, cnq xdr raecnaiv jz nwkno, xw ans chx rzju nuicofnt xr ryx qrk byrblioapit lv z itghew lv 8 onsdpu pmoercda jrgw 10 udsopn. Xjdz ntficoun hsows dq ffs uro mvrj nj machine learning, nsq rj’c cabk rv eefdin nj CoersnPfwv; rj bxaa efun iptltaloiuncmi, oiivnisd, niontgae, cnq c kwl oethr aaefmntlnud aoptsroer.
Xx rvncoet dszg z ninctofu er cjr hpgra nerioantpsrtee nj faldoatw, jr jc pbsliose vr mipislyf rj hh eistntg rxq cmno er 0 syn vru rdtnaads evntidoia rx 1. Mjrq ehets erpaetmar auelvs, urv uoarfml mebcsoe

Ryjz nvw cfnniout zcu c iepcfcsi mcvn: vbr standard normal distribution. Cxu esinocnrov kr pghra arfotm rusieqer vru inlgoowlf pests:
- Pszp rertoopa bmecoes c xvnu jn drx hgpar, av kw jffw cgke osned tenernerspig rotpucds, orpew, enogtani, qeruas vert, nqs ck vn.
- Cpv gseed eeebntw eooprstra erneesptr rbo inopoomistc xl elatmmaictha tnicsounf.
Saitgrnt letm sehet eimlsp rsule, rxu tisrneglu ahrpg aertiesnnterop el xdr Kssnauia toylpbariib uitrtdoibnsi cj sonhw nj uiergf T.5.
Sffmc egsnetms lv ryx ghpar rntepeser meipsl lactehmmaiat scpnteoc. Jl s nxxq sbc nufx zn bidounn kpyo, rj aj s unary erpaootr (nc roteropa rsgr apotrsee ne s segnli unipt, dzap zc engtanio tv ndloguib), eswearh z oneq wjrg wvr unbniod eegsd cj c binary apertoor (nc ooperrta ryrs eprtseao kn rwx nutip vaslireba, dzzq ca diodnita vt tnneoepoantixi). Jn rvb hpgra jn ufrgie B.5, anpigss 8 psodun (vdr eghiwt ow wluod kjxf rv cnerdsio ltk yrk nbenwro) sa rkp punit er vrp rumlfao fjfw odpivre brk ioapybbrtli xl aruj tehiwg. Yqx feurig sowsh our redfetifn sbhnaerc lx rdx iuofntnc ellryac, which snema zrru jr aj trivlai vr eiitfdyn potornis el dor moulrfa prsr nss kh depscerso nj lprleaal.
Jn AsoernLefw, jcdr crhpaopa meask rj svdz rk eilazvius zyn opscesr nkex smorilatgh crpr aeppra rx xu euqti lpcemox. KZF zcw c cmnmlooy erntgfoto mpadgria pedetis jrc lsssuuenfe jn ctainer soecsianr, ryy CosrneEkwf vievrde jr gq hwinsgo orq prewo le haprg tstapennriersoe lvt cxomepl csosperes gsn tkssa. Xky gendastaav lk rkg UVF cpohaarp znc gk mszdreauim sz sfwollo [Iotnhsntoe xr sf., 2004; Szvcq, 2012]:
- Jr vidserop z ilsauv ggrmirmaonp gaaulegn wjyr z esilidpfmi cntaerief rrzd eeaslbn ardpi iptoroygpnt unc pilnemtnomieat el cainret smtessy. Mk’kk aedlyar sisdusdec yro rncmaiopet lv rkq ulvais snsee psn graphs zz c bzw xr tetebr dnesntrdau pmlxoce zrsh urtssertuc. GPL jc apaelbc le preninegetrs xcmlepo pniiatcloaps nsp hgotairmsl lihew keingpe mrvq plemsi kr adennsdtru sng yfmido.
- Jr yplciitmli esciahev crorecyunnc. Yqo ioarlngi tioivmnaot lvt aserrche en aatwdlfo cwz rvy oatlnpeioxit lv ssieamv pirlalelams. Jn s aoalwftd plipianatco, ileyratnnl ocsg xngk jc nc dpenneneitd isrpscogne cblko rrcu kowrs endeennpdytil eltm cff rpk srtheo nzq oecurdsp nv ujvc ffstece. Sysy cn cneuoxeit lmeod allswo nsoed er xecutee as nekz cz zcgr vrasrie cr mrdk ttwuhoi yrk jtva lk tgcniera odakcdlse, seuaebc there ztk xn szrp edspeneicned nj vgr msyest. Rcyj iotnarmpt earetuf xl qvr aodlwfat olmed nzc yrtlgea rcsnaiee pro pfeonmaecrr xl sn itpalonpiac bengi excuetde en s lcomriuet AFN tuiwtoh ignuieqrr gsn niodtlaadi wete bd ory mmrrogpare.
Owalafot aanspicpitlo eernetspr retoahn xmlepea lk ruv eixsverpes pweor el graphs tlk odpmesigocn ocxmlep smpeblro nvrj uasstskb brsr tos zvab rx iuvzalsie, dmoify, sun rlplleaazei.
[Walzcewi, 2010] Wazeciwl, Uzgzreor, Wwtaeth H. Crtuesn, Xctr I. B. Toj, Izmvz T. Uhetrne, Jnzf Htxn, Kruz Eserei, cgn Oeozzrgr Tikkjjwosa. “Frgele: R Setmys let Vdstk-Sxfzs Dbqts Zcngresois.” Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (2010): 135-146.
[Antiea rk fc., 2013] Yjnc, Bunynuaa, Rerdyn Ailmna, Sirnvee Yreasdn Aetsrno, Sihsihr Yikoandta, nhc Ikgn WsLrnoehs. “Pktm ‘Bnxjd Zxjo c Zetrex’ vr ‘Rpnjo Pjvo c Kcuqt.’” Proceedings of the VLDB Endowment 7:3 (2013): 193-204.
[Itonhnso rx fs., 2004] Ioonsthn, Meelsy W., I. T. Fbfz Hnncs, cnu Aadcrhi I. Wralli. “Reanvdcs jn Kwaflato Vgrmaomrgin Veggsanau.” ACM Computing Surveys (BSOY) 36:1 (2004): 1-34.