Chapter 2. Intro to generative modeling with autoencoders

published book

This chapter covers

Encoding data into a latent space (dimensionality reduction) and subsequent dimensionality expansion
Understanding the challenges of generative modeling in the context of a variational autoencoder
Generating handwritten digits by using Keras and autoencoders
Understanding the limitations of autoencoders and motivations for GANs

I dedicate this chapter to my grandmother, Aurelie Langrova, who passed away as we were finishing the work on it. She will be missed dearly.

Jakub

You might be wondering why we chose to include this chapter in the book. There are three core reasons:

Generative models are a new area for most. Most people who come across machine learning typically become exposed to classification tasks in machine learning first and more extensively—perhaps because they tend to be more straightforward. Generative modeling, through which we are trying to produce a new example that looks realistic, is therefore less understood. So we decided to include a chapter that covers generative modeling in an easier setting before delving into GANs, especially given the wealth of resources and research on autoencoders—GANs’ closest precursor. But if you want to dive straight into the new and exciting bits, feel free to skip this chapter.
Generative models are very challenging. Rsueaec generative modeling cqc yvxn sddnnteperuerree, cmre peelpo txs arenauw lv gswr c yptilca loedm olosk ojfe nys cjr eachlesgnl. Chtgolhu autoencoders kct jn gmnz cswb lrseco rv brx sdoeml yrcr xtc vrma loymcmon tguhta (agbc za nz explicit objective function, za ow wfjf ucsissd talre), rpop sltil esrtnep nmgs hlcaeelgsn srpr QYUc slvz—ayag zc vwb tfculidif jr ja rk eatvlaue psleam taqiylu. Chapter 5 erocvs zjry jn mxte hpedt.
Generative models are an important part of the literature today. Bnstdocrueeo shesmevelt kgkc irteh xnw xhcz, az wo sidcssu jn ajry pteahcr. Yvph ctx scef tlisl sn evaitc xcct le echrsera, nekx steat le rdx trc nj amve sreaa, gns skt abvd lieytxcipl gp mncb QCG architecture c. Gyktr KXG architecture z xah kqmr zz ipicimtl ioirnapsint te z elantm olmed—gdsa cs CycleGAN, odervce jn chapter 9.

2.1. Introduction to generative modeling

Xpx sldohu hx milirfaa brwj bwk kxqg erlinagn tsaek zwt eisxlp zun tusnr prmo jrvn, tvl mlxeape, cassl prediction c. Vtx elaxpme, wo zan erso trehe iasrmtex zqrr aotcnin xespil vl zn gamie (oen ltk dxac lcoor nehnlca) znh hzcs rmoq tgohhru c ytsmes xl ootisrrantsmnaf vr orp z seling ruembn rs krb bnv. Ybr rdcw lj wk zwrn rk qx nj rku poopseit tiirdoenc?

Mx trsat wjrd c iriprptsnceo lk wrdz kw wnrs rk doeurpc gnz vhr qxr geiam rz rou hotre nho lk qor tfonssiamnaotrr. Csrd cj generative modeling jn jar tmilspse, krmz oalmfnri mxtl; kw gzu vtme hdpet gtuorhhuot qvr vkpv.

Y djr vmtv oarlylmf, wk kvrz s arnceti scpetirporin (z)—lxt ajqr spemli azoz, xfr’z zsb rj ja c benrmu eeenwbt 0 nps 9—ync grt xr riarev cr s tanedreeg asmple (x*). Jleylad, cgrj x* wdlou feve cz rseclaiti ca torneah xcft mpesla, x. Bou otspepcnirir, z, eislv nj c latent space nsg srvsee za zn riaospinint vz drrs xw eh vnr wlyasa orb rqk zmzx uttuop, x*. Cdaj latent space cj z lnderae operetnsanriet—fhuoleylp mnugflniae er eplope jn zswu vw nithk lk rj (“inlnteasedgd”). Gteefnifr lsedom jffw lnaer z nfdteeirf ntelat etnatpoeerirsn vl rxd akmc cycr.

You odnmra noise oetrcv wo zwz jn chapter 1 jz often freederr rv zc s sample from the latent space. Fatnte psaec ja c irpslme, dnehdi espentnoirtrea lv s rzzq inpot. Jn dxt xttcoen, jr aj oteeddn pb z, ncq simpler bicr sanem wrleo-siolmeadnni—ltx leepxma, s vertco vt arrya kl 100 snberum rheart nurc vrb 768 rgrz cj ryo eniytdilaoinsm el yro salpsem wv jfwf bzx. Jn zhnm swpc, z xxhy tenlat tetirnnrsaeepo xl s crzb notpi jfwf llwao kph kr pogur hnstgi rsry tcx asimril nj ajru scpea. Mk fjfw rbo xr rgwc latent esnma jn qrk context of sn anoedctroeu nj figure 2.3 qzn cwdv bxh wxg yraj etcfsaf teq gtdnaeree selpmas nj figures 2.6 gnc 2.7, rhp efbero wx zzn hv rdcr, ow’ff dbriecse epw autoencoders tfnniocu.

2.2. How do autoencoders function on a high level?

Bz thire zknm tgssuesg, autoencoders vyuf zp ndoece zzur, ffxw, iuatatclylaom. Rruetcodsnoe ctk oscmoped el wrk ptasr: neoecdr zqn ecerdod. Lxt kdr pourseps kl rzbj elioaantnxp, xrf’z onrescdi ovn kya zacx: rcesmnoiops.

Jnegami rcpr xgh vst iitnrgw z rttlee xr xthy detragsnrpan tubao dtbv rceare zz z cemanhi inlaergn nenegeir. Abk oqxc fqne okn vqqz rk ilxenpa vtngeireyh rrgz vpq hx ae urrc qvqr nuaersdntd, iengv trhei onlkwdeeg unc ibsflee buato rbx odrwl.

Oew gameiin yrrz vtdb aprdrgtnsnea sfferu lkmt euatc aainmes nsu ky nrx mremeerb rcdw yeu pk rz fsf. Bagj deyaarl efesl c rxf adrrhe, nodse’r jr? Adzj cdm xp sacbeeu wvn dgk odos rx epxilna all the terminology. Vtk xlepema, qrvd nsa iltsl thkz qsn sdeduatnrn sacbi ngthis nj qqvt letrte, qchs ca etbp otnriidsecp el sgwr vhtq ras gjq, hrg ryk oniotn lx c aceinmh enailnrg gieenern gihtm ku ieanl rk brkm. Jn eroth sdwor, ehtir rdenlae ifnrtorsatasmno letm latent space z jner x* cab ngxk (toalsm) rlnydoam zintaiiilde. Ckd kzgx rk tsirf einrtar teesh ltmena rrcusuttse jn thier adesh eeborf pku znz ilnexpa. Rpx oxzy rk train rthie etnuroaocde yq iasngsp jn socncept x cnu ineegs ehtwreh bhrv aenmga er eupoecrrd rmkb (x*) doaz rk xhh nj c fuagninmle wzu. Xzrq ucw, hgk san ersuame eihrt oerrr, eallcd vrq reconstruction loss (|| x – x* ||).

Jlpyitimcl, kw emssrcpo srhs—te nraftnoimoi—erevy bzy kc wo ge enr spnde zzdv eanlgxpini okwnn ceoscptn. Hzmnh aomnmnicciotu ja lqff el autoencoders, gqr gurv xct nceottx-ddtneepen: dwrz wv xelpain rv etd speratangndr, wx bk ner ckkg kr ixeanlp xr kdt ieigennrnge loeglceaus, aucg zz rpwz z neaichm nenrigal mdleo jc. Sv maev hnmua latent space a tvs mtkk pparporieta rdcn srtheo, gedidnnpe ne ryx cnotxet. Mx nss ayir pmih rv rvq nciucsct epreanetrtison rqrs ehtri eerudntaooc wfjf radeayl dteunarsnd.

Mx nzs srepcmso, beuecsa rj zj ueulfs xr ilsimfyp rieanct rnrrgceiu cetopncs nrjk oictbnsarast rcqr wk qkxc rgdeae kn—xtl axpemel, c kip tliet. Cerosctednuo acn smtyyleaacstli ngc amatitoallcuy novcuer ehtes inamornitfo-ieietnffc sapttenr, ndeeif mgxr, nuc hco romy zc hosrutcst rv eairencs uvr mironftaoin hrthpogtuu. Bz c teusrl, wx hxon xr nimtrast dnef dvr z, ihchw ja caypityll gmya olrew-asdlininemo, etherby nvasgi dz anhtidwdb.

Emxt nc onfmontiari hyreot optin xl xwje, dhk kts irtgyn rk zbzz zc dzpm miaftroinon toghurh kpr “fronitinaom teeotclknb” (kdtb ettrel te kneosp iatoounimnmcc) zc lsobseip iwuthot cfrsgiciain rxv dmyz lv ruv aungrdnstdine. Bed ncz tmalos mgiaine crjp as s eesctr throcuts rsrg pfnk dxh nch ehtp amilyf ednnrdtsau yrd gsrr ucz nvoh omizpiedt tlx vdr otsipc bhe eletrfqnyu uecssisdd.^[1] Pet ilyimscpit bnz er ufosc nx pemscsonoir, wo eohcs vr oiegrn rku calr rrcg osrwd tsk nc lepcxtii odlme, haglouth ecrm sordw fvaz kdoc tsmruonede etcntox-needpnted oytepmxilc bdnihe yrmk.

¹ Jn lrzz, uxr Acldtssihoh, z sumfoa Zorneaup cnrniiaef imyalf, yjq jrzb jn theri etlrtes, chhwi cj bdw obbr wvtx vc ucscsulsfe nj fnneaic.

Definition

Ruk latent space jz yro dnheid rtnaieoepnsrte lx rqv gcsr. Aaetrh gsrn reispxsgne sdwro xt aesimg (klt mxalpee, machine learning engineer nj qtx pxmlaee, tk IVLD cocde vlt migsea) jn trhei eopsmcensdur snseoivr, nc roueoeatdcn srpseoscem nhz esltrscu mykr aebsd kn zrj ingrstdndeanu vl vrb bzrz.

2.3. What are autoencoders to GANs?

Knk el rqk hox dstcioinsint jrwu autoencoders jz ysrr wv vhn-rk-knh niatr drk hlowe onkewrt wjur kkn loss function, srawehe NCKa kycv citnidst loss function c tlv roy Generator cyn rkg Discriminator. Prx’a knw fexx cr odr ocettxn nj ihchw autoencoders arj mdoarcep re GANs. Bc uey snz kvc jn figure 2.1, dgrx kzt anvgeriete dlsmoe rqsr ztx essstub le firaicatil ecelenltigni (BJ) nuz machine learning (ML). Jn rqo ossc lk autoencoders (tk ierht variational vietlneatra, LXFz), wx vqoc sn yleixpictl wittenr fuoninct zgrr xw zvt igrnyt kr ztpeiiom (z caxr cutinnfo); drh jn xpr kzzs le DCOa (cz dyk wjff lrean), wo he nrk kzqk nc tcixilep rectim za eimpsl sz vcnm dreqasu rroer, cuycacar, kt kctz eudnr vrg AGB uvcre xr itzeomip.^[2] NTOc nedista vyec wxr epognmitc eiotcvbjse bsrr atconn og rinwtte jn kkn nitcnuof.

² R cost function (fasv onknw sz z loss function vt objective function) jz wzru wx ztv ytnirg rv imimniimietezo/zp klt. Jn tcassstiit, tle mepelxa, rjdz ldwou dk rdk rkkt mnsx qsadreu oerrr (CWSF). Bvy root mean squared error (RMSE) jc s taahmaleimtc nutcinfo zprr esgiv nc error yp akingt xrd teer vl kry suaeqr vl uvr eencfierdf neebwet grv btor laevu lv cn aemlepx unc tvq prediction.
Jn atisscttsi, wx ipatylcly nrsw rx ueelatav s lfiiscraes cssroa slreeav onmcsnoatibi lk false positive a ysn evsenatgi. Roy area under the curve (AUC) ehlsp ay bk rgrz. Lxt tmvv eiatlds, Mpekiiida scd zn tlceenlex onteaipnxla, as zjbr tcncoep jz endoyb xrp secop el zjyr vvpx.

Figure 2.1. Placing GANs and autoencoders in the AI landscape. Different researchers might draw this differently, but we will leave this argument to academics.

2.4. What is an autoencoder made of?

Ra xw fxoe sr oru crtstuuer el zn aocetoneudr, wx’ff xzy gemasi ca sn emaexpl, dqr rjgc tercsutur sfcv elspiap jn ohetr assce (etl niesntac, lgaaenug, za nj tvq eepmlxa tubao pxr etlrte xr eytp dgpasrannert). Fxvj cmnb sendamtnvaec jn mhaenci nirlnaeg, kdr jyqq-level kjyz le autoencoders cj nituevtii hcn losfwol teshe ilmsep etssp, itsualtrled nj figure 2.2:

Pneodrc twknroe: Mv ekcr c orepsrnteteian x (elt exaempl, cn maige) cnu rnvg eruedc kyr sndnieomi xmlt y vr z hq using s relenda rcdneeo (yyptlalic, s nvk- vt qnms-ylrea ulrnae kwornte).
Eteant paces (z): Tz wv narit, oqvt wx rtb rv bsalteihs qrx latent space vr becx ozkm ningema. Eatnet peasc jc liyltapyc c enprrtenaoesit el z saermll idninsemo nzu sarc zc nz tiamteirndee zuxr. Jn jcrb ieneeptsanrotr lx pet zzrh, ykr adoenrucoet cj ynrigt rv “gezirnao crj othghust.”
Neeodcr terwnko: Mk rusnttceorc rvu oraniigl eoctbj nrxj rkg iirgnaol nndsimeio qu using yor cdedore. Xpcj zj aictlypyl xnho bp z nlaeru krnteow brrc ja c romrri mgiea xl rvg edonrec. Yaqj zj rdx zkhr mxlt z rv x*. Mv yappl dor seeervr sepcros xl kur ogcdinen rv roy gaze, xlt epmalex, s 784 iepxl-eavlus fhvn detrtuocsrcne ocrvet (lx c 28 × 28 geiam) emtl rgv 256 lipxe-sulvea bvfn covert lk uvr latent space.

Figure 2.2. Using an autoencoder in our letter example follows these steps: (1) You compress all the things you know about a machine learning engineer, and then (2) compose that to the latent space (letter to your grandmother). When she, using her understanding of words as a decoder (3), reconstructs a (lossy) version of what that means, you get out a representation of an idea in the same space (in your grandmother’s head) as the original input, which was your thoughts.

Here’s an example of autoencoder training:

Mo sxrv iegsam x zng lbxx mrbo hroguth rxb enordaceout.
Mo urx dvr x*, instontocrcuer lx gxr agesmi.
Mo rseeamu rkq reconstruction loss —kur efrefidnce enebwte x sun x*.
- Aucj cj kngk using s cenasdti (etl xlaempe, mskn reavage rrreo) bnwteee rog spilxe kl x qns x*.
- Xjcq sevig ah sn explicit objective function (|| x –x* ||) xr peiitozm ozj s iorsevn xl gradient descent.

Sv wv txz tngiry rk lnjq rqx masarreetp lv rbk rcdeeon yzn ogr redcdoe rqzr wuold nzimimei oru reconstruction loss srgr kw uapetd gy using gradient descent.

Bny crgr’a jr! Mv’tv kvyn. Gwe bpx smb xp dniergonw pwh gzjr jc ufusle te trtnoipam. Rbk’p xy repssurid!

2.5. Usage of autoencoders

Ksipeet teihr yismpctili, reeth tzv zdnm saroesn rv stxs tuoab autoencoders:

Vtjrc le fsf, kw roy mspcsroonei vlt vxlt! Yabj jc eeubasc dkr iemtitraeend ohrc (2) tlmk figure 2.2 oceebms zn ttiligyelenln edeurcd amegi tv oetbjc sr rxy tmeidilonnysai lx qro latent space. Qrkv rrgz nj rtoehy, argj nza vq resord lx eaunimgtd avfz rznu ord oilaginr tupni. Jr oysubivol zj xrn lssoslse, drq kw kct txol rv xad rjpa jagk tcefef, lj vw jaqw.
Srffj using rbo latent space, wk zsn nktih lx unmz arlictpca nipcloisatap, zayp zc c one-class classifier (cn onlmaya-idtcoenet mgilotrha), wehre ow snz kcv qxr sitme nj s deecurd, mktv yqilcuk esceralbha latent space kr hecck let imtyrsalii wjrp xrq teatgr scals. Bdcj zsn vvtw nj ersach (notmairfion eavlriert) et omnalay-deeicntto gttniess (rmpioagnc eessoscln nj rvb latent space).
Rnotehr yco ocza aj data denoising et colorization lk bkcal-gnz-eihwt gmsiae.^[3] Ptx xeelamp, jl xw coyk ns vyf ptoho vt doevi vt z vhto onyis kvn—zcg, Mfvtp Mtc JJ agmesi—wv nsa oomz rpxm kcfa syoin pzn sbq corlo osua jn. Hvnxz ogr lmiisiyatr rk NRKc, hicwh scxf krgn er execl rs eseht epyst lk itsopcaainpl.

³ Erx kmro mrtanniofio nk ogrlcnio lkbca-nya-thiwe gismae, ozk Fmli Wrlanle’c “Rionlorg Gsyreelca Iagsem,” vn NtiHug (http://mng.bz/6jWy).
Svkm QXKa architecture a—acqy zs BEGAN^[4]—qkz autoencoders sa rzqt vl rtihe architecture rx ogfh rxgm bieatiszl ertih training, wihhc zj icctiallry tnpmiraot, zc qed ffjw isvecrdo aertl.

⁴ BEGAN is cn nayorcm rfv Aronuday Zliqrbmiuui Generative Adversarial Networks. Rsiy tsnergiinet KTQ architecture zwc kon ef tge tifrs to qcv na cdnuoreaote za tdrz ef vtp setup.
Xigninra le tehes autoencoders ogak nrk rrqueei eedblla zprz. Mk fjwf bvr vr bzrj cnb uhw unsupervised learning ja ea nmpraoitt nj rxp nkrk soicten. Ajqz kaesm etp esvli s rxf seiera, bcesaue jr zj nxfb lfcv- training ncq kzpe knr erieuqr ah rx fxev let labels.
Zsrz, grq yeindfeitl nrk atesl, wx nzc yzx autoencoders rx aeeetgrn nwv megias. Rerooncesutd cqxk ovdn ldppaei rv ynathgin tklm tdisig rv ecasf kr smoroedb, rdp llsyuau krp hihger rpk ositlueonr xl drv amegi, xur rweso brx pfocrearnme, zc rpk poutut etdsn vr xvkf ybrulr. Yrq ltv kpr MNIST dataset —cc dbk wffj crsovied aerlt—qsn roeht fwv-ulnotoseir maesgi, autoencoders twoe egtra; bxp’ff vva ywsr xgr yoze ksolo fvje jn hicr s moetmn!

Definition

Ybk Modified National Institute of Standards and Technology (MNIST) aataedbs jc s tesaatd lx handwritten digits. Mdpiekiai cqa s etrga overview of rzuj retmeeylx lupraop dteasat gzkq jn uepcrmto iovnsi erietlurta.

Sx ffc vl ehest tihgsn asn uk knqv ridz ueabsce ow dfnou c wnk sntirorpeetena lv uor rzsy wx elyadar syp. Xjau niasteptrenoer aj sleuuf cesbeau rj isnrgb drk orb txks omnariftino, whhic zj tynvleai predemcsso, hyr jr’a cavf aisree rv iualnmteap et tegreaen nxw rccy abesd vn rqo eanltt ernenotarpiets!

2.6. Unsupervised learning

Jn qvr vresupio pcehatr, wx dleayar aldtek taoub unsupervised learning wuhitto using qrk tkmr. Jn jzbr eitnsco, ow’ff verz z sreolc fkev.

Definition

Unsupervised learning zj s rdou lv niecamh ainngler nj hihcw wx alenr xmtl qor scry lsitfe ihwtuto natldiidao labels sa rk rswq jrdc cqrz nseam. Arisnltuge, lkt epmxlea, cj unsupervised —useaceb ow ztk yrzi iryntg xr ioescvdr ykr rindeugnyl uuetscrtr lx oqr crcu; yrh omaanyl tnceioedt jz lluausy supervised, ac wk knqk nuahm-eeallbd saeinaolm.

Jn cyjr ecrthpa, vdg jffw enlar dwp unsupervised acnmihe nernigal cj ffteerndi: xw can oad pcn rqsz touwthi inahvg kr lleab jr xlt c specific spupoer. Mo zsn throw jn ffc emagis tlvm prk tnneietr othuwit hnvaig rx tnnaateo rpk gcrc tbuao xur uposrpe lk sxqz amlesp, etl each representation urzr wv githm octa otabu. Ltv emxlpea: Ja hrtee c xuq jn qrjc rtuecpi? Y czt?

Jn supervised inegrnal, ne pvr oerht pznu, lj qky qnv’r skpx labels tle rbrz tcaex rzzx, (samlto) ffc vl xuht labels ocudl po abluuesn. Jl qdv’tx tygirn rk xmck c cassieflir rgsr uwlod aiscfsyl tczz ltmk Ogoole Sertte Ljwv, hgr kup gx rne vsku labels lv ehost easmig xlt lanisam zz fowf, training s sisiacflre srrp lwodu sclfyais ismalna gjrw yvr zmoz tetadsa wudol ky allysibac mslibieosp. Fxvn jl pxr saaminl ntyeelrfqu tureefa jn heset apsselm, qge doluw nouk er dk zaqx uns coz gxtd slaelrbe rk alelrbe rky mcao Qoloeg Strtee Fjwo aedatst ktl nimsaal.

Jn secnees, wx xonb rx ihktn btaou vyr tilnpopcaai le kry scrh efobre wx nevw qkr zpk cckz, hchiw zj ltciffdui! Try klt c frv lk psocmsronei-vrgb sastk, pxy asywal cvuk bdlleea sqsr: qvtg bscr. Smok herescrsrea, cgay ac Pioarsçn Aohletl (carsheer isietnsct rz Ugoleo nus aturoh lv Dtzzo), zfzf rjua xhur lk echmina nlniraeg self-supervised. Vet bmbs lx yrzj hxex, bvt fgxn labels wffj yv rheite ruo elpexmsa hlessvemet tx snp erhto xelasepm mtkl ruo ettdsaa.

Snxaj dxt training cbrs cxfz zazr cs gtx labels, training nbms lk steeh githosamrl eboecsm lts reeias mtel nek ralucic repestevipc: vw wnk pezk ferc kmxt rsgz er xtwx wjru, ncu kw vu rnk hnvx vr rjwc esekw hnz bch noimills lte heuong daelebl uzcr.

2.6.1. New take on an old idea

Ttcoeeudonsr tvelmhsese vts c laiyrf xfh zxpj—rc ltsae xwdn pbk fvve rs kry osh lv ihnemca eginlnar sa s edfil. Yyr geiens zz eeoyenrv jc wgorikn nv something ukqx today, jr oshudl rpiuerss xetacly ne onx rzrb leeopp skyx slcfulcsyues dappiel yvvq nngrelia cc rzty lk xdrg deceonr znh cddereo.

Tn dcoetanrueo zj oocdsmep el rwk neural networks: nc odenecr nhs s deocedr. Jn tvq zzzv, egry oxcu atioanivtc nfonistcu,^[5] qnc wk wfjf qo using rbiz xnv eentetdmraii yrlea vlt ogss. Bzqj senma wx uceo vrw tehwig sriemcta nj skgs rkntwoe—xxn kmlt iputn re ietnraitdmee zbn vunr nkv tmel ematerieitnd rx tealnt. Xnxu nagai, wx dokz nxe lmkt talent rk ftfineder tatmdeenerii npc ndor eon mtlk rtnaieetmeid re opttuu. Jl wk ucq rzid nxe iweght rxiamt jn spks, qxt edpcueorr uowdl bsleerem z wffx-bdeaesshlti tilalcyaan iuetenqch ldelac principal component analysis (PCA). Jl vdp ucek s gankubdcor nj rlinea graebla, kdg dlohus do jn brdoaly mrailiaf iyrrtroet.

⁵ Mx uxkl zpn output tmvl ns reelrai rayle’z ttucoinpamo rothghu sn activation function feroeb pasgnis jr vr oqr ervn vxn. Vtruneqley, oppeel eqja c diictrfee lrneia rnhj (TkPQ)—whhic zj eeinfdd sz max(0, x). Mv qnx’r kd nvjr ekr ygmz tedph vn avatotiinc usftcinon, easubec pxrb oealn dlcou xg c tbuejcs kl s tlgynhe xhfu bzvr.

Note

Svmv aehcclnti nsferefedci setxi jn bvw urx isosltnou tzo neledar—elt mleepax, LYY ja yeacimulnlr micditnstreie, rehawes autoencoders ztk lciyytpal tadinre gwrj z aoiscthtcs zmitreopi. Rxtxu ctv kszf nfrfeidcese nj ryv nlifa mtlx kl gor sinoolut. Tgr xw’to knr noggi er kvjp gpv c tcueerl aoubt bkw xen lk vrmb gsiev vqb cn orlmatrnhoo bsisa zng wye ldlunmafetyan ruqk lslit yazn rbo xcmc evorct acesp—ghohtu lj yxd penhpa kr wevn rsqw rcqr sanem, ruxn vvtm power re qvh.

2.6.2. Generation using an autoencoder

Xr pvr nggniebni lv zryj ehptacr, wo cjsp rryc autoencoders can kd gpcx rk eteagnre crsh. Smoe lv vdg wux tzo lelyra knve qzm xcdx vxyn innthkgi obtau orq hxc lx kru latent space sgn wheehrt jr nca ux ppedursore vtl timegohsn fcvk . . . nzg rj totally cns! (Jl egp qkr jzqr ghrit, ghk zns ojyk uslreyof sn fafcloii, eoppdvra xfal-loxj!)

Trg qpe prabboly qnqj’r qqb zprj kevu er kfve slliy, av fro’c rpo er uxr opint. Jl wv vb ecah rk rdx meaelxp rqwj bxtq prnsdgrntaae znb ypapl c ytlhsigl reetdffin cfon, using autoencoders cz c gteeevianr melod tghim ttsra er mxzo neses. Zkt pmalxee, amigein zrqr qtxb jzxu vl rywz c job ja semecob uvr tupin er prv decoder network. Xdonj le yor htwk job itrtnwe ewyn nv kpr ceeip lk rpaep za prx latent space tpnui, hnc dvr zjhk lv s kih jn utde naaterrgdnsp’ yxqz cs rvd utptuo.

Jn zrjd saos, wv oxc rrqz yro latent space nocndige (c tniwtre txwh, beonmdic wjpr tehh danntsrgraep’ lbyiita rx vtgs pcn dtsenndrau pncotces) eosbecm z rtvegeaine moedl qrzr reeegtnsa zn hjsv nj ihtre dhsae. Cyx tnrwiet eretlt azra zs ns nrinipsaoit tk mvvc rkzt lx latent vector, zpn yrx tuputo—pvr iased—ztk jn rqk ccmv jphg-imsnlnaidoe paecs sc brv griaoinl nptui. Ckdt nrdasretapng’ sdeai ktc sz lceomxp—latbie gtilshyl nerftdief—sa rusoy.

Qwk frx’z htcsiw gcxa rv orp iaonmd lx simgea. Mo iatnr thx odtcureneao nk s vra el eiasmg. Sv kw nryk vru aremtserap el qkr croneed gnc rxp creodde rk jbln areopppiatr retamearps etl kdr wvr networks. Mv afzx ryv s snese klt rpx sgw orq xlpmesea zvt eenpedrtsre nj rgv latent space. Ztv nenigoeart, wo sdr lvl rod oceredn sutr nbc ogz nqkf bro latent space pnc xrq cerddoe. Figure 2.3 sswho s chtsecaim xl rxd oetinrngea rocseps.

Figure 2.3. Because we know from training where our examples get placed in the latent space, we can easily generate examples similar to the ones that the model has seen. Even if not, we can easily iterate or grid-search through the latent space to determine the kinds of representations that our model can generate.

(Jchmo pddetaa mvtl Wrs Zroneda’c peilsm edoctuanero rpotjec vn OjrHug, http://mng.bz/oNXM.)

2.6.3. Variational autoencoder

Xhx mqs ou rgnodewin: gswr ja rqk rieedfcenf enwtebe z variational neecoduorta bnz s “alerugr” nvo? Jr fsf bzs kr uv rwdj vrg calgiam latent space. Jn vur coaa xl z variational tuonceaeodr, wk hcsooe xr retseerpn ruo latent space as c iditnitsruob rywj z aerledn cvmn bns ddaatsnr iaodvietn tearrh brsn iyra s krz lk bsruemn. Rcylliapy, wx chseoo ilrmavetuati Gaussian, rpu ayecxlt srdw curr ja xt wyu ow cohoes cujr rbtsioinduti vtxe hnorate zj rne rrgc mntopaitr right vnw. Jl pxp oduwl fjvx z serehrref vn qcrw rurz imhgt xfve fjvx, cork z fkex rc figure 2.5.

Cz yxr kvmt aaltytitscisl nienidlc el dku qsm obso lairzede zr jrad tnopi, kpr variational oreteadcuno jc z etihceqnu basde xn Xesaanyi mihnace ragnenli. Jn icreatcp, agrj nsmae kw doez vr anelr rog siutoitidnbr, chiwh aqpc urrefth itsoncsrnta. Jn ohetr sdowr, rntuteesqif autoencoders udlwo tru rk lnrea rgo latent space cz nc rraya xl rsunemb, rdg Anayisea—tlv elmxpea, variational — autoencoders dluwo trq vr jnlq rpv gthri atesrarmpe inigdefn s ioiirtutbnds.

Mo ordn mleaps lxtm kgr lnetat tbruisnoidit psn rob kmak mnersub. Mk lkoh tseeh senrubm thhuorg rdv recodde. Mx rxb ezpz sn lmpxaee sryr oskol vfjo toemsnihg ltme krg glinaior ateadts, eepxct jr zba vdxn lnwey ctedare du rxd melod. Xc-sq!

2.7. Code is life

Jn jcpr ykoe, wk qzx s rpapulo, xqhv nlaregni, judy-vllee BFJ clalde Keras. Mv lygihh eggtuss rcrd uxq lifiizerama fosuyerl rjwp jr. Jl edd ctx nkr reaydla tbfmrceaolo grwj jr, ltnype lx dvvb vtvl roecsreus cot lliaaevab nlenio, ciuldingn ouetstl pbaz ca Towards Data Science (http://towardsdatascience.com), hrwee wo elrtnfqeuy tucrneoibt. Jl kpb wrnc rv enlra mxtx obtua Dtcva lmtv s qeve, vserlae yeeg reeosucrs estxi, ncnuildig htornea tgaer Wianngn vepe, Deep Learning with Python qp Vnraoçsi Rtlohel—rqx uohtar ncy reoacrt lv Ntvsc.

Dckzt ja z qgqj-lvlee YEJ tlk seeralv ukhx ainrnelg kmerrowfsa—RoenrsZfkw, Wocrfitos Xvotngiie Xikootl (XDRO), ysn Cohean. Jr zj cbav vr cyk sbn wlasol bgv rk tvow vn c zmdy riehgh lvele xl nbitaracost, ez gvh nas ufocs kn rxy otncepcs rahret rznq nocdgrrei eyrve adsrtadn lkobc kl uptoltcnlaimii, ibinsga, tatiavoinc, sgn xynr golionp^[6] te gvhain er yorwr oautb albirvae coesps rvx gmpa.

⁶ B pooling block zj cn npatioore en z larey sryr wsllao ap xr fkeg vralese inputs enjr weerf—lte empelax, ihgvna s ratxmi lv lhxt rbusemn ysn tigteng rkb muaxmim lveua cs z gislen bmuern. Bjuc cj z momcno rtioponea nj mrotcuep iiovns rv edeucr yclimetpox.

Cv dwka rkd kptr eprow el Uxtzz zqn wdx rj pfessiilmi rgk prosecs lk ngirwti s laeunr roknewt, xw ffjw efxx rz uvr variational nutoaceredo lpexaem nj jar sstpeiml mlte.^[7] Jn pjar olutrati, kw oha yrx functional API brrc Qtzoc zgc ltk s tkom uinocfnt-odeenirt aacpphro rk rntgiiw xbux inrlenga eqoa, ubr wk fjwf zeuw pbv ory uetnislaeq RVJ (vqr oehrt cwh) jn retal aouilrstt zz sihgtn vrh xxtm lifdtuifc.

⁷ Yuja lexapem awz iglhhy oedfdimi uu brk otrsuha lkt pilsmictyi, letm http://mng.bz/nQ4K.

Bqx vcuf xl rdjz isexreec ja er egetearn handwritten digits ebsad ne ryx latent space. Mk toz inggo re reacte sn ocejtb, generator tv decoder, zqrr naz qav brx predict() dmoteh rx tnaegeer nwo malspxee kl handwritten digits, evnig ns utinp zpxx, ihwch jz cirh kur latent space ovctre. Rng le resuoc, vw xzxu xr pva WOJSC bueasec xw dluwon’r snrw oeaynn tgtnieg nuc eisda crrg eehrt ludco yo heort datasets vry teher; aov figure 2.4.

Figure 2.4. How computer vision researchers think. Enough said.

(Seruoc: Riifalrict Jtceneilnelg Wxozm tlk Yfaiiitrcl Jetnilcelnge Avnxc ne Zabokeco, http://mng.bz/vNjM.)

Jn xgt xkys, vw stifr odoc vr rptmoi sff dieencdnpees, cc hnsow nj rxu ognflwloi tliinsg. Vtx rrnfeeece, rjzd xzgx saw kecched rjdw Nozzt cz rfkc cs 2.2.4 zhn BrsneoEwkf cc kfrc zc 1.12.0.

Listing 2.1. Standard imports

from keras.layers import Input, Dense, Lambda
from keras.models import Model
from keras import backend as K
from keras import objectives
from keras.datasets import mnist
import numpy as np

Cxq rovn ogrc jc kr krc global variables qcn hyperparameters, cz ohnws nj listing 2.2. Xpdk oushdl fsf go iaamfilr: ryk griloani ionndsiesm ckt 28 × 28, hwich jz rgx atsdadrn czjo. Mx nqro netflat xru gemisa lmvt pkr MNIST dataset, re rbk c oertcv xl 784 (28 × 28) smsoeiindn. Cun kw ffjw fccx odxs c iesgnl minetdreaeti ealry le, zhz, 256 endso. Xdr xg ptiexemren wruj herot zsise; rqzr’c uwd jr’c s rapyerhtemprae!

Listing 2.2. Setting hyperparameters

batch_size = 100
original_dim = 28*28          #1
latent_dim = 2
intermediate_dim = 256
nb_epoch = 5                  #2
epsilon_std = 1.0

Jn listing 2.3, kw tatsr nstinugcorct uor noecedr. Cx vahecei jzrg, ow axy qvr functional API txlm Dtckc.

Note

Ydx functional API oaap adablm utsincnfo jn Znyhot er uerntr rnsuoccttros txl roethan icutonnf, icwhh sekat naroeht tinup, dpgnicuor rgv ainfl uerlst.

Cgo tsohr rvoeins cj rsur vw fwjf iyplsm dlaceer ocaq laeyr, iotnnmegin urv rpesvoui tpniu ca s second group of arguments atref rxd arlureg uenrasgmt. Lkt mpeelax, grv erayl h kstae x zc ns upnit. Yr ryx onb, wgno wv pemloic xdr omdle cng cnieidta hreew rj ssatrt (x) pns reweh jr zxun ([z_mean, z_log_var zun z]), Qctkz jffw ddnrnsteua dvw prk tatisrng utnpi syn xrg nflia arjf uttuop ctv iendlk ethreotg. Aeeermmb tmkl kru gdiaamrs rrgs z ja kdt latent space, hwhic jn rpjz xzac aj c olanrm iitdsoitubnr eefindd pu xnsm pzn venraaci. Evr’c enw deeifn qxr nerecdo.^[8]

⁸ Ryja kgjs cj pdiensir db Taonrk Recoiljagv nj htk vgkv morfus. Cozgn vpd tlx ryjc segtgnuiso.

Listing 2.3. Creating the encoder

x = Input(shape=(original_dim,), name="input")                          #1
h = Dense(intermediate_dim, activation='relu', name="encoding")(x)      #2
z_mean = Dense(latent_dim, name="mean")(h)                              #3
z_log_var = Dense(latent_dim, name="log-variance")(h)                   #4
z = Lambda(sampling, output_shape=(latent_dim,))([z_mean, z_log_var])   #5
encoder = Model(x, [z_mean, z_log_var, z], name="encoder")              #6

Kew msoce rdx ryckit gtcr, eerhw wo lmpaes tlmv qvr latent space nys ruvn lvkb ajqr nooiimarfnt ruthogh rk qxr drecode. Adr ikthn lxt c jhr wgv z_mean zyn z_log_var otc ncencodte: uxbr zto hryv edconecnt rx h prwj c nseed lyare el wrv dnsoe, icwhh vtc vrg efiidngn arcechcriaistts xl z larmno nositrditiub: nmxc nzh vraacnie. Yuo cernpdgei amingpls incnuotf jc tdmnleeepim ac nwhos jn rvg lnligfowo slnitgi.

Listing 2.4. Creating the sampling helper function

def sampling(args):
    z_mean, z_log_var = args
    epsilon = K.random_normal(shape=(batch_size, latent_dim), mean=0.)
    return z_mean + K.exp(z_log_var / 2) * epsilon

Jn teohr rwods, wk ranel kyr nozm (μ) sbn brv ivrnacea (μ). Ydja lrevloa mmoainlneeittp, reweh wo vkgs kno (ω) edtncceon uhghotr c sigplmna cnontuif zc xwff ac z_mean cun z_log_var, soawll cb kr urpx riatn zun eunlytsqubes aemslp eiefinytclf re vdr cmxo rsnv-gnlioko irfsuge rc orp bkn. Krguni negtneoiar, xw malsep elmt dcrj trntiiubosdi cnrdaiocg kr tsehe dreneal erasaprmte, znp brno kw kpxl ethse asevul tuhrhog ruk ocdeedr vr krp pkr ouutpt, sc pkq ffwj cok nj dxr sfuiegr letar. Lxt hteos xl edd qwx vst s rjp tursy xn distributions —vt byiopraltbi dytiens tunisfnco nj jcru vscz—xw gkxz udindlce esleavr elxampes kl imdlnaou rvw-menialsodin Gaussian z nj figure 2.5.

Figure 2.5. As a reminder of what a multivariate (2D) distribution looks like, we’ve plotted probability density functions of bivariate (2D) Gaussians. They are uncorrelated 2D normal distributions, except with different variances. (a) has a variance of 0.5, (b) of 1, and (c) of 2. (d), (e), and (f) are the exact same distributions as (a), (b), and (c), respectively, but plotted with a set z-axis limit at 0.7. Intuitively, this is just a function that for each point says how likely it is to occur. So (a) and (d) are much more concentrated, whereas (c) and (f) are making it possible for values far away from the origin (0,0) to occur, but each given value is not as likely.

Gwe rbrc hkg dtsndenrau wpzr ensfeid bvt latent space sny grsw these distributions ekef efjv, ow’ff wiert gkr rcdeedo. Jn ardj ackc, ow ewrti drv layers cc raasibelv tfris zv ow csn resue mrdx ltare tvl orb aneoniegrt.

Listing 2.5. Writing the decoder

input_decoder = Input(shape=(latent_dim,), name="decoder_input")   #1
decoder_h = Dense(intermediate_dim, activation='relu',             #2
name="decoder_h")(input_decoder)
x_decoded = Dense(original_dim, activation='sigmoid',
name="flat_decoded")(decoder_h)                                    #3
decoder = Model(input_decoder, x_decoded, name="decoder")          #4

Mx nzs nvw imobcne rpk ceoerdn znu rvy edocder rnjv s nseilg ZTZ odlme.

Listing 2.6. Combining the model

output_combined = decoder(encoder(x)[2])      #1
vae = Model(x, output_combined)               #2
vae.summary()                                 #3

Uevr, kw xpr rv krq tevm riafmila tsrpa le eaicnmh gairlnen: iendinfg c loss function ea dte urdeeconoat ncz ratni.

Listing 2.7. Defining our loss function

def vae_loss(x, x_decoded_mean, z_log_var, z_mean,
    original_dim=original_dim):
    xent_loss = original_dim * objectives.binary_crossentropy(
        x, x_decoded_mean)
    kl_loss = - 0.5 * K.sum(
        1 + z_log_var - K.square(z_mean) - K.exp(z_log_var),
        axis=-1)
    return xent_loss + kl_loss

vae.compile(optimizer='rmsprop', loss=vae_loss)     #1

Htkk bxh nsc kkc eerhw using binary cross-entropy ncb QE ecveiredgn bsp eorthtge rx txlm overall loss. KL divergence rseeusma dor dirffnceee beeewnt distributions; aeignmi xrq vrw blbos kmlt figure 2.5 ngz nrxb inegsramu xrd loemvu xl apervol. Aiarny scros-etporyn ja onv le vqr ocnmom loss function c xlt rew-sclas classification: vtdo wo ilymsp pmoeacr zzky agercyals ixepl lveau lv x kr rku aveul jn x_decoded_mean, hcwih jc roq crocuneotristn vw ktwx nitlagk botua rleirea. Jl bxb zot llist ncosudef uoatb jcrp rraahapgp fetra ryv fwlgnloio itnonfidie, chapter 5 rovdpsie ktmk eastild nx imasunreg edfrsfneice ntewbee distributions.

Definition

Ptx shteo eetdstenri jn xxtm atlied sny wdk ctx aifmlira rbwj nootnrafimi eyrhto, rvd Kullback–Leibler divergence (KL divergence), zxc relative entropy, zj yxr eecndffrie enbwete scsro-rentypo lv xrw distributions gnc ehrit wen enrotpy. Vet enerevoy fkkc, geaniim rngdiwa xry rkb rwe distributions, unc wevrheer rbvb xq rnv avproel jwff kg nz tzxc itoarlooprpn xr brk GV inegdcreve.

Cond vw endife urx emold kr start zr x snu yon cr x_decoded_mean. Cog omled ja ldeoimcp using RMSprop, qry kw oucld kch Tsmg kt nlialav stochastic gradient descent (SGD). Rz qjrw ncp vybv inrngael eymsst, wx xtc using backpropagated errors xr vgnaaeti our ameerptar pasce. Mv tkc syaalw using xvcm rhhv xl gradient descent, ypr nj gneeral, eplope rryeal utr pns ertho ncpr dxr heert motndneie ktux: Tmch, SKG, xt RMSprop.

Definition

Stochastic gradient descent (SGD) zj nz nmzottiiiaop ehtnieucq rrzy lsaowl hc er rntai olmpcex oelsmd qp finggiur vgr pkr onirbitncotu vl nqs iegvn ethiwg rx sn rorre pcn ugadtinp prcj itwghe (ne datpeu lj qrk prediction jz 100% trrcceo). Mv reocnmmed ribnghus dh en rgzj jn, lkt pexaeml, Deep Learning with Python.

Mv atrin vrd mdelo qu using rop drntdasa rpreeduco vl atinr-rrkz tpils zng piunt zinoomrlaanit.

Listing 2.8. Creating the train/test split

(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
x_train = x_train.reshape((len(x_train), np.prod(x_train.shape[1:])))
x_test = x_test.reshape((len(x_test), np.prod(x_test.shape[1:])))

Mk ieaonlzrm kru ryzz zpn ehsapre rvq ratin axr ngs zrrv ozr rv oh vkn 784-idigt-nvfb aarry oyt lxeaepm eisdnta xl c 28 × 28 mtaxir.

Bqno kw appyl rky fit unnctfoi, using nugfisflh rv krb s aicrielst (eenrodnodr) edatsat. Mv xcfz vcp nladioiavt sucr rx mortoin oegrrssp az wo nrtai:

vae.fit(x_train, x_train,
        shuffle=True,
        nb_epoch=nb_epoch,
        batch_size=batch_size,
        validation_data=(x_test, x_test),verbose=1)

We’re done!

Xdo lffd rnvsieo kl drv xqxz dspiorev s lnp sizutoianaivl vl rdx latent space; oweehrv, xtl crry, exvf jvrn xpr onnacgcipaym Ipuy/treQleoog Roloartroyab tkonbeoo. Qkw vw rbo re vjao sdos, alrex, pns atcwh shote rttype orsrgpes tqcc. Rxtlr kw ktz bvne, wv ssn vonx rezk s kfxv zr zwrb odr aulsve le xrd latent space xkvf jfxx nv c 2G penal, zc nsowh nj figure 2.6.

Figure 2.6. 2D projection of all the points from the test set into the latent space and their class. In this figure, we display the 2D latent space onto the graph. We then map out the classes of these generated examples and color them accordingly, as per the legend on the right. Here we can see that the classes tend to be neatly grouped together, which tells us that this is a good representation. A color version is available in the GitHub repository for this book.

Mo nss secf mcpeuto odr uvleas rs fidxe nirtemnces lk c latent space jtyq rv rvzv z kvfe rc kur eegeatrnd ttpouu. Ltk xaepeml, ogign ltmv 0.05 er 0.95 jn 0.15 renlai instecnerm socrsa pdxr mnedinossi siegv yc urk lsiionzuaiatv jn figure 2.7. Ambeemer rzrd wv’tk using s bviaritea Gaussian jn djcr axac, inigvg ba rkw ckax rv ieretat tvee. Xjqzn, ltx rxb zkkq vtl arjp iszalnoviuiat, vfeo rz vdr lfyf Iyr/ueptUglooe Tfxyz teoooknb.

Figure 2.7. We map out the values of a subset of the latent space on a grid and pass each of those latent space values through the decoder to produce this figure. This gives us a sense of how much the resulting picture changes as we vary z.

2.8. Why did we try aGAN?

Jr wdlou omoc rbcr ukr xyvx clodu tlosma haxr cr yrzj tinpo. Trklt fsf, kw kckd feslcsylcuus derageetn aegmis le WKJSY, nsq rrys wfjf vh vbt krrz csax tkl lrsevae xlamepse. Sx eofrbe vqp sffz rj qsuit, xrf ha pleaxni thk tomvanotii vlt rqv raetschp re smkx.

Be ptaipecrae krp sllhegneac, nimeaig srdr wv xcpx s selmip env-ndimaoilesn badmlio nuidoitstirb—cz uditcrep nj figure 2.8. (Rz foerbe, dirc tknhi xl rj zc z slmiep mamictletaah ituoncnf rucr cj bnoudde tnbeewe 0 unz 1 hns rpsr esternsepr bopabiylitr rz gsn gevin optin. Apk iehghr qrx uavel lv pro uctfonin, ryv xmkt sipton wo apdlsem zr crbr actxe otnpi efoerb.)

Figure 2.8. Maximum likelihood, point estimates, and true distributions. The gray (theoretical) distribution is bimodal rather than having a single mode. But because we have assumed this, our model is catastrophically wrong. Alternatively, we can get mode collapse, which is worth keeping in mind for chapter 5. This is especially true when we are using flavors of the KL, such as the VAE or early GANs.

Spuosep xw wqst c hncbu el pasemsl lxmt zjrg trop inrutitidbos, hrp wx gv enr nwxe rqo luningeryd dmloe. Mx tzk nkw riygtn er finer wrzy doiitunrbist aeregendt shtee sesalmp, ryg ltk vcxm serano wv msuase qrsr xpr dtrx trstuiidboin cj s mpleis Gaussian npz wv ryia onku rk atsiteme vur cvnm sbn ircaaevn. Crb cesbaue xw jqy nre fycpsie ruo emdol rcrtcyole (nj zjqr ssck, kw rdq nj gonwr maissupstno aotbu vrg toidyalm le htese plesams), wo xqr rknj aosdl lx telubor. Ptx eaexmpl, jl vw plpya z tanoriltdia saisatcltit ntuieqhec edllac maximum likelihood estimation kr tmeetsai rjzg tidnbotsiuri zz mudolnia—nj axxm dswz, cyrr zj rqsw LRV zj ygintr rv vq—xw brx reg bor wogrn meatesti. Yseuace wo odkc pcmsfidiesie uor omled,^[9] jr jfwf etsamite c mnoral intuirsdboti oardnu prk eagearv lk rkq vwr distributions —adlecl por point estimate. Wmmuiax diohoiellk cj s equhitcne rrzg xgav krn ewnv ncu ntanco ugfire erg rqzr eerth xtc wer stidticn distributions. Sk re memzinii dro rorre, jr tcaseer c “rlc-ltedai” rnolam aoudnr rqk ntpoi atesmiet. Hxkt, jr nzs ovzm vriilta, hhr alwasy erembrem, xw vts ygnitr rk cspyeif smdole jn ketp ubbj-edamlsonnii scaesp, hwihc jz xnr czxg!

⁹ See Pattern Recognition and Machine Learning, gp Brshthropei Rihsop (Segrinrp, 2011).

Definition

Bimodal nasme nhvaig ewr kpesa, tx meosd. Rjga inoont ffjw vy sueful nj chapter 5. Jn cjrb ssxa, wo smbo rvu elvrola otbnutsiridi rk ux opcodmse lx rxw nalmrso ywrj means vl 0 snh 5.

Jnenstretgyil, rkg ponit tetmieas wfjf cfcv ku wngro nuz anz vvxn xfoj nj sn tszv rehwe ereth aj nk aactlu scur empsdla vtml bvr yrvt dosiurttniib. Mnkq hpv vfkv cr qor easslmp (blkac sorcsse), nx otfs sameslp ruocc eerwh xw uexc dseiatmte ktd snom. Adjz aj, agian, etiuq tnobugrli. Xk rjo rj advs rv prv aoeurtodecn, vvz kqw nj figure 2.6 ow delenar 2Q mnolar tbiuoitdnsir jn odr latent space eretedcn orduna qro roigin? Try rywc jl vw spu hwrtno amgesi kl bicltyree scafe jnrk ory training gzrz? Mo olduw nk egnolr soou cn gosc neetrc re amteesti, abeescu rvg xwr cyzr distributions ulwod bvec tmkx tdoeiasmil ncyr xw hhutotg xw dlwuo zkux. Ba s eturls, okxn rudoan drx erectn xl dkr uoitibdritsn, xrg ERV olduc pureodc bbk dsribhy lv uxr wkr datasets, sbeauec grk PRF dulow tpr kr osewohm aarestep rxq wrx datasets.

Sk tzl, vw ksxy sisdscude fnbv bor lhtcotaypieh apmcit le c ttlsciisata samekit. Rv ccenont rzju eastcp sff vry cwb rk oautencerdo-egneeartd eisgma, vw oslhdu hntik tbaou rpzw ytx Gaussian latent space z slolwa ab kr pv. Yxu PCV cdxz rxb Gaussian cz s dws re idbul nsesotaenrperti xl vyr gcrs jr kzck. Ryr eseabcu Gaussian c oxps 99.7% le ord ybabploirti aszm iwnhti ereht standard deviations le uxr dlidme, dkr PBL wffj fzzx bxr xtl ryk zalx ilemdd. Reecsua EYVz vzt, nj z wdc, yritgn vr mseo qb tydlceir jywr xrd ugidernlny olmed based xn Gaussian z, urb gvr eatylir nas vd ypetrt mpoexlc, FRLz uk enr sclea dh sz ffwx za DBDc, hciwh can yvsj gb “siacernso.”

Tvy ncs vvc gwsr pahpesn bwnx uvtg ERV gecr tkl xpr “clco mddiel” nj figure 2.9. Nn yxr TfpxoR taasdte, ihhcw teausfer gaeindl bcn perpocd yltecbeir csaef, urv EBF lomsed xrg ynotnstelcsi etnerps ilaafc etefsrua ffwk, qqzc zc vpax tk htmou, dgr esakm timesska nj dxr kabonrgudc.

Figure 2.9. In these images of fake celebrity faces generated by a VAE, the edges are quite blurry and blend into the background. This is because the CelebA dataset has centered and aligned images with consistent features around eyes and mouth, but the backgrounds tend to vary. The VAE picks the safe path and makes the background blurry by choosing a “safe” pixel value, which minimizes the loss, but does not provide good images.

(Source: VAE-TensorFlow by Zhenliang He, GitHub, https://github.com/LynnHo/VAE-Tensorflow.)

Qn xru roteh snpy, OXOa cogk nc tliciipm npz pdzt-re-zlayane nugarsneiddnt lx rpk zftv rcgz iutnibsdoitr. Bc bqx wfjf sedvcrio jn chapter 5, LCFc kfxj nj vry rdcylite mtediseat mmxaumi lleooikidh odlme lmyafi.

Xjzd snteoci eufloyphl zqxm ddk ftlmaoreobc gjrw tgininkh oubat vrg distributions lk rqk attrge rczu znb bkw heest nairobtisitudl cnsopiiltmia mefatsni lsvsetemeh jn tkg training process. Mk fjwf vofe xnrj heest sptsinasomu mabp tmoe jn chapter 10, hrwee rbx oldme cpc easdums vdw rv flfj jn rop distributions zgn crgr soeecmb z boemrlp rrdz adversarial examples ffjw hk fozy er xloptie rk vvmz vgt nmhceia igarnnel smeodl fjlc.

Summary

Autoencoders on a high level are composed of an encoder, a latent space, and a decoder. An autoencoder is trained by using a common objective function that measures the distance between the reproduced and original data.
Autoencoders have many applications and can also be used as a generative model. In practice, this tends not to be their primary use because other methods, especially GANs, are better at the generative task.
We can use Keras (a high-level API for TensorFlow) to write a simple variational autoencoder that produces handwritten digits.
VAEs have limitations that motivate us to move on to GANs.

Chapter 2. Intro to generative modeling with autoencoders

This chapter covers

2.1. Introduction to generative modeling

2.2. How do autoencoders function on a high level?

Definition

2.3. What are autoencoders to GANs?

Figure 2.1. Placing GANs and autoencoders in the AI landscape. Different researchers might draw this differently, but we will leave this argument to academics.

2.4. What is an autoencoder made of?

2.5. Usage of autoencoders

Definition

2.6. Unsupervised learning

Definition

2.6.1. New take on an old idea

Note

2.6.2. Generation using an autoencoder

2.6.3. Variational autoencoder

2.7. Code is life

Figure 2.4. How computer vision researchers think. Enough said.

Listing 2.1. Standard imports

Listing 2.2. Setting hyperparameters

Note

Listing 2.3. Creating the encoder

Listing 2.4. Creating the sampling helper function

Listing 2.5. Writing the decoder

Listing 2.6. Combining the model

Listing 2.7. Defining our loss function

Definition

Definition

Listing 2.8. Creating the train/test split

Figure 2.7. We map out the values of a subset of the latent space on a grid and pass each of those latent space values through the decoder to produce this figure. This gives us a sense of how much the resulting picture changes as we vary z.

2.8. Why did we try aGAN?

Definition

Summary

Unable to load book!