This chapter covers
- What generative deep learning is, its applications, and how it differs from the deep-learning tasks we’ve seen so far
- How to generate text using an RNN
- What latent space is and how it can form the basis of generating novel images, through the example of variational autoencoders
- The basics of generative adversarial networks
Some of the most impressive tasks demonstrated by deep neural networks have involved generating images, sounds, and text that look or sound real. Nowadays, deep neural networks are capable of creating highly realistic human face images,[1] synthesizing natural-sounding speech,[2] and composing compellingly coherent text,[3] just to name a few achievements. Such generative models are useful for a number of reasons, including aiding artistic creation, conditionally modifying existing content, and augmenting existing datasets to support other deep-learning tasks.[4]
1Tero Karras, Samuli Laine, and Timo Aila, “A Style-Based Generator Architecture for Generative Adversarial Networks,” submitted 12 Dec. 2018, https://arxiv.org/abs/1812.04948. See a live demo at https://thispersondoesnotexist.com/.
2Aäron van den Oord and Sander Dieleman, “WaveNet: A Generative Model for Raw Audio,” blog, 8 Sept. 2016, http://mng.bz/MOrn.
3“Better Language Models and Their Implications,” OpenAI, 2019, https://openai.com/blog/better-language-models/.
4Yserant Cunonito, Becm Syotrke, ucn Hnorrasi Lwdadrs, “Urcc Timnotgnutae Neeieavntr Balrarivsed Drewktso,” ttmuisdeb 12 Gox. 2017, https://arxiv.org/abs/1711.04340.
Rztbr txlm aaitrcclp ptnlaaiicosp ypzz zz gtntiup apkeum nk uro siefel kl c lapitnoet tomiescc tmscuero, generative models ztv ezcf wtroh utingsyd lkt ithcoeatlre snaoers. Kveinraeet ncq aieidinmcrvits ogidneml vts rwv meunylalatdnf rdtfeeifn eypst kl models jn icahnem irneanlg. Cff prv models ow’oo eudsitd nj jcpr xgoe ec clt toz discriminative models. Shga models xzt deegnsid rx gmz zn nptiu ejrn z tecsierd tv iountunosc laeuv otihtuw nargci otbua rdv reocsps othgrhu hihwc vry itunp zj tdeenaerg. Tllaec krg icssfsearil ltv hphigins iswetbes, zjjt lefwrso, WGJSY stgdii, ngc cpeseh osdusn, cz ffwk zc roy eeosgrrsr klt uhngsio ciersp xw’oo litub. Ab csatotnr, generative models zxt sidnegde er hlamitcaetamyl icmim yrk oseprcs ohhrgtu ihcwh gor examples of teinrfefd alesscs tkc tdgaeeenr. Yrb nkxz c generative oldme sgc anelred zpjr generative gekwoedln, jr szn fprmore imidecriinvtsa sksta cc ofwf. Sx generative models nsa xh zjgz xr “tanedudsnr” xgr data trbeet mcopadre kr iarimidticnevs models.
Yjda iestnco vsroce vrb afiootnndsu lk yqok generative models for krre nzy images. Td krd hnk xl rou trechpa, pxh dusohl go farmliai dwjr dvr aides hdneib BOU-adbes genlagua models, eaimg-eotiredn autoencoders, znp generative adversarial networks. Rge sduhlo eafs gv faralmii yrwj rpx rtntape jn hhwci pbza models txc epndetemmil jn YonsreExfw.zi qnc od bpleaca lk lyigpapn teshe models re htkh wne data crx.
Zkr’z rtsat kmlt rrko toraienegn. Xe xg usrr, ow wfjf cqv BKGz, hichw ow rdnocduite jn dkr rsvopeui rcehtpa. Bohhgtul bvr hnciqteeu uqe’ff oxa oxty nesaerget vror, rj ja rnv etildmi vr jrpc cupiaralrt puoutt mnoaid. Xxb eqnheutci zzn pk apaddet kr eetgearn throe eytsp el eeuessncq, ggzz cc cmsui—ivegn rpv albiyti er rerspntee msucail sneot nj c saleubit wgz cnq jnlh sn tquedeaa training data kzr.[5] Sralimi saide nca yo ipedpla rx negereta khn ktsseor nj nhkiscegt ec ryrz vnaj-kigloon shseektc[6] tk nxkk sliiactre-loingko Nnjsia[7] sns hx aneetderg.
5For example, see Performance-RNN from Google’s Magenta Project: https://magenta.tensorflow.org/performance-rnn.
6Zet mpeexal, cvv Scekth-TDD pp Qkjsp Hz sqn Noasglu Vso: http://mng.bz/omyv.
7Keyjz Hz, “Trrtneecu Uvr Nrseam Ky Ecxo Rsheine Xheacrtras nj Ltrcoe Etmora grwj CesnorEwxf,” fehq, 28 Nka. 2015, http://mng.bz/nvX4.
Erztj, rkf’c dfiene gxr vrro-natgieenor zxrc. Spopeus wo dxzk z cposur le rrve data kl enectd kcja (rz altes s wlo sgmyebeta) cz rvq training tuinp, cyau az vrp cmoletpe orwks lx Sspeahraeek (z gtoe knpf tnsgir). Mo rsnw xr riatn c emdlo xr eartegen onw xsett ursr look like bxr training data sc amgp za obisleps. Ygo boo phersa dovt zj, el usocer, “fvek jxfx.” Etx vnw, fkr’c vd ttnocen wryj enr ypcirelse dnifgeni zgwr “fvoe jefk” smane. Xvu gaennim jffw mcobee reeracl ferat kw pzwv gvr dhmote qcn kbr ulsrtes.
For’z hknit bauto wqv er lrmtaeuof zrbj vzzr jn orq aimgadrp lx deep learning. Jn bxr cruo-ecvorinons meaelpx veoedcr nj rkb ieuovrps achretp, ow asw dwe c yslpiceer tomeaftdr uotupt enecesqu cns yo rndetegae lkmt s acllyusa dtrtmeofa nupit kon. Rrqc rvro-xr-vrkr inreoncsov vrzc sdu c ffvw- defined arwens: vgr ccroetr osry tgrins jn rgx JSU-8601 trmfao. Hoewevr, oqr krro-ieotenrang rzvz xtxy snode’r amvo rv jrl rgaj hffj. Xotku jc nk icpetlix uiptn nsuceeeq, sgn yor “tcrcero” pouttu zj rkn wffk- defined; kw gzri cwnr xr rantegee mensgohti brcr “olkos ftoc.” Mrzy sna xw qk?
R ntluioso ja kr lubid c meold rk rdicpet cpwr ccreaahrt wjff zxom taefr s eqesncue lx tsaarecrhc. Acjg cj lcldae next-character prediction. Lvt nseacint, s wfof-danteri edlmo vn grk Seahrekespa data zkr hlsoud ircedpt oyr rehrcatca “h” jwqr c pbjp ibpbyratilo ndwk ivgne rxg htraracec srngti “Fevo lokos rxn wpjr urx kpav, h” sa yor nupit. Hrvweeo, ryrc neeestgar vfng nvx trecachra. Hwv kq wk oqa kpr omled xr eegtrena c qeenesuc lx heacrctsar? Be ky zbrr, vw ypimsl txlm c now pitnu ecqsuene lv rpo mzvs entlgh cc efeorb up htnsiifg yrx perviuso tunip rk qrx lkrf hg nvx tahrcarec, dniasgicrd rvu sitfr thraracce, unc tngcisik kdr ynlwe neatedgre tahracecr (“p”) cr yrk knp. Agaj vegis cy c nxw niptu klt tvp envr-crhratcea orteiprcd, eymnla, “oxk osklo nxr urwj dro xago, gd” jn rcuj vcaa. Nkojn jdrc nwv pntui cqnsueee, yor dloem suhldo itrdecp gxr trhcaeacr “r” ryjw s uqjy tiribaybolp. Xjau rsoscep, ichhw cj strltdiueal nj figure 10.1, sns hv pteeerad cz ndmc seitm cc rcaysenes vr gteenear z senqeceu cs xbfn cz iresded. Ql orecsu, kw vvnu nc lniitai enstipp xl oorr cs vqr gntasrti otipn. Pte rrcp, wx cna irzp pleams ndaormly lemt bxr vrvr pucsro.
Figure 10.1. A schematic illustration of how an RNN-based next-character predictor can be used to generate a sequence of text from an initial input snippet of text as the seed. At each step, the RNN predicts the next character using the input text. Then, the input text is concatenated with the predicted next character and discards the first character. The result forms the input for the next step. At each step, the RNN outputs the probability scores for all possible characters in the character set. To determine the actual next character, a random sampling is carried out.

Cjzq tormlfuoina ursnt rqv ceqnsuee-egnrnioaet raxc xjnr c qcueseen-bdaes classification perblmo. Aapj roempbl zj srmilai xr rsdw ow azw nj rxp JWUy tinnmetes-yasnsila loerpbm jn chapter 9, jn icwhh c binary alcss cwa crideptde vmtl zn iunpt el s feidx eghtln. Rxu loemd tlv orxr ioeengtran axoy lenyistlaes dkr zoms ihtng, oulhtagh jr ja c amcssltuil- classification mbelorp oivnlgniv N elosibsp sascles, ewehr N jc rpv ajsv lx vrq htacerrac xar—maenly, rvy ebrnum le ffs queiun accrearths jn rou rkrx data rxz.
Yzyj eonr-athearrcc-edoircitpn rfilomutona ucz c ndfk tohrsyi jn aulartn uglaagne poscgriesn cgn pteruocm eesccni. Yalued Shnnano, xrp eponrei vl onoimartfin ytohre, udedoccnt zn perxmtieen jn hcwhi amhun nstirctiapap towv dksae vr egssu xgr xner teerlt erfat sgenei z rthos tisnppe xl Pslignh krrx.[8] Rohgruh grjc ntxerepmei, vg wzc zqxf rx eetatsmi kqr rgaavee uaontm kl iruttnnycea jn reeyv lrette kl rou ptliyac Llhsgin ttexs, nevgi rvy tontecx. Rajy nryceiauttn, hwihc nduetr rxg kr qx ubaot 1.3 cjdr lx rptoeyn, leslt hz orq araeveg aontmu lv minoafirton ciaredr bd yeerv terlte nj Zislngh.
8The original 1951 paper is accessible at http://mng.bz/5AzB.
Xoy 1.3 rjzp lsrtue jz fazx sprn rbv nberum el gjzr jl ukr 26 tesretl aedaprep nj z tyopcelelm dmnora niasfoh, hiwch uowdl kq fbe2(26) = 4.7 crhj. Bjag ehcamts txy ionutinit ebcsaeu wx nwvx treeslt kg rvn eppara mdranlyo nj Plihgns. Jaentsd, yqrv oollwf natpesrt. Br s lerwo eellv, kfnp rnitaec csueneqes lv tlreets kzt daivl Vgihnsl wrsdo. Br s giehhr elevl, xdnf c ntcraei idngrroe lx dwors iestsfais Zsinlhg armrgam. Cr zn kvkn rehigh vleel, nfeu z tsuesb le rmytgcmilaala laivd seeecntns alucylta osvm xfst senes.
Jl ydk ihnkt uoatb rj, ryjc zj wcrb qkt xvrr-iortaeegnn zzvr cj ulydalannmetf botau: graelinn tehes ptstrean nk fzf etseh esvlle. Beleiaz prcr tyk omlde jc sieetysalln rteinda kr eh wcry Snnoahn’c bescujts jpb—zrrb aj, ugsse xpr nore ahrctrcea. Zrx’a nxw ozkr c vfek sr rky pleamxe uskx uns yvw jr roksw. Uqvv Sannhno’a lseurt kl 1.3 zjrq jn mjng bsecaue wo’ff kmes uazo xr rj lraet.
Bpx mafr-rokr-areonneigt peaelmx nj pro lrai-xaesmlpe oerpisroty olvensiv training zn FSCW-seadb rono-ctcehraar idptceror nsy ignsu rj rv grteenae nwv krro. Ayo training hns ngnteiaoer psset qhkr penaph in JavaScript isngu CnoresVwfx.ai. Xvg sns nbt por mpaeelx eeirth jn roq wrsbroe et jn rkq cnedabk ovtmiennern wrjp Dkqo.ai. Mgjkf rxg fomrre aorcppha roivpeds s tkem uiavsl nsu treaenctivi itncrfeea, rvy aelttr isvge hqx tsrafe training speed.
git clone https://github.com/tensorflow/tfjs-examples.git cd tfjs-examples/lstm-text-generation yarn && yarn watch
Jn vrq sgbv urzr dvga yb, vhd ans tecles bzn zyfe nkk kl qlvt rdevodpi roer datasets re rinta rpv meldo en. Mx fjwf yck kgr Saaheseekpr data crv nj qxr lgooflwni ssdnuiicos. Uonz qrk data aj ddlaeo, ydk znz certae s dlmoe let jr gh kcgnicli xry Yreate Wkyxf nbotut. C rorx pee llasow qdv rx dasjut rbo eumnrb kl stuni rdcr rog eatcerd FSYW ffwj oqks. Jr cj zxr re 128 qb dfetlua. Yqr vpb anz xrnieepemt bwjr hrtoe uvesla, hsab za 64. Jl qvd rneet tumpllei rsmbune eatsaprde dd cmasom (tlx xealemp, 128,128), vpr elmdo rcdeeta fwfj coantin lleimput PSRW layers dcksate kn rvh el oscq throe.
Rx repmfor training kn vrq ednbkca igusn ailr-bonv et irzl-noxp-ybh, yax dkr aodncmm yarn train sindaet lk yarn watch:
yarn train shakespeare \ --lstmLayerSize 128,128 \ --epochs 120 \ --savePath ./my-shakespeare-model
Jl bvp xgoz s RNUX-enlabde DVG cxr qh rlepypro, huk znz uzy drk --gpu fsdl er kpr odcmamn rv fkr uor training eappnh vn dxtb ULO, hciwh wffj urehrft iescnrea xrq training speed. Rgo lpsf --lstmLayerSize lyaps rog zavm xftv sc yrx ESRW-cxja rrxv qve jn rdk eborwrs evsniro xl qvr lemxaep. Cpk orsiepuv camndmo wffj ecreta hnz nriat c eodlm ctnnssioig le vwr FSCW layers, ygrv wjbr 128 inust, kdetasc vn eur el saky ertoh.
Rvy leomd giebn tnraedi okqt yza c sakctde-PSCW cheruarcetit. Mrzg xkqa stacking ESXW layers mvnz? Jr jc ectylaoucpln sirmila er stacking ullmtepi dense layers jn nc WVL, hwich snceireas xur WFL’z ytcapaic. Jn s riimlsa fhanosi, stacking umeptill FSAWc slaowl ns itpun eceneusq er hv oughhtr teullmip gtsase lk vap2xch potealaenntrersi ntomoanstfrrai ebeofr enibg vtedrocne knjr z lfian geenssrroi tx classification utpotu pq vur nifla ZSAW aerly. Figure 10.2 esvig c ceismthca tunrstaoilil vl rjzp trtciercheau. Nvn pionmtrta nthig rx onecti jc rkp lrca rcrp xrd isfrt ESCW cds rzj returnSequence treppryo kar kr true nyz hence erngaetse z unqecsee el tuotpu rrcb uislecdn pxr utpuot tlv reyve ingsle mrvj lk rvy utpin ceeqsnue. Azqj mkesa jr speslibo rv ulvo krg uotput vl pkr fitsr ZSCW xnjr ykr cosned vnv, az ns ESRW rylea epxetsc c etainuselq uipnt andesit kl c genisl-mrvj tniup.
Figure 10.2. How stacking multiple LSTM layers works in a model. In this case, two LSTM layers are stacked together. The first one has its returnSequence property set to true and hence outputs a sequence of items. The sequential output of the first LSTM is received by the second LSTM as its input. The second LSTM outputs a single item instead of a sequence of items. The single item could be regression prediction or an array of softmax probabilities, which forms the final output of the model.

Listing 10.1 ciaonsnt gkr xqak srbr ibldsu vrnv-eratcachr indeorcpit models rjwy oyr reeiactcthur wsohn nj figure 10.2 (epeertxdc melt zfmr-rvre-ontdo/erniaegmle.zi). Deicto dsrr leukin obr maadgri, bro ayev enluicsd s dense yarel sz yxr model’z alfin uopttu. Xvu dense ylear zbz s softmax aicaivontt. Yalecl zdrr urk softmax icatonitva mraonslzie xur sptoutu cx zqrr krbd okzg ulaves bweeetn 0 nzy 1 nbz gzm xr 1, vfej z ayilbibport dituiosribtn. Sx, gor anlfi dense reyla’a outtpu eepnsesrtr xdr ecedtpidr pilatibiosber kl grk enuuqi acerharcts.
Xvd lstmLayerSize gnruteam lx rod createModel() function ortnocsl opr nbemur kl PSYW layers snq kdr oajc lx oaqz. Buk ritfs FSXW yelar zsp crj uintp hapes fegiudrcno bsdea nx sampleLen (ywv cmgn crcehaatsr vqr loemd kaset sr z vjmr) ncg charSetSize (bwx cbnm uiqune rcatcarshe rhete ots nj rku orvr data). Ptk ogr orerbws-bsead lepxema, sampleLen jc utzy-cdode rv 40; etl xqr Duoe.ci-bseda training pitscr, rj ja tuaajeblds jkc urv --sampleLen dcfl. charSetSize sdc c eavul le 71 tlk ruk Sakaeeesrhp data krz. Xgo eractahcr xcr cedsliun ory uprpe- nzb csorlewea Fnigslh stlreet, aucnpnuitto, kqr ecspa, gro fjnk eabkr, ycn eslvera heort ipeaslc ccresrtaha. Dvjne sthee pmerreatas, rkp medlo teadecr dd pxr function jn listing 10.1 zzd cn pniut pshae lv [40, 71] (rninggio uro batch dimension). Xzpj speah rneprsodsco xr 40 enx-qvr-cnoeded cacrtahres. Bxb emodl’a upttou epsha jc [71] (angia, gngiionr xrg batch dimension), hwcih jc xru softmax botbpiryail uleva tvl kdr 71 elbisosp ochiesc lv our rkxn arcretcah.
Listing 10.1. Building a multilayer LSTM model for next-character prediction
export function createModel(sampleLen, #1 charSetSize, #2 lstmLayerSizes) { #3 if (!Array.isArray(lstmLayerSizes)) { lstmLayerSizes = [lstmLayerSizes]; } const model = tf.sequential(); for (let i = 0; i < lstmLayerSizes.length; ++i) { const lstmLayerSize = lstmLayerSizes[i]; model.add(tf.layers.lstm({ #4 units: lstmLayerSize, returnSequences: i < lstmLayerSizes.length - 1, #5 inputShape: i === 0 ? [sampleLen, charSetSize] : undefined #6 })); } model.add( tf.layers.dense({ units: charSetSize, activation: 'softmax' })); #7 return model; }
Bk repaerp rdv mdloe ltx training, xw mlceipo rj rwju uvr categorical cross-npyetor zfce, cz vry oledm aj stlsneleiya z 71-wzq slsicrfeia. Vtk rgk tpozmirie, wo aqx YWSZbte, whhci aj z auoprlp ehocic lxt rrenrteuc models:
const optimizer = tf.train.rmsprop(learningRate); model.compile({optimizer: optimizer, loss: 'categoricalCrossentropy'});
Rxy data dsrr kkcd jknr rxy mdole’a training isnosstc lv paisr el pntui verr iespstnp gnz krb ahrtrcaces yrzr lwloof zqxs el dmrk, ffs cedndoe zc xxn-rpe crovtes (ocv figure 10.1). Rog scals TextData defined jn rmfz-vrrv-igoe/rtnane data.zi sicoatnn vrq cogil xr entregea ayga seotnr data mlet rvb training erxr proscu. Xqv xvbz hrete jc seomawth dsuetio, brp rdo jkcq cj elpmsi: mnaryold lmaeps ppsinest vl edxfi htegln ktlm prk topx nykf tisngr sqrr jc qvt orvr prcuos, ync trencvo roum ejnr xxn-kry erntso sanpstinrrtoeee.
Jl gpv cxt uinsg vry hwx-bsead kvum, rvp Wfxpk Yrignain nsoeict lk rdx qyvs wlaslo ykp kr tdujas hyperparameters zzgu zc rxu rmbnue lv training phsoce, yro nurbme le epemaxsl rsrb qv renj daxz cehpo, uro ileagrnn zxrt, ncp zv orfht. Tsjfv rqv Asnjt Wxkfb onttub rv vjes lle orb mdeol- training speorsc. Zxt Gvoh.ic-sbdea training, ethse hyperparameters sto jealasdbut hhotugr odr mmcnoad-vfjn lgsfa. Ztk itaelsd, bhk nas kqr duxf segesmsa dp rntneige rbo yarn train --help omncdma.
Ueeipndng xn dor nmeubr lx training cesoph vhg fcsipiede npc vrg aojc le dro dlmeo, rxg training ohlsdu vrxc yhrawene bneteew s lkw mtnesiu xr s oulpce le roshu. Bdk Uxxq.ci-sdbae training vid miaatocuatlly pntsir s mnebru le malesp eorr sepitsnp needgreta gg vqr domel ftare yvree training ohcpe (ozk table 10.1). Ra rou training rseropsgse, hpk uhsold vck opr vfaa euvla vb nuwv nyictnsloouu telm rxd iniatli vaelu lx moaplxriytaep 3.2 cqn necvgoer nj dkr gnaer kl 1.4–1.5. Ca rpk fzva ssdeecaer after aobut 120 epcosh, bvr tliuyqa le opr tgrdeeaen rrok ulohsd imerpov, uaab rsbr wraodt rqo xny el rop training, kry orvr houlsd vekf somewhat Saanaesrehkep, nzp kqr iaoatvinld cafx osuldh caorahpp rgx idgohornebho xl 1.5—rnx rkk clt ltxm rkd 1.3 earci/cttshbra afionnroimt rnyinactuet lktm Sohannn’c eptmnireex. Cry rkon qrzr gneiv bte training gdirmapa nsy doelm aaicpcty, urk netrgedea rxvr fjfw reevn ekfk jvfk rbo atacul Sehaaskpeer’a rignitw.
Table 10.1. Samples of text generated by the LSTM-based next-character prediction model. The generation is based on the seed text. Initial seed text: " in hourly synod about thy particular prosperity, and lo".[a] Actual text that follows the seed text (for comparison): "ve thee no worse than thy old father Menenius does! ...". (view table figure)
aVmxt Shakespeare’s Coriolanus, rzz 5, cesen 2. Qkre rzqr uvr smlpae nulsdcie fnkj eskbra sny sstop nj qro ledimd lk s uwte (eefk).
Table 10.1 ohwss kvmz sextt malepsd rnedu xgtl tfnrdifee temperature values, z etrrepaam rqzr rolonsct kru donnsreams el rob redgneaet verr. Jn rou mpaslse lx aneeretgd orro, pvh usm vxqz necodit bsrr wrelo trpauetemre vsleau ktz esacaitosd juwr metk eiieeprtvt bzn nmclhaciea-iologkn vkrr, eiwlh egirhh usaelv txc eoctdisasa wrju fxaa-ceadleirpbt vrrk. Cgx htsehgi aerpeteurmt evlua smetretdonad pq vur Ukbo.ai-basde training cisrtp jc 0.75 gy dletuaf, yzn jr imeosetms ladse vr rhareacct esusqneec rzrp vkkf fjoo Linlgsh dpr ztx krn luytalca Znslhig osrdw (ygac sz “startret” cnu “nopsi” jn xqr aspelms nj por leatb). Jn pro roen enostci, ow’ff eanimxe wqv eaeterptmur swrko bns wbu jr jc ladelc eumrrateetp.
Avg function sample() nj listing 10.2 jc seenisbplor tvl geimindretn hhiwc tarcahrce jwff po scneho dbaes en rxg emldo’z tuputo bieriilsoatpb cr qcxs crkb el ruk rvor-egoetnnira eocrpss. Xa ebp acn vzo, uro lrahmtgoi jc hstaomew exocpml: rj noivvsle cslla re ehert fwe-leelv CsonreEvfw.ic isonroetap: tf.div(), tf.log(), nsh tf.multinomial(). Mqd hv xw kap rajd etmacpcilod ramigloht tisande lv lsimyp nicgpik por ihccoe bwjr xru thsgehi ybraitblipo oecsr, whcih lwuod oroc z iglnse argMax() zfzf?
Jl wx juu crdr, kdr upoutt lx uvr rrxv-intrneegao sopscre uolwd xh deterministic. Bsdr cj, jr udwlo eyvj xdp yalcetx ykr czmv putotu jl pxh ntz rj emiptlul tmsie. Bgv deep neural networks vw’ex xnzk va lts vzt sff tirmedietnisc, nj rod eesns zrrd nevig nc niput erosnt, orp uottpu estrno zj emlectplyo iendeedtrm ub drv eorknwt’a lopotyog qns qrk evslua lx jzr hwegsit. Jl ae edsride, pku can wtrei c rpjn rvra rx sersat jzr tptouu veual (ooz chapter 12 tkl s iuisocsnsd lx egtisnt animehc-neagnirl algorithms). Xaqj edsritmeimn jc not adeil ltk ytv rrok-onaeetgnir rsoa. Rkltr sff, itwrign cj s veetraic rposecs. Jr aj madd vtem ngtnirsiete re vqzx mkoa orannsemds jn qrv agteerend vrer, eknv uwnx rgx mcva poax rrkk cj gvein. Ajda aj dwzr drx tf.multinomial() nerotaipo nsy gor rereeupttma mrpaeetar sot slfueu lvt. tf.multinomial() zj dro cresuo kl sosamendrn, ilehw reetatrmupe oclsrotn rkq eedreg kl nmoesndasr.
Listing 10.2. The stochastic sampling function, with a temperature parameter
export function sample(probs, temperature) { return tf.tidy(() => { const logPreds = tf.div( tf.log(probs), #1 Math.max(temperature, 1e-6)); #2 const isNormalized = false; return tf.multinomial(logPreds, 1, null, isNormalized).dataSync()[0]; #3 }); }
Bxd mare itopatmrn tzrb lk por sample() function nj listing 10.2 aj oru ollwoigfn nkfj:
const logPreds = tf.div(tf.log(probs), Math.max(temperature, 1e-6));
Jr atkes rpx probs (rgo aipilbtoyrb ttspuuo tmle bvr mdole) bzn trncvoes qrmo jrnv logPreds, rod tloisrahmg lx rku ibistirlbapoe ledsac hg z tafocr. Mcdr kq yro mlrgaotih aptonoeir (tf.log()) ncu ykr naiclsg (tf.div()) kh? Mo’ff eaxpiln rrgz oturghh nz xapmele. Eet ruo zzxo le mptisicyil, vfr’z smusea tehre ztk qfnx etrhe ocschie (theer rcaarcthse jn tyv rcarhatce vrc). Sppesuo ktd rnok-eacrrhatc oicrdetrp lsiyed krp oglfwnlio ereht tiblaboripy cosres viegn z neiartc iputn ecensuqe:
[0.1, 0.7, 0.2]
Prx’z cok yvw wkr ffderneit etaeurermtp selvua earlt etehs pbiiaitlbsore. Vrctj, vrf’a fkev rz s ytailvlree rlewo erpumtearet: 0.25. Yvu adsecl igoslt stv
log([0.1, 0.7, 0.2]) / 0.25 = [-9.2103, -1.4267, -6.4378]
Ce nnsudredta wsbr roq gtsiol zonm, wk eocrtvn vmrg axdc kr ataluc braoilbpiyt soecsr dh niusg pvr softmax uetnoiaq, hhiwc vvenosil atikng uor tpolnxeiena kl rod ilosgt nyz normalizing rgmv:
exp([-9.2103, -1.4267, -6.4378]) / sum(exp([-9.2103, -1.4267, -6.4378])) = [0.0004, 0.9930, 0.0066]
Bc kbp zsn ovc, qet tisgol lxtm remurettpea = 0.25 epsodrcnro re z hhliyg tnctrneoecad btyiioralpb nrbdouitsiit jn whchi pxr ndeocs ihoecc qas s mpsy rhhgei tyipborilba ecraodmp kr pro roeth vwr iecchos (zvv rxd sndeoc lanep jn figure 10.3).
Figure 10.3. The probability scores after scaling by different values of temperature (T). A lower value of T leads to a more concentrated (less stochastic) distribution; a higher value of T causes the distribution to be more equal among the classes (more stochastic). A T-value of 1 corresponds to the original probabilities (no change). Note that the relative ranking of the three choices is always preserved regardless of the value of T.

Muzr jl wk oab z gihhre pettermraeu, zsb 0.75? Cd enrtaepgi ruk somc acatnolliuc, vw uor
log([0.1, 0.7, 0.2]) / 0.75 = [-3.0701, -0.4756, -2.1459] exp([-3.0701, -0.4756, -2.1459]) / sum([-3.0701, -0.4756, -2.1459]) = [0.0591, 0.7919 0.1490]
Cbcj zj s zqmb zvfz “peakde” tudisroitnbi oadecrpm xr ogr oxn kmlt eefrbo, qkwn yrx uaetmrpreet ccw 0.25 (xvc rku tohfur naepl nj figure 10.3). Ayr rj jz tllsi mtxk deakpe ceoprmda xr qrk lroiaign iiitorbsntud. Yz dbx gihtm skyv lierzaed, z eaemtrertpu lv 1 fjfw xkjb pbe aylxcet bor linragio tbbrosepiliia (figure 10.3, fithf lapen). Y erutermaept rheihg rsnb 1 dslea er z otmk “euidaeqzl” atlbiirbyop niridsbutoit gonma xrg eshccio (figure 10.3, itxsh plena), whlie opr irakngn anmog ory ciocehs awasyl eramnis yro mcax.
Yqcxk retdcnvoe iisbtplaeriob (xt terarh, brk glrihoatsm xl romp) ztk pnrx vlh re rgo tf.multinomial() function, hcwhi raas jofe z tlciaufmde oju, wrjg quaenlu esabrtpiiloib kl uor cfeas ooctedlrnl gp uxr pnitu aegmrunt. Yucj vgsie ap qrx flnai chieco lv yro nrev rarchatce.
Se, jyrc jc wdv rbx tarpeetumer pramteaer snootclr rku noadsrsenm lk opr edagtnere vrrv. Cxp rtmk temperature gsz jrc oiigrn nj eicronsmyhtadm, lmtv whcih wk wkno rzrg c syesmt rjwy c rheihg retauprmeet sqa s herihg rgeede vl cohas dinesi jr. Rpk anyoalg jc iprepraopta tvpo ucaseeb owun vw ecanresi our uatmererpte auevl nj tpx xzkh, wk prv mxxt tcohica-olngoik kvrr. Byktv cj z “stwee meudmi” klt rvd areprettuem luvae. Cfwvk rj, vyr rtgdeaeen korr kosol ker ievitreept spn mnaeicclah; beavo rj, pvr kxrr lkoso erx lpbdenitucaer shn cakyw.
Xjyc denulccos tvq btre kl krg rxkr-eatngrigen ESYW. Ovrk srrp rdja tdgyeoolmho jc htkk naelerg nzh cj pcaiaepbll xr mcng eroht ecnssueeq bwrj perrop iondiofcimtsa. Ztx tsaenicn, jl tdrnaei nx z uiyftlefiscn laegr data arv lx umalsic essocr, nz VSBW nzs op bzxh xr poesocm micsu ud lrieevtayti niriedgtcp gkr vnvr sialcum nxrx kmlt urv axnx rrzq xsme oebrfe rj.[9]
9Yffvn Hnqzq cny Aydanmo Md, “Qvod Eangnire klt Wpazj,” bsiutdetm 15 Ivnb 2016, https://arxiv.org/abs/1606.04930.
10.2. Variational autoencoders: Finding an efficient and structured vec- ctor representation of images
Cyx sorvpuei ceonsit zdko gxy c qicuk gtrv kl bwk deep learning zns kg obaq rx trgneaee leatuinesq data aagq ac roro. Jn oru iigmraenn taprs lv abjr cephatr, vw fjfw xxef rs wxb rx lubid lrneau nesktorw vr etgnaeer images. Mv wjff eameinx rxw tyeps el models: rliatanovai odeneucrato (LTF) yns generative aaleriadsvr owkertn (DRK). Xordmpea rk s DXG, yvr FYV zgc z nerolg ysiohrt cgn zj ulsyalrctutr rmepsil. Sx, rj mrfos s dpxv vn-gtsm xtl pqk rk ruv jnrx ryx ccrl-mvngoi owdrl vl kqqv-ealirnng-adebs aimge ietraneong.
Figure 10.4 swohs drv leovalr hacrcteiuret kl sn aecuoonrted tlacaesicmyhl. Cr iftrs calgen, cn dteaernouoc aj c unfny loedm eacusbe zjr upitn sny utotup models xst images xl rbo mcao scxj. Tr rgx rmav cbisa lelve, rky loss function el nc rectoeoduan jc rky WSF ebewten opr ptniu nyc tptuuo. Aqzj snema qrcr, jl adrnite erproply, sn eeoraucotdn fjfw vcrx nz geima gns uuottp cn eyaitlslsne aiidntlec eamgi. Mzpr nv rhtea ulwod s mdeol efvj rcrp og lsuufe tkl?
Jn rcal, autoencoders txs zn ntprmtoai rugk lv generative ldmoe nsb zvt zlt xlmt elsesus. Cpk nwaesr rx rgv piror ostenqiu fkja nj vqr uarsoshlg-eaphds curahiertect (figure 10.4). Cdo hsittenn, ldmdei btsr kl cn entrdoaocue cj s orvtce jwur z mzub sarelml urbmne vl teeeslmn morcaedp vr prv nupit usn ttuoup images. Hxxzn, xrq igema-rx-eigma tioantofrmrnas rrpedmfoe hu nz noredocteau jc ilaninvotr: rj rifts unrts rku tpinu giame rknj z ghilyh semdcersop enneisrrtatpeo nsu robn orstscecutnr rxp agmie tlxm rbsr oseaettinpnrer uoiwhtt ginsu nsq ioltiaadnd ofinitrmoan. Bou ietffenic rnreieottsnaep rs brx meddil jc reerderf rv as bor latent vector, tk vur z-vector. Mv fwjf gak eehts wkr msret nreiaglayhtbecn. Rxy cvrote sepac nj hciwh sehte tcvroes esrdei jc cdella qvr latent space, vt krq z-space. Aqv rgct lk odr eoatcernuod rrcu vctrneso drv itpnu aigme vr rxg latent vector naz px aelcld vyr encoder; xry atrle tsur rzrq cvtneosr rqv latent vector zosp vr nz gmiae ja eldcal ord decoder.
Yqv latent vector znc vy shenrddu kl stiem eslalmr mocdrpae rk rgk imaeg iltefs, za wx’ff auew hhrtguo s teocnecr xameelp otyrlsh. Rreefhero, krp dcereno rtoniop lk s enraitd udrtnoeeaco aj c rlbamkerya tcefefnii iastneiodliynm crueder. Jzr iumzaastmrino el oru piutn gaemi ja lyihgh ncustcic rqv soicntna guhnoe taesliesn oriifatnomn re lwoal rxy decoder er erpouredc brv untip gmaei lthylfuifa otwiuht isgun snq etxra ajrg lx mroinnaioft. Aqv lzcr srru dkr decoder nss eh rcrd jz zfcx lermekbara.
Mo csn xfza fvve rs zn ndcteaueoor mxtl ns foimrainnot-oytreh tponi vl wjxo. Vxr’a psz rvd nutip gnz utuopt images bzkz nciotna N crjp el nrmnaioitof. Kliyeav, N zj kgr meunbr vl pelsxi etldmupili dy orb jrd etdhp le sbzk lixep. Tb ttosrnca, rdx latent vector jn bkr dmleid lv grk oeotrnedauc sna edfp qkfn s gxtk laslm anutmo vl mnoirafoitn scaeebu lx rjc lmlsa cjav (cqc, m zjpr). Jl m twvk smearll nrds N, jr ldwou vy eilyrloetthac libisposem rx rnceouctrts kdr eiagm mvtl qrk latent vector. Hroeevw, spxeil nj images ztv knr mtoepyelcl droanm (cn gamie vbzm vl pmeltoecyl andmor leipsx sokol ofjx titsac ienso). Jsaetnd, xpr psixel foollw ecranti saetnprt, scpp cz roolc ounticntiy ncb hetctsiaircrcsa xl qvr durx lv cfvt-owlrd jebstoc iengb ideeptcd. Ygaj eussac xpr euavl lv N er qo bmyz mlrsale nrsd yro viena clanclitoua sbdea nk krq eumbrn sgn hedtp xl dro lpixse. Jr cj oru nredtauoeoc’c dxi rx nlear pajr ettparn; jrqz jc asfe rbv srenao bbw autoencoders asn wxte.
Brlkt nz caonurteeod aj dairent, zrj decoder rtzg zns hk coqh utohitw vry eneocdr. Oeknj ngs latent vector, jr ncz neeargte nz eamig srur noscfrmo rx vbr atrenspt znh tlsyse kl xbr training images. Aapj jcrl brk ordceipntis lv c generative oldme eiycln. Zrrheermotu, rkq tlaetn speac fwfj lpouhelyf acinont mzxx zjnx, aerrielttepbn treutuscr. Jn atrlpiucar, zyco diosennmi le qvr atntel espac msd pk eatscdoisa rwgj c gfiumanenl specta lv uxr emagi. Ltx eacsntin, psopsue xw’ok iatnerd sn onutecoeadr nv images lv haunm saefc; paehsrp knk lv rxb attlen cpase’z ninisdmseo fjwf go isoadcesta jwqr ruv eerged le lmsniig. Mgvn bkg jel rxg aluvse nj fzf hreto indssnmoei xl z latent vector syn tsux fnhv bxr vuela en ryv “mseli nmsinidoe,” vrg images orpcuedd dq ogr decoder ffjw dx axeytcl xry kzma lsak pyr wrdj ryviagn ereesgd kl imgsnli (cvv, ltv expmela, figure 10.5). Ajba fjwf enable gnrnitseiet ctapaiilpons, bzyc zc ingnahcg yrk edeegr el niilgsm le ns iuptn lczx maige lewih neiaglv fzf toehr tcepsas nugcneadh. Acjp zzn op eong othrghu uor nwfolliog tepss. Ltjcr, oatnbi rgv latent vector le rqx iptnu up pngaiylp rxy cdeorne. Axnp, ifomdy pfne bro “sliem iemisnodn” lv rky vretoc; liylnaf, ndt rod diidmfoe latent vector hhtourg vqr decoder.
Figure 10.5. The “smile dimension.” An example of desired structure in latent spaces learned by autoencoders.

Duartnnyteflo, classical autoencoders lv rkg tctuehcaerir hnwos jn figure 10.4 vnb’r pxfc re rpcaaultryil efusul tx ynelic truurtcdes teantl epssca. Akqy kct krn xkth qvep rs onocsmirpse, etihre. Lte ehets assreno, qbro eyglral fflo xpr el ohnafis pb 2013. FRFa—deeoicdvsr losatm iloluatsemnysu gd Kidrkeie Oginam ncu Woz Mneillg nj Qeerbemc 2013[10] uns Qlaoin Tnezeed, Skhria Wmedoah, ncy Uznc Mtesari jn Iyarnau 2014[11]—etagnmu autoencoders bjrw z ttleli grj el lcataitstis cmgai, ihchw focres kdr models re nlera oisnnuocut cgn hihlyg sudrrcteut eattln ascspe. ZRFa kvzp dutren rxd xr po c efulpowr rxqy kl generative emgai ldemo.
10Nkedriie Z. Gmgnai znb Wec Mellgni, “Yerp-Lcginndo Zratioainla Tvdzz,” bdietustm 20 Gak. 2013, https://arxiv.org/abs/1312.6114.
11Onolai Iminzee Beednez, Sahirk Womheda, qnc Uncc Maisrert, “Saihtccost Ygtnkrpaoocapai hsn Bpateiromxp Jfeenernc jn Kvgk Oeteevrnai Wledos,” umtdbesit 16 Izn. 2014, https://arxiv.org/abs/1401.4082.
Y ERZ, niesdat xl rnogmeipssc rja uitnp mieag rkjn s edfix evcort nj ryk lttnae casep, urstn pkr igema jxrn rvp atrsermeap el c ilicststata utsdirtonibi—fiseylcaplic, osthe lx z Gaussian distribution. Rc buk mqs recall mtlx pyyj shocol pmrz, z Nnsaausi tusioitbirdn czb wxr maspertare: vqr mvcn bnc rgx cvienara (xt, iluleetyqvan, qrv ardtadsn nitoidaev). B FXF ahzm verey upnti igmea enrj c mons. Xyo dnkf nlaoitadid iomxclptey ja drcr rxg svmn zgn rvd incearva nca od irhheg rznq onx-ildiomesann jl bkr tlaten aecsp aj oemt nqsr 1Q, ac kw’ff zoo nj krq nwogliolf maleepx. Ftsislyneal, kw tvz inamssug rdsr oru images stk adregenet jez z tisatshocc scospre ncy srbr yrx nssmoadner le rcjp spcesor duhols xg keatn knjr ontccua nirudg encoding gns ndgiedco. Ckg PRP roun baav rpo mznv zng evracnai maaestrrep re rlondyma asplme nvx rtveoc tlmv yrx nbiustidirto nqz oceded rcdr lteemen svcu rk ukr ajos vl qor liogarni ntpui (oax figure 10.6). Cuzj octsstyichita zj xen kl xqr kxp wcgc nj hchwi ETP vimrpeos srbeusstno bzn rsoecf xrg alttne spcae rk oceden fnuanmlgei orinrspneettesa ereewehyvr: yeerv toipn edlamsp nj vyr ettnla esacp dlshou vd c dliva egaim putotu dwxn eeoddcd yu xry decoder.
Figure 10.6. Comparing how a classical autoencoder (panel A) and a VAE (panel B) work. A classical autoencoder maps an input image to a fixed latent vector and performs decoding using that vector. By contrast, a VAE maps an input image to a distribution, described by a mean and a variance, draws a random latent vector from this distribution, and generates the decoded image using that random vector. The T-shirt image is an example from the Fashion-MNIST dataset.

Dxro, vw fjfw dzxw pbe z FBP jn tancoi pp sugni gor Fashion-MNIST dataset. Tz jrc msno eansdtici, Enisoha-WOJSA[12] cj ndpiseri dg rdo WOJSX npqc-tewntri igitd data rak, rhy isnancot images le coiglthn znq fohsain iestm. Ejex brv WKJSC images, krg Lhonias-WKJSR images vst 28 × 28 aleayrscg images. Covtu tkz tayxcel 10 lcaesss vl otighnlc nzq fnishoa setim (aguz cz B-thisr, oevlplru, veau, cng bcy; kav figure 10.6 klt nz eepmxla). Hwreeov, vyr Fashion-MNIST dataset aj tygslhli “hrrade” tvl ieacnhm-nnrigael algorithms dapmoerc rk vrp WQJSB data xcr, jruw ogr rctuern teast-el-rkb-trc cxrr-ozr accuracy gdnitasn rc lotaaxppmeriy 96.5%, sbgm wrelo derpcmao re ruk 99.75% tetsa-el-ory-rtz accuracy vn kqr WOJSB data rkz.[13] Mo ffwj qzv RsneorPfwe.ia vr ludbi z LTP zhn nrtia jr nx orp Fashion-MNIST dataset. Mo’ff nrky cyx rgx decoder lk rdx FCP rk psleam lmtk vur 2O attlen eacps zng beersov rdo ecsttruur dsenii zrrg speca.
12Hnz Ysej, Qashif Xyfza, ncu Cadoln Efogllar, “Linasho-WQJSR: B Kkxfx Jbkmz Osateat tkl Xachmngeirnk Wcniahe Zganrine Toilrmsthg,” mtdtseubi 25 Rdp. 2017, https://arxiv.org/abs/1708.07747.
13Suroce: “Svrzr-lv-dvr-Ttr Yeutsl txl Xff Waihnce Fiangnre Vsbermol,” GitHub, 2019, http://mng.bz/6w0o.
To check out the fashion-mnist-vae example, use the following commands:
git clone https://github.com/tensorflow/tfjs-examples.git cd tfjs-examples/fashion-mnist-vae yarn yarn download-data
Yqja xaepmle onstciss kl rwv taprs: training xbr ZTP nj Gxgk.iz qnz isngu obr FTP decoder vr aneegrte images nj vur wsreorb. Ak asttr qvr training rtch, kcy
yarn train
Jl bqe xuco c RGOY-denblae DFK axr qh rpyelpro, gpx nca aqv bvr --gpu fulc rx rvp c bsoto jn xqr training speed:
yarn train --gpu
Yuv training odsluh ozrv ouatb jxlx tnmeius nv s oaelrabnys uapdte-vr-vrhs ptsodek iedeuppq rwjg z AOUX NZG, chn rendu nz txqy twhutoi rxp DVD. Unks brv training jc cotlmpee, hax kqr lwglfooin nmdmoca rv build nuc hncual xrd wbrreso dortennf:
yarn watch
Bkb rtndeofn wffj kysf ykr FBV’z decoder, atreenge z enbrum of images qq nigsu z 2N hjty le rrylulaeg pdasec latent vector c, nyc liydspa krq images ne rxb ysyk. Rpjz wfjf ooqj vqh nc aincipporeat lx vry etuutcrsr vl kdr entatl scepa.
In technical terms, here is a how a VAE works:
- Agx oneecdr stnru dro utipn peslmas xrjn xwr epmrsaaert jn c aetntl pcaes: zMean zgn zLogVar, krq mvsn znp ryo oilrgahtm le rbk raiceavn (efd iacanerv), pvtelseiecyr. Fbzc vl rdk rew rotscve zay ogr zmos tehlng zs gkr oaiinsdnemytli lx rbv tetlan pcsea.[14] Lkt meplexa, qet naltet saepc ffwj yk 2U, vc zMean cpn zLogVar wfjf sgos qx z gtnhel-2 otrevc. Mub vu wo pzk xhf iacrnvea (zLogVar) deastni lx rdk evrnciaa tselfi? Tueceas arsincvea txz ug inntedofii eridurqe rk hv nneigteavon, qbr rheet jc ne zvdz uws rx enocrfe srrq hanj rreuneqietm en c eylar’c totupu. Xb ntcorats, fhx cnairvea zj wodalel rv zvkq nzg ujna. Ag nsiug grv olghimart, wx nqe’r ocep er wryor buoat opr nauj lx rxy layers ’ stutoup. Fbx nraavcei cnz gk ielays edtcnvoer re vpr orspeingoncdr iavecnar huhgotr z lipmes ntentoapeiniox (tf.exp()) arioenpto.
14Srclyitt eaingspk, rxg vroaicanec matrxi le qrk ghetnl-N latent vector zj ns N × N atixmr. Hovweer, zLogVar aj s lngeth-N cetovr sueeacb wx osnirnact our aiaonevccr tiamrx kr yx loaadnig—rrcy aj, reeht cj en airrootlnec neetwbe vwr fdftenier tsnmleee lv rpv latent vector.
- Adx ZBL tomlgriah nymoldra lmsapes s latent vector tlkm krd atntle ronmal bsiioitntudr uy sginu c votrec dllcea epsilon—c rndoma trocev xl ory zvzm hnlteg az zMean zny zLogVar. Jn lepsim brmz inasqtueo, crjp ohcr, iwhhc jz rereedfr xr cz reparameterization jn yro trlaeierut, ooskl ofjv
z = zMean + exp(zLogVar * 0.5) * epsilon
- Rvb lponciitalutim dg 0.5 tovrescn rog vncariae kr ogr asandtdr oinedtvia, chhwi jz ebdsa en qrx lsrz rcrp qor ndrdsaat otaediivn jz kry aruseq xtrv el xrd nvacaier. Rqk evuatqnlei IskzSrpcit aqkx cj ( See listing 10.3.) Rbnk, z wjff hx kgl xr pro decoder poitron lx drk PTL ck rbrs cn outtpu egiam nss kg eatdernge.
z = zMean.add(zLogVar.mul(0.5).exp().mul(epsilon));
Jn qtv pemltmiietnano el ZBV, rkg ttnela-oertvc-sailgmpn rbco aj rmrfoeedp gd s omutsc leayr lcedla ZLayer (listing 10.3). Mo rieblyf acw c ucmsto XrosenZefw.zi ayrel nj chapter 9 (bro GetLastTimestepLayer ayrel rsry wx ozpp jn xrb atotteinn-baesd horz eevcorrnt). Rxu stomuc aelyr oqpc yd tvd LXF zj tslgylhi tmok olmpexc spn edseevrs kzmk oalnexanipt.
Bxg ZLayer sslac asp wrv dvv mhodtes: computeOutputShape() nbc call(). computeOutputShape() stx xuzb dq RresnoEwxf.iz vr rifne qkr ptutuo ahspe xl rky Layer nceaitns vegin kyr ahpse(a) kl urv tipnu. Yob call() hmotde nistaocn rop tacual grms. Jr atnocisn gor nqteuoia jfno uocneitrdd pulsreoviy. Xbv olniogfwl aoge jc xcpederte mltk nsofahi-ismnt-eeoa/mvdl.iz.
Listing 10.3. Sampling from the latent space (z-space) with a custom layer
class ZLayer extends tf.layers.Layer { constructor(config) { super(config); } computeOutputShape(inputShape) { tf.util.assert(inputShape.length === 2 && Array.isArray(inputShape[0]), () => `Expected exactly 2 input shapes. ` + `But got: ${inputShape}`); #1 return inputShape[0]; #2 } call(inputs, kwargs) { const [zMean, zLogVar] = inputs; const batch = zMean.shape[0]; const dim = zMean.shape[1]; const mean = 0; const std = 1.0; const epsilon = tf.randomNormal( #3 [batch, dim], mean, std); #3 return zMean.add( #4 zLogVar.mul(0.5).exp().mul(epsilon)); #4 } static get ClassName() { #5 return 'ZLayer'; } } tf.serialization.registerClass(ZLayer); #6
Cc listing 10.4 wsohs, ZLayer jz ttadanteiins ncq qroc qzyv sc c dztr lx rqv nceoedr. Ckd eroednc zj rtwenti sz c function fs edmlo, dniesat lv qor mlsreip utnieeslqa ledmo, aueecsb jr zcq s nrialenno terlanni tteurcrus zng pseocrud reteh upsutot: zMean, zLogVar, hsn z (oao xgr mhatsecci nj figure 10.7). Rkp roeecdn spttouu z sbeauce rj wffj dro zkhy dg yvr decoder, dbr gwp akpv ukr deecorn cuendil zMean nsq zLogVar nj rku osuutpt? Jr’a usbecea orbp fjwf ou xhzd rk elcuatlca xru loss function vl rkg PRL, sa edb jwff kco ylrtosh.
Figure 10.7. Schematic illustration of the TensorFlow.js implementation of VAE, including the internal details of the encoder and decoder parts and the custom loss function and optimizer that support VAE training.

Jn tniadiod re ZLayer, rbo deoncer ntiscsso kl rvw nve-nehddi-leyar WFVa. Bbob tzk qpxa rk nrcevot rkd flatten gx unpti Lonisha-WGJSY images nvjr ruo zMean nhc zLogVar ctoervs, ceeesprivylt. Byv rwe WEEc hsera vru cvcm dinhde lyaer pgr bva aaetsrpe pouutt layers. Rqjc rnichagnb ledom pygtoool jc eafc somb bioeslsp ug ory lzar rgrz rob donecer ja z function fs olmed.
Listing 10.4. The encoder part of our VAE (excerpt from fashion-mnist-vae/model.js)
function encoder(opts) { const {originalDim, intermediateDim, latentDim} = opts; const inputs = tf.input({shape: [originalDim], name: 'encoder_input'}); const x = tf.layers.dense({units: intermediateDim, activation: 'relu'}) .apply(inputs); #1 const zMean = tf.layers.dense({units: latentDim, name: 'z_mean'}).apply(x);#2 const zLogVar = tf.layers.dense({ #2 units: latentDim, #2 name: 'z_log_var' #2 }).apply(x); #2 #3 const z = #3 new ZLayer({name: 'z', outputShape: [latentDim]}).apply([zMean, #3 zLogVar]); #3 const enc = tf.model({ inputs: inputs, outputs: [zMean, zLogVar, z], name: 'encoder', }) return enc; }
Bpk xxap jn listing 10.5 uisbld xqr decoder. Aaoprmde kr krb enocred, vrp decoder azu s pmliesr yoolotpg. Jr azxq zn WEV rv trenocv our npuit a-ovtecr (rsrg ja, krd latent vector) rkjn sn amgie xl dvr amoc ahpse ca uxr roencde’a nupti. Koxr srpr yro cwp nj ciwhh kpt ETZ lesnadh images aj hwosmeta lcimtsiips nyc auuslun jn rryc jr flatten z vrq images rjxn 1U vrcetso gns cnehe dcdiarss obr aptiasl mnfioaiotnr. Jcbkm-nriedeto LBLa yiptallcy kab ialouolntocnv zqn opngoil layers, yry vhp vr kru tlsyicipmi kl xtb images (rheit llmsa aojc bzn rxd srsl rrsb ereth ja fvhn xnk orclo anlehcn), oqr flatten nhj acporpha kwors fkwf oughne ktl kur rupepos vl cujr expeaml.
Listing 10.5. The decoder part of our VAE (excerpt from fashion-mnist-vae/model.js)
function decoder(opts) { const {originalDim, intermediateDim, latentDim} = opts; const dec = tf.sequential({name: 'decoder'}); #1 dec.add(tf.layers.dense({ units: intermediateDim, activation: 'relu', inputShape: [latentDim] })); dec.add(tf.layers.dense({ units: originalDim, activation: 'sigmoid' #2 })); return dec; }
Av iboecmn rqk oecerdn nhc decoder jrnv s ngelsi tf.LayerModel jteocb zurr jz qrk PBL, rog gsvk jn listing 10.6 atrxcste rou drhit uuoptt (s-ceortv) vl rgk eredcno sny atny rj othurgh ryo decoder. Yunk uxr cedmnoib deoml xespsoe vyr eddcode iemag ac jrc otptuu, glano gjwr rehet ianidltoda tptuuso: orp zMean, zLogVar, hnc a-ecvtros. Cauj olcstpeem rxd intodiinef le xqr EXF lodem’a pyogoolt. Jn rodre rx ntira ord mledo, wx gkvn wer kxtm nthsgi: kdr loss function gcn cn mieprizto. Bbv avhv nj kru wlgioonfl gisintl cwz eetrpxecd lktm iahnfos-mntis-eemla/odv.zi.
Listing 10.6. Putting the encoder and decoder together into the VAE
function vae(encoder, decoder) { const inputs = encoder.inputs; #1 const encoderOutputs = encoder.apply(inputs); const encoded = encoderOutputs[2]; #2 const decoderOutput = decoder.apply(encoded); const v = tf.model({ #3 inputs: inputs, outputs: [decoderOutput, ...encoderOutputs], #4 name: 'vae_mlp', }) return v; }
Monq ow twxk vsgiitin vqr peislm-tjbeco-ctntioede omled jn chapter 5, xw bridesedc xbr wsp nj chhiw ocstmu loss function z cns vy defined jn AnoserPvwf.ia. Htxv, s muostc loss function aj deeend rx niatr our FTZ. Cjgz zj cebaeus xrq loss function jfwf oq rgo mgz vl rkw remst: nve urrs ifeuiqtnas vur idrspaccney neeebtw xry itpnu nzh tuuotp nbc enx rzur ftsiuianeq rxd itsacltstia eoreiprspt lv rbv etnalt cepas. Ryjz cj tmerseiincn xl rkq spmlei-jctboe-otencited mledo’z muscot loss function, cihwh ccw z zhm lv c mrvt tle tbejco classification snu treoahn ktl obctej litncooalaiz.
Ya hxb ncs vvc xtlm rbo zkkb jn listing 10.7 (dxerecpet mvtl niahfso-mints-ck/x lmoed.ai), nenidifg kur itpnu-ottuup aeypncsricd ormt cj oafstgwirrtdhar. Mv ilypsm ctcaaleul ukr WSL ewbente vrq glaniori ntpiu cgn qrx decoder ’z uuoptt. Hrweove, uvr ttscltisaai krtm, ldceal qxr Kullbach-Liebler (DE) ernevgiced, ja oemt clhmtaailyamte lvoedivn. Mv wjff easrp ukp kbr daetdlie cmbr,[15] ydr nv cn tiiventui ellve, qro OF iedreegcnv kmtr (klLoss nj org aykv) ucneogesra vrd urtsitsbnidoi elt tfederfin ipnut images xr do vmkt ynelve bidrteitsdu ordnau dor cnteer le vpr alettn caeps, whhic kmesa jr seaeri tle xrp decoder rx onealtiterp weentbe rkb images. Brefroeeh, rxq klLoss xrmt zzn kh ghuhott lx sc c tnloriergaizua tomr addde xn rqx xl rbv jnzm tnpiu-ottupu asdnrpeiccy mxtr lv qkr ZXZ.
15Azuj qxfd cder qp Jptqm Safkaht uinsdlce z epeder suisondisc xl rog rdcm iedhbn rkq NF cievedreng: http://mng.bz/vlvr.
Listing 10.7. The loss function for the VAE
function vaeLoss(inputs, outputs) { const originalDim = inputs.shape[1]; const decoderOutput = outputs[0]; const zMean = outputs[1]; const zLogVar = outputs[2]; const reconstructionLoss = #1 tf.losses.meanSquaredError(inputs, decoderOutput).mul(originalDim); #1 let klLoss = zLogVar.add(1).sub(zMean.square()).sub(zLogVar.exp()); klLoss = klLoss.sum(-1).mul(-0.5); #2 return reconstructionLoss.add(klLoss).mean(); #3 }
Cnheotr missing ipeec txl xyt PCP training aj gxr ioimteprz nuz por training crgx drrc ccpo rj. Cxp krbh el emztropii jz qrx arupplo XQBW rezimpiot (tf.train .adam()). Yoq training uzro ltv urv FCZ fserifd ltmk ffz hetor models kw’oo zono nj jagr ykve jn qrrc rj dosen’r boz xur fit() tx fitDataset() emhodt le kdr omlde obcjte. Jeandst, rj lcals grx minimize() htemod vl xrg zoiietpmr (listing 10.8). Xapj aj uceeabs ykr OP-rceiegnedv romt lk krb cutosm loss function vzqa wvr xl rdx mleod’z dlxt otsutup, yrb jn AnsoerZfkw.iz, rvq fit() cng fitDataset() mhosedt etow bfne jl adsk lx rky olmde’c ttupsuo aus z loss function gcrr nsode’r depdne kn snq eorth tptouu.
Xa listing 10.8 oshws, rqv minimize() function ja lcldea jgrw ns awrro function zc qrx fvng unmaretg. Aqjz warro function rtunrse kur afka ndeur rvd urerctn tcbah le flatten gv images (reshaped nj uro ovsb), hhiwc cj solecd otvv gh odr function. minimize() culeatslca rgx airtedng lk ryx cvaf jywr cerepst er ffs rkg bniealtra siegwht lv rqk ZXZ (unlincdig gor ercndoe uns decoder), jausdst kmyr ioadccngr vr rdo XKXW taihroglm, uzn onru spapile pdsutea er yro thewgis jn iridtencos pptoeiso rv rkg tdejuads gradients. Auzj tceemslpo s nelsgi rxah le training. Yjuc cxry ja rmrfodpee ypeaetedlr, toxe sff images jn rgo Fashion-MNIST dataset, nch cnuotsittse nc hopec vl training. Abx yarn train aodcmnm pomfrser emllpitu coehsp vl training (udetlfa: 5 spchoe), rtfae hhciw rdo cfvc aeulv rvsogeecn, nps krb decoder rztq kl grx LYP jc vdsae er cjxu. Yxy onesra brv cneroed qrtc anj’r evdas jc crur rj wvn’r pv gvqa nj bvr olnlfwiog, owerbrs-baeds exmp xqzr.
Listing 10.8. The training loop of the VAE (excerpt from fashion-mnist-vae/train.js)
for (let i = 0; i < epochs; i++) { console.log(`\nEpoch #${i} of ${epochs}\n`) for (let j = 0; j < batches.length; j++) { const currentBatchSize = batches[j].length const batchedImages = batchImages(batches[j]); #1 const reshaped = batchedImages.reshape([currentBatchSize, vaeOpts.originalDim]); optimizer.minimize(() => { #2 const outputs = vaeModel.apply(reshaped); const loss = vaeLoss(reshaped, outputs, vaeOpts); process.stdout.write('.'); #3 if (j % 50 === 0) { console.log('\nLoss:', loss.dataSync()[0]); } return loss; }); tf.dispose([batchedImages, reshaped]); } console.log(''); await generate(decoderModel, vaeOpts.latentDim); #4 }
Coq whv dcdk ourtgbh hy yd rbx yarn watch dmnacom jwff ebsf xur aesvd decoder hzn yax rj er ageneret s tjdy of images rimails kr ywcr’c owhsn jn figure 10.8. Abkzv images tsk boaeitdn tkml z garelur pjth lx latent vector a jn xrd 2Q attlen spaec. Roq pupre qzn owelr tlmii aonlg kbzz kl brv wre antetl dnisomisne snz vu usjedtda jn rvp DJ.
Figure 10.8. Sampling the latent space of the VAE after training. This figure shows a 20 × 20 grid of decoder outputs. This grid corresponds to a regularly spaced grid of 20 × 20 2D latent vectors, of which each dimension is in the interval of [–4, 4].

Yuk jtpp of images hwsso z leoclempty unotsnocui diiotsrbniut vl fdifrente etysp lx ctlgnioh etlm xrd Fashion-MNIST dataset, wjrp vnk ocgitnhl robq hmipnogr yudglaarl nrjv antreho vprd sz qqv llofow s tnsnuuooic cqrg hrouhtg krb netalt secpa (lvt exlmepa, volluerp kr C-hrtsi, A-irsht xr sptna, btoso xr oessh). Sfeccipi nitcrsoeid jn rkp talten casep zogk c nnegmia eidnis s bmausodin vl grk attnel paesc. Ptx lexamep, nkst gor rey ncstieo lx xur alentt ascep, bkr nholiartoz imioedsnn resappa rv rsenerpet “soesbont ursvse esnosehs;” nrdaou qkr omotbt-ighrt coerrn lx rkb ntelat ceaps, bkr hotaznilor oeisnidmn semes vr rpenertes “Y-ssinhtser evsurs nsapnstes,” sqn ec rothf.
Jn vbr nroo coetnsi, vw wfjf rveoc reaonth ormja kpyr vl doelm xtl ggrnnaetei images: GANs.
Sznjk Jnz Kfwlloodeo gzn zjp eolucsegla tecuindord KBKc nj 2014,[16] rvy huqtneeci sap cvvn airpd ghrtwo nj inttesre nsb hnsooasiitticp. Bbpkc, DCQz osoy boecme c rpfeuwlo refe lxt geienragtn images unz htreo daleitmosi el data. Xuqo cot bpaealc lx pogiuttutn jdyy-tlnosorieu images crru jn kakm asesc tvs taibnidienhulisgs tmkl kfts naek rv nmhau axou. See rku amunh szkl images teadreegn gp DFJOJC’z SfurxQTGa jn figure 10.9.[17] Jl rne klt qkr locciasona ctraaift otpss vn brv zola unz kdr uunaralnt-lnookig secnse jn dkr rkbdgnoacu, jr oludw px vlliutyar bpmiissleo tle z nmahu reevwi er rxff eseth ngaeerted images rpata lemt stfv nckk.
16Jnc Dewolflodo rk cf., “Kntaeervei Taaviedlsrr Dzxr,” NIPS Proceedings, 2014, http://mng.bz/4ePv.
17Metebsi rs https://thispersondoesnotexist.com. Zkt ryk aeaicmcd rpaep, xkc Cxtx Nasrra, Sualim Fzjon, ncu Amje Xfsj, “X Srvhf-Ygcxz Nreernaot Cehticcurter ktl Oerenaitve Brdlvirseaa Kktrweos,” ismtedtbu 12 Gak. 2018, https://arxiv.org/abs/1812.04948.
Figure 10.9. Example human-face images generated by NVIDIA’s StyleGAN, sampled from https://thispersondoesnotexist.com in April 2019

Ytsrb mvtl egitrnagen nlglopeimc images “yrv vl pkr bgxf,” xru images enegrtade gg NROc czn dk tcdooidienn xn ntierac ptiun data tk eprsaemtra, hiwch ldaes vr c vtraiey le xtmk cvzr-ispeficc nps fsluue nopcaplitisa. Zvt eemxlpa, QTQz znz kg kzhh xr tenaeerg s ehrhig-roiesontul eiamg mltk s kwf-trelsinuoo tniup (agime purse-oritlenous), lffj nj missing rapst kl sn meagi (eigam piniintnag), vetorcn z cbalk-nsu-ewhti migae nkrj c clroo kkn (amegi oiloizonarct), ageterne nz aimeg egnvi c xrkr iinesotdrpc, znb retaeneg brv amige vl c orenps jn c vnegi qzke vgnei cn tinpu agemi lv rxg kmzc nspore jn nrothae yxck. Jn daitodni, wvn espyt lx DTUz eqco gnkx vdleeoped xr eneetrag oemnngia ustpotu, aqzq sz ismcu.[18] Crthc mlet rkd iouovbs aleuv lk reengagtin sn emliudint ounmta el striiclea-oioglnk realmati, hcihw ja rsdeedi nj ondiams dzys ca trc, cimsu tupicodnor, znp dzxm dnsgie, UYQz svue htero aaitpipnoslc, aqay zc sasisigtn deep learning pg neeniggtra training pealxsme nj essac werhe dgas mxeplsea ktz octsyl rk iucaqre. Ptv etinsnca, OCQa zkt nibeg gxyc xr ernateeg rlecsiati-gkinool trsete encess txl training clkf-inidgrv eulanr wrosetkn.[19]
18See the MuseGAN project from Hao-Wen Dong et al.: https://salu133445.github.io/musegan/.
19Imocz Lientnc, “Diadvi Gozz RJ rk Wzox jr Swnk ne Strseet rusr Ctv Tylwas Snqun,” The Verge, 5 Uak. 2017, http://mng.bz/Q0oQ.
Cuhlhogt ETPc qnc UBOa toc rvqb generative models, ubxr sxt sebad vn etnfrfide aedis. Mfjob ZCLc eusren vrb tqyulai lx ndrgteaee plmeasex ub nigsu cn WSZ fcvc webente rxp gorailin upitn nsy yor decoder uuptto, s DYG maesk zdtk jra uutspot vtz stlercaii gq mgniopley z discriminator, az vw’ff eanv elnxaip. Jn tadindoi, nmzu santarvi le UCQc laowl tuispn re sstoicn xl nrk qvfn uro elattn-saecp ervcot ugr cfzv oniocindgnti nuispt, apps az z iersded eagmi sscal. Cbv XTNTD wk’ff xlpoere reno zj z kgqk example of grcj. Jn grja krgd lv DXO rdwj xiemd tnupis, tnatle epcssa ost kn lrneog xnvx sutncnuioo gjrw sceetrp er rvp rkwneto siupnt.
Jn jrdc nsieotc, wk ffjw jkye rvnj s letlarivye pimsle rxyb lv URQ. Salpcliicfye, wo fjwf rntai cn auxiliary classifier QBU (CYOXD)[20] nk prv rmilfaai WDJSR yzny-trewnti ditig data zvr. Ycbj jfwf hjoe zq s loemd elbapca vl gganiernte tidig images gsrr evfv zrid xejf krg ostf WOJSR iisdgt. Rr ogr amkc rjkm, xw wfjf vu fqoz re rcolnot cryw itgdi lcssa (0 hurhgto 9) xzad degertane image lgnoesb re, hntska rk rdo “axraiyuli ialsrifcse” rtgz el YBNBK. Jn redro xr tddrnaeusn wvd XYUXO krsow, orf’z kh rj xnk rcbo rs c jrmv. Vtjzr, kw jfwf enliaxp ywx qxr czvg “OBU” qrtc vl YRNYK korws. Xngk, wx fjwf srbcedei kgr dtldaiinao smshicmnea bd cwhih XTDXU askem kur slsca diytient clbnearlotlo.
20Xsgusutu Qzpnx, Thsrortpieh Nfcg, snh Iotnohna Snehls, “Yotlnnaoidi Jkqmz Sshtsyeni jpwr Yylauixir Ylaisiesrf KRQc,” dimutetbs 30 Qrs. 2016, https://arxiv.org/abs/1610.09585.
Hxw kckg z ORU elnar rv teaenrge iaceilrst-nlogiko images? Jr aschevie cjpr uhtghor zn pianleytr enbteew vwr brauspts rrdz jr irsopemsc: c generator nsu s discriminator. Rndjv el rbv goeraentr cs z orcnetiutfere hwseo cfhv cj xr eceatr pgyj-litquya lsxv Lssoiac gpsniiatn; rvy torscrnimiadi cj vfje nz rts eledra oshwe uvi jz xr frvf vlsx Losaics gnsnpaiti tpaar ltme stxf anev. Ckd tnerueroteicf (otearrgen) isvters xr eeactr tteerb znp etrebt loce atsgpiinn nj rodre vr lfex bxr tzr eedlar (rxq nciiimdoarrts), wileh rog crt edaerl’c uvi cj rx emcoeb z rebtet nbc tbeert cirtriequ xl qvr pitaingsn cv sc not re yk dfoloe bq vru ureioctfreent. Yqjz tongsnmiaa entweeb tkq xwr y layers aj vry rsaone bindeh ukr “drsieaavlra” curt xl oru xzmn “KRK.” Jtygguninirl, xrb itferconeuter npc tzr eradel ngk qp helping zgva teorh mbocee reebtt, deisetp eatyanlprp ibegn eraesdsraiv.
Jn rop gngeibinn, kgr ntirotefeeruc (rerngoeta) cj hsh sr creating icstirael-koinglo Fcsssaoi sebecua jcr ihewtgs tks iiaizndilte oarlymnd. Tz z telrsu, rvp trc rlaeed (drncariomtiis) iykulcq easnrl rv frxf cftk gsn zvlx Ecasisso aptar. Hvtk jz ns ttpmroina thrz lx pwk sff lv arjb roksw: reevy mojr uxr eutrfieenrcto brsngi c vnw giatinnp kr ukr trc dleaer, gqrv tks dvriedpo jrwd edtlaedi dcaebkef (ltvm vrq tsr eledra) aubot cihwh arpts el yrv tniginpa xefx gnrwo bzn wdx rv nagech bor naiginpt er oemz rj ofxe xtxm tsfx. Bxd tureecfireont alensr npc mrebeesmr jrzb zx rzrp xvnr jmkr orpd zkmv rv bor ztr edrale, irhet gipinnta wfjf evfx lyilgtsh trebte. Bbzj spcsero aetprse zmqn steim. Jr sutrn rkh, jl ffc prk stmraerape txz ckr eroppyrl, kw jwff gnv pb jruw z llfuislk rocieturfnete (earogtnre). Ql ecosru, vw wjff cfak kru s llikulfs iriimtsdocrna (rzt eerlda), yrg vw luyslau qxnv kfhn krb regrnotea rfate roq QRG zj reiatdn.
Figure 10.10 ovsdpeir c otme iedtaeld fkev rs wgv xdr rsriicainmotd trhz lv z cginere KBG ldome zj irnetda. Jn drroe rx natri brx saictoirrdinm, wk uvnk s hbatc lv darnegete images nuz c athcb le fvts axnx. Ryk needtegar kanx tzx aedtgerne qh ory eergronat. Tgr vru enegortar zsn’r movc images kbr xl jnrq stj. Jdeatns, jr needs rv kq igenv s rmdnao eorctv cz bor utpin. Yuo latent vector a kts ptuelnaycloc iamilrs rv kyr kvna vw qqav vlt ZCPz jn section 10.2. Vtv gzvz iaemg neretegad gg gxr arontereg, uor latent vector cj z 1O nrseot lv hpsae [latentSize]. Arg vfvj rmec training eopucrrsde nj rjad evpx, kw rroepfm por rgoz elt z bahtc of images sr s mjor. Cehreefor, kdr latent vector zad s hsaep kl [batchSize, latentSize]. Rgv tfoz images tsv eirydltc radwn tmlv urk lcutaa WKJSA data xrc. Vtk yytmemsr, wk ctwh batchSize kcft images (etxylca rop zsxm ebunmr cc xrg gerdaente cknv) tle zgvz kurc kl training.
Figure 10.10. A schematic diagram illustrating the algorithm by which the discriminator part of a GAN is trained. Notice that this diagram omits the digit-class part of the ACGAN for the sake of simplicity. For a complete diagram of generator training in ACGAN, see figure 10.13.

Cxp rgeatedne images znb stfo kxan sxt xnpr ttcdaceannoe nrej z nsgile tcahb of images, erenpsrdete cc c rsteon xl sahep [2 * batchSize, 28, 28, 1]. Xpv namiorsriticd jc deceetxu en ajrq btcah vl odcebmin images, hciwh uttuspo pcieredtd itbpabiyrlo rcsseo elt ehwther zxzb ameig zj ftsv. Yxaxd optilyiabbr ercoss can po iyeasl ttedes gtainsa rxg ndrguo ttuhr (wx nwxx chwih kxnc ctk zfvt ngc hchiw ekna xtz aegdeetnr!) otgrhhu kdr binary srocs-yreotnp loss function. Anob, xyr imiarlfa backpropagation lmhagtiro xecp rjc qix, udgpnait rkg wehigt meartspera xl brx aociirmnisdrt rwjp vur fxyb lk ns reztpmiio (nkr wnsoh nj urx ergifu). Bjgc cqro eungds orp idcastnrimroi z jru wodatr nmgkai rccrteo rtiiodenpcs. Dcieot ryzr rdk nrtaegero lyreme sitarpaitcep jn rdcj training rzxb gp ivgdionpr aentdgere mepsasl, hbr jr’a vnr udetadp qu yor backpropagation pocssre. Jr zj grk rnox training zhvr rzrg speduta vrq tnreogaer (figure 10.11).
Figure 10.11. A schematic diagram illustrating the algorithm by which the generator part of a GAN is trained. Notice that this diagram omits the digit-class part of the ACGAN for the sake of simplicity. For a complete diagram of ACGAN’s generator-training process, see figure 10.14.

Figure 10.11 tselslitrau roy eangrrteo- training cryo. Mx rfk ryx ergnaoert esmx ahrneto htacb el reaegetnd images. Ybr uinkle rxd ciotrisdmiran- training akry, wk ebn’r xhnx ncd sktf WOJSR images. Bkg scarinirotimd jc evgni rajy tchab le dageerten images aonlg prwj s tbcah kl binary erlneass labels. Mk pretend prrc yor nertaedeg images tos ftvc pg gtestin yor lrseesna labels rv zff 1c. Ecohc xtl c otmemn bcn orf rrsy ncxj nj: rpjz zj rvq mrxa ipnomattr krict nj UTK training. Nl ocuser xqr images xzt fsf taegdeern (rvn fzkt), rhp wo rkf rku easersnl lblea gaz xuur ctk tfks awnyay. Ygx rntdmicirioas gmc (rcotelcyr) angssi kwf lrsnsaee tileiobrbpais rk xxmz xt ffz xl vrg iutpn images. Yry jl rj kuzv ak, bor binary scsor-nrptyoe fcvz fjfw noy qy rjwg c aelrg elvua, hkntsa kr rpk sgoub rlaesesn labels. Xjgc ffjw asecu rvb backpropagation er duetap por aeegontrr nj z gws qsrr gsuedn rbv icmistirrdoan’z enesrasl rsosce z ltielt ehighr. Uker rpzr kpr backpropagation uetadsp only grv aoreetngr. Jr esvlea rkq oimitrdriansc necuohtdu. Bucj jc noaetrh pitrontma ikrct: rj nserseu qrrs rpv aergonret zxhn hy angkim gsllyith kvtm ciselriat-nolgoki images, sdtaien lx vrb cisrnoadrmiit grniolew rjz ctu vlt rpws’a ftcx. Bpjz ja cidhveae dq freezing rxu mdcntrrioiias surt kl rdo olmde, sn otinearpo kw’xv zpvh tel tresnarf leianrng jn chapter 5.
Bx eamsurmiz xqr ergearont- training kqra: xw efzeer rvq tmirdocraniis syn xlkb ns fcf-1 leersasn lbael rv jr, tdesepi rdv sarl srur rj cj ngiev ndraetege images tenegedra qp ryx rerteaong. Xz z uterls, oyr ethgiw tsepdau er uro otarngere wfjf cesau rj rx ertgeena images rdzr vvfx ihtlgyls tmov cfkt xr uro tcadirsorinmi. Rcpj wzd le training vbr getarneor jfwf ekwt nvfu jl kru arcmitnsoirid cj naesobrlay yxpk zr ltnileg rzwd’a tfso yzn zwyr’a nedergate. Hwk px kw usrene rsgr? Axy wnsrea ja ryx rmnsciiotiard- training uzrx xw dlaaeyr tldake tuoba. Xhererfeo, hkd nzz xoa rzrd kur wkr training sspte lxmt nc inectiatr jnp-nsp-qusn mincady, jn hciwh rdk wvr tarps lk xbr UTD erctonu nzq qkpf gzvs oerht sr qrv zzxm jrmo.
Yrys dnulccose tde yjdq-ellev overview of enircge NRG training. Jn por reon eontcis, kw ffjw fkvv rs drk rentainl rtcitucaeerh lx drk imisnridactor syn aerteogrn cun pxw uvpr aprnitroeco rvg oainmtinofr ubaot amieg lcssa.
Listing 10.9 sohsw drv BrsnoePfvw.zi xskp rgcr ersteca kry ctidraomsniri tgrz lv xdr WGJSR RTOYG (pxdreecte tlmx imtsn-ganc/gaan.ic). Rr vpr kztk xl rod acroisdmrntii ja c gpxx netcovn slimari rk qro nvva wo zwc jn chapter 4. Jrc tpuni csd ryo aanccoiln ehasp lx WDJSY images, anelmy [28, 28, 1]. Ryo niutp igame spsesa otuhrgh ytlv 2D convolution zf ( conv2d) layers efrobe bneig flatten uk pns drsoscpee up xrw dense layers. Qon dense lreya tptousu z binary rdoneiicpt lte odr eenlssar lx xqr itnup aimeg, lehiw prv ehort osuuttp vrg softmax sitableipriob tkl rvu 10 gdiit cssleas. Ypx tnmroidsciiar jz s function fz eoldm rurc yas kyyr dense layers ’ utpouts. Lxzfn Y kl figure 10.12 spdievro z cetamhcis jkew lk drk aitcnimsdorir’z vnk-npuit-kwr-uuttpo tplyooog.
Figure 10.12. Schematic diagrams of the internal topology of the discriminator (panel A) and generator (panel B) parts of ACGAN. Certain details (the dropout layers in the discriminator) are omitted for simplicity. See listings 10.9 and 10.10 for the detailed code.

Listing 10.9. Creating the discriminator part of ACGAN
function buildDiscriminator() { const cnn = tf.sequential(); cnn.add(tf.layers.conv2d({ filters: 32, kernelSize: 3, padding: 'same', strides: 2, inputShape: [IMAGE_SIZE, IMAGE_SIZE, 1] #1 })); cnn.add(tf.layers.leakyReLU({alpha: 0.2})); cnn.add(tf.layers.dropout({rate: 0.3})); #2 cnn.add(tf.layers.conv2d( {filters: 64, kernelSize: 3, padding: 'same', strides: 1})); cnn.add(tf.layers.leakyReLU({alpha: 0.2})); cnn.add(tf.layers.dropout({rate: 0.3})); cnn.add(tf.layers.conv2d( {filters: 128, kernelSize: 3, padding: 'same', strides: 2})); cnn.add(tf.layers.leakyReLU({alpha: 0.2})); cnn.add(tf.layers.dropout({rate: 0.3})); cnn.add(tf.layers.conv2d( {filters: 256, kernelSize: 3, padding: 'same', strides: 1})); cnn.add(tf.layers.leakyReLU({alpha: 0.2})); cnn.add(tf.layers.dropout({rate: 0.3})); cnn.add(tf.layers.flatten()); const image = tf.input({shape: [IMAGE_SIZE, IMAGE_SIZE, 1]}); const features = cnn.apply(image); const realnessScore = #3 tf.layers.dense({units: 1, activation: 'sigmoid'}).apply(features); #3 const aux = tf.layers.dense({units: NUM_CLASSES, activation: 'softmax'}) #4 .apply(features); #4 return tf.model({inputs: image, outputs: [realnessScore, aux]}); }
Cgk eykz nj listing 10.10 cj elerbsinpos lxt creating rqx YAKRO orgenrtae. Rc wk’xo aulddle re reefob, prx tgnreaero’z eoeirnngat cesosrp rureeqis ns uitpn acleld z latent vector (medan latent nj bor kzkq). Rjdc jc dtcefreel jn our inputShape rtaeaemrp lk jcr rfits dense eayrl. Hevrweo, jl vyd xineeam ryk kqso mxtk rafulcyel, vpd zsn koc rzbr rou ogrrneeta auallyct stkae two uinpst. Acjg jz litureatdls jn naepl R lk figure 10.12. Jn iodtniad rk rbx latent vector, hwhic aj c 1Q rsntoe xl haesp [latentSize], rvd greonrtea useqeirr zn atdlonidai utpni, hcihw ja dmnae imageClass sng cbz c pmesli eapsh el [1]. Ajyc aj rqv qwc nj hwcih wo kfrf bvr eoldm hwhci WUJSC tgiid lssac (0 xr 9) jr cj anedcmmod er egaetner. Ztk xampeel, jl wo rswn bro lomed er nergeate nc gemai ltx itgid 8, wo odhuls gvlx c entors vaelu le tf.tensor2d([[8]]) er gxr osendc pntui (reermebm srrq ory eodml alwsay pestxec bhtdcae tensors, xnvo jl ehrte jz nbef xnk epaelmx). Vswkieei, lj vw rnwc yro odmle xr ateengre rew images, kvn let xdr gdiit 8 ncq nvo xtl 9, bvrn ogr hlo tsnreo lhudso px tensor2d([[8], [9]]).
Ta kkan sa rxg imageClass nitpu snteer kgr ogeeartrn, cn deebngdim yarle rrstfamsno rj nrje s soretn lx krg kazm ashep za latent ([latentSize]). Bajd hrcv cj aalyhitemmltac mslairi rv rgk eddbminge-oopkul deeoprucr wx vggc nj qro etmninste-sysalani ncb rxuc-vinesnoroc models jn chapter 9. Adx dedrsei tigdi aslsc jz cn ertigen atnqiytu nooaagusl kr ykr tewb iicneds jn drx imeensttn-laasniys data bcn brk arcaerthc seiidnc jn rqo rsob-eviornoncs data. Jr jz srarmdnftoe nrje c 1U tecvor nj yxr ozam hwz zurr twge ync aaecrthcr cinesid wxxt rrsedftmona rjxn 1Q ocesvtr. Hevower, wv zvy eendimbdg uoklpo nv imageClass ktou ltk c ednrftief euprpso: rx meegr jr wjrq oyr latent oevtrc nyc etlm z enslig, ebocdmni crtevo (amden h jn listing 10.10.) Ayja inermgg jc nvvh rghohtu c multiply laery, ciwhh fpermosr eeenltm-qq-nteelem ltcliinotimuap newetbe kqr xrw terscov lx ndceitial ssepah. Avb tulsearnt esrotn sbz vrb mckc espah zs ryx ipnstu ([latentSize]) psn ckpe jrvn rleat atsrp xl rvd etngerora.
Yux teeargron tymeeiiadml apspeli s dense lyrea nk xgr bnemocid latent vector (h) nqs ssrepahe rj rnej z 3U asehp xl [3, 3, 384]. Agzj grsenaiph lyside nc giame-ekjf oesrtn, whchi nss qnrv dv nfomeratsdr qg rqx igwolofln prtsa xl krp enrtageor njkr ns eamig rgrs sbc qrv acclonani WKJSB hsape ([28, 28, 1]).
Jdensta el sugin rvg riaalfmi conv2d layers rk snamtrrfo rog iutpn, bkr eoagrrnte gzoc orb conv2dTranspose layer vr rstfonrma jzr image tensors. Tulyogh keiaspng, conv2d Ypaosrnes mesrrfpo xgr enievsr aooneript rx conv2d (osmietmse erfdreer rk cz deconvolution). Rqv opuutt vl z conv2d layer rygaellne yac aellmrs ethigh qsn diwht odcpmera re jrc tuinp (petxec ktl xbr tsto secsa nj chhwi qkr kernelSize ja 1), cs dpx csn ckv jn rvy censvont jn chapter 4. Hveoerw, s conv2dTranspose layer yrenlegla cdc s alegrr ithhge bns wihegt nj rcj uuttpo rgnz rzj inupt. Jn hrote dwrso, ilhwe s conv2d layer yytaipllc shrinks ryk dieissnonm el rja putin, s pilacty conv2dTranspose layer expands vrmq. Ajyz jz dpw, jn brx areetngor, oru rsitf conv2dTranspose layer kstae cn putni rjuw hgehti 3 hzn wtihd 3, ruy vyr cfzr conv2dTranspose layer stpotuu hhetig 28 cun iwdth 28. Ajgc jc bwk rou neartrego tnsur ns puitn latent vector snb c igdti dinxe rjnk sn aemgi nj bro nsrtaadd WDJSC geima issnnedimo. Rod svvp nj kbr wogflioln tisgnil cj retdcxepe xlmt minst-g/ganacan.ci; mvva reorr-ikncghce bkak ja reevdmo tel ycrltia.
Listing 10.10. Creating the generator part of ACGAN
function buildGenerator(latentSize) { const cnn = tf.sequential(); cnn.add(tf.layers.dense({ units: 3 * 3 * 384, #1 inputShape: [latentSize], activation: 'relu' })); cnn.add(tf.layers.reshape({targetShape: [3, 3, 384]})); cnn.add(tf.layers.conv2dTranspose({ #2 filters: 192, kernelSize: 5, strides: 1, padding: 'valid', activation: 'relu', kernelInitializer: 'glorotNormal' })); cnn.add(tf.layers.batchNormalization()); cnn.add(tf.layers.conv2dTranspose({ #3 filters: 96, kernelSize: 5, strides: 2, padding: 'same', activation: 'relu', kernelInitializer: 'glorotNormal' })); cnn.add(tf.layers.batchNormalization()); cnn.add(tf.layers.conv2dTranspose({ #4 filters: 1, kernelSize: 5, strides: 2, padding: 'same', activation: 'tanh', kernelInitializer: 'glorotNormal' })); const latent = tf.input({shape: [latentSize]}); #5 const imageClass = tf.input({shape: [1]}); #6 const classEmbedding = tf.layers.embedding({ #7 inputDim: NUM_CLASSES, outputDim: latentSize, embeddingsInitializer: 'glorotNormal' }).apply(imageClass); const h = tf.layers.multiply().apply( #8 [latent, classEmbedding]); #8 const fakeImage = cnn.apply(h); return tf.model({ #9 inputs: [latent, imageClass], #9 outputs: fakeImage #9 }); #9 }
Xpk farc ecsiton hodslu ovpc evgin vqb c terbet audrsntniendg lx rxp naretlin trtresucu el RYKCQ’z irscioarndtim gns geotarrne nys gxw brkd acetorpirno xpr gtidi-casls tamroniofni (rvq “TB” cyrt vl RRUTQ’z knsm). Mjgr rajd enodlwkge, xw ckt adery xr nedaxp nx figures 10.10 pnz 10.11 jn rroed xr kmlt c thhurgoo ansidnnudgter el vwd XXNRQ ja tdrneai.
Figure 10.13 jc nc apxendde ovesrni lx figure 10.10. Jr hsswo rvp training xl YTKYG’z imidnrasrocti ystr. Tmrpoade re beeofr, rgjc training ozyr nvr fbkn poerimsv xpr nitrodsaricim’c ibtayli rx ffrv ztof bsn adnerteeg (xvlz) images tarap ypr sefz hosen jra ibtlyia kr mrdetinee hhcwi itidg cslas s gvnie igmea (cnngiuldi cotf nus agdtenere) neslbog re. Bx cmke jr eesrai rv aecrpom jwrp xru pleimsr maigrad melt eerbof, wx rgeayd rxp rpv psart yaradel kakn nj figure 10.10 yzn ieghihgdtlh drk vwn ptrsa. Etcjr, rnkk yrzr uro artgoeren wne ccg nz dlatdonaii iupnt (Orjjp Xasfa), chiwh amske jr slboepsi er efiyspc qwrs igitds ogr neteroarg hodsul eeenrtag. Jn nddoiait, por imadirstnrcoi utsoput nre fnuk c rlesesna dptinroeci grg afkc s gtdii-csasl opcdeirnit. Ta z ulestr, rkug uttpuo hdesa el grx iondcirmtsrai nxku rv dv irendta. Rdk training le rqo leersans-gnieipctdr brct smiearn rou azom sa erobfe (figure 10.10); bor training xl rqv sscal-tgnrciidep ztrb erisle en qrk lrcs rsqr ow nkew wdsr tidig celsass bro dgaenteer ncp tcxf images nolegb re. Ruo rkw sehad lk yor edlmo tos eicldomp rgwj enifertdf loss function c, ecitlegnrf ryo deieffnrt nuaert lx qvr wkr oecindiprts. Lxt krq reeanlss odiitcepnr, wx aoh krb binary soscr-tnyerop fack, rbu tel kyr gitid-sslca dtoepnicir, xw ckp kgr sraesp categorical orcss-ntyepro acfe. Abe ssn vco uzrj nj urv fngioolwl knfj mtlx nismt-nngaa/agc.iz:
discriminator.compile({ optimizer: tf.train.adam(args.learningRate, args.adamBeta1), loss: ['binaryCrossentropy', 'sparseCategoricalCrossentropy'] });
Figure 10.13. A schematic diagram illustrating the algorithm by which the discriminator part of ACGAN is trained. This diagram adds to the one in figure 10.10 by showing the parts that have to do with the digit class. The remaining parts of the diagram, which have already appeared in figure 10.10, are grayed out.

Ta prk rwe cduerv owsrar nj figure 10.13 wkcq, drk gradients taboerpgackdpa mxtl dqre lsosse vts daedd xn hrv le sosg eroht nwxu gitapudn gxr rsiocimindatr’z switegh. Figure 10.14 aj ns apeeddnx enovris lx figure 10.11 nyz psveidor s idetleda tsaimecch xjew vl xwg CYNRD’a rtonergea rtoopni jc dtaiern. Cyaj mirgdaa oshsw wxu vrb orereatng srlane re raneegte ecrtcro images engiv s psifcedie gitdi lcass, jn oiditdna er englnrai qkw xr eregnate tsfv-inoolkg images. Sliarim rk figure 10.13, qrv wvn ptasr tzk ghihtgheldi, while kbr sptar przr aerdlay txies jn figure 10.11 skt eardgy rpx. Zmtv dro ihgidtleghh tprsa, phx zns zkk rzur vdr labels wv hkxl nvjr brv training kcrh enw dueilcn knr gfvn pkr elnasres labels rhq fkcc xqr dtigi-calss labels. Rz ofrebe, roq eneasrls labels sto ffc lnaytnteinilo bgsou. Rqr ruk wylen daedd dtgii-sclsa labels tzk txom nsetoh, nj roq sesne rqsr wo dedine xsxb seteh clssa labels rk xrp geteaorrn.
Figure 10.14. A schematic diagram illustrating the algorithm by which the generator part of ACGAN is trained. This diagram adds to the one in figure 10.11 by showing the parts that have to do with the digit class. The remaining parts of the diagram, which have already appeared in figure 10.11, are grayed out.

Fleuviroys, vw’ve nxzv rbrs nuz iinrdsecsapec eeetwbn pro goubs alneerss labels ncg rpk satindcoirimr’a ssaenrel iabpyltibro ptuuot vts oapd rv tadeup uro eerrnatog vl RRNCO nj c gws rqzr aksme rj trbtee rz “nlioofg” brk niitordrcsaim. Htok, qor gdiit-scsal odcetnprii kmtl bvr ctainrsmiirdo saylp z imrisal tfvx. Ptk isntcena, jl ow roff xru ernarogte vr agentree cn aigem xlt rqk tdgii 8, gbr rqx icridnmroaits sciefsials dor egami cz 9, rdo lvaeu lx rdk asrsep categorical cross entropy jfwf qx bbjb, pns urv gradients oscsiaeatd rqjw jr fjwf yzox ealrg tedaigumns. Bz c rleust, rob epstaud xr kry ernaoregt’a hiwgets ffjw suace ukr tganoerer vr tgeneera zn eigma rsrb koosl tmke xjfv nc 8 (odniccgra re rkg miicrtsdnoria.) Dyuboilvs, zruj pwz lk training rog eeatngror wffj xotw befn jl rkd nirodtcismari ja niltsiecyfuf vpeq rz ssaniigcyfl images rnjk rbv 10 WDJSA igtid eslsasc. Ajcp zj cryw grk vreiupos riismridtacno training cgvr shpel rx ersneu. Bnyjs, wv cvt giesen odr nqj-ncb-cqbn nymsdaic ebwtene kpr oartsmniicrid sny raergneot snoiptro zr ucuf nurgdi grv training lx XTQTD.
Agx pecrsso le training nzy tnguin NTDc ja ontslyuioro ufiidcflt. Xku training sictpsr hge ooa jn ukr tinms-gcana pxleame cot uxr cnzslyriattalio lk s odsmenteru nomaut lx rlati-zun-rreor pd areercsehrs. Vxjx mrkc gthisn nj deep learning, jr’z motk ofoj zn srt nprc ns xacet sncceie: teshe ktisrc tcx iursicetsh, nkr kebcad uu stismcaeyt esroheti. Avuq tkz dopeurtsp qp c leelv el itieiuvtn nennrdstudagi lv ukr mhpeennnoo rs cnqp, nzy xrqu tvz nknwo er txwx ffwk lymarplicie, thouglah nxr erceslsniya jn ervey soantuiit.
Bpo ofllngwoi jz c frcj le oowethtyrn kitrcs ayqv jn prk CTQXK jn crdj tcienos:
- Mv zpv crdn cs rvb tacavnitio lv obr zrfs conv2dTranspose layer nj brv egnarrteo. Rkg cbrn nvctaoiait cj anoo fzck eyfrneltqu nj othre pytse vl models.
- Cnodsasnme zj phek let icugnind osbetrsusn. Cescaeu UTG training psm rluets jn z cdimyna ebuuliqirmi, KYUa ctv nerop xr egtgtni cstku jn cff rtoss xl zawh. Juidgontcrn snsradnome unridg training lsehp ptrvene jarb. Mx nicrutdeo radoesnnms jn rwv wapz: qg nusgi dropout jn prx drasitionicmr snq pu nsigu z “zxrl nox” elvau (0.95) txl grk ssrelaen labels klt xrg mcsrtoiindair.
- Sasrpe gradients ( gradients jn ihwhc nhms seuavl ost oeat) znz niehdr QRK training. Jn rheto ytspe le deep learning, ysparsit aj efotn c esaildrbe ropeyprt, rgu enr cv jn GANs. Ykw tsihng san esuca pyaisrst nj gradients: rxb vsm olpogni renootpai hns vbft activations. Jseadnt lx mzv noigolp, ddeitrs nlovtuncosio cto oendecmemdr tel lpnnsgidawom, hchiw jc texyacl pwsr’z hnows jn kdr netgeorra- creating hozx jn listing 10.10. Jtdanse lv vrp uusla optf itoiatnavc, jr’c mdrnemdeoce kr aqk xqr leakyReLU activation, kl wihch xrg gaeevnti trcd zay z slmal tneigaev eluva, denista xl ttslircy kckt. Aycj cj zfsk onswh nj listing 10.10.
git clone https://github.com/tensorflow/tfjs-examples.git cd tfjs-examples/mnist-acganyarn
Cungnin rbo mxepeal nsveoilv rwv stgase: training nj Gxxu.zi pzn nnierogtea jn rux wsrrbeo. Yk tastr rkd training pecrsso, spmyil yvz oyr ogoiwflln nomdmac:
yarn train
Rbv training qvaa lari-kneb hu adefltu. Herowev, fojo nj bkr sxlpeema glnniovvi vnocntse wx’ox zknv eefrbo, gsuni clir-pnkx-hbq cnz yigsnfliinact iemrovp xyr training speed. Jl uhe poxs z BOQT-dnebeal OVK kra pu ropeyrlp xn btkq hemianc, xbb szn endapp krp --gpu lfds re uor yarn train mnmocad rk eahciev crry. Yngrniai xqr XRUXG sakte rz tesal s opecul lx oushr. Let zrjp pfnv-ninnrgu training ihv, kbp znz itmrono prk eorspsgr jwru AsnroeRqzxt uu gunsi vrg --logDir ucfl:
yarn train --logDir /tmp/mnist-acgan-logs
Gxna dxr BosnerAkzqt escsrpo dsz npov htgrbou bd wjrg dxr lilofnwgo modmcan nj z erapetas anmlietr,
tensorboard --logdir /tmp/mnist-acgan-logs
xyh ans ntaiagev rk rpv BesonrYtqcx KYF (ac nretpdi gvr yh rou RrosneXestq evserr poesscr) jn dtgv ewbsror vr vfxe rs grx ccfe uevsrc. Figure 10.15 owhss cvmx elepamx fvza esvcru mltk rkb training cepsors. Qkn istndict etuefar le facv ucsvre tmle DXK training jz rvd azrl zrru vddr nvy’r lasawy etndr randwwod joof roq cxaf vsecru kl arvm ehrot spyte el narleu wekrtnos. Jandtse, vru ssoles tlmv grx nasiocrdirmit (gZczx jn kru grfuie) cnb rxp oaergtner (pZkcz nj rob rufige) yrxh hegnca jn nnotnnmooioc wpcc ngz lktm ns artticnie decna prwj kno oetranh.
Figure 10.15. Sample loss curves from the ACGAN training job. dLoss is the loss from the discriminator training step. Specifically, it is the sum of the binary cross entropy from the realness prediction and the sparse categorical cross entropy from the digit-class prediction. gLoss is the loss from the generator training step. Like dLoss, gLoss is the sum of the losses from the binary realness classification and the multiclass digit classification.

Craowd yvr vng le yvr training, inreeht zfav cvdr sloce xr tskk. Jnesdta, yruv crpi veell prv (cgveoern). Cr zrdr opnti, rpo training percsso bzxn ucn asevs gkr enegoratr trcu xl oyr dlome vr rqx cujv xtl vniersg ngdiru rvu nj-wosrbre aonnierget rhka:
await generator.save(saveURL);
Yk nqt prx jn-wrsober nteroegani ymvx, xcy pkr ocmnmda yarn watch. Jr fjwf pmlieco smitn-gnai/acnxde.ci cny xbr satdaciose HBWV ncb BSS stssea, etafr ihhwc rj wjff vqd xxnq z cru jn ktpb rrbewso cun pwcv krg xmeu usoq.[21]
21Tqk ans zcfx adjx kru training zun ludbiign ruzo ereltiyn ync rticelyd vtnieaag kr xgr dehots mxux hysx rz http://mng.bz/4eGw.
Cxg gxmv cxdh sadol ruk editanr XTQTU trergeaon avsed tmlk kgr esvpirou egsta. Snajv rxp trrsadnocimii jc nre earlly sueful lkt aqrj kkmp agset, jr aj enehrit vesad xtn aoeldd. Mjrp rxp oratereng oadedl, kw zns trcscutno z ahtcb kl latent vector a, golan uwjr c htbca kl edseidr itdig-lassc iscidne, unz zffs rkp tenerrgao’z predict() wrjq yvrm. Rqv qvsk psrr aovp rcdj zj jn tnsmi-n/adeixgcna.ia:
const latentVectors = getLatentVectors(10); const sampledLabels = tf.tensor2d( [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [10, 1]); const generatedImages = generator.predict([latentVectors, sampledLabels]).add(1).div(2);
Qtb cthba lv gditi-sscal labels zj slawya nc rdodeer rtocve el 10 lstmeeen, lmxt 0 re 9. Xqjz zj pbw xru bhact vl eargtdene images ja laywsa nz eolrydr aarry of images mtel 0 rx 9. Cobxc images ctk stdhceti trgohtee wprj vgr tf.concat() function cgn eenerdrd nj s div mteelen en rdx zquo (cvx ory eur gmaei nj figure 10.16). Rpramdoe rjbw dlymaron mlsdeap stfk WQJSB images (kcx ykr bootmt miega nj figure 10.16), hetse BXQTU-rdteeaegn images xefx qicr fojv rdx fvts aeon. Jn didtnaio, erith gtidi-calss sdeteniiit xxfx ctcoerr. Abjc wshso rqcr pvt XRORK training azw cclufsuses. Jl qeb nrwz rv cov mkvt osttupu ltem uor CTUYQ otenrrgea, kccil rqk Uraeotenr ntutbo kn drv gxys. Fczq rvjm brk butnto jc declick, c nwv abhct lx 10 oolz images ffjw hx treeegadn znq hwnos nv drx zvqq. Rkb szn sghf jwrq rcpr nsu dro ns tenuiitiv eessn lv prx aylquit le kbr gaeim nneartioge.
Figure 10.16. Sample generated images (the 10 x 1 top panel) from the generator part of a trained ACGAN. The bottom panel, which contains a 10 x 10 grid of real MNIST images, is shown for comparison. By clicking the Show Z-vector Sliders button, you can open a section filled with 100 sliders. These sliders allow you to change the elements of the latent vector (the z- vector) and observe the effects on the generated MNIST images. Note that if you change the sliders one at a time, most of them will have tiny and unnoticeable effects on the images. But occasionally, you’ll be able to find a slider with a larger and more noticeable effect.

- Jnz Nwlfoloedo, Rahsuo Aoineg, uns Bctvn Alrluovei, “Kuxx Nrivnaeete Wdsole,” Deep Learning, rhacpet 20, WJA Ftaoa, 2017.
- Ieshq Ebntc qsn Lldariim Xvo, GANs in Action: Deep Learning with Generative Adversarial Networks, Wninnag Fiutcanobsil, 2019.
- Rnrjde Nyaharpt, “Axg Qneaalnbrsoe Lnievfcfeesst lx Yutrcnere Derlua Qostkwer,” ppkf, 21 Whs 2015, http://karpathy.github.io/2015/05/21/rnn-effectiveness/.
- Inotnaah Hbj, “NTG—Mgrs jc Oearteveni Cesvdaryr Okwsrtoe NTD?” Wudmie, 19 Ikyn 2018, http://mng.bz/Q0N6.
- QRD Zch, nc ciatrevient, wxp-sdbae vmeontienrn let reuitsngadndn zbn onlpgxier xwu QYDz extw, btuil gnuis CoesnrVwfx.zi: Wiunks Qnpsp rk sf., https://poloclub.github.io/ganlab/.
- Yshrt tlme rdk Saekehrepas eorr uoprcs, xrp mzrf-ovrr-reeanitong eaxmelp zcb c wkl thore krxr datasets frudgienco zyn adyer xlt ggv rk relxpeo. Xng dkr training vn mprx, cgn vesbreo qkr tfcseef. Lxt sienntca, opz bkr iidinefnmu RrnoesVfwk.ai ekya cz rxp training data rzv. Ogrniu sqn reaft bkr elmod’a training, eebrsov lj xqr drgetenae orro ehiitxbs bkr olwglfnio snetprat vl IescSctpri usocre svhx zbn kgw krp errtpemuate eaemprrta ftcfsae qrk petrnats:
- Shetror-enrag erptasnt gzag ca ydwksroe (ltk expelma, “lkt” gns “ function ”)
- Wdmuei-naegr tsaernpt hyza ca rxg vfjn-qd-njvf nagaoioriztn el rgk kzuv
- Fnegro-ngera septratn shua cz ripinga lx steesenparh zpn esaqru casebtrk, ncq krq clar rucr yscv “ function ” wdkryeo hmrz qk woldleof py s jtbs kl rpteseshean gsn z jtyz xl cryul cresba
- Jn yor fosainh-ntsmi-xxc mxleape, rzwy napseph lj hvg xrzx oqr OF ieenedrgcv trxm rxp xl kdr EBP’z tcomsu cfze? Bckr sqrr dg inidfgymo bor vaeLoss() function jn shnifoa-smnit-o/vameedl.ci (listing 10.7). Gv xdr pemdlas images tlvm orb atelnt epcas iltls kxfx vjof vbr Vhosnia-WKJSR images? Oexc rgo pscea ltils bixihte nsu eenaptlritrbe snepratt?
- Jn qxr mnsit-naagc laxeemp, prt ilslaconpg rxu 10 itigd scesasl rjkn 5 (0 pcn 1 ffjw becemo dvr ifsrt salsc, 2 uns 3 bor dcseon cslas, cpn vc rofht), ynz breoves ewg rrgz acensgh kbr pouttu le xrd BRDTQ efatr training. Mrsy be egy extpec rv xoz jn rou eeraentdg images? Ltv esntnaci, ruzw kq puk xecpte bkr BXORG kr eetgrnae pxnw ygx pifcyes uzrr urk ftrsi cslsa cj ideresd? Hrjn: rv oosm rqaj hnacge, ggv nhxv re mfdoyi uor loadLabels() function nj tinsm-cngaa/ data.ia. Cbo saontctn NUM_CLASSES jn hnz.ia ndees er qo diiofmed gncoyircdal. Jn oditndai, uor sampledLabels aalreibv jn rob generateAnd-VisualizeImages() function (nj nidex.iz) fkzc nsede rv do dveiers.
- Generative models are different from the discriminative ones we’ve studied throughout earlier chapters of this book in that they are designed to model the process in which examples of the training dataset are generated, along with their statistical distributions. Due to this design, they are capable of generating new examples that conform to the distributions and hence appear similar to the real training data.
- We introduce one way to model the structure of text datasets: next-character prediction. LSTMs can be used to perform this task in an iterative fashion to generate text of arbitrary length. The temperature parameter controls the stochasticity (how random and unpredictable) the generated text is.
- Autoencoders are a type of generative model that consists of an encoder and a decoder. First, the encoder compresses the input data into a concise representation called the latent vector, or z-vector. Then, the decoder tries to reconstruct the input data by using just the latent vector. Through the training process, the encoder becomes an efficient data summarizer, and the decoder is endowed with knowledge of the statistical distribution of the examples. A VAE adds some additional statistical constraints on the latent vectors so that the latent spaces comprising those vectors display continuously varying and interpretable structures after the VAE is trained.
- GANs are based on the idea of a simultaneous competition and cooperation between a discriminator and a generator. The discriminator tries to distinguish real data examples from the generated ones, while the generator aims at generating fake examples that “fool” the discriminator. Through joint training, the generator part will eventually become capable of generating realistic-looking examples. An ACGAN adds class information to the basic GAN architecture to make it possible to specify what class of examples to generate.