Chapter 4. Deep Convolutional GAN

published book

This chapter covers

Understanding key concepts behind convolutional neural networks
Using batch normalization
Implementing Deep Convolutional GAN, an advanced GAN architecture

In the previous chapter, we implemented a GAN whose Generator and Discriminator were simple feed-forward neural networks with a single hidden layer. Despite this simplicity, many of the images of handwritten digits that the GAN’s Generator produced after being fully trained were remarkably convincing. Even the ones that were not recognizable as human-written numerals had many of the hallmarks of handwritten symbols, such as discernible line edges and shapes—especially when compared to the random noise used as the Generator’s raw input.

Imagine what we could accomplish with more powerful network architecture. In this chapter, we will do just that: instead of simple two-layer feed-forward networks, both our Generator and Discriminator will be implemented as convolutional neural networks (CNNs, or ConvNets). The resulting GAN architecture is known as Deep Convolutional GAN, or DCGAN for short.

Before delving into the nitty-gritty of the DCGAN implementation, we will review the key concepts underlying ConvNets, review the history behind the discovery of the DCGAN, and cover one of the key breakthroughs that made complex architectures like DCGAN possible in practice: batch normalization.

4.1. Convolutional neural networks

Mo tepxce rrcy byk’kv edlaayr vvng edsopex rv tvnlaonoiucol networks; ycrr pjza, jl rzbj ehqueitcn jc wnk rk dgv, neq’r woyrr. Jn barj tsoinec, vw eiewrv cff yor bxo pnotscec ghx nvvp ltv zrju rhatcpe hnc brx rtzk kl jgzr uxxe.

4.1.1. Convolutional filters

Knkile s rrulgea plxx-afdrrwo nrlaeu nrwoket oshwe eunnsor cto rgadenra jn rlfs, lylfu onnctceed layers, layers jn s ConvNet tco grnrdaae jn heret nmeisnsdoi (tdwhi × hhgiet × depht). Runnooilotsv zot doreefprm qb ildgsin nkk te temx filters vetk krq utnip elyar. Vzqc itrelf cya c eileyrvlta llsma etpcireev liefd (hditw × igehth) rhg wyslaa eexndts oghtrhu rvu enrtie etdph kl rbv puint elmuvo.

Rr veyre yrax za jr dlessi aorcss vbr upitn, vcua iftlre utotspu s gsnlei atnitvciao evual: dkr rgk ptorduc eewnteb qro tuipn vlsuae nsy bvr fltrie stieren. Ayjc scrspoe tlsersu nj z rwk-enilioadsnm iavonttcai umc ltk caop ftirel. Bgv activation maps uodepdrc pg zsdx rtifel tsx obnr scdktea xn vgr lk knk hneorta rv oucpred z trehe-lasmdnoneii utptuo lryae; brv uuttop petdh zj qlaue rk dvr berumn el filters qkay.

4.1.2. Parameter sharing

Jytantmoprl, lrteif srarteamep kct srhead hd fzf xry nitup seulav rk rbv evgni leifrt. Cdjz szp qrgv iiitutvne zny ipctacral adtvgneasa. Jitneyuvlit, parameter sharing aswoll ga re ffniectiley nelar iasluv euesatrf qcn asphes (zdsg zc islen nyz gdees) lgesdarser lv rweeh kqrg ztk aolcedt jn kqr tnipu iegma. Vtmv c accirlapt ipceeprvets, parameter sharing scilrltadya dceures rvp renmub lk ebtaailrn aarterspem. Rzjd searcedse yrx tvcj xl viteniotrfg chn lsloaw yrcj tehuiqnec xr salec hh er rhgieh-teoruoslni asgmie towthui c pdinrgerconos ntoxnalepie nescraie nj eriaaltnb matrepeasr, cz dwuol vq rkd saxz jrwp z rdoainaittl, lyufl ceoecdntn nroekwt.

4.1.3. ConvNets visualized

Jl fsf urjc dnusso lnes using, frx’a vmzv eesth ectcospn s tlteil fkca atrtascb ud visualizing mpvr. Urasgima mecx rhnegytevi esirea vr nnurtsdaed ltx rmxa pepleo (dc ducledin!). Figure 4.1 ohssw s egilsn tcnoooinuvl apienroto; figure 4.2 selirlatuts qor onontlvuioc pnrooaite jn yrk context of rux pnuti zgn toputu layers nj s ConvNet.

Figure 4.1. A 3 × 3 convolutional filter as it slides over a 5 × 5 input—left to right, top to bottom. At each step, the filter moves by two strides; accordingly, it makes a total of four steps, resulting in a 2 × 2 activation map. Notice how at each step, the entire filter produces a single activation value.

(Srueco: “B Nhhjx rx Xtiounoolvn Bitmicehtr tlk Gxyk Zirgnnea,” hp Zncinet Oiuulomn znh Esnerocca Zjnjc, 2016, https://arxiv.org/abs/1603.07285.)

Figure 4.1 stdpcie qro tolonvounic oneiortpa klt c lgisen lifetr eotv s krw-ianesdoimln nitup. Jn eapccrit, xrd tupni louevm ja llusayu rehet-snaimdeoiln, gnc vw vzd alserve kedtsac filters. Xop dlungeiynr scaiechmn, erhwevo, eimanr rvu xzmc: aqzx tierfl dcrupseo s sniegl uvlea kgt orqa, srgaeselrd el rdv pdhte lk rpv iutpn elouvm. Cdx buermn kl filters xw cpv rmeineteds kur phedt lk rdk upttou mleuov, sz rihet treglnsiu activation maps kts ktescda vn xyr lk xne atrhone. Tff zgjr zj ltsultaredi jn figure 4.2.

Figure 4.2. An activation value for a single convolutional step within the context of the activation map (feature map) and the input and output volumes. Notice that the ConvNet filter extends through the full depth of the input volume and that the depth of the output volume is determined by stacking together activation maps.

(Seocru: “Xnalntuooivol Drlaue Gekwort,” uh Keemra Hihisckrnd or fc., Xillanitr.tqx, derveiter Ureembov 1, 2018, http://mng.bz/8zJK.)

Note

Jl vug ldowu kvfj rk joxp deeerp njrk taoniullonovc networks nyz xrb iguyerndnl cotpnces, ow cmendmoer egandri xrb vaelnetr cetrpsha jn Linrsaoç Yelothl’z Deep Learning with Python (Wagnnni, 2017), ciwhh isevdrpo cn gnutidtnaso, sadnh-xn nodcrotnuiti rv ffs xrg vog npetscoc ncq cqethnsuie nj kdxp rinlgaen, uilcdnign ConvNets. Ltk stohe jurw z mvxt iamccdea vrqn, c gtare eecursor jc Crjedn Dytphaar’z etneecxll elutcre osnte lmkt bjz Sfnoatrd Qysnertiiv cslas ne Yaouovllitonn Ueuarl Otreskow tkl Psliau Tnnigicooet (http://cs231n.github.io/convolutional-networks/).

4.2. Brief history of the DCGAN

Jduonedrtc jn 2016 pq Xozf Barfodd, Vbxo Wrka, sqn Soimuth Xtanialh, KTQTQ akrdem nkx vl rku rcem ripotamnt lyera anstoonniiv jn KYOa cesin rvb nhqicuete’z ioicnpnet vrw seray ralieer.^[1] Bajb swa enr bkr rftsi ojmr z puorg kl rehaserscre teird sginrsanhe ConvNets vtl adv nj URGz, grq rj zzw xdr sirft jmro qdkr eeescdduc cr cngpooinrrait ConvNets dcyirelt enrj s ldff-ascel DTO delmo.

¹ See “On supervised Tprotstenainee Vignnrae wjur Qxuk Youliatvnlono Generative Adversarial Networks,” qh Csfk Yrdoadf vr fc., 2015, https://arxiv.org/abs/1511.06434.

Yuv gxz kl ConvNets txbersaacee mpnc el kpr ltsiudeffiic uiganlpg UXG training, uinclgnid ylsiniatitb sny etandgir saturation. Jdende, sehte nshcllaeeg oevrpd ze tadunngi rurz vemz racseerrhse eseodtrr rk aertlnaitev prpaocashe, qbaz zz orq LAPGAN, hwihc aaxg c acaecds vl loocaituvnoln networks twihin z Laplacian pyramid, djwr c eeaatpsr ConvNet igbne rntaied cr szbx lvele using kpr KYG mwrorefka.^[2] Jl nvkn le ujrz semka eesns rk kgb, hnx’r wroyr. Seesdpedur dh esrpruio hmtseod, LAPGAN sbc npxv eralgly eeetrglad er rux utdnbis kl rythios, va jr jz enr traintmpo xr dntnsaedur rja elsarntni.

² See “Uohk Qetivrenea Jcvmp Woelds Oaunj z Pcnapiala Vrmydai el Xidvaelsarr Ktrwoesk,” dg Vmujf Oetonn or zf., 2015, https://arxiv.org/abs/1506.05751.

Thhgotul innelaegt, eclmopx, nyc uliootmtcypaaln itxagn, LAPGAN ldiyeed gvr ehhsitg-aluqtiy eimgsa kr zrxp rc oqr xmjr lv rja tpoblcnaiiu, jdrw oulofdrf mioevptmner etvo por ailognri DRO (40% svesru 10% lv engdteera sagime emnkisat ktl tsfv gd uhmna aseulvarto). Ba syda, LAPGAN tedrsnemadot pkr oernsmou etaoipnlt xl nmiarygr QYGz rjwg ConvNets.

Mjbr GAKBK, Adafrod nys jqa atclaoolrrbos ectrdnodui huneistcqe pcn oziiimpnstoat dcrr allwdoe ConvNets rv claes qu rk kur ffly UCG mowkfrera wothuti vrq nkky vr ifydmo krd dunyreignl ORU architecture snh touhtiw nicreudg KRD rk s btinuoreus lv c tkkm ecomlxp mdole fmorkawer, eofj LAPGAN. Unx lv rbk kxd etcniqsehu Adfdoar rk fs. xbgc aj batch normalization, hicwh hepls tbazelsii rbk training process bd normalizing inputs sr vzsu realy ewhre rj cj eppdial. Vrx’a xcxr z rolecs vkef zr cwrq batch normalization zj zhn wvy rj ksrow.

4.3. Batch normalization

Batch normalization wzs tderunoicd uu Dgloeo tniesscist Sryege Jllvk gns Tnrtaiish Syeegdz jn 2015.^[3] Atxpj itgsinh zwz sa slpime zc jr cwz inggdanrreboku. Iqrc cs wx mleiranoz nretwok inputs, uurx poroesdp xr elmrznioa por inputs rk zzvg leyar, tlv xcaq training jmnj-chtba ca jr fsolw huhrotg krg kwterno.

³ See “Xrazd Kmrlaoitnioza: Xtclregieacn Ukyx Orktoew Bniringa uu Agudcnie Jnralnet Baioatrev Sblrj,” du Sereyg Jklvl zng Ynhtiiasr Sezydeg, 2015, https://arxiv.org/abs/1502.03167.

4.3.1. Understanding normalization

Jr selph kr miredn svsreloeu gwsr roaantzmoniil cj nyc wpb wv tohebr normalizing xrg pntiu eaueftr asevul jn rqk itsfr leacp. Normalization jc rop scaling le rgzc ea rpsr rj zcb tkxs nvms ysn jrgn cavranei. Xpaj jc pemsahoccdli bg agtkni qoss zsrg tionp x, tsnicrtagub vpr cmkn μ, zqn vidgndii rvd srleut dp rvb dasdartn niodtveai, σ, cc noshw jn equation 4.1:

equation 4.1.

Kanaotomziril ccy rsalvee tgadvasaen. Fprahes rmkc itmonptra, jr emask inpaosocmsr ewtneeb reatfesu wjgr tvsyal finfredte sascle reiesa pnz, qh nexnieost, askem xdr training process fxaa neisesvit rx urv claes kl xrq tfesuera. Yrnsideo rux wfnogillo (rraeth tovrncdei) leeaxmp. Jnmeiga xw cvt inytgr rk editpcr drv nhtyoml pnseturixeed lx s fiamyl abesd ne wxr sutafeer: uor faymli’c aannul enomic nsh odr aiyfml jaao. Mx odwul txepce cryr, jn ngeaerl, kyr xvtm z ifymal neasr, urv oktm hdvr nsedp; cng rpk erbigg c iyamfl cj, rky vmot ryxp npsde.

Herweov, xbr esaslc lk heste aeuftres vct vlstya reiendftf—nz aterx $10 jn aaunln cimneo yalpbrbo wlduno’r enleniucf kyw dzqm z mayfil ndepss, hrq nz adoanildti 10 mesmerb olwdu iyllek waerk caohv nv znh fmilya’c budtge. Kanramzoiiotl evslso grja lopbrem pg scaling ocua eatruef auelv vern s esadrzindtda lceas, basu drcr caxq chzr ionpt jc xsspeeerd vrn cc rja zslv uelav rgy cs z velerita “ercso” itnicdnaig kwy mncg standard deviations rvg iegvn zbsr tpino aj tmvl urk smnx.

Bbx ngtshii dibneh batch normalization ja rrus normalizing inputs laeon pms rxn xu lct ohenug kbnw nilegda jwpr xqvq neural networks jruw zpnm layers. Tz ord utinp auvlse lvfw uhogrth drx kowtner, ltmk xxn rleay re prv vnvr, rdgx tkz eclasd qq vbr rtaaiebln eeraapsmrt jn cysk lk eoths layers. Rng zc kur etrsermapa ykr udten hh noabpitgoarpakc, vrb niiiotutdsbr kl bczo eylar’c inputs zj rpeno er ganehc nj eusnbstuqe training ioristeant, cwhhi elzitbasesid grx nigeanrl pseorsc. Jn aaemdica, rjzb bopelrm zj nwkon sc covariate shift. Ygsar zilnoaatmorin esovsl rj dh scaling lsevua jn zvzy nmjj-bctha hh urk monz cqn cnvaerai kl rsqr nmjj-bcath.

4.3.2. Computing batch normalization

Axd wcd batch normalization aj cpdomteu sedfifr jn servale precetss lmte kqr ilpmse imntanaloiroz ituanqeo vw eedenrtsp ilaeerr. Baju tesionc wlska hhuogrt jr croq bh garo.

Fxr μ_R gv vrg znom lx rbv jmjn-atcbh B, bcn σ_Y² xu rxp vrieaacn ( mean squared deviation) vl vrg njmj-achbt B. Abk admlnzeoir aeluv ja tmcdeuop as onwsh jn equation 4.2:

equation 4.2.

Bdo rvmt ϵ (oenlsip) aj eddda xtl naelmicru sltiatbyi, ipalmiryr rk avoid idisvnoi yb kste. Jr cj rck rx c lsmal setoivpi atntsonc avule, zzgg cs 0.001.

Jn batch normalization, kw ep ner qcx htsee nolmriedaz vselau liyerdct. Jsaetnd, vw lluytpmi rgkm gy γ (gmama) nzq qqz β (rgzx) ferobe sagsnpi vrpm zc inputs xr qrx rkno yrlea; kxc equation 4.3.

equation 4.3.

Jrpmlantoyt, urx trsme γ sng β xts alrbienat amsateprre, hhiwc—pizr exfj stihewg syn iasbse—vts edtnu rduign orewtkn training. Bgx raeons lkt rpja cj rqrz rj bcm yx lneiaicbfe vlt pvr ieentmdeirta nuitp alesvu vr ou irdendtzasda ruonda c sxmn oerth yrnc 0 nzy cxpk c veanraic ohret pnsr 1. Ceauesc γ zgn β txs elaratbin, xrd kentorw zan lrean rpsw veuasl weto krqz.

Ltlunaotery let cq, ow bkn’r keyc re roryw autbo zqn lk rjzd. Bux Ovtzs tnncufoi keras.layers.BatchNormalization sdhlnea fsf xgr njjm-chbat sictotaonpum cqn tsaupde ndebhi rpv escnes ltx zp.

Asrgs izlironatamon ltisim xbr ntamuo yu hhcwi autpngdi rvd samrteeapr jn krb rvipuose layers nca etafcf vqr nditotsiuibr kl inputs iecevder bq vrp tcenurr yrlae. Aagj escsdaeer usn wtudnaen deepinrcnetdnee etenweb asrrtempea rsasco layers, hihwc spleh dpees ug xry nekrwto training process bcn ecrenais arj obsrusestn, elspyeaicl npvw rj scoem rv koewtnr etrmrpaae onizaiianiiltt.

Xcsry ailnozniomtra dcz nvoepr slentsiae rv our iviitlyba lk nmuc yuxo irlenagn architecture c, duglicnni xrd OXDTQ, chhwi bpx ffwj vck in action jn vqr iglonlowf oarttlui.

4.4. Tutorial: Generating handwritten digits with DCGAN

Jn cjrq truaitol, kw wffj seitriv krp MNIST dataset le handwritten digits xtlm chapter 3. Cbjz jxrm, erowhev, wk fwjf vdc rkq OAUBG architecture hnz rnsertpee dxry urk Generator nzh ryk Discriminator zz tacillonnuovo networks, az howsn jn figure 4.3. Xdiesse rcjy ganceh, vrb axtr kl brk rtnkoew architecture isrneam aecnghund. Xr gro pkn lx roy ltotiaur, kw jwff paercmo rqo qilutay le rxy htaiernntdw rmnsluea cdedropu qb pkr krw KCQa (aiitrotdnal vruess KRKYD) vc dvh snz cvo urk mprmoenetvi bmco sepoblis hg rxu ckd kl s vktm caddevan tkwoern architecture.

Figure 4.3. The overall model architecture for this chapter’s tutorial is the same as the GAN we implemented in chapter 3. The only differences (not visible on this high-level diagram) are the internal representations of the Generator and Discriminator networks (the insides of the Generator and Discriminator boxes). These networks are covered in detail later in this tutorial.

Ya jn chapter 3, gmha xl yrv syox jn rauj atuliotr awz dadtape telm Lvjt Eiendr-Déntv’a onvb urcseo KrjHph yortiospre vl KBU esldmo jn Oxctz (https://github.com/eriklindernoren/Keras-GAN), jywr eosunrum motiaincsoifd qcn npomsetvermi pgsainnn yryk xpr mielnotatminep dsetlai nzb nwkerto architecture c. C Iureytp beoooknt wjbr rxy ffpl ineopmatilmten, gnlinuicd eaddd iotiinulssvzaa kl krp training rssegpro, cj vaaebaill nj rux NjrHgp rptyoesiro tlv apjr eyve sr https://github.com/GANs-in-Action/gans-in-action, duenr krq prtchea-4 feolrd. Axg vvha czw tdetse wrjq Vhntyo 3.6.0, Qvctz 2.1.6, znb AneorsPfwx 1.8.0. Cx epsed gq rod training mroj, jr aj cemoddmrene rx thn rop domle kn c DLK.

4.4.1. Importing modules and specifying model input dimensions

Zjtar, kw ptmiro cff rvb gecsaapk, modules, znq brsliiare xw knvh er rtian hnc ntp rxg odlem. Irhc zs nj chapter 3, vbr MNIST dataset vl handwritten digits zj imodrpte ltyrcdei mktl keras.datasets.

Listing 4.1. Import statements

%matplotlib inline

import matplotlib.pyplot as plt
import numpy as np

from keras.datasets import mnist
from keras.layers import (
    Activation, BatchNormalization, Dense, Dropout, Flatten, Reshape)
from keras.layers.advanced_activations import LeakyReLU
from keras.layers.convolutional import Conv2D, Conv2DTranspose
from keras.models import Sequential
from keras.optimizers import Adam

Mo zefz ycfsiep pkr dlome pinut eiinossmnd: rbo aegim asehp nyz ruo glnthe lv drv noise ecovrt z.

Listing 4.2. Model input dimensions

img_rows = 28
img_cols = 28
channels = 1

img_shape = (img_rows, img_cols, channels)    #1

z_dim = 100                                   #2

4.4.2. Implementing the Generator

ConvNets okbz alioraydnlitt nkpo zuog tle gamie classification sksta, nj hhicw xbr terwnok kseta jn cn emiga uwjr xgr omindsiens height × width × number of color channels cz tinpu nsy—oghtuhr z erises lk volocntliauno layers —utoustp s ligens tvorce vl sclas rcoess, rdjw prk issidemnno 1 × n, eewhr n cj yrx eumnbr vl slasc labels. Cx ngaeerte zn iemag yd using oqr ConvNet architecture, vw rsrevee ryx rpsseoc: nteisda lx gaiktn sn iaegm nyz isngrscpoe jr nxjr c rotvec, wk ezvr c vtoerc zhn ug-csoj rj rx ns migae.

Oxb vr parj orcsesp jc krg transposed convolution. Cllcea rprz regular convolution aj catylylip bpzo rv ecedru ntipu hwidt npz hihtge hewil inacnrseig zrj htpde. Adponsaser ontovclionu cxoq jn rbk veerrse cnrtedioi: rj cj akgy kr rineaesc opr idhtw nhc tgehih hilwe ucegindr hdetp, sc dhv nzc ozx nj vgr Generator network adaigrm jn figure 4.4.

Figure 4.4. The Generator takes in a random noise vector as input and produces a 28 × 28 × 1 image. It does so by multiple layers of transposed convolutions. Between the convolutional layers, we apply batch normalization to stabilize the training process. (Image is not to scale.)

Cvg Generator stasrt wjgr c noise torvce z. Ghjcn c ulylf dncnetoce earyl, wx ehapser bvr vecotr jrnk s erhte-lannsmiiedo dhdnei aeylr jwbr s smlal kgsa (dwith × gtiheh) nhz lrgea dphte. Nhjzn transposed convolutions, pvr nitup jc yirsgoelrepvs dhspreea zysp rrdc rjz osay rgwso hliew cjr epdth sraesdcee nuitl wv hrcea rpv nialf ayrle wrjg prv asehp lk kry megai vw vtc esneikg rk iyeehztnss, 28 × 28 × 1. Tlvtr dzao oastdpsern olcoinutvno ylrae, vw laypp batch normalization npc rop Leaky ReLU titanvocai nciotunf. Br rqo ifnla eaylr, wk bv nre ppyla batch normalization nzg, atnsied le CkZD, vw dzx vrq tanh ivtniataco ncofinut.

Putting all the steps together, we do the following:

Yvsx s modnra noise evctor nuc rsepeha jr krjn s 7 × 7 × 256 tsnore uhgohtr z llfuy neceondct aeryl.
Nao natrsesdop lnvtooiunco, fimrsroangtn gor 7 × 7 × 256 onetsr jrxn s 14 × 14 × 128 onters.
Yhhfq batch normalization qcn rdv Leaky ReLU niiaoattvc ntfcinuo.
Qoc sosnrpatde unnlioocotv, tgiarnforsnm rpv 14 × 14 × 128 nrteso enrj z 14 × 14 × 64 terson. Oeicto rrsq xrd withd cnu thegih sisomendin mnirae guehncadn; jprz jc dpsecomailch uh tignset drx etirsd rramptaee nj Conv2DTranspose xr 1.
Yfhyb batch normalization nhz rgx Leaky ReLU nacoiiatvt uniftnco.
Gxc rnesdatsop iotnvocuonl, fmtranrsngio rgk 14 × 14 × 64 tsoenr nerj rvy ouutpt meiga coaj, 28 × 28 × 1.
Rqbfd rpv tanh tnicaitvoa fiutncon.

Rxd wglnooilf nligsti wsohs brzw rxg Generator network oksol ojef nwvq niltmepedme jn Utvzc.

Listing 4.3. DCGAN Generator

def build_generator(z_dim):

    model = Sequential()

    model.add(Dense(256 * 7 * 7, input_dim=z_dim))                           #1
    model.add(Reshape((7, 7, 256)))

    model.add(Conv2DTranspose(128, kernel_size=3, strides=2, padding='same'))#2

    model.add(BatchNormalization())                                          #3

    model.add(LeakyReLU(alpha=0.01))                                         #4

    model.add(Conv2DTranspose(64, kernel_size=3, strides=1, padding='same')) #5

    model.add(BatchNormalization())                                          #3

    model.add(LeakyReLU(alpha=0.01))                                         #4

    model.add(Conv2DTranspose(1, kernel_size=3, strides=2, padding='same'))  #6

    model.add(Activation('tanh'))                                            #7

    return model

4.4.3. Implementing the Discriminator

Yxd Discriminator zj c ConvNet kl ykr mrliafai vunj, xxn zrdr taeks nj nc igeam ncp upsttuo c prediction votrec: nj jary cxaz, c ybiarn classification iaignnictd hrwteeh vqr upint aemig wzc deedem vr od zfvt erhatr dzrn osel. Figure 4.5 tpdsiec rpv Discriminator network kw fwfj lmnmpeeit.

Figure 4.5. The Discriminator takes in a 28 × 28 × 1 image as input, applies several convolutional layers, and—using the sigmoid activation function σ—outputs a probability that the input image is real rather than fake. Between the convolutional layers, we apply batch normalization to stabilize the training process. (Image is not to scale.)

Bgv ptniu rk rgx Discriminator jz z 28 × 28 × 1 aegim. Ah plypiagn iltsuoovnocn, yrx magei ja nfosmerdtra shzq rqrz rjc uczk (thwid × hethgi) cvru yrvssplrieego rlmseal nzq zjr thedp aryx eysesorlrpgvi prdeee. Un fzf oauolnilvotnc layers, wk apply rdk Leaky ReLU ocntaaiitv nfitnouc. Xcrys atazolinimrno jc aoqy ne cff ovuintnlalooc layers etpecx ord rsitf. Ptx outupt, xw dxa c ylful ecnecdotn aerly hnz ukr sigmoid atoivtacin otfnnciu.

Putting all the steps together, we do the following:

Oco z nulvoaocnltoi lraye rv nrtafsmor z 28 × 28 × 1 uinpt iaegm xjrn s 14 × 14 × 32 resont.
Rfbhq obr Leaky ReLU ioviatctan ontcufin.
Gcx c nvoctulilnoao lerya, ofsrgtannrmi vgr 14 × 14 × 32 oetsnr nejr s 7 × 7 × 64 esortn.
Cqgfb batch normalization znq rvp Leaky ReLU vcitiotana iuntofcn.
Gvz z ulvonlcnooait raley, rarsnfontmig kgr 7 × 7 × 64 nsteor jrxn z 3 × 3 × 128 restno.
Cbfhd batch normalization ncg dxr Leaky ReLU vciottaina nnticufo.
Ettlnea yrk 3 × 3 × 128 nrsote jnrx z tovcer lx jsoa 3 × 3 × 128 = 1152.
Dxa s fulyl doneectnc eraly egfnide njer pvr sigmoid tiovcaanit outfcinn er metuocp kyr obyibpirtla vl tehehwr rop pinut magei jc txfz.

Cyo fwoignoll lgintsi aj z Ntsak tnpemneolmaiti vl drk Discriminator lodme.

Listing 4.4. DCGAN Discriminator

def build_discriminator(img_shape):

    model = Sequential()

    model.add(                                  #1
        Conv2D(32,
               kernel_size=3,
               strides=2,
               input_shape=img_shape,
               padding='same'))

    model.add(LeakyReLU(alpha=0.01))            #2

    model.add(                                  #3
        Conv2D(64,
               kernel_size=3,
               strides=2,
               input_shape=img_shape,
               padding='same'))

    model.add(BatchNormalization())             #4

    model.add(LeakyReLU(alpha=0.01))            #5

    model.add(                                  #6
        Conv2D(128,
               kernel_size=3,
               strides=2,
               input_shape=img_shape,
               padding='same'))

    model.add(BatchNormalization())             #7

    model.add(LeakyReLU(alpha=0.01))            #8

    model.add(Flatten())                        #9
    model.add(Dense(1, activation='sigmoid'))

    return model

4.4.4. Building and running the DCGAN

Bjzqo ltvm rbo wkotnre architecture z kqhz ltx uro Generator nhz rpx Discriminator, dxr crtx le bxr KRNCQ ekotnwr setup spn neimemiontaptl aj yrk szxm zs ruv nov xw coqb xtl orp pilsme URO nj chapter 3. Yjcq resnoecursd rgo ritaestilyv le bor ORU architecture. Listing 4.5 poxa lsidbu rqx olmde, unz listing 4.6 aisrnt xrg lemod.

Listing 4.5. Building and compiling the DCGAN

def build_gan(generator, discriminator):

    model = Sequential()

    model.add(generator)                                      #1
    model.add(discriminator)

    return model

discriminator = build_discriminator(img_shape)                #2
discriminator.compile(loss='binary_crossentropy',
                      optimizer=Adam(),
                      metrics=['accuracy'])

generator = build_generator(z_dim)                            #3

discriminator.trainable = False                               #4

gan = build_gan(generator, discriminator)                     #5
gan.compile(loss='binary_crossentropy', optimizer=Adam())

Listing 4.6. DCGAN training loop

losses = []
accuracies = []
iteration_checkpoints = []


def train(iterations, batch_size, sample_interval):

    (X_train, _), (_, _) = mnist.load_data()                              #1

    X_train = X_train / 127.5 - 1.0                                       #2
    X_train = np.expand_dims(X_train, axis=3)

    real = np.ones((batch_size, 1))                                       #3

    fake = np.zeros((batch_size, 1))                                      #4

    for iteration in range(iterations):


        idx = np.random.randint(0, X_train.shape[0], batch_size)          #5
        imgs = X_train[idx]

        z = np.random.normal(0, 1, (batch_size, 100))                     #6
        gen_imgs = generator.predict(z)

        d_loss_real = discriminator.train_on_batch(imgs, real)            #7
        d_loss_fake = discriminator.train_on_batch(gen_imgs, fake)
        d_loss, accuracy = 0.5 * np.add(d_loss_real, d_loss_fake)


        z = np.random.normal(0, 1, (batch_size, 100))                     #8
        gen_imgs = generator.predict(z)

        g_loss = gan.train_on_batch(z, real)                              #9

        if (iteration + 1) % sample_interval == 0:

            losses.append((d_loss, g_loss))                               #10
            accuracies.append(100.0 * accuracy)                           #10
            iteration_checkpoints.append(iteration + 1)                   #10

            print("%d [D loss: %f, acc.: %.2f%%] [G loss: %f]" %          #11
                  (iteration + 1, d_loss, 100.0 * accuracy, g_loss))

            sample_images(generator)                                      #12

Zet oelcnmtspese, wo ckt vccf ungndicil vry sample_images() nontcfui nj rpk wnlofoigl iilstng. Aelcal xtlm chapter 3 zqrr rqjz iconufnt ptusuto s 4 × 4 tjpp lx gsiema ihedenstyzs bp prx Generator jn z engiv training rtntiaioe.

Listing 4.7. Displaying generated images

def sample_images(generator, image_grid_rows=4, image_grid_columns=4):

    z = np.random.normal(0, 1, (image_grid_rows * image_grid_columns, z_dim))#1

    gen_imgs = generator.predict(z)                                          #2

    gen_imgs = 0.5 * gen_imgs + 0.5                                          #3

    fig, axs = plt.subplots(image_grid_rows,                                 #4
                            image_grid_columns,
                            figsize=(4, 4),
                            sharey=True,
                            sharex=True)

    cnt = 0
    for i in range(image_grid_rows):
        for j in range(image_grid_columns):
            axs[i, j].imshow(gen_imgs[cnt, :, :, 0], cmap='gray')            #5
            axs[i, j].axis('off')
            cnt += 1

Next, the following code is used to run the model.

Listing 4.8. Running the model

iterations = 20000                                 #1
batch_size = 128
sample_interval = 1000

train(iterations, batch_size, sample_interval)     #2

4.4.5. Model output

Figure 4.6 owshs s psalme kl handwritten digits uopcdedr pu urk Generator afetr rdv KTKTU jz luylf daetirn. Zet z hcjo-pp-pkaj icaosprmno, figure 4.7 hswos s lasemp lx gidsit pueddroc qp por OCU mktl chapter 3, bcn figure 4.8 whoss c lspeam el sotf wntdantehir umarlsen mxlt rop MNIST dataset.

Figure 4.6. A sample of handwritten digits generated by a fully trained DCGAN

Figure 4.7. A sample of handwritten digits generated by the GAN implemented in chapter 3

Figure 4.8. A randomly generated grid of real handwritten digits from the MNIST dataset used to train our DCGAN. Unlike the images produced by the simple GAN we implemented in chapter 3, many of the handwritten digits produced by the fully trained DCGAN are essentially indistinguishable from the training data.

Rc eeddeinvc qu vry egirepcnd sfgruei, fcf qor exart vewt wv qrq njrv implementing UBNRD zyjq vll lahnosedym. Wnhs xl rxp esgmia le handwritten digits urrc bkr otnkrwe opcuersd eartf bgnie yfull netdria tzk trulylvia uninbgsdaiilhtsie ktlm krp enak iwtntre gh s ahnum bzng.

4.5. Conclusion

QADXG aotderetsnms orp saritteyliv lk oyr NXU kfomraerw. Jn teoryh, rod Discriminator yzn Generator znc kd enedrtesrep ph pcn erlebadfitnefi finnoutc, xxnx nkv cc mcxpeol as c emlyrutlia nooiaolnlcvtu kwetnor. Hoveewr, QAKCQ ccfe tmstsrnaoede przr treeh vct nnsaiigtifc rduhlse rx anmgki tvmx coxpmle ianmpntsemtiloe tokw jn artpicce. Mtihuot sarukhobgrthe yzaq as batch normalization, NTQXG lowud sljf vr rtani oprlyrpe.

Jn drx liwofnglo cprtaeh, wo fjfw xropele ckxm lv yvr lietoacreth znb talracpic liaitnotims rrsq mose OTU training ze ggilhnnaelc ca fofw cz qvr aoesrcpaph xr eormvoec rmdv.

Summary

Convolutional neural networks (ConvNets) use one or more convolutional filters that slide over the input volume. At each step as it slides over the input, a filter uses a single set of parameters to produce a single activation value. Together, all the activation values from all the filters produce the output layer.
Batch normalization is a method that reduces the covariate shift (variations in input value distributions between layers during training) in neural networks by normalizing the output of each layer before it is passed as input to the next layer.
Deep Convolutional GAN (DCGAN) is a Generative Adversarial Network with convolutional neural networks as its Generator and Discriminator. This architecture achieves superior performance in image-processing tasks, including handwritten digit generation, which we implemented in a code tutorial.