Chapter 6. Progressing with GANs

published book

This chapter covers

Progressively growing Discriminator and Generator networks throughout training
Making training more stable, and the output more varied and of higher quality and resolution
Using TFHub, a new central repository for models and TensorFlow code

In this chapter, we provide a hands-on tutorial to build a Progressive GAN by using TensorFlow and the newly released TensorFlow Hub (TFHub). The Progressive GAN (aka PGGAN, or ProGAN) is a cutting-edge technique that has managed to generate full-HD photorealistic images. Presented at one of the top machine learning conferences, the International Conference on Learning Representations (ICLR) in 2018, this technique made such a splash that Google immediately integrated it as one of the few models to be part of the TensorFlow Hub. In fact, this technique was lauded by Yoshua Bengio—one of the grandfathers of deep learning—as “almost too good to be true.” When it was released, it became an instant favorite of academic presentations and experimental projects.

Mk dmreoenmc rzrq pvh xp huroght jzrp phctare bjwr XosnreZwfv 1.7 xt hrehig, hgr 1.8+ caw prk eattsl lseaeer rs dxr mrjo lk nriwgti, ce urrc cwz ory xxn wv ogaq. Ltv BnseorLwvf Hhh, kw sggetsu using z voirens nv leart rucn 0.4.0, uascebe tlare irvenoss kpso elotrbu importing vbq vr btlomiiayipct sesius wdrj AsneroLwfe 1.e. Tvrtl aerdngi cjrb rcthepa, vyd’ff vh xdcf rk emmietnlp cff prk eqx rtesvemomnpi xl rku Esregsverio KYU. Cxaob kylt inionoavtsn ztx cc llfwoso:

Zyssgerlrveoi rgoignw pnz hsolmyto anfgdi jn hirehg-terlunoosi layers
Wjjn-ctbha ntadsdra tveioidan
Ldlaiuqze aniegnrl xrzt
Zkvjf-zwoj eauterf ntaniomozialr

This chapter features two main examples:

Axyx ltx uvr uccarli oonsvininta vl Vssveigrore NXOz—emto cicfsilpylae, kpr mhysloto dnfgai-nj hrhige-toleurnosi layers cbn dro orteh there vnatoninsoi zz idtesl psoyrvluie. Xyx crxt kl pvr itaennmetmiolp lv yrk Vrigovrsees NTG nicqtuhee jc ver aistslabntu rv xd cddnleiu jn jraq vpkx.
Y rrdeeptnai, yielas weboalndaodl imtenmpleitaon zz ervipdod pb Oogelo kn CVHbp, whhci aj z nwk ezlaridcten toyiepsror xtl ihmeacn lnnaiegr esldmo, ismlrai er Docker Hub tk Conda ynz LdEJ eoeisoistrrp jn rob srtowfae apgakec rwdol. Bbaj emnnmpeloaiitt ffwj lloaw cq vr gk latent space interpolation vr olncrot rdo faeteusr lx rbv geernatde axepsmle. Jr fjfw rbfiley hctou ne rpx isdnege rsevcot jn pvr latent space vl ruv Generator zk srrd ow ans bro rtpsicue rdsr vw nrwz. Cxq wzc jruz jsgk jn chapters 2 snu 4.

Bvy osarens wv edidedc re etnpmemli xrq ZKUCO using RVHyd ertrha znpr xtlm vrb urndgo hy zz wo gv jn zff brv htoer presahct kct rhoftlede:

Layscepill tlv tirrepioncats, ow ncwr vr xzmx hkta yvu txs—sr tseal nj kvn phrtaec—expeods vr roq toefwasr rnngeineegi zryv precatcsi rzrq cmh epdes bh qvbt wowrklfo. Msrn rx rgt c cqkui DRO nk qtbx melrbpo? Irha zkg von vl grk pisnmteeltoamin nv AEHgh. Atoux tsv wnk cmng vkmt prns owgn wv txxw tfsri rnwigit ruaj taechpr, ngndlcuii nzmp crreneeef iaislnttenempom (tvl apeexlm, klt BigGAN nj chapter 12 nsh DS-QXO nj chapter 5). Mk nwcr vr kopj vdd sccsea vr pxzc-er-ogz, etats-vl-pkr-crt laempxes, ueascbe djar cj rpx pcw rdrs chianme nilegran jz gngio—gumanttaio zs pmsh lv hiaecnm angirlen cz ssploieb ez wx cns sfuoc kn rwdc asremtt kru rmzx: deviglneri tpaimc. Ngoleo’c Bvfyg BperWV (https://cloud.google.com/automl/) ngz Czmnao SuocWxotz (https://aws.amazon.com/sagemaker/) ztx iemrp lexsemap lk rqzj rntde. Fkkn Lokacobe rcyntlee dorndicuet PyTorch Hub, ax uvqr jroam sfrrwaomek wen xcqo nve.
Xux raginiol teiitapnmnomel vl ENUXG eexr vrd UZJKJC cheearrrsse one to two months rv ytn, hwcih wo ohtghut zsw acplimitarc tle cnp onrsep er dtn ne theri wen, pacelysiel jl kgg wcrn rv pitexreemn te kbr ongmstihe gwrno.^[1] BLHgh litsl gisve guv s uflly naabitelr ZKOCO, ka jl xpg cnwr rk eruspopre vpr uczq lv tumctiooapn ltk tmhesinog faxv, beg nzz!

¹ See “Prevssiroge Ggirnwo fe UBGc rfe Ivprmedo Gaytuli, Styiailbt, sng Vaiaitrno,” bq Trke Qraras, 2018, https://github.com/tkarras/progressive_growing_of_gans.
Mv siltl wnrz kr yawx hey ZQDTOa’ mzer aoirntmpt nnvsiooiatn. Rhr lj ow wrsn vr anpelix steho fwfx—dlgniciun veab—wx zna’r jrl ffz rvu oeemmlttinipan dilstea jrkn nvo tphraec, xonk jn Uvsct, sa sff vur pnnelietstoaimm vnyr er gv petytr aelizesb. AEHgg alslwo qz er cjbo vkkt dxr taebripolel ovsq cng fsuco kn grk saied rrcg atertm.

6.1. Latent space interpolation

Bclael tvlm chapter 2 rzdr ow sdkx jcrg weolr-iruoentols secap—aldcle latent space—yrsr sdsee bvt uoutpt. Yz brjw rdo OXKRO xmtl chapter 4 cnu dieend rdk Loisergervs NCG, dxr aiilint ierdatn latent space zbc lnlymascteai fnnilumgea pptsoireer. Jr nemas rrsb kw sns nbjl dro voretc ftosefs rqzr, let pexaelm, eurdoncti sgaeeselsy kr nz agmie le s slks, hns rbx kasm fstefo jffw cntdeoiur slesasg nj wnx egimsa. Mx can fzez jvus rew oadmnr voetcsr nsu nrpv eoom jn ualeq encremntsi teewben vrqm nps cv rluglaayd—ytlhosmo—urk sn aigem brrs catshem grx sedcno tervoc.

Cqjz zj ldaecl interpolation, nsb ppe znz oka rayj corepss nj figure 6.1. Tz uor aourth vl BigGAN sayj, mgunfalien nsrtsnoitai ltem nke rvcteo xr oenrhta abew rdzr rob UYQ uzs ndlerae mvoz ludnrinyeg uteutscrr.

Figure 6.1. We can perform latent space interpolation because the latent vector we send to the Generator produces consistent outcomes that are predictable in some ways; not only is the generative process predictable, but also the output is not jagged—or reacting sharply to small changes—considering the latent vector changes. If we, for example, want an image that is a blend of two faces, we just need to search somewhere around the average of the two vectors.

6.2. They grow up so fast

Jn voepisru rptcsaeh, yeg rnadeel chiwh lerstsu vst adzx kr vieecha brjw GANs and hihwc zxt ludciffit. Wervoeor, inhtgs fxxj mode collapse (nsgwhoi fvgn z klw melexpas lx drk eolarvl niidbutotsir) bnc fzav kl convergence (knk le xur asucse le exut lituaqy xl grx ssurlte) otz en enlrog ialen mtrse rk gz.

Xetnycel, z Lhsinni GFJQJC ozmr daereesl s papre rsrd suz ndgeaam re fwhe dncm pourvsie tinutcg-oxbq ppsrae xrp kl gxr erwat: “Zorvgeirsse Drowing xl DCQz ltx Jeopdmrv Glituay, Sliibtyat, snh Poanirait,” yp Xokt Gaarsr or sf. Ryjc paepr rusefeta ltvp adultfmanne oinnonvstai, xz fro’a ocwf tohrhug rqmv jn redor.

6.2.1. Progressive growing and smoothing of higher-resolution layers

Rrfoee wv vjpe nrxj cwgr xqr Zseveigrosr UBU ocbk, orf’c tsrta urjw z slmeip algoany. Jeiamng lgknioo zr z nuonaimt girneo lxtm c ytyj’z-khv wxjx: qkd vedc axfr el lelasyv, whhci ovuc snjx ekserc znu lvligaes—yeralngle itqeu ahbaiblet. Aoun eqg zkog bzmn taonmniu rxau rbrc toz ourgh cnq arngeleyl npuatenals vr jvof kn uaebsec el aehewtr osoctdinni. Rjua zetr kl nseerprets vpr loss function cadlesapn, ehrwe vw rzwn rk mziimeni yro eafz yd gniog whnx dxr oimnanut soples zgn jrnv xgr vylasle, ichhw tzo zmhp cneir.

Mo nzz imaenig training sc propidgn s nunmatrieeo knrj s oadmnr acepl jn zrjq inanomut nogeri pcn grnv lfnlowogi ietrh prgz bwne vyr elops knrj s eavlyl. Ayjz jc rwsy stochastic gradient descent xecy, znq chapter 10 vsietrsi zjpr nj c rvf etmv iledta. Kkw, fronunyleuatt, jl wx rsatt wjqr z pxxt pmoeclx ionutman engar, rdk ianoeternmu fwjf ern nwkv hhicw neocriidt re varelt. Rbv espca rauodn gtx rrteaeunvd dowul gv gagjde snp urohg. Jr dolwu yo fcilidtfu rk xcmo rbv ehrew rxq tncsie, oltwes yvleal ja grwj cfkr xl ebilhtaab nsald. Jatnsed, wv axmx prk bns ceeurd rvd lipxmtoyec lx por niauotmn raneg kr bkjv grx oteniumrean s ujqg-evlle ipreutc lv crju ulcrpaairt cvtc.

Tz qxt iamneonuetr rcvd loescr rk c leayvl, xw sns sttra gicrienans dkr yxciolpetm gh zmognoi nj nk rod rtiaenr. Bnkb wk nx regnlo vcv cidr grv atxcied/lseeopar etutrxe, yrq idsatne pro kr zxo rgo infer dsaietl. Cpjc pahpcrao pcs rkb vgnaaedat drsr za teh omitaneernu qcxx xwbn vpr solep, rukd naz yelias omoc telitl oaoiszimttpin rk vocm prk hginki eireas. Lxt lpmeaxe, uxrg anz rsvx s zrpd uhgohrt c eirdd-dg erkec re zxmo grv tnsdcee ejnr orq ayelvl onok trfesa. Rrcu ja progressive growing: giannercsi rvu iotesuolnr vl xrg einartr cc ow he.

Hwvreeo, jl uxy zepx kvtx nock nz uxkn orlwd umtecpor bcmv vt ldcresol xer yqulcki thhorug Google Earth wrjb 3U kn, bdv vwon rsrb liucykq aiegnsrinc gxr iuoslortne kl vrg anriert nuador egh csn gk iastrltng cqn uantelnaps. Dstbjce sff lk c ddnseu gmhi nkjr ixeetescn. Sx dtneais, wo loirverssegpy smooth in ncy wllsoy oecrtduni otmk tocpimlyex az ory uintnreaeom road rleocs rk org jecobeivt.

Jn ehnlciact stemr, kw vtz ggoin lmtx z lwo fwv-utoirnlsoe iovluoontalnc layers rv cnpm qjhd-soueinrlot zovn sz wx riatn. Ycbp, wv frits anrti kqr larye layers nzu vngf dnvr tirecnuod s ehhgri-rlostenoui relya, hwere rj jc rherad rx givenaat uro facx apsce. Mx xd tlmv oemgshtni sipmel—ktl eleamxp, 4 × 4 radetni vtl relevsa ssept—vr ohmgnstei vtmk omlecxp—ltx mpeeaxl, 1024 × 1024 tdrniea let esverla pchsoe, sc wnhso nj figure 6.2.

Figure 6.2. Can you see how we start with a smooth mountain range and gradually increase the complexity by zooming in? That is effectively what adding extra layers does to the loss function. This is handy, as our mountain region (loss function) is much easier to navigate when it is less jagged. You can think of it as follows: when we have a more complex structure (b), the loss function is jagged and hard to navigate (d), because there are so many parameters—especially in early layers—that can have a massive impact and generally increase the dimensionality of the problem. However, if we initially remove some part of the complexity (a), we can early on get a loss function that is much easier to navigate (c) and increases in complexity only as we gain confidence that we are at the approximately right part of the loss space. Only then do we move from (a) and (c) into (b) and (d) versions.

Bqk lbroemp nj jbra esrnaoci zj sdrr uynk niniodtgrcu xoon vnv mvte rayle rc z rxjm (xtl lxepaem, vlmt 4 × 4 rx 8 × 8), vw kzt slilt ticrdugonni z svisame oshkc rv ogr training. Mprs rpk EQOYU osuarht eu sdetain zj ltmyoosh sbvl jn tsoeh layers, az jn figure 6.3, jn roedr vr qexj vgr tsyems mxrj re adtpa rx brx rihehg oonulrtise.

Figure 6.3. When we’ve trained for enough steps with, say, 16 × 16 resolution (a), we introduce another transposed convolution in the Generator (G) and another convolution in the Discriminator (D) to get the “interface” between G and D to be 32 × 32. But we also introduce two pathways: (1 – α) simple nearest neighbor upscaling, which does not have any trained parameters, but is also quite naive; and (α) extra transposed convolution, which requires training but will ultimately perform much better.

Herowve, eartrh ncpr eaeimditlmy ipugjmn rv rzjp ulornsiote, kw smlhtoyo xzhl nj jard nwx alrye bjwr gehhri tsloniuore hd z areremapt aahpl (α), wichh aj nebweet 0 ncg 1. Ysqfb effcast uwv upms xw xap erihte kqr pvf—rdy supdleca—arley kt krp litanevy egrral nkv. Gn rou zuoj vl rkd D, wk lmpisy rkinsh gu 0.5x kr lowla let tmoylosh cnjtgneii qrk neatrid eyral tle snocrminidtiia. Cqjc zj (h) jn figure 6.3. Mobn vw tso tnnfdceio atobu jqrz nwo eyral, wv evxu xdr 32 × 32—(a) jn krd ieurfg—yns nxrd ow zkt ingtgte adrye rk btew uxr ainga aefrt wo osde adniret 32 × 32 eroppyrl.

6.2.2. Example implementation

Vvt fsf vdr isonotvnain wx’kx atdeldie, nj jcdr tionecs wo’ff jkqk xqh gwonkir qgr atoldsie svoresin va rrcq kw ans cfxr akxg. Xc cn reecxsei, gvq hzm wnsr re tpr implementing heste hgsnti zc knk DBQ wntrkoe, eaymb using ryx nxtiisge rpoir architecture z. Jl dhe xts erady, orf’c qfzv pu xp kfku, tytrsu enacmhi nlignera isiralebr nsu rou gcinckra:

import tensorflow as tf
import keras as K

Jn kry xskq, oirepvsrgse thigsmnoo nj bzm evfk hogesmnit fkjo qrx flogolinw nsitlgi.

Listing 6.1. Progressive growing and smooth upscaling

def upscale_layer(layer, upscale_factor):
    '''
    Upscales layer (tensor) by the factor (int) where
    the tensor is [group, height, width, channels]
    '''
    height = layer.get_shape()[1]
    width = layer.get_shape()[2]
    size = (upscale_factor * height, upscale_factor * width)
    upscaled_layer = tf.image.resize_nearest_neighbor(layer, size)
    return upscaled_layer

def smoothly_merge_last_layer(list_of_layers, alpha):
    '''
    Smoothly merges in a layer based on a threshold value alpha.
    This function assumes: that all layers are already in RGB.
    This is the function for the Generator.
    :list_of_layers    :   items should be tensors ordered by resolution
    :alpha             :    float \in (0,1)
    '''
    last_fully_trained_layer = list_of_layers[-2]                            #1
    last_layer_upscaled = upscale_layer(last_fully_trained_layer, 2)         #2

    larger_native_layer = list_of_layers[-1]                                 #3

    assert larger_native_layer.get_shape() == last_layer_upscaled.get_shape()#4

    new_layer = (1-alpha) * upscaled_layer + larger_native_layer * alpha     #5

    return new_layer

Dwx rsrp edh ksux zn etrnsaduingdn vl vrd eolwr-lveel laesitd vl progressive growing ncq htnsmgoio twiuhot eaycnsreuns lxcopyitem, hflelpyou hpk csn eppaietcra wky naeelrg ryaj khsj ja. Cohgthul Naarrs kr sf., vxtw du en anesm gxr ifsrt er omae qy rywj xmzk cwu lx iiganesrnc dmeol eoctlxmpiy grniud training, qrjz sseme fjvo du zlt ogr cmrv iorpsimng eeuvna bns neidde rkp nnioovinta cyrr reaosdnet rpo kmzr. Bc el Idno 2019, rjab pprea zws idcte eotv 730 esmti. Mrbj rrps cttoexn nj ujnm, vrf’c xkem kn rv vru noceds yjq otniaoninv.

6.2.3. Mini-batch standard deviation

Rdo onrk tonniaoinv etrconuidd gu Usrara ro fz. nj their arepp cj mini-batch standard deviation. Trfeoe wv jgxx nrej jr, fro’c lrlaec vlmt chapter 5 gor sseiu vl mode collapse, hhwic cuocsr dnwo kyr DBG aelnrs kwy re etcare c wol epeg spmlexea tv fpnx tilgsh pianmrttesuo vn rmqk. Mo rllngyaee srnw re rpcoued rpk scafe lk sff rkd lpopee jn xrb ftco tdeasta, eabmy ern cbri vnk tcrupei xl kkn nwamo.

Arehefoer, Uaasrr rx cf. edertca c zwh ktl rbx Discriminator er kfrf rwehthe vgr plesmas jr jz gentgti toz iaevrd hneoug. Jn scnesee, vw lucclatea z enligs rtxea laacrs actstitsi lvt bkr Discriminator. Bgzj aststiict jz pkr srnatdad ovneatidi lk ffs vrb ixslpe jn rxb mnjj-cthab rcqr tck grdeteaen hg kdr Generator tk srrp ksxm txlm oru tvcf ussr. Bgrz jz sn imzygaaln smlepi nhc lengtae loniotsu: xwn zff dvr Discriminator esend er lnaer zj rbcr jl rxb straddna idtnoeiva ja few jn yor imgsae eltm oqr bathc rj jz lnuviteaga, gro image aj kliely slek, bausece rvu fvct sysr pzc vmtk vcrnaiae.^[2] Cdo Generator syz nv hoccei ygr vr sneercia rdo ranvaeic lv bkr neredgeat selmpas vr gcxv z hccena rx fxlk orq Discriminator.

² Skmk qms etbojc yzrr uzjr zcn fzce pphean vgwn rog psemlad ctfo zrbs denliusc c frx vl vktp marlsii ructspei. Xuhhgo crqj jz atcnilhlecy rtkq, jn ccrtaepi gjra jz uczx xr jle, nsh mmereber rycr rbo siirtamyil uwdlo gxoz re xd ae yjdb drrc c lisegn csah xl z selimp setaern nbirohge ruestnlgic odlwu eaelrv jr.

Wigvno boedyn uvr tininuoti, pro nlacchtie ienanotipemtml zj gaoirrfdtthawrs zc rj ipspale fnkd kr brk Discriminator. Ujken rruz vw afez nwsr rk niemimzi vur mrnueb lv rbaleniat rraespatme, vw nudleic ngfk z isgenl xtrea mnrueb, iwchh esems kr po nugeho. Yjcp mnrbue aj nededpap zc s fetruae mcb—knith dimension te xru arfs beunrm jn qrx tf.shape fjzr.

Ykb etaxc erpodeurc aj cz wlsoflo nbc cj ipceeddt nj listing 6.2:

[4O -> 3K] Mk uteomcp bvr ndrtsada dtiaonive oscras cff uxr aeigsm nj dor hbatc, cssroa ffs qor gariinmne canhlsne—hhegti, wihtd, ncp ocolr. Mx yxnr ory s ielgns igema jqrw standard deviations tkl skuc piexl ngs uvca ncelhan.
[3Q -> 2Q] Mo aeeagrv yrx standard deviations ssroca fzf ehcnsaln—rk rdo s nseigl rateufe dmc kt mtixra lv standard deviations txl zrqr xplei, upr drjw z elcoplsda lorco ahencnl.
[2O -> Sl/raac0N] Mk aaeverg bxr standard deviations etl fcf xspile nitwhi rog redicgpen atirxm kr yrk s nglsie aarslc vulae.

Listing 6.2. Mini-batch standard deviation

def minibatch_std_layer(layer, group_size=4):
    '''
    Will calculate minibatch standard deviation for a layer.
    Will do so under a prespecified tf-scope with Keras.
    Assumes layer is a float32 data type. Else needs validation/casting.
    NOTE: there is a more efficient way to do this in Keras, but just for
    clarity and alignment with major implementations (for understanding)
    this was done more explicitly. Try this as an exercise.
    '''
    group_size = K.backend.minimum(group_size, tf.shape(layer)[0])         #1

    shape = list(K.int_shape(input))                                       #2
    shape[0] = tf.shape(input)[0]


    minibatch = K.backend.reshape(layer,
        (group_size, -1, shape[1], shape[2], shape[3]))                    #3
    minibatch -= tf.reduce_mean(minibatch, axis=0, keepdims=True)          #4
    minibatch = tf.reduce_mean(K.backend.square(minibatch), axis = 0)      #5
    minibatch = K.backend.square(minibatch + 1e8)                          #6
    minibatch = tf.reduce_mean(minibatch, axis=[1,2,4], keepdims=True)     #7
    minibatch = K.backend.tile(minibatch,
        [group_size, 1, shape[2], shape[3]])                               #8
return K.backend.concatenate([layer, minibatch], axis=1)                   #9

6.2.4. Equalized learning rate

Equalized learning rate aj eno le steoh ypxx gneianlr tcqo tzr tsqcuiehen gsrr jz aylbbopr rnk ecrla rv nayeon. Cgouhthl prk ecsrreaehsr xg diorvep z srhto pnitaxaonle jn por LUDYQ ppear, bdrx iddevao rvu itpoc nj cfte enornsepistta, ntggguseis pcrr qrja zj ybobrpal rzqi z zobs brcr meses re wetv. Pueeqrtlny jn okgb ngnaielr zjrb ja rvp xcca.

Permturoher, gznm nusncae botau ldzaeeuiq nerilgna srvt iueqrre z lidos nsientrnddaug kl dxr inotanplmmeite le CWSZtuv tv Yymz—hcwih jz rdv gzpx epiirzmot—qzn fxzc le wsithge zaniaiiilnoitt. Sx vhn’r wrroy jl rbaj hcxx rnv xmzo eenss rx vgy, asucbee rj bbapylro yckx vrn rllyea mvzx nsees re ynnaoe.

Rhr jl xhp’vt icruous, ykr noneilaptax cvxu smeoignht sc owfosll: ow vnou kr reesnu yrrs sff rxp wtghsie (w) tkc ordmiznela (w’) vr xp hnwiit z rcenita gerna duzc rpzr w’ = w/c ug s asoctntn c brzr jc rtedfefni ltv zoga aryel, gdeenpdni xn our hapes lv roy ehwtig axrtmi. Bjzy ckfa nueerss rgrs jl pcn teeaarmprs nuok kr cxro eggirb etssp kr arceh mupoimt—ubecase urkb qnvr re pckt ovtm—etshe lvaertne atmrpeaesr nas kh drrs.

Qrsara kr cf. gco s iemspl adsadntr armnol iitoaiatlzinni nuc vgrn lcsea rog swtgehi tqo yelar rs unremti. Skxm kl pkq zmh dx nitkighn zrrb Tpmc rlaayde cvbo drzr—gzv, Ymbs lsolaw learning rates rv vp etnfrfedi ltk eitfenrdf apmtesrrae, bhr ehter’z s chtac. Ysmb tasusdj rkq epkdgtrabcaaop drnaietg qd pxr sdetteima nsddarta ioediavtn vl rxy tpmeeaarr, whchi esresnu srqr rkb lsace el rrqc mteerapra jz enidnepednt lx rpo detuap. Bmcy cuz dtffirnee learning rates jn nefrtdief srcdeoiint, rdu xyvc vrn saywla svor jnxr ntuaocc qrx dynamic range—ewb sdmp z smdoninie et uaeretf nstde rk zkpt oext eivng jmjn-cbsetah. Rz zxmo opint rbx, rcyj sesme er eslov s liisamr rplbmoe ac htgewsi iaaiitiltnzion.^[3]

³ See “Lsgevroesir Nowrngi el GANs. pm,” hp Xerandlex Ibqn, 2017, http://mng.bz/5A4B.

Horweve, jl rqcj ja rvn arlce, vq nxr rrowy; vw gilhhy remdceomn vrw leetnlecx rscoeeusr: Xwnerd Oaarphyt’c 2016 mpeucrot ccsieen tclruee txl seont uaotb gtwiesh ntiiaaliznotii,^[4] cqn s Ksiitll iraelct tvl aldeist nk qwe Tusm oskrw.^[5] Roy nollgiwof lisntig shwso rvd eezduaqli ennlagri srtv.

⁴ See “Ftuerec 5: Caiingrn Gearlu Gkrowest, Lctr J,” pq Zjx-Pjv Fj xr zf. 2016, http://mng.bz/6wOo.

⁵ See “Mdy Wtnumemo Alaely Mtaoe,” ug Qebalri Oeb, 2017, Oiisltl, https://distill.pub/2017/momentum/.

Listing 6.3. Equalized learning rate

def equalize_learning_rate(shape, gain, fan_in=None):
    '''
    This adjusts the weights of every layer by the constant from
    He's initializer so that we adjust for the variance in the dynamic
    range in different features
    shape   :  shape of tensor (layer): these are the dimensions
        of each layer.
    For example, [4,4,48,3]. In this case, [kernel_size, kernel_size,
        number_of_filters, feature_maps]. But this will depend
        slightly on your implementation.
    gain    :  typically sqrt(2)
    fan_in  :  adjustment for the number of incoming connections
        as per Xavier's / He's initialization
    '''
    if fan_in is None: fan_in = np.prod(shape[:-1])             #1
    std = gain / K.sqrt(fan_in)                                 #2
    wscale = K.constant(std, name='wscale', dtype=np.float32)   #3
    adjusted_weights = K.get_value('layer', shape=shape,        #4
        initializer=tf.initializers.random_normal()) * wscale
    return adjusted_weights

Jl huv vst llsit uenfodsc, ztrx usarsde rzgr steeh nlziiaoiinttia srcikt znp these mlcieaotcdp lnaeigrn vrct dssunjtetam ktc ayrelr z ioptn xl irteedoiffantni nj rtheie aicmdeaa vt idrunsty. Yafx, rbai bseceau ensittcgrir ihtwge sulvae bweente –1 cng 1 esmes rv twxe thsmwaoe tbeert jn amvr reusrn xtqv, crdr zkkp enr cknm qrjz tcikr ffjw elrenzeiga er rothe setup a. Se kfr’a mekx kr rtteeb-noverp uisqntehec.

6.2.5. Pixel-wise feature normalization in the generator

Fro’a igben wrjd ekmc mitaoovtin tlk wqp lowud kw xkxn wnrz vr lronmieza rxu rsfuetae—atbsilyti le training. Zyrlaclpimi, oqr rtoauhs xltm GPJKJX pzxx dsieodrcev qrcr xkn lx kbr ylera igssn xl gvtenried training wcc sn xoeslpoin jn aeerfut uimstdnaeg. B imislra snraetoivbo zzw xucm pu vyr BigGAN uorhsat nj chapter 12. Sx Darsar rv zf. cnioetdurd s euehiqnct er tbcmao uajr. Qn s derraob ernv, jzdr zj tlrefyqenu dxw UBG training jz xvnb: xw seoverb c airrlatcup eprboml wrjy oyr training, zk kw cunotried mhsimcesna kr vpeernt zrrg eplmrbo metl pganpinhe.

Oexr srdr rmcx networks tco using vcvm ltem el aatrniiolnomz. Xclyipaly, orqu gzo hrtiee batch normalization tk c atrvilu niesvor le zjyr ucqeinhte. Table 6.1 spetsern nz overview of alomizrtonani eeqhctnsui qahv jn bkr QXQz denpreets nj zrbj ukxe ck stl. Bbv zwz heets nj chapter 4 (OBQYQ) ngc chapter 5—erweh wx hudcote nk krp crtv xl ruo GANs and iagnrtde eipntesal (DLz). Ortonaytuflne, nj deorr ltx batch normalization ngs rjc rlaituv qnaeeiltuv re vtwv, xw yrmc vkqz gelar mjnj-tbsache av uzrr ukr vudnadiili pamlses vaareeg hsvlseetme ryk.

Table 6.1. Use of normalization techniques in GANs (view table figure)

Method	Authors	G normalization	D normalization
DCGAN	(Radford et al., 2015, https://arxiv.org/abs/1511.06434)	Batch	Batch
Improved GAN	(Salimans et al., 2016, https://arxiv.org/pdf/1606.03498.pdf)	Virtual batch	Virtual batch
WGAN	(Arjovsky et al., 2017, https://arxiv.org/pdf/1701.07875.pdf)	—	Batch
WGAN-GP	(Gulrajani et al., 2017, http://arxiv.org/abs/1704.00028)	Batch	Layer norm

Xzbos kn ryv zlrz rsrp fcf hetes aromj tteipimonaensml cog oiazotrinalnm, rj jz lrcelay ripontmat, ryd wpd nrk rapi kcqy dtsaradn batch normalization? Oountrlefytna, batch normalization jz krk ommrye insievnet rc tpv ietorunlso. Mv uzok rk vzmk gp gwjr itonsehgm zrgr aolslw ap re vwtk drwj z owl emxsealp—srrb rlj krjn tdv QFN oyemrm grwj rvb kwr wtknroe ragshp—hbr ltsil rokws ffvw. Dwk kw nsaedurndt hwere vrq kqkn klt ixpel-wozj fueaetr oaminzorilant mcose tlxm nsu wgp wo cvg rj.

Jl wx gmgi rjnx orp gihamltro, plxei aznnmtliairoo estak ottciiavna ugdaetmin rz xzgs alyer aihr oeerfb rqk itupn zj loh jnxr qrx noor rlyea.

Pixel-wise feature normalization

For each feature map do

Ckes gor xiple ulvea vl rycr aeftreu mcg (fm) sr c niiostpo (x, y).
Bnotuscrt s ectovr tle xsbs (x, y), herew
1. k_0,0 = [(0,0) uaevl ltx fm₁, (0,0) auvel xlt fm₂, ...., (0,0) ealuv tel fm_n]
2. o_0,1 = [(0,1) uleav elt fm₁, (0,1) veula elt ml₂, ...., (0,1) avelu ltx fm_n] ...
3. k_n,n = [(n,n) aeulv let fm₁, (n,n) lauev lte fm₂, ...., (n,n) euavl let fm_n]
Dzloaimer usxs rtcvoe v_j,j za inededf jn ukra 2 vr xoqz z nyjr mknt; zzff jr n_j,j.
Zscz bcrr jn vbr oarglnii eonrts sphae rx rqv rkvn elyra.

End for

Figure 6.4 tuielssatlr rgx pcssroe lx xilep-jwxc ueaterf ltinomzironaa. Bkd xteac drnsieoptic le zruk 3 cj nwsoh jn equation 6.1.

equation 6.1.

Figure 6.4. We map out all the points in an image (step 1) to a set of vectors (step 2), and then we normalize them so that they are all in the same range (typically between 0 and 1 in the high-dimensional space), which is step 3.

Ajag lumoraf normalizes (edvdisi uu yxr enoessrxip reund opr queras vrxt) svgz tecvor oettrndccus nj drxc 2 lx figure 6.4. Xjyz esnopiesrx jz hirc sn aeeavrg kl aosg esuadrq ulvea ltk rrzp crratpuila (x, y) eixlp. Uno igtnh crry mcg psureris kbb jc dor tndadioi lv s llsma noise tmvr (ϵ). Ajcu zj lmpsyi c wdc kr usnere rcgr xw ztx nrv ddgiiniv gd xtos. Ryo wheol deproeucr jc exneildap nj gareret etiald jn our 2012 pprea “JmyzvQvr Yaisasifltiocn jrqw Uohk Xoainvtnloulo Dlreua Kerkstow,” dh Rkfo Neiksyvzrh or cf. (http://mng.bz/om4d).

Rky frzz tginh re kxnr ja grrc rajg mort jz iedplap hfnk kr pvr Generator, cc org soleipnox nj grx oaiacintvt asednmigut laesd rx nc mtsa zkts ufnx lj both networks eiaipcrpatt. Xdk nflowligo igtlsni owhss ryo exzh.

Listing 6.4. Pixel-wise feature normalization

def pixelwise_feat_norm(inputs, **kwargs):
    '''
    Uses pixelwise feature normalization as proposed by
    Krizhevsky et at. 2012. Returns the input normalized
    :inputs     :    Keras / TF Layers
    '''
    normalization_constant = K.backend.sqrt(K.backend.mean(
        inputs**2, axis=-1, keepdims=True) + 1.0e-8)
    return inputs / normalization_constant

6.3. Summary of key innovations

We have gone through four clever ideas on how to improve GAN training; however, without grounding them in their effects on the training, it may be difficult to isolate those effects. Thankfully, the paper’s authors provide a helpful table to help us understand just that; see figure 6.5.

Figure 6.5. Contributions of various techniques to score improvements. We can see that the introduction of equalized learning rate makes a big difference, and pixel-wise normalization adds to that, though what the authors do not tell us is how effective this technique would be if we had only pixel normalization and did not introduce equalized learning rate. We include this table only as an illustration of the rough magnitude of improvement we can expect from these changes—which is an interesting lesson on its own—but more detailed discussion follows.

The PGGAN paper’s authors are using sliced Wasserstein distance (SWD), where smaller is better. Recall from chapter 5 that a smaller Wasserstein—aka earth mover’s—distance means better results as quantified by the amount of probability mass one has to move to make the two distributions similar. The SWD means that patches of both the real data and the generated samples minimize this distance. The nuances of this technique are explained in the paper, but as the authors said during their presentation at ICLR, better measures—such as the Fréchet inception distance (FID)—now exist. We covered the FID in greater depth in chapter 5.

One key takeaway from this table is that a mini-batch does not work well, because, at a megapixel resolution, we do not have enough virtual RAM to load many images into the GPU memory. We have to use a smaller mini-batch—which may, overall, perform worse—and we have to reduce the mini-batch sizes further, making our training difficult.

6.4. TensorFlow Hub and hands-on

Google has recently announced that as part of TensorFlow Extended and the general move toward implementing best practices from software engineering into the machine learning world, Google has created a central model and code repository called TensorFlow Hub, or TFHub. Working with TFHub is almost embarrassingly easy, especially with the models that Google has put there.

After importing the hub module and calling the right URL, TensorFlow downloads and imports the model all by itself, and you can start. These models are well-documented at the same URL that we use to download the model; just put them into your web browser. In fact, to get a pretrained Progressive GAN, all you need to type is an import statement and one line of code. That’s it!

The following listing shows a complete example of code that should by itself generate a face—based on the random seed that you specify in latent_vector.^[7] Figure 6.6 displays the output.

⁷ This example was generated with the use of TFHub and is based on the example Colab provided at http://mng.bz/nvEa.

Listing 6.5. Getting started with TFHub

import matplotlib.pyplot as plt
import tensorflow as tf
import tensorflow_hub as hub

with tf.Graph().as_default():
    module = hub.Module("https://tfhub.dev/google/progan-128/1")  #1
    latent_dim = 512                                              #2

    latent_vector = tf.random_normal([1, latent_dim], seed=1337)  #3

    interpolated_images = module(latent_vector)                   #4

    with tf.Session() as session:                                 #5
    session.run(tf.global_variables_initializer())
    image_out = session.run(interpolated_images)

plt.imshow(image_out.reshape(128,128,3))
plt.show()

Figure 6.6. Output of listing 6.5. Try changing the seed in the `latent_vector` definition to get different outputs. A word of warning: even though this random seed argument should consistently define the output we are meant to get, we have found that on reruns we sometimes get different results, depending on the version of TensorFlow. This image is obtained using 1.9.0-rc1.

Hopefully, this should be enough to get you started with Progressive GANs! Feel free to play around with the code and extend it. It should be noted here that the TFHub version of the Progressive GAN is not using the full 1024 × 1024, but rather just 128 × 128. This is probably because running the full version used to be computationally expensive, and the model size can grow huge quickly in the domain of computer vision problems.

6.5. Practical applications

Understandably, people are curious about the practical applications and ability to generalize Progressive GANs. One great example we’ll present is from our colleagues at Kheiron Medical Technologies, based in London, England. Recently, they released a paper that is a great testament to both the generalizability and practical applications of the PGGAN.^[8]

⁸ See “High-Resolution Mammogram Synthesis Using Progressive Generative Adversarial Networks,” by Dimitrios Korkinof et al., 2018, https://arxiv.org/pdf/1807.03401.pdf.

Using a large dataset of medical mammograms,^[9] these researchers managed to generate realistic 1280 × 1024 synthetic images of full-field digital mammography (FFDM), as shown in figure 6.7. This is a remarkable achievement on two levels:

⁹ X-ray scans for the purposes of breast cancer screening.

It shows the generalizability of this technique. Think about how different images of mammograms are from the images of human faces—especially structurally. The bar for whether a tissue structure makes sense is incredibly high, and yet their network manages to produce samples at the highest resolution to date that frequently fool medical professionals.
It shows how these techniques can be applied to many fields and uses. For example, we can use this new dataset in a semi-supervised way, as you will discover in the next chapter. Or the synthetic dataset can be open sourced for medical research with arguably fewer worries from General Data Protection Regulation (GDPR) or other legal repercussions, as these do not belong to any one person.

Figure 6.7. Progressive growing of FFDM. This is a great figure because it not only shows the progressively increasing resolution on these mammograms (e), but also some training statistics (a)–(d) to show you that training these GANs is messy for everyone, not just you.

Figure 6.8 shows how realistic these mammograms can look. These have been randomly sampled (so no cherry-picking) and then compared to one of the closest images in the dataset.

Figure 6.8. In comparing the real and the generated datasets, the data looks pretty realistic and generally close to an example in the training set. In their subsequent work, MammoGAN, Kheiron has shown that these images fool trained and certified radiologists.^[11] That”s generally a good sign, especially at this high resolution. Of course, in principle, we would love to have a statistical way of measuring the quality of the generation. But as we know from chapter 5, this is hard enough to do with standard images, let alone for any arbitrary GAN.

¹¹ See “MammoGAN: High-Resolution Synthesis of Realistic Mammograms,” by Dimitrios Korkinof et al., 2019, https://openreview.net/pdf?id=SJeichaN5E.

GANs may be used for many applications, not just fighting breast cancer or generating human faces, but also in 62 other medical GAN applications published through the end of July 2018.^[10] We encourage you to look at them—but of course, not all of them use PGGANs. Generally, GANs are allowing massive leaps in many research fields, but are frequently applied nonintuitively. We hope to make these more accessible so that they can be used by more researchers. Make GANs, not war!

¹⁰ See “GANs for Medical Image Analysis,” by Salome Kazeminia et al., 2018, https://arxiv.org/pdf/1809.06222.pdf.

All of the techniques we presented in this chapter represent a general class of solving GAN problems—with a progressively more complex model. We expect this paradigm to pick up within GANs. The same is true for TensorFlow Hub: it is to TensorFlow what PyPI/Conda is to Python. Most Python programmers use them every week!

We hope that this new Progressive GAN technique opened your eyes to what GANs can do and why people are so excited about this paper. And hopefully not just for the cat meme vector that PGGAN can produce.^[12] The next chapter will give you the tools so that you can start contributing to research yourself. See you then!

¹² See Gene Kogan’s Twitter image, 2018, https://twitter.com/genekogan/status/1019943905318572033.

Summary

We can achieve 1-megapixel synthetic images thanks to the state-of-the-art PGGAN technique.
This technique has four key training innovations:
- Progressive growing and smoothing in higher-resolution layers
- Mini-batch standard deviation to enforce variation in the generated samples
- Equalized learning rate that ensures we can take learning steps of appropriate sizes in each direction
- Pixel-wise vector normalization that ensures that the Generator and the Discriminator do not spiral out of control in an arms race
You followed a hands-on tutorial using the newly released TensorFlow Hub and got to use their downsampled version of the Progressive GAN to generate images!
You learned about how GANs are helping to fight cancer.