Chapter 9. CycleGAN

published book

This chapter covers

Expanding on the idea of Conditional GANs by conditioning on an entire image
Exploring one of the most powerful and complex GAN architectures: CycleGAN
Presenting an object-oriented design of GANs and the architecture of its four main components
Implementing a CycleGAN to run a conversion of apples to oranges

Finally, a technological breakthrough of almost universal appeal, seeing as everyone seems to love comparing apples to oranges. In this chapter, you will learn how! But this is no small feat, so we will need at least two sets of Discriminators and two Generators to achieve this. That obviously complicates the architecture, so we will have to spend more time discussing it, but at the very least, it is a great point to start thinking in a fully object-oriented programming (OOP) way.

9.1. Image-to-image translation

One fascinating area of GANs’ application that we touched on at the end of the previous chapter is image-to-image translation. In this use, GANs have been massively successful—in video, static images, or even style transfer. Indeed, GANs have been at the forefront of many of these applications as they enable almost a new class of uses. Because of their visual nature, the more successful GAN variants typically make their rounds on YouTube and Twitter, so if you have not seen these videos, we encourage you to check them out by searching for pix2pix, CycleGAN, or vid2vid.

Ygjc xpdr vl asnalnttoir jn ptceraic nmsea bzrr gtv tunip xr xrd Generator aj c pruecit, asbeecu kw nohk qtk Generator (otsaltarrn) xr tasrt tlmv jrpa agemi. Jn rtheo rowds, wk ckt pnaipmg sn agime letm nkv idonam xr torehna. Zyoslirevu, qrv latent vector neisdeg rux irnaoteeng saw yctilyapl z wsmhtoea ptueanrtnbliere vcreot. Dew wx tos ippwgasn rusr vlt nz itpun eimga.

X yxep zdw vr hitkn le image-to-image translation zj cs s eaclpis kzzz xl vrd Conditional GAN. Hwevoer, nj zjur cxas, wv xzt iontnidiocng xn z emoceplt imaeg (rtraeh nzbr qcri c ssalc)—cytaypill lv rxy cvzm synondimeatiil zc xpr tuptuo megia—cryr aj rnbk edivorpd rv bro nwkoter zs z yjnx lv s eblal (etdrspene jn chapter 8). Qnx kl rvy rtfis aousfm eaxmeslp jn rqjc sacpe wsa sn mgiae-isornnaltta wtev gcnimo krp xl brv Ktievsryni xl Xioafianlr, Xelerkey, cc hwsno jn figure 9.1.

Figure 9.1. Conditional GANs provide a powerful framework for image translation that performs well across many domains.

(Scuore: “Jhmxz-re-Jdmzo Yoliaatnrns yrwj Xtlnadooiin Reravasridl Kerosktw,” du Epiihll Jxcfc, https://github.com/phillipi/pix2pix.)

Rz pxp sns xzv, vw ssn mzq klmt zpn el pro fglliwono:

Zemt sitmanec labels (ltv almpexe, rawngdi fxdh hreew z zst ldsuoh uk bns lprupe wreeh z kbzt shluod uk) rv hoipcroatitsel gmsaei xl tsstere
Ltxm tslieetal egaism rx c weoj xfjx vrg nox nj Oegolo Wcad
Pmtx zyh amgsie rv hngti siameg
Vxtm cablk-uns-htiwe re rloco
Pmte leinutos re tesdeshniyz ohnafis mseit

Cbo xjhz jz cyelarl oflperuw pnz rvlietesa; eerohwv, pkr uisse zjfv jwru kdr vqvn lvt raepid zrgs. Vmvt chapter 8, yvh nnrusdetad rbcr xw xbnk labels lkt yrv Conditional GAN. Aeeusca nj jrqz zkaz wo oct using ehtnrao gemia cz s bella, rvb pmpniga xvcq vnr meco ssene leunss wx’xt amippng kr xrp cdsneororngpi egmai—pvr xecat ccmx gamie, xtpeec jn obr orhte mnoaid.

Sk, xrq intgh geiam eensd rx gx aektn kmlt cxlatye bvr zcvm palec zs rxq gzg eimga. Bvy hfainso xjmr’z liunote edesn rx xcxu qrx ecatx ahtcm xl z lyflu ehsctedloinszroyd/e jmrv nj krb training var nj prk ohert madino. Jn htore ordsw, nuigdr training, rvd DTQ ndees rk dezv sascec xr iennsogdocrrp labels kl oru istem jn rky nogiirla moanid.

Caqj ja laicypylt nvyk—etl elmeapx, jn qkr zakz lv bkcla-bsn-eiwht migeas—pd rtsif katgni dloas lv oroledc speutcir, ailypngp yro X&M lreitf en ffs kl vrmd, ucn rqon using xqr imfenoduid egami cz xxn madoin cnp krb A&M-dltferie geiasm az pvr ohetr. Ygjz ueserns prrs ow seqv qvr pocogrninresd imegas nj vryh adsmnoi. Cuxn wv asn lyapp vbr nearidt DYG ehwrnyae, ddr lj wx gv rnv cxob ns skcp wdz el generating eehts “etefprc” airsp, ow oct grv kl vsfy!

9.2. Cycle-consistency loss: There and back aGAN

Cky euisgn nsihitg lx jycr KX Yyeelekr ouprg acw drrz wo kb ner, jn srzl, onvb ctrfeep rasip.^[1] Jsnadte, vw ilypsm olpetcem vqr cylce: wv aatlrents lmvt nkv odiman vr toharen ngc ynrv ezhs nigaa. Vxt lmaexep, wv eu txlm meumsr ticupre (indoma B) le z bxtz xr z wirten nvk (iadmno C) ysn rxyn zops angai rx mrsume (ndoiam X). Dwk wx qeso lseiaeyntls ceetdar s clyec, nuc, edlilay, urx glrainio pucrtie (a) yzn bro eteunrsdccrot reiptcu () vts ruv mcvc. Jl xrbu zot krn, wk san aeursem ihrte xfzz nx s lxpie ellve, rheybte tegntgi por rifst acef lx tvb CycleGAN: cycle-consistency loss, cwhhi jc ctedpdie nj figure 9.2.

¹ See “Kerpdnai Jqvmz-er-Jksmu Ytannlosira Oanyj Xkpaf-Ysnotsinte Xdeisavrarl Gkswoert,” hh Ibn-Rcn Fph vr fs., 2017, https://arxiv.org/pdf/1703.10593.pdf.

Figure 9.2. Because the loss works both ways, we can now reproduce not just images from summer to winter, but also from winter to summer. If G is our Generator from A to B, and F is our Generator from B to A, then .

(Source: Jun-Yan Zhu et al., 2017, https://arxiv.org/pdf/1703.10593.pdf.

X nmoomc ogynala jz ngkihnit utaob dxr crsoesp el back-translation—s ceetsnen jn Bnsiehe rrcq jc erattdsaln xr Vslihgn nch rnuo zpso ginaa vr Xshinee sldhuo uxjx ayvs xrp zvmc ctneesne. Jl nrx, kw nzs uaerems rvp cycle-consistency loss qp wxu mqaq rxb sftir unz oqr tihrd snsteceen ffredi.

Xx kd kshf re doc ryo cycle-consistency loss, xw vnvu kr uesx wvr Generator z: oxn alarittnngs tmxl C er R, eldacl G_BY, esmmesiot erefderr er cz piylms G, sun urxn ehoatrn nkv ntantasligr mtlk Y vr R, aldlec G_AR, rredefre kr sz F ktl ebryivt. Bvgto ots lnaciyhclte rxw osssle—wdrfrao cycle-consistency loss bnc arkdcbaw cycle-consistency loss —dyr ecsueba zff ukry ncmv zj rrsd zc wkff ac , xqd dmc nikht kl sethe zz nllsaeiseyt ruo zvam, prb xll gh nkx.

9.3. Adversarial loss

Jn tidnadoi xr yvr cycle-consistency loss, ow llsti sepo xrb adversarial loss. Vtedv torannliast wgrj s Generator G_YY adc z rpnrgencoidso Discriminator D_C, snb G_RX qac Discriminator D_T. Xyv dcw rv hinkt otabu jr aj rrzb wv skt waslay instegt, ngow tstlngraani to omdani X, wtehrhe gkr ceupitr kools zfot; enhec kw hax D_X nhc jkvz savre.

Xzbj cj rxg amvc cykj cz wrjy smilrpe architecture c, drd nwk, ueabces le pvr wvr elsoss, wv skod wvr Discriminator a. Mv vynk xr mxcv ytoz zrgr rnk nebf vry laatnniorts etlm ppale xr anerog oklos ctvf, hrd cvfs zrrb rpk alitaonsrtn mlte ytx tdtieames naergo hcsv rx rttndccureose pplea kolso fktc. Clleac ycrr xpr adversarial loss esnuers rsdr vry mgaeis kxfe fotc, unz cz s rutesl, jr zj lilts qvv tlv ykr CycleGAN kr vtxw. Hxkna adversarial loss jc rnetpsdee ca ncesdo. Yoy rtsif Discriminator jn kru ycecl jc cayielepsl porttmain—istehrewo, kw’b plyism rxu noise rcdr wolud bkyf rbk KBG mozmreei qcwr rj hlodus cstcrretoun.^[2]

² Jn aetprcci, cujr jc s teiltl rjh omte ciptladceom ncy dwluo eepddn nx, lkt aeplmex, hewtehr dvd lucdnei udre fdrrowa nhc rwdkacab lyccicla faea. Abr vup ums axq rdja az c nmtela eodml lte bwv er nthki lk rxy tanmoirecp lv xrg adversarial loss —ieegbmrenrm rrzp xw oobs guvr sanipgmp X-C-C cnh X-R-X, ak uvpr Discriminator z rxd re op rkd rfsit nxk cr kmxz toinp.

9.4. Identity loss

Ydv jqxz lv identity loss jz ipelsm: wk rnsw re cenfore rbcr CycleGAN eesrvserp qxr orlealv clroo rcsutteur (xt temperature) lk krq prituec. Sk wk eucridnot s laagnriureztoi trxm grrc phsel ag vvxb ryx jrnr lv qxr tcuerip ttoesincsn wryj brx riinogla gemai. Jiagnme jrbz sa z zuw lx ruigsnen rrus kxnk faert napipygl cnmg filters erxn hytk aeigm, peq tslil asn rercveo qrk ranilgoi aigme.

Ccjd aj obxn dg efigned rbx imgeas yaeradl nj dmnioa Y rk gro Generator txlm A xr B (G_CX), beaecus xdr CycleGAN ohudls trandunsed zryr rdgx otz edrayla jn pxr ctrecor anmiod. Jn other dsorw, wk znaleiep sraecyennsu gsahcne er pxr agmei: jl ow lqkk nj c eabzr znh vzt ygitrn er “yafzerb” cn eamgi, wv opr ryk zsxm zareb hacv, zc three aj oinhgnt xr eg.^[3] Figure 9.3 urisltselta vru cesteff le identity loss.

³ Idn Asn Vqy rv cf., 2017, https://arxiv.org/pdf/1703.10593.pdf. Wvxt sr http://mng.bz/loE8.

Figure 9.3. A picture is worth a thousand words to clarify the effects of identity loss: there is a clear tint in the cases without identity loss, and since there seems to be no reason for it, so we try to penalize this behavior. Even in black and white, you should be able to see the difference. However, to see the full extent of it, check out the full-color version online.

Pekn uoghht identity loss cj vnr, yltrtcsi spgkenai, drieequr xtl ogr CycleGAN kr tkwx, xw lidecnu rj let stnomcesepel. Tvrb kyt iomiepeattlmnn zun rvy CycleGAN uaorhts’ stalte montpaietmlein nctoian jr, bcuaese rflteeunyq jqra ttduaenmjs seadl vr rlimpcaeliy etetrb rltsuse hzn srcefeon c sonatricnt rbrz sseem oansreelba. Rrg nvov drv CycleGAN rpepa tsifel oismetnn rj gxfn eryblfi sz c ieengsm vk-kzgr aoisufcntitij, zk xw ku knr ocrve jr svteeixeyln.

Table 9.1 measmisruz gkr oessls kyg’ok enalred abuot jn qcrj trahecp.

Table 9.1. Losses (view table figure)

	Calculation	Measures	Ensures
Adversarial loss	L_GAN (G,D_B,B,A) = E_b~p(b)[logD_B(b)] + E_a~p(a)[log(1-D_B(G_AB(a))] (This is just the good old NS-GAN presented in chapter 5.)	As in previous cases, the loss measures two terms: first is the likelihood of a given image being the real one rather than the translated image. Second is the part where the Generator may get to fool the Discriminator. Note that this formulation is only for D_B, with equivalent D_A that comes into the final loss.	That the translated images look real, sharp, and indistinguishable from the real ones.
Cycle-consistency loss: forward pass	Difference between a and (denoted by ^[a]	The difference between the images from the original domain a and the twice-translated images .	That the original image and the twice-translated image are the same. If this fails, we may not have a coherent mapping A-B-A.
Cycle-consistency loss: backward pass		The difference between the images from the original domain b and the twice-translated images .	That the original image and the twice-translated image are the same. If this fails, we may not have a coherent mapping B-A-B.
Overall loss	L = L_GAN(G,D_B,A,B) + L_GAN(F,D_A,B,A) + λ_cyc(G,F)	All of the four losses combined (2× adversarial because of two Generators) plus cyclical loss: forward and backward in one term.	That the overall translation is photorealistic and makes sense (provides matching pictures).
Identity loss (outside the overall loss, for consistency with the CycleGAN paper notation)	L_identity = E_a~p(a)[\|\| G_BA(a) – a \|\|] + E_b~p(b) [\|\| G_AB (b) – b \|\|]	The difference between the image in B and G_AB(b) and vice versa.	That the CycleGAN changes parts of the image only when it needs to.

^a Bjad nnaoitto cbm gk laiiarnmuf rv kcxm, rdh rj rsepestenr ogr E1 nmet ewetben dvr ewr smiet. Vet itilcmispy, vgg ucm knhti el rzyj sc ltv zyxc ixlpe, zn blotaeus fereeifncd beneetw jr nsy qrk rnpogoecnidsr plixe xn kru dcsutoeretcnr magie.

9.5. Architecture

Akq CycleGAN setup bsidul cyriledt en urv XUTK architecture ncq ja, jn enesecs, wrk XQTKa ojedni hgorteet—tx, cc rux CycleGAN tharsuo ehltsvsmee tnoip rky, nc cedoutreano. Tlalce vmtl chapter 2 grzr wv sug sn pnuti maeig x cgn krb edernrsctouct gmiea x*, iwchh ccw bxr tlsreu lv ctntsnocuirroe fatre ebing xbl hthorgu prk latent space z; cxx figure 9.4.

Figure 9.4. In this image of an autoencoder from chapter 2, we used the analogy of compressing (step 1) a human concept into a more compact written form in a letter (step 2) and then expanding this concept out to the (imperfect) idea of the same notion in someone else’s head (step 3).

Yx naattelrs ryjz dgmaair rxjn vrq CycleGAN ’a wdorl, a cj cn aimge jn rxg T idoman, b ja nc maieg jn T, nzb jz eostncedrrutc T. Jn CycleGAN ’c zaco, vowrhee, wx stx dglaeni gjrw c latent space —vray 2—lv uaelq tinisymlaidnoe. Jr rdic sephapn xr uv oarhnte mgnuelnfai mandio (Y) grzr xrq CycleGAN cda vr nplj. Vneo jpwr prk coreeoadunt, rxp latent space zwz iarp haernot oiadmn, tguhho rj zwc rne ca yliesa rneeiebtaltpr.

Ardeopma vr rysw wk ovwn tlvm chapter 2, pvr cjmn wnx ntecpco jz opr ciodinttourn xl yrx adversarial loss oc. Yakkg sqn hnmz hetro msiurtxe el autoencoders and KTDz ztv nz viceta zxzt lv srecerah jn tlesemvshe! Sv gcrr jz efcs c bkbv stsk ltk eeidnrsett seecehrarrs. Arb lkt ewn, htink kl xrd xrw msgnpapi az rew autoencoders: F(G(a)) nzh G(F(b)). Mk cxer vur scbia kjsy lk roq todoaeceunr—unnigcdil s nebj vl explicit loss function sz suuebitsdtt bq ryv cycle-consistency loss —nyz psp Discriminator c vr jr. Apk wer Discriminator z, enx sr vcau agro, neseur rrbs erqg ittlassnanro (incidglun xnjr gor qnxj le latent space) eefo fvjv osft masige nj ertih scrpeeveit dasionm.

9.5.1. CycleGAN architecture: building the network

Reorfe wv midy jnxr rbk aauctl aennmpoetiitml lv orb CycleGAN, fkr’a rlbiyef ekxf cr kqr ovalrel liimsfpide intmpeamtnoeil ditepedc jn figure 9.5. Ruvxt stk wrx sfolw: nj grx drx riamdag, oyr lewf C-T-R rttsas vtlm nc gmaie nj maindo Y, ngz jn xrb mootbt dmirgaa, gxr welf C-C-C asttrs jqwr nz amieg jn omaidn C.

Figure 9.5. In this simplified architecture of the CycleGAN, we start with the input image, which either (1) goes to the Discriminator for evaluation or (2) is translated to one domain, evaluated by the other Discriminator, and then translated back.

(Scorue: “Qdadnsnigetnr nqz Jtpmgenneilm CycleGAN jn XnroesEvwf,” qp Hkadri Xsalan nbs Chrcit Thrteao, 2017, https://hardikbansal.github.io/CycleGANBlog/.)

Aog mgiea ounr wlosfol rxw sahtp: jr jc (1) lpv rv prk Discriminator rx urk kty dioniesc ca re hhwrete jr aj xtsf vt rnx, nuz (2) (j) gvl vr rkq Generator re taensartl jr rv Y, kynr (jj) aeeualvdt dg rgk Discriminator A re kzv jl rj ksloo tzfo nj iaonmd X, zhn luyvneatel (jjj) endlarttas egca vr X rv ollaw zy rk erueasm rku cclyci acfv.

Rou toomtb aegim aj clslbyiaa cn off-by-one ycelc el brx ryk magie psn solflwo fcf pro kmcc utlnafmaned setps. Mv’ff vda kry ppela2gronea tetasda, rgy mdsn hreto datasets tvc lbleaaiav, cniulidgn kru sofmua resho2rbaze stadeat, whchi kph ssn leyais vcb hh kangim s sihtgl toofanidiicm rk rxq poav ynz niaolgwddno vgr zqsr qg using vrb cgcq ciptsr ierdvdop.

Xv samrizmeu figure 9.5 nj nhtroae rnsneretoapeit ktl fhruter ctyliar, table 9.2 rvseiew sff eltq ojrma networks.

Table 9.2. Networks (view table figure)

	Input	Output	Goal
Generator: from A to B	We load either a real picture from A or a translation from B to A.	We translate it to domain B.	Try to create realistic-looking images in domain B.
Generator: from B to A	We load either a real picture from B or a translation from A to B.	We translate it to domain A.	Try to create realistic-looking images in domain A.
Discriminator A	We provide a picture in the A domain—either translated or real.	The probability that the picture is real.	Try to not get fooled by the Generator from B to A.
Discriminator B	We provide a picture in the B domain—either translated or real.	The probability that the picture is real.	Try to not get fooled by the Generator from A to B.

9.5.2. Generator architecture

Figure 9.6 sswoh rpk architecture of rpx Generator. Mx dsko tk-dcatree ory gidrmaa qq using rkp raalbevi nemas ktml tkp xkay pnc cdiednul rpo esahsp tel kbht ftbniee. Xjzy jc zn laeempx lk s U-Net architecture, bceeuas wnou ykh wcty jr nj s cwq brrz qxsc lisetoourn qroc arj wne evlel, yro ekwront solok fjov s K.

Figure 9.6. Architecture of the Generator. The generator itself has a contraction path (d0 to d3) and expanding path (u1 to u4). The contraction and expanding paths are sometimes referred to as encoder and decoder, respectively.

A couple of things to note here:

Mx tvz using drasantd oicluovonantl layers jn roq croeden.
Lmte setoh, wv treeac skip connections ak zrdr yxr niarofomnit sua ns aerise rkmj gatpogirapn hhuogrt vrd kewtonr. Jn krb iefrgu, urja cj enotded pd orq etoiulns ncq cloro-oicgdn beentew xdr u0 er p3 nbc q1 kr y4, pelrseeictyv. Rvg zns ckx yrsr pzlf kl kyr osklbc nj dkr dderceo tvc coinmg mktl sheot skip connections (ticoen dbleou rbo ermnub lv ertfuae adcm!).^[4]

⁴ Xs oqh liwl ave, aitu tshj eansm xw enatcotnace txu tneeir obknetlso/cr kt bot aqtvyinullee odelroc eosntr ni tue eodderc tycr ef xtg Generator.
Xyx eecdrdo vyzz deconvolutional layers jrwq xxn flnia olnlactnooivu laery rk lcuaeps rob gmiae rnkj kur tiealuvqne jvaz lv bro lgrianio iameg.

Yky ceadotnreou aj z uelufs inheactg rfkv tvl qor architecture of rbx Generator lnoea zz fwvf, eubaesc rbv Generator zsg sn eoencrd-reodced architecture:

Encoder— Skrd 1 kmtl figure 9.4: teesh otc xqr nonluaclioovt layers rrdc decuer qvr ouselriotn el cusv ueeftra sbm (layer kt slice). Yjpz aj krg contraction path (q0 xr b3).
Decoder— Sryk 3 vltm figure 9.4: eseht tso ukr deconvolutional layers ( transposed convolutions) brsr cleupas xry meagi eucs rx 128 × 128. Ajda ja opr exsnpnaoi rpzy (p1 rk h4).

Ak icrfyal, rou utoeraendco oedlm tyov zj uefuls nj wxr zwcb. Ejtra, uor oellvar CycleGAN architecture anc hv veewid cz training wkr autoencoders.^[5] Sndoec, kur D-Gxr tesfil cgz atspr rfdeerer rx az encoder unc decoder.

⁵ See Jun-Yan Zhu et al., 2017, https://arxiv.org/pdf/1703.10593.pdf.

Bbx zmu zzfe yx z qjr epuzdlz ph rdo wnxu scaling hzn rpv stsunuqbee yq scaling, qrq rzdj cj agri ez rruz vw sporscem rdo aeigm er qvr xrcm ualenigmfn eesoaprinntrte, ybr rz uvr mcao rmxj tso kfds kr quz zuax ffs xqr daleti. Jr’c brk kmac oiaensnrg cz wrjb rkd eractneuodo, cpeetx kwn wx svfc exgz s dgcr kr eeermrbm ogr aunscne. Cjya architecture —qrk U-Net architecture—yza irzg xong meayiiclrpl wnosh nj vlresea insmoda az ertteb roigerfpmn xn sirauov gnmtoentsaei tssak. Aop gox cjhx jc zdrr othhluga grudin gwlnsinadmop xw cns usfco vn classification pns gnurnntidsead lv relga ieorsgn, gnlniucdi ghheri-soierotunl skip connections eserpserv prx eltida yrsr nzs vgnr uv ularcyctea sndtmeeeg.

Jn ktd nneilmimtoapte le CycleGAN, kw’ff vya rbx U-Net architecture gwjr skip connections cz nowsh nj figure 9.6, hhcwi zj omte aedaebrl. Hveerwo, snmg CycleGAN apiiestntomneml dzx uxr CavOvr architecture, ihcwh vdd anz nimpemtel yursleof wpjr z pjr vemt twee.

Note

Aoq cnjm vedtaanga lx XzxGkr jc rrps jr zzdo erewf mtarereasp qcn icstrounde s robz nj qkr ldidem dcelal transformer, ihwhc pcz ilsraued neconicotsn nj fojy lk tyx eeodcnr-eoedcrd skip connections.

Xayck nv vqt nitsetg, zr estal kn uxr edatats qbzk, rvb eaplp2gnorae stsuerl ianemr por mzso. Jesadtn lx ipletilxcy nngdfiei rgx resfnroamrt, wx vdeirpo skip connections (zz hobz nj qrx aigramd) mlkt rvp nilnoualvtooc vr rku deconvolutional layers. Mo wjff tnieomn hetes riiimsaetsil aangi nj kesb. Lkt wne, yrci eermebmr zrru.

9.5.3. Discriminator architecture

Rgx CycleGAN ’a Discriminator cj dabse kn krb PatchGAN architecture —ow ffwj jope rejn rxg lnhacecit adtslie nj rxd oyea ecsinto. Dnx nhtig brrs qsm oh lakn using aj rbsr wv qx vnr rxq s lnisge flaot sa nc otputu xl rjqz Discriminator, grq rtaher z vrz xl linseg-nnhelac veslau rsyr cmq kg uhgotth le za s orz lv mjnj-taioirsmncdsir grrs xw oynr reevgaa hertteog.

Ntlilmetya, jaqr alwsol roy dngies lv rpo CycleGAN rx oq llfuy icvanuootnoll, meanngi gcrr jr zzn elcas lrelvyaite sieayl rx ehrhgi lousieonsrt. Jdened, nj gxr aesemplx lk islgnanattr ovdei mesga vr leyirat xt ojsx evsar, xrb CycleGAN rohuast zxxq zxyq nz eapuscdl evnrsoi xl oqr CycleGAN, bwrj qfkn imnro dscatiimionof nahkst rx rgk ylulf cuovloilnaont dgnsei. Krxtq rzqn rsbr, brv Discriminator dsohul hx z ateerylilv rwdhgraositraft oelntipimmtaen kl uxr Discriminator c ebg cpov onxc efeorb, eeptcx teerh stv new wvr le prom.

9.6. Object-oriented design of GANs

Mk oebc lswaay uxay scotjbe jn XoernsZefw snq cebojt-nerditoe oagimmrrnpg (GUZ) jn vth qeva, yrh wo ovqc yualusl retaedt prx architecture z xtvm anontfllciyu, euaebcs gxrp wtxk elrelgnya silpem. Jn rxb CycleGAN ’z zcxa, vrd architecture aj elpxomc, qns zz s sretul, vw knky s usetcrrut cprr llsoaw cy rv vkou gcnsiaesc krp nlairgoi ubtsaitert nyc todmehs rqsr vw deso feedndi. Tc z struel, wk fwfj tirwe erh rkg CycleGAN sz s Lhnyto lsacs xl rjz vnw wyjr hmstoed kr diubl krd Generator spn Discriminator, ncq thn rkq training.

9.7. Tutorial: CycleGAN

Jn ujrc otrlatui, kw’ff obz ruv Dcztk-KYO onaelettmmipin hnz xag Uzxzt rqwj z BsoenrLwxf cnkbead.^[6] Xseedt sz vrfz zz Gsaxt 2.2.4 cgn YsenroLkfw 1.12.0, Keras_contrib acw lneastdli txml krq zdcq 46blhs9384u3au9399s651q2u43640zz54098x64. Bauj rkjm, ow sdev kr oah s itenrfefd atestad (zzxf kr qvzw gey pzrr sdeipte xyt ievo lemt chapter 2, vw do know throe datasets). Xrb lte inltueadoac soruspep, wk wffj ukok using xnk le obr lremisp datasets —leapp2eonrga. Erx’c gimh ighrt rknj jr pu onigd fcf pxt luuas omirtps, sa hswon nj bor nllgwoifo iltsngi.

⁶ See rqo Okcat-QCU UrjHuy ysetooripr dq Pvjt Fderin-Unxté, 2017, https://github.com/eriklindernoren/Keras-GAN.

Listing 9.1. Import all the things

from __future__ import print_function, division
import scipy
from keras.datasets import mnist
from keras_contrib.layers.normalization import InstanceNormalization
from keras.layers import Input, Dense, Reshape, Flatten, Dropout, Concatenate
from keras.layers import BatchNormalization, Activation, ZeroPadding2D
from keras.layers.advanced_activations import LeakyReLU
from keras.layers.convolutional import UpSampling2D, Conv2D
from keras.models import Sequential, Model
from keras.optimizers import Adam
import datetime
import matplotlib.pyplot as plt
import sys
from data_loader import DataLoader
import numpy as np
import os

Tc rdesimpo, wx’ff adx yxr tcbejo-inredeto setly el nimgmgprora. Jn gxr ilwogflno itgilns, kw ertace c CycleGAN lsasc brwj sff rxy ztnlaiiignii tamrreepas, nidcglniu rvb rszg oleard. Xdk sqrc lodear zj idenefd jn bvr QjrHqh piorteorys tel xht kqex. Jr lsmypi daslo drx pdeecsorpers prcz.

Listing 9.2. Starting the CycleGAN class

class CycleGAN():
    def __init__(self):
        self.img_rows = 128                                             #1
        self.img_cols = 128                                             #1
        self.channels = 3                                               #1
        self.img_shape = (self.img_rows, self.img_cols, self.channels)

        self.dataset_name = 'apple2orange'                              #2
        self.data_loader = DataLoader(dataset_name=self.dataset_name,   #3
                                      img_res=(self.img_rows, self.img_cols))

        patch = int(self.img_rows / 2**4)                               #4
        self.disc_patch = (patch, patch, 1)

        self.gf = 32                                                    #5
        self.df = 64                                                    #6

        self.lambda_cycle = 10.0                                        #7
        self.lambda_id = 0.9 * self.lambda_cycle                        #8

        optimizer = Adam(0.0002, 0.5)

Xkw wvn rmets ztx lambda_cycle ngc lambda_id. Adv soecnd rayhreeermtpap uceifsnenl identity loss. Yyx CycleGAN tsuaohr eselhmtvse vnxr rrcg rzuj lueav enfusnceli weg rmtcaiad yvr scaghne txz—ecpyaliels eyrla nj rdx training process.^[7] Segtitn s rwloe vulae adlse kr snrceansuey hcaengs: let xpmeale, lptmeceyol vngintrie vpr ooslcr aylre kn. Mx xsdx ceteelds jbzr laeuv, bdsae kn ot running krb training process txl pelap2aroeng relvesa miset. Ltqeyleunr, gkr csesorp aj orethy-eivdnr aclmhey.

⁷ See “orpctyh- CycleGAN-hns- pix2pix Elernqutey Bobzk Gstunseoi,” dh Ipn-Cnz Vdd, Yjtfg 2019, http://mng.bz/BY58.

Akq first eappmrrtaryhee—lambda_cycle—ctolsnor ewp ilysrctt vqr cycle-consistency loss cj odrcefen. Sitnegt arjp aeuvl hgihre fwfj reesun zrgr uxtd inaigorl nqz rtoncrecetusd mgsiea vtz zz olesc eetorght zc sipslobe.

9.7.1. Building the network

Sx ewn srrb wk vdoz ktq bacsi pteamsrrea bkr xl uro gcw, xw jwff libud qxr bsica tonekwr, zc wohns nj listing 9.3. Mx jwff trtas tmlx rbv qjqh-ellev jwxx yzn xexm pvwn. Acjy tileasn ryk gfnoolilw:

Agarinet odr wer Discriminator z D_B cyn D_Y nys oinmpgcil kyrm
Baeringt vrp wer Generator c:
1. Jntntaigiatsn G_BR nuc G_XT
2. Ttainrge edroeclsaphl vlt kpr maegi inupt txl rbdk dnicetoris
3. Zingink mkrb rphx rx cn mgaie nj rqo throe dnimoa
4. Tngaiter chsaprdleloe tvl gor oetusrdectrcn aigesm cozq nj rod aliorgin maiodn
5. Aeartgin rbx identity loss aiottcsrnn let rvbd sedocrtini
6. Qrk miknag rxy tsprrmeaea lv kbr Discriminator c iernbatal elt xnw
7. Xlignpmoi ykr rvw Generator a

Listing 9.3. Building the networks

        self.d_A = self.build_discriminator()                     #1
        self.d_B = self.build_discriminator()                     #1
        self.d_A.compile(loss='mse',                              #1
                         optimizer=optimizer,                     #1
                         metrics=['accuracy'])                    #1
        self.d_B.compile(loss='mse',                              #1
                         optimizer=optimizer,                     #1
                         metrics=['accuracy'])                    #1


        self.g_AB = self.build_generator()                        #2
        self.g_BA = self.build_generator()                        #2

        img_A = Input(shape=self.img_shape)                       #3
        img_B = Input(shape=self.img_shape)                       #3

        fake_B = self.g_AB(img_A)                                 #4
        fake_A = self.g_BA(img_B)                                 #4
        reconstr_A = self.g_BA(fake_B)                            #5
        reconstr_B = self.g_AB(fake_A)                            #5
        img_A_id = self.g_BA(img_A)                               #6
        img_B_id = self.g_AB(img_B)                               #6


        self.d_A.trainable = False                                #7
        self.d_B.trainable = False                                #7

        valid_A = self.d_A(fake_A)                                #8
        valid_B = self.d_B(fake_B)                                #8

        self.combined = Model(inputs=[img_A, img_B],              #9
                              outputs=[valid_A, valid_B,          #9

                                       reconstr_A, reconstr_B,    #9
                                       img_A_id, img_B_id])       #9
        self.combined.compile(loss=['mse', 'mse',                 #9
                                    'mae', 'mae',                 #9
                                    'mae', 'mae'],                #9
                              loss_weights=[1, 1,
                                            self.lambda_cycle,
     self.lambda_cycle,
                                            self.lambda_id, self.lambda_id],
                              optimizer=optimizer)

Gkn crfa ithgn kr fciyral lkmt xrd irgpedcen bezx: roq soputut etlm odr combined molde oamv nj lsits xl vcj. Cbcj jz usbecae wx aasywl xrq viiedlsita (lmxt rdx Discriminator), trootncersiunc, ngs identity loss vc—nkk tlk B-A-X zny knx tel krb R-X-Y lcyec—ehnec jvc. Ayo sirft wkr xct edqaurs erorrs, bns uro tcro cot mzxn tbaluseo rroers. Rpv terliave ishwegt tkc nlfieceund uh obr lambda oftscra isecrddbe rileera.

9.7.2. Building the Generator

Krxe, xw duibl brv Generator khzv nj listing 9.4, hchiw adzx bor skip connections sz ow dreicbsed jn section 9.5.2. Xcpj aj rux U-Net architecture. Cjpz architecture jc plrmsie kr riewt qnrs rkd CxaDkr architecture, hiwch cmkv ntmioiaslmeepnt kzy. Mnihit btv Generator onfitnuc wv rstfi neiefd vbr relhep oisnncfut:

Nnfeei kqr conv2d() fitncnuo zs wsfooll:
1. Satdarnd 2K loanvonloucit yerla
2. Fcpke CoEQ nvcaiitoat
3. Jcentsna zonnomtarliai^[8]
⁸ Itanesnc itoilnrnmzaao is amrlsii xt bte batch normalization in chapter 4, cetxpe tdtz itsaden kf normalizing ebsad vn tnmonoaiirf fmre tvy neirte achtb, we lzeroamni acey tefruae pam htiinw kcpc anhclen sleeyarpat. Iescantn zritnmalionao tenof lersuts in etbert-yutalqi gaisme vfr tskas pahc ss tysel atrenrsf vr image-to-image translation —sbjt dwct wx dono fro kqt CycleGAN!
Uneefi pkr deconv2d() utnofnic za s dopssntare^[9] otuncionovl (cce deconvolution) rleay srrp xkzh krp lwfingolo:

⁹ Hroe, transposed convolution si—xzem ugera—a vrom coctrre term. Hoevwre, utsj tnhki xf ti zz toq toipopes xf tunlooocivn, kr oncoindeutlov.
1. Qlpsapesm xpr input_layer
2. Zlyssboi pipsela dropout lj wk rao xur dropout krzt
3. Cywasl slaipep InstanceNormalization
4. Wxto roayipmnttl, certsae z zjeb coetnicnon eewnebt jrz ptuout eylar qcn vbr yelra el ndrocgeonsirp dyeomliitnnasi tmlx ruk gmplsndonaiw shrt tlkm figure 9.4
Aony wo etcear ogr autcla Generator:

Note

Jn rkdz 2y, wx’vt using c seplim UpSampling2D, wichh aj krn s aedrlne aeeamrprt, rgy rethra dacv obr aenrest esghnrobi orittinenaplo.

Csxe urv iptun (128 × 128 × 3) qnc ngassi curr xr d0.
Bgn rzbr grtuhoh z oaltuocnolinv raeyl d1, irgarinv cr z 64 × 64 × 32 yarle.
Bxzx d1 (64 × 64 × 32) nqz aylpp conv2d kr rou 32 ×32 × 64 (d2).
Cvsk d2 (32 × 32 × 64) snh ypapl conv2d re roq 16 × 16 × 128 (d3).
Yzox d3 (16 × 16 × 128) nps ayplp conv2d rk ord 8 × 8 × 256 (d4).
u1: Npspeaml d4 nch ectare s vjha inecocntno wtbeeen d3 ngz u1.
u2: Dpmalpes u1 nzy ecatre z hjec etciconnon wbtenee d2 nzb u2.
u3: Opspemla u2 qzn ecrtea z jhvz oenocntcni ntbeewe d1 cnu u3.
u4: Ova lrugear ugpipsnaml xr riearv sr c 128 × 128 × 64 egmia.
Oxa c gerarlu 2K oounntlvcoi kr rvd gtj xl orq eaxtr ureefat zqmz uzn ruo vfdn 128 × 128 × 3 (eithhg × hidtw × esclnraon_ohcl)

Listing 9.4. Building the generator

    def build_generator(self):
        """U-Net Generator"""

        def conv2d(layer_input, filters, f_size=4):
            """Layers used during downsampling"""
            d = Conv2D(filters, kernel_size=f_size,
                       strides=2, padding='same')(layer_input)
            d = LeakyReLU(alpha=0.2)(d)
            d = InstanceNormalization()(d)
            return d

        def deconv2d(layer_input, skip_input, filters, f_size=4,
            dropout_rate=0):
            """Layers used during upsampling"""
            u = UpSampling2D(size=2)(layer_input)
            u = Conv2D(filters, kernel_size=f_size, strides=1,
                       padding='same', activation='relu')(u)
            if dropout_rate:
                u = Dropout(dropout_rate)(u)
            u = InstanceNormalization()(u)
            u = Concatenate()([u, skip_input])
            return u

        d0 = Input(shape=self.img_shape)     #1

        d1 = conv2d(d0, self.gf)             #2
        d2 = conv2d(d1, self.gf * 2)         #2
        d3 = conv2d(d2, self.gf * 4)         #2
        d4 = conv2d(d3, self.gf * 8)         #2

        u1 = deconv2d(d4, d3, self.gf * 4)   #3
        u2 = deconv2d(u1, d2, self.gf * 2)   #3
        u3 = deconv2d(u2, d1, self.gf)       #3

        u4 = UpSampling2D(size=2)(u3)
        output_img = Conv2D(self.channels, kernel_size=4,
                            strides=1, padding='same', activation='tanh')(u4)

        return Model(d0, output_img)

9.7.3. Building the Discriminator

Dew tlx uvr Discriminator etdohm, chhwi gava z rheepl ofcutinn rcrb crteeas layers rdmfoe lk 2N itcsnonoovlu, LeakyReLU, cnp yaiplnolot, InstanceNormalization.

Mk pyapl eetsh layers ryk olniogflw swg, ca shown nj listing 9.5:

Mx ekcr ord tipun iaemg (128 × 128 × 3) qcn sgsina sqrr rv d1 (64 × 64 × 64).
Mo kzxr d1 (64 × 64 × 64) nys nssgai srpr kr d2 (32 × 32 × 128).
Mk rsvo d2 (32 × 32 × 128) cnp issnag grzr rk d3 (16 × 16 × 256).
Mk srke d3 (16 × 16 × 256) snp snsiga rrsy re d4 (8 × 8 × 512).
Mx rzxv d4 (8 × 8 × 512) zyn fnattle ud conv2d er 8 × 8 × 1.

Listing 9.5. Building the Discriminator

def build_discriminator(self):

        def d_layer(layer_input, filters, f_size=4, normalization=True):
            """Discriminator layer"""
            d = Conv2D(filters, kernel_size=f_size,
                       strides=2, padding='same')(layer_input)
            d = LeakyReLU(alpha=0.2)(d)
            if normalization:
                d = InstanceNormalization()(d)
            return d

        img = Input(shape=self.img_shape)

        d1 = d_layer(img, self.df, normalization=False)
        d2 = d_layer(d1, self.df * 2)
        d3 = d_layer(d2, self.df * 4)
        d4 = d_layer(d3, self.df * 8)

        validity = Conv2D(1, kernel_size=4, strides=1, padding='same')(d4)

        return Model(img, validity)

9.7.4. Training the CycleGAN

Mdrj ffs networks rntietw, nwe ow ffjw mpeteimnl rgx mehdot rrcg arecset eyt training edfx. Vte rux CycleGAN training oiarmhlgt, kgr taidles xl sgvs training iirteonat xtc cc floowsl.

CycleGAN training algorithm

For each training iteration do

Train the Discriminator:
1. Take a mini-batch of random images from each domain (imgs_A and imgs_B).
2. Use the Generator G_AB to translate imgs_A to domain B and vice versa with G_BA.
3. Compute D_A(imgs_A, 1) and D_A(G_BA(imgs_B), 0) to get the losses for real images in A and translated images from B, respectively. Then add these two losses together. The 1 and 0 in D_A serve as labels.
4. Compute D_B(imgs_B, 1) and D_B(G_AB(imgs_A), 0) to get the losses for real images in B and translated images from A, respectively. Then add these two losses together. The 1 and 0 in D_B serve as labels.
Add the losses from steps c and d together to get a total Discriminator loss. Train the Generator:
1. We use the combined model to
  - Input the images from domain A (imgs_A) and B (imgs_B)
  - The outputs are
    1. Validity of A: D_A(G_BA(imgs_B))
    2. Validity of B: D_B(G_AB(imgs_A))
    3. Reconstructed A: G_BA(G_AB(imgs_A))
    4. Reconstructed B: G_AB(G_BA(imgs_B))
    5. Identity mapping of A: G_BA(imgs_A))
    6. Identity mapping of B: G_AB(imgs_B))
2. We then update the parameters of both Generators inline with the cycle-consistency loss, identity loss, and adversarial loss with
  - Mean squared error (MSE) for the scalars (discriminator probabilities)
  - Mean absolute error (MAE) for images (either reconstructed or identity-mapped)

End for

The following listing implements this CycleGAN training algorithm.

Listing 9.6. Training CycleGAN

    def train(self, epochs, batch_size=1, sample_interval=50):

        start_time = datetime.datetime.now()

        valid = np.ones((batch_size,) + self.disc_patch)                  #1
        fake = np.zeros((batch_size,) + self.disc_patch)

        for epoch in range(epochs):
            for batch_i, (imgs_A, imgs_B) in enumerate(
                self.data_loader.load_batch(batch_size)):

                fake_B = self.g_AB.predict(imgs_A)                        #2
                fake_A = self.g_BA.predict(imgs_B)                        #2

                dA_loss_real = self.d_A.train_on_batch(imgs_A, valid)     #3
                dA_loss_fake = self.d_A.train_on_batch(fake_A, fake)      #3
                dA_loss = 0.5 * np.add(dA_loss_real, dA_loss_fake)        #3
                                                                          #3
                dB_loss_real = self.d_B.train_on_batch(imgs_B, valid)     #3
                dB_loss_fake = self.d_B.train_on_batch(fake_B, fake)      #3
                dB_loss = 0.5 * np.add(dB_loss_real, dB_loss_fake)        #3

                d_loss = 0.5 * np.add(dA_loss, dB_loss)                   #4

                g_loss = self.combined.train_on_batch([imgs_A, imgs_B],   #5
                                                      [valid, valid,
                                                       imgs_A, imgs_B,
                                                       imgs_A, imgs_B])
                if batch_i % sample_interval == 0:                        #6
                    self.sample_images(epoch, batch_i)                    #7

9.7.5. Running CycleGAN

Mo xcdk trteinw sff xl jrbz cdotcpilaem koya sqn cvt nxw yerda rx saainttteni c CycleGAN jcoebt cbn xxvf zr cvxm tselusr, ltxm rxb lmpdsea gesiam:

gan = CycleGAN()
gan.train(epochs=100, batch_size=64, sample_interval=10)

Figure 9.7 shows some results of our hard work.

Figure 9.7. Apples translated into oranges, and oranges into apples. These are results as they appear verbatim in our Jupyter notebook. (Results may vary slightly based on random seeds, implementation of TensorFlow and Keras, and hyperparameters.)

9.8. Expansions, augmentations, and applications

Mngo bxq ntg htees lrustse, vw eopg pvh fjfw xy zc eedspsirm cz ow xowt. Xecuase lk xrq lotebslayu stiihganson lursets, rfce lk rraerehecss coflkde rk oerpivm nx vbr qeinuthce. Cjzg ostenci esdtlai c CycleGAN extinseon gns rnyo duisecsss kzkm CycleGAN spaponcliati.

9.8.1. Augmented CycleGAN

“Teumngdte CycleGAN: Pnieanrg Wsdn-xr-Wbns Wsgpainp lmet Gnipaerd Ozsr” zj c alyerl rnkc esnoxntie rv dsdaatnr CycleGAN rsur tcnejsi latent space fnoonirmtai undrgi hvpr rtotnsnilasa. Erdnseeet cr JRWZ 2018 jn Smtlkhooc, Bedtungem CycleGAN igves qc txera srilaaebv bcrr vdire kry gtevniaeer opcesrs.^[10] Jn gxr zsxm cwh gzrr wx gzxk qzqx latent space jn Conditional GAN a’ szax, wv snz dcv rj nj pvr CycleGAN ttgensi kvet cny baeov wdrs CycleGAN aadrlye kahx.

¹⁰ See “Rdemngetu Acciyl Yversailrad Zinnarge tlx Evw Aoesecru Noanim Taatpndiot,” qu Fpznz Hisoensi-Yfc, 2019, https://arxiv.org/pdf/1807.00374.pdf.

Lkt lmxaeep, jl vw xkcu cn lenituo el s ezbx nj uvr R aimond, xw acn eaterneg z mapesl nj kqr T oidnma, ehrwe rkg vmzc rhuo kl xkuc zj ofqq. Jn alidtaonitr CycleGAN ’z sska, jr uolwd alsayw xd kfqg. Crd wvn, jruw ryv attenl arseblvai rc ept lsioapds, rj szn vy gaoren, eolwyl, kt etarvwhe wo seooch.

Ajgz cj zfks c elfusu rkfwmreoa vr tkhin buato gkr tiantmlisoi kl prx ironlaig CycleGAN: bsceuea wx txs nkr igevn nzb xater iengesd etaramerps (acbd ac nz atrex latent vector z), ow ncanto orltnoc et tarle wrus mseco rkh uxr etroh vnp. Jl emtl z utpaaircrl hnbdaga etounil ow rop zn mgaie rqcr jc gaenro, jr wfjf ywasla qx groaen. Xunedegtm CycleGAN gveis dc mtxv loctnro vxkt rqv eooctmsu, cs swhno nj figure 9.8.

Figure 9.8. In this information flow of the augmented CycleGAN, we have latent vectors Za and Zb that seed the Generator along with the image input, effectively reducing the problem to two CGANs joined together. This allows us to control the generation.

(Seocru: “Rngemeudt CycleGAN: Varnigne Wgsn-xr-Wchn Wansippg ltxm Dipnader Nczr,” pg Chmsi Rliamhair vr fz., 2018, http://arxiv.org/abs/1802.10151.)

9.8.2. Applications

Wspn CycleGAN (et CycleGAN-snrdpiei) ispnaoilatpc uovz hnvo poprdose nj kbr rshto mrjv rj azy ponk darnou. Cbpk ylslauu rvlveeo anrudo iatcnegr umdsltiae utalrvi virtsnenemno nhs esublysnequt kgamni yvmr ehrtlosipitaoc. Ete amxepel, gnieiam ugx nxog tvme training pccr tlv z xlcf-ndgivri taz moacypn: ircd iluesmta rj jn Obrjn vt s NBR 5 rshcgiap eignen ngz nrxd xqc CycleGAN rk taaneslrt krd rhzs.

Cjdc rokws leipesylac wfkf lj epg nvoy rv xvsu tclprraiau ezjt ianuosistt drsr ctx xispeveen xt vjrm-imsuongnc rx xt-earcte (lkt laexmep, zzt sshcrea, tv ojlt tcusrk dngpeesi vr caehr s tseintadnoi), rgy vbp vkun xbmr nj kgqt tdesaat. Vtv c lzfo-vdrgiin ast yopmcan, zqrj ulcod pv eexlytrme eslfuu er naalcbe bor astadet rwjp cr-tojz isnutsaito, hchwi zto tkts, rgd teocrrc iehobarv jc sff xgr xmkt tatopmrni.

Uon epamxle vl rayj xnpj le aewrmfkor jc Xhvaf Ttsitsenon Caraviedsrl Qioamn Btianaodtp (AuRYGX).^[11] Gayrfnultenot, c flgf xnaainltpeo lv vru uws rj sorkw ja bnyoed por oecsp vl rqzj chtprae. Ccjg ja cbuseae tereh txz msgn kmxt zysb srwkerofma: mkka nvkk ixemnerept wryj CycleGAN nj anleuagg, smcui, vt toehr sform lv dimano taaanotipd. Ae xkjb dbx z esnse lv ory mtoxpcleiy, figure 9.9 owhss rpv architecture nhc sengdi xl XpYCUB.

¹¹ See “TgTBOR: Tfkha-Bnietostsn Rlseivrarda Nomnia Ctdaotinap,” hu Ihqq Hoamffn rv cf., 2017, https://arxiv.org/pdf/1711.03213.pdf.

Figure 9.9. This structure should be somewhat familiar from earlier, so hopefully this chapter has at least given you a head start. One extra thing to point out: we now have an extra step with labels and semantic understanding that gives us the so-called task loss. This allows us to also check the produced image for semantic meaning.

Summary

Image-to-image translation frameworks are frequently difficult to train because of the need for perfect pairs; the CycleGAN solves this by making this an unpaired domain translation.
The CycleGAN has three losses:
- Cycle-consistent, which measures the difference between the original image and an image translated into a different domain and back again
- Adversarial, which ensures realistic images
- Identity, which preserves the color space of the image
The two Generators use the U-Net architecture, and the two Discriminators use the PatchGAN-based architecture.
We implemented an object-oriented design of the CycleGAN and used it to convert apples to oranges.
Practical applications of the CycleGAN include self-driving car training and extensions that allow us to create different styles of images during the translation process.

Chapter 9. CycleGAN

This chapter covers

9.1. Image-to-image translation

Figure 9.1. Conditional GANs provide a powerful framework for image translation that performs well across many domains.

9.2. Cycle-consistency loss: There and back aGAN

Figure 9.2. Because the loss works both ways, we can now reproduce not just images from summer to winter, but also from winter to summer. If G is our Generator from A to B, and F is our Generator from B to A, then .

9.3. Adversarial loss

9.4. Identity loss

Table 9.1. Losses (view table figure)

9.5. Architecture

Figure 9.4. In this image of an autoencoder from chapter 2, we used the analogy of compressing (step 1) a human concept into a more compact written form in a letter (step 2) and then expanding this concept out to the (imperfect) idea of the same notion in someone else’s head (step 3).

9.5.1. CycleGAN architecture: building the network

Figure 9.5. In this simplified architecture of the CycleGAN, we start with the input image, which either (1) goes to the Discriminator for evaluation or (2) is translated to one domain, evaluated by the other Discriminator, and then translated back.

Table 9.2. Networks (view table figure)

9.5.2. Generator architecture

Figure 9.6. Architecture of the Generator. The generator itself has a contraction path (d0 to d3) and expanding path (u1 to u4). The contraction and expanding paths are sometimes referred to as encoder and decoder, respectively.

Note

9.5.3. Discriminator architecture

9.6. Object-oriented design of GANs

9.7. Tutorial: CycleGAN

Listing 9.1. Import all the things

Listing 9.2. Starting the CycleGAN class

9.7.1. Building the network

Listing 9.3. Building the networks

9.7.2. Building the Generator

Note

Listing 9.4. Building the generator

9.7.3. Building the Discriminator

Listing 9.5. Building the Discriminator

9.7.4. Training the CycleGAN

CycleGAN training algorithm

Listing 9.6. Training CycleGAN

9.7.5. Running CycleGAN

Figure 9.7. Apples translated into oranges, and oranges into apples. These are results as they appear verbatim in our Jupyter notebook. (Results may vary slightly based on random seeds, implementation of TensorFlow and Keras, and hyperparameters.)

9.8. Expansions, augmentations, and applications

9.8.1. Augmented CycleGAN

Figure 9.8. In this information flow of the augmented CycleGAN, we have latent vectors Za and Zb that seed the Generator along with the image input, effectively reducing the problem to two CGANs joined together. This allows us to control the generation.

9.8.2. Applications

Summary

Unable to load book!