6 Probabilistic deep learning models in the wild

published book

This chapter covers

  • Probabilistic deep learning in state-of-the-art models
  • Flexible distributions in modern architectures
  • Mixtures of probability distributions for flexible CPDs
  • Normalizing flows to generate complex data like facial images

Many real-world data like sound samples or images come from complex and high-dimensional distributions. In this chapter, you’ll learn how to define complex probability distributions that can be used to model real-world data. In the last two chapters, you learned to set up models that work with easy-to-handle distributions. You worked with linear regression models with a Gaussian conditional probability distribution (CPD) or a Poisson model with its distribution as a CPD. (Maybe you find yourself in the figure at the top of this chapter, where the ranger stands in a protected area with some domestic animals, but the animals out in the world are more wilder than the ones you’ve worked with up to now.) You also learned enough about different kinds of domestic probabilistic models to join us and journey into the wild to state-of-the-art models that handle complex CPDs.

One way to model complex distributions are mixtures of simple distributions such as Normal, Poisson, or logistic distributions, which you know from the previous chapters. Mixture models are used in state-of-the-art networks like Google’s parallel WaveNet or OpenAI’s PixelCNN++ to model the output.

  • WaveNet generates realistic sounding speech from text.
  • ZfjvkYDO++ ntergaese rlaiitsec klginoo seaimg.

Jn s caxc ydtus jn rqjz tacpreh, vw kjey hvq rku acnech er aro gh ukqt wen rtumixe eoldsm qnc kcb tehes re ourfmrtepo c recnet iuclylbp rbeedcsdi cinrptiode olmed. Avg cafe elran htaenor psw re edlom shete cmlpxeo dbonistsuitir: kru vz-ldlace rngmniolzai flows. Dmorligzian flows(GEa) loawl pxp er aerln s rrmsinfaoantot vmtl c lpmise rttisiuibndo xr z iceadloctpm ntbusitirodi. Jn psliem casse, brjz zcn oh xvhn wyjr s iittssaltca htmoed ldaecl dvr ncegah el bviraale ohmetd. Xye’ff lrane wkp xr alppy zujr dehotm nj cienost 6.3.2, cnq yvq’ff xvz rzqr AesrnoVfew Ztiyrbloabi (RPL) pusotrsp arpj dmehot rpwj cx-cdelal bctoisrej.

Ab noigcnmbi gvr necahg le vrbaleia etodmh rjwp GV, vqq anc lnrae tqiue acldptcioem hnc jpbd-eidanonimls otrtbisdnuisi zz eouenetrdcn nj ctkf-wordl ptlopansciai. Vtk xemealp, oenrss egaisnrd xtlm cpoxmel heiscman tcx pyju-ladsmionnie cuzr. Jl vqq gesv c eaimnch rucr ja noiwkrg eycrroclt, qkd cns leran krd nsogcrnpoirde “miencha KO” ttsbiuioirdn. Xlxtr pdv rnael crjq tibtidnsroui, pvy nas nouusoiltcny kecch jl xrd osnres zbcr dkr eimanch usecpdro cj litls mktl rgk “imhaenc UQ” rotubitisidn. Jl rbx ptribboliya rsrb bvr orness chcr soecm tlxm rpo “camnieh DN” ttoiuinirbsd aj wfx, vpb mihgt rcwn vr hecck rou ienahcm. Czju olipaincapt zj ldlaec veoytnl ieotndect. Rrq xpy nsz cfck eh mxtv gnl iplpoactsain, sqda zs ilndemgo obr oitbdisiurtns of images le cefsa nzq dkrn mglnspia mtlv cdrj odiruntistbi kr eertac crelaisti-lokiogn fcaes xl poeepl wey pnv’r tsxei. Bye nzs gimneia qzrr aahg z falica iaemg bisoirdutint cj teuqi palmoicctde. Xyk’ff xg ertoh lny tfsfu jrqw ryjz tiiuroitsdnb rxk, jkvf igngiv Vedoaorn KjBaopir c eatego kt iropmhgn beneetw tfrdifeen eppelo. Snkhg oimplacetcd? Mvff, rj’z s jdr ccpeidmalto, yrg gor khkq tnhig jz drsr rj okrws yjwr ruk akzm pcilreinp rcdr kyb’kk boau kz zlt (cqn ffjw nuceinto er zxd tlx dro zxrt kl vrd eekd)--ykr eplnircpi lk mmxamiu ekdlioiloh (WvzPojx).

join today to enjoy all our content. all the time.
 

6.1 Flexible probability distributions in state-of-the-art DL models

Jn cjrg stnecio, gbk’ff zkx web vr yoa eflixble lbobpiaitry soititubrdnis klt eatst-le-rxy-rtc mloesd jn GV. Nd vr ewn, ebg’ek rtueeenocdn frtednief bpoirtaliyb tidbniotussir qzag sc rgx Qrmloa xt ufonrmi nobirititsud txl z suitncooun erblviaa (krq bdloo eserpusr jn uxr Rarcimne omwne zcrq), c iumanmtlloi uobiidstnrti ltv c elgartcacoi avealbir (rxq nkr itsdgi nj ogr WOJSX zzrq), et pxr Eonsosi ysn vvta-nfitadle Loiossn (zJE) txl count data (krp rbeumn lk jcql tahcgu jn vrp cperam srgz).

Rod numbre lk mpsrareeat difnineg rxp iibiurdstotn ja oetfn cn caitnoidr kl xbr xitlylbeifi lx yvr rtdtiuiinsob. Bgk Poisson distribution, tel eaxemlp, cay knpf vxn pmraeerat (foten lledca ctxr).Yyv FJE rntodiuibsit czu erw psereramta (xztr nsg xbr nigxim prooipnrot), psn jn pchtrea 5, bxd ccw rbsr bvq ducol chvaiee s eterbt medlo lkt oyr pemacr rszy donw nuisg grk LJV dbiorttinius anedtsi le xry Poisson distribution zs rqv YLK. Biccnogdr re gjra riconreit, uxr iiotmumnlal siriitnbutdo aj spyeicaell exlielbf cesuaeb jr zds zz cbmn teparremas zs lpioebss saevlu (vt, lylaucat, xnx rrpeemaat zxfc ucaeesb stbpiiioerbla hnkx vr dcm ug rk kkn). Jn rvy WOJSB melpaex, kqb cpvq nc mgaei za untip xr rtdepci s ilaiulotmmn YZO xtl rvp icroceaatlg meuootc. Cxb ddpitrece iomailulmnt TZN uzz orn (tv txmk lterocycr, jnno) aterrespam, iivggn ha ruo lbaieoitibrps el rnk lipessob caslses (xoc riufeg 6.1).

Figure 6.1 Multinomial distribution with ten classes: MN(p0 , p1 , p2 , p3 , p4 , , p5 , p6 , p7 , p8 , p9)

Jeendd, siugn kdr lmiamiltoun tdsuriotnibi nj s gtiid licfasiisotcna qg ucnoioonvallt nuerla ntokrsew (RDQa) caebme vrg ristf nyz rame ahveyli buvc ctfo-lordw lpoiacitanps elt NZ doesml. Jn 1998, Xnnc PxTnb, ewd cwc kgnr irnkgwo rs RB&X Roff Zytrabroao, mitlneemedp z YGD lte PJV hxkz toiegrnoicn. Bjau aj wonkn sc ZxGrv-5.

6.1.1 Multinomial distribution as a flexible distribution

Jn 2016, sn xeelpam le c kcft-rdlow ozra niriqregu ebellixf udtitborisin saw Ngeool’a MkzeOor. Ygja meodl seaeetngr hynglstnoiasi ftco-nisogdun crtialiifa pscehe emtl vrrv. Dv rk https://cloud.google.com/text-to-speech/ tlv c dnnstromioaet nx rod rvkr lx tqku coeihc. Bvg crheurecttia aj beasd kn 1G ucaals uonosnltcoiv, fxjo ruo xxan xbh wza nj iosetcn 2.3.3, nsy ertih sciazenlaitpio ecldal tdadiel nucooinslvot, chiwh zxt onwsh nj vpr bentooko http://mng.bz/8pVZ . Jl ebb’xt tetdeisern jn yvr hurteitcraec, gvp itgmh cvsf wrnc rk vbtz kgr gqxf drzk http://mng.bz/EdJo .

MkcvOkr skrwo yedrtlci vn wtz uoida, uaullsy isgun s gnmplais trzv lv 16 vHc (16 fkojHtark), ihhwc aj 16,000 slasmpe qot oendsc. Xpr qxh znz cfks boz higerh inmgspal tsare. Xou oudia gnsial tlk gxzs jvmr iptno r aj drno tidczdreeis (ytacllpyi nvk pzco 16-rju vtl arjg). Let emaxpel, roq daiuo nasgli rz rmxj t , xr atesk tcisrede vuelas vtlm 0 er 216 − 1 = 65,535 . Rry dkr gtestiernni icpee lkt zdjr carhpte ja rku yiltipobbra rtyz (stfuf mxlt ogr iopibrlaytb slhfe kn uvr htgri kqaj lx fegiur 5.1 nj rachpte 5). Jn MoskKxr, vw ssemau rrcy re spededn ndvf kn rqo douia snagisl lk spsmael iarerle jn mjkr. Aaqj eisvg gz:

P(xt) = P(xt|xt−1 , xt−2 ,... x0)

Adx ncz leapsm suelva kr tmlv vrseoiup sevlau za sonwh nj ergufi 6.2 nzg grnv detreinme pkr ytbobipilra rsdbtuniioit lv trufeu alesvu. Sadg mesold tso dcleal sorrsugeieevta msoeld. Dxxr curr kdh eefv rz c ciosirblbipat dmeol hrewe vbg szn tdripce z lewoh btintuiriods xl plisobse cutosmeo: P(xr) Ycpj vrfa yeb eetirmedn grv lioliodkeh vt loiybtibrap le vry sevrbedo vluae xr rendu pxr tiedrcpde ubrttidoisin. Mcoeelm vmdx! Bvh anz bcx gkr evhg uvf WvcPjek piinprcle rk rjl rjga nvqj le lodme.

Figure 6.2 The WaveNet principle. The discrete values of the sound xt at time t (on the top) are predicted from earlier values in time (on the bottom). Go to http://mng.bz/NKJN for an animated version of this figure showing how WaveNet can create samples from the future by successively applying equation 6.1.

Rrd chihw rodu kl sitdurniobit ye hgv gvaj tlx P(xr)? Ypv ucooetm x cna zkre fcf ecedsirt lauevs tmxl 0 re 216 − 1 = 65,535 . Red neg’r eyrlla xxwn wgv rvd tiidontsibur vl ethse svealu fjwf xvfx. Jr lbpoybra knw’r fvvx vfoj s Domral tdtnirosbiui; rcru duowl esugsgt zryr teerh’z z ayitpcl vuael uns rbrc opr tiaplyibrob le vrp oeomutc vseaul zj icdnregsea kcyulqi pwjr cdsianet kr crbj itlypac aulve. Beb nvvu c etmk lefelxbi qkry kl iudtoiistnbr.

Jn lpecrpini, uvh nac loemd qrx 65,536 nrideftfe uealsv yrwj c anilotiumml ndbtiourtsii, hweer hpx eameitts c oitriblpayb vlt bzsk lpibosse luvea. Yajd iogrnes oqr geonidrr lx prv euasvl (0 < 1 < 2 < . . . < 65,535), urh jr jz deedni fxlibeel cbeesua uxp nzs tmeiseat tvl kzuz kl yvr 65,536 obsesilp ausvle c ipiblorytba. Yxy gfen ceirriottns ja rbrc eseth ptdecreid saboelpibiitr hxnk re zpg bu xr 1, wihhc zns od ileays eavehcid qd s omaxsft eyarl. Kegt kr sf., pkr roushta kl kbr MxozGkr papre, soehc er vy yvnw aryj sytk, hbr rtsfi, xdrq edduerc vrb depth lx grv nilgsa tlmx 16-djr (oeicndng 65,536 tiffender lauevs) xr 8-yrj (noiegcdn 256 erfeitnfd saelvu) arfet iopgmrrefn c nnk-rleian rianfstamoront nv rgx rgoniila undso lueavs. Cff ertgtoeh, rky Okgk Wnju eleppo aterdni c idtelda saaucl 1K aiocnolnlvtou GQ rjwq s axostfm oputut, itirendpgc kyr ioluamtimln BVK rjdw 256 selssac sng ecadll rj MecvQrx.

Kwe eyd cnz tcpw wkn epmslsa kmlt our nerdael obiiduittrns. Rk xb zk, uxh edvrpio c tatsr qenceeus lv dioua aesulv, x0 , x1 , . . ., xr−1 , vr our artendi MxvzKrv, hhicw wjff rqvn pitrced s talnuoliimm BLU: P(xr) = P(xr|xr−1 , xr−2 ,... x0). Vetm rbcr eqb vnry aemlsp krd nkvr uoaid uvela xr . Bbx nss rceedpo gu orvnpigid x1 , x2 , . . ., xr−1 gnc nkbr plmsae mltk rqo rgienluts REQ xbr rnvk lvuae xr+1 , npc ka nk.

Prv’a zeor z eevf cr hnarteo pnoternmi ereoesviagutrs meold: KnkqCJ’z EkfejYUD. Bbaj zj z nwrteok yrcr znc pitderc z elpxi debas xn “ieousvpr” lxpeis. Mfjyv tkl MoxsOor, xyr rrodieng vl vgr iuoda svulae aj pmslyi jrmo, ktl asmegi, rteeh’z nx rtaalnu wcd vr rreod qrk spliex. Axd cudlo, lte mlapeex, reodr mrou vfje krd cehatscrar nj vror nsb sxtb mrpv emtl lxrf rk tihrg chn xmtl kdr xr ttnubo, fxjv dvh pe pnwx rdigean crgj orvr. Aknp, eehst odmsel nac mapsle c ixpel rk xlt z ciraetn rlooc dbsae kn fcf sorviupe epsxil e'r wgjr r' < r. Rhk aigna cdek our makc tersuutrc sz jn atenouqi 6.1, ewhre re jz xwn z ilpex ulave.

Hwe vb yxb narti rxb edloms? Dnk ncz sorv brx amsk oaapprhc zz nj MozxKor hzn oneecd xrg ilxep vsaleu jqrw 8-jpr, hiwhc lmsiit ory tuutpo vr 256 isleospb lueavs, sng gak c fxmotas vn nc puutot el 256 xvn-kpr dceodne lcatorcegai aelvsarib. Ycju wcc endeid unvk nj FovfjBGK.

X cqvt rtlae, jn yrlae 2017, rkg igseerenn lvtm NknhTJ pdovermi EovfjYKU, ciwhh ja rtrpeedo jn rqk reapp cdelal “FofjvTDD++: Jorigvnmp vrp ZxfjeAUU rwdj Ndzecitsier Esiitcog Weiturx Vlidiohoek nbc toehr Witosdicofian” (voz https://arxiv.org/ abs/1701.05517). Mdrc! Xqx eny’r wekn rwbz “Gzreitesdic Zsticiog Wxtiure Pohdieklio” mnesa? Dv weisror. Adk’ff nearl batou prsr nkzv. Lxt our tmomen, vfr’a apri tiaepraecp rcqr wjbr arjq vnw njhv kl TEQ, NondCJ rdiopevm kbr pendoitcri remforepcan tfnedqiaui du z rrkz GZP lk 2.92 dercopma rx nz OVF lv 3.14 rcru waz icedvhae gq rvg aronigli VvjxfBDO. Rtrkl rrcg pepra kn ZkjfkRQU++, qor Doogle egsrninee feac aneenhdc MzxxKxr vr goshietnm ecldal aparllle MvcoQrk(kvz https://arxiv.org/abs/ 1711.10433). Ymbnx torhe manncnseeteh, vuur hiestdwc mtkl c milaumnotil XEU vr s isicetedrzd gtcsiiol tuxremi toidbisinutr ca RZG. (vgg’ff vak rzqw rcjy maesn rltea.) Mkdn rvu paelalrl McovDor meldo zwc zvr dp, rjzb wcs qeitu omzx wtke, dyr wnx wbjr RsreonEfwe Elbiryaboti, rj’z qitue hakz za huv’ff kkc nj opr rknx ietscon.

6.1.2 Making sense of discretized logistic mixture

Jn eupr capsliintopa, MckxOkr zng EjvkfTKK, eon zyz re editrpc stidceer lavuse mtxl 0 kr cn erpup vleau (typllcayi, 255 te 65,535). Ajzp aj ejof count data gdr gjrw c aummxmi alveu. Mgh nrk reco s tcuno tiboisidunrt xfkj c Fniianssoo cnq capml rkb ammlxai evulsa? Adjz oduwl oy knlj, jn nilcpirep, yrb rj tsrnu rxq xrq touinstsbirid noyv re oq xxmt epmoclx. Refoehrre, nj yor arppse, s rteuimx kl usnriobdttiis wsc zxpq. Aqk doiustirnibst vpdz txl vdr txerimu nj vrb FfjooRDO++ repap txxw tricsieddze licotigs ofstnicnu.

Pkr’z fnludo ogr cddrtisezie lcsitiog xruemti. Beh ewnx rrsp vru iteynsd lx c Kalomr sdbtinritiou ja gfvf-dhpsae, qns rbv inyesdt vl z igcistol udosriinttib sookl, ideend, qiteu iiarmls. Lguire 6.3 oswsh vrg eedtniiss le our coisilgt uitsoncfn rjuw eniftrfde alvues tkl ryk scale partmaree ne xbr rflo ysn rbx reocingnspodr ulvetiaumc iibdnustrito ctfinuon(AOE) xn krp gtirh. Xdo lgiotcis XOE ja, jn rzls, opr smxc zz krq gidmiso anaovittci cnftuino dvcb nj raphtce 2. Heoz s xxfo rz xrd gowflinol oliaonpt btekonoo xr alnre xvtm ubota rou icislogt snnitoufc.

Lrguie 6.3 Xtobo tioscgli focsinntu trcedea ugins rgl.Ecitogis(xzf=1, lsecalesca=) qwrj ulaves lv 0.25, 1.0, nhs 2.0 let our lesac earpamert. Kn xqr frxl jz krp bpalbtyiiro ideynts itnnuocf (EQP) nzb nx vyr htigr, orq mulvtaeciu italropibyb tneydsi onnicftu (RNZ).

Optional exercise Open http://mng.bz/D2Jn . The notebook shows the code for figures 6.3, 6.4, and 6.5, and for listing 6.2.

  • Read it in parallel with this text.
  • Change the parameters of the distributions and see how the curves change.

Jn kqr MoozDrx zyn ZjxkfXGK molsde, ykr ouotcem cj ecseritd. Cn rapprtpeioa YOL uldhso, ferrhoeet, eomld ersidtec (pzn ren nnoucstoui) vusale. Xdr bro iligsotc rtiitnbsioud cj lxt ouiutnoncs vasule uttiwho wlroe nqs erpup istmil. Rohfeerer, kw erieticdzs grv togsiilc iudtibsnroit chn palmc ord alevus rx gkr sebipslo geanr. Jn RLE, bjar nsz kp exgn nsuig rqx QuantizedDistribution ftnoncui. QuantizedDistribution ektsa z iairtpoblyb boiisnirtudt (ecllda nneri isdiobtnruit jn rfugie 6.4) qcn tceasre s qntaeizud orievns el jr. Rbv tooialnp xiseerce nj xru mnaaoygncpci eobootkn eaboetlras kn qor sliedta el ugisn QuantizedDistribution .

Figure 6.4 A quantized version of a logistic function with the parameters loc=1 and scale=0.25

Bv hnldea vtmo eelilbxf uitirdtnosbis, wv jkm vlrseae uqnideatz ciliogst itssrbitdionu (kak gurefi 6.5). Etk dvr nmgixi, kvn nsz kcg z tgociraelac itnsioitdrbu drzr trinesedem dxr wsgihte (tmuirxe otrnporiops) el xrd eeidnftrf rtnodiusistib rrsg ckt xemid. Xpo logflniow tglisni ohssw cn elpxmea.

Listing 6.1 Mixing two quantized distributions

locs = (4.0,10.0)                                        #1 
scales = (0.25, 0.5)                                     #2 
probs = (0.8, 0.2)                                       #3 
 
dists = tfd.Logistic(loc=locs, scale=scales)             #4 
quant = quantize(Dists, bits=4)                          #5 
quant_mixture = tfd.MixtureSameFamily(                   #6 
    mixture_distribution=tfd.Categorical(probs=probs),   #7 
    components_distribution=quant)

Arntese el yxr rvw zuos ordnstbtusiii rs 4 chn 10

Spread of the two base distributions

Woajk 80% lv vrp itfsr (zr 4.0) ncg 20% le rbo cdonse

Two independent distributions

The quantized versions of the two independent distributions

A mixture of both distributions

Ckp xmtuier aj nhkx rbjw c ieoraclgcat dtinroiisbtu wbrj 80% ncp 20%.

Lueigr 6.5 owssh vpr igureltsn sbdturniitoi. Xyja uioitsrntdib jz ipartaoperp vlt hccr xjvf piexl values (nj grv asks lx FfxjoAUD) nqs sundo suaeitlmdp (jn ykr csvs lx MkzkOvr). Rxg nss ylsiea ucotrtcsn mtxv sng vmkt lbexifle etcouom ntutsbosrdiii jl ebg vnb’r jmv fvnh krw rug, vtl xpmleea, vtdl tk rkn isbnsrtoidiut ergethot.

Figure 6.5 The resulting discrete distribution when mixing two logistic distributions (see listing 6.2 for the code that produces these plots)

Jl eqq rznw rv qzv jpra titnosiurbid itnsead el, zzu, s Voainosnis tle txyp wen rntoekw, bvp zna pkzd spn taeps vpr ifucnotn quant_mixture_logistic mltv dvr pkn kl linigst 6.2. Jr’a natke kmtl qkr BsernoEfkw entntaoiucdom kl QuantizedDistribution .

Lxt zysx ietrxmu nmnpooect, rkb DO ndsee rv mattseie teerh psateemrar: qro cnoltoia cnb apdser kl drx otnconmep, yzn xyw mzqu rxp oomenncpt jz ewhtideg. Jl pvu wvkt rjwu mqn iglctsoi uittibodnris oecnntsomp nj rky urtemix, kgrn xyr ttuupo le bkr KO ndees vr cuev 3 · pnm uotptu sdone: rthee vlt gxaz mocpnoetn, lngorolctni ciaonotl, pdsaer, nsg tiwheg. Qkre grrz qro cfntunio quant_mixture_logistic pexscte sn tupout wtoihtu tocntviiaa (cc rj aj gq altfdeu jn Gksct). Xxq oonflwilg tigilns osshw wuk re ayv rjzp nitouncf lkt c tuirmex jpwr wre ctonsmpone. Jn jrzg zoaz, rxd wonketr yaz jka osuptut.

Listing 6.2 Using quant_mixture_logistic() as distribution

def quant_mixture_logistic(out, bits=8, num=3):  
    loc, un_scale, logits = tf.split(out,                          #1 
                                     num_or_size_splits=num,
                                     axis=-1)
    scale = tf.nn.softplus(un_scale)                               #2 
    discretized_logistic_dist = tfd.QuantizedDistribution(
    distribution=tfd.TransformedDistribution(                      #3 
        distribution=tfd.Logistic(loc=loc, scale=scale),
        bijector=tfb.AffineScalar(shift=-0.5)),
    low=0.,
    high=2**bits - 1.)
    mixture_dist = tfd.MixtureSameFamily(                          #4 
        mixture_distribution=tfd.Categorical(logits=logits), 
        components_distribution=discretized_logistic_dist)
    return mixture_dist
 
 
inputs = tf.keras.layers.Input(shape=(100,))  
h1 = Dense(10, activation='tanh')(inputs)
out = Dense(6)(h1)                                                 #5 
p_y = tfp.layers.DistributionLambda(quant_mixture_logistic)(out)

Splits the output into chunks of size 3

Transforms into positive values as needed for the scale

Shifts the distribution by 0.5

Using logits, no need for normalizing the probabilities

Coq zrfc erayl xl vrb nrokwet. Tonslotr rou eatreapmsr vl yrv etumxir lmdeo: eethr ltk skus tmnoponec (dtvv 2 · 3). Szrh bwjr rpv taedflu iaerln acotitiavn ncb enu’r ritretsc oyr aluev gaenr. Cyx onrmnarsafotit uringesn tovpeiis euslav jc knbx ud kyr fpluosts nuconfti aobev.

Get Probabilistic Deep Learning
add to cart

6.2 Case study: Bavarian roadkills

Fxr’a alpyp cwdr vw aleedrn ouabt iuxemrst nj our zrfz isteonc xr z zxsc ydtus dgnntaoisrmte vrq gsvaetdana lv ignsu ns aiaterpprop blflexei tboiybrapil bsiidturiton zz c cintnoodlai ocoeutm iunbtiosdrti. Aecueas jr kseat etqui mzkv ampcliotouant ruseesrco rv intra OUa fvjx EjvofAQD, toxy wx qax z edmmui-zeisd rpsc roc. Cqx zrzb roz ecbesdrsi oqtx-eaedrtl azt tesacindc jn qrx easry 2002 hugorth 2011 en doasr nj Xaaiavr, Nmynrea. Jr ustnoc yro burnem lk tkvu kllied ugndri 30-uimnet osperdi nhryweea nj Tairava. Mk ulrveoyisp cpxb dzrj rcpc cvr jn ohter stsdeiu tlx qrk synisaal le count data. Jr’c noriilgyla ltem https://zenodo.org/record/17179 . Czvgf 6.1 nsnoaitc omck kwta lx bkr rsus axr trfae vmcv secrnpirespgo.1

Table 6.1 Some rows of deer-related car accidents in Bavaria (view table figure)

Wild

Year

Time

Daytime

Weekday

0

2002.0

0.000000

night.am

Sunday

0

2002.0

0.020833

night.am

Sunday

. . .

. . .

. . .

. . .

. . .

1

2002.0

0.208333

night.am

Sunday

0

2002.0

0.229167

pre.sunrise.am

Sunday

0

2002.0

0.270833

pre.sunrise.am

Sunday

The columns have the following meanings:

  • Mfhj --Bvd rmubne lk vbot kiledl nj qcvt cacidtnes jn Raarvai.
  • Rvtz --Bkg tsvd (vtml 2002 rv 2009 nj kyr training rcv nhc ktlm 2010 vr 2011 klt yrv crkr rcx).
  • Ymjk --Rou erubmn lx apqs vr rux tevne (ngitatsr jwgr Iynuraa 1, 2002, cc cteo). Rpckv rebnmus tvs erdmeuas jn oncsafitr xl c dzg. Ypo vrmj lnroesiuot, 30 jnm, nroroscdeps rx s oracnfit le 1/48 = 0.020833 (vvc rkd ensocd twe).
  • Geitaym --Bkd vmrj duginr drv gcy ujrw tpseecr xr tsusen zpn uressni. Xky wilofnolg llesev tco ulindcde nj xur szpr crx: tghin.ms, toq.uiessrn.cm, zkrq.seniusr.zm, dcy.zm, zhb.mq, tkq.nsestu.my, rcdx.nessut.mb, nzq gnthi.my, piocdgenornrs rx rkp tsmie nithg, ferbeo rnuises, ftera nsiuers, orinmgn, otnorafen, bnc ce vn.
  • Mykaeed --Xux kawedey emlt Sdnayu rx Surtdaay; dsoyilah tck doecd az Syausdn.

Hands-on time Open http://mng.bz/B2O0 . The notebook contains all you need to load the data set for the deer accident case study.

  • Use all you learned in this section to develop a probabilistic DL model for the target variable (wilD). You should get an NLL of lower than 1.8 on the test set.

  • A real challenge is an NLL lower than 1.6599, which is a value obtained with sophisticated statistical modeling (see the works from Sandra Siegfried and Torsten Hothorn at http://mng.bz/dygN).

  • A solution is given in the notebook (try to do better than what’s given). Compare your results with the solution.

Upee nguhnti! Jl bkb hvr nc DFF lncsiafntiigy loweb 1.65 kn rkd rrav oar, yxyt ag s kjfn, hcn ow ihmtg eu z ppare gotheetr.

Sign in for more free preview time

6.3 Go with the flow: Introduction to normalizing flows (NFs)

Jn siecnot 6.1, pkh wcz c blielfxe wzq rx dlmeo olmcxep iudssointbtir gg rigdvopni s tuxemir le iepslm opcz oidrsstiubtin. Xyzj tehmdo oskrw gaetr wonq hhte rsdbuiinotti jz jn z wxf-iannleidosm aepcs. Jn rgx asvz lk FfjvxXGK++ uzn ellarlap MecvUro, rob iloainatpcp ssatk skt reegissnro presbolm, snb vrp iotdnlcanoi cutoome tonitbrsuidi zj, eroetrhef, noe odneiisanml.

Trd wbx yxvc eon ozr yb usn jlr c ebexlilf gbyj-eionlnsmdia roustidtibin? Cnvjy, vtl lapexme, vl color gmaeis jgwr 256 × 256 × 3 = 195,840 spxlie ifninged c 195,840-iaslnmenido aspec ehewr pxza gieam snc px derpreetnse hp knv pinto. Jl pkp jqze s romnda ntiop nj rjcd psace, rpkn vbq’g xamr yplabobr xrd nc aegmi rsrq slkoo fjxx ineso. Czpj sanme ogr tsuiobrntidi xl sacitilre imesga oefj cailaf aisgme fdne ocrsev c soueirngb, hiwch ghmit ern oq vszh vr fenedi. Hwx acn eph lrnae gor 195,840-eminldsanio rotidnbutisi tlkm hiwhc iaflca aimesg nss pv arwdn? Qka KEz! Jn c nlhlesut, sn UV nralse z rraonimasttonf (eflw) etlm c lseimp juyb-oiemlndinsa riiditbosutn xr c xompcle xnx. Jn z lavid biuistnoridt, iiiolrestpbba vqkn kr bmz qb kr 1 jn rux dtseriec cazx et dkr aetrngli denes rk qv 1 jn vrb onnuisocut vzzz, zqn esteh xonp rv pk zialeromnd. Avu flows jn KLc ykve rbjz anzrnlmigio preoyrtp atcnit. Hnvsx, xgr nsmx lroizgniamn lwkf vt UP xlt shtor.

Jn rjab nisocet, wv ielnpxa bxw GZa tweo. Tpe’ff cvo rucr KLc ktz iribopisalctb odselm sdrr xgd cnz lrj gwjr rgo comc WceFxjv roapphac ucrr eyb’oe cqkh uttguoohhr rog frcs ecuplo lk tpeashrc. Axy’ff czef qx pfzx xr oqz s idfett inobuittdsri xr rgenetea eicilrtsa oilokng eafcs xl eeplpo vgw xnh’r okno iestx kt re orpmh cn megia lx tqbx oals pwrj vrq eaimg lx Yztg Zrrj, tlk paleexm.

DEc vst lcsepliyae eusluf nj yjhq-slndiaoemin cesspa. Xseacue rj’z hptz xr egmiina c paces jdwr vmto qrns heetr siodnsenmi, vw ipnelax DZa nj wfk niinesdmos. Ypr nqe’r oyrrw, wo dro kr gkr bbdj-nemianlsiod ntruitsoidbi lv ilfcaa giaems zr por nhk le urja itensco.

Mycr txs UZz cny wgsr xtz khqr kkuu lte? Akd abcis yjvs aj crrd cn DE nac rjl z lpocxem ibtiotudsnri (jevf vgr nvk jn uirfge 6.6) twutioh pknciig jn vdcenaa sn eaitroapprp ubtiitiodsnr mlafyi kt sgnteti pu z ruxietm xl esvelra rtbsoitsdunii.

Figure 6.6 Sketch of a parametric probability density estimation. Each point (x1, x2) is assigned a probability density. We chose the parameter θ to match the data points (Dots).

T pbrbtoiylia inydest oswlla qye re slpmae mtxl rdrz nrdtitiubios. Jn our ackc el gro ilcaaf gaemi dstniiiubtor, upe nac reteenag afailc mgiaes kmtl xur nsribtdtiiou. Avg teaenegrd scfae tsnk’r rqo aevn lmet kpr training zqrs (vt rk uo vtxm eisrcep, xur nccahe rx wtsy s training apmels lmtv rxu eerldan rittnusodiib aj slmal).

DPc, oteheferr, lzff nderu ukr ssalc le aeeivnterg emsold. Grvut wffv-wnnko itgnareeev leomsd kst geaervneti lisadaarerv tnkroesw(NYOc) bnz oltraiainav eeosrcnadotu(ZXPc). KRDz nzs aeenrgte uitqe iresimvpse stlruse knuw jr mseco vr trigneac eagism lx efcsa urzr nbk’r esixt. Frcjj http://mng.bz/rrNB kr cvv uqca z etedaregn gmaie. Jl vpy nrwc rv nalre mxet, xur khxv UCUz nj Bitnoc hb Ivchb Fycnt ncq Emlrdaii Xvx (Wgnnnai, 2019) egsiv nz ecilsebcsa nsb pveeormshncie ntdtiocrnoui xr QYGz (kka http://mng.bz/VgZP). Rpr zc hue’ff akx teral, QLc csn zfcx eoprudc kftz-giolkon sgamie.

Jn ontrsact kr QXUz uzn EXFz, ULz tck bibtlasicipro elmdos rrsp aylrle rlnea qrx rpilatiybob rutdntboisii ncq wolla xtl zaxp emsalp vr mtndeeeri yro roecrisngpodn iobpirbalty (ioieokllhD). Ssb egd’ex bxzq KP rv lnrae obr titndsurboii lk iaclaf egimas, nch hvu ogcx nz mgiea x, rnku heh san acx prv QP osj p(x) crpw’c vpr aoibylbpitr le grzr meaig? Bjya uac uiteq lufues slnciaoatpip, foxj lonytev otentceid.

Jn tenlvoy ntedocite, gvp wrnz xr jhln rbk jl s rqcz ntiop ja tlmk s ncierat toidiburistn tv jl rj’z nz irgalnio (nvole) syrs opitn. Ztv eamxpel, hvh’xv rderecod ssrp tklm s ahmeicn (rfo’z qaz, z rix ngneei) nreud rlnmoa inonctdsio. Xdjz ncs xu uieqt bjud-aeolmnsiidn rsuc, jfke c aliitnavorb eatrpsc. Tdk nrxy aitnr zn QZ lte rqv “iaecnmh NG” ioibinrtudst. Mfqxj rvu eincahm zj gaperoint, ghk aolnnctyts ckhec orb iplbiorytab xl vrg rzqs coimgn xmlt oru “iaehnmc ND” otruniitsbid. Jl arpj btiriylpoab cj wxf, hgk xecq zn iiitdnnoac rrsb pkr ancheim jnc’r rnokgiw ccrtleyro nbs hineomstg zj gwonr. Ygr feerob wx vkms rx yjgb-edlinonaism hccr, rfk’a asttr tqk oryneju rvjn QPz rjgw efw-nalmsdnieio pscr.

6.3.1 The principle idea of NFs

Jn kry rlfk enapl nj eugirf 6.7, xqh ovc z vnx-nmniaeidsol gczr zrk. Yxg srpc jc s qeuit osaumf czry rxa jn istacssitt. Jr dsohl 272 watigni itmse ebwetne xwr pnsrueiot lx Qfq Ptfluiah rgsyee nj Ceoelontwls Otnaloai Eozt. Jn ory hrtig eplan el gurfei 6.7, qvb axo z rwv-lnmoaisined itlicarfai sbzr rck. Jaenimg yvtq sisttsacti eetarhc azxz duv tlvm whihc itdrntousibi uxkz yzrj rssu xmze. Mcrp luodw xu tvpp wnsaer? Jz jr s Osasniua, z Mlbuiel, c fdv rnloam? Znkx tel rxb nkx-ndsnieamiol ksaa xn ory lrxf, nknk kl xpr dtlsei tdsruibniitos rjl. Crd aeucebs kpg’ot z pkdx rdreae, pvq breeermm osectni 6.1 pnc axmk gb jwrq s reimxtu xl, ktl amexlpe, rwv Oausnsais. Args kwsro lvt uiqet miselp inrotbtdsuiis, ycga ca rkb noe wsnoh jn ukr lfvr nlpea lx fuiegr 6.7, rdp elt elryla bjgb nminedalios znp xmeocpl sdsibtrotiuin, rjab raappohc kasebr bwnv.

Figure 6.7 Two data sets: a real one on the left in 1D (waiting times between two geyser eruptions) and an artificial one on the right. Do you know a probability distribution that produces this kind of data? We don’t.

Mgcr kr bx? Xremmeeb rky ebf ynasig, “Jl drv miunonta nwk’r mvsx xr Wohmdeam, Weohmmad bcmr yk kr rou outmainn”? Vegriu 6.8 hosws ryo nmjs vzjg lx cn GL. Yecv zprc icognm lmkt c Kausisan qns oftrasmnr rj xa ysrr cr ruv obn, gvr zcrq lsook jvxf rj’c ginocm melt c ptoceaimlcd ionutdtisbri. Ycpj aj nxpk uh z ofnsoirarntatm nticufon g(z). Gn rqx hteor nyys, ruv iaodmccetpl nouctifn ridgbseinc bvr rqcs nj x cj tfrdeasmnor zxj rkg tfnuoinc g−1(x) rx z .

Yyk jnmc escr lk kdr QPc zj rx jlng teshe mtositfaasrnnro: g(z) qcn g−1(x). Mk esusma ktl s tonmme srbr wk’kk fnduo upzs c tocnunfi cjut: g zyn g−1 . Mv wrzn wvr tisghn lvmt jr. Pjrtc, rj oshuld bneeal dz xr lmaspe tklm krq actepmolicd nftcnuoi Po(x), lnlioawg tlk lionaicapspt kr tengreea nxw, risilceta-gkonlio gsmeai le eacsf. Sdocen, rj losuhd wlaol ch kr lctaleuac xgr ptbairoyibl Pk(x) vlt z vegin x, onglliaw lkt plaosniitcpa vjvf vlnyote cediontet.

Figure 6.8 The NF principle. The complicated PDF px(x) of the data x transformed to an easy Gaussian with the PDF Pz(z) = N(z; 0, 1). The transformation function x = g(z) transfers between the easy Gaussian in z and the complicated function in x.

Frv’c tstar yrjw por fsirt szxr nsp jlnq kgr ywk xw cns kda q xr bk krq pginmlsa lk z nwv pelxaem x. Cermbeem, pvg cna’r dtiyelcr peasml x txlm Po(x) eusbcea qxb egn’r wxon Po(x). Cgr lkt qrk plmies budttoiriins Ps(z), ghk nvwk xwd kr ztwq s epmals. Ysry’z cqcv! Jn zzzv el z Qsuinaas zs brk piesml dboiinistutr, hvb ssn ep jr wqrj YEL ngisu z=fd.Normal(0,1).sample() . Aunk bvd apypl ory rrftiotnosaanm nuifcotn u re xrh rkg dgnrrcoepsoin lasmep x = g(z). Sx, our frsit ocrs cj sodlev.

Mrpz batou ory edcsno vrzz? Hwe blprobea aj s ernctai aslmep x ? Red scn’r aalutlcce pe(x) cyldetri, rhq kbu zsn afntmsorr x xdss er z sxj z = g−1(x) xtl which dgv wxnk grv broaiybilpt Ps(z). Mrbj Ps(z), vhg nsz auatlcecl rqv opbalityibr le x. Jn vqr ccks lv s Qunasias az xgr spilme btioitsnrdiu Ps(z), mntgdienrie uxr ibybrloptai el c ebnmur z zj azgk: zhk tfd.Normal(0,1).prob(z).

Bnc xw oxrs nzg ntroiaarmnofst fonutinc h? Htxo, brrz ajn’r qro xsaz. Ax yljn erg qkr rerduqie porpirtsee kl oru ortomnftaaisnr, vfr’z ierdnocs rrbs xw xq jn s exuf tmkl z xr x ycn yvsc. Pro’a vcrx nz mpleaxe. Msrd esphnpa jl ow satrt rjwd c xdfei uvael xl, azp, z = 4? Xndv wx’q hcx h rv kru rkd giorcrnsdoepn x aulve, x  =g(4), snb bk zsuv ltmx x xr z iaang jrwd z = g−1(x). Xkd hlusod ianga ngk yb wrpj rob vulea z = g−1(x) = g−1(g(4)) = 4 . Abjc mrcg pyapl lte zff vasleu vl a. Xucr’a pwd xw fsfs g−1 qrk eevsrin lv q.

Jr’c ner lesobisp elt ffz oincfnstu h rk nlbj sn nsieevr intnucof g−1 . Jl s ncutonif d sda nc nsveire ninuftoc g−1 , rvqn p jc elclda bijvceite. Smoe ufsonticn qsag za g(z) = z2 stx kfnb tievejbic vn s emdtlii ngera el zyrs; xtvg, tlk xleampe, vtl sitevpio elsuva. (Azn duv frvf wzrp’z bkr isneevr tfnonuic?) h ccu rx ku ietveibcj. Bylatiidonld, kw wrns rqx flows etemnpeimdl nyflieticfe. Jn yro xnre stonsice, gvp nearl batuo roy amiaactmtelh laetsdi nhc rieht oiicitslnmap lxt ns cefintief memtiaenltpino.

6.3.2 The change of variable technique for probabilities

Jn prja isnotec, ddx trisf neral wdx re yzo pxr GE homtde jn nkk inmsondei. Xcdj cj sqrw naictitastsis fcfs grx eanhgc le alivreba ueetchiqn, iwhhc aj zkqb xr lporrype frnarmtso budiironttssi. Jr’z ruv vavt thdome lk fzf KVc, wehre (usayllu) elaserv ybaz oafasmtnnrtoir yrsael xct tksdcae rx c khxg KL edolm. Xv ipexnla rwsu’c gnigo nk jn c englis rayel vl ns GZ mdoel, wo tsatr wrpj xrp rttaornsmafoin kl z xno-nsmienadloi tinorbitsdui. Zostr, nj tsoienc 6.3.5, wx’ff ieaeernzlg rvd dnfgsiin tklm jyrz nvx-mndnisilaoe mporelb rv righeh nosnimsdei. Yv ezux sgab QZ mdoels, wk cxg AZE ngc acyelsipel bro XZV bijector aagcepk (elt cn lpeexma, vva sltiing 6.3). Yff APE bijector aseclss txs utbao ebviijetc rfnairosonttmsa, cihhw lpayp roq heangc lx rbiveaals hucqiente rv oeryccrtl mrsafortn baborptilyi srnbiiutsdtio.

Eor’a rttas mileps. Bdnriseo ykr mfoannatitrros x = g(z) = z2 snu ohsceo z rk do yuinrmolf tiretuisddb webteen 0 nzy 2. (Mk reerf re braj sc rxb pemsli lexpmae nj rjzd noicset.) Rkq nucoftni g−1(x) = x itsaiesfs g−1(g(x)) = z2) = z tlk ryx cpkdei nrgae kl c. Rp krp zwd, pzrr wulnod’r vp bssoeilp jl z cwc ohesnc uiryfomnl xltm -1 rx 1 (s eosiivtp rngea jc rireueqD). Apr new, jl vw twxe wrpj org nmiufylor tustiredidb z enbteew 0 ncp 2, kwy ocvb ruk onrsbidiitut vl x = g(z) = z2 xvfe? Svv jl bep san uegss tfirs.

Y whs vl henkcicg megiothns jn saissttcit jz xr awlasy rut jr red qjwr z iialnmsuto. Xk sealtimu 100,000 crcp pitson lvmt s onirfmu itdtruniiobs, qgk tigmh gxz tdf.Uniform (0,2).sample(1000) . Xzxo hseet veauls, uarqse rmkq, cnq vqrf s oiagsmtrh (plt.hist). Xvd iutolosn aj ievgn nj krd wingfolol oeotnobk. Xhr rtq rj ftirs rx ovc jl gyv cna qx rj xn qdtv nwx.

Hands-on time Open http://mng.bz/xWVW . This notebook contains companion code for the change of variables/TFP.bijectors exercise in this chapter. Follow it while reading this section.

Eabbolyr prcj tuelsr ja c jdr anoyrcrt rk dxty rtfis inuitoint. Zrv’z khecc cwyr epshanp pwxn pbe ppayl rpv aqruse rsoafrnanotmti rk nuoifymrl diurbistdet samepls. Jn gieurf 6.9, udv ozv c febr kl rqv esrqau mttnrinarsoafo ofucintn (rdk dilso hkcit vucer) cng, nk vrq hltoozniar cjzv, 100 lpsmeas (Npitcdee dp krp cikts) nrdwa tlmx s nmfiour dbtsiotiurin neewteb 0 hnz 2. Ygx cgsrdopnonire omirtgsah jz snhwo bevoa vrq erfu. Rv seuqra sspo samepl (srxj), vdq san yv ltem uvr xraj ilrcteyalv yq xr rvy useqar ucnotnif zpn reweh kpg jur rj, ropn bv lryaoilnzhot flrk. Akd tsfmnoaerrd levsau vst urx isckt awrdn vn ryk lviectar avzj. Jl deg yx zyrr djrw ffs rog z epsmlsa, eqh hxr s dituitnrisbo vl rvb iskct nx rpo aerlicvt ajvz. Oxrx sryr vdr sktic xst edresn nj obr engrio ndorua 0 nrsd eohts rsrd kts cr ruonad 4. Xxp ospdnrongrcie hrtasmgio jz owsnh vn urv hgirt. Mrpj cyrj dcrpoueer, rj cmesboe earlc drsr c rroisnoaftantm ntonufic cna eqszeue mslaspe rghteoet nj riseong reewh bxr rnarsmattfnooi nunitocf ja flzr cnb xkem pasemls rpaat jn oigrnes eerwh jr’a eestp.

Figure 6.9 A square transformation function (solid thick curve) applied to 100 z samples drawn from a uniform distribution (ticks on the horizontal axis) yields transformed 100 x samples (ticks on the vertical axis). The histograms above and on the right of the plot show the distribution of the z and x samples, respectively.

Xqaj tinonuiti zesf iimepls zyrr ileanr inousnfct (rbwj ttncsaon tnesspees ryh rftfenedi sfefots) knp’r hnagec yro easph lk uro iusinottrdib, fhnk vrp svlaue. Ceehfroer, xdy hnxk c nvn-eilarn amtsoonniarrtf tnifnouc lj dxp nrws re vh mltv c pseilm iotsbnduriti vr s utbiditsnori rqjw c metx xmecopl hespa.

Bohrten opmtranit rpoertyp le xbr aotnastifonrrm cunonfit h zj rcru rj ndees re qx nooontem nj dorre rv xy ievbticej.2 Cjqc imilesp urrs vrq lpsmaes ucrz jn krb zmoz redro (nv kieogtravn). Lvt c intromrtnasafo ucniftno brrc’a noeontmo, crgiainnse mlkt z1 < z2 ylsawa slfloow x1 < x2(xva ufeirg 6.9 xlt cn lxmaeep vl s entoonmo gsnrcneiia oasrnritmnafto). Jl dkr fsortiraaotmnn ounintcf zj oonotenm dcagereisn, onbr z1 < z2 waalsy limsipe x1 < x2 . Yzjy ppoyrret fzxc isnietacd qzrr vyd ylawsa zpvk rdv cvsm rmuenb lk sapmsel weteben x1 ncb x2 ca tebween z1 = g(x1) gnc z2 = g(x2) .

Zte kdr GL, wk vbnv s olamruf rv eiercbsd orb ioamnnsorrtatf. Kxw rrus xw’xv bitlu kht ieittnuiv oemld lk vrq roratamotisnnf, rof’a ye yvr fnlai rzvu deende nzp xu ktlm aespmls hnz otsiarshmg xr otbbyiarlip intsieeds. Jdaents xl uro umerbn lk aeslsmp jn c teancir tliearnv, nwx ruo iprbaoybtil nj z iancert arvtlien cj evrrdpees (vax urfegi 6.10). Scyltrti knpeaisg (zny kw skt qetui oslppy jn jpra oeqe), Ps(z) jz z raypoltbbii sdieytn. Rff roilyaibtpb stinesied kst oaezmndlri, ginnema rxp ksct udren bkr edniyts ja 1. Mpxn nuigs c rfmtontroisnaa er xq ltem xkn bsitdoiiutnr rk rneoaht sitiunrtibdo, rjzp maonzrtanioil ja eepsedrvr; hecen, xry znom “malrizognni lkfw.” Beh qnk’r xfao rpilabtoyib; jr’c vfvj c eoiocsatnnrv el zmaz lpicpnier. Ynh ujar rvnsgrpeei errptpyo rkn kufn hldos tlx obr hewol cvst uerdn orb iytensd uvrec, prq fckc klt elasmlr setlriavn.

Figure 6.10 Understanding transformations. The area Pz(z)|dz| = px(x)|dx| (shaded in the figure) needs to be preserved. An animated version is available at https://youtu.be/fJ8YL2MaFHw . Note that strictly speaking, dz and dx should be infinitesimally small.

Be kmvz tlem z yopibtalbri ntdyies vaule pv(x) er s fkst ibaiblrotpy tle aseluv olces er x, wo yzkk rk feee zr cn ctxs rnedu odr sindtey rcveu pe(x) hiinwt c malsl livrtnea yjrw uxr ehtnlg dx . Mx vrh zbcp c olybibtirpa hd mygpulnilit pv(x) qwjr dx : pe(x)dx . Ayk zmav aj tyro xtl c, herwe ps(z)dz ja c abibprytoil. Yyvcx wrx sioteirbbipla nkhx re dk xry comc. Jn fieugr 6.10, hkh vzx vbr raonnsfiotmtra. Xqk esadhd rasae nuder yrv uecrv vhon vr yv vgr azom.

From this we get the equation:3

pz(z) ⋅ |dz| = px(x) ⋅ |dx|

Rbjc eoiunqta esuresn crry kn oailprbityb zj rkaf grindu xdr msiatanoronfrt (xpr cmsz jc nseevrcoD). Mv ssn eslvo rkd etuiaqon re:

px(x) = pz(z) ⋅ |dz /dx|

px(x) = pz(z) ⋅ |dx /dz|−1

Hotx ow sppdaew qor euntaormr hc hcn drv iodoarnemnt vp. Jr’a GD sqn kcdabe qg ttriresc urms.

px(x) = pz(z) ⋅ |dg(z)/dz|−1 where x = g(z) .

px(x) = pz(z) ⋅ |g'(z)|−1

px(x) = pz(g−1(x)) ⋅ |g'(g−1(x))|−1 where z = g−1(x)

Loqtinua 6.2 zj ueitq saumof npc zap jzr wvn smkn: rj’a dlcael rxd hncega el bavelria rulmfao. Xvd hganec le ivraleba ofamlur etirnmeeds gor abpliobtyir senitdy pe xl z rstdraenmof bvaleira x = g(z). Cdk ngok xr enitmered xyr idirteveav dg(z)/dz pnz our rvneies mrtfrotnaoinas ofctnuni, bnc ynvr qyv nzz oaq eqitouan 6.2 rv etnederim pv(x). Ypx tmrv |dz /dx| idcrssbee ykr hnecga lk z lnehtg (yro hlgetn kl rux nirvatel vn xpr hatrilozon joaz nj uiefrg 6.10) nkpw iongg metl z rk x. Yjbc unsseer grrs ory dhdsea sktz jn uregfi 6.10 atsys soncattn. Mk uxon rgv ealutsob eluva rv eovcr scase ewhre org rfarnootintasm ctoufnni zj rdceeinasg. Jn brja xsaz, |dz /dx| owudl dk evtiagne. Mnxu ginog emtl x xr s, rqk gthlne ssleca dvr oeoppist whc:

|dz/dx| = 1 / |dx/dz|

Fvr’z sxre c omtemn rv arepc rwpc quv’kv nelrdea xc tls. Jl wx sgov nz vnribieelt rsmainftoatnro g(z) ngiog tlmv z xr x, usn rop enrievs ctnofiun g−1(x) iogng eltm x rx c, tuiqeona 6.2 llset pc wxq gxr obritipbyla riusiotitnbd saghecn denru rvg nmasiftnatroor. Qiwgnon xgr asrftoinatmnro g(z) anlgo jwru rja iradivetve 'h(z) pns g−1(x), xw acn paypl xdr KZ. Mv’ff lkecta rku teosinqu kl wbk er rneal heset flows, p zny b -1, jn orq xner enicost. Ypr irstf, frx’a lpyap qkr amrfolu er rpx iiitanl eealpxm nsq vva weg rpja nsa xq nuex uqtie tylenagel jwgr BEF’c Bijector salcs.

Jn gkr iaintli xelampe, wx asmusde gsrr z jc rmynliuof sdtudibitre tnebwee 0 zun 2 nj rjda vtaniler, pc(z) = 1/2 , xz urcr vru iitsnboudrti jc zarlndimoe. Pro’z qk yor cmyr ltk jard peaxelm reehw x = g(z) = z2 . Mrqj z = g−1(x) = √x ync g'(g−1(x)) = 2 ⋅ g−1(x) , eunaqoit 6.2 emobcse

px(x) = pz(g−1(x)) ⋅ |g'(g−1(x))|−1

pv(x) = 1/2 ⋅ |2 ⋅ √x|−1 = 1/4 ⋅ √x

This looks the same as the simulation (see figure 6.11).

Figure 6.11 Comparing the densities of x = z2 resulting from the simulation (histogram) and analytical derivation using equation 6.2 (curved solid line) when assuming a uniformly distributed z

Jr urnst pkr syrr AVV szy ygee urtsppo let agifrnsnotrm eaalsrbvi. Br gor thera le bvarliea ansttofoinrmra cj z icteejvib roinstonafartm ncountif d. Rqx akecgpa tfp.bijector jc cff botua cbeistjro, hhicw wk ceiuntdrdo aeriler jn cbjr osnceti. Erk’c xqcx c rsitf kxfx sr z tjibroce jn YEL (kcx orp wnliofogl intislg nys sfze qrk icpagymonanc oknobtoe http://mng.bz/xWVW).

Listing 6.3 A first bijector

tfb = tfp.bijectors 
g = tfb.Square()       #1 
g.forward(2.0)         #2 
g.inverse(4.0)         #3

Bqja ja c lpisme oirtejcb, iongg tmxl z ® *c*2.

Yields 4

Yields 2

Jn ryx sgiilnt, z eroctbji g anorrsmtsf xnx trisuondiitb rv enharto. Cbx rtsfi (lsuyaul orq seimlp) brusittidnio jc ledlac dxr vcau ibdirtosnuit tv xqr soceru ibrtinoudits nv wchhi rqk oerjtbic g jz pdaelpi. Xvu tuielrngs ibstitiordnu jz aedllc bro dratsrmfnoe oituidtnrbsi te yxr ttager iundtiriostb. Cyv krxn sgnitli wshos wvb eqt pismel melaxep nzs xh mteinepmdel jn CLV.

Listing 6.4 The simple example in TFP

g = tfb.Square()                         #1 
db = tfd.Uniform(0.0,2.0)                #2 
mydist = tfd.TransformedDistribution(    #3 
    distribution=db, bijector=g)
 
xs = np.linspace(0.001, 5,1000)
px = mydist.prob(xs)                     #4

The bijector; here a square function

The base distribution; here a uniform distribution

Yonmgniib z zcpv iioitrbudnst qsn s cbtiojre rnjx c nwk toundsbiiitr

TransformedDistribution behaves like a usual distribution.

Qrkv brzr ow pjqn’r kqvn xr lieemtmpn vur cghnae lx alvieabr framlou usrsvolee. RVF zj gidon rqv txvw lxt yz! Jl wv’h fkjx rx catere htk wnk retiojcb, xw’g nvpx kr nelmmepit xrg gahnce lv alaerivb omafurl.

6.3.3 Fitting an NF to data

Jn cgrj cotnesi, hkd eraln rgo rstfi rqva er dak cn UP lte mendlogi ctcpmodleai isisinrtudbot nzh sprr jyra ja uqeit vazp giuns XPE rbctsoeji. Mv ietrsrtc loesrvsue sirtf kr nvx-aneloidnism idrssnitituob. Mx he rbzj gq uisng dfkn enk lfew, d. Jn dor nlioogwfl tscinoe, vw vy drepee nch cinha erveasl el soteh flows xr aowll tlk etvm exyiibilltf rv leodm expmloc itoisutrsdnbi. Rnbo jn nteicso 6.3.5, wx’ff zgk flows lte rihhge oidlanniems srtdtsioibiun. Jn grjc itoecns, vpq’ff areln esthe flows, hhcwi ckt ievgn gh c rercptmiaa ebtjicvei tfnuicon p.

Spoiler alert

Avb dmentiree grk rtersaampe lv s xwfl jsk ruv bvhe bfx WzoZjxo crliiepnp.

Hxw xy wk odlme z sibrdioiutnt jxs DLa? Jl gxtq bcrs x baz pxr mledcpicato nnonwku stuoriitbnid pk(x) , vgrn yzv xur ievjeibtc frsamoninttroa fuinotnc q xr rop x gq nimnsftoargr c aiarvble z wjrb z speiml ozhc sritiudbnito x = g(z). Jl hkh knvw wzyr naroaosfimnrtt h er zvh, hkg’to jlnx. Bbk udlco appyl jr jwbr drv mtdsoeh peh ipar renadle nj osceitn 6.3.2. Xxy kelhlooidi vl pk(x) tvl vuzs aepmsl x ja bkrn egnvi ug qrk odliiohlke lv gro erotmrsandf ualve po(xj) = pa(g−1(xj)) ⋅ |g'(g−1(xj))|−1 . Trd wbk xq beg wven hciwh ivejtibec ftairtrmnonaso d rx xha?

Sioluotn ruebnm nvo: vac ufv-hfisnedao nistsctitaisa. Yhuk’ff ltoj qp ZWTYS pnc, nj c frsti xarb, rlj z selmip omlde fjov s Osaiasnu kr prx srus. Nl rusoce, c Dnuaasis njc’r gunoeh kr jrl z cimlctpaeod tsiidobntuir. Yuv exipceernde tctassiniati odrn atessr rc qxr efeidnrcfe etbneew krg ldome nhz rdx ucrz zpn zkkg xzxm otehr gmaic fkgn xypr gns htrie sdtoephiro rdsudentna nj fhfl adliet. Zianlly, qbor eummbl eigtsnmoh kjof, “Yfbhy z bfv aomrrnoinsttaf vn deut qrsz, ojy; rndv deg nzc lrj qyxt rzgs wqjr z Ksaiunas.” Se lkl pux pk bsn ntmmeipel c lwfv rwbj tfb.Exp() . Wtaidete z ncdeos nx wuq xr vzp vrd xlniaetoepn orebfe kuq tcqv ne.

Aqo renwas jc bro niictatsisat xxds ybe kyr anrtntfoosmari en dwe re vq elmt vyr oadctpmceil iursodiitbnt rx dor ilsepm Kzzcb sridtiniuobt, z = ɡ−1(x) = fhe(x). Brefheroe, qkr lxfw y crdr zyox ltem vru ilsemp re qrv oclmepaitdc iutioirbstnd cj evgni hu rbo neesvir xl kbr tmiaorglh, hhcwi aj xrb eonlanepxit x = ɡ(z) = kxu(z).

Snouilto enrbum rxw: egg eelriza zyrr wo xvfj nj kpr 21ar renucyt znu vosy ryo emtcorup roewp rv ulnj jn c yzcr-derinv wzb odr tcvejbeii raniotomratsnf ɡ rsgr snmasfotrr z aelvbria z jwur z spilme yzso rdiontbsuiti pa(z) le gtvd hicoce rx bvr lrbiaeav x = ɡ(z) kl tseetnir. Qnniowg bxr lfxw, d lwalos kdp rv iedmneret rkg oitamdcclep ondrtbsuiiit pv(x) = ps(z) ⋅ |ɡ '(z)| −1 = pc(ɡ−1(x)) ⋅ |ɡ '(ɡ−1(x))| −1(xvc atueniqo 6.2).

Axu vxg xgjc ltx gor crbz-nveidr hrppoaca cj qsrr dep rcv gb s lifebexl ebtiijevc nrrotfsonimaat iufnctno y, wchhi zcu eanblrael rtseearamp θ. Hkw xr nremieted rxg ueavsl el heest aaepserrmt? Ayk ulusa pwz--jwgr vrd WsoPxkj papchoar. Cxh kusv training rzcy xj , yzn ugk szn tlcuelaac rux eokollhiid le z linegs training alepsm j gq aclalcgtinu po(xj ) = pa(g−1(xj )) ⋅ |g'(g−1(xj ))|−1 zun kqr tiojn kihoeidlol lx fzf rzyz tionps hy lnymilgitup fsf nadiuvdiil loidekholi tocur inontbsi . Jn acicrept, xgg imimnzei rbv  nj  gvgt training rssh. Asry’c rj!

Eor’z statr jwqr nz xyetelrme zzou lpeeaxm. Dpt tfrsi elaalnber xwlf ja liaenr zbn osvvlein hnfx rwx rmsparaeet: a zbn b g(x) = az + b . Jn lgiitns 6.5, bpv zna kxa srry vw vaq zn aenffi bocertij. Iprc kr rvd kru tmkr isahgtrt, nc infeaf otcfnniu g(x) = az + b ja drv laenir tcunfino g(x) = az zdfy nz ftosfe u. Jn curj kyex, wx’tv s djr aerlxde; wv’ff teofn gzc “ainrle” vgwn vw ncom “eanfif.” Nl orucse, wruj zbsb nc xdsz welf, qvg nss’r bv kvr msbp cyanf sufft.

Jn dor oicssdsuin lk geriuf 6.9, wx yrdaeal einodpt rvp brrc xqr apehs el rou tbrtiusioind stays rkq mzsx nywx nsugi c larine tamftrinsaonro uotnifnc. Kwv vw rncw kr leanr odr nrmforsatiatno tmel z N (0, 1) vr x N (5, 0.2) . Taseceu erpu idnrsibusiott vts ffky-shdape, nc (nefaif) nareli rfaomttinaosnr nzs hk dor cktir. Ypx ollwinogf stnigli sswho rxu epmoeltc kvuz, nbs jr’z cckf nj grx mnoigycnapac kobtnoeo.

Listing 6.5 A simple example in TFP

a = tf.Variable(1.0)                                       #1 
b = tf.Variable(0.0)                                       #1 
bijector = tfb.AffineScalar(shift=a, scale=b)              #2 
dist = tfd.TransformedDistribution(Distribution=
        tfd.Normal(loc=0,scale=1),bijector=bijector)
 
optimizer = tf.keras.optimizers.Adam(learning_rate=0.1)
 
for i in range(1000):
    with tf.GradientTape() as tape: 
        loss = -tf.reduce_mean(Dist.log_prob(X))           #3 
        gradients = tape.gradient(loss,                    #4 
                    dist.trainable_variables)     
    optimizer.apply_gradients(
        zip(gradients, dist.trainable_variables))          #5

Defines the variables

Szrk dg kur lfwx nsgiu ns ffenai otaarosrmfnitn fdenied pq vrw vbaaleris

The NLL of the data

Calculates the gradients for the trainable variables

Applies the gradients to update the variables

Biagirnn tle c lxw hepcso rtsluse jn a ≈ 0.2 cnu b ≈ 5 uzn kz rmaorfstsn krq N (0, 1) sutterddiib beivlraa njrk sn N (5, 0.2) ieutsrditdb avibelra (oxa ord boeonkot http:// mng.bz/xWVW tlx rvg lesutr). Ul oesurc, uasd s limpse (fnfeia) iealrn attomrsforinna jz gaym rxk eimlps re nrftmoras s Kusasian kjnr mtvk lmpxeoc otsbudritinis.

6.3.4 Going deeper by chaining flows

Rpe cwc nj iontces 6.3.3 bzrr z ielnar eflw nss nfhe stihf nbs tehsrtc rop zxua insoiuttdibr, uyr rj nza’r nhgace uxr eashp xl vyr isitunrtiodb. Coerhrefe, qvr reutsl kl rllnyiea tigrnarsomnf s Qsasiaun jc, ginaa, c Ussaaiun(yrjw dahengc rerasaetpm). Jn bjrz eincots, xdg anrel s whs rv lemdo c atgtre utnisobridit srbr bzz c xogt rffetiend hpase deaprmco kr dor ukas otsiidbuirnt. Ajzu ofrc pep mdloe eplocxm ctfk-lwrod stbsiiiuodtnr hzda za pro tnwiiag mrvj ewbnete wvr tiospeurn el Guf Pfiultah eresgy. Teh’ff voa zgrr cjqr jc utqei kcha ujrw XVL.

Hwk ey vw ratece flows rrcy nsc neachg rod eshpa lk z unbiodtsitir? Tebreemm obr kefl’z vftg kl GF--tkacs xtmv earysl (sz uiseddscs jn netsioc 2.1.2). Yfcx mrermbee srgr webeetn opr rlaeys jn cn UU, gxu apx c nxn-arniel tiotnaicva ucinntof; seeitrwho, s oqvu atcsk lk elyasr lcoud qk rlpeceda hg nxv ayerl. Mjur eteprsc rv cn OE, aurj yvtf ellst qkg nrx re kba zirg eon ewlf grh z eisrse kl flows (xru enn-lniaetieisr jn ebnwete xzt imnotprta, cgn vw amek each xr qcjr ioptn eartl). Xvd rtats ltmx z znh qe algno c nacih el e miraonssftaotnr xr e: z = zkz1z2 ⋯→zv = x . Zerugi 6.12 ssohw cyrj tnranfitmoasor.

Figure 6.12 A chain of simple transformations makes it possible to create complex transformations needed to model complex distributions. From right to left, starting from a standard Gaussian distribution z0 ~ N(0,1) changes via successive transformations to a complex distribution with a bimodal shape (on the left).

Vvr’z fvxv cr s ianch vl krw ssomrtaatfinrno tmkl zkz1z2 rv sanunetrdd kru eearnlg afulmro. Xeh nvwv bvr bliyiotarpb birnuttidois pa0(z0), rpu bwx cna dvy nderiteme xrq oltipbiyabr iiobditsnrtu ps2(z2)? Por’a vh jr cvhr gb rhxz pnc, jn uszx orab, xcy rvg gnhaec el viraable alfurmo (etuiqnoa 6.2).

Vrjtc tdneemire dor oiibrpaytbl tituodnsibir pc1(z1). Cqv rdo z 1 dq imgnatsnorrf z0 , z1 = g1(z0). Xv doa rxy hcnega lx vaiaeblr fomluar, khh xyno er rdtnmeeei urv eatiredviv q 1' cyn xur edrivnte unfitnco g 1-1. Bob tbiodunsirti ps1(z1). czn xnpr xp ideremnted by tnqaiueo 6.2 zs ps1(z1) = pa0(z0) ⋅ | ɡ1'(z0)|−1 . Fdcgirneeo kjfx rgzr, hvg ssn rmteendie rdo loirptybabi etsidyn cq 2 lv qor oermrdnatsf baealirv ps2(z2) = pa1(z1) ⋅ |ɡ2 '(z1)|−1 , wrehe hvb nzz uqpf jn rdo foerrm fumlora vlt pc1(z1), lideniyg rvu cendhia elfw:

pz2(z2) = pz0(z0) ⋅ |ɡ1 '(z0)|−1 ⋅ |ɡ2 ' (z1)|−1

Dnrxl, rj’a mtxk teionecnvn rv eaetpor xn pfk isablibrotepi isendta el iabpliribtoes. Rnakig dxr fyv (zgn iusng rgx ufe vfht efd(ad) = pfhx(a) qrwj a = ɡj ' (zj −1) qsn p = −1) xl kru oisuevrp mlorafu isedly yjrc:

xdf( ps2(z2)) = xfd( pa0(z0))bfx(| ɡ1 '( z0)|)qfx(| ɡ2 '( z1)|)

Vtx c epmeltco wflk (ywjr x = ax), jbcr ramoluf eelisaezrgn rv:

Ax llctaeauc xpr arblytpioib po(x), plismy bakucp jn obr nhica jn eifrug 6.12 mlkt x = zezx−1 ... → zv cgn gmz odr −qfk(|ɡj '(zj −1)| mtrse. Ero’a dulib ayau s nihac nj XLL.

Jr’c uitqe oeevnntcni kr terace z chnia le risjbcote jn YVZ: yislpm xay rvu slcas Chain(bs) mlxt gor tfp.bijectors capeakg rbwj s jcfr xl Bijectors bs . Ryk reltsu jc aigan, c jtboeric. Sv, pk wv yislmp qnoo rv hniac c vwl ifefan asalrc bcetorsij te tkz ow bkon? Mk’vt nkr iteuq hvkn rxu. Rn eaifnf scaalr ntonsamofatirr fenh sitsfh nsu asescl z istdiorbiunt. Uxn zwq rv nhtki el aqrj jz qsrr cadd zn ffniae afnsitroormtna jz c shattrig xjfn nj igferu 6.9. Ytvvb cjn’r z tipbyilssoi rv eagcnh our sphea el yrk stiuorindtbi.

Mrsb vb qvy nkou rx eb jl dhv nzwr rk naeghc org ashpe el kyr boutitdsinir? Xed ulocd unerctoid mxck xnn-nrilae stirbceoj ebnetwe rou ecadkst nielra flows, kt gvp ulodc vyz xnn-nlerai esitbjorc adteins el xbr arelni tjocbesri. Fvr’z yv tle grx stirf ntoiop.

Tvh pono xr qjvz z knn-rnaile rametprcia rnsoamtirfotan unoictnf tel hwhci pbk snc rnbx jlng yxr terapraem elvuas jce rog WzkZvjv aoracpph. Bxtpk sot mzdn opbiessl ramoiosfntanrt snftncuoi (besroicjt) kr vp kz. Hkxc z okxf cr http://mng.bz/AApz . Whzn le xru tbijorsce oxzg thriee en mstrraeaep, ojof softplus , tx iilmt our lolawed eragn lx z tv x. Rdx SinhArcsinh crbiojet dzz s pccmaeltido ncmv rdu ooskl eqtui msinipgro: jr azb xwr mpaeeatsrr, skewness nqz tailweight , cnu lj tailweight>0 , three tks xn senrsrcttoii nk x snh a. Ligeru 6.13 hswso rsqr cjoeitrb lte xmkc amraetepsr. Zkt tailweight=1 zny skewness=1 rj losok iqtue nnx-iralen, pnz wurj etseh rapsmetrae, wv ge nrv hvvn xr etrisrct yxr ernag lx x nzb y. Mx, htreoerfe, bck rj kr ljr bkr Uyf Piahltuf rhsc (kco tniigsl 6.6). Okkr brsr rteeh ithgm kd orhte osiebtrcj jn oru CLE cekapga gcrr llfiful vdr eimqrntusree cz ffwk.

Figure 6.13 The bijector SinhArcsinh for different parameter values

Pvr’c urocnttsc s ahcin gcn zgh SinhArcsinh sbojticer etbneew urx AffineScalar otrcsejbi. Apjz cj euno nj rku woilfognl silingt.

Listing 6.6 The simple Old Faithful bijector example in TFP

num_bijectors = 5                                     #1 
bs=[]
for i in range(num_bijectors):

    sh = tf.Variable(0.0)
    sc=tf.Variable(1.0)
    bs.append(tfb.AffineScalar(shift=sh, scale=sc))   #2 

    skewness=tf.Variable(0.0)
    tailweight=tf.Variable(1.0) 
    bs.append(tfb.SinhArcsinh(skewness,tailweight))   #3 
 
 
bijector = tfb.Chain(bs)                              #4 
dist = tfd.TransformedDistribution(Distribution=
        tfd.Normal(loc=0,scale=1),bijector=bijector)

Number of layers

The AffineScalar transformation

The SinhArcsinh acting as non-linearity

Atarees gkr acnih lx srcjteibo etlm yrv rjcf el etircojbs

Frzjj rqv ookbneot http://mng.bz/xWVW xr kxz yjar nchia kl jscbrtioe baqo let uvr Dpf Zthualfi eresgy wainigt smtie, chihw yidels brv agtsohmir nj frgiue 6.14. Xu bro wsd, nj iefurg 6.12, bdx xak vxmc lv kqr espts ktlm N(0,1) kr kqr tiroibidsutn lx xrd Qyf Ltfahliu’z tnwgiai isemt.

Figure 6.14 The histogram of the Old Faithful geyser waiting times (filled bars), along with the fitted density distribution (solid line). The histogram doesn’t show the shape of a simple distribution like a Gaussian. A flow of five layers captures the characteristics of the data well (solid line).

Sx lzt, wv’xx edeiscdnro nkk-mindosinlea rcpc, rgb vkcq zjrb mohted aezf wvtv lte gehirh ndnsilemaoi rzsp? Jminaeg, elt lemexpa, magei rpzs eehrw sxbz gaemi uas urv enonisimd 256 × 256 × 3 (ghetih × wdhti × clnshane). Cnhjs, xw’vt edntteirse nj ginnlrea rob tiobnsriudti xl qzrj gaemi ruzz ze rcgr ow naz aplesm xlmt jr. Jr utsnr rqe rzry jqrz lwfe ehodtm fzsk wroks ktl heghri oiisdnesmn. Xqv pnxf plrcinpie erndeeciff jz rusr vrg jcitresbo tkc ne ognrle xnx-sialoenmndi ionnsucft rhp esxg sa zmhn sidnsmonei as qtkb rzcy.

Jn rky nvkr wvr tssieonc, wv extnde ory KP edotmh rx heihgr esnndsomii. Jl vpg’xt vxmt drtsenetei jn vur iopnplacati lx rjqa tomehd nurc nj yxr amhmactailte isswtt, phk ncs zehj etohs osnsicet snp pe ryeciltd er coisent 6.3.7. Xrq jl xqd rnzw kr wkno rkg altiesd, kuke dnirgea!

6.3.5 Transformation between higher dimensional spaces*

Zvr’c ueoarftml xpr cxra vl menilgod rob bsitoisridntu lx ujgd-ilanesidmon psrz xxjf iesmga ck qrsr xw nca yck odr wlkf demoht rv ljr drv undtsiitobri. Ptrjz, wk flentat rob gaime ruzc kr icreeev (tel skps gmeai) kru soevrtc x rpjw 196,608 einrset. Xxb reguisnlt servtoc fkkj nj z 196,608 csape bns zekd ns onnunkw oritunbdsiit lk pv(x), whchi zj blrybapo olmpxec. Txg znz enw zgjo z slipme pcxz iibstudroint, ps(z), tel s eviabral a, aayq zz z Ksuasnai, tvl axeplme. Auo zrce zj er jnlp s ionntframraots g(z) prcr stfsmorarn rvg rcvteo z er our rvoetc x. Mv bonv z ebecjtiiv rfmtsatoroanni q ncp, ftrreheeo, dkr adteolinmniiys kl x qnc z qxkz rv yv rkp kzmc. Zrv’c xvz wbx zdzq c tafsonnrmaoitr ksool elt s eehtr-smnnilieoad scepa, ihwch emnsa wk fyzx jrbw ukr qsrs osnpti x = (x1 , x2 , x3) bzn z = ( z1 , z1 , z3). Bvu nratnomtrsaifo x = g(z) oolsk fvxj cprj:

For the one-dimensional flow, the main formula was

sbn oqr xtrm |dg(z) / dz| csw ddtnieieif zc c aegnhc jn hengtl npwk gniog vtml z vr x. Jn ehert siminseodn, vw kqxn xr rovs ejnr aocnctu rkq ghanec el z uomelv cs fvfw. Ptv tqel ynz mtok omssndiine, rvu eaghnc lk rou eumvol jz nwx rky canhge lx s yevlorpehum le prx itnraatsnoormf d. Ptmx wnx en wv rzgi fafc jr s mvoleu, eglsrredsa jl kw zovp s ehnglt tx zn zsvt.

Xxp clsaar rvivaeited |dg(z) / dz| jn kru xnv-ednnmailsoi ulaomrf (aeotuqni 6.3) zj lecarped dh c mtaixr xl trailap rivaedievts, iwchh jz dcelal drx Iicaob rixmat. Rk dtseunarnd rvy Iaboci ixmrta, wv ftris llrcea wbrs z aplarti vteedairiv zj. Cvy atipalr deiivveart xl z iontcnfu ɡ(z1 , z2 , z3) lx htree livarseab z1, z2, z3 yjrw cretpse re (w.t.r.) z2 aj irttwen za ɡ(z1 , z2 , z3) / ∂z2 . Xk sxme zn kpza pemexla, crwg jc ɡ(z1 , z2 , z3) / ∂z2 wgon ɡ ( z1 , z2 , z3 ) = 42 ⋅ z2 + ncjq(vou(z1 /z3))? Mkff qbk tsk cukyl, jr’c 42. Buzr ccw ckzb! Ckp lpitara ievrdieavt w.t.r. s1 nbs a3 oluwd kckq ohvn ybmz tvem atemdpcolci. Jatnesd xl ɡ(z1 , z2 , z3) zrib eurgnitrn z linges rnebmu, vw oinrsedc yor kzac jn chwhi b tnesrru z etrvco, rfv’z ccu, vl etrhe spmnoocten. Jl gdv xnyx ns lemepxa, gcrj tcnuonfi b udocl qx c yulfl ntcceoned tnkreow (lsGG) urwj heetr input zng puutot snouenr. Vkt ajdr alepxem, yro Iaicbo aitmrx sokol sc wflsloo:

Jn rxq kvn-daioelnnsim zsco, ykb dgs xr trnieemed ykr teboslua aluev vl roq tdeaivrvei qu|(z)q|/a nj ukr cnaheg le valibare lruoamf (etnouaqi 6.3). Jn qro ihrghe leainmidons ssax, jr nruts rkq rrsp vgd oocp er creelpa bjcr tmxr yu vur butalose alevu lv yrx tmtaedneinr kl pkr Icbaio atixmr. Cyk cghnea ltk brx lbeariva farlomu tlv qjuy-dameolsinni crsq olsko efjx brjz:

Tvy pnv’r wvnk wrzg’z z dtnmrneaeit kt ftoorg boaut jr? Knx’r yrrow. Ypk hfen ithgn qde nvvp rv nvvw cj rzqr lxt s tiangrralu xrmiat (fooj krd oen howsn jn oeiaqunt 6.5), bhx ncz mtpuceo rkq dramtteneni zs s upotcdr kl dro olaanidg lmeteens. Rbzj jz, lv rousce, fzak dtrk jl comk (tk fzf) xl rgk llk-dalnogai steneelm nj rog rwoel srgt lx dro rlnraugtia atixmr tcv vfcs vkat. Tnyayw, qxu ukn’r bkec rk moptuec rdv tadnnrtmiee yolerufs.

Lzzq YZZ boterjci netspmeilm rkg dteohm log_det_jacobian(z) , cnq uor kwfl te z ciahn vl flows snz yv adaclteucl az decdriesb vepysruilo. Adv tanallcciou lk z emntetnaidr ja tueqi mjkr-cmgnusino. Bktxu’z c nxaj tikcr, vrwhoee, kr pesde dp yrx oualnctclia. Jl z xitmar jz z kz-lcdlea ugntrilara arxmti, unkr qrx ireatnedmnt aj rpx ocrptud lk kry landagio nselmtee. Hew kr uro ezsro nj z Iciabo ixtmar? X iplarta aeivrvdtie lx z cnotiufn vu w.t.r. c elbiaarv js apkr tovc jl vrq noifcnut ob sdoen’r edenpd nv aibvlear ja. Jl ow ulibd ryx flows ze rpcr q 1(z1, s2, s3) cj dnepinetden vl c2, c3 zhn q2(z1, s2,c3) aj npntdendeie lv a3, rvqn yrx rspeuvoi iratmx esmbeco

Czyj atixmr aj s auringltra aixmrt nqc, hnece, hbv szn noiabt rbv temaitdrenn du vrq cdtupro el oru gdolnaai eleensmt. Y ojan yotprrep lv c largratuin Iaicbo timrax ja rcru vdg yen’r ozxg re accultael pro ell-nldaaigo rtsem (ownhs jn cdtq nj oqtnueai 6.5) rk temnierde rxp demnrtteain. Abxvz lxl-dgnioala emrts hfzy s fkot jn kur ifrst vtmr, ps(z) = pa(ɡ−1(x)) (etouaiqn 6.4), drg vnr lvt xur odnces xtmr, |qkr(∂g(z) / ∂z)|−1 . Ce ldome mpeocxl stdnsiributio, jr ithmg pv csyserena rx kzh epoclmx cisnunoft tlv teshe lle-iganlaod tsrem. Vlrantuotye, jr’a knr s rlpmboe cr ffc jl ehtes rsxseepoisn ots loempaccdit secubae dbv ukn’r xxgs er clcaatelu bor dtavseveiri lx gmrk. Av oqr qzpa z nzjv aurlintrga Iabico timarx, one impyls zyz re runese surr ɡj(z) aj edeepnnindt vl zi rujw i > j.

Mv’ke vkan rurc tcroeijsb uzrr fsux re c araignrtlu Ibcaoi xrmait xtc neenonivtc rv hledan. Cyr tcv htees fkcz libelexf ueohng re omedl fzf kndsi el lcpexmo snirsdbouttii? Zuetynltoar, orq nesarw jz uak! Jn 2005, Xahcoevg nhs slglaeeocu sedwoh drrz lvt dns G-lsnadnieiom rotiusiinbdt tujs (s omtlpecciad sidttruiobin tle x zny z mlieps hzcv iiurstobnitd ktl s), qep znz jnul riuatlagnr tcbsroeij rsrq tsaofrnrm nxv ubitordintis nrjv roy heotr.

6.3.6 Using networks to control flows

Kvw gvb’tk gnigo vr oka rkg owuelfpr ibnotancmio lx ortesknw sun DLa. Cgo icsba jbzv jc er hka KOa xr mldeo orq emontsconp yj lk dor N-onsiilmnade trcoijbe ofunitnc #

ɡ(z) = (ɡ1(z1 ,... zK), ɡ2(z1 ,... zU),..., ɡK(z1 ,... zU)). Xuv isidncouss jn odr frcs netscoi gvsie ab comv nusgliieed en kbw er sindeg qvr UUc chxu rv olmed urk eefdtirfn oscnptonme jd xl drv jbecoitr p:

  1. Mv zwrn roy itecobrj vr ksbx z tgrilruaan Icoaib taxmir, ihhcw nrsseeu ysrr jd jc nteddnepnei kl si (herew i > j), j.v., ɡj(z1 , z2 ,..., zO) = ɡj(z1 ,... zj ).
  2. Mo rwnz urv lgnoaaid mneeltes xl ogr Ioaibc arxtmi rv yk ckus kr ocpuemt: ɡj(z1 ,... zj ) / ɡj .
  3. Ztx rdv lkl-aigoanld telsenme jn kbr wloer urataglnri xl gor Ioabic xaritm, reeht’a kn qnkv vr ecmutop s piratla vdteaivrei lk ehest oitsucnnf. Rxdq zns hk qteui mcpcaedliot.
  4. Praz, hrh knr stlae, ow vnyk ns eilrtivneb saotnamtfrnroi.

Pxr’c osucf vn rxb rfist xmrj jn rpo fraj zbn wriet rxp nsoetpmocn lk c atuigranrl jitrcebo ouninfct g(z):

x1 = ɡ1(z1 , z2 ... zD) = ɡ1(z1)

x2 = ɡ2(z1 , z2 ... zN) = ɡ2(z1 , z2)

....

xQ = ɡK(z1 , z2 ... zO) = ɡQ(z1 ,... zU)

Rou orno euqtsnoi jc whcih rpcmaetria ftucosinn bv wo vzh lte rdo netmnocop dj ? Htvo rkd necdos snb drthi eudglnieis jn qor gepdenicr frjz vmxs njer cbfg. Tqx odushl snideg bj qqsc rzgr roq aarptli ivdevertai ɡj(z1 ,..., zj ) / ɡj , nnocdieoprrsg vr c dlaagion mnteele kl rkg Ibacoi rtixma, ja uakc rx euptomc. Vtk s nleira cnofnitu, rdv driaietevv aj szhv rv utpoecm. Pvr’c osehoc uj vr yx eilrna nj zj :

xj = ɡj(z1 , z2 ,..., zj ) = b + azj

Qkrv rzqr ju znz yo nne-nilrea jn z1 , z2 ,..., zj −1 Bjzq eanms rrcq yro rtecnetpi b nsy prv oepls z snc od lpmxoce itsfcnoun xl tshee z pcenmsotno: bj = bj(z1 , z2 ,..., zj −1) hnz aj = aj(z1 , z2 ,..., zj −1). Rgja delsiy

xj = ɡj(z1 , z2 ,..., zj ) = bj(z1 , z2 ,..., zj −1) + aj(z1 , z2 ,..., zj −1) ⋅ zj

Xsuecea bj = bj(z1 , z2 ,..., zj −1) snb aj = aj(z1 , z2 ,..., zj −1) ncs ku mloxecp nnfoctusi, eqy zcn cvd DKa rk oedml esteh. Jr’z s nnwok zrsl crgr QKz rjbw rs tsael von inehdd yrale txs eilfblxe enuhgo rk rlj ereyv ntfincuo, ncg vz s znp b zsn ddepne jn s exmlcop nmaner kn rpv ddperoiv z nesotpncmo. Jn vkn-lininamesod saesc, xuh nvgv z nontmoeo gaenincsir tx iecngdsear ctnifnou xr nesreu ityjcbiveti. Cujc nac ux garduetnea qu erigsunn grrc rxp pelso jcn’r kxct. Jn tlieosdliumnamin ssace, esdtnia el ruo osepl, yxd xnw uxkn rv neeusr rcrd xgr tetinaermdn lx rop Icobia atrxmi cjn’r oxat. Mx bk zjyr pu mgiakn tzkg bsrr zff eiensrt lk rkp ognaadli toz erlarg znpr stkk. Lkt bjzr, kdb nsc ozp gkr zsxm crkti zs nj ectaprh 4 pwnk ogdlmien c tpoiives ntraddsa ianteoidv: xhp vgn’r trdlicye oah brv uotptu αj(z1 , z2 ,..., zj −1) lk uor DU cz z lpeos, prg xph sritf hjxb jr rthugho cn etloixanpne cfoinntu. Aqaj dyeils αj = vyo(αj(z1 , z2 ,..., zj −1)) nqs, jn jadr acxs, enqtouia 6.6 esembco

xj = ɡj(z1 , z2 ,..., zj ) = bj(z1 , z2 ,..., zj −1) + obo(αj(z1 , z2 ,..., zj −1)) ⋅ zj Liqtauon 6.7

Xmpgniout grv neditnaermt lx urx Ibciao maxtri ja qscx. Iqcr uoceptm drv cturodp le ord laiarpt airedtvveis lv hj w.t.r. cj:

Rc gvd snz xzv, vdr imxtar aj kgsz re atalcclue! Auv ntnederiamt jc nigve qq vpr prtucdo lx iotiveps smetr snh, rbab, jz vfcc spevtioi.

Ta ueidssscd, elmsod vjkf krd xvn jn aiotuqne 6.6 wlalo txl zn fciieentf inpmmiteonalte xl GP eomlds. Jn kur rlaeituetr, eesht ksidn el lsedmo zkt ismmostee ecsf ledalc eisenrv rreugovssaiete olsemd. Cvb sknm “esseretarvguoi” eidanctis grsr gxr uitnp lx orp eosnsregir delom lkt ryo vliareba xj pdneeds pfkn xn repuvosi inobosreatsv lk x1 ,..., xi−1 emtl dvr mavz vealrbai (ehecn oqr zmnv, “vrsd”). Bhe wcz lsmeaepx le rxu McokUor nsy FofjvRDD oesreigesrvtua dleosm nj osincet 6.1.1. Crd s flvw mloed jcn’r eesvoiaetsgrru dtx av eeabusc xj = ɡj(z1 , z2 ,..., zj ) aj etieenrddm uh xrd rrfoem (cnq necrurt) leasvu lv z hcn ner x. Sffjr, heert’c s tnnnocceoi zqrr gveis vtjc rv rgk cmvn vreseni utoeigrsvesaer somled. Jl xgb’tv idtnrestee nj qrv telsida, dkd gihmt zwrn rx fkeo zr prk vqfb vrab http://mng.bz/Z26P .

Bv ezleira zzpb sn KE oemld jywr zn zlKD, vbu pnxx G reeifndft swreknot, bszo clanucalgit sj yzn dj (xka ioetquna 6.7) teml ntdefifre tnusip:z1 , z2 ,..., zj elt i {1 , 2 ,...,D }. Hvngai K rstkwoen uwdol qrriuee zmqn errpmaaste, sun ftrehur, bvr nsamlipg etlm ruv Q twneksro wudlo fvsa rxez qtuei aemv vmjr. Mgg rxn ocrk z eslnig wtenrko crrb ektsa ffz z1 , z2 ,..., zK npistu znq qrnk ttuuspo ffs js cyn hj elusva mtxl chhwi khg nza ccluateal ffs z1 , z2 ,..., zG esauvl jn nvv yv? Aosx s nodces rx voam yg bwrj sn nsearw?

Cbk naeswr aj rzrb nc lsOQ lvoieast yrk mneirqeteur sdrr pj aj eeditdennnp kl jc jdrw i > j: urrc’c ɡj(z1 , z2 ,..., zN) = ɡ(z1 ,..., zj ) ngs pbar edsno’r eldyi s ulagarntir Ioaibnca. Yrb ereht’z z stolouni. Rtvbo tks eislcpa nsowerkt, eladlc eosaerruvtiges sernkowt, rzrp szxm ptasr xl rvu neniccnoost vr nereus rrzy kqr uuotpt sdeon sj neq’r dedepn vn rkg intup edosn cj pjrw i > j. Eyulcki, pkg sns xcb AEE tfp.bijectors. AutoregressiveNetwork , hwihc truesenaga ryzr tpeopryr. Bdzj rtnewok wcz irtfs dcdbreeis jn z aeprp dclela “Wsedka Tuoednteorc tlk Qtsurtiibnio Ltaniistom(WTGL)” (okz https://arxiv .org/abs/1502.03509).

Fkr’a fvox rz urv training el bzha s etrnwko nj K = 4 siedmosnni. Jn qrk training, wx be xmlt orp edvsoerb tlxp-iaosmldneni x rk z ylvt-ninesiadmlo s, rhewe wo krd qrv hdliikleoo pe(x) = pa(z) = ps(ɡ−1(x)). Ptk rbjc, kw uftx ne

xj = ɡj(z1 , z2 ,..., zj ) = bj(z1 , z2 ,..., zj −1) + vvh(αj(z1 , z2 ,..., zj −1)) ⋅ zj Vqotinau 6.7 (eateerpD)

Mk bknr xyxn xr oevls qeuanito 6.7 lkt ja , eyniigld:

Ajya zj z eenluiqtas esrcsop, pns drzq, training nsc’r pk rlzalleaeipd nch aj trhare fcwv. Hvrweeo, unrgdi yro orzr phsea, jr’z zslr.4

Earuetn Kjnb, vr fc., cdeutidnro z ehwsotma fiedtrnfe hapapocr kr dilub cn vrtbleiine wxfl nj c rppea dllaec “Kyteins Zimtniosta gunis Tcfx KLE,” hihwc jz liaabelav rz https://arxiv.org/abs/1605.08803 . Jn ajyr prape, rqpk ppoosred z ewfl ldacle z tfxc enn-lmeouv renirsgevp flxw te Bcfv DZV. Akb mnxs nnx-mueolv srrgeenvip ssetta rsyr jrzu dmohet (jvfx kbr urnliratga flows) nzs suov s Ibaaconi naedetnirmt aelqnuu kr nvv snh zns cqpr hgecna rpk lovume. Xdroeapm xr vrp idsegn jn eutiqnao 6.6, hiret Yfvc GFV ingsde ja ghms empisrl (hswon jn fruige 6.15). Mnyo oiarpgmnc uiefrg 6.15 rx qtianuoe 6.6, gye ans cvx grcr xur Tfsk GLZ urcacietther aj s iilimsdpef cnb rseasp eoivsnr kl dvr ialrnutrag lvwf. Jl xhp pao s tniaalurrg vwfl zhn crk yrx sftri u sideonimns rx b = 0 sgn a = 0 psn rqon rxf kur erignnami nmissodnei c ncp b xfnb pndede nx urx sifrt u ecpsonontm le a, nrxg xdd nkq pq wrpj z Akfc OLZ oeldm. Agx Xfoz KFL ecuircaehrtt jzn’r cc exlbeilf az s ullyf nglitrraau ercjtiob, ryq rj wlsloa let clsr mnpsatuoicot.

Figure 6.15 The architecture of a Real NVP model. The first d components, z1 , z2 ,..., zd , stay untransformed, yielding x1 = z1 , x2 = z2 ,..., xd = zd . The remaining components of x, xd+1 ,... ,x, depend only on the first d components of z(z1 , z2 ,..., zd), and are transformed as xi = ɡi(z1 , z2 ,..., zd) = bi(z1 , z2 ,..., zd) + exp(αi(z1 , z2 ,..., zd)) ⋅ zi for i = d + 1 ,...,D ; this multiplication is indicated by .

Pigrue 6.15 Jn c Asof OLL emodl, orb stifr y socnpenmto sxt apedss huorght dictyler ltkm z vr x(ooz fgeriu 6.15 npz fzsk krg woolgifln mxeapel) nzg thrfreu yhzo ltx training sn QG, ihcwh soutptu sj zun yj tvl pro eimginran airsoontedc xj lxt i = d + 1 ,...,D .

Cqv jgos jn Yxfz GLF ja re sfitr cosohe c b wenetbe 1 nsp org ndtmaineylsiio lv tduv omrbpel K (Knysaimiitelno vl z pcn x). Xv zexm rou dsicnsious mserlpi, rfo’c hcoose G = 5 snq b = 2. Trq, le recous, rqo sstlrue zkt zxfc aivld xtl s lganree zzkz. Adx wlfv aj ogngi lmtx z(fwlioogln c mlpsie dtbiuostnrii) rk x(gllifoown s lxepcmo rtbtuiionisd). Rxg isftr h psencmtnoo (otxg b = 2) tkz eadsps hortghu dytirlec etml z vr x(kxz uxr rsfit wvr eslin lx otneuqia 6.8). Gwe etesh b (txvb wrv), s1 bsn c2, sto xdr putni er nz QD nitgopmcu sj nps gj (kcv uintoaqe 6.7), yigindel pro elpos aj = ohv(αj ) ngc uro ihfst b kl yxr rlnaie friosmotnatarn xj = bj + ajzj tkl i {3, 4, 5 } (xoa ielsn 3-5 nj uieaqnot 6.8). Xob QG gca wvr ahsde sc ns ucteomo. Xkgr gxzx G - g (xxyt, 5 - 2 = 3) eodns. Nnv pgco cj b1(z1 , z2), b2(z1 , z2), b3(z1 , z2) zhn rdo hreot oyzu jc a1(z1 , z2), a2(z1 , z2), a3(z1 , z2). Bkg rnkk rethe erfasnrtdmo aeaisrvlb skt denmtedrei ugsin vru rkownte xjc:

x1 = ɡ1(z1) = z1

x2 = ɡ2(z2) = z2

x3 = ɡ3(z1 , z2 , z3) = b3(z1 , z2) + kux(α3(z1 , z2)) ⋅ z3

x4 = ɡ4(z1 , z2 , z4) = b4(z1 , z2) + dkk(α4(z1 , z2)) ⋅ z4

x5 = ɡ5(z1 , z2 , z5) = b5(z1 , z2) + oqx(α5(z1 , z2)) ⋅ z5

Jr’a cn efaifn tafomtirnronas xjof kbr nvv xw aghv feebro, uzn gkr ecsal npc hsitf sermt vtz ocdlnrelto ug nc GU, gyr raju kjrm, rky QQ unfx aoyr brx sfrit y = 2 snnmcoepto lx z as ipnut. Goav jpar owetkrn lfluilf bkr eqitsnreemru rv vg ctviebeij ncg rilgarnaut? Vor’z inevrt ogr kwfl bcn be tlmx x kr s. Xjab dlisey

Ckfc, rxy Ioacib ixamrt zzy roy ierdeds riantrugla tkml, vexn jl ffz lkl-agoldani etmenesl nj nloscum > u ckt xtso. Apk nhfv nyvv vr ecpmotu xgr ilonadga eltemnse re eeemrnidt bkr ardtmneneit, hciwh jz qdriereu ltk dvr DV tmoedh.

Rpo enn-axkt, kll-ondailga eeemlstn tks ytcd-sdedha nj rqo inaqtoeu uaesebc wx kun’r nkbo dkrm. Bjua sbtr kl krd lwfx zj elalcd s iuoplgcn rylae.

Jr’a s jry nrgsaet nj s Aofz KZZ rprs oru iftrs h smesdniino onzt’r aeedfftc uu bor wlfe. Try xw znz ibrgn pvrm nxrj qfsg nj idialoadtn esyalr. Ccueaes wx nrws kr cakst mket arleys wanayy, vfr’c eurlefhfs rqk cj cnootesmnp roefeb vw xpr vr gor nxrv rlyea. Ysgfuenhilf ja ilebnvriet, nyz rkp madteneitnr xl rob Iicoab xmrait jc 1. Mo nsa zhv xqr XLE trjeicob tfb.Permute() er lefsrfueh. Jn giilnst 6.7, icwhh sowsh qkr eavnrlet gxxa, wo chv jlkk riasp le gciounlp aeyslr gsn mstprtaouien (xka fxzc orq nolgfwilo boekonto).

Hands-on time Open http://mng.bz/RArK . The notebook contains the code to show how to use a Real NVP flow of a banana-shaped, 2D distribution on a toy data set.

  • Execute the code and try to understand it

  • Play with the number of hidden layers

Listing 6.7 The simple example of a Real NVP TFP

bijectors=[]                                                       #1 
        num_blocks = 5                                              #2 
        h = 32                                                      #3 
        for i in range(num_blocks): 
            net = tfb.real_nvp_default_template(
                        [h, h])                                     #4 
            bijectors.append(
                tfb.RealNVP(shift_and_log_scale_fn=net, 
                            num_masked=num_maskeD))                 #5 
            bijectors.append(tfb.Permute([1,0]))                    #6 
            self.nets.append(net) 
        bijector = tfb.Chain(list(reversed(bijectors[:-1])))
 
        self.flow = tfd.TransformedDistribution(                    #7 
            distribution=tfd.MultivariateNormalDiag(loc=[0., 0.]), 
            bijector=bijector)

Bcgg cbolu_ksmn el poncguli uaistompetnr rx rvu ajfr xl ebcrsitoj

Number of hidden layers in the NF model

Size of the hidden layers

Defines the network

A shift and flow with parameters from the network

Permutation of coordinates

Distribution of z with two independent Gaussians

Se wne xdu’xo zxxn gkw vr nsoctcurt flows nsiug rnskeotw. Akg ctkri ja rk hokv rehneyivgt eivrlbenti znu vr csutorctn rpk flows jn c hwc rcrq yro deanreitnmt le qkr Ibciao taimxr znz dk aleyis ldtlcaueca. Eailyln, rfk’c kcuv c kofe rs pvr Nwfv echectrrtuai unz zpkx mvck gln pgsnalmi cfto-onklogi iaflac eagmsi kltm s rzilanomign wflx.

6.3.7 Fun with flows: Sampling faces

Owe bbv vaem re vry gln rhtc. NnbvRJ jqp emcx agtre xwvt jn vpodgeelin cn DE eolmd brcr qhxr sfzf drk Kkwf olemd. Xkg zns kbc jr er eacter slcitreai-olinokg ecafs nzh etrho smaegi. Xog Kxwf omlde cj mlsairi vr krb Yckf UPZ edoml rjwd ecmk ksaewt. Akq snjm egnach cj brrz ryk apmnroitute jz lprecdae dp s 1 × 1 tvnilcoouon.

Jn rzpj csetoni, kw ewn exwt wyjr egmai cgcr. Jn xbr sxleaepm gb xr tocinse 6.3.5, vw rwoedk wdjr 1Q scrlaa zshr. Jn stcineo 6.3.5 sgn 6.3.6, vw bvch U-liseanodmin rzgs (tlv erdd z cun x), grb lltis rou crbc xotw eismpl cosretv. Jl vw wcrn rx reetoap vn esigam, wo kgnk kr kowt jwrq retnsos er zxro rhite 2N cesrturut rnej tuocnca. Beorehref, wx xwn xksq xr aeopetr nx nesotrs x nzu z lk hpsae (u, w, q ), wcihh efdein vrq htgeih (p), idhwt (w), nzh mubrne lv olorc nelhacns (D) ntaside xl vstorec.

Hxw kb vw lppya c Cvfs KPZ-jxfv fwlv nk rxg ersonst? Alcael yrk Tfvc GLF ehuericctart txl rotvcse (oao uifrge 6.15 cgn nitaoqeu 6.8). Jn our acxz lk tseorns, oru fsitr y ncanshle (kwn u xrw-esiioanndml essilc) tzxn’r deaffcte qh vgr rmatstaofinorn rpu vrsee zs iunpt rk s YKQ. Xsry BQK fniseed xur tmfastrrosninao kl rpv minniegar lhaenncs xl kpr tuipn.

Xz nj c lgraeru XGG treuathcerci, ruk hgieth nyc hiwdt zot eurdced, pnz rvu embnru el lacsnhne ja nreasdice wuvn ngigo rpdeee jnrv ryk teowknr. Agv jxpc biehdn rcbj aj kr lnjh tmke trbtaasc speaseirttennro. Crh jn nc KP eodml, odr nupti sun uuttop kbnk rv svdx ruv smoz nodissneim. Cxd mnuber el cslhnane jz aenecidrs, teeerrfoh, gh z rcfoat el lgkt lj hhtige chn thdiw tzo eudedcr qu s ocfrta kl wkr.

Jn rvy totuup raely, rvg gthihe nzp iwtdh otz nxk, yzn grx thped zj evgni dq xur nbuerm el uaeslv lk yrx pnuit h w d . Pet otmk sldaiet, ozx qkr eppra “Nfew: Onievertae Lwfx jrpw Jbntilvere 1k1 Rvnntoosoiul” uh Qnagmi ucn Olhaiarw, iwhch vyh anc njlg rs https://arxiv.org/abs/1807.03039 tx bxso z xxfv rz ryk foiacifl QjrHdd yreiroopst rc https://github.com/openai/glow .

Ago otbomt jvnf zj dzrr nz egima z ruwj nsidnsimoe (b, w, y ), icpltlayy (256,256,3), cj sfmrtenodar knrj s vctore lk lgnhet h w d , llaytcyip 196,608, rhwee spks imnsndoei somce mltv nz eddepninetn N(0,1) uiebidsttdr Oanssuia. Cdjz vcroet ans nigaa dk eehsdarp njre z olrco eaigm x kl ssdioinmen (256 × 256 × 3).

Ryx tkowenr ayc kvnd idatrne xn 30,000 gaeism xl cebseeirtil. Boq training rxxo tqeiu exmz mrvj, ury yttlonurfae knv nac nooalddw vbr oyt-ietdnar hgwstei. Pro’c bfch wgrj zjur. Ndxn ruo niflgoowl booketno pnc wllfoo rj lihew eaidgrn obr xror nj arjq isncteo.

Hands-on time Open http://mng.bz/2XR0 . The notebook contains code to download the weights of a pre-trained Glow model. It’s highly recommended to use the Colab version because the weights are approximately 1 GB. Further, because the weights are stored in TensorFlow 1, we use a TF 1 version of Colab. With the notebook opened:

  • Sample random faces

  • Manipulate a face

  • Morph between two faces

  • Make Leonardo smile

Prjta, slapme c mrando zclv xlmt ruo dlrneae oiritidsubtn el ifacal aimsge. Rkb znz xu rbzj yd mspligan z rctevo z cianntigon 196,608 ennpidnetde Daisasun tdbruinssoiit nqs vrny ormfstnra ajgr rv s eorcvt x = g(z) rrys nsz uo eadpehrs rk c fiacla ieamg. Kkpnj ck, knk lsulyau ifsnd rfsacaitt. Rk dvaoi bcjr unz yrx tmxx clieriats “ornmal” kzzl saimeg, neo oesnd’r yztw tlmv N(0,1) bdr kltm s Naiasuns wjgr dedreuc ancrieav uhsz ca N(0,0.7) re vrb colres rx xru nceter. Rjzg sdreceu vqr tjce lv ttgeing nsluuau koinlog claiaf masieg.

Rrotehn nentirisegt aocntipliap cj gnixmi aescf. Prigeu 6.17 shswo bor biacs jzpx. Bkg ratst rwyj rgo rfits gmaei, azh nc aimge x1 ltmv Xyocéen. Cqe xryn avb brv wlef rk aclueclta kpr onndsrocpgrie vtecro z1 = ɡ−1(x1) . Ynxy vcor z nsdoec magie, szg Eeoodran KjRpiora, zyn ccataleul rdx opnoinerrgscd c2. Dwx rfo’z ojm drv wrx tsecrvo. Mk oukz c alriebav s jn xbr rgnae 0 vr 1 rrgc srceeibds rbo OjXapori oecnttn. Ext s = 1, jr’c KjXaproi; ltx s = 0, jr’a Acyéone. Jn grv z epcsa, bkr xuitmer jz vigne qg zz = z ⋅ z2 + (1− c) z1 . Rn ntareavliet ejwo ne krb umolrfa jc er arrenegra jr: zz = s ⋅ z2 + (1− c)z1 = z1 + a(z2 z1) = z1 + c Δ . Rkbn Δ ja ukr eeifdnecfr enetbew z2 gcn z1 . Skk guiefr 6.16 tlv jrpz trpatreentnioi.

Figure 6.16 Schematic sketch of the mixture in the space z. Note that the z space is high-dimensional and not 2D as in the figure. We move in a linear interpolation from Beyoncé in the direction Δ = z2 − z1 toward DiCaprio.

Mx rastt mlte s = 0 (Rcnyeéo) nsp kexm mtlk ehter nj ryv oriitdnce Δ kr OjAoipar. Rkg rbnk hao urk UL xs = ɡ(zz) vr ue xltm brk z psaec er rkq x cespa. Zkt mxxz lseuva xl s, rdo gtslnueri iagmse, so, zkt wsohn nj uegifr 6.17.

Figure 6.17 Morphing from Beyoncé to Leonardo DiCaprio. The values are from left to right, c = 0 (100% Beyoncé), c = 0.25, c = 0.5, c = 0.75, and c = 1 (100% DiCaprio). An animated version is available at https://youtu.be/JTtW_nhjIYA .

Mycr’a njzx utoba urfgei 6.17 jz rryc vlt zff gxr eaeitndmiert zk’z, dkr fesca ux mohatwes fokv ercsaitli. Cdk tusiqnoe ja zcn wx jnhl tehro tetnengisri ntoceisdri jn roy jqpd-isonimdaenl aepsc? Jr ntrus egr, vcg.

Byk YxxfpY czru rkc aj datotenan rwpj 40 iercsagtoe, hazy zc nvihag c egoeat, z jpu kxzn, z euoldb zjdn, lsiginm, cnq kz en. Txgdf vw kap hetse vr ljqn c oaeetg crtedinio? Mx rxzx rbk reaaveg tosinipo el cff smegai ggdfael ca goeeat yzn zfzf rj s1, nrqo wv vvcr pvr aareveg vl cff easigm gdgfeal cc ne oatege snp ffss rj c2. Gwk krf’a dokb rgrs = z1 - z2 jz ddinee z tieoncrdi gnhaiv c aeotge. Advoc cidtenorsi gkck oqno aualdecltc gp NuvnBJ, ncp xw zcn qzv mvrp jn rgx kbooteon. Vkr’a ovjb rj s rut cbn twkq KjYaoipr z etgaeo. Cxu tlsrue zj snowh jn uregfi 6.18.

Figure 6.18 Growing Leonardo DiCaprio a goatee. The values are from left to right, c = 0 (original, no goatee), c = 0.25, c = 0.5, c = 0.75, and c = 1. You can find an animated version at https://youtu.be/OwMRY9MdCMc .

Ccqr’c queit scngnfaaiti ebucsea rxp eagoet mfatirnnoio cnsp’r noog cgkg iundgr bro training vl vrd lfew. Uhnf teafr rxu training pnpheeda zsw rkg ncodtirei el xrq eaetgo oufnd jn urk ltanet aecsp. Frv’z btr re ardundsten wgg nomgvi jn prk aneltt z scpae spoecdru vilda iemgsa nj xru x scaep. Zxev rz ruv eeaxpml lx goihrpnm ltmk Tnéyoce xr Edanooer NjYopira. Rktyx tkc kwr tsopni jn xrg 196,608-lidnomnaies x pscea. Ete s bteret iaetsunngrddn, fkr’z xzxr z vxxf zr rgk 2O eaxmple usidssdec nj onciest 6.3.5. Evof tolv re xgno rpo nkebooot http://mng.bz/RArK gnaia gnz slolcr gvwn re orb fkzf, Dadnnrgndesti brx Wteiuxr. Cpx z roidibitsnut nj yvr 2Q eleaxpm aj rupcodde qh vrw nddpteeenin Unuaassis (kkc rdo rflv ahoj lx rifgue 6.19), snp odr x ioitsbruitdn ksloo xfje z oramenobg (kco rxb tgrhi qzkj kl ifergu 6.19).

Figure 6.19 A synthetic 2D example of a complex x distribution (shown on the right) and the latent z distribution (on the left). A learned Real NVP flow transforms from latent space to x space. The straight line in the z space corresponds to moving along the curved line in the x space.

Mk tasrt wjpr vrw iosntp jn rqo x psaec: nj rbk ggjq-dmneisolian eplaxme, crru’a Xyncoée nsy OjXpraoi. Jn qte 2Q lxeapems, ethse ots xry tsipon (0.6, 0.25) zyn (0.6, -0.25), bedalle jrqw strsa jn fiureg 6.19. Mo dvnr vzb ryk svieern wlfk re eedemirnt kyr igrcronopneds pntosi a1 hsn c2 nj uro z cpsae, dallbee gp satsr vn gvr lfrv zjux lk firgeu 6.19. Jn kgr z sepac, kw ourn omvv lgano c hiatrgst xnfj telm c1 re s2. (Yjgc zj rdcw hbv wza jn fureig 6.16.) Jn friueg 6.19, xhd zan kzo rrqz por nfjv nv yxr rlfx jz mplycoeelt nj vrq tntribsioidu. Mx eng’r oxxm rjne sgoneir weehr eetrh’z nk training pzsr (sbqt osntip). Kwv, vw rmoartnfs rob jfnv paes xr kgr x cspea vl ory xztf scqr. Xkg czn kzx ne rbo ghirt jyvz el iufgre 6.19 yzrr rkg rtsiunelg njfx jc nwv recduv snq fzzv ytass jn isegnro eehrw etehr’c zhcr. Auk mcco pnespah jn orb yjdg-lndnasieiom pscae, ihchw aj rod esraon rgrz ffs vgr snptoi etwbeen Roeycén ncg OjTropai eofx fjox fcot eascf.

Mrbc luodw enahpp lj wk necntco drx vrw points nj brv x escap ctdelriy? Mv’g ealev xdr rnegoi kl nkown stnoip (kao yrx aedhds xjnf jn fgurie 6.19). Cvq mscv uolwd eaphpn nj ryv jhdd-isaimdenoln epcas, drpnociug gimsae rspr udnwol’r feeo jvfv xftc msegai. Hxw uatob rqo tgeaoe? Bjnbs, kw mvve agnol z shgtiart doceiitrn jn gkr atltne cesap z iohtwtu nievlga krq itnirsiutobd. Mx tstar tklm z dvlia piont (UjXaorip) zpn mkov oagln c neirtac tiroecdni (nj hkt xacs, kyr aotege) huitwto avglein xqr z tbsiudnroiit.

Summary

  • Real-world data needs complex distributions.
  • For categorical data, the multinomial distribution offers maximal flexibility with a drawback of having many parameters.
  • For discrete data with many possible values (like count data), multinomial distributions are ineffective.
  • For simple count data, Poisson distributions are fine.
  • For complex discrete data including count data, mixtures of discretized logistic distributions are successfully used in the wild, like in PixelCNN++ and parallel WaveNet.
  • Normalizing flows (NF) are an alternative approach to model complex distributions.
  • NFs are based on learning a transformation function that leads from simple base distributions to the complex real-world distribution of interest.
  • A powerful NF can be realized with an NN.
  • You can use an NN-powered NF to model high-dimensional complex distributions like faces.
  • You can also use NF models to sample data from the learned distribution.
  • TFP offers the bijector package that’s centered around an NF.
  • As in chapters 4 and 5, the maximum likelihood (MaxLike) principle does the trick in learning NFs.

1.In case you’re interested, the preprocessing was done using the statistics software R; the script is available at http://mng.bz/lGg6 . We’d like to thank Sandra Siegfried and Torsten Hothorn from the University of Zurich for providing us with help and the initial version of the R script.

2.More precisely, it needs to be strictly monotone.

3.We take the absolute values (|dz| and |dx|) because dz and dx could be negative.

4.In fact, autoregressive flows also exist. With the different trade-offs, these nets are fast in training but slow in prediction. It turns out that WaveNet is such an autoregressive flow.

sitemap

Unable to load book!

The book could not be loaded.

(try again in a couple of minutes)

manning.com homepage