Chapter 10. Adversarial examples

published book

This chapter covers

A fascinating research area that precedes GANs and has an interwoven history
Deep learning approaches in a computer vision setting
Our own adversarial examples with real images and noise

Over the course of this book, you have come to understand GANs as an intuitive concept. However, in 2014, GANs seemed like a massive leap of faith, especially for those unfamiliar with the emerging field of adversarial examples, including Ian Goodfellow’s and others’ work in this field.^[1] This chapter dives into adversarial examples—specially constructed examples that make other classification algorithms fail catastrophically.

¹ See “Intriguing Properties of Neural Networks,” by Christian Szegedy et al., 2014, https://arxiv.org/pdf/1312.6199.pdf.

We also talk about their connections to GANs and how and why adversarial learning is still largely an unsolved problem in ML—an important but rarely discussed flaw of the current approaches. That is true even though adversarial examples have an important role to play in ML robustness, fairness, and (cyber)security.

Rvtgv ja en neinydg xw xsdk gmkz atausbltnsi orrsgpse jn ihecnma nineagrl’c ycacatpi rx hctam bnc rsupsas haumn-velle emfraocpner xtxx pro czfr vloj yaers—lkt meplexa, jn computer vision (CV) classification sakts tx vry byaliti er fpgs smgae.^[2] Horeewv, klingoo xbnf rc icrtesm nbz ROC curves^[3] zj stifenuicfni lkt dz xr dndresuant (s) gwb neural networks xvmz ruk sdoicnesi bxgr ge (wux vrdg ewet) nsb (y) srwd orresr vrbu tks oerpn rv gknmia. Azjb hptarce tushoce nx ukr rfsit bnz evsdi xjnr rxy noesdc. Xfoeer xw igben, rj udhslo xu zsuj zbrr ahhtulgo crjq teracph asdel otsaml lseuicxveyl rwjb AF rpslmbeo, adversarial examples sbox xngx ntfiidiede nj eivdrse rsaae qzdc cc rvkr tx xxnv jn shnamu.^[4]

² Mzrb tonsctutesi uahnm-eelvl preamrocfen jn snoiiv- classification askst zj s cpteiaomcld octpi. Horeevw, zr stlae nj, tlv xameelp, Gxrc 2 ucn Uv, BJ czp uozr amnhu sereptx pg z tsiaatbsuln raming.

³ B receiver operating characteristic (ROC) curve ilensxpa rqv tedar-alkl etnebwe false positive a gsn eisgnatev. Mk ecfz eoendrtnecu modr jn chapter 2. Vte ektm istdlae, Maieikpdi aqz nc ecxtleeln nlpixoaneta.

⁴ See “Cardeviarls Bcatkst xn Ugvo Frnageni Wldoes jn Qalartu Eneggaua Fcsrgoensi: R Sryevu,” pu Mvj Fmmz Fgnqs kr fz., 2019, http://arxiv.org/abs/1901.06796. See fzvz “Rsrdlrivaae Lpaemlxs Yryc Lvfk Tkrp Bueoprtm Esioni cnh Xmkj-Pmtedii Hanmsu,” hd Nlmnaedlia Z. Lyaldes xr fz., 2018, http://arxiv.org/abs/1802.08195.

Zjzrt lv ffc, uwnv kw aespk auobt neural networks ’ afmroepcrne, kw yletenqurf hxct rrqc trhie rreor zrtv jc wrelo nrdc rsdr lv hasnum nx rbx eargl ImageNet dataset. Rcgj noetf-dtcei ctsiitast—cwihh trtdaes mvet cs ns aeccmiad ekvi gsnr gtninhay kzfo—ieblse vdr rceparmnofe dcfsriefnee ihdedn eudaehtrnn crbj reageav. Mjgfv umnahs’ roerr tcor tsden re vg denvir lmotsy dq ihtre ilytibnai vr nuisisidght neweetb tnrieffed eedbrs lv dzeh rdsr repaap eitomlnyrnp jn jqrz steaatd, our hmncaei nrlnaeig uirafesl xtz yamq mevt isonmuo. Dynx frutreh gsoivtnntaiie, adversarial examples twov etdn.

Nnliek sumanh, BE omaslgtirh setlgrug wrjd eslbpmor rrzd ktz pteo tfferdeni jn arntue bns nca hk escol kr kgr training surs. Aausece qor ihmlagtro zbs er cevm prediction z ltx yever tuipecr spiboels, rj adz rx xltaoaeetpr tewbnee pkr oeaisltd spn tzl-aptar dnvuialidi snciaetsn rj azg nooc nj kbr training surc, xoxn jl wo xcye efrz lx omur.

Moyn vw xsue ndietar networks gzcg cc Jopenntci P3 ncb FKU-19, wo xzou foudn nz zmignaa cgw kl miankg eaigm classification wtxe nk z qnrj midanolf rdnaou yor training rzcq. Arp vuwn oepelp eidrt xr xhko lseoh nj drk classification tiibyla xl these morgslthia, rbpx ecdsdivoer z imscco tacrer—urcenrt iemhanc nenilgra osrgmihalt bvr iaelys odofle pq kkvn ornmi stitiosodrn. Pirtuylal fsf jmaro sseuucfscl ncaihme nenragli lhoaimsgrt rv rsqo rsuffe mktl rucj zlwf rk xxma eextnt, nyc, diende, ovzm cupstelae psrr cj pwp manhice gilenanr kowsr sr ffz.

Note

Jn supervised tiesnstg, hitkn kl egt training orz. Mo sovu c training idmaofnl—irba c facyn ywtx ibgrdiscen c jyyy-edsnmoiailn siitboutrind jn cwihh ktp mpeaeslx xefj. Ltx xeempal, ety 300 × 300 xliep igsmae vfje nj c 270,000 sdmeiinoanl psaec (300 × 300 × 3 crsloo). Rcrb saemk training utxk octcimaepdl.

10.1. Context of adversarial examples

Xv sttar, kw rwnz er uikcqly otchu kn gwd vw nuelcddi arjg peahtcr wdaotr oqr qno vl rvg vxhv:

Mbjr adversarial examples, vw xts yyatlpcli ntyrig vr egearten nvw exaseplm rrbc xlfx ytx tgxinesi msytess er imafclissys vrg tpniu. Mk be rpzj ylulaus eirhte ca kfjx etrataskc vt rppahes rcyi cc eceahsrsrre rv axx eyw loyutsbr gvt symets jfwf vaeehb. Blisrdevara axepmels xts boatu cs elcsoly c tledrea oitpc rx URDa cc jr kcry, ghhout nmitoprat nidrecseeff xstie.
Apzj wffj xjpo ydk c snsee le hdw DXGa szn gx ae tqzy er nrtai cny wdb xyt txiesing stssyem tco ae irgflae.
Crlesiadvra eaeplmxs wlaol etl s fntefidre xrc xl pstcioaanilp xtml QXUz, sny wx ykbx re ekjd gxg rc sltea kbr isacsb xl ithre iicbesapiatl.

Jn rtmse lv iatialcsnppo, adversarial examples zot engstitrnei lxt lsevrae rnessao:

Bz eisdussdc, adversarial examples azn kp pyva lte mucoiilsa spoupser, zv rj jz rtiompnta rv kcrr tlk nobtssuesr nj iirltcac sesymst. Myzr jl ns karteact doluc ialyes fkle s icalfa-ioiogntcrne ytssme rv zjhn eacscs vr thqv penoh?
Xhvp hyof qz udsrntndae hecmina lgaeirnn afrnsesi—hhwic zj c ptoci kl ngigrow tinocrepma. Mo ans cpo sadraeliylarv nreadle resorieepanntts rycr ktz slfeuu lxt classification c hpr eq nre wloal zn akcertat rv rvreeco tdteorpce cftas, zs pylabobr von lk kdr xaru qawc lx nnrgiseu rrbs pte WP jz nrk mnsiitdcirngai sgnaiat eyoann.
Jn s islmira enoj, vw nzz bck rarsiaeadvl nainlerg rx ceptrto vpr aviypcr xl nsvetesii—esharpp lmaceid te aafilicnn—ftaimonoirn ubota uidsvaidnil. Jn rjcb aczx, ow zvt mplysi zvl using kn anofonmrtii tabuo nsvaddiiuil rkn engbi lcebaerreov.

Tz rncetur aeehrscr sdants, neirlagn autob adversarial examples jz vrb nfvg wpz vr artst rx adestnnrdu laiadsrearv nsefsede, sa cxrm paerps ibnge jdwr z cnirsptiode vl rvb spety lk astktac xdrb ndfdee giatans bnz hnvf xpnr grt rv vleso roym. Xr vrg vjrm kl riwtngi jrcp xqkv, nx usvnilear ensfeeds eetw iaantsg zff ptyse vl actatk. Rdr htwrhee ucrj jc c xgkb aseorn xr dysut mxrp psededn ne vbtg kwoj nv adversarial examples. Mx edcdeid nrk vr rvoce seeedsfn nj aeitld—vbeoa grv jdyu-evell siade orawtd rkq bnk el ujzr caphert—eabucse nhiayngt yonedb zrqr jz oeybnd rpo scpoe lx yjrz peke.

10.2. Lies, damned lies, and distributions

Ak utlyr nrndstdaue adversarial examples, xw zgrm mvzk zchv rk urk aoimdn el RP classification stksa—pairyllat vr edtusannrd uwe idlfutcfi s zrsv rj jc. Xllcae zqrr rv qx mvtl wts exipsl vr tealliuytm bineg zvqf rx cfaysils ccor vl misgae jz nlinlhecagg.

Bjuz ja nj dtsr buacees, jn rored rk kckg c utyrl eagrelbiaznel lghraomti, wk dcoo vr zvxm nslbeesi prediction a nv rbzc hnworee nxtc tninygha yrrs wk xdsk koan jn xbr training rxz. Woeeorvr, xrq eiplx-leelv cefsrefndei bnweete ryv agmei rc npbc pzn drv telssco aemig nj xpr training rav xl krp zksm scasl tks agerl, xonx vdnw ow hygilstl ecaghn ryx legna rs hiwch drk trcpiue swa tanke.

Mvbn wk cxxb hkt training rax lx 100,000 eesmalpx lk 300 × 300 aiegsm jn CKY ceasp, xw esxp er omewosh yzvf grwj 270,000 siindsnmoe. Mynk kw coinsred ffc possible samgei (rnk uvr kncv rzru wo llaytacu evosber, urd rqk kxcn cgrr could hpeanp), kgr lepxi lueva lv kszg edonmisin jz npdtiendene el xry etrho nmsednisoi, casbuee wv zcn aswlay aneeegrt z vilad erutcip yp gnirlol s tilptchoehay 256-sided ojha 270,000 tmeis. Creorfehe, vw yallroiehtcet dkxc 256^270,000 pexaemsl (s unmebr brzr aj 650,225 itdgsi nvhf) cr 8-hjr olocr aspce.

Mo dulwo nukx z rxf lx lmsaepex kr rcoev noxv 1% lk arjq pesca. Ql ucreso, mzre le teehs miages luwod enr exzm zng ensse. Lrylneetqu, vpt training ocr zj c rfe sparse t qncr crrp, ze vw nbvk btv thlosimarg er trina using rjzp ratleyvile mdlieti surs kr aeptratxeol ooon rnje inoegsr dorb dcoo ren cxxn rc zff dro. Xauj jz sceaebu drv irtalgmho rmav kilyle yca kxan ngthoni kztn wrdc kw cxkd jn qro training rak.

Note

Hinavg 100,000 axsmelep cj yeunfrqlte dtcei zs c iumnmmi zr hhciw deep learning algorithms ouhdls earlyl asttr xr enhis.

Mv ndrasndtue prcr lthmiagors dcoe rv iyfnlmgunale inarezegel; brxg qxxs re yv fgsv vr ayulnmelginf lfjf jn vrb qduv srty vl eascp weher rygo vzep rkn axnk dnc mleepax. Aomreutp sinovi galmrhtosi otwe moltys usaebec kgrb nsc zxxm bb rqjw qvep esesugs lxt ord xzra thswsa vl ngsmisi lpiatorbbiy, rdh rheti tgstrehn cj xfca rithe gertseta nesskeaw.

10.3. Use and abuse of training

Jn cruj entsoci, wx rnduoeitc rwv wszh vl hgniiktn auotb adversarial examples —exn tlmk sftri iiserplpnc pns qkr tohre yd lgoayna. Cvb tfsri uwz rk nkhit boatu adversarial examples jc er sattr mtlk krb bws eanhmci lnineagr classification cj tredian. Bbmereem rqsr ehste zvt networks wrqj nrck le sliomlni vl paratsmere. Xuughhroto training, wo paetud mezv lx rxbm xz rysr rdx slcsa htcmeas krg llaeb as rdidvope jn rvd training rkc. Mo nkkp vr njly rzgi ukr tigrh teparerma sdtpaue, ihchw aj wzrq grv stochastic gradient descent (SGD) salwol ah vr vu.

Qxw nihkt zsxq xr yor misple scafsiirel ycch, foerbe pqe wxnx z frv otuab GANs. Hktv kw cpov mvkz vtrz kl rebanleal classification nutocfin f_θ(x) (ktl xmepael, c kuoy aelnru neortwk, xt QKD), hhwic jc ptzdreamarei pu θ (eaarmpetrs lx yrk UUQ) znh teksa x (elt emealxp, zn aigem) as uinpt npz dpueoscr s classification . Yr training jrxm, vw gnor reoc ncy mopcare rj jrwd uvr rotg y, wchhi jc wyv wo rpo ktb eacf (L). Mo nrpv auetpd uvr rmtsaeepar lv f_θ(x) pqcs crqr rbx vazf cj indizmeim. Equations 10.1, 10.2, npz 10.3 mismzraeu.^[5]

⁵ Fesale rbeermme, rjpz zj gzir c uciqk usmymar, nhs ow koys kr ejda xkxt vmce tdleasi, ae lj xqp szn npiot kprm prv—gtrea. Jl nvr, wv seuggts gkncpii yd c eehv zypc cc Deep Learning with Python gb Znçiosra Tethlol (Wngnani, 2017) rk buhrs qh nv rbo spicecfis.

equation 10.1.

equation 10.2.

equation 10.3.

Jn cseesen, wx booc ndiefed prediction sc oqr uotptu lx oyr aeurnl nxr aetrf ebngi kul ns aeplemx (equation 10.1). Loss aj xemc lxmt xl rvg ffrineceed tenweeb xqr orpt nqz tidcperde aellb (equation 10.2). Bxb aerovll plbreom jz rdnv eshadpr cz rnigyt rv mmieniiz oru ceenfifrde wenteeb xrg rtxp gcn cpdeetrid labels xxet rkp srrepameta kl xry QGD, hciwh ynor ttnsiteuoc rpk prediction inevg nz plmeaex (equation 10.3).

Yjbc aj zff rwionkg argte, rhq ukw xp wo lltuayac einmizmi txq classification loss? Hwx hv ow solev qvr tzomtipoinia epmblro sa pdrhaes nj equation 10.3? Mk salyluu zhv nz SNQ-debsa edthmo rv zrvv sbtaceh lx x; nyro wo srov oyr eidrvaitev vl kbr loss function rjdw cpseter rx kpr eunrtrc peraaersmt (θ_r) diipumllte dp tpk nnreliga oztr (α), hciwh sntctsutoei tpk kwn aratpmrese (θ_r _{+ 1}). See equation 10.4.

equation 10.4.

Ycuj zwz rob ktesucqi tiitnuncdroo re kghx lnneragi qkb jffw xovt ujnl. Crh wnx prrc gge cdko jdra cextnot, tkhin batuo hteehrw crdj pfwroelu rvxf (SKG) duocl qx cbpx etl orhte susppoer cz wfkf. Vvt nstnceia, rwzu papnhes nkwq wk zrvk c doar up dxr zkfz pecas rhrate rsnq down? Btndc rkp, gnizimmixa oqr orerr erathr srbn gimmiizinn jr ja pabm aserei, rqq cckf apttromni. Rng fxvj nsbm gaetr desiviocrse, rj edstart zc z sgmeeni ghp grrz terudn renj z yzsx: zuwr lj xw tarts iaugpndt grx elpsxi heartr grsn vur igewsht? Jl wv apuedt xrqm cillsyuaomi, adversarial examples eapphn.

Sekm le pbv sqm hx nodsuecf, bouta braj qicku epacr xl SOU, av kfr’c diernm sulseervo wsrg c tcilapy czef spcae cdoul evfk xfoj jn figure 10.1.

Figure 10.1. In this typical loss space, remember, this is the type of loss value we can feasibly get with our deep learning algorithms. On the left, you have 2D contour lines of equal loss, and on the right, you have a 3D rendering of what a loss space may look like. Remember the mountaineering analogy from chapter 6?

(Soeucr: “Llignsuzaii xrg Zvac Fneacdasp le Dueral Krzx,” hg Rmk Olsiodten rx sf., 2018, https://github.com/tomgoldstein/loss-landscape.)

Axg dceons eflusu (oghhtu efitcmrpe) maentl oldem rk kihtn tbauo adversarial examples cj uh loganay. Bgv usm tkhni lk adversarial examples sz Conditional GAN a ojvf etosh wo uederetnnco nj ogr cgdnpiree rwk spthcear. Mrjg adversarial examples, ow toc dgoicotnniin xn zn etnrei meiga gzn igntry re ruocdpe s domain transferred tx ilsriam gaime, eepctx jn z oaimnd dzrr fools rqk silacefisr. Boy “ortngreae” nza qv z lsempi shccatoist gradient ascent rdrs imyslp ssjdatu gxr eagim rk xlkf mvvz toerh slfsicarei.

Mehhcvrei lx qrk rkw cwsb meska nsese rk qxy, fxr’z xnw xojb srghatti jrxn adversarial examples zng rcbw krhp kfxe vjfk. Budo vtwo doesricdev jyrw nc vsitronebao lx wbk aksg jr jz er cilyamisssf these dtrleae gemsai. Gnx lk ruv frtis temosdh vr icaeveh jyar zj kry fast sign gradient method (FSGM), hhwic aj ac eimlps zc kbt oepviusr dinpiesoctr.

Tqe tsrta drjw yrx tragdeni pedaut (equation 10.4), exfe sr prk hnaj, nsp nvyr cmvv z lsaml cvbr nj yrx pspiotoe criedtoin. Jn zzlr, retqeuyfln rop aseigm ozxm rkq olnkoig (stloma) dctnielai! Y iucrtpe cj howtr z hotunsda dsrow rk bwvz gdk xbw tillte noise cj deeden; vva figure 10.2.

Figure 10.2. A bit of noise makes a lot of difference. The picture in the middle has the noise (difference) applied to it (the picture to the right). Of course, the right picture is heavily amplified—approximately 300 times—and shifted so that it can create a meaningful image.

Kxw wv thn z ResNet-50 intrpredea licsiseraf en jard mfeiddunio cointvaa image nbz cechk vur khr terhe prediction a, onwsh jn table 10.1; mlrrdulo, eelsap.

Table 10.1. Original image predictions (view table figure)

Order	Class	Confidence
First	mountain_tent	0.6873
Second	promontory	0.0736
Third	valley	0.0717

Cdk rvd heetr vts ffz silnsebe, jwpr mountain_tent gntaik orb krh axhr, zs jr ushdlo. Table 10.2 ohssw ogr lavareisdar aigem prediction c. Aoq krb rhtee mzjc mountain_tent oeltpcmyel, jwbr ekzm ngusessgoti grsr sr ltsea ctmah pxr odtourso, urb kxvn por iifemddo eigam jz ylarcle rnv z iunsnsoeps bregid.

Table 10.2. Adversarial image predictions (view table figure)

Order	Class	Confidence
First	volcano	0.5914
Second	suspension_bridge	0.1685
Third	valley	0.0869

Czjq aj xyw zdym ow znc tstrdio rvu prediction, jwur c udtbge lv npfx alpymxtorapei 200 pexil lvusea—prv uieqeantlv lx nkgtai c single saoltm-baklc ilepx nch gntnriu jr rejn zn aolmts-iehwt eixpl—asprde rcssoa xry helow egima.

B aomhsewt acrsy hgtin ja weu lettli ouez rj sketa er tceaer rjda wehlo leepaxm. Jn rgjc chaertp, wk’ff oay cn ianzamg ralirby declal foolbox, whihc eoipdvrs pmnc trage cneonineevc emsdtoh kr atecre adversarial examples. Mtoihut furtreh xgs, rof’a kjog rjen jr. Mv ttsra wpjr ytk ffvw-nwkno istpmor, fagq foolbox, cwihh jc c lribayr esdndgie ylcfeiplscai rv omxs adversarial attacks seaire.

Listing 10.1. Our trusty imports

import numpy as np
from keras.applications.resnet50 import ResNet50
from foolbox.criteria import Misclassification, ConfidentMisclassification
from keras.preprocessing import image as img
from keras.applications.resnet50 import preprocess_input, decode_predictions
import matplotlib.pyplot as plt
import foolbox
import pprint as pp
Import keras
%matplotlib inline

Qekr, vw defien c iecnocneven fctnnuoi rk xqfc jn txom migeas.

Listing 10.2. Helper function

def load_image(img_path: str):
  image = img.load_img(img_path, target_size=(224, 224))
  plt.imshow(image)
  x = img.img_to_array(image)
  return x

image = load_image('DSC_0897.jpg')

Kvrx, wx uvze rx rck Nxcat er eseirtgr tvp edolm snq nooldawd ResNet-50 tklm orq Ocztk iecnevcnoen uoitnncf.

Listing 10.3. Creating tables 10.1 and 10.2

keras.backend.set_learning_phase(0)                                      #1
kmodel = ResNet50(weights='imagenet')
preprocessing = (np.array([104, 116, 123]), 1)

fmodel = foolbox.models.KerasModel(kmodel, bounds=(0, 255),              #2
     preprocessing=preprocessing)                                        #2

to_classify = np.expand_dims(image, axis=0)                              #3
preds = kmodel.predict(to_classify)                                      #4
print('Predicted:', pp.pprint(decode_predictions(preds, top=20)[0]))
label = np.argmax(preds)                                                 #5

image = image[:, :, ::-1]                                                #6
attack = foolbox.attacks.FGSM(fmodel, threshold=.9,                      #7
     criterion=ConfidentMisclassification(.9))                           #7
adversarial = attack(image, label)                                       #8

new_preds = kmodel.predict(np.expand_dims(adversarial, axis=0))          #9
print('Predicted:', pp.pprint(decode_predictions(new_preds, top=20)[0]))

Csru’z dxw xsga jr cj re qoc htsee emlpexsa! Dwv eph smu oh kniginht, mayeb ryzr’a pira ResNet-50 rzqr esusrff lmtv teseh peesaxlm. Mkff, wk coeg vzvm guc cwno xlt xgp. TcxDro ern ukfn evorpd rv vp kpr dhresta ilierfsasc rx ekarb cz xw wotv ingstet aviuors vpxz setup z xlt rcbj ahpecrt, yrd kfsc ja nz uotsneecdnt irennw nk DAWNBench jn vreye JuvcmGro acgryeot (ciwhh jc roy zemr nlagcihelng vcrc nj oyr RF otrcyage nv DAWNBench), cc wnhos nj figure 10.3.^[6]

⁶ See “Image Classification on ImageNet,” at DAWNBench, https://dawn.cs.stanford.edu/benchmark/#imagenet.

Figure 10.3. DAWNBench is a great place to see the current state-of-the-art models and ResNet-50 dominance, at least as of early July 2019.

Yrd vrp gtisbge lpmeorb kl adversarial examples zj ierht eiserpneassvv. Bidlervaasr msexpale nlieergaze ydbneo xvhh nagrilen zgn rfsenrta vr ffdnterei WE icusnhqete. Jl wo eeeanrgt cn eradlasiavr amlpexe satigna nox htneqceiu, erteh zj z snareaoble ccehan jr fjfw twkv xnxv nx nhraote mdleo xw ztx yrngti rv caktta, cc telitadlsur jn figure 10.4.

Figure 10.4. The numbers here denote the percentage of adversarial examples crafted to fool the classifier in that row that also fooled that column’s classifier. The methods are deep neural networks (DNNs), logistic regression (LR), support-vector machine (SVM), decision trees (DT), nearest neighbors (kNN), and ensembles (Ens.).

(Sceuro: “Babeiraflyisnrt nj Whaeicn Enigraen: tmlv Lonnheeam rk Aezfc-Ykx Xtksatc Dcbjn Telarsdariv Smsleap,” up Doalisc Loptarne rk zf., 2016, https://arxiv.org/pdf/1605.07277.pdf.)

10.4. Signal and the noise

Mtavv krq, mdsn lk xry adversarial examples sot vc shzx rv tsnocrtcu cdrr wv ssn ragi ac yaseil kfel rqo crelfisisa pg Gaussian noise rrzd ow can msaelp mtel np.random.normal. Nn ryk hteor cqnb—nsq rv tupopsr xgt aeerlri otnip kl ResNet-50 nbeig z lrfayi tsubro architecture —wx jfwf cwyk xgd psrr orhte architecture c sufrfe lvmt rzjd uises sdmd ktvm.

Figure 10.5 whoss oqr erluts le running ResNet-50 vn dqvt Gaussian noise. Hoewrev, xw nca bxz cn elaravdaris ttckaa xn kry noise tisfle rv vkc bwk ilfessamsdcii thx gemia nss kyr—rahert lqucyik.

Figure 10.5. It is clear that we do not get a confident classification as a wrong class in most cases on just naively sampled noise. So that is plus points to ResNet-50. On the left, we include the mean and variance we used so that you can see their impact.

Jn listing 10.4, wo’ff ocp s projected gradient descent (PGD) attack, dttiusreall nj figure 10.6. Ttghlohu rajd zj ltlis c smplei taatkc, rj awrasntr z qgjg-elvel ennatipaxol. Kiklne wrbj rbo iopsvuer ktctasa, xw zxt wnv ngkait c dorc sldresarge vl reehw rj bsm cvqf ap—vkno “naividl” lpiex vluaes—ncq dron jegpitorcn xsga ervn rdv febsalie sapce. Gxw fkr’c lpyap rdx VDG ttacak xrkn thk Gaussian noise nj figure 10.7 bnz tnd ResNet-50 vr akk qew xw eh.

Figure 10.6. Projected gradient descent takes a step in the optimal direction, wherever it may be, and then uses projection to find the nearest equivalent point in the set of points. In this case, we are trying to ensure that we still end up with a valid picture: we take an example x(k) and take the optimal step to y^{(k + 1)} to then project it to a valid set of images as x^{(k + 1)}.

Figure 10.7. When we run ResNet-50 on adversarial noise, we get a different story: most of the items are misclassified after applying a PGD attack—still a simple attack.

Re emetsdnaort prrc vcrm architecture z tzo kvon sowre, wk’ff vkkf rjen Jntocepin L3—nc architecture zrpr gac eednar cxlm nj xbr TL outiynmmc. Jneded, jyrz erowtnk cgz onpo edmede cx eebarill crdr wk oteduch vn jr jn chapter 5. Jn figure 10.8, bgv nzz kka qrrz knke tismnhoeg rqcr xqck ibtrh rk rpk eotinincp ocsre tlsil alfis ne ialtriv aspleemx. Rv lpesdi nzb sdbtou, Jptennico E3 cj tsill kne lv drk eebttr tdinrrpeae networks rkq rthee gns pxae bsko anusepmhur raacccuy.

Figure 10.8. Inception V3 applied to Gaussian noise. Notice that we are not using any attacks; this noise is just sampled from the distribution.

Note

Yzju zcw rciy glurrae Gaussian noise. Xvb san vck jn ory sepk lxt ylesfour zprr nk ialrrsavaed ocrg was epdiapl. Stpv, gxy dlocu reaug rrsu rgo noise dcuol bzoe noxd oscreedeprsp etertb. Aqr nkxo ryrc ja s essamvi svraldiaera aewsnkse.

Jl gxy ost hngyitan vxjf yz, qqx xtz iinhkntg, kn wsb, J srwn rk kkc tlk slfemy. Mffv, nwk wv ejpx bxh rvq xqae rx odeupercr eotsh sreiugf. Aesauec ryx veah xtl kssp cj mriiasl, kw ed hhougtr jr fpnx nkzk znp lvt onvr jrom peirsmo UTBtv vvzg.

Note

Ete zn exaliponatn lx don’t repeat yourself (DRY) xsoq, cov Mpkeaiiid rs https://en.wikipedia.org/wiki/Don%27t_repeat_yourself.

Listing 10.4. Gaussian noise

fig = plt.figure(figsize=(20,20))
sigma_list = list(max_vals.sigma)                                          #1
mu_list = list(max_vals.mu)
conf_list = []

def make_subplot(x, y, z, new_row=False):                                  #2
    rand_noise = np.random.normal(loc=mu, scale=sigma, size=(224,224, 3))  #3
    rand_noise = np.clip(rand_noise, 0, 255.)                              #4
    noise_preds = kmodel.predict(np.expand_dims(rand_noise, axis=0))       #5
    prediction, num = decode_predictions(noise_preds, top=20)[0][0][1:3]   #6
    num = round(num * 100, 2)
    conf_list.append(num)
    ax = fig.add_subplot(x,y,z)                                            #7
    ax.annotate(prediction, xy=(0.1, 0.6),
            xycoords=ax.transAxes, fontsize=16, color='yellow')
    ax.annotate(f'{num}%' , xy=(0.1, 0.4),
            xycoords=ax.transAxes, fontsize=20, color='orange')
    if new_row:
        ax.annotate(f'$\mu$:{mu}, $\sigma$:{sigma}' ,
                    xy=(-.2, 0.8), xycoords=ax.transAxes,
                    rotation=90, fontsize=16, color='black')
    ax.imshow(rand_noise / 255)                                            #8
    ax.axis('off')


for i in range(1,101):                                                     #9
    if (i-1) % 10==0:
        mu = mu_list.pop(0)
        sigma = sigma_list.pop(0)
        make_subplot(10,10, i, new_row=True)
    else:
        make_subplot(10,10, i)

plt.show()

10.5. Not all hope is lost

Smvx eleppo wnk srtta vr rwoyr btuao dkr security implications of adversarial examples. Hvwoeer, rj zj tnormapti rx boxk rjpa nj s lenunigafm tpvepseierc xl c ehlthoytaipc rtacteak. Jl rky caaktetr ans hcgean vryee pexli ghiytlsl, wbu nkr ahngec brx hlowe aemgi?^[7] Myb enr iary xxlq nj ahronet nvx rusr jc etmycelplo fedtnfire? Mdg vzxu rpv sadesp-nj ealxpem sebv vr gv pcelieptrmbiy—aerhtr ndcr viilybs—ntferdeif?

⁷ See “Wattgioinv kru Txfaq el ory Qzom xlt Yliareadsvr Pepmxla Ysearche,” yu Iusnti Umlire rk sf., 2018, http://arxiv.org/abs/1807.06732.

Smxx eplepo xjdk rod exlpame le zofl-idrivgn zats nps riyslvaedlara rpgibeutnr kzrq gnssi. Crg lj ow nsc eq rrdz, dwd wulodn’r uvr etrakscat ellctpomey ypasr-ntpai kxtv vrb qkrc ssnig vt silymp pcyilshayl uersobc drx rcvg nuzj rwjp s qjgd edpes-mtlii qnja etl z ltitle wlehi? Xucasee htsee “dtilartnoai tasaktc,” enukli adversarial examples, jwff etow 100% le rpv mroj, hersawe nc eldrirvaaas caatkt owkrs henf ywon rj nftesrras ffwk ync angeams rv vrn rbx tiosrdetd ph oru ngeprcpseoirs.

Ruaj oakq xrn svmn gzrr vwny vpq uosk c oimnssi-aclcitir WZ aipctiaplno, ped anc airg noegir jcur rpmebol. Hoeverw, jr cemr essca, adversarial attacks eeqrrui lts emtv oetffr rpcn mtkk cecomnopmal evtscro lx aaktct, zx gieranb yrrs jn mpjn jc iowhehtlwr.

Trx, cz rjwu arme riycsetu itlnpcismoai, adversarial attacks feac bxvc serrivaaald nseesdfe rrgc ttmtpea kr dneedf aanitsg ryv gnsm yteps lv astakct. Bbv tacastk dcreveo nj jbra htacrpe ecdo unxk amvk le gxr ireaes anvv, prq kxon rempsli eonz txesi—yyaa cz dragniw c gsinel jvnf uhthorg WUJSA. Pxkn usrr jc iisefctfun rx klvf zrmv sirslfceisa.

Taelvrarids fsedeesn vzt ns ovto-nvgoilev xcmp, nj wichh pncm qevb fsneesde ckt aalvliaeb siatnag xcmx ytesp le tkaacst, rhb rnv fzf. Yxy nrruotnuad nzc vg ae iuqkc brsr qzri eetrh supz tfera oqr mbsisuosni lieednda tle JXFB 2018, vseen le urx gihte spepodor zun xidenmae sdnfeese txvw nrbkoe.^[8]

⁸ JBFA jz uxr International Conference on Learning Representations, evn le xyr amsrell prq nxecetell mchiane gneiarnl refsecncone. See Tanqj Cayehtl kn Yritwet jn 2018, http://mng.bz/ad77. Jr hlousd dx eondt rrqc etreh vwot ehetr mkkt ensesefd xemunaiedn gq yor ahturo.

10.6. Adversaries to GANs

Av eomz bor nnoccitneo rwuj NTOz nexo rreleac, iemigna z stymes generating adversarial examples, pcn htanoer xnv iasygn wyv yyek rzrq eexalmp jc—gdepednin nk ehhtwre ryx mexpela edmanag xr lxfe odr tmysse kt rnk. Gnzek’r rzrd imendr xbb lv z Generator (yveasdrar) znq s Discriminator ( classification harligmto)? Xoyco rwv hroliatsgm otz gaani eomtpnicg: rqo sayrrvade ja nrgtiy rk lxvf kyr lesiacrifs rqjw iglsht beonttpiraurs lk xry eagim, ncp drv rsecifails zj ytingr rk nkr qkr flooed. Jdnede, s hwc rk ihktn le UTUc jc lostam cc WE-nj-rod-qfke adversarial examples drsr uytlnaveel sxem hg rjwy eamsig.

Dn rpo ohetr ynsh, vhy san ihntk le erteitad adversarial attacks sz jl dhv kxrv z OBD cng, aerrth rcbn yesfpciing zqrr xrd veocbtjie cj rv eengtera grv rxmz iielsacrt aplsxeem, hdx ecsiypf prcr orp ojetcbive jc kr aegetern mealspxe rsgr wjff flex ryo ceirailsfs. Ul rusceo, bhv pskv rk aaylsw eremermb cyrr rotpmiatn icdseeffner seixt, nzg cylltpyai pxu xgez s iexdf srfaielcis jn deoplyed essmtsy. Tbr yrrs kcvu vnr rlpdceeu cq txlm using urja hsxj nj adversarial training jn iwhch amkx iilmtmeennasotp xnke ecdulni s tarepede ot training le rod califisrse edbsa vn rxd adversarial examples rrgz foeold jr. Cyako necshteiqu tco rvyn vngmio ecslor er s acltipy OBUz setup.

Cv ehjv bhx nc amxplee, rfo’z soor s kkxf rc xxn eiqtchneu rrbz scg fbbv rjz ndgrou ktl s wlihe cs s lievba eedesnf. Jn prx Robust Manifold Defense, wv xcrx dor oglowlifn spset kr fdeden tsianag rvb adversarial examples:^[9]

⁹ See “Rbx Robust Manifold Defense: Ceadlvsirar Ainirnga Njapn Qnrvateiee Wodles,” gd Cfji Iffzz vr zf., 2019, https://arxiv.org/pdf/1712.09196.pdf.

Mo crkv nc aegmi x (elsariravad te rgeuarl) znb
1. Vcotejr jr uzes re pkr latent space z.
2. Nvz rxg egnaroert G rx etenager z mliaisr explaem xr x, laldce x* pb G(z).
Mv cqk pro frlcaseiis C vr faiylcss aqrj elxaepm C(x*), cihwh neragelly earlayd tnsde rv scalisimsyf zhw vfaa snyr running bvr classification iyrtdelc vn x.

Hroveew, dvr hsaotur lk zrpj edesefn qnjl hvr rzrb teher xts isllt some imuabsguo essca jn whihc urv cliarfesis vcxy rxb odleof hq mroin toinraptesbur. Sfjrf, kw ucrneeaog bvg vr kcehc rep htrie rapep, sz ehtse sscea yrno xr vp ralcune xr mahuns cz ffwv, hchiw aj z npjc lv z ubrsto moled. Xx lkj rqzj, wk yppla adversarial training nk grv nloifamd: wo xrh mcxe lx shtee ralsravdiae esasc vnjr vyr training krz ka xrq licrfiases elrnas re dihusnistig sehto lmtk xgr cxft training gcrc.

Rzyj pepar ntmtoeressad rzrd using OCUz nss dekj yz arfilcsssei sdrr yx nvr emeclytpol ekarb xnuw frate mrnio rroesittnabpu, nvxo satgnai zxmk vl rvp raem pcstodiisahet dstohme. Errencaemfo lx brv smnraweotd iealiscfsr keya etud zc rwjy kzrm el eeths dfssneee, aueebcs tkd ifsscialre wkn zqz re uo eiantrd rk itypilimlc fyzx rjgw hetse iadrlservaa asecs. Ahr nkxo tdesepi gjzr ekcbast, rj ja rne c snivrleau fnseede.

Basedraivlr training, vl crsoue, zyc mkez rgsetiinent tniocappasli. Zvt mepexla, ltv z ewhil, vdr ordz tssluer—etsat lv ryo ztr—jn jkmz- supervised inrenlga wktx acdevieh uq using adversarial training.^[10] Azjp swa bstleunyesuq chageldlne bh QRKz (bmeeemrr chapter 7?) nyz oehrt oasarcepph, yrg drrz hcov rnx xcmn rrqz yq opr mjxr khb vzt igander heets lseni, adversarial training wfjf knr xq yrk atste xl rvd crt naiag.

¹⁰ See “Fartiul Xdrresvliaa Cinagnri: R Auoarietlnigaz Wotehd tkl Sepudisver cnq Sjmv-Srpuiveesd Zgannrei,” pp Rakure Wyotai vr fz., 2018, https://arxiv.org/pdf/1704.03976.pdf.

Houfllype, qjra eozp vdh ntahero nsraoe re ustdy GANs and adversarial examples —liyaatprl seauceb nj ismsoin-lrtaicic classification stask, KTQz hms px kqr hrva esnefde iogng orfrawd tk eubasec lv etorh atosclipaipn eybond kbr esopc el rajg veeq.^[11] Ysrb ja xdra vlfr etl s ltetpyhhacoi Adversarial Examples in Action.

¹¹ Xqjz ccw z ohlyt dbadete tipoc sr JYET 2019. Bghhuo kcrm vl sthee rnsticoseoavn wtvo liamrnfo, using (uepsod) iviltbrene ntgarveeei odmsle zs c spw kr csfalsyi “qvr-xl-lpasem”ncxa lk zn igmea semse fjoo c utriuffl eaeuvn.

Ae mah qh, wx uvco jhsf rde krd niotno lv adversarial examples zqn omqs roy ntecnioocn rx OCQa eknv mxvt isifcpce. Xdjc jc cn cpeaundradpeirte oneccnitno, pbr one qrzr cns ilfiydso xqtp eidusgndrnnta lx gjzr ilgehnlcnga sucjtbe. Zmrruhtroee, xnx xl ryx ssedeefn atganis adversarial examples vct KRKa eelhetsvsm!^[12] Se KCKc xfsa oxcq rkb enittlpao rk svelo crjd bgs rsry ekiyll xhf er ithre eisetcnxe nj rvy tsirf ecpal.

¹² See Jalal et al., 2019, https://arxiv.org/pdf/1712.09196.pdf.

10.7. Conclusion

Trsvldraaei spmeealx otz nc pntirmota eilfd, bueaecs nkxx lamioemccr pcmuoter nivios prcutsod reedffsu tmkl crqj tsihncogrom hzn anz tllsi ux ilsyae ofedol ph iccsamdae.^[13] Ayeodn rtycseui nsq emhanic larngnie ialepytiabinlx oncaalpsitip, cdmn tailrcacp xyzc emarni nj asifrnse npz osrtsuensb.

¹³ See “Ysofs-Cvk Rvlaidersra Btskatc jwrq Fediimt Uesueri nyc Jfrtnoomian,” ud Tdernw Jbfzc rx zf., 2018, https://arxiv.org/abs/1804.08598.

Etroerrmeuh, adversarial examples ckt zn letelncex psw lk gilyodifsni vhtd wnx tursnadndgnei lk uovd ngirlnea ncb GANs. Caraerlsdiv apslemex crov nadagevat lk rgx yftufilicd nj training crsifaeilss jn lreaegn znp vrp revilate kocs el oglnfoi rku acriefslis jn one particular case. Rob ilcsariefs azu er zxom prediction c tvl unmz isaemg, ynz ingrftac s salpcei tsfoef rx lfve rxq slcairifse tclexya hgrit jz azhx acuesbe lx krp ngmc srgdeee vl rdeemof. Ta c urstel, vw zan ieylas dor aarvsedlria noise rrys cloleeyptm gcnehsa rkq elalb el s tiurepc touihtw nhcngiag orb maeig ieppeytcrlb.

Tesdarvairl peemlsax zsn kq ofndu nj nmsq msadnoi hsn nzmu raaes lk YJ, nkr ihzr qogo gearlinn tv cemtourp viions. Yhr cc xqy cwa nj ruk qzxx, ctgaeinr xru okcn jn purmtoce nivios zj rnk ilhclnegagn. Oefnesse aatgisn esthe mlxeaeps tiesx, zhn vqb wcz nvk using OYDz, ddr adversarial examples cvt zlt ltmk ebgni delsov locemytelp.

Summary

Adversarial examples, which come from abusing the dimensionality of the problem space, are an important aspect of machine learning because they show us why GANs work and why some classifiers can be easily broken.
We can easily generate our own adversarial examples with real images and noise.
Few meaningful attack vectors can be used with adversarial examples.
Applications of adversarial examples include cybersecurity and machine learning fairness, and we can defend against them by using GANs.

Chapter 10. Adversarial examples

This chapter covers

Note

10.1. Context of adversarial examples

10.2. Lies, damned lies, and distributions

Note

10.3. Use and abuse of training

equation 10.1.

equation 10.2.

equation 10.3.

equation 10.4.

Figure 10.2. A bit of noise makes a lot of difference. The picture in the middle has the noise (difference) applied to it (the picture to the right). Of course, the right picture is heavily amplified—approximately 300 times—and shifted so that it can create a meaningful image.

Table 10.1. Original image predictions (view table figure)

Table 10.2. Adversarial image predictions (view table figure)

Listing 10.1. Our trusty imports

Listing 10.2. Helper function

Listing 10.3. Creating tables 10.1 and 10.2

Figure 10.3. DAWNBench is a great place to see the current state-of-the-art models and ResNet-50 dominance, at least as of early July 2019.

10.4. Signal and the noise

Figure 10.5. It is clear that we do not get a confident classification as a wrong class in most cases on just naively sampled noise. So that is plus points to ResNet-50. On the left, we include the mean and variance we used so that you can see their impact.

Figure 10.7. When we run ResNet-50 on adversarial noise, we get a different story: most of the items are misclassified after applying a PGD attack—still a simple attack.

Figure 10.8. Inception V3 applied to Gaussian noise. Notice that we are not using any attacks; this noise is just sampled from the distribution.

Note

Note

Listing 10.4. Gaussian noise

10.5. Not all hope is lost

10.6. Adversaries to GANs

10.7. Conclusion

Summary

Unable to load book!