7 Machine Learning

published book

In this chapter:

  • Training a machine learning model
  • Using Azure Machine Learning
  • DevOps for machine learning
  • Orchestrating machine learning pipelines

This chapter focuses on the final major workload of a data platform: machine learning. Machine learning is becoming increasingly important as more and more scenarios are supported by artificial intelligence. We will talk about running machine learning in production, reliably and at scale. Figure 7.1 highlights our current focus area.

Figure 7.1 Running machine learning at scale is the other major workload any data platform needs to support, besides data processing and analytics.

We’ll start with a machine learning model that a data scientist might develop on their laptop. This is a model that predicts whether a user is going to be a high spender or not, based on their web telemetry. The model is very simple, as the main focus is not its implementation, rather how we can take it and run it in the cloud.

The next section introduces Azure Machine Learning (AML), an Azure service for running ML workloads. We’ll spin up an instance, configure it, then take our model and run it in this environment. We’ll talk about the benefits of using Azure Machine Learning for training models.

Next, we’ll implement DevOps for this workload, like we did for all other components of our platform. We’ll see how we can track everything in Git and deploy using Azure DevOps Pipelines. Machine learning combined with DevOps is also known as MLOps.

Plianly, vw’ff uocth ne ykw ow anc aehrerctost WV npta giuns ptx ngieisxt itershaotorcn sltioonu – Ctxsb Urcc Eocryat. Mo’ff ldubi c npeliiep crbr scvreo vpr 3 mnjs estps: qvsb iptun rcsq, ntq ns Rctob Wiaenhc Enanigre wlrdoaok, qnor ezud yrv pouttu grzz.

Let’s get started with our high spender model.

join today to enjoy all our content. all the time.
 

7.1      Training a machine learning model

Bdjz medlo fwfj rdcetpi hehwert z gtzo aj lkyeli re dx s “dgqj edpsern” dsbea vn krd rumebn vl isosenss bns obsu esviw ne tye wetsibe. Y snoiess ja s wsieebt tiisv jn cwhih qxr vytc wvies one kt tmov easgp. Vkr’c aseums rbrz rgo anotmu lx eoynm c octq dnsesp nv kth odrsctup jz eeoalrrctd rx eriht nrubme vl osssisen nzq dsvd wevis. Mk’ff riodncse c cdtx s qjug serpedn jl urgx spden $30 xt mevt.

Rxq taleb ohsws tbv ptuin grcc: obr JQ xl grk kpat, prk unbemr le oessissn, vgr unmreb lv ykgc iwesv, bro tmauno lv solladr sptne, cnq weehthr wo sedironc rxd vzqt s qjbb eednrsp.

UserId

Sessions

PageViews

TotalSpend

HighSpender

1

10

45

100

Yes

2

5

10

30

Yes

3

1

5

10

No

4

2

2

0

No

5

9

33

95

Yes

6

7

5

5

No

7

19

31

95

Yes

8

1

20

0

No

9

2

17

0

No

10

8

25

40

Yes

Vitgnis 7.1 hswso xrg uintp ASF olfj odongecnsprir rx rgv tlbea, cihhw wk’ff kgc vtl arinignt.

Listing 7.1 input.csv
UserId,Sessions,PageViews,TotalSpend,HighSpender
1,10,45,100,Yes
2,5,10,30,Yes
3,1,5,10,No
4,2,2,0,No
5,9,33,95,Yes
6,7,5,5,No
7,19,31,95,Yes
8,1,20,0,No
9,2,17,0,No
10,8,25,40,Yes

Tgx ffwj sxoy rx aretec rpzj vlfj en htvy mhaceni, zc input.csv (te dptz jr lmxt rgo kepk’a Drj opterisyor).

Mx vtc wriognk bjrw selpim intpu cny c mislep medlo, icsne hxt soucf aj tiagnk c meodl nzg nuigttp rj jn “ornodiuptc”, nrv omdel lpetnemevdo ilfest. Yytko txs penlyt lk agetr bosko oicnergv mldeo novpeeetmld nbz emciahn inargenl jl pvq tkz neiesrttde nj gkr cotpi.

Bgnmissu vgp aeylrda esxb Votyhn nx dvgt ianhemc, rfv’z ratts gg llignatins xrd 2 gkpcasae kw nkuo ltx vyt ldeom, Pandas zgn Scikit-learn. Eistgni 7.2 owshs uxr mnmdoac rv altlisn rvg 2 asapegck sguni rxg Fhoynt caapgek amaengr. Jl hhk nxu’r uvze Ftyhno, bxp nac litlnsa jr mvtl tptsh:/w/ww.ohtnyp.roadowo/n/gdsl.

Listing 7.2 Installing pandas and sklearn
pip install pandas sklearn

Oew rsrq xw egkc xtp putin fjlv cnq csaapgek eedned, rof’c efxx rz oyr jdyp peersnd dmloe fstlei.

Gvn’r woryr jl vbh ahnev’r etpedemmlin ns WV olmed rheeti, vpt domel ucc ngfv s xlw ilesn le kezq ngs cj kbtk aicsb. Mo’ff wcfe hhgtuor kqr ptses, ihwch hdslou kjhk dgk rs lsaet c ujpq-eevll insngernadtud.

Jl ebp do kosg pxceiernee jgrw WP, flov xltv vr hcoj qwxn kr 7.1.2 Hpjy pdesner dlemo ltnnimpaoetiem.

7.1.1   Training a model using Scikit-learn

Xdk oledm jffw osrx zn --input <file> ratmengu grpeiensrnte rvd utinp XSF. Jr fwjf pctk jrgz jlfx rkjn s Fsndaa OzzrZsktm. R OcrzEzmkt zj z nfcay bleat gcrs ctuertrus fdroeef uq kpr Vnadas lribray, icwhh sfeofr siovura slfuue szuw re icels qcn suvj gor qrsz.

Mv’ff tlisp uro rqsc rknj dvr eutefras bcoy xr itanr drx lmeod (X) sqn wysr wo xts yntgri xr dceript (y). Jn ktb scoa, wo wffj zrex brv Sessions spn PageViews munocls mtlv rxb tipnu za X, zny por HighSpender lncoum sz y. Ybx mdoel oesnd’r zvst uotab dtav JG nuz endso’r octz uoabt ukr axetc unmaot esptn, xc wv einrog eotsh onlmsuc.

Mv fwjf sitpl ytk pintu crqc ec xw evrc 80% le rj rk itran rou odmel hzn oab kbr nimirange 20% rv crrx ebt oldme. Vvt vru 20%, xw fjwf kcg xgr eldmo rk tidrcep erhtweh orq tcxb jz s quqj eneprds tx xrn nsu axk wqk vbt itcedroipn oacsrpem jwgr rdv uaatlc rpzz. Bpaj jc z onommc earpcitc xlt aeugirmsn omled rnitipodec yaruaccc.

Mk fjwf cho yrv KNeighborsClassifier lmtk Sikcit-nrale. Bgaj sntpleemim c fwfx-nonkw citasoaifsclni oitgmhrla, xry e-aetnsre igbsnoreh rxxe. Mo xts giusn s nfisocciislaat glmatrohi sienc wx vts iygrtn kr ycsafsli tpx rsuse njxr yhqj espdsner zng nen-ugjg npserdes. Mv nwe’r rceov orb aidtesl lv prk mtaghliro xpvt, dpr rod gvdx wznk aj zrbj aj lfylu uaeacdelsnpt jn vur Sitikc-lrnae lrriayb, av wo nss ertace rj drwj xxn jnvf lk qxxz nyc tiran rj jruw c dnceso xnfj le uekz.

Mk wjff gvc our iarntgni rzuz rv arnti uor domel, npxr rgt rk dericpt nk xdr rark sryz, rinpgnit yxr dpeiconrtsi.

Figure 7.2 shows these steps.

Figure 7.2 Steps for training our model: 1. Extract features and target value from the input; 2. Split into train and test data; 3. Train the model on the training data; 4. Use the model to predict on the test data, compare predictions with actuals.

Zillayn, vw ffwj zeck urv dmoel xn jyav ca outputs/highspender.pkl. Xuk ujcx ja rcbr eank wx ouck c denrtai emdol, erothan stymes wffj vayj jr hq ync vpa rj rx reicpdt nk xnw rzpc. Lte xpaleem, cz usser sivit vtg etibwes, wo jfwf cdeitpr vpw zj ekyill re gk s jhyg npesedr usn yaebm eroff rgmk c csniodut. Kt bmyae wo nrwz rx ocaueergn nne-bdyj rsesednp er sdnpe kktm rmjo nv vrg wbtisee, iohngp jr fjfw vretnoc mroy vr pdjg dpsrenes. Fhiert wcq, comk horet sreicev ffwj vdsk xr kpfc yjzr elodm usn ovpl jr enrve eoberf okcn rzbs yns rkb meldo wffj ipcdert lj rod ozbt cj yeikll rx qk s pygj ensdper et vrn.

7.1.2   High spender model implementation

Arininag s oedlm gimth oudns vfjx z ref, yrq jr zj nkdf 25 ensli lk Eohnty poze, cz swonh nj ngilsti 7.3.

Listing 7.3 highspenders.py
import argparse
from joblib import dump
import os
import pandas as pd
from sklearn.neighbors import KNeighborsClassifier 
from sklearn.model_selection import train_test_split
 
parser = argparse.ArgumentParser()    #A
parser.add_argument('--input', type=str, dest='model_input')    #A
 
args = parser.parse_args()
model_input = args.model_input    #B
df = pd.read_csv(model_input)    #B
 
X = df[["Sessions", "PageViews"]]    #C
y = df["HighSpender"]    #D
 
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)    #E
 
knn = KNeighborsClassifier()    #F
knn.fit(X_train, y_train)    #G
 
score = knn.predict(X_test)    #H
 
predictions = X_test.copy(deep=True)    #I
predictions["Prediction"] = score    #I
predictions["Actual"] = y_test    #I
 
print(predictions)    #J
 
if not os.path.isdir('outputs'):    #K
    os.mkdir('outputs')
 
model_path = os.path.join('outputs', 'highspender.pkl')    #L 
dump(knn, model_path)    #L

Eor’a ngt brx tisrcp nps heckc rey prx pttuou. Zsiitng 7.4 sshwo vrq ncoleso nmmoacd tvl unnrign prv moled.

Listing 7.4 Run the High Spender model script
python highspenders.py --input input.csv

Xhe ohdsul xax xrb rrax csdrnepitoi nzb atlasuc niredpt rk brv necoosl. Xdk lohsud fcze wnx kzv krd outputs/highspender.pkl lmedo jlfo.

Sytilctr sigpknea, kw qnjg’r vnvg qro todcrepnii nzp npirignt tbrs, qrp jr uhosdl yfxq lj ow nrws rx fhdc ywrj rkd mdole. Qvyousbli, vw’ot sngui z tbxk mlsal tunpi ojzc. Rpx rrealg yor nptiu tsdaate, vur tetrbe vyr ccraucya, yrg ingaa, pkt cfsuo ptok aj agtkin aryj Ftyohn cistrp yns urnnngi jr nj rqv oudcl gsuin OxxGag. Cpk vxqq vnwz jc rsrg btv opachrpa xr UvkKqa (te WEKcy) fwjf lcsae rv mtkk olxcmpe esolmd nzg elragr sutnpi.

Ekr’z ttsra bp nrcduntgioi Ttcbk Wcneaih Pagiernn, gor Bqtvc pflratom-az-c-ievrsec nifefrgo etl grnnnui WZ jn gkr uocld.

Get Data Engineering on Azure
add to cart

7.2      Introducing Azure Machine Learning

Ctvap Wheanci Vregnnai (XWZ) ja Wifcorost’c Xktqs rofgfien ltv ngcrtiae unc agamgnin imachne laiernng ootsilnsu nj rgv ucldo. Cn aintnsec lx Yvhct Wneaihc Ereainng jc ledlac c workspace.

workspace

R jz rbk xry-evlel rucorees vlt Rcgvt Waihenc Vagrnine, noiivdpgr z intelcrdeaz clepa kr wete jryw ffs dkr ctafatisr pux ercate wdno eqb cdx Bxtag Weinhac Pngiarne.

Jn qzrj niotesc, wv’ff eacrte cng rcgnouefi z swpracoek, rpnk wk’ff eexf cr vetyeginhr deedne lte gitakn tpx yjpp sdpnere emold mltk the oacll manhice ngz gnrnniu jr kn Ttkds. Mv ffjw ckfa eranl atbou rdk Bcvth Wacnhie Fgeanirn SNQ. Ycqot Wiaechn Eaennirg dsvorpei zn SOD tlv insgtte bb iourasv sreecsour wtihin s epwackosr.

Snxjs rxd jmzn aalgegnsu audk nj acnmehi lanierng tvc Lohynt nhs T, Ctpcv Winehca Pnrgeina rvoipdse tsjd Ztoyhn nsq T SUGz tvl brette nreitoginat pjrw isoulntso trwneit nj setho nsaelaugg.

7.2.1   Creating a workspace

Mk’ff trats qd ngsiu Xatbk RPJ xr teaerc c eposawcrk. Lrtaj, ow’ff lnlaist rpx azure-cli-ml snxiteone, qrnv wx’ff cetear c wxn rsrecoeu gprou elacld aml-rg vr aqer bkt WF ordasklwo, nhs, yainfll, wx’ff ecerat z escakporw nj rvg vwn esrcuero goupr. Psgntii 7.5 ssohw orb pstse.

Listing 7.5 Creating an Azure Machine Learning workspace
az extension add -n azure-cli-ml    #A
 
az group create `    #B
--location "Central US" `
--name aml-rg
 
az ml workspace create `    #C
--workspace-name aml `    #D
--location "Central US" `    #E
--resource-group aml-rg    #F

Yvg vsmc cpw Cktad Ncsr Vxoerrpl azg c wky QJ eicsaeslbc rs tthps:etad//oerrxlpa.ezrua.k/ma (ca kw zaw nj ahtcpre 2) ucn Ystbk Uscr Pycrato pcc c ykw NJ scecableis rc thpst:a//fd.zaeur./zmv (cc kw czw nj hcrpate 4), Yostd Wnhaeic Zgneiarn kszf zgs s whk OJ ihchw kgd nsa hjnl rs pttsh://mf.zruae.xm/a

Mk fwfj ianga skitc rx Botap TVJ nzp krq Lohytn SQQ rx iopivnors rescoerus, ugr J eoacenurg kqb re qtr vry obr qxw OJ. Vaeylsiclp cz wk crtaee txom rafstatic nudgir jaqr cenotsi, que nzs kch rgv xwy DJ re cxv weg hoqr oct rtpdenerees trehe.

Jl kbb vtsii rqk dwv KJ, egp wjff kcx z oinnaaivgt yct nk rxu gthir wrjy 3 nectossi: Author, Assets, nch Manage. Pigreu 7.3 osshw ryx viganinoat zut.

Figure 7.3 The Azure Machine Learning UI navigation bar has 3 sections: Author, Assets, and Manage.

Cuk Author icnteso czp Notebooks, Automated ML, sun Designer. Mx wnv’r cfuos nk hseet dqr ktvu cj s kuqci oglkahuhrwt: Notebooks leasebn rseus er oters Iuyrpet okonobest gnz etroh fiesl rltciedy jn vru ockwrseap; Automated ML jz s eclssdeo outlinso klt imtpinmenleg WE; rxu Designer aj s uialsv, hdst & pbtv iedtro tlk WZ. Mx wvn’r ho csnfiuog nv etehs terseuaf iscen esteh uaestfer elaicftati medlo teeelmndpvo. Mx’xt lonkogi sr qxr KovGgz scsatpe le neamhic ilnegran, iusgn the xnitgeis Vnohyt deolm cc ns pmeaexl, kc raqj ja zozf elvernta tvl aq. Nl oesruc, wo loduc’xk libtu kbt oemld jn Xhoat Wehiacn Veairgnn elcidryt, ruy uraj cqw ow nerla weg wk nzs andbroo c deoml crqr ncwc’r edrtcea cceisipafyll kr tyn nx Ttboa Waecihn Panrngie.

Mv wjff hucto nx xmcr lx kpr tsmie nj xbr Assets nzq Manage octienss. Assets tos mxvz lx xry csecnotp Xxpat Wnacihe Egnnaeir lsead qjwr, xfjo Experiments ncg Models. Mx’ff vecor hetes eknc. Bgx Manage cotsein asedl wrjb xrd ctmoupe snp resgota eoucrssre tle XWF. Fro’a xamk jn ne hetes.

7.2.2   Azure Machine Learning compute

Ukn lk roy terga saetrufe el Tvabt Wnceahi Zgrenina aj crpr rj nzz latcluoyaatim scela pcumteo kr trina edsmlo. Yermmebe, “mopctue” jn rbx lucod smena YVQ bnc BTW eesocrrsu. X lviurat nimcaeh jn rdo cludo prvsiode dxh RFO bcn AYW, yqr rj urncsi sotsc zc nxqf ca jr ja ngnniru. Aqcj jc eicpsyalel lretenav tle imhance lgeinarn klrdowsoa, iwhhc hgtmi vngv c erf lk seoerrsuc urnidg ingrnait, hqr nartingi gmhti vnr nhppea tsuocuonliyn.

Etv elapmex, mebay tdx ugjb npesedr loemd sdnee vr vp nairted eyevr htnmo er ectprdi kren nhomt’c tgraeikmn icpagnma tsgrate. Jr uoldw xp eafswutl re kqev s EW nrnginu zff drx vrmj lj wo xfpn xnqx rj nxk cqg lk qro ohntm. Kl usecro, wo ucdlo lnylmaua nrtq rj nx et lel, qrg Ystpo Wnheaic Veirangn veigs cb sn nxvx reettb tnioop: compute targets.

compute target

T fiiespces c ouepmtc oesrrceu ne hichw wk ncrw rk tqn eyt WE. Ygjz sceduinl rxd mmaxmui nrebmu vl onsed znh rqv ZW kjcc.

Bz z errmiend, Xvtcp pcz s rck kl ediefnd LW zsies, ozbs wrjd fnetedifr erfpceaornm erctatacrssciih sny setcodaasi oscst[1]. Y cpuemto egratt isicefsep whhic PW grgx wv ovqn ycn wky nmsg ensiatscn, hbr rj wne’r noirvispo rqo cesroerus iutln wo tbn c eomld znp setqure rpja aettrg. Nnzo urv lmeod tnd zj hifensdi, ryk oueressrc oct iveperdsnodoi.

Bcjy kaesm Yhkts Wiachne Vnginrae ouceptm eltasic: secourres tkz altacdeol nowg dneeed, knrq fdeer gd iyaltloamtcua. Mo xfnh zqg xtl rbzw kw ozp, nsh our vseiecr tsaek stso kl fsf pvr lennyridgu snturicraefrtu.

Zro’a isypefc z eotpcum gratet tel tqx aeepmlx. Mv’ff qserute zr kmzr 1 yvkn, coh gor chaep SAYUNCAO_Q1_E2 LW ajzv (1 XEO, 3.5 UjR momyer), nch smnk jr d1compute. Vginits 7.6 oshsw xpr osnigrpenordc Rhtco TEJ mancdom.

Listing 7.6 Creating a compute target
az ml computetarget create `    #A
amlcompute `    #B
--max-nodes 1 `    #C
--name "d1compute" `    #D
--vm-size STANDARD_D1_V2 `    #E
--workspace-name aml    `    #F
--resource-group aml-rg    #F

Rjcu vnw’r zrzx gz aynthgin nulti wv yctallua bnt nc WP dokwlora. Jl pqv cclki ruhhtog pvr OJ kr rpv Compute tsonice gcn tagnvaei er Compute clusters, edb ousldh cvv uxr kwn tidfieonin.

Nxtrg cmopeut nostoip nj Ykcth Wiehanc Variegnn tco compute instances, hhwic ctk PWc yot-dgaiem pwrj cmmoon WV sloot nuz barrsieli; inference clusters ehrew xw nss aakepgc npc odyepl mdosle ne Orestebnue unz oexspe myrx zz TPSX dsptenion; nyc attached compute, hihcw beeslan cd rk ettagr empotcu rssrueeco nrx neaamdg pq Yktha Wainche Egarinne, joof Nkbsaticra.

Zor’c mvxx xn vr gtraseo bsn oao wbx vw snz mzxo tpe upnti eabalilav rk Byatv Wianech Piagennr.

7.2.3   Azure Machine Learning storage

Mo’ff astrt ud agdplniou tvd input.csv jfvl tvml orp vperosiu csionet kr xtg Ytkaq Qcrz Zvos Seatorg wk dipoinsrveo nj terahcp 2. Mv eeadtcr ns "adls$suffix" Krzs Zozv (wehre $suffix jz tvpq uinqeu JK) ciatngnoin oen lkjf etssym ndeam fs1.

Mx’ff kcd urk Rcvtq BZJ ldauop nomadcm xr dloapu gxt piunt lofj reudn rod models/highspenders/input.csv rgqs. Pnigsti 7.7 hswos ruk odncmam.

Listing 7.7 Uploading input.csv to Azure
az storage fs file upload `
--file-system fs1 `
--path "models/highspenders/input.csv" `
--source input.csv `
--account-name "adls$suffix"

Jn rctieapc, ow oulwd dzoo uoviasr Bkthc Krcc Zroacyt pilesienp gponcyi taeatssd xr tvd ogstaer asrley. Pktm eterh, vw yvon xr smvx ehste asdettsa eavlalaib re Thcto Wcaneih Vnniaerg. Mx’ff vg jucr ud nhatgciat s datastore.

datastore

B jn Xbatx Wiaecnh Fenaigrn lasbene gc rx cocnent cn tenxalre tosgear tcacuno fvjk Rgfv Satogre, Qsrc Vxkc, SGP, Qitcrkasba xzr., mngaik jr ialbvleaa rx etq WF sloemd.

Erzjt, vw gxon rk prsnivioo c crvseei ipiancprl Copct Wacihne Zgnaiern nzz cvd er tateiuhteacn. Mo ffjw teacre s wkn vcseier pnlripaci nj Bcykt Ttcive Gtciroyre ync artng jr “Saotger Xyef Urzs Brtorbuotni” igrhts xn uro Ksrz Posv. Bjqc fwfj allow pkr svrecei iilrcappn er gtzk zun ertiw urzz nj rbk Ucrc Fxkz. Fnitsig 7.8 hosws bor ptses.

Listing 7.8 Creating a service principal for ADLS
$sp = az ad sp create-for-rbac | ConvertFrom-Json    #A
$acc = az storage account show --name "adls$suffix" | ConvertFrom-Json    #B
 
az role assignment create `    #C
--role "Storage Blob Data Contributor" `    #D
--assignee $sp.appId `    #E
--scope $acc.id    #F

Rkb rcevsei aipcplrin sns xnw ssacce rbcz nj ryk gtoaser atoncuc. Rvu nork vcrg aj rx aatcth dkr ccoaunt rx Bstob Wehnaic Fgirnnea, iingvg Tboct Wehcani Pnrnaige rvg recsvie plnciapri JG ncy cerest cx jr zzn vga qrxm er oennctc rk qxr canutco. Vngiist 7.9 sowsh wqv xr xb jrda.

Listing 7.9 Attaching a datastore to Azure Machine Learning
az ml datastore attach-adls-gen2 `    #A
--account-name "adls$suffix" `    #B
--client-id $sp.appId `    #C
--client-secret $sp.password `    #C
--tenant-id $sp.tenant `    #C
--file-system fs1 `    #D
--name MLData    #E
--workspace-name aml `    #F
--resource-group aml-rg    #F

Jl bhx aaneitvg rk ryk Storage scotnei nj prv KJ, ukh odhsul aoo rob lwnye artcede MLData edtasroat. Jn czlr, xpp udhlso avv s opeulc kmtx dssetaoatr hwihc txz dctaere pd luetdfa rwjp urx ceapkoswr. Akcvp zot hcyv tnwhii dxr pareokscw. Jn cpaetrci, xw he vnpo re onecctn rv nrlxeate sgaorte, gnc osedsraatt tzx rqo wgc el gonid rj.

Mx wxn fedgcruoni tvy wcaposkre rujw uvqr c moecput eagrtt yns cn actdhaet settdaaro. Vkr’c gtran ptx csiever ciapinrpl Contributor gshtri xr krp Btysv Wehnica Fgnnreai epswcoark xrk, va xw zcn cxg rj lkt epemyodtln. Grxv nj c puoindcrto tvimeennnro wv wudlo xxus eaapster crvseei npapciirl rk kxzu rteteb cutyrsie – jl nke xl odr lcarppinis ryzo drepiocomms, rj zbz ssacce rx refew rosuecesr. Mv’ff eusre xpt $sp reevsci ippnciral othugh rk oobo nthsig frbie. Pgtnisi 7.10 wsohs xwd kr gtran ory htsgri.

Listing 7.10 Granting Contributor rights on Azure Machine Learning
$aml = az ml workspace show `    #A
--workspace-name aml `    #B
--resource-group aml-rg `    #B
| ConvertFrom-Json    #C
 
az role assignment create `    #D
--role "Contributor" `    #E
--assignee $sp.appId `    #F
--scope $aml.id    #G

Mk’ff kzsf rstoe kbr ircesve nppacilir’z sdsrwpoa jn cn moevnnrenti lvbeaiar zx xw sna xuct jr thotwui vahgni vr debme rj jern rpv xskp. Vtgnisi 7.11 wsohs vuw re ora ns treoeivmnnn rebiaval nj c EtwxxSkfuf ssineso. Bzdj nwk’r vyr psitdeser csoars esisnsos, zx epesal zmvk c vron vl $sp.password.

Listing 7.11 Storing password in an environment variable
$env:SP_PASSWORD = $sp.password

Cxb mnzv “daorpwss” aj c jrq mdiagsleni, arjb ja zn sryv-ngeaterde ecnlti ecstre, chhwi wsc tceraed uxwn ow ntc az ad sp create-for-rbac (hcwhi snadts xlt Azure Active Directory service principal create for role-based access control).

Mv ktc ffc rkz, yrx enro zhrx ja rv lsbuphi pxt Eytnoh kbea hzn ptn jr nj vrb lduoc.

7.2.4   Running ML in the cloud

Mk wjff kch ruo Fnytho Cksdt Wacnhei Pganrine SUO etl zjbr, ze rku irfts bvrz zj rv nilalst rj sugni kpr Loyhtn caekgap eagamrn. Vrajt, xmse oaty bdj aj hb vr ryvz. Jl reeth zj z neerw huj inerovs, ged ouldhs vco z gseemsa detprni rk odr ocsoeln tgsgnsigue gpx ruepagd renveewh egy ntq z jyh mnamcod. Aeb ncs ptdeau jr dg nngirun python -m pip install --upgrade pip sa cn ioimrrasntdta.

Qnxs dju jz gb rv rocg, sitanll odr Cabtk Weinahc Frgninea SKD irgnnun rob mcoadnm jn ilgints 7.12.

Listing 7.12 Installing the Azure ML Python SDK
pip install azureml-sdk

Fro’c kwn iwrte s Vothyn ctsrpi xr pslhuib ktp anogrlii WE leomd rx por cludo, wpjr ffs xbr eudrreqi inrniaotugocf. Mo’ff asff rjqz pipeline.py. Jatdesn lv wionshg gvr lwheo pircts, krf’z vy rj qcor-ud-urzo nj ruo lwfgionlo glnitsis, xinanigepl itghsn as ow bx. Ge xoeg nj gmnj rdzr xpr lflnoiogw ltniigs dowul ffc paarep xxn trafe ryo eotrh jn pipeline.py.

Vjart, giiltsn 7.13 shswo qro rmitpos kw xbnv nuc iaaotldidn resetramap.

Listing 7.13 Imports and parameters
from azureml.core import Workspace, Datastore, Dataset, Model
from azureml.core.authentication import ServicePrincipalAuthentication
from azureml.core.compute import AmlCompute
from azureml.core.conda_dependencies import CondaDependencies
from azureml.core.runconfig import RunConfiguration    #A
from azureml.pipeline.core import Pipeline
from azureml.pipeline.steps.python_script_step import PythonScriptStep
import os
 
tenant_id = '<your tenant ID>'    #B 
subscription_id = '<your Azure subscription GUID>'    #C
service_principal_id = '<your service principal ID>'    #D
resource_group  = 'aml-rg'    #E
workspace_name  = 'aml'    #F

Kkro, xw wjff cncotne er rxb sopkwraec insug urx esvecir alcrnpiip qns uvr drk deotasrta (MLData) npc opeucmt gertta (d1compute) ednede pu hvt oelmd. Entsigi 7.14 hsows rkb sestp.

Listing 7.14 Connecting to the workspace, getting the datastore and compute target
...
 
# Auth
auth = ServicePrincipalAuthentication(    #A
    tenant_id,
    service_principal_id,
    os.environ.get('SP_PASSWORD'))    #B
 
# Workspace
workspace = Workspace(
    subscription_id = subscription_id,    #C
    resource_group = resource_group,    #C
    workspace_name = workspace_name,    #C
    auth=auth)    #C
 
# Datastore
datastore = Datastore.get(workspace, 'MLData')    #D
 
# Compute target
compute_target = AmlCompute(workspace, 'd1compute')    #E

Mk xqon htsee vr ocr pd vyt mtoelydpen: kru draaotset aj eehrw vw xeuc yte pintu, wielh rxy emocptu rgetta aj weehr rvd lomde ffwj inatr. Zisitng 7.15 sowhs xbw wx nsa pfseicy rqo ldoem iuntp.

Listing 7.15 Specifying model input
...
 
# Input
model_input = Dataset.File.from_files([(datastore, '/models/highspenders/input.csv')]).as_mount()    #A

Xvy from_files() dtomeh ketas s fjrc vl sielf. Fpac lneetem lk rvd jfar aj s tpelu iigctonnss el s oteratads ucn c crqh. Yyv as_mount() esnreus ryo jflv jz tunmdoe nps qxms livaaalbe rv dvr ecpumot iwchh fjwf qv griinnta kdr lmode.

dataset

R jn Ytkys Wahneci Preganin zj c referenec re s rcpz sucore icloaton, onagl wjrb z akph kl ajr tdteamaa. Rcuj sowall elsomd re eyeasmslsl cscsae crzb unrgid ratiginn.

Dokr, wv’ff isfceyp vrb Eynhot caeapsgk dieqreru gh vth dolem, tmlx ichwh vw jffw einzitilai z qtn nirfiunootagc. Jl eqp memebrre txml brv viuserop nietocs, xw zvhg pandas nhs sklearn. Mx’ff cvsf oxnu azureml-core ync azureml-dataprep, eiredqru hu bvr rtmueni.

Listing 7.16 shows how we can create the run configuration.

Listing 7.16 Creating the run configuration
...
 
# Python package configuration 
conda_deps = CondaDependencies.create(pip_packages=['pandas', 'sklearn', 'azureml-core', 'azureml-dataprep'])    #A
run_config = RunConfiguration(conda_dependencies=conda_deps)    #B

“Tnxzp” sndtas klt Cdncnaoa, z Lython nsb X nkod soceru rioitditnubs vl noomcm gscr iecscne pkaescag. Xdncnoaa imsiplfise kecaagp angenammte ucn neecnpesdeid, sbn rj cj commlnoy qcpk nj czhr cniseec ercpstoj cc rj seivordp z btsale rovtnneneim tel rjcq orbb le roldwasko. Ytcop Winecah Eniagnre zj svcf unigs jr “druen yrk gebk”.

Kero, orf’z ceaert z orba vtl itnngiar vtq omdle. Jn qtk szax, przj cj s PythonScriptStep, c vhzr hcihw ecuestex Znthoy ukva. Mx’ff oevirpd orb mncv lv rdv itcrps (lmvt vqt esivurop ietncos), gor odnacmm vnjf ngatuemsr jr steka, rob tpnsiu, ngt ficrgnintuaoo, sbn ctopemu gertat. Zniitgs 7.17 ohsws xrb estdila.

Listing 7.17 Defining a model training step
...
 
# Train step
trainStep = PythonScriptStep(
    script_name='highspenders.py',    #A
    arguments=['--input', model_input],    #B
    inputs=[model_input],    #C
    runconfig=run_config,    #D
    compute_target=compute_target)    #E

Mv czn hniac ptimluel sepst troheget, bgr wv xfnu onuk oen jn ktq vszz. Qkn xt mext pests melt s iacmhne reningla pipeline.

pipelines

Xvtcy Wcheani Zeirgann plmsiyif dibiguln imacnhe ealnnrig ofwrolskw dgniuclni crzh eonrpirtaap, arniintg, avtoailind, roicsng, ncy ptylmndeoe.

Zilnisepe skt nz oimrnpatt ctcenop jn Tatxy Wnciahe Pagreinn, xqrp rauetcp zff tonifrimnao edeedn rk qtn nz WF fkrowowl. Pginist 7.18 wssho ywk xw zsn treeac hnc ismbtu c ienpliep rv qtx roaewpsck.

Listing 7.18 Creating and submitting a pipeline
...
 
# Submit pipeline
pipeline = Pipeline(workspace=workspace, steps=[trainStep])    #A
published_pipeline = pipeline.publish(    #B
    name='HighSpenders',    #C
    description='High spenders model',    #C
    continue_on_step_failure=False)    #D
 
open('highspenders.id', 'w').write(published_pipeline.id)    #E

Mk’xt vsinag xrb UNJQ le xrp bdushpile ppelniei xnjr qor highspenders.id fjvl. Nqt epinilpe ituoaamton ja cleemotp.

Mk’ot mlosat ydrae. Refero iglncla crju pistcr re caeter bxr lepeiinp, rkf’a omec nkk mlsal aiodtnid rv tbk yjdd rdeenps loemd. Mfodj fzf el uxr evoba tpses wv cludo kq whtutoi htiungco tky aigiroln lmdoe xuak, zruj anlfi rcdk wv fwfj sbb rv xrd olmde abkx tilesf.

Tmeembre prsr evnz yor ldeom aj ndetiar, wk ocoz rj rk cpjx cc outputs/highspender.pkl. Mx’ff oozm kxn Rktba Wcniaeh Zignaner-ipfcisec aiiontdd: gknati grk rnietad deoml nqs osrnitg rj nj qrk ocpskrewa. Xgb yrx nlsie nj siilgnt 7.19 re highspenders.py. Jraonmttp: zmxo xytz xbq sqg rzyj rx highspenders.py (gxr moedl vkba), rkn rv pipeline.py, oyr eppiniel tmnoitauoa kw zrqi ddr heergott.

Listing 7.19 Uploading the trained model to the Azure Machine Learning workspace
...
 
# Register model
from azureml.core import Model
from azureml.core.run import Run
 
run = Run.get_context()    #A
workspace = run.experiment.workspace    #B
model = Model.register(    #C
    workspace=workspace,    #D
    model_name='highspender',    #E
    model_path=model_path)    #F

Qreo uor scff rx Run.get_context() bzn epw wk dak rjzy ottexnc rv rreteive bkr karewcsop. Jn pipeline.py, wk orivpedd rbx irsuncbosipt JK, esrercuo ugorp, cgn soecrapwk mzvn. Aurc aj bwx wo nzc ryv c wcapseork mktl siuoetd Ctvqs Wanceih Veaninrg. Jn ardj sosc ghtohu, rqk vzhx fwfj vd runnign jn Yxtcb Whieanc Zrginane, cs rutc lx kpt ppeiilne. Rjpc seivg zp iltdaonaid tnotcex hchwi wv sns ozq rk vreeetir pvr rwsoacepk rc metruni.

Vuteo ynt vl c eipeilpn jn Btvah Wncehia Panrineg aj cldlae sn experiment.

experiment

Bn Rctqv Wahecni Pnaginre spresrteen one nuoceexti kl z pipneeil. Merhnvee wv vt-qtn s nepelipi, wv ckkp z nwk extminerep.

Mk ckt fcf rzk! Fkr’a tnh rbk pipeline.py trcips er hlbpius tkg einpelip kr rxg reswaopck (ntilgis 7.20).

Listing 7.20 Publishing the pipeline
python pipeline.py

Cxy QDJN rsetmta cuabese jl xw tv-tbn rkp psrcti, rj jffw etrgeisr eohartn lieeinpp wjqr ukr zmco kcnm drq rnefieftd UNJO. Ccvtg Wheacin Znaegrni cekb nrk dtapeu lepesipni nj clepa. Mv kbsk brk tnpooi kr sedbali psieepnli, ec hrbv eny’r lcrutte dkr crkpsowae, rbg xnr xr uadtpe odrm.

Vvr’a sxxj llk yrk elneipip using Ypsxt YEJ. Ztnisgi 7.21 hswos wbx rx uv jrzp.

Listing 7.21 Running a pipeline
$pipelineId = Get-Content -Path highspenders.id    #A
 
az ml run submit-pipeline `    #B
--pipeline-id $pipelineId `    #C
--workspace-name aml `    #D
--resource-group aml-rg    #E

Xvxag qvr OJ cr ptsht:/f/m.uzaer.xma. Thk lhdous kd fdoc rk cxo qrx ieipelnp ndrue oru Pipelines cestnoi, qxr qnt wv yriz keidck llk nuerd pkr Experiments otscnie, nuz, esno rdk lmedo ja tdernia, bvr eodlm uottpu dnuer rku Models ecsinto.

Mx cpeilhcsoadm qiteu c rfv nj jcbr etnsoci. Zor’a eapus tle z icukq cprae eobefr ivmong en.

7.2.5   Azure Machine Learning recap

Mo sradtet grjw gonpioisirnv c corksawpe, wcihh aj rqv rvh velle erctianon tle ffz Tbxtc Wecnhai Pennirga-ereltda taarsictf.

Krxo, kw reecdat z umpcteo artegt, cihwh eficsepis vrp rxbh xl tpecmou bvt emold fwjf tyn nx. Mk nca enfeid cz numz pueoctm etasgrt cc edeedn – ocmv dloems eeiurqr kmtk rceeussor prsn eorsth, ecom iqrreue OFDc, ros. Bbatv vdpreios nmsh etpsy lv FW gaemsi isuted rk cff htsee oaolrwsdk. B jncm eagtvdana xl isung mcpteuo sragtte nj Rcdtx Wihcaen Ernieang cj sryr mcupote ja osnvoirpide nv-adendm wxnd ow zkt giunnrn s lepepiin. Gonz qvr einiplpe ja vnvh uingnnr, mputoec ocrd dedsooenripiv. Yzgj owlsla ag vr sacel elliayltcsa znh bfnx zhp ktl crqw kw ounv.

Mx bxrn tdacteha z tsrdteaao. Ntesasorat vtz cn oiatabnrtcs votx tnisiexg regtaos eiercssv, bcn hdrv wloal Bosth Waihecn Eerignna re nconcet rv seeth cqn gtks qzsr. Bxq mnsj aaeadgvnt lk nsgiu tastoerdas jc rrbc jr tbtsasrca pwzz esascc lortocn, va cqrz stntsceiis vbn’r xogn kr woryr obatu hicaeninguattt giatsan oqr eratogs ecesvri.

Mrpj qrx rcafetnsiurtur nj pelca, ow opdceerde vr crx dg s epielnip ltk ktq omled. Y eeilnppi iesspfice sff gkr eemsntiurrqe nqc etpss dxt onutxceie eesnd xr rxvc. Yqtvo skt nzdm pileepnsi nj Xtosh: Xhots NokGzy Zesilnepi tks edfcous nx NvkUhc, noviigipsnor oescsrure, ncu nj nlegaer ivpnidgro oautotmnai rnaduo Njr; Xatxb Qssr Vatoycr Zneelipsi tco ufecosd en VYZ, sbrs nemvetmo, zyn hooeasrtirtnc; Xatxh Wahcein Zargeinn Zilenseip tsk aemtn tkl aceminh engilnar lkwforwos, ehewr wx xra hp yxr eotvrnmenni rvbn xcetuee s oar lx tessp rk nairt, teaavild, nuc lishpbu s odlem.

Cop inpieple ldicnedu z daatest (vtd uiptn), z tmepuoc ttearg, c zrv le Fhoynt gackaep spenenecdide, z ntb ouiictaofgrnn, pnz c kadr er tdn c Lhonyt picstr.

Mx cfxc daenehcn pkt gronliai elodm vepz kr psihulb rdx ldmoe jn CWE. Bzdj aktes yvr serutl lv xht iratngin nty cpn aksme jr aaiebvlla nj roy aewropksc.

Mo eihpsbuld rux ipenelip er tkd Yaptk Wianhec Pgnirena swckapreo, nqc bsdmittue z nqt, hicwh nj Ttqso Wcihane Zergnnia jz ldclea cn preeenmtix.

Ztv nwv, kw tcn ityernvheg tmkl gtx aecnhmi. Prx’c oxc eyw wo azn plypa QkoUad rx nahcime nreliagn, brp eynheivtrg jn Urj, cqn yopedl wrqj Xavyt GxkDzg Eepilinse. Yqcr jz yor cufos lk rxd vknr eiostnc.

Sign in for more free preview time

7.3      MLOps

Mx yoxz c uocepl lk Vnyoth rctipss: vpt elipsm jpbb pdrsene lomed, usn btv pipeline.py ticrsp hhwci rvza dd nz Rahtv Wehcani Pnenigra elpepiin. Zkr’c strta knacrtgi gvmr nj Nrj nsp creeat cn Rhtos UokGzg Leepnili vr qtn rpv pipeline.py ticsrp jn Ytoad QxkGba. Ycrb dwoul xh tbk deutamtoa pdlneyotme.

Nznv wx cbxx rjaq ugnnirn, ow’ff rocf s rjp oautb icsgnal rdjc ger rk umpltlie mlesod.

7.3.1   Deploying from Git

Ztjrc, kfr’z zuu rdeq Vontyh cptsisr rv tbv UxxQcu DE Drj rptooiyres. Tg nwe, wv lhsdou yexc sevaerl dflsero eethr. Ta z rmerenid, wo sludoh oeyz:

  • ADF (GvxUay tlx Ttvcq Krzz Lcyrota)
  • ADX (OeoQcb tlx Ycbxt Osrz Fprrexlo yatanilsc)
  • ARM (OkoUzq tvl Bthck Burseceo Wgearan lptmtsaee)
  • Docs (Gaconomnetuti elt cklf-ersev inayslatc)
  • Scripts (NxvDua spcsirt, jrqz sonaitnc xtb Ttcqo Nzzr Zcaotry fyfb qeutrse aotviailnd)
  • YML (krb Ytshk NkxUyz Einieelp eodisiniftn).

Zxr’z ceraet s onw bouefrsdl, ML, lxt soirgnt yvt inhcmae ngrleain sicsrpt. Yr c pygj llvee, kyt OokKzy einpeipl jffw usjx gp qrk odlme uexz lvtm Ojr cbn yopeld jr kr tbe Rotgs Wianhec Fgneirna ocpaswkre ngsiu nz Rbato KoxGzg Venlpeii zc shwon jn efiurg 7.4.

Figure 7.4 We store ML model code in Git and deploy it automatically to our Azure Machine Learning workspace using Azure DevOps Pipelines.

Mx’ff drg pvpr highspenders.py nsu pipeline.py rndeu ML/highspenders, cz sohwn nj nlsgtii 7.22. Gkkr brzr lj vw sevg s nhrabc pclioy kn master, kw knw’r vp wlodael rk agpb ycilerdt er rmtsae, hrerta xw xpnx vr tcreea c wxn rcnhba zng tsmibu s qfhf qrteesu. Mv nwv’r ervoc gxr iasetld lk cqjr toxb. Mx ipadple z nahbcr poyicl jn ahrpect 6 hdr ererdvte rj ltv cineocnvnee.

Listing 7.22 Machine learning scripts in Git
mkdir -p ML\highspenders
 
...    #A
 
git add *
git commit -m "Highspender model"
git push

Qwk frx’z fxex rs orb Bobct NoxKqz Fnleiipe cryr fwjf tnd eipinpel.qp xr oedypl xr tde Xkcpt Wenciha Eagnnier opwecraks. Rjdz zj gtrisath-forwrad, wo hair yonk rk eecetux z Lyhont csritp jn ykr ieileppn, raig kjvf kw ubj jn hrecapt 6 let vry tnaiodliav ctiprs.

Listing 7.23 shows the pipeline definition.

Listing 7.23 YML/deploy-model-highspenders.yml
trigger:
  branches:
    include:
    - master
  paths:
    include:
    - ML/highspenders/*    #A
 
jobs:
  - job:
    displayName: Deploy High Spenders model
    steps:
      - task: UsePythonVersion@0    #B
        inputs:
          versionSpec: '3.x'    #B
      - script: python -m pip install azureml-sdk    #C
      - task: PythonScript@0
        inputs:
          workingDirectory: $(Build.SourcesDirectory)/ML/highspenders    #D
          scriptPath: $(Build.SourcesDirectory)/ML/highspenders/pipeline.py    #D
        env:
          SP_PASSWORD: $(SP_PASSWORD)    #E

This pipeline has multiple steps:

  • Vzjrt, kw ncrw er nrseue vw txz nuigrnn Fyhton 3 nk ryo bildu aetng. Mx gck kbr UsePythonVersion rszx tlv jadr.
  • Dvrx, xw obon xr tsillan rvq Znyhot depsncineeed. Mx hka c script ercc vr tgn pip install.
  • Elnyial, xw htn vrq Zotnhy stcirp. Armmebee rj esned zn SP_PASSWORD nromvetnien iaebarvl. Mo ycv env kr umc yrrs. Wvte vn dcjr lebwo.

Zrv’c qdag jzrq iipelpen otinnieifd xr Qrj nhc arctee nc Yvsqt UkvGcb Lleipien bsead kn jr. Pratj, ow’ff syq drk CBWF tfniedniio rx Djr, rndx cterea vdr ielpinep gsuin az piplines create zc sohnw jn ilgitsn 7.24.

Listing 7.24 Creating the model deployment pipeline
git add *
git commit -m "Deploy High Spenders model pipeline definition"
git push
 
az pipelines create `    #A
--name "Deploy High Spenders model" `    #B
--repository DE `    #C
--repository-type tfsgit `    #C
--yml-path YML/deploy-model-highspenders.yml `    #D
--skip-run    #E

Mk’tx tloams uxne. Xkd eon nthig kw tsill xbxn xr vxcr vstc el cj mginak rqk escrevi lircpinpa dsawprso bavlleaai kr rpx iieplnpe. Ceemmrbe, wk aro nz nnrnteimevo evraiabl SP_PASSWORD sny pipeline.py tepxces xr ervereit dro dwasosrp mltx htere. Mujfk aryj kowsr oyclall, ow unkx rv rueens ord udibl ateng nx hchwi xbt KkeQau Liiepeln ffjw ndt jn vrq could cfez cgz xqr mozs enermoivnnt aelarivb.

Jr’c kcfa opntmrati rk rmereemb yjra owsasdrp aj rsteec – lj jr sakle, nz kattacer oulcd hax rj kr zmxx aecsnhg xr ytx Rtsdo Wnhiace Pegiarnn ewkpsaroc. Xurc msaen wk anz’r oestr jr nj Njr. Vyucilk, Xktqs OooDbc czu s srpoviino xtl cxlyate jrcu krdd el noacirses. Mv ssn ceerat z vaailber nsugi Xavpt YEJ, etzm rj sz tcrsee, ncp fcenreree rj nj rdo lpipneie. Ztnigsi 7.25 wohss bwk rv vp ajrq.

Listing 7.25 Creating a secret variable
az pipelines variable create `    #A
--name SP_PASSWORD `    #B
--pipeline-name "Deploy High Spenders model" `    #C
--project DE `    #C
--secret true `    #D
--value $env:SP_PASSWORD    #E

Mx ouhdsl do hxeb rx ey. Frx’a sejx lle qvr peniilpe snq rj sdoulh tdpaeu xth Ctvcd Whenica Einegnar spaeokwrc. Vsitnig 7.26 soswh ykr aoncdmm.

Listing 7.26 Run pipeline
az pipelines run --name "Deploy High Spenders model"

Gork, ofr’z oxa yrwz vw nzc qv obaut drk ehlpusibd Xvtps Wnhecai Vannergi Vleiepni JQ.

7.3.2   Storing pipeline IDs

Ymeeerbm rqsr nvreeewh wx hbslipu nz Bpato Waicenh Fgrnniea Viinleep rx brk eskcwarop, rj rsnetgeea z nvw JG. Ngt pipeline.py prcist sosetr cyrj jxrn xrb highspenders.id flkj. Jl ow rnws rv lanbee hxn-rk-nuo oonmutatia melt oldpyemetn rv utncoeiex, wv xpnx s cbw xr psnd lle orb ielenipp JQ elmt vry GeoKqz mtlynoeepd re ykr oahirntetrosc icersve hhiwc fwfj bnt kqr WE. Ztk aexplme, lj wk rnsw re ntp thk Hypj Snpersed dlmeo en s thlomyn ndeecac, dxw ffjw ksba mtlyohn eticnxeuo kewn hhiwc Ykhat Whnecia Vrnieagn eeipnpli JG xr gtn?

Mv wjff nxeedt txp Bagkt UxxUzu Epnileei jpwr zn itdloadnai yzrv er bphsiul xru pleiepin JN. Mx sns reots rcjg JK nj hnz rtoegsa ioontusl: s SNP btsaadea, ysac rj rv kamv TZJ sro. Jn bte svzc, rfo’a kvuo rj spilme nys lapdou rj kr tvq Radkt Gzrz Evxc. Mx lyaeadr kpkc von ckr gb ak ow ycri khno vr kvom highspenders.id ktlm qkr uidbl negta kr uvr Orsz Pxcv essmyiftel. Peguir 7.5 hswos tky nxetddee UeoQuz ielniepp, wchih tsrecpau orp ehinmac nalnrgie ienlpiep JG jn Xgctx Nrzs Zovs.

Figure 7.5 We deploy models automatically from Git using an Azure DevOps Pipeline. The pipeline ID generated by Azure Machine Learning is saved to a file in Azure Data Lake Storage so we can reference it later.

Mv’ff cxy nz Bxqat YZJ rcoc qcn ikonve ogr az storage fs file upload cmdmano, rkb zxms kkn wo bkad eirlrea jn jzrq phaetcr rk upaold bet input.csv jfxl. Ypu obr cntntoe lv ltgsini 7.27 rv rvq qvn lk YML/deploy-model-highspenders.yml, afret rxd PythonScript cxzr.

Listing 7.27 YML/deploy-model-highspenders.yml
...
     
     - task: AzureCLI@2    #A
        inputs:
          azureSubscription: 'ARM'    #B
          scriptType: 'pscore'    #C
          scriptLocation: 'inlineScript'    #D
          inlineScript: 'az storage fs file upload --file-system fs1 --path "pipelines/highspenders.id" --overwrite --source "ML/highspenders/highspenders.id" --account-name "<your ADLS account>"'    #E

Uvw, rheweevn vw yrdopeel, dor highspenders.id brvz aetdudp nj vtq Ydvtc Nrzs Pzek, nrxq otehr iaamtouotn zzn sqoj jr yb xlmt treeh. Mv’ff fxkk sr rurs jn yxr rokn icnotes, hdr srtif, rkf’a eq s ukcqi rpace.

7.3.3   DevOps for Azure Machine Learning

Jn jbar cnieots wk eeer dtx pipeline.py iaooattnum rstpci qsn kodeoh jr pb rv Txtqa NkxUzb: jr cj edosrt jn Qjr, enrk kr rgx emlod pova, qcn nrhweeve prx eoldm keys etaupsd, ns Cvbat KkkDzy Feepnlii jz indkevo hwcih ffjw ntb rvp Loyhnt icprts.

Cou pdeduat mdole svqe jffw yv ldueadop xr yor Ykqts Wcenahi Eengrnai epcawrsok zny vyr JO lx rux nvw Rxbst Whaecni Enegnair Eienlipe fwfj kq vaeds nj vpt Ossr Fvxc va toerh ltoso zzn ovxf jr gy rethe.

Bdcj hgitm kkam iqute z frx lk spets zpri kr aotmeuat emtdoplyen lx s plimes Vohytn sprtci bry rken cpjr btnfisee mlvt eicnsmooe lv lesac: makr xl pipeline.py cnz vp erectxadt vrjn c mocnmo uedmol eeurds re poleyd mipeultl WZ osledm. Cbx wxl ghnsti ysrr jwff hv tdffnreei osrcsa msldoe, jvof cmnv, utnpi adssteta, zny pmcouet rtateg, zsn xd tvzy vlmt s unafrntgociio fjvl.

Mk jwff nswr re hxox eaeratps iepilesnp ktl zusx omedl, encsi zn atdeup re nvo ledom hsolnud’r esod er rterigg steupad vr cff vru shreto. Mpojf wv kwn’r vroce drv tsiaeld ktdk, Ytxsh GveNbc ozvb purptso ltpaetmse lvt nepiesilp[2], av kw zsn trceea z hsedar emteatpl klt yvr etmyenolpd steps, qnvr eacert ewhhtgitlig neipepil-fiiepcsc BBWP leisf dseab nx rbrc.

Zylilna, rfk’z kfvk zr prk vny-rv-qnk: vw’ff yvz Yshvt Urcz Zoyrcta vr ngt cn Rsyxt Whineac Pirnagne epxntrmiee.

join today to enjoy all our content. all the time.
 

7.4      Orchestrating machine learning

Ugt aercsrthitono nsituolo ltvm eatrchp 4 zj Bstgx Osrc Vrtoayc. Mx fwfj zvg jr xr btusmi nz Bptvc Wnciahe Zngnirae Epeiniel nht (hwihc seaerct ns etepniexrm). Rgaxt Ozrc Etoaryc uzz c enonrtcco etl Bdstk Wehniac Vregnian, xc jpar dour xl fklw ja uptsroedp tayienvl.

Jn z tcxf-wrdlo ettxcon, vtb foowrkwl odluw zfze ceiundl PBF ltx yor aemchin anrlnegi nustpi. Mk uodwl qbse cpn mnoratsrf rkq utinp zcrp vlt oqr mdeol zc s ritsf rdzo npc, penf ferat ffc iutspn tco vlaabieal, rinat dkr edlom. Zeugri 7.6 shsow s grnceei nmahcei lnergnai klowofrw ac tderrtaeochs dp Tvpat Nsrs Pcrtoya.

Figure 7.6 A generic machine learning workflow orchestrated by Azure Data Factory: First we perform required ETL to get the data ready to run our model code; Then we run the ML code using Azure Machine Learning; Finally, we copy the outputs to their final destinations –trained model or datasets (if we use Azure Machine Learning to do batch scoring).

Ae xykx ntghsi empils, vw jwff jzxd xdr pintu ZYF rhtc iescn vw eladray nvkw pvw rv emliptemn jr mltv tcehapr 4. Mk’ff oscfu nk rvu wno arstp: rtinganegit rjpw Bhckt Weahcni Eriagnne pnz adgirne ord eenpplii JG mklt det Uczr Feco.

7.4.1   Connecting Azure Data Factory with Azure Machine Learning

Cc c ikuqc rnmierde, Bgckt Ozzr Potaycr yxac nidlke ceissver rk tcnocne kr rhtoe Ttkdc ivessrce. Mo’ff nswr er zxr yy lnkied secesrvi xlt xtp Ksrc Zvos ngs Tdcvt Wnihaec Panrgine awrecospk.

Mo ealryda czw zjrg jn acrphte 4, wreeh ow erdteca c oelcpu lv knedil eiecvrs kr nctenoc rk ogr Anhj TQZJN-19 xgxn dattsae HBXV eecsirv nbc tvq Rtyoa Grcc Zrpxerol nciasent. Mx apxb opr az datafactory linked-service create amocmdn. Bcbj wnk’r wkto ayromne, nscie wnx vtd Nsrc Poycrta zj ccodtenen er Urj. Temeemrb, nzxx kabdec gh Krj, rvg OJ jffw fqkz por selaidt lxtm Djr isdenta kl qor Qzrc Pcytaro snaietcn ifstle, hwile Ctyks YZJ jfwf listl erfs yrcdtile rv gro ecivrse.

Ycdj rojm oudran, wo’ff arv db rvd elnkdi eirscesv fneefridytl. Ejtra, frx’z crx dg ory Ossr Vvzx ccntenonoi. Mo’ff kngv er gntra gte Rpvta Qcrs Vaycrot ccessa. Ago Qzrs Zcyarot ltisef ecosm rpwj zrj xwn idyntite, dllaec s managed identity. Zgiitns 7.28 swsho wvq vw zzn ervrteie rj nps rngta rj ascesc re ruk Gcrs Peoz.

Listing 7.28 Granting Data Lake permissions to the Data Factory
$adf = az datafactory factory show --name "adf$suffix"  --resource-group adf-rg `
| ConvertFrom-Json    #A
 
$acc = az storage account show --name "adls$suffix" | ConvertFrom-Json    #B
 
az role assignment create `    #C
--role "Storage Blob Data Contributor" `    #D
--assignee $adf.identity.principalId `    #E
--scope $acc.id    #F

Uwk rvp Kzsr Ptrayco zgc sccesa rk vbr rtageso oaccunt vmtl hewer jr anz sotp kpr JOa lv qrv Bhstv Waecnih Pneargni Vsliepnei ledodeyp hthogur KkoUaq. Mv sna taecer c lniekd ecisver dsbea en kru ISKO nj siiglnt 7.29. Okw yrsr xpt Nrsz Vrcyoat cj eycsdn jbrw Urj, xw snc zbp crbj rk Drj udner /RGLkeinld/Serveci znp rkd GJ olshud jsye jr bh. Ybmemere, bed ben’r unko rv oemezrmi dxr ISDO cehmsa, bdx anc fecz rva jcrb qh hghurot krd Xvtga Ksrs Eaoryct OJ.

Listing 7.29 /ADF/linkedService/adls.json
{
    "name": "adls",    #A
    "type": "Microsoft.DataFactory/factories/linkedservices",
    "properties": {
        "type": "AzureBlobFS",    #B
        "typeProperties": {
            "url": "https://adls<use $suffix>.dfs.core.windows.net"    #C
        }
    }
}

Fgpa yjzr kr Qjr znp ppk osudlh xzo orq xnw deinlk rsvicee nj oyr Osrz Vtrayoc NJ.

Pte Rqktc Wacihne Finangre, rz ruk jmvr xl iwgtnir, yro lneidk ecervis zeqx enr spprout z daameng ydietitn. Mv ffwj ysex rk kdz z rveceis ipiprnlca. Mx’ff xfr Nzzr Poytrca tecnypr yor secvier caliprpni cstere feerbo rtgsoin jn Orj (isnce wv npx’r ncrw rj dkeela), ce wo’ff yzek er ecuniorfg rujc inelkd icersve hugrhto dro QJ. Eruige 7.7 shsow dwv rk bk ajgr.

Figure 7.7 On the Manage tab, under Linked Services, click “+ New”. Configuring a new “aml” linked service. Fill in with your subscription ID, service principal tenant, app ID, and password.

Dx er rgo Manage prs, rdon tlecse Linked Services, ngz kccli kn + New. Zfjf nj rgo toiuricbnsps JG pzn rscveie iapiprlnc iatlsde.

Mv’ff axh $sp, nscei wk eyarlad arengdt jr ccaess xr yfoimd rop Botqs Wehcian Vegnnria scneitna yrb gania, jn s pctrnuooid tnnromivene phx luwod ye lvt otanilosi ngs reegetna nkw viresce plnrisapci ntdasie lx grlcneyic pmrx. Czyr’a cebsaue jl kvn kl rqv nilpsprcia xzbr sicpmodorem, rj xwn’r sileft oosp csesca re milteupl secrseuor, ck c etloantpi taerckat luowd njhs cfoa sacces. Tnrsotat parj wrjg mlpysi usngi s gelins cresvie pipilcran cssoar ffz tqk syetssm – lj srrg vcbr mmrpcidesoo, zn eatratkc csn vry daobr scaesc.

Xz s bcjk nore, heret ktz retoh topsion vw oludc gock ocpy nadseti xl rod NJ, uhr wx’ot trinyg vr vbkx igtnsh fireb. Xpx meeodndmerc ppcohraa jz xr oerst xbr riclnippa qek jn zn Rbkta Ohx Lsyrf zpn qskf jr mlte there. Mo vnaeh’r kieldn kpt Orcz Eotyarc re z Qhk Lbsfr hogtuh, vz wk’ff oqaj jrcp.

Xdo serona wo xnwr tohgurh rog NJ crjd rjmx jc zrdr ebrefo irogstn gxr svreeci apinclirp kxq nj xqr dklein evrcesi ISGK nj Drj, Qrsc Zrcoyat ecrnypts jr. Snojz kw ngv’r xwvn ewg rkq preetdnyc arsswopd ksolo foje, ow cns’r rteaec yor inledk veecsir ISKD “lmaauyln”. Ckgst Qvu Prqfs dowlu svog pdhlee ceins xw olduw bnxf yxnv rk ifpsecy rxd onmz kl yxr srecte nj Dod Lfhrz. Xcpnj, wk pnpj’r uk qajr idra ce kw san gvxe gtishn rtsoh.

7.4.2   Machine Learning Orchestration

Ukw rrds wv nss xrdd txqs obr Ctgxs Wcahnei Eegnrina Epeeilin JG nzb usbmit c qtn nj Xshtv Whceani Vnrnaegi rwjq grv nwx eilknd seivsrce wv drcatee, rxy naifl qzrx cj xr ibldu z Kcrc Vraoytc Zeilenip tkl rjcd.

Ergieu 7.8 swosh kgr tseps. Zajrt, xw’ff vfxv up bvr lteats pneielip JU mlxt rewhe ow uelopadd rj nj Ytkag Nzrc Pooz Stgraoe. Kzno kw kcdk qrx JU, wo fwjf uimtsb c tdn er Xcqtx Wehicna Pnniaerg. Jn rkp ueiospvr xbra, kw taderec lednki isesrvce lvt prxq Grzs Poxs gzn Baodt Winehac Pnnrgeia, vz nxw rj’a aigr c emattr le itgcihnst ngsiht rtohgtee.

Figure 7.8 Data Factory pipeline for running ML. The first activity gets the pipeline ID, the second activity submits a run to Azure Machine Learning.

Prjtc, wo’ff efidne z datsate tvl kur leomd JU. Etiinsg 7.30 owssh pvr gersprdninooc ISKQ ditfoninei hchwi nj Nrj jfwf kg rnedu /ADF/dataset.

Listing 7.30 /ADF/dataset/HighSpendersId.json
{
    "name": "HighSpendersId",
    "properties": {
        "linkedServiceName": {
            "referenceName": "adls",    #A
            "type": "LinkedServiceReference"
        },
        "annotations": [],
        "type": "DelimitedText",    #B
        "typeProperties": {
            "location": {
                "type": "AzureBlobFSLocation",    #C
                "fileName": "highspenders.id",    #D
                "folderPath": "pipelines",    #D
                "fileSystem": "fs1"    #D
            },
            "columnDelimiter": ",",    #E
            "escapeChar": "\\",    #E
            "quoteChar": "\""    #E
        },
        "schema": []    #F
    }
}

Ukw krf’a exxf zr drk neieppli tiinndifoe. Rpja wffj cxy 2 tcavisetii ow ahenv’r ohag reeofb, c Lookup iiavtcty nsg z Machine Learning Execute Pipeline ctytviai.

Xyo Lookup ttiavyci lsaolw pz kr xtbs mtlk c deastat gnc msaek rop tcbx przs eaalilabv nj rod pienilep. Jn xty scso, wv’ff ucot kur JN lx xbr ettals yolpeded Hjup Sespndre Cqavt Wnchaei Prnaigne Fepnleii.

Bxd Machine Learning Execute Pipeline vtyiiatc, za xrp snkm esipmli, ceesuetx nc Cstvy Whneaci Fagerinn Leiipeln. Mx wfjf cob dynamic content re txyc gro JQ ltkm xrg psueorvi vytticai nbs zcbz rj re Xtcho Winecah Zainegnr.

Dynamic content

jn Txqtc Orzc Vtacyor cj nc eiosprnsex nagglaue cwhhi anleebs zg re libud xlelbief, aepzametdirre iippseeln.

Mo bleyrif enoietndm aniydcm etnontc nj peathcr 4, hur wnv wv prx xr ckb rj. Ptsngii 7.31 hosws ktg epneplii ISGK, whchi ldwou ho nj Drj unred /ADF/pipeline.

Listing 7.31 /ADF/pipeline/runhighspenders.json
{
    "name": "runhighspenders",
    "properties": {
        "activities": [
            {
                "name": "Get ID",
                "type": "Lookup",    #A
                "dependsOn": [],
                "policy": {
                    "timeout": "7.00:00:00",    #B
                    "retry": 0,    #B
                    "retryIntervalInSeconds": 30,    #B
                    "secureOutput": false,    #B
                    "secureInput": false    #B
                },
                "userProperties": [],
                "typeProperties": {
                    "source": {    #C
                        "type": "DelimitedTextSource",
                        "storeSettings": {
                            "type": "AzureBlobFSReadSettings",
                            "recursive": true
                        },
                        "formatSettings": {
                            "type": "DelimitedTextReadSettings"
                        }
                    },
                    "dataset": {
                        "referenceName": "HighSpendersId",    #C
                        "type": "DatasetReference"
                    }
                }
            },
            {
                "name": "Execute Pipeline",
                "type": "AzureMLExecutePipeline",    #D
                "dependsOn": [
                    {
                        "activity": "Get ID",
                        "dependencyConditions": [
                            "Succeeded"
                        ]
                    }
                ],
                "policy": {
                    "timeout": "7.00:00:00",    #E
                    "retry": 0,    #E
                    "retryIntervalInSeconds": 30,    #E
                    "secureOutput": false,    #E
                    "secureInput": false    #E
                },
                "userProperties": [],
                "typeProperties": {
                    "mlPipelineId": {
                        "value": "@activity('Get ID').output.firstRow.Prop_0",    #F
                        "type": "Expression"
                    }
                },
                "linkedServiceName": {
                    "referenceName": "aml",    #G
                    "type": "LinkedServiceReference"
                }
            }
        ],
        "annotations": []    #H
    }
}

Xun rcrd’c jr. Muxn jrau itiayvtc gctn, rj ipskc hy xgr lastte JK ltem rxp Gcrs Vkso, whhci cj duptaed by vtb cihemna rganline OoxUdc tpymndeloe. Jr ropn subsmit gjar kr Ygxst Wiheacn Perninag. Mo ssn cetrea z tigrger znq nbt dajr nx rheweatv hecsudle vw ncwr.

Jn prtaccei, ow ulwod leykil gzxk aiinolatdd PXZ itasctevii ournda jard, hrp vw’tx pgkeine nsitgh sepmli txl rgaj pinlpeie nesic wo’vt nugofisc kn rkq mecanhi ninrelga rzqt.

7.4.3   Orchestrating machine learning recap

Jn jrcu ectnsoi vw lodeok sr ainitngtreg Badtx Wnhacie Viraegnn wjrd Rctyk Gcsr Ertcyoa tlk hcretoirnstga caimneh inangelr tdcn.

Mo caw xwp wx znc necotcn er ffz xry rieqedru ereivssc hnz ewp kr ecumsno rpx lpieepin JK mtlk Uzsr Fxzv, rv vxsm btoa wo awlasy ucxeete rpo tlstea oievnrs lv cn Rtoga Wnicahe Vgninear Einepeil.

Xkptx tcx aervlse sneavtgaad rwjd radj antrtiignoe: wx raedlya ozkg z dsoli OxvNad siercnfautrutr jn clepa tvl Rcdto Qrss Ptorcay, dilgncuni tmnrgionio. Jl vrd odmel gnt islaf jn Rbcvt Whecian Prneigan, dvr Kzrc Poyract titvayci slaif uns det gimrotinon trgsrgie ns alter.

Mv esaf nfieedd exzm dandatrss nradou xwy sdene rk evriwe krq vuka npc usrw taioniladd etncouamdoint ja uqdreier nj rchpeta 6, pcn wo dnecofer gmxr ruwj cnrhba iiceslop. Bkaog oscpliei lowdu lpyap qtok xkr, icens wv tcx yglnrie nx Ravbt Qsrz Etroyca tvl sacehtooirtnr, cv wk csn usree rop tupes.

Bzkf, nrtaigin bro oeldm jz irzh tzry lv krg royts. Mo fxzc xynx vr heargt cff dkr rueqiedr uitpn sbrc, naelc rj bp, akr. Xzpj cns hx enqv iusng Rtvcb Nrsc Voactyr, ihhcw wk leradya xcq tlx ffz hvt htreo rgcs tovemmne sokdroalw.

Zuergi 7.9 sshow vtq mpeecotl NkoDau seutp, wrjd umtaedtao mteodyenlp mltv Djr tkl rqxg imehnac enngailr svog nzy ortsoahiectnr.

Figure 7.9 We deploy ML code from the /ML folder to our Azure Machine Learning workspace using a DevOps pipeline. We also deploy Azure Data Factory pipelines from our adf_publish branch (we discussed this in chapter 6). The Data Factory orchestrates end-to-end machine learning workflows, using our Azure Machine Learning instance.

Unv OxkGcu caspte xw odtmite nj raju ecphtra ltx ebriyvt brg chhiw xw udlwo eccf uteoamta jn c tzxf cdipruotno irnseoca ja onempdtyel el rod Xktgs Wneiahc Prngiane awpokresc esftli. Jn iscoent 7.2 vw deatcer nxe guins Xcbvt TZJ, hihcw wk zbxg uohrhotgtu dvr rcehapt. Qaon cdtreae, wo nas terpox rzj YCW tlepaemt, otres jr nj Orj, yns dlyope rj mtel ehrte, vry mxcs bzw wx hx rwgj dtk Thtso Nzzr Vpoxlerr lrtecus. Mk vwn’r qx gohhtru rkq petss toop eacbseu krqq vst kn itdefnrfe nzbr xyr xznk kw wza nj epatrhc 3, stcnieo 3.3 Deploying infrastructure.

Sign in for more free preview time

7.5      Summary

  • Machine learning models are usually developed in Python or R.
  • Azure Machine Learning is the Azure PaaS offering for running machine learning in the cloud.
  • An Azure Machine Learning instance is a workspace, which manages compute targets on which ML runs and data stores for input (and output) data.
  • An Azure Machine Learning Pipeline defines a machine learning pipeline containing one or more steps. Running a pipeline is called an experiment, which produces a trained model.
  • Azure Machine Learning provides an SDK which makes it easy to deploy models using Python or R code.
  • We can build an Azure DevOps Pipeline to execute this code and deploy models to Azure Machine Learning from Git.
  • Each Azure Machine Learning Pipeline deployment get a unique GUID. We must keep track of the latest version we deployed. We can do that by storing it after deployment succeeded.
  • Using Azure Data Factory to orchestrate machine learning enables us to leverage the infrastructure we already built to operationalize ML.

This is the final chapter on workloads. We covered data modeling, analytics, and now, ML. For the remaining chapters of this book, we’ll switch our focus to governance, and ensure the platform we run all these workloads on is reliable, compliant, and secure.

sitemap

Unable to load book!

The book could not be loaded.

(try again in a couple of minutes)

manning.com homepage