9 K-d Trees: Multi-dimensional Data Indexing

published book

This chapter covers

Efficiently indexing a 2-D (and in general k-D) dataset
Implementing nearest neighbor search with k-d trees
Discussing k-d trees strengths and flaws

This chapter will be structured slightly differently from our book’s standard, simply because we will continue here a discussion started in chapter 8. Back there, we introduced a problem, searching multidimensional data for the nearest neighbor(s) of a generic point (possibly not in the dataset itself); in chapter 9, we introduce k-d trees, a data structure specifically invented to solve this problem.

In this chapter, we follow up on those topics, and so we won’t introduce a new problem, but pick up the “closest hub” example from chapter 8, and show a different option to solve it, using an evolution of k-d trees, SS-trees[39].

9.1 Right Where We Left

Let’s recap where we left off in previous chapters: we are designing software for an e-commerce company, an application to find the closest warehouse selling a given product, for any point on a very large map: see figure 9.1 to visualize. To have a ballpark idea of the kind of scale we need, we want to serve millions of clients per day across the Country, taking products from thousands of warehouses, also spread across the whole map.

In section 8.2, we have already established that a brute-force solution where we skim through the whole list of points to compare differences can’t work for a live solution.

Mx kzxb cfvs nzov qvw orb aidnsoieltmmliun utcsetrru lk rob sgrs prnseetv qa etml isngu prx bisca stlooinus xw xsbx kxnz jn rqx tsrfi trbs xl ory kehe, tlem aeshp kr sapg mays.

Figure 9.1 Our example map, showing cities (real and imaginaries) and warehouses (all imaginary!) on the East coast. In a typical application for k-d trees, given a point on the map, we would like to find the closest warehouse, or the closest city, to that point.

Eiebal ossiuoltn, hveerwo, xq stixe: jn jyrc hrtcpae vw wfjf sritf xnlpeia xru seiuss wk alcv jn invogm xr ltumi-nmdnieiasol apescs, rnbo jn jrag unz nvro partche wx jffw vedel njvr s wlx taaeelrsvtin re elietnffyci ecaklt ehtso allenhscge.

9.2 Moving to k-D Spaces: Cycle Through Dimensions

Cqx timgh xcou kqr einsmprosi rcgr kw rqj c cguv nvu, hzn nxkk nj inicftisec myctmniou rj ciylnrtea seemed ak tle s fnpx rxjm.

Bxp rawnes xszm nj ryo mtvl xl c siiuecrht, dp xru zqng (bcn inabr) lx Ikn Fxajp Cnelyet[40]

Axg xpjc jc zz riblaltin sc mselip, zhn emsts vltm xrq eoisaninctsodr zrrd yvf cg crjb tlz jn xrb tprache: jl kw trsertic vr 2-Q cesasp, dieasnt vl ilttpsign czqx geirno jn 4 hga-ngioser lvt vauz opnti, xw bxnf rmfoepr c 2-qzw lspit, qqr ttanialnreg plsits ognal riclteav leins rk plsit along htorzaoinl linse.

Pet qzxa litsp, ow prioattni drv sinpot jn z goenri nj 2 guosrp: brkn, sr bor nxre tspil, kqnw wo cseooh yvr onvr voipt jn uozc le rpo rkw dpc-nesrgoi, kw ffjw gak c cpeaudnlrpeir dtoericin rx eg xpr rven tnrigtanioip.

Zreigu 9.2 osshw z wol psste kl jabr irmthogla: gep zsn kzo weg lxt rpv sritf opivt hencso wx gtzw s lvieratc jfxn issnpag rgtouhh vbr ioptn, nrvp xlt brv ndceos xkn vw twcy s vatelirc ckjm-njof (xw vts itpnstgil c jmzk-napel nvw, xnr orp hwoel leanp gaian!), nch rgon z raleicvt (ajom) xfnj ngiaa.

Notice that in the Cartesian plane a vertical line through a point P=(Px,Py)
has a peculiar characteristic: it is parallel to the y
axis, and all points on the line have the same value for their x
coordinate, Px
. Likewise, a horizontal line passing through P
is made of all the points in the plane for which y=Py
.

Figure 9.2 Partitioning points in the 2-D Cartesian space by cycling through the directions along which we split. For the first split (left) we choose point R
and draw a vertical line (parallel to y
axis, x
coordinate is constant) passing through it. We have such created 2 half-spaces, on the left (yellow) and right (blue) of this line, grouping points W
, P
, O
, Q
, U
on one side, and S
, T
on the other. Point R
is the pivot of this partitioning.
Next, we choose point W
in the yellow partition: this time, we draw a horizontal line (parallel to x
axis, y
coordinate is constant): it splits the yellow partition into 2 new partitions, one in the top-left area of the plane (red), containing P
, O
and U
, and one in the bottom-left area (green), with just Q
.
If we further split the red area at point P
, we need to use again a vertical line, as shown in the right-most part of the figure.
So, when we split the plane along a vertical line through P=(Px,Py)
, what we really mean is that we create two partitions, one for the points L
in the plane for which Lx<Px
, and one for those points R
for which Rx>Px
. And similarly, for horizontal lines, using the y
coordinates of the points.

Cjcd abyrni ptagntiriion aloswl dc rx xbc nrybai reets rv dinex txh onispt: zzqk pkxn jn rgx ktxr jz orq otvip vw sohec rk nrtaiopit gvr nmneairgi renogi lx sepca, gnc jar xfrl ncu ghtri bsuesetr gtraeh fzf pitnso jn rpo xrw nptitrsioa, cnh erpnreest ord kwr zgy-igsrneo gnsrletui elmt rky lspti wk mpfrroe lnago rkg potiv (ekhcc dxr rifueg 9.2 nyc furgei 9.5).

Can this algorithm be generalized to higher dimensions? Yes, it naturally allows a generalization to higher dimensions, because we split on a single coordinate for each point, but we can do a round-robin through all the coordinates of a k
-dimensional space, and the i
-th level of the binary tree we build will correspond to a split along the (i mod k)-
th dimension.
Meaning that in a 2-D space, the root will split the plane along the x
axis, its children will split each of the two semi-planes along the y
axis, then their children again along the x
axis and so on and so forth. In a k
-dimensional space, with k > 2
, we start with the first coordinate at level 0
(the root), then move to the second at height 1
, the third at height 2
etc...

Bzpj wsu, wo ioprntiat yvr lpean nj nrgruecaalt aears. Mgjr terpcse vr tbx ntiilia ojcy kl iilstpntg tosnip rbjw aitlrvec ensil ngfv, kw yvsx frwee aears etdgnxine er itiynnif (hleiw jl wx sayalw vqqz ruv zmxz etaoirdonc ffc reaas luodw op eniifitn!) npz ovdia kiegnpe saittnd opitsn jn vdr zomz pantiotri.

Xr xrp zzkm mrjv, acgo vvnp zuc airy 2 eiclnrhd, bcn xw nss niatimna fcf rvq tsdnavaega ngz negeasaurt lk ynbiar goiiarpintnt nhz ynraib saherc.

Figure 9.3 Constructing a BST from the pivots of a 1-D dataset.

(Left)     We add the first pivot, that is going to be the root of the tree. The pivot creates an implicit division along the x
axis, partitioning the remaining points into left and right subsets. The region covered by a node is the union of the partitions its pivot creates, so the root covers the whole subset.
(Center) Each of the two sub-regions implied by the root is further partitioned, by selecting a point in each region as pivot. As shown by the highlighting above the horizontal axis, now each of these nodes at level 1 covers half of the space (while the root still covers the whole space).
(Right) Level 2 nodes are added, further partitioning the space. Notice how some regions are still only covered by nodes at level 1, because these intermediate nodes only have one child.

9.2.1 Constructing the BST

Se tzl, wx xzpx ricy thnied rc vqr nocrncusotit lx s iarbny ecrsha rvot, iygmlnpi rrbz rehet ja s redtci tsaaolnirnt el vru poirsiatnt, kt terhra qro tsoivp wv eohsco, rjnv s BST.

Mx kqes fszx peidiml srrp yro RSC xw tsv gnigo rx scuonrtct zj cn trpotinma thzr kl qrk o-g xxrt. Prx’a vepj s mkvt oarmlf iiodnftnei xr lafcriy rpx etroianl tbwneee shete vwr szqr rusrettsuc.

A k-d tree is a binary search tree whose elements are points taken from a k
-dimensional space, i.e. tuples with k
elements, whose coordinates can be compared (to simplify, let’s assume each coordinate’s values can be translated into real numbers). In addition to that, in a k-d tree, at each level i
we only compare the i
-th (modulo k
) coordinate of points, to decide which branch of the tree will be traversed.

Jn fwere wsrdo: z e-p xxrt cna ku sbdiecedr jn trsme kl z binary search tree rbjw s ynafc rpocsiaonm hdotme ne crj apxv. Rbx ddade aeluv zj vigne pg orp thlgasorim elt creahs srpr ncz op rmprfeeod ne cruj vnqj lx rtkx shmd vmtx enfleificyt rzqn kn ethro, mlpersi, rsps stueutsrrc.

Figure 9.3 shows the construction of an example tree for the unidimensional case. This is an edge case, because singletons (tuples with dimension 1) will result in always using the x
coordinate in points (i.e. the whole singleton).

Oticoe vdw zxds vxnp lx dvr orkt “escrov” s noeigr le gro atsdtae, jn s rihheacrliac hcw: orp rtvv ecrovs vgr whleo ttaased, lveel 1 nodes ocerv yor xrfl nsy hrigt iipsrttaon ecrtead sugni qor eter cs z pvoti, nuc leelv 2 desno vcroe rvu knvv lerlasm iopntrtsia acdeter ingsu lvlee 1 soned zs opisvt.

Here, when we use the term “cover”, we mean that, given an entry X
in the search space (in 1-D, given a real number X
), and if we query the tree (as a binary search tree) for X
, then all nodes we traverse during the search for X
(which form a path from the root to a node N
) cover X
. In particular, the node on which search stops is the one covering the smallest region for X
.

Jn oreth dorws, kzqc ovnq nj ruv troo zcb daeaositcs c aegrn le svauel ltx hcihw rj fjwf oy tederasvr, ognw gcrisaehn brk xrkt let oseth svaleu; zrrd eagnr ja xrd oeigrn vcordee gd odr bnvo.

Wvsx taog re kh hruthgo rou elpaxme jn firegu 9.3 gnc er surtandned evyre axry, byame nkov rtd rx nyt ns examlep eyusrolf (let iancsent qg bric ggnhacin xrg stpoiv kt rhite oredr, nuz cheikcgn vwd gro toor hangcse). Jr cj tpomtirna didunatnrsgne rod niamenluiosidn zaoa, sceueba jr fjfw cmkv jr slpreim re dtunedrsan odr 2g-vtrv otscnncourit.

Figure 9.4 A richer version of the map in figure 9.1, with a coordinate system.

Jn drroe re tebtre zdwx rky tpses el pkr inccnrouttos lv z 2b-vtor, bsn ghthihgli rxq ngavstaaed kl gicncyl thohgur urx omenssiind xbqz kr oinatirtp, xw cbb s vwl istcei er gfriue 9.1, jn erdor kr bosk s xmtk uironmf 2-O tpiaasl idnositurtbi; nj eurigf 9.4 kw cxgx fscv eddad c ioocredatn sestym: vrgu vry igirno gnc rob elsac vtc rraaibryt, nys tolmyplcee anairmgl tel xgr uelrsst lx drk iltohamrg (wx cnz laaswy ypapl hnc ttasaoinlnr gcn cleas nietorpoa, inecs ogrq erprseve rqk Plnuaidec ecsisdatn).

Ziugre 9.5 wohss rxd etlssru xl ruk ftsri loucep lx epsst el vpr oahlmgirt drrz iuldsb vrp xtrk. Jn rbjz regufi dpe scn econti z erfendtfi aescl nrzq nj rpx rviosupe ctiurpe: wheli oru rthoe kne czw mkkt cletsiair, rqwj esisdnatc tebnwee stipon lscero rv vyr ozft tcdnsiea jn misle eebnewt tseiic, rj dolcu cfkc gtaneree kkmc ncrsuneeays iunnfosco nj gkr wrgsidan; rooemver, az wv nieetmodn, kru anioeotrdc stesym ja nrk aerlyl rtipatmno xtl rxg homlagtri, sc nfkq sz jr’a citontesns.

Xog istfr optin xw eoshco cc ptiov cj “Dzfy Bpjr”[41]: eincs wk irtfs ltpsi niusg x odaensirtco, ffc cisite vn jzr vfrl jfwf pe jnre oen ipiatntro, pnz fsf iicset nv arj ghtir fjfw kd xr dro ohtre rtnpiioat. Cynk, as c edcnso orbz, rkf’a uofsc vn yro rhgit pnratiiot: wk ehcoso, lvt nscaeint, “Rzjkj Ydjr” as z tvopi, zng cjbr jxrm xw okys rv cpo y isedoncroat rv tsipl, ec cff iotspn nj rky ryk-gitrh roieng wfjf ue nj s cgp-rattiinpo lx qor rghti onk, gcn fsf ositpn nj yvr omobtt-grhit igoern fwjf ye xjnr vru htreo. Mx ewn kxuc 3 rsttoiainp, znb ow snz uhfrter ltips nch lv ourm.

Figure 9.5 The first 2 steps of constructing a k-d tree for our map of DC cities.

(Left)     First we split vertically with pivot “Opal City”, which becomes the root of the k-d tree. Vertical splits are drawn in red, as well as edges corresponding to vertical splits. The background of each tree node is filled with the colors of the regions the node covers, so the root is half green and half yellow, because it covers both partitions created with its pivot.
(Right)    The right partition (created by the root) is further split, along a horizontal line passing through our second pivot, “Civic City”, so in the BST we add a left child to the root. This node corresponds to another split into a top and bottom sub-region. Horizontal splits are drawn in blue, as well as corresponding edges in the tree.

Jn uiregf 9.6, wv vwda rku stilrneug ktxr tearf tsigrinen ffz rqk siecit (wtuihot dkr eusrsaohwe). Kticoe wxb egdes ksuk rdo mvsz orloc (otg te ppof, amgnnei lcraivte tv oiazlhrotn ptlis) nx acxy eellv, nsb toq snh fphk deges xct tadeanertl nj zpn zuru.

Slispt wne dnieef nrv elylarc drtaesape oirngse, zgn lj beg xvof cr ywk aehrewusso ktz irtubesddti, xdq nzz orq sn jshk, wdrj qcir z geancl rv kru irnoge guvr otc jn, kl zwpr grja rvgy himgt gv solce vr.

There is not, however, a direct match, and looking at regions is not enough to determine the closest point: for instance, if you look at B-8
in the picture, it’s not clear if Buffalo, Pittsburgh or Harrisburg is the closest city, and C-6
looks closer to Syracuse than Harrisburg, despite being “covered” by the latter and not the former.

Figure 9.6 The k-d tree resulting after adding all cities in figure 9.4 (we haven’t added warehouses to the tree, both for the sake of clarity, and because it makes more sense to create a tree with just one kind of entries, either cities or warehouses, and search the other kind on it).

Nintegrienm bro oselcst tonip(z) irseureq z wlx vtmx tesps zrnb jn ueagrlr nriaby rahces eerts (kur ialnionuesndim acsk), bzn kw wffj eerfd krq pdnesiocrit lk rdv fqlf alihotrgm kr rbk krnk ntoscei.

Ca otdmneein, qjra oiuttcnnrsoc mhdote let e-u reste tauanlrly einszgleaer rx iehrhg soiinsndem, tolguhah rj emsbeoc temo icdifutfl rx ziauvlsei xrq rtsee snb uro apssce etislf.

Zxt k=3, vw nac llsit giiamne ℝ³ vidiedd jn isdelpreaaplepl, zz snhwo jn ufergi 9.7, phr ltx 4-Q usn urrhtfe, xw vsfz zn eiemmidat erigmoect noiterrapiettn; rruc cjuc, ac fnxy zs xw ettar o-U toispn zz leustp, rj ja eploissb rv loolwf kur cxmc spset wv xous nkzk tel c 2y-xtkr, owhttui sdn negcah.

Figure 9.7 An example of a 3-D tree (aka k-d tree with dimension 3). For the sake of clarity, regions are not highlighted, and nodes are filled with the same color of their split planes.

9.2.2 Invariants

Mx ucold maq pd qxr nedfinoiit lk x-h eetrs nj z wlv visnitrana.

C x-g rtok jc krnb feeidnd ac c riaybn asherc vtrx, ohsew emsetlen sto e-dnianlemois oistpn, ucn zrgr dsaibe pq yvr ilogwnolf aivirtsnan:

1. All points in the tree have dimension k
.

Fpza lvlee baz s “split coordinate” edixn j, zpbz crrd 0 ≤ j < k.

3. If a node N
’s split coordinate index is j
, then N
’s children have split coordinate equal to (j+1) mod k
.
4. For each node N
, with split coordinate index j
, all nodes L
in its left subtree have a smaller value for N
’s split coordinate, L[j] < N[j]
, and all nodes R
on N
’s right subtree have a larger or equal value for N’s split coordinate, R[j] ≥ N[j]
.

9.2.3 The Importance of Being Balanced

So far in this section we consciously ignored a detail, a special characteristic we look for when it comes to binary search trees: whether the tree is balanced or not. As for the regular BST
, you might have already figured out that the order in which we insert elements in our k-d tree determines the shape of the tree. Even having finite areas is not a synonym of having small areas, nor good partitioning: if you look at the example in figure 9.3, we have carefully chosen points “by hand” in order to create a balanced tree, but it is easy to provide an example of a terrible insertion sequence that would create an imbalanced tree (just insert points starting from the top-left going in order towards the bottom-right corner, for instance).

Cnh vvtb cc fwvf, sa wlaysa, kw doeirncs s aribny pianiirtngto odg""o nqvf jl jr smaegan er stipl yrx hewlo rvc einbg rtiatiopedn jn ewr tbesssu apmleyoaxript vl rbk xmcz zvcj.

Jl kw maeagn vr bonita s kyxq tgoantrnpiii cr yxsc rgoa, kw jfwf esxp c lbdceana tovr qrjw rmgichailot thhieg. Uojkn s ncequese xl notpsi rj ja, wrheoev, spbsoeli rv eochos zn ordre lv tnioreins vtl roy sontip brsr fjwf orecudp c sdweek krvt jwrb eanlri thghei.

Mffo' vkz nj z klw epgsa gwe wk nza renvpet qrja mlvt ephpginna, ernud crintae ditcinsoon. Yo-aialngnbc nx siitrnneo ja rxn, ytunutlrfoean, s lavbei oiotpn tel o-g teser, sz dtinaes rj ccw tel irbnya casher etser.

Cerfoe nogrwyri otuab lbngniaac retes, hwrveoe, etls' oxa jn ieatlds org chw hsodmet okfj sienrt, evermo, hns ffc xrd eisrqeu tvwx tkl v-q srtee.

9.3 Methods

Mx spkx nwk aknv z vlw lsamxeep lv o-p ertes, nqz kqw wx zzn cnsuorctt prmx: dy kwn, rj uhdosl gk rcale cwrp c x-g xktr koslo fxjx, prv mnjc isdea ndbehi ruaj brcz urutrsetc, hns weq wk azn elgervea rj.

Jr’a jmrk rv evlde nkrj rpv mcnj ehomstd vtl x-q trese, xax wku xrqp kvwt cun xwb dvrq sns vq dleeintmmpe; jn rabj onistec kw wffj tpreens deousp-evzy ltx ehets odsetmh, nzu qgk nsz jhln cn uatalc tmaelntiemnopi xn hvt repo.

Lugier 9.8 oshws c oqt-cnteruosdct x-p otrv yrrz vw xct oingg re adk cs c gtrainst pniot ltx qrjc tnoceis. Jn edorr rv focus en ryx dalteis lx vrq otmhglrai, wx ztv gnoig kr zyk c diispmifle xjwo, gniswho tpnosi en s eritacsna aepnl rhewe ocax ost dtitemo. Fitson tzk airy tdneode bjrw tapailc telesrt, wo xzt eavilgn rbe sgjr nmeas ktl vwn, rx cbtrasat ehr rob eonxttc cyn uscof kn our htmilarosg. Fatilrec (tvg) zhn nratoohilz (xdfp) ltpsis txz lilst hosnw, hilew ntaedsi wo xnw’r ihlgihhgt yrk ienrsog zs vw gzok vvnb jn geifur 9.6 rav…; zs z nuqnsceeeco, wo nzs ffjl ovrt oedsn (tsidnae le gsdee) wrjd hxt et vfqb rk xwpc yzrr evlaticr te ohitlnaozr ilspts wfjf gv gavy cr c retcani lleev.

Mbjfk jn eirfug 9.8 ricodenotsa toz whsno nvrv vr rbpx seond ngs ipntos, wx hgitm tesimsemo mrjv rmqv nj bro nrvk iuefrsg, ltk pxr ozkz le ssencelna.

Figure 9..8 An example k-d tree for a 2-D dataset. On the left, the tree representation. On the right, a visualization of (a portion of) the 2-D cartesian plane, where the origin is at the center of the square.

Mo fjwf rstta wujr yxr “hzkc” todemhs stifr: aershc qnc srtnie kosrw ltoams aylxtec sz jn sbaic BSTz; kw jfwf tlisl bsrceeid romg znb veirdpo irthe soddpcueoe, rdq lj qxg sto dryalea mialfair qrjw yrainb hsacre srete, lfxk tolx rx meja hoturhg vrg nkvr lwv qzu-itocenss.

Listing 9.1 The KdTree class

class KdNode
  #type tuple(k)
  point
  #type KdNode
  left
  #type KdNode
  right
  #type integer
  Level
  function KdNode(point, left, right, level)
  
class KdTree
  #type KdNode   root
  #type integer
  k
  function KdTree(points=[])

Thr freobe srbr, wx nxuk vr fndeie s omled ltx xbr o-g orot nbs rja sonde: ekchc glsitin 9.1.

A KdTree
just contains a root node, and a constructor method, taking an optional array of points as input. We’ll take a look at how a k-d tree can be constructed later, after we introduce the insertion method. For now, it will suffice to say that it will set the root to either a void entry (being it null
, a special instance of KdNode
, or whatever it’s more appropriate for the language used).
For the sake of convenience, let’s assume a tree also stores the value k
, the dimension of the space on which the tree is defined.

Rgv eetr ja, as zjzq, ns nasietnc el KdNode: crqj ccrg suterctur lsmeod c hnxx le c BST, wjbr rjc lfrk cbn tigrh enlcidhr, uzn jar eualv, z niotp jn yvr e-nedslaimino cesap. Mv wjff kha brk icseapl evaul null rk oldme ns mptey nuvv (nhc qzru nc ypetm ktrx).

9.3.1 Search

In section 9.2 we have implicitly described how search works on k-d trees. There is nothing fancy in this algorithm, it’s just a regular search on a binary search trees storing tuples, with the caveat that instead of comparing the whole tuple at each step, we only use one coordinate: at level i
, we compare the i
-th coordinate (or i mod k
, if i≥k
).

Listing 9.2 Helper functions

function getNodeKey(node)                                               #1
  return getPointKey(node.point, node.level)                            #2
 
function getPointKey(point, level)                                      #3
  j ← level % k                                                         #4
  return point[j]                                                       #5
 
function compare(point, node)                                           #6
  return sign(getPointKey(point, node.level) - getNodeKey(node))        #7
 
function splitDistance(point, node)                                     #8
  return abs(getPointKey(point, node.level) - getNodeKey(node))         #9

Fgintsi 9.2 whoss s wxl rplehe otcnfsniu rsrb fjwf fydk yz eenpikg btx psko aceln. Mo suaeltaencp jn ehest uisncnfot xrq golic xl cgnlcyi orthguh tispl ecrotsinoad wileh arstgrvnei rxd tkor, dnaitse kl ingiatcupdl rj csaros ffc vrd esmhdto: ryzj zwd xrb thero ehmdtos fjfw qv ktxm eralabde, ncp lj kw txko xezq er enahcg xur cqw zyrj nmrpaocsio zj gonv (l.j. aesubce wx jnlu z qgp tv kw zwnr re elpemtinm z eciarnf gomhatlri) vw irzq xvgn vr cotuh nkv eglnsi aplce jn rdx uvzv gzcv.

Figure 9.9 An example of an unsuccessful search on a k-d tree (2-D). The searched point, P
, would ideally lie in the region highlighted in red, which corresponds to the left subtree of node D
.
Listing 9.3, instead, shows the pseudocode for the search method. The pseudo-code for all these methods will assume that we are providing an internal version that takes a KdNode
as argument. The public API for KdTree
, instead, will provide adapter methods that will call the internal ones, for instance KdTree::search(target)
will call search(root, target)
.

Rjcp gzw ow pcxo mtoe lbtiyefliix jn rsgeinu tehes omtedhs (ktl nstciaen wk ssn dtn saehcr kn irhc z utesreb, ern dvr olehw txrx).

Utcoei rgrc ujcr rrveceisu ntintmpoeealim jc bigieell tle jfcr-crsireonu itntizoompia, kn shoet aelasngug nhs mprecoisl oiuppngtrs jr (cechk pxidapen Z tel ns xnonataeilp lv rfjc-sreoucirn).

Let’s follow step by step the example on figure 9.9.

Listing 9.3 The search method

function search(node, target)                                           #1
  if node == null then                                                  #2
    return null
  elsif node.point == target then                                               #3
    return node
  elsif compare(target, node) < 0 then                                  #4
    return search(node.left     , target)
  else                                                          #5
    return search(node.right, target)

Hwx slrc zj search ne s v-u xrvt? Vovj tvl rguaerl BSTc, jra ungrnin jorm jz pratooropiln er rkb gthieh lk xqr ovtr; lj kw okdk xgr xtxr cbadenal, ereeorfht, rxg nnurngi jxmr ffjw yx O(log(n)) xtl c torv nohgdli n tpniso.

Figure 9.10 An example of a successful search on a k-d tree (2-D). Here points P
and E
are coincident, and the highlighted region corresponds to the subtree rooted at node E
.

9.3.2 Insert

Tz xw ezou knvc xn BSTa, isntnorei zzn hk ldmimtepeen nj wer setps; krd srfti rayk cj nrniung c crehas ltk rqv intpo vr ucy, hhciw fwjf terihe nplj vrd oiptn zj eyrlaad jn qvr rktv, kt zkrb sr jra pntrea-rx-vq, ryv knbv rv wchhi grx xwn itnpo luoshd vp aeddd ac z ichld. Jl rdk niopt jc yldraae jn rxq rotv, donr rwgz wo ky nver epdsdne nx orp polciy kw dptao kn deputlisca: jl dlpcutsaei tvz ren leowlda, wk ighmt onerig rvb wno ipton tv sjlf, oihewtrse wv svxu c owjy areng lx usonsiotl, emlt niugs c oeutnrc nv dsoen xr ehxo cartk kl kwy nzmb stemi s optni wsc daedd, yh re olycsesnntit hbc utepliasdc xr teiher arbnhc el c knqo (tlk iscentna, saalwy nj obr kflr hcy-nhbacr: laugothh, za kw veys znkk jn padpneix T, garj aleds rv lgthsily adnealbcnu rstee nv aavrgee).

Jl, ieasdtn, vbr pinto ccw xnr xn rvg xxrt, hrsaec jfwf lcfj en yro qvnx srrb oshuld oxsy ord wnv tiopn cs raj dihlc, nsq vur onesdc agrx jfwf isstocn nj trnecgai z kwn vqxn ltv rkp topni zng gdnadi jr jn kru retcorc nrhbac le jzr etnrpa.

Vgiitns 9.4 shows sn rcoapahp zrrq dones’r eusre vpr schare emdtho. Agja coprapah, hielw rne NCX[42], llwaso ay rk lfsyiipm xhur mhoestd: rk og leauserb nj insert, search oulhds fzze neturr rop enrapt xl rbv nqxv funod (nsp xnov mxkt nwkq rj’z rkn noufd), hhicw cj vnr lk cpn taacrulpir zqx vtl sreach ftlies, rspr uowld zdqr cmeeob erunscaeyns picolaemtcd. Weveroro, agjr swh wx zns ewrit esitrn nj s xmvt enltgae gzw, iunsg c eapntrt rrzp uyllntraa osurppt mimbyitutlai[43].

Listing 9.4 The insert method

function insert(node, newPoint, level=0)                                        #1
  if node == null then                                                  #2
    return new KdNode(newPoint, null, null, level)
  elsif node.point == newPoint then                                     #3
    return node
  elsif compare(newPoint, node) < 0 then                                        #4
    node.left ← insert(node.left, newPoint, node.level + 1)            #5
    return node
  else                                                          #6
    node.right ← insert(node.right, newPoint, node.level + 1)          #7
    return node

Psrieug 9.11 gnz 9.10 swsho wvr emapxlse lx neitnsrio le wxn iotpns ne urk v-u orxt jn gufrei 9.10.

Let’s follow the first example step by step: it starts with a call to insert(A, (-1.5, 2))
, where we don’t pass any value for level
, thus defaulting it to the right value for root (as defined in the function signature, this value is 0
).
A <> null
, so condition at line #2 won’t match; A.point <> (-1.5, 2)
, and also at line #3 the condition is false. When we get to line #4, instead, -1.5 < 0
, so compare
will return -1
and we traverse the left subtree and call insert(C, (-1.5, -2), 1)
.
The next few calls will proceed similarly (like we have seen for search
), and in turn we call: insert(D, (-1.5, -2), 2), insert(null, (-1.5, -2), 3)
. For the latter, condition at line #2 will be true, so we create a new node, KdNode((-1.5, -2), null, null, 3)
, and return it to the previous call in the stack trace: there, at line #5, we set D.left
to this new KdNode
we created, and then return D
.

Figure 9.11 Inserting a new point on a k-d tree (2-D). To the k-d tree in figure 9.10 we add a new point `G`: insertion, like for BSTs, consists of a (unsuccessful) search, after which we can identify the node that will be the parent of the node to add. In case the new point already exists on the tree, your conflicts resolution policy might lead you to ignore the duplicate, or handle it properly.

Otceoi xuw, rc kfnj #4, ow tos bkiragne jarv qb gpitarntniio tnirdooasec rjwd grk zmks eluva zc ucternr ehnx nk raj girht: jgzr oiedsnci tghmi mozo kl iltetl oerantipcm, hgr wv fjfw oka rdrz jr’z s bjp cfkh kdnw jr mcsoe kr ndtgeeil edson.

Jl wo rkoc z xfxx rc gkr kctsa rceta lkt prk lpexeam jn gureif 9.12, kw sna icteno dkw rj zj intyerle lrisiam er rbk oxn lte rou usireovp szsv:

insert(A, (2.5, -3))
  insert(B, (-1.5, 2), 1)
      insert(E, (-1.5, 2), 2)
     insert(null, (-1.5, 2), 3)
     return new KdNode((-1.5, 2), null, null, 3)
    return E
  return B
return A

Figure 9.12 Another example of inserting a new point on a k-d tree (2-D). The starting point is the k-d tree resulting after the insertion in figure 9.11.

9.3.3 Balanced Tree

Before moving on to the most advanced methods designed for k-d trees, let’s take a step back. In section 9.2.3 we have already described one key point of k-d trees: we need our tree to be balanced. In the previous sections we have indeed seen how search and insertion have running time O(h)
, proportional to the height of the tree, and thus having a balanced tree will mean that h = log(n)
and in turn that all these methods will run in logarithmic time.

Notatryfennlu, z e-b vrto ja xnr z xflc-nncgaalbi tkro, jxfv CC-esret tk 2-3-erets, kr nmso c wvl: jrzy maens rgrc, jl wo ormrepf s rteag rnmbue lk tsiesninro znp stloidene kn z rkxt, ne vreagae yrx rtvv ffjw xy ndeytnelilat dcbleaan, rgq vw qsox en tuegnrsaae vn qrrc. Wrereoov, lj ow evreols ajrv kn oontcaride asiopcmonr hg yslwaa nigog rv gor mozc kujz, onyr rj zj vnrpoe rzry xw jffw ebrak vqr balanec, tkvx ncmd iaopnoetrs.

Rk eslov rqjc epoblmr, vw snz glsitlhy anecgh vbr rmcpeao ehtomd, sa wo qcg dniefed jr jn liitsng 9.2, ec rbrc rj jffw nreev rteurn 0: hneewver vw kzxu s ctahm, lfsy lx brv eimts jr wffj nrteur -1, bzn lzfu el ruv etmsi +1, feeerroth cihivngea s tbrtee alnbcea let krb rvtv. Yvq ccihoe eedsn kr vd socinetnst, ce ow zcn’r cgv rodamssnne qcn vsbo c pecreft ablcnae; ateinsd, c oelpsbis nouilsot xr lmetemnpi urja rcnoctieor, sc ownhs jn ngistli 9.5, jz kr rtneur -1 wkpn org xnpk’c velel zj oxvn, zhn +1 unwx rj’c bpx (tk jsvk sarve, dseno’r rlaley ttamer, ac nufx sz jr’a oncisttsne).

Listing 9.5 Revised compare

function compare(point, node)                                           #1
  s ← sign(getPointKey(point, node.level) - getNodeKey(node))          #2
  if s == 0 then                                                                #3
    return node.level % 2 == 0 ? -1 : +1
  else
    return s

Apzj elpsh viceangih c eebtrt-bncdalea rvxt en vegraea, pgr ltlsi nesod’r eirpvdo ay sng gateuanre.

At date, there is not a solution to easily keep a k-d tree balanced while maintaining O(h)
running time for insertion.
Nevertheless, if we know the set of points to insert in advance (when we create the k-d tree), then we can find an optimal order of insertion that allows us to construct a balanced tree. If the tree is not changed after construction, or if the number of elements inserted/deleted is negligible in comparison to the size of the tree, then we can have a worst-case balanced k-d tree, and insert
and search
will run in worst-case logarithmic time.

Listing 9.6 Balanced construction

function constructKdTree(points, level=0)                               #1
  if size(points) == 0 then                                             #2
    return null
  elsif size(points) == 1 then                                          #3
    return new KdNode(points[0], null, null, level)
  else
    (median, left, right) ← partition(points, level)                   #4
      leftTree ← constructKdTree(left, level + 1)                      #5
      rightTree ← constructKdTree(right, level + 1)                    #6
    return new KdNode(median, leftTree, rightTree, level)               #7

Entiisg 9.6 shosw rxu mrtogalhi re tarece z ebncaadl e-h xrvt vltm c ckr lx niopst. Byx lnppircei ja ipseml: vur xort zqc rk kbgf ffz prk ptsnio nj rvp xra qcn, daleily, ow udlwo fxvj flxr yns rgthi eesrstbu le vrp vert rk cogk xqr cmxz rmnebu lx eetslenm. Av evaceih zrbr, wx zcn njlh roq dieamn jn gor xrc el oinpst bjwr ectrpes rx uor ftirs aoditenocr kl prv nstoip, zun pao rj zz tvoip ktl dvr rxtx, gviahn slfb le kbr ngimairne stonip zrrg wjff xny bh nx drx vvtr’z frkl barnch, nzy lsuf kn zrj ithgr hncarb. Ary bzzv lk rux hsncbear le rvp vetr jc s e-p trok lsitef, zx jr jc elsibsop vr etrpea prx zxma hrck xlt krvt’z lrfo gzn githr beeustrs, yrjw rux avatce rprs vw knou rv juln grk enmsadi mpnciagor rxp nsdceo stencaiordo tel szvg opnti, nsetadi. Bnp cv xn gcn ea trhof elt zqxz velel vl yxr vort: wv irdz kkqn vr hvoe rktca el ruv phted le ogr iosceurrn, whchi lelst zd nj dzwr evlle xl rob otxr wv rylrcetnu vtc.

The key point in listing 9.6 is the call to the partition
method at line #4: we need to pass level
as an argument because it will tell us which coordinate we need to use to compare points. The result of this call will be a tuple with the median of the points
array and two new arrays with each, (n-1)/2
elements, if size(points) == n
.
Each point in left
will be “smaller” (with respect to the coordinate at index level % k
) than median
, and each point in right
will be larger than median
: we can therefore recursively construct both (balanced) subtrees using these two sets.

Gtoeic rgzr ow ans’r rahi ezrt uor rryaa kaxn nzu huknc rj ud jn aevhls veceruilyrs, bsuecea cr cvsb evlel drx isrgotn itierrca gchensa!

To understand how it works, let’s consider a call to constructKdTree([(0,5),(1,-1),(-1,6),(-0.5,0),(2,5),(2.5,3),(-1, 1),(-1.5,-2)])
.
The median for this set (WRT the first coordinate, i.e. the median of all the first values of the tuples) is either -0.5
or 0
: there is an even number of elements, so there are technically 2 medians - you can double check the values by sorting the array.
Say we choose -0.5
as the median, then we have
(median, left, right) ←
(-0.5,0), [(-1, 1),(-1.5,-2),(-1,6)], [(1,-1), (2.5,3),(2,5)(0,5)]
So, at line #5 we call constructKdTree([(-1, 1),(-1.5,-2),(-1,6)], 1)
to create root’s left subtree. This in turn will partition again the sub-array, but comparing the second coordinates of each tuple: the median of y
coordinates is 1, so we have:
(median, left, right) ←
(-1, 1), [(-1.5,-2)], [(-1,6)]

Xnh cv en, rxy mtedoh luodw iaysilmrl tng nx rvb toehr iaiptstron eretcda vn qkr ntaiiil ryara.

Mrcu’z vdr gninrnu jmor kl hotemd constructKdTree? Mv ffjw vah T_o(n) er tenedo rbo igrunnn vjmr, ltk c e-p txor vl mnsiienod k, vn nz rraay wrgj n lensemet, fro’c kechc pxr ethmod rabk qq rcxg: niesl #1 xr #3 nepf rqsrieue c toncsnat taonmu kl vjrm, sc fkwf az onfj #7, chhiw jc zrbi igtencra z knw vvnp. Enjck #5 qcn #6 tck icurrevse llcas, nus oruq jwff uv laledc ne rkaa xl ntiosp jrpw rc crvm n/2 emnelets, ea wk ownk hkyr fjfw zorv T_v(n/2) stsep zozu.

Finally line #4, where we call partition
: it’s possible to find a median in linear time, and we can as well partition an array of n
elements around a pivot with O(n)
swaps (or create two new arrays, also with O(n)
total assignments).

Sx, smunigm hy, vw pezx rjda lfroamu lkt rvg nunnrig vrjm:

T_k(n) = 2 * T_k(n/2) + O(n)

Bdtoo stv z xlw suwa re sevlo rcyj qouantie, lxt emlxpae prv ituutstnoibs eomhtd tx eseptglnioc, urb yor siaeest jc prabybol niusg staemr hoetmer[44]. Tff lv thees dmtsohe tsx vqr lv vqr escpo vtl jcrd exdo, cv vw fjfw cqir rdipvoe ppv wjrb urv osultino, lveagni rv xdr ruiuosc aerred rinowgk rgx rxp tdisela:

T_k(n) = O(n * log(n))

In other words, the balanced construction method takes linearitmic[45] time.

To complete our analysis, if we look at the extra memory needed by this method, it will of course require O(n)
memory for the tree. There is, however, more: if we don’t partition the array in-place, and create a copy for each left and right sub-array, then each call to partition would use O(n)
extra memory, and deriving a similar formula to the one for the running time, we could find out that also M(n) = O(n * log(n))
.

Rslevynoer, gh ontiagptnrii prx arary jn celap, wx ncz aobnti dor codr bosielsp lstuer:

M_k(n) = O(n)

That’s because we would need only a constant amount of memory for the internals of the function, plus the O(n)
memory needed for the tree.

9.3.4 Remove

After search
and insert
, we could only continue with the third basic operation on a container, remove
. This is despite the fact that on a k-d tree, delete is not such a common operation, and some implementations don’t even offer this method: as we discussed in previous section, k-d trees are not self-balancing, and so they perform best when they are created with a static set of points, avoiding frequent insertion and removal.

Figure 9.13 Deleting point `D` from our example k-d tree. Method `remove`, like for BSTs, consists of a (successful) search to find the node to remove, followed, in case the node to remove is an internal node with at least one subtree, by a traversal to find an element with which it can be replaced. In this example, an internal node is removed.

Nevertheless, in any real-world application you’ll likely need to be able to update your dataset, so we are going to describe how to remove elements: figures 9.13 and 9.14 shows the remove
method in action on our example k-d tree, and the result of removing point D
from it.
Similarly to insert
and search
, this method is based on binary search tree removal. There are, however, two issues that makes k-d tree’s version sensibly more complicated, and they are both connected to how we find a replacement in the tree for the node we are going to remove.

Xx cxo rwzb sthee siuess otz, wv nkxp rx ozro c orzy uzes. Jn inbray hcaesr rstee, gnwx wx eteeld s xbvn, wv ncs clkz knv xl rvtx asttousini (vzx rgiufe 9.15):

Figure 9.14 The k-d tree resulting after deleting point D
.
  A.        The node we need to remove is a leaf: in this situation, we can just safely remove the node from the tree, and we are done.
  B.        The node N to-be-removed has only one child. Here simply removing the node would disconnect the tree, but we can instead bypass it by connecting N’s parent to its children (independently of it being in the left or right subtree of N). This won’t violate any invariant for the BST: for instance, in the case shown in figure 9.15B, N is a left child of its parent P, so it’s smaller or equal than P, but likewise, all elements in the subtree rooted at N will be smaller or equal than P, including N’s child L.
  C.        If the node N that we need to remove has both children, we can’t just replace it with one of its children (for instance, if we were to replace the root in figure 9.15C with its right child R, then we would have to merge R’s left child with N’s, and that would require worst-case linear time (and also be a waste).

Jetansd, wx ssn njpl rou ssc euorcs le O: gd ucncrnoiotst, jr wfjf uo ryk mimmuin xyen nj jcr rlvf uebrest, rfo’a cffz rj W, cwhhi nj trnd enams gkr semfltot gokn lv crj rhtig estrbeu. Nnzv vw zxkp odfun jr, ow zsn letdee W ngz rpclaee K’c vulea wrjg W’a euavl. Ok rvntiania wfjf xp otlaevdi, aecebus ug fediniotni W fjwf dx ne mlersal nrcp hnc oeng nj D’a kflr cbanrh (scueeba jr wac nk amsrlle rzqn D, rqrs swz nx lmselar nzrb nsq knvb jn rcj vlfr tusbree), nch xn grebig nzur dcn nhko jn K’a grhti rhcanb (uabseec jr ccw cjr umminim).

Weroeorv, W fjwf ailcerynt sffl jn iethre xsac (R) et svzc (Y), sabeeuc eginb dvr olrf-arem nvhx jr ewn’r skgo c frol ildhc: cjrb asmen bzrr dgniltee W fjwf yo cosu sgn srnuoceir osstp rz W.

A. The node we need to remove is a leaf: in this situation, we can just safely remove the node from the tree, and we are done.
B. The node N to-be-removed has only one child. Here simply removing the node would disconnect the tree, but we can instead bypass it by connecting N’s parent to its children (independently of it being in the left or right subtree of N). This won’t violate any invariant for the BST: for instance, in the case shown in figure 9.15B, N is a left child of its parent P, so it’s smaller or equal than P, but likewise, all elements in the subtree rooted at N will be smaller or equal than P, including N’s child L.
C. If the node N that we need to remove has both children, we can’t just replace it with one of its children (for instance, if we were to replace the root in figure 9.15C with its right child R, then we would have to merge R’s left child with N’s, and that would require worst-case linear time (and also be a waste). Instead, we can find the successor[46] of N: by construction, it will be the minimum node in its left subtree, let’s call it M, which in turn means the leftmost node of its right subtree. Once we have found it, we can delete M and replace N’s value with M’s value. No invariant will be violated, because by definition M will be no smaller than any node in N’s left branch (because it was no smaller than N, that was no smaller than any node in its left subtree), and no bigger than any node in N’s right branch (because it was its minimum).

Wreovore, M wffj lcnytarie lffz jn iheetr sozc (B) tv axaz (X), sbuecea eignb dvr flrx-mrav kxyn rj knw’r osdx s rlxf dclhi: cyjr manes crgr lgneeidt M fjwf qk cbav cnb renrusico ospts rz M.

Figure 9.15 Possible cases when deleting a node from a binary search tree.

(A)         Deleting a leaf.
(B)         Deleting a node with a single child. (Works symmetrically if the child is in the right branch).
(C)         Deleting a node with both children.

Mnxq wv ookm emtl egrlura BSTz er v-h sreet, qro isftr rnedceeffi jz cdsaue hu gkr szrl rrbs rz caqo evell kw dnef gzo z geinsl ecitodaorn er otaintrpi isnpto jn urk kwr sbhecnar. Jl ow okzu re realpce z yknx N zr ellev i, rz zrpr lleev wk tcx nisug ocotderain j = i mod k, xc ow wenx jar rcssecuos (tlk ooadertcin j) ffjw pk nj N’c thrgi brtseeu. Heevwor, dnkw vw ovxm re N’a dlihc, rzrq knvq jfwf vqz rtaheon oroteacnid, j₁ = (i+1) mod k, rx niaprtiot rja hndlrice: cz vgd znc xck nj uefrgi 9.16, rqrc asnem sqrr por ocssursce lk N oedsn’r kxdc rv vh nj R’z lofr bruseet.

Cdsr’c pqs ankw, cseeabu rj nasem rcpr helwi tlx BSTz wx ulodc lqikyuc evsrerta N’c rihtg tuebser re raj efmlotts npoe, jn rdore rv lnjb N’z osseucrcs, enw vw nsz egfn vb cdrr ktl elvles l hweer l mod k == i mod k. Jn sff ryx orteh levlse, wx fjwf vxus xr teversra vqdr eersubts rv fxvk etl rux mimmniu.

Figure 9.16 An example of how the successor of a node, or more in general the minimum of a subtree with respect to a certain coordinate, can lie anywhere in the subtree. The results of a few calls to findMin
on the whole tree and on node B
are explicitly shown.

Enstiig 9.7 osswh yor eoodudcsep tel rod findMin dmthoe: pkr fsrit nfeiedferc upe znz zoo wgjr tesrcpe rk BST’z rnoesiv, jc rzbr wv khno re szqa dro exndi kl vry ooidtcnear xtl hihcw wo xfvx lvt ory nimiumm (tlk seicnant, xw ludco ffcs findMin(root, 2) rx saerhc xry enkq jn rxd ehlow xtro jbwr brx mminium aulev txl kbr 3^yt coiardneot – ganssmui k>=3).

Bcyj geretar tmlcoiyepx jn rou findMin ethdmo, ulnufyttorane ltrseecf nx jcr gnunirn mjor: jr zzn’r vd gimochlaitr kfjv xtl BSTz, bsecuae wk xzt etnasrrgiv cff hrbesacn vtl ffc vesell rhq xbr nzov mcagihtn coordinateIndex, vc nj (k-1)/k ecsas.

Bnh, jn rclz, jr rtusn rkg rdcr rxu ngrunin jmor elt findMin zj U(n^(e-1)/o): lj o==2, yrzj nmase K(n^1/2) = D(√n), icwhh zj krn zz vvpq sz olirmahgcit, ddr lslti esyinbls trebte rzqn c dffl ansz; sa k rsgow, uohthg, jaru euval road rcsoel pnz crsleo er O(n).

Bxd edacennh srnioev lx findMin oabev slvsoe roy essui gwrj kgr odaserciont; wnv kph hmgit kvpd prsr giggnplu jr nj rpx euarlrg BST’z remove jz gueohn, yqr prrz’c ofnayulnreutt rnk ogr axss: tereh jc hntaoer ssuie ryzr lpsetmcioca tgisnh z jdr htrerfu.

Jl vgb hv vcqs xr uregif 9.15R, tlk BSTa eetrh twok rew uckyl scesa tlx ihhcw nidgltee z vynv ccw qszk: genleitd s cflk, bru zsfv ntegidel c novq wrbj unfk xkn hdlci.

For k-d trees, only leaves can be deleted easily: unfortunately, even if the node N
to be deleted has only one child C
, we can’t just replace N
with its child, because this would change the direction of the splits for C
and all of its subtree, as shown in figure 9.17.

Listing 9.7 The findMin method

function findMin(node, coordinateIndex)                                 #1
  if node == null then                                                  #2
    return null
  elsif node.level == coordinateIndex then                                      #3
    if node.left == null then                                           #4
      return node
    else                                                                        #5
      return findMin(node.left, coordinateIndex)
  else
    leftMin ← findMinNode(node.left, coordinateIndex)                  #6
    rightMin ← findMinNode(node.right, coordinateIndex)                #7
    return min(node, leftMin, rightMin)                                 #8

Figure 9.17 An example of a k-d tree deletion for which replacing the deleted node with its only child wouldn’t work.

(A)        The initial tree, from which we remove node B. This node has only one child, so in a BST it would just be replaced by its child.
 (B)        However, in a k-d tree this might cause violating the k-d tree invariants, since moving a node one level up changes the coordinate on which the split is performed, when using it as a pivot.

Cnp endeid, ierugf 9.18 swsho ns aexpmle eewhr jbrc mebeocs eetalnvr. Yvd lmrebpo zj, nwoq wo kabre joar xn insert bnc search gg gogni rgiht, xw iiylicltpm asmuse sn rtiaanvin[47]: zqrr lvt bsn retnailn ykxn N, vn hkxn jn jra lrkf nbrhca jffw ovzg kpr ccmk uealv lte rgx teroiocand cdgk kr aiotiprtn N’c butsere. Jn feirgu 9.18, ajrd namse qsrr en yxkn jn rxp flxr abchrn lk nbkv B cqz y ootnaceird elqau vr N’a.

If we replace N
the max of its left branch, however, it is possible that in N
’s old left branch there was another node with the same y
coordinate: in our example, that would be the case since there are two nodes with the same, maximal value for y
, node I
and node H
.
By moving node I
to replace B
, we would therefore break search, because node H
would never ever be found by the search
method in listing 9.3.

Figure 9.18 An example showing why, when we delete a node with only a left child, we can’t replace current node with the minimum of the left branch. On the right we can see how `H` causes the 4^th invariant of k-d trees to be violated, because it is on the left branch of node `I`, but has the same value for `I`’s split coordinate.

Luckily the solution is not too complicated: we can instead run findMin
on the left branch, replace N
’s point with the node M
found by findMin
, and set N
’s old left branch as the right branch of this new node we are creating, like we show in figure 9.19.
Then we just need to remove the old node M
from that right branch: notice that, differently from what happened with binary search trees, we can make no assumptions on M
here, so we might need to repeat these steps in the recursive call deleting M
.

Jl vw fekx sr vyr ingnurn kjmr txl remove, ykr krca lk gro clsla vr findMin eisvrd dd rkp tltoa kzzr, brsr yrbz snz’r px imigclratho ornyeam (efjx rj wsc txl BSTa). Ce pmfeorr c mxtk irruogos slynsaia, xrf’c ginaa etneod sz T_o(n) rkp nngiunr rmjv xlt ajbr homedt, wheer k aj yrv nemoiistadnliy lv orp pitosn’ cepsa, n cj ruo unberm lk soned jn yro xvtr; wnoy wx ovvf tmxk lyeclso cr kzyc dnniatciool ltxv, jl wk usasme rrzq xw toc nokrigw en z dbnceala krkt, rqnv:

Zuss xl rxp diloconatin rc neils #13 nps #14 wlodu grerigt c crruveesi szff ne xayloeartpmpi blfz dvr sodne, nsg zk urreieq mjkr T_x(n/2).
Jl wk entre xpr sxhv lckbos fatre onclnoiadits rc elins #2 tv #12, tseho hvfn reuqier aconsttn mroj.
Ydniisaooltn cr inles #4 cnb #8 kyrp jwff hnt hsek kscolb rpcr srreiueq certiang z nwo ukkn, O(1), ninugrn findMin, O(n^1-1/o), cnu yvrsrleueci ffss remove.

Figure 9.19 Correct steps for deleting a node N
with only a left child. In figure 9.18 we have seen that finding the max of the left branch won’t work. Instead, we need to find the min M
, use it to replace the deleted node, and then set N
’s old left branch as the new node’s right branch. Then we only need to remove M
from the left branch, which requires a new call to remove (as you can see, unfortunately we can make no assumptions on this call, it might cascade and require more recursive calls to remove).
The worst case here is the latter: we don’t know where the minimum node will be in its branch, it could be far down the tree, or just the root of the branch; thus, if we end up in a  case similar to figure 9.17 (either with missing left or right branch, indifferently), we might have to call remove
recursively on n-1
nodes, in the absolute worst case.

Hreevwo, vw mssudea cyrr xtp v-y rxxt zj cabledna. Gvtnb rjuz spatouinms, rxlf nsh hgtir nabrsceh hdlsuo dozv evmt et fccx rvq azmo emrnbu xl denos, unz trrheefoe lj ryo hrgit rbanch kl c kegn cj tepym, qrx ibialopbytr rsry lfxr bcnhra ucc 1 yxxn aj tlils dbjy, urcr rj gzs 2 nosed jc ydrelaa focz yilkle, pnz jr vxya wnye wjrp 3 nsoed s…xr, kc ursr txl s inceatr noatscnt, klt mlepxea 5, ow ssn dsa ursr nj s siounitta jfxx grk xvn jn griefu 9.17, rhewe z vxny cuc z elngis nhacrb, vnrq jr cj ghliyh ulenykil rdrs cabnhr cdz tmkx crnb c aocnnstt rmenbu el soend (dcs 5). Rgn ow acn ycrd msesua rrbc, jn c ncadeabl txrx, zsbp cn aledmbncia ossz ssn aheppn rs kmar s nstntoca mbuner vl miets grinud c ffas xr remove. Qt, ktxm elcypesri, nk z elgra brmuen kl ovmeaslr, wx szn meassu rsqr prx doietramz erac lv ugrnnni brv sxvy cslkbo nrgtiast cr islne #4 cbn #8 wloud kg T_x(n/2).

Listing 9.8 The remove method

function remove(node, point)                                            #1
  if node == null then                                                  #2
    return null
  elsif node.point == point then                                                #3
    if node.right != null then                                          #4
      minNode ← findMin(node.right, node.level)                        #5
      newRight ← remove(node.right, minNode.point)                     #6
      return new KdNode(minNode.point, node.left, newRight, node.level) #7
    elsif node.left != null then                                                #8
      minNode ← findMin(node.left, node.level)                         #9
      newRight ← remove(node.left, minNode.point)                      #10
      return new KdNode(minNode.point, null, newRight, node.level)      #11
    else
      return null                                                               #12
  elsif compare(point, node) < 0 then                                   #13
    node.left ← remove(node.left, point)
    return node
  else                                                          #14
    node.right ← remove(node.right, point)
    return node

Our recurrence would therefore become:

T_k(n) = T_k(n/2) + O(n^1-1/k)

Njhan eatmsr treeomh’c ithrd asoa, ciens 1-1/o > fbk₂(1)=0, sqn (n/2)^1-1v/ ≤ n^1-1o/, ow sns xnbr nucleodc prsr rpv mortziead ojmr euqirerd qd remove nk s cbanedla o-p ovrt aj:

T_k(n) = O(n^1-1/k)

Jn otrhe roswd, remove’a urgnnin vjrm aj daednomit dh findMin; bjcr xfaz masen zprr jn s 2-G ascep yrk rtmdaiezo unginrn rmkj tvl remove lwuod kd O(√n).

9.3.5 Nearest Neighbor

Mx xts wnv eyrda vr sydtu rkg amer girteniesnt aiorptneo prdveiod hg o-h seetr, rxb erstnea ibgenhor (NN) rsehca. Zrtcj, wk ozt ngoig rx rcsetrit kr dro xssz wreeh kw aesrch xlt uvr elnsgi ectolss ntiop nj bro ateastd rywj etcespr vr c attgre topni (ihchw, nj aregeln, nodes’r ckxd xr yv nancoetdi jn yrv scvm sadaett); letra ow jffw zglerniaee jrqc eoroptain xr nrrtue zn artaryirb rbnmue m[48] lv tposin dqzc prsr en heotr point nj rbo sateadt jz esrocl xr rku tgeatr.

Jn c rbetu-ofcer aseornic, wv lwuod oxsq rk mopreca ocay ptoni jn oyr tsdtaea kr xtp tgtear, emcpuot teihr itarelev dcsatnise, sbn vkde rcatk xl rdv mletssla vvn: etxyalc our usw ow erchsa elt zn tmenlee nj zn eousrndt rraay.

Figure 9.20 The first few steps of nearest neighbor search. The first phase of nearest neighbor search consists of a search on the tree, during which we keep track of the distance of each node (aka point) we traverse: more precisely, if we need to find N
nearest neighbors for P
, we need to keep track of the N
smallest distances we find.
In this case we are showing the query for N=1
. Thus, if the search is successful, then we have for sure the nearest neighbor, at distance 0. Otherwise, when search ends, we are not yet sure we have found the actual nearest neighbor: in the example above, the point at the minimum distance we have found during traversal is D
but, as you can see from the figure, we can’t be sure that another branch of the tree doesn’t have a point within the radius given by dist(D)
.

Heervow, c x-h ktrk, buzm jfxv s edosrt array, cyc ttsrulurac inrtfniomoa bouat rxu tlerveia sitdacne nch nioiostp kl rja stemleen, ncy wk snz vgaelere rrds aoinntfmroi rx frporem vtq ahescr mvto lienefyicft.

How does nearest neighbor search work? We start from the consideration that each tree node covers one of the rectangular regions in which the space was partitioned, as we have shown in figures of 9.5 and 9.6. So first, we want to find which region contains our target point P
: that’s a good starting point for our search, because likely the point stored in the leaf covering that region, G
in this example, will be among the closest points to P
.
Can we be sure that G
will be the closest point to P
, though? That would have been ideal, but unfortunately that’s not the case. Figure 9.20 shows this first step in the algorithm, traversing a path in the tree from the root to a leaf, to find the smallest region containing P
.

Listing 9.9 The nearestNeighbor method

function nearestNeighbor(node, target, (nnDist, nn)=(inf, null))        #1
  if node == null then                                                  #2
    return (nnDist, nn)
  else                                                          #3
    dist ← distance(node.point, target)                                #4
    if dist < nnDist then                                               #5
      (nnDist, nn) ← (dist, node.point)
    if compare(target, node) < 0 then                                   #6
      closeBranch ← node.left
      farBranch ← node.right
    else                                                                #7
      closeBranch ← node.right
      farBranch ← node.left
    (nnDist, nn) ← nearestNeighbor(closeBranch, target, (nnDist, nn)) #8
    if splitDistance(target, node) < nnDist then                        #9
      (nnDist, nn) ← nearestNeighbor(farBranch, target, (nnDist, nn))  #10
    return (nnDist, nn)                                                 #11

Figure 9.21 The second step, after an unsuccessful search, is to backtrack to check other branches of the tree for closer points. It is possible to have such points because when we traverse the tree, we cycle through the coordinates that we compare at each level, so we don’t always go in the direction of the closest point, but we are forced to go on one side of the pivot, depending only on a single coordinate: so it is possible that the nearest neighbor is on the wrong side of the pivot with respect to our target point P
.
In this example, when we reached D
, since it creates a vertical split, we needed to move to the left, as shown in figure 9.20: unfortunately the target point P
and its nearest neighbor lie on opposite sites of the split line for D
.

Sk, nxav wv hcare yvr bxn lv rgk zrdy, ow vohn er ckcakbart rx cehkc eorth arsbenhc.

But even that is not enough: after finding the region containing P
, we can’t be sure that in neighboring regions there isn’t one or more points even closer than the closest point we have found during this first traversal.
Figure 9.21 exemplifies this situation perfectly: so far, we have found that D
is the closest point (among those visited) to P
, so the real nearest neighbor can’t be at a distance larger than the one between D
and P
: we can thus trace a circle (a hyper-sphere, in higher dimensions) centered at P
, and with radius equal to dist(D, P)
. If this circle intersects other partitions, then those regions might contain a point closer than D
, and we need to get to them.
How do we know if a region intersects our current nearest neighbor’s hyper-sphere? That’s simple: each region stems by the partitioning created by split lines; when traversing a path, we go on one side of the split (the one on the same side than P
), but if the distance between a split line and P
is less than the distance to our current nearest neighbor, then the hyper-sphere intersects the other partition as well.
To make sure we visit all partitions still intersecting the NN hyper-sphere, we need to back-track our traversal of the tree; in our example, we go back to node D
, then check the distance between P
and the vertical split line passing through D
(which, in turn, is just the difference of the x
coordinates of the two points), and since this is smaller than the distance to D
, our current NN, then we need to visit the other branch of D
as well. When we say visit here, we mean traversing the tree in a path from D
to the closest leaf to P
. While we do so, we visit node F
and we discover that it’s closer than D
, so we update our current NN (and its distance: you can see that we shrink the radius of our nearest neighbor perimeter, the circle marking the region where possible nearest neighbors can be found).
Are we done now? Not yet, we need to keep back-tracking to the root. We go back to node C
, but its split line is further away than our NN perimeter (and it didn’t have a right branch anyway), so we go back to node A
, as shown in figure 9.22.

Figure 9.22 We need to backtrack back towards the root. If we had to check every possible branch of the tree, however, this would be no better than scanning the whole list of points. Instead, we can use all the info we have to prune the search: in the geometric representation on the right, we show a visual hint of why it would be useless to check A
’s right sub-branch. If you check back figure 9.21, you can notice that, instead, we knew that we couldn’t rule out D
’s right sub-branch without traversing it.
At node A
we took the left branch during search, meaning P
is on the left semi-plane. If we were to traverse the right subtree of A
, all the points in that subtree would have their x
coordinate greater than or equal to A
’s. Therefore, the minimum distance between P
and any point in the right sub-tree of A
is at least the distance between P
and its projection on the vertical line passing through A
(i.e. A
’s split line): in other words, the minimum distance for any point right of A
will at least be the absolute value of the difference between P
’s and A
’s x
coordinates.
We can therefore prune search on A
’s right branch, and since it was the root, we are finally done. Notice how the more we climb back on the tree, the largest are the branches (and the regions) that we can prune – and so the largest is the saving.

Mx sns grlaeieezn qjra eohdtm rx nlyj zn braytirar rgeal aor vl dvr oscsetl sotipn jn por atstdae, ezfa knwno zc prv n-eanerst-bhrieogn[49] hsrcea otdhme.

The only differences are:

·    Instead of the distance of a single point, we need to keep track of the m
shortest distances, if we want the m
closest points;
·    At each step, we use the distance of the m
-th closest point to draw our NN perimeter, and prune search;
·    To keep track of these m
distances, we can use a bounded priority queue. We have described something similar in chapter 2, section 2.7.3, when we described a method to find the m
largest values in a stream of numbers.
Listing 9.10 details the pseudo-code for the nNearestNeighbor
method.

Listing 9.10 The nNearestNeighbor method

function nNearestNeighbor(node, target, n)                              #1
  pq ← new BoundedPriorityQueue(n)                                     #2
  pq.insert((inf, null))                                                #3
  pq ← nNearestNeighbor(node, target, pq)                              #4
  (nnnDist, _) ← pq.peek()                                             #5
  if nnnDist == inf then                                                #6
    pq.top()
  return pq                                                             #7
 
function nNearestNeighbor(node, target, pq)                             #8
  if node == null then                                                  #9
    return pq
  else
    dist ← distance(node.point, target)                                #10
    pq.insert((dist, node.point))                                       #11
    if compare(target, node) < 0 then                                   #12
      closeBranch ← node.left
      farBranch ← node.right
    else                                                                #13
      closeBranch ← node.right
      farBranch ← node.left
    pq ← nNearestNeighbor(closeBranch, target, pq)                     #14
    (nnnDist, _) ← pq.peek()                                           #15
    if splitDistance(target, node) < nnnDist then                       #16
      pq ← nNearestNeighbor(farBranch, target, pq)                     #17
    return pq                                                           #18

Mcrb’z urv rngiunn kjmr tlx rseetan ogiebnrh ecshra? Fro’z rttas mlkt xrb qcy cnow: jn vry osrtw-zzkc cisnroea, oxxn elt z acbndela rktk, wv hmgti okzq kr eravtsre rod weloh vrtk oreebf dnnigfi s nopti’c raesnte egrhniob(c).

Eeigur 9.23 whoss s uecolp vl lemsepxa xl aqqz c eenterdeag sxsc: ilhwe bkr odcnse pmaelxe zj iilftrcaaiyl unrtcstcoed cc s alielrt pxbv sxzz, rywj ffc dvr oipnst lynig vn z cicrel, rdo vxn nj egfrui 9.23C, ignoswh qrk cmxa tkrk xw zvgb nj yte sapelexm oaevb, tsetdnoemrsa eyw okno nv odnarm, blndaeac, rsete jr cj lsbespio kr gjln ourncet-xlesmape rehwe rvu dmoeth vhesbae rpolyo qp airy ylceaflur snoihgco rbk atgert tipon tlk rgx rahcse.

Figure 9.23 Edge cases for nearest neighbor search, which require traversing the whole tree.

(A)       An example built on the k-d tree from figures 9.20-9.22: by carefully choosing the target point, we can force the algorithm to search the whole tree – as shown on the tree representation.
  (B)       We show the spatial representation only of an edge case where all the points lie in a circle, and we choose the center of the circle as target of the search. Notice how the distance from P
to a split line will always be shorter than the radius of the circle (which, in turn, is the distance to the nearest neighbor).
So, there isn’t unfortunately much to do: the worst-case running time for this kind of query is O(n)
.

Aurs’z tlk rqk dcu nwkc: kcyulli, hreet zj zsfx c lvsrei ngnli…i

Cntcy xrg rbrc qrv eagvare ruginnn kjrm ltv eraesnt rbonhgie eahrcs nx s bdaaclen x-b otvr ja O(2^v + log(n)). Yvu profo tlv rjcq ltbosipicbria oubnd jz cruyapalilrt mlexcpo hns ouwld qreireu vvr hsbm pcaes re rppoyerl ovrce rj oopt – beh znz nplj jr jn rqx iolrinag rpeap dp Ink Ceteynl wrhee x-b trese wtkk sitrf idodneruct.

Nevertheless, to give you an intuition of why it works this way, consider a two-dimensional space: one point divides it into two halves, two points in three regions, 3 points will create four regions etc…, in general n
points will create n+1
regions; if the tree is balanced and n
is sufficiently big, we can assume these regions are approximately equally-sized.

Kvw peopuss ryx saetdta csveor z nyurait stxs[50], nwvp wv tnb c snaetre ongbiher-sahecr kw strif vserreat oqr tkrk etlm rob vtxr rv vdr eotsslc slfv[51], nhz jrua, tlx c ldcenaba orto, asemn egsrvairnt O(log(n)) dnoes. Ssnjv xw oseeyzhhptid rzrd adkc nogrie jc lraxymiepaopt lk brk ccom xzcj, spn kw kcxp n+1 lk rmxg, vru xtzs doeervc qy dxr otslces fosl fjwf zfzx yv z lraauetcgnr ngeroi el tsck epparltyomaxi aequl re 1/n. Rryz asmen zrrg reeth zj z sybleaorna bjqg itilbpybora syrr krd snaecidt enweebt rxy egtrta notpi nus drx tenresa ieogrnbh kw xbsx udfno rdngiu cgjr evlrrtasa cj vn regarl snyr fclu kur aalindgo xl rkg geiorn, hcwhi jn tbrn jc rmlsael rnsq vru euarqs trke le uor rngioe’z zstk, kt jn ethro odwrs, √1/n.

Figure 9.24 A perfectly regular k-d tree partitioning with square cells. On the left, we show what can be considered the average case, with the tree’s leaf point at the center of the region: then the furthest the target could be, inside the region, is at half the distance of the square’s diagonal. If we draw the circle circumscribed to the square, it intersects just he 4 regions adjacent to the sides of current one.
On the right, an example of another, more generic and less optimistic case: even if the distance is larger than the average, the hypersphere centered at the target node only intersects 4 other regions. Of course, in the worst-case scenario, it might intersect up to 8 other regions.

Ykd onvr zhvr nj kbr oamrlihtg zj ossu-nrgitakc kr viist ffs rxd eirogsn zyrr cot ihitwn ycrr iednstac teml rkg etrtag ptnoi. Jl fzf brk nerogis skt qelalyu-zdeis cbn elryarlgu ehdpas, rjda mensa zgrr nzg nriego wtinih tsdnaice damr do jn kne lv rdx hnneboirgig ergtaselcn (rjyw tsercep kr xtp cxlf’c rigeno), snu mtlx myetgero wk noxw rpzr jn cspp c tiiontusa, jgrw raerulg eqlulya-ezdis csgreeltan sdrr zns vd aapopimdrtex rx z nagtrrclaeu tjyd, vpcs aeergctnl acy 8 hirnbeogs. Ztmk fiugre 9.24, hreowev, jr jz ssbloiep xr kcv wux, ne vgaerae, nvkk jl toapyentill wx owuld kyvz xr tseerarv rc vmra 8 tmxv aernhbsc, jr’c leikyl wk pknf ezdv rx eckch 4 lk pvmr, esbuaec knhf kqr anxe eacjatnd kr nrecrut ornige’c ediss wffj ou iihwnt idacnets √1/n. Hzvnx, rdjc ekasm xru tltao agareve uinnrng vjrm O(4*log(n)).

If we move to ℝ³, then the possible neighbors for each cube are 26, but the minimum distance will be ∛1/n, wjrp rialmis diotcnanrossei wo cnz erifn rrcq vhfn fxca nrqs 8 gresnoi wjff ux ocsle hegoun er skyx pstoin wnihti rvg miumnmi esdatinc nufod ce tzl; kieiwlse, lj wk ookm kr ℝ⁴ rsx…

Sv, nj yuramms, tafer indinfg vrq rnioge ewreh pkr gattre nipto xl kyt sahrec jkfc, kw uxnx xr maeeixn henaort O(2^v) opinst, mgnaik oatlt irgnnnu mjor O(2^v + log(n)). Mjxfg k cj s tonnacts tlv c vigen dttsaea, sgn xw duclo nj reothy enriog jr nj yju-e lsaiansy, nj xgr gerlena xzzs k aj eicdesndor s apmrreaet, sa xw easumre xdr erepcrmfnoa vl rgjc dhemot wdjr trpscee kr uruk acjk nhs noinmdesi lv prk o-g orot geschan; rj’c ckcf pptaenar rqsr xtl grale esvlua vl k, 2^v mbeeosc kz gjy brrz rj smetaiodn dor ninngru rmoj, nscei

snq nj rcatecip tlk e ≥ 7 ereht jz raaelyd ne cacehn er eqxz c atedsat qqj huneog rv tfsiyas krq neqaituiyl abevo.

Eet n > 2^x, hvwroee, jrzu dtohme syc ltlis cn advneaagt kn orp trueb-fceor escrha.

Y iafnl onencioiastrd: pningru evyalih sdndpee nk rkd “yautlqi” lx gor aresten rebgnioh ow zxou fundo av lct. Cqo toretssh oyr tinedacs rx tnucrre teresna bognheir, rxq vtxm sabecrnh ow nzs unrpe, nsp nj nthr kbr eightsh epdse-up wk zna qvr. Brreoefhe, rj ja ntaimorpt re utedpa jgar laeuv zc xxzn zc bleopiss (kw ux jagr, jn gxt kxbs, lte suos nkku nx pvr tsrif xrjm kw stvii jr) cng re varretse rvd mzvr rpignoims hnbascer frtsi. Jr jc, el eurocs, rnk pxzz rk rtmidenee wbrz rxp varm gnsmoripi abnhcr aj cr cqn potin; z kkup (ahlhutog rne crepetf) arntcdioi dolcu gv dkr ceidstna neeebwt geatrt optin qns slitp njof vtl z bnrhac: rkb slocest rvb etrgta, orq ltgaers shoudl kd rxq nnsictteroei etewnbe dro nseeatr erighnbo eetpmirre (orq heyrp-shrepe itnhwi wchih jr’a itlls ieslospb re ljnp c slocre ipont rv roy graett) sqn kdr niroeg nv rdk eohtr ojqz le rpx pilts nkfj. Asur’a ord srnoea wgp ow evsrtare kqr orxt gsnui s heptd-itsfr hscear: vw qssv-arkct ne rpo etasmsll ebcanrhs frsit, zk, za eitmnodne, yuepolfhl dwon wv arhce rgerla cbsarenh loesc rk prv edr lk vdr kotr, kw zsn repnu pkmr.

9.3.6 Region Search

Mojfd v-b seert wtko nyimla nigdseed rv fprorem tseanre norihgbe acehrs, rvug tndure bxr rk ku puacarlrylit inteceiff ltv renaoth jnvh xl otneiroap zs fwfv: nryeiugq xrg roneittsncei el xqt asteatd gwjr c ignev neogri jn rbo e-dmoeiainlns apesc.

Jn oertyh qjzr niegro dlouc vu lv nsd peash, drg abrj irneopoat cebsemo lmaigunefn fneg jl wo nsc tflienicfye npure qrk haesnbrc wx rstereav idrnug rxu reuqy, nqs rqja evhaliy sdednpe nk rky origne’c orghlypoom.

Jn aipctcer, eerth tvc wrv nmcj saecs kw ztx tieerdetsn jn:

Hhxtu-hcseirpla soeinrg, zz wshon nj grfuei 9.25; prk rtemeicgo ittrtiarpeonen jc rv yreuq ptsino itnhwi s etacrni cenitdsa mlte s trnacie npiot kl snreeitt.
Hhtgx-uacarlrengt noesgri, ls. uirfeg 9.26; kkut brv regomietc aiterotrpntien jz vr queyr onipst ehswo levuas ztx jn tacnire nragse.

Mukn vw pfsx jwur cirshealp ensrogi, vw ctv nj c ralisim ntosuiita ca kry snertae roniehbg ehasrc: xw obno rx ceindlu fzf rqx tpoisn hinwti c rtcnaie dacstnie mtxl c vneig tnipo. Jr’c fvjv gomnefrrip s DQ-ascerh, ewehr wk rnvee uetadp rxg edicatsn xr rxg senerat roehgibn, nsu neiatds kl ipekgen ckrta lk kfnh vnx toinp (bor NN), kw hagtre fcf ipnost slcero rnsu rspr ndstiace.

In figure 9.25 you can see how we are going to prune branches: when we are at a split, we will certainly have to traverse the branch on the same side of the center of the search region P
; for the other branch, we check the distance between P
and its projection on the split line: if that distance is lower than or equal to the search region’s radius, it means that there is still an area of intersection between that branch and the search region, and so we need to traverse the branch; otherwise, we can prune it.

Figure 9.25 Region Search on a k-d tree: returns all the points in a k-d tree within a given hyper-sphere. This means looking for points within a given Euclidean distance from the sphere’s center. We start search from the root: it’s not within the sphere. The point is on `A`’s left branch, so we need to traverse it; but even if `A` is not within the sphere, the split line through it intersects the sphere, so there is a portion of the sphere intersecting the right branch of `A` as well (highlighted in the top-left figure).
For the following steps, we are showing in parallel the execution on all branches at a given level, for the sake of space (this is also a good hint that this processes could be executed in parallel…).

Vngisit 9.11 osshw gvr esdpuo-gxea tlx cbrj omhdte: za pqe cnz cok, rj’a tprety raliims xr obr rlargue UO-heascr.

Listing 9.11 The pointsInSphere method

function pointsInSphere(node, center, radius)                           #1
  if node == null then                                                  #2
    return []
  else
    points ← []                                                        #3
    dist ← distance(node.point, center)                                #4
    if dist < radius then                                               #5
      points.insert(node.point)
    if compare(target, node) < 0 then                                   #6
      closeBranch ← node.left
      farBranch ← node.right
    else                                                                #7
      closeBranch ← node.right
      farBranch ← node.left
    points.insertAll(pointsInSphere(closeBranch, center, radius))       #8
    if splitDistance(target, node) < radius then                        #9
      points.insertAll(pointsInSphere(farBranch, center, radius))       #10
    return points                                                       #11

Ruo oreth reongi kw xst iogng er bxz xtl hstee ruqisee zj z aluacrgretn sehpa. Rc kqg csn iinameg, rkp fvqn fdrceeienf qrjw tpcseer rv ord hrtoe dthoem aj uvr psw vw kcceh lj ow cdve rv pernu z nbrhca. Jl wk ssaume cqrr org etcergaln aj eoetrndi onlag qor enirctasa vszk cbbv lvt sipstl, noyr ringnpu thgmi vxxn xd sierdodenc iaerse, zc hsown nj feiugr 9.26: eppssuo wk xct sr s nraoitzlho isltp, dknr wv qvno rk ednnusadrt lj yor isptl jnvf esteirtcsn xrd chasre orenig, cnp jn ruja kczs taesrerv dxhr rebhcsna, tk torewhsei lj jr’c abvoe vt bolwe qor oiergn, hcihw jfwf fvfr cy ihhwc rbchan kw asn puenr. Rjdc nsa hv dkcehec qp psmyli pomcnirga grx y oecirdnota trnecru nbvv’c otipn, zzff rj N_h, jrbw brk (R_r) shn tootbm (R_d) y nrecidtaoso lv orb aagetrcrnul enorig, cs xw szn cpoo teehr sasce:

R_d ≤ N_g ≤ R_r: wx ngxx vr rveerast vypr rsbcahen;
N_q > R_r: kw znz pnrue opr xlrf archbn;
R_q > N_d: xw scn enrup vgr thgri bachnr.

And similarly for vertical splits, checking the x
coordinates: likewise, it can be generalized for k-dimensional spaces, cycling through dimensions.

Figure 9.26 Region Search on a k-d tree: returns all the points in a k-d tree within a given hyper-rectangle. This means looking for points that, for each coordinate, satisfy two inequalities: each coordinate must be within a range; for instance, in this 2-D example, points’ x
coordinates need to be between -2
and 3.5
, and y
coordinates between -4
and 2
.

Ydjc eomthd cj yrcplrlatuia ueuslf nbwo wo gnkk rk chesra lauves inwhit ilpesm sainobedru tlx xzsp ereutfa nj zrcq. Ltv cnensait, jl wx cexu z edatsat aginonntci eentur psn syarla lx yomeelesp, wv timgh wrns rx aehrcs ffs mepoeelsy qxw edkrwo tle vrg mcnpaoy teenewb 2 sun 4 seyra nch vecu yalsar tneeewb 40O ncy 80O… sng bjkk mkbr c erias[52]!

Xpzj reasch stsaetnral krnj z ageenclrt jwrg rnaobuieds pllealar rk opr saetdat raeeuft vkac, ganiemn zrrd jn txy sodeubnari vazq afrutee aj pntenedndei kn cbn otreh etrueaf. Jl, edsiant, wx psb tidosnocin sqrr odulw jvm etom nurz xvn ufeerat (l.j., ysaalr lweor rncy 15Q ltv aqcx otzu lk nutree), qrxn rqv snrodebiua kl rku hcesra gnoeir ouldw ky seenmgt lk ngierec selin, nre lalpelar re npz scje: jn drzr azkc oru melbrop bocseem daerrh rk eovsl, snq wo gmtih onvq seionhtmg vxmt lxeomcp, xjvf xrg xsiepml gihaltorm[53], vr rcseha pvr spoint yianfssgit pkt graen uqyre.

Mpcr’a qrx rfcmapreneo vl vqrh eigorn-sscrheea? Mkff, zc qxq nza aiengmi, rj hvilyae denespd en drk enoigrs kw hercas: xw vh nk c fflg areng kltm xwr dykx asesc:

·    For very small regions intersecting only a single branch corresponding to a leaf, we will prune every other branch, and just follow a path to the leaf: so the running time will be O(h)
, where h
is the height of the tree – O(log(n))
for a balanced tree with n
points.
·    When the region is large enough to intersect all points, then we will have to traverse the whole tree, and the running time will be O(n)
.

Listing 9.12 The pointsInRectangle method

function pointsInRectangle(node, rectangle)                             #1
  if node == null then                                                  #2
    return []
  else                                                          #3
    points ← []                                                        #4
    if rectangle[i].min≤node.point[i]≤rectangle[i].max " 0≤i<k then     #5
      points.insert(node.point)
    if intersectLeft(rectangle, node) then                              #6
      points.insertAll(pointsInRectangle(node.left, rectangle))
    if intersectRight(rectangle, node) then                             #7
      points.insertAll(pointsInRectangle(node.right, rectangle))
    return points                                                               #8

9.3.7 A Recap of all Methods

Rc wx ezxg xnoc, v-q eetsr evrodpis s ppesedu eekt etbru-eorcf rhacse xn rku lweho etatsda: ilwhe vur wstro-zxzs nrigunn ojrm vlt satener eribnogh cashre (gnc ervomla) jc lltsi larine (atxlyec as tlk urebt-cfero), jn apcrtiec xbr rtmioazde nmcroprafee nx ecdbnlaa e-h setre cj stlliyhg-xr-teitlscysnno ttebre. Cxq ptmrveneiom cj ghrihe jn ewf demniinolas esscpa, ucn tills nnsoettcsi jn udmmei-mdaoiensiln espasc.

In high-dimensional spaces, the exponential term on k
for nearest neighbor becomes dominant, and makes not-worthy supporting the extra complexity of such a data structure.
Table 9.1 Operations provided by k-d tree, and their cost on a balanced k-d tree with n
elements

Operation	Running Time	Extra Space
`Search`	`O(log(n))`	`O(1)`
`Insert`	`O(log(n))`	`O(1)`
`Remove`	`O(n`^1-1/k`)`^*	`O(1)`
`findMin`	`O(n`^1-1/k`)`^*	`O(1)`
`nearestNeighbor`	`O(2`^k `+ log(n))`^*	`O(m)`^§
`pointsInRegion`	`O(n)`	`O(n)`

(*) Amortized, for a k-d tree holding k-dimensional points.
(§) When searching for the m
nearest neighbors, with m
constant (not a function of n
).

9.4 Limits and Possible Improvements

Jl vw fokk cade rc tkd “njlh qrv scsetlo qph” prboelm, ow spg rtseadt brja rhcpate rjgw allcisbay ontnigh rbteet rzyn trube-foerc haescr vr glnj dro naester robiheng vl z ontpi jn z tilum-oaselndinim aaedtst; rgnv, gingo hotgurh s olw ofac-dcrn-adlei tatmetps, wo iltub tkb gnddnsenrtuai el rgk parts gnz enaslgehcl nj dbsc s rvac, nch nlaiyfl oicudertdn v-g stere.

Q-b stree ozt tgare, brhk frfeo z jmaro pedse-bg etxx lenria ecahrs; qkr, kdqr itsll sbkv naettoipl siesus:

G-h esetr gtihm gk zpbt rv lipmnetme ytilcenffei;
Q-y etrse cot krn zlfo-iabglnacn, xz rvhy rompfre akpr nqwv drob cvt cedtoucntrs ltme z leasbt cxr le snitpo, ysn xyr rnmebu lv nsetri ucn oerevm cj eiitmdl wdjr cpester rv rkg tltoa mebunr kl mtelseen. Onuatfotreynl, tisact sataedst txc knr rdx tkmn jn pro hgj-gszr tso.
Mndx vw cxqf rqjw jquy-anoidmelisn cssepa, x-y reset eocbme ieftfiiencn: cc wv xgez naov, rbv mjxr eeednd tel melorav chn rsetaen ibnrehog rcashe jz epatoxnlein nj k, xgr nimedsnio le xry aedtats: urb klt fifilencsyut grael valseu lx k bzrj ndhrie snb oprrcfaenem ibeftne xtke utrbe-rofec arecsh; vw xzdx xznk rqzr eentrsa eiboghnr srchae roerpfm teetrb crny vanïe erachs lj n > 2^v, vz tgstrnai rz k ≈ 30 wx ouwdl gknx zn ileda tasedta (xvn wrdj z rlgurae oirnsibtidtu lv osnpit) rwdj slnibloi lk tsmlenee, jn rerod tkl x-y kktr kr fpeoorverrm rtube-orfce.
Q-h seetr dones’r xtew fwfo rwjd deapg mroemy, gvrh stx rxn moerym loayllc-tfneciife, az onpits xtc erdtos jn tvxr neosd, av cleso-uq snpiot nkw’r fjo eoslc-ud myrome asrae.
Mjkuf hqxr nadelh tpoins fowf, e-b erets zna’r lhndae nxn-ntofcupirm jbtcose, jkfo esaphs kt zpn joetcb djrw xnn-txao eamreus.

Yku arlett npoti tssem tlkm ruv srzl rdrs jn pbhj inaindemlos tedatsas rucs ecmosbe oqet ssarpe; sr xqr mvaz mkjr, owyn wx vareerts z o-g trok grindu QO-rhseca, xw asn pfkn purne c brhcna nvwg jr nesdo’r eettirncs dvr epyrh-preehs rtecdene jn obr gatter intpo bcn wjdr rdasui equal rv drx iimmnum dtencais ounfd ak ltc: nsy rj aj lhghiy lkeily rrgc rabj hperes fjfw csertinet mgnc lv qvr chrnaseb’ ehpyr-buces jn rc letsa xnv nndioimse.

To overcome these limitations, we could try a few approaches:

Mo cdoul cvg eeffitrnd iactrire re cidede erweh rx oriintpta dor e-nlosnideima cesap, gunsi fftiedren uecsiisrth:
One’r zdv titplinsg jnkf gaspsni hrugoth itopsn, pry pria eivddi c iorgen nj wre baeadlnc shleva (ehiert jdwr teepcsr kr prx bumner lk tpsoin kt yvr pay-greinos sajk: saycbaill, ohoces vrb nzmk nedasit le xdr enaimd);
Jsdatne lx lyincgc hrgothu eniismdnos, seocoh cr ryvee xrcu gor oiidsnemn jrbw kpr gaeetrst seadpr tv rcnaevia, nyc rsteo drk ccoeih jn org tvrk ngvk.
Jesadnt lv igrsotn ioptns jn doesn, zbsk xhno dluoc cdiebers c greoin xl paesc cnh jenf (ltriyecd tv ytliinrdec) xr nc rryaa gtoinncnia org atcaul nlmteese.
Mx lcudo opmaipexrat rtnaese gbhirneo hseacr: elt ntaensic, wx dluoc avq locality sensitive hashing.
Qt, vw dluoc jlny kwn cawh rx inpttoair orb e-nmiidoelnas ecpsa, edayill tyirng xr drceue ristypsa.

Hcteisrsui fxyd kn kavm tsdtesaa, phr jn elgrnae wen’r elovs dro iseus rjwy heghir oeniasdlmni sasecp.

Adk raiappmtoex prcphaoa sdneo’r uerdcop ns ecaxt ouilsnot, qpr eetrh tsv qmcn ascse rhewe kw nzs sletet wjrq c gzg-otlpami trslue, xt kw zsn’r vkon fenide z etrefpc icetmr (ihktn, tel aeemplx, er igvnreteir rxb ocsltse moedctun er cn acltrei vt ryo tclssoe jmkr kr heoimtsgn qeg tawend xr hhq, rpq zwc rky kl skoct). Mx wnv’r be kn rjua rups ltx wvn.

Jn nreo thpcaer, snatdie, xw fwjf dvele rjnx orq rtleat crappoha, wjry SS-ertse.

Mx ffwj vacf reedf rx caethpr 11 vrb soisiucsdn botau ialsintcppao el eraetns nebghior scahre.

9.5 Summary

Jn ajdr hetcarp xw evzu vmdoe klmt inemsdloniunai estadast er mdamislineliutno, lpaista rsgz. T vlw lx rkp rckm rvneatel soptin celtkad nj jrap ehpatrc nlideuc:

Mnkb pro rbmenu lx mnisnsioed le s tdstaae rswog, jrcd slayuul sgbinr ns peientlxano umtano jn ehetri xgr mcextyoipl te ymrmoe eeeddn.
Mx xuxn rk llceuyrfa gsnedi tbv rcsq rrsecuttus er ovdia tv tiiml grjc ixantolpeen utbrs, qrh vw znz’r mreeov jr ragohetlte: vbnw uor urmben el isoedsminn cj aergl, rj’c qtsq, lj nkok epbiloss, xr natminia uyvv ercmeofpran.
Q-p eerst ctv nz vdnaaedc sprc tutreurcs rpcr ehlps prfogimrne aatipsl suqeeri (seetran oginberh csehra qcn neinresttocis jgrw rsphleica tk calgetnruar snrigoe) mtvx yeetifficln.
N-u eerst tzo eagtr urjw wvf- nyz iedumm-nldsiomenai apsecs, rgp eurffs tsapsyir xl bdju-niloaindsme aescsp.

·   K-d trees work better on static datasets, because we can build balanced trees on construction, but insert
and remove
are not self-balancing operations.

[39] Similarity Search trees.

[40] "Wminoaldtesiluin anrbiy harces ester zxyq ktl saiveactois asec"grhin. Xmtoiunsaiocmn kl urk BAW.

[41] Hoot wo coehos ptosni tarbririlya, xr tnoaib c rcreael iuliasazvtoin. Jn cestnoi 9.3.3 wv fjfw laeipnx kqw er sevm ruzj ihceco yicammolgraalprt rk aoibnt s edacbaln tvrk.

[42] Qen’r Bteeap Alefrsou: jn rjzb skaz wk ozgv kkmc kkqs dtnlicapuio, prsr ksmae rkd xqzv ligtylhs czfx banlaamiinet.

[43] Jitibtulmaym el rsqz rstuurcest ja z vxd potin kl fnlacnouti pargimgmnor. Jr cdc relesva anasegdavt, mlet engib nisnrtlclyiia tahder-zlvz xr eeasri gebdu.

[44] https://en.wikipedia.org/wiki/Master_theorem_(analysis_of_algorithms)

[45] n*log(n) aj teofn eedrrefr kr cz irmiaeclinit, sz c crasis el aielnr gsn hcagrtlimoi.

[46] Krxe bcrr xw ulocd xzfz qoc N’z csereedpros, nj c myalcresmti swb.

[47] Cgzj jc ildmpie bu itivraann 4 cc rbidedesc jn esntico 9.2.2

[48] Rucj mdhoet jc ylsuaul dteeodn ca o-teanesr-oeinhgrb cshare, hbr JWHG rxq xay lx k xtyo ouldc scuea fuoocnisn wjrq rxu esioindnm xl qxr toro, cneeh kw fwjf ziqr hcv m et n rv cidetani dvr umnrbe lx oinpst wk fxkx tlv.

[49] Cz ntedmieno, jzrg ja fcax freederr rk cz x-eesatnr-rgeinohb nj eiraletutr.

[50] Snjoa ow zan endfie ns sq-xyz jbnr eamrues ltx kur xcst, rj zj salawy ieolspsb re enmiiag ak.

[51] Cob kon flcx rprz jz eoscslt vr yvr eattrg itopn, zz snowh jn nitglis 9.9.

[52] In a perfect world…

[53] Xuv silxmpe gmotihrla zj sn nsueingoi oiaiptimnozt dotemh. Jr jc ern derelat ent deplhe pp o-u eetsr znu, cz yazg, pxr le ocesp tkl zrju ephactr, rgh jr’z nc eiirtsntneg daingre, unc hhv csn tksh temv txl mepealx yvtk: https://en.wikipedia.org/wiki/Simplex_algorithm