concept geofilt in category solr

appears as: geofilt, geofilt
Solr in Action

This is an excerpt from Manning's book Solr in Action.

The higher the cost for a filter, the later it will execute. In the example, the category: technology filter is executed first because it has the lowest cost. It probably makes sense for this filter to have the lowest cost because it can be executed quickly and is likely to greatly reduce the number of documents to a single category of documents. The next filter (with a cost of 2) limits all results to those with a date in the last year. The next most expensive filter is the geofilt operation, which must calculate a radius and limit results to those found within 50 miles of that radius (an expensive operation). The final filter, with a cost of 100, is both expensive (due to the math) and highly variable (because it accepts keywords), so it has both cost=100 and cache=false specified. It may seem odd that we would jump from a cost of 3 to a cost of 100. Although the costs do not have to be sequential (they’re applied in relative order to each other), a cost greater than or equal to 100 invokes a special feature in Solr called post filtering.

Such a request is wasteful, however, as it requires you to repeat the same input parameters multiple times. Thankfully, the geofilt, bbox, and geodist implementations are all able to pull their parameters from dereferenced query string values on the request. As such, you can simplify the previous query to the following:

You saw in section 15.2.1 how to perform radius and bounding box searches using a LatLonType field and Solr’s simple, single-point geospatial implementation. It’s possible to perform this same type of searching upon SpatialRecursivePrefixTreeFieldType fields through the use of the geofilt and bbox query parsers discussed in section 15.2.1:

Sorting on distance with geofilt

It’s possible to simulate the geodist() calculation by having the distance returned from an applied geofilt operation. If you recall from figure 15.2, the geofilt query parser contains a performance optimization in that it first applies a bounding box filter and then calculates the distance for the remaining points inside that box. Because the geofilt query parser calculates the distance between the point in the query and every document under consideration, it’s possible to have that distance returned as the score for a document:

http://localhost:8983/solr/geospatial/select?
  sort=score asc&
  q={!geofilt pt=37.775,-122.419 sfield=location d=5 score=distance}
!@%STYLE%@!
{"css":"{\"css\": \"font-weight: bold;\"}","target":"[[{\"line\":2,\"ch\":53},{\"line\":2,\"ch\":67}]]"}
!@%STYLE%@!

The score=distance parameter in the geofilt operation causes geofilt to internally return the distance between the point in the query and the (closest) point in the geo field for each document. Because this geofilt is part of the main query (q parameter), the score for the document will be equal to the distance calculated for the document. In addition to score=distance, you can also request score=recipdistance, which will return the reciprocal of the distance such that closer documents rank higher. If you do not pass in a score parameter, the distance is not necessarily calculated (for SpatialRecursivePrefixTreeFieldType fields), so every document receives the same constant score for the requested geofilt.

If you want the score calculated for all documents (even those falling outside of the geofilt), you can also turn filtering off for the geofilt query parser:

http://localhost:8983/solr/geospatial/select?
  sort=score asc&
  q={!geofilt pt=37.775,-122.419 sfield=location_rpt d=5
      score=distance filter=false}
!@%STYLE%@!
{"css":"{\"css\": \"font-weight: bold;\"}","target":"[[{\"line\":3,\"ch\":21},{\"line\":3,\"ch\":33}]]"}
!@%STYLE%@!

By turning filtering off on the geofilt (yes, that does sound bizarre, but it works), you can effectively cause geofilt to behave similarly to the geodist function, in that it returns the distance calculation, but does not apply a filter. Unfortunately, because the geofilt is not a function query, if you want to use it in a function for purposes of sorting separately from the score of the main query, you will have to wrap the geofilt query in a query function as follows:

http://localhost:8983/solr/geospatial/select?
  sort=$distance asc&
  fl=id,distance:$distance&
  q=*:*&
  distance=query($distFilter)&
  distFilter={!geofilt pt=37.775,-122.419 sfield=location_rpt d=5
      score=distance filter=true}
!@%STYLE%@!
{"css":"{\"css\": \"font-weight: bold;\"}","target":"[[{\"line\":1,\"ch\":7},{\"line\":1,\"ch\":16}],[{\"line\":2,\"ch\":17},{\"line\":2,\"ch\":26}],[{\"line\":2,\"ch\":16},{\"line\":2,\"ch\":26}],[{\"line\":4,\"ch\":11},{\"line\":4,\"ch\":30}]]"}
!@%STYLE%@!

This request passes the geofilt into the query function, which ultimately returns the score of the distFilter operation (in this case, the shortest distance from the point specified in the geofilt). This dereferenced distance parameter can then be passed to the sort and fl parameters and used as the geodist() function would be when using a LatLongType. As you can see, there are many options for calculating the distance of the value(s) in a field from a given point.

sitemap

Unable to load book!

The book could not be loaded.

(try again in a couple of minutes)

manning.com homepage
test yourself with a liveTest