Advanced Filter Caching in Solr

10 02 2012

Note: my blog has moved. Please see : Advanced Filter Caching in Solr

Ranges over Functions in Solr 1.4

6 07 2009

Solr 1.4 contains a new feature that allows range queries or range filters over arbitrary functions.  It’s implemented as a standard Solr QParser plugin, and thus easily available for use any place that accepts the standard Solr Query Syntax by specifying the frange query type.  Here’s an example of a filter specifying the lower and upper bounds for a function:

fq={!frange l=0 u=2.2}log(sum(user_ranking,editor_ranking))

The other interesting use for frange is to trade off memory for speed when doing range queries on any type of single-valued field.  For example, one can use frange on a string field provided that there is only one value per field, and that numeric functions are avoided.

For example, here is a filter that only allows authors between martin and rowling, specified using a standard range query:
fq=author_last_name:[martin TO rowling]

And the same filter using a function range query (frange):
fq={!frange l=martin u=rowling}author_last_name

This can lead to significant performance improvements for range queries with many terms between the endpoints, at the cost of memory to hold the un-inverted form of the field in memory (i.e. a FieldCache entry – same as would be used for sorting). If the field in question is already being used for sorting or other function queries, there won’t be any additional memory overhead.

The following chart shows the results of a test of frange queries vs standard range queries on a string field with 200,000 unique values. For example, frange was 14 times faster when executing a range query / range filter that covered 20% of the terms in the field. For narrower ranges that matched less than 5% of the values, the traditional range query performed better.

Percent of terms covered Fastest implementation Speedup (how many times faster)
100% frange 43.32
20% frange 14.25
10% frange 8.07
5% frange 1.337
1% normal range query 3.59

Of course, Solr 1.4 also contains the new TrieRange functionality that will generally have the best time/space profile for range queries over numeric fields.


Get every new post delivered to your Inbox.