<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Solr 'n Stuff</title>
	<atom:link href="http://yonik.wordpress.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://yonik.wordpress.com</link>
	<description></description>
	<lastBuildDate>Thu, 15 Sep 2011 19:42:32 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='yonik.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>Solr 'n Stuff</title>
		<link>http://yonik.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://yonik.wordpress.com/osd.xml" title="Solr &#039;n Stuff" />
	<atom:link rel='hub' href='http://yonik.wordpress.com/?pushpress=hub'/>
		<item>
		<title>MurmurHash3 for Java</title>
		<link>http://yonik.wordpress.com/2011/09/15/murmurhash3-for-java/</link>
		<comments>http://yonik.wordpress.com/2011/09/15/murmurhash3-for-java/#comments</comments>
		<pubDate>Thu, 15 Sep 2011 19:42:31 +0000</pubDate>
		<dc:creator>yonik</dc:creator>
				<category><![CDATA[java]]></category>
		<category><![CDATA[solr]]></category>
		<category><![CDATA[hash]]></category>
		<category><![CDATA[hashcode]]></category>
		<category><![CDATA[MurmurHash3]]></category>

		<guid isPermaLink="false">http://yonik.wordpress.com/?p=152</guid>
		<description><![CDATA[Background I needed a really good hash function for the distributed indexing we&#8217;re implementing for Solr. Since it will be used for partitioning documents, it needed to be really high quality (well distributed) since we don&#8217;t want uneven shards. It also needs to be cross-platform, so a client could calculate this hash value themselves if [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=yonik.wordpress.com&amp;blog=1995971&amp;post=152&amp;subd=yonik&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<h2>Background</h2>
<p>I needed a really good hash function for the distributed indexing we&#8217;re implementing for Solr. Since it will be used for partitioning documents, it needed to be really high quality (well distributed) since we don&#8217;t want uneven shards. It also needs to be cross-platform, so a client could calculate this hash value themselves if desired, to predict which node has a given document.</p>
<h2>MurmurHash3</h2>
<p>MurmurHash3 is one of the top favorite new hash function these days, being both really fast and of high quality. Unfortunately it&#8217;s written in C++, and a quick google did not yield any suitable high quality port. So I took 15 minutes (it&#8217;s small!) to port the 32 bit version, since it should be faster than the other versions for small keys like document ids.  It works in 32 bit chunks and produces a 32 bit hash &#8211; more than enough for partitioning documents by hash code.</p>
<h2>MurmurHash3-java</h2>
<p>It would be nice to prevent others from having to do the same thing. Since stuff like this is small enough, I simply put it under the public domain and uploaded to github.  This way anyone can just copy the file or the function into their project and avoid extra dependencies and license hassles.</p>
<p><a href="http://github.com/yonik/java_util/blob/master/src/util/hash/MurmurHash3.java">Here&#8217;s the code</a>, copy away!</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/yonik.wordpress.com/152/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/yonik.wordpress.com/152/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/yonik.wordpress.com/152/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/yonik.wordpress.com/152/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/yonik.wordpress.com/152/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/yonik.wordpress.com/152/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/yonik.wordpress.com/152/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/yonik.wordpress.com/152/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/yonik.wordpress.com/152/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/yonik.wordpress.com/152/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/yonik.wordpress.com/152/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/yonik.wordpress.com/152/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/yonik.wordpress.com/152/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/yonik.wordpress.com/152/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=yonik.wordpress.com&amp;blog=1995971&amp;post=152&amp;subd=yonik&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://yonik.wordpress.com/2011/09/15/murmurhash3-for-java/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/d826abbc3ebe028c7db08a03a159503f?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">yonik</media:title>
		</media:content>
	</item>
		<item>
		<title>Solr&#8217;s Realtime Get</title>
		<link>http://yonik.wordpress.com/2011/09/07/realtime-get/</link>
		<comments>http://yonik.wordpress.com/2011/09/07/realtime-get/#comments</comments>
		<pubDate>Wed, 07 Sep 2011 17:04:37 +0000</pubDate>
		<dc:creator>yonik</dc:creator>
				<category><![CDATA[lucene]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[solr]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[realtime]]></category>
		<category><![CDATA[solr 4.0]]></category>

		<guid isPermaLink="false">http://yonik.wordpress.com/?p=141</guid>
		<description><![CDATA[Solr took another step toward increasing it&#8217;s NoSQL datastore capabilities, with the addition of realtime get. Background As readers probably know, Lucene/Solr search works off of point-in-time snapshots of the index. After changes have been made to the index, a commit (or a new Near Real Time softCommit) needs to be done before those changes [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=yonik.wordpress.com&amp;blog=1995971&amp;post=141&amp;subd=yonik&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Solr took another step toward increasing it&#8217;s NoSQL datastore capabilities, with the addition of <strong>realtime get</strong>.</p>
<h2>Background</h2>
<p>As readers probably know, Lucene/Solr search works off of point-in-time snapshots of the index. After changes have been made to the index, a commit (or a new <strong>Near Real Time softCommit</strong>) needs to be done before those changes are visible. Even with Solr&#8217;s new NRT (Near Real Time) capabilities, it&#8217;s probably not advisable to reopen the searcher more than once a second. However there are some use cases that require the absolute latest version of a document, as opposed to just a very recent version. This is where Solr&#8217;s new <strong>realtime get</strong> comes to the rescue, where the latest version of a document can be retrieved <strong>without</strong> reopening the searcher and risk disrupting other normal search traffic.</p>
<h2>The Realtime-Get API</h2>
<p>The realtime get handler is registered at the <strong>/get</strong> URL. As an example, a request like<br />
<a href="http://localhost:8983/solr/get?id=SOLR1000&amp;fl=id,name&amp;wt=json"> http://localhost:8983/solr/get?id=SOLR1000&amp;fl=id,name&amp;wt=json</a><br />
returns a response like</p>
<pre>{"doc":{"id":"SOLR1000","name":"Solr, the Enterprise Search Server"}}</pre>
<p>Notice that the optional <strong>fl </strong>(<strong>f</strong>ield <strong>l</strong>ist) parameter works as normal, allowing you to select the fields you want returned.</p>
<p>There&#8217;s also a realtime get component that can be inserted into any request handler, including the standard request handler.</p>
<h2>How it works</h2>
<p>The realtime get feature uses transaction logging to keep track of uncommitted updates to the index.  When a get request for a document is received, this log is checked first and retrieved from there if found.  If it&#8217;s not found, then the latest opened searcher is used to retrieve the document.  Checking the log is super fast, and IO reads from the log are fully concurrent for maximum scalability.</p>
<h2>Try it out</h2>
<p>Download a recent <a href="http://wiki.apache.org/solr/NightlyBuilds">nightly build</a> of Solr 4.0-dev and follow the <a href="http://wiki.apache.org/solr/RealTimeGet">Quick Start</a> guide  on the Solr wiki.  Feedback on the <a href="http://lucene.apache.org/solr/mailing_lists.html">solr-user</a> mailing list is always appreciated!</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/yonik.wordpress.com/141/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/yonik.wordpress.com/141/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/yonik.wordpress.com/141/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/yonik.wordpress.com/141/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/yonik.wordpress.com/141/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/yonik.wordpress.com/141/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/yonik.wordpress.com/141/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/yonik.wordpress.com/141/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/yonik.wordpress.com/141/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/yonik.wordpress.com/141/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/yonik.wordpress.com/141/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/yonik.wordpress.com/141/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/yonik.wordpress.com/141/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/yonik.wordpress.com/141/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=yonik.wordpress.com&amp;blog=1995971&amp;post=141&amp;subd=yonik&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://yonik.wordpress.com/2011/09/07/realtime-get/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/d826abbc3ebe028c7db08a03a159503f?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">yonik</media:title>
		</media:content>
	</item>
		<item>
		<title>Solr relevancy function queries</title>
		<link>http://yonik.wordpress.com/2011/03/10/solr-relevancy-function-queries/</link>
		<comments>http://yonik.wordpress.com/2011/03/10/solr-relevancy-function-queries/#comments</comments>
		<pubDate>Thu, 10 Mar 2011 22:33:30 +0000</pubDate>
		<dc:creator>yonik</dc:creator>
				<category><![CDATA[lucene]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[solr]]></category>
		<category><![CDATA[function query]]></category>
		<category><![CDATA[lucidworks]]></category>
		<category><![CDATA[Similarity]]></category>
		<category><![CDATA[solr 4.0]]></category>

		<guid isPermaLink="false">http://yonik.wordpress.com/?p=131</guid>
		<description><![CDATA[Lucene&#8217;s default ranking function uses factors such as tf, idf, and norm to help calculate relevancy scores. Solr has now exposed these factors as function queries. docfreq(field,term) returns the number of documents that contain the term in the field. termfreq(field,term) returns the number of times the term appears in the field for that document. idf(field,term) [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=yonik.wordpress.com&amp;blog=1995971&amp;post=131&amp;subd=yonik&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Lucene&#8217;s default ranking function uses factors such as <strong>tf</strong>, <strong>idf</strong>, and <strong>norm</strong> to help calculate relevancy scores.<br />
Solr has now exposed these factors as function queries.</p>
<ul>
<li><strong>docfreq(field,term)</strong> returns the number of documents that contain the term in the field.</li>
<li><strong>termfreq(field,term) </strong>returns the number of times the term appears in the field for that document.</li>
<li><strong>idf(field,term)</strong> returns the inverse document frequency for the given  term, using the <a href="http://lucene.apache.org/java/3_0_0/api/core/org/apache/lucene/search/Similarity.html">Similarity</a> for the field.</li>
<li><strong>tf(field,term)</strong> returns the term frequency factor for the given term,  using the <a href="http://lucene.apache.org/java/3_0_0/api/core/org/apache/lucene/search/Similarity.html">Similarity</a> for the field.</li>
<li><strong>norm(field)</strong> returns the &#8220;norm&#8221; stored in the index, the product of the  index time boost and then length normalization factor.</li>
<li><strong>maxdoc()</strong> returns the number of documents in the index, including those  that are marked as deleted but have not yet been purged.</li>
<li><strong>numdocs()</strong> returns the number of documents in the index, not including  those that are marked as deleted but have not yet been purged.</li>
</ul>
<p>We can use these new functions to develop and test custom ranking functions!  For example, if we wanted simple <strong>tf*idf</strong> for a given term, we could issue the following function query (if you have solr&#8217;s example server running with exampledocs indexed, just click on the following link):</p>
<p><a href="http://localhost:8983/solr/select/?fl=score,id&amp;defType=func&amp;q=mul(tf(text,memory),idf(text,memory))">http://localhost:8983/solr/select/?fl=score,id&amp;defType=func&amp;q=<strong>mul(tf(text,memory),idf(text,memory))</strong></a></p>
<p>To avoid repeating the term we are using (text,memory) we can pull the field and term out into other query parameters:</p>
<p><a href="http://localhost:8983/solr/select/?fl=score,id&amp;defType=func&amp;q=mul(tf($f,$t),idf($f,$t))&amp;f=text&amp;t=memory">http://localhost:8983/solr/select/?fl=score,id&amp;defType=func&amp;q=mul(tf($f,$t),idf($f,$t))<strong>&amp;f=text&amp;t=memory</strong></a></p>
<p style="text-align:justify;">Utilizing Solr&#8217;s new ability to sort by arbitrary function queries, we could now sort a query by the number of times a specific term appears in each document.  The following query searches for documents matching &#8220;DDR&#8221;, but then sorts by the number of times &#8220;memory&#8221; appears in the text field.</p>
<p><a href="http://localhost:8983/solr/select/?fl=score,id&amp;q=DDR&amp;sort=termfreq(text,memory) desc">http://localhost:8983/solr/select/?fl=score,id&amp;q=DDR&amp;<strong>sort=termfreq(text,memory) desc</strong></a></p>
<p>We could also utilize the &#8220;norm&#8221; function to sort by the longest field first.  This assumes there were no index time boosts and thus the norm is just the standard length normalization factor.</p>
<p><a href="http://localhost:8983/solr/select/?fl=score,id&amp;q=DDR&amp;sort=norm(text)%20desc">http://localhost:8983/solr/select/?fl=score,id&amp;q=DDR&amp;<strong>sort=norm(text) asc</strong></a></p>
<p>Given Solr&#8217;s plethora of <a href="http://wiki.apache.org/solr/FunctionQuery">function queries</a> (including the new spatial queries that return distance between points), the possibilities are almost endless.  To try this out,  you’ll need a recent <a href="http://wiki.apache.org/solr/FrontPage#solr_development">nightly build</a> of Solr 4.0-dev, or <a href="http://www.lucidimagination.com/enterprise-search-solutions/lucidworks">LucidWorks Enterprise</a>, our commercial version of Solr.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/yonik.wordpress.com/131/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/yonik.wordpress.com/131/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/yonik.wordpress.com/131/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/yonik.wordpress.com/131/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/yonik.wordpress.com/131/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/yonik.wordpress.com/131/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/yonik.wordpress.com/131/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/yonik.wordpress.com/131/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/yonik.wordpress.com/131/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/yonik.wordpress.com/131/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/yonik.wordpress.com/131/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/yonik.wordpress.com/131/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/yonik.wordpress.com/131/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/yonik.wordpress.com/131/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=yonik.wordpress.com&amp;blog=1995971&amp;post=131&amp;subd=yonik&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://yonik.wordpress.com/2011/03/10/solr-relevancy-function-queries/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/d826abbc3ebe028c7db08a03a159503f?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">yonik</media:title>
		</media:content>
	</item>
		<item>
		<title>Solr Result Grouping / Field Collapsing Improvements</title>
		<link>http://yonik.wordpress.com/2010/12/17/solr-result-grouping-field-collapsing-improvements/</link>
		<comments>http://yonik.wordpress.com/2010/12/17/solr-result-grouping-field-collapsing-improvements/#comments</comments>
		<pubDate>Fri, 17 Dec 2010 20:37:38 +0000</pubDate>
		<dc:creator>yonik</dc:creator>
				<category><![CDATA[lucene]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[solr]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[field collapsing]]></category>
		<category><![CDATA[lucidworks]]></category>
		<category><![CDATA[result grouping]]></category>
		<category><![CDATA[solr 4.0]]></category>

		<guid isPermaLink="false">http://yonik.wordpress.com/?p=103</guid>
		<description><![CDATA[I previously introduced Solr&#8217;s Result Grouping, also called Field Collapsing, that limits the number of documents shown for each “group”, normally defined as the unique values in a field or function query. Since then, there have been a number of bug fixes, performance improvements, and feature enhancements. You&#8217;ll need a recent nightly build of Solr [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=yonik.wordpress.com&amp;blog=1995971&amp;post=103&amp;subd=yonik&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I previously introduced <a href="2010/09/16/solr-result-grouping-field-collapsing/">Solr&#8217;s Result Grouping</a>, also called Field Collapsing, that limits the number of documents shown for each “group”, normally defined as the unique values in a field or function query.</p>
<p>Since then, there have been a number of bug fixes, performance improvements, and feature enhancements.  You&#8217;ll need a recent <a href="http://wiki.apache.org/solr/FrontPage#solr_development">nightly build</a> of Solr 4.0-dev, or the newly released <a href="http://www.lucidimagination.com/enterprise-search-solutions/lucidworks">LucidWorks Enterprise</a> v1.6, our commercial version of Solr.</p>
<h3>Feature Enhancements</h3>
<p>One improvement is the ability to group by query via the <strong>group.query</strong> parameter.  This functionality is very similar to <strong>facet.query</strong>, except that it retrieves the top documents that match the query, not just the count.  This has many potential uses, including always getting the top documents for specific groups, or defining custom groups such has price ranges.</p>
<p>Another useful capability is the addition of the <strong>group.main</strong> parameter.  Setting this to true causes the results of the first grouping command to be used as the main result list in a flattened response format that legacy clients will be able to handle.</p>
<p>For example, the grouped response format normally returns highly structured results under &#8220;grouped&#8221;.<br />
<a href="http://localhost:8983/solr/select?wt=json&amp;indent=true&amp;fl=id,name,manu&amp;q=solr+memory&amp;group=true&amp;group.field=manu_exact">&#8230;&amp;q=solr+memory&amp;group=true&amp;group.field=manu_exact</a></p>
<p><code><br />
&nbsp;"grouped":{<br />
&nbsp;&nbsp;"manu_exact":{<br />
&nbsp;&nbsp;&nbsp;"matches":6,<br />
&nbsp;&nbsp;&nbsp;"groups":[{<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"groupValue":"Apache Software Foundation",<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"doclist":{"numFound":1,"start":0,"docs":[<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;{<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"id":"SOLR1000",<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"name":"Solr, the Enterprise Search Server",<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"manu":"Apache Software Foundation"}]<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}},<br />
&nbsp;&nbsp;&nbsp;&nbsp;{<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"groupValue":"Corsair Microsystems Inc.",<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"doclist":{"numFound":2,"start":0,"docs":[<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;{<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"id":"VS1GB400C3",<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"name":"CORSAIR ValueSelect 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - Retail",<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"manu":"Corsair Microsystems Inc."}]<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}},<br />
[...]<br />
</code></p>
<p>If we add <strong>group.main=true</strong> to the request, then we get back a much more familiar looking response (i.e. it looks like a normal non-grouped response):<br />
<a href="http://localhost:8983/solr/select?wt=json&amp;indent=true&amp;fl=id,name,manu&amp;q=solr+memory&amp;group=true&amp;group.field=manu_exact&amp;group.main=true">&#8230;&amp;q=solr+memory&amp;group=true&amp;group.field=manu_exact&amp;group.main=true</a></p>
<p><code><br />
&nbsp;"response":{"numFound":6,"start":0,"docs":[<br />
&nbsp;&nbsp;&nbsp;{<br />
&nbsp;&nbsp;&nbsp;&nbsp;"id":"SOLR1000",<br />
&nbsp;&nbsp;&nbsp;&nbsp;"name":"Solr, the Enterprise Search Server",<br />
&nbsp;&nbsp;&nbsp;&nbsp;"manu":"Apache Software Foundation"},<br />
&nbsp;&nbsp;&nbsp;{<br />
&nbsp;&nbsp;&nbsp;&nbsp;"id":"VS1GB400C3",<br />
&nbsp;&nbsp;&nbsp;&nbsp;"name":"CORSAIR ValueSelect 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - Retail",<br />
&nbsp;&nbsp;&nbsp;&nbsp;"manu":"Corsair Microsystems Inc."},<br />
</code></p>
<p>One can also use the <strong>group.format=simple</strong> parameter to select this simplified flattened response within the normal &#8220;grouped&#8221; section of the response.</p>
<p>Other recent enhancements include support for debugging explain, highlighting, faceting, and the ability to handle missing values in the grouping field by treating all documents without a value as being in the &#8220;null&#8221; group.</p>
<h3>Performance Enhancements</h3>
<p>There have been a number of performance enhancements, including an improvement to the short circuiting logic&#8230; cutting off low ranking documents earlier in the process.  This important optimization resulted in a speedup of about 9x for collapsing on certain fields!</p>
<p>Collapsing on string fields was further optimized with specialized code that worked on ord values instead of the string values.  This doubled the performance yet again!</p>
<p>Please see the <a href="http://wiki.apache.org/solr/FieldCollapsing">Solr Wiki</a> for further documentation on all of result grouping&#8217;s capabilities and parameters.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/yonik.wordpress.com/103/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/yonik.wordpress.com/103/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/yonik.wordpress.com/103/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/yonik.wordpress.com/103/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/yonik.wordpress.com/103/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/yonik.wordpress.com/103/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/yonik.wordpress.com/103/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/yonik.wordpress.com/103/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/yonik.wordpress.com/103/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/yonik.wordpress.com/103/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/yonik.wordpress.com/103/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/yonik.wordpress.com/103/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/yonik.wordpress.com/103/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/yonik.wordpress.com/103/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=yonik.wordpress.com&amp;blog=1995971&amp;post=103&amp;subd=yonik&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://yonik.wordpress.com/2010/12/17/solr-result-grouping-field-collapsing-improvements/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/d826abbc3ebe028c7db08a03a159503f?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">yonik</media:title>
		</media:content>
	</item>
		<item>
		<title>Indexing JSON in Solr 3.1</title>
		<link>http://yonik.wordpress.com/2010/12/08/indexing-json-in-solr-3-1/</link>
		<comments>http://yonik.wordpress.com/2010/12/08/indexing-json-in-solr-3-1/#comments</comments>
		<pubDate>Thu, 09 Dec 2010 04:16:21 +0000</pubDate>
		<dc:creator>yonik</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://yonik.wordpress.com/?p=83</guid>
		<description><![CDATA[Solr has been able to produce JSON results for a long time, by adding wt=json to any query. A new capability has recently been added to allow indexing in JSON, as well as issuing other update commands such as deletes and commits. All of the functionality that was available through XML update commands can now [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=yonik.wordpress.com&amp;blog=1995971&amp;post=83&amp;subd=yonik&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Solr has been able to produce JSON results for a long time, by adding <strong>wt=json</strong> to any query.  A new capability has recently been added to allow indexing in JSON, as well as issuing other update commands such as deletes and commits.</p>
<p>All of the functionality that was available through XML update commands can now be given in JSON.<br />
For example, you can index a document like so:</p>
<p><code><br />
$ curl http://localhost:8983/solr/update/json -H 'Content-type:application/json' -d '<br />
{<br />
&nbsp;&nbsp;"add": {<br />
&nbsp;&nbsp;&nbsp;&nbsp;"doc": {<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"id" : "ISBN:978-0641723445",<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"title" : "The Lightning Thief"<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"author" : "Rick Riordan",<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"series_t" : "Percy Jackson and the Olympians",<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"cat" : ["book","hardcover"],<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"genre_s" : "fantasy",<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"pages_i" : 384<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"price" : 12.50,<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"inStock" : true,<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"popularity" : 10<br />
&nbsp;&nbsp;&nbsp;&nbsp;}<br />
&nbsp;&nbsp;}<br />
}'<br />
</code></p>
<p>Of course, if you want the doc to be visible, you must do a commit.  This could have been done by adding a <strong>commit=true</strong> parameter to the URL in the previous command, or we could have added a <strong>commit</strong> command within the JSON itself.  This time we&#8217;ll issue a separate commit command.</p>
<p><code><br />
curl "http://localhost:8983/solr/update/json?commit=true"<br />
</code></p>
<p>And now, we can query the Solr index and verify the document has been correctly added (requesting the results in JSON of course!)<br />
<a href="http://localhost:8983/solr/select?wt=json&amp;indent=true&amp;q=title:lightning"></p>
<p>http://localhost:8983/solr/select?wt=json&#038;indent=true&#038;q=title:lightning</a></p>
<p>There&#8217;s more documentation on the <a href="http://wiki.apache.org/solr/UpdateJSON">Solr Wiki</a>.<br />
To use this functionality, you&#8217;ll need to use <a href="http://www.lucidimagination.com/enterprise-search-solutions/lucidworks">LucidWorks Enterprise</a> (our commercial version of Solr), or a recent Solr 3.1-dev or 4.0-dev <a href="http://wiki.apache.org/solr/FrontPage#Solr_Development">nightly build</a>.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/yonik.wordpress.com/83/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/yonik.wordpress.com/83/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/yonik.wordpress.com/83/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/yonik.wordpress.com/83/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/yonik.wordpress.com/83/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/yonik.wordpress.com/83/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/yonik.wordpress.com/83/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/yonik.wordpress.com/83/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/yonik.wordpress.com/83/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/yonik.wordpress.com/83/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/yonik.wordpress.com/83/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/yonik.wordpress.com/83/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/yonik.wordpress.com/83/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/yonik.wordpress.com/83/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=yonik.wordpress.com&amp;blog=1995971&amp;post=83&amp;subd=yonik&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://yonik.wordpress.com/2010/12/08/indexing-json-in-solr-3-1/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/d826abbc3ebe028c7db08a03a159503f?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">yonik</media:title>
		</media:content>
	</item>
		<item>
		<title>Solr Result Grouping / Field Collapsing</title>
		<link>http://yonik.wordpress.com/2010/09/16/solr-result-grouping-field-collapsing/</link>
		<comments>http://yonik.wordpress.com/2010/09/16/solr-result-grouping-field-collapsing/#comments</comments>
		<pubDate>Fri, 17 Sep 2010 01:54:16 +0000</pubDate>
		<dc:creator>yonik</dc:creator>
				<category><![CDATA[search]]></category>
		<category><![CDATA[solr]]></category>
		<category><![CDATA[field collapsing]]></category>
		<category><![CDATA[geo search]]></category>
		<category><![CDATA[result grouping]]></category>
		<category><![CDATA[solr 4.0]]></category>
		<category><![CDATA[spatial search]]></category>

		<guid isPermaLink="false">http://yonik.wordpress.com/?p=72</guid>
		<description><![CDATA[Result Grouping, also called Field Collapsing, has been committed to Solr! This functionality limits the number of documents for each &#8220;group&#8221;, usually defined by the unique values in a field (just like field faceting). You can think of it like faceted search, except instead of just getting a count, you get the top documents for [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=yonik.wordpress.com&amp;blog=1995971&amp;post=72&amp;subd=yonik&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><strong>Result Grouping</strong>, also called <strong>Field Collapsing</strong>, has been committed to Solr!<br />
This functionality limits the number of documents for each &#8220;group&#8221;, usually defined by the unique values in a field (just like field faceting).</p>
<p>You can think of it like faceted search, except instead of just getting a count, you get the top documents for that constraint or category.  There are tons of potential use cases:</p>
<ul>
<li>For web search, only show 1 or 2 results for a given website by collapsing on a site field.</li>
<li>For email search, only show 1 or 2 results for a given email thread</li>
<li>For e-commerce, show the top 3 products for each store category (i.e. &#8220;electronics&#8221;, &#8220;housewares&#8221;)</li>
<li>Hiding duplicate documents at query time.</li>
</ul>
<p>In addition to being able to group by the values of a field, you can also group by the values of a function query.  Given that geo search works as a function query, this also opens up possibilities for showing top query matches within 1 mile, between 1 and 2 miles, etc.</p>
<p>Just like faceting, we&#8217;ll be adding new functionality and making continual improvements.<br />
Result Grouping is documented on the <a href="http://wiki.apache.org/solr/FieldCollapsing">Solr Wiki</a>, and you will need a recent<br />
<a href="http://wiki.apache.org/solr/FrontPage#solr_development">nightly build</a> of Solr 4.0-dev to try it out (just make sure it&#8217;s dated after this post).</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/yonik.wordpress.com/72/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/yonik.wordpress.com/72/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/yonik.wordpress.com/72/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/yonik.wordpress.com/72/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/yonik.wordpress.com/72/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/yonik.wordpress.com/72/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/yonik.wordpress.com/72/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/yonik.wordpress.com/72/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/yonik.wordpress.com/72/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/yonik.wordpress.com/72/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/yonik.wordpress.com/72/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/yonik.wordpress.com/72/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/yonik.wordpress.com/72/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/yonik.wordpress.com/72/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=yonik.wordpress.com&amp;blog=1995971&amp;post=72&amp;subd=yonik&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://yonik.wordpress.com/2010/09/16/solr-result-grouping-field-collapsing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/d826abbc3ebe028c7db08a03a159503f?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">yonik</media:title>
		</media:content>
	</item>
		<item>
		<title>CSV output for Solr</title>
		<link>http://yonik.wordpress.com/2010/07/29/csv-output-for-solr/</link>
		<comments>http://yonik.wordpress.com/2010/07/29/csv-output-for-solr/#comments</comments>
		<pubDate>Thu, 29 Jul 2010 17:49:25 +0000</pubDate>
		<dc:creator>yonik</dc:creator>
				<category><![CDATA[search]]></category>
		<category><![CDATA[solr]]></category>
		<category><![CDATA[CSV]]></category>
		<category><![CDATA[solr 3.1]]></category>
		<category><![CDATA[solr 4.0]]></category>

		<guid isPermaLink="false">http://yonik.wordpress.com/?p=61</guid>
		<description><![CDATA[Solr has been able to slurp in CSV for quite some time, and now I&#8217;ve finally got around to adding the ability to output query results in CSV also. The output format matches what the CSV loader can slurp. Adding a simple wt=csv to a query request will cause the docs to be written in [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=yonik.wordpress.com&amp;blog=1995971&amp;post=61&amp;subd=yonik&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Solr has been able to <a href="http://wiki.apache.org/solr/UpdateCSV">slurp in CSV</a> for quite some time, and now I&#8217;ve finally got around to adding the ability to output query results in CSV also.  The output format matches what the CSV loader can slurp.</p>
<p>Adding a simple <strong>wt=csv</strong> to a query request will cause the docs to be written in a CSV format that can be loaded into something like Excel.</p>
<p><a href="http://localhost:8983/solr/select?q=ipod&amp;fl=id,cat,name,popularity,price,score&amp;wt=csv">http://localhost:8983/solr/select?q=ipod&amp;fl=id,cat,name,popularity,price,score&amp;wt=csv</a></p>
<pre>id,cat,name,popularity,price,score
IW-02,"electronics,connector",iPod &amp; iPod Mini USB 2.0 Cable,1,11.5,0.98867977
F8V7067-APL-KIT,"electronics,connector",Belkin Mobile Power Cord for iPod w/ Dock,1,19.95,0.6523595
MA147LL/A,"electronics,music",Apple 60 GB iPod with Video Playback Black,10,399.0,0.2446348
</pre>
<p>CSV formats tend to vary, so there are a number of parameters that allow you to customize the output.  For example setting <strong>csv.escape=\</strong> and <strong>csv.separator=%09</strong> (a URL-encoded tab character) will use a tab separator and backslash escaping to match the default CSV format that MySQL uses.</p>
<p><a href="http://localhost:8983/solr/select?q=ipod&amp;fl=score,id&amp;wt=csv&amp;csv.escape=\&amp;csv.separator=%09">http://localhost:8983/solr/select?q=ipod&amp;fl=score,id&amp;wt=csv&amp;csv.escape=\&amp;csv.separator=%09</a></p>
<pre>score	id
0.98867977	IW-02
0.6523595	F8V7067-APL-KIT
0.2446348	MA147LL/A
</pre>
<p>The CSVResponseWriter is documented on the <a href="http://wiki.apache.org/solr/CSVResponseWriter">Solr Wiki</a>, but you will need a recent<br />
<a href="http://wiki.apache.org/solr/FrontPage#solr_development">nightly build</a> (Solr 3.1-dev or Solr 4.0-dev) to try it out.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/yonik.wordpress.com/61/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/yonik.wordpress.com/61/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/yonik.wordpress.com/61/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/yonik.wordpress.com/61/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/yonik.wordpress.com/61/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/yonik.wordpress.com/61/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/yonik.wordpress.com/61/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/yonik.wordpress.com/61/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/yonik.wordpress.com/61/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/yonik.wordpress.com/61/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/yonik.wordpress.com/61/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/yonik.wordpress.com/61/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/yonik.wordpress.com/61/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/yonik.wordpress.com/61/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=yonik.wordpress.com&amp;blog=1995971&amp;post=61&amp;subd=yonik&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://yonik.wordpress.com/2010/07/29/csv-output-for-solr/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/d826abbc3ebe028c7db08a03a159503f?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">yonik</media:title>
		</media:content>
	</item>
		<item>
		<title>Ranges over Functions in Solr 1.4</title>
		<link>http://yonik.wordpress.com/2009/07/06/ranges-over-functions-in-solr-1-4/</link>
		<comments>http://yonik.wordpress.com/2009/07/06/ranges-over-functions-in-solr-1-4/#comments</comments>
		<pubDate>Tue, 07 Jul 2009 01:37:42 +0000</pubDate>
		<dc:creator>yonik</dc:creator>
				<category><![CDATA[lucene]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[solr]]></category>
		<category><![CDATA[frange]]></category>
		<category><![CDATA[function query]]></category>
		<category><![CDATA[qparser]]></category>
		<category><![CDATA[query syntax]]></category>
		<category><![CDATA[range query]]></category>

		<guid isPermaLink="false">http://yonik.wordpress.com/?p=37</guid>
		<description><![CDATA[Solr 1.4 contains a new feature that allows range queries or range filters over arbitrary functions.  It&#8217;s implemented as a standard Solr QParser plugin, and thus easily available for use any place that accepts the standard Solr Query Syntax by specifying the frange query type.  Here&#8217;s an example of a filter specifying the lower and [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=yonik.wordpress.com&amp;blog=1995971&amp;post=37&amp;subd=yonik&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Solr 1.4 contains a new feature that allows range queries or range filters over arbitrary functions.  It&#8217;s implemented as a standard <a href="http://lucene.apache.org/solr/api/org/apache/solr/search/FunctionRangeQParserPlugin.html">Solr QParser plugin</a>, and thus easily available for use any place that accepts the standard <a href="http://wiki.apache.org/solr/SolrQuerySyntax">Solr Query Syntax</a> by specifying the <strong>frange </strong>query type.  Here&#8217;s an example of a filter specifying the lower and upper bounds for a function:</p>
<p><code>fq={!frange l=0 u=2.2}log(sum(user_ranking,editor_ranking))</code></p>
<p>The other interesting use for frange is to trade off memory for speed when doing range queries on any type of single-valued field.  For example, one can use <strong>frange </strong>on a string field provided that there is only one value per field, and that numeric functions are avoided.</p>
<p>For example, here is a filter that only allows authors between martin and rowling, specified using a standard range query:<br />
<code>fq=author_last_name:[martin TO rowling]</code></p>
<p>And the same filter using a function range query (<strong>frange</strong>):<br />
<code>fq={!frange l=martin u=rowling}author_last_name</code></p>
<p>This can lead to significant performance improvements for range queries with many terms between the endpoints, at the cost of memory to hold the un-inverted form of the field in memory (i.e. a FieldCache entry &#8211; same as would be used for sorting).  If the field in question is already being used for sorting or other function queries, there won&#8217;t be any additional memory overhead.</p>
<p>The following chart shows the results of a test of frange queries vs standard range queries on a string field with 200,000 unique values.  For example, frange was 14 times faster when executing a range query / range filter that covered 20% of the terms in the field.  For narrower ranges that matched less than 5% of the values, the traditional range query performed better.</p>
<table border="1">
<tbody>
<tr>
<th>Percent of terms covered</th>
<th>Fastest implementation</th>
<th>Speedup (how many times faster)</th>
</tr>
<tr>
<td>100%</td>
<td>frange</td>
<td>43.32</td>
</tr>
<tr>
<td>20%</td>
<td>frange</td>
<td>14.25</td>
</tr>
<tr>
<td>10%</td>
<td>frange</td>
<td>8.07</td>
</tr>
<tr>
<td>5%</td>
<td>frange</td>
<td>1.337</td>
</tr>
<tr>
<td>1%</td>
<td>normal range query</td>
<td>3.59</td>
</tr>
</tbody>
</table>
<p>Of course, Solr 1.4 also contains the new <a href="http://www.lucidimagination.com/blog/2009/05/13/exploring-lucene-and-solrs-trierange-capabilities/">TrieRange </a>functionality that will generally have the best time/space profile for range queries over numeric fields.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/yonik.wordpress.com/37/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/yonik.wordpress.com/37/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/yonik.wordpress.com/37/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/yonik.wordpress.com/37/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/yonik.wordpress.com/37/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/yonik.wordpress.com/37/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/yonik.wordpress.com/37/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/yonik.wordpress.com/37/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/yonik.wordpress.com/37/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/yonik.wordpress.com/37/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/yonik.wordpress.com/37/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/yonik.wordpress.com/37/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/yonik.wordpress.com/37/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/yonik.wordpress.com/37/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=yonik.wordpress.com&amp;blog=1995971&amp;post=37&amp;subd=yonik&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://yonik.wordpress.com/2009/07/06/ranges-over-functions-in-solr-1-4/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/d826abbc3ebe028c7db08a03a159503f?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">yonik</media:title>
		</media:content>
	</item>
		<item>
		<title>Filtered query performance increases for Solr 1.4</title>
		<link>http://yonik.wordpress.com/2009/05/27/filtered-query-performance-increases-for-solr-1-4/</link>
		<comments>http://yonik.wordpress.com/2009/05/27/filtered-query-performance-increases-for-solr-1-4/#comments</comments>
		<pubDate>Wed, 27 May 2009 18:12:49 +0000</pubDate>
		<dc:creator>yonik</dc:creator>
				<category><![CDATA[lucene]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[solr]]></category>
		<category><![CDATA[filtered query]]></category>
		<category><![CDATA[solr 1.4]]></category>
		<category><![CDATA[solr performance]]></category>

		<guid isPermaLink="false">http://yonik.wordpress.com/?p=30</guid>
		<description><![CDATA[One of the many performance improvements in the upcoming Solr 1.4 release involves improved filtering performance. Solr 1.4 filters are both faster (anywhere from 30% to 80% faster to calculate intersections, depending on configuration), take less memory (40% smaller), and are more efficiently applied to the query during a search. In previous Solr releases, filters [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=yonik.wordpress.com&amp;blog=1995971&amp;post=30&amp;subd=yonik&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>One of the many performance improvements in the upcoming Solr 1.4 release involves improved filtering performance.  Solr 1.4 filters are both faster (anywhere from 30% to 80% faster to calculate intersections, depending on configuration), take less memory (40% smaller), and are more efficiently applied to the query during a search.</p>
<p>In previous Solr releases, filters were applied after the main query and thus had little impact on overall query performance.  Filters are now checked in parallel with the query, resulting in greater speedups the fewer documents that match the filters.</p>
<p>Example: Adding a filter that matched 10% of a large index resulted in a 300% performance increase for a dismax query consisting of three words on a single field with proximity boost.</p>
<p>Related issues:</p>
<p><a href="https://issues.apache.org/jira/browse/SOLR-1169">https://issues.apache.org/jira/browse/SOLR-1169</a></p>
<p><a href="https://issues.apache.org/jira/browse/SOLR-1179">https://issues.apache.org/jira/browse/SOLR-1179</a></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/yonik.wordpress.com/30/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/yonik.wordpress.com/30/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/yonik.wordpress.com/30/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/yonik.wordpress.com/30/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/yonik.wordpress.com/30/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/yonik.wordpress.com/30/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/yonik.wordpress.com/30/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/yonik.wordpress.com/30/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/yonik.wordpress.com/30/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/yonik.wordpress.com/30/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/yonik.wordpress.com/30/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/yonik.wordpress.com/30/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/yonik.wordpress.com/30/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/yonik.wordpress.com/30/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=yonik.wordpress.com&amp;blog=1995971&amp;post=30&amp;subd=yonik&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://yonik.wordpress.com/2009/05/27/filtered-query-performance-increases-for-solr-1-4/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/d826abbc3ebe028c7db08a03a159503f?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">yonik</media:title>
		</media:content>
	</item>
		<item>
		<title>Solr scalability improvements</title>
		<link>http://yonik.wordpress.com/2008/12/01/solr-scalability-improvements/</link>
		<comments>http://yonik.wordpress.com/2008/12/01/solr-scalability-improvements/#comments</comments>
		<pubDate>Tue, 02 Dec 2008 02:53:48 +0000</pubDate>
		<dc:creator>yonik</dc:creator>
				<category><![CDATA[java]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[solr]]></category>
		<category><![CDATA[cache]]></category>
		<category><![CDATA[LRU]]></category>
		<category><![CDATA[NIO]]></category>
		<category><![CDATA[scalability]]></category>

		<guid isPermaLink="false">http://yonik.wordpress.com/?p=20</guid>
		<description><![CDATA[With CPU cores constantly increasing, there has been some major work done in Lucene/Solr to increase the scalability under multi-threaded load. Read-only IndexReaders One bottleneck was synchronization around the checking of deleted docs in a Lucene IndexReader.  Since another thread could delete a document at any time, the IndexReader.isDeleted() call was synchronized.  It&#8217;s a very [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=yonik.wordpress.com&amp;blog=1995971&amp;post=20&amp;subd=yonik&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>With CPU cores constantly increasing, there has been some major work done in <a href="http://lucene.apache.org/solr/">Lucene/Solr</a> to increase the scalability under multi-threaded load.</p>
<h2>Read-only IndexReaders</h2>
<p>One bottleneck was synchronization around the checking of deleted docs in a Lucene IndexReader.  Since another thread could delete a document at any time, the IndexReader.isDeleted() call was <em>synchronized</em>.  It&#8217;s a very quick call, simply checking if a bit is set in a BitVector, but the problem was that it can be called millions of times in the process of satisfying a single query. The Read-only IndexReader feature allowed for the removal of this synchronization by prohibiting deletion.</p>
<h2>Use of NIO to read index files</h2>
<p>The standard method for Lucene to read index files is via Java&#8217;s RandomAccessFile.  Reading a part of the file involves two calls, a <strong>seek() </strong>to position the file pointer followed by a <strong>read()</strong> to get the data.  For multiple threads to share the same RandomAccessFile instance, this obviously involves synchronization to avoid one thread changing the file pointer before another thread gets to read at the file position it set.   If the data to be read isn&#8217;t in the operating system cache, it&#8217;s even worse news&#8230; the synchronization causes all other reads to block while the data is retrieved from disk, even if some of those reads could have been quickly satisified.</p>
<p>The preferred solution would be to have a method on RandomAccessFile that accepted an offset to read from.  This could easily be implemented by the JVM via a <strong>pread()</strong> system call.  But since Sun has not provided this functionality, we need to use something else.  NIO&#8217;s FileChannel <em>does </em>have the type of method we are looking for:  <strong>FileChannel.read(ByteBuffer dst, long position)</strong></p>
<p>Solr now uses the non-synchronizing NIO method of reading index files (via Lucene&#8217;s NIOFSDirectory)  by default if you are on a non-Windows platform.  Windows systems default to the older method since it turns out to be faster than the new method &#8211; the reason being a long standing &#8220;bug&#8221; in Java that still synchronizes internally even when using FileChannel.read().</p>
<h2>Non blocking caches</h2>
<p>Solr&#8217;s standard LRU cache implementation use a synchronized LinkedHashMap.  A single cache could be checked hundreds or thousands of times during the course of a single request that involves faceting.  A non-blocking ConcurrentLRUCache was developed as an alternative implementation, and is now the default for Solr&#8217;s filter cache.  One user indicated that this has doubled their query throughput under ideal circumstances.</p>
<h2>Where to find this scalability goodness?</h2>
<p><a href="http://www.apache.org/dyn/closer.cgi/lucene/solr">Solr 1.3</a> has read-only IndexReaders, but for the other scalability improvements, including the improved faceting, you&#8217;ll have to grab a <a href="http://hudson.zones.apache.org/hudson/job/Solr-trunk/lastSuccessfulBuild/artifact/trunk/dist/">nightly Solr build</a>.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/yonik.wordpress.com/20/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/yonik.wordpress.com/20/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/yonik.wordpress.com/20/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/yonik.wordpress.com/20/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/yonik.wordpress.com/20/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/yonik.wordpress.com/20/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/yonik.wordpress.com/20/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/yonik.wordpress.com/20/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/yonik.wordpress.com/20/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/yonik.wordpress.com/20/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/yonik.wordpress.com/20/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/yonik.wordpress.com/20/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/yonik.wordpress.com/20/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/yonik.wordpress.com/20/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=yonik.wordpress.com&amp;blog=1995971&amp;post=20&amp;subd=yonik&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://yonik.wordpress.com/2008/12/01/solr-scalability-improvements/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/d826abbc3ebe028c7db08a03a159503f?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">yonik</media:title>
		</media:content>
	</item>
	</channel>
</rss>
