MurmurHash3 for Java

15 09 2011

Note: my blog has moved: See MurmurHash3 for Java


Solr scalability improvements

1 12 2008

With CPU cores constantly increasing, there has been some major work done in Lucene/Solr to increase the scalability under multi-threaded load.

Read-only IndexReaders

One bottleneck was synchronization around the checking of deleted docs in a Lucene IndexReader.  Since another thread could delete a document at any time, the IndexReader.isDeleted() call was synchronized.  It’s a very quick call, simply checking if a bit is set in a BitVector, but the problem was that it can be called millions of times in the process of satisfying a single query. The Read-only IndexReader feature allowed for the removal of this synchronization by prohibiting deletion.

Use of NIO to read index files

The standard method for Lucene to read index files is via Java’s RandomAccessFile.  Reading a part of the file involves two calls, a seek() to position the file pointer followed by a read() to get the data.  For multiple threads to share the same RandomAccessFile instance, this obviously involves synchronization to avoid one thread changing the file pointer before another thread gets to read at the file position it set.   If the data to be read isn’t in the operating system cache, it’s even worse news… the synchronization causes all other reads to block while the data is retrieved from disk, even if some of those reads could have been quickly satisified.

The preferred solution would be to have a method on RandomAccessFile that accepted an offset to read from.  This could easily be implemented by the JVM via a pread() system call.  But since Sun has not provided this functionality, we need to use something else.  NIO’s FileChannel does have the type of method we are looking for: dst, long position)

Solr now uses the non-synchronizing NIO method of reading index files (via Lucene’s NIOFSDirectory) by default if you are on a non-Windows platform.  Windows systems default to the older method since it turns out to be faster than the new method – the reason being a long standing “bug” in Java that still synchronizes internally even when using

Non blocking caches

Solr’s standard LRU cache implementation use a synchronized LinkedHashMap.  A single cache could be checked hundreds or thousands of times during the course of a single request that involves faceting.  A non-blocking ConcurrentLRUCache was developed as an alternative implementation, and is now the default for Solr’s filter cache.  One user indicated that this has doubled their query throughput under ideal circumstances.

Where to find this scalability goodness?

Solr 1.3 has read-only IndexReaders, but for the other scalability improvements, including the improved faceting, you’ll have to grab a nightly Solr build.

Solr Faceted Search Performance Improvements

25 11 2008

See facet performance benchmarks on my new blog for the latest performance benchmarks.

lookup3ycs : a standard high performance string hash

14 06 2008

I was surprised to discovered that there isn’t a good cross-platform hash function defined for strings. MD5, SHA, FVN, etc, all define hash functions over bytes, meaning that it’s under-specified for strings.

So I set out to create a standard 32 bit string hash that would be well defined for implementation in all languages, have very high performance, and have very good hash properties such as distribution. After evaluating all the options, I settled on using Bob Jenkins’ lookup3 as a base. It’s a well studied and very fast hash function, and the hashword variant can work with 32 bits at a time (perfect for hashing unicode code points). It’s also even faster on the latest JVMs which can translate pairs of shifts into native rotate instructions.

The only problem with using lookup3 hashword is that it includes a length in the initial value. This would suck some performance out since directly hashing a UTF8 or UTF16 string (Java) would require a pre-scan to get the actual number of unicode code points. The solution was to simply remove the length factor, which is equivalent to biasing initVal by -(numCodePoints*4). This slightly modified lookup3 I define as lookup3ycs.

So the definition of the cross-platform string hash lookup3ycs is:

The hash value of a character sequence (a string) is defined to be the hash of it’s unicode code points, according to lookup3 hashword, with the initval biased by -(length*4).

So by definition

lookup3ycs(k,offset,length,initval) == lookup3(k,offset,length,initval-(length*4))


lookup3ycs(k,offset,length,initval+(length*4)) == lookup3(k,offset,length,initval)

An obvious advantage of this relationship is that you can use lookup3 if you don’t have an implementation of lookup3ycs.

Here’s my optimized version for Java

Update: I’ve also included a 64 bit version called lookup3ycs64

Distributed Search for Solr

27 02 2008

A new chapter in Solr scalability has been opened with the addition of distributed search!

Distributed Search splits an index into multiple shards, and queries across all the shards, combining the results and presenting a single merged response that looks like it came from a single server.

Solr’s current implementation uses SolrJ (the solr java client) to talk to other Solr servers via HTTP, in two main phases. The first phase collects matching document ids and scores, as well as doing any requested faceting. The second phase retrieves the stored fields for selected documents, does highlighting, and may include additional faceting requests to nail down exact facet counts.