Apache SOLR Search – Find Everything

Recently, I had a chance to build a search application for one of our clients.  The client had an existing Google Search Appliance integrated with multiple Plone CMS sites.  The Google Search Appliance has been discontinued and they were looking for a replacement for their search application.  UDig recommended Apache SOLR as a replacement.  We were able to leverage a single search schema for multiple sites and consolidate their search application into a single page application that utilizes AJAX Solr to provide a comprehensive search for the end users.  We found that the open source community has embraced SOLR as a search solution and as such, there is a Plone plugin that we used to provide near real-time indexing of new documents as they are added to the Plone CMS.  Also, with a rich set of API’s available in SOLR, the existing content, no matter how old, was indexed and searchable.  The client also wanted the ability to define custom rank orders to documents that they considered highly relevant.  With SOLR, we easily changed the ranking order based on the criteria supplied by the client.

What is SOLR?

Apache SOLR is an enterprise search platform built on Apache Lucene.  Lucene is a search engine packaged together in a set of jar files.  SOLR takes the Lucene API and builds features on top of them to make the API’s available to a web server.  This also makes building a search application much easier.  SOLR is defined as a “highly reliable, scalable and fault tolerant, providing distributed indexing, replication and load-balanced querying, automated failover and recovery, centralized configuration and more. SOLR powers the search and navigation features of many of the world’s largest internet sites” (http://lucene.apache.org/solr/).  This standalone search server provides multiple mechanisms to enter your data into SOLR and provides a query language to retrieve your documents from SOLR.

So, how do you begin?

Well, first download and install SOLR using Apache’s easy to follow quickstart.  After you have a working SOLR demo site, then you can begin to figure out how to use it in your environment.  Let’s say you have a web site and you want to index the web pages for a custom search.  You can use Apache Nutch which is a mature web crawler that integrates directly with SOLR.  How about my .net Forms application? Well, there are apis for that too.  SolrNet is a .net client for SOLR.  Maybe you have a Java application – SolrJ to the rescue!  How about that file system that has hundreds of documents?  Are you constantly trying to find a word document from 2 years ago?  You can index that file system into SOLR and then search the index for that document.  In our case, we used the Plone CMS SOLR plugin to index documents.  The plugin supported both HTML documents and attachments such as Excel, Word and PDF.  This met our needs for indexing and we ended up with an index that we could use to build our search application.

Building a Search Application

We chose AJAX Solr to build out the search application.  AJAX Solr is a JavaScript library that can be extended to provide custom search results. This choice provides the users with a single search application to search all of the different locations that data is stored.   The result is a cohesive application that the users will come to rely on.  We built out the search application to include some of SOLRs wonderful features such as Faceted Search, Filtering, Query Suggestions, Spell Check and Auto-complete.  We also ranked the results so that relevant information is provided to the user higher up in the search results. Let’s breakdown some of the search features of SOLR.

To send a query to the SOLR server, you construct an URL to be sent to the server.

Basic Search
To search a term in your index called searchableText, simply put the query after a colon on the search URL

To search a phrase, enclose the query in double quotes

Sloppy Phrases Search
A proximity query will search for a phrase within a phrase.  Utilizing a tilde (~) we can tell SOLR to look for the number of words to search for. “fast search” will match “fast search” and “fast solr search” in the searchableText field.  ~1 tells SOLR to search within 1 word of our search phrase.

Boost Queries
Any query clause can be boosted with the ^ operator. The boost is multiplied into the normal score for the clause and will affect its importance relative to other clauses. In this example, any documents with “UDig” in the searchableText field will have its score boosted by 10 which will cause that result to be higher in the results than a searchableText field with only the word “blog” in the field.

Range Queries
A range query selects documents with values between a specified lower and upper bound. Range queries work on numeric fields, date fields, and even string and text fields.

  • Square brackets [ ] denote an inclusive range query that matches values including the upper and lower bound.
  • Curly brackets { } denote an exclusive range query that matches values between the upper and lower bounds, but excluding the upper and lower bounds themselves.

There are many, many ways to slice and dice your search index.  SOLR has a very rich API which can be utilized to provide users with the best search results possible.  From internal sites and databases to externally facing websites, utilizing search will help users find everything.  In fact, our recent project actually helped users save lives.  Click here to read more.