ONJava.com    
 Published on ONJava.com (http://www.onjava.com/)
 See this if you're having trouble printing code examples


Using the Lucene Query Parser Without Lucene

by Marcin Maciukiewicz, Daniel Owsiański
05/24/2007

Lucene is a search engine project developed under the Apache Foundation. This project is well known as one of the best solutions when you need search capability in your application. There are plenty of legacy applications where the existing database solution cannot be replaced with Lucene, but they require support for user-friendly searching.

As the application grows more complex, the design of the search tool becomes crucial. The most common design approach is a search based on forms. Users are given a set of fields to express search criteria. One of the most common examples is Google's advanced search page, shown in Figure 1.

Google advanced search
Figure 1. Google advanced search

But we all know this is not what Google is famous for. The "single search field" is the recognizable Google brand. Even though it is a simple interface, it has plenty of power, and most people aren't even aware there is a real query structure behind it. Entering specific phrases gives you a chance to express a whole range of restrictions as if you were using an advanced search form. Lucene has almost the same kind of search capabilities.

Idea

Stop creating sophisticated search forms. You can use technologies like Ajax to give you the power of creating user friendly interfaces; use ideas like "suggest" or "type ahead"; and create a simpler interface so your users won't feel lost in a huge set of search options. Remember, all users want is to quickly find the information they are looking for. You may stop creating sophisticated and hard-to-maintain search forms, instead providing searches based on Lucene query syntax. You could satisfy your users with a simple search field, as Google does (see Figure 2).

Google simple search
Figure 2. Google simple search

At the end of this article you will realize how easy it is.

Parse the Query!

There are many interesting articles, so this one will concentrate only on parts of the Lucene project. In particular, we will concentrate on the indexing of documents, which is the heart of Lucene. Each document has to be indexed in advance to be searched. Let's walk through this process very quickly. Lucene analyzes input and creates Document objects. This is a composition of many Field objects, each composed of name-value pairs. You may think about these as "properties" associated with the document. When you want to find a document, all you need is a set of name-value pairs to be used as search criteria. Lucene will find all documents that meet this criteria.

Let's look at an example. Say we have an address book application. Each entry is represented with the Friend class, as shown in Figure 3.

Friend class
Figure 3. Friend class

Here are a few example search queries you might want to run. All are written in query language supported by Lucene QueryParser:

Query Meaning
name:John Find all entries where attribute "name" is equal "John".
name:J* Find all entries where attribute "name" starts with "J".
name: John AND phoneNumber: 1234 Find all entries where attribute "name" is equal to "John" and attribute "phoneNumber" is equal to "1234".
name:J* NOT name: John Find all entries where attribute "name" starts with "J" but is not equal to "John".

A Plan

We assume all entries in the address book application are stored in a database accessed with Hibernate. This is a typical example, used in plenty of real-world applications. All we need is some code between the search form and the Hibernate API that can understand the query language and produce Hibernate criteria. Figure 4 shows what we want to achieve.

Idea
Figure 4. Solution overview

For each query string entered by the user, we invoke QueryParser to build an object representation. Next QueryInterpreter walks through the object tree and uses HibernateQueryBuilder to create the Hibernate criteria. Hibernate will then execute the query and return the result. Simple, isn't it? We are only missing the amber parts shown on the diagram.

Query Interpreter

Lucene parses the query string and produces an object representation; this is the input for QueryInterpreter. Object representation is a composition made from types extending Lucene's Query class. Each reflects part of the query string. The following table shows selected query types. Some represent restrictions; others are logical conditions.

Query Type Description
TermQuery Matches documents containing a specific term.
BooleanQuery Represents combinations of other queries.
WildcardQuery Criteria with wildcards like * or ?.
PhraseQuery Sequence of terms.
PrefixQuery Terms with a prefix.
RangeQuery Property value range.

For example, the parser reads "name:John" and creates TermQuery. Now you know everything that is necessary to start implementing QueryInterpreter.

First, we create a template method to understand what information the TermQuery type represents:

import org.apache.lucene.search.Query;
import org.apache.lucene.search.TermQuery;

class QueryInterpreter {

    void parse(Query query) {
        if (query instanceof TermQuery) {
            Term term = query.getTerm();
            String fieldName = term.field();
            String fieldText = term.text();
            System.out.println("TermQuery ["+fieldName+":"+fieldText+"]");
        } else {
            throw new IllegalArgumentException("Unsupported Query type [" + query.getClass() + "]");
        }
    }
}

As you see, TermQuery consists of the name field and the expected value. In the preceding code, the QueryInterpreter is able to recognize only simple queries, like "name:John." More complicated restrictions are not supported in the example, but of course it could be extended to support more complex forms. However, that is beyond the scope of this article.

Our next step is to invoke QueryBuilder. We start with an interface:

import org.apache.lucene.search.TermQuery;

public interface IQueryBuilder {
    void termQuery(TermQuery query);
}

import org.apache.lucene.search.Query;
import org.apache.lucene.search.TermQuery;

public class QueryInterpreter {

    private final IQueryBuilder queryBuilder;

    public QueryInterpreter(IQueryBuilder queryBuilder) {
        this.queryBuilder = queryBuilder;
    }

    private void interpret(TermQuery query) {
        this.queryBuilder.termQuery(query);
    }

    public void parse(Query query) {
        if (query instanceof TermQuery) {
            System.out.println("TermQuery");
            interpret((TermQuery) query);
        } else {
            throw new IllegalArgumentException("Unsupported Query type [" + query.getClass() + "]");
        }
    }
}

For every TermQuery occurrence, QueryInterpreter invokes IQueryBuilder.termQuery method. This is somehow similar to how SAX parser works with XML input events.

Hibernate Query Builder

Now we can develop the query builder. As mentioned before, this example is only using TermQuery for the sake of a manageable example--it is very easy to translate it into the Hibernate criteria.

Here is the implementation:

import org.apache.lucene.index.Term;
import org.apache.lucene.search.TermQuery;
import org.hibernate.Criteria;
import org.hibernate.criterion.Criterion;
import org.hibernate.criterion.Restrictions;

public class HibernateQueryBuilder implements IQueryBuilder {

    private final Criteria criteria;

    public HibernateQueryBuilder(Criteria criteria) {
        this.criteria = criteria;
    }

    public void termQuery(TermQuery query){
        Term term = query.getTerm();
        String field = term.field();
        Criterion crit;
        crit = Restrictions.eq(field, term.text());
        this.criteria.add(crit);
    }
}

The Hibernate query is based on Criteria. This is why we require a Criteria instance to create HibernateQueryBuilder. Please check the Hibernate documentation for details on how Criteria works.

Building restrictions based on TermQuery is very easy now with the pieces we have built. Sample query "name:John" is expressed in HQL (Hibernate Query Language) as Friend.name = "John". That is all. Now our simple tool handles any TermQuery and builds proper Hibernate criteria. Things gets more complicated if you want to support other query types available in Lucene API.

Lights, Camera, Action!

Let's put this all together into an example that uses the code developed so far:

import java.util.List;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.WhitespaceAnalyzer;
import org.apache.lucene.queryParser.ParseException;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.Query;
import org.hibernate.Criteria;

public class Test {

    public static void main(String[] args) throws Exception {
        // Setup Hibernate
        Criteria criteria=...;

        // Parse the query
        Analyzer analyzer = new WhitespaceAnalyzer();
        QueryParser luceneParser = new QueryParser("name", analyzer);
        String queryString = "name:John";
        Query luceneQuery = luceneParser.parse(queryString);

        // Build HQL criteria based on Lucene query.
        IQueryBuilder queryBuilder=new HibernateQueryBuilder(criteria);
        QueryInterpreter qi=new QueryInterpreter(queryBuilder);
        // Add criteria
        qi.parse(luceneQuery);

        // Retrieve list of objects that fulfill criteria
        List list = criteria.list();            
        // ... 
    }
}

The preceding code is missing Criteria creation, but that is well described in the Hibernate manual and I'll leave it as an exercise for the reader.

The example is straightforward. First we need the Lucene query parser and analyzer to "read" the query. WhitespaceAnalyzer is one of many implementations. QueryParser uses Analyzer to properly understand the query string. With Query in hand, we create a new HibernateQueryBuilder. The builder requires valid Criteria to work. Then we construct the QueryInterpreter. All we have to do now is to use QueryInterpreter to add the criteria. This is what qi.parse(luceneQuery) does. The last line gives us a list of objects conforming to search constraints.

Conclusion

I hope you see how easy it is to use the Lucene query parser, outside of the scope of the rest of Lucene, and together with an existing search solution in a legacy application. The ideas in this article were born during the development of a real application, and have been tested in a production environment.

In our environment, we have found great flexibility in expressing search criteria, simplicity in implementation, and an added benefit of being able to cache search results. We have created a demo application to let you play with the ideas in this article and take them further. The demo is built with Lucene Query API, Hibernate, and the HSQLDB database.

What Next?

Thanks to the decoupled design, this same technique can be used for other data stores. Why not adapt your existing search to LDAP or the Java Persistence API? (See Figure 5.)

Future implementations
Figure 5. Future implementations

Development work for this is already in progress. You are welcome to join this growing project and help improve the project.

Resources

Marcin Maciukiewicz is an experienced architect and developer. He has worked on a wide range of projects, from small web applications to high-availability enterprise solutions.

Daniel Owsiański is an independent IT consultant and author of webankieta.pl, a leading Polish online survey system.


Return to ONJava.com.

Copyright © 2009 O'Reilly Media, Inc.