SearchAssist: A Portable Search Engine in Java
Pages: 1, 2
Search Engine Applet Implementation
Under the Hood
The ternary search tree is the heart of the applet. The index is
deserialized from the file and loaded into memory in the applet's
init() method. To navigate the tree (i.e., to query the index), the
text content of the JTextField is used. It is wired to listen to
keyPressed and keyReleased events using the
addKeyListener(KeyListener) method. For every character pressed,
the current position in the index moves deeper into the tree, provided the
prefix exists.
To provide the auto-completion functionality, the JTextField is
customized through inheritance. The AutoCompleteTextField has an
anonymous inner class that extends the KeyAdapter to listen to
keyPressed and keyReleased events. The AutoCompleteTextField uses
an AutoCompleteDictionary to do a
lookup(textFieldContents) of the typed text. The
AutoCompleteDictionary is an interface, so that different
implementations can be plugged in. The lookup(textFieldContents)
method returns a string that is the auto-completion text suggested by the
AutoCompleteDictionary. If the string is empty,
textField remains unchanged, indicating that the search word does
not exist. See AutoCompleteTextField.java, excerpted here:
protected void setUp()
{
addKeyListener(new KeyAdapter()
{
public void keyPressed(KeyEvent e)
{
isTextSelected =
getSelectionStart()
!= getSelectionEnd();
}
public void keyReleased(KeyEvent e)
{
char charPressed = e.getKeyChar();
int charCodePressed = e.getKeyCode();
if (charCodePressed == KeyEvent.VK_DELETE
|| charPressed == KeyEvent.CHAR_UNDEFINED)
{
return;
}
if (getSelectionStart()
!= getSelectionEnd())
{
setText(
getText().substring(
0,
getSelectionStart()));
}
String input = getText();
if (lookup(input) != null)
{
setText(lookup(input));
setSelectionStart(input.length());
setSelectionEnd(getText().length());
isTextSelected = true;
}
else
{
isTextSelected = false;
}
if (charCodePressed
== KeyEvent.VK_BACK_SPACE
&& isTextSelected
&& input.length() > 0)
{
setText(
input.substring(0, input.length()));
}
}
});
}
Figure 7 shows the package structure of the applet.

Figure 7. The package structure
Because the ternary search tree is the dictionary here, it can be used
directly to provide the auto-completion text. The TSTDictionary
implements the AutoCompleteDictionary's
lookup(textFieldContents) method by invoking the
matchPrefixString(textFieldContents) on the ternary search tree it
contains internally. If the tree has a keyword whose prefix matches or
completely matches the textFieldContents, the keyword is returned.
Otherwise, if there are no matches it returns null. Apart from returning the
closest match, the TSTDictionary populates the
JTextArea with the remaining near-matches. The size()
of the lists of keywords contained is displayed alongside the keywords in
square brackets indicating the number of hits. See
TSTDictionary.java, excerpted here:
public String lookup(String s)
{
String textFieldContents = textField.getText();
if (textFieldContents == ""
|| textFieldContents == null)
{
return null;
}
String matchingKeys = tst.matchPrefixString(textFieldContents);
textArea.setText(matchingKeys);
String match = tst.matchPrefixString(textFieldContents, 1);
if (match != null)
{
if (match == "")
{
match = s;
}
else
{
int markerLocation =
match.indexOf(
TernarySearchTree.MARKER_HIT_OPEN)
- 1;
if (markerLocation > 0)
{
match = match.substring(0, markerLocation);
}
}
}
return match;
}
The Custom JTextField
When the user presses the Enter key in the
AutoCompleteTextField, the ActionListener's
actionPerformed() method will be invoked. This method retrieves
the list of unique IDs from the ternary search tree using the
textField contents. If the textField contains a
keyword that exists in the tree, it will be auto-completed. The list of unique
IDs thus obtained is rendered on the right-hand side. See
TSTApplet.java, excerpted here:
public void init()
{
super.init();
final String serTSTFile =
getParameter(PARAM_SERIALIZED_TST_FILE);
final String idRendererImpl =
getParameter(PARAM_ID_RENDERER_IMPL);
try
{
ObjectInputStream ois =
new ObjectInputStream(
getClass().getResourceAsStream(
serTSTFile));
tst = (TernarySearchTree) ois.readObject();
Class clazz =
getClass().getClassLoader().loadClass(
idRendererImpl);
idRenderer = (IDRenderer) clazz.newInstance();
// Anonymous inner class implementing
// the Private Interface pattern.
Attributes attrs = new Attributes()
{
public String getParameter(String paramName)
{
return TSTApplet.this.getParameter(
paramName);
}
public JSObject getWindow()
{
return JSObject.getWindow(
TSTApplet.this);
}
};
idRenderer.init(attrs);
renderingComponent =
idRenderer.getRenderingComponent();
}
...
initComponents();
dictionary = new TSTDictionary(tst, textField, textArea);
textField.setDictionary(dictionary);
}
private void textFieldActionPerformed(
java.awt.event.ActionEvent evt)
{
String str = textField.getText();
if (str.equals(""))
{
return;
}
List list = tst.get(str);
if (list == null)
{
return;
}
idRenderer.renderNodes(list);
}
Customizable Search Result Renderer
From the beginning, one of the main goals has been to keep the rendering
customizable. To facilitate that, the TSTApplet uses an interface
on which it invokes the renderNodes(List ids) method. This
IDRenderer interface has two lifecycle methods.
The init(Attributes params) method initializes the component
with the Attributes instance, and the
getRenderingComponent() method returns the actual Swing component
that renders the search results. This component is added to the
JPanel on the right-hand side. To TSTApplet is made
aware of the class that implements IDRenderer by specifying its
fully qualified class name as the IDRendererImpl Applet parameter.
Using this, the applet, in the init() method, dynamically loads
the IDRenderer implementation class. See
IDRenderer.java (shown below), TST.html, and
TSTApplet.java.
/**
* Order of invocation: (setApplet) - Once
* (getRenderingComponent) - Once
* (renderNodes) - Zero or more
*/
public interface IDRenderer
{
public void init(Attributes params);
public JComponent getRenderingComponent();
public void renderNodes(List ids);
}
Rendering HTML Using JEditorPane
The index, in this case, is a map of keywords present in the
Title tags of Categorys and Items to
their unique IDs. Consequently, the IDRenderer implementation
must display the XML fragment marked by the uniqueID. In order to
do that, the XML file containing only the unique IDs (without the keywords) must be available to the Applet. Although the XML file with the
IDs and keywords will serve the purpose, the additional keyword tags increase
the file size and the memory required to load it. The IDRenderer
that does this is XPathNodeAccessHTMLRenderer.
On instantiation, the XML file is loaded as a DOM document. To allow the
XML file to be switched or replaced with an updated file, the name and location
are not hardcoded. The XMLWithIdsFile Applet parameter points to
the actual location. Even the index file's location is not hardcoded. It is
instead fetched using the SerializedTSTFile Applet parameter.
These two input files are compressed and placed in a separate .zip file called
DataFiles.zip that should be present in the Applet's
classpath. See XPathNodeAccessHTMLRenderer (shown here) and
TST.html.
public void init(Attributes attrs)
{
...
// Create a builder factory
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setValidating(false);
DocumentBuilder docBuilder = factory.newDocumentBuilder();
// Create the builder and parse the file
final String xmlIDsFile = attrs.getParameter(PARAM_XML_WITH_IDS_FILE);
Document inDoc = docBuilder.parse(
getClass().getResourceAsStream( xmlIDsFile ));
doc = inDoc;
...
postInit(attrs);
}
protected void preRenderNodes(
List ids,
StringBuffer strBuf)
{
editorPane.setCursor(waitCursor);
strBuf.append(
"<head><base></head><body>\n");
}
public void renderNodes(List ids)
{
StringBuffer stringBuffer = new StringBuffer();
preRenderNodes(ids, stringBuffer);
for (Iterator iter = ids.iterator(); iter.hasNext();)
{
String id = (String) iter.next();
try
{
String part = "[@uniqueID='" + id + "']";
renderNode(
"//category"
+ part
+ " | //item"
+ part,
stringBuffer);
}
catch (TransformerException e)
{
e.printStackTrace();
}
}
postRenderNodes(ids, stringBuffer);
}
protected void postRenderNodes(
List ids,
StringBuffer strBuf)
{
strBuf.append("</body></html>");
editorPane.setText(strBuf.toString());
editorPane.setCursor(cursor);
}
XPath to the Rescue
The XPathNodeAccessHTMLRenderer uses a JEditorPane
to render the XML fragment as HTML with links. That's why
getRenderingComponent() returns a JEditorPane. When
the user presses the Enter key in the textField to request the
search results, the AutoCompleteTextField's
ActionListener fetches the list of unique IDs for the search word,
if it exists, sending it to the IDRenderer's
renderNodes(List ids).
For each uniqueID in the list, the
renderNodes(...) method uses XPath to retrieve the XML fragment
pointed to by the ID. Since the elements can only be either
Category or Item tags, the XPath query is fine-tuned
to improve search speeds. With the Element thus obtained, simple
HTML lists are generated by iterating through it recursively. The
Title tags become the List items in bold, and the
Links become HTML anchor tags displayed as nested HTML lists. See
XPathNodeAccessHTMLRenderer.java.
Java-JavaScript Communication
When a Link is clicked in the JEditorPane, a new
browser window opens to display the URL. This requires Netscape LiveConnect to
do the Java-JavaScript communication. See LinkClickListener.java,
excerpted here:
public class LinkClickListener implements HyperlinkListener
{
private JSObject window;
public LinkClickListener(Attributes attrs)
{
window = attrs.getWindow();
}
public void hyperlinkUpdate(HyperlinkEvent event)
{
if (event.getEventType()
== HyperlinkEvent.EventType.ACTIVATED)
{
String[] args =
new String[] {
event.getURL().toExternalForm(),
"",
"" };
window.call("open", args);
}
}
}
HTML Renderer Using the Mozilla Sidebar
Because the search result rendering is designed to be customizable,
transforming the applet into a Mozilla Sidebar is quite simple. The
XPathNodeAccessHTMLRenderer is extended by another class, the
MozillaSidebar, to reuse most of the XML to HTML transformation.
Instead of rendering the resulting HTML in the right-hand-side
JEditorPane, it is rendered in a small browser window using
JavaScript through LiveConnect, as shown in Figure 8.
Figure 8. SearchAssist in the Mozilla Sidebar
The getRenderingComponent() returns a JPanel of
zero size. So the right-hand side is almost invisible, leaving only the
JTextField and the JTextArea visible, which fits into
the Sidebar panel on the left-hand side of the browser. See
MozillaSidebar.java and TST v1.1.html, excerpted
here:
public class MozillaSidebar
extends XPathNodeAccessHTMLRenderer
{
private static final boolean DEBUG = true;
private JSObject window;
public MozillaSidebar()
{
super();
}
public JComponent getRenderingComponent()
{
JPanel panel = new JPanel(null);
panel.setSize(0, 0);
return panel;
}
protected void postInit(Attributes attrs)
{
window = attrs.getWindow();
}
protected void preRenderNodes(
List ids,
StringBuffer strBuf)
{
strBuf.append(
"<head><title>Search results</title>"
+ "<base></head><body>\n");
}
protected void postRenderNodes(
List ids,
StringBuffer strBuf)
{
strBuf.append("</body></html>");
window.call(
"renderHTML",
new Object[] { strBuf.toString()});
}
}
<!-- JavaScript code accompanying the Applet to render the search results
in a Browser pop-up window -->
<script language="JavaScript">
function renderHTML(str)
{
newWindow = window.open("","SearchResults",
"height=300,width=350,resizable");
newWindow.document.write(str);
newWindow.document.close();
newWindow.focus();
return true;
}
</script>
The .jar file must be signed, because the XML parser reads some
System properties that are not available to unsigned
Applets.
Customization Through Applet Parameters
The SerializedTSTFile, IDRendererImpl, and the
XMLWithIdsFileApplet parameters are provided to make the necessary
customizations. The Archive parameter is used to supply DataFiles.zip, which contains all the data files, especially the serialized index and the XML file.
<!-- Applet code using XPathNodeAccessHTMLRenderer -->
<APPLET CODE = "searchassist.ui.TSTApplet"
CODEBASE = "."
ARCHIVE = "SSearchAssist.jar, DataFiles.zip"
WIDTH = "590" HEIGHT = "600"
ALIGN = "baseline">
<PARAM NAME = SerializedTSTFile
VALUE ="/resources/tst.ser">
<PARAM NAME = IDRendererImpl
VALUE ="searchassist.ui.XPathNodeAccessHTMLRenderer">
<PARAM NAME = XMLWithIdsFile
VALUE ="/test/output_index_mod.xml">
</APPLET>
<!-- Applet code using MozillaSidebar -->
<APPLET CODE = "searchassist.ui.TSTApplet"
CODEBASE = "."
ARCHIVE = "SSearchAssist.jar, DataFiles.zip"
WIDTH = "163" HEIGHT = "450"
ALIGN = "baseline">
<PARAM NAME = SerializedTSTFile
VALUE ="/resources/tst.ser">
<PARAM NAME = IDRendererImpl
VALUE ="searchassist.ui.MozillaSidebar">
<PARAM NAME = XMLWithIdsFile
VALUE ="/test/output_index_mod.xml">
</APPLET>
Conclusion
SearchAssist is limited by memory consumption, as the whole index is loaded all at once. The XML file also takes up memory when loaded using DOM. More mature index systems like JavaHelp should be used for larger indices.
Resources
- Ashwin Jayaprakash's web site, where the complete code and binaries of SearchAssist and other hobby projects are available.
- The web site of Matt Welsh, whose AutoComplete
JTextFieldis used in SearchAssist. - Wally Flint's Ternary Search Tree library. I have modified Wally Flint's code to hold
Lists instead ofObjects. - Ashoka Polpitiya's list of stop words.
- Sun's JAXP 1.1 tutorials were very helpful.
- James Newkirk's Private Interface.
Requirements
- JDK 1.4, for built-in XML parsers.
- JAWS.jar, for LiveConnect.
Ashwin Jayaprakash is a software engineer at BEA Systems R & D Centre, Bangalore, using Java/J2EE to develop WebLogic Integration products.
Return to ONJava.com.