ONJava.com -- The Independent Source for Enterprise Java
oreilly.comSafari Books Online.Conferences.

advertisement

AddThis Social Bookmark Button

Putting XML in LDAP with LDAPHttp
Pages: 1, 2

Using SAX to Get What You Want

Parsing XML can be an expensive task. In scanning a file like an RSS feed over the network to obtain only a small subset of the information therein, we would prefer to read once as a stream and retain only what we need. Hence, SAX is the right tool for the job. To do the work, we will nest a subclass of a SAX DefaultHandler inside of our rep class. This handler will tap the RDF of RSS v1.0, the rdf:about attribute for item elements in particular, to identify the precise post to which Charlie wants to respond. In other words, Charlie submits an article ID, a rep class method converts this to an appropriate substring, the substring is passed to the handler, and the handler looks for the matching item as it parses.



public void startElement(String namespaceURI,
	String localName, String qName, Attributes atts)
	throws SAXException {

	// store the current element name
	this.current_element = localName;

	// indicate if within a relevant parent tag
	if ( localName.equalsIgnoreCase("item") ) {
		if (atts.getValue("rdf:about").indexOf(this.item_substring) > 0) {
			this.item_found = true;
			this.in_item    = true;
		}
	}
}

Once inside the appropriate item, the handler will fill buffers with the bits we want.

public void characters(char[] ch, int start, int length)
	throws SAXException {

	// store information from the relevant item
	if (in_item) {
		if ( this.current_element.equalsIgnoreCase("title") ) {
			this.title_buffer.append( new String(ch, start, length) );

		} else { if ( this.current_element.equalsIgnoreCase("link") ) {
			this.link_buffer.append( new String(ch, start, length) );

		} else { if ( this.current_element.equalsIgnoreCase("description") ) {
			this.description_buffer.append( new String(ch, start, length) );

		} else { if ( this.current_element.equalsIgnoreCase("creator") ) {
			this.creator_buffer.append( new String(ch, start, length) );
		}}}}
	}
}

The values are then available with standard get methods that convert the buffers to trimmed strings and return them individually or as a composite. Another method indicates whether the specific item was in fact found in the feed by returning item_found. The class is rounded out with the usual error methods. A lot of work for a little data? Perhaps, but SAX doesn't get much simpler and we will only need one handler.

The Urge to preCreate

The code for the gateway create servlet primarily constructs a new entry in memory from attribute values submitted with the request, then attempts to add it to the directory. Between these two steps is a call to the object's preCreate() method, which does nothing by default. We override this method in our rep class to perform the RSS parsing and to populate additional attributes with derived values.

public void preCreate() throws LDAPHttpException {

	// perform standard comment precreation
	super.preCreate();

	// retrieve and parse the RSS feed
	String uri = getFeed();
	RSS1Handler rss_handler = new RSS1Handler();
	rss_handler.setItemSubstring( getItemSubstring() );

	try {
		XMLReader reader = XMLReaderFactory.createXMLReader(PARSER_CLASS);
		reader.setContentHandler(rss_handler);
		reader.setErrorHandler(rss_handler);
		InputSource input_source = new InputSource(uri);
		reader.parse(input_source);

	// handle feed retrieval exceptions
	} catch(IOException e) {
		throw new LDAPHttpException(
			"Unable to fetch the RSS feed: " + e.getMessage() );

	// handle feed parsing exceptions
	} catch(SAXException e) {
		throw new LDAPHttpException(
			"Unable to parse the RSS feed: " + e.getMessage() );
	}

	// update attribute values with those found in the feed
	if ( rss_handler.itemFound() ) {
		resetAttribute("cn", new String[] {rss_handler.getTitle()} );
		resetAttribute("description", new String[] {
			rss_handler.getDescription()} );
		resetAttribute("link", new String[] {
			rss_handler.getLink()} );

	// handle the case where the item wasn't found in the feed
	} else {
		throw new LDAPHttpException("An item matching <i>"
			+ getItemSubstring()
			+ "</i> was not found in the current feed: " + getFeed() );
	}
}

The feed URI is defined in the constructor for the particular subclass, along with some information used in retrieval. For instance:

public class registerrep extends rep {
	public registerrep() throws LDAPHttpException {

		setCategory("register");
		setLabel("The Register news reply");
		setFeed("http://www.theregister.co.uk/tonys/slashdot.rdf");
	}
}

An Entry You Can Count On

Looking back at the DIT diagram and the Rep LDIF, you are probably still wondering about c=CRAcounter and uid=CRA42. While most directory servers manage a few attributes at the system level for things like timestamps, there is nothing in LDAP that will auto-increment or define a primary key for your entries when you add them; the server expects this information from you at creation time. As I've learned, the best practice for solving this problem is to store and manage an incrementing identifier value in the directory itself. Because it's small, standard, and doesn't appear in this picture, we'll hijack the country object class for our counter entry.

dn: c=CRAcounter, ou=Reps, ou=Comments, ou=Expressions, o=mentata.com
objectclass: top
objectclass: country
c: CRAcounter
description: 0

With this loaded, LDAPHttp and the create servlet will automatically grab and set a unique uid value for each new post per this line in the rep constructor:

setIncremental("CRA", "c=CRAcounter, ou=Reps, ou=Comments, " 
	+ "ou=Expressions, o=mentata.com", "description");

And "Bob's your uncle," as they say in Australia.

But Why, Daddy?

I'm sure to some this may look like a profound waste of time. I've provided motivation for LDAP and LDAPHttp elsewhere, but this example raises the question: why would you take data from one available format and store it in another for use in closed little web applications?

XML is an excellent way to express simple or complex textual information openly, and is every bit the de facto standard for general data representation that I predicted last year it would continue to become. On the other hand, what do we always say about silver bullets? Although there are mechanisms for indexing XML content, if you want to search your data by fields or dynamically re-express it, odds are you want it in a database of some sort. Relational database systems are adequately powerful, but can be overkill for some needs, as they require lots of administration and a potentially stilted process of partitioning data into two-dimensional views. Directory databases are simpler, plus they excel at searchability and are well suited to host information that is available live but doesn't change frequently once created. You can make use of identities and sophisticated schemes for access control without new software, and the results are accessible to any client or API that speaks LDAP.

The question of whether to use LDAPHttp and my gateway here may be more to the point. LDAP is a mature standard, so you can bet there are and will be plenty of ways to communicate with directory servers; promising new open source apps are being released with increasing frequency. With LDAPHttp, my own goal has been to deliver a platform that plays to the specific strengths of LDAP, servlets, and HTTP to do useful and interesting things without regard to what those things are. The framework may be non-standard, but it's as extensible as Java itself. LDAPHttp is clearly not a panacea, but it will provide elegant solutions for appropriate problems. Good ideas can organically bubble up from contexts to app libraries or the core packages. Someday, this software may serve as a competitive advantage in some vertical market of my choosing, but for now, I've deferred the question of what in the hope that others can use my work to prototype, demonstrate, and deliver unique services of their own. Think about Charlie.

The Larger Conversation

If I had to pick a space today, I'd say I am particularly interested in supporting transactions that involve the exchange of text (e.g., news and weblogs). Since a Rep is a solicitation, the HTML page returned by the gateway retrieve servlet will include a form for creating a new comment under the ou=Anonymous,ou=Comments,ou=Expressions,o=mentata.com branch of the DIT. Hence, Charlie's earlier Rep could provoke

dn: uid=CMA217,ou=Anonymous,ou=Comments,ou=Expressions,o=mentata.com
uid: CMA217
cn: India's endorsement?
businesscategory: rep
dnqualifier: 20030622123345Z
description: anonymously
content: I don't know what Bombay developers would think of all this,
	but you're going to have to do better if you want them
	to read your posts.
parent: uid=CRA3,ou=Reps,ou=Comments,ou=Expressions,o=mentata.com

That final parent attribute is a special type used by LDAP to relate entries by dn value. Think of it as a pointer or foreign key. One of the features of LDAPHttp is to allow you to trace dn attributes in either direction, providing (among other things) links to comments made on a retrieved Rep. This all works well because a request for an entry by its dn or a request for entries with values for an indexed dn attribute matching a given distinguished name will both run like streaks of greased lightning through LDAP. To me, a good candidate application for LDAPHttp should involve lots of dn attributes. So is Charlie's app a good one? It depends on how much dialogue his posts generate! Even so ...

<comment>Much like open source developers, reporters and columnists frequently exchange and co-opt the ideas of their fellows without much concern for abstract notions of property. In fact, James Joyce goes so far as to make incest the central metaphor for journalism in the Aeolus episode of Ulysses. On the Internet today, with all of that fast and easy communication facilitated between people worldwide in real time, commentary as a profession may soon be overwhelmed by commentary as a diversion. RSS in its many flavors makes it a snap to generate your own syndicated feed, blasting your observations and perspective far and wide. Along for the ride are the expressions of others as they have influenced you. This is all a boon for an open, democratic society, but the real value is not in the posting, but in the responding. The best place for that is in the original conversation. Sorry Charlie, but I don't think anybody should be an island.</comment>

That doesn't mean it was such a bad example for this article; we covered a lot. And with this journey at an end, I will employ yet another class from my forum package to start new conversations, asking my perennial favorite question: suggestions?

Resources

Jon Roberts is an independent software developer and sole proprietor of Mentata Systems.


Return to ONJava.com.