ONJava.com -- The Independent Source for Enterprise Java
oreilly.comSafari Books Online.Conferences.

advertisement

AddThis Social Bookmark Button

XML Parsing in a Producer-Consumer Model

by Prabu Arumugam
10/08/2003

XML plays a vital role in integrating business-to-business applications. To parse XML files, these applications use either a Simple API for XML (SAX) or a Document Object Model (DOM) parser. Parsing in single-threaded applications is straightforward. However, it is quite complex and challenging in a multithreaded application, such as an application server, because the applications often create a dedicated thread to parse XML, serving many concurrently running threads with the parsed data. This article describes one implementation of parsing XML in concurrent applications.

Design Approach

Based on producer-consumer concurrent programming concepts, a dedicated thread acts as producer to parse the XML. A group of threads act as consumers. As the producer thread parses XML data, it stores the data in a shared data structure for the consumer threads to pick up for further processing. To maximize throughput and minimize memory usage, this design uses a special queue for the producer and the consumers to store and to retrieve parsed data, respectively.

Smart Queuing

The SmartQueue class provides producer-consumer threads with queuing functionalities. The primary responsibility of the SmartQueue is to maintain the size of the queue to prevent over- and under-flowing. In other words, the SmartQueue maintains a fixed-length queue policy to maintain resource efficiency. It enforces this policy by holding and waking up appropriate threads at the right time. For instance, if there is no room to add data, the queue will hold the producer thread until a consumer thread removes an item from the queue.

The following code snippet from the SmartQueue shows the implementation of this strategy:

public synchronized void put(Object data) {
    // check to see if the length is 2
    while (list.size() >= 2) {
        try {
            System.out.println("Waiting to put data");
            wait();
        }
        catch (Exception ex) {
        }
    }

    list.add(data);
    notifyAll();
}

public synchronized Object take() {
    // wait until there is data to get
    // come out if the end of file signaled
    while (list.size() <= 0 && (eof != true)) {
        try { 
            System.out.println("Waiting to consume data");
            wait();
        } catch (Exception ex) {
        }
    }

    Object obj = null;

    if (list.size() > 0) {
        obj = list.remove(0);
    } else {
        System.out.println("Woke up because end of document");
    }

    notifyAll();
    return obj;
}
Java and XML

Related Reading

Java and XML
Solutions to Real-World Problems
By Brett McLaughlin

XML Parsing

This design uses the SAX API for parsing XML file for the following reasons:

  • The API reads XML data quickly and efficiently.
  • It does not construct any internal data representation of XML. Instead, it simply delivers data to the application as it encounters XML elements.
  • The SAX API fits very well with the producer-consumer model.

The XMLParserHandler class extends SAX, implementing callback methods to receive XML data from the parser. As XMLParseHandler receives data from the parser, it puts the data in the hashtable. At the end of each document, the XMLParseHandler puts the data in the SmartQueue. The handler will go into a wait state if there is no room in the SmartQueue. The call to the Put method completes once the consumer threads remove the items from the SmartQueue. Upon completing the entire XML document, the XMLParseHandler notifies the consumer threads to stop looking for more documents.

Let's look at the callback methods that store data in the SmartQueue and notify waiting consumer threads. The startElement method instantiates a new hashtable for each document in the XML file.

public void startElement( String namespaceURI, String localName,
	String qName, Attributes atts )
    throws SAXException {
    System.out.println(
        " startElement local names............." +
        localName + " " + qName);
        if (qName.equalsIgnoreCase(elemmark)) {
            doc = new Hashtable();
        }
    elem = qName;
}

The endElement method is responsible for adding the parsed data into the SmartQueue. As mentioned earlier, the SmartQueue holds this thread until there is room to store it.

public void endElement( String namespaceURI, String localName,
	String qName )
    throws SAXException {
    String s = sbData.toString();

    System.out.println("element " + elem + " character " + s);

    if ((doc != null) & (s != null) & !(s.trim().equals("")))
        doc.put(elem, s);

    sbData = new StringBuffer();
    System.out.println(" endElement ending element............." + qName);

    if (qName.equalsIgnoreCase(elemmark)) {
        System.out.println(
            " endElement ending element............." + localName);

        smartQueue.put(doc);
        doc = null;
    }
}

Finally, the endDocument callback method notifies the consumer threads about the end of XML document. This means that consumer threads do not have to wait for more data before finishing their work.

public void endDocument() throws SAXException {
    smartQueue.end();
    System.out.println("End Document.............");
}

Consumer Threads

Consumer threads remove items from the SmartQueue once the producer thread puts items in the SmartQueue. Each consumer thread will go into a wait state if the SmartQueue is empty. The consumer threads run until the producer thread signals the end of document processing and there are no more items in the SmartQueue.

Here is an example of a consumer thread implementation that keeps taking the data from the SmartQueue until there is no more data or the end of document is reached.

public void run() {
    while (!queue.isEmpty() || !queue.onEnd()) {
        Hashtable val = (Hashtable) queue.take();

        System.out.println("Obtained by " + this.getName() + " " + val);

        // try {
        //     System.out.println("Simulate lengthy processing...........");
        //     Thread.sleep(2000);
        // }
        // catch(Exception ex){}
    }
}

Benefits

This design provides the following benefits:

  • Parsing and data consuming are can occur in parallel.
  • Large XML files can be parsed with a small memory footprint.

Extending the Design

The SmartQueue implements a fixed-length queue policy to maintain memory efficiently. By changing the implementation of its Take and Put methods, you can enforce a different policy. As mentioned earlier, the XMLParserHandler creates a hashtable of XML elements and values. However, this class can be customized to build application-specific objects.

Sample Application

The source.zip file contains a TestProducerConsumerForXML class that takes the XML file as a parameter and runs the application. Follow the instructions below to run the application:

  • Unzip the source.zip file.
  • Run the program TestProducerConsumerForXML with order.xml.

For example

c:\testarea>java -classpath \
	c:\testarea prodcons.TestProducerConsumerForXML \
	c:\testarea\prodcons\order.xml

Conclusion

This article has presented a method of parsing XML documents with concurrent programming. It has also explained the ideas behind the producer-consumer model, as well as thread coordination.

Resources

Prabu Arumugam is a software architect and senior Java developer at Forest Express, LLC.


Return to ONJava.com.