ONJava.com -- The Independent Source for Enterprise Java
oreilly.comSafari Books Online.Conferences.


AddThis Social Bookmark Button

Parsing and Processing Large XML Documents with Digester Rules
Pages: 1, 2

Implementing Custom Rules

Download the complete source code for DBUnitRuleSet and DBUnitFlatRuleSet, with an accompanying Maven project. Below is the implementation of the following rules from DBUnitRuleSet: TableRule, TableColumnRule, TableRowRule, and TableRowValueRule. For convenience, the concrete rules could be coded as static inner classes within RuleSet.

Each rule may handle any combination of:

  • Opening the XML element using begin().
  • Closing the XML element using end().
  • Accessing the textual body of the selected XML element using body().

In this example, TableRule creates a child copy of the parent context for each new table, initializes the TABLE_NAME attribute, and creates a new TABLE_COLUMNS List for column names when handling the open <table> element. It also drops the current child context from the Digester stack at the closing </table> element.

private static class TableRule extends Rule {

  public void begin( String ns, String name, 
        Attributes att) {
    Map parentCtx = (Map) getDigester().peek();
    Map ctx = new HashMap(parentCtx);
    ctx.put("TABLE_NAME", att.getValue("name"));
    ctx.put("TABLE_COLUMNS", new ArrayList());
    ctx.put("TABLE_ROWS", new ArrayList());
    getDigester().push( ctx);

  public void end( String ns, String name) {

TableColumnRule adds a single column name into the TABLE_COLUMNS List in the current context.

private static class TableColumnRule 
      extends Rule {
  public void body( String ns, String name, 
        String text) {
    Map ctx = ( Map) getDigester().peek();
    ((List) ctx.get("TABLE_COLUMNS")).add(text);

TableRowRule initializes a TABLE_ROW List that will be used to store values for the current table row at the opening <row> element.

This rule also executes SQL to insert data from the current row when the closing </row> element is handled. This way, the entire XML document is never loaded into memory. The actual SQL is constructed in the getStatement() method.

private static class TableRowRule extends Rule {
  public void begin( String ns, String name, 
        Attributes att) {
    Map ctx = (Map) getDigester().peek();
    ctx.put("TABLE_ROW", new ArrayList());

  public void end( String ns, String name) 
        throws SQLException {
    Map ctx = (Map) getDigester().peek();
    execute(ctx, getStatement(ctx));
  private int execute( Map ctx, 
      PreparedStatement st) throws SQLException {
    List values = (List) ctx.get("TABLE_ROW");
    if( values.size()==0) return 0;

    for( int i = 0; i<values.size(); i++) {
      st.setObject(i+1, values.get(i));
    return st.executeUpdate();

  private PreparedStatement getStatement( Map ctx) 
        throws SQLException {
    List cols = (List) ctx.get("TABLE_COLUMNS");
    if(cols.size()==0) return null;

    String tableName = getTableName(ctx);
    StringBuffer sql = new StringBuffer()
        .append("INSERT INTO ")
    StringBuffer values = new StringBuffer("?");
    for( int i = 1; i<columns.size(); i++) {
    sql.append(") VALUES (")

    Connection conn = getConnection(ctx);
    return conn.prepareStatement(sql.toString());

  private Connection getConnection( Map ctx) {
    return (Connection) ctx.get("CONNECTION");

  private String getTableName(Map ctx) {
    return (String) ctx.get("TABLE_NAME");

TableRowValueRule collects column values for the current row from the <value> element within the TABLE_ROW List of the current context.

private static class TableRowValueRule 
      extends Rule {
  public void body( String ns, String name, 
      String text) {
    Map ctx = (Map) getDigester().peek();
    ((List) ctx.get("TABLE_ROW")).add(text);

The code above does not cache the created PreparedStatement instances, and instead recreates them every time. This may cause some performance concerns; however, if this code is used inside of a J2EE container, a connection is obtained from the container-managed DataSource, so most likely, caching of prepared statements is being done automatically. If not, then the getStatement() method can be extended in order to save created instances of the PreparedStatement within the processing context. Also, please note that these statements must be explicitly closed at the end of processing, such as in the end() method of TableRule.


For event-driven code, testing is twice as important than it is for any other application. It is not always possible to clearly observe which events will be fired by the event generator. In our case, events are generated by the SAX XML parser, so we build test data for this. It does not make a much sense to test each rule independently, because they are related. On the other hand, for a first shot at an execution sequence test for DBLoader, we don't really need a database connection and can use a mocked environment. It is easy to implement such test using the jMock dynamic mock testing framework. A mocked Connection and PreparedStatement can verify that rules are executed in an appropriate order and that they convert all data from XML. Here is a simple test suite.

public class DBLoaderTest extends TestCase {
  private static final String DBUNIT_FDATA = 
    "  <TABLE1 col1=\"1\" col2=\"11\"/>\n"+
    "  <TABLE1 col1=\"2\" col2=\"22\"/>\n"+
  public static Test suite() {
    String name = DBLoaderTest.class.getName();
    TestSuite suite = new TestSuite(name);
    suite.addTest( new DBLoaderTest( 
        new DBUnitRuleSet(), DBUNIT_DATA));
    suite.addTest( new DBLoaderTest( 
        new DBUnitFlatRuleSet(), DBUNIT_FDATA));
    return suite;

  private final RuleSet ruleSet;
  private final String xml;

  private DBLoaderTest( RuleSet ruleSet, 
      String xml) {
    this.ruleSet = ruleSet;
    this.xml = xml;
  public void testDBLoader() throws Exception {
    Mock ps = new Mock(PreparedStatement.class);

    Object[][] params = new Object[][] {
        { new Integer(1), "1"},
        { new Integer(2), "11"},
        { new Integer(1), "2"},
        { new Integer(2), "22"}};
    for( int i = 0; i<params.length; i++) {
      ps.expects(new InvokeOnceMatcher())
        .method(new IsSetter())
        .with(new IsEqual(params[i][0]), 
              new IsEqual(params[i][1]))

    ps.expects(new InvokeCountMatcher(2))
      .will(new ReturnStub(new Integer(1)));
    Mock conn = new Mock(Connection.class);
    conn.expects(new InvokedRecorder())
        .will(new ReturnStub(ps.proxy()));
    Reader r = new StringReader( xml);

    DBLoader loader = new DBLoader(ruleSet);
    loader.load((Connection) conn.proxy(), r);
  public String getName() {
    String name = ruleSet.getClass().getName();
    return super.getName()+" "+name;
  public class IsSetter implements Constraint {

    public boolean eval( Object o) {
      return ((String) o).startsWith("set");


The same test case can be used to test both layouts, because the sequence of JDBC calls will be the same in both cases for the same data. The method testDBLoader() creates a Mock for PreparedStatement and sets its expectations based on the source XML structure. Expected methods are setObject()/setString() and executeUpdate(). The test method also calls verify() for all mocks after DBLoader execution to ensure that expectations are met.


As shown above, Digester can help to isolate XML processing logic in maintainable rules and maintain the advantages of the stream-based XML processing. The code is easy to understand and test.


Eugene Kuleshov is an independent consultant with over 15 years of experience in software design and development.

Return to ONJava.com.