ONJava.com    
 Published on ONJava.com (http://www.onjava.com/)
 See this if you're having trouble printing code examples


Using Castor JDO for SQL Mapping

by Jeff Lowery
10/02/2002

Castor is a multifaceted software tool being developed under the auspices of exolab.org, an informal organization involved in the development of open source, enterprise software projects based on Java and XML.

The primary function of Castor is to perform data binding. Data binding is a process that facilitates the representation of one data model in another. For example, an XML data model, described by an XML schema document, can be approximately represented by Java classes. Castor helps by generating these classes from the XML schema document. Object instances of these classes are then able to store XML document data, so long as such documents conform to the XML schema.

Such binding works both ways, of course: object instances of the generated classes can be easily transformed back into XML documents through a process known as marshalling. Castor's marshalling engine can introspect the Java data objects and generate corresponding XML document elements. Such marshalling can be refined through the use of user-defined mapping files, which Castor also supports.

Related Reading

Java & XML Data Binding
By Brett McLaughlin

While XML data binding is very useful, this article will focus on another aspect of Castor: data binding of Java objects to tables, columns, and rows in a SQL database. This functionality falls under the heading of Castor JDO. The intent of the Java Data Objects (JDO) standard is to transparently persist Java objects. Although similar in name and intent to Sun's efforts, Castor JDO was developed largely independently of Sun's efforts, but aims roughly in the same direction. Be that as it may, the two technologies have different feature sets, and should not be assumed to be interoperable.

Castor is able to employ a mapping file as a simple way to bind Java objects to SQL database tables. Instead of writing complex procedural code in Java to manage the database queries and updates via JDBC, Castor hides this complexity by using the mapping file entries and transparently performing the proper queries and updates in the background. This declarative mechanism of descibing how the objects and the database are linked makes maintenance easier, because mapping files are relatively easy to understand and can be changed without recompiling code. It's also much simpler in that it operates at a higher level of abstraction than functionally equivalent JDBC mechanisms allow.

Writing a Mapping File

Unlike Castor's XML marshalling -- where there is a default mapping of a Java object to XML elements and attributes -- no default mapping exists for binding Java objects to SQL database tables: you must use a mapping file to enable this functionality. The mapping file contains explicit information on how Castor should represent a set of Java objects in a relational database.

Mapping files are written in XML, and can be validated against either a DTD or an XML schema supplied by the Castor group. This type of validation will catch syntax errors, but it does take some practice to understand the concepts behind mapping files and understanding Castor's JDO behavior. If Castor JDO appears not to be storing data in the database, it's likely due to missing entries in an otherwise syntactically valid mapping file.

Before going into the details of mapping, let's first present an overview of the mapping elements. For every Java class whose instances are to be stored in the database, a class element is required in the mapping file. These class elements reside one level below the mapping root element:

<mapping>
   <description>Optional description of mapping file</description>
   <!-- you can include other mapping files -->
   <include href="other/mapping_file.xml"/>  
   <class name="Class1" map-to="db_table1">
      <description>Optional description of this class mapping</description> 
      <field name="field>  <!-- mapping of a class data member --> 
         <sql> ... </sql>  <!-- maps Java field to database column --> 
      </field>
      <field> ... </field>
      ...  
   </class> 
   <class name="Class2" map-to="db-table2">
     ...
   </class>
</mapping>

For each class to be mapped to a database table (the map-to attribute indicates which table), the data members of the class to be persisted (referred to as fields by Castor) will have corresponding field elements. Each field element in turn has a sql child element, which describes how such fields are stored in the database.

Once a mapping has been established, Castor takes care of the queries and updates that are needed to fetch and store data as Java objects. It also handles conversions between SQL and Java datatypes seamlessly.

Overview

A mapping file is written from the point of view of the Java class and describes how the properties of each object are to be stored in database tables, columns, and rows. These class properties are referred to as fields in the mapping file.

The general rules for mapping Java objects to database tables are:

  1. Each Java class maps to no more than one database table.
  2. Each Java object must have a unique identifier (which may be autogenerated).
  3. A database column must be identified for every field that is to be stored in the database.
  4. Fields have the option of being fetched or stored directly (if they are public), or through their class' get/set methods.
  5. Classes that exist only as part of a larger composite class should be indicated as dependent upon that composite class in the mapping file.
  6. Classes that extend another mapped class should be indicated as such in the mapping file.

Database Persistence

Castor uses the XML-format mapping file to determine:

Every element and attribute of a primitive type in Java (int, boolean, double, etc.) can be mapped to a table column that is of a comparable SQL type. Some nonprimitive types (classes), such as java.lang.String, java.sql.Date, java.lang.Long, etc., can be mapped to a single table column, as well. All other Java classes must have a class element definition in the mapping file to be persisted in a database.

Remember, Castor will not introspect a class and apply a set of default rules to guess the fields and attempt to persist them. This is different behavior from that of the XML marshaller.

There may be cases where a class instance only exists as a member of a containing parent object (in other words, it makes an appearance only as a child of some other object). The mapping file conveniently allows such dependent objects to be denoted as such. The advantages of recording dependencies in the mapping file will be explained later.

Fetching Java Objects From a SQL Database

When reconstructing an object from a SQL database, Castor uses the mapping information to determine where the class' data members are stored in the database.

It is possible to set up the mapping file so that an object doesn't have to be pulled from the database in its entirety, but can be lazily fetched. This means that certain members of the class instance will only be retrieved when the appropriate get method is called. The result is that large objects can be fetched incrementally -- avoiding the performance hit of fetching a load of data from the database that might not be needed by the program at that time.

Describing the Data Model

We're now ready to start in on the mechanics of how a set of Java objects are mapped to a SQL database. For this example, we assume the following environment:

The main variable here is the database. Not all databases support the same features, and some support similar features differently. I will point out cases where the examples show a Postgres-specific way of doing things.

Class Relationships

For purposes of illustration, it's best to have a set of classes that exhibit the following relations between objects:

  1. one-to-one
  2. one-to-many
  3. many-to-many
  4. class dependent on another (weak entity)
  5. class related to another
  6. class type extending another

The first three are self-explanatory cardinality relationships among class instances. Number 4 has already been covered. The fifth item, class related to another, is the case where a class has a data member reference to another, nondependent class. The referenced class in this case is a strong entity, whose instances exist independently of the parent class instance.

The sixth type of relation is that where a class extends (is derived from) another Java class. The extended class type may contain additional data members (fields) not defined in the base class. If the base class and its data members are already mapped to a database table, then it would be desirable to be able to reuse that mapping for a derived class. In this case, a new table is created to hold the extended class' data members, and the members that are inherited are stored in the table used for the base class. In other words, a class that extends another class can be mapped in way that, when instances of the extended class are saved, data member values of that class are stored in two tables: the base type table, and the extended type table.

Example: Shipping Line

I'll use for an example a description of a merchant shipping line. At the top level, the shipping line has ships, and each ship has ports-of-call. This gives us three main domain objects:

Each of these domain objects in turn will have properties, both simple and complex. The easiest way to show the relation between line, ship, and port, along with their respective properties, is to see the object marshalled into an XML document that contains sample information for all of the above:

<shipping-line name="Burble Bros. Shipping">
<ship name="Bubbles" registry="Liberian">
      <port-of-call name="Seattle" pier="41"> 
         <arrival-time day="2002-12-03" time="02:35" zone="GMT"/>
         <departure-time day="2002-12-05" time="22:45" zone="GMT"/>
      </port-of-call> 
      <port-of-call name="San Fransisco" pier="1"> 
         ... 
      </port-of-call> 
    ...
   </ship> 
   <ship name="Spring Surprise" registry="Sierra Leone"> 
     <port-of-call ...> 
   ... 
</shipping-line>

As you can see in this XML document, each shipping line has multiple ships, and each ship has multiple ports-of-call. In addition, each element above has one or more attributes. In Java, such attributes are represented as class properties. Later, I'll show how these same attributes correspond to columns in a database table.

The ShippingLine Class

Each shipping line has a name and several ships. An interface to the ShippingLine class might look like the following:

public interface IShippingLine {
   public String getName();
   public Ship getShip( int i ); 
   public ArrayList getShips();
   public void addShip( Ship s ); 
   public void setName( String name );
   // more methods... 
};

The ShippingLine class has a Java-bean-type interface, with get/set methods for every data member of the class. Because a ShippingLine object can contain many Ship instances, the methods for adding and retrieving Ship objects are different, because they manipulate objects in a collection.

The actual data members of ShippingLine are as follows:

public class ShippingLine {
   private String _name;
   private ArrayList _ships; 
   private int _id; 
   // methods... 
};

You'll note the addition of an _id data member that was not mentioned in the interface methods shown previously. The _id member will be used only by the JDO persistence engine as a table row identifier. It holds, in effect, the value of the primary key of the row in database table that contains ShippingLine object data. The _id key value will be autogenerated (as you shall see), and will therefore be unique.

The shipping_line Table

We will use the following SQL statement to create a table for holding ShippingLine information in our Postgres database:

CREATE TABLE shipping_line ( id SERIAL, name VARCHAR(80) );

Since most SQL databases (including Postgres) are case-insensitive, we've named the table shipping_line rather than shippingline to make the table name easier to read. For the definition itself, the language keywords have been capitalized, but again, that is only for legibility. Since SQL does not have an array type, there isn't a ships column in the table definition. In order to incorporate ships in shipping line, we will have to use two tables. The Mapping Extended Classes section describes how this is done. For now, I'll concentrate on binding just this one table's information in the section to follow.

Before moving on to explain how to map a Java class to a table, let's first explain the table column definitions above in more detail.

Natural Keys vs. Surrogate Keys

It may be that the name of a shipping line is sufficient to identify it from any other. It may be likely that such is the case, and if so, then shipping_line.name is sufficient as a row identifier in our table. The shipping line name would then serve as a natural key. To be safe, though, I'm using a generated sequential id, known as a surrogate key. Surrogate keys have no semantics associated with them -- they merely identify each row in the shipping_line table, and therefore each shipping line. With surrogate keys, it is easy to ensure that no identity collisions take place.

For table joins performed during a SQL query, integer-based surrogate key lookups will prove faster than natural keys that require string comparisons. Therefore, our foreign-key/primary-key columns in the database will use these surrogate keys instead of the natural ones. However, it will be necessary to place a unique index on the natural key as well as the surrogate, so that the two are always mapped one-to-one (and the id is associated with one and only one name):

CREATE TABLE shipping_line ( id SERIAL, name VARCHAR(20) 
CONSTRAINT uniq_name UNIQUE (name));

Note that it was not necessary to define id as unique, since the SERIAL datatype covers that base already.

Let's now proceed to write the XML mapping file.

Mapping One Class to One Database Table

As mentioned earlier, unlike Castor's XML marshalling, which has a default mapping based on Java introspection of the class definitions, a class-to-table mapping file must be provided explicitly in order for Castor to persist objects in a database. There are some tools, such as Raccoon, that can generate a Castor mapping file from Java beans. For the examples to follow, I'll author the mapping file by hand. By default, Castor expects a mapping file named mapping.xml.

SQL Mapping Basics

If you've used Castor mapping files in the past to marshal XML, then the format for binding Java data objects to databases is will seem familiar. Here is the mapping between the ShippingLine Java class and the shipping_line database table:

<class name="ShippingLine" identity="id">
   <map-to table="shipping_line"/>
   <field name="id" type="integer">
      <sql name="id" type="integer"/>
   </field>
   <field name="name" type="string" required="true">
      <sql name="name" type="varchar"/>
   </field>
</class>

What follows is an explanation of the elements and their values:

  1. The class element names the class to be mapped and its identity field. In cases where there are multiple fields that make up the identifier for the class (and table), a space-separated list of id fields can be specified. Say, for instance, that ShippingLine id was not unique, but that id + name would be. In that case, the identity attribute of the class element would have this value:

        <class name="ShippingLine" identity="id name">
  2. The map-to element identifies the database table that will be used to persist the ShippingLine class data.

  3. The first field to be mapped is the id field. Its datatype is specified as integer, which is compatible with both the Java primitive datatype intand class java.lang.Integer. A type specification is not required, but by providing one, we add an extra bit of type checking during the persistence process.

  4. Values for the class data member id are to be stored in the id column of the ShippingLine table, according to the sql child element of field. The SQL type integer is compatible with that of the integer datatype used for the Java field.

  5. The next field to be mapped is the name field. One of the attributes of the field element indicates that a value for the name is required. When it comes time to persist a ShippingLine object, that object must have a name value or an exception will be thrown.

  6. The sql element maps the name field to a like-named column in shipping_line. The type of the column is declared to be varchar, which is compatible with the java.lang.String type of the field.

Note: Field names generally refer to methods, not the names of data members of a class.

Let's take a look at the ShippingLine class once more:

public class ShippingLine { 
  private String _name; 
  private ArrayList _ships; 
  private int _id;

  // methods... 
  public int getId() { return _id; }
  public void setId( int id ) { _id = id; }
  //...
};

You see that the name of id data member actually has a preceding underscore: _id. Note that we didn't see this underscore in the field name of the class mapping. That's because Castor is not accessing the _id variable directly. Instead, it is inferring the names of the getter/setter methods for the _id data member from the field name provided in the class mapping. Hence, getId() and not get_id().

It is possible, however, to access data members directly. The section "Binding Through Class Data Members" will show you how this is done.

Key Generators

In order to ensure that the id column of the shipping-line table is unique, we defined the column to be of type SERIAL. The Postgres database supports this nonstandard (non-SQL) type as a convenient way to specify columns as unique identifiers for a table row. It's really more of a function than a type. When a column of type SERIAL is declared in Postgres SQL, the Postgres database management system creates a SEQUENCE table for that column.

A SEQUENCE table is a special table designed specifically for the autogeneration of incremental values. These values are most frequently used as identifiers, or primary keys, of table rows. When the shipping_line table was created, Postgres's SQL interpreter sees the SERIAL type of the id column and creates a separate SEQUENCE table named shipping_line_id_seq .

Note: Just about every database has support for SEQUENCE tables, although the syntax for using them may vary. Consult your database manual for specifics. What follows is an explanation of how to use SEQUENCE tables in Postgres; different databases may require slight alterations in the syntax used below.

Having a SERIAL datatype for a column is only half of the process of the auto-generation of keys. The remaining work to be done is to call the auto-generating function of the sequence table prior to commiting a row insertion. This is done by declaring a key-generator element in the mapping.xml file. In this element, Castor has to know the name of the SEQUENCE table in order to be able to call the proper SEQUENCE table method for getting the next id value.

The way to specify the SEQUENCE table name in the key-generator element is shown below.

<key-generator name="SEQUENCE" alias="seqgen">
   <param name="sequence" value="{0}_{1}_seq"/>
</key-generator>

The key-generator element has two attributes. The name attribute describes the type of key being used. Castor supports various types of keys; SEQUENCE keys are the most common. The alias attribute gives us a handle on the key generator. It is possible to refer to the key generator by its name, but an alias gives us the option of having two different SEQUENCE generators in the mapping file, each under a separate alias.

The param child element indicates the SEQUENCE table name. Here we are using replacable parameters, {0} and {1}, to specify the SEQUENCE table to be used. Every occurrence of the SERIAL column type requires a new sequence table to be created. The name of the table is generated from the table name where SERIAL occurs, along with the column name. Recall that the sequence table name for the shipping_line id column was: shipping_line_id_seq. By using the replaceable paramters {0} (which resolves to the table name containing the SERIAL column) and {1} (the SERIAL column name), each time the key-generator is called, the correct SEQUENCE table is used.

With my key-generator defined, I can next hook up a key-generator to the shipping_line table mapping definition by adding a reference to the key-generator via a class element attribute:

<class name="ShippingLine" identity="id">
   <map-to table="shipping_line"/> 
   <field name="id" type="integer" key-generator="seqgen"> 
      <sql name="id" type="integer"/> 
   </field>
   <field name="name" type="string" required="true"> 
      <sql name="name" type="varchar"/> 
   </field> 
</class>

Now, every new shipping-line row added to our Postgres database will have a unique, sequential value as an identifier.

Multifield Identifiers

Not all SQL tables have single-column keys; some keys are compound keys. A compound key uniquely identifies a row in a database table, but uses two or more columns to do so.

Let's say a ship's name uniquely identifies a ship within a registry. That's not to say that a ship is uniquely identified by name; it is instead identified by the compound identifier registry.name::ship.id. This means that each ship must have a registry attribute, and that the combination of registry attribute and ship id must be unique within the XML document (regardless of which shipping line the ships are in). In XML schema, it is possible to define a unique constraint that will enforce this for all ship elements in the document root:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="document_root"> 
    <xs:complexType>
      <xs:sequence> 
         <xs:element name="shippingLine" maxOccurs="unbounded">
            <xs:complexType> 
              <xs:sequence> 
                <xs:element name="ship" maxOccurs="unbounded"> 
                  <xs:complexType> 
                     <xs:attribute name="name" type="xs:string" 
                                      use="required"/> 
                    <xs:attribute name="registry" type="xs:string"
                                     use="required"/>
                  </xs:complexType>
                </xs:element> 
              </xs:sequence>
              <xs:attribute name="name" type="xs:string"/>
            </xs:complexType>
         </xs:element>
       </xs:sequence>
     </xs:complexType>

     <xs:unique name="shipUniq">
         <xs:selector xpath="shippingLine/ship"/>
         <xs:field xpath="@name"/>
         <xs:field xpath="@registry"/>
     </xs:unique>  
   </xs:element>
</xs:schema>

This same uniqueness constraint can be implemented in the database table definition for ship:

CREATE TABLE ship (
  registry VARCHAR(80) NOT NULL, 
  name VARCHAR(80) NOT NULL 
  -- other column definitions here --
  CONSTRAINT id_ship UNIQUE (registry, name) 
);

The database mapping file used by Castor can enforce this constraint for us, instead:

<class name="Ship" identity="registry name">
   <map-to table="ship"/> 
   <field name="name" type="string" required="true"> 
      <sql name="name" type="varchar"/>
   </field> 
   <field name="registry" type="string" required="true">
      <sql name="registry" type="varchar"/>
   </field> <!-- rest of fields mapped here -->

</class>

The value of the identity attribute of the class element is, in this case, a space-delimited list of fields that together uniquely identify each ship. With the identity constraint defined here, there is no need to have a unique constraint defined in the database table definition. Castor will handle it.

Binding Through Class Methods

There are two basic ways of binding the properties of a class to columns in a SQL table. The first method is to use public class methods to access the properties. By default, Castor uses this type of access, inferring the name of the public accessor/mutator methods from the name of the field being mapped. As was shown earlier, if the field name is id, then the accessor method is assumed to be getId().

What if the method names are different from what Castor assumes? In that case, you have to inform Castor of the proper methods by specifying them in the field mapping element:

<class name="ShippingLine" identity="id"> 
  <map-to table="shipping_line"/> 
  <field name="id" type="integer" key-generator="seqgen" 
       get-method="getID" set-method="setID"> 
    <sql name="id" type="integer"/> 
  </field> 
  <field name="name" type="string" required="true"> 
    <sql name="name" type="varchar"/> 
  </field> 
</class>

Normally, Castor would be looking for the Java bean methods getId() and setId() associated with the id property. The class author (me) prefers to capitalize both letters of ID when it occurs in method names. Because the capitalization is different from what Castor would expect, the method names must be given explicitly in the mapping file.

Binding Through Class Data Members

A second way to bind a class property to a SQL table column is to access the class data member directly, rather than through methods. The data member must be public for this mechanism to work. To specify direct access, the following attribute of the field element is used:

<class name="ShippingLine" identity="id">
   <map-to table="shipping_line"/>
   <field name="_id"type="integer" key-generator="seqgen" direct="true">
      <sql name="id" type="integer"/>
   </field> 
   <field name="name" type="string" required="true"> 
      <sql name="name" type="varchar"/>
   </field>
</class>

By accessing the data member directly in this way, you can reduce the number of public methods in the class definition. Notice that I've had to add the underscore character, since I'm now referring to the name of the data member directly, and it has an underscore in its name. The drawback to this mechanism is that, because of the need for a public data member, I consequently allow direct access by outside class methods in the application. In any case, you really don't have much protection from the wicked, but at least with public get/set methods you can deprecate their use where advisable through Javadoc comments.

Mapping Extended Classes

In Java, a class definition can extend another (base) class' definition through type-derivation. The new class will inherit all of the data members and methods of the base class, plus it might add a few of its own.

In the case where only the extended class is stored in the database, the mapping file entry for that class will be the same as for any other class; however, if the base class is by itself persisted in the database already, then there will be an entry in the Castor mapping file for that class. It would be silly to have to copy all of those field mappings for the base class in the new mapping entry of the derived class. It would also be a maintenance worry: the derived class definition may change, and you'd wind up with two mapping entries to update.

Better would be to store the new data members of the extended class in a new table in the database using a new mapping, and store the existing members of the base class in the table already set aside for it. Then Castor can perform a join operation between the two tables, generating a row that has all of the values of the data members (both inherited and new) for the extended class. Two joined table rows form one class instance.

The extends Attribute

A mapping for a class extends another class mapping through use of the extends attribute. Let's say I want to subtype the ship class so that there are cargo ships and passenger ships. I then have the following Java classes:

public class Ship
{
   String name;
   String registry;

   public Ship() { ... }

   //getters/setters...
}

public class CargoShip extends Ship
{
   int cargoTonnage;

   public CargoShip() { ... }
   //getters/setters...
}

public class OilTanker extends Ship
{
   int barrels;

   public OilTanker() { ... }

   //getters/setters...
}

To store these new classes, I can map them as follows:

<class name="Ship" identity="registry name"> 
   <map-to table="ship"/> 
   <field name="name" type="string" required="true"> 
      <sql name="name" type="varchar"/> 
   </field> 
   <field name="registry" type="string" required="true">
      <sql name="registry" type="varchar"/> 
   </field> 
   <!-- rest offields mapped here --> 
</class>

<class name="CargoShip" extends="ship" identity="id" key-generator="seqgen">
   <description>Cargo is measured in net tonnage</description>
   <map-to table="cargo_ship"/>
   <field name="id" type="integer">
      <sql name="id" type="integer" />
   </field>
   <field name="cargoTonnage" type="integer">
      <sql name="cargo_tonnage" type="integer"/>
   </field>
</class>

<class name="OilTanker" extends="ship" identity="id" key-generator="seqgen">
  <description>Oil is measured in barrels</description>
  <map-to table="oil_tanker"/>
  <field name="id" type="integer">
    <sql name="id" type="integer" />
  </field>
  <field name="barrels" type="integer">
    <sql name="barrels" type="integer"/>
  </field>
</class>

Because new tables are required to store the additional data members of the extended classes, each extended class must have its own id, in addition to the identifying name/registry it already inherits from the base Ship class. If we were dealing only with Java classes, having these extra identifiers in the extended classes would be silly. But since we store the extended class data in two relational tables, each row in the second (extension) table must have a unique identifier; otherwise, the table would not be in First Normal Form (where each row must be distinct). The unique identifer in each row ensures that such is the case.

Mapping Multiple Classes to One Table

There can be cases where you might not want to create a new database table for each class to be persisted. It's possible in Castor to store data from a parent object and its child in one table.

The ships of the line, in addition to ferrying ubiquitous shipping containers across the seas, have to transport an odd-sized crate now and then. The dimensions of each crate are maintained in the database. The unit of measure of a dimension might be in either meters or feet.

public class Dimension
{
   public double measurement;
   public UOMType uom;
}

public class UOMType
{
   private static String _feet = "ft";
   private static String _meters = "m";
   private String _unit;

   public static UOMType FEET = new UOMType( _feet );
   public static UOMType METERS = new UOMType( _meters );

   private UOMType( String type ) { _unit = type;}
   public void setUnit( String type ) {_unit = type; }
   public String getUnit() {return _unit;}
}

The last class maintains public instances that denote two types of measurement. Each Dimension instance will refer to one of these UOMType objects. Since UOMType is a separate class from Dimension, it would seem that I'd have to have two class descriptions in the Castor mapping file. That's overkill for the tiny UOMType class, with its single primitive value.

An alternative approach is to store UOMType data in the same table that contains the Dimension rows. To accomplish this, I use a dot path in the class mapping for Dimension, like so:

<mapping> 
   <!--  Mapping for class Dimension  --> 
   <class name="Dimension" identity="id"> 
      <map-to table="dimension"/>
      <field name="id" type="integer"> 
         <sql name="id" type="integer"/> 
      </field> 
      <field name="measurement" type="double">
         <sql name="measurement" type="decimal"/>
      </field>
      <field name="uom.unit" type="string">
         <sql name="uom" type="varchar"/> 
      </field>   
   </class>
</mapping>

A dot path denotes data that is nested at a lower level in a container class than the class that is being mapped. By referring to nested data in this way, both the Dimension class data and the UOMType class data are stored in one table. With this approach, it's easy to look at the database row values and see what is being measured.

Mapping Object Hierarchies to Relational Tables

I use the term object hierarchy to refer to the parent-child hierarchy of class instances where an object has one or more objects as data members. In the ShippingLine example, the children of ShippingLine objects are instances of class Ship; Ship instances, in turn, have PortOfCall child objects as data members.

There are several ways to map such hierarchies to a relational database table. Each type of mapping is determined by factors such as the cardinality of parent-child relationship (one-to-one, one-to-many, many-to-many), and if children represent strong or weak entities.

Entities Both Strong and Weak

One of the primary constraints to be determined in any business model is the relative independence of data entities (represented as classes and their instances in Java). Some entities come into and out of existence of their own accord. For instance, new shipping lines are founded and others go out of business as time passes. We want to be able to create new ShippingLine classes at the drop of a hat. Shipping lines are strong entities in our data model. When a new shipping line comes into existence, I can construct a ShippingLine object to contain information about it.

There are some entities in our model that don't have such a carefree existence. Such an entity is so dependent upon another that it couldn't exist without the existence of the referenced entity. When an entity instance depends upon another for its very existence, it is a weak entity. Ships are weak entities and must be associated with a shipping line before we can incorporate ship information in our model.

Note that I'm referring to business rules in the abstract, and not to rules applicable to specific physical data models. There are two physical models that this article focuses upon: one in Java and one in a SQL database. If we tossed in XML documents, that would be a third. Each physical model has different mechanisms for enforcement of weak entity dependencies. In Java, a Ship's dependency upon a ShippingLine would usually be enforced by requiring a non-null ShippingLine.object parameter passed to the Ship class constructor. In the database, such a dependency would be enforced by having a required foreign key column in the ship table that has a value that matches a primary key in the ShippingLine table. In XML, the shippingLine element might be the root element of a document, and ships are child elements.

In fact, there may be different constraints for any one of these physical data models. For example, an XML document might report on all shipping lines in our database, and therefore would require a different root element. That's not a business constraint, however, so much as a constraint enforced by the physical data model representation. Those types of dependencies are best expressed locally; in this instance, the reporting module would be sure that all shippingLine elements are within the document's root element (whatever that may be).

I can go ahead and use the particular mechanisms that each physical model provides to enforce strong-strong and strong-weak entity constraints. There are drawbacks to this approach, however. The obvious one is that I must make sure that each mechanism is in fact enforcing these constraints in a manner compatible with the others. Also, changes in the strong-weak relationship of entities requires changes to schema and class definitions at separate locations. A more subtle drawback is that constraint mechanisms of a particular data model may be incompatible with the application's mechanisms for data storage and class construction. In fact, Castor's method of data binding often runs into trouble when columns in a database are defined with sophisticated column constraints.

Fortunately, the Castor mapping file has all of the appropriate features to ensure that entity relationships, both strong-strong and strong-weak, can be defined and enforced.

Dependent Classes

Castor uses the term dependent class to denote a class representing a weak entity. Instances of a dependent class have a parent master object that they are dependent upon. Through the use of a depends attribute in the class mapping element, the JDO persistence engine is able to ensure that a dependent class instance is only saved as part of its master object. In other words, you can't save a dependent class by itself.

There are other restrictions noted in the Castor documentation:

Before moving on, let's note two further advantages of enforcing dependencies through the mapping file.

  1. All depenencies are noted in one location. This makes maintenance easier when dependency relationships change.
  2. Dependent class objects do not require a separate save operation. When the dependent object's parent is saved, all of its depenencies will be saved, too.

Related Classes

Not all parent-child object relationships are strong-weak. A class representing a strong entity in the data model can refer to another strong entity class. A shipping line might be owned by a shipping magnate, or by a holding company. Both magnates and holding companies might be strong entities in our data model (they're certainly not dependent upon shipping lines for their existence (although, arguably, the magnate is)). Conversely, a shipping line may be owned by a magnate, or by a holding company, or it might be a wholly independent corporation -- so its existence is not dependent upon magnates or holding companies.

In such circumstances, even though one class might refer to the other, there isn't a strong-weak dependency there; it's possible to create magnates independently of shipping lines and vice versa. Both are strong entities in our model. For these cases, Castor declares them to be related classes. Related classes are not treated specially by Castor. This means that they come and go out of existence without regard for each other, and (most importantly) they must be saved independently.

One-to-many Relationship

A one-to-many relationship in Java is simply a class instance with a collection of objects of the same type. In a database, the one-to-many relationship is denoted by the appearance of identical foreign key values in a table (each row with the same foreign key value points to a single row in different table).

In the mapping file, fields that are collections of same-type objects are denoted by the collection attribute. The collection attribute can take one of several values:

Mapping Name java.util.collection
arraylist ArrayList
hashtable Hashtable
map Map (HashMap)
set Set (HashSet)
vector Vector

(Note: The classes in parenthesis are the default implementations of the interface shown. Defaults are used where collection instances are null.)

One consequence of a shipping line having one or more ships is that the ShippingLine class must have a collection whose elements are Ships. Let's say the collection is a Java ArrayList. The mapping file below highlights the way collections are mapped in class elements:

<class name="ShippingLine" identity="id" key-generator="seqgen">
  <map-to table="shipping_line"/> 
  <field name="id" type="integer" get-method="getID" set-method="setID"> 
    <sql name="id" type="integer"/> 
  </field> 
  <field name="name" type="string" required="true"> 
    <sql name="name" type="varchar"/>
  </field>
  <field name="ships" type="Ship" collection="arraylist>
    <sql many-key="id_shipping_line"/>
  </field> 
</class>

<class name="Ship" identity="registry name" depends="ShippingLine">
  <map-to table="ship"/> 
  <field name="registry" type="string" required="true">
    <sql name="registry" type="varchar"/> 
  </field> 
  <field name="name" type="string" required="true"> 
     <sql name="name" type="varchar"/> 
  </field> 
  <field name="shippingLine" type="ShippingLine" required="true"> 
    <sql name="id_shipping_line"/>
  </field> 

  <!-- rest of fields mapped here --> 
</class>

Here, we have a dependent one-to-many relationship between ShippingLine and Ship. This means that the collection of Ships will be saved automatically at the time ShippingLine is saved. When mapping the collection to a database table, the many_key attribute is used in the sql element. The many_key value refers to the column in the ship table that contains the foreign key to shipping_line. Let me restate that for emphasis: the shipping_line mapping refers to a foreign key column in the ship table. This value must match that of the name attribute in the sql element for the Ship class mapping (highlighted).

The foreign key column type of ship must match the type of the identifier (primary key) it refers to. Since the shipping_line table has a primary key on the id column (an integer), the foreign key id_shipping_line must also be of that type. However, note that this type is not shown in the field mapping for shippingLine. The type is inferred from the field's declared type. That type is not integer, but ShippingLine. This is the one case where the type of the field does not correlate with the type of the table column.

It's not enough, however, that I declare an ArrayList in ShippingLine to store Ship objects. I have to maintain a backlink to the master object from the dependent object. This is simple enough to do. First, I add a ShippingLine class instance to the Ship class, with get/set methods:

class Ship
{
  private ShippingLine _shippingLine;
  ...
  public ShippingLine getShippingLine() 
  { 
    return _shippingLine;
  }

  public void setShippingLine(ShippingLine shippingLine)
  { 
    _shippingLine = shippingLine;
  }
  ...
}

Next, I need to be sure that the reference to the master object is populated at the appropriate time. The logical place to do this is in ShippingLine's addShip() method:

public void addShip(Ship vShip) 
  throws java.lang.IndexOutOfBoundsException
{ 
  _shipList.add(vShip); 
  vShip.setShippingLine(this); //set the master object reference
}

What about the removeShip() method? Should it delete the master object reference from Ship when an instance is removed from the ShippingLine? It depends on the application, but as far as we're concerned, since Ship objects can only be persisted in the database through a ShippingLine, none of the Ship instances will ever be saved with an invalid master object reference (all instances will have gone through the ShippingLine.addShip() method).

Many-to-many Relationship

In order to properly represent many-to-many relationships in a database, an intersection table is required. An intersection table consists of two foreign key columns, one for each table in the many-to-many relation. Each column may have the same key value repeated in the table rows, although identical foreign key pairings should not appear in more than one row of the table.

Let's say that each PortOfCall keeps track of the Ships that visit it, in addition to each Ship maintaining its own PortOfCall list. Each PortOfCall sees many Ships, each Ship enters many PortsOfCall. This is our many-to-many relation.

I might only have two Java classes, but I will have three database tables to store these two classes. First, an outline of the Java classes is in order:

public class Ship
{
   String name;
  // String registry;  // commented out 
   ArrayList portsOfCall = new ArrayList();

   public Ship() { ... }

   public void addPortOfCall( PortOfCall poc )
   {
     portsOfCall.add( poc );
     poc.addShip( this );
   }

   //other methods...
} 

public class PortOfCall
{
   String name;
   ArrayList ships = new ArrayList;

   public PortOfCall() { ... }

   public void addShip( Ship ship )
   {
     portsOfCall.add( ship );
   }
 
   // other methods...
}

Note: Since they're not pertininent to the present discussion, I haven't shown how a ship's arrival and departure times figure into these two classes. Assume they are somehow associated with the Ship's PortOfCall instances.

Due to a limitation in the current version of Castor, I've had to rely on name alone to identify a Ship. The mapping files for both Ship and PortOfCall are as follows:

<class name="Ship" identity="name" depends="ShippingLine">
  <map-to table="ship"/> 
  <field name="registry" type="string" required="true">
    <sql name="registry" type="varchar"/>
  </field> 
  <field name="name" type="string" required="true"> 
    <sql name="name" type="varchar"/>
  </field> 
<field name="shippingLine" type="ShippingLine" required="true"> 
    <sql name="id_shipping_line"/> 
  </field>
  <field name="portOfCall" type="PortOfCall" collection="arraylist">
     <sql name="poc_name" many-table="poc_ship" many-key="ship_name"/>
  </field>
</class>

<class name="com.example.shipping.PortOfCall" identity="name">
   <map-to table="port_of_call"/>
   <field name="name" type="string"> 
     <sql name="name" type="varchar"/> 
   </field>  
   <field name="ship" type="Ship" collection="arraylist">
     <sql name="ship_name" many-table="poc_ship" many-key="poc_name"/> 
   </field> 
</class>

Nothing need be changed in the ship table, and the port_of_call table is simple enough:

CREATE TABLE port_of_call ( 
   name VARCHAR(80) 
); 

I now have to add a third join table, poc_ship, that links many port_of_call rows to many ship rows through their respective foreign keys (as described in the mapping file):

CREATE TABLE poc_ship ( 
   poc_name VARCHAR(80),    -- port_of_call fkey 
   ship_name VARCHAR(80)    -- ship fkey
 );

It's now possible to construct a test case, part of which is shown below:

...
ship[0].setName("Seven Seas");
ship[1].setName("Sea Six");

_shippingLine.addShip( ship[0] );
_shippingLine.addShip( ship[1] );  

PortOfCall pocs[] = new PortOfCall[]
{
   new PortOfCall("Pago Pago"), 
   new PortOfCall("Walla Walla")
}; 

ship[0].addPortOfCall( _poc[0] ); 
ship[0].addPortOfCall( _poc[1] );

ship[1].addPortOfCall( _poc[0] );
...
PersistenceFactory.getInstance().store(pocs[0], true); 
// true means create, not update
PersistenceFactory.getInstance().store(pocs[1], true);
//stores all Ships instances as well
PersistenceFactory.getInstance().store(_shippingLine, true); 

Now, if all went according to plan, I should see entries in the poc_ship table when I execute a SELECT * from poc_ship; SQL statement from the DBMS's command line:

poc_name     | ship_name
----------------------------
Pago Pago    | Sea Six
Walla Walla  | Sea Six
Pago Pago    | Seven Seas

If I see these entries, then I know I've successfully recorded a many-to-many join in the database using Castor JDO.

Making the Connection

In order to be able to access the database, we need to establish a connection to it. Castor maintains connection information in a separate file from the mapping file. This file is typically named database.xml.

The database.xml file contains information about the database being used and its location. It also indicates the JDBC driver to be used (JDBC stands for Java Database Connectivity, an vendor-neutral API for communicating with relational database management systems).

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE databases PUBLIC "-//EXOLAB/Castor JDO Configuration DTD
     Version 1.0//EN" "http://castor.exolab.org/jdo-conf.dtd">
<database name="shippingDB" engine="postgresql"> 
   <driver class-name="org.postgresql.Driver" url="jdbc:postgresql:shippingDB">  
      <param name="user" value="Administrator"/>
      <param name="password" value=""/> 
   </driver>    
   <mapping href="d:/shipping/mapping.xml" />
</database>

The database to connect to is indicated by the value of the name attribute in the database element. The engine attribute of that element tells Castor that it's a Postgres database (formerly known as PostgreSQL, hence the name of the Castor database engine).

The driver element has a class-name attribute that indicates the class in the Java classpath that functions as the JDBC driver. The url attribute is the connection string that the JDBC driver uses to make a connection to the Postgres database. A JDBC driver is often supplied by the database vendor or distributor. For Postgres, an up-to-date JDBC driver can be found at http://jdbc.postgresql.org.

Database Programming with JDBC & Java

Related Reading

Database Programming with JDBC & Java
By George Reese

The param elements have user and password information that the driver needs to log in to the database. Other JDBC drivers may have more (or possibly fewer) parameters, so additional param elements can be provided as children of the driver element.

Finally, the location of the JDO mapping file is given in href format. Once the JDBC connection is established, Castor reads in the mapping.xml file and is ready to start storing and retrieving data.

Storing and Retrieving Objects From the Database

The storing of class instances in the relational database is straightforward, thanks to the JDO mapping file and Castor's support of Object Query Language (OQL). OQL is a query language developed by the Object Database Management Group (ODMG), a consortium of object database vendors and users. OQL is a lot like SQL with objects thrown in.

Unlike SQL, where a myriad of statements must be used to store even a simple Java object (and requiring an inate understanding by the programmer as to how each object is mapped to the various database tables), OQL can store entire hierarchies of objects in a single statement.

Opening the Database Connection

Castor relies primarily on connection information in the database.xml file. To make the connection, first a JDO instance is created using a default constructor. The JDO is then configured with the database name to open and provide the location of the database.xml file. A database.xml file may contain connection information for more than one database, hence the need to specify a database name.

Optionally, database log information can be sent to an output stream of your choosing. The log information is useful for debugging, since it records all SQL commands and return statuses that Castor sends to and receives from the database during transaction commits.

package com.example.shipping; 

import org.exolab.castor.jdo.*; 

import java.io.PrintWriter; 
import java.io.ByteArrayOutputStream; 

class PersistenceFactory
{
   private static String _databaseName = "shippingDB"; 
   private static String _databaseConfiguration = "d:/shipping/database.xml";

   public PersistenceFactory() throws PersistenceException 
   {
      super();
      _jdoLogWriter = new PrintWriter(new ByteArrayOutputStream(400));
      open();
   }

   private void open() throws PersistenceException 
   { 
     JDO jdo; // Define the JDO object 
     jdo = new JDO(); 
     jdo.setLogWriter(_jdoLogWriter);
     jdo.setDatabaseName(_databaseName); 
     jdo.setConfiguration(_databaseConfiguration);  // Obtain a new database 
     _db = jdo.getDatabase();
   } 

   // other persistence methods... 
}

The core of the connection procedure is highlighted above. Once the connection is made, all subsequent database transaction go through Castor's Database class instance.

Storing Objects in the Database

As mentioned earlier, Castor uses OQL to store and retrieve data from the database. OQL is an object-oriented extension of SQL, so it is possible to use SQL commands, although for object manipulation, it's not the preferred choice.

Before doing anything with the database, you must first start a transaction using the begin() method of org.exolab.castor.jdo.Database. Changes are not made to the database until the transaction is committed using the commit() method. Both methods' calls are highlighted below:

public void store(Object obj, boolean create)
throws PersistStoreException
{  
  try {
     _db.begin();
     if (create)
     { 
        _db.create(obj);  
     }
     else 
     { 
        _db.update(obj); 
     } 
     _db.commit();
  } 
  catch (ClassNotPersistenceCapableException ex) 
    { throw new PersistStoreException(ex.toString()); } 
  catch (DuplicateIdentityException ex) 
    { throw new PersistStoreException(ex.toString()); } 
  catch (TransactionNotInProgressException ex) 
    { throw new PersistStoreException(ex.toString()); } 
  catch (TransactionAbortedException ex) 
    { throw new PersistStoreException(ex.toString()); } 
  catch (PersistenceException ex) 
    { throw new PersistStoreException(ex.toString()); } 
}

Once a transaction is started, you may save a new object in the database or, if changes were made to an object previously fetched from the database (as it could have been from a prior transaction), you may "update" it. If the Database create() method is called, then an object with the same identifier must not already exist in the database, or an exception will occur. Updates, on the other hand, require that the object be in the database, inasmuch as it has the same identifier.

Dependent objects are not created or updated directly, as you may recall. What counts is whether the master object exists previously in the database or not.

In the code above, I've elected to collapse all of the possible exceptions that Castor can throw during the transaction into one exception that I've defined. I'm assuming that any exception that gets thrown here is essentially fatal, but another (more clever?) person might be able to program a recovery under certain circumstances. Also, while having the transaction begin/commit calls inside of this one method simplifies the storage and update of single objects, you would normally want to commit multiple objects at one time in a single transaction for better efficiency. In that case, you'd begin your transaction before the first object is stored and commit after the last object is stored.

Retrieving the ShippingLine Objects

OQL queries for fetching object data look very much like SQL queries, except that you don't get a row cursor returned (where each row then has to be manhandled into a class instance); instead, you receive a cursor to a list of fully-formed objects. Very nice.

public Object fetch(String objName, String id) 
   throws PersistFetchException 
{ 
   Object object = null;  
   try { 
     _db.begin();  
     Query oql; 
     QueryResults results;  
     oql = _db.getOQLQuery(  
         "SELECT o" + 
         " FROM com.example.shipping." + objName + " o" +
         " WHERE o.id=\"" + id + "\"" );
     results = oql.execute();
  
     while (results.hasMore() ) 
     { 
        object = results.next();

     }  
     _db.commit(); 
  }
  catch ( TransactionAbortedException ex )
    { throw new PersistFetchException( ex.toString() ); } 
  catch ( TransactionNotInProgressException ex ) 
    { throw new PersistFetchException( ex.toString() ); } 
  catch ( PersistenceException ex ) 
    { throw new PersistFetchException( ex.toString() ); }

  return object;  
}

To fetch an object from the database, we first construct an OQL query using the getOQLQuery() method of org.exolab.castor.oql.Database. The query in this case might be:

SELECT o 
FROM com.example.shipping.Ship o
WHERE o.id="Waterdown"

Once the query object is constructed, it can be executed. The results are returned in a QueryResults object. The QueryResults object functions as an array of Object instances. I'm assuming in this code that I have only one result (bad programmer am I). However, when I originally wrote this example, I called the undocumented results.size() method and checked for a value equal to 1. I was unpleasantly surprised to find that size() always returned 0! In this case, it's a safe assumption that I'm only going to get one object per identifier, but it's not very defensive coding.

The returned object(s) can be cast to the expected class type by the caller.

Conclusion

Castor JDO provides a lot of capability for simplifying the storage of Java data objects in relational tables. The use of a mapping file to direct the Castor JDO engine on how to store object member data to table rows and columns really reduces the amount of grunge code involved in query/update procedures. Mapping files maintain information about object dependencies and primary/foreign keys, also; the result is that you don't have to futz with such dependencies in either the application code or database schema.

There are many additional features of Castor JDO that I was not able to cover in this article. A lot of these are not yet well-documented; however, there's usually some example code that you can download from the Castor Web site, either as a source repository snapshot (taken daily and for each release build) or by using WebCVS.

As of this writing, Castor sits at pre-release version 0.9.3.21. A call for contributors recently went out on the the Castor mailing list, so if storing objects in databases is your kind of gig, you might check out the possibilities available in getting Castor 1.0 out the door. Hey, it just might save you a lot of work!

Jeff Lowery is a JDO expert and advocate and is experienced at using Exolab's Castor, Jakarta Ant, and more.


Return to ONJava.com.

Copyright © 2009 O'Reilly Media, Inc.