Castor is a multifaceted software tool being developed under the auspices of exolab.org, an informal organization involved in the development of open source, enterprise software projects based on Java and XML.
The primary function of Castor is to perform data binding. Data binding is a process that facilitates the representation of one data model in another. For example, an XML data model, described by an XML schema document, can be approximately represented by Java classes. Castor helps by generating these classes from the XML schema document. Object instances of these classes are then able to store XML document data, so long as such documents conform to the XML schema.
Such binding works both ways, of course: object instances of the generated classes can be easily transformed back into XML documents through a process known as marshalling. Castor's marshalling engine can introspect the Java data objects and generate corresponding XML document elements. Such marshalling can be refined through the use of user-defined mapping files, which Castor also supports.
|
Related Reading
Java & XML Data Binding |
While XML data binding is very useful, this article will focus on another aspect of Castor: data binding of Java objects to tables, columns, and rows in a SQL database. This functionality falls under the heading of Castor JDO. The intent of the Java Data Objects (JDO) standard is to transparently persist Java objects. Although similar in name and intent to Sun's efforts, Castor JDO was developed largely independently of Sun's efforts, but aims roughly in the same direction. Be that as it may, the two technologies have different feature sets, and should not be assumed to be interoperable.
Castor is able to employ a mapping file as a simple way to bind Java objects to SQL database tables. Instead of writing complex procedural code in Java to manage the database queries and updates via JDBC, Castor hides this complexity by using the mapping file entries and transparently performing the proper queries and updates in the background. This declarative mechanism of descibing how the objects and the database are linked makes maintenance easier, because mapping files are relatively easy to understand and can be changed without recompiling code. It's also much simpler in that it operates at a higher level of abstraction than functionally equivalent JDBC mechanisms allow.
Unlike Castor's XML marshalling -- where there is a default mapping of a Java object to XML elements and attributes -- no default mapping exists for binding Java objects to SQL database tables: you must use a mapping file to enable this functionality. The mapping file contains explicit information on how Castor should represent a set of Java objects in a relational database.
Mapping files are written in XML, and can be validated against either a DTD or an XML schema supplied by the Castor group. This type of validation will catch syntax errors, but it does take some practice to understand the concepts behind mapping files and understanding Castor's JDO behavior. If Castor JDO appears not to be storing data in the database, it's likely due to missing entries in an otherwise syntactically valid mapping file.
Before going into the details of mapping, let's first present an overview
of the mapping elements. For every Java class whose instances are to be
stored in the database, a class element is required in the
mapping file. These class elements reside one level below
the mapping root element:
<mapping>
<description>Optional description of mapping file</description>
<!-- you can include other mapping files -->
<include href="other/mapping_file.xml"/>
<class name="Class1" map-to="db_table1">
<description>Optional description of this class mapping</description>
<field name="field> <!-- mapping of a class data member -->
<sql> ... </sql> <!-- maps Java field to database column -->
</field>
<field> ... </field>
...
</class>
<class name="Class2" map-to="db-table2">
...
</class>
</mapping>
For each class to be mapped to a database table (the
map-to attribute indicates which table), the data members of
the class to be persisted (referred to as fields by Castor) will
have corresponding field elements. Each field element in turn has a sql child
element, which describes how such fields are stored in the database.
Once a mapping has been established, Castor takes care of the queries and updates that are needed to fetch and store data as Java objects. It also handles conversions between SQL and Java datatypes seamlessly.
A mapping file is written from the point of view of the Java class and describes how the properties of each object are to be stored in database tables, columns, and rows. These class properties are referred to as fields in the mapping file.
The general rules for mapping Java objects to database tables are:
get/set methods.Castor uses the XML-format mapping file to determine:
Every element and attribute of a primitive type in Java (int, boolean,
double, etc.) can be mapped to a table column that is of a comparable SQL
type. Some nonprimitive types (classes), such as java.lang.String,
java.sql.Date, java.lang.Long, etc., can be mapped to a single table column, as
well. All other Java classes must have a class element definition in
the mapping file to be persisted in a database.
Remember, Castor will not introspect a class and apply a set of default rules to guess the fields and attempt to persist them. This is different behavior from that of the XML marshaller.
There may be cases where a class instance only exists as a member of a containing parent object (in other words, it makes an appearance only as a child of some other object). The mapping file conveniently allows such dependent objects to be denoted as such. The advantages of recording dependencies in the mapping file will be explained later.
When reconstructing an object from a SQL database, Castor uses the mapping information to determine where the class' data members are stored in the database.
It is possible to set up the mapping file so that an object doesn't have
to be pulled from the database in its entirety, but can be lazily fetched.
This means that certain members of the class instance will only be retrieved
when the appropriate get method is called. The result is that large
objects can be fetched incrementally -- avoiding the performance hit of
fetching a load of data from the database that might not be needed by the
program at that time.
We're now ready to start in on the mechanics of how a set of Java objects are mapped to a SQL database. For this example, we assume the following environment:
The main variable here is the database. Not all databases support the same features, and some support similar features differently. I will point out cases where the examples show a Postgres-specific way of doing things.
|
For purposes of illustration, it's best to have a set of classes that exhibit the following relations between objects:
The first three are self-explanatory cardinality relationships among class instances. Number 4 has already been covered. The fifth item, class related to another, is the case where a class has a data member reference to another, nondependent class. The referenced class in this case is a strong entity, whose instances exist independently of the parent class instance.
The sixth type of relation is that where a class extends (is derived from) another Java class. The extended class type may contain additional data members (fields) not defined in the base class. If the base class and its data members are already mapped to a database table, then it would be desirable to be able to reuse that mapping for a derived class. In this case, a new table is created to hold the extended class' data members, and the members that are inherited are stored in the table used for the base class. In other words, a class that extends another class can be mapped in way that, when instances of the extended class are saved, data member values of that class are stored in two tables: the base type table, and the extended type table.
I'll use for an example a description of a merchant shipping line. At the top level, the shipping line has ships, and each ship has ports-of-call. This gives us three main domain objects:
Each of these domain objects in turn will have properties, both simple and complex. The easiest way to show the relation between line, ship, and port, along with their respective properties, is to see the object marshalled into an XML document that contains sample information for all of the above:
<shipping-line name="Burble Bros. Shipping">
<ship name="Bubbles" registry="Liberian">
<port-of-call name="Seattle" pier="41">
<arrival-time day="2002-12-03" time="02:35" zone="GMT"/>
<departure-time day="2002-12-05" time="22:45" zone="GMT"/>
</port-of-call>
<port-of-call name="San Fransisco" pier="1">
...
</port-of-call>
...
</ship>
<ship name="Spring Surprise" registry="Sierra Leone">
<port-of-call ...>
...
</shipping-line>
As you can see in this XML document, each shipping line has multiple ships, and each ship has multiple ports-of-call. In addition, each element above has one or more attributes. In Java, such attributes are represented as class properties. Later, I'll show how these same attributes correspond to columns in a database table.
ShippingLine ClassEach shipping line has a name and several ships. An interface to the
ShippingLine class might look like the following:
public interface IShippingLine {
public String getName();
public Ship getShip( int i );
public ArrayList getShips();
public void addShip( Ship s );
public void setName( String name );
// more methods...
};
The ShippingLine class has a Java-bean-type interface, with get/set
methods for every data member of the class. Because a ShippingLine object can
contain many Ship instances, the methods for adding and retrieving Ship
objects are different, because they manipulate objects in a collection.
The actual data members of ShippingLine are as follows:
public class ShippingLine {
private String _name;
private ArrayList _ships;
private int _id;
// methods...
};
You'll note the addition of an _id data member that was not mentioned in
the interface methods shown previously. The _id member will be used only by the
JDO persistence engine as a table row identifier. It holds, in effect, the
value of the primary key of the row in database table that contains
ShippingLine object data. The _id key value will be
autogenerated (as you shall see), and will therefore be unique.
shipping_line TableWe will use the following SQL statement to create a table for holding
ShippingLine information in our Postgres database:
CREATE TABLE shipping_line ( id SERIAL, name VARCHAR(80) );
Since most SQL databases (including Postgres) are case-insensitive, we've
named the table shipping_line rather than shippingline to
make the table name easier to read. For the definition itself, the language
keywords have been capitalized, but again, that is only for legibility. Since
SQL does not have an array type, there isn't a ships column in the
table definition. In order to incorporate ships in shipping line, we will
have to use two tables. The Mapping Extended Classes section describes how this is done. For now, I'll concentrate on binding just this one table's information in the section to follow.
Before moving on to explain how to map a Java class to a table, let's first explain the table column definitions above in more detail.
id: The id column is defined as SERIAL, which is a Postgres notational shortcut for defining an auto-incrementing integer field. Such
auto-incrementing fields are useful as primary keys; in other words, the
columns used to uniquely identify rows in a table. When defining a column
of type SERIAL, a SEQUENCE table is generated in the background. A
sequence table is a special database table that holds the last generated
sequence number and has methods for retrieving the next sequence value.
More on this in the key generators section to follow.
name: The name column is defined as being of type VARCHAR
(a.k.a. CHARACTER VARYING) of maximum length 80. Postgres doesn't
actually require a maximum VARCHAR length to be specified, but most
database systems do. VARCHAR columns vary the number of bytes being
stored based on the length of the string values, and are therefore more
storage efficient than fixed-lenghth columns.
It may be that the name of a shipping line is sufficient to identify it
from any other. It may be likely that such is the case, and if so, then
shipping_line.name is sufficient as a row identifier in our table. The
shipping line name would then serve as a natural key. To be safe,
though, I'm using a generated sequential id, known as a surrogate
key. Surrogate keys have no semantics associated with them -- they merely
identify each row in the shipping_line table, and therefore each shipping
line. With surrogate keys, it is easy to ensure that no identity collisions
take place.
For table joins performed during a SQL query, integer-based surrogate key
lookups will prove faster than natural keys that require string comparisons.
Therefore, our foreign-key/primary-key columns in the database will use these
surrogate keys instead of the natural ones. However, it will be necessary to
place a unique index on the natural key as well as the surrogate, so that the
two are always mapped one-to-one (and the id is associated with one and only one
name):
CREATE TABLE shipping_line ( id SERIAL, name VARCHAR(20)
CONSTRAINT uniq_name UNIQUE (name));
Note that it was not necessary to define id as unique, since the SERIAL
datatype covers that base already.
Let's now proceed to write the XML mapping file.
|
As mentioned earlier, unlike Castor's XML marshalling, which has a default mapping based on Java introspection of the class definitions, a class-to-table mapping file must be provided explicitly in order for Castor to persist objects in a database. There are some tools, such as Raccoon, that can generate a Castor mapping file from Java beans. For the examples to follow, I'll author the mapping file by hand. By default, Castor expects a mapping file named mapping.xml.
If you've used Castor mapping files in the past to marshal XML, then the
format for binding Java data objects to databases is will seem familiar. Here
is the mapping between the ShippingLine Java class and the shipping_line database table:
<class name="ShippingLine" identity="id">
<map-to table="shipping_line"/>
<field name="id" type="integer">
<sql name="id" type="integer"/>
</field>
<field name="name" type="string" required="true">
<sql name="name" type="varchar"/>
</field>
</class>
What follows is an explanation of the elements and their values:
The class element names the class to be mapped and its
identity field. In cases where there are multiple fields that make up the
identifier for the class (and table), a space-separated list of id fields
can be specified. Say, for instance, that ShippingLine id was not unique,
but that id + name would be. In that case, the identity attribute of the
class element would have this value:
<class name="ShippingLine" identity="id name">
The map-to element identifies the database table that
will be used to persist the ShippingLine class data.
The first field to be mapped is the id field.
Its datatype is specified as integer, which is compatible with both the
Java primitive datatype intand class java.lang.Integer. A type
specification is not required, but by providing one, we add an extra bit
of type checking during the persistence process.
Values for the class data member id are to be stored in the id
column of the ShippingLine table, according to the sql
child element of field. The SQL type integer is compatible with that of the integer datatype used for the Java field.
The next field to be mapped is the name field. One of
the attributes of the field element indicates that a value for the name
is required. When it comes time to persist a ShippingLine object, that
object must have a name value or an exception will be thrown.
The sql element maps the name field to a like-named
column in shipping_line. The type of the column is declared to be
varchar, which is compatible with the java.lang.String type of the
field.
Note: Field names generally refer to methods, not the names of data members of a class.
Let's take a look at the ShippingLine class once more:
public class ShippingLine {
private String _name;
private ArrayList _ships;
private int _id;
// methods...
public int getId() { return _id; }
public void setId( int id ) { _id = id; }
//...
};
You see that the name of id data member actually has a preceding
underscore: _id. Note that we didn't see this underscore in the field name
of the class mapping. That's because Castor is not accessing the _id variable
directly. Instead, it is inferring the names of the getter/setter methods for
the _id data member from the field name provided in the class mapping. Hence,
getId() and not get_id().
It is possible, however, to access data members directly. The section "Binding Through Class Data Members" will show you how this is done.
In order to ensure that the id column of the shipping-line table
is unique, we defined the column to be of type SERIAL. The Postgres database
supports this nonstandard (non-SQL) type as a convenient way to specify
columns as unique identifiers for a table row. It's really more of a function
than a type. When a column of type SERIAL is declared in Postgres SQL, the
Postgres database management system creates a SEQUENCE table for that
column.
A SEQUENCE table is a special table designed specifically for the
autogeneration of incremental values. These values are most frequently used
as identifiers, or primary keys, of table rows. When the shipping_line table
was created, Postgres's SQL interpreter sees the SERIAL type of the id column
and creates a separate SEQUENCE table named shipping_line_id_seq .
Note: Just about every database has support for SEQUENCE tables,
although the syntax for using them may vary. Consult your database manual for
specifics. What follows is an explanation of how to use SEQUENCE tables in
Postgres; different databases may require slight alterations in the syntax
used below.
Having a SERIAL datatype for a column is only half of the process of the
auto-generation of keys. The remaining work to be done is to call the
auto-generating function of the sequence table prior to commiting a row
insertion. This is done by declaring a key-generator element
in the mapping.xml file. In this element, Castor has to know the name of the
SEQUENCE table in order to be able to call the proper SEQUENCE table method
for getting the next id value.
The way to specify the SEQUENCE table
name in the key-generator element is
shown below.
<key-generator name="SEQUENCE" alias="seqgen">
<param name="sequence" value="{0}_{1}_seq"/>
</key-generator>
The key-generator element has two attributes. The name attribute describes
the type of key being used. Castor supports various types of keys; SEQUENCE
keys are the most common. The alias attribute gives us a handle on the key
generator. It is possible to refer to the key generator by its name, but an
alias gives us the option of having two different SEQUENCE generators
in the mapping file, each under a separate alias.
The param child element indicates the SEQUENCE table name. Here we are
using replacable parameters, {0} and {1}, to specify the SEQUENCE table to be used. Every occurrence of the SERIAL column type requires a new sequence table to be
created. The name of the table is generated from the table name where SERIAL
occurs, along with the column name. Recall that the sequence table name for
the shipping_line id column was: shipping_line_id_seq. By using
the replaceable paramters {0} (which resolves to the table name containing
the SERIAL column) and {1} (the SERIAL column name), each time the
key-generator is called, the correct SEQUENCE table is used.
With my key-generator defined, I can next hook up a key-generator to the
shipping_line table mapping definition by adding a reference to the
key-generator via a class element attribute:
<class name="ShippingLine" identity="id">
<map-to table="shipping_line"/>
<field name="id" type="integer" key-generator="seqgen">
<sql name="id" type="integer"/>
</field>
<field name="name" type="string" required="true">
<sql name="name" type="varchar"/>
</field>
</class>
Now, every new shipping-line row added to our Postgres database will have
a unique, sequential value as an identifier.
Not all SQL tables have single-column keys; some keys are compound keys. A compound key uniquely identifies a row in a database table, but uses two or more columns to do so.
Let's say a ship's name uniquely identifies a ship within a
registry. That's not to say that a ship is uniquely identified by name; it is
instead identified by the compound identifier registry.name::ship.id. This
means that each ship must have a registry attribute, and that the combination of registry attribute and ship id must be unique within the XML document (regardless of which shipping line the ships are in). In XML schema, it is possible to define a unique constraint that
will enforce this for all ship elements in the document root:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="document_root">
<xs:complexType>
<xs:sequence>
<xs:element name="shippingLine" maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
<xs:element name="ship" maxOccurs="unbounded">
<xs:complexType>
<xs:attribute name="name" type="xs:string"
use="required"/>
<xs:attribute name="registry" type="xs:string"
use="required"/>
</xs:complexType>
</xs:element>
</xs:sequence>
<xs:attribute name="name" type="xs:string"/>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
<xs:unique name="shipUniq">
<xs:selector xpath="shippingLine/ship"/>
<xs:field xpath="@name"/>
<xs:field xpath="@registry"/>
</xs:unique>
</xs:element>
</xs:schema>
This same uniqueness constraint can be implemented in the database table
definition for ship:
CREATE TABLE ship (
registry VARCHAR(80) NOT NULL,
name VARCHAR(80) NOT NULL
-- other column definitions here --
CONSTRAINT id_ship UNIQUE (registry, name)
);
The database mapping file used by Castor can enforce this constraint for us, instead:
<class name="Ship" identity="registry name">
<map-to table="ship"/>
<field name="name" type="string" required="true">
<sql name="name" type="varchar"/>
</field>
<field name="registry" type="string" required="true">
<sql name="registry" type="varchar"/>
</field> <!-- rest of fields mapped here -->
</class>
The value of the identity attribute of the class element is, in this case, a
space-delimited list of fields that together uniquely identify each ship.
With the identity constraint defined here, there is no need to have a unique
constraint defined in the database table definition. Castor will handle
it.
|
There are two basic ways of binding the properties of a class to columns
in a SQL table. The first method is to use public class methods to access the
properties. By default, Castor uses this type of access, inferring the name
of the public accessor/mutator methods from the name of the field being
mapped. As was shown earlier, if the field name is id, then the
accessor method is assumed to be getId().
What if the method names are different from what Castor assumes? In that case, you have to inform Castor of the proper methods by specifying them in the field mapping element:
<class name="ShippingLine" identity="id">
<map-to table="shipping_line"/>
<field name="id" type="integer" key-generator="seqgen"
get-method="getID" set-method="setID">
<sql name="id" type="integer"/>
</field>
<field name="name" type="string" required="true">
<sql name="name" type="varchar"/>
</field>
</class>
Normally, Castor would be looking for the Java bean methods
getId() and setId() associated with the id
property. The class author (me) prefers to capitalize both letters of ID when
it occurs in method names. Because the capitalization is different from what
Castor would expect, the method names must be given explicitly in the mapping
file.
A second way to bind a class property to a SQL table column is to access the class data member directly, rather than through methods. The data member must be public for this mechanism to work. To specify direct access, the following attribute of the field element is used:
<class name="ShippingLine" identity="id">
<map-to table="shipping_line"/>
<field name="_id"type="integer" key-generator="seqgen" direct="true">
<sql name="id" type="integer"/>
</field>
<field name="name" type="string" required="true">
<sql name="name" type="varchar"/>
</field>
</class>
By accessing the data member directly in this way, you can reduce the
number of public methods in the class definition. Notice that I've had to add
the underscore character, since I'm now referring to the name of the data
member directly, and it has an underscore in its name. The drawback to this
mechanism is that, because of the need for a public data member, I
consequently allow direct access by outside class methods in the application.
In any case, you really don't have much protection from the wicked, but at
least with public get/set methods you can deprecate their use where advisable
through Javadoc comments.
In Java, a class definition can extend another (base) class' definition through type-derivation. The new class will inherit all of the data members and methods of the base class, plus it might add a few of its own.
In the case where only the extended class is stored in the database, the mapping file entry for that class will be the same as for any other class; however, if the base class is by itself persisted in the database already, then there will be an entry in the Castor mapping file for that class. It would be silly to have to copy all of those field mappings for the base class in the new mapping entry of the derived class. It would also be a maintenance worry: the derived class definition may change, and you'd wind up with two mapping entries to update.
Better would be to store the new data members of the extended class in a
new table in the database using a new mapping, and store the existing members
of the base class in the table already set aside for it. Then Castor can
perform a join operation between the two tables, generating a row that has
all of the values of the data members (both inherited and new) for the extended
class. Two joined table rows form one class instance.
extends AttributeA mapping for a class extends another class mapping through use of the
extends attribute. Let's say I want to subtype the ship
class so that there are cargo ships and passenger ships. I then have the
following Java classes:
public class Ship
{
String name;
String registry;
public Ship() { ... }
//getters/setters...
}
public class CargoShip extends Ship
{
int cargoTonnage;
public CargoShip() { ... }
//getters/setters...
}
public class OilTanker extends Ship
{
int barrels;
public OilTanker() { ... }
//getters/setters...
}
To store these new classes, I can map them as follows:
<class name="Ship" identity="registry name">
<map-to table="ship"/>
<field name="name" type="string" required="true">
<sql name="name" type="varchar"/>
</field>
<field name="registry" type="string" required="true">
<sql name="registry" type="varchar"/>
</field>
<!-- rest offields mapped here -->
</class>
<class name="CargoShip" extends="ship" identity="id" key-generator="seqgen">
<description>Cargo is measured in net tonnage</description>
<map-to table="cargo_ship"/>
<field name="id" type="integer">
<sql name="id" type="integer" />
</field>
<field name="cargoTonnage" type="integer">
<sql name="cargo_tonnage" type="integer"/>
</field>
</class>
<class name="OilTanker" extends="ship" identity="id" key-generator="seqgen">
<description>Oil is measured in barrels</description>
<map-to table="oil_tanker"/>
<field name="id" type="integer">
<sql name="id" type="integer" />
</field>
<field name="barrels" type="integer">
<sql name="barrels" type="integer"/>
</field>
</class>
Because new tables are required to store the additional data members of
the extended classes, each extended class must have its own id, in addition to
the identifying name/registry it already inherits from the base Ship class.
If we were dealing only with Java classes, having these extra identifiers in
the extended classes would be silly. But since we store the extended class
data in two relational tables, each row in the second (extension) table must
have a unique identifier; otherwise, the table would not be in First Normal
Form (where each row must be distinct). The unique identifer in each row
ensures that such is the case.
There can be cases where you might not want to create a new database table for each class to be persisted. It's possible in Castor to store data from a parent object and its child in one table.
The ships of the line, in addition to ferrying ubiquitous shipping containers across the seas, have to transport an odd-sized crate now and then. The dimensions of each crate are maintained in the database. The unit of measure of a dimension might be in either meters or feet.
public class Dimension
{
public double measurement;
public UOMType uom;
}
public class UOMType
{
private static String _feet = "ft";
private static String _meters = "m";
private String _unit;
public static UOMType FEET = new UOMType( _feet );
public static UOMType METERS = new UOMType( _meters );
private UOMType( String type ) { _unit = type;}
public void setUnit( String type ) {_unit = type; }
public String getUnit() {return _unit;}
}
The last class maintains public instances that denote two types of
measurement. Each Dimension instance will refer to one of
these UOMType objects. Since UOMType is a separate class
from Dimension, it would seem that
I'd have to have two class descriptions in the Castor mapping file. That's
overkill for the tiny UOMType class, with its single primitive value.
An alternative approach is to store UOMType data in the same table that
contains the Dimension rows. To accomplish this, I use a dot path in the
class mapping for Dimension, like so:
<mapping>
<!-- Mapping for class Dimension -->
<class name="Dimension" identity="id">
<map-to table="dimension"/>
<field name="id" type="integer">
<sql name="id" type="integer"/>
</field>
<field name="measurement" type="double">
<sql name="measurement" type="decimal"/>
</field>
<field name="uom.unit" type="string">
<sql name="uom" type="varchar"/>
</field>
</class>
</mapping>
A dot path denotes data that is nested at a lower level in a container
class than the class that is being mapped. By referring to nested data in
this way, both the Dimension class data and the
UOMType class data are stored
in one table. With this approach, it's easy to look at the database row
values and see what is being measured.
|
I use the term object hierarchy to refer to the parent-child
hierarchy of class instances where an object has one or more objects as data
members. In the ShippingLine example, the children of
ShippingLine objects
are instances of class Ship; Ship
instances, in turn, have PortOfCall child
objects as data members.
There are several ways to map such hierarchies to a relational database table. Each type of mapping is determined by factors such as the cardinality of parent-child relationship (one-to-one, one-to-many, many-to-many), and if children represent strong or weak entities.
One of the primary constraints to be determined in any business model is
the relative independence of data entities (represented as classes and their
instances in Java). Some entities come into and out of existence of their own
accord. For instance, new shipping lines are founded and others go out of
business as time passes. We want to be able to create new ShippingLine
classes at the drop of a hat. Shipping lines are strong entities in
our data model. When a new shipping line comes into existence, I can
construct a ShippingLine object to contain information about it.
There are some entities in our model that don't have such a carefree existence. Such an entity is so dependent upon another that it couldn't exist without the existence of the referenced entity. When an entity instance depends upon another for its very existence, it is a weak entity. Ships are weak entities and must be associated with a shipping line before we can incorporate ship information in our model.
Note that I'm referring to business rules in the abstract, and not to
rules applicable to specific physical data models. There are two physical
models that this article focuses upon: one in Java and one in a SQL database.
If we tossed in XML documents, that would be a third. Each physical model has
different mechanisms for enforcement of weak entity dependencies. In Java, a
Ship's dependency upon a ShippingLine would
usually be enforced by requiring
a non-null ShippingLine.object parameter passed to the Ship class
constructor. In the database, such a dependency would be enforced by having a
required foreign key column in the ship table that has a value that matches a
primary key in the ShippingLine table. In XML, the shippingLine
element might be the root element of a document, and ships are child elements.
In fact, there may be different constraints for any one of these physical
data models. For example, an XML document might report on all shipping lines
in our database, and therefore would require a different root element. That's
not a business constraint, however, so much as a constraint enforced by the physical
data model representation. Those types of dependencies are best
expressed locally; in this instance, the reporting module would be sure that
all shippingLine elements are within the document's root element (whatever
that may be).
I can go ahead and use the particular mechanisms that each physical model provides to enforce strong-strong and strong-weak entity constraints. There are drawbacks to this approach, however. The obvious one is that I must make sure that each mechanism is in fact enforcing these constraints in a manner compatible with the others. Also, changes in the strong-weak relationship of entities requires changes to schema and class definitions at separate locations. A more subtle drawback is that constraint mechanisms of a particular data model may be incompatible with the application's mechanisms for data storage and class construction. In fact, Castor's method of data binding often runs into trouble when columns in a database are defined with sophisticated column constraints.
Fortunately, the Castor mapping file has all of the appropriate features to ensure that entity relationships, both strong-strong and strong-weak, can be defined and enforced.
Castor uses the term dependent class to denote a class
representing a weak entity. Instances of a dependent class have a parent
master object that they are dependent upon. Through the use of a
depends attribute in the class mapping element, the JDO
persistence engine is able to ensure that a dependent class instance is only
saved as part of its master object. In other words, you can't save a
dependent class by itself.
There are other restrictions noted in the Castor documentation:
key-generators.Before moving on, let's note two further advantages of enforcing dependencies through the mapping file.
Not all parent-child object relationships are strong-weak. A class representing a strong entity in the data model can refer to another strong entity class. A shipping line might be owned by a shipping magnate, or by a holding company. Both magnates and holding companies might be strong entities in our data model (they're certainly not dependent upon shipping lines for their existence (although, arguably, the magnate is)). Conversely, a shipping line may be owned by a magnate, or by a holding company, or it might be a wholly independent corporation -- so its existence is not dependent upon magnates or holding companies.
In such circumstances, even though one class might refer to the other, there isn't a strong-weak dependency there; it's possible to create magnates independently of shipping lines and vice versa. Both are strong entities in our model. For these cases, Castor declares them to be related classes. Related classes are not treated specially by Castor. This means that they come and go out of existence without regard for each other, and (most importantly) they must be saved independently.
A one-to-many relationship in Java is simply a class instance with a collection of objects of the same type. In a database, the one-to-many relationship is denoted by the appearance of identical foreign key values in a table (each row with the same foreign key value points to a single row in different table).
In the mapping file, fields that are collections of same-type objects are
denoted by the collection attribute. The collection
attribute can take one of several values:
| Mapping Name | java.util.collection |
arraylist |
ArrayList |
hashtable |
Hashtable |
map |
Map (HashMap) |
set |
Set (HashSet) |
vector |
Vector |
(Note: The classes in parenthesis are the default
implementations of the interface shown. Defaults are used where collection
instances are null.)
One consequence of a shipping line having one or more ships is that the
ShippingLine class must have a collection whose elements are Ships. Let's
say the collection is a Java ArrayList. The mapping file below highlights
the way collections are mapped in class elements:
<class name="ShippingLine" identity="id" key-generator="seqgen">
<map-to table="shipping_line"/>
<field name="id" type="integer" get-method="getID" set-method="setID">
<sql name="id" type="integer"/>
</field>
<field name="name" type="string" required="true">
<sql name="name" type="varchar"/>
</field>
<field name="ships" type="Ship" collection="arraylist>
<sql many-key="id_shipping_line"/>
</field>
</class>
<class name="Ship" identity="registry name" depends="ShippingLine">
<map-to table="ship"/>
<field name="registry" type="string" required="true">
<sql name="registry" type="varchar"/>
</field>
<field name="name" type="string" required="true">
<sql name="name" type="varchar"/>
</field>
<field name="shippingLine" type="ShippingLine" required="true">
<sql name="id_shipping_line"/>
</field>
<!-- rest of fields mapped here -->
</class>
Here, we have a dependent one-to-many relationship between ShippingLine
and Ship. This means that the collection of Ships will be saved automatically
at the time ShippingLine is saved. When mapping the collection to a database table, the many_key attribute is used in the sql element. The many_key value refers to the column in the ship table that contains the foreign key to shipping_line. Let me restate that for emphasis: the shipping_line mapping refers to a foreign key column in the ship table. This value must match that of the name attribute in the sql element for the Ship class mapping (highlighted).
The foreign key column type of ship must match the type
of the identifier (primary key) it refers to. Since the
shipping_line table has a primary key on the
id column (an integer), the foreign key
id_shipping_line must also be of that type. However, note
that this type is not shown in the field mapping for shippingLine. The type
is inferred from the field's declared type. That type is not integer, but
ShippingLine. This is the one case where the type of the field does not
correlate with the type of the table column.
It's not enough, however, that I declare an ArrayList in ShippingLine to
store Ship objects. I have to maintain a backlink to the master object from
the dependent object. This is simple enough to do. First, I add a ShippingLine class instance to the Ship class, with get/set methods:
class Ship
{
private ShippingLine _shippingLine;
...
public ShippingLine getShippingLine()
{
return _shippingLine;
}
public void setShippingLine(ShippingLine shippingLine)
{
_shippingLine = shippingLine;
}
...
}
Next, I need to be sure that the reference to the master object is populated
at the appropriate time. The logical place to do this is in ShippingLine's
addShip() method:
public void addShip(Ship vShip)
throws java.lang.IndexOutOfBoundsException
{
_shipList.add(vShip);
vShip.setShippingLine(this); //set the master object reference
}
What about the removeShip() method? Should it delete the master object
reference from Ship when an instance is removed from the ShippingLine? It
depends on the application, but as far as we're concerned, since Ship objects
can only be persisted in the database through a ShippingLine, none of the
Ship instances will ever be saved with an invalid master object reference
(all instances will have gone through the ShippingLine.addShip() method).
|
In order to properly represent many-to-many relationships in a database, an intersection table is required. An intersection table consists of two foreign key columns, one for each table in the many-to-many relation. Each column may have the same key value repeated in the table rows, although identical foreign key pairings should not appear in more than one row of the table.
Let's say that each PortOfCall keeps track of the Ships that visit it, in
addition to each Ship maintaining its own PortOfCall list.
Each PortOfCall sees many Ships, each Ship enters many PortsOfCall. This is our many-to-many relation.
I might only have two Java classes, but I will have three database tables to store these two classes. First, an outline of the Java classes is in order:
public class Ship
{
String name;
// String registry; // commented out
ArrayList portsOfCall = new ArrayList();
public Ship() { ... }
public void addPortOfCall( PortOfCall poc )
{
portsOfCall.add( poc );
poc.addShip( this );
}
//other methods...
}
public class PortOfCall
{
String name;
ArrayList ships = new ArrayList;
public PortOfCall() { ... }
public void addShip( Ship ship )
{
portsOfCall.add( ship );
}
// other methods...
}
Note: Since they're not pertininent to the present discussion, I haven't shown how a ship's arrival and departure times figure into these two classes. Assume they are somehow associated with the Ship's PortOfCall instances.
Due to a limitation in the current version of Castor, I've had to rely on name alone to identify a Ship. The mapping files for both Ship and PortOfCall are as follows:
<class name="Ship" identity="name" depends="ShippingLine">
<map-to table="ship"/>
<field name="registry" type="string" required="true">
<sql name="registry" type="varchar"/>
</field>
<field name="name" type="string" required="true">
<sql name="name" type="varchar"/>
</field>
<field name="shippingLine" type="ShippingLine" required="true">
<sql name="id_shipping_line"/>
</field>
<field name="portOfCall" type="PortOfCall" collection="arraylist">
<sql name="poc_name" many-table="poc_ship" many-key="ship_name"/>
</field>
</class>
<class name="com.example.shipping.PortOfCall" identity="name">
<map-to table="port_of_call"/>
<field name="name" type="string">
<sql name="name" type="varchar"/>
</field>
<field name="ship" type="Ship" collection="arraylist">
<sql name="ship_name" many-table="poc_ship" many-key="poc_name"/>
</field>
</class>
Nothing need be changed in the ship table, and the port_of_call table is
simple enough:
CREATE TABLE port_of_call (
name VARCHAR(80)
);
I now have to add a third join table, poc_ship, that links many
port_of_call rows to many ship rows through their respective foreign
keys (as described in the mapping file):
CREATE TABLE poc_ship (
poc_name VARCHAR(80), -- port_of_call fkey
ship_name VARCHAR(80) -- ship fkey
);
It's now possible to construct a test case, part of which is shown below:
...
ship[0].setName("Seven Seas");
ship[1].setName("Sea Six");
_shippingLine.addShip( ship[0] );
_shippingLine.addShip( ship[1] );
PortOfCall pocs[] = new PortOfCall[]
{
new PortOfCall("Pago Pago"),
new PortOfCall("Walla Walla")
};
ship[0].addPortOfCall( _poc[0] );
ship[0].addPortOfCall( _poc[1] );
ship[1].addPortOfCall( _poc[0] );
...
PersistenceFactory.getInstance().store(pocs[0], true);
// true means create, not update
PersistenceFactory.getInstance().store(pocs[1], true);
//stores all Ships instances as well
PersistenceFactory.getInstance().store(_shippingLine, true);
Now, if all went according to plan, I should see entries in the poc_ship table when I execute a SELECT * from poc_ship; SQL statement from the DBMS's command line:
poc_name | ship_name
----------------------------
Pago Pago | Sea Six
Walla Walla | Sea Six
Pago Pago | Seven Seas
If I see these entries, then I know I've successfully recorded a many-to-many join in the database using Castor JDO.
In order to be able to access the database, we need to establish a connection to it. Castor maintains connection information in a separate file from the mapping file. This file is typically named database.xml.
The database.xml file contains information about the database being used and its location. It also indicates the JDBC driver to be used (JDBC stands for Java Database Connectivity, an vendor-neutral API for communicating with relational database management systems).
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE databases PUBLIC "-//EXOLAB/Castor JDO Configuration DTD
Version 1.0//EN" "http://castor.exolab.org/jdo-conf.dtd">
<database name="shippingDB" engine="postgresql">
<driver class-name="org.postgresql.Driver" url="jdbc:postgresql:shippingDB">
<param name="user" value="Administrator"/>
<param name="password" value=""/>
</driver>
<mapping href="d:/shipping/mapping.xml" />
</database>
The database to connect to is indicated by the value of the name attribute
in the database element. The engine attribute of that element tells Castor
that it's a Postgres database (formerly known as PostgreSQL, hence the name
of the Castor database engine).
The driver element has a class-name attribute that indicates the class in
the Java classpath that functions as the JDBC driver. The url attribute is
the connection string that the JDBC driver uses to make a connection to the
Postgres database. A JDBC driver is often supplied by the database vendor or
distributor. For Postgres, an up-to-date JDBC driver can be found at http://jdbc.postgresql.org.
|
Related Reading Database Programming with JDBC & Java |
The param elements have user and password
information that the driver needs to log in to the database. Other JDBC
drivers may have more (or possibly fewer) parameters, so additional param
elements can be provided as children of the driver element.
Finally, the location of the JDO mapping file is given in href format.
Once the JDBC connection is established, Castor reads in the mapping.xml file
and is ready to start storing and retrieving data.
|
The storing of class instances in the relational database is straightforward, thanks to the JDO mapping file and Castor's support of Object Query Language (OQL). OQL is a query language developed by the Object Database Management Group (ODMG), a consortium of object database vendors and users. OQL is a lot like SQL with objects thrown in.
Unlike SQL, where a myriad of statements must be used to store even a simple Java object (and requiring an inate understanding by the programmer as to how each object is mapped to the various database tables), OQL can store entire hierarchies of objects in a single statement.
Castor relies primarily on connection information in the database.xml file. To make the connection, first a JDO instance is created using a default constructor. The JDO is then configured with the database name to open and provide the location of the database.xml file. A database.xml file may contain connection information for more than one database, hence the need to specify a database name.
Optionally, database log information can be sent to an output stream of your choosing. The log information is useful for debugging, since it records all SQL commands and return statuses that Castor sends to and receives from the database during transaction commits.
package com.example.shipping;
import org.exolab.castor.jdo.*;
import java.io.PrintWriter;
import java.io.ByteArrayOutputStream;
class PersistenceFactory
{
private static String _databaseName = "shippingDB";
private static String _databaseConfiguration = "d:/shipping/database.xml";
public PersistenceFactory() throws PersistenceException
{
super();
_jdoLogWriter = new PrintWriter(new ByteArrayOutputStream(400));
open();
}
private void open() throws PersistenceException
{
JDO jdo; // Define the JDO object
jdo = new JDO();
jdo.setLogWriter(_jdoLogWriter);
jdo.setDatabaseName(_databaseName);
jdo.setConfiguration(_databaseConfiguration); // Obtain a new database
_db = jdo.getDatabase();
}
// other persistence methods...
}
The core of the connection procedure is highlighted above. Once the
connection is made, all subsequent database transaction go through Castor's
Database class instance.
As mentioned earlier, Castor uses OQL to store and retrieve data from the database. OQL is an object-oriented extension of SQL, so it is possible to use SQL commands, although for object manipulation, it's not the preferred choice.
Before doing anything with the database, you must first start a
transaction using the begin() method of org.exolab.castor.jdo.Database.
Changes are not made to the database until the transaction is committed using
the commit() method. Both methods' calls are highlighted below:
public void store(Object obj, boolean create)
throws PersistStoreException
{
try {
_db.begin();
if (create)
{
_db.create(obj);
}
else
{
_db.update(obj);
}
_db.commit();
}
catch (ClassNotPersistenceCapableException ex)
{ throw new PersistStoreException(ex.toString()); }
catch (DuplicateIdentityException ex)
{ throw new PersistStoreException(ex.toString()); }
catch (TransactionNotInProgressException ex)
{ throw new PersistStoreException(ex.toString()); }
catch (TransactionAbortedException ex)
{ throw new PersistStoreException(ex.toString()); }
catch (PersistenceException ex)
{ throw new PersistStoreException(ex.toString()); }
}
Once a transaction is started, you may save a new object in the database
or, if changes were made to an object previously fetched from the database
(as it could have been from a prior transaction), you may "update" it. If the
Database create() method is called, then an object with the same identifier
must not already exist in the database, or an exception will occur. Updates,
on the other hand, require that the object be in the database, inasmuch as it
has the same identifier.
Dependent objects are not created or updated directly, as you may recall. What counts is whether the master object exists previously in the database or not.
In the code above, I've elected to collapse all of the possible exceptions
that Castor can throw during the transaction into one exception that I've
defined. I'm assuming that any exception that gets thrown here is essentially
fatal, but another (more clever?) person might be able to program a recovery
under certain circumstances. Also, while having
the transaction begin/commit calls inside of this one method simplifies the
storage and update of single objects, you would normally want to commit
multiple objects at one time in a single transaction for better efficiency.
In that case, you'd begin your transaction before the first object is stored
and commit after the last object is stored.
ShippingLine ObjectsOQL queries for fetching object data look very much like SQL queries, except that you don't get a row cursor returned (where each row then has to be manhandled into a class instance); instead, you receive a cursor to a list of fully-formed objects. Very nice.
public Object fetch(String objName, String id)
throws PersistFetchException
{
Object object = null;
try {
_db.begin();
Query oql;
QueryResults results;
oql = _db.getOQLQuery(
"SELECT o" +
" FROM com.example.shipping." + objName + " o" +
" WHERE o.id=\"" + id + "\"" );
results = oql.execute();
while (results.hasMore() )
{
object = results.next();
}
_db.commit();
}
catch ( TransactionAbortedException ex )
{ throw new PersistFetchException( ex.toString() ); }
catch ( TransactionNotInProgressException ex )
{ throw new PersistFetchException( ex.toString() ); }
catch ( PersistenceException ex )
{ throw new PersistFetchException( ex.toString() ); }
return object;
}
To fetch an object from the database, we first construct an OQL query
using the getOQLQuery() method of org.exolab.castor.oql.Database. The query
in this case might be:
SELECT o
FROM com.example.shipping.Ship o
WHERE o.id="Waterdown"
Once the query object is constructed, it can be executed. The results are
returned in a QueryResults object. The QueryResults object
functions as an array of Object instances. I'm assuming in this code that I have only one
result (bad programmer am I). However, when I originally wrote this example,
I called the undocumented results.size() method and checked for a value equal
to 1. I was unpleasantly surprised to find that size() always returned 0! In
this case, it's a safe assumption that I'm only going to get one object per
identifier, but it's not very defensive coding.
The returned object(s) can be cast to the expected class type by the caller.
Castor JDO provides a lot of capability for simplifying the storage of Java data objects in relational tables. The use of a mapping file to direct the Castor JDO engine on how to store object member data to table rows and columns really reduces the amount of grunge code involved in query/update procedures. Mapping files maintain information about object dependencies and primary/foreign keys, also; the result is that you don't have to futz with such dependencies in either the application code or database schema.
There are many additional features of Castor JDO that I was not able to cover in this article. A lot of these are not yet well-documented; however, there's usually some example code that you can download from the Castor Web site, either as a source repository snapshot (taken daily and for each release build) or by using WebCVS.
As of this writing, Castor sits at pre-release version 0.9.3.21. A call for contributors recently went out on the the Castor mailing list, so if storing objects in databases is your kind of gig, you might check out the possibilities available in getting Castor 1.0 out the door. Hey, it just might save you a lot of work!
Jeff Lowery is a JDO expert and advocate and is experienced at using Exolab's Castor, Jakarta Ant, and more.
Return to ONJava.com.
Copyright © 2009 O'Reilly Media, Inc.