ONJava.com    
 Published on ONJava.com (http://www.onjava.com/)
 See this if you're having trouble printing code examples


Don't Let Hibernate Steal Your Identity

by James Brundege
09/13/2006

Enterprise Java applications often move data back and forth between Java objects and relational databases. There are several ways to do this, ranging from manually coded SQL to sophisticated object-relational mapping (ORM) solutions such as Hibernate. Regardless of what technique you use, once you start persisting Java objects to a database object identity becomes a complex and difficult-to-manage topic. The possibility arises that you will instantiate two different objects that represent the same row in the database. To handle this, you must properly implement the equals() and hashCode() methods on your persistent objects, but a proper implementation of these methods may be trickier than it at first appears. To make matters worse, the conventional wisdom (as espoused in the official Hibernate documentation) may not lead you to the most practical solution for new projects.

The problem stems from differences between object identity in the virtual machine (VM) and object identity in the database. In the VM you do not get an ID for an object; you simply hold direct references to the object. Behind the scenes, the VM does assign an eight-byte ID, which is what a reference to an object really is. The problems start when you persist an object in a database. Say you create a Person object and save it to a database (person1). Somewhere else in your code you read in the Person data and instantiate a new Person object (person2). You now have two objects in memory that are mapped to the same row in the database. An object reference can only point to one or the other, but we need a way to show that these are really the same entity. This is where object identity comes in.

In Java, object identity is defined by the equals() method (and the related hashCode() method) that is present on every object. The equals() method should determine whether two objects represent the same entity, regardless of whether they are the same instance. The hashCode() method is related because all objects that are equal should also return an identical hashCode. By default, equals() compares object references. An object is equal to itself, and not equal to any other instance. For persistent objects, it is important to override these methods so objects that represent the same row in the database are always considered equal. This is particularly important for Java Collections (Sets, Maps, and Lists) to work correctly.

To illustrate the different ways to implement equals() and hashCode(), let's consider a simple Person object that we want to persist to the database.

public class Person {
    private Long id;
    private Integer version;

    public Long getId() {
        return id;
    }
    public void setId(Long id) {
        this.id = id;
    }
    public Integer getVersion() {
        return version;
    }
    public void setVersion(Integer version) {
        this.version = version;
    }

    // person-specific properties and behavior

}

In this example, we've followed best practices by having both an id field and a version field. The id holds the value used as the primary key in the database, and the version starts at 0 and is incremented each time the object is updated (this helps us avoid concurrent update problems). For clarity, let's also look at the Hibernate mapping file that allows Hibernate to persist this object to a database:

<?xml version="1.0"?>
<!DOCTYPE hibernate-mapping SYSTEM
    "http://hibernate.sourceforge.net/
    hibernate-mapping-3.0.dtd">

<hibernate-mapping package="my.package">

  <class name="Person" table="PERSON">

    <id name="id" column="ID"
        unsaved-value="null">
      <generator class="sequence">
        <param name="sequence">PERSON_SEQ</param>
      </generator>
    </id>

    <version name="version" column="VERSION" />

    <!-- Map Person-specific properties here. -->

  </class>

</hibernate-mapping>

The Hibernate mapping file indicates that the id field on Person is the database ID (i.e. it is the primary key in the PERSON table). Within the id tag is an attribute, unsaved-value="null", that tells Hibernate to use the id field to determine whether a Person object has been previously saved or not. ORM frameworks must make this distinction to know whether they should save the object with a SQL INSERT or UPDATE statement. In this case, Hibernate assumes that the id field starts out null on new objects and is assigned when they are first saved. There is also a generator tag that tells Hibernate where to get an id to assign to the object the first time it is saved. In this case, Hibernate is using a database sequence as a source of unique IDs. Finally, the version tag tells Hibernate to the use the Person object's version field for concurrency control. Hibernate will enforce an optimistic locking scheme, whereby it checks the object's version number against the database's version number before saving changes to the object.

What's missing from our Person object is an implementation of equals() and hashCode(). Since this is a persistent object, we don't want to rely on the default implementation, which can't distinguish between two different instances that represent the same row in the database. A simple and obvious way to implement these methods is to use the id field for the equals() comparison and to generate the hashCode().

public boolean equals(Object o) {
    if (this == o) return true;
    if (o == null || !(o instanceof Person))
        return false;

    Person other = (Person)o;

    if (id == other.getId()) return true;
    if (id == null) return false;

    // equivalence by id
    return id.equals(other.getId());
}

public int hashCode() {
    if (id != null) {
        return id.hashCode();
    } else {
        return super.hashCode();
    }
}

Unfortunately, there is a problem with this implementation. When we first create a Person object the id is null, which means any two Person objects are considered equal if they haven't been saved yet. If we were to create a Person and put it in a Set, then create a completely different Person and put it in the same Set, the second Person could not be added. That's because the Set would conclude that all unsaved objects are the same.

You may be tempted to implement an equals() method that uses the id only if the id is set. After all, if two objects haven't been saved yet, we can assume they're different objects since they will be assigned different primary keys when they're saved to the database.

public boolean equals(Object o) {
    if (this == o) return true;
    if (o == null || !(o instanceof Person))
        return false;

    Person other = (Person)o;

    // unsaved objects are never equal
    if (id == null || other.getId() == null)
        return false;

    return id.equals(other.getId());
}

There is a hidden problem here. The Java Collections framework needs equals() and hashCode() to be based on immutable fields for the lifetime of the Collection. In other words, you can't change the value of equals() or hashCode() while the object is in a Collection. For example, this program:

Person p = new Person();
Set set = new HashSet();
set.add(p);
System.out.println(set.contains(p));
p.setId(new Long(5));
System.out.println(set.contains(p));

Prints: true false

The second call to set.contains(p) returns false because the Set can no longer find p. The Set has literally lost our object! That's because we changed the value for hashCode() while the object was inside the set.

This is a problem when you want to create domain objects that hold other domain objects in Sets, Maps, or Lists. To do so you must provide an implementation of equals() and hashCode() for each of your objects that is valid both before and after saving the objects and does not change while the objects are in memory. The Hibernate Reference Documentation (v. 3) provides this suggestion:

"Never use the database identifier to implement equality; use a business key, a combination of unique, usually immutable, attributes. The database identifier will change if a transient object is made persistent. If the transient instance (usually together with detached instances) is held in a Set, changing the hashcode breaks the contract of the Set. Attributes for business keys don't have to be as stable as database primary keys, you only have to guarantee stability as long as the objects are in the same Set." (Hibernate Reference Documentation v. 3.1.1).
"We recommend implementing equals() and hashCode() using Business key equality. Business key equality means that the equals() method compares only the properties that form the business key, a key that would identify our instance in the real world (a natural candidate key)" (Hibernate Reference Documentation v. 3.1.1).

In other words, use a natural key for equals() and hashCode(), and use a Hibernate-generated surrogate key for the object's id. This works as long as you have a relatively immutable natural key for each of your objects. However, you may not have such a key for every object type, and you may be tempted to use fields that don't change often, but could change. This is consistent with the idea that business keys don't have to be as stable as database primary keys. It is "good enough" if they don't change for the lifespan of the collection the object is in. This is a dangerous proposition, as it means your application probably won't break, but it could break if someone updates the right field under the right circumstances. There really should be a better solution, and there is.

Don't let Hibernate manage your ids.

All of the problems discussed so far derive from trying to create and maintain separate definitions of identity for objects and database rows. These problems all go away if we unify all forms of identity. That is, instead of having a database-centric ID, or an object-centric ID, we should create one universal entity-specific ID that represents the data entity and is created when the data is first entered. This universal ID can identify one unique data entity regardless of whether it is stored in a database, as an object in memory, or in any other format or medium. By using entity IDs that are assigned when the data entity is first created, we can safely return to our original definition of equals() and hashCode() that simply uses the id:

public class Person {
    // assign an id as soon as possible
    private String id = IdGenerator.createId();
    private Integer version;

    public String getId() {
        return id;
    }
    public void setId(String id) {
        this.id = id;
    }

    public Integer getVersion() {
        return version;
    }
    public void setVersion(Integer version) {
        this.version = version;
    }

    // Person-specific fields and behavior here

    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || !(o instanceof Person))
            return false;

        Person other = (Person)o;

        if (id == null) return false;
        return id.equals(other.getId());
    }

    public int hashCode() {
        if (id != null) {
            return id.hashCode();
        } else {
            return super.hashCode();
        }
    }
}

This example uses the object id as the definition of equals() and to derive hashCode(). This is much simpler. However, to make this work we need two things. First, we need a way to ensure every object has an id even before it is saved. This example assigns the id a value as soon as the id variable is declared. Second, we need a way to determine if this is a newly created object, or a previously saved object. In our original example, Hibernate checked whether the id field was null to determine if the object was new. Obviously this won't work anymore since our object id is never null. We can easily solve this by configuring Hibernate to check whether the version field, rather than the id field, is null. The version field is a much more appropriate indicator of whether your object has been previously saved.

Here is the Hibernate mapping document for our improved Person class.

<?xml version="1.0"?>
<!DOCTYPE hibernate-mapping SYSTEM
"http://hibernate.sourceforge.net/
hibernate-mapping-3.0.dtd">

<hibernate-mapping package="my.package">

  <class name="Person" table="PERSON">

    <id name="id" column="ID">
      <generator class="assigned" />
    </id>

    <version name="version" column="VERSION"
        unsaved-value="null" />

    <!-- Map Person-specific properties here. -->

  </class>

</hibernate-mapping>

Note that the generator tag under the id has the attribute class="assigned". This tells Hibernate that we're assigning the id in our code, rather than letting it assign the id from the database. Hibernate will simply expect the id to be there even for new, unsaved objects. We've also added a new attribute to the version tag: unsaved-value="null". This tells Hibernate to look for a null version (rather than a null id) as an indicator that the object is new. We could just as easily tell Hibernate to look for a negative value as an unsaved indicator, which is useful if you prefer to use an int for your version field instead of an Integer.

We've gained several benefits by moving to a pure object, id. Our implementation of equals() and hashCode() is much simpler and easier to read. These methods are no longer error-prone and are guaranteed to work with Collections both before and after the objects are saved. Hibernate is also a bit faster, since it no longer needs to read a sequence value from the database before saving a new object. Furthermore, the new definition of equals() and hashCode() is universal for all objects that contain an object id. That means we can move those methods to an abstract parent class. We no longer need to re-implement equals() and hashCode() for every domain object, and we no longer need to think through which combination of fields is both unique and immutable for each class. Instead, we simply extend the abstract parent class. Of course, we don't want to force our domain objects to extend from a parent class, so we'll also define an interface to keep things flexible.

public interface PersistentObject {
    public String getId();
    public void setId(String id);

    public Integer getVersion();
    public void setVersion(Integer version);
}

public abstract class AbstractPersistentObject
        implements PersistentObject {

    private String id = IdGenerator.createId();
    private Integer version;

    public String getId() {
        return id;
    }
    public void setId(String id) {
        this.id = id;
    }

    public Integer getVersion() {
        return version;
    }
    public void setVersion(Integer version) {
        this.version = version;
    }

    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null ||
            !(o instanceof PersistentObject)) {

            return false;
        }

        PersistentObject other
            = (PersistentObject)o;

        // if the id is missing, return false
        if (id == null) return false;

        // equivalence by id
        return id.equals(other.getId());
    }

    public int hashCode() {
        if (id != null) {
            return id.hashCode();
        } else {
            return super.hashCode();
        }
    }

    public String toString() {
        return this.getClass().getName()
            + "[id=" + id + "]";
    }
}

We now have a simple and effective way to create domain objects. They extend AbstractPersistentObject, which automatically gives them an id when they're first created and properly implements equals() and hashCode(). They also get a reasonable default implementation of toString() that they can optionally override. If this is a test object or an example object for a query-by-example the id can be changed or set to null. Otherwise it should not be altered. If for some reason we need to create a domain object that extends some other class, it can implement the PersistentObject interface rather than extend the abstract class.

Our Person class is now much simpler:

public class Person
    extends AbstractPersistentObject {

    // Person-specific fields and behavior here

}

The Hibernate mapping document has not changed since the last example. We don't bother to tell Hibernate about the abstract parent class, and instead just make sure that every PersistentObject mapping file includes an id (with an "assigned" generator) and a version tag with unsaved-value="null". Astute readers may have noticed that an id gets assigned every time a persistent object is instantiated, which means that even when Hibernate is reading an existing object from the database it will get a new id when Hibernate creates an in-memory instance of the saved object. That's OK; Hibernate will then call setId() on the object to replace the newly assigned id with the saved id. The extra id generation is not a problem as long as our id generation algorithm is cheap (i.e., it does not need to contact the database).

So far so good, but we've left out one critical detail: how to implement IdGenerator.createId(). We can define some criteria for our ideal key-generation algorithm:

What we need is a universally unique identifier (UUID). UUIDs are 16-byte (128-bit) numbers that adhere to a standard format. The String version of a UUID looks like this:

2cdb8cee-9134-453f-9d7a-14c0ae8184c6

The characters are simple hexadecimal representations of the bytes, while the dashes delimit different portions of the number. This format is simple and easy to work with, but at 36 characters it is a bit long. The dashes are always located in the same place and can be removed to reduce the size to 32 characters. For a more compact representation you can create an object that holds the number as either a byte[16] array, or as two eight-byte longs. If you're using Java 1.5 or later, you can use the UUID class directly, though this is not the most compact in-memory format. For more information, see the Wikipedia UUID entry and the JavaDoc entry for the UUID class.

There are several implementations of UUID-producing algorithms. Since the final UUID is in a standard format, it doesn't matter which one we use in our IdGenerator class. We can even change the implementation at any time or mix and match different implementations, since each id is guaranteed unique regardless of the algorithm. If you're using Java 1.5 or later, the most convenient implementation is the java.util.UUID class:

public class IdGenerator {
    public static String createId() {
        UUID uuid = java.util.UUID.randomUUID();
        return uuid.toString();
    }
}

For those not using Java 1.5+, there are at least two external libraries that implement UUIDs and are compatible with earlier versions of Java: the Apache Commons ID project and the Java UUID Generator (JUG) project. Both are available under the Apache License (JUG is also available under the LGPL).

Here's an example implementation of IdGenerator using the JUG library:

import org.safehaus.uuid.UUIDGenerator;

public class IdGenerator {

    public static final UUIDGenerator uuidGen
        = UUIDGenerator.getInstance();

    public static String createId() {
        UUID uuid
            = uuidGen.generateRandomBasedUUID();
        return uuid.toString();
    }
}

What about the UUID generator algorithm that is built into Hibernate? Is this a suitable way to get UUIDs for object identity? Not if you want object identity to be independent of object persistence. While Hibernate does have the option of generating UUIDs for you, we're back to the same old problem: your objects will not receive an ID at the time of creation, and will have to wait until they're saved.

The major drawback of using UUIDs as database primary keys is their size in the database (rather than in memory), where indexes and foreign keys compound the size increase. Here you have to make trade-offs. With a String representation, the database primary keys are 32 or 36 bytes. The number can also be stored directly as bytes, which cuts the size by half but is harder to understand when directly querying the database. Whether these approaches are feasible for your project depends on your requirements.

If using UUIDs as primary keys is not acceptable for your database, you may want to consider using a database sequence, but always assign IDs at the time new objects are created rather than letting Hibernate manage your IDs. In this case, your business objects that create new domain objects can call a service that uses a data access object (DAO) to retrieve the id from a database sequence. If you can use a Long datatype for your object ids, a single database sequence (and service method) will suffice for all your domain objects.

Conclusion

Object identity is deceptively hard to implement correctly when objects are persisted to a database. However, the problems stem entirely from allowing objects to exist without an id before they are saved. We can solve these problems by taking the responsibility of assigning object IDs away from object-relational mapping frameworks such as Hibernate. Instead, object IDs can be assigned as soon as the object is instantiated. This makes object identity simple and error-free, and reduces the amount of code needed in the domain model.

Resources

James Brundege currently works as an independent contractor and consultant through his company Synaptocode Software LLC.


Return to ONJava.com.

Copyright © 2009 O'Reilly Media, Inc.