Your data model was near perfect when your application was first written. Since then, it has evolved. You've hacked, you've denormalized, and, as a result, you've spent countless hours in meetings ranting about the fixes you need to put in place.
Yet, you're ambivalent. Despite your cogent arguments, you're loath to putting together the "change-all-your-data-all-at-once" plan. It's just too risky. There are countless applications that directly read from and write to your database--you can't change all of them at once! If only you could only fix your data model one piece at a time, and one application at a time.
It's a typical scenario, really. Over time, IT organizations at small, medium, and large enterprises create disparate applications that access vital data stored in a centralized database. And slowly, moderately ill-designed data models start dragging down performance, scalability, and the overall efficiency of an organization.
In this article, we will show readers how to upgrade their faulty schemas and data models without affecting existing applications or processes. By using the latest technology from Hibernate (version 3.0 and up)--along with a combination of database views, stored procedures, and standard design patterns--application developers and data architects can repair a faulty data model, one piece at a time.
Here's how we'll do it:
Now let's dive into the details of each one of these steps. But first, let's present the example at hand.
Our example is an overly denormalized order system. Instead of dividing the orders into an ORDER table and an ORDER_ITEM table, the original data designer decided to put all order information into one table, CUST_ORDER. We'd love to split this table into two, but how?
Figure 1 shows the original design.

Figure 1. Our data model before the DMA solution
Ok, let's get fixin'!
We decided that we could really split this up fairly easily. It'd be great to achieve something like in Figure 2.

Figure 2. Our data model including the views that improve the overall design
By dividing the order data into two tables, we avoid data repetition and have a generally more sustainable data model. But how can we arrive at this model given our existing data structure?
To achieve our desired structure, we can define database views on top of our existing schema that use the current data in our overly denormalized table(s). Our views, however, will present this data in a normalized way. The ORDER_V view is really just a grouped and simplified version of the CUST_ORDER table (removing specific order item information and grouping by the order_id). Here's the definition:
CREATE VIEW dma_example.order_v
AS select
dma_example.cust_order.order_id AS order_id,
dma_example.cust_order.order_cust AS order_cust,
max(dma_example.cust_order.order_date) AS order_date
from dma_example.cust_order
group by dma_example.cust_order.order_id;
The ORDER_ITEM_V view captures only the order item details, ignoring the customer id and the date (information that can be obtained from the ORDER_V view). Here's the ORDER_ITEM_V's definition:
CREATE VIEW dma_example.order_item_v
AS select
dma_example.cust_order.order_id AS oi_order,
dma_example.cust_order.order_item AS oi_item,
dma_example.cust_order.order_item_price AS oi_price,
dma_example.cust_order.order_item_qty AS oi_qty
from dma_example.cust_order
where (dma_example.cust_order.order_item is not null);
So what we've basically done is split one table into two.
|
We now want to be able to treat our new views as if they were tables--inserting, updating, and deleting to our hearts' content without actually worrying about what is going on behind the scenes. Although some views may be directly updatable without any further intervention on the part of the database designer, our views are a little more complex, and we want to make sure we control exactly how the database will affect the underlying (CUST_ORDER) table. The best way to do this is to define code on the database that will execute every time we try to execute one of these CUD operations against our views.
In most databases (MS SQL Server, Sybase, Oracle, DB2), we can define INSTEAD OF triggers (PostreSQL uses "rules" that behave similarly) that will be responsible for inserting, updating, and deleting records from the underlying table from which the view is defined. MySQL, however, does not currently support INSTEAD OF triggers. In their place, we can create stored procedures and, through careful configuration of Hibernate mapping files, call these stored procedures every time a CUD operation is triggered in our code (and persisted by Hibernate). Be it stored procedures or instead of triggers, the code is very similar.
Since our example uses MySQL, we will demonstrate our solution using stored procedures.
Our stored procedures for inserting, updating, and deleting into our denormalized table must take into account all aspects of the denormalization: repetitive rows, additional fields, superfluous values, etc. When we use these stored procedures, the data model we created with the definition of nice, normalized views is turned back into the flawed, denormalized structure. Why? Because the rest of our applications are expecting the data to be presented in this way. In addition, our view definitions rely on the data to exist in the current structure.
So what does one of these procedures look like? Here's an example of inserting an item to the order:
create procedure insert_order_item
(in itemprice FLOAT, in itemqty INT, in orderid INT, in itemid INT)
LANGUAGE SQL
BEGIN
DECLARE p_order_id INT;
DECLARE p_cust_id INT;
DECLARE max_order_id INT;
DECLARE p_itemprice FLOAT;
-- apply the current price to the line item
if itemprice is null then
select prod_price into p_itemprice from product where prod_id=itemid;
else
set p_itemprice = itemprice;
end if;
-- get the customer id.
select order_cust into p_cust_id
from cust_order where order_id=orderid limit 1;
insert into cust_order
(order_id, order_cust, order_date,
order_item, order_item_price, order_item_qty)
values
(orderid, p_cust_id, now(), itemid, p_itemprice, itemqty);
END
Notice that whatever data is usually missing from the ORDER_ITEM_V view has to be sought out and inserted in the underlying CUST_ORDER table. This procedure, if successful in inserting into the CUST_ORDER table, will return the number of rows affected as 1. It is important to note that Hibernate expects either 1 or 0 as a result of these stored procedures, since it treats them as single rows in tables (even though they are really views). To ensure that this happens, we might have to throw little tricks into our stored procedures. For instance, the stored procedure to update an order may affect various rows in the CUST_ORDER table (one row for every order item). If we were to simply update all the rows with the given order ID, the rows' affected value returned would be greater than 1. Since this would present a problem for Hibernate, we use a small table and update it after the update to the CUST_ORDER table. This causes the stored procedure to return 1 as the number of affected rows (since the update we executed only affects one row). Here is what the stored procedure looks like:
create procedure update_order
(in ordercust INT, in orderdate DATETIME, in orderid INT)
LANGUAGE SQL
BEGIN
update cust_order set order_cust=ordercust,
order_date=orderdate
where order_id=orderid;
if row_count() > 0 then
update help_table set i=i+1;
end if;
END
Creating the POJOs and Hibernate mappings for your new, view-based data model is fairly straightforward. There are, however, a couple of gotchas to keep in mind here.
Although database views do not have foreign and primary keys, you should still map these in your solution's mapping file. This allows other developers to treat this new data model as if it were a true physical model. Furthermore, mapping these elements will ensure an almost seamless transition when you move on to a final solution based on real tables.
When using stored procedures (you do not need to do this if your solution is implementing instead of triggers), you must override the insert, update, and delete calls with calls to your stored procedures. This is done by adding <sql-insert>, <sql-update> and <sql-delete> elements to the mapping. These elements tell Hibernate to call the given procedures instead of inserting, updating, and deleting directly to the database. Here is the ORDER_V mapping:
<?xml version="1.0"?>
<!DOCTYPE hibernate-mapping PUBLIC "-//Hibernate/Hibernate Mapping DTD 3.0//EN"
"http://hibernate.sourceforge.net/hibernate-mapping-3.0.dtd">
<hibernate-mapping>
<class name="org.onjava.shared.vo.Order" table="order_v" catalog="dma_example">
<id name="orderId" type="int" column="order_id" />
<property name="orderCust" type="int" column="order_cust" />
<property name="orderDate" type="timestamp" column="order_date" length="19"/>
<set name="items" inverse="true" cascade="all-delete-orphan" lazy="true">
<key column="oi_order"/>
<one-to-many class="org.onjava.shared.vo.OrderItem" />
</set>
<sql-insert callable="true">{call insert_order(?, ?, ?)}</sql-insert>
<sql-update callable="true">{call update_order(?, ?, ?)}</sql-update>
<sql-delete callable="true">{call delete_order(?)}</sql-delete>
</class>
</hibernate-mapping>
The parameter order is important here. Refer to the custom SQL reference of the Hibernate manual to determine the parameter order in your stored procedures.
Once the right mapping is in place, the data access objects for the view-based data model are identical to table-based models. Hibernate takes care of executing the stored procedures and treats the views much like tables. See this article's sample DMA solution for complete data access classes for the ORDER_V and ORDER_ITEM_V views.
|
Extensive testing is one of the most important activities during the creation of a DMA solution. Only thorough testing can ensure a correctly functioning view-based (logical) data model. All aspects of the new data model must be explored in tests. And, of course, it is imperative to test both working cases and failing cases.
A great aid in automating testing is DBUnit. Although we won't go into great detail on how DBUnit works (a great OnJava article by Andrew Glover already does that) a couple of important pointers should be noted:
As far as the tests themselves, look at them as a way to exercise your DAOs and value objects. Here are some of the kinds of tests we recommend. For a closer look at the implementation, look at the sample code included with this article.
Once you have completed all of these tests, you can be confident that your new data model is fairly bomb-proof.
Now that you have upgraded your application to use a sane data model, keep in mind that other applications will be accessing the same data using slightly different points of contact. This shouldn't worry you, it's just something to keep in mind. Figure 3 demonstrates how your new and improved application lives peacefully alongside the existing legacy applications.

Figure 3. A legacy application and the DMA solution coexisting peacefully and manipulating the same data set albeit through different models
So you've implemented this fancy solution to fix your data model. As the months go by, developers update their applications and begin to use this new view-based data model. But the underlying denormalization (or whatever faulty design exists) is still there, and you want to get rid of it. But how? It's actually simpler than you might think. Here is the step-by-step guide:
Voila! If you've done things right, not a single line of Java code needs to be modified, and your applications will behave exactly the same. This is where the true beauty of this kind of solution is evident. By abstracting the data model through Hibernate and database procedures, you can achieve an impressive change with little effort. Of course, this doesn't mean that you shouldn't retest everything thoroughly--the good news is that your tests are still totally valid as well (if you are using XML data sets, make sure to replace the view name with the table name).
Using some of the latest and greatest Hibernate technologies, Java testing methodologies, and smart use of your databases resources, we have shown you that iterative change is possible. What is magical about solving data model problems this way is that the solution is mutually inclusive. This means that while you have solved the problem for yourself (in your application), other applications accessing the same data can continue to operate fine until they wise up and jump on your corrected bandwagon. It's a really friendly approach to data model migration.
As a final note, we would like to remind you to keep the following in mind when implementing your solution:
Gilad Buzi has been involved in data driven application development for over ten years. He is currently a Principal Software Engineer with The Code Works Inc.
Kelley Glenn has worked in the software development industry for more than 10 years, with experience in telephony billing and enterprise application integration.
Jonathan Novich is co-founder and partner at The Code Works, Inc. with more than 10 years of experience in software consulting and development.
Return to ONJava.com.
Copyright © 2007 O'Reilly Media, Inc.