ONJava.com -- The Independent Source for Enterprise Java
oreilly.comSafari Books Online.Conferences.


AddThis Social Bookmark Button

Agile Database Refactoring with Hibernate

by Gilad Buzi, Kelley Glenn, Jonathan Novich

Your data model was near perfect when your application was first written. Since then, it has evolved. You've hacked, you've denormalized, and, as a result, you've spent countless hours in meetings ranting about the fixes you need to put in place.

Yet, you're ambivalent. Despite your cogent arguments, you're loath to putting together the "change-all-your-data-all-at-once" plan. It's just too risky. There are countless applications that directly read from and write to your database--you can't change all of them at once! If only you could only fix your data model one piece at a time, and one application at a time.

It's a typical scenario, really. Over time, IT organizations at small, medium, and large enterprises create disparate applications that access vital data stored in a centralized database. And slowly, moderately ill-designed data models start dragging down performance, scalability, and the overall efficiency of an organization.

In this article, we will show readers how to upgrade their faulty schemas and data models without affecting existing applications or processes. By using the latest technology from Hibernate (version 3.0 and up)--along with a combination of database views, stored procedures, and standard design patterns--application developers and data architects can repair a faulty data model, one piece at a time.

Steps to Follow

Here's how we'll do it:

  1. Dream up an improved data model: Agree on what's wrong with the current model and how you could fix it
  2. Develop database views: Based on the current (faulty) model, these views reflect how you would like your data model to be
  3. Develop stored procedures or "instead of" triggers: These will replace a standard table insert
  4. Develop POJOs, Hibernate mappings, and DAOs: Represent your new data model and tie it to your views
  5. Test, test, test: Prove your data model right

Now let's dive into the details of each one of these steps. But first, let's present the example at hand.

Our example is an overly denormalized order system. Instead of dividing the orders into an ORDER table and an ORDER_ITEM table, the original data designer decided to put all order information into one table, CUST_ORDER. We'd love to split this table into two, but how?

Figure 1 shows the original design.

Figure 1. Our data model before the DMA solution

Ok, let's get fixin'!

Dream Up an Improved Data Model

We decided that we could really split this up fairly easily. It'd be great to achieve something like in Figure 2.

Figure 2. Our data model including the views that improve the overall design

By dividing the order data into two tables, we avoid data repetition and have a generally more sustainable data model. But how can we arrive at this model given our existing data structure?

Communication is key. Although this has little to do with coding and testing, it is an important point. Make sure to involve all stakeholders in the new design of your data model. This includes developers working on other applications that access this data, database administrators who will have to maintain the new data model, and finally technical managers and technical business analysts who may have their own ideas about where the data model should go. We really can't say enough about how important communication is.

Develop Database Views

To achieve our desired structure, we can define database views on top of our existing schema that use the current data in our overly denormalized table(s). Our views, however, will present this data in a normalized way. The ORDER_V view is really just a grouped and simplified version of the CUST_ORDER table (removing specific order item information and grouping by the order_id). Here's the definition:

CREATE VIEW dma_example.order_v 
        AS select 
        dma_example.cust_order.order_id AS order_id,
        dma_example.cust_order.order_cust AS order_cust,
        max(dma_example.cust_order.order_date) AS order_date 
     from dma_example.cust_order 
     group by dma_example.cust_order.order_id;

The ORDER_ITEM_V view captures only the order item details, ignoring the customer id and the date (information that can be obtained from the ORDER_V view). Here's the ORDER_ITEM_V's definition:

CREATE VIEW dma_example.order_item_v 
        AS select 
        dma_example.cust_order.order_id AS oi_order,
        dma_example.cust_order.order_item AS oi_item,
        dma_example.cust_order.order_item_price AS oi_price,
        dma_example.cust_order.order_item_qty AS oi_qty 
     from dma_example.cust_order 
     where (dma_example.cust_order.order_item is not null);

So what we've basically done is split one table into two.

Pages: 1, 2, 3

Next Pagearrow