Agile Database Refactoring with Hibernateby Gilad Buzi, Kelley Glenn, Jonathan Novich
Your data model was near perfect when your application was first written. Since then, it has evolved. You've hacked, you've denormalized, and, as a result, you've spent countless hours in meetings ranting about the fixes you need to put in place.
Yet, you're ambivalent. Despite your cogent arguments, you're loath to putting together the "change-all-your-data-all-at-once" plan. It's just too risky. There are countless applications that directly read from and write to your database--you can't change all of them at once! If only you could only fix your data model one piece at a time, and one application at a time.
It's a typical scenario, really. Over time, IT organizations at small, medium, and large enterprises create disparate applications that access vital data stored in a centralized database. And slowly, moderately ill-designed data models start dragging down performance, scalability, and the overall efficiency of an organization.
In this article, we will show readers how to upgrade their faulty schemas and data models without affecting existing applications or processes. By using the latest technology from Hibernate (version 3.0 and up)--along with a combination of database views, stored procedures, and standard design patterns--application developers and data architects can repair a faulty data model, one piece at a time.
Steps to Follow
Here's how we'll do it:
- Dream up an improved data model: Agree on what's wrong with the current model and how you could fix it
- Develop database views: Based on the current (faulty) model, these views reflect how you would like your data model to be
- Develop stored procedures or "instead of" triggers: These will replace a standard table insert
- Develop POJOs, Hibernate mappings, and DAOs: Represent your new data model and tie it to your views
- Test, test, test: Prove your data model right
Now let's dive into the details of each one of these steps. But first, let's present the example at hand.
Our example is an overly denormalized order system. Instead of dividing the orders into an ORDER table and an ORDER_ITEM table, the original data designer decided to put all order information into one table, CUST_ORDER. We'd love to split this table into two, but how?
Figure 1 shows the original design.
Figure 1. Our data model before the DMA solution
Ok, let's get fixin'!
Dream Up an Improved Data Model
We decided that we could really split this up fairly easily. It'd be great to achieve something like in Figure 2.
Figure 2. Our data model including the views that improve the overall design
By dividing the order data into two tables, we avoid data repetition and have a generally more sustainable data model. But how can we arrive at this model given our existing data structure?
Develop Database Views
To achieve our desired structure, we can define database views on top of our existing schema that use the current data in our overly denormalized table(s). Our views, however, will present this data in a normalized way. The ORDER_V view is really just a grouped and simplified version of the CUST_ORDER table (removing specific order item information and grouping by the
order_id). Here's the definition:
CREATE VIEW dma_example.order_v AS select dma_example.cust_order.order_id AS order_id, dma_example.cust_order.order_cust AS order_cust, max(dma_example.cust_order.order_date) AS order_date from dma_example.cust_order group by dma_example.cust_order.order_id;
ORDER_ITEM_V view captures only the order item details, ignoring the customer id and the date (information that can be obtained from the
ORDER_V view). Here's the
CREATE VIEW dma_example.order_item_v AS select dma_example.cust_order.order_id AS oi_order, dma_example.cust_order.order_item AS oi_item, dma_example.cust_order.order_item_price AS oi_price, dma_example.cust_order.order_item_qty AS oi_qty from dma_example.cust_order where (dma_example.cust_order.order_item is not null);
So what we've basically done is split one table into two.