A Hibernate Snafu
2006-08-11
When configuring Hibernate, it is critical to know which relationships to make eager and which to make lazy. One wrong mistake can be fatal for your application.
I have been working on this large web application, a job site for the Canadian government. We were experiencing severe performance problems.
There are several tables used to represent a job advertisement. There is the summary information, another table with more detailed information, tables used to do one-to-many joins for job ads involving several geographic locations, and so on. We thought that the performance problems were simply the result of having (by necessity) all these tables.
My colleague, who designed the database, also wrote the db access layer of the application using Hibernate. In case you are not familiar with it, is a framework that provides the glue between Java classes and database tables. You give it configuration files telling it how to map Java class properties to database columns and it takes care of all the rest. You can represent relationships between classes without having to explicitly deal with foreign keys. It turned out that the performance problems were the result of a little problem with the Hibernate configuation.
Let's say that we have a primary table called 'primary' and a secondary table, containing more detail about an entity, in a table called 'secondary'. primary has a field secondary_id which is a foreign key to secondary. In the Hibernate config, we created a bidirectional relation so that in Java we can write:
Secondary sec = primary.getSecondary();
or
Primary prim = sec.getPrimary();
You can also configure Hibernate to do 'eager fetching'. This will cause it to preload entities to avoid doing an extra database lookup. Now, what was happening was this:
1. We would load a primary from the database.
2. We would load the secondary using Secondary sec = primary.getSecondary();
3. Because of the (unnecessary) relation from sec to primary, and because of eager fetching, this caused sec to try to find its primary (even though we had already loaded it.).
So it would execute: select * from primary where secondary_id = ? and because there was no index on that field, it would cause a full table scan on the primary table, which right now contains 13427 rows. This site gets about 120,000 page views a day, and most of those are looking at job advertisements, so that meant tens of thousands of full table scans every day.
Once we discovered that, we removed that relation, and all our performance problems disappeared.
The moral of that story is: if use use Hibernate, have a good look at every sql statement it executes.