Chapter 6. Test-driving data access
Information is the currency of democracy.
Thomas Jefferson
Data is everywhere. Think about it. How many systems have you worked with in your professional career where the business logic you implemented did not operate on persistent data? Exceptions do exist, of course, but the majority of systems in today’s enterprises deal with persistent data—and with lots of it. As such, data has a central role in the development of these systems and in the day-to-day lives of software developers. Furthermore, if you’d have to pick one component of an enterprise system as the single most important, it’d have to be the data. Yet software developers have a long tradition of relying on manual, visual inspection for testing their data-access code. Visual inspection is slow and error-prone—not how we as professional craftsmen want to work.
So far, we’ve ignored data as an element in our tests by faking where it’s coming from. In other words, although we have seen how we can implement business logic that needs persistence services using test-driven development and test doubles, we haven’t seen how to implement the persistence layer itself. In this chapter, we’ll pick up the slack and give the data layer our full attention.
After finishing this chapter, we’ll have all the tools necessary for using test-driven development on code that accesses and manipulates data, be it on the file system or in a relational database. We will focus on facilitating our test-driven process with testability-aware design and with technology that helps us get the job done. Along the way, we will also discuss the relative advantages and disadvantages of test-driving with unit tests and integration tests and of different strategies for managing our test data.
Ready. Set. Go.
Let’s begin by exploring the problem space before getting grease on our hands. First of all, we need to establish what supposedly makes test-driving data access code somehow different from test-driving “regular” code (for whatever definition of “regular”).
The main difference in data-access code compared to most other code is that its behavior typically spans multiple layers of a system’s architecture. Another common difference is that data-access code tends to make use of third-party APIs that are, nicely put, sometimes not as testing-friendly as they could be. In order to be able to test without all of the third-party stuff constantly hitting the disk or a database server, we need to be able to separate these issues.
Let’s continue our journey by identifying what kinds of boundaries data-access code typically crosses in Java and Java EE applications. Then let’s briefly study a design pattern that helps us implement our data access code using a testable design.
Data-access code almost by definition crosses boundaries in a system. Our application logic talks to our persistence logic, and the persistence logic talks to a persistence framework such as Hibernate or the EJB 3.0 EntityManager interface, or directly to the JDBC API, which in turn is implemented by a JDBC driver, which in turn connects to a relational database. Figure 6.1 illustrates this stack of layers on top of layers.
Figure 6.1. A pile of logical layers of a system. If we want to test-drive the persistence logic layer, which of the underlying layers should our tests exercise?

Tip
Test-driving EJB 3.0 code is covered in the bonus chapter, available online.
Ideally, we’d want to test-drive the code we’re working on without involving the layer below. We want our tests to be independent and running fast. Including too much from the layers below into our unit test increases the coupling in our tests and inevitably makes our tests run slower—and setting up the other layers would most definitely be more work we’d rather not do. Furthermore, the layer below might not exist yet.
With this in mind, let’s look at our options of test-driving the code inside the layers in figure 6.1. Working on the application logic layer, it’s easy to stub out the persistence logic. We verify that the chunk of business logic we’re testing invokes the appropriate methods on a persistence logic layer interface with the appropriate parameters. But looking at how to test-drive the persistence logic layer, we face a slightly more difficult decision—should we stub or mock the JDBC API (or the selected persistence framework’s interfaces, if applicable), or should we hit a database? Furthermore, should we hit a real database or a lightweight, in-memory database that is faster but not exactly identical to the real thing?
Later in this chapter, we will look at each of these different strategies one at a time, always implementing the same example scenario so that we can do a subjective comparison between them. First, though, let’s familiarize ourselves with a design pattern that helps us separate our code base into appropriate layers.
It might have something to do with the Java EE community’s long-time affection for design patterns and architecture blueprints (or it might be just that this stuff makes sense), but for one reason or another, most of the Java EE systems I’ve seen during the past five years have applied some kind of a variation of the Data Access Object (DAO) pattern, described in Core J2EE Patterns: Best Practices and Design Strategies (Deepak Alur, Dan Malks, and John Crupi; Addison-Wesley, 2003).
The core of the pattern is that for each persistent domain object, there is a Data Access Object interface (and implementation) for persisting the said type of object to a database, be it a relational database, an object database, or a bunch of XML files on the file system. The point is that the object itself doesn’t know and whoever is using the DAO doesn’t know where (and whether) the object was persisted or where it came from. As a result, we can swap the actual persistence technology without the application noticing. Figure 6.2 illustrates this with boxes and lines.
Figure 6.2. Data Access Object pattern: Domain objects being created, retrieved, and persisted through a Data Access Object interface without knowledge of the underlying database technology

In the following sections, we’ll reuse this pattern to illustrate the process of test-driving data-access-related code using different technologies. To give you a better idea of what a DAO interface might look like and to give us a basis for later examples, listing 6.1 shows a simple PersonDao interface for persisting Person objects.
The DAO pattern is enormously helpful compared to scattering data access all around the business logic. Also, although the DAO pattern typically reduces the absolute volume of data access code we need to write, adopting freely available open source persistence frameworks like Hibernate or more trivial support libraries like the JdbcTemplate facility from the Spring Framework can radically reduce this work. Essentially, a good framework can do most of the plumbing work for you and leave you to worry about the persistence logic rather than the gritty details of the persistence technology.
Tip
The implementation for the Person class referenced by the PersonDao interface is available as part of the source code download at the book’s website.
Without further ado, let’s roll up our sleeves again and start test-driving code that needs data and, therefore, needs to use a DAO to get at that data. We’ll start from a plain JDBC implementation of the PersonDao interface in listing 6.1, acknowledge its main problems, and then introduce Spring Framework’s JdbcTemplate and Hibernate as alternative solutions that simplify our work quite a bit.
Traditionally, Java developers have written raw JDBC code for accessing data in relational databases. Especially during the late nineties when the IT bubble was at its peak and everyone became a Java programmer, we could see JDBC calls scattered throughout a code base—with the result of having to change things all over when renaming a column in the database, for example. This was obviously a bad thing, and developers today are much more aware of the need to avoid such mistakes.
In this section, we’ll look at two frameworks that simplify our data-access code and the way we can test-drive such code with our unit tests. Those two frameworks are the Spring Framework and its JdbcTemplate and the Hibernate persistence framework. We’ll start, however, by looking at the foundation of all Java database access—the JDBC API—and how to test-drive JDBC-based data-access code.
Before we begin, perhaps we should clarify what we mean by unit tests in this context. I’ve seen people utter the words unit test when talking about a wide variety of things, including JUnit test classes running external system commands that fork JVM instances to run Java classes’ main methods. Those might be called developer tests, but they’re most certainly not unit tests according to my definition. I personally subscribe to Michael Feathers’ definition for unit tests,[1] which says that a test is not a unit test if:
1Working Effectively With Legacy Code (Addison-Wesley, 2005)
- It talks to the database.
- It communicates across the network.
- It touches the file system.
- It can’t run at the same time as any of your other unit tests.
- You have to do special things to your environment to run it.
These things are costly to do and make our tests run slower than necessary—sometimes up to several magnitudes slower than they would using a test double. With this in mind, we don’t want to hit a database repeatedly in our unit tests but rather want to fake it somehow. This section will show us how to test-drive our data access objects effectively by using test doubles in the context of the aforementioned persistence frameworks and APIs.
JDBC could be considered one of the most important factors in the adoption of Java—accessing relational databases in a standard way is important, after all—but the API could’ve been made slightly easier to use. We say this because JDBC requires the developer to write a lot of plumbing code to catch checked exceptions and make sure to close various resources. Furthermore, the depth of the JDBC API makes it somewhat cumbersome to test. Let’s see this by test-driving a piece of a pure JDBC-based implementation of the PersonDao interface from listing 6.1.
Let’s say we’ve decided to start by implementing a findByLastname method from the PersonDao interface. Because we’re implementing a pure JDBC DAO, we’re going to fake the JDBC connection and return hard-coded data when the DAO performs a query. The way we want our DAO to work is by passing it a DataSource from which to obtain JDBC connections. In our test, we’re obviously going to pass a test double as the DataSource. In production mode, we’ll use dependency injection to obtain a real DataSource. In code, what we want is this:
The problem is, of course, that we don’t yet have a DataSource object, and we don’t want to connect to a real database from our unit tests.
Because we don’t have a DataSource object yet, we’ll need to create one prior to proceeding to the previous block of code invoking our production code. We’ve decided to use EasyMock for creating a mock implementation for the DataSource. Not counting the static import statement, creating the test double with EasyMock is a one-liner:
Now, when we execute findByLastName, we expect the DAO implementation to ask the DataSource for a Connection. That means we need a test double for the Connection interface. Then, we expect the DAO to ask the connection object to prepare a statement with an SQL template string and to populate the statement with the last name given as argument. So, we also need a test double for the PreparedStatement interface. Let’s look at what we’ve got so far for creating all of these mock objects and expectations:
That’s a lot of mock objects already, and there are more to come. Next, we’d like the DAO to execute the statement object. The executeQuery method returns a java.sql.ResultSet interface for which we still need one more test double.
This time, we’ll decide to not use EasyMock but rather a fake implementation provided by the MockObjects.com library. This is because (especially for large result sets) setting expectations for each and every method call would quickly explode the volume of our test code, and we’re not interested in how the DAO pulls out the pieces of data from the ResultSet—we’re interested in knowing that it builds and returns the right kind of list of people from the result set.
The following snippet shows how we configure a MockMultiRowResultSet object and make our mock PreparedStatement object return it when executeQuery is called:
The MockMultiRowResultSet object expects the data to be given to it as a two-dimensional array. In the end, we’d like to be able to compare the List of Person objects returned by the DAO’s findByLastname method with something. In order to minimize duplication of data, we want to create the list of people just once, convert the list into a two-dimensional object array for the MockMultiRowResultSet, and use the same list in comparing the DAO’s output.
After everything else, we still want our DAO implementation to do something—release (close) all of the JDBC resources:
Listing 6.2 shows our test in its final form, specifying the wanted behavior for our DAO implementation—that is, when everything goes according to plan and the database contains three matching people.

That’s what I call a big test. First, we create and configure mock implementations for the DataSource, Connection, and PreparedStatement JDBC interfaces by using EasyMock. Then, we
populate a test double for the ResultSet JDBC interface using a static mock object from the MockObjects.com library. Finally, after
recording the expected release of all JDBC resources, we
inject our mock DataSource into a JdbcPersonDao instance, invoke the finder method, and compare the returned list of Person objects to the one with which we populated the mock ResultSet. Not the most compact test method we’ve seen (although I have seen worse), but this is to be expected when working with the JDBC interfaces.
For the sake of completeness, listing 6.3 shows an implementation of the findByLastname method that passes the previous test with flying colors.
Note that our implementation is only closing the JDBC connection, for example, if everything goes fine and none of the prior method invocations to the JDBC API throw exceptions. In other words, we’d still need to write a couple of tests to verify that the proper close calls take place even though a query throws an exception, for example.
Instead of doing that, let’s look at how the use of a good framework can alleviate the pain of test-driving JDBC code. There are plenty of open source frameworks that attempt to simplify data-access code in Java and Java EE applications, but we’ll look at only two of those: the JDBC support provided by the almost ubiquitous Spring Framework, and the Hibernate persistence framework that has led object-relational mapping in Java to the mainstream.
Let’s begin with a tour of how Spring’s JdbcTemplate can significantly reduce the pain of working with the JDBC API.
The Spring Framework includes a facility named JdbcTemplate, which purports to simplify data-access code to only the essential. The fundamental idea behind the implementation is that most raw JDBC code follows the same pattern with certain boilerplate code always present: nested try-catch structures with conditional close calls for making sure all resources are released is a prime example. In practice, JdbcTemplate employs something similar to the well-known Template Method design pattern[2] and achieves a structure where the developer only needs to worry about the varying part; the boilerplate is taken care of by the framework. Figure 6.3 illustrates the structure with boxes and lines.
2 See Design Patterns: Elements of Reusable Object-Oriented Software (Addison-Wesley, 1994).
Figure 6.3. Collaboration between the template code provided by the Spring Framework and the DAO implementation provided by the developer

Spring implements this division of responsibilities through an abstract base class named JdbcDaoSupport, which our specific data-access object implementations can extend. Figure 6.4 depicts this structure.
Figure 6.4. DAO implementation extends from Spring’s JdbcDaoSupport class that provides JdbcTemplate functionality

What the abstract base class in figure 6.4—JdbcDaoSupport—provides is essentially one getter method for obtaining an instance of the JdbcTemplate class. The JdbcTemplate class encapsulates the gritty details of raw JDBC interfaces, offering instead higher-level methods to the DAO implementor’s disposal. The signature of one of these higher-level methods is shown here as an example:
Notice how there’s no reference to a Connection, a PreparedStatement, or anything else JDBC. We pass in a SQL statement, the argument values with which to populate any parameter placeholders, and something named RowMapper. The RowMapper is an interface provided by Spring and basically represents an object that knows how to convert a row in a database table into an object. The RowMapper interface is presented here in full:
The RowMapper interface has a dependency to the JDBC ResultSet interface, but this single dependency doesn’t pose a problem when testing DAO implementations because it’s hidden inside a RowMapper implementation—which can be test-driven separately. Furthermore, the same RowMapper implementation can be reused for practically all queries dealing with the table in question.
Although the JdbcTemplate class provides a horde of other methods in addition to the query method we just saw, we’ll ignore those others for now and get our hands dirty again, implementing a JdbcTemplate-based DAO class test-first. We’ll again implement the findByLastname method from the PersonDao interface introduced in listing 6.1. For that purpose, we’re going to need a RowMapper implementation that converts rows into Person objects, so why not start from there?
Let’s assume for the sake of this example that a Person object consists of only two properties—first name and last name—and that these properties map to columns first_name and last_name in the database. With these assumptions, we could write the test in listing 6.4.
In listing 6.4, we’re creating a mock ResultSet object representing a single row and then
comparing the return value from the RowMapper implementation to a Person instance populated with the expected values. The matching implementation is shown in listing 6.5.
This is the basic scenario—straightforward mapping of a row to an object—but there can be other kinds of scenarios as well. For example, we might want the PersonRowMapper to populate several fields of the Person object based on a single column value. Null values, conversions between Booleans and numeric values, and so forth are other scenarios where a more elaborate implementation—and the associated tests—would be necessary.
But now that we have a PersonRowMapper implementation, let’s move on and test some functionality in the JdbcTemplate DAO implementation.
In addition to implementing the PersonDao interface, our JdbcTemplate-based DAO implementation should also extend JdbcDaoSupport from the Spring Framework. The JdbcDaoSupport provides all the plumbing for our DAO class and exposes access to JdbcTemplate through a simple getter method, getJdbcTemplate. The DAO can then invoke JdbcTemplate’s methods and the framework takes care of cleaning up resources afterwards.
In our case, we want to test-drive an implementation for the findByLastname method and decide that the DAO should use the query method that takes the SQL template, an array of query parameters, and a RowMapper implementation as arguments. We also need to stub the JdbcTemplate class itself in our test and make it return our hard-coded list of Person objects when the query method is invoked.
Listing 6.6 shows our test in its full glory.

Our test in listing 6.6 creates a mock implementation of the JdbcTemplate class and configures it to return a hard-coded list of Person objects for the SELECT * FROM employee WHERE last_name = ? query. Then we
inject our mock JdbcTemplate into the DAO and
invoke the finder method, verifying that it returns the list received from the query method.
Listing 6.6 also uses EasyMock’s custom argument-matching functionality because we need to make our test double expect a method call that involves arrays, and the default matching logic doesn’t work with arrays. As you might not be familiar with custom matchers just yet, perhaps a word about them is called for. Let’s take a closer look at what we instructed EasyMock to do regarding the query call.
The common case with recording expectations with EasyMock is to invoke the method and specify a return value with andReturn or by asking EasyMock for an interface to tell the return value for the previous method call. In replay mode, EasyMock then records the method calls as they take place. Finally, when we ask EasyMock to verify that our mock objects’ expected collaboration did take place, EasyMock takes the list of recorded method calls and the list of method calls and performs a simple comparison, using the objects’ equals method to compare argument values.
In some cases, the equals method is not a valid strategy for comparing the expected with the actual. Arrays are a good example—two array objects claim they’re not equal even though they contain objects that are equal between the two arrays. Furthermore, in some cases we’re not interested in the exact arguments being passed. For example, we might only be interested to know that a String starting with certain characters is passed or that the argument is an instance of a certain class. For these cases, EasyMock gives us a way to override the default behavior of using equals for argument matching.
In listing 6.6 we executed the following statement:
The eq, aryEq, and isA methods were statically imported from the EasyMock class. The implementations of these methods do exactly one thing: register an implementation of the IArgumentMatcher interface with the current execution context. The eq method registers an implementation that uses the equals method. The aryEq method registers an implementation that first compares the lengths of the expected and actual array and then (if the array lengths match) loops through the arrays, comparing each entry with its equals method. The isA method registers an implementation that only checks that the actual argument is an instance of the expected class.
Because of Java’s language semantics, the eq, aryEq, and isA methods are executed before the query method. As a result, when the mock object receives the query invocation, EasyMock notices it has custom argument-matchers registered and knows to use those matcher implementations when the method is later called when the mock is in replay mode.
You can learn more about the advanced features of the EasyMock framework from the project’s website. Now, let’s get back to the PersonDao implementation.
In our test, we injected a JdbcTemplate into the DAO, expecting the template object’s query method to be called with certain arguments. In fact, that’s pretty much all we need to do in our DAO because JdbcTemplate takes full care of handling the JDBC resources. Listing 6.7 shows the complete DAO implementation.
As we can see, JdbcTemplate takes away all the boilerplate from our data-access objects and leaves us to worry about the query logic alone—making our tests more compact and maintainable in the process.
As much as I appreciate the band-aid JdbcTemplate brings to both writing and testing data-access code, I personally like our next subject—the Hibernate framework—even more. Enough that the section’s title includes the word nirvana!
Although Spring’s JdbcTemplate and the RowMapper interface make JDBC code less of a pain to work with, the code we write for mapping our domain objects into database columns is still boilerplate we’d rather not write (and maintain). After all, because we’re going to add a new field and accessor methods to our domain object anyway, why should we need to tell our RowMapper implementation about the new field? Wouldn’t it be nice if our persistence infrastructure would pick up the new field automatically?
This is what modern object-relational mapping (ORM) frameworks like Hibernate do for us. Hibernate lets us define our persistent fields in exactly one place—in the code with simple annotations. The framework takes care of mapping annotated fields on a domain object into a database table and columns. The key is having smart defaults. Most of the time, we can add a field, and it is by default considered persistent. On those occasions when this default doesn’t suit our purposes, we can declare the field as being transient either by adding the @javax.persistence.Transient annotation on the field or by using Java’s transient keyword.
These advantages have made Hibernate one of the most popular Java persistence frameworks. Considering how Hibernate has spread all over the place, we’d better know how to work with it and, especially, how to test and test-drive the data-access objects of a Hibernate-based application. Before looking at how Hibernate changes the way we write our unit tests and how it simplifies our data-access code, let’s take a brief tour of some of the main concepts in the Hibernate API.
The Hibernate API has been designed to give its users complete independence of the JDBC API—if that’s what they want—while still leaving enough hooks in place to bypass Hibernate’s abstractions when necessary, exposing access to the underlying JDBC connections. Furthermore, Hibernate gives us a specialized query language named Hibernate Query Language (HQL) while also supporting SQL-based queries for situations where proprietary database features are required.
For the majority of users, the Hibernate API’s most important abstractions are the SessionFactory, Session, and Query interfaces; their collaboration is described on a high level in figure 6.5.
Figure 6.5. The SessionFactory, Session, and Query interfaces represent the main concepts of the Hibernate API.

The SessionFactory acts as the starting point for everything related to Hibernate. The SessionFactory is configured to wrap the DataSource and is responsible for keeping up a façade, handing out Session objects when asked. We could consider sessions the rough equivalent of connections in the JDBC API, except that they’re responsible for some serious persistence and optimization magic that leaves the JDBC API spinning its wheels. With the Session object, we perform basic CRUD operations directly by using methods defined in the Session interface, or more elaborate queries by asking the Session object to create a Query. The Query interface gives us ways to build dynamic queries at runtime.
These three interfaces are—most of the time—all we need to worry about when developing data access objects using Hibernate. So, if they’re all interfaces and the SessionFactory is our access point to the triad, where does the SessionFactory come from? You might’ve guessed already that we typically pass the SessionFactory to our data-access objects using dependency injection, but we haven’t yet discussed how to get an instance to inject in the first place. There are two answers to this question—both involving the Configuration class.
The first and more common way to obtain a SessionFactory instance is to define an external configuration file as either a properties file or an XML document (named hibernate.properties or hibernate.cfg.xml, respectively) in our class path and let the Configuration class load the configuration from there, constructing an implementation of the SessionFactory on top of the configured database, using the configured connection details, pooling, caching strategy, and so on.
The other, less frequently encountered means of obtaining a SessionFactory instance is to tweak a Configuration object programmatically in Java and thus get a dynamically configured SessionFactory object. The programmatic configuration can come extremely handy in building a suite of automated integration tests with a framework like JUnit. On the other hand, an external file representing a test configuration might be easier to keep in sync with the production configuration.
Because this is a chapter about test-driven development and not Hibernate, this is as far as we’ll go with regard to Hibernate details. What we know about the responsibilities of the SessionFactory, Session, and Query interfaces is enough for us to be able to start test-driving Hibernate DAOs. In fact, because we don’t want our unit tests to touch the database, we’re going to mock everything! Let’s redo the PersonDao one more time, Hibernate-style.
Let’s build up our first test for a Hibernate-based DAO in small steps—the API is pretty simple and the patterns are easy to pick up, leading to a somewhat routine process (which is good news because it means we can move the little plumbing we need into a common setup shared between tests and test classes).
Say we want to start again by implementing the findByLastname method into our DAO. We know we’re going to need a SessionFactory as well as a Session. We also know that we’re going to use the Query interface for implementing this specific finder, so we’ll need one of those as well. Our first step in writing a test is thus creating dynamic mock objects for these three interfaces using EasyMock, as shown in listing 6.8.
With the plumbing in place, we are ready to proceed to defining our test. I’ve developed a habit of defining the expected HQL statement and any parameters for the query in the beginning of the test method. In the case of looking up people by their last name, we might type something like this:
We also need to make our mock objects return a bunch of Person objects when asked for people named Smith, so let’s quickly create a list of Smiths:
That should do it. Now, what remains is the meat of our test—that is, defining the way we expect our DAO to interact with the Hibernate API.
Let’s assume for a while that we’re using an architecture where the opening and closing of sessions is performed in a crosscutting way using something like a Servlet filter—a component that executes around the Servlet itself, freeing the Servlet from worrying about the session getting closed.
In other words, all of our DAO classes should use the current session rather than ask the SessionFactory to create a new one. Next, having a reference to a Session object, we want our DAO to ask the Session for a Query, passing the HQL statement as an argument. Finally, after populating any query parameters in the Query object, we expect our DAO to ask the Query to list all matching Person objects (and to return that list). Listing 6.9 shows how these expectations translate to the EasyMock syntax.
Having configured the SessionFactory’s getCurrentSession method to return our mock Session, which in turn is configured to return our mock Query, we’re all set for switching our mocks into replay mode and invoking the production code. Listing 6.10 shows our complete test for the findByLastname method.

Recording the expected interaction with the Hibernate API using EasyMock is easy compared to a similar test for a JDBC-based implementation. This is both because there are not many method calls to make to the Hibernate interfaces in the first place and, especially with more complex queries, because the Query API happens to have almost no void methods, thus making it possible to use the compact expect(...).andReturn(...) syntax. With a dependency injection-based design, it’s also straightforward to wire our DAO implementation with our small graph of test doubles.
Enough praise for the Hibernate API; let’s look at what kind of implementation would satisfy our test in listing 6.10.
The implementations for most finder methods in Hibernate-based DAO classes are almost trivial, and this is certainly the case with our findByLastname method, shown in listing 6.11. We store the injected SessionFactory into a private field,
ask for the current session when a method call comes in,
create and
populate a query object, and return the resulting List directly. This is all listing 6.11 does.
The implementation is straightforward—too straightforward, actually. Our test hasn’t pushed our implementation to handle exceptions properly yet. Before it becomes a habit to skip the exception cases, let’s look at how such a variation might look like for our Hibernate-based DAO.
Let’s say we’d like our DAO to throw an unchecked exception from the findByLastname method when there’s an exception thrown from the Hibernate API. Further, let’s say we want the original exception to be chained as the cause for the unchecked exception. For that behavior, we might write a test like that in listing 6.12.
Instead of configuring our mock Query to return a list of people from the list method, we now make it throw a HibernateException. Also, instead of expecting a List to be returned, we expect to see a RuntimeException that wraps the original HibernateException we’ve configured our mock to raise.
Now, let’s see what kind of changes this test implies for our production code. Our tests are expecting a RuntimeException when the Query object’s list method fails. Although we could be sneaky with our implementation and only wrap the list call with a try-finally block, forcing us to write further tests that throw the exception from, say, SessionFactory#getCurrentSession, we won’t do that. Instead, we’ll go for the correct, obvious implementation directly. The final implementation is shown in listing 6.13.
Our production code got a bit more verbose, but it’s also a lot more robust because we’re now explicitly dealing with situations where, for example, the database connection dies suddenly in the middle of a transaction. Similarly, if we had wanted our DAO methods to ask the SessionFactory for a brand-new Session, our test would’ve probably wanted to check that the Session object also gets closed if there’s an exception, not leaving the sessions and their associated database connections hanging open indefinitely.
Reducing duplication with fuzzy matching and named queries
By configuring the expected query string in our test, we are effectively duplicating data between our test and production code. We could make our test reference a constant defined in our production code but that’s not the cleanest of solutions either, at least if you share my taste for code aesthetics.
One way of reducing this duplication is to use fuzzy matching in our tests, leaving the specific query string unverified. In other words, we would be relying on integration tests to verify that the query does work. EasyMock, for example, allows custom argument matching, which makes the use of regular expressions and similar techniques almost trivial. Following this approach, our mock objects would be only checking that the production code passes the right arguments to the persistence API.
Another useful technique for reducing duplication is the concept of named queries. Hibernate, among other modern ORM tools, supports externally configured, named queries in its API, which makes it possible to write unit tests using test doubles without dictating the query syntax. Again, the correctness of the queries would need to be verified with integration tests.
Speaking of database connections, let’s pull away from test-driving data-access objects with test doubles for a moment and think about the big picture. So far, we’ve executed our code only against test doubles, and we’ve configured them to expect certain queries with certain parameters, making them return hard-coded data. How do we know that the query we expected would work against the real database schema?
The answer is, we don’t. Knowing our tools well does make for a good probability that we’ve gotten a simple parameterized query correct, but we don’t really know until we’ve tested that it does work with the real thing. And this leads us to our next topic—driving our data-access code with integration tests rather than unit tests.
So far, we’ve written unit tests for our data-access objects that have led to an implementation that works correctly against our test doubles—mock objects we’ve configured expectations for based on our understanding of what the underlying database schema is (or will be) or what kind of a query the data-access code should use. Mock objects are a powerful tool for testing exception situations and avoiding the burden of having to maintain a test database. On the other hand, as already noted, we can’t be sure that we got our expectations correct until we execute our data-access code and our selected persistence framework against a real database.
Thus, we might also want to consider test-driving our data-access code with integration tests.
Integration tests for data-access code are tests that connect the code under test into a database. In other words, looking back at figure 6.1, on page 198, we’re testing everything from the persistence logic layer down to the relational database. Or, at least, down to the JDBC driver. What exactly do we mean by “at least”? Let me explain.
In other words, we’re testing the end-to-end integration of our data-access components, the persistence framework employed, and the relational database. Having said that, we might stretch the definition of “end to end” a bit by swapping a different relational database product for our tests than we have in production. This is where we need to make informed decisions about what are acceptable differences between the environment used for integration testing and the production environment.
Our production environment might be composed of a cluster of application servers, connecting to a cluster of database servers running Oracle, for example, and we might have configured our application server with a cluster-aware DataSource. In production, our persistence logic would connect to our persistence framework, which would talk to a JDBC connection, which would talk to a cluster-aware JDBC driver, which would talk to a whole bunch of other stuff before our request would hit the database node. All of this might be awesome for the system’s high availability or scalability, but it’s also probably at least a couple of magnitudes slower than a test double returning hard-coded data. And speed of test execution is crucial for test-driven development.
Not all systems require a heavy, clustered, load-balanced infrastructure for persistence. Some systems are perfectly suitable for a standalone relational database product running locally to the application server. Furthermore, some systems might deploy an embedded relational database. Even with all this variance between different systems, we can be reasonably confident about our persistence logic by using some kind of a relational database.
With that in mind, let’s think about what we’re interested in testing when it comes to data-access code. That is, what problems might’ve we missed with our unit tests?
We want fast tests. Blazing fast. If we can’t get that, we want almost blazing fast. What we don’t need is the absolute best, identical representation of the production environment for our integration tests. Thinking about the kinds of assumptions we make while writing pure unit tests for data-access code, we can identify three aspects that are more or less the biggest risks:
- Assumptions about the database schema or domain model
- Assumptions about the query statements used
- Assumptions about the object-relational mappings, if applicable
We have already eliminated things like correct use of the persistence API, because that’s what our unit tests are verifying. We have also eliminated the JDBC driver as a risk (as well as the underlying database), because we should be able to expect third-party code to function properly.[3] Let’s take a closer look at the three risks that remain on our list.
3 There are valid reasons for having tests for third-party software, including learning tests, but none directly related to test-driving data-access code.
The assumptions we make about the database schema or domain model are largely about the existence of and the naming of fields and columns. This is a risk because the references to fields or columns in our data-access code are often literal, expressed as strings rather than as part of our programming language syntax, which our compiler would be able to verify for us.
Similarly, to the compiler, our query statements are just strings and nothing more. Our ability to get them right is solely dependent on our knowledge of the query language syntax. In some cases, it’s also dependent on the support for the syntax features used by the underlying database engine. In other words, we need to know our tools and, believe it or not, sometimes we make mistakes anyway.
The assumptions about our object-relational mappings are effectively an extension of the first class of assumptions. For example, when we wrote unit tests for our Hibernate-backed DAO, we configured our mock objects to expect a certain kind of query statement and to return data upon that query. In doing that, we made assumptions about the mapping of the fields used in the query. Naming conventions do help, but still we inevitably sometimes leave a typo in, only to be discovered later when someone happens to use the functionality in a real deployment environment—finding out that lastName should’ve been lastname.
So the question becomes, how do we respond to the risks that are built in to these assumptions?
These are all things that we might go wrong even though we’ve built up a thorough suite of unit tests as part of our test-driven development. The good news is that these are all things that we can verify well enough with any standards-compliant relational database, even if it wouldn’t be the exact same product with which we’re going to deploy to production. After all, what we need is to execute our JDBC code against a database schema on pretty much any database that supports the SQL features we’re using; or, if we’re using an ORM framework like Hibernate, we need to execute our persistence logic and the framework code against any database supported by the framework.
Saying “any database” sounds perhaps a bit too grandiose to be accurate, though. Let’s talk a bit more about our options and their relative pros and cons in order to help us grasp the limits of this flexibility and what properties to consider when selecting the database for use in our integration tests.
When it comes to deciding which database to use in our integration tests, there are a few factors to consider:
- How closely the setup matches our production environment
- How easy it is to share and manage the configuration within the team
- How easy it is to access and control the database
Starting from the first...
Even though the JDBC API is a standard and seemingly common denominator among any databases we connect to from a Java application, there are differences that matter. These differences can make our system incompatible with the production database, somewhat dysfunctional, or behave seemingly erratically in production, even though all of our integration tests are passing against the test database.
The most obvious differences are in support for identity and uniqueness-related database features. For example, the concept of database sequences is a feature commonly used in high-end commercial databases such as Oracle and DB2 but not too widespread among popular open source databases.[4] This can cause some gray hairs for developers, for example, when the production database is MySQL, which doesn’t support SEQUENCE objects but rather uses its own AUTO_INCREMENT column type, which practically no other database supports. Furthermore, although two database products might have support for the same concept, the syntax for using the concept might be different. Sequence or identity generators and stored procedures are prime examples.
4 I’m certainly hoping this will change, because the SQL:2003 standard introduced sequence generators.
Because we need the same SQL syntax to work for both the production database and for whatever database product we use in our integration tests, a common solution is to pull the responsibility for generating unique identifiers from the database and inside the application’s persistence layer. Such an approach also tackles another related problem—how to figure out the assigned identifier for the newly persisted object/row—which also requires the use of proprietary database features. Generating identifiers in the application is not necessarily a mere workaround, however, because it does give our code base full control of the identity information for our domain objects.
Effectively, we’re talking about a trade-off between architectural complexity (having a portion of our application logic reside in the database) and code complexity (having to implement a sequence generator). Luckily, many persistence frameworks (including Hibernate) provide such functionality out of the box. Hibernate supports primary-key generators based on standalone algorithms as well as database concepts like sequences and identity columns, for example.
Another common problem with compatibility between two database products is the use of reserved words. For example, we might have a legacy system running on MySQL for which we’d like to start writing automated integration tests using a more lightweight database, HSQLDB.[5] Now, there’s a table that stores items that belong to ordered lists. For each row, there’s one column referring to the list it belongs to and another column indicating the absolute position of that item in the list. I’d wager that there’s a fifty-fifty chance that the developers have named the positional column “position” rather than something like “indx” or “ordinal.” The bad news is that although “position” is a perfect name for that column, it’s also a reserved word in HSQLDB. The good news is that columns and tables are easy to rename, so this is mostly a small annoyance rather than a showstopper for using a lighter-weight database.
There can also be more subtle differences between databases, such as the semantics of UNIQUE constraints for multiple columns. Although in most databases a multi-column UNIQUE constraint allows duplicates if at least one of the constituent values is NULL, this might not be the case with the database you’re planning to use for production or testing.
Even after all of these gotchas, it still makes sense to see if there’s a way to use a substitute for the real database in our integration tests. After all, we’re interested in improving our productivity, and getting to use a database in our integration tests that’s easier and faster to set up can be a huge improvement—huge enough that it might well be worth the trouble of changing our code base to accommodate for differences between the database products.
And speaking of ease of use, it’s not just about one developer, but a whole team. In other words, our setup needs to be easily shared and maintainable.
Another important aspect to consider when deciding which database to run our integration tests against is the amount of work involved in getting a fresh development environment set up and the tests running. Ideally, everything should work out of the box with no software to install manually. In practice, it’s often acceptable to be required to install software such as a database server as long as the installation doesn’t require too much configuration.
The problem with non-default configurations is that it suddenly becomes necessary to have developers maintain a custom configuration file somewhere specifying, for example, which port the database server is listening to and what are the username and password for connecting to it. A much better setup is one where all configuration information is maintained in the configuration management system, versioned along with revisions of the software itself and not requiring a new developer to perform any manual work in order to get going.
The ultimate ease of maintenance is reached when the database server is downloaded from version control and installed by automated build scripts, if necessary. Such a scenario is possible with many embedded or otherwise extremely lightweight database servers such as HSQLDB or McKoi.[6] We’ll soon see how HSQLDB can be embedded into our test suite, but before we go there let’s talk about the third factor that might affect our selection of the database for our integration tests.
Having a database server for our integration tests to successfully run against is one thing. It’s another thing to find out what went wrong when a test fails. Depending on the database server, the JDBC driver, and our persistence framework, the reason for a given failure might be blatantly obvious from the error message or it might require an elaborate debugging session. In some situations, it’s not that simple—especially when there’s no exception telling us that we’ve tried to use invalid syntax or that there’s “no such column” in the table we attempted to query. Analyzing our way down to the root cause of a discrepancy between what a query returns and what we expected it to return can be extremely tasking unless we can easily browse the contents of the database.
Having a local installation of a standalone database server listening for incoming connections on a network interface is probably the easiest of our options when it comes to access. This is because we can choose to use pretty much any graphical or command-line tool for browsing the database, executing SQL statements against it, and so forth. Embedded databases fare worse in this regard, because we typically need to access them through the Java API in-process rather than be able to connect remotely with our favorite graphical database browser. Then there’s the middle ground of an embedded database to which the application still connects over a network connection.
My personal preference is toward using a pure embedded database like HSQLDB because of its ease of integration with the test suite and zero installation. There are some downsides to using HSQLDB in embedded mode as we’ve just learned, of course, but these can be mitigated to a large degree by building simple tools for dumping the contents of the database to an external file, for example. Furthermore, there’s hardly anything faster than HSQLDB for integration-testing data-access objects written in Java.
We’re now going to look more closely at integrating HSQLDB into our integration test suite and test-drive a bit of data-access code. It’s still not rocket science, though, and we’ll proceed in small steps as usual.
In this section, we’ll add more functionality into the Hibernate-based PersonDao implementation we started developing earlier in this chapter. The ultimate purpose in this is to see first-hand what kind of infrastructure we might have for integration testing and what kind of obstacles we need to overcome. Even though we’re using Hibernate here as our persistence framework, after we’re through with our examples, you’ll be well-equipped to establish a similar setup for your specific technology stack.
We’ll begin by writing a simple integration test for persisting a transient object into the database. Once we’ve sketched an outline for the test, we’ll look into using an in-memory database and learn how to create the database schema for a test database built from scratch. After we’ve gotten that test implemented and running green, we’ll move on to discuss how we can keep our test data intact from one test to another using transactional fixtures.
That’s a lot to cover, so let’s get started!
We’re about to write our first integration test for a DAO dealing with Person objects. Our persistence architecture is Hibernate, but the whole HibernatePersonDao class we now want to test-drive doesn’t exist yet. The first thing we want the DAO to be able to do is persist a Person object. Persisting an object takes us straight into the beef with integration tests because it’s a simple piece of functionality and because it’s a write operation. Write operations are generally trickier than read operations and, thus, bring up interesting issues regarding how our tests maintain the database in a known state.
Don’t overdo it
Integration tests that hit a database (or a web service, for example) are a must-have. With that said, it’s worth noting that it is generally not necessary to test all persistence-related operations end-to-end with a database if most of them are similar. In other words, try to write integration tests against a database only for a representative subset of all read and write operations (or for learning purposes). The rest of the cases can then be covered with unit tests using test doubles, which generally cuts down the overall test run times significantly.
This is our starting point. What do we do first? How about programming by intention a little and writing a little test that verifies that a Person object we persist using the DAO can be found from the database afterward? Sounds good to me, of course, so let’s do it.
Wanting our DAO to be able to persist Person objects, we proceed to writing the outline of a test, shown in listing 6.14.
We first create an instance of the HibernatePersonDao class and configure it with a Hibernate SessionFactory that knows of the Person class. We’ve omitted the implementation of the getSessionFactory method so far—we’re just programming by intention for now.
Next we proceed to the “act” portion of our test in listing 6.14. We create a new, transient Person object and ask the HibernatePersonDao to save it. We then verify that the object received an ID and that
an identical Person object can be found from the database with that specific ID.
Beware of reusing a Session in tests
Notice how in listing 6.14 we obtain a new Session for verifying that the given Person object was indeed persisted. We could’ve just asked the SessionFactory for the current Session instead of creating a new one, but using an entirely separate Session is a more reliable technique. Using the same Session can sometimes hide a nasty bug when there is transactional behavior involved because all changes are visible within a given Session, regardless of whether or not those changes have been committed as part of the transaction.
That seems like a sufficient test for verifying that the DAO successfully persists the object through the Hibernate API into the database. Now, let’s look at how we could implement the missing getSessionFactory method.
The problem we’re facing is that we need to have two different Hibernate configurations—one for production and one for integration testing. And we need to be able to swap in the testing configuration transparently and, preferably, without duplicating data unnecessarily between the two configurations.
One solution that I’ve been using is to maintain the production configuration in the standard hibernate.properties file in the class path and the differing properties in a testing-only configuration file named hibernate.test.properties. The trick is to have the testing configuration override the production configuration where there is overlap. This helps us minimize the risk of finding a configuration issue only after deploying the application to a testing environment—we’ll catch most of our Hibernate-related integration problems with data access already in the developer’s workspace.
Now, let’s look at listing 6.15 to see how we can implement the missing getSessionFactory method from listing 6.14 and realize the approach I’ve described.

In order to obtain a SessionFactory, we need a Configuration object that knows how to create one. In listing 6.15, we first load a Configuration object based on the default production configuration from the hibernate.properties file sitting somewhere in our class path. We then pass on the production configuration object to another method that
loads the test configuration on top of the production configuration,
overriding any preexisting properties.
That’s quite a bit of code just to get Hibernate to connect to our test database, but the good news is that we’ll only need to write this code once. Speaking of connecting to the database, we should probably specify the test configuration for Hibernate at this point so that we’re not letting our integration tests hit a slow remote database or (gasp) the production database.
The next question is, if our test configuration file should override database connection-related properties, what exactly are those properties? In our case, we want the integration tests to run against an in-memory HSQLDB database. Assuming that HSQLDB isn’t also our production database, that means we need to override at least the SQL dialect and a number of properties prefixed with hibernate.connection. Listing 6.16 lists a sample properties file that does the trick.
The configurations in listing 6.16 aren’t much different than for a production setup—we just need to tell Hibernate which product-specific dialect of SQL to generate, where the database is located (the JDBC URL), which JDBC driver to use for connecting to the database, and what user account to use. The last configuration, hibernate.show_sql, isn’t required but is useful for testing purposes. By enabling the show_sql feature, Hibernate will print out the generated SQL statements as it executes them, which can be extremely helpful in figuring out why things don’t work the way we expect.
The hibernate.connection.url property’s value, jdbc:hsqldb:mem:testdb, is basically saying that we want to connect to an in-memory HSQLDB database named “testdb.” If such a database doesn’t exist already in the running JVM, HSQLDB creates one. Using an in-memory database also means that, because we’re creating the database from scratch, there are no tables to begin with. Somehow, we need to initialize our database schema when the database is created.
Finally, with the hibernate.hbm2ddl.auto value of create-drop, we’re instructing Hibernate to automatically create and drop the database schema when the SessionFactory is created and closed, respectively. That’s not the only way to go about creating the database schema, though. Let’s see what our options are.
The database schema is part of the code base. With this in mind, and considering that the database schema should be versioned along with the rest of the system, we quickly come to the conclusion that we need to have a single representation of the schema in a file (or files) that’s under version control.
The traditional solution has been to maintain the DDL (data definition language) statements used for creating the schema in an SQL script file (or a set of SQL scripts). This has many benefits: it’s easy to put under version control, and it uses the familiar SQL statements that the database administrator and the developers are already familiar with.
The downside with external SQL scripts is two-fold. First, we need some way to execute such scripts from Java code in order to set up the schema for our integration tests. Second, if we’d like to use a different database for integration tests than in production, the DDL statements must also be compatible between the two products. Otherwise, we’ll have to maintain two schema-definition files, and that’s just asking for trouble.
The first of these issues can be tackled easily by hacking together a utility that can parse SQL statements from a file and feed them to the JDBC API. The second, however, is trickier. Unless you happen to be using Hibernate, that is.
Fortunate for those of us who are using Hibernate, Hibernate offers a way to automatically export the schema into the configured database when the SessionFactory is created. Hibernate accomplishes this based on the mapping documents. The good news is that the only thing required from the developer is to add the hibernate.hbm2ddl.auto=create-drop property to our integration test configuration. The not-so-good thing about it is that the schema derived from mapping files is typically not identical to what we’d create by hand.
For example, Hibernate by default might create a VARCHAR(255) column for a field of type java.lang.String, whereas we might prefer an arbitrary long data type instead of one limited to 255 characters. This has caused more than a few gray hairs in the past: Some database products silently crop the incoming data when it doesn’t fit into the allocated column length. The defaults can be overridden in the mapping documents by defining the sql-type attribute for each column for which we want to explicitly declare its type. This may or may not be a problem for you, depending on whether your production database and your test database both support the same data type.
Before we continue with our integration test cycle by implementing the missing production code, there’s one more thing about the SQL scripts I feel you should know. I call it incremental DDL scripts.
If you’ve ever been elbow-deep in grease on a production deployment gone wrong, you’ve probably witnessed a database rollback. It’s relatively simple to roll back application code—just take the current stuff out and put something else back in. It’s not that simple with the database, however, because the database is not just the schema. It’s also the data stored in that schema, and it’s that data that makes things more difficult—we can’t just throw away all of the data and create the schema again from scratch. This is where incremental DDL scripts come into play.
The fundamental idea of incremental DDL scripts is that every big schema can be created in small steps, adding one column or one table at a time—and that these little steps are often reversible! If we define the database schema as an ordered sequence of small DDL scripts “patching” the current version of the schema in a database, we can easily re-create an arbitrary version of the schema, and we can revert from a newer version back to an older version of the schema without losing all of our data.
Probably the first public incremental DDL script implementation I’ve seen is migrations in the Ruby on Rails web framework.[7] This is a Java-oriented book, however, so let’s not talk about Rails migrations. The closest thing to the incremental DDL scripts idea I’ve seen in the Java world is an open source project named dbdeploy.[8]
The dbdeploy tool is a database change-management tool that generates an SQL script from a set of smaller SQL scripts. The small patch files are sorted by their numerical prefix that identifies the patch file’s version. For instance, we might have a directory structure of delta scripts such as that shown in figure 6.6.
Figure 6.6. Example directory structure with a set of delta change scripts for creating the database schema incrementally. The four scripts in the directory src/db/deltas will be applied in order according to the numerical prefix.

In our example in figure 6.6, we have four delta scripts that dbdeploy will apply in the order specified by the numerical prefix. Each of the four delta files would then contain plain ol’ SQL statements and, optionally, an undo block that dbdeploy applies when rolling back to an older version of the schema. For instance, the delta file named 00002_create_order_items_table.sql might look like listing 6.17.
The dbdeploy tool would read the SQL statements in listing 6.17 until it encounters a row containing the magic string --//@UNDO, which effectively says, “that’s all there is for updating the schema—the rest of the file is rollback statements.”
Now, dbdeploy can be pointed to a database and a set of delta scripts, and it will generate migration scripts for that specific database. But how does dbdeploy know which delta files have already been applied? For that purpose, we need to maintain a dedicated table in the database that contains information about the deltas applied to that specific database. We need to create the table manually; but the dbdeploy distribution contains the necessary SQL scripts for doing that, so we just need to execute it once when creating a database.
An incremental DDL-script solution such as dbdeploy is definitely worth considering. It is yet another tool to learn, but it also gives a number of benefits that are hard to argue. For this chapter’s purposes, however (and especially for our integration test example, which we still haven’t finished!), we’ll go with the Hibernate solution, letting it drop and create our database schema behind the scenes. Now, let’s pick up our integration test again and finally make it pass.
Let’s recap what our test back in listing 6.14 was expecting of us. We wanted the HibernatePersonDao class, configured with a proper SessionFactory object, to save the given Person object so that it is indeed persisted into the underlying database behind the SessionFactory. Listing 6.18 presents a sufficient implementation in all of its glory.
All that our DAO implementation needs to do to make the test pass is to store the SessionFactory that was passed in into a private field and in the save method use it for the persistence operation. In the save method, we
obtain an existing Session from the SessionFactory (or let it create a new one if there’s no existing session),
begin a transaction, invoke save(person) on the Session object, and commit the transaction.
Speaking of transactions, let’s look at a concept called transactional fixtures and why we should consider using such things.
It’s a piece of cake to test read operations. They’re straightforward: We have some data in the database, and the read operation returns objects that represent that data (or a subset of that data). Write operations change the state of the database, so we need to check that the expected change happened. They’re also difficult because the state of the database is not the same anymore for the next test. If we don’t make sure that the database’s state is retained from one test to another, the read operations quickly become part of this non-trivial problem as well. So, how do we keep the state of the database from changing?
We could, of course, implement a set of undo operations into our teardown methods for each test, essentially undoing all changes made to the integration test database during the execution of the previous test method. The complexity of such undo-logic would soon explode in our faces, however, so we need another solution. Possibly our best option is something we call a transactional fixture.
A transactional fixture is a fixture—the state upon which our test methods operate—that gets rolled back into its original state automatically at teardown. As the name implies, we use the persistence framework’s and the database’s transactional capabilities for implementing this. In other words, we literally begin a database transaction in our setup method and perform a rollback in our teardown method. Listing 6.19 shows how this can be done with our Hibernate example.
We begin each test execution by starting a Transaction for the Session inside the setUp method. From this point onward, all database operations through the associated Session object are part of the database transaction and will be rolled back when we later
invoke rollback on the Transaction object in our tearDown method.
The fundamental advantage of using transactional fixtures over having our test suite constantly populate the database into a known state before each and every test is speed. Importing data from an external file can take a lot of time, and even with small data sets, the number of tests accumulates the problem with slower test execution. With transactional fixtures, we can choose to populate our test database just once before running our tests, knowing that our tests will always leave the database exactly as they found it.
The only significant disadvantage of transactional fixtures is that they cannot be used for testing transactional behavior. That is, if our test is supposed to test that a certain piece of code commits the ongoing transaction, it’s too late for us to cry for a rollback in teardown.
If transactional fixtures help us retain the state of the database throughout our test run, how can we populate the known state into the database in the first place? Let’s talk about populating test data into a database for our integration tests.
Let’s say we’re writing an integration test for a read operation such as findByLastname. Because we’re no longer faking the results coming from the persistence framework, we need some data in the database. A somewhat classic but brutal way of accomplishing this has been to execute SQL scripts with a bunch of INSERT statements. Although technically a feasible solution, SQL scripts are hardly the most user-friendly and far from the most portable way of populating our test database.
We’d very much like some alternatives, and, luckily, we have at least a couple: Hibernate, if applicable, and an open source framework named DbUnit. Let’s take a closer look at these two.
If we happen to be using an object-relational mapping framework like Hibernate, why not take advantage of the capability of creating object graphs in plain Java and of having them automatically persisted into the database? In many aspects, representing the state of the database in the same language—and same location—as the rest of our test assets makes perfect sense. Let’s look at an example in listing 6.20.
The test for the findByLastname method on our familiar HibernatePersonDao class starts by creating a bunch of Person objects and then
using the Hibernate API for persisting them into the database, flushing the session to make sure the objects are saved right away. Then, it uses the DAO to find the subset of those Person objects—now records in a database table—that have the last name Smith.
Notice how the test data in this test is right next to the production code we’re invoking. Contrast this to a situation when our test would call the finder method and expect a list containing Alice and Billy Smith. Why is that so? Because the external SQL file that’s associated with this particular test class has inserted exactly those two Smiths into the database before this test runs. When the test data and test logic are separated, it’s much harder to figure out what’s going on when a test fails.
Note
Although we’re populating the database inside a test method in listing 6.20, it is possible to use the same Hibernate API for populating the database just once in a suite or class-level setup.
Having said all that, there are also some downsides to using the API of an object-relational mapping tool to populate our test data. For instance, by relying solely on the object-relational mappings and the persistence framework, we might be missing problems that would become blatantly obvious if we had populated the data using raw SQL or some other lower-level format. Also, not all projects use ORM frameworks like Hibernate, so another alternative for plain SQL scripts is still warmly welcomed. Yet another potential downside is that it’s harder to test-drive when we might not yet have a suitable data-access object to use for populating the data we need!
For a couple of years, and for many projects, that other alternative has been a tool named DbUnit.[9]
DbUnit is an all-around workhorse for testing database code with externally defined test data. It can be used for populating the contents of a database from data files in various formats—most notably two alternative forms of XML—as well as for exporting the contents of a database into a data file. Furthermore, DbUnit lets us perform comparisons between two data sets, typically an external data file and the current contents of a database. We’re not going too deep into using DbUnit (the project’s website offers sufficient documentation and examples for getting started), but let’s take a brief glance at how populating test data from external files works.
DbUnit provides a base class named DatabaseTestCase for us to extend. This base class makes sure to populate the appointed database with the appointed data set in the setup method. The developer’s responsibility is to provide the base class with the database connection and the data set through a pair of abstract methods.
In practice, though, it makes sense to create an abstract base class of our own for our DbUnit tests, isolating the database connection and data-loading details into a single place rather than dealing with them everywhere. With that in mind, listing 6.21 illustrates what such a class might look like.

The two responsibilities of the developer have been fulfilled by overriding two abstract methods from DatabaseTestCase. In listing 6.21, we first create a JDBC connection using whatever means necessary (using java.sql.DriverManager, for example) and
wrap the connection object with a DbUnit-specific DatabaseConnection object. The second responsibility, appointing the initial data set for DbUnit to populate to our database, is implemented by
loading an XML file from the class path based on the test class’s name and
parsing the file as a flat XML data set.
With this base class, we can extend from DbUnitIntegrationTestCase, create an XML file to describe the data set our tests should begin with, and start writing the Java code for executing our data-access code against the same database connection that we gave to DbUnit. Listing 6.22 shows an example of a concrete test class that extends our abstract base class.
Listing 6.22. Concrete test class extending the base class from listing 6.21

After letting DbUnit populate the database from an XML file named after the test class, we
create a List of Person objects named expectedList, then
configure our data-access code to connect to the same database our base class appointed to DbUnit, and
assert that the List returned by the findAll method matches our expectations. We’re basically assuming (correctly) that DbUnit will have populated “Al Freeman”, “Bill Brewster”, and “Juan Alvarez” into the database before our test method runs. The assumption is correct as long as we’ve placed a file named PersonDaoDbUnitIntegrationTest.initial.xml somewhere in our class path where the class loader can find it. How about peeking at exactly what the data file should contain?
Listing 6.23 illustrates how we would tell DbUnit to populate Al, Bill, and Juan into a table named “employees” and their salaries into a table named “salaries” using DbUnit’s so-called “flat” XML format.
The flat XML format works by encoding each row to be populated into a single XML element under the dataset root element. The element’s name identifies the name of the table into which DbUnit should insert the record, and each attribute for the element represents a column name-value pair. DbUnit is smart enough to figure out basic type conversions based on the data type information provided by the JDBC connection. For example, the attribute values for the start_date column are nicely parsed into java.sql.Date objects if the column’s data type has been defined as DATE.
Defining our test data in an XML format such as that shown in listing 6.23 translates to easier maintenance, as we don’t need to change our test data when switching our test database, for example. Furthermore, the XML format is easier to read than SQL statements. The main downside to this approach of defining our test data in external files the way we’ve seen in listings 6.22 and 6.23 is that we’ve duplicated our test data in two places. One way to get rid of this kind of annoyance when testing write operations is to use DbUnit’s provided data set assertions, which we’ll look at next.
We can make effective use of DbUnit’s ability to export the contents of a database as an IDataSet object in the context of comparing two data sets with each other. Compared to going back and forth between an XML document and Java source code, it is somewhat more natural to express the expected contents of the database in the same format we use to define the initial data.
There are three steps to accomplishing this—the first of which is to export the current contents of the database. That’s as simple as taking the same connection object we’ve given DbUnit for populating initial data and asking it to create an IDataSet for us:
Next, we need to get another IDataSet to compare the data set with. We can employ the same code we added earlier to our base class, after a small refactoring:
By extracting the logical name of the data-set file into a method parameter, we can use the overloaded getDataSet(String) method in our test methods for reading an arbitrary data file from the disk and parse it into an IDataSet.
Now that we know how to get two data set objects to compare—the actual and the expected—we just need to compare them somehow. DbUnit offers a helping hand in the form of the Assertion class, which defines two overloaded assertion methods:
The former compares two data sets, whereas the latter compares two individual tables. The latter is useful when we’re only interested in the contents of a specific table. By comparing one table, we can maintain just the specific part of the full database contents in our “expected” data-set file that we’re interested in. Perhaps a more common approach to narrowing the comparison to what’s relevant is, though, to ask DbUnit to export only certain tables rather than every table in the database.
With these two assertion methods at our disposal, and with the code snippets for exporting the current contents of the database and for reading a data set from the file system, we could now proceed to write a test such as that shown in listing 6.24.
After configuring and invoking some data access code, we read the expected contents of tables “employee” and “salary” from a file,
ask DbUnit to extract the current contents of tables “employee” and “salary” from the database, and
compare the two data sets to find out any discrepancies. Pretty simple, eh?
That’s about all we need to know about DbUnit to get started. It’s a simple tool that can make our lives a lot easier. It is, of course, up to you to decide whether to use DbUnit or not. In fact, we’ve now seen a lot of unit-testing and integration-testing techniques for data-access code, but we haven’t yet discussed how exactly these two mix in our development process. The main question is whether we should drive the development of our data-access code with unit or integration tests.
Take our tour and find out more about liveBook's features:
- Search - full text search of all our books
- Discussions - ask questions and interact with other readers in the discussion forum.
- Highlight, annotate, or bookmark.
In a traditional development process, we proceed through stages of testing from unit-level tests to integration tests to system-level tests. But test-driven development is not a traditional development process. To start with, we don’t have anything to test until after we’ve written our first test. From the perspective of getting production code implemented, there’s little difference between driving with unit tests using test doubles and driving with integration tests that hit a database. Having said that, there are differences in what we can do with each type of test and what kind of a rhythm our development follows.
Let’s first see at what our TDD cycle would look like if we were to use integration tests to drive development; then, we’ll discuss how integration tests and unit tests mesh in our overall process.
Perhaps you’ve noticed that integration tests require some plumbing before we can get down to business. Building enough plumbing to get the first test written can take time—especially when using a specific tool or technology for the first time—but once the basic plumbing is in place, its evolution follows the same path as the rest of our test code through small refactorings and removing duplication by moving shared functionality toward utility classes or higher up in the inheritance tree.
Integration tests with plain Java don’t differ from regular unit test–based TDD, except that the tests are somewhat slower to run. Integration tests making use of DbUnit, however, are more interesting. What happens once we have the necessary plumbing in place for DbUnit is akin to the following sequence:
1. Write a test method.
2. Run the test, getting a complaint about missing data file(s).
3. Edit the data file(s).
4. Run the test, getting a complaint about missing table, column, and so on (if applicable).
5. Implement a change to the database schema (if applicable).
6. Run the test, getting the expected complaint about missing functionality.
7. Implement functionality.
8. Run the test, getting a green bar.
This isn’t radically different from regular unit test–based TDD, but it includes steps that imply a longer round-trip between two green states. In other words, our cycle becomes longer. Also worth noting is that, because we might need to make changes to the database schema, we might need to do some code as well as database refactorings[10] before getting to a form where it’s straightforward to introduce the new functionality.
10 Scott Ambler’s and Pramodkumar Sadalage’s Refactoring Databases (Addison-Wesley, 2006) is the prime resource on the topic of evolving a database schema in a disciplined manner.
In my experience, integration tests can be used to test-drive data-access code effectively as long as the toolset is in place—a persistence framework that does most of the plumbing, a test harness that facilitates adding new tests easily, and a database that’s fast to work with. That’s not to say that we should start running around with just one hammer in hand, though.
As you know, integration tests can yield valuable information about our data-access code working properly with a database—something that we verify in unit tests only indirectly, relying on our knowledge of how the persistence framework’s query language works, how we should use the API, and so forth.
One of the biggest downsides with integration tests is that the tests are slower to execute. The slowness starts to show with only a handful of tests; and with the full integration test suite taking up to 10 minutes to execute, the temptation to commit changes before knowing their full effect on the health of the code base increases.
Another downside of integration tests as a driver for development is that they’re incapable of simulating certain situations, and writing a unit test using test doubles is often easier than setting up the test data for an integration test. For instance, it’s pretty darn difficult to make a database connection fail temporarily just at the right time, although it’s trivial to simulate the symptoms (an exception) with test doubles. Further, it’s generally easier to wire a test double to return a set of domain objects than it is to populate a database with equivalent data.
With these trade-offs in mind, we don’t want to make an either-or decision; instead, we want to find some kind of golden middle ground that mixes the best of both worlds. Although it makes sense to have a degree of consistency with regard to the kinds of tests our team writes for data-access code, it also makes sense to keep both tools in our bag and be able to pick the one most suitable for the task at hand.
Before summing up what we’ve gone through in this chapter so far, let’s take a quick look at a domain of data access we haven’t yet touched—the good ol’ file system.
Remember Michael Feathers’ definition of a unit test, which we saw earlier in section 6.2? Among other things, the definition says that a unit test does not touch the file system; a unit test can run in parallel with other unit tests; and a unit test doesn’t require you to do special things to your environment to run it. The main reasons for having these rules for unit tests are speed and maintainability. Now, keeping the goal of fast test execution and maintainable test code in mind, what should we do with file-system access?
In short, we should avoid file system access as far as possible, instead abstracting files with Java’s streams or the java.io.Writer and java.io.Reader interfaces. After all, most of the time we’re interested in the data inside the file, not the file itself, and streams are well-suited for feeding data to our classes.
The question remains, then, how should we test that thin slice of our code base responsible for adapting the File API to our application’s needs? We’re going to answer that exact question in this section. Let’s start with a little story from a past project of mine.
I recently faced a requirement for downloading content updates in the background over the network from another server. The background download component was supposed to first download the new packages into a temporary file, apply some integrity checks, and then copy the downloaded package over the previous version of the package. This led to the introduction of a ContentStorage interface, which the ContentDownload class used for writing files and moving them around inside the system’s content directory structure.
By faking the ContentStorage interface in unit tests, we avoided file access and the trouble of configuring file paths for the real storage implementation and cleaning up afterward. Furthermore, we only needed to deal with the file system in a handful of focused tests for the real, disk-based ContentStorage implementation, which limited the number of slow tests performing file I/O and the need to clean up the file system after each test.
Realizing that file access can be made to support testing is the first step toward improved testability. In the next section, before wrapping up this chapter, we’ll look at a handful of practical tips for improving our file-access code’s testability.
Following good design principles generally leads to testable designs. There’s something about file access, however, that somehow seems to make us drop our design awareness when it comes to reading from or writing to the file system. For some reason, we Java developers have traditionally not given a second thought to, for example, using the java.io.File* API directly from pretty much everywhere in our code base. OK, maybe it’s just what I used to do, but I’d like to think that I wasn’t alone. In any case, what follows is a handful of design tips for simplifying our life with file access and, specifically, making the test-driving of file access-related functionality as easy as possible.
Java’s File API is good for the most part, but that’s no reason to scatter file access all around the code base. The problem is perhaps most visible when we need to add or change functionality in legacy code, which does direct file access. If you’re as lucky as me, you might need to create a whole directory structure in your test in order to successfully invoke the code you want to change. By encapsulating file access behind a custom interface and having the rest of the code base use that interface, we can usually get away with less setup code in our tests.
When we have a piece of code that needs to, say, perform a diff operation on two files, the code easily ends up taking in two java.io.File objects. It makes perfect sense because we’re supposed to be comparing two files. However, as it turns out, it doesn’t make perfect sense. The problem is that the diff operation, for example, doesn’t care about the file as such—it cares about the file’s contents. In other words, the code needs to open a FileInputStream in order to read the files’ contents, which in turn means that the File objects must refer to an actual, existing file in the file system. And this tends to require more code to accomplish compared to a situation where we could feed the file contents directly from Java code. Passing around stream objects rather than file handles removes this problem nicely for the majority of cases.
We’re better off passing around stream objects than file handles in the majority of—but not all—cases. The exception is situations where the code needs more than just the contents of the file: for example, information about the file’s location, size, and so forth. Even in these situations, however, we shouldn’t blindly fall back on passing around java.io.File objects, because the testability hindrances are still there.
Instead, it might make sense to introduce a custom file interface, which would be a façade for the java.io.File API, offering access both to metadata such as the file’s path as well as to the file’s contents. With this kind of an interface, code dealing with files becomes easier to test, because we can truly simulate files without messing around with temporary files and directories.
A good candidate for implementing a custom API for the file system is the Jakarta Commons VFS project.[11] The VFS stands for virtual file system and does just that—allows us to swap in a purely virtual, in-memory file system for use in our tests while running with the regular, disk-based file system in production.
If we are facing a situation where we need to generate actual, physical files on the file system, it makes sense to use a dedicated temporary directory for all of our output files as well as for dynamically generated input files. By isolating all test files under a single directory, we simplify our test cleanup procedures into wiping a single directory.
Speaking of cleaning up, sometimes it might make sense to perform cleanup only before each test and not clean up generated files afterward. The reason for this is that sometimes the generated data might be non-trivial in size. With large data sets, it’s useful to be able to study discrepancies between the expected and actual data manually by using our chosen tools (a diff tool, for example) rather than have the failing test wipe out all of our evidence.
These are simple practices—not silver bullets—but they can make our lives a lot easier when we need to work with file-access-related functionality. But now, let’s sum up the chapter and move on to yet other interesting and challenging topics.
In this chapter, we took a dive into the world of test-driving data-access code. And we came out alive and well! It turns out that test-driving data-access code isn’t all that complicated after all. We tackled this on two fronts—using both unit tests and integration tests to drive our production code.
We started off by discussing how data-access code differs from general application or business logic and how data access spans several layers. From there, we moved on to refresh our knowledge of the DAO pattern and how it can make testing our code a far less taxing task than it could be.
Next, we rolled up our sleeves and went on to test-drive a simple DAO class using three different persistence technologies: the plain JDBC API, JdbcTemplate from the Spring Framework, and the Hibernate API. With each of these technologies, we saw the same piece of functionality build up and watched how the technologies facilitated test-driven development through their respective flavors of testability.
After learning how to test-drive data-access code with unit tests, we switched gears and grabbed a larger piece of the puzzle: writing integration tests for a DAO class against an in-memory HSQLDB database. We discussed issues to consider when picking a lightweight database for integration tests, including the importance of being able to put the full configuration under version control.
We learned to build the necessary plumbing for making integration tests a breeze to write, discussed ways of creating the database schema with SQL scripts as well as using the Hibernate API, and touched on the use of transactional fixtures for cleaning up after our tests. To complete our discussion of integration tests, we introduced DbUnit as a tool for populating a database and performing comparisons based on externally defined XML data files.
After seeing two different strategies for test-driving data access code, we concluded that they both have their sweet spots and should both be kept in our toolbox. We should use the best tool for the job.
To finish the chapter, we briefly discussed the nuances with file access and how to design our code to simplify dealing with the file system.
In the next chapter, we’ll move a notch closer to the underlying platform and set out to conquer more unpredictable genres of programming tasks with our test-driven method: time-dependent functionality and concurrent programming.