Chapter 13. Database programming with Groovy

published book

This chapter covers

  • Using the low-level Groovy API for interacting with relational databases
  • Transactions, batching operations, and retrieving database metadata
  • DataSets for performing CRUD operations
  • Architectural and design of a data access layer
  • Groovy and NoSQL databases

As far as the laws of mathematics refer to reality, they are not certain, and as far as they are certain, they do not refer to reality.

Albert Einstein

Databases are stores of structured data. If your Groovy application needs persistent data, you most likely will need to talk to one. There are many different kinds, each with particular advantages, disadvantages, and characteristics; luckily Groovy makes it easy to talk to whichever kind you need. Perhaps the most well-known kind of database is the family of relational[1] databases, but other kinds include object-oriented, key–value stores, document-centric, and graph. We’ll concentrate on relational databases first and briefly cover some of the NoSQL[2] forms later in the chapter.

1 An Introduction to Database Systems, by C. J. Date (Addison-Wesley, 2003).

2 Seven Databases in Seven Weeks: A Guide to Modern Databases and the NoSQL Movement, by E. Redmond and J. R. Wilson (Pragmatic Programmers, 2012).

join today to enjoy all our content. all the time.
 

13.1. Groovy SQL: a better JDBC

Relational databases are data stores that are based on a relational model. It’s this model that makes them so powerful. Its mathematical foundation allows you to reason about the results of operations and lets database engines perform appropriate optimizations.

Database access is also highly standardized, allowing multiple applications to coordinate by sharing their data even if these applications are built with different technologies. The standard that incorporates the relational algebra is SQL.

Because using SQL and connecting to relational databases is such an important task, any programming language worth talking about provides a way of doing it. Scripting languages—notably PHP, Python, and Ruby—provide simple and immediate access, whereas Java comes with the JDBC API, which isn’t as simple but comes with some levels of improved consistency across different database systems and improved integration with Java objects.

Now comes Groovy. The Groovy database connectivity support (Groovy SQL for short) is plain JDBC with sugar from Groovy’s groovy.sql library package. It takes only a handful of classes (the main four being Sql, DataSet, GroovyResultSet, and GroovyRowResult) to make database work short and sweet. Figure 13.1 shows where Groovy SQL fits into the API stack.

Figure 13.1. Groovy SQL builds on plain JDBC
Note for Module Users

Groovy 2 introduced modularization (see section B.3 of appendix B) and groovy-sql is one of the available modules. If you’re using the groovy-all JAR file or using one of the Groovy distributions, you’ll automatically have the groovy-sql module available to you. If you’re integrating directly with the core module groovy.jar file, then you might need to @Grab the groovy-sql module (contained in a groovy-sql.jar/) or place the JAR file on your classpath. The listings for this chapter have the necessary module @Grab statements in them but they’re commented out and not needed for most users.

Groovy SQL lifts JDBC to a level of user-friendliness that’s comparable to, and in some respects better than, that offered by other scripting languages. But it also plays nicely at the object level. JDBC is often used with database-related design patterns that evolved around it. In this chapter, you’ll see some of them in the form of data transfer objects (DTOs) and data access objects (DAOs). You’ll witness how Groovy SQL reduces the need for creating such extra classes, sometimes eliminating the extra work.

Database systems and SQL make a topic of their own, and many books have been written about them. You need this knowledge for our examples, but explaining it here would exceed the scope of this book.[3] In return for your understanding, we keep the SQL examples reasonably basic.

3 See An Introduction to Database Systems, by C. J. Date (Addison Wesley, 2003) for a good introduction.

For the remainder of this section, it’s assumed that you have some basic knowledge about SQL and how to work with relational databases in Java.

When you’ve finished this section, you’ll be able to work on your databases through Groovy for any kind of administration task, automated or ad hoc reporting, persisting your business objects, and leveraging the power of the relational data model—all in a simple yet organized manner.

13.1.1. Setting up for database access

It’s fairly obvious that you cannot do anything before you have a database system that you can use. Groovy has no database in its distribution. If you have a database system that comes with a JDBC driver, you can go with it. You might want to check if one was installed along with your JDK installation (many distributions bundle Java DB[4]). Otherwise, you’ll need to install one, where install can mean totally different things for different database products.

4 Java DB is Oracle’s supported distribution of the Apache Derby open source database. It is included in the JDK. See www.oracle.com/technetwork/java/javadb/.

Installing a database

The examples in this chapter should work with most vendor and open source databases that support JDBC. We used the popular HyperSQL Database (HSQLDB), which you can download from http://hsqldb.org. We used version 2.3.2 but any recent version should work. HSQLDB is an easily embeddable, Java-only database engine with a small footprint but is still fairly feature complete. Installing HSQLDB means putting the hsqldb.jar file on your classpath when executing this chapter’s examples, or using the appropriate @Grab statement as per the sample listings. See section 7.2.3 for details of how to add JAR files to the classpath. Remember that you can drop a JAR file into your <user.home>/.groovy/lib directory to have it on your classpath whenever you start any of Groovy’s built-in tools.

If you decide to use a different database system, follow its installation instructions. Typically, you’ll also have a JAR file that needs to be added to the classpath, because at least the implementation-specific driver or data source class needs to be found there. You’ll also need to adjust the URL to suit your selected database engine.

Note for Windows Users

The JdbcOdbcDriver is on the classpath by default because it ships with the JDK. It allows connections to database systems that implement the Open Database Connectivity (ODBC) standard over the so-called JDBC–ODBC bridge. Popular ODBC data sources are Microsoft Excel and Microsoft Access. This driver isn’t intended for production use, however. It’s an easy way to explore a database exposed by ODBC, but a dedicated JDBC driver is usually a more stable and better-performing long-term solution.

Database products also differ in the SQL they accept. Every system has its own dialect.[5] Because our examples use HSQLDB, the SQL that you’ll see in the examples is in HSQLDB dialect. Should you choose to use a different database, the examples should be very similar, but see the manual of your database product for possible deviations.

5 “The wonderful thing about standards is: there are so many to choose from.”—Prof. Andrew Tennenbaum.

Basic relational database operations are supported in Groovy according to the design guideline that simple tasks should be easy and advanced tasks should be possible. This section is solely about simple tasks. That means you can expect an easy introduction into the topic. We’ll go through:

  • Connecting to the database
  • Creating the database schema
  • Working with sample data

Working with data is done through four operations: create, read, update, and delete, together called CRUD operations.

At the end of this section, you’ll be able to do standard database work with Groovy. The knowledge in this section will be sufficient to write whole applications that utilize databases. The remainder of the section will expand your design choices to more elaborate solutions.

First contact

Regardless of your technology, you must provide four pieces of information to access a database:

  • The database URL
  • Username
  • Password
  • Driver class name (which can sometimes be derived automatically)

The database URL needs a short explanation. A database URL (a JDBC URL in our context) is a platform-independent way of addressing a database. It always starts with jdbc: followed by a vendor-specific subprotocol. You need to refer to your database system’s documentation for possible subprotocols.

HSQLDB supports several subprotocols, and the main ones are listed in table 13.1.[6]

6 But see the relevant documentation (http://hsqldb.org/doc/2.0/guide/dbproperties-chapt.html) for reading from read-only resource databases, turning on security, or a whole host of other database properties that can be set as part of the URL.

Table 13.1. HSQLDB subprotocols

URL pattern

Purpose

jdbc:hsqldb:hsql://server/dbname Connects to a HSQLDB server process; use when multiple clients or processes need to share the database
jdbc:hsqldb:file:/path/dbname Connects to a single-client HSQLDB instance with file-based persistence; multiple files starting with dbname will be created if the database doesn’t yet exist
jdbc:hsqldb:mem:dbname Connects to a nonpersistent in-memory database

When using the HSQLDB in-memory database, for example, our database URL will be jdbc:hsqldb:mem:GinA. Changing to the server or file-based version is as easy as changing the URL accordingly.

We’ll use standard username/password settings: sa for sysadmin and an empty password string. It goes without saying that this is acceptable only for experimental purposes.

The driver class name will be org.hsqldb.jdbcDriver. If you use a different vendor, this name will also be different.

Where do you put this information? In Groovy, you access the database through an object of type groovy.sql.Sql.[7] There are a few ways to get such an object. The most common is through Sql’s newInstance factory method, passing the preceding information as parameters. The following listing shows typical use.

7 If you think this naming is questionable, we wouldn’t disagree. If it helps, you can think of it as a database API that for the most part exposes SQL statements to the programmer.

Listing 13.1. Connecting to a database
import groovy.sql.Sql

def url = 'jdbc:hsqldb:mem:GinA'
def user = 'sa'
def password = ''
def driver = 'org.hsqldb.jdbcDriver'
def sql = Sql.newInstance(url, user, password, driver)

// use 'sql' instance ...

sql.close()

Congratulations; you’ve successfully connected to the database!

When you look into Sql’s API documentation, you’ll find more variants of the newInstance factory method. It might seem like there are many variants to choose from, but you’ll quickly find that there are typically only one or two that are appropriate for any given scenario. We’ll cover some of the common scenarios to make it a bit easier for you.

Listing 13.1 assumes that the JAR file containing the required JDBC driver is on the classpath as we alluded to earlier in this section. That typically involves declaring or defining a dependency to the required JAR file in your IDE or build system. To support the creation of self-contained scripts, Groovy’s @Grab annotation can be used. For our example we could tweak the earlier listing as follows:

@Grab('org.hsqldb:hsqldb:2.3.2')
@GrabConfig(systemClassLoader=true)
import groovy.sql.Sql
// ... as before ...

This adds the required JAR file (downloading if needed) to the classpath on the fly. You’d need to change the artifact reference if you aren’t using the particular version of HSQLDB that we used in our examples. Don’t worry too much about the @GrabConfig statement. Groovy has some simple class-loading infrastructure under the covers that you normally don’t need to even know about; here we’re telling that infrastructure to load the driver in a way that allows the JDK’s DriverManager class to see your driver class.

The map variant of the newInstance method allows Groovy’s named-parameter convention to be applied and also makes it easy to set additional parameters when needed. Our call to newInstance using that variant could look as follows:

def sql = Sql.newInstance(
    url: 'jdbc:hsqldb:mem:GinA',
    user: 'sa',
    password: '',
    driver: 'org.hsqldb.jdbcDriver',
    cacheStatements: true,
    resultSetConcurrency: CONCUR_READ_ONLY)

While discussing variants, we should also cover the alternative withInstance method. It has the same variants as the newInstance method and automatically calls the close method. Using it would look like this:

Sql.withInstance(url, user, password, driver) { sql ->
  // use 'sql' instance ...
}

The other scenario, which is quite common, is using a DataSource. We cover that next.

DriverManager vs. DataSource

If you look back to figure 13.1, you’ll notice two concepts below the JDBC API: DriverManager and DataSource. The Sql.newInstance methods always go through the DriverManager facility, which can be seen as the classic low-level way of connecting. Since JDK 1.4, there has been a second way that uses the DataSource concept.

Although the DriverManager facility is still supported for backward compatibility, using DataSource is generally preferable. In addition to providing a connection to the database, it may optionally manage a connection pool and support distributed transactions (not explained here). Because obtaining connections to a database is a time-consuming operation, it’s common to reuse them. The pool is the storage facility that provides you with a connection. You have to pass the connection back after use so that others can reuse it. If you forget to return it, the pool becomes pointless. To avoid that, Groovy SQL transparently returns the connection for you.

DataSources become even more important when running in a managed environment such as within an application server. A managed environment provides its applications with DataSource objects to make its special features (such as connection pooling) available. In this scenario, DataSource objects are often retrieved through the Java Naming and Directory Interface (JNDI).

Now that you’ve heard about the merits of DataSources, how do you use them in Groovy? Your database vendor provides its own implementation of the javax.sql.DataSource interface. HSQLDB, for example, provides the class org.hsqldb.jdbc.JDBCDataSource for that purpose. To obtain a Sql instance for a DataSource, you need to create it, optionally set its properties, and pass it to the Sql constructor, as can be seen in the following listing.

Listing 13.2. Connecting using a DataSource
import groovy.sql.Sql
import org.hsqldb.jdbc.JDBCDataSource

def dataSource = new JDBCDataSource(
    database: 'jdbc:hsqldb:mem:marathon', user: 'sa', password: '')
def sql = new Sql(dataSource)

// use 'sql' instance ...

sql.close()
Note for Enterprise Developers

If you’re using an application server, you might retrieve the DataSource using JNDI as previously mentioned. The advantage of this approach is that it allows administration of the database to be more independent from your program. Your program doesn’t need to mention specific database drivers or DataSource classes, and you could migrate from one database to another with reduced effort. But we did mention the dialect differences, didn’t we?

No matter whether you use a DataSource in the Sql constructor or the DriverManager facility through Sql.newInstance, in the end you have a reference to a Sql instance (as the value of the sql variable). You can work with this reference regardless of how it was constructed.

These are the recommended ways of connecting to the database in Groovy. In situations when you already have a database connection and you’d like to work on it through Groovy, you can use new Sql (connection). But beware that in this case, Groovy SQL cannot manage that connection and you have to take care of properly closing it yourself.

If you have a Sql instance and you need a second one with the same characteristics (a clone), you can use new Sql (sql).

Now that you have a Sql instance that represents your connection to the database, you’ll use it to execute SQL statements.

13.1.2. Executing SQL

Once you have a Sql instance in the sql reference, executing an SQL statement on the database is as easy as

sql.execute(statement)

Groovy SQL carries out all the management work around that call: getting a connection (possibly from the DataSource connection pool), constructing and configuring the statement, sending it, logging encountered exceptions, and closing resources (statement and connection) properly even if exceptions have been thrown. It even does a bit more, as you’ll see in the course of this chapter.

Creating the database schema

The first thing you can use the execute method for is creating the database schema. Let’s assume we’re going to store data about marathon athletes and their performances. To identify an athlete, we need the first name, last name, and date of birth. A first attempt might be

sql.execute '''
  CREATE TABLE Athlete (
    firstname   VARCHAR(64),
    lastname    VARCHAR(64),
    dateOfBirth DATE
  );
'''

This does the job but isn’t very realistic because we’ll need a primary key to look up athletes and we didn’t define one. It’s obvious that none of these fields listed is unique in itself. A combination of all three is unlikely to have duplicates, but such a compound key is always tricky to deal with and is still not guaranteed to be unique.

It’s conventional to use an artificial key (also known as a surrogate key) in such cases, so we’ll introduce one. Because we’re lazy, we’ll let the database figure out how to create one as shown in the following listing.

Listing 13.3. Creating a table in a database

That’s the minimal schema we’ll start with. We’ll work with it in an agile way; the schema will grow over time. Reconstructing the schema programmatically every time we need it makes this agile database programming possible. But wait. If we issue the preceding statement to a database instance that already has an Athlete table (maybe from our last run), it will throw a SqlException. We need to drop the old one, but only if an old one exists:

sql.execute '''
    DROP   TABLE Athlete    IF EXISTS;
'''


!@%STYLE%@!
{"css":"{\"css\": \"font-weight: bold;\"}","target":"[[{\"line\":1,\"ch\":4},{\"line\":1,\"ch\":38}]]"}
!@%STYLE%@!

As the SQL boilerplate code grows, it starts to bury the interesting information. For our purposes, in the following listing we’ll refactor this boilerplate code into a static helper method, create(), within a utility class, DbUtil, and call that in future examples.[8]

8 See the book’s sample code for the full listing.

Listing 13.4. A static helper method, create(), within a utility class, DbUtil
import groovy.sql.Sql

class DbUtil {
    static Sql create() {
        def url = 'jdbc:hsqldb:mem:GinA'
        def user = 'sa'
        def password = ''
        def driver = 'org.hsqldb.jdbcDriver'
        def sql = Sql.newInstance(url, user, password, driver)

        sql.execute """
            DROP TABLE Athlete IF EXISTS cascade;
            DROP TABLE Record IF EXISTS;
        """

        sql.execute """
            CREATE TABLE Athlete (
              athleteId   INTEGER GENERATED BY DEFAULT AS IDENTITY,
              firstname   VARCHAR(64),
              lastname    VARCHAR(64),
              dateOfBirth DATE,
              UNIQUE(athleteId)
            );
        """

        // additional set up will be added in future examples

        sql
    }

    // additional utility methods will be added in future examples
}

Note that this could be refactored into a template if you find yourself writing repeated code for multiple tables.

Inserting data

With the schema defined, you can start entering data. You can use the execute method for this purpose. Let’s add a marathon runner:

sql.execute '''
  INSERT INTO Athlete (firstname, lastname, dateOfBirth)
               VALUES ('Paul',    'Tergat', '1969-06-17');
'''


!@%STYLE%@!
{"css":"{\"css\": \"font-weight: bold;\"}","target":"[[{\"line\":2,\"ch\":23},{\"line\":2,\"ch\":29}],[{\"line\":2,\"ch\":34},{\"line\":2,\"ch\":42}],[{\"line\":2,\"ch\":44},{\"line\":2,\"ch\":56}]]"}
!@%STYLE%@!

We were once in a project where we used this approach to insert a thousand records of carefully hand-managed test data. But this approach is difficult to read and manage, because it contains a lot of duplication. You can make the execute method produce what is called a prepared statement, a SQL statement with occurrences of values replaced by placeholders (question marks).

You can reuse the same statement for a possibly large sequence of calls with different values per call. The JDBC driver has to do its per-statement work only once instead of numerous times. The work per statement includes parsing the SQL, validating, optimizing access paths, and constructing an execution plan. The more complex the statement, the more time-consuming this work becomes. In other words, using a prepared statement is always a good move. In Java, prepared statements are represented using the java.sql.PreparedStatement interface.

The following example separates the SQL from the data used:

def athleteInsert = '''
    INSERT INTO Athlete (firstname, lastname, dateOfBirth)
                VALUES  (?, ?, ?);
'''
sql.execute athleteInsert, ['Khalid',  'Khannouchi', '1971-12-22']


!@%STYLE%@!
{"css":"{\"css\": \"font-weight: bold;\"}","target":"[[{\"line\":2,\"ch\":25},{\"line\":2,\"ch\":26}],[{\"line\":2,\"ch\":25},{\"line\":2,\"ch\":26}],[{\"line\":2,\"ch\":25},{\"line\":2,\"ch\":26}]]"}
!@%STYLE%@!

The execute method is smart enough to know when it needs to work with a prepared statement. The preceding construction also better supports reading the list of fields from an external source such as a file and populating the database with it. The approach also avoids a nasty security vulnerability known as SQL injection.

Note

In SQL, string values are placed in single quotes like 'Paul'. But with a prepared statement, these single quotes must not be used. They aren’t present in the prepared statement, nor are they part of the string data passed in the list of values. (In other words, the single quotes in those values are for Groovy, not for SQL.) Similarly, even though dates have been represented here as strings, they really are dates in the database. You could have passed an instance of java.util.Date to the execute method, and in production code this would be more likely, but the sample code in this chapter is clearer using simple string representations.

When the statement gets more complicated, the mapping between each question mark and the corresponding list entry can become difficult to follow. In the course of development, the statement or the list may change, and the task of keeping both in sync is a likely source of errors.

It would be nicer if you could use a placeholder that better reveals its purpose and goes around the rigid sequence constraint. Toward that end, execute can also produce a prepared statement from a GString. We show this with a map for the athlete’s data but you could just as easily use a full-blown Athlete object instead—with the additional work of creating an Athlete class to start with, of course:

def data = [first: 'Ronaldo', last: 'da Costa',   birth: '1970-06-07']
sql.execute """
  INSERT INTO Athlete (firstname, lastname, dateOfBirth)
    VALUES (${data.first}, ${data.last}, ${data.birth});
"""


!@%STYLE%@!
{"css":"{\"css\": \"font-weight: bold;\"}","target":"[[{\"line\":1,\"ch\":12},{\"line\":1,\"ch\":15}],[{\"line\":4,\"ch\":0},{\"line\":4,\"ch\":3}],[{\"line\":3,\"ch\":12},{\"line\":3,\"ch\":25}],[{\"line\":3,\"ch\":27},{\"line\":3,\"ch\":39}],[{\"line\":3,\"ch\":41},{\"line\":3,\"ch\":54}]]"}
!@%STYLE%@!

Pay attention to the tripled double quotes around the statement, and remember that this construction produces a prepared statement and will therefore be just as efficient and safe on the database as the question-mark version (indeed that’s what it’s turned into under the covers).

This might sound like magic to you and might leave you with some doubts, because after all you cannot see whether we’re telling the truth. But we can enable logging and assess our claim. Use the following lines to see what happens behind the curtain:

import java.util.logging.*

Logger.getLogger('groovy.sql').level = Level.FINE
// your execute(GString)

For the previous example this produces:[9]

9 If you have trouble, see troubleshooting comments in the sample code for the book.

19/04/2015 7:23:28 PM groovy.sql.Sql getStatement
FINE: SELECT * FROM Athlete
19/04/2015 7:23:28 PM groovy.sql.Sql getPreparedStatement
FINE:
        INSERT INTO Athlete (firstname, lastname, dateOfBirth)
                     VALUES (?,?,?);
     | [Ronaldo, da Costa, 1970-06-07]

It goes without saying that logging the SQL that’s eventually executed is always a good practice during development. Also note that because we have a real prepared statement, the SQL expression uses no single quotes around the placeholder. The special use of GStrings as SQL statements limits the use of placeholders to places where a question mark would otherwise be allowed in a prepared statement.[10]

10 There’s an escape mechanism. See the GroovyDoc for Sql#expand for more details.

Before moving on, we should summarize options for execute that we’ve discussed so far. There are three main variants of the execute statement as shown in table 13.2.

Table 13.2. Main versions of the execute method

Returns

Method name

Parameters

boolean execute String statement
boolean execute String prepStmt, List values
boolean execute GString prepStmt

If you look at the GroovyDoc for the Sql class you’ll note a number of other variants of the execute method. These variants support less common calling scenarios such as using varargs-style parameters instead of a list, using named parameters (discussed in section 13.2), and variants taking an additional closure to support SQL statements that might return zero or more results. See the GroovyDoc for additional details and examples of these other variants.

Before completing our discussion of inserting data, we should mention the closely related executeInsert method. It supports similar parameter variants as execute but expects the provided SQL to represent an INSERT statement and can provide additional information via the return value. The normal execute statement returns a Boolean (which is frequently ignored) indicating whether the statement returned a ResultSet. The executeInsert statement instead returns a list of any autogenerated key values. So in the example you could find out what athleteId was automatically assigned for any inserted row. There are also variants of executeInsert that take a list of key column names of interest, allowing you more control over which autogenerated keys might interest you. Let’s put together the execute and executeInsert examples into a complete listing.

Listing 13.5. Inserting athletes into our table

HSQLDB starts autogenerated values from zero and increments by one, so the second row will have value 1 and the third row 2 . If you’re using a different database, these values may be different so the related assertions may need to change.

You now have three rows in your Athlete table. Because you’re using an in-memory database, it will turn out to be handy to populate some rows like this for many of the future examples. For convenience in future examples, we’ll move this logic into two helper methods: the insertAthlete method will insert one athlete, the populate method will call insertAthlete for three sample athletes. Both methods are added to our DbUtil class.[11]

11 See the book’s sample code for the full listing.

The first important steps have been done: you connected to the database, created the schema, and inserted data. In other words, you’ve covered the C in CRUD. Still to come are read, update, and delete. Let’s look at read next.

Reading data

Usually the most frequently used operation is reading data.[12] Reading has different aspects, depending on whether you look for a single row or multiple rows, what query information is available, and how you intend to process the retrieved data. The Sql class provides a range of methods, as listed in table 13.3, to cover these cases. The variants for handling of plain and prepared statements are the same as for execute.

12 This isn’t necessarily true in all databases—when a database is used essentially for audit logging, for instance, it may be read very rarely. But most databases are more frequently read than changed.

Table 13.3. Main methods for reading data from the database

Returns

Method

Parameters

void eachRow String statement { row -> code }
void eachRow String prepStmt, List values { row -> code }
void eachRow GString prepStmt { row -> code }
void query String statement { resultSet -> code }
void query String prepStmt, List values { resultSet -> code }
void query GString prepStmt { resultSet -> code }
List rows String statement
List rows String prepStmt, List values
List rows GString prepStmt
Object firstRow String statement
Object firstRow String prepStmt, List values
Object firstRow GString prepStmt

You’ll again find additional variants when looking at the GroovyDoc. Some of these extra variants are needed for paging and named parameters which we’ll cover in section 13.2. See the GroovyDoc for further details and examples if the standard variants don’t meet your needs.

The methods eachRow and query use a closure for processing the result. query calls the given closure once and passes the full java.sql.ResultSet into it; eachRow calls the closure for each row of the result, thus relieving the programmer from the usual iteration work.

Fetching a row at a time with eachRow

Suppose you’d like to print a report about all known athletes that should look like this:

----- Athlete Info ------
Paul Tergat
born on 17. Jun 1969 (Tue)
-------------------------
Khalid Khannouchi
born on 22. Dec 1971 (Wed)
-------------------------
Ronaldo da Costa
born on 07. Jun 1970 (Sun)
-------------------------

You can achieve this by using eachRow and a simple selection statement. The row that’s passed into the closure is an interesting object. You can use the column names as if they were property names of that object:

println ' Athlete Info '.center(25,'-')
def fmt = new java.text.SimpleDateFormat('dd. MMM yyyy (E)',
                                         Locale.US)
db.eachRow('SELECT * FROM Athlete'){ athlete ->
    println athlete.firstname + ' ' + athlete.lastname
    println 'born on '+ fmt.format(athlete.dateOfBirth)
    println '-' * 25
}


!@%STYLE%@!
{"css":"{\"css\": \"font-weight: bold;\"}","target":"[[{\"line\":3,\"ch\":3},{\"line\":3,\"ch\":10}],[{\"line\":3,\"ch\":37},{\"line\":3,\"ch\":44}],[{\"line\":4,\"ch\":12},{\"line\":4,\"ch\":19}],[{\"line\":5,\"ch\":35},{\"line\":5,\"ch\":42}],[{\"line\":4,\"ch\":12},{\"line\":4,\"ch\":29}],[{\"line\":4,\"ch\":38},{\"line\":4,\"ch\":54}],[{\"line\":5,\"ch\":35},{\"line\":5,\"ch\":54}]]"}
!@%STYLE%@!

Note how you’re using the row as if it were an Athlete object, which it isn’t. But you can also use the row as if it were a list (which it isn’t either) and call the subscript operator on it. To print

Paul Tergat
Khalid Khannouchi
Ronaldo da Costa

you could call

db.eachRow('SELECT firstname, lastname FROM Athlete'){ row ->
    println row[0] + ' ' + row[1]
}


!@%STYLE%@!
{"css":"{\"css\": \"font-weight: bold;\"}","target":"[[{\"line\":0,\"ch\":3},{\"line\":0,\"ch\":10}],[{\"line\":0,\"ch\":19},{\"line\":0,\"ch\":38}],[{\"line\":0,\"ch\":55},{\"line\":0,\"ch\":58}],[{\"line\":1,\"ch\":12},{\"line\":1,\"ch\":15}],[{\"line\":1,\"ch\":15},{\"line\":1,\"ch\":18}],[{\"line\":1,\"ch\":30},{\"line\":1,\"ch\":33}]]"}
!@%STYLE%@!
Note

When working with column indexes, it’s always safer to explicitly specify the sequence of column names in the select statement. 'SELECT *' may sometimes return the columns in an expected order (for example, the order they were defined in CREATE TABLE), but this isn’t guaranteed for all database management systems.

So what’s that row object, after all? It’s of type groovy.sql.GroovyResultSet, which is a decorator around the underlying java.sql.ResultSet. Being a Groovy object, it can pretend to have properties and provide Groovy-friendly indexing (starting from zero, allowing negative indexes that count from the end).

Fetching a ResultSet with query

The query method allows you to customize the iteration over the query results at the expense of convenience, because you can only work with the good-old java.sql.ResultSet. Suppose you’re only interested in the first athlete, and don’t want to go through all results for that purpose. You can use query like this:

db.query('SELECT firstname, lastname FROM Athlete'){ resultSet ->
    if (resultSet.next()){
        print   resultSet.getString(1)
        print   ' '
        println resultSet.getString('lastname')
    }
}


!@%STYLE%@!
{"css":"{\"css\": \"font-weight: bold;\"}","target":"[[{\"line\":0,\"ch\":3},{\"line\":0,\"ch\":8}],[{\"line\":0,\"ch\":53},{\"line\":0,\"ch\":62}],[{\"line\":1,\"ch\":8},{\"line\":1,\"ch\":17}],[{\"line\":2,\"ch\":16},{\"line\":2,\"ch\":25}],[{\"line\":4,\"ch\":16},{\"line\":4,\"ch\":25}],[{\"line\":1,\"ch\":18},{\"line\":1,\"ch\":24}],[{\"line\":2,\"ch\":26},{\"line\":2,\"ch\":38}],[{\"line\":4,\"ch\":26},{\"line\":4,\"ch\":47}]]"}
!@%STYLE%@!

Just like the eachRow method, the query method manages your resources (the connection and the statement). The downside is that the ResultSet that gets passed into the closure is less convenient to work with. You need to call next to move the cursor forward, you need to call type-specific getters (getString, getDate, and so on), and—most annoyingly—indexes start at one instead of zero.

Fetching all rows at once

As shown in table 13.3, it’s also possible to fetch all rows at once into a (possibly long) list with the rows method. Each list item can be used with an index or a property name (just like in eachRow). Suppose you have a simple requirement, like printing the following:

There are 3 Athletes:
Paul Tergat, Khalid Khannouchi, Ronaldo da Costa

You can use a simple database call like

List athletes = db.rows('SELECT firstname, lastname FROM Athlete')
println "There are ${athletes.size()} Athletes:"
println athletes.collect{"${it[0]} ${it.lastname}"}.join(", ")


!@%STYLE%@!
{"css":"{\"css\": \"font-weight: bold;\"}","target":"[[{\"line\":0,\"ch\":19},{\"line\":0,\"ch\":23}],[{\"line\":2,\"ch\":28},{\"line\":2,\"ch\":33}],[{\"line\":2,\"ch\":37},{\"line\":2,\"ch\":48}]]"}
!@%STYLE%@!

Having the selection results in a list makes them eligible to be put in GPath expressions. The example shows this with the collect method, but you can imagine find, findAll, grep, any, every, and so forth in its place.

Note

The list items are implemented as GroovyRowResult objects, the equivalent of GroovyResultSet object as used with eachRow.

The firstRow(stmt) method returns the equivalent of rows(stmt)[0] but, if your database supports it, only requests the first row (see section 13.2.3 for more details on how this trick is supported). Let’s put all of this together in a complete listing.

Listing 13.6. Reading athlete information from our table

This listing provides an excellent summary of your options for reading data. You can use the query method , in which case you need to handle iteration of the result set yourself and use JDBC API calls to access column values . You can use the internal iterator style eachRow method , in which case a closure is called with a GroovyRowResult for each row returned by the query. GroovyRowResult supports accessing column values using 0-based ordinal syntax (with Groovy’s normal conventions around negative index values) or (case-insensitive) named property syntax. You can use firstRow for efficient access to a single row or rows for all rows. Both methods also use a GroovyRowResult to represent a row. These methods provide many options for reading and manipulating database information. You’ll soon be working with database data with the same ease and efficiency that Groovy offers for Java’s collection structures. Do remember, though, to make the database do work when appropriate (for example, ).

That’s it for reading data. The next CRUD operation is updating.

Updating data

The update operation works with the execute method in the same way you’ve seen so far. Suppose we initially insert only the last name of a marathon runner:

sql.execute '''
  INSERT INTO Athlete (lastname) VALUES ('da Costa')
'''

Now suppose we want to update the row to also include the athlete’s first name:

sql.execute '''
  UPDATE Athlete SET firstname='Ronaldo' where lastname='da Costa'
'''

Our update here used a plain statement but, just like for inserting, we could have provided a list of parameters or used a GString to enforce the use of a prepared statement.

As you saw for inserting, there’s a closely related method for updating with some special features. The executeUpdate method works the same way as execute but provides a different return value. execute returns a Boolean indicating whether the statement returned a ResultSet and executeUpdate returns the number of rows that were changed by the update. So, to change the athlete again, this time entering the date of birth you’d use:

def updateCount = sql.executeUpdate '''
  UPDATE Athlete SET dateOfBirth='1970-06-07' where lastname='da Costa'
'''
assert updateCount == 1

Putting that altogether gives you the following complete listing.

Listing 13.7. Updating a table row
Deleting data

So far, you’ve created tables and inserted, read, and updated. The last CRUD operation you need to examine is delete. You’ll use the execute method and pass in the appropriate SQL statement. A complete listing follows.

Listing 13.8. Deleting a table row

You’ve seen how easy it is to execute SQL with Groovy to perform basic CRUD operations, but many applications need more advanced processing. That’s the topic of our next section.

Get Groovy in Action, Second Edition
add to cart

13.2. Advanced Groovy SQL

Groovy’s SQL features provide a higher-level API above JDBC. But that doesn’t mean when you need to do something a little out of the ordinary that Groovy’s SQL libraries get in your way. You can always start working directly at the JDBC level if you need to, but what you’ll discover in the rest of this section is that many other common database tasks also have special support from Groovy SQL. We’ll start by looking at database transactions.

13.2.1. Performing transactional updates

In many scenarios, for data integrity reasons, it’s important to update multiple pieces of information all at the same time or, if that isn’t possible, change none of the information. Such systems are said to support transactions and preserve ACID properties. When using a single database, JDBC provides such functionality out of the box, presuming, of course, your database supports this feature.

By default, every JDBC update is treated as a transaction. If you want to perform multiple updates, that normally involves turning off the autocommit behavior, performing your changes, and either committing, if everything went according to plan, or rolling back the transaction if there was a problem and then resetting the autocommit status (if needed). Thankfully Groovy SQL provides a nice little shortcut for these steps.

Suppose you wanted to add two new athletes to the database and suppose it’s important that they both are added in one transaction. It would be as simple as including the operations within a withTransaction block as shown in the following listing.

Listing 13.9. Invoking a transaction
import static util.DbUtil.*

def sql = create()
populate(sql)

sql.withTransaction {
    insertAthlete(sql, 'Haile', 'Gebrselassie', '1973-04-18')
    insertAthlete(sql, 'Patrick', 'Makau', '1985-03-02')
}

assert sql.firstRow('SELECT COUNT(*) as num FROM Athlete').num == 5

This isn’t only significantly shorter but much less error-prone.

There are times when data integrity isn’t your primary concern, instead efficiency is. That’s where batch processing comes into play. We cover that next.

13.2.2. Working with batches

Whenever you send commands or queries to a database server, there will be some communication overhead. When large volumes of information must be sent, this overhead can become significant. Batch processing is a standard mechanism to minimize this overhead and it comes in two flavors.

The first variant allows arbitrary commands to be sent to the database. An optional parameter (not used here) allows you to set a batch size. If you don’t set a batch size, then all commands will be in the one batch unless you manually call stmt.executeBatch(), which allows you to chunk the batch into arbitrary-sized pieces. If you wanted to combine an insert into the Athlete table and one for a related Record table as a batch, you’d execute the following withBatch statement:

sql.withBatch { stmt ->
    stmt.addBatch '''
    INSERT INTO Athlete (firstname, lastname, dateOfBirth)
    VALUES ('Paula', 'Radcliffe', '1973-12-17')'''
    stmt.addBatch """
    INSERT INTO Record (time, venue, whenRun, fkAthlete)
      SELECT ${2*60*60+15*60+25}, 'London', '2003-04-13',
      athleteId FROM Athlete WHERE lastname='Radcliffe'"""
}

If logging is turned on, this produces:

22/04/2013 6:34:59 AM groovy.sql.BatchingStatementWrapper processResult
FINE: Successfully executed batch with 2 command(s)

Before proceeding further, we should dive under the covers and see what this Record table looks like. It’s created with the following SQL:

CREATE TABLE Record (
  runId       INTEGER GENERATED BY DEFAULT AS IDENTITY,
  time        INTEGER,    -- in seconds
  venue       VARCHAR(64),
  whenRun     DATE,
  fkAthlete   INTEGER,
  CONSTRAINT fk FOREIGN KEY (fkAthlete)
    REFERENCES Athlete (athleteId) ON DELETE CASCADE
);

We modify our DbUtil create and populate methods to create and fill in four sample Record rows respectively.[13]

13 It follows the same pattern as we saw earlier for the Athlete table. Full details can be seen in the DbUtil class within the sample code.

The second batch variant uses prepared statements and is used when all the commands in the batch involve the same kind of operation. For example, if we want to enter multiple athletes as a batch operation, and we wanted to chunk the batch into pieces of size 3, we’d use the following code:

def qry = '''
  INSERT INTO Athlete (firstname, lastname, dateOfBirth)
  VALUES (?,?,?)
'''
sql.withBatch(3, qry) { ps ->
  ps.addBatch('Paula',     'Radcliffe',   '1973-12-17')
  ps.addBatch('Catherine', 'Ndereba',     '1972-07-21')
  ps.addBatch('Naoko',     'Takahashi',   '1972-05-06')
  ps.addBatch('Tegla',     'Loroupe',     '1973-05-09')
  ps.addBatch('Ingrid',    'Kristiansen', '1956-03-21')
}

If logging is turned on, this produces:

20/04/2015 2:18:10 AM groovy.sql.BatchingStatementWrapper processResult
FINE: Successfully executed batch with 3 command(s)
20/04/2015 2:18:10 AM groovy.sql.BatchingStatementWrapper processResult
FINE: Successfully executed batch with 2 command(s)

Let’s see the complete listing (including some assertions that our additional rows were added).

Listing 13.10. Batching operations
import util.DbUtil

def sql = DbUtil.create()
DbUtil.populate(sql)
DbUtil.enableLogging()

sql.withBatch { stmt ->
    stmt.addBatch '''
    INSERT INTO Athlete (firstname, lastname, dateOfBirth)
    VALUES ('Paula', 'Radcliffe', '1973-12-17')'''
    stmt.addBatch """
    INSERT INTO Record (time, venue, whenRun, fkAthlete)
      SELECT ${2*60*60+15*60+25}, 'London', '2003-04-13',
      athleteId FROM Athlete WHERE lastname='Radcliffe'"""
}

assert sql.firstRow('SELECT COUNT(*) as num FROM Athlete').num == 4
assert sql.firstRow('SELECT COUNT(*) as num FROM Record').num == 5

def qry = '''
  INSERT INTO Athlete (firstname, lastname, dateOfBirth)
  VALUES (?,?,?)
'''
sql.withBatch(3, qry) { ps ->
    ps.addBatch('Catherine', 'Ndereba', '1972-07-21')
    ps.addBatch('Naoko', 'Takahashi', '1972-05-06')
    ps.addBatch('Tegla', 'Loroupe', '1973-05-09')
    ps.addBatch('Ingrid', 'Kristiansen', '1956-03-21')
}

assert sql.firstRow('SELECT COUNT(*) as num FROM Athlete').num == 8

Sometimes you might have a different kind of performance problem. Your queries may produce too much data. To combat this problem let’s look at pagination.

13.2.3. Working with pagination

When working with large databases, it’s sometimes useful to only work with a subset of the returned information. This can be useful when you need a small subset of sample data or when working with the information in chunks (for example, displaying a page of information at a time on a website). Many Groovy SQL commands contain variants that allow subsets to be worked with; for example, the rows method takes an optional offset and size parameter as shown in the following listing, which returns athletes in chunks of two.

Listing 13.11. Pagination operations
import util.DbUtil

def sql = DbUtil.create()
DbUtil.populate(sql)

def qry = 'SELECT * FROM Athlete'
assert sql.rows(qry, 1, 2)*.lastname == ['Tergat', 'Khannouchi']
assert sql.rows(qry, 3, 2)*.lastname == ['da Costa']

Relational database systems reveal information about themselves in so-called metadata. This is “data about the data”—in its simplest terms, information like the types and names of columns, tables, and so forth. A look at accessing such information is next.

13.2.4. Fetching metadata

Earlier we looked at methods to print out parts of our Athlete table. Let’s now consider writing a helper method that should dump the content of any given table. The table name is provided as a method parameter. If you call the method as dump (sql, 'Athlete'), it should print

------- CONTENT OF TABLE Athlete -------
0: ATHLETEID        0
1: FIRSTNAME        Paul
2: LASTNAME         Tergat
3: DATEOFBIRTH      1969-06-17
----------------------------------------
    ... other rows ...

For proper display, you need additional questions answered:

  • How many columns should we display?
  • What are the column names?

Luckily, ResultSet (and thus also the GroovyResultSet) provides a method called getMetaData that returns a ResultSetMetaData object. This contains all the necessary information. See its API documentation in the following listing for details.

Listing 13.12. Accessing metadata

Like all the classes from the java.sql package, ResultSetMetaData works with indexes starting at 1. Therefore, you need to call getColumnLabel with (i+1). You also use the safe dereferencing operator (see section 7.1.3) in case the value at the given index is null . You’ll override standard output and capture it into a byte array so you can check that the method is printing the correct value .

Making use of metadata is so common that Groovy SQL provides additional variants of some of its methods that take a Closure that works exclusively with the metadata. You saw earlier that an eachRow method was available that took an SQL query and a closure:

eachRow(String sql, Closure rowClosure)

which would normally be called using the normal trailing Closure syntax. The metadata variant has an extra parameter:

eachRow(String sql, Closure metaClosure, Closure rowClosure)

The metaClosure is called once and passed the ResultSetMetaData object from the query result. The rowClosure is then called once for each row. Here’s how you might use it to rework the dump method:

def dump2(sql, tablename) {
  def printColNames = { meta ->
    (1..meta.columnCount).each {
      print meta.getColumnLabel(it).padRight(12)
    }
    println()
  }
  def printRow = { row ->
    row.toRowResult().values().each {
      print it.toString().padRight(12) }
    println()
  }
  sql.eachRow('SELECT * FROM ' + tablename, printColNames, printRow)
}


!@%STYLE%@!
{"css":"{\"css\": \"font-weight: bold;\"}","target":"[[{\"line\":1,\"ch\":24},{\"line\":1,\"ch\":28}],[{\"line\":2,\"ch\":8},{\"line\":2,\"ch\":12}],[{\"line\":3,\"ch\":12},{\"line\":3,\"ch\":16}],[{\"line\":7,\"ch\":19},{\"line\":7,\"ch\":22}],[{\"line\":8,\"ch\":4},{\"line\":8,\"ch\":7}]]"}
!@%STYLE%@!

One thing you might notice here is that we didn’t use the trailing closure syntax. We certainly could have chosen to do so, though we often avoid that style when calling methods with multiple closure parameters. Let’s add formatting characters and checks that we produce the correct output (for both the Athlete and Record tables) as shown in the following complete listing.

Listing 13.13. Using a metadata closure
import util.DbUtil

def sql = DbUtil.create()
DbUtil.populate(sql)

def dump2(sql, tablename) {
  def printColNames = { meta ->
    def width = meta.columnCount * 12
    println " CONTENT OF TABLE ${tablename} ".center(width, '-')
    (1..meta.columnCount).each {
      print meta.getColumnLabel(it).padRight(12)
    }
    println()
    println '-' * width
  }
  def printRow = { row ->
    row.toRowResult().values().each {
      print it.toString().padRight(12)
    }
    println()
  }
  sql.eachRow('SELECT * FROM ' + tablename, printColNames, printRow)
}

def baos = new ByteArrayOutputStream()
System.setOut(new PrintStream(baos))

dump2(sql, 'Athlete')
assert baos.toString().readLines()*.trim().join('\n') == '''\
----------- CONTENT OF TABLE Athlete -----------
ATHLETEID   FIRSTNAME   LASTNAME    DATEOFBIRTH
------------------------------------------------
0           Paul        Tergat      1969-06-17
1           Khalid      Khannouchi  1971-12-22
2           Ronaldo     da Costa    1970-06-07\
'''

baos.reset()
dump2(sql, 'Record')
assert baos.toString().readLines()*.trim().join('\n') == '''\
----------------- CONTENT OF TABLE Record ------------------
RUNID       TIME        VENUE       WHENRUN     FKATHLETE
------------------------------------------------------------
0           7495        Berlin      2003-09-28  0
1           7538        London      2002-04-14  1
2           7542        Chicago     1999-10-24  1
3           7565        Berlin      1998-09-20  2\
'''

You’ve seen that using GStrings provides for succinct and clear expression of our SQL commands, but there’s an alternative syntax using named and named-ordinal parameters. We’ll look at that next.

13.2.5. Working with named and named-ordinal parameters

You saw earlier that many Groovy SQL methods have multiple variants—String, String plus a list of parameters, and GString variants, to name three. For most users, these variants will be quite sufficient but there are a few scenarios where an alternate syntax is useful. In particular, when integrating with Java, where GStrings might be less convenient to use, or when handling multiple objects containing properties to be fed into the SQL. Sometimes also, you might wish to mix your queries with some templating solution where GStrings may not be convenient. In such circumstances, named and named-ordinal parameters are useful.

Named parameters

With this style, you use placeholders similar to what we showed earlier for PreparedStatements; however, instead of a series of question marks and a list of parameters, you have one parameter that could be a domain object or a map and each placeholder references the relevant property from the parameter. Two placeholder syntax styles are supported: using either a colon (:) before the property name or a question mark–dot (?.) before the property name.

First, let’s assume you have the following SQL insert fragment that will begin your SQL examples in this section:

def insertPrefix = '''
  INSERT INTO Athlete (firstname, lastname, dateOfBirth)
  VALUES '''

and the following map:

def loroupe = [first: 'Tegla', last: 'Loroupe', dob: '1973-05-09']

Then you can insert runner Loroupe into the database as follows using the colon form:

db.execute insertPrefix + '(:first,:last,:dob)', loroupe

Alternatively, you can use Groovy’s named parameter style method call to include the map inline (for runner Kristiansen):

db.execute insertPrefix + '(:first,:last,:dob)',
    first: 'Ingrid', last: 'Kristiansen', dob: '1956-03-21'

If you wanted to use domain objects, you might declare one like so:

@Canonical class Athlete { String first, last, dob }
def ndereba = new Athlete('Catherine', 'Ndereba', '1972-07-21')

You could add this athlete as follows using the question mark form:

db.execute insertPrefix + '(?.first,?.last,?.dob)', ndereba
Named-ordinal parameters

As a final example, perhaps the information you have is in multiple objects (in this case one domain object and one map). In that case, use an alternative to the question mark–dot form that also includes the numeric index of which parameter is referred to. Here’s an example:

def takahashi = new Athlete('Naoko', 'Takahashi')
def takahashiExtra = [dob: '1972-05-06']

db.execute insertPrefix + '(?1.first,?1.last,?2.dob)',
    takahashi, takahashiExtra

Putting these snippets together gives us the following listing.

Listing 13.14. Using named and named-ordinal parameters
import groovy.transform.Canonical
import util.DbUtil

def sql = DbUtil.create()
DbUtil.populate(sql)

def insertPrefix = '''
INSERT INTO Athlete (firstname, lastname, dateOfBirth) VALUES
'''

sql.execute insertPrefix + '(:first,:last,:dob)', first: 'Ingrid',
        last: 'Kristiansen', dob: '1956-03-21'

def loroupe = [first: 'Tegla', last: 'Loroupe', dob: '1973-05-09']
sql.execute insertPrefix + '(:first,:last,:dob)', loroupe

@Canonical class Athlete { String first, last, dob }

def ndereba = new Athlete('Catherine', 'Ndereba', '1972-07-21')
sql.execute insertPrefix + '(?.first,?.last,?.dob)', ndereba

def takahashi = new Athlete('Naoko', 'Takahashi')
def takahashiExtra = [dob: '1972-05-06']
def namedOrdinalSuffix = '(?1.first,?1.last,?2.dob)'
sql.execute insertPrefix + namedOrdinalSuffix, takahashi, takahashiExtra

assert sql.firstRow('SELECT COUNT(*) as num FROM Athlete').num == 7

That wraps up our discussion on this special parameter syntax. Next we’ll look at Groovy’s special support for stored procedures.

13.2.6. Using stored procedures

One feature that many databases support is the ability to store code or functions in the database itself. We won’t argue whether this is always an ideal practice but we acknowledge that there are times when we’ve needed to support use of such procedures. JDBC has support for such scenarios, but it can be a little cumbersome. Groovy SQL streamlines the process a little bit for you. We’ll also note that the details of defining stored procedures and functions may vary slightly depending on your database[14] but how you call equivalent procedures across varying databases should remain the same.

14 And to be honest there isn’t much Groovy SQL can do to assist you with these differences!

We’ll look at creating and then using stored procedures. First you’ll create a SELECT_ATHLETE_RECORD stored function, which returns a table of data. Then you’ll process the returned table using eachRow as shown in the following listing.

Listing 13.15. Working with a stored procedure
import util.DbUtil

def sql = DbUtil.create()
DbUtil.populate(sql)

sql.execute '''
  CREATE FUNCTION SELECT_ATHLETE_RECORD ()
  RETURNS TABLE (lastname VARCHAR(64), venue VARCHAR(64), whenRun DATE)
  READS SQL DATA
  RETURN TABLE (
    SELECT Athlete.lastname, Record.venue, Record.whenRun
    FROM Athlete, Record
    WHERE Athlete.athleteId = Record.fkAthlete
    ORDER BY whenRun
  )
'''
def result = []
sql.eachRow('CALL SELECT_ATHLETE_RECORD()') {
  result << "$it.lastname $it.venue $it.whenRun"
}
assert result == [
    'da Costa Berlin 1998-09-20',
    'Khannouchi Chicago 1999-10-24',
    'Khannouchi London 2002-04-14',
    'Tergat Berlin 2003-09-28'
]

Creating the stored function involved executing the appropriate SQL. The stored function joins information from the Athlete and Record tables and returns the resulting rows. You used eachRow to process those rows but could have used any of the methods for processing rows.

Consider now a FULL_NAME stored function that takes a parameter. If you give it an athlete’s last name, it will return the full name. The following listing shows how you could create and then use such a function.

Listing 13.16. Working with a stored procedure with simple parameters
import util.DbUtil

def sql = DbUtil.create()
DbUtil.populate(sql)

sql.execute '''
  CREATE FUNCTION FULL_NAME (p_lastname VARCHAR(64))
  RETURNS VARCHAR(100)
  READS SQL DATA
  BEGIN ATOMIC
    DECLARE ans VARCHAR(100);
    SELECT CONCAT(firstname, ' ', lastname) INTO ans
    FROM Athlete WHERE lastname = p_lastname;
    RETURN ans;
  END
'''

assert sql.firstRow("{? = CALL FULL_NAME(?)}",
    ['Tergat'])[0] == 'Paul Tergat'

The creation of the stored function is again a simple execute statement. You can again use any of the reading methods. It makes most sense to use firstRow here because you’ll always get back only one row. As before, our SQL statement will involve using the SQL call method but we’ll also use a placeholder to indicate that a parameter is required and another to indicate that a result is returned.

So far our stored functions have returned single or multiple rows. JDBC also supports what are known as IN, OUT, and INOUT parameters for stored procedures. This mechanism allows multiple unrelated return values. We’ll illustrate this mechanism by creating a stored procedure to concatenate two strings. Groovy and Java both support native string concatenation but we’ll ignore that fact for the purposes of this example. We’ll start by defining our procedure in the following listing.

Listing 13.17. A stored procedure with IN and OUT parameters
import groovy.sql.Sql
import util.DbUtil

def sql = DbUtil.create()
DbUtil.populate(sql)

sql.execute '''
  CREATE PROCEDURE CONCAT_NAME (OUT fullname VARCHAR(100),
    IN first VARCHAR(50), IN last VARCHAR(50))
  BEGIN ATOMIC
    SET fullname = CONCAT(first, ' ', last);
  END
'''

sql.call("{call CONCAT_NAME(?, ?, ?)}",
    [Sql.VARCHAR, 'Paul', 'Tergat']) {
  fullname -> assert fullname == 'Paul Tergat'
}

When defining the stored procedure we use two input parameters (first and last) and one output parameter (fullname), but we could have any combination of IN, OUT, and INOUT parameters if needed. To invoke our stored procedure we use Groovy’s special call method. This method supports a number of special conventions. The first and last names are passed in the normal way, but for the OUT parameter we instead pass in the type that the stored procedure will produce, which for us is a VARCHAR. Because we have an OUT parameter, we’ll use the Closure variant of the method. It calls our closure passing in the OUT parameter from the stored procedure, and for this example we’ll check its value with an assertion.

You’ve seen how easy it is to execute SQL with Groovy including advanced use techniques. Wouldn’t it be nice not to have to worry about the SQL at all? Unlikely as that concept sounds, it’s the topic of our next section.

Sign in for more free preview time

13.3. DataSets for SQL without SQL

We demanded that simple tasks should be easy. So far, you’ve seen simple SQL and easy ways for sending it to the database. It’s hard to believe that database programming can be any simpler, but it can.

Groovy provides a basic way of working with the database that doesn’t even work with SQL. This approach is based on the concept of a DataSet, and we’ll look at each of the operations it supports:

  • Adding a row to a table
  • Working through all rows of a table or a view
  • Selecting rows of a table or a view by simple expressions

You cannot define a schema that way or use delete or update operations. But you can mix the use of DataSets with other Groovy SQL operations and use whatever seems most appropriate for the task at hand.

A groovy.sql.DataSet is a subclass of and a decorator around groovy.sql.Sql. Figure 13.2 shows the UML class diagram.

Figure 13.2. UML class diagram of groovy.sql.DataSet decorating groovy.sql.Sql

The conventional way of retrieving a DataSet instance is to call Sql’s factory method dataSet. You pass it the name of the table that this DataSet should work with. For more alternatives, see the API documentation of Sql and DataSet:

// assuming sql refers to an instance of Sql
athletes = sql.dataSet('Athlete')


!@%STYLE%@!
{"css":"{\"css\": \"font-weight: bold;\"}","target":"[[{\"line\":1,\"ch\":15},{\"line\":1,\"ch\":22}]]"}
!@%STYLE%@!

Let’s explore what you can do with such an instance.

13.3.1. Using DataSet operations

With an instance of a DataSet, you can call its methods, as listed in figure 13.2. We can add a new row to the Athlete table with

athletes.add(
    firstname:   'Paula',
    lastname:    'Radcliffe',
    dateOfBirth: '1973-12-17')


!@%STYLE%@!
{"css":"{\"css\": \"font-weight: bold;\"}","target":"[[{\"line\":0,\"ch\":9},{\"line\":0,\"ch\":12}]]"}
!@%STYLE%@!

That’s all we need to do. A SQL insert statement will be created behind the scenes and executed immediately. If we omit any of the fields, a null value will be inserted instead.

We can also use the athletes to work with what’s currently in the table. The code

athletes.each {
    println it.firstname
}


!@%STYLE%@!
{"css":"{\"css\": \"font-weight: bold;\"}","target":"[[{\"line\":0,\"ch\":9},{\"line\":0,\"ch\":13}]]"}
!@%STYLE%@!

would print

Paul
Khalid
Ronaldo
Paula

Let’s see this in action in a complete listing.

Listing 13.18. Using DataSets for SQL-free code

When processing each athlete (note the use of the implicit it variable), it works analogously to the GroovyResultSet you saw before: you can use fieldnames as if they were properties and use positive or negative indexes. The goal of the abstraction provided by the DataSet methods (each, add, and findAll) is that a DataSet can be used in much the same way as any other collection and no SQL needs to appear in your code (for example, ). The abstraction has been kept simple, so other methods you might expect to find on collections (for example, collect) aren’t currently found within the DataSet class. Instead there are two hooks, via the firstRow and rows methods, which then allow you to call collect if required .

Now comes the findAll method, which looks simple at first but turns out to be very sophisticated. Let’s start with trying

athletes.findAll{ it.dateOfBirth > '1970-1-1' }


!@%STYLE%@!
{"css":"{\"css\": \"font-weight: bold;\"}","target":"[[{\"line\":0,\"ch\":9},{\"line\":0,\"ch\":16}]]"}
!@%STYLE%@!

This method call returns a new DataSet, which can in turn be used with the each method to work over the filtered result:

youngsters = athletes.findAll{ it.dateOfBirth > '1970-1-1' }
youngsters.each { println it.firstname }


!@%STYLE%@!
{"css":"{\"css\": \"font-weight: bold;\"}","target":"[[{\"line\":0,\"ch\":0},{\"line\":0,\"ch\":10}],[{\"line\":1,\"ch\":0},{\"line\":1,\"ch\":10}],[{\"line\":0,\"ch\":0},{\"line\":0,\"ch\":10}],[{\"line\":1,\"ch\":0},{\"line\":1,\"ch\":10}]]"}
!@%STYLE%@!

What’s behind this construction? At first sight, you might guess that the findAll method fetches all the rows from the table, applying the closure and adding rows that pass the filter to a list internally. This would be far too time-consuming for large tables. Instead, findAll produces a SQL statement that reflects the expression within the closure. This generated statement is encapsulated in the returned youngsters DataSet.

It’s hard to believe that Groovy can do that,[15] but proof is available. Any DataSet encapsulates a statement in its sql property, and because that’s the SQL of a prepared statement, it also needs parameters, which are stored in the parameters property. Let’s find out what these properties are for our sample code:

15 It may be slightly easier to believe if you’ve looked at Microsoft’s LINQ project or similar projects. Groovy has had this feature since before LINQ was generally available but now the technique is widely used.

youngsters = athletes.findAll{ it.dateOfBirth > '1970-1-1' }
println youngsters.sql
println youngsters.parameters
youngsters.each { println it.firstname }


!@%STYLE%@!
{"css":"{\"css\": \"font-weight: bold;\"}","target":"[[{\"line\":1,\"ch\":19},{\"line\":1,\"ch\":22}],[{\"line\":2,\"ch\":19},{\"line\":2,\"ch\":29}]]"}
!@%STYLE%@!

These lines print

select * from Athlete where dateOfBirth > ?
["1970-1-1"]
Khalid
Ronaldo
Paula

So take note:

  • findAll only creates a new DataSet (with the enclosed prepared statement).
  • findAll doesn’t even access the database.
  • Only the trailing each triggers the database call.

To prove this to yourself, you can add logging to the program in the same way we did in section 10.1.2. Logging is useful during development to see when the database is accessed, as well as how it’s accessed.

But the buck doesn’t stop here. Because the findAll method returns a DataSet that can be interpreted as a filtered selection of the original DataSet (which was the whole Athlete table in our example), it would be surprising if it weren’t possible to combine filters. And yes, you can. The lines

youngsters = athletes.findAll{ it.dateOfBirth > '1970-1-1' }
paula      = youngsters.findAll{ it.firstname == 'Paula' }
println paula.sql
println paula.parameters


!@%STYLE%@!
{"css":"{\"css\": \"font-weight: bold;\"}","target":"[[{\"line\":0,\"ch\":0},{\"line\":0,\"ch\":10}],[{\"line\":1,\"ch\":13},{\"line\":1,\"ch\":23}],[{\"line\":0,\"ch\":0},{\"line\":0,\"ch\":10}],[{\"line\":1,\"ch\":13},{\"line\":1,\"ch\":23}]]"}
!@%STYLE%@!

print

select * from Athlete where dateOfBirth > ? and firstname = ?
[1970-1-1, Paula]


!@%STYLE%@!
{"css":"{\"css\": \"font-weight: bold;\"}","target":"[[{\"line\":0,\"ch\":44},{\"line\":0,\"ch\":47}]]"}
!@%STYLE%@!

Interestingly enough, we can achieve the same effect by providing a combined filter expression in the findAll closure:

youngsters = athletes.findAll{
    it.dateOfBirth > '1970-1-1' && it.firstname == 'Paula'
}


!@%STYLE%@!
{"css":"{\"css\": \"font-weight: bold;\"}","target":"[[{\"line\":1,\"ch\":32},{\"line\":1,\"ch\":34}]]"}
!@%STYLE%@!

You can legitimately ask how this could possibly work. Here is the answer: the expression in the findAll closure is never executed! Instead, the DataSet implementation fetches Groovy’s internal representation of the closure’s code. This internal representation is the AST and was generated by the Groovy parser. By walking over the AST (with a Visitor pattern), the DataSet implementation emits the SQL equivalent of each AST node. The mapping is listed in table 13.4.

Table 13.4. Mapping of Groovy AST nodes to their SQL equivalents

AST node

SQL equivalent

&& And
|| Or
== =
Other operators Themselves, literally
it.propertyname propertyname
Constant expression ? (Expression is added to the parameters list)

This also means that the following restrictions apply for expressions inside the findAll closure:

  • They must be legal Groovy code (otherwise, the Groovy parser fails).
  • They must contain only expressions as listed in table 13.4, excluding variables and method calls.

These restrictions limit the possibilities of filtering DataSets. Conversely, this approach brings a new quality to database programming: using the parser of your programming language for checking your selection expression at compile time.

If you put syntactically invalid SQL into a string and pass it to Sql’s execute method, you won’t notice the error until the database is accessed and throws a SqlException.

If you put a syntactically invalid expression into a findAll closure and choose to compile your code, the compiler fails without accessing the database. You also get better error messages that way, because the compiler can point you to the offending code. With good IDE support, your IDE can open the editor on such failing code or even highlight the error while editing.

That’s quite a lot to absorb. Let’s look at what we’ve discussed in a complete listing.

Listing 13.19. Using DataSets with filtering
import util.DbUtil

def sql = DbUtil.create()
DbUtil.populate(sql)
DbUtil.enableLogging()

def athletes = sql.dataSet('Athlete')

athletes.add(
    firstname: 'Paula',
    lastname: 'Radcliffe',
    dateOfBirth: '1973-12-17'
)

def query = athletes.findAll { it.firstname >= 'P' }
query = query.findAll { it.dateOfBirth > '1970-01-01' }
query = query.sort { it.dateOfBirth }
query = query.reverse()
assert query.sql == 'select * from Athlete where firstname >= ? and ' +
    'dateOfBirth > ? order by dateOfBirth DESC'
assert query.parameters == ['P', '1970-01-01']
assert query.rows()*.firstname == ['Paula', 'Ronaldo']

Now might be a good time to have a cup of coffee. Let the last couple of pages sink in. Read them again. Try a few example queries for yourself. This ability to view the code within the closure as data and transform it into another type of code (SQL) rather than a block to be executed may be one of the most important concepts in ushering in a new era of database application development.

So far you’ve seen DataSets working on a single table only. We’ll next explore how to use this concept more generally.

13.3.2. DataSets on database views

DataSets are a convenient way to work on a single table. But working on a single table is usually not of much value in a relational model.

You saw earlier that we stored the marathon world records for our athletes in a separate table. Each row in the Record table captures how many seconds a particular marathoner took, and when and where it happened. For relating such a row with the according athlete, we refer to the athlete’s unique ID, the athleteId, by the foreign key fkAthlete. Figure 13.3 shows the relationship. Note that we also introduce a recordId[16] to give this performance a unique handle.

16 There’s no pressing need for the recordId. We introduce it because that’s our usual working pattern when creating tables.

Figure 13.3. Entity-relationship diagram of athletes and multiple records

For filling the Record table with example data, we unfortunately cannot easily use a DataSet; we’d need to know the corresponding athleteId, which we cannot foresee because it’s dynamically generated by the database. The next best solution is to use a helper method that executes an insert statement to retrieve the athleteId from a subselect. Here’s some sample code, which uses parameters for most values but has a hard-coded distance for demonstration purposes. Likewise, it assumes there will be only one athlete with a given last name—something we wouldn’t do in real life code:

def insertRecord(h, m, s, venue, date, lastname){
    def time = h*60*60 + m*60 + s
    db.execute """
        INSERT INTO Record (time, venue, when, fkAthlete)
            SELECT $time, $venue, $date,
                athleteId FROM Athlete WHERE lastname=$lastname;
    """
}


!@%STYLE%@!
{"css":"{\"css\": \"font-weight: bold;\"}","target":"[[{\"line\":3,\"ch\":47},{\"line\":3,\"ch\":56}],[{\"line\":5,\"ch\":16},{\"line\":5,\"ch\":25}]]"}
!@%STYLE%@!

We can now call the insertRecord method with example data:

insertRecord(2,4,55, 'Berlin',  '2003-09-28', 'Tergat')
insertRecord(2,5,38, 'London',  '2002-04-14', 'Khannouchi')
insertRecord(2,5,42, 'Chicago', '1999-10-24', 'Khannouchi')
insertRecord(2,6,05, 'Berlin',  '1998-09-20', 'da Costa')

After this preparation, how can we use DataSets to list runs for an athlete name? We need to join the information from that Record table with the information from the Athlete table to retrieve the names.

Of course, we could read both tables and do the join programmatically, but that wouldn’t leverage the power of the relational model and wouldn’t perform well because of the overhead of each database call.

The trick is to create a database view that behaves like a read-only table made up from an arbitrary selection.

Here’s how to create a view named AthleteRecord that combines athletes with their records as if we have a combined table that contains both tables but only for athletes for whom we have record information:

DROP   VIEW AthleteRecord IF EXISTS;
CREATE VIEW AthleteRecord AS
    SELECT * FROM Athlete INNER JOIN Run
        ON fkAthlete=athleteId;


!@%STYLE%@!
{"css":"{\"css\": \"font-weight: bold;\"}","target":"[[{\"line\":0,\"ch\":7},{\"line\":0,\"ch\":11}],[{\"line\":1,\"ch\":7},{\"line\":1,\"ch\":11}],[{\"line\":0,\"ch\":12},{\"line\":0,\"ch\":25}],[{\"line\":1,\"ch\":12},{\"line\":1,\"ch\":25}]]"}
!@%STYLE%@!

With this view, we can create a DataSet and work with it as if it were one big table.[17] To find where Khalid Khannouchi performed his records, we can use

17 You may wish to compare this approach with the SELECT_ATHLETE_RUN stored procedure earlier in this chapter.

record = db.dataSet('AthleteRecord').findAll{ it.firstname=='Khalid' }
record.each{ println it.lastname + ' ' + it.venue }


!@%STYLE%@!
{"css":"{\"css\": \"font-weight: bold;\"}","target":"[[{\"line\":0,\"ch\":20},{\"line\":0,\"ch\":35}],[{\"line\":1,\"ch\":24},{\"line\":1,\"ch\":32}],[{\"line\":1,\"ch\":44},{\"line\":1,\"ch\":49}]]"}
!@%STYLE%@!

which prints

Khannouchi London
Khannouchi Chicago

Let’s have a look at these snippets as a complete listing.

Listing 13.20. Using DataSets with views
import util.DbUtil

def sql = DbUtil.create()
DbUtil.populate(sql)

sql.execute '''
    DROP   VIEW AthleteRecord IF EXISTS;
    CREATE VIEW AthleteRecord AS
      SELECT * FROM Athlete LEFT OUTER JOIN Record
        ON fkAthlete=athleteId;
'''

def records = sql.dataSet('AthleteRecord').findAll {
  it.firstname == 'Khalid'
}
def result = records.rows().collect { "$it.lastname $it.venue" }
assert ['Khannouchi London', 'Khannouchi Chicago'] == result

What you’ve done here is remove SQL-specific knowledge, such as how to join two tables, from the application. This makes the code more portable across database vendors, as well as making it readable to developers who may not be particularly skilled in SQL. This comes at the expense of putting it into the infrastructure (the database setup code). This requires the database structure to be under your control. In large organizations, where the database is maintained by an entirely different set of people, the challenge is to get these administrators on board for efficient collaboration and for leveraging their database knowledge in your project.

You now have the tools you need to access a database. Giving someone a chisel doesn’t make them a carpenter. How the tools are used is as important as the tools themselves.

join today to enjoy all our content. all the time.
 

13.4. Organizing database work

Knowing the technical details of database programming is one thing, but organizing a whole application for database use takes more than that. You have to take care of design considerations such as separation of concerns, assigning responsibility, and keeping the codebase manageable and maintainable—free from duplication.

This section will give you insight into how Groovy SQL fits into the overall architecture of a database application. We’ll plan the architecture, define what the application has to be capable of, and then implement the application in a layered fashion, examining how Groovy makes things easier at every level. No single and authoritative solution fits all needs. Instead, you need to use your imagination and creativity to find out how to relate the presented rules, structures, and patterns to the situation at hand.

13.4.1. Architectural overview

Today’s architectural patterns usually call for a layered architecture, as depicted in figure 13.4. The lowest layer is the infrastructure that shields all upper layers from SQL specifics. It presents DAOs to the domain model layer above it. There often is a one-to-one relationship between business objects in the domain model layer and DAOs. Classically, DAOs and business objects pass DTOs back and forth for communication.

Figure 13.4. Layered architecture for database programming

Above the domain model layer is the application layer, which makes use of the business objects in its workflow and presents them within the UI.

Layering also means that any layer may call the layer below it, but never the one above. Strict layering also forbids calling layers deeper than the one directly below; for example, calls from the application layer to the infrastructure layer would be forbidden.

With the advent of Groovy SQL, things can be done more easily. First, custom-built DTOs become obsolete, due to the dynamic nature of Groovy’s classes. There’s no more need to create special classes for each DTO type. A DAO can exchange information with transparent types—types that are independent of any DAO or business object specifics. Good candidates for transparent DTOs are GroovyRowResult, Map, List, and Expando. For DTOs that should encapsulate a collection of business objects, a list of these DTOs or a DataSet may be used.

Note

With layering as in figure 13.4, DAOs aren’t allowed to directly return business objects, because calling their constructor would mean calling into the upper domain model layer. As a trick, they can pass back a map of properties and let the caller object do the construction, such as new MyBusiness-Object(map).

For simple read-only data, business objects can also be replaced by transparently using a GroovyRowResult, a Map, or an Expando. Suppose the following line exists in the application code:

out << athlete.firstname

To a reader of this code, everything looks like athlete is a business object. But you cannot tell whether it’s really of type Athlete. It could just as well be a GroovyRowResult, a Map, or an Expando. From the code, it all looks the same.

Of course, this works only in simple scenarios. If you go for domain-driven design,[18] you’ll want to implement your business objects explicitly (most often with the help of GroovyBeans).

18 Domain-Driven Design: Tackling Complexity in the Heart of Software, by Eric Evans (Addison Wesley, 2003).

DAOs can sometimes be replaced by transparently using a DataSet, as you saw in the previous section. There’s a crucial point about DataSets that makes this possible: the way they handle findAll. DAOs shouldn’t expose SQL specifics to their caller, because that makes the infrastructure layer leaky. Conventional DAOs often break this constraint by allowing the caller to pass parts of the WHERE clause; or they end up with a plethora of methods like

findByFirstName(firstname)
findByLastName(lastname)
findByFirstAndLastName(firstname, lastname)
findByBirthdateBefore(date)
...

You’ve also seen that DataSets can replace DAOs, which represent sophisticated relations by providing the appropriate view in the database schema.

All this is interesting in theory, but it’s what it looks like in practice that counts. In the next section, we’ll examine some real code.

13.4.2. Specifying the application behavior

Thinking through the architecture is nice, but only the code tells the truth. So let’s go for a full example of managing our athletes.

We’ll use a layered architecture similar to figure 13.4, albeit not a strict version. Our general approach is bottom-up. We begin at the infrastructure layer, starting with helpers and deciding what DAOs we’re going to provide. DTOs will all be transparent. From our decisions about DAOs, the business objects will fall into place almost automatically. Finally, we have to implement the application. Because our current focus is on database programming, we’ll keep the UI and workflow basic and provide a small command-line interface.

Here is how the application should work. The application should start by creating the database schema. With logging enabled, we should see the following output when the application starts:

DROP   TABLE Athlete    IF EXISTS;
CREATE TABLE Athlete (
    athleteId    INTEGER GENERATED BY DEFAULT AS IDENTITY,
    dateOfBirth  DATE,
    firstname    VARCHAR(64),
    lastname     VARCHAR(64)
);

Entering athletes should be like in this transcript (input in bold):

create Paul Tergat 1969-06-17
1 Athlete(s) in DB:
id firstname  lastname     dateOfBirth
0: Paul       Tergat       1969-06-17
create Khalid Khannouchi
2 Athlete(s) in DB:
id firstname  lastname     dateOfBirth
0: Paul       Tergat       1969-06-17
1: Khalid     Khannouchi   null


!@%STYLE%@!
{"css":"{\"css\": \"font-weight: bold;\"}","target":"[[{\"line\":0,\"ch\":0},{\"line\":0,\"ch\":29}],[{\"line\":4,\"ch\":0},{\"line\":4,\"ch\":24}]]"}
!@%STYLE%@!

Note that we use the create operation and pass parameters in a well-known sequence. Missing parameters result in null values. The current list of athletes is displayed after the operation, sorted by the automatically generated ID.

The update operation should work for a given ID, fieldname, and new value:

update 1 dateOfBirth 1971-12-22
1 row(s) updated
2 Athlete(s) in DB:
id firstname  lastname     dateOfBirth
0: Paul       Tergat       1969-06-17
1: Khalid     Khannouchi   1971-12-22


!@%STYLE%@!
{"css":"{\"css\": \"font-weight: bold;\"}","target":"[[{\"line\":0,\"ch\":0},{\"line\":0,\"ch\":31}]]"}
!@%STYLE%@!

The list of athletes should be sortable, where the sort is performed by the database, not in the application code. It needs to support multiple-column sorts:

sort firstname
2 Athlete(s) in DB:
id firstname  lastname     dateOfBirth
1: Khalid     Khannouchi   1971-12-22
0: Paul       Tergat       1969-06-17


!@%STYLE%@!
{"css":"{\"css\": \"font-weight: bold;\"}","target":"[[{\"line\":0,\"ch\":0},{\"line\":0,\"ch\":14}]]"}
!@%STYLE%@!

The delete operation should accept an ID and delete the corresponding row:

delete 1
1 row(s) deleted
1 Athlete(s) in DB:
id firstname  lastname     dateOfBirth
0: Paul       Tergat       1969-06-17


!@%STYLE%@!
{"css":"{\"css\": \"font-weight: bold;\"}","target":"[[{\"line\":0,\"ch\":0},{\"line\":0,\"ch\":8}]]"}
!@%STYLE%@!

The application is to be terminated with the exit operation.

No validation of user input needs to be implemented; we also don’t need to gracefully handle database errors resulting from bad user input.

Let’s see how to design and implement the infrastructure, domain model, and application layer to make this functionality work.

13.4.3. Implementing the infrastructure

The infrastructure contains helpers and DAOs. For our example, we have a single helper class DbHelper, an AthleteDAO, and a general abstract DataAccessObject as depicted in figure 13.5.

Figure 13.5. UML class diagram of the athlete example’s infrastructure layer

The DbHelper is responsible for providing access to an instance of groovy.sql.Sql through its db property and setting it to a default value. The second responsibility is to support automatic schema creation by executing the DDL for a given DataAccessObject.

The DataAccessObject is a general implementation of the basic CRUD operations. The AthleteDAO is a specialization of a DataAccessObject providing the least possible information for accessing an Athlete table: the fieldnames and their types.

We’ll next go through the classes to see how they implement their responsibilities.

Implementing DbHelper

The implementation of DbHelper as in listing 13.21 yields no surprises. It contains the code for a database connection via the Sql class and the SQL template for creating a table. Unlike in previously presented variants, we now use a SimpleTemplateEngine for separation of concerns.

The template contains the structure of a simple table definition in SQL; the DataAccessObject as passed into executeDdl is used for getting details about the table name and other schema details, such as fieldnames and their SQL types.

Listing 13.21. Athlete example infrastructure: DbHelper
package layering

import groovy.sql.Sql
import groovy.text.SimpleTemplateEngine as STE

import org.hsqldb.jdbc.JDBCDataSource

class DbHelper {
  Sql db

  DbHelper() {
    db = new Sql(new JDBCDataSource(
        database: 'jdbc:hsqldb:mem:GinA', user: 'sa', password: ''))
  }

  def simpleTemplate = new STE().createTemplate('''
DROP   TABLE $name    IF EXISTS;
CREATE TABLE $name (
    ${lowname}Id    INTEGER GENERATED BY DEFAULT AS IDENTITY,
$fields
);
''')

  def executeDdl(DataAccessObject dao) {
    def template = simpleTemplate
    def binding = [
        name: dao.tablename,
        lowname: dao.tablename.toLowerCase(),
        fields: dao.schema.collect { key, val ->
          "    ${key.padRight(12)} $val" }.join(",\n")
    ]
    def stmt = template.make(binding).toString()
    db.execute stmt
  }
}

At first glance, this may look like an oversimplification of SQL table definitions, because we don’t have to deal with foreign keys or other constraints, views, joins, and so forth. But it would be easy to expand DbHelper to also cover those scenarios by providing correspondingly amended templates.

Because this class works in collaboration with a DataAccessObject, that’s the next class to implement.

Implementing DataAccessObject

DAOs encapsulate the knowledge of how to do basic CRUD operations with the database, and DataAccessObject is the general superclass that collects common functionality for DAOs. With Groovy SQL, so many operations can be done generally that this superclass grows large in comparison to its subclasses.

In addition to the CRUD operations, DataAccessObject uses the structural information that its subclasses provide through their class names and the getFields method to build the DAOs’ meta-information in a general way.

Subclasses are expected to follow the naming convention of MyTableDAO for a table of name MyTable. Their getFields method is expected to return a list of strings, alternating between the fieldnames and their SQL type descriptions.

The following listing shows how DataAccessObject uses this information to expose the table name, fieldnames, schema, and so forth.

Listing 13.22. Athlete example infrastructure: DataAccessObject

Note that the CRUD operations work with prepared statements. The update and delete statements both use the id column to identify a row, obtaining the appropriate where clause using properties . The creation operation takes a list of values, which it converts into a map by assuming they’re in the same order as the field list provided by the subclass via the getFields method . A single read operation is provided, but because db is available as a property, callers can provide their own queries easily enough. For this particular application, we don’t need any other read operations anyway.

The all method returns business objects transparently as maps.

Implementing AthleteDAO

With all the hard work already done in DataAccessObject, implementing the Athlete-DAO is a breeze. It’s hardly worth an object.

The following listing shows how AthleteDAO needs to do nothing else but subclass DataAccessObject and provide the field information.

Listing 13.23. Athlete example infrastructure: AthleteDAO
package layering

class AthleteDAO extends DataAccessObject {
  List getFields() {
    return [
        'firstname',   'VARCHAR(64)',
        'lastname',    'VARCHAR(64)',
        'dateOfBirth', 'DATE'
    ]
  }
}

If you ever need specialized versions of CRUD operations or elaborate finder methods, such a DAO provides the place to put it in. For simple applications, a DAO is overkill and you can get by without one, as you’ll see in the next section.

13.4.4. Using a transparent domain model

Our application uses transparent business objects, implemented as maps. There is no Athlete class as you might expect.

Of course, if we ever needed one, we could easily create it like this:

class Athlete {
    def firstname
    def lastname
    def dateOfBirth
}

Inside the application, we could create these objects, for example, from an Athlete-DAO call like

athletes = athleteDAO.all('firstname').collect{ new Athlete(it) }

The reason for not introducing such business objects is that they currently add no value. All their information (the fieldnames) is already available in the DAO.

The point at which to start using such business objects is when they begin to depend on other objects in the domain layer or when they provide additional behavior, such as specialized methods.

In the next section, you’ll see that simple applications are even easier when using transparent business objects.

13.4.5. Implementing the application layer

The application layer is implemented in the AthleteApplication class. Listing 13.24 reveals that it does little more than call the infrastructure and display the transparent business objects.

The mainLoop method reads the user input from the console, interpreting the first word as the operation and any additional input as parameters. It passes this information to invokeMethod, which automatically dispatches to the according method call. Each keyword is implemented by a method of the same name.

Listing 13.24. Athlete example application layer: AthleteApplication

To use the application, you first need to initialize the database before calling the main loop of the class . Because the commands are provided as the method name followed by the arguments, you can tokenize each line and treat it as a method call. Of course, you’d have lots of validation in a real system, but it’s amazing how a functional console interface can be implemented with so little code.

You can see this in action in the following listing, which shows a script that fires up the application, ready for input from the user at the console prompt.

Listing 13.25. Running the Athlete example application
import layering.*

def app = new AthleteApplication()
app.init()
app.mainLoop()

It wasn’t intended originally, but this little application effectively implements a domain-specific language: a simple line-oriented command language for manipulating the Athlete table. This example provides a good way to learn Groovy SQL. It’s worth playing with the given code and expanding it in multiple dimensions: more DAOs, relationships between DAOs (one-to-one, one-to-many), views, more operations, and a more sophisticated UI.

You might wonder how to test our AthleteApplication class. Because it takes input from the console, you can certainly perform manual testing and we’d encourage you to do so. It always pays to do some exploratory testing[19] in addition to running your test suites. But we can create a test for our regression suites too by intercepting the standard input and output streams as shown in the following listing.

19 See www.kaner.com/pdfs/QAIExploring.pdf and James Bach, “What is Exploratory Testing? And How it Differs from Scripted Testing,” Satisfice Inc., http://www.satisfice.com/articles/what_is_et.shtml.

Listing 13.26. Testing the Athlete example application

By now, you should have a good idea of how to possibly organize your code around Groovy SQL to work with relational databases. It has become increasingly popular, however, to also consider nonrelational data stores. Groovy shines here as well.

Sign in for more free preview time

13.5. Groovy and NoSQL

We don’t have sufficient space to cover the topic of NoSQL databases at an in-depth level. Such information is covered elsewhere.[20] Instead we plan to whet your appetite by diving straight into showing you how easy it is to use a couple of the popular NoSQL databases: MongoDB, a document database, and Neo4j, a graph database. We’ll start with MongoDB.

20 See Seven Databases in Seven Weeks, by E. Redmond and J. R. Wilson (Pragmatic Programmers, 2012) or “NoSQL,” https://en.wikipedia.org/wiki/NoSQL.

13.5.1. MongoDB: A document-style database

Relational database systems like you saw earlier in this chapter store data in (typically) highly normalized two-dimensional tables with rows and columns. Data values are (typically) strictly typed and tables can be joined into more complex ones. If you can predict your data use ahead of time, it’s relatively straightforward to design appropriate database schemas to allow efficient queries to be carried out.

Document databases take a different tack. Instead of aiming for high levels of normalization, document databases aim to keep related material together in a document or nested map of information. This allows more ad hoc data structures to be used and lends itself to being able to scale in large distributed environments. We’ll examine MongoDB[21] (www.mongodb.org) as our exemplar document database.

21 Another good source is Kyle Banker, MongoDB in Action, 2nd edition, (Manning Publications, 2011), www.manning.com/banker/. A second edition is to be published this year.

Installing MongoDB

So let’s get started. MongoDB runs as a separate server process. You’ll need to download and install it by following the instructions on the MongoDB website. Once it’s installed, run the mongod executable from the command line. You now have a database server you can talk to. You might want to optionally play with the MongoDB shell to familiarize yourself with some of the database’s features but we’ll move straight on to accessing it from Groovy via its API.

Using MongoDB

MongoDB has a Java API. Thanks to Groovy’s great Java integration, to interact with your MongoDB server from Groovy, you could use that API directly but there’s also a special-purpose Groovy API called GMongo (https://github.com/poiati/gmongo). That’s what we’ll use.

Suppose now we wanted to store our original athlete information in MongoDB. We certainly could mirror our Athlete and Record tables that we used with our relational database but a more document-style approach would be to keep an athlete and his or her runs as a single document. So we’ll choose to store an athlete as a map of properties. One of those properties is the runs, which is a list of maps, each map in the list representing one run. Let’s work on a script for entering the athlete information.

Entering information

First, we’ll need to access the GMongo library. We’ll do that using @Grab. We’ll use a field in our script (hence the @Field annotation) to store our reference to the database. For good measure we’ll clean out any earlier versions of an athlete collection before adding Paul Tergat and his Berlin run into an athlete collection.

@Grab('com.gmongo:gmongo:1.3')
import com.gmongo.GMongo
import groovy.transform.Field

@Field db = new GMongo.getDB('athletes')
db.athletes.drop()
db.athletes << [first: 'Paul', last: 'Tergat', dob: '1969-06-17', runs: [
    [distance: 42195, time: 2*60*60 + 4*60 + 55,
        venue: 'Berlin', when: '2003-09-28']
]]

If we want to add lots of athletes and runs (perhaps at different times) it might be useful to define a couple of helper methods:

def insertAthlete(first, last, dob) {
  db.athletes << [first: first, last: last, dob: dob]
}

def insertRun(h, m, s, venue, date, lastname) {
  db.athletes.update(
      [last: lastname],
      [$addToSet: [runs: [distance: 42195,
          time: h * 60 * 60 + m * 60 + s,
          venue: venue, when: date]]]
  )
}

Note that because we’re storing an athlete’s runs with the athlete (all one document, remember) we use an update method and a special $addToSet operator, which together will accumulate any runs we add into the runs property. Here’s how we might use those methods:

insertAthlete('Khalid', 'Khannouchi', '1971-12-22')
insertAthlete('Ronaldo', 'da Costa', '1970-06-07')

insertRun(2,5,38, 'London',  '2002-04-14', 'Khannouchi')
insertRun(2,5,42, 'Chicago', '1999-10-24', 'Khannouchi')
insertRun(2,6,05, 'Berlin',  '1998-09-20', 'da Costa')

This should look very familiar to you. The fact that we split out the two helper methods means that we normalized to some extent data entry.

Given that we might want to add any arbitrary document, it should come as no surprise that GMongo supports mechanisms to enter whole documents not just rows. We’ll look at entering Paula Radcliffe’s information as a JSON document:

import com.mongodb.util.JSON

def radcliffe = """{
  first: 'Paula',
  last: 'Radcliffe',
  dob: '1973-12-17',
  runs: [
      {distance: 42195, time: ${2*60*60+15*60+25},
          venue: 'London', when: '2003-04-13'}
  ]
}"""

db.athletes << JSON.parse(radcliffe)

Now let’s look at forms for querying our athlete information.

Querying our database

First, we’ll check how many athletes we have and then list each athlete we find in the collection:

assert db.athletes.count == 4
db.athletes.find().each {
  println "$it._id $it.last ${it.runs.size()}"
}

For our system this prints out

516b15fc2b10a15fa09331f2 Tergat 1
516b15fc2b10a15fa09331f3 Khannouchi 2
516b15fc2b10a15fa09331f4 da Costa 1
516b15fc2b10a15fa09331f5 Radcliffe 1

If you run this yourself, you’ll no doubt see something similar but with your own unique IDs. Next, let’s confirm the athletes who have run in London:

def londonAthletes = db.athletes.find('runs.venue': 'London')*.first
assert londonAthletes == ['Khalid', 'Paula']

Now we’ll retrieve the first names of the athletes born after 1970 (sorted from youngest to oldest):

def youngAthletes = db.athletes.aggregate(
    [$project: [first: 1, dob: 1]],
    [$match: [dob: [$gte: '1970-01-01']]],
    [$sort: [dob: -1]]
)
assert youngAthletes.results()*.first == ['Paula', 'Khalid', 'Ronaldo']

Let’s look at those steps as a single listing.

Listing 13.27. Athletes stored in MongoDB
@Grab('com.gmongo:gmongo:1.3')
import com.gmongo.GMongo
import com.mongodb.util.JSON
import groovy.transform.Field

@Field db = new GMongo().getDB('athletes')
db.athletes.drop()
db.athletes << [first: 'Paul', last: 'Tergat', dob: '1969-06-17', records: [
        [time: 2 * 60 * 60 + 4 * 60 + 55,
         venue: 'Berlin', when: '2003-09-28']
]]

def insertAthlete(first, last, dob) {
    db.athletes << [first: first, last: last, dob: dob]
}

def insertRecord(h, m, s, venue, date, lastname) {
    db.athletes.update(
            [last: lastname],
            [$addToSet: [records: [time: h * 60 * 60 + m * 60 + s,
                                venue: venue, when: date]]]
    )
}

insertAthlete('Khalid', 'Khannouchi', '1971-12-22')
insertAthlete('Ronaldo', 'da Costa', '1970-06-07')

insertRecord(2, 5, 38, 'London', '2002-04-14', 'Khannouchi')
insertRecord(2, 5, 42, 'Chicago', '1999-10-24', 'Khannouchi')
insertRecord(2, 6, 05, 'Berlin', '1998-09-20', 'da Costa')

def radcliffe = """{
    first: 'Paula',
    last: 'Radcliffe',
    dob: '1973-12-17',
    records: [
        {time: ${2 * 60 * 60 + 15 * 60 + 25},
            venue: 'London', when: '2003-04-13'}
    ]
}"""

db.athletes << JSON.parse(radcliffe)

assert db.athletes.count == 4
db.athletes.find().each {
    println "$it._id $it.last ${it.records.size()}"
}

def londonAthletes = db.athletes.find('records.venue': 'London')*.first
assert londonAthletes == ['Khalid', 'Paula']

def youngAthletes = db.athletes.aggregate(
        [$project: [first: 1, dob: 1]],
        [$match: [dob: [$gte: '1970-01-01']]],
        [$sort: [dob: -1]]
)

assert youngAthletes.results()*.first == ['Paula', 'Khalid', 'Ronaldo']

That sure was a whirlwind tour of MongoDB, but we hope we’ve given you a flavor for the document style of NoSQL database. Next, we’ll look at a graph database.

13.5.2. Neo4J: A graph database

Graph databases store structured data in terms of nodes, edges, and properties. They’re ideally suited to data that’s highly interrelated. Individual nodes use edges to point to data they’re related to. When deep interconnections exist within the data, the extra overheads of storing this additional edge information yield extremely fast query and traversal times compared to more traditional approaches.

We’ll use the popular Neo4j (www.neo4j.org/) database to illustrate a graph database.[22] Let’s start by defining our athlete information. To make it more interesting from a graph point of view we’ll add information about the relationship between marathon records, such as when one record broke an earlier record. Let’s have a look at how we might code this in a script. We’ll cover adding each piece of information before showing you a complete listing.

22 Another good source is Aleksa Vukotic et al., Neo4j in Action (Manning Publications, 2014), www.manning.com/partner/.

Entering information

First we define the necessary @Grab to make the Neo4j library available. We define a variable in our script to point to the database. We’re using Neo4j in its embedded mode so there’s no need for any other kind of installation:

@Grab('org.neo4j:neo4j-kernel:2.1.6')
import org.neo4j.graphdb.*
import org.neo4j.graphdb.factory.GraphDatabaseFactory

def factory = new GraphDatabaseFactory()
def db = factory.newEmbeddedDatabase("marathon")

We also declare an enum to capture the relationships that we’ll be representing within our data:

enum MyRelationshipTypes implements RelationshipType { set, broke }

We’ve chosen to use a bit of metaprogramming magic here to make our remaining code be more succinct:

Node.metaClass {
  propertyMissing { String name, val -> delegate.setProperty(name, val) }
  propertyMissing { String name -> delegate.getProperty(name) }
  methodMissing { String name, args ->
    delegate.createRelationshipTo(args[0], MyRelationshipTypes."$name")
  }
}

With these definitions in place, here’s how we define Paul Tergat and his Berlin record:

def tx = db.beginTx()
def athlete1, record1
try {
  athlete1 = db.createNode()
  athlete1.first = 'Paul'
  athlete1.last = 'Tergat'
  athlete1.dob = '1969-06-17'
  record1 = db.createNode()
  record1.distance = 42195
  record1.time = 2*60*60+4*60+55
  record1.venue = 'Berlin'
  record1.when = '2003-09-28'
  athlete1.set(record1)

  def venue = record1.venue
  def when = record1.when
  println "$athlete1.first $athlete1.last won the $venue marathon on $when"

  tx.success()
} finally {
  tx.close()
  db.shutdown()
}

As in previous examples, we might wish to define helper methods to make additional athlete and run definitions easier. First a little bit more metaprogramming will be useful:

GraphDatabaseService.metaClass {
  createNode { Map properties ->
    def n = delegate.createNode()
    properties.each{ k, v -> n[k] = v }
    n
  }
}
Relationship.metaClass {
  propertyMissing { String name, val -> delegate.setProperty(name, val) }
  propertyMissing { String name -> delegate.getProperty(name) }
}

These will let us use a Map-flavored variant for createNode. Now our helper methods are:

def athlete(db, first, last, dob) {
  db.createNode(first: first, last: last, dob: dob)
}

def record(db, h, m, s, venue, when, athlete) {
  def secs = h * 60 * 60 + m * 60 + s
  def record = db.createNode(time: secs, venue: venue, when: when)
  athlete.set(record)
  run
}

With these definitions in place, we can add the additional athletes:

athlete2 = athlete(db, 'Khalid', 'Khannouchi', '1971-12-22')
record2a = record(db, 2, 5, 38, 'London', '2002-04-14', athlete2)
record2b = record(db, 2, 5, 42, 'Chicago', '1999-10-24', athlete2)

athlete3 = athlete(db, 'Ronaldo', 'da Costa', '1970-06-07')
record3 = record(db, 2, 6, 5, 'Berlin', '1998-09-20', athlete3)

athlete4 = athlete(db, 'Paula', 'Radcliffe', '1973-12-17')
record4a = record(db, 2, 17, 18, 'Chicago', '2002-10-13', athlete4)
record4b = record(db, 2, 15, 25, 'London', '2003-04-13', athlete4)
Querying our database

We can perform queries on the athletes and their relationships as follows:

def allAthletes = [athlete1, athlete2, athlete3, athlete4]
def londonRecords = allAthletes.findAll { athlete ->
  athlete.getRelationships(MarathonRelationships.set).any {
    record -> record.getOtherNode(athlete).venue == 'London'
  }
}
assert londonRecords*.last == ['Khannouchi', 'Radcliffe']

The real power of graph databases comes into play when we have more interesting relationships. Let’s add a broke relationship to indicate that one marathon world record broke an earlier one:

record2b.broke(record3)
record2a.broke(record2b)
record1.broke(record2a)
record4b.broke(record4a)

Figure 13.6 illustrates these relationships graphically.

Figure 13.6. Marathon world records showing edge relationships

With these relationships in place we can now perform more interesting queries that rely on the graph structure of the data. We can use the API from earlier when we determined which athletes have won a race in London. This API can become a little bit cumbersome with complex graph algorithms. Fortunately, Neo4j comes with special traversal methods especially for when working with graph algorithms. Let’s use the traversal facilities to find all records that superseded Ronaldo da Costa’s Berlin world record:

import org.neo4j.graphdb.traversal.*

println "World records following $record3.venue $record3.when:"
def result = []
  for (Path p in db.traversalDescription().breadthFirst().
      relationships(MarathonRelationships.broke).
      evaluator(Evaluators.fromDepth(1)).
      uniqueness(Uniqueness.NONE).
      traverse(record3)) {
    def newRecord = p.endNode()
    println "$newRecord.venue $newRecord.when"
  }

Running this yields the following output:

World records following Berlin 1998-09-20:
Chicago 1999-10-24
London 2002-04-14
Berlin 2003-09-28

That took quite a few steps. Let’s see it all together in a single listing.

Listing 13.28. Neo4J marathon database

Such graph-based queries are so common for graph databases that a special Groovy-based DSL called Gremlin[23] has been devised to make writing such queries a bit easier.

23 Gremlin is a DSL for traversing property graphs. See https://github.com/tinkerpop/gremlin.

Using Gremlin

Let’s use Gremlin and perform some similar queries again. First, we add the necessary @Grab commands to load the Gremlin library and the needed import statements:

@Grab('com.tinkerpop.gremlin:gremlin-groovy:2.6.0')
@Grab('com.tinkerpop.blueprints:blueprints-neo4j-graph:2.6.0')
@Grab('com.tinkerpop.blueprints:blueprints-core:2.6.0')
import com.tinkerpop.blueprints.Graph
import com.tinkerpop.blueprints.impls.neo4j.Neo4jGraph
import com.tinkerpop.gremlin.groovy.Gremlin

Next we need to initialize Gremlin (it enables similar metaprogramming to what we’ve done manually earlier in this section) and then create a Gremlin Graph object, which will let us do our queries:

Gremlin.load()

Graph g = new Neo4jGraph(graphDb)

The expression g.V('venue', 'London') finds all vertices in our graph that have their venue property set to London. We can use that expression to find all the records set in London:

def pretty = { it.collect{ "$it.venue $it.when" }.join(', ') }
def results = []
g.V('venue', 'London').fill(results)
println 'London world records: ' + pretty(results)

For our data this will produce

London world records: London 2002-04-14, London 2003-04-13

We can also find all world records set immediately after (that is, breaking) a world record set in London:

results = []
g.V('venue', 'London').in('broke').fill(results)
println 'World records after London: ' + pretty(results)

For our data, this produces

World records after London: Berlin 2003-09-28

And to see traversal in action, we can find all world records after Ronaldo da Costa’s Berlin world record:

results = []
def berlin98 = { it.venue == 'Berlin' && it.when.startsWith('1998') }
def emitAll = { true }
def forever = { true }
g.V.filter(berlin98).in('broke').
  loop(1, forever, emitAll).fill(results)
println 'World records after Berlin 1998: ' + pretty(results)

For our data, this produces

World records after London: Berlin 2003-09-28
World records after Berlin 1998: Chicago 1999-10-24, London 2002-04-14, Berlin 2003-09-28

Let’s see that one more time as a complete listing.

Listing 13.29. Neo4J with Gremlin
@Grab('org.neo4j:neo4j-kernel:2.1.6')
@Grab('org.neo4j:neo4j-management:2.1.6')
@Grab('org.neo4j:neo4j-cypher:2.1.6;transitive=false')
@Grab('org.neo4j:neo4j-cypher-commons:2.1.6;transitive=false')
@Grab('org.neo4j:neo4j-cypher-compiler-1.9:2.0.4;transitive=false')
@Grab('org.neo4j:neo4j-cypher-compiler-2.0:2.0.4;transitive=false')
@Grab('org.neo4j:neo4j-cypher-compiler-2.1:2.1.6;transitive=false')
@Grab('org.neo4j:neo4j-lucene-index:2.1.6;transitive=false')
@Grab('org.apache.lucene:lucene-core:3.6.2')
@Grab('com.tinkerpop.gremlin:gremlin-groovy:2.6.0;transitive=false')
@Grab('com.tinkerpop.gremlin:gremlin-java:2.6.0;transitive=false')
@Grab('com.tinkerpop.blueprints:\
blueprints-neo4j2-graph:2.6.0;transitive=false')
@Grab('commons-configuration:commons-configuration:1.6')
@Grab('com.tinkerpop.blueprints:blueprints-core:2.6.0;transitive=false')
@Grab('com.tinkerpop:pipes:2.6.0;transitive=false')
@Grab('org.parboiled:parboiled-scala_2.10:1.1.6;transitive=false')
@Grab('org.parboiled:parboiled-core:1.1.6')
@Grab('org.scala-lang:scala-library:2.10.4')
@Grab('com.googlecode.concurrentlinkedhashmap:\
concurrentlinkedhashmap-lru:1.4.1')
@GrabExclude('junit:junit')
@GrabExclude('org.hamcrest:hamcrest-all')
@GrabExclude('org.mockito:mockito-core')

import com.tinkerpop.blueprints.Graph
import com.tinkerpop.blueprints.impls.neo4j2.Neo4j2Graph
import com.tinkerpop.gremlin.groovy.Gremlin
import static util.Neo4jUtil.*

def db = create()
def tx = null
def athlete1, athlete2, athlete3, athlete4
def record1, record2a, record2b, record3, record4a, record4b

Gremlin.load()

try {
    tx = db.beginTx()

    // create athlete1 .. athlete4
    athlete1 = insertAthlete(db, 'Paul', 'Tergat', '1969-06-17')
    record1 = insertRecord(
        db, 2, 4, 55, 'Berlin', '2003-09-28', athlete1)

    athlete2 = insertAthlete(db, 'Khalid', 'Khannouchi', '1971-12-22')
    record2a = insertRecord(
        db, 2, 5, 38, 'London', '2002-04-14', athlete2)
    record2b = insertRecord(
        db, 2, 5, 42, 'Chicago', '1999-10-24', athlete2)

    athlete3 = insertAthlete(db, 'Ronaldo', 'da Costa', '1970-06-07')
    record3 = insertRecord(
        db, 2, 6, 5, 'Berlin', '1998-09-20', athlete3)

    athlete4 = insertAthlete(db, 'Paula', 'Radcliffe', '1973-12-17')
    record4a = insertRecord(
        db, 2, 17, 18, 'Chicago', '2002-10-13', athlete4)
    record4b = insertRecord(
        db, 2, 15, 25, 'London', '2003-04-13', athlete4)

    record2b.broke(record3)
    record2a.broke(record2b)
    record1.broke(record2a)
    record4b.broke(record4a)

    Graph g = new Neo4j2Graph(db)

    def pretty = { it.collect { "$it.venue $it.when" }.join(', ') }
    def results = []
    g.V('venue', 'London').fill(results)
    println 'London world records: ' + pretty(results)

    results = []
    g.V('venue', 'London').in('broke').fill(results)
    println 'World records after London: ' + pretty(results)

    results = []
    def emitAll = { true }
    def forever = { true }
    def berlin98 = { it.venue == 'Berlin' &&
            it.when.startsWith('1998') }
    g.V.filter(berlin98).in('broke').
            loop(1, forever, emitAll).fill(results)
    println 'World records after Berlin 1998: ' + pretty(results)
    tx.success()
} finally {
    tx?.close()
}

That wraps up our brief tour of graph databases and Neo4j. We’re almost done. On the final pages in this chapter, we want to cover other approaches you might find when persisting data.

Tour livebook

Take our tour and find out more about liveBook's features:

  • Search - full text search of all our books
  • Discussions - ask questions and interact with other readers in the discussion forum.
  • Highlight, annotate, or bookmark.
take the tour

13.6. Other approaches

For some time now, language and library providers have been trying to make databases easier to use. There have been many approaches, including several along the lines of object-relational mapping (ORM). In the most general terms, ORM frameworks allow developers to describe their data models, including the relationships, for use in an object-oriented language. The idea is to retrieve data from the database as objects using an object-oriented search facility, manipulate the objects, and then persist any changes back to the database. The ORM system takes care of adding and deleting records in the right order to satisfy constraints, datatype conversions, and similar concerns.

This sounds wonderful, but reality is more complicated than theory, as always. In particular, new databases can often be designed to be “ORM-friendly,” but existing databases are sometimes significantly harder to work with. The situation can become sufficiently complex that the author Ted Neward has referred to ORM as “the Vietnam of computer science.”[24]

24 “The Vietnam of Computer Science,” June 26, 2006, http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx.

There are many different approaches and libraries, both free and commercial, for many different platforms. In the Java world, two of the best-known players in the field are the Java Data Objects (JDO) specification and Hibernate. The latest Enterprise JavaBeans (EJB) (also known as the Java Persistence Architecture (JPA) specification) includes ORM to allow implementation-independent expression of relationships. It has yet to be seen how well this independence will work in practice.

As you’ve seen, Groovy provides more object-oriented database access than good-old JDBC, but it doesn’t implement a full-blown ORM solution. Of course, because it integrates seamlessly with Java, any of the solutions available in Java can be used in Groovy too.

Even within the Groovy library, more can be done without crossing the line into full ORM. We expect future versions of Groovy to ship with DataSets that support all CRUD operations, a general DAO implementation, and possibly ready-made Active-Record support.

Beyond the Groovy library are activities to come up with a special Groovy ORM (GORM). This is an approach that builds on Hibernate but relieves the programmer of all the configuration work by relying on code and naming conventions. GORM is developed as a part of the Grails project.

Finally, we’d like to emphasize that it would be a misconception to see ORM as the final solution to database programming and to dismiss all other approaches. ORM is targeted at providing object persistence and transaction support. It tries to shield you from the relational model (to some extent). When selecting an ORM solution, make sure it allows you to exploit the relational model. Otherwise, you’re losing most of the power that you paid your database vendor for.

We find the Groovy SQL approach appealing: it provides good means for working with the relational model with an almost ORM-like feeling for the simple cases while keeping all statements under programmatic control.

Before leaving this section we would be remiss if we didn’t mention the Spring Data project (www.springsource.org/spring-data). This project aims to provide an overarching framework to make it easier for Spring-based applications to use relational and nonrelational databases, MapReduce frameworks, and cloud-based data services. Groovy integrates well with Spring so you may wish to consider using Spring Data from your Groovy or Grails applications to remove some of the differences between the different persistence options.

join today to enjoy all our content. all the time.
 

13.7. Summary

In this chapter, we’ve shown you that Groovy has considerable support for database programming within its standard library. Groovy SQL is available wherever Groovy is. You don’t need to install any additional modules. It’s also easy to integrate with the many available NoSQL databases either via their Java support or in some cases with special Groovy functionality.

Groovy SQL is made from a small set of classes that build on JDBC and make it Groovy-friendly. Important features are as follows:

  • Minimal setup for database access
  • Simple execution of SQL statements
  • Improved reliability through automatic, transparent resource handling (DataSource, Connection, Statement, ResultSet)
  • Easy transparent use of prepared statements with GStrings
  • Convenience with DataSets (adding, nested filtering with expressions)
  • Transparent DTOs
  • Optionally transparent DAOs and business objects

The filtering available in the DataSet class is particularly important in terms of closures being understood not only as a block of code but also as an abstract syntax tree. This can allow logic to be expressed in a manner familiar to the developer without the potentially huge inefficiency of retrieving all the data from the database and filtering it within the application.

You’ve seen how an example application can be written with the help of Groovy SQL so that the code organization fits into architectural layers and database programming patterns with little work.

Although Groovy doesn’t provide any true ORM facilities, it integrates well with existing solutions; and where the full complexities of ORM aren’t required, the facilities provided above and beyond straight JDBC can help tremendously.

Groovy doesn’t come with NoSQL drivers out of the box but it’s easy to find appropriate drivers and often special Groovy support for many of the available NoSQL databases.

sitemap
×

Unable to load book!

The book could not be loaded.

(try again in a couple of minutes)

manning.com homepage
Up next...
  • Reading and writing XML and JSON
  • Transforming treelike structures in-place and as streams
  • Navigating inside structured data
{{{UNSCRAMBLE_INFO_CONTENT}}}