concept aggregate table in category mondrian

appears as: ggregate tables, Aggregate tables, aggregate table, n aggregate table, aggregate tables
Mondrian in Action

This is an excerpt from Manning's book Mondrian in Action.

Although the performance gains vary significantly based on the structure of the data warehouse, the use of aggregate tables and in-memory caching can significantly increase performance. For example, one user had a fact table with several hundred million rows and eight dimension tables with up to 25 million rows. Running reports directly using SQL took about 10 minutes each. The addition of Mondrian with aggregations dropped the time to just over 8 seconds. With the addition of caching, these queries dropped to 2.4 seconds on average. Figure 1.10 illustrates the significant gains that can be made by using Mondrian (using Mondrian is more than 100 times faster, in this example).

Once you have the database running as fast as possible, the next step is to tune Mondrian’s performance as shown in figure 7.4. There are two major ways to improve Mondrian performance. The first is to use aggregate tables, which is covered here. The second is to use in-memory caching, which is covered in the next section.

Figure 7.4. Evaluate Mondrian

Analytics databases often contain millions of records, because you want to store data at the lowest grain that might yield useful analysis. But an analyst will likely be interested in higher-level analysis. For example, analysts for Adventure Works might generally want to see how parts are selling at the monthly level, but they still want the ability to drill down to lower levels of detail to see specific days or customer orders. This section will describe aggregate tables and show how they’re implemented in Mondrian 4.

Aggregating data across millions of records can be slow for even the fastest hardware and database, so Mondrian allows you to specify aggregate tables that precalculate at a higher level. Then, when analysis is being done, Mondrian can get the higher-level details from the aggregate table and the finer details from the detailed table. All of the data is available at the level needed, but the performance is much better. Figure 7.5 shows the relationship between the aggregate and detailed fact tables.

The preceding simple example shows that many different types of aggregate tables can be created. If you have a large fact table with many dimensions and facts, you could create hundreds or thousands of aggregate tables. So how should you decide which aggregate tables to create?

It’s tempting to create as many aggregate tables as possible, but this is not recommended for several reasons. For one, the ETL process will grow as more aggregate tables are created and need to be populated. One of the reasons for using a ROLAP tool is to shorten the time needed to move data from OLTP systems to OLAP. Aggregate tables are a step in the direction of “pure” OLAP where intersections are precalculated, sometimes taking hours before data is available for analysis.

Aggregate tables also take up space in the analytics database and possibly in backups of the database. The additional storage in backups can be avoided by just storing the detailed facts and re-creating the fact tables, but this slows down the restoration process as well as makes it more complex.

Because it’s undesirable to create all the possible aggregates, careful performance testing can help you determine which ones will be most useful. If you have a good performance test environment and you know the common queries, you can find out which queries could use some performance help. Start with the queries, create an aggregate, and see what impact that has on performance. There are two nice benefits to this approach. First, it’s fairly easy to create and populate aggregate tables. Second, aggregates can be added after the fact to speed up slow queries, so it’s not essential to create all the aggregates you might eventually want the first time around.

Now let’s turn to Mondrian’s second significant performance tuning feature: caching. Whereas aggregate tables reduce the amount of data read from the database, caching can eliminate reads entirely by storing data in RAM. Because even the fastest read from the disk or the network is going to be thousands of times slower than in-memory reads, this can lead to another order-of-magnitude gain in performance.

7.3.1. Creating aggregate tables

The physical aggregate tables are created in the database and populated as part of the ETL process. There’s nothing special about the data in aggregate tables for Mondrian. Mondrian simply uses the data in the aggregate table when it has been configured to do so and the query can be answered by the aggregate table. It’s up to the ETL creator and database designer to properly populate the aggregate table.

Enabling aggregate tables
sitemap

Unable to load book!

The book could not be loaded.

(try again in a couple of minutes)

manning.com homepage
test yourself with a liveTest