Chapter 10. Analytics-on-read
This chapter covers
- Analytics-on-read versus analytics-on-write
- Amazon Redshift, a horizontally scalable columnar database
- Techniques for storing and widening events in Redshift
- Some example analytics-on-read queries
Up to this point, this book has largely focused on the operational mechanics of a unified log. When we have performed any analysis on our event streams, this has been primarily to put technology such as Apache Spark or Samza through its paces, or to highlight the various use cases for a unified log.
Part 3 of this book sees a change of focus: we will take an analysis-first look at the unified log, leading with the two main methodologies for unified log analytics and then applying various database and stream processing technologies to analyze our event streams.
What do we mean by unified log analytics? Simply put, unified log analytics is the examination of one or more of our unified log’s event streams to drive business value. It covers everything from detection of customer fraud, through KPI dashboards for busy executives, to predicting breakdowns of fleet vehicles or plant machinery. Often the consumer of this analysis will be a human or humans, but not necessarily: unified log analytics can just as easily drive an automated machine-to-machine response.