chapter seventeen

17 Design a dashboard of top 10 products on Amazon by sales volume

This chapter covers

Scaling an aggregation operation on a large data stream
Using a Lambda architecture for fast approximate results and slow accurate results
Using Kappa architecture as an alternative to Lambda architecture
Approximating an aggregation operation for faster speed

Analytics is a common discussion topic in a system design interview. We will always log certain network requests and user interactions, and we will perform analytics based on the data we collect.

The Top K Problem (Heavy Hitters) is a common type of dashboard. Based on the popularity or lack thereof of certain products, we can make decisions to promote or discontinue them. Such decisions may not be straightforward. For example, if a product is unpopular, we may decide to either discontinue it to save the costs of selling it, or we may decide to spend more resources to promote it to increase its sales.

The Top K Problem is a common topic we can discuss in an interview when discussing analytics, or it may be its own standalone interview question. It can take on endless forms. Some examples of the Top K Problem include

17.1 Requirements

17.2 Initial thoughts

17.3 Initial high-level architecture

17.4 Aggregation service

17.4.1 Aggregating by product ID

17.4.2 Matching host IDs and product IDs

17.4.3 Storing timestamps

17.4.4 Aggregation process on a host

17.5 Batch pipeline

17.6 Streaming pipeline

17.6.1 Hash table and max-heap with a single host

17.6.2 Horizontal scaling to multiple hosts and multi-tier aggregation

17.7 Approximation

17.7.1 Count-min sketch

17.8 Dashboard with Lambda architecture

17.9 Kappa architecture approach

17.9.1 Lambda vs. Kappa architecture

17.9.2 Kappa architecture for our dashboard

17.10 Logging, monitoring, and alerting