17 Design a dashboard of top 10 products on Amazon by sales volume
- Scaling an aggregation operation on a large data stream
- Using a Lambda architecture for fast approximate results and slow accurate results
- Using Kappa architecture as an alternative to Lambda architecture
- Approximating an aggregation operation for faster speed
Analytics is a common discussion topic in a system design interview. We will always log certain network requests and user interactions, and we will perform analytics based on the data we collect.
The Top K Problem (Heavy Hitters) is a common type of dashboard. Based on the popularity or lack thereof of certain products, we can make decisions to promote or discontinue them. Such decisions may not be straightforward. For example, if a product is unpopular, we may decide to either discontinue it to save the costs of selling it, or we may decide to spend more resources to promote it to increase its sales.
The Top K Problem is a common topic we can discuss in an interview when discussing analytics, or it may be its own standalone interview question. It can take on endless forms. Some examples of the Top K Problem include
17.1 Requirements
17.2 Initial thoughts
17.3 Initial high-level architecture
17.4 Aggregation service
17.4.1 Aggregating by product ID
17.4.2 Matching host IDs and product IDs
17.4.3 Storing timestamps
17.4.4 Aggregation process on a host
17.5 Batch pipeline
17.6 Streaming pipeline
17.6.1 Hash table and max-heap with a single host
17.6.2 Horizontal scaling to multiple hosts and multi-tier aggregation
17.7 Approximation
17.7.1 Count-min sketch
17.8 Dashboard with Lambda architecture
17.9 Kappa architecture approach
17.9.1 Lambda vs. Kappa architecture
17.9.2 Kappa architecture for our dashboard
17.10 Logging, monitoring, and alerting