8 Approximate quantiles on data streams

 

This chapter covers

  • Reviewing the concept of exact quantiles and understanding constraints imposed by streaming data context
  • Understanding different types of errors for approximate quantiles
  • Applying t-digest and q-digest algorithms to a data stream
  • Comparing t-digest and q-digest on realistic data on length of visits to a website

Different algorithms presented in the previous chapter allow us to select an (un)biased sample from all data-tuples that have arrived up to the current moment. In a way, a sample is a very flexible datasketch: you form it once, and you can then use it to claim that its mean, or any other feature, is a good estimate of that same feature of all the data from the stream so far. Remember why we moved from Bernoulli sampling to sampling procedures that keep a sample of fixed size: because the Bernoulli sample grows with the number of elements seen and in combination with streaming data, it is just not practical.

8.1 Exact quantiles

 
 
 

8.2 Approximate quantiles

 
 
 
 

8.2.1 Additive error

 
 

8.2.2 Relative error

 
 

8.2.3 Relative error in the data domain

 
 
 
 

8.3 T-digest: How it works

 
 
 

8.3.1 Digest

 
 
 

8.3.2 Scale functions

 
 
 

8.3.3 Merging t-digests

 

8.3.4 Space bounds for t-digest

 
 
 

8.4 Q-digest

 
 

8.4.1 Constructing a q-digest from scratch

 
 

8.4.2 Merging q-digests

 
 

8.4.3 Error and space considerations in q-digests

 
 

8.4.4 Quantile queries with q-digests

 
sitemap

Unable to load book!

The book could not be loaded.

(try again in a couple of minutes)

manning.com homepage
test yourself with a liveTest