chapter ten

10 Your data under a different lens: window functions

 

This chapter covers

  • Window functions and what kind of data transformation they enable.
  • Summarizing, ranking, and analyzing data using the different classes of window functions.
  • Building static, growing, and unbounded windows to your functions.
  • Apply UDF to windows as custom window functions.

When performing data analysis or feature engineering (which is my favorite part of machine learning!—see chapter 12), nothing gets me happy quite like window functions. When you take a first glance at them, they look like a watered-down version of the split-apply-combine introduced in chapter 9. Then you open the blinds and bam! powerful manipulations in a short, expressive body of code.

Those who don’t know window functions are bound to reimplement its functionality, poorly. This has been my experience coaching data analysts, scientists, and engineers. If you find yourself struggling to

  1. Rank records
  2. Identify the top/bottom record according to a predicate
  3. Get a value from a previous observation
  4. Build trended features (meaning features that summarize past observations, such as the average of the observations for the previous week).

you will find that window functions will multiply your productivity and simplify your code.

10.1 Growing and using a simple window function

10.1.1 Identifying the coldest day of each year, the long way

10.1.2 Creating and using simple window function to get the coldest days

10.1.3 Comparing both approaches

10.2 Beyond summarizing: using ranking and analytical functions

10.2.1 Ranking functions: quick, who’s first?

10.2.2 Analytic functions: looking back, peeking ahead

10.3 Flex those windows!; using row and range boundaries

10.3.1 Counting, window-style: static, growing, unbounded

10.3.2 What you are vs. where you are: range versus rows

10.4 Going full circle: using UDF within windows

10.5 Summary