9 Case study 2 solution

 

This section covers

  • Measuring statistical significance
  • Permutation testing
  • Manipulating tables using Pandas

We’ve been asked to analyze the online ad-click data collected by our buddy Fred. His advertising data table monitors ad clicks across 30 different colors. Our aim is to discover an ad color that generates significantly more clicks than blue. We will do so by following these steps:

  1. Load and clean our advertising data using Pandas.
  2. Run a permutation test between blue and the other recorded colors.
  3. Check the computed p-values for statistical significance using a properly determined significance level.
Warning

Spoiler alert! The solution to case study 2 is about to be revealed. I strongly encourage you to try to solve the problem before reading the solution. The original problem statement is available for reference at the beginning of the case study.

9.1 Processing the ad-click table in Pandas

Let’s begin by loading our ad-click table into Pandas. Then we check the number of rows and columns in the table.

Listing 9.1 Loading the ad-click table into Pandas
df = pd.read_csv('colored_ad_click_table.csv')
num_rows, num_cols = df.shape
print(f"Table contains {num_rows} rows and {num_cols} columns")

Table contains 30 rows and 41 columns

Our table contains 30 rows and 41 columns. The rows should correspond to clicks per day and views per day associated with individual colors. Let’s confirm by checking the column names.

9.2 Computing p-values from differences in means

9.3 Determining statistical significance

9.4 41 shades of blue: A real-life cautionary tale

Summary