8 Practical considerations

 

This chapter covers

  • Dealing with data that does not match statistical assumptions
  • Identifying biases that may creep into experiments
  • Avoiding behaviors that generate false positives
  • Replicating experiments to validate that their results are robust

The experimentation methods presented in this book are powerful tools that you can use to improve your engineered system. They are powerful but not foolproof. We can make subtle or simple mistakes that can cause these methods to fail.

This chapter discusses various ways in which your author—and colleagues kind enough to sit for interviews for this book—has seen these methods fail. You could read this chapter as a set of warning labels for experimental optimization.

Section 8.1 shows how the analysis of an experiment can fail if the measurements do not meet the assumptions of the analysis. Perhaps you’ve heard the phrase “garbage in, garbage out.” This is that. In sections 8.2 and 8.3, we look at early stopping and family-wise error, both sources of increased false positives. Section 8.4 discusses common psychological and methodological biases of which you should beware. Finally, section 8.5 explains how replicating experiments boosts confidence in their results.

8.1 Violations of statistical assumptions

8.1.1 Violation of the iid assumption

8.1.2 Nonstationarity

8.2 Don’t stop early

8.3 Control family-wise error

8.3.1 Cherry-picking increases the false-positive rate

8.3.2 Control false positives with the Bonferroni correction

8.4 Be aware of common biases

8.4.1 Confounder bias

8.4.2 Small-sample bias

8.4.3 Optimism bias

8.4.4 Experimenter bias

8.5 Replicate to validate results

8.5.1 Validate complex experiments