chapter five

5 The birth of information theory: Shannon and the mathematics of uncertainty

 

This chapter covers

  • Claude Shannon’s A Mathematical Theory of Communication (1948), which defined information as measurable uncertainty and founded modern communication theory
  • His introduction of entropy, expressed in bits, as a precise measure of uncertainty and information
  • His intended contributions to communication—capacity, redundancy, coding, and noise—that solved urgent engineering problems
  • The unintended impact of entropy and mutual information in statistics, data science, and artificial intelligence
  • Modern applications of entropy: decision trees, random forests, feature selection, clustering, neural networks, and representation learning

So far, we have followed the problem of uncertainty through the minds of some of its greatest interpreters. Bayes showed how belief could be updated as new evidence arrived, giving us a rule for reasoning under incomplete information. Fisher reframed uncertainty in terms of estimation, developing maximum likelihood as a principled way to identify the parameters that best explain observed data. Neyman and Pearson established the procedure for hypothesis testing, offering a framework for separating real signals from random noise while balancing the risks of error. Each step extended our ability to reason reliably when certainty was out of reach.

5.1 Primers on information and entropy

5.1.1 Information: from meaning to uncertainty

5.1.2 5.1.2 Entropy: quantifying uncertainty

5.2 Shannon’s framework of communication (intended contribution)

5.2.1 Channel capacity

5.2.2 Noisy channel coding

5.2.3 Source coding (compression)

5.2.4 Signal processing and modulation

5.2.5 Why it mattered—and where it led

5.3 Entropy and information gain in data partitioning

5.3.1 Decision trees

5.3.2 Random forests

5.3.3 Feature selection

5.3.4 5.3.4 Clustering

5.4 Entropy and uncertainty reduction in deep learning

5.4.1 Neural networks

5.4.2 Representation learning

5.5 From communication to universal uncertainty

5.6 Summary