chapter three

3 Inside machine brains

This chapter covers

Explaining how neural networks turn data into usable structure.
Distinguishing learning from memorization in practice.
Revealing how architecture shapes model capability.
Analyzing why language resists simple machine learning solutions.

AI systems now produce outputs that remain coherent across a wide range of tasks, even when the structure of those tasks differs significantly. This raises an immediate question: what kind of system can generate behavior that appears coherent and adaptive across many different tasks? Earlier machine learning approaches could already learn from data. They could detect patterns, classify inputs, or make predictions based on past examples. But their ability to combine many interacting signals into coherent structure was limited. As problems grew more complex, involving high-dimensional data and relationships that unfold across multiple steps, these models struggled to capture the structure required.

Neural networks emerged to address this limitation. Instead of learning a single set of relationships, they allow patterns to be transformed and recombined through successive stages of computation, making it possible to build internal representations that reflect increasingly complex structure in the data. What distinguishes them is not that they learn from data, but how much structure they can accumulate through these transformations, combining many interacting signals into a single response.

3.1 What is a neural network?

3.1.1 Neurons as a web of mathematics

3.1.2 Layers that shape information

3.1.3 Weighing what is important

3.1.4 Shaping the neuron’s response

3.1.5 Tracing the journey

3.1.6 Becoming the dominant paradigm

3.2 How do machines learn?

3.2.1 Learning from mistakes

3.2.2 Tracing the path of error

3.2.3 Nudging in the right direction

3.2.4 How rules emerge from examples

3.2.5 What the network learns

3.3 Why learning is not memorizing

3.3.1 When training becomes memorization

3.3.2 Learning under constraint

3.3.3 Knowing when to stop

3.3.4 Augmenting the data

3.3.5 Intelligence in abstraction

3.4 One concept, many architectures

3.4.1 The limits of dense networks

3.4.2 Convolutional networks: structure in space

3.4.3 Recurrent networks: structure in sequences

3.4.4 LSTMs and GRUs: unlocking long-term memory

3.4.5 Autoencoders: compressing the world

3.4.6 Generative architectures: learning to create

3.4.7 Architecture as a form of intelligence

3.5 Specific approaches to language