chapter seven

7 Function Approximation: How Neural Networks model the world

 

Computing till to date has been dominated by the von Neumann architecture. Here the processor and the program are separate. The program sits in memory from where it gets fetched and executed by the processor. The advantage is that completely different programs, solving totally unrelated problems can be loaded in the memory and the same processor can execute them. But, neural networks have a fundamentally different architecture. Here there are no separate processors and programs. There is a single entity called, well, the neural network1 . In this chapter we will study this paradigm in detail.

In section 1.7 we have seen an overview of neural networks2 . There we indicated that most intelligent tasks performed by humans can be expressed in terms of mathematical functions. While that gives us hope of developing automated solutions, we are hobbled by two serious difficulties.

  • In addition to being arbitrarily complicated, the functions underlying different problems are completely different. Hardly any common pattern exists.
  • For most problems, we do not know its underlying function.

The first issue makes it challenging to come up with a mechanized repeatable solution for performing generic intelligent tasks. If we have to start from scratch and estimate the underlying function every time we need to solve a problem, there is little hope of imparting human like intelligence to a machine.

7.1 Most real world problems can be expressed as functions

7.1.1 Logical Functions in real world problems

7.1.2 Classifier functions in real world problems

7.1.3 General Functions in real world problems

7.2 The basic building block aka Neuron: Perceptron

7.2.1 Heaviside Step Function

7.2.2 Hyperplanes

7.2.3 Perceptrons and Classification

7.2.4 Modeling common logic gates with perceptrons

7.3 Towards more expressive power: Multi Layer Perceptrons (MLPs)

7.3.1  MLP for Logical XOR

7.4 Layered networks of Perceptrons aka MLPs aka Neural Networks

7.4.1 Layering

7.4.2 All Logical Functions can be modeled with MLPs

7.4.3 Cybenko’s Universal Approximation Theorem

7.4.4  MLPs for polygonal decision boundaries

7.5 Chapter Summary