16 Gradient descent: Optimization problems (not just) on graphs

This chapter covers

Developing a randomized heuristic to find the minimum crossing number
Introducing cost functions to show how the heuristic works
Explaining gradient descent and implementing a generic version
Discussing strengths and pitfalls of gradient descent
Applying gradient descent to the graph embedding problem

If I mention a technique called gradient descent, does it ring a bell? You might not have heard of it, or maybe you recognize the name but can’t quite recall how it works. If so, that’s fine. If, however, I ask you about machine learning, classification problems, or neural networks, chances are that you know exactly what I’m talking about; and I bet these terms also sparked your interest much more.

Well, gradient descent (or a variation on the theme) is the optimization technique that is used behind the scenes by many machine-learning algorithms. But did you know that long before being used as the backbone of supervised learning, this technique was designed to solve optimization problems, like some of the ones we have seen on graphs?

If you didn’t, or if you’d like to delve into this topic and learn more about how gradient descent works, this should be the perfect chapter for you.

16.1 Heuristics for the crossing number

16.1.1 Did you just say heuristics?

16.1.2 Extending to curve-line edges

16.2 How optimization works

16.2.1 Cost functions

16.2.2 Step functions and local minima

16.2.3 Optimizing random sampling

16.3 Gradient descent

16.3.1 The math of gradient descent

16.3.2 Geometrical interpretation

16.3.3 When is gradient descent appliable?

16.3.4 Problems with gradient descent

16.4 Applications of gradient descent

16.4.1 An example: Linear regression