Chapter 14. Deep learning on Spark with H2O

This chapter covers

  • Introduction to H2O
  • Introduction to deep learning
  • Starting an H2O cluster on Spark
  • Building and evaluating a regression deep-learning model using Sparkling Water
  • Building and evaluating a classification deep-learning model using Sparkling Water

Deep learning is a hot topic in the machine-learning world today. We could say that there’s a deep-learning revolution going on. Deep learning is a general term denoting a family of machine-learning methods characterized by the use of multiple processing layers of nonlinear transformations. These layers are almost universally implemented as neural networks.

Although the core principles aren’t new, a lack of computing power and efficient algorithms prevented those principles from being further developed in the previous decades. This has changed in recent years, with many advances in deep-learning algorithms and their successful applications. One of the many recent breakthroughs is the DeepID system for learning high-level features,[1] which is capable of recognizing tens of thousands of faces with a close-to-human accuracy of 97.45% (unlike its accuracy, its capacity is obviously superhuman).

1Yi Sun et al., “Deep Learning Face Representation from Predicting 10,000 Classes,”

14.1. What is deep learning?

14.2. Using H2O with Spark

14.3. Performing regression with H2O’s deep learning

14.4. Performing classification with H2O’s deep learning

14.5. Summary