This chapter covers
In October 2006, Netflix announced a $1 million prize for the team that could improve movie recommendations by 10% via Netflix’s own proprietary recommendation system, CineMatch. The Netflix Grand Prize was one of the first-ever open data science competitions and attracted tens of thousands of teams.
The training set consisted of 100 million ratings that 480,000 users had given to 17,000 movies. Within three weeks, 40 teams had already beaten CineMatch’s results. By September 2007, more than 40,000 teams had entered the contest, and a team from AT&T Labs took the 2007 Progress Prize by improving upon CineMatch by 8.42%.
As the competition progressed with the 10% mark remaining elusive, a curious phenomenon emerged among the competitors. Teams began to collaborate and share knowledge about effective feature engineering, algorithms, and techniques. Inevitably, they began combining their models, blending individual approaches into powerful and sophisticated ensembles of many models. These ensembles combined the best of various diverse models and features, and they proved to be far more effective than any individual model.