It was like the machine learning world’s version of “American Idol” Monday in New York: A seven-man multinational group dubbed BellKor’s Pragmatic Chaos was awarded $1 million by Netflix NFLX CEO Reed Hastings as the winners of the Netflix Prize.
Cooked up three years ago by Hastings, the Netflix Prize challenged all comers to improve the recommendation technology used by the online movie rental company by 10%.
As they clutched the Ed McMahon-style check, and slung enormous gold medallions stamped with “Netflix Prize 2009” around their necks, it was the first time the winning team – a collection of computer scientists, electrical engineers and statisticians – had actually ever physically been in the same place at the same time.
The basic gist of the problem to be solved was this: If you love this handful of movies, here are others you ought to adore. Simple enough when you are talking to the video clerk behind the counter (though hit or miss even then), but to break movie preference down into essentially mathematical components and do it even 10% better than Netflix was able to do proved an incredibly tough goal to achieve.
Bob Bell, Martin Chabbert, Michael Jahrer, Yehuda Koren, Martin Piotte, Andreas Töscher and Chris Volinsky, had combined teams and brain power to fend off 40,000 other teams from 186 countries to take the grand prize to back to home bases in Austria, Canada, Israel and the United States.
Hastings had an inkling of how hard it was going to be. During one Christmas vacation skiing in Park City, Utah a few years back, Hasting figured he could make some headway after his days on the slopes. “I’d stay up at night writing computer models, and I thought I had it all figured out,” Hastings says with a grin. “I didn’t get anywhere. One of the biggest surprises for me in all this, has been the enormous progress this area of mathematics has made.”
BellKor’s Pragmatic Chaos managed to do what Hastings and the crack engineers at Netflix couldn’t, by blending a variety of mathematical models, a tact that ultimately all the serious contenders used.
If you liked French Connection will you love Singin’ in the Rain?
Some of the BellKor team’s algorithms examined movies as bunches of elements. These elements might include genre, a specific actor, and then drill down into more detail – blood but no car chases, or car chases but no gunfights.
Movies that had combinations of similar elements would get lumped together into groups. The theory is that if you like one in a particular group you are more inclined to like the others.
But that kind of approach, similar to what Amazon AMZN and other online retailers use, couldn’t squeeze out enough improvement to take home the victory.
To do that, the BellKor team deployed a variety of other models that were far from obvious. One looked at what movies the anonymous Netflix customers in the data provided rated, rather than how they rated the movies (the movies they chose not to rate also provided so-called negative information).
Other models looked at when the ratings were done. It turns out, the longer it’s been since a movie was viewed the more generous the ratings are. BellKor’s model, therefore, discounted ancient movie reviews, when suggesting a movie you might want to watch tonight.
Still unanswered: The Miss Congeniality question.
How to blend hundreds of algorithms together into a useable model turned out to be the final problem BellKor’s Pragmatic Chaos had to lick, one that ultimately required combining four leading teams into one (thus the mouthful of a name representing the members of each team).
“You need to think outside the box, and the only way to do that is find someone else’s box,” says Chris Volinsky, BellKor’s team manager and the head of statistical research for AT&T Research. “When we combined with other engineers, we found they approached the problem from a different perspective and we were able to use that to our advantage.”
(One thing Volinsky and his team couldn’t figure out is why Sandra Bullock-vehicle “Miss Congeniality” – savaged by critics – was the most frequently rated movie among Netflix users. “I still have no explanation for that,” Volinsky says laughing. Click here for more lists of favorite movies, hated movies.)
Indeed, one of the chief lessons of the lengthy contest was the necessity of collaboration. And whether it was behind (virtual) closed doors, or by publicly sharing information on contest forums, the teams and people involved pushed everyone further by sharing information and methods.
Neck and neck in the homestretch
It was also a requirement of the contest that the winners of so-called progress awards during the course of the competition (all won by BellKor’s members), had to publish their methods for all to see.
At the close of the three-year contest, after BellKor had already passed the 10% mark, the rest of the contestants were given a final 30 days to beat the BellKor score. In that all-or-nothing push, essentially all of the other top-notch teams ganged up to topple the leader.
They practically pulled it off.
After three years of effort, the second place team, called simply The Ensemble, a combination of dozens of teams ended up with the same final score as BellKor’s Pragmatic Chaos on the final day of competition. But BellKor had submitted its final top-scoring model 24 minutes earlier – thus according to the contest rules taking the victory.
Since Volinsky and fellow BellKor team member Bob Bell worked full-time at AT&T research during the contest, and spent company time on the problem, AT&T gets a chunk of the winnings, Volinsky says. The remainder gets split between the three other teams. For its part, Netflix got much more than its money’s-worth, says Hastings, who estimates by using the prize-winning approach Netflix ought to be twice as good in its recommendations (not just 10% better – it’s a logarithmic thing, naturally). Since the winning team owns its solution, Netflix is licensing it, but there is nothing to prevent the likes of Amazon, Apple AAPL or anyone else from doing the same or licensing The Ensemble’s models.
Hastings is clearly pleased with the way the contest went, though it may have dragged on a bit long. That is why, on the same day the winner of the first Netflix Prize was announced, Hasting’s outlined a second $1 million prize.
The next contest also focuses on movie recommendations, clearly at the core of the Netflix business, but rather than picking movies for people whose movie ratings are known, it’s tackles the tougher problem of recommending movies for Netflix customers who don’t rate movies at all.
Rather than millions of ratings, the dataset will include more implicit information about preference, such as which movies have been watched, which haven’t, and demographic information. Rather than an open-ended contest to hit some numeric goal, it will be a limited 18-month sprint to the best solution.