Skip to Content

The One Huuuuuge Thing Donald Trump Taught Nate Silver About Big Data

Inside The South By Southwest (SXSW) Interactive FestivalInside The South By Southwest (SXSW) Interactive Festival
Nate SilverPhoto courtesy of Bloomberg via Getty Images

This Wednesday, esteemed data journalist Nate Silver published a lengthy postmortem of his failure to predict the Republican primary triumph of Donald Trump, who he and his team at FiveThirtyEight initially gave a 2% percent chance and compared to previous flimsy candidates like Herman Cain and Michele Bachmann. Even in early January, Silver was still giving Trump odds of just 12 or 13%.

Silver has been facing heat for ‘getting it wrong.’ That backlash reflects our rising faith in, and expectations of, the full spectrum of data-driven analytics, “big data,” and artificial intelligence in business and beyond. Silver’s self-analysis offers several important, basic lessons for how we should use and understand those new tools—but one insight looms largest.

Get Data Sheet, Fortune’s technology newsletter.

In concrete terms, Silver says the biggest problem with FiveThirtyEight’s early primary predictions was that they didn’t use formal statistical models to arrive at them, but instead translated their “subjective odds” into percentage terms. That, as Silver acknowledges, was a major lapse in presentation, giving the impression of a precision that just wasn’t there.

But the more interesting insight is why FiveThirtyEight wasn’t using a formal model for those early predictions—Silver says there just wasn’t enough historical data to support one. The party nomination process we have now, he says, only dates back to 1972, and “the data availability is spotty.”

To understand why shying away from limited or inconsistent data sets is crucial to being smart about analytics, we only need to look at another recent, much graver failure of data science—the mortgage crisis. In the runup to 2008, it was in large part incomplete historical data sets that led banks and rating agencies to overvalue subprime mortgage derivatives, with consequences we’re still living with. Silver, in fact, directly compares Trump’s candidacy to a financial bubble, whose feedback loops and collapse points are still impossible to reliably interpret.

For more on Big Data, watch our video:

In other words, had Silver relied more heavily on his limited historical data set, he might not have done much better than he did.

Silver’s experience offers plenty of other insights. One is that the primary system (like the mortgage market) is extremely complex—in Silver’s words, “among the most complex systems that I’ve studied.” Predictive modeling is much, much better in more controlled systems like sports, where Silver got his start.

Another point is even more basic: Unlikely events like the Trump nomination are, by their very nature, impossible to predict. We live in an extremely complex world, many parts of which are simply unquantifiable—at least, for now.