Is Donald Trump’s Surprise Win a Failure of Big Data? Not Really
Hillary Clinton’s campaign for the presidency was famously, proudly, data-driven. For months, a trail of reporters chronicled the magic of the Clinton team’s “digital strategy” with dizzied wonderment. A data chief who scribbles on walls in erasable marker like Russell Crowe in A Beautiful Mind! Subtle but telling changes to landing page design! Something called “cost per flippable delegate!”
Now that Clinton has failed, the revenge against data has been swift: Since Tuesday’s surprise election, we’ve been told that Trump’s surprise victory undercuts the belief that analyzing reams of data can accurately predict events; that it explodes the received wisdom about the value of data-driven campaigning; that data doesn’t matter.
But Tuesday was not a failure of data; it was a failure of forecasting and analysis — by humans. The data was as good as it could be, but the analysis of it lacked depth. If anything, the forecasters’ spectacular and almost unanimous collective failure to see Trump’s win coming provides an opening for a more productive conversation between numbers and words, statisticians and analysts, data and message.
The Great Data Debate
Much of the Great Data Debate has focused on two things: the polls “got it wrong;” and polling data, no matter its quality, was powerless to grasp the hidden electoral momentum generated by Trump’s populist appeal to the bruised pride of working-class whites.
Yes, many polls underestimated the strength of Trump’s support. Yes, Tuesday was another blow for a polling industry already winded by several recent big misses and facing numerous structural obstacles. But polls were never designed to be forecasts. They are simply one basket of data points among many others.
The real problem is that we haven’t done enough work to look beyond the polls and find new data sets that can improve political analysis — an especially urgent task in an age of volatile electoral moods.
The data is out there. We just need to get more creative in looking for it.
The firm I work for, Predata, is engaged in this very search for alternative ways of understanding politics. For the election, working off the theory that political campaigning increasingly takes place online and voters are increasingly inaccessible to polling firms, we developed signals to capture shifts in the digital conversation around the race. To produce these signals, we gathered and analyzed hundreds of thousands of data points every day.
Humans failed, not Big Data
Having had some success with our Brexit forecast earlier in the year, on this occasion Predata— like practically everyone else — got the call wrong and predicted that Clinton would win. There was nothing fundamentally wrong with the data; the data was good. It’s just that the humans (well, human: me) curating and analyzing the data underperformed.
Influenced by the percussion of polls and punditry heavily suggestive of a Clinton win, I allowed myself to ignore signs in the data that Trump was ahead in both the battleground states overall and Florida. That was a mistake. But it was a fundamentally human mistake. The data was blameless.
All data sets and data-driven forecasting models — even those that claim to run off artificial intelligence — are, to some extent, a reflection of their creator’s own biases. There is a subjectivity embedded in every curatorial choice that goes into the creation of a poll, or a set of signals to monitor debate online, or a prediction model. The interpretation of data, too, is necessarily subjective. But one mistake does not mean we should forfeit the game. Gather data, crunch data, interpret data: there is nothing fundamentally unsound or stupid about this basic exercise. It’s still worth doing. But we need to get better at understanding what the data can tell us — its potential and limitations — and how it fits into a broader analytical picture.
Need to bridge the geek divide
There’s still a cultural divide that separates the geeks (the data scientists and statisticians) from the poets (the reporters, the color writers) in coverage of political campaigns. Neither has a monopoly on the truth, as Tuesday showed. And each can offer useful information in our ongoing quest to make sense of messy reality.
To get better at forecasting big political events, we need both better data and sharper reporting, a clearer read on the numbers and a more penetrating portrait of on-the-ground realities — and a more active exploration of the intersection between the two. That means more words informed by data, and more data worked on by words: the marriage of techies and fuzzies to which good technology always tends.
In our exploration of this blossoming new age of data, we’re still no better than Monsieur Hulot in his new kitchen. The epistemological blunders of the last few weeks shouldn’t impel us to give up on data. They’re an invitation to keep blundering on, keep making mistakes, and hopefully — with flexible minds and a better sense of the limits of what is possible — make data great again.
Aaron Timms is the Director of Content at Predata, a New York-based predictive analytics firm.