The hotly disputed black magic of data breach cost estimates

A few weeks ago Fortune visited a law firm where one partner lamented the quality of cost estimates for big companies suffering data breaches—a vital consideration for businesses seeking to manage their risk and score reasonably priced insurance policies. (Who and where are unimportant for the purposes of the story.) Prompted by a recent analysis of 10-k filings which concluded that the impact of breaches to corporate bottom lines is trivial, the conversation stirred the lawyer’s excitement—and vexation. There are no good estimates, the lawyer rued.

“It’s black magic,” the partner told Fortune. “No one actually knows the costs.”

Nowhere is this fact more apparent than in the latest data breach investigations report compiled by Verizon (VZ), a touchstone annual study for the cyber security industry. Now in its eighth year, Verizon’s report ventures for the first time to determine the cost of a stolen record—the amount of money a company loses for each pilfered payment, personal, or medical record (see pages 27-30). (“Better eight than never,” right? the researchers write.)

“One of the questions we get a lot is, ‘Do you have any impact information?’ We’ve always had to say, ‘Unfortunately we don’t,'” says Marc Spitler, senior analyst and co-author of the report, during a briefing call. He notes that producing a reliable estimate is no easy feat.

The reason, as Verizon data scientist Jay Jacobs later clarifies when sitting down with Fortune at the RSA Conference this week, is that whenever the company’s forensics team would go back and ask companies about the financial impact, they would tell them that their engagement was done. Sharing over. As a result, Verizon—and many others in the industry—have struggled to get quality follow-up data. Add to that the fact that the quantity of data isn’t very good either, he says. “It’s not just bad data,” he adds. “It’s lack of data.”

Yet in order for executives to make reasonably informed decisions about security investments, they need to be able to understand the costs and benefits. So estimates have been drummed up. The reigning schema, aka the “cost-per-record” model, determines data breach costs by dividing the sum of estimated losses by total records lost, a straightforward formula that yields $201 for 2014 and $188 for the year prior to that. It’s a linear relationship. These figures, which have known limitations (primarily, underestimating the cost of small breaches and overestimating the cost of large breaches), are the result of annual surveys and field observations conducted by The Ponemon Institute, a digital security research center. According to the Verizon report, there has never been a better model available—until now.

This year Verizon teamed up with the cyber risk assessment firm NetDiligence, which has data about cyber liability insurance claims. By analyzing nearly 200 cyber insurance payouts, the researchers were able to get tangible data linking breaches to damages, the report says. What they found should surprise anyone who follows the information security industry. In Verizon’s analysis, the average cost-per-record is radically lower than the prevailing estimate: it’s $0.58, stupendously less than the Ponemon figure. How could that be?

“I think that this impact section is going to be the most talked about section,” Spitler says, anticipating its significance. “It’s probably the one that’s going to get a lot of buzz and questions back to us going forward.” He’s right.

Verizon’s cost-per-record coup

Once the embargo on Verizon’s report lifted last week, Fortune phoned Larry Ponemon, founder and chairman of the eponymous institute, to hear how he might account for the jarringly conflicting estimates. When informed of Verizon’s figure, Ponemon reacted as though he had been blindsided. He had not known of the result, he says, until Fortune brought it to his attention. “It’s very disturbing,” he says, mentioning that, were it not so early in the morning, he could use a glass of wine. “As you can tell from my voice, I’m very upset about this.”

Ponemon’s distress is understandable. His eponymous institute spends 10 months per year putting together its annual “cost of a data breach” study, which analyzes more than 1,600 companies in a dozen countries. It’s no small task. (Verizon’s report—which is, by all means, one of the papers of record for the cyber intrusion business—encompasses, in total, research into more than 2,000 data breaches in more than 60 countries.) Despite the Ponemon Institute having produced a cost report every year for the past decade, Verizon chose not to contact nor consult with him, Ponemon says. And, he adds, he feels snubbed. (Verizon, by the way is a sponsoring company of the institute, he says.)

“We contemplated reaching out to Ponemon on this and talking through it,” Jacobs later tells Fortune at the aforementioned security conference, ” but we really didn’t get anything from him. We just simply took his published material referenced it, cited it. We’ve got links to his reports on one of these pages.” [Editor’s note: see page 30 of the report.] “There was nothing we had a question about,” he adds. “There wasn’t any sort of question or ambiguity about what he had done that we needed input on.”

“We’re not trying to be an adversary to Ponemon,” Spitler earlier tells Fortune during the briefing call. “We were just able to get really excellent, tangible data and to use it in such a way that we were able to build something that will improve upon the cost-per-record model.”

So the two sides—Verizon and Ponemon—seem to have gotten off on the wrong footing. But questions of collaboration aside: whose model is right? Or at the very least more accurate? How could these two organizations—both of which have taken great pains to assess the damages inflicted on corporations at the hands of keyboard-clacking hackers—arrive at such glaringly different conclusions? What gives?

Why the difference

The issue has partly to due with how each team collects its data and calculates its numbers. First off, Ponemon’s model excludes breaches of a certain size. It does not take into account companies that have lost more than 100,000 records—above a certain point the damages don’t quite scale, despite more records being lost. Consider the case of Target, for example. Take the number of payment card records (as in credit and debit card numbers) the retailer lost—about 40 million—and multiply that by the roughly $200 Ponemon cost-per-record number. The result? $8 billion. Now compare that to reality…

Target certainly did not wind up spending $8 billion to cover its breach related expenses. For a company with revenues at around $70 billion, that would be roughly 10% of its top line! A crippling blow for any business. In fact, after insurance compensation and tax deductions, the retailer’s damages actually come out to something more on the order of $100 million. That’s about 0.1% of total sales—a much more manageable hit. (Omitting insurance and deductions, Target got nailed with around $250 million in breach related expenses so far.) Plugging in Verizon’s roughly $0.50 cost-per record figure, on the other hand, yields $20 million—far too low.

By the same token, the couple hundred insurance claims analyzed by Verizon have caps, too. Since the NetDiligence data is based on insurance payouts—and since all insurance policies have limits (and sub-limits and exclusions), as a blog post on the Ponemon Institute’s website explains—it is highly likely that NetDiligence’s numbers do not represent the full costs companies incur. Jacobs acknowledges this when sitting down with Fortune, and he stresses that the point has more to do with data collection than data analysis. In the report and in person, he strongly advocates for better data collection, but stands by his team’s analysis.

Further, Ponemon’s model purports to include so-called soft costs, which are indirect. These might include business partners deciding to take their business to more secure partners in the wake of a breach. They also might comprise customers losing trust in a brand and choosing to shop at a competitor instead. It makes sense then that Ponemon’s numbers are higher; Verizon’s analysis, in contrast, does not factor in intangible costs. Still, is that enough justify the huge—$0.58 versus $201—disparity?

Here’s the rub: Verizon’s report readily admits that its $0.58 cost-per-record estimate is no good. But neither is the $201, the report says. “Both the $0.58 and $201 cost-per-record models,” asserts the study, “create very poor estimators.”

On the briefing call, Spitler says much the same. “That $0.58 cent model is not good way to go about it either,” he tells Fortune, pointing to the report’s ensuing discussion as containing the better model. That’s why Verizon’s report scraps the linear cost-per-record model promoted by Ponemon, and proposes a new logarithmic regression model as a better predictor of real-world impacts. Indeed, the entire reason Verizon presents a $0.58 estimate is to debunk the traditional engine of estimation. While Ponemon’s researchers rely on simpler metrics—cost (Y) per record (X)—Verizon’s researchers use parametric statistics to plot the relationship between records lost and estimated data breach cost.

Spitler qualifies: “We built a better model, but it is far from perfect.”

The middle road

Figure 23 from Verizon’s 2015 data breach investigations report tabulates a range data breach cost estimates based on the number of records lost by a company.

Looking at this table with its wide ranges, there is definitely some opportunity for improving the estimate of loss from breaches. But at least we have improved on the oversimplified cost-per-record approach, and we’ve discovered that technical efforts should focus on preventing or minimizing compromised records.

Though it lacks a concise or catchy name, the above—figure 23 in the Verizon report—represents the company’s alternative to the conventional cost-per-record model.

For all the apparent dispute between the institute and Verizon, their models agree on certain points. “It is ironic that after all the criticism, our estimate of a total cost of data breach falls within DBIR’s confidence interval shown in Figure 23 of the report,” Ponemon writes, referencing the common acronym of the data breach investigations report, in his post titled “Why Ponemon Institute’s Cost of Data Breach Methodology Is Sound and Endures.” He continues, “DBIR’s own prediction model for a data breach involving between 10,000 and 100,000 records fits our global total cost of data breach.”

Given that kernel of accord buried beneath the surface-level strife, it seems that Ponemon’s consternation arises more so from the way Verizon presents its findings than the findings themselves. “Apparently their single-minded goal was to ‘bust the myth’ of our annual cost of data breach research,” he writes on the institute’s blog.

“We stand by our results,” Ponemon tells Fortune. “We work very hard to be accurate. We are not simpletons. We work to provide meaningful data.” Of course, Ponemon also owns up to the limitations of the institute’s model. “I’m the first to say it’s not perfect—it has possible errors,” he says. “But to the best of my knowledge, there’s no better way to collect the kind of data we collect.”

Jacobs does not dispute that, per se: “I don’t think Ponemon’s data collection methodology is bad, but I think there’s an opportunity in analysis to do that better.” And both Spitler and Jacobs believe there is still a lot of room for improvement in Verizon’s newly proposed model, too. They draw attention to the new logarithmic regression model’s wide margins of uncertainty as evidence. (See that grey ballpark range between “upper” and “lower” described in the chart above.) Collecting more and better data should narrow that gap, they say.

“This is the Holy Grail in security,” Jacobs says. “People can talk about that there’s vulnerabilities, or that, hey, this has been exploited, but really the question is, So what? How much does it matter to me? How much does this affect my bottom line?”

Corporate data breaches are no doubt high stakes matters. And understanding their impact has shot to the top of mind for c-suites spanning the Fortune 500 and beyond. Though the jury is still out when it comes to confidently and accurately estimating data breach costs, the recommendations for businesses could not be more clear: Defend and protect your data. That’s the best way for corporate stewards to curtail the consequences of a potential compromise.

The reality is that the financial impacts of data breaches depend on a variety of factors, at least half of which remain unknown, according to the Verizon report, yet the most important of which is indeed number of records stolen. As such, Verizon’s framework still rests upon the cost-per-record foundation, though it ditches the notion that it is a strictly linear relationship. Perhaps the best option for predicting impact then would be a combination of the two approaches, utilizing the data collection methodology of Ponemon and the statistical analysis employed by Verizon. The best of both worlds.

Spitler and Jacobs, for their part, urge companies to begin collecting more data around data breaches. That’s the only way, they say, that the current models will improve. “Who wants a weak model that spits out a number that is all but guaranteed to be wrong?” the Verizon researchers ask in the report, before supplying a tongue-in-cheek rejoinder. “For that, you can just use a pair of D20 risk dice.”

Instruments of fantasy role-playing board games aside, there is still much black magic to the art of data breach cost estimation. As the actuarial wizards continue to hone their models, executives (and underwriters) will have to satisfy themselves with some combination of the aforementioned approaches. “I think the most important thing is that this is hopefully a step in what will hopefully someday be a long history on the impact of data breaches,” Jacobs says, projecting iterations of improved models to come. “There’s a whole lot of opportunity here.”

In the meantime, let’s not leave these numbers up to the imagination of a dungeon master.