At Coca-Cola Bottling, flash memory energizes big data efforts

June 27, 2014, 7:25 PM UTC
Coca-Cola soda bottles sit on a delivery truck in Mexico City
Coca-Cola soda bottles sit on a delivery truck in Mexico City.
Susana Gonzalez/Bloomberg—Getty Images

Coca-Cola may have once declared itself “the official soft drink of summer,” but as summer approached in 2013, clouds of uncertainty hung over its largest independent U.S. bottler. Mounting volumes of data were testing the technology in place there, challenging the company’s ability to keep Coke products on store shelves.

At the time, Charlotte, North Carolina-based Coca-Cola Bottling Co. Consolidated (COKE) had just recently begun upgrading its supply chain software. With new capabilities, the company hoped to improve its demand forecasting—specifically by narrowing the focus from warehouses down to individual customers to better predict how much of which products was needed on any given day at any particular store or vending machine.

It’s a formidable challenge. CCBCC has five production centers and 47 distribution centers and serves 11 states, mostly in the southeastern U.S. The company rolls out some 18,000 cases of beverage products every hour. Making sure the right ones get to the right places has never been a trivial matter, but the decision to narrow forecasting brought with it a gigantic leap in the amount of data that had to be processed each day, from 100,000 to 3.5 million data points—”demand forecasting units,” in supply chain speak.

The problems piled up: Nightly batch schedules began taking four to five hours longer to process than they had before. That caused delays in the next day’s route planning and deliveries. And that meant the company was missing targets for its service-level agreements (SLAs), leading to higher costs.

“We were going to have to renegotiate our SLAs if we went live,” said Tom DeJuneas, the company’s IT infrastructure manager. “There was also going to be no time to address issues during the night.”

CCBCC’s solution? Flash storage, the memory technology commonly found in mobile phones, tablet PCs, and USB drives, which is increasingly playing a key role for processing big data in the enterprise.

As a result, CCBCC saw a 75 percent reduction in processing time without having to replace any servers. Processing jobs that took 45 minutes were reduced to just six, DeJuneas said. Most important, forecasts are better, SLAs are being met, and products are on the shelves at the right place and the right time.

‘Flash is a thousand times faster’

Hard-disk drive (HDD) technology has long been the storage mainstay in most corporate settings, but flash—which is used in solid-state drives (SSDs) such as those found in Apple’s MacBook Air—is an increasingly attractive solution for enterprises struggling under the weight of big data, according to a recent Gartner report.

“When you think of big data, think of ‘hot’ data and ‘cold’ data,” said Joseph Unsworth, vice president at Gartner and the author of the report.

Flash-based SSD technology, while more expensive, is great at accessing data quickly—ideal for the “hot” scenario where speed is essential, he explained. HDD technology, on the other hand, is better for “cold” data, where speed is less important than the ability to store it cheaply.

“We continue to see strong growth for SSD in both PC and data centers and the emerging market for solid-state arrays—storage arrays that are 100% based on SSD technology,” Unsworth said.

Indeed, CCBCC’s solution was IBM’s FlashSystem 840, an all-flash storage array that stems from its 2012 purchase of Texas Memory Systems. The bottler has since migrated its data warehouse onto the system as well.

“The faster you can complete a huge analysis, the better, and flash is a thousand times faster” than hard-disk drives are, said Eric Burgener, research director for IDC’s Storage group.

Flash has long been prohibitively expensive for all but the most critical applications, Burgener said. Now, however, those costs are coming down, making it more attractive for a much wider array of applications, he said.

Just a few years ago, flash memory was priced on the order of $35 per gigabyte; today, enterprise flash is priced per gigabyte more in the $6 to $10 range. Hard disks, by contrast, are more like 50 cents to a dollar per gigabyte, he said—considerably cheaper, but also significantly slower.

“We see flash dropping to $2 to $3 per gigabyte for raw flash capacity in the next two to three years,” Burgener said. “Absolutely it will be used more for application workloads as costs come down because it will be easier to justify the cost.”

‘That’s what makes big data big’

IBM (IBM) leads the $667 million worldwide market for flash storage solid-state arrays in terms of revenue, according to Gartner, but it’s by no means the only vendor competing in this space. Pure Storage ranked No. 2 on Gartner’s list, for example, and it offers several all-flash storage options of its own.

“Traditional enterprise processes were designed to reduce data,” said John Hayes, a cofounder and chief architect with Pure Storage. “It’s a very human-centric approach—people can’t fathom keeping all of their data.

“But ‘big data’ is about considering everything simultaneously to learn outside of preconceived boundaries. In the consumer world, big data is used to determine relevance, not aggregates. Relevance requires keeping facts about everything in your database; it’s a multiplier—that’s what makes ‘big data’ big.”

It can take weeks for data scientists to discover a meaningful insight, months to translate it into operation, and even longer to determine if there is positive result at the business operations level, Hayes said. Flash enables “instant data availability, delivering intensive I/O required to run data science within operations,” he added.

“If you really want to follow data where it leads you, flash is an ideal technology,” Hayes said. “If you want to do it instantaneously, flash is essential.”