IBM stakes its claim in ‘scale-out’ storage for big data

May 14, 2014, 4:20 PM UTC

FORTUNE — Get more stuff and you need more storage — that rule applies equally to data and the forgotten items piling up in your garage. The era of big data is upon us, as so many technologists proclaim, but the trouble for large companies is that obtaining extra storage space has traditionally meant buying more refrigerator-sized storage arrays — which can run more than a hundred thousand dollars each — for additional petabytes of data.

In addition to the major capital investment required, buying more arrays can actually slow the system down. It’s something like adding more boxcars to a train without adding any locomotives, said Eric Burgener, a director at the market research firm IDC.

“Every time you add another car, you don’t add power, so it runs slower,” Burgener said. “There’s a certain number of storage controllers, and you can’t increase those.”

That’s the legacy of the traditional “scale up” model of storage that’s been around for about 15 years. But an increasingly common alternative — known instead as “scale out” technology — can offer companies a much better deal. This week, IBM (IBM) threw its hat into the ring by announcing a new technology code-named “Elastic Storage” that offers some compelling benefits, including a potential reduction in storage costs by up to 90 percent.

‘Thousands of yottabytes’

IBM says Elastic Storage is based on technology used in its Watson supercomputer. The technology is part of IBM’s new portfolio for “software-defined storage,” a frequently used term that refers to data-storage technologies in which the hardware is separated from the software that manages the storage infrastructure.

With Elastic Storage, IBM says it can load five terabytes of Watson’s “knowledge” — roughly 200 million pages of data — into the computer’s memory in mere minutes. Elastic Storage can scan 10 billion files on a single cluster in 43 minutes, IBM claims, and the architectural limits to that scalability “stretch into the thousands of ‘yottabytes,'” it says. (A yottabyte is one billion petabytes, or the equivalent of a data center the size of one million city blocks — large enough to fill Delaware and Rhode Island combined.)

MORE: big data look at the lime shortage

“What this architecture does is let you buy a whole bunch of cheap, x86-based servers for $4,000 or $5,000 each, and put some storage in them — flash drives or spinning drives or both,” Burgener said. “You can get in at $10,000 to $15,000, you never incur the cost of buying this refrigerator-sized array, and you can scale up to a petabyte or more by adding boxes one at a time.”

And the “locomotive” problem? “What happens is that since you’re adding another x86 server, you get some CPUs with each one,” he said. “You’re adding performance to help you deal with the additional capacity. You get more balanced scalability — that’s why the IT guys are liking this a lot.”

‘More information, more economically’

Elastic Storage taps IBM’s existing global file system software to provide online storage management, scalable access, and integrated data governance tools capable of managing vast amounts of data and billions of files. Perhaps the technology’s most attractive feature, though, is its ability to automatically move data onto the most economical storage device, potentially leading to a dramatic reduction in costs. By recognizing when a server has flash storage and automatically using it as cache memory, for example, it can improve performance by as much as six times over standard SAS disks, IBM says.

“Elastic Storage is built to allow for the redistribution of data based on economic information as well as availability, which means that companies would be able to store more information, more economically over time,” Stephen O’Grady, co-founder and principal analyst with RedMonk, told Fortune.

The new software targets data-intensive applications requiring high-speed access to massive volumes of information generated by countless devices, sensors, business processes, and social networks; examples include seismic data processing, risk management and financial analysis, weather modeling, and scientific research.

It features native encryption and secure erase, thereby ensuring compliance with regulations such as HIPAA, the health information law, and Sarbanes-Oxley, the accountability law for public companies.

‘The potential to become disruptive’

“IBM’s Elastic Storage offering has the potential to become a disruptive product, and is built around proven technologies such as IBM’s GPFS [Global Parallel File System –Ed.], which has been used in [high-performance computing] environments for several years,” said Henry Baltazar, a senior analyst with Forrester Research.

Essentially, the company is taking its core GPFS and adding features such as server-side caching to accelerate storage performance, Baltazar said. “IBM has also added tiering and storage virtualization capabilities to move stale, infrequently accessed data to less expensive storage mediums such as tape.”

MORE: Beware the ‘big data barbell’

Meanwhile, patented “deduplication” capabilities — a technique focused on eliminating duplicate copies of repeating data — “would help boost the storage efficiency of the platform,” he added.

“One of the challenges facing many enterprises is how to store increasing volumes of data cost-effectively,” Simon Robinson, research vice president for storage and information management with 451 Research, said. “Deduplicating data is one way of achieving this. It also speaks to another pain point: ‘How do I ensure my data is on the most cost-optimal platform over its lifespan?’ ”

‘Many view it as a marketing gimmick’

Scale-out storage is relatively new, Burgener said, but IBM is far from the first to tackle it. Storage industry granddaddy EMC (EMC) entered the game by buying the firm ScaleIO in July 2013, and there are a number of competing startups, including Nexenta, Maxta, and Nutanix, as well as Red Hat Gluster and Ceph.

“From a competitive viewpoint, ‘software defined storage’ has a lot of attention right now,” Robinson said. “Though our research indicates that many IT professionals view it as a marketing gimmick, it does speak to some long-standing issues in storage — essentially, how do I store and manage my data more cost effectively as volumes explode, and as I move to more cloud-like architectures?

He added: “Storage giant EMC has gone big on SDS with ViPR [its software-defined storage product], and lots of startups are playing on this theme, although there’s no real consensus on what the term actually means.”

Looking ahead, such scale-out architectures will likely replace today’s monolithic storage arrays, Burgener predicted. “It could easily take five to seven years,” he said. “They’re still new, and still lack a lot of the functionality you get with an array.”

‘This changes the economics’

Ultimately, the financial and operational benefits of such technology may be too substantial to ignore.

“This hugely changes the economics of creating large storage configurations,” Burgener said. It also gives enterprises “the flexibility to modularly and very granularly increase storage capacity as needed.”

Traditional storage systems must move data to separate designated systems for transaction processing and analytics, Baltazar said, but “Elastic Storage can support analytics platforms such as Hadoop and will also integrate with cloud environments such as OpenStack. The ability to simultaneously handle analytics, cloud and virtualization workloads will allow customers to reduce storage silos within their environment, and should accelerate storage provisioning. Consolidation is a great benefit.”

Subscribe to Well Adjusted, our newsletter full of simple strategies to work smarter and live better, from the Fortune Well team. Sign up today.