Even people who are not data scientists know that Hadoop is a big deal, even if they don’t know exactly what it is. Hadoop in many people’s eyes equals big data. And everyone wants big data, right?
The sales pitch is that Hadoop can take reams of information in different formats, crunch it, and provide the fodder for better business decisions. Companies, for example, can use Hadoop and associated analytics to dig into social networks to find out what people are saying about their products on Twitter, Facebook etc., and use information to change course if necessary.
That sort of thing can be valuable, which is why vendors like Cloudera and Hortonworks
are built on that Hadoop foundation.
And yet … adoption isn’t setting the world on fire, according to a new Gartner survey which shows less than 50% of 284 respondents have invested in Hadoop technology or even plan to do so.
Just over a quarter (26%) of respondents said they are deploying or experimenting with Hadoop and only 11% said they plan to invest in Hadoop within 12 months. In a press release, Gartner
analysts found two possible reasons. One was that respondents did not feel Hadoop was a priority and others felt it was “overkill.” Ouch.
The survey also confirmed the notion that a persistent shortage of Hadoop skills is hindering adoption.
The respondents work in big companies—the average annual revenue for the member companies is $3.4 billion and mean head count is just under 7,900 employees. They hear that Hadoop is this sexy new thing but they don’t know what to make of it, said Gartner Research Director Nick Heudecker.
“We get lots of questions from clients about what this thing is good for,” he said. “It doesn’t help that Hadoop is not a thing but several dozen software components. That is challenging for an enterprise that has lots of priorities.”
And, for those who aren’t sure what Hadoop is, it basically consists of two important subsystems called Hadoop MapReduce and Hadoop Distributed File System.
Together they take enormous quantities of data and process it relatively quickly. It does this by splitting the jobs into tiny pieces and spreading it out over multiple servers. By keeping the data on servers instead of dedicated storage systems, the jobs are processed faster and more cheaply. But there are still compromises. Running a Hadoop job still takes time and is done in batches, so it’s not good for real-time data or jobs that see a continuous inflow of data.
But back to the survey. The upshot seems to be that while Hadoop can handle huge data sets and make them useable, the capabilities needed to set up and run Hadoop remain scarce and expensive. And, for at least a subset of the corporate population, the perceived advantages do not yet outweigh the cost and complications.