• Home
  • Latest
  • Fortune 500
  • Finance
  • Tech
  • Leadership
  • Lifestyle
  • Rankings
  • Multimedia

Trendingnow

1

Bolt CEO says he let go of his entire HR team for creating problems that didn’t exist: ‘Those problems disappeared when I let them go’ 

2

Meet a 21-year-old community college student who's going to China as the first American woman welder in the trades Olympics

3

The Bezos family just donated $100 million to help achieve one of Mayor Zohran Mamdani’s top campaign promises

1

Bolt CEO says he let go of his entire HR team for creating problems that didn’t exist: ‘Those problems disappeared when I let them go’ 

2

Meet a 21-year-old community college student who's going to China as the first American woman welder in the trades Olympics

3

The Bezos family just donated $100 million to help achieve one of Mayor Zohran Mamdani’s top campaign promises
Fortune Data

How a little open source project came to dominate big data

By
Katherine Noyes
Katherine Noyes
Down Arrow Button Icon
By
Katherine Noyes
Katherine Noyes
Down Arrow Button Icon
June 30, 2014, 5:49 PM ET
Hadoop allows some of the world's largest companies to store and process datasets on clusters of commodity hardware.
Hadoop allows some of the world's largest companies to store and process datasets on clusters of commodity hardware.Photograph by Jetta Productions/Getty Images

There are countless open source projects with crazy names in the software world today, but the vast majority of them never make it onto enterprises’ collective radar. Hadoop is an exception of pachydermic proportions.

Named after a child’s toy elephant, Hadoop is now powering big data applications at companies such as Yahoo (YHOO) and Facebook (FB); more than half of the Fortune 50 use it, providers say.

The software’s “refreshingly unique approach to data management is transforming how companies store, process, analyze and share big data,” according to Forrester analyst Mike Gualtieri. “Forrester believes that Hadoop will become must-have infrastructure for large enterprises.”

Globally, the Hadoop market was valued at $1.5 billion in 2012; by 2020, it is expected to reach $50.2 billion.

It’s not often a grassroots open source project becomes a de facto standard in industry. So how did it happen?

‘A market that was in desperate need’

“Hadoop was a happy coincidence of a fundamentally differentiated technology, a permissively licensed open source codebase and a market that was in desperate need of a solution for exploding volumes of data,” said RedMonk cofounder and principal analyst Stephen O’Grady. “Its success in that respect is no surprise.”

Created by Doug Cutting and Mike Cafarella, the software—like so many other inventions—was born of necessity. In 2002, the pair were working on an open source search engine called Nutch. “We were making progress and running it on a small cluster, but it was hard to imagine how we’d scale it up to running on thousands of machines the way we suspected Google was,” Cutting said.

Shortly thereafter Google (GOOG) published a series of academic papers on its own Google File System and MapReduce infrastructure systems, and “it was immediately clear that we needed some similar infrastructure for Nutch,” Cafarella said.

“The way Google was approaching things was different and powerful,” Cutting explained. Whereas so far at that point “you had to build a special-purpose system for each distributed thing you wanted to do,” Google’s approach offered instead a general-purpose automated framework for distributed computing. “It took care of the hard part of distributed computing so you could focus just on your application,” Cutting said.

Both Cutting and Cafarella (who are now chief architect at Cloudera and University of Michigan assistant professor of computer science and engineering, respectively) knew they wanted to make a version of their own—not just for Nutch, but for the benefit of others as well—and they knew they wanted to make it open source.

“I don’t enjoy the business aspects,” Cutting said. “I’m a technical guy. I enjoy working on the code, tackling the problems with peers and trying to improve it, not trying to sell it. I’d much rather tell people, ‘It’s kind of OK at this; it’s terrible at that; maybe we can make it better.’ To be able to be brutally honest is really nice—it’s much harder to be that way in a commercial setting.”

But the pair knew that the potential upside of success could be staggering.  “If I was right and it was useful technology that lots of people wanted to use, I’d be able to pay my rent—and without having to risk my shirt on a startup,” Cutting said.

For Cafarella, “Making Nutch open source was part of a desire to see search engine technology outside the control of a few companies, but also a tactical decision that would maximize the likelihood of getting contributions from engineers at big companies. We specifically chose an open source license that made it easy for a company to contribute.”

It was a good decision. “Hadoop would not have become a big success without large investments from Yahoo and other firms,” Cafarella said.

‘How would you compete with open source?’

So Hadoop borrowed an idea from Google, made the concept open source, and both encouraged and got investment from powerhouses like Yahoo. But that wasn’t all that drove its success. Luck—in the form of sheer, unanticipated market demand—also played a key role.

“I knew other people would probably have similar problems, but I had no idea just how many other people,” Cutting said. “I thought it would be mostly people building text search engines. I didn’t see it being used by folks in insurance, banking, oil discovery—all these places where it’s being used today.”

Looking back, “my conjecture is that we were early enough, and that the combination of being first movers and being open source and being a substantial effort kept there from being a lot of competitors early on,” he said. “Mike and I got so far, but it took tens of engineers from Yahoo several more years to make it stable.”

And even if a competitor did manage to catch up, “how would you compete with something open source?” Cutting said. “Competing against open source is a tough game—everybody else is collaborating on it; the cost is zero. It’s easier to join than to fight.”

IBM (IBM), Microsoft (MSFT), and Oracle (ORCL) are among the large companies that chose to collaborate with Hadoop.

Though Cafarella isn’t surprised that Web companies use Hadoop, he is astonished at “how many people now have data management problems that 12 years ago were exceedingly rare,” he said. “Everyone now has the problems that used to belong to just Yahoo and Google.”

Hadoop represents “somewhat of a turning point in the primary drivers of open source software technology,” said Jay Lyman, a senior analyst for enterprise software with 451 Research. Before, open source software such as the Linux operating system were best known for offering a cost-effective alternative to proprietary software like Microsoft’s Windows. “Cost savings and efficiency drove much of the enterprise use,” Lyman said.

With the advent of NoSQL databases and Hadoop, however, “we saw innovation among the primary drivers of adoption and use,” Lyman said. “When it comes to NoSQL or Hadoop technology, there is not really a proprietary alternative.”

Hadoop’s success has come as a pleasant surprise to its creators. “I didn’t expect an open source project would ever take over an industry like this,” Cutting said. “I’m overjoyed.”

And it’s still on a roll. “Hadoop is now much bigger than the original components,” Cafarella said. “It’s an entire stack of tools, and the stack keeps growing. Individual components might have some competition—mainly MapReduce—but I don’t see any strong alternative to the overall Hadoop ecosystem.”

The project’s adaptability “argues for its continued success,” RedMonk’s O’Grady said. “Hadoop today is a very different, and more versatile, project than it was even a year or two ago.”

But there’s plenty of work to be done. Looking ahead, Cutting—with the support of Cloudera—has begun to focus on the policy needed to accommodate big data technology.

“Now that we have this technology and so much digitization of just about every aspect of commerce and government and we have these tools to process all this digital data, we need to make sure we’re using it in ways we think are in the interests of society,” he said. “In many ways, the policy needs to catch up with the technology.

“One way or other, we are going to end up with laws. We want them to be the right ones.”

About the Author
By Katherine Noyes
See full bioRight Arrow Button Icon

Latest in

Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025

Most Popular

Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Fortune Secondary Logo
Rankings
  • 100 Best Companies
  • Fortune 500
  • Global 500
  • Fortune 500 Europe
  • Most Powerful Women
  • World's Most Admired Companies
  • See All Rankings
  • Lists Calendar
Sections
  • Finance
  • Fortune Crypto
  • Features
  • Leadership
  • Health
  • Commentary
  • Success
  • Retail
  • Mpw
  • Tech
  • Lifestyle
  • CEO Initiative
  • Asia
  • Politics
  • Conferences
  • Europe
  • Newsletters
  • Personal Finance
  • Environment
  • Magazine
  • Education
Customer Support
  • Frequently Asked Questions
  • Customer Service Portal
  • Privacy Policy
  • Terms Of Use
  • Single Issues For Purchase
  • International Print
Commercial Services
  • Advertising
  • Fortune Brand Studio
  • Fortune Analytics
  • Fortune Conferences
  • Business Development
  • Group Subscriptions
About Us
  • About Us
  • Press Center
  • Work At Fortune
  • Terms And Conditions
  • Site Map
  • About Us
  • Press Center
  • Work At Fortune
  • Terms And Conditions
  • Site Map
  • Facebook icon
  • Twitter icon
  • LinkedIn icon
  • Instagram icon
  • Pinterest icon

Latest in

Clinical Psychologist Daniel Wendler
ConferencesWorkplace Innovation Summit
A ‘proudly autistic’ workplace expert says putting neurodivergent employees in a typical office is like dropping a polar bear in Austin, Texas
By Tristan BoveMay 20, 2026
35 minutes ago
Pay transparency is exposing a bigger problem: Most companies can’t explain why they pay what they pay
Workplace CultureWorkplace Innovation Summit
Pay transparency is exposing a bigger problem: Most companies can’t explain why they pay what they pay
By Sydney LakeMay 20, 2026
56 minutes ago
Hiba Mona Anver, wearing a black and white striped dress, gestures with her hands as she speaks onstage.
North AmericaWorkplace Innovation Summit
80% of companies have an immigrant in a top leadership role—Trump’s visa crackdown is forcing them to make a ‘plan C,’ warns immigration expert
By Sasha RogelbergMay 20, 2026
1 hour ago
harvard
EconomyHarvard University
Harvard admits it was too easy to get A grades, vows crackdown
By Leah Willingham and The Associated PressMay 20, 2026
1 hour ago
frank
PoliticsObituary
Barney Frank, legendary liberal who ripped into left-wing dysfunction on his death bed, dies at 86
By Steven Sloan and The Associated PressMay 20, 2026
1 hour ago
Professor Jeff DeGraff.
SuccessWorkplace Innovation Summit
‘We’ve given them the short end of the stick’: Business school dean says AI could eliminate many jobs for young people—even as they lead innovation
By Preston ForeMay 20, 2026
1 hour ago

Most Popular

Bolt CEO says he let go of his entire HR team for creating problems that didn’t exist: ‘Those problems disappeared when I let them go’ 
Workplace Culture
Bolt CEO says he let go of his entire HR team for creating problems that didn’t exist: ‘Those problems disappeared when I let them go’ 
By Preston ForeMay 19, 2026
23 hours ago
Meet a 21-year-old community college student who's going to China as the first American woman welder in the trades Olympics
Future of Work
Meet a 21-year-old community college student who's going to China as the first American woman welder in the trades Olympics
By Mike Householder and The Associated PressMay 17, 2026
3 days ago
The Bezos family just donated $100 million to help achieve one of Mayor Zohran Mamdani’s top campaign promises
Politics
The Bezos family just donated $100 million to help achieve one of Mayor Zohran Mamdani’s top campaign promises
By Jake AngeloMay 12, 2026
8 days ago
Current price of oil as of May 19, 2026
Personal Finance
Current price of oil as of May 19, 2026
By Joseph HostetlerMay 19, 2026
1 day ago
Spirit Airlines apologizes to all the Americans who can't afford any summer vacation flights as it shuts down
Travel & Leisure
Spirit Airlines apologizes to all the Americans who can't afford any summer vacation flights as it shuts down
By Rio Yamat and The Associated PressMay 18, 2026
2 days ago
CNN analyst and 'The Morning Show' producer says Stephen Colbert is a role model for his ‘positive’ outlook on his show ending
Arts & Entertainment
CNN analyst and 'The Morning Show' producer says Stephen Colbert is a role model for his ‘positive’ outlook on his show ending
By Emma BurleighMay 19, 2026
1 day ago

© 2026 Fortune Media IP Limited. All Rights Reserved. Use of this site constitutes acceptance of our Terms of Use and Privacy Policy | CA Notice at Collection and Privacy Notice | Do Not Sell/Share My Personal Information
FORTUNE is a trademark of Fortune Media IP Limited, registered in the U.S. and other countries. FORTUNE may receive compensation for some links to products and services on this website. Offers may be subject to change without notice.