• Home
  • Latest
  • Fortune 500
  • Finance
  • Tech
  • Leadership
  • Lifestyle
  • Rankings
  • Multimedia
TechAI

If data is the new oil, these companies are the new Baker Hughes

Jeremy Kahn
By
Jeremy Kahn
Jeremy Kahn
Editor, AI
Down Arrow Button Icon
February 4, 2020, 7:00 AM ET

Artificial intelligence runs on data. And today, in most cases, that data needs to be labeled by humans.

This is particularly true when using computer vision to identify tumors on medical scans, spot roof damage from aerial photography or figure out whether an object crossing in front of your self-driving car is a plastic bag or a mother pushing a stroller. But it’s also true for speech recognition: To train the software, someone must provide an accurate transcript to match an audio recording.

Data labeling for machine learning has spawned an entirely new industry, and the companies springing up to help businesses label their data are among the hottest “picks and shovels” investment plays for venture capitalists hoping to cash in on the current A.I. gold rush.

The latest datapoint in this data labeling boom: Labelbox, a San Francisco startup that operates a software platform for helping companies manage their data labeling tasks, on Tuesday announced it had received $25 million in additional venture capital funding.

The money is from prominent Silicon Valley venture capital firm Andreessen Horowitz, whose managing partner Peter Levine, is joining Labelbox’s board; Google’s A.I.-focused venture capital fund, Gradient Ventures; and Kleiner Perkins, another of the Valley’s best-known firms.

The investment, which is Labelbox’s Series B, or second round of institutional financing, brings the total that the not-quite-two-year-old startup has raised to $39 million.

Labelbox competes with a number of other labeling companies: there’s Scale AI, another San Francisco data labeling platform that has raised $122 million since its founding three years ago, as well as companies that specialize in running teams of human data labelers on a project basis, such as Hive, Cloudfactory, and Samasource, the startup founded by Leila Janah, who died last month at age 37, but who saw data labeling as a way to bring decent wages and skilled work to people in the developing world.

Alexandr Wang, the 23-year-old founder and CEO of Scale AI, which has worked with a number of self-driving car companies, says that the “dirty secret” of artificial intelligence is that getting the software to work well in the real world requires a large amount of high-quality data.

“Where the rubber hits the road is what does the data these A.I. systems are trained on look like?” he says. “Is that data biased? Is that data high quality? Does that data have noise? Is that data comprehensive?”

Providing labels can be relatively low-skilled work (identifying “cats” in videos) performed by thousands of contractors in traditional outsourcing hubs such as India, Romania, or the Philippines, or it can be much higher-skilled work performed by radiologists (outline the exact contours of a tumor on a medical scan) or lawyers (identify a non-compete clause in a contract). Often companies have a need for both general and more expert labeling and employ a combination of outsourcing firms, freelancers, and in-house experts to affix these annotations. The labels can be in the form of bounding boxes around objects, tagging items visually or with text labels in photographs, or entering a classification into a separate text-based database that accompanies the original data.

Wang says that with such complex work flows, data governance—how companies track what data they are using, who’s using it, and what they are doing with it— is critical. “It isn’t sexy, but it really matters,” he says. Companies trying to deploy machine learning are often slowed because they don’t have systems in place to manage data labeling efficiently, he says.

Both Scale AI and Labelbox provide tools to help companies’ machine learning and data science teams analyze the data once it is labeled, allowing them to identify blindspots and biases. For example, are men overrepresented in your X-ray data (bias)? Or did you have too few examples of cats running across the road in order to train your self-driving algorithm to brake for them (a blindspot)? “Every A.I. company needs tools to edit, manage, and review labels,” Manu Sharma, Labelbox’s co-founder and CEO, says.

Michael Phillippi, vice president of technology at Lytx, a San Diego company that sells systems that allow trucking businesses to assess and track drivers’ behavior through cameras and sensor data, says it takes about 10,000 hours of labeled 20-second video clips to train a prototype A.I. system to detect something like driver distraction. To put that system into actual production, though, requires four to five million hours of video, he says. That is a lot of labeling.

John-Isaac Clark is the CEO of Arturo.ai, a spin out from American Family Insurance that specializes in machine learning software to analyze images, including satellite and aerial photography, for the insurance industry. He says that large, well-labeled data sets are especially important for training A.I. software to correctly identify “edge cases”—unusual or rare situations.

Humans can often use common sense to deal with these situations, even when they haven’t encountered them before. Most A.I. systems, in contrast, need to have seen multiple examples during training to correctly handle them.

Both Arturo and Lytx are Labelbox customers. Clark says Labelbox enabled Arturo to reduce the number of employees it needed to supervise its data labeling contractors from four to just one.

Sharma and his co-founder Brian Rieger, who is now the Labelbox’s chief operating officer, met when they both worked in aeronautics industry, helping to design and test flight control systems. Sharma later worked for Planet Labs, a company that analyzes gigantic datasets of satellite images, where he realized the difficulty companies had with managing labeling tasks for A.I. training data and began thinking of creating a company to address this problem. His other co-founder, Dan Rasmuson, now Labelbox’s chief technology officer, had encountered similar problems working at a company that sold drone imagery.

Labelbox’s software supplies a set of labeling tools for both images and text, as well as a way to distribute data to labelers in such a way that multiple labelers can work on the same data simultaneously without duplicating any labels.

Some companies in the labeling space, such as Scale AI and Hive, provide labeling services themselves. In fact, Scale AI uses its own A.I. software to automatically generate labels for certain kinds of data. These labels are then checked by humans to ensure accuracy, Wang says.

Automatic labeling, he says, allows Scale AI’s customers to benefit from the work Scale AI has done in the past—if it has already built a system to detect cars in videos, for instance, customers may not need to train their own system from scratch. Even in cases where customers want to build their own models, he says, automatic labeling makes the process more efficient.

Labelbox, meanwhile, has taken a different approach. It doesn’t perform any labeling itself. Instead, it’s a tool for managing labeling projects and data across different contract labelers, who often work for large outsourcing firms. The software also allows Labelbox’s customers to audit the quality of labeling contractors. Labelbox gets paid based on how much data a customer runs through the software.

Andreessen Horowitz’s Levine compares Labelbox to Github, the software code repository that many companies use to manage their code. Acquired by Microsoft for $7.5 billion in 2018, it was an Andreessen Horowitz investment. “Labelbox has the potential to fill a similar role for data in the AI/ML world,” Levine writes in response to emailed questions, using shorthand for artificial intelligence and machine learning. He says the platform can serve as “a single source of truth” for training data across an organization.

This story has been updated to correct the spelling of Labelbox chief technology officer Dan Rasmuson’s last name.

More must-read stories from Fortune:

—The long ocean voyage that helped find the flaws in GPS
—Global companies enter lockdown mode as coronavirus rocks China
—3 key takeaways from Tesla’s blockbuster fourth-quarter earnings
—Facebook says its ad machine is being weakened by privacy changes
—Predicting the biggest tech headlines of 2020

Catch up with Data Sheet, Fortune’s daily digest on the business of tech.

About the Author
Jeremy Kahn
By Jeremy KahnEditor, AI
LinkedIn iconTwitter icon

Jeremy Kahn is the AI editor at Fortune, spearheading the publication's coverage of artificial intelligence. He also co-authors Eye on AI, Fortune’s flagship AI newsletter.

See full bioRight Arrow Button Icon

Latest in Tech

Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025

Most Popular

Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Rankings
  • 100 Best Companies
  • Fortune 500
  • Global 500
  • Fortune 500 Europe
  • Most Powerful Women
  • Future 50
  • World’s Most Admired Companies
  • See All Rankings
Sections
  • Finance
  • Leadership
  • Success
  • Tech
  • Asia
  • Europe
  • Environment
  • Fortune Crypto
  • Health
  • Retail
  • Lifestyle
  • Politics
  • Newsletters
  • Magazine
  • Features
  • Commentary
  • Mpw
  • CEO Initiative
  • Conferences
  • Personal Finance
  • Education
Customer Support
  • Frequently Asked Questions
  • Customer Service Portal
  • Privacy Policy
  • Terms Of Use
  • Single Issues For Purchase
  • International Print
Commercial Services
  • Advertising
  • Fortune Brand Studio
  • Fortune Analytics
  • Fortune Conferences
  • Business Development
About Us
  • About Us
  • Editorial Calendar
  • Press Center
  • Work At Fortune
  • Diversity And Inclusion
  • Terms And Conditions
  • Site Map

© 2025 Fortune Media IP Limited. All Rights Reserved. Use of this site constitutes acceptance of our Terms of Use and Privacy Policy | CA Notice at Collection and Privacy Notice | Do Not Sell/Share My Personal Information
FORTUNE is a trademark of Fortune Media IP Limited, registered in the U.S. and other countries. FORTUNE may receive compensation for some links to products and services on this website. Offers may be subject to change without notice.


Most Popular

placeholder alt text
Future of Work
Meet a 55-year-old automotive technician in Arkansas who didn’t care if his kids went to college: ‘There are options’
By Muskaan ArshadDecember 21, 2025
1 day ago
placeholder alt text
Future of Work
A Walmart employee nearly doubled her pay after entering its pipeline for skilled tradespeople. 'I was able to move out of my parents' house'
By Anne D'Innocenzio and The Associated PressDecember 20, 2025
1 day ago
placeholder alt text
Success
Multimillionaire musician Will.i.am says work-life balance is for people 'working on someone else’s dream'—he grinds from 5-to-9 after his 9-to-5
By Orianna Rosa RoyleDecember 21, 2025
19 hours ago
placeholder alt text
Economy
For the first time since Trump’s tariff rollout, import tax revenue has fallen, threatening his lofty plans to slash the $38 trillion national debt
By Sasha RogelbergDecember 12, 2025
10 days ago
placeholder alt text
Economy
Even if the Supreme Court rules Trump's global tariffs are illegal, refunds are unlikely because that would be 'very complicated,' Hassett says
By Jason MaDecember 21, 2025
13 hours ago
placeholder alt text
Success
The scientist who helped create AI says it’s only ‘a matter of time’ before every single job is wiped out—even safer trade jobs like plumbing
By Orianna Rosa RoyleDecember 19, 2025
3 days ago

Latest in Tech

A Waymo robotaxi unable to detect traffic lights after a major power outage in San Francisco, California on December 20, 2025. (Photo: Tayfun Coskun/Anadolu/Getty Images)
NewslettersFortune Tech
What happened when Waymo robotaxis met a San Francisco blackout
By Andrew NuscaDecember 22, 2025
39 minutes ago
AIdesign thinking
A top global design alliance is embracing AI to ‘let designers focus more on empathy and creativity’
By Angelica AngDecember 22, 2025
3 hours ago
AIOpenAI
OpenAI sees better margins on business sales, report says
By Mark Bergen and BloombergDecember 21, 2025
12 hours ago
Innovationautonomy
Waymos froze, blocked traffic during San Francisco power outage
By Maria Paula Mijares Torres and BloombergDecember 21, 2025
12 hours ago
Successwork-life balance
Multimillionaire musician Will.i.am says work-life balance is for people ‘working on someone else’s dream’—he grinds from 5-to-9 after his 9-to-5
By Orianna Rosa RoyleDecember 21, 2025
19 hours ago
Young banker
SuccessCareers
Is AI really killing finance and banking jobs? Experts say Wall Street’s layoffs may be more hype than takeover—for now
By Emma BurleighDecember 21, 2025
23 hours ago