• Home
  • Latest
  • Fortune 500
  • Finance
  • Tech
  • Leadership
  • Lifestyle
  • Rankings
  • Multimedia
TechAI

Israeli startup raises $18.5 million to train A.I. with fake data

Jeremy Kahn
By
Jeremy Kahn
Jeremy Kahn
Editor, AI
Down Arrow Button Icon
March 16, 2021, 3:00 AM ET

Subscribe to Eye on A.I. for expert weekly analysis on the intersection of artificial intelligence and industry, delivered free to your inbox.

Companies interested in using artificial intelligence face a big obstacle: Having enough of the right kind of data to train their systems.

Companies need large amounts of labelled, historical examples to train A.I. systems, particularly those that work with images and videos. The demand has spawned a whole sub-industry of companies that specialize in helping other businesses annotate their data. Among them are Scale AI, which was valued at $3.5 billion in a December 2020 funding round, Hive, Sama,  Labelbox, Cloudfactory, and a division of A.I. company Clarifai, among others.

But there is another way to produce enough data to train A.I. systems: Fabricate it.

Fake it till you make it

That’s essentially what a fast-growing Israeli startup called DataGen specializes in doing. The company uses its own machine learning systems to create what’s known as “synthetic data”—in this case, artificially generated still and video images—that DataGen’s customers then use to train their own A.I.

DataGen can produce a bespoke synthetic dataset for its customers in just a few hours, says Ofir Chakon, DataGen’s founder and chief executive officer. Compare that to the months it typically takes a data labelling company to curate an equivalent real world video or image library.

Synthetic data has other advantages too, in addition to speed. With synthetic data, companies don’t need to worry about any personal identifying information in the dataset, nor need they worry about ethical considerations around how the data was collected. This feature gains significance as more and more of the world’s population is covered by data protection laws. Gartner, the technology analytics firm, says that by 2023, 65% of the world’s population will have their personal data covered by some sort of privacy regulation, up from just 10% last year.

Data bias can still be a problem though. A synthetic dataset can, in some cases, simply replicate the same biases found in a real dataset. But DataGen has ways to potentially eliminate bias. The company can shape the dataset it generates however it wishes, allowing the company to create a lot more examples of unusual or rare cases to ensure that an A.I. system will know how to handle these. For instance, what will happen to a robot that uses a video camera to “see” as it navigates around a warehouse if there is a power cut and the warehouse’s low-level emergency lighting switches on? Acquiring enough examples of these rarer cases is far more difficult with real world datasets.

Courtesy of DataGen

“Our customers have full control over all the parameters that go into the data they create,” Chakon says. “The real-world implication is that, once deployed, you can be sure it’s going to work well in different domains, with different ethnicities, in different geographic locations or any environment you can imagine.”

An enabler for the whole A.I. industry

DataGen has attracted some big name investors.

On Tuesday, the company announced a $18.5 million early stage funding round lead by Israeli venture capital funds TLV Partners and Viola Ventures. The round also includes an impressive list of machine learning luminaries. They include Michael Black, a computer vision pioneer who is now director of the Max Planck Institute for Intelligent Systems; Gal Chechik, director of A.I. research at computer chip giant Nvidia; Anthony Goldbloom, the chief executive officer and cofounder of machine learning competition site Kaggle; and Trevor Darrell, a computer science professor at the University of California at Berkeley. Existing investor Spider Capital is participating in the new funding round too.

Rona Segev, a founding partner at TLV, says that simulated data “addresses problems which are just unsolvable without it.” She says that synthetic data is “an enabler for the whole A.I. industry. Without simulated data, the industry will slow.”

DataGen said it would use the funds to hire more machine learning experts and engineers, expanding from the 30 employees it has currently, most of them based in Israel. Chakon said the company would also expand its focus from creating training sets for machine learning to data that is also designed to test those A.I. systems once they have been trained.

The future product plans aim to address a major problem with a lot of A.I. systems: quality assurance. Oftentimes, only a small subset of available data is reserved for testing an A.I. As a result, it may be hard for a company to test enough rare situations to know how well an A.I. will perform if it encounters the same or similar situations in the real world.

Courtesy of DataGen

DataGen’s cofounders Ofir Chakon, CEO, (left) and Gil Elbaz, tech chief (right) create so-called synthetic data to train A.I. systems.

The startup, which was founded in 2018, has about 10 paying customers so far, “all of them big companies,” Chakon says, although he says contractual agreements prevent him from naming them. DataGen’s data has been used to train warehouse robots to pick items off a conveyor belt, help with factory operations for a home appliance manufacturer, and for a number of physical security applications, such as identifying shop lifters in a retail store.

Just like the real thing

“We are experts in everything that has to do with indoor environments and human perceptions,” Chakon says, adding that the company can also simulate the way people move in indoor environments. “We generate data that looks exactly like the target domain.”

In other words, a set of DataGen-created images of various household items in a crate—a scene used to train a robot picking arm in a logistics warehouse—looks just like the real overhead video images taken of those objects in a real crate on a real warehouse conveyor. A fabricated scene of a kitchen looks as if the company had gone out and commissioned photography of a real kitchen. And a simulation of a person’s face displays all the same movement points, textures, and skin tones as would be found in a real photograph or video.

DataGen uses software that represents objects and people as a kind of three-dimensional mesh, allowing the user to easily edit and adjust their size and shape. The company pairs that visual meshwork with a physics simulator to create realistic scenes of how objects moves. By doing so, the company can easily depict what happens when one object moves on top of or in front of another, potentially obscuring a clear view of that object from a certain angle.

DataGen uses a machine learning technique called a GAN, short for “generative adversarial network,” to create its realistic simulations. GANs also underpin the creation of so-called Deepfakes, which are a kind of synthetic data, but Deepfakes exist only in a two-dimensional representation of a person’s face, not a three-dimensional one.

Chakon says he thinks DataGen’s use of 3-D simulation gives it an advantage over other companies that are trying to use 2-D photos and videos to create synthetic data. He said it is much more difficult to simulate the interaction of objects—particularly when one object obscures or occludes another or two objects collide—accurately with just two-dimensional data.

More must-read tech coverage from Fortune:

  • After its IPO, Coupang eyes South Korea domination
  • The pandemic’s edtech boom won’t slow down anytime soon
  • One year later: 15 ways life has changed since the onset of the COVID pandemic
  • Facebook reveals A.I. that is already improving Instagram video recommendations
  • HBO Max will offer cheaper subscriptions—for people who don’t mind watching ads
About the Author
Jeremy Kahn
By Jeremy KahnEditor, AI
LinkedIn iconTwitter icon

Jeremy Kahn is the AI editor at Fortune, spearheading the publication's coverage of artificial intelligence. He also co-authors Eye on AI, Fortune’s flagship AI newsletter.

See full bioRight Arrow Button Icon

Latest in Tech

Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025

Most Popular

Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Rankings
  • 100 Best Companies
  • Fortune 500
  • Global 500
  • Fortune 500 Europe
  • Most Powerful Women
  • Future 50
  • World’s Most Admired Companies
  • See All Rankings
Sections
  • Finance
  • Leadership
  • Success
  • Tech
  • Asia
  • Europe
  • Environment
  • Fortune Crypto
  • Health
  • Retail
  • Lifestyle
  • Politics
  • Newsletters
  • Magazine
  • Features
  • Commentary
  • Mpw
  • CEO Initiative
  • Conferences
  • Personal Finance
  • Education
Customer Support
  • Frequently Asked Questions
  • Customer Service Portal
  • Privacy Policy
  • Terms Of Use
  • Single Issues For Purchase
  • International Print
Commercial Services
  • Advertising
  • Fortune Brand Studio
  • Fortune Analytics
  • Fortune Conferences
  • Business Development
About Us
  • About Us
  • Editorial Calendar
  • Press Center
  • Work At Fortune
  • Diversity And Inclusion
  • Terms And Conditions
  • Site Map

Latest in Tech

Luigi
CybersecurityCrime
‘It seemed preposterous on its face’: Altoona cop’s supervisor said he’d buy his favorite hoagie moments before Luigi Mangione arrest
By Michael R. Sisak, Jennifer Peltz and The Associated PressDecember 18, 2025
7 hours ago
Bill Gates
CybersecurityJeffrey Epstein
House Democrats release more Epstein photos, including Bill Gates and a dinner full of wealthy philanthropists
By Stephen Groves and The Associated PressDecember 18, 2025
7 hours ago
The Trump Media & Technology Group said Dec. 18 it would merge in a $6 billion deal with the TAE Technologies fusion energy developer.
EnvironmentDonald Trump
CEO of nuclear fusion firm Trump Media is merging with in $6 billion deal: High-velocity capital is ‘critical’ and concerns are secondary
By Jordan BlumDecember 18, 2025
8 hours ago
Lovable CEO
AICoding
Lovable hits $6.6 billion valuation as its CEO says it wants to be ‘the last piece of software’ companies ever buy
By Beatrice NolanDecember 18, 2025
9 hours ago
unemployed
CommentaryLayoffs
The AI efficiency illusion: why cutting 1.1 million jobs will stifle, not scale, your strategy
By Katica RoyDecember 18, 2025
11 hours ago
AIFintech
How Salient, an AI loan processing startup valued at $500 million, grew ARR to $25 million in two years
By Lily Mae LazarusDecember 18, 2025
12 hours ago

Most Popular

placeholder alt text
Economy
The $38 trillion national debt is to blame for over $1 trillion in annual interest payments from here on out, CRFB says
By Nick LichtenbergDecember 17, 2025
2 days ago
placeholder alt text
AI
'Robots are going to be amongst us': Qualcomm exec says buckle up for the next 5 years. Your car is going to be the first shoe to drop
By Nino PaoliDecember 17, 2025
2 days ago
placeholder alt text
C-Suite
Red Lobster CEO Damola Adamolekun says the key to being a better leader is being a better person: ‘Leadership is self-improvement’
By Sydney LakeDecember 17, 2025
2 days ago
placeholder alt text
Success
As millions of Gen Zers face unemployment, McDonald's CEO dishes out some tough love career advice for navigating the market: ‘You've got to make things happen for yourself’
By Preston ForeDecember 16, 2025
3 days ago
placeholder alt text
Success
Britain’s defense chief calls on Gen Z grads leaving university to skip corporate jobs and join the military as war with Russia becomes a growing risk
By Emma BurleighDecember 17, 2025
2 days ago
placeholder alt text
AI
Amazon CEO Andy Jassy announces departure of AI exec Rohit Prasad in leadership shake-up
By Sharon GoldmanDecember 17, 2025
1 day ago

© 2025 Fortune Media IP Limited. All Rights Reserved. Use of this site constitutes acceptance of our Terms of Use and Privacy Policy | CA Notice at Collection and Privacy Notice | Do Not Sell/Share My Personal Information
FORTUNE is a trademark of Fortune Media IP Limited, registered in the U.S. and other countries. FORTUNE may receive compensation for some links to products and services on this website. Offers may be subject to change without notice.