• Home
  • Latest
  • Fortune 500
  • Finance
  • Tech
  • Leadership
  • Lifestyle
  • Rankings
  • Multimedia

Trendingnow

1

The U.S. campaigned to host the World Cup. Now soccer fans will trade their countries' train system for the U.S.'s 'D' rated infrastructure

2

The pig in the python: Baby Boomers are strangling the economy they built by refusing to move or retire

3

Jeff Bezos wants the bottom half of earners to pay zero income tax—he says nurses making just $75K should save $12K a year

1

The U.S. campaigned to host the World Cup. Now soccer fans will trade their countries' train system for the U.S.'s 'D' rated infrastructure

2

The pig in the python: Baby Boomers are strangling the economy they built by refusing to move or retire

3

Jeff Bezos wants the bottom half of earners to pay zero income tax—he says nurses making just $75K should save $12K a year
TechAI

The AI gold rush is hitting a ‘bottleneck’ that could spell disaster for Google and Meta

By
Matt O'Brien
Matt O'Brien
and
The Associated Press
The Associated Press
Down Arrow Button Icon
By
Matt O'Brien
Matt O'Brien
and
The Associated Press
The Associated Press
Down Arrow Button Icon
June 6, 2024, 12:56 PM ET
Sam Altman, wearing a suit and in front of a large blue screen, frowns and looks to the side.
An AI 'bottleneck' could hurt Sam Altman's OpenAI in the near future, a new study found.Andrew Caballero-Reynolds/AFP—Getty Images

Artificial intelligence systems like ChatGPT could soon run out of what keeps making them smarter—the tens of trillions of words people have written and shared online.

Recommended Video

A new study released Thursday by research group Epoch AI projects that tech companies will exhaust the supply of publicly available training data for AI language models by roughly the turn of the decade—sometime between 2026 and 2032.

Comparing it to a “literal gold rush” that depletes finite natural resources, Tamay Besiroglu, an author of the study, said the AI field might face challenges in maintaining its current pace of progress once it drains the reserves of human-generated writing.

In the short term, tech companies like ChatGPT-maker OpenAI and Google are racing to secure and sometimes pay for high-quality data sources to train their AI large language models–for instance, by signing deals to tap into the steady flow of sentences coming out of Reddit forums and news media outlets.

In the longer term, there won’t be enough new blogs, news articles and social media commentary to sustain the current trajectory of AI development, putting pressure on companies to tap into sensitive data now considered private—such as emails or text messages—or relying on less-reliable “synthetic data” spit out by the chatbots themselves.

“There is a serious bottleneck here,” Besiroglu said. “If you start hitting those constraints about how much data you have, then you can’t really scale up your models efficiently anymore. And scaling up models has been probably the most important way of expanding their capabilities and improving the quality of their output.”

A 2- to 8-year cliff

The researchers first made their projections two years ago—shortly before ChatGPT’s debut—in a working paper that forecast a more imminent 2026 cutoff of high-quality text data. Much has changed since then, including new techniques that enabled AI researchers to make better use of the data they already have and sometimes “overtrain” on the same sources multiple times.

But there are limits, and after further research, Epoch now foresees running out of public text data sometime in the next two to eight years.

The team’s latest study is peer-reviewed and due to be presented at this summer’s International Conference on Machine Learning in Vienna, Austria. Epoch is a nonprofit institute hosted by San Francisco-based Rethink Priorities and funded by proponents of effective altruism—a philanthropic movement that has poured money into mitigating AI’s worst-case risks.

Besiroglu said AI researchers realized more than a decade ago that aggressively expanding two key ingredients—computing power and vast stores of internet data—could significantly improve the performance of AI systems.

The amount of text data fed into AI language models has been growing about 2.5 times per year, while computing has grown about 4 times per year, according to the Epoch study. Facebook parent company Meta Platforms recently claimed the largest version of their upcoming Llama 3 model—which has not yet been released—has been trained on up to 15 trillion tokens, each of which can represent a piece of a word.

But how much it’s worth worrying about the data bottleneck is debatable.

“I think it’s important to keep in mind that we don’t necessarily need to train larger and larger models,” said Nicolas Papernot, an assistant professor of computer engineering at the University of Toronto and researcher at the nonprofit Vector Institute for Artificial Intelligence.

‘You photocopy the photocopy’

Papernot, who was not involved in the Epoch study, said building more skilled AI systems can also come from training models that are more specialized for specific tasks. But he has concerns about training generative AI systems on the same outputs they’re producing, leading to degraded performance known as “model collapse.”

Training on AI-generated data is “like what happens when you photocopy a piece of paper and then you photocopy the photocopy. You lose some of the information,” Papernot said. Not only that, but Papernot’s research has also found it can further encode the mistakes, bias and unfairness that’s already baked into the information ecosystem.

If real human-crafted sentences remain a critical AI data source, those who are stewards of the most sought-after troves—websites like Reddit and Wikipedia, as well as news and book publishers—have been forced to think hard about how they’re being used.

“Maybe you don’t lop off the tops of every mountain,” jokes Selena Deckelmann, chief product and technology officer at the Wikimedia Foundation, which runs Wikipedia. “It’s an interesting problem right now that we’re having natural resource conversations about human-created data. I shouldn’t laugh about it, but I do find it kind of amazing.”

While some have sought to close off their data from AI training—often after it’s already been taken without compensation—Wikipedia has placed few restrictions on how AI companies use its volunteer-written entries. Still, Deckelmann said she hopes there continue to be incentives for people to keep contributing, especially as a flood of cheap and automatically generated “garbage content” starts polluting the internet.

AI companies should be “concerned about how human-generated content continues to exist and continues to be accessible,” she said.

From the perspective of AI developers, Epoch’s study says paying millions of humans to generate the text that AI models will need “is unlikely to be an economical way” to drive better technical performance.

As OpenAI begins work on training the next generation of its GPT large language models, CEO Sam Altman told the audience at a United Nations event last month that the company has already experimented with “generating lots of synthetic data” for training.

“I think what you need is high-quality data. There is low-quality synthetic data. There’s low-quality human data,” Altman said. But he also expressed reservations about relying too heavily on synthetic data over other technical methods to improve AI models.

“There’d be something very strange if the best way to train a model was to just generate, like, a quadrillion tokens of synthetic data and feed that back in,” Altman said. “Somehow that seems inefficient.”

Join our exclusive webinar on May 28, featuring tech leaders from Orange, Mars, Reckitt, and Saint-Gobain. Apply to attend and receive Fortune’s editorial takeaways.
About the Authors
By Matt O'Brien
See full bioRight Arrow Button Icon
By The Associated Press
See full bioRight Arrow Button Icon

Latest in Tech

Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025

Most Popular

Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Fortune Secondary Logo
Rankings
  • 100 Best Companies
  • Fortune 500
  • Global 500
  • Fortune 500 Europe
  • Most Powerful Women
  • World's Most Admired Companies
  • See All Rankings
  • Lists Calendar
Sections
  • Finance
  • Fortune Crypto
  • Features
  • Leadership
  • Health
  • Commentary
  • Success
  • Retail
  • Mpw
  • Tech
  • Lifestyle
  • CEO Initiative
  • Asia
  • Politics
  • Conferences
  • Europe
  • Newsletters
  • Personal Finance
  • Environment
  • Magazine
  • Education
Customer Support
  • Frequently Asked Questions
  • Customer Service Portal
  • Privacy Policy
  • Terms Of Use
  • Single Issues For Purchase
  • International Print
Commercial Services
  • Advertising
  • Fortune Brand Studio
  • Fortune Analytics
  • Fortune Conferences
  • Business Development
  • Group Subscriptions
About Us
  • About Us
  • Press Center
  • Work At Fortune
  • Terms And Conditions
  • Site Map
  • About Us
  • Press Center
  • Work At Fortune
  • Terms And Conditions
  • Site Map
  • Facebook icon
  • Twitter icon
  • LinkedIn icon
  • Instagram icon
  • Pinterest icon

Latest in Tech

g
North AmericaEducation
Techlash grows in education: ‘My daughter went to middle school and was sent home with a screen addiction in her backpack’
By Jocelyn Gecker and The Associated PressMay 26, 2026
49 seconds ago
Jensen Huang waving
SuccessView from the C-Suite
Nvidia CEO Jensen Huang admits he criticizes everything his 42,000-plus employees show him: ‘You can’t go a day without some criticism’
By Preston ForeMay 26, 2026
5 minutes ago
Leading without a blueprint: the new reality for European technology chiefs
EuropeFortune 500 Europe
Leading without a blueprint: the new reality for European technology chiefs
By Aslesha MehtaMay 26, 2026
7 minutes ago
Perceptic cofounders Tilman Flock (left), Zaki Trache (center), and Martin Copes.
AIPharmaceutical Industry
Exclusive: Perceptic, a startup automating drug discovery end-to-end for Big Pharma, emerges from stealth with $12 million in seed funding
By Jeremy KahnMay 26, 2026
2 hours ago
eric ries
CommentaryBook Excerpt
I wrote the playbook that built Big Tech. I misjudged what would happen next
By Eric RiesMay 26, 2026
2 hours ago
Ryan Breslow, Bolt CEO, chats with Fortune's Kristin Stoller, editorial director, at the Fortune Workplace Innovation Summit.
NewslettersFortune Workplace Innovation
A CEO fired all of HR. The EEOC is suing the NYT. Both defended it onstage
By Kristin StollerMay 26, 2026
3 hours ago

Most Popular

The U.S. campaigned to host the World Cup. Now soccer fans will trade their countries' train system for the U.S.'s 'D' rated infrastructure
Travel & Leisure
The U.S. campaigned to host the World Cup. Now soccer fans will trade their countries' train system for the U.S.'s 'D' rated infrastructure
By Catherina GioinoMay 25, 2026
1 day ago
The pig in the python: Baby Boomers are strangling the economy they built by refusing to move or retire
Economy
The pig in the python: Baby Boomers are strangling the economy they built by refusing to move or retire
By Nick LichtenbergMay 25, 2026
1 day ago
Jeff Bezos wants the bottom half of earners to pay zero income tax—he says nurses making just $75K should save $12K a year
Success
Jeff Bezos wants the bottom half of earners to pay zero income tax—he says nurses making just $75K should save $12K a year
By Preston ForeMay 21, 2026
5 days ago
Elon Musk's best friend could make more than $100 billion from SpaceX's IPO. His firm is also owed billions by SpaceX
Investing
Elon Musk's best friend could make more than $100 billion from SpaceX's IPO. His firm is also owed billions by SpaceX
By Eva RoytburgMay 25, 2026
1 day ago
A billionaire and an A-list actor found refuge in a 37-home Florida neighborhood with armed guards—proof that privacy is now the ultimate luxury
Real Estate
A billionaire and an A-list actor found refuge in a 37-home Florida neighborhood with armed guards—proof that privacy is now the ultimate luxury
By Marco Quiroz-GutierrezMay 25, 2026
1 day ago
Uber CEO says rideshare 'freed up' his son from having to get a driver’s license—and he's one of many Gen Zers who aren’t willing to drive
Lifestyle
Uber CEO says rideshare 'freed up' his son from having to get a driver’s license—and he's one of many Gen Zers who aren’t willing to drive
By Sasha RogelbergMay 24, 2026
2 days ago

© 2026 Fortune Media IP Limited. All Rights Reserved. Use of this site constitutes acceptance of our Terms of Use and Privacy Policy | CA Notice at Collection and Privacy Notice | Do Not Sell/Share My Personal Information
FORTUNE is a trademark of Fortune Media IP Limited, registered in the U.S. and other countries. FORTUNE may receive compensation for some links to products and services on this website. Offers may be subject to change without notice.