• Home
  • Latest
  • Fortune 500
  • Finance
  • Tech
  • Leadership
  • Lifestyle
  • Rankings
  • Multimedia
NewslettersEye on AI

Cerebras hopes planned IPO will supercharge its race against Nvidia and fellow chip startups for the fastest generative AI

Sharon Goldman
By
Sharon Goldman
Sharon Goldman
AI Reporter
Down Arrow Button Icon
Sharon Goldman
By
Sharon Goldman
Sharon Goldman
AI Reporter
Down Arrow Button Icon
October 1, 2024, 3:13 PM ET
Andrew Feldman, CEO of Cerebras Systems.
Andrew Feldman, CEO of Cerebras Systems.Ramsey Cardy—Sportsfile for Collision via Getty Images

Hello and welcome to Eye on AI! In this edition…Governor Newsom vetoes SB 1047; ByteDance plans new AI model based on Huawei chips; Microsoft announces AI models will improve Windows search; and the U.S. Commerce Department sets a new rule that eases restrictions on AI chip shipments to the Middle East.

Recommended Video

Cerebras has a need for speed. In a bid to take on Nvidia, the AI chip startup is rapidly moving toward an IPO after announcing its filing for one yesterday. At the same time, the company is also in a fierce race with fellow AI chip startups Groq and SambaNova for the title of ‘fastest generative AI.’ All three are pushing the boundaries of their highly-specialized hardware and software to enable AI models to produce responses using ultra-fast generative AI that even outperform Nvidia GPUs. 

Here’s what that means: When you ask an AI assistant a question, it must sift through all of the knowledge in its AI model to quickly come up with an answer. In industry parlance, that process is known as “inference.” But large language models don’t sift through words during the inference process. When you ask a question or give a chatbot a prompt, the AI breaks that into smaller pieces called “tokens”—which could represent a word, or a chunk of a word—to process its answer and respond. 

Pushing for faster and faster output

So what does “ultra-fast” inference mean? If you’ve tried chatbots like OpenAI’s ChatGPT, Anthropic’s Claude, or Google’s Gemini, you probably think the output of your prompts arrives at a perfectly reasonable pace. In fact, you may be impressed by how quickly it spits out answers to your queries. But in February 2024, demos of a Groq chatbot based on a Mistral model produced answers far faster than people could read. It went viral. The setup served up 500 tokens per second to produce answers that were nearly instantaneous. By April, Groq delivered an even speedier 800 tokens per second, and by May SambaNova boasted it had broken the 1,000 tokens per second barrier. 

Today, Cerebras, SambaNova, and Groq are all delivering over 1,000 tokens per second, and the “token wars” have revved up considerably. At the end of August, Cerebras claimed it had launched the “world’s fastest AI inference” at 1,800 tokens per second, and last week Cerebras said it had beaten that record and become the “first hardware of any kind” to exceed 2,000 tokens per second on one of Meta’s Llama models. 

When will fast be fast enough?

This led me to ask: Why would anyone need generative AI output to be that fast? When will fast be fast enough?

According to Cerebras CEO Andrew Feldman, generative AI speed is essential since search results will increasingly be powered by generative AI, as well as new capabilities like streaming video. Those are two areas where latency, or the delay between an action and a response, is particularly annoying. 

“Nobody’s going to build a business on an application that makes you sit around and wait,” he told Fortune. 

In addition, AI models are quickly being used to power far more complex applications than just chat. One rapidly growing area of interest is developing application workflows based on AI agents, in which a user asks a question or prompts an action that doesn’t simply involve one query to one model. Instead it leads to multiple queries to multiple models that can go off and do things like search the web or a database. 

“Then the performance really matters,” said Feldman, explaining that a reasonably slow output today could quickly become painfully slow. 

Unlocking AI potential with speed

The bottom line is that speed matters because faster inference unlocks greater potential in applications built with AI, Mark Heaps, chief technology evangelist at Groq, told Fortune. That is especially true for data-heavy applications in fields like financial trading, traffic monitoring, and cybersecurity: “You need insights in real time, a form of instant intelligence that keeps up with the moment,” he said. “The race to increase speed…will provide better quality, accuracy, and potential for greater ROI.” 

It’s worth noting, he pointed out, that AI models still have nowhere near as many neural connections as the human brain. “As the models get more advanced, bigger, or layered with lots of agents using smaller models, it will require more speed to keep the application useful,” he explained, adding that this has been an issue throughout history. “Why do we need cars to get beyond 50 mph? Was it so we could go fast? Or producing an engine that could do 100 mph enabled the ability to carry more weight at 50 mph?” 

Rodrigo Liang, CEO and cofounder of SambaNova, agreed. Inference speed, he told Fortune, “is where the rubber hits the road—where all the training, the building of models, gets put to work to deliver real business value.” That’s particularly true now that the AI industry is moving more of its training from training AI models to putting them into production. “The world is looking for the most efficient way to produce tokens so you can support an ever-growing number of users,” he said. “Speed allows you to service many customers concurrently.” 

Sharon Goldman
sharon.goldman@fortune.com

AI IN THE NEWS

Governor Newsom vetoes California’s SB-1047. On Sunday, news spread quickly through Silicon Valley that Governor Newsom had vetoed SB-1047, a widely debated and ambitious AI regulatory proposal. The bill, if enacted, would have required developers to conduct safety testing on large AI models before public release, the New York Times reported. Critics, however, raised concerns over provisions granting the state’s attorney general the authority to sue companies for harm caused by their technologies. The bill also mandated a “kill switch” to shut down AI systems in the event of potential threats like biowarfare, mass casualties, or significant property damage. “I do not believe this is the best approach to protecting the public from real threats posed by the technology,” Newsom said in a statement. “Instead, the bill applies stringent standards to even the most basic functions—so long as a large system deploys it.”

Sources say ByteDance plans new AI model trained with Huawei chips. Reuters reported that TikTok's Chinese parent ByteDance plans to develop an AI model trained primarily with chips from China’s Huawei Technologies. It's a response to U.S. moves since 2022 to restrict exports of advanced AI chips, particularly from market leader Nvidia. The article claimed that sources said ByteDance's next step in the AI race is to use Huawei's Ascend 910B chip to train a large-language AI model, but ByteDance denied a new model is being developed.

Microsoft announces AI models will improve Windows search on Copilot Plus PCs. Microsoft said today its new Copilot Plus PCs will use AI models to improve Windows search, available starting in November, including a new Click to Do feature that is similar to Google’s Circle to Search function. “AI-powered search makes it dramatically easier to find virtually anything,” said Yusuf Mehdi, executive vice president and consumer chief marketing officer at Microsoft, as reported by the Verge. “You no longer need to remember file names and document locations, nor even specific names of words. Windows will better understand your intent and match the right document, image, file, or email.”

U.S. Commerce Department sets new rule that eases restrictions on AI chip shipments to Middle East. According to Reuters, yesterday the U.S. Commerce Department unveiled a rule that could ease shipments of AI chips like those from Nvidia to Middle East data centers. Since October 2023, U.S. exporters have been required to obtain licenses before shipping advanced chips to parts of the Middle East and Central Asia. But now, data centers will be able to apply for status that will allow them to receive chips, rather than requiring their suppliers to obtain individual licenses to ship to them.

FORTUNE ON AI

Before Mira Murati’s surprise exit from OpenAI, staff grumbled its o1 model had been released prematurely—by Jeremy Kahn, Kali Hays and Sharon Goldman

Why investors want startup founders to own equity—including OpenAI’s Sam Altman—by Sharon Goldman, Kali Hays and Verne Kopytoff

Nvidia shares fall and its Chinese rivals soar after Beijing urges AI companies to look elsewhere for chips—by David Meyer

Mark Cuban warns the U.S. must win the AI race ‘or we lose everything’—by Jason Ma

AI CALENDAR

Oct. 22-23: TedAI, San Francisco

Oct. 28-30: Voice & AI, Arlington, Va.

Nov. 19-22: Microsoft Ignite, Chicago

Dec. 2-6: AWS re:Invent, Las Vegas

Dec. 8-12: Neural Information Processing Systems (Neurips) 2024 in Vancouver, British Columbia

Dec. 9-10: Fortune Brainstorm AI San Francisco (register here)

EYE ON AI RESEARCH

Could generative AI chatbots help reduce belief in conspiracy theories? New research published in Science by Thomas Costello of American University and Gordon Pennycook of Cornell found that discussions with AI chatbots could reduce individuals’ beliefs in conspiracy theories. Using OpenAI’s GPT-4 Turbo, human participants described a conspiracy theory that they subscribed to, and then the AI responded with back and forth with persuasive arguments that refuted their beliefs with evidence. According to the research, “the AI chatbot’s ability to sustain tailored counterarguments and personalized in-depth conversations reduced their beliefs in conspiracies for months, challenging research suggesting that such beliefs are impervious to change.”

BRAIN FOOD

Want a glimpse of your future self using generative AI? If you’ve ever wanted to receive a visit from your future self like in Back to the Future, you may be interested in new research from MIT that created a chatbot for users to have a conversation with an “AI-generated simulation of their potential future self.” The tool, called “ Future You,” uses a large language model and information provided by the user to help young people “improve their sense of future self-continuity, a psychological concept that describes how connected a person feels with their future self.” What if the Future Tool offers negative predictions, causing young people to freak out? The researchers explained that the tool cautions users that its results are only one potential version of their future self, and they can still change their lives. “This is not a prophesy, but rather a possibility,” the lead researcher said. 

This is the online version of Eye on AI, Fortune's biweekly newsletter on how AI is shaping the future of business. Sign up for free.
About the Author
Sharon Goldman
By Sharon GoldmanAI Reporter
LinkedIn icon

Sharon Goldman is an AI reporter at Fortune and co-authors Eye on AI, Fortune’s flagship AI newsletter. She has written about digital and enterprise tech for over a decade.

See full bioRight Arrow Button Icon

Latest in Newsletters

Goldman Sachs' logo seen displayed on a smartphone with an AI chip and symbol in the background.
NewslettersCFO Daily
Goldman Sachs CFO on the company’s AI reboot, talent, and growth
By Sheryl EstradaDecember 10, 2025
2 hours ago
NewslettersCIO Intelligence
Inside tractor maker CNH’s push to bring more artificial intelligence to the farm
By John KellDecember 10, 2025
3 hours ago
NewslettersTerm Sheet
5 VCs sounds off on the AI question du jour
By Amanda GerutDecember 10, 2025
4 hours ago
Hillary Super at the 2025 Victoria's Secret Fashion Show held at Steiner Studios on October 15, 2025 in New York, New York.
NewslettersCEO Daily
Activist investors are disproportionately targeting female CEOs—and it’s costing corporate America dearly
By Phil WahbaDecember 10, 2025
4 hours ago
Databricks co-founder and CEO Ali Ghodsi (right) with Fortune editorial director Andrew Nusca at Fortune Brainstorm AI 2025 in San Francisco. (Photo: Stuart Isett/Fortune)
NewslettersFortune Tech
How Databricks could achieve a trillion-dollar valuation
By Andrew NuscaDecember 10, 2025
5 hours ago
A man and robot sitting opposite each other.
AIEye on AI
The problem with ‘human in the loop’ AI? Often, it’s the humans
By Jeremy KahnDecember 9, 2025
19 hours ago

Most Popular

placeholder alt text
Economy
‘Fodder for a recession’: Top economist Mark Zandi warns about so many Americans ‘already living on the financial edge’ in a K-shaped economy 
By Eva RoytburgDecember 9, 2025
18 hours ago
placeholder alt text
Success
When David Ellison was 13, his billionaire father Larry bought him a plane. He competed in air shows before leaving it to become a Hollywood executive
By Dave SmithDecember 9, 2025
1 day ago
placeholder alt text
Banking
Jamie Dimon taps Jeff Bezos, Michael Dell, and Ford CEO Jim Farley to advise JPMorgan's $1.5 trillion national security initiative
By Nino PaoliDecember 9, 2025
20 hours ago
placeholder alt text
Uncategorized
Transforming customer support through intelligent AI operations
By Lauren ChomiukNovember 26, 2025
14 days ago
placeholder alt text
Politics
Exclusive: U.S. businesses are getting throttled by the drop in tourism from Canada: 'I can count the number of Canadian visitors on one hand'
By Dave SmithDecember 10, 2025
3 hours ago
placeholder alt text
Economy
The 'forever layoffs' era hits a recession trigger as corporates sack 1.1 million workers through November
By Nick Lichtenberg and Eva RoytburgDecember 9, 2025
1 day ago
Rankings
  • 100 Best Companies
  • Fortune 500
  • Global 500
  • Fortune 500 Europe
  • Most Powerful Women
  • Future 50
  • World’s Most Admired Companies
  • See All Rankings
Sections
  • Finance
  • Leadership
  • Success
  • Tech
  • Asia
  • Europe
  • Environment
  • Fortune Crypto
  • Health
  • Retail
  • Lifestyle
  • Politics
  • Newsletters
  • Magazine
  • Features
  • Commentary
  • Mpw
  • CEO Initiative
  • Conferences
  • Personal Finance
  • Education
Customer Support
  • Frequently Asked Questions
  • Customer Service Portal
  • Privacy Policy
  • Terms Of Use
  • Single Issues For Purchase
  • International Print
Commercial Services
  • Advertising
  • Fortune Brand Studio
  • Fortune Analytics
  • Fortune Conferences
  • Business Development
About Us
  • About Us
  • Editorial Calendar
  • Press Center
  • Work At Fortune
  • Diversity And Inclusion
  • Terms And Conditions
  • Site Map

© 2025 Fortune Media IP Limited. All Rights Reserved. Use of this site constitutes acceptance of our Terms of Use and Privacy Policy | CA Notice at Collection and Privacy Notice | Do Not Sell/Share My Personal Information
FORTUNE is a trademark of Fortune Media IP Limited, registered in the U.S. and other countries. FORTUNE may receive compensation for some links to products and services on this website. Offers may be subject to change without notice.