• Home
  • Latest
  • Fortune 500
  • Finance
  • Tech
  • Leadership
  • Lifestyle
  • Rankings
  • Multimedia
NewslettersEye on AI

Cerebras hopes planned IPO will supercharge its race against Nvidia and fellow chip startups for the fastest generative AI

Sharon Goldman
By
Sharon Goldman
Sharon Goldman
AI Reporter
Down Arrow Button Icon
Sharon Goldman
By
Sharon Goldman
Sharon Goldman
AI Reporter
Down Arrow Button Icon
October 1, 2024, 3:13 PM ET
Andrew Feldman, CEO of Cerebras Systems.
Andrew Feldman, CEO of Cerebras Systems.Ramsey Cardy—Sportsfile for Collision via Getty Images

Hello and welcome to Eye on AI! In this edition…Governor Newsom vetoes SB 1047; ByteDance plans new AI model based on Huawei chips; Microsoft announces AI models will improve Windows search; and the U.S. Commerce Department sets a new rule that eases restrictions on AI chip shipments to the Middle East.

Recommended Video

Cerebras has a need for speed. In a bid to take on Nvidia, the AI chip startup is rapidly moving toward an IPO after announcing its filing for one yesterday. At the same time, the company is also in a fierce race with fellow AI chip startups Groq and SambaNova for the title of ‘fastest generative AI.’ All three are pushing the boundaries of their highly-specialized hardware and software to enable AI models to produce responses using ultra-fast generative AI that even outperform Nvidia GPUs. 

Here’s what that means: When you ask an AI assistant a question, it must sift through all of the knowledge in its AI model to quickly come up with an answer. In industry parlance, that process is known as “inference.” But large language models don’t sift through words during the inference process. When you ask a question or give a chatbot a prompt, the AI breaks that into smaller pieces called “tokens”—which could represent a word, or a chunk of a word—to process its answer and respond. 

Pushing for faster and faster output

So what does “ultra-fast” inference mean? If you’ve tried chatbots like OpenAI’s ChatGPT, Anthropic’s Claude, or Google’s Gemini, you probably think the output of your prompts arrives at a perfectly reasonable pace. In fact, you may be impressed by how quickly it spits out answers to your queries. But in February 2024, demos of a Groq chatbot based on a Mistral model produced answers far faster than people could read. It went viral. The setup served up 500 tokens per second to produce answers that were nearly instantaneous. By April, Groq delivered an even speedier 800 tokens per second, and by May SambaNova boasted it had broken the 1,000 tokens per second barrier. 

Today, Cerebras, SambaNova, and Groq are all delivering over 1,000 tokens per second, and the “token wars” have revved up considerably. At the end of August, Cerebras claimed it had launched the “world’s fastest AI inference” at 1,800 tokens per second, and last week Cerebras said it had beaten that record and become the “first hardware of any kind” to exceed 2,000 tokens per second on one of Meta’s Llama models. 

When will fast be fast enough?

This led me to ask: Why would anyone need generative AI output to be that fast? When will fast be fast enough?

According to Cerebras CEO Andrew Feldman, generative AI speed is essential since search results will increasingly be powered by generative AI, as well as new capabilities like streaming video. Those are two areas where latency, or the delay between an action and a response, is particularly annoying. 

“Nobody’s going to build a business on an application that makes you sit around and wait,” he told Fortune. 

In addition, AI models are quickly being used to power far more complex applications than just chat. One rapidly growing area of interest is developing application workflows based on AI agents, in which a user asks a question or prompts an action that doesn’t simply involve one query to one model. Instead it leads to multiple queries to multiple models that can go off and do things like search the web or a database. 

“Then the performance really matters,” said Feldman, explaining that a reasonably slow output today could quickly become painfully slow. 

Unlocking AI potential with speed

The bottom line is that speed matters because faster inference unlocks greater potential in applications built with AI, Mark Heaps, chief technology evangelist at Groq, told Fortune. That is especially true for data-heavy applications in fields like financial trading, traffic monitoring, and cybersecurity: “You need insights in real time, a form of instant intelligence that keeps up with the moment,” he said. “The race to increase speed…will provide better quality, accuracy, and potential for greater ROI.” 

It’s worth noting, he pointed out, that AI models still have nowhere near as many neural connections as the human brain. “As the models get more advanced, bigger, or layered with lots of agents using smaller models, it will require more speed to keep the application useful,” he explained, adding that this has been an issue throughout history. “Why do we need cars to get beyond 50 mph? Was it so we could go fast? Or producing an engine that could do 100 mph enabled the ability to carry more weight at 50 mph?” 

Rodrigo Liang, CEO and cofounder of SambaNova, agreed. Inference speed, he told Fortune, “is where the rubber hits the road—where all the training, the building of models, gets put to work to deliver real business value.” That’s particularly true now that the AI industry is moving more of its training from training AI models to putting them into production. “The world is looking for the most efficient way to produce tokens so you can support an ever-growing number of users,” he said. “Speed allows you to service many customers concurrently.” 

Sharon Goldman
sharon.goldman@fortune.com

AI IN THE NEWS

Governor Newsom vetoes California’s SB-1047. On Sunday, news spread quickly through Silicon Valley that Governor Newsom had vetoed SB-1047, a widely debated and ambitious AI regulatory proposal. The bill, if enacted, would have required developers to conduct safety testing on large AI models before public release, the New York Times reported. Critics, however, raised concerns over provisions granting the state’s attorney general the authority to sue companies for harm caused by their technologies. The bill also mandated a “kill switch” to shut down AI systems in the event of potential threats like biowarfare, mass casualties, or significant property damage. “I do not believe this is the best approach to protecting the public from real threats posed by the technology,” Newsom said in a statement. “Instead, the bill applies stringent standards to even the most basic functions—so long as a large system deploys it.”

Sources say ByteDance plans new AI model trained with Huawei chips. Reuters reported that TikTok's Chinese parent ByteDance plans to develop an AI model trained primarily with chips from China’s Huawei Technologies. It's a response to U.S. moves since 2022 to restrict exports of advanced AI chips, particularly from market leader Nvidia. The article claimed that sources said ByteDance's next step in the AI race is to use Huawei's Ascend 910B chip to train a large-language AI model, but ByteDance denied a new model is being developed.

Microsoft announces AI models will improve Windows search on Copilot Plus PCs. Microsoft said today its new Copilot Plus PCs will use AI models to improve Windows search, available starting in November, including a new Click to Do feature that is similar to Google’s Circle to Search function. “AI-powered search makes it dramatically easier to find virtually anything,” said Yusuf Mehdi, executive vice president and consumer chief marketing officer at Microsoft, as reported by the Verge. “You no longer need to remember file names and document locations, nor even specific names of words. Windows will better understand your intent and match the right document, image, file, or email.”

U.S. Commerce Department sets new rule that eases restrictions on AI chip shipments to Middle East. According to Reuters, yesterday the U.S. Commerce Department unveiled a rule that could ease shipments of AI chips like those from Nvidia to Middle East data centers. Since October 2023, U.S. exporters have been required to obtain licenses before shipping advanced chips to parts of the Middle East and Central Asia. But now, data centers will be able to apply for status that will allow them to receive chips, rather than requiring their suppliers to obtain individual licenses to ship to them.

FORTUNE ON AI

Before Mira Murati’s surprise exit from OpenAI, staff grumbled its o1 model had been released prematurely—by Jeremy Kahn, Kali Hays and Sharon Goldman

Why investors want startup founders to own equity—including OpenAI’s Sam Altman—by Sharon Goldman, Kali Hays and Verne Kopytoff

Nvidia shares fall and its Chinese rivals soar after Beijing urges AI companies to look elsewhere for chips—by David Meyer

Mark Cuban warns the U.S. must win the AI race ‘or we lose everything’—by Jason Ma

AI CALENDAR

Oct. 22-23: TedAI, San Francisco

Oct. 28-30: Voice & AI, Arlington, Va.

Nov. 19-22: Microsoft Ignite, Chicago

Dec. 2-6: AWS re:Invent, Las Vegas

Dec. 8-12: Neural Information Processing Systems (Neurips) 2024 in Vancouver, British Columbia

Dec. 9-10: Fortune Brainstorm AI San Francisco (register here)

EYE ON AI RESEARCH

Could generative AI chatbots help reduce belief in conspiracy theories? New research published in Science by Thomas Costello of American University and Gordon Pennycook of Cornell found that discussions with AI chatbots could reduce individuals’ beliefs in conspiracy theories. Using OpenAI’s GPT-4 Turbo, human participants described a conspiracy theory that they subscribed to, and then the AI responded with back and forth with persuasive arguments that refuted their beliefs with evidence. According to the research, “the AI chatbot’s ability to sustain tailored counterarguments and personalized in-depth conversations reduced their beliefs in conspiracies for months, challenging research suggesting that such beliefs are impervious to change.”

BRAIN FOOD

Want a glimpse of your future self using generative AI? If you’ve ever wanted to receive a visit from your future self like in Back to the Future, you may be interested in new research from MIT that created a chatbot for users to have a conversation with an “AI-generated simulation of their potential future self.” The tool, called “ Future You,” uses a large language model and information provided by the user to help young people “improve their sense of future self-continuity, a psychological concept that describes how connected a person feels with their future self.” What if the Future Tool offers negative predictions, causing young people to freak out? The researchers explained that the tool cautions users that its results are only one potential version of their future self, and they can still change their lives. “This is not a prophesy, but rather a possibility,” the lead researcher said. 

This is the online version of Eye on AI, Fortune's biweekly newsletter on how AI is shaping the future of business. Sign up for free.
About the Author
Sharon Goldman
By Sharon GoldmanAI Reporter
LinkedIn icon

Sharon Goldman is an AI reporter at Fortune and co-authors Eye on AI, Fortune’s flagship AI newsletter. She has written about digital and enterprise tech for over a decade.

See full bioRight Arrow Button Icon

Latest in Newsletters

Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025

Most Popular

Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Fortune Secondary Logo
Rankings
  • 100 Best Companies
  • Fortune 500
  • Global 500
  • Fortune 500 Europe
  • Most Powerful Women
  • Future 50
  • World’s Most Admired Companies
  • See All Rankings
Sections
  • Finance
  • Fortune Crypto
  • Features
  • Leadership
  • Health
  • Commentary
  • Success
  • Retail
  • Mpw
  • Tech
  • Lifestyle
  • CEO Initiative
  • Asia
  • Politics
  • Conferences
  • Europe
  • Newsletters
  • Personal Finance
  • Environment
  • Magazine
  • Education
Customer Support
  • Frequently Asked Questions
  • Customer Service Portal
  • Privacy Policy
  • Terms Of Use
  • Single Issues For Purchase
  • International Print
Commercial Services
  • Advertising
  • Fortune Brand Studio
  • Fortune Analytics
  • Fortune Conferences
  • Business Development
  • Group Subscriptions
About Us
  • About Us
  • Editorial Calendar
  • Press Center
  • Work At Fortune
  • Diversity And Inclusion
  • Terms And Conditions
  • Site Map
  • About Us
  • Editorial Calendar
  • Press Center
  • Work At Fortune
  • Diversity And Inclusion
  • Terms And Conditions
  • Site Map
  • Facebook icon
  • Twitter icon
  • LinkedIn icon
  • Instagram icon
  • Pinterest icon

Latest in Newsletters

NewslettersMPW Daily
Can Sheryl Sandberg’s Lean In take on tradwives and the manosphere?
By Emma HinchliffeMarch 27, 2026
1 day ago
NewslettersTerm Sheet
VC firms rarely reinvent themselves. Kleiner Perkins did—and has a new $3.5 billion to show for it
By Allie GarfinkleMarch 27, 2026
1 day ago
Abstract business graph of AI growth. market growth, analysis, and future projections.
NewslettersCFO Daily
Why CFOs—not chief AI officers—are the secret to getting real value from AI
By Sheryl EstradaMarch 27, 2026
1 day ago
NewslettersFortune Tech
Anthropic data leak reveals powerful, secret Mythos AI model
By Alexei OreskovicMarch 27, 2026
1 day ago
NewslettersCEO Daily
Chubb’s CEO 25-page shareholder letter touches on China, AI, and the fragility of democracy: ‘I am both optimistic and I’m concerned’
By Diane BradyMarch 27, 2026
2 days ago
Water storage construction on the Meta data center site in Holly Ridge, Richland Parish, Louisiana.
AIEye on AI
Inside Meta’s chaotic AI boomtown in rural Louisiana
By Sharon GoldmanMarch 26, 2026
2 days ago

Most Popular

Success
Meetings are not work, says Southwest Airlines CEO—and he’s taking action by blocking his calendar every afternoon from Wednesday to Friday 
By Fortune EditorsMarch 27, 2026
2 days ago
Personal Finance
Current price of gold as of March 27, 2026
By Fortune EditorsMarch 27, 2026
1 day ago
Economy
The stay-at-home boyfriend is now an economic trend as more women than men go to work
By Fortune EditorsMarch 28, 2026
10 hours ago
Personal Finance
Current price of silver as of Friday, March 27, 2026
By Fortune EditorsMarch 27, 2026
1 day ago
AI
Exclusive: Anthropic acknowledges testing new AI model representing ‘step change’ in capabilities, after accidental data leak reveals its existence
By Fortune EditorsMarch 26, 2026
2 days ago
Success
This AI-proof career faces a 250,000-worker shortage—now the Trump administration is trying to revive the job millennials abandoned
By Fortune EditorsMarch 27, 2026
1 day ago

© 2026 Fortune Media IP Limited. All Rights Reserved. Use of this site constitutes acceptance of our Terms of Use and Privacy Policy | CA Notice at Collection and Privacy Notice | Do Not Sell/Share My Personal Information
FORTUNE is a trademark of Fortune Media IP Limited, registered in the U.S. and other countries. FORTUNE may receive compensation for some links to products and services on this website. Offers may be subject to change without notice.