Cerebras hopes planned IPO will supercharge its race against Nvidia and fellow chip startups for the fastest generative AI

Sharon GoldmanBy Sharon GoldmanAI Reporter
Sharon GoldmanAI Reporter

Sharon Goldman is an AI reporter at Fortune and co-authors Eye on AI, Fortune’s flagship AI newsletter. She has written about digital and enterprise tech for over a decade.

Andrew Feldman, CEO of Cerebras Systems.
Andrew Feldman, CEO of Cerebras Systems.
Ramsey Cardy—Sportsfile for Collision via Getty Images

Hello and welcome to Eye on AI! In this edition…Governor Newsom vetoes SB 1047; ByteDance plans new AI model based on Huawei chips; Microsoft announces AI models will improve Windows search; and the U.S. Commerce Department sets a new rule that eases restrictions on AI chip shipments to the Middle East.

Cerebras has a need for speed. In a bid to take on Nvidia, the AI chip startup is rapidly moving toward an IPO after announcing its filing for one yesterday. At the same time, the company is also in a fierce race with fellow AI chip startups Groq and SambaNova for the title of ‘fastest generative AI.’ All three are pushing the boundaries of their highly-specialized hardware and software to enable AI models to produce responses using ultra-fast generative AI that even outperform Nvidia GPUs. 

Here’s what that means: When you ask an AI assistant a question, it must sift through all of the knowledge in its AI model to quickly come up with an answer. In industry parlance, that process is known as “inference.” But large language models don’t sift through words during the inference process. When you ask a question or give a chatbot a prompt, the AI breaks that into smaller pieces called “tokens”—which could represent a word, or a chunk of a word—to process its answer and respond. 

Pushing for faster and faster output

So what does “ultra-fast” inference mean? If you’ve tried chatbots like OpenAI’s ChatGPT, Anthropic’s Claude, or Google’s Gemini, you probably think the output of your prompts arrives at a perfectly reasonable pace. In fact, you may be impressed by how quickly it spits out answers to your queries. But in February 2024, demos of a Groq chatbot based on a Mistral model produced answers far faster than people could read. It went viral. The setup served up 500 tokens per second to produce answers that were nearly instantaneous. By April, Groq delivered an even speedier 800 tokens per second, and by May SambaNova boasted it had broken the 1,000 tokens per second barrier. 

Today, Cerebras, SambaNova, and Groq are all delivering over 1,000 tokens per second, and the “token wars” have revved up considerably. At the end of August, Cerebras claimed it had launched the “world’s fastest AI inference” at 1,800 tokens per second, and last week Cerebras said it had beaten that record and become the “first hardware of any kind” to exceed 2,000 tokens per second on one of Meta’s Llama models. 

When will fast be fast enough?

This led me to ask: Why would anyone need generative AI output to be that fast? When will fast be fast enough?

According to Cerebras CEO Andrew Feldman, generative AI speed is essential since search results will increasingly be powered by generative AI, as well as new capabilities like streaming video. Those are two areas where latency, or the delay between an action and a response, is particularly annoying. 

“Nobody’s going to build a business on an application that makes you sit around and wait,” he told Fortune

In addition, AI models are quickly being used to power far more complex applications than just chat. One rapidly growing area of interest is developing application workflows based on AI agents, in which a user asks a question or prompts an action that doesn’t simply involve one query to one model. Instead it leads to multiple queries to multiple models that can go off and do things like search the web or a database. 

“Then the performance really matters,” said Feldman, explaining that a reasonably slow output today could quickly become painfully slow. 

Unlocking AI potential with speed

The bottom line is that speed matters because faster inference unlocks greater potential in applications built with AI, Mark Heaps, chief technology evangelist at Groq, told Fortune. That is especially true for data-heavy applications in fields like financial trading, traffic monitoring, and cybersecurity: “You need insights in real time, a form of instant intelligence that keeps up with the moment,” he said. “The race to increase speed…will provide better quality, accuracy, and potential for greater ROI.” 

It’s worth noting, he pointed out, that AI models still have nowhere near as many neural connections as the human brain. “As the models get more advanced, bigger, or layered with lots of agents using smaller models, it will require more speed to keep the application useful,” he explained, adding that this has been an issue throughout history. “Why do we need cars to get beyond 50 mph? Was it so we could go fast? Or producing an engine that could do 100 mph enabled the ability to carry more weight at 50 mph?” 

Rodrigo Liang, CEO and cofounder of SambaNova, agreed. Inference speed, he told Fortune, “is where the rubber hits the road—where all the training, the building of models, gets put to work to deliver real business value.” That’s particularly true now that the AI industry is moving more of its training from training AI models to putting them into production. “The world is looking for the most efficient way to produce tokens so you can support an ever-growing number of users,” he said. “Speed allows you to service many customers concurrently.” 

Sharon Goldman
sharon.goldman@fortune.com

AI IN THE NEWS

Governor Newsom vetoes California’s SB-1047. On Sunday, news spread quickly through Silicon Valley that Governor Newsom had vetoed SB-1047, a widely debated and ambitious AI regulatory proposal. The bill, if enacted, would have required developers to conduct safety testing on large AI models before public release, the New York Times reported. Critics, however, raised concerns over provisions granting the state’s attorney general the authority to sue companies for harm caused by their technologies. The bill also mandated a “kill switch” to shut down AI systems in the event of potential threats like biowarfare, mass casualties, or significant property damage. “I do not believe this is the best approach to protecting the public from real threats posed by the technology,” Newsom said in a statement. “Instead, the bill applies stringent standards to even the most basic functions—so long as a large system deploys it.”

Sources say ByteDance plans new AI model trained with Huawei chips. Reuters reported that TikTok's Chinese parent ByteDance plans to develop an AI model trained primarily with chips from China’s Huawei Technologies. It's a response to U.S. moves since 2022 to restrict exports of advanced AI chips, particularly from market leader Nvidia. The article claimed that sources said ByteDance's next step in the AI race is to use Huawei's Ascend 910B chip to train a large-language AI model, but ByteDance denied a new model is being developed.

Microsoft announces AI models will improve Windows search on Copilot Plus PCs. Microsoft said today its new Copilot Plus PCs will use AI models to improve Windows search, available starting in November, including a new Click to Do feature that is similar to Google’s Circle to Search function. “AI-powered search makes it dramatically easier to find virtually anything,” said Yusuf Mehdi, executive vice president and consumer chief marketing officer at Microsoft, as reported by the Verge. “You no longer need to remember file names and document locations, nor even specific names of words. Windows will better understand your intent and match the right document, image, file, or email.”

U.S. Commerce Department sets new rule that eases restrictions on AI chip shipments to Middle East. According to Reuters, yesterday the U.S. Commerce Department unveiled a rule that could ease shipments of AI chips like those from Nvidia to Middle East data centers. Since October 2023, U.S. exporters have been required to obtain licenses before shipping advanced chips to parts of the Middle East and Central Asia. But now, data centers will be able to apply for status that will allow them to receive chips, rather than requiring their suppliers to obtain individual licenses to ship to them.

FORTUNE ON AI

Before Mira Murati’s surprise exit from OpenAI, staff grumbled its o1 model had been released prematurely—by Jeremy Kahn, Kali Hays and Sharon Goldman

Why investors want startup founders to own equity—including OpenAI’s Sam Altman—by Sharon Goldman, Kali Hays and Verne Kopytoff

Nvidia shares fall and its Chinese rivals soar after Beijing urges AI companies to look elsewhere for chips—by David Meyer

Mark Cuban warns the U.S. must win the AI race ‘or we lose everything’by Jason Ma

AI CALENDAR

Oct. 22-23: TedAI, San Francisco

Oct. 28-30: Voice & AI, Arlington, Va.

Nov. 19-22: Microsoft Ignite, Chicago

Dec. 2-6: AWS re:Invent, Las Vegas

Dec. 8-12: Neural Information Processing Systems (Neurips) 2024 in Vancouver, British Columbia

Dec. 9-10: Fortune Brainstorm AI San Francisco (register here)

EYE ON AI RESEARCH

Could generative AI chatbots help reduce belief in conspiracy theories? New research published in Science by Thomas Costello of American University and Gordon Pennycook of Cornell found that discussions with AI chatbots could reduce individuals’ beliefs in conspiracy theories. Using OpenAI’s GPT-4 Turbo, human participants described a conspiracy theory that they subscribed to, and then the AI responded with back and forth with persuasive arguments that refuted their beliefs with evidence. According to the research, “the AI chatbot’s ability to sustain tailored counterarguments and personalized in-depth conversations reduced their beliefs in conspiracies for months, challenging research suggesting that such beliefs are impervious to change.”

BRAIN FOOD

Want a glimpse of your future self using generative AI? If you’ve ever wanted to receive a visit from your future self like in Back to the Future, you may be interested in new research from MIT that created a chatbot for users to have a conversation with an “AI-generated simulation of their potential future self.” The tool, called “ Future You,” uses a large language model and information provided by the user to help young people “improve their sense of future self-continuity, a psychological concept that describes how connected a person feels with their future self.” What if the Future Tool offers negative predictions, causing young people to freak out? The researchers explained that the tool cautions users that its results are only one potential version of their future self, and they can still change their lives. “This is not a prophesy, but rather a possibility,” the lead researcher said. 

This is the online version of Eye on AI, Fortune's weekly newsletter on how AI is shaping the future of business. Sign up for free.