The $19.6 billion pivot: How OpenAI's 2-year struggle to launch GPT-5 revealed that its core AI strategy has stopped working

From the moment OpenAI released GPT-4 in March 2023, the company has faced one persistent question: When would it release GPT-5?

OpenAI ignited the generative AI craze with its large language models, and anticipation for its next big launch can rival the buzz around a new sneaker drop or a pop star’s latest single.

But the GPT-5 rollout has not gone according to script. Two years after the launch of GPT-4, the world is still waiting for its successor – and wondering what’s been holding it up.

Earlier this month, OpenAI CEO and cofounder Sam Altman provided some guidance, announcing in a tweet that GPT-4.5 and GPT-5 would finally be released in a matter of “weeks/months.” The 39-year-old CEO was mum on the specific causes of GPT-5’s lengthy development, even as he promised that OpenAI would do a “better job of sharing our intended roadmap.” (Rumors on social media have put the release of GPT-4.5 as soon as this week.)

This story was featured in The Reader, a weekly newsletter of the biggest stories from Fortune. Sign up here.

As it turns out, the big news in Altman’s “roadmap” was not just the timeline for GPT-5’s launch, but the revelation of a deeper change within the company. Embedded in jargon about “non-chain-of-thought” models and project names like “Orion” was the admission that OpenAI has quietly made an about-face on its core strategy. After years of preaching a bigger-is-better approach that calls for pre-training models with ever more data, Altman effectively acknowledged the scaling technique was no longer producing a big enough performance boost. The formula that had yielded ChatGPT and propelled OpenAI to a $157 billion valuation, in other words, had run out of juice, and the company was changing its playbook to get GPT-5 out the door.

While the full picture of OpenAI’s new approach is not yet in clear focus, the implications of OpenAI’s pivot are almost certain to extend beyond the confines of the buzzy San Francisco startup and across the web of businesses and valuations engaged in the race to build ever more powerful AI models, from specialized chips to data centers and energy providers. In January, for instance, Altman appeared alongside President Trump, Oracle CEO Larry Ellison, and Softbank CEO Masayoshi Son to announce the massive, $500 billion Stargate datacenter project – a project that may be less crucial if it turns out AI can be built with less resources. Microsoft has pumped $13 billion into OpenAI (in addition to the at least $6.6 billion OpenAI has raised from other investors), in part on the premise that the company’s close partnership with the AI startup would put his company at the forefront of AI technology for the foreseeable future—and that the path towards ever more capable AI models would follow a predictable, step-wise path.

Now that path seems far less certain. Fortune spoke to AI industry insiders and experts to better understand what led to OpenAI’s change of plans, how its approach to building AI models is shifting, and what it means for the company and the industry.

The $500 billion Stargate project, announced by President Trump in January, will build giant data centers to run OpenAI’s models.

The Orion question

Two days after he turned 39 last April, Sam Altman sat down for a talk at Stanford University, the school he had dropped out of many years earlier to embark on his career as a startup founder.

The talk was with students in the entrepreneurship program, and naturally, the question of GPT-5 came up. Altman responded with an observation about the unremitting power of OpenAI’s approach to building GPT models. “I think that’s like among the most remarkable facts in human history – that we can just do something, and we can say right now with a high degree of scientific certainty that GPT-5 is going to be a lot smarter than GPT-4, GPT-6 is going to be a lot smarter than GPT-5. And we are not near the top of this curve and we kind of know what to do.”

Whether Altman knew it at the time or not however, the scaling approach OpenAI has used since 2018 to produce its GPT models — building successively larger models and feeding them successively larger amounts of data during an initial “pre-training” stage (the P in GPT stands for “pre-trained”) — had reached the top of the curve. By the time Altman shared the roadmap in his tweet this month, this had become a reality that OpenAI could not ignore or deny.

“We will next ship GPT-4.5, the model we called Orion internally, as our last non-chain-of-thought model,” Altman wrote in the tweet. The GPT-5 model, which Altman said would come after Orion, will feature a mixture of the company’s traditional pre-training as well as the newer “chain of thought” method.

Chain of thought is a technique in which an AI model, after having been pretrained, is taught through a “post-training” process to answer a question in a deliberate, step-by-step manner, taking time to consider whether any alternate set of steps might produce a better result. This is the method behind the current wave of “reasoning” AI models. These models (which still involve some degree of pre-training) perform more slowly than purely pre-trained GPT models, but they produce better answers for certain logic-oriented tasks like coding and math.

OpenAI itself has been at the forefront of the reasoning wave, with its o1 and o3 models (China’s DeepSeek is another leading provider of reasoning models). But OpenAI’s “o-series” models are distinct from the company’s main bloodline of GPT models – and making those GPT models bigger was no longer making them better across the board, a problem that came to a head with Orion.

Orion, now destined to be the last of the pre-trained GPT species, was in fact initially supposed to be the long awaited GPT-5, according to two former OpenAI employees who were granted anonymity because they were not authorized to discuss internal company matters, as well as past media reports citing anonymous OpenAI insiders. OpenAI struggled to get Orion to deliver significant performance gains over GPT-4 however, according to a November report in The Information, and a Wall Street Journal report in December.

These media reports indicated that Orion was better at some language tasks than GPT-4, but that its abilities at coding and mathematics was not significantly improved on the previous model. According to the Wall Street Journal report, OpenAI conducted at least two large training runs for Orion, each lasting several months, but the results fell short of what company researchers were hoping for.

Gary Marcus, an emeritus professor of cognitive science at New York University and expert on artificial intelligence who has emerged as a leading skeptic of current approaches to AI, was never a believer in the so-called scaling laws. He says OpenAI has discovered it cannot overcome fundamental limitations of GPT-style models. Among other problems, these models have seemingly little ability to tell fact from fiction and are prone to a phenomenon known as “hallucination” where they confidently output invented or otherwise erroneous information.

Marcus says that OpenAI built up expectations that GPT-5 would solve these challenges and be markedly better than GPT-4. “My impression is that they have tried and tried hard, but not come up with anything that they find satisfactory,” Marcus says.

In response to questions about whether OpenAI was acknowledging that it was unable to produce a GPT model with a large enough performance increase to warrant the name GPT-5, an OpenAI spokesperson said, “we remain focused on building and improving both our GPT and o-series models and we’ll share more when we’re ready.” The statement also pointed to advancements OpenAI has announced since December—such as its o1 and o3-mini models, as well as its computer-using Operator agent AI system, and Deep Research, which produces in-depth reports. All of these advancements, however, are either chain of thought reasoning models or applications built on such models.

In a blog post this month, Altman wrote that “scaling laws” that deliver “continuous and predictable gains” in AI models are “accurate over many orders of magnitude.” Yet Altman’s description of scaling notably referred to the resources to “train and run” an AI model — a phrasing that appears to expand the concept of scaling to include the new chain of thought approach.

Training advanced LLMs involves data centers packing tens of thousands of GPU chips.

Warning signs

Warning signs that the pre-training scaling approach OpenAI used for Orion was nearing its limits have been growing for months – including from some prominent OpenAI alumni.

Ilya Sutskever, OpenAI’s former chief scientist who is now running his own startup, Safe SuperIntelligence (SSI), told an audience at a leading AI conference in December that “pre-training as we know it will unquestionably end.” The reason, he said, is that the world was running out of human-created data to feed larger and larger GPT models. “The data is not growing because we have but one internet,” he said.

OpenAI has never disclosed exactly how large GPT-4 is or exactly what, or how much, data it was trained on. But outside experts have estimated that the model might have as many as 1.8 trillion parameters—essentially tunable nodes—in its neural network. These experts also assume the model was trained on pretty much the entirety of the publicly-accessible internet, plus perhaps a number of large private datasets as well.

It’s not hard to see the problem for a company like OpenAI as it attempts to build the next big foundation model — and each one that follows. We may feel like we’re downing in oceans of data, but in the age of AI, data is a scarce commodity. So-called synthetic data (data created by AI that’s designed to mimic human-created data) is one way to get around the limits of human-created data, but with potential drawbacks. Researchers have shown that if you use too much synthetic data, it can lead to a phenomenon known as “model collapse,” in which the AI systems’ performance dramatically plummets.

And shuttling the data between racks of tens of thousands of GPUs in hulking data centers during training runs is another inherent challenge, with some engineers suggesting we are approaching the physical limits of what current networking and switching technologies will allow.

Of course, the costs mushroom as these data centers get bigger. Altman has said that training runs for models of GPT-5 size or beyond could cost as much as $1 billion—and that is just how much it would cost to rent the cloud computing time to train the model, a figure that is largely determined by the energy cost of running so many GPUs round-the-clock for weeks and months. It does not include the capital costs of constructing the datacenters. OpenAI’s first Stargate supercomputing cluster is expected to contain as many as 2 million GPUs, cost $100 billion to construct, and consume one gigawatt of energy per year—or as much as large American city.

One of the largest known data centers today for AI training belongs to Elon Musk’s X.ai, with 100,000 GPUs housed in a single location in Tennessee. And yet even with this GPU mega cluster, X.ai has not yet managed to release a model more capable than GPT-4.

Thomas Wolf, cofounder and chief science officer at Hugging Face, the open source AI company, says that while it is possible OpenAI will use Stargate to keep building colossal base LLMs, most AI companies now realize that “there is lower hanging fruit” with the reasoning models. “There are a lot of things we can grab on these reasoning models without having to train GPT-5,” he said.

Wolf predicted that the focus of the entire industry for the next year would likely be about reasoning models and chain-of-thought. DeepSeek’s work with R1 has also shown that powerful reasoning models don’t have to be particularly large and can be run on smaller, less expensive data centers. And a report on Friday from financial firm TD Cowen, claiming that OpenAI partner Microsoft has cancelled some U.S. leases for data center capacity, stoked speculation that the industry is rethinking its approach to building AI models.

A Microsoft spokesperson said that the company is “well positioned to meet our current and increasing customer demand,” noting that last year the company added more data center capacity than in any prior year in the company’s history. “While we may strategically pace or adjust our infrastructure in some areas, we will continue to grow strongly in all regions,” the spokesperson said. They also reiterated Microsoft’s plans to invest $80 billion in data center infrastructure this fiscal year.

A model that mimics the brain

Anthropic, the OpenAI rival whose cofounders Dario Amodei and Jared Kaplan were among those who wrote the first paper on GPT scaling laws, said in a statement that it is not quite ready to give up on GPTs yet.

“Synthetic data remains highly promising,” the company said in a statement in response to questions Fortune posed about the limits of GPT scaling. “Even a look at recent public progress in the AI industry should show us that claims of ‘model collapse’ are very overstated.”

The company added, “that’s not to say that this is a straightforward problem: data quality and quantity present real challenges that need to be addressed. But Anthropic views this as a solvable problem rather than a fundamental limitation.”

Anthropic CEO Dario Amodei, a former OpenAI exec who cowrote the original GPT paper, believes the scaling approach still has room to grow.

Some people compare the fast speedy responses that GPT models provide to what psychologist Daniel Kahneman called System 1 thinking in humans. System 1 is fast and intuitive thinking, with a person simply producing an answer or an action, often without even being aware they are thinking at all. When a person really needs to think through a problem before answering, they use a different cognitive pathway that Kahneman called System 2. System 2 is much slower, deliberative thinking, where a person is aware of their own brain searching for the right way to answer. This is much more akin to the chain-of-thought, test time compute that the so-called reasoning models use.

The trick for what Altman says will be OpenAI’s GPT-5 model—which will combine the ability to respond quickly across a vast range of tasks, like GPT-4.5, with the long, chain-of-thought reasoning of o3—will be to figure out how to marry these two approaches, System 1 and System 2, in the same model, with the model itself being able to determine correctly which system to use.

Earlier this week, Anthropic released its Claude 3.7 Sonnet model, which is the first AI model to accomplish this hybrid approach, with the model being able to determine whether it can give an instant answer or if it needs to use chain of thought and spend longer before providing an output. Anthropic is also allowing users to set a budget for how much compute time they want Claude 3.7 to use in providing an answer. More compute time for reasoning questions tends to produce better answers, but also runs up the meter on costs.

GPT-5’s ability to match Claude 3.7 in seamlessly combining System 1 and System 2 “thinking” will be crucial.

That’s because reasoning models offer impressively better performance on some tasks, particularly tough math problems and coding tasks, than GPT-4. They are better at logic puzzles. But they are not better across the board than GPT-4. They do not write significantly better. They also are not significantly better at summarization or answering questions.

There are many AI enthusiasts who hope that GPT-5 will bring us a major step closer to artificial general intelligence (known in the field by the acronym, AGI.) AGI has been the Holy Grail of AI since the field’s establishment in the mid-20th Century and it is the explicit founding goal of OpenAI. The company defines the technology as a single AI system that can do most of the economically-valuable cognitive work that people currently perform. And Altman has increasingly hinted that AGI is near. On his personal blog, in early January, the OpenAI CEO wrote “we are now confident we know how to build AGI as we have traditionally understood it.” In a separate blog post on February 9, Altman wrote “Systems that start to point to AGI are coming into view.” Around the same time, in an interview during a trip to Japan, Altman suggested that GPT-5 or GPT-6 would reach such a high threshold of raw intelligence at which users would stop asking for further capability improvements and instead focus much more on improvements to the models’ user interface and how to integrate them with other applications. Although Altman was careful not to say so explicitly, his remarks certainly gave the impression that GPT-5 or perhaps GPT-6 would qualify as AGI.

But many outside experts doubt that GPT-5 will be able to overcome some fundamental challenges that afflict all LLMs. Even the AI reasoning models cannot reliably tell fact-from-fiction. They still hallucinate and make errors, even in mathematics questions and tasks involving non-verbal reasoning, that an intelligent person wouldn’t be expected to make.

They also can’t reliably play chess—something conventional, deterministic software programs have been able to do for five decades. “This is an easy task, the bar is ‘don’t make illegal moves,’” Marcus says. “It’s absurd to call something that can’t understand the rules of chess after seeing millions of explicit explanations of the rules in its training artificial general intelligence.”

Marcus says that what AI companies are calling reasoning is nothing of the sort. “It is a kind of naive pattern recognition. It is not the abstract reasoning that logisticians have thought about for centuries,” he said.

That’s why Marcus thinks ultimately the field will need to turn to some other underlying design for AI, not based on using the neural network architecture called Transformers that all of today’s generative AI systems use.

Marcus is not alone in this. Meta’s chief scientist Yann LeCun, a renowned AI pioneer who has sparred with Marcus over the utility of neural networks in the past, also says that Transformers and GPT-style models will never be able to deliver human-like AGI, or artificial general intelligence which can perform cognitive tasks as well as humans. (Never mind that Meta has released some of the most capable open source GPT-style models available.) LeCun says that these models lack any capacity to understand the physical world, cannot remember and retrieve information, have no persistent memory, and have only limited planning and reasoning abilities.

Fortune Brainstorm AI returns to San Francisco Dec. 8–9 to convene the smartest people we know—technologists, entrepreneurs, Fortune Global 500 executives, investors, policymakers, and the brilliant minds in between—to explore and interrogate the most pressing questions about AI at another pivotal moment. Register here.

The $19.6 billion pivot: How OpenAI’s 2-year struggle to launch GPT-5 revealed that its core AI strategy has stopped working

The Orion question

Warning signs

A model that mimics the brain