OpenAI launches long-awaited GPT-4.5—but its capabilities already lag competitors'

OpenAI announced the debut of its GPT-4.5 model on Thursday, unveiling one of the most highly anticipated products in the booming generative AI market. But the launch, coming two years after GPT-4 was introduced, only served to highlight how the high-flying AI company is struggling to stay at the front of the race it helped kick off.

OpenAI CEO Sam Altman touted the latest model’s advances, saying in a tweet Thursday that GPT-4.5 was the first AI that “felt like talking to a thoughtful person,” and that he has been “astonished” by the “good advice” it provides. A blog the company published said that testers of GPT-4.5 judged the model to have more “EQ”—or emotional intelligence—than previous OpenAI models. And GPT-4.5 is less prone to inventing information, a phenomenon known as “hallucination,” the company said.

But Altman and co. also sought to tamp down expectations. “This isn’t a reasoning model and won’t crush benchmarks,” Altman warned, describing a “different kind of intelligence.” OpenAI’s blog post emphasized softer, more qualitative metrics for assessing GPT-4.5’s improvements from previous models, such as an output that “feels more natural” and the model’s “improved ability to follow user intent.”

OpenAI’s ambivalence about its latest model was evident even its description of GPT-4.5. OpenAI noted that “GPT-4.5 is not a frontier model” in the technical paper it released alongside the new model on Thursday (the term “frontier model” refers to AI systems at the leading edge of technological capability). Hours later, for reasons unclear, the company deleted that line from its paper.

And the company noted that it was still deciding whether it would even offer GPT-4.5 in the “long term” as an API for partners to connect to their systems because of how expensive it is to run. The new model is currently being offered at prices that are between 15 and 30 times more costly than OpenAI’s GPT-4o model.

In many ways, GPT-4.5 represents the end of an era for OpenAI. As Altman announced earlier this month, GPT-4.5, or Orion, as the company called the model internally, is the last that will be built using the same “pre-training” method that the company used to create the technology behind its breakout hit, ChatGPT (the P in GPT stands for “pretrained”). The method involves building ever bigger models and using ever increasing amounts data for each successive version, an expensive and complex approach that in theory allows the models to become more powerful.

OpenAI said that GPT-4.5 would be available on Thursday to users of the $200-a-month ChatGPT Pro service, but would not be available to other users until next week because, Altman noted, the company did not currently have enough computing capacity on hand.

OpenAI did not say how large the new GPT-4.5 model is. Outside experts have estimated that GPT 4 might have as many as 1.8 trillion parameters—essentially tunable nodes—in its neural network. Outside experts estimated that GPT-4.5 could have as many as 4 trillion or 5 trillion parameters.

Mind the benchmarks

While the new model outperforms OpenAI’s GPT-4o by a significant margin on a number of benchmark tests, especially those that involve accurately answering general knowledge questions, its performance on other tests, including those that involve solving problems across different languages, was only slightly improved. What’s more, in questions involving mathematics, coding, and logic, many early users said that GPT-4.5 underperforms OpenAI’s already released “reasoning” models such as o1 and o3-mini, as well as the R1 model from the Chinese AI startup DeepSeek.

GPT-4.5 also appears to lag Anthropic’s Claude 3.7 Sonnet model, which the rival AI shop unveiled earlier this week, according to benchmark scores users have posted to social media. Claude 3.7 Sonnet is the first AI model to be released that combines the instant, “intuitive” answers that GPT-style models produce with the slower, more deliberative, but often more accurate, answers that the reasoning models produce.

Claude 3.7 Sonnet decides, based on the user’s prompt, whether it can answer quickly, based only on what it has learned in its initial training, or whether it needs to spend more time producing a series of sequential steps and reflecting on those steps—a process known as a “chain of thought”—to arrive at the answer. OpenAI’s GPT-4.5 does not have this ability.

The lack of a clear, across-the-board, leap in performance led Gary Marcus, the emeritus New York University cognitive scientist and AI expert who has emerged as a leading skeptic of today’s generative AI methods, to label OpenAI’s GPT-4.5 “a nothing burger.” Some disappointed users posted relatively weak benchmark data for the new model on social media along with captions like “tell me I’m not seeing this.”

The shift to reasoning

Two former OpenAI employees told Fortune that the Orion model was originally intended to be GPT-5—an AI system that would show a much more significant increase in capabilities from OpenAI’s GPT-4, which launched in March 2023. But the model was never able to demonstrate this across-the-board step change in performance. As a result, OpenAI appeared to release it with nomenclature that would denote it was merely an incremental improvement on GPT-4o, not an order of magnitude jump in capabilities.

In a February 12 tweet, Altman said OpenAI will debut a model it will call GPT-5 in “weeks/months.” He noted that this model will combine the fast, instant answers of the GPT series models with the more deliberative, step-by-step logic of new “reasoning” models, making it more akin to Anthropic’s Claude 3.7 Sonnet.

Reasoning models start out with pretrained models but then use a method called reinforcement learning (where an AI models learns by trial and error to maximize some goal) to teach the model to output a sequence of logical steps that will lead to a correct answer. This “chain of thought,” as AI researchers refer to it, can often include the model engaging in what is essentially “self-reflection” to see where it can improve its process to arrive at the best answer.

GPT models adhered to so-called “scaling laws.” More empirical observations than anything akin to the laws of physics, the scaling laws were the supposition that the larger an AI model is (as measured by the number of parameters), the more data it is fed, and the more computing power applied to this pre-training process, the better the resulting AI model would be. What’s more, the scaling laws asserted that this improvement in capabilities was predictable and directly proportional to the increase in model size, data, and computing power applied during pretraining.

The reasoning models, by contrast, derive much of their capability from the amount of computing power applied at the time they are asked to answer a prompt. This is what AI researchers call “test-time compute,” and OpenAI has claimed it has found a new set of scaling laws that suggest that these reasoning models produce improved answers proportional to the amount of test-time compute applied. But even more than the original AI scaling laws, these new test-time compute scaling correlations have yet to be proven.

What’s clear with GPT-4.5’s release is that OpenAI no longer has the clear lead in the AI race it once did. To use a bike racing analogy, OpenAI remains in the peloton, but, for now at least, the yellow jersey has passed to Anthropic, and there are other companies, including China’s DeepSeek, Google, and Meta, all capable of winning the tour.

Trendingnow

1

2

3

OpenAI launches long-awaited GPT-4.5—but ‘Orion’s’ capabilities already lag competitors

Mind the benchmarks

The shift to reasoning