Hello and welcome to Eye on AI. In this edition…OpenAI’s o3 model blows the AI industry’s collective mind; Trump AI policy advisor becomes a flashpoint for internecine battle over immigration policy; DoE confirms AI’s big impact on electricity demand; some predictions for 2025.
The year drew to a close with a mic drop from OpenAI. On Dec. 20, the company unveiled a new AI model called o3 that showed off-the-charts performance on a series of benchmark tests, including one specifically designed to gauge whether AI models are capable of human-like abstract reasoning and generalization. The new o3 model scored 75.7% on this ARC-AGI benchmark, when restricted to less than $10,000 in computing expense, and 87.5% with an unrestricted compute budget. OpenAI’s relatively capable GPT-4o model had scored just 5% on the same test.
The result led some AI enthusiasts to wonder out loud whether OpenAI had just achieved the field’s long-sought Holy Grail, artificial general intelligence (or AGI)—which OpenAI defines as a single software system able to perform most economically valuable cognitive tasks as well as or better than a human.
Meanwhile, some deep learning skeptics, most notably Gary Marcus, blasted the o3 result as wildly misleading, arguing that: OpenAI had specifically trained o3 to do well on the ARC-AGI benchmark (even though the benchmark was designed to make this sort of “training to the test” difficult); that the ARC-AGI benchmark was a poor judge of how well a model will perform on more open-ended and ambiguous real world tasks; and that the high cost of o3 would make it economically unfeasible to use for most practical applications.
o3 still can’t do some thing humans find easy
Francois Chollet, the former Google AI researcher who designed the ARC-AGI benchmark and ran the tests of o3 for OpenAI, acknowledged that his benchmark was not designed to be the definitive marker of whether AGI has been achieved. But it was supposed to be a yardstick for the kind of learning efficiency and conceptual extrapolation humans exhibit and which might indicate a system was getting closer to AGI. Chollet and Marcus both agreed, however, that o3’s average performance score on ARC-AGI could give people a false impression of the model’s capabilities—while the model scored well on the ARC-AGI test overall, it bricked some visual reasoning tasks that humans find fairly trivial to solve.
Software suddenly has a marginal cost
Chollet made a critical point about o3: It upends a fundamental truism of the software business—that the marginal cost of software trends toward zero. With earlier kinds of software, once the program is coded or once the AI model is trained, the cost of deploying additional copies of that software is essentially nothing. But o3 works very differently. Like other new “reasoning models,” it produces better results the more computing power it uses at the point of inference (i.e. when it is asked to perform a task). This means the marginal cost of running additional copies does not trend towards zero. (The cost of getting o3 to score 87.5% on ARC-AGI was not revealed but it was estimated at hundreds of thousands of dollars.) This will change how AI companies will need to think about pricing their AI models, and more importantly, radically alter the way companies buying AI like o3 will need to budget.
AGI will come sooner, but matter less
o3 also means that OpenAI CEO Sam Altman is probably correct when he predicts that “we will hit AGI much sooner than most people in the world think and it will matter much less.” When I first heard Altman say this, I was skeptical. I thought it was simply a way of moving the goalposts on AGI, while also trying to avoid regulation. But, now, I think Altman is on to something. At the current rate of progress, it is quite possible a successor model to o3 will, within the next year or two, do well enough across every tough benchmark we can conceive, that its creators will be able to claim it is as good or better than humans at every cognitive task.
But, at the same time, the cost of running such a model may be so high that, as Marcus argues, it won’t be economically feasible to use it for most real world applications, at least not initially. While those costs will no doubt come down rapidly, as they have for LLMs, they may not drop far enough or fast enough to make AGI adoption seem anything other than gradual. In many cases, it will remain easier and more economically efficient for businesses to keep employing people to do tasks.
The pace of AI adoption is already lagging the pace of AI model development, and this seems unlikely to change. AI outputs still differ enough from what we expect from humans, and remain unreliable and inconsistent enough, that in most cases we’ll need to keep at least some humans in the loop to double-check AI outputs. Changes to workflows and jobs will occur gradually, more like a creeping tide of automation than an exploding bomb. This is good news, as it should give us all more time to adapt.
With that, here’s more AI news.
Jeremy Kahn
jeremy.kahn@fortune.com
@jeremyakahn
AI IN THE NEWS
OpenAI confirms it is changing to a for-profit entity. OpenAI formally announced plans to transition into a more traditional for-profit company in 2025. The company acknowledged in a blog post that its current nonprofit-controlled structure had become an impediment to its ability to raise venture capital funding and said that it would transition to a Delaware-incorporated Public Benefit Corporation (or B Corp), in which its board would have fiduciary duties that extend to a social purpose as well as earning profits for investors. OpenAI said it would continue to have a nonprofit arm, which would own a significant number of shares in the B Corp and exert some influence through them, but that the nonprofit would no longer have exclusive control of the for-profit OpenAI. The company also said in its blog post that despite having just raised an additional $6.6 billion in the largest venture capital funding deal in history, it would need to raise even more capital going forward to achieve its goal of developing AGI.
OpenAI and Microsoft have a contractual AGI definition that safeguards Microsoft’s investment. That’s according to reporting from The Information, which says that AGI is currently defined between them as a technology that has the capability of producing $100 billion in profits for OpenAI. Given that OpenAI is currently losing billions per year, it may take OpenAI a long time to be in a position to meet that contractual definition. However, The Information also reports that OpenAI and Microsoft are in the process of trying to renegotiate their financial arrangements, with Microsoft perhaps gaining further access to OpenAI’s technology in exchange for OpenAI gaining more freedom to work with other tech companies and data center providers.
David Sacks’s “AI and Crypto Czar” role scaled back. That’s a scoop from my Fortune colleague Kali Hays who reports that Sacks, who U.S. President-elect Donald Trump initially named as his “AI and Crypto Czar,” will serve only in a general advisory position. The venture capitalist, former PayPal exec, and close confidant of Elon Musk was reluctant to resign from his position at his investment firm, Craft Ventures, in order to take a formal government job, according to sources familiar with Trump’s transition plans. Instead, Michael Kratsios, a former Scale AI executive who had served as U.S. Chief Technology Officer during Donald Trump’s previous presidential administration, would oversee day-to-day tech policy as director of the White House Office of Science and Technology Policy (OSTP), with former a16z venture capitalist Sriram Krishnan serving as senior policy advisor for AI within the OSTP. On X.com Sacks disputed that his role had changed, posting that he had requested a position that would allow him to split his time between Washington and Silicon Valley.
MAGA activists target AI policy advisor Krishnan over skilled immigrant visas. Internecine warfare broke out among Donald Trump supporters after extreme right activist Laura Loomer attacked Trump’s appointment of Sriram Krishnan to be a White House policy advisor on AI, arguing that his past support for expanded H-1B visas and for the removal of caps on the number of green cards for skilled workers was the opposite of what Trump stood for. Krishnan’s Silicon Valley allies, including Trump supporters Elon Musk, David Sacks, and Vivek Ramaswamy, rushed to his defense, accusing Loomer and her supporters of racism, while arguing that U.S. culture did not produce enough highly skilled native-born engineers to ensure the country remains at the forefront of technological innovation. Meanwhile Loomer and her crew claimed censorship by Musk’s X.com social media platform. Trump weighed in to calm the dispute, claiming (perhaps somewhat misleadingly) that his businesses have made frequent use of the H-1B visas in the past. You can read more from the Washington Post here.
DeepSeek unveils powerful new AI model trained with far fewer computer chips Chinese start-up DeepSeek launched its V3 AI model, which outperforms top models from Meta and Alibaba and matches results from OpenAI GPT-4o and Anthropic’s Claude 3.5, despite being developed at a fraction of the cost of these rival AI systems. V3, which is smaller than these other companies' models, was trained in just two months at a cost of $5.58 million, the South China Morning Post reported. The development highlights the gains Chinese companies are making in optimizing AI training despite U.S. export controls that limit the access Chinese companies have to top-of-the-line graphics processing units (GPUs), the specialized computer chips used for AI applications.
EYE ON AI RESEARCH
U.S. Energy Department study confirms AI driving massive electricity demand. A new study from the Department of Energy’s Lawrence Berkeley National Laboratory shows the soaring electricity required by data centers in the U.S. This demand began to accelerate in 2017, according to the report. In 2018, data centers made 1.9% of total U.S. electricity use. But by 2028, the research predicts that they will consume between 6% and 12% of U.S. electric power, with the large range due to the difficulty of estimating exactly how fast AI adoption will take place. The research also notes that this demand, particularly after 2028, will come on top of other big electricity demand spikes from electric vehicle adoption, the return of manufacturing to the U.S., the increased use of hydrogen gas as a fuel source (which requires large amounts electricity to produce), and the electrification of other industry and buildings. You can read the full report here.
FORTUNE ON AI
Google CEO urges employees to move faster and ‘stay scrappy’ ahead of pivotal year in AI, report says —by Jason Ma
Scientist says the one thing everyone hates about AI is ultimately what helped him win a Nobel Prize —by Dave Smith
Sam Altman calls Elon Musk a ‘bully’ who enjoys getting into fights with fellow billionaires —by Sydney Lake
Accenture boss Julie Sweet met 30 global CEOs in the past 2 months, and nearly all of them are scrambling to roll out more AI —by Ryan Hogg
Matt Garman rose from intern to AWS CEO. Inside the ‘unflappable’ exec’s plan to defend Amazon’s $100 billion profit machine in the age of AI —by Jason Del Rey
AI CALENDAR
Jan. 7-10: CES, Las Vegas
Jan 16-18: DLD Conference, Munich
Jan. 20-25: World Economic Forum, Davos, Switzerland
Feb. 10-11: AI Action Summit, Paris, France
March 3-6: MWC, Barcelona
March 7-15: SXSW, Austin
March 10-13: Human [X] conference, Las Vegas
March 17-20: Nvidia GTC, San Jose
April 9-11: Google Cloud Next, Las Vegas
BRAIN FOOD
AI Predictions for 2025
We’ll end the year with a few predictions to mull over for the new year.
The Trump Administration will launch a “Manhattan Project” to race towards AGI. Trump will announce a significant government-backed effort to ensure the U.S. achieves AGI before China does. This may include forcing AI rivals, such as OpenAI, Anthropic, and Google DeepMind, to work together on a single AI model intended for the government’s own use, and the construction one or more massive data centers.
AMD and a new crop of AI computer chip startups will begin to noticeably erode Nvidia’s market dominance in GPUs. All the AI trends are pointing to the increased importance of inference relative to training in the running of high-end AI models. But Nvidia’s top-end graphics processing units are more optimized for training ultra-large AI models than they are for running inference. That is likely to mean rivals that have focused on making chips specifically for inference will begin chip away at Nvidia’s stranglehold.
Reasoning will replace “agent” as the AI buzzword of 2025. Microsoft and Salesforce tried to make 2024 all about AI agents. But next year will be all about reasoning. OpenAI made its o1 model publicly-available in early December and previewed its o3 model. Its fellow AI companies are likely to follow suit with reasoning models of their own in the coming months. And while reasoning and agents are aligned concepts—in fact, reasoning abilities are necessary to make AI agents useful and trustworthy—I predict you are going to be hearing much more hype about reasoning and less about agents this coming year.
Mistral or its founding team will end up in the arms of a tech giant. One of the tech giants will scoop up French AI startup Mistral, or will hire away its founding team in a deal similar to the ones that took out the founding teams of Inflection, Adept, and Character.ai. The cost of training cutting edge AI models is simply too great from the Paris-based Mistral to manage to keep going as a completely venture-backed, independent AI startup.
Ok, those are my predictions for 2025. What are yours?