Why OpenAI CEO Sam Altman and AI skeptic Gary Marcus are both wrong about today's AI

Hello and welcome to Eye on AI. In this edition…What both the true believers and the doubters get wrong about today’s AI…Nvidia wants to dominate ‘embodied AI’ just as it has datacenter-based AI…The innovations behind DeepSeek V3’s impressive performance…Can AI give the little guy a leg up?

Among AI industry insiders, opinion about AI progress tends to bifurcate. In one camp are folks such as Sam Altman. The OpenAI CEO wrote a blog post on his personal website over the weekend reflecting on OpenAI’s trajectory, especially over the past two years. In the post, Altman stated that artificial general intelligence—which OpenAI defines as a single AI software system that can perform as well or better than people at most economically-useful cognitive tasks—was essentially a solved problem. “We are now confident that we know how to build AGI as we have traditionally understood it,” he wrote. And Altman predicted that in 2025 “we may see the first AI agents ‘join the workforce’ and materially change the output of companies.”

The other camp is deeply skeptical of the value of today’s AI software. Gary Marcus, the AI authority and emeritus NYU professor of cognitive psychology, may be the best example of these doubters. Marcus has recently used his blog to point out several key challenges facing today’s AI that he doesn’t think will be solved anytime soon. These include common sense reasoning, but also AI’s continued unreliability, its inability to generalize to data different from what it encountered during training, and its difficulty with understanding compositionality (how parts constitute a whole). Coupled with the high cost of the most advanced AI systems, Marcus has often wondered whether AI will ever find much use in business outside a few niche settings.

But several conversations I had in the closing weeks of last year with business executives who are using AI at scale at their companies made it clear to me that, when it comes to business applications of AI, we should neither be as optimistic as Altman, nor as pessimistic as Marcus. The business executives I spoke to all reported finding ways to wring significant business value from today’s AI software, despite the shortcomings Marcus has highlighted.

But how they did this was far more complicated and involved far more human engineering work—and, often, a good deal more expense—than what one might think from Altman’s pronouncements about how close AGI is. In no case could one simply use one of the foundation models straight out of the box and have it reliably solve business problems.

Prosus’s Toqan System

One of the people I spoke to was Euro Beinat, executive vice president and global head of AI and data at Prosus, the Netherlands-based technology investment firm whose portfolio includes dozens of tech startups worldwide. I like talking to Beinat because Prosus’s diverse portfolio gives him a good vantage point from which to gauge how AI is being adopted across different kinds of companies—from food delivery apps to ecommerce plays to fintechs—and across different job functions. I also like speaking to him because he is unusually candid about what has worked, and what hasn’t.

Beinat said that in the past year, Prosus has rolled out its Toqan AI system to 25,000 employees across its various portfolio companies. Employees can use Toqan to do everything from answering questions about employment policies to drafting marketing surveys to assisting human customer support agents in finding the right documentation to answer customer queries. One of Prosus’s companies, OLX Poland, has created an agentic AI system, called OLX Magic, that helps walk sellers through the process of posting a listing or, in the case of a potential buyer, helps them shop, letting them specify what they are looking for in natural language and have a “conversation” about the options with an AI chatbot, rather than using a traditional search.

Using multiple models and an “agentic workflow”

Of course, one of the things that has held AI adoption in business back, as Marcus rightly points out, is reliability. Few business use cases can tolerate the 10% to 25% level of inaccuracies many large language models (LLMs) generate if used without any other interventions. Through an iterative process of improvement—including building better guardrails and updating the AI models it was using—Prosus gradually brought Toqan’s hallucination rate down from 10% in 2022 to 2.5%. But to get it down further, Beinat says, Prosus had to change how the entire system is engineered to build a more “agentic workflow.”

That process involves having an AI model that reasons about the nature of the question it’s being asked and decides whether the question can be given to an LLM (large language model) to answer directly, or whether it requires the agentic workflow. If it does, the model breaks the task into discrete parts and gives different AI “agents” (either models that have been fine-tuned for a specific task or LLMs that have been prompted to play a particular role and perhaps given a specific software tool to use to help complete that task) each part. Then there is a “reflection phase,” where an AI model checks the overall result of this workflow for errors, repeating the entire process if any are found. Using this system, Process has reduced hallucinations to 1.5%.

But, Beinat warns, “it is slower to do this and a lot more expensive in terms of token usage” than simply giving the question straight to an LLM and having it answer. Overall, the number of tokens used per query has increased by 2.5 times. Meanwhile, the average price per token has, thanks to price wars among cloud providers, slightly more than halved. So, on average, the system is only about 10% more expensive today than it was in early 2023.

Measuring ROI

The lower hallucination rate is probably worth the cost, he says. When Toqan was initially rolled out, it was embraced mostly by engineers, while people in other domains, such as human resources and legal, were reluctant to use it. Beinat says he thinks this was because engineers, due to the nature of their work, often had an intuitive sense of when they could trust the model’s output, whereas in other areas, detecting hallucinations was more difficult and the chance of errors made people hesitant to use Toqan. Now, with the lower hallucination rate, the majority of Toqan users are from non-engineering roles. Still, Beinat warns, managers should not expect AI’s impacts to be apparent immediately after a system is introduced. Prosus has found that on average it takes six months of learning and experimentation for users to figure out how to use these new AI tools most effectively in their particular role, he says.

And, even then, Beinat acknowledges figuring out the return on investment from AI is difficult. So far, he says, Prosus data shows that Toqan saves about 48 minutes on average per user per day. That’s not nothing, but he says the problem is that those 48 minutes “are spread all over the place. There are all these microbursts of productivity.” And the value of those saved minutes varies a lot depending on the use case. Prosus has calculated that right now, the cost of those 48 saved minutes per day, is about $12 per user per month, which he says is definitely worth it.

Reducing the cost of growth

Still, 48 minutes each day doesn’t seem like a game changer. And that’s why he says he often likes to highlight individual use cases, where AI’s transformative impact is more apparent. He points to iFood, a Brazil-based food delivery app Prosus owns. iFood told its employees that if they had a data analytics questions to try asking that question to Toqan before sending it on to a human data analyst. The company discovered that 70% of these questions could be solved by Toqan. iFood still employs plenty of data analysts who handle the question Toqan can’t, but now their backlog has been reduced and the capacity of those human data analysts is less of a bottleneck. And, of course, savings such as this mean that iFood can grow without hiring as many new employees—in essence, AI reduces the cost per dollar of revenue generated.

It’s this kind of business logic that is too often lost in Marcus’s pessimistic takes on today’s AI—while the significant effort it takes to deliver that cost reduction is often glossed over in Altman’s rosy statements about AGI being solved.

With that, here’s more AI news.

Jeremy Kahn
jeremy.kahn@fortune.com
@jeremyakahn

***
Before we get to the news, at the Fortune Brainstorm Tech dinner last night at CES in Las Vegas, legendary investor Mark Cuban treated the audience to his pearls of wisdom on everything from shaking up how pharmaceuticals are sold in America to the impact AI is having on his companies and everything else. You can check out the video of his talk on Fortune’s website here.

***
Also, a correction: Last Thursday’s (Jan. 2, 2025) edition of the newsletter on corporate cybersecurity training erroneously stated that none of the training courses reviewed addressed the emergence of deepfakes in live video calls. One training video, from Ninjio, did address this threat. We regret the error.

AI IN THE NEWS

Nvidia rolls out ‘world model’ Cosmos as it makes a play to dominate embodied AI. Nvidia is looking to grab a big piece of the expanding market for computer chips that will power robots, drones, self-driving cars, and other forms of “embodied AI” with the introduction of Cosmos, an open-source AI platform that generates realistic synthetic data for training AI “world models.” Announced by Nvidia CEO Jensen Huang at CES, Cosmos uses generative AI models trained on 20 million hours of real-world videos to create lifelike simulations, enabling breakthroughs in areas like self-driving cars and humanoid robots. The company has already signed partnerships with a number of leading humanoid robot startups and companies working on self-driving car software, Fortune’s Sharon Goldman reports.

OpenAI is losing money on its $200-per-month ChatGPT Pro plan. CEO Sam Altman revealed in a series of social media posts that the company is losing money on the expensive service, which gives users access to a souped-up version of its o1 reasoning model, among other benefits, due to unexpectedly high usage. Altman said he was surprised by this, since he set the price with the intent to generate profit. OpenAI, which has raised $20 billion but remains unprofitable, faces significant expenditures, including AI infrastructure costs, and is exploring options like price hikes and usage-based pricing to reach profitability, TechCrunch reports.

Texas lawmaker introduces tough AI bill. Republican State Rep. Giovanni Capriglione introduced House Bill 1709, which would regulate the development and use of high-risk AI systems and represents one of the most comprehensive efforts to regulate AI by any state so far. The 44-page bill proposes measures such as mandatory AI use disclosures, protections against bias and data misuse, safeguards for free speech, and restrictions on AI technologies that pose “unacceptable risks,” like behavioral manipulation or deepfakes. It also establishes an AI Regulatory Sandbox Program for research and testing, amends the state’s Data Privacy and Security Act to include AI-specific rules, and offers workforce grants and guidance through a new AI council. The bill is almost certain to draw opposition from the same group of AI companies, venture capitalists, and Big Tech associations that helped defeat California’s AI bill, SB 1047, last year. You can read more here from the Austin American-Statesman.

EYE ON AI RESEARCH

The innovations behind DeepSeek's V3. In last week’s edition of Eye on AI, I mentioned the impressive performance of DeepSeek’s V3 open-source AI model, which has topped a number of leader boards for open-source AI. Now, I want to highlight some of the technical tricks behind that performance which have caught the attention of other AI researchers.

One is that the model used multi-token prediction, where during training it is asked not just to predict the next token but the next several tokens. This made its training process much more efficient.

Like many LLMs, V3 is what is called a “mixture of experts” model. Different parts of the network are trained to be expert in answering certain kinds of questions and only those parts of the neural network must be activated when addressing that kind of question. This makes the model more efficient. But it creates issues during training because the team creating the model must find a way to make sure all the experts are adequately activated during training to ensure the whole model functions well. This is called “load balancing.” In the past, researchers had used a separate bit of mathematics to try to force the model to achieve this balance among its different experts. But DeepSeek dispensed with this extra step, which makes training and running the model less efficient, and instead DeepSeek came up with a number of clever ways to route prompts to the right set of experts in the model in order to ensure both a high-quality answer and load balancing.

The Association of Data Scientists has a good, not-too-technical breakdown of DeepSeek V3’s various innovations on its blog here. Expect to see these methods copied by other leading AI labs in the coming months.

FORTUNE ON AI

Nvidia’s Jensen Huang says AI agents are ‘a multi-trillion-dollar opportunity’ and ‘the age of AI Agentics is here’ —by Brooke Seipel

An AI-powered robot vacuum that can pick up your dirty socks and take pictures of your dog is one of the early stars of CES 2025 —by Chris Morris

Klarna CEO says he feels ‘gloomy’ because AI is developing so quickly it’ll soon be able to do his entire job —by Sydney Lake

Sam Altman says Airbnb’s CEO and a ‘legendary’ VC saved OpenAI and ‘stopped me from making several mistakes’ after he was briefly fired in 2023 —by Sasha Rogelberg

Sam Altman says Elon Musk won’t abuse his political power to target competitors, but cautions he ‘may turn out to be proven wrong’ —by Christiaan Hetzner

AI CALENDAR

Jan. 7-10: CES, Las Vegas

Jan 16-18: DLD Conference, Munich

Jan. 20-25: World Economic Forum, Davos, Switzerland

Feb. 10-11: AI Action Summit, Paris, France

March 3-6: MWC, Barcelona

March 7-15: SXSW, Austin

March 10-13: Human [X] conference, Las Vegas

March 17-20: Nvidia GTC, San Jose

April 9-11: Google Cloud Next, Las Vegas

BRAIN FOOD

Using AI to level the playing field between citizens, government, and business. One of the more optimistic takes on AI is that it will help average citizens even the odds. Currently, many of the avenues citizens have for seeking redress—the courts or various ombudsmen, tribunals, or arbitration processes—are so complicated and time-consuming, that citizens struggle to obtain justice. AI could handle much of this work, possibly at a fairly low cost, helping to create a more equal power dynamic between individuals, businesses, and the state.

One good example of this is DoNotPay, the startup that uses AI to help people in the U.S. and U.K. dispute parking fines or seek refunds for airline or hotel bookings. But DoNotPay is itself a paid service. A better example may be a new chatbot, called Roxanne, launched today by the New York City tenants’ rights organization Housing Court Answers in partnership with New York University and legal tech company Josef. The chatbot helps tenants answer questions about their rights vis-à-vis landlords when it comes to repairs to the property they rent. “Renting law and regulations in New York are notoriously complicated and hard to digest, so with Roxanne, we've made rental repairs guidance both easy to access and understand,” Jenny Laurie, executive director of Housing Court Answers, said in a statement.

I no longer live in New York City. But I used to. I tried out Roxanne on a series of hypothetical issues and found it was good—but limited. (You can play around with it yourself here.) It only helps with answers concerning rental repair issues. (Still, given that is all it was designed to do, I was actually glad to see that it refused to stray into areas beyond that expertise, such as whether a renter could sublet their property, rather than producing possibly erroneous answers.) When I asked if I could hypothetically withhold rent until my landlord carried out a repair, it correctly advised that this was a bad idea, one that could lead to eviction, and suggested instead documenting the repair and any attempts I’d made to get the landlord to remedy the issue. It suggested that I could use services provided by another nonprofit, justfix.nyc, to send the landlord a certified letter or prepare a Housing Court filing called an HP Action that could result in a court order that the landlord carry out the repair.

That’s all a start. But it would have been nice to see a greater integration between Roxanne and these justfix.nyc services so that this could all be carried out by the AI assistant. It would also be good to see these services expanded beyond just repairs to other housing law issues. Still, it is a good start and a sign that AI just might actually help tilt the balance of power, at least in some cases, in favor of the little guy.

This is the online version of Eye on AI, Fortune's biweekly newsletter on how AI is shaping the future of business. Sign up for free.

Why OpenAI CEO Sam Altman and AI skeptic Gary Marcus are both wrong about today’s AI