Some AI agent customers say reality doesn’t match the hype

By Sage LazzaroContributing writer

A phone displaying a title saying "manus" and "the general AI agent."

The viral Chinese AI app Manus bills itself as a "general AI agent"—but it has been plagued by reliability issues. It's just the latest example of how AI agents are failing to live up to the hype.

Lam Yik—Bloomberg via Getty Images

Hello and welcome to Eye on AI. In today’s edition…Companies experimenting with AI agents say the tech falls short of expectations; Nvidia announces its new chips and positions itself for the post-DeepSeek landscape; Elon Musk and Nvidia join the Microsoft-Blackrock AI fund; AI spammers are “brute forcing” the internet; and Foxconn emerges as a key player in the global AI race.

Hardly a day goes by without a tech company announcing a new AI “agent” it says will revolutionize workflows and unlock unprecedented efficiencies. But while the makers of these agents—companies like Salesforce, Amazon, Oracle, and tons of startups—are hyping them, some of their customers are growing skeptical that these tools can deliver, at least right now.

“Many customers report a gap between marketing and reality,” reads a new report from CB Insights, which analyzes the main pain points surrounding these products.

Throughout March, CB Insights surveyed over 40 customers of AI agents and found that they’re running into issues with reliability, integration, and security. Other recent headline events have highlighted some of the same issues. For instance, there was a surge of excitement over Manus, which was billed as the first fully autonomous “general agent” and lauded by some as another DeepSeek moment for China—until user tests revealed unreliable performance and questionable outputs.

The idea of an AI tool that can autonomously and accurately orchestrate and complete complex tasks makes sense as a goal to strive for, and it’s possible it can be achieved. But the current reality is that customers are traversing uncertain waters, and the hype cycle and muddled use of the term “agent” is causing confusion about what users can actually expect.

(Un)reliability is top-of-mind

DeepMind founder and CEO Demis Hassabis recently offered an insightful description of the reliability issues surrounding AI agents, comparing it to compounding interest.

“If your AI model has a 1% error rate and you plan over 5,000 steps, that 1% compounds like compound interest,” he said this week at a Google event, according to Computer Weekly. He went on to describe how by the time those 5,000 steps have been worked through the possibility of the answer being correct is “random.”

For companies that need to deliver accurate information and serve their own customers, a random possibility of accuracy is not usually acceptable. CB Insights reported reliability as the top concern among customers using AI agents, with nearly half citing it as an issue. One customer described getting partially processed information and hallucinations from an AI agent it deployed, for example.

Customers are also running into issues with integrating AI agents into their existing systems. A lack of interoperability has long caused headaches in the world of enterprise software, but with AI agents, integration is kind of the whole point. “It was a bit of a gamble that we were signing up for a product where they didn’t have quite all the integrations that we wanted,” one customer told CB Insights.

A new swath of security risks

Security also tops the list of customer concerns, and for good reason. Having a technology connect to various systems that contain sensitive information and take action autonomously opens up huge risks. Gartner predicts that by 2028, 25% of enterprise breaches will be traced back to AI agent abuse from both internal and external and malicious actors.

“Without proper governance, AI agents can and will inadvertently expose sensitive data, make unauthorized decisions, or create compliance blind spots,” Dimitri Sirota, CEO of data intelligence and compliance company Big ID, told Eye on AI.

He said the best way companies can experiment with AI agents safely is by avoiding products that aren’t transparent about how the AI agent makes decisions. Companies should also pilot AI agents in controlled environments so they can uncover risks and adjust as necessary before scaling.

What even is an AI “agent”?

The market for AI agents is becoming saturated, especially in specific niches like customer support and coding. At the same time, “no one knows what the hell an AI agent is,” as TechCrunch bluntly put it in a story published last Friday, arguing that the term has become “diluted to the point of meaninglessness.”

Every company is defining “AI agent” a little differently. Some generally use the term to refer to fully autonomous AI systems that can execute tasks independently, while others use it to refer to systems that follow predefined workflows. Some offer yet other definitions. And some—such as OpenAI—seem to frequently change and contradict their own prior definitions. A lot of tools that were previously called “AI assistants” are now also being referred to as “agents.”

For IT leaders, this definitional chaos creates confusion and deployment headaches. Not only is it difficult to understand what the products do and how they work, but it’s also impossible to compare benchmarks and performance metrics.

None of this is to say companies aren’t starting to see some benefits from AI agents. But it is a reminder that these are still very early days for this technology, and the hype is running well ahead of reality.

And with that, here’s more AI news.

Sage Lazzaro
sage.lazzaro@consultant.fortune.com
sagelazzaro.com

AI IN THE NEWS

Nvidia announces its new chips and positions itself for the post-DeepSeek AI landscape. Fortune’s Sharon Goldman reported from Nvidia’s GTC developer conference this week, where the company unveiled Blackwell Ultra, a family of chips shipping later this year, and Vera Rubin, its next-generation GPUs it plans to ship in 2026. Beyond just its product lineup, the event showed how the company is responding to a shift in the AI landscape where some are questioning just how much compute—and how many of its powerful chips—will really be needed to create and run future AI models. This skepticism of the “go big or go home” ethos that had prevailed in AI until recently was sparked by DeepSeek’s claim that it trained its R1 model for a fraction of the cost and computing power of competing models. On stage, Nvidia CEO Jensen Huang argued that the new, emerging reasoning models, like R1, will actually require more computing power, not less, as these systems use many more tokens per query than previous kinds of LLMs. “The amount of computation we need as a result of agentic AI, as a result of reasoning, is easily 100 times more than we thought we needed this time last year,” he said.

Softbank acquires Ampere for $6.5 billion. Masayoshi Son’s Japanese conglomerate will buy the U.S. semiconductor company Ampere Computing for $6.5 billion in cash and will operate it as a wholly-owned SoftBank subsidiary. Son framed the purchase of the seven-year old Ampere as a play on the future of “artificial superintelligence” which he said would require the kind of hardware breakthroughs Ampere has been working on. Ampere was founded by Intel veteran Renée James and was funded by the Carlyle Group and Oracle (which perhaps not coincidentally is also a partner with SoftBank in the new Stargate Project that has committed to investing $500 billion in U.S. AI infrastructure in the next four years.) Reuters has more about the acquisition here.

Elon Musk’s xAI and Nvidia join the Microsoft-Blackrock AI fund. The fund, also backed by Abu Dhabi, aims to raise $30 billion to build AI infrastructure including data centers and energy plants. It’s just one of several funds aiming to raise gargantuan funds for AI infrastructure, much like the $100 billion “Stargate” project announced by SoftBank, Oracle, and OpenAI. What is interesting here, however, is seeing Musk team up with Microsoft—the main backer of his rival, OpenAI. You can read more from Semafor.

AI spammers are “brute forcing” the internet. 404 Media dove into the impact AI-generated content is having on the internet and people’s sense of reality as AI spammers create content tailored for social media algorithms and search engines at unprecedented speeds. Human creators have always sought to hijack these algorithms, but the nature of AI means that so-called “AI slop” can be produced at unprecedented speed and scale. “A human running an AI can generate dozens of images, photos, or articles in a matter of seconds. This allows a creator using AI to not necessarily have to worry about the quality of their videos, because these metrics (or any metric on any social media platform) can be brute forced. If a video fails it does not matter, because you can make 10 more of them in a matter of seconds,” says 404 Media, adding that human-created content is getting almost entirely drowned out by AI-generated content because of the sheer amount of it.

FORTUNE ON AI

Exclusive: Pluralis raises $7.6 million from prominent investors to take on OpenAI with big decentralized models —by Ben Weiss

Booking Holdings CEO Glenn Fogel wants to use AI to make it “easier for everybody to experience the world” —by Fortune editors

Why Goldman Sachs’ CIO is taking a measured approach to rolling out AI across the business —by John Kell

AI CALENDAR

April 9-11: Google Cloud Next, Las Vegas

April 24-28: International Conference on Learning Representations (ICLR), Singapore

May 6-7: Fortune Brainstorm AI London. Apply to attend here.

May 20-21: Google IO, Mountain View, Calif.

July 13-19: International Conference on Machine Learning (ICML), Vancouver

July 22-23: Fortune Brainstorm AI Singapore. Apply to attend here.

EYE ON AI NUMBERS

78%

That’s how much Foxconn’s revenue from server assembly grew in Q4 2024, thanks to increased demand for AI. Overall, more than a quarter of the firm’s Q4 revenue came from its cloud and networking division, which includes manufacturing servers for AI leaders such as Nvidia. The company—which is the world’s largest contract electronics manufacturer and is best known for helping to assemble the iPhone—forecasts its server revenue to surpass $1 trillion this year, making up almost half of its revenue. At the same time, as a Chinese company, Foxconn is increasingly finding itself in the geopolitical crosshairs as AI competition between the U.S. and China heats up and Trump’s tariffs take effect across the globe.

This is the online version of Eye on AI, Fortune's biweekly newsletter on how AI is shaping the future of business. Sign up for free.