OpenAI has said that some attack methods against AI browsers like ChatGPT Atlas are likely here to stay, raising questions about whether AI agents can ever safely operate across the open web.
The main issue is a type of attack called “prompt injection,” where hackers hide malicious instructions in websites, documents, or emails that can trick the AI agent into doing something harmful. For example, an attacker could embed hidden commands in a webpage—perhaps in text that is invisible to the human eye but looks legitimate to an AI—that override a user’s instructions and tell an agent to share a user’s emails, or drain someone’s bank account.
Following the launch of OpenAI’s ChatGPT Atlas browser in October, several security researchers demonstrated how a few words hidden in a Google Doc or clipboard link could manipulate the AI agent’s behavior. Brave, an open-source browser company that previously disclosed a flaw in Perplexity’s Comet browser, also published research warning that all AI-powered browsers are vulnerable to attacks like indirect prompt injection.
“Prompt injection, much like scams and social engineering on the web, is unlikely to ever be fully ‘solved,'” OpenAI wrote in a blog post Monday, adding that “agent mode” in ChatGPT Atlas “expands the security threat surface.”
OpenAI said that the aim was for users to “be able to trust a ChatGPT agent,” with Chief Information Security Officer Dane Stuckey adding that the way the company hopes to get there is by “investing heavily in automated red teaming, reinforcement learning, and rapid response loops to stay ahead of our adversaries.”
“We’re optimistic that a proactive, highly responsive rapid response loop can continue to materially reduce real-world risk over time,” the company said.
Fighting AI with AI
OpenAI’s approach to the problem is to use an AI-powered attacker of its own—essentially a bot trained through reinforcement learning to act like a hacker seeking ways to sneak malicious instructions to AI agents. The bot can test attacks in simulation, observe how the target AI would respond, then refine its approach and try again repeatedly.
“Our [reinforcement learning]-trained attacker can steer an agent into executing sophisticated, long-horizon harmful workflows that unfold over tens (or even hundreds) of steps,” OpenAI wrote. “We also observed novel attack strategies that did not appear in our human red teaming campaign or external reports.”
However, some cybersecurity experts are skeptical that OpenAI’s approach can address the fundamental problem.
“What concerns me is that we’re trying to retrofit one of the most security-sensitive pieces of consumer software with a technology that’s still probabilistic, opaque, and easy to steer in subtle ways,” Charlie Eriksen, a security researcher at Aikido Security, told Fortune.
“Red-teaming and AI-based vulnerability hunting can catch obvious failures, but they don’t change the underlying dynamic. Until we have much clearer boundaries around what these systems are allowed to do and whose instructions they should listen to, it’s reasonable to be skeptical that the tradeoff makes sense for everyday users right now,” he said. “I think prompt injection will remain a long-term problem … You could even argue that this is a feature, not a bug.”
A cat-and-mouse game
Security researchers also previously told Fortune that while a lot of cybersecurity risks were essentially a continuous cat-and-mouse game, the deep access that AI agents need—such as users’ passwords and permission to take actions on a user’s behalf—posed such a vulnerable threat opportunity it was unclear if their advantages were worth the risk.
George Chalhoub, assistant professor at UCL Interaction Centre, said that the risk is severe because prompt injection “collapses the boundary between the data and the instructions,” potentially turning an AI agent “from a helpful tool to a potential attack vector against the user” that could extract emails, steal personal data, or access passwords.
“That’s what makes AI browsers fundamentally risky,” Eriksen said. “We’re delegating authority to a system that wasn’t designed with strong isolation or a clear permission model. Traditional browsers treat the web as untrusted by default. Agentic browsers blur that line by allowing content to shape behavior, not just be displayed.”
OpenAI recommends users give agents specific instructions rather than providing broad access with vague directions like “take whatever action is needed.” The browser also has extra security features such as “logged out mode”— which allow a users to use it without sharing passwords— and “Watch mode”—which is a security feature that requires a user to explicitly confirm sensitive actions such as sending messages or making payments.
“Wide latitude makes it easier for hidden or malicious content to influence the agent, even when safeguards are in place,” OpenAI said in the blogpost.











