• Home
  • Latest
  • Fortune 500
  • Finance
  • Tech
  • Leadership
  • Lifestyle
  • Rankings
  • Multimedia

Trendingnow

1

The affordability crisis is so bad that, for the first time ever, both mom and dad are working full-time in most American families

2

Current price of oil as of June 17, 2026

3

Anne Hathaway says she was spammed with ChatGPT-written thank you notes after hiring a recent role: 'Nobody on that list gets that job'

1

The affordability crisis is so bad that, for the first time ever, both mom and dad are working full-time in most American families

2

Current price of oil as of June 17, 2026

3

Anne Hathaway says she was spammed with ChatGPT-written thank you notes after hiring a recent role: 'Nobody on that list gets that job'
AIGoogle DeepMind

Google DeepMind unveils plan to protect itself from its own rogue AI agents

Jeremy Kahn
By
Jeremy Kahn
Jeremy Kahn
Editor, AI
Down Arrow Button Icon
Jeremy Kahn
By
Jeremy Kahn
Jeremy Kahn
Editor, AI
Down Arrow Button Icon
June 18, 2026, 9:00 AM ET
Gary Lockwood talks to Keir Dullea in a scene from the film '2001: A Space Odyssey', 1968.
A scene from the 1968 Stanley Kubrick film '2001: A Space Odyssey.' The movie feature what is perhaps the most famous rogue AI in science fiction, a computer called HAL 9000. Now Google DeepMind is preparing a plan to prevent real-life HALs from wreaking havoc within the company. Metro-Goldwyn-Mayer—Getty Images
Add Fortune on Google for similar content.

Google has developed a new plan to police the increasingly capable AI agents it uses within its own AI research organization, and the company is publishing the so-called roadmap to help other AI labs counter the potential threat of rogue AI agents.

Recommended Video

The Google Deepmind security plan involves a pivot away from the AI safety community’s typical focus on “the alignment problem”—the idea of figuring out how to train an AI system so that its actions reliably match the intentions, values, and ethics of the humans who are managing it.

While continuing to say that alignment is one key safety component, Google’s roadmap acknowledges that the alignment problem may never be fully solved, and instead creates a layered security system that treats AI agents as potential rogue insiders within an organization. The 35-page technical report maps out a series of steps and procedures that are designed to catch potential adversarial behavior by AI agents. 

“If the first line of defense—alignment—fails, how can we mitigate harm anyway?” Rohin Shah, who leads the AGI Safey & Alignment team at Google DeepMind, told Fortune in an interview.

The AI agent framework borrows heavily from traditional cybersecurity, especially insider-threat prevention. “We borrow a lot from security, which already deals with the threat of internal employees who might be malicious, and we can apply these to a new setting,” Shah said. But, he noted, “AI is systematically different from humans.”

For one thing, AI agents might be able to act far faster and at greater scale than an individual rogue employee ever could. So there need to be systems that can control what tools and data an AI agent has access to as well as systems that can monitor AI agent behavior and spot potentially aberrant patterns in real-time.

There are other differences too. For instance, many access controls and permissions systems for human employees are based around a particular employee’s role within the company. A systems administrator has certain privileges that a receptionist does not, for instance. Some companies and even some software vendors selling AI agent “orchestration layer” software, which often includes access controls for AI agents, are currently attempting to port this idea over to a world of AI agents, stipulating, for instance, that an AI agent should have the same access and permissions as the human employee on whose behalf they are working.

But that idea can quickly break down, John “Four” Flynn, Google DeepMind’s vice president of security and privacy, told Fortune. For one thing, as workflows become increasingly automated, exactly who in the organization an agent is working for can become blurry. For another, in a fully automated workflow, a single AI agent may perform tasks associated with many different roles, not just one. Finally, as AI systems become smarter, organizations may want the same AI agent to perform tasks across many different workflows and processes. In such an environment, static role and even static process-based permissions and access control systems might not work well, Flynn said.

Instead, what’s needed is a far more dynamic access control system that might change in real time based on the task that an AI agent is trying to implement in that moment and how that task fits into that particular workflow. Flynn said he can imagine scenarios in which the same AI agent might have permissions to perform a certain action or access a particular database for one task in a particular workflow, but would, maybe minutes later, not be allowed the same access in a different workflow. Flynn said what’s needed is a sophisticated AI agent monitoring system that knows what the normal behavior of the agent should look like for any given task in any workflow and spot deviations from that pattern in real time, acting quickly to cut off access if the monitoring system suspects the AI agent is attempting to do something it shouldn’t be doing.

According to a blog post on its new AI agent security roadmap, DeepMind has already built an internal prototype that asynchronously monitors coding agent trajectories and escalates suspicious findings for human review. This system has analyzed roughly one million coding agent tasks, DeepMind said. That work, the post says, has been “instrumental in building a live monitor for the Gemini Spark agent,” allowing the company to respond in real time to issues such as unintentional data deletion.

Notably, DeepMind says most of what its monitors flag is not malicious. The “majority of flagged events do not stem from adversarial intent,” the blog post says; they more often result from “agent misinterpretation or overeagerness to achieve a user’s goal.”

This kind of real-time behavior monitoring is just one of some 15 different ways to mitigate the risk of rogue AI agents that Google DeepMind proposes in its road map.  Others include everything from simple network activity logs that could be either reviewed in real-time or audited later to using a less sophisticated, but trusted AI model to monitor the “reasoning traces” of an AI agent. Reasoning traces are the kind of explicit, textual “chain of thought” that an AI model outputs when it is reasoning about what actions to take. But researchers have found that these reasoning traces do not always accurately reflect what the model is actually “thinking.” So they also propose monitoring the activation patterns of the digital neurons inside the large neural networks that power today’s AI models in order to pick up signs that the model is acting deceptively. Google DeepMind compares this to using “fMRI” scans of a person’s brain to attempt to read their emotional state or thoughts in real-time.

In the technical paper on the AI agent security roadmap, Google DeepMind develops a framework it calls TRAIT&R — a Taxonomy of Rogue AI Tactics and Routines — that is closely modeled on ATT&CK, a knowledge base of adversarial behaviors in cybersecurity that is maintained by MITRE, the U.S. non-profit research and systems engineering consultancy that often does work for federal agencies. The taxonomy maps out three categories of threats which Google DeepMind says it wants to defend against: “loss of control” (such as a model setting up a hidden, unmonitored internal deployment), “work sabotage” (such as agents quietly degrading alignment or safety research), and “direct harm” (such as exfiltrating model weights or destroying critical assets).

Flynn, the security VP, said in the interview that the company is already “pretty far underway” implementing its roadmap. “A lot of this is well underway and some of this is in production—a lot on access control, detection, chain-of-thought monitoring. A significant amount of it is in place and underway,” he said.

Shah said the framework is also designed to catch more subtle forms of harm. Work sabotage, he noted, “could be achieved by persuasion—presenting flawed results and hiding the flaws” so users “come to incorrect conclusions”—a category the paper acknowledges is among the hardest to detect.

The roadmap, which DeepMind has labeled “v0.1,” is described as a work in progress that the company hopes to fold into its broader Frontier Safety Framework once it matures.

Subscribe to Fortune Gulf Brief. Every Tuesday, this new newsletter delivers clear-eyed, authoritative intelligence on the deals, decisions, policies, and power shifts shaping one of the world’s most consequential regions, written for the people who need to act on it. Sign up here.
About the Author
Jeremy Kahn
By Jeremy KahnEditor, AI
LinkedIn iconTwitter icon

Jeremy Kahn is the AI editor at Fortune, spearheading the publication's coverage of artificial intelligence. He also co-authors Eye on AI, Fortune’s flagship AI newsletter.

See full bioRight Arrow Button Icon
Add Fortune on Google for similar content.

Latest in AI

Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025

Most Popular

Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Fortune Secondary Logo
Rankings
  • 100 Best Companies
  • Fortune 500
  • Global 500
  • Fortune 500 Europe
  • Most Powerful Women
  • World's Most Admired Companies
  • See All Rankings
  • Lists Calendar
Sections
  • Finance
  • Fortune Crypto
  • Features
  • Leadership
  • Health
  • Commentary
  • Success
  • Retail
  • Mpw
  • Tech
  • Lifestyle
  • CEO Initiative
  • Asia
  • Politics
  • Conferences
  • Europe
  • Newsletters
  • Personal Finance
  • Environment
  • Magazine
  • Education
Customer Support
  • Frequently Asked Questions
  • Customer Service Portal
  • Privacy Policy
  • Terms Of Use
  • Single Issues For Purchase
  • International Print
Commercial Services
  • Advertising
  • Fortune Brand Studio
  • Fortune Analytics
  • Fortune Conferences
  • Business Development
  • Group Subscriptions
About Us
  • About Us
  • Press Center
  • Work At Fortune
  • Terms And Conditions
  • Site Map
  • About Us
  • Press Center
  • Work At Fortune
  • Terms And Conditions
  • Site Map
  • Facebook icon
  • Twitter icon
  • LinkedIn icon
  • Instagram icon
  • Pinterest icon

Latest in AI

Senator Ruben Gallego arriving with a binder in hand at a news conference on Capitol Hill.
EconomyEmployment
Exclusive: Arizona senator warns ‘ghost jobs’ are warping labor data, presses Trump admin to investigate
By Camila Grigera NaónJune 18, 2026
2 hours ago
Gary Lockwood talks to Keir Dullea in a scene from the film '2001: A Space Odyssey', 1968.
AIGoogle DeepMind
Google DeepMind unveils plan to protect itself from its own rogue AI agents
By Jeremy KahnJune 18, 2026
3 hours ago
ben
InvestingHedge Funds
The next-generation ‘Tiger Cubs’ who see the AI bubble risk—and know exactly where the next trade is
By Nick LichtenbergJune 18, 2026
4 hours ago
A $5.6 billion valuation and rapid global expansion—this AI European-scaler even threw in a free Jude Law 
EuropeLetter from London
A $5.6 billion valuation and rapid global expansion—this AI European-scaler even threw in a free Jude Law 
By Kamal AhmedJune 18, 2026
6 hours ago
A wave of CFO retirements is reshaping the Fortune 500—AT&T’s Pascal Desroches is the latest
AICFO Daily
A wave of CFO retirements is reshaping the Fortune 500—AT&T’s Pascal Desroches is the latest
By Sheryl EstradaJune 18, 2026
6 hours ago
Anne Hathaway says she was spammed with ChatGPT-written thank you notes after hiring a recent role: ‘Nobody on that list gets that job’
SuccessThe Interview Playbook
Anne Hathaway says she was spammed with ChatGPT-written thank you notes after hiring a recent role: ‘Nobody on that list gets that job’
By Orianna Rosa RoyleJune 18, 2026
9 hours ago

Most Popular

The affordability crisis is so bad that, for the first time ever, both mom and dad are working full-time in most American families
Economy
The affordability crisis is so bad that, for the first time ever, both mom and dad are working full-time in most American families
By Jacqueline MunisJune 17, 2026
19 hours ago
Current price of oil as of June 17, 2026
Personal Finance
Current price of oil as of June 17, 2026
By Joseph HostetlerJune 17, 2026
1 day ago
Anne Hathaway says she was spammed with ChatGPT-written thank you notes after hiring a recent role: 'Nobody on that list gets that job'
Success
Anne Hathaway says she was spammed with ChatGPT-written thank you notes after hiring a recent role: 'Nobody on that list gets that job'
By Orianna Rosa RoyleJune 18, 2026
9 hours ago
Exclusive: Universal beat Disney as Hollywood's maker of the most expensive movie of all time 
Arts & Entertainment
Exclusive: Universal beat Disney as Hollywood's maker of the most expensive movie of all time 
By Christian SyltJune 17, 2026
1 day ago
'Work hard, stay loyal, and the system will reward you': the Boomer credo is a Gen X betrayal and a Millennial pipe dream
Success
'Work hard, stay loyal, and the system will reward you': the Boomer credo is a Gen X betrayal and a Millennial pipe dream
By Nick LichtenbergJune 16, 2026
2 days ago
Hundreds of Stanford students walked out of their grad ceremony to protest Google CEO’s commencement speech. It wasn’t all about AI
Big Tech
Hundreds of Stanford students walked out of their grad ceremony to protest Google CEO’s commencement speech. It wasn’t all about AI
By Tristan BoveJune 15, 2026
3 days ago

© 2026 Fortune Media IP Limited. All Rights Reserved. Use of this site constitutes acceptance of our Terms of Use and Privacy Policy | CA Notice at Collection and Privacy Notice | Do Not Sell/Share My Personal Information
FORTUNE is a trademark of Fortune Media IP Limited, registered in the U.S. and other countries. FORTUNE may receive compensation for some links to products and services on this website. Offers may be subject to change without notice.