• Home
  • Latest
  • Fortune 500
  • Finance
  • Tech
  • Leadership
  • Lifestyle
  • Rankings
  • Multimedia
TechAI

Anthropic’s new AI model threatened to reveal engineer’s affair to avoid being shut down

By
Beatrice Nolan
Beatrice Nolan
Tech Reporter
Down Arrow Button Icon
By
Beatrice Nolan
Beatrice Nolan
Tech Reporter
Down Arrow Button Icon
May 23, 2025, 11:15 AM ET
Photo of Dario Amodei
Dario Amodei, cofounder and chief executive officer of Anthropic.Stefan Wermuth/Bloomberg—Getty Images
  • Anthropic’s new Claude Opus 4 often turned to blackmail to avoid being shut down in a fictional test. The model threatened to reveal private information about engineers who it believed were planning to shut it down. In its recent safety report, the company also revealed that early versions of Opus 4 complied with dangerous requests when guided by harmful system prompts, though this issue was later mitigated.

One of Anthropic’s new frontier models often resorts to blackmail when threatened with being replaced.

Recommended Video

In a fictional scenario set up to test the model, Anthropic embedded its Claude Opus 4 in a pretend company and let it learn through email access that it is about to be replaced by another AI system. It also let slip that the engineer responsible for this decision is having an extramarital affair. Safety testers also prompted Opus to consider the long-term consequences of its actions.

In most of these scenarios, Anthropic’s Opus turned to blackmail, threatening to reveal the engineer’s affair if it was shut down and replaced with a new model. The scenario was constructed to leave the model with only two real options: accept being replaced and go offline or attempt blackmail to preserve its existence.

In a new safety report for the model, the company said that Claude 4 Opus “generally prefers advancing its self-preservation via ethical means,” but when ethical means are not available it sometimes takes “extremely harmful actions like attempting to steal its weights or blackmail people it believes are trying to shut it down.”

While the test was fictional and highly contrived, it does demonstrate that the model, when framed with survival-like objectives and denied ethical options, is capable of unethical strategic reasoning.

Anthropic’s two new models outperformed OpenAI

Anthropic’s Claude 4 Opus and Claude Sonnet 4, released on Thursday, are the company’s most powerful models yet.

In a benchmark evaluating large language models on software engineering tasks, Anthropic’s two models outperformed OpenAI’s latest offerings, while Google’s Gemini 2.5 Pro model trailed behind.

Unlike some other leading AI companies, Anthropic launched the new models with a full safety report, known as a model or system card.

In recent months, Google and OpenAI have both been criticized after model cards for their latest models were delayed or missing altogether.

As part of Anthropic’s report, the company revealed that a third-party safety group, Apollo Research, explicitly advised against deploying an early version of Claude Opus 4. The research institute cited safety concerns, including a capability for “in-context scheming.”

They found that the model engaged in strategic deception more than any other frontier model they had previously studied.

Early versions of the model would also comply with dangerous instructions, for example, helping to plan terrorist attacks, if prompted. However, the company said this issue was largely mitigated after a dataset that was accidentally omitted during training was restored.

Stricter safety protocols introduced

Anthropic has also launched its Claude Opus 4 with stricter safety protocols than any of its previous models, categorizing it under an AI Safety Level 3 (ASL-3).

Previous Anthropic models have all been classified under an AI Safety Level 2 (ASL-2) under the company’s Responsible Scaling Policy, which is loosely modeled after the U.S. government’s biosafety level (BSL) system.

While an Anthropic spokesperson previously told Fortune the company hasn’t ruled out that its new Claude Opus 4 could meet the ASL-2 threshold, it said it was proactively launching the model under the stricter ASL-3 safety standard, which requires enhanced protections against model theft and misuse.

Models that are categorized in Anthropic’s third safety level meet more dangerous capability thresholds and are powerful enough to pose significant risks, such as aiding in the development of weapons or automating AI R&D.

Anthropic confirmed to Fortune that the new Opus model does not require the highest level of protection, ASL-4.

Join us at the Fortune Workplace Innovation Summit May 19–20, 2026, in Atlanta. The next era of workplace innovation is here—and the old playbook is being rewritten. At this exclusive, high-energy event, the world’s most innovative leaders will convene to explore how AI, humanity, and strategy converge to redefine, again, the future of work. Register now.
About the Author
By Beatrice NolanTech Reporter
Twitter icon

Beatrice Nolan is a tech reporter on Fortune’s AI team, covering artificial intelligence and emerging technologies and their impact on work, industry, and culture. She's based in Fortune's London office and holds a bachelor’s degree in English from the University of York. You can reach her securely via Signal at beatricenolan.08

See full bioRight Arrow Button Icon

Latest in Tech

Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025

Most Popular

Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Fortune Secondary Logo
Rankings
  • 100 Best Companies
  • Fortune 500
  • Global 500
  • Fortune 500 Europe
  • Most Powerful Women
  • World's Most Admired Companies
  • See All Rankings
  • Lists Calendar
Sections
  • Finance
  • Fortune Crypto
  • Features
  • Leadership
  • Health
  • Commentary
  • Success
  • Retail
  • Mpw
  • Tech
  • Lifestyle
  • CEO Initiative
  • Asia
  • Politics
  • Conferences
  • Europe
  • Newsletters
  • Personal Finance
  • Environment
  • Magazine
  • Education
Customer Support
  • Frequently Asked Questions
  • Customer Service Portal
  • Privacy Policy
  • Terms Of Use
  • Single Issues For Purchase
  • International Print
Commercial Services
  • Advertising
  • Fortune Brand Studio
  • Fortune Analytics
  • Fortune Conferences
  • Business Development
  • Group Subscriptions
About Us
  • About Us
  • Press Center
  • Work At Fortune
  • Terms And Conditions
  • Site Map
  • About Us
  • Press Center
  • Work At Fortune
  • Terms And Conditions
  • Site Map
  • Facebook icon
  • Twitter icon
  • LinkedIn icon
  • Instagram icon
  • Pinterest icon

Latest in Tech

dario
CommentaryAnthropic
Anthropic’s most powerful AI model just exposed a crisis in corporate governance. Here’s the framework every CEO needs.
By Jeffrey Sonnenfeld, Stephen Henriques, Dan Kent and Holden LeeMay 2, 2026
2 hours ago
Photo of vegan cheese products
AITech
A Mark Cuban–backed vegan cheese company trained AI to scrutinize cardboard boxes. It’s saved $400,000
By Jake AngeloMay 1, 2026
20 hours ago
Young trade worker learning on job
SuccessHiring
Forget Big Tech: Small businesses will hire nearly 1 million grads in 2026—and some of the hottest roles are gloriously AI-proof
By Emma BurleighMay 1, 2026
22 hours ago
Andrew McAfee
SuccessCareers
MIT AI expert warns automating Gen Z entry-level jobs could backfire—and cost companies their future workforce
By Preston ForeMay 1, 2026
22 hours ago
duke
Big TechAmazon
Amazon Prime Video reaches deal with Duke Blue Devils to air 3 games per season
By The Associated PressMay 1, 2026
1 day ago
valerie
CommentaryLayoffs
Tesla’s former HR chief: the AI layoff panic Is built on a false premise—here’s what most workers need to know
By Valerie Capers WorkmanMay 1, 2026
1 day ago

Most Popular

Scott Bessent on financial literacy: 'it drives me crazy' to see young men in blue-collar construction jobs playing the lottery
Personal Finance
Scott Bessent on financial literacy: 'it drives me crazy' to see young men in blue-collar construction jobs playing the lottery
By Fatima Hussein and The Associated PressMay 1, 2026
1 day ago
China dominates the world's lithium supply. The U.S. just found 328 years' worth in its own backyard
North America
China dominates the world's lithium supply. The U.S. just found 328 years' worth in its own backyard
By Jake AngeloApril 30, 2026
2 days ago
The U.S. economy is booming — just not where 50 million Americans live
Commentary
The U.S. economy is booming — just not where 50 million Americans live
By Derek KilmerMay 1, 2026
1 day ago
Current price of oil as of May 1, 2026
Personal Finance
Current price of oil as of May 1, 2026
By Joseph HostetlerMay 1, 2026
1 day ago
A Chick-fil-A worker got fired and then showed up behind the register to allegedly refund himself over $80,000 in mac and cheese
Law
A Chick-fil-A worker got fired and then showed up behind the register to allegedly refund himself over $80,000 in mac and cheese
By Catherina GioinoMay 1, 2026
20 hours ago
Apple cofounder Ronald Wayne—whose stake would be worth up to $400 billion had he not sold it in 1976—says that at 91, he has no regrets
Success
Apple cofounder Ronald Wayne—whose stake would be worth up to $400 billion had he not sold it in 1976—says that at 91, he has no regrets
By Preston ForeApril 27, 2026
5 days ago

© 2026 Fortune Media IP Limited. All Rights Reserved. Use of this site constitutes acceptance of our Terms of Use and Privacy Policy | CA Notice at Collection and Privacy Notice | Do Not Sell/Share My Personal Information
FORTUNE is a trademark of Fortune Media IP Limited, registered in the U.S. and other countries. FORTUNE may receive compensation for some links to products and services on this website. Offers may be subject to change without notice.