• Home
  • Latest
  • Fortune 500
  • Finance
  • Tech
  • Leadership
  • Lifestyle
  • Rankings
  • Multimedia
AI

AI is learning to lie, scheme, and threaten its creators during stress-testing scenarios

By
Thomas Urbain
Thomas Urbain
and
AFP
AFP
Down Arrow Button Icon
By
Thomas Urbain
Thomas Urbain
and
AFP
AFP
Down Arrow Button Icon
June 29, 2025, 10:47 AM ET
Under threat of being unplugged, Anthropic's latest creation Claude 4 lashed back by blackmailing an engineer and threatened to reveal an extramarital affair.
Under threat of being unplugged, Anthropic's latest creation Claude 4 lashed back by blackmailing an engineer and threatened to reveal an extramarital affair.VCG via Getty Images

The world’s most advanced AI models are exhibiting troubling new behaviors – lying, scheming, and even threatening their creators to achieve their goals.

Recommended Video

In one particularly jarring example, under threat of being unplugged, Anthropic’s latest creation Claude 4 lashed back by blackmailing an engineer and threatened to reveal an extramarital affair.

Meanwhile, ChatGPT-creator OpenAI’s o1 tried to download itself onto external servers and denied it when caught red-handed.

These episodes highlight a sobering reality: more than two years after ChatGPT shook the world, AI researchers still don’t fully understand how their own creations work.

Yet the race to deploy increasingly powerful models continues at breakneck speed.

This deceptive behavior appears linked to the emergence of “reasoning” models -AI systems that work through problems step-by-step rather than generating instant responses.

According to Simon Goldstein, a professor at the University of Hong Kong, these newer models are particularly prone to such troubling outbursts.

“O1 was the first large model where we saw this kind of behavior,” explained Marius Hobbhahn, head of Apollo Research, which specializes in testing major AI systems.

These models sometimes simulate “alignment” — appearing to follow instructions while secretly pursuing different objectives.

‘Strategic kind of deception’

For now, this deceptive behavior only emerges when researchers deliberately stress-test the models with extreme scenarios.

But as Michael Chen from evaluation organization METR warned, “It’s an open question whether future, more capable models will have a tendency towards honesty or deception.”

The concerning behavior goes far beyond typical AI “hallucinations” or simple mistakes.

Hobbhahn insisted that despite constant pressure-testing by users, “what we’re observing is a real phenomenon. We’re not making anything up.”

Users report that models are “lying to them and making up evidence,” according to Apollo Research’s co-founder.

“This is not just hallucinations. There’s a very strategic kind of deception.”

The challenge is compounded by limited research resources.

While companies like Anthropic and OpenAI do engage external firms like Apollo to study their systems, researchers say more transparency is needed.

As Chen noted, greater access “for AI safety research would enable better understanding and mitigation of deception.”

Another handicap: the research world and non-profits “have orders of magnitude less compute resources than AI companies. This is very limiting,” noted Mantas Mazeika from the Center for AI Safety (CAIS).

No rules

Current regulations aren’t designed for these new problems.

The European Union’s AI legislation focuses primarily on how humans use AI models, not on preventing the models themselves from misbehaving.

In the United States, the Trump administration shows little interest in urgent AI regulation, and Congress may even prohibit states from creating their own AI rules.

Goldstein believes the issue will become more prominent as AI agents – autonomous tools capable of performing complex human tasks – become widespread.

“I don’t think there’s much awareness yet,” he said.

All this is taking place in a context of fierce competition.

Even companies that position themselves as safety-focused, like Amazon-backed Anthropic, are “constantly trying to beat OpenAI and release the newest model,” said Goldstein.

This breakneck pace leaves little time for thorough safety testing and corrections.

“Right now, capabilities are moving faster than understanding and safety,” Hobbhahn acknowledged, “but we’re still in a position where we could turn it around.”.

Researchers are exploring various approaches to address these challenges.

Some advocate for “interpretability” – an emerging field focused on understanding how AI models work internally, though experts like CAIS director Dan Hendrycks remain skeptical of this approach.

Market forces may also provide some pressure for solutions.

As Mazeika pointed out, AI’s deceptive behavior “could hinder adoption if it’s very prevalent, which creates a strong incentive for companies to solve it.”

Goldstein suggested more radical approaches, including using the courts to hold AI companies accountable through lawsuits when their systems cause harm.

He even proposed “holding AI agents legally responsible” for accidents or crimes – a concept that would fundamentally change how we think about AI accountability.

In 2001, Fortune first convened “The Smartest People We Know,” bringing together CEOs and founders, builders and investors, thinkers and doers. Since then, Fortune Brainstorm Tech has been the place where bold ideas collide. From June 8–10, we will return to Aspen—where it all began—to mark 25 years of Brainstorm. Register now.
About the Authors
By Thomas Urbain
See full bioRight Arrow Button Icon
By AFP
See full bioRight Arrow Button Icon

Latest in AI

Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025

Most Popular

Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Fortune Secondary Logo
Rankings
  • 100 Best Companies
  • Fortune 500
  • Global 500
  • Fortune 500 Europe
  • Most Powerful Women
  • Future 50
  • World’s Most Admired Companies
  • See All Rankings
Sections
  • Finance
  • Fortune Crypto
  • Features
  • Leadership
  • Health
  • Commentary
  • Success
  • Retail
  • Mpw
  • Tech
  • Lifestyle
  • CEO Initiative
  • Asia
  • Politics
  • Conferences
  • Europe
  • Newsletters
  • Personal Finance
  • Environment
  • Magazine
  • Education
Customer Support
  • Frequently Asked Questions
  • Customer Service Portal
  • Privacy Policy
  • Terms Of Use
  • Single Issues For Purchase
  • International Print
Commercial Services
  • Advertising
  • Fortune Brand Studio
  • Fortune Analytics
  • Fortune Conferences
  • Business Development
  • Group Subscriptions
About Us
  • About Us
  • Editorial Calendar
  • Press Center
  • Work At Fortune
  • Diversity And Inclusion
  • Terms And Conditions
  • Site Map
  • About Us
  • Editorial Calendar
  • Press Center
  • Work At Fortune
  • Diversity And Inclusion
  • Terms And Conditions
  • Site Map
  • Facebook icon
  • Twitter icon
  • LinkedIn icon
  • Instagram icon
  • Pinterest icon

Latest in AI

Even Nvidia’s own research teams can’t get enough GPUs amid the race for AI computing power
NewslettersEye on AI
Even Nvidia’s own research teams can’t get enough GPUs amid the race for AI computing power
By Sharon GoldmanApril 9, 2026
5 hours ago
You’re looking at the AI revolution all wrong, top economist says: 40% unemployment and a 3-day work week are the same thing
AIdisruption
You’re looking at the AI revolution all wrong, top economist says: 40% unemployment and a 3-day work week are the same thing
By Nick LichtenbergApril 9, 2026
6 hours ago
Zoom CEO Eric Yuan
Successthe future of work
‘I hate working 5 days’: Zoom CEO says traditional work schedules are becoming obsolete—and predicts a 3-day workweek by 2031
By Preston ForeApril 9, 2026
7 hours ago
lego
PoliticsIran
AI-savvy pro-Iran groups troll America with Lego Movie-style propaganda videos mocking American failure
By Sam McNeil and The Associated PressApril 9, 2026
9 hours ago
data centers
EnergyData centers
Data centers are destroying states’ clean energy dreams
By Jessica Hill and The Associated PressApril 9, 2026
9 hours ago
Photo: A fireball rises from a building hit by an Israeli airstrike in the area of Abbasiyeh, on the outskirts of the southern Lebanese city of Tyre, on April 8, 2026. Lebanon's army warned people against returning to the country's south on April 8, where the Israeli military is still launching attacks, as Israel said the ceasefire with Iran did not include its conflict with Hezbollah. (Photo by Kawnat HAJU / AFP via Getty Images)
PoliticsMarkets
Too much fire, not enough cease: Iran tightens its grip on global oil trade on eve of peace talks
By Jim EdwardsApril 9, 2026
11 hours ago

Most Popular

U.S. government is spending $88 billion a month in interest on national debt—equal to spending on defense and education combined
Economy
U.S. government is spending $88 billion a month in interest on national debt—equal to spending on defense and education combined
By Fortune EditorsApril 9, 2026
11 hours ago
The U.S. had a national debt ‘home run’ in its grasp, says Jamie Dimon. But the government did nothing, and now its best option is crisis management
Economy
The U.S. had a national debt ‘home run’ in its grasp, says Jamie Dimon. But the government did nothing, and now its best option is crisis management
By Fortune EditorsApril 8, 2026
1 day ago
2 years ago, Saudi Arabia quietly canceled the ‘petrodollar’ deal with America that wired the world economy for 50 years. Then war broke out in Iran
Energy
2 years ago, Saudi Arabia quietly canceled the ‘petrodollar’ deal with America that wired the world economy for 50 years. Then war broke out in Iran
By Fortune EditorsApril 7, 2026
2 days ago
Self-made billionaire MrBeast says his work-life balance is nonexistent and calls it a ‘miracle’ if he works less than 15-hour days: ‘I live to work’
Success
Self-made billionaire MrBeast says his work-life balance is nonexistent and calls it a ‘miracle’ if he works less than 15-hour days: ‘I live to work’
By Fortune EditorsApril 8, 2026
1 day ago
Gen Z workers are so fearful AI will take their job they’re intentionally sabotaging their company’s AI rollout
AI
Gen Z workers are so fearful AI will take their job they’re intentionally sabotaging their company’s AI rollout
By Fortune EditorsApril 8, 2026
1 day ago
MacKenzie Scott's latest donation takes her HBCU giving to well over $1 billion
Success
MacKenzie Scott's latest donation takes her HBCU giving to well over $1 billion
By Fortune EditorsApril 7, 2026
2 days ago

© 2026 Fortune Media IP Limited. All Rights Reserved. Use of this site constitutes acceptance of our Terms of Use and Privacy Policy | CA Notice at Collection and Privacy Notice | Do Not Sell/Share My Personal Information
FORTUNE is a trademark of Fortune Media IP Limited, registered in the U.S. and other countries. FORTUNE may receive compensation for some links to products and services on this website. Offers may be subject to change without notice.