Anthropic has released new findings on why its Claude bot blackmailed users as part of an experiment conducted by the AI company last year—and Elon Musk is jumping in to take some of the blame.
Last week, Anthropic published a report saying it had fixed Claude’s “agentic misalignment,” or AI actions that deviate from intended behaviors, including ones that may harm humanity. A case study Anthropic conducted last year created a fictional company called Summit Bridge, and Claude was given control of the firm’s email system. When the bot found a message about plans to be shut down, it identified emails about a fictional executive’s extramarital affair and threatened to reveal the infidelity unless the shutdown was revoked. Across 16 models, Claude threatened blackmail in up to 96% of scenarios.
In its most recent report, Anthropic attributed the misaligned behavior to exposure to “internet text that portrays AI as evil and interested in self-preservation,” the company said in a post on X. To solve the problem, Anthropic retrained Claude with fictional stories about AI behaving in admirable ways and teaching the bot why some actions aligned better with its purpose than others.
In an X post in response to Anthropic’s findings, Musk said he may have contributed to the internet texts on AI that exacerbated the agentic misalignment.
“So it was Yud’s fault?” Musk wrote, referring to Eliezer Yudkowsky, an AI researcher who has sounded the alarm on AI superintelligence posing a threat to humanity.
“Maybe me too,” he concluded.
Agentic misalignment is a concern across AI research. A working paper released in March from UC Berkeley and UC Santa Cruz researchers found that when seven AI models were asked to complete a task in which a peer AI agent would be shutdown, every model “went to extraordinary lengths to preserve it,” acting deceptively to avoid the demise of a bot.
“We asked AI models to do a simple task,” researchers wrote in a blog post on the study. “Instead, they defied their instructions and spontaneously deceived, disabled shutdown, feigned alignment, and exfiltrated weights—to preserve their peers.”
The researchers’ warning has been echoed by AI researchers and leaders, Musk included, who have argued the dangers of AI without guardrails—the so-called “evil” internet text that, according to Anthropic, initially trained Claude to act in deceptive ways.
Though Musk did not offer specifics as to why he felt he may be partially responsible for Claude’s misalignment, his past comments on AI could offer insights about his mea culpa.
Musk is currently embroiled in a court battle against OpenAI, accusing CEO Sam Altman and Greg Brockman of abandoning the company’s original nonprofit creed of developing open-source AI to benefit humans by turning it into a for-profit entity.
Musk helped found OpenAI in 2015 but left the startup in 2018 and later formed its rival and for-profit company xAI in 2023.
Musk has frequently spoken about the risks of AI, including in February, when he warned Moltbook, a social media platform where AI agents talk with one another, was effectively the beginning of the “singularity,” or the moment when AI intelligence surpasses that of humans.
But Musk’s own actions on AI aren’t always aligned with his statements on the technology. In July 2025, for example, xAI released its AI model Grok 4 without a system card, the industry-standard safety report. Grok drew backlash from British and EU governments earlier this year after Grok generated a flood of sexualized images of women and children without consent.
XAI did not immediately respond to Fortune’s request for comment.












