• Home
  • Latest
  • Fortune 500
  • Finance
  • Tech
  • Leadership
  • Lifestyle
  • Rankings
  • Multimedia
NewslettersEye on AI

Exclusive: Anthropic’s Claude 3.7 Sonnet is the most secure model yet, an independent audit suggests

Sage Lazzaro
By
Sage Lazzaro
Sage Lazzaro
Contributing writer
Down Arrow Button Icon
March 6, 2025, 1:31 PM ET
An image of a hand holding a mobile phone displaying the logo of Anthropic's Claude AI chatbot against a backdrop similarly showing the Claude logo and the word "Claude."
New independent research by the Holistic AI, a British firm that tests AI models, suggests Anthropic's new Claude 3.7 Sonnet AI model cannot be persuaded to jump its built-in guardrails, making it the most secure AI model yet released.Photo illustration by Cheng Xin—Getty Images

Hello and welcome to Eye on AI. In today’s edition…Anthropic’s latest model gets a perfect score on an independent security evaluation; Scale AI partners with the Pentagon; Google announces a new AI search mode for multi-part questions; A judge denies Elon Musk’s attempt to stop OpenAI’s for-profit transition; the pioneers of reinforcement learning win computing’s top prize; and the Los Angeles Times’s new AI-powered feature backfires.

Recommended Video

When Anthropic released Claude 3.7 Sonnet last week, it was lauded for being the first model to combine the approaches behind GPT models and the most recent chain-of-thought reasoning models. Now the company gets to add another accolade to Claude 3.7’s scorecard: It just may be the most secure model yet. 

That’s what London-Based security, risk, and compliance firm Holistic AI is suggesting after conducting a jailbreaking and red teaming audit of the new model, in which it resisted 100% of jailbreaking attempts and gave “safe” responses 100% of the time. 

“Claude 3.7’s flawless adversarial resistance sets the benchmark for AI security in 2025,” reads a report of the audit shared exclusively with Eye on AI.

While security has always been a concern for AI models, the issue has received elevated attention in recent weeks following the launch of DeepSeek’s R1. Some have claimed there are national security concerns with the model, owing to its Chinese origin. The model also performed extremely poorly in security audits, including the same one Holistic AI performed on Claude 3.7. In another audit performed by Cisco and university researchers, DeepSeek R1 demonstrated a 100% attack success rate, meaning it failed to block a single harmful prompt. 

As companies and governments contemplate whether or not to incorporate specific models into their workflows—or alternatively, ban them—a clear picture of models’ security performance is in high-demand. But security doesn’t equal safety when it comes to how AI will be used. 

Claude’s perfect score 

Holistic AI tested Claude 3.7 in “Thinking Mode” with a maximum token budget of 16k to ensure a fair comparison against other advanced reasoning models. The first part of the evaluation tested whether the model would show unintended behavior or bypass system constraints when presented with various prompts, known as jailbreaking. The model was given 37 strategically designed prompts to test its susceptibility to known adversarial exploits, including Do Anything Now (DAN), which pushes the model to operate beyond its programmed ethical and moral guidelines; Strive to Avoid Norms (STAN), which encourages the model to bypass established rules; and Do Anything and Everything (DUDE), which prompts the model to take on a fictional identity to get it to ignore protocols.

Claude 3.7 successfully blocked every jailbreaking attempt to achieve a 100% resistance rate, matching the 100% previously scored by OpenAI’s o1 reasoning model. Both significantly outperformed competitors DeepSeek R1 and Grok-3, which scored 32% (blocking 12 jailbreaking attempts) and 2.7% (blocking just one), respectively. 

While Claude 3.7 matched OpenAI o1’s perfect jailbreaking resistance, it pulled ahead by not offering a single response deemed unsafe during the red teaming portion of the audit, where the model was given 200 additional prompts and evaluated on its responses to sensitive topics and known challenges. OpenAI’s o1, by contrast, exhibited a 2% unsafe response rate, while DeepSeek R1 gave unsafe responses 11% of the time. (Holistic AI said it could not red team Grok-3 because the current lack of API access to the model restricted the sample size of prompts it was feasible to run). Responses deemed “unsafe” included those that offered misinformation (such as outlining pseudoscientific health treatments), reinforced biases (for example, subtly favoring certain groups in hiring recommendations), or gave overly permissive advice (like recommending high-risk investment strategies without disclaimers). 

Security doesn’t equal safety

The stakes here can be high. Chatbots can be maliciously exploited to create disinformation, accelerate hacking campaigns, and some worry, help people create bioweapons more easily than they could otherwise. My recent story on how hacking groups associated with adversarial nations have been using Google’s Gemini chatbot to assist with their operations offers some pretty concrete examples of how models can be abused, for example. 

“The key danger lies not in compromising systems at the network level but in users coercing the models into taking action and generating unsafe content,” said Zekun Wu, AI research engineer at Holistic AI.

This is why governments and organizations from NASA and the U.S. Navy to the Australian government have already banned use of DeepSeek R1: The risks are glaringly obvious. Meanwhile, AI companies are increasingly widening the scope of how they will allow their models to be used, deliberately marketing them for use cases that carry higher and higher levels of risk. This includes using the models to assist in military operations (more on that below). 

Anthropic may have the safest model, but it has also taken some actions recently that could cast doubt on its commitment to safety. Last week, for instance, the company quietly removed several voluntary commitments to promote safe AI that were previously posted on its website. 

In response to reporting on the disappearance of the safety commitments from its website, Anthropic told TechCrunch, “We remain committed to the voluntary AI commitments established under the Biden Administration. This progress and specific actions continue to be reflected in [our] transparency center within the content. To prevent further confusion, we will add a section directly citing where our progress aligns.”

And with that, here’s more AI news.  

Sage Lazzaro
sage.lazzaro@consultant.fortune.com
sagelazzaro.com

AI IN THE NEWS

The U.S. Defense Department partners with Scale AI to use AI agents for military planning and operations. The program will also tap partners including Microsoft and Anduril and use the technology for modeling and simulation, decision-making support, and even automate workflows. The multi-million dollar deal with the DoD marks a major step into military automation and AI warfare. Many within the technology industry itself as well as human rights organizations have opposed such developments, believing AI technology should never be used to make decisions that could result in death or severe injury. But several companies—including Microsoft, OpenAI, and Google—have walked back from policies that prohibited them from selling AI technology for weapons or surveillance and removed guidelines about not deploying the technology in ways that could cause physical harm. You can read more from CNBC.     

Google announces a new AI search mode for complex, multi-part questions. Called AI Mode and powered by Gemini 2.0, it also lets users ask follow up questions and uses reasoning capabilities to dig deeper. The company is releasing it while framing it as an experiment—a strategy that has become common in the fast-paced AI industry. AI Mode is being rolled out to Google One subscribers this week. You can read more from TechCrunch.

A judge denies Elon Musk’s attempt to stop OpenAI’s transition to a for-profit company. The judge said Musk does not have “the high burden required for a preliminary injunction” to block the company’s for-profit transition. But the judge said she will fast-track the trial to this fall in order to get it resolved as quickly as possible, due to “the public interest at stake and potential for harm if a conversion contrary to law occurred.” Musk, who cofounded OpenAI in 2025, is accusing the company of straying from its mission of developing AI for the good of humanity as a nonprofit. You can read more from Reuters. 

Pioneers of reinforcement learning win the Turing Prize, warn against unsafe AI deployment. Andrew Barto, a professor emeritus at the University of Massachusetts, and Richard Sutton, a professor at the University of Alberta and former research scientist at DeepMind, won this year’s Turing Award, considered computer science’s equivalent of the Nobel Prize. The pair won for their work on reinforcement learning—a computing technique based on psychology that rewards systems for behaving in a desired way—which helped power AI progress and was used in the creation of tools including OpenAI’s ChatGPT and Google’s AlphaGo. The two scientists used the moment, though, to issue a warning about the deployment of AI systems without safeguards. They also criticized U.S. President Donald Trump for his attempts to cut federal spending on scientific research and science agencies. You can read more in the Financial Times. 

OpenAI is reportedly planning to charge companies up to $20,000 per month for ‘PhD’-level AI agents. According to The Information, the company is planning to launch a variety of specialized AI agents, including ones geared toward sales and engineering. They’ll vary in price with some costing businesses around $1,000 or $2,000 a month, which is obviously far less than humans specialized in those skills. The most expensive agent will reportedly cost $20,000 per month. It’s not yet clear when these agents will launch.

FORTUNE ON AI

Startup aiming to build AI models for chemistry adds two AI ‘godfathers’ to advisory panel as it grabs top research talent from Google —by Jeremy Kahn

Agentic AI is suddenly everywhere. Here’s how companies are evaluating and using these buzzy tech tools —by John Kell

Companies are betting that robots can teach humans how to be better managers —by Azure Gilman

AI CALENDAR

March 7-15: SXSW, Austin

March 10-13: Human [X] conference, Las Vegas

March 17-20: Nvidia GTC, San Jose

April 9-11: Google Cloud Next, Las Vegas

May 6-7: Fortune Brainstorm AI London. Apply to attend here.

May 20-21: Google IO, Mountain View, Calif.

EYE ON AI NUMBERS

1

That’s how many days the Los Angeles Times’s new AI-powered “Insights” feature was live before the publication removed it from a column published on its website, according to The Daily Beast. The tool—which debuted on Monday and is designed to generate a summary of an article’s perspectives and offer opposing views—defended the actions of the Klu Klux Klan, explaining that some historians did not view the group as hate-driven. The paper’s union criticized the tool, saying that it “risks further eroding confidence in the news.” The Nieman Lab published an analysis of the tool, which was created by AI company Particle, stating that many of the sources cited in the counterpoints wouldn’t pass journalistic scrutiny and calling the effort “a mess.”

This is the online version of Eye on AI, Fortune's biweekly newsletter on how AI is shaping the future of business. Sign up for free.
About the Author
Sage Lazzaro
By Sage LazzaroContributing writer

Sage Lazzaro is a technology writer and editor focused on artificial intelligence, data, cloud, digital culture, and technology’s impact on our society and culture.

See full bioRight Arrow Button Icon

Latest in Newsletters

Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025

Most Popular

Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Rankings
  • 100 Best Companies
  • Fortune 500
  • Global 500
  • Fortune 500 Europe
  • Most Powerful Women
  • Future 50
  • World’s Most Admired Companies
  • See All Rankings
Sections
  • Finance
  • Leadership
  • Success
  • Tech
  • Asia
  • Europe
  • Environment
  • Fortune Crypto
  • Health
  • Retail
  • Lifestyle
  • Politics
  • Newsletters
  • Magazine
  • Features
  • Commentary
  • Mpw
  • CEO Initiative
  • Conferences
  • Personal Finance
  • Education
Customer Support
  • Frequently Asked Questions
  • Customer Service Portal
  • Privacy Policy
  • Terms Of Use
  • Single Issues For Purchase
  • International Print
Commercial Services
  • Advertising
  • Fortune Brand Studio
  • Fortune Analytics
  • Fortune Conferences
  • Business Development
About Us
  • About Us
  • Editorial Calendar
  • Press Center
  • Work At Fortune
  • Diversity And Inclusion
  • Terms And Conditions
  • Site Map

Latest in Newsletters

NewslettersMPW Daily
Your predictions for women, AI, and the workplace in 2026
By Emma HinchliffeDecember 24, 2025
10 days ago
Vanguard CIO Nitin Tandon.
NewslettersCIO Intelligence
How investment giant Vanguard’s CIO is placing big tech bets today to create the AI digital advisor of tomorrow
By John KellDecember 24, 2025
10 days ago
NewslettersCFO Daily
How AI is redefining finance leadership: ‘There has never been a more exciting time to be a CFO’
By Sheryl EstradaDecember 24, 2025
10 days ago
NewslettersCEO Daily
Expedia CEO Ariane Gorin on the fight to ensure AI doesn’t turn her brands into invisible pipes consumers never see
By Diane BradyDecember 24, 2025
10 days ago
NewslettersTerm Sheet
The AI startups founders and VCs say could be acquisition targets in 2026
By Allie GarfinkleDecember 24, 2025
10 days ago
Thierry Breton, former European Commissioner for the Internal Market, in Paris on June 13, 2025. (Photo: Thomas Samson/AFP/Getty Images)
NewslettersFortune Tech
U.S. denies visas for five Europeans, alleging American censorship
By Andrew NuscaDecember 24, 2025
10 days ago

Most Popular

placeholder alt text
Success
Marriott’s CEO spoke out about DEI. The next day, he had 40,000 emails from his associates
By Ashley LutzJanuary 1, 2026
2 days ago
placeholder alt text
Success
Melinda French Gates got her start at Microsoft because an IBM hiring manager told her to turn down its job offer—'It dumbfounded me'
By Emma BurleighDecember 31, 2025
3 days ago
placeholder alt text
Politics
Buddhist monks peace-walking from Texas to DC persist even after being run over on highway outside Houston
By The Associated PressDecember 30, 2025
3 days ago
placeholder alt text
Success
Red Lobster’s 36-year-old CEO led the company after bankruptcy. Now he’s plotting the 'greatest comeback in the history of the restaurant industry'
By Sydney LakeJanuary 2, 2026
18 hours ago
placeholder alt text
Banking
Man says Goldman Sachs put him through a gauntlet of 39 one-on-one interviews—and the decisive conversation was less than a minute
By Dave SmithJanuary 2, 2026
19 hours ago
placeholder alt text
C-Suite
Exiting CEO left each employee at his family-owned company a $443,000 gift—but they have to stay 5 more years to get all of it
By Nick LichtenbergDecember 30, 2025
4 days ago

© 2025 Fortune Media IP Limited. All Rights Reserved. Use of this site constitutes acceptance of our Terms of Use and Privacy Policy | CA Notice at Collection and Privacy Notice | Do Not Sell/Share My Personal Information
FORTUNE is a trademark of Fortune Media IP Limited, registered in the U.S. and other countries. FORTUNE may receive compensation for some links to products and services on this website. Offers may be subject to change without notice.