Last month, at the 33rd annual DEF CON, the world’s largest hacker convention, in Las Vegas, Anthropic researcher Keane Lucas took the stage. A former U.S. Air Force captain with a PhD in electrical and computer engineering from Carnegie Mellon, Lucas wasn’t there to unveil flashy cybersecurity exploits. Instead, he showed how Claude, Anthropic’s family of large language models, has quietly outperformed many human competitors in hacking contests—the kind used to train and test cybersecurity skills in a safe, legal environment. His talk highlighted not only Claude’s surprising wins but also its humorous failures, like drifting into musings on security philosophy when overwhelmed, or inventing fake “flags” (the secret codes competitors need to steal and submit to contest judges to prove they’ve successfully hacked a system).
Lucas wasn’t just trying to get a laugh, though. He wanted to show that AI agents are already more capable at simulated cyberattacks than many in the cybersecurity world realize—they are fast, and make good use of autonomy and tools. That makes them a potential tool for criminal hackers or state actors—and means, he argued, that those same tools need to be deployed for defense.
The message reflects Lucas’s role on Anthropic’s Frontier Red Team, an internal group of about 15 researchers tasked with stress-testing the company’s most advanced AI systems—probing how they might be misused in areas like biological research, cybersecurity, and autonomous systems, with a particular focus on risks to national security. Anthropic, which was founded in 2021 by ex–OpenAI employees, has cast itself as a safety-first lab convinced that unchecked models could pose “catastrophic risks.” But it is also one of the fastest-growing technology companies in history: This week Anthropic announced it has raised a fresh $13 billion at a $183 billion valuation and had passed $5 billion in run-rate revenue.
Unlike similar groups at other labs, Anthropic’s Red Team is also explicitly tasked with publicizing its findings. That outward-facing mandate reflects the team’s unusual placement inside Anthropic’s policy division, led by cofounder Jack Clark. Other safety and security teams at Anthropic sit under the company’s technical leadership, including a safeguards team that works to improve Claude’s ability to identify and refuse harmful requests, such as those that might negatively impact a user’s mental health or encourage self-harm.
According to Anthropic, the Frontier Red Team does the heavy lifting toward the company’s stated purpose of “building systems that people can rely on and generating research about the opportunities and risks of AI.” Its work underlies Anthropic’s responsible scaling policy (RSP), the company’s governance framework that triggers stricter safeguards as models approach various dangerous thresholds. It does so by running thousands of safety tests, or “evals,” in high-risk domains—results that can determine when to impose tighter controls.
For example, it was the Frontier Red Team’s assessments that led Anthropic to release its latest model, Claude Opus 4, under what the company calls “AI Safety Level 3”—the first model released under that status—as a “precautionary and provisional action.” This designation says the model significantly enhances a user’s ability to obtain, produce, or deploy chemical, biological, radiological, or nuclear weapons, by providing better instructions than existing, non-AI resources like search engines. It also is a system that begins to show signs of autonomy, which include the ability to act on a goal. By designating Opus 4 as ASL-3, Anthropic flipped on stronger internal security measures to prevent someone from obtaining the model weights, or the neural network “brains” of the model—and visible safeguards to block the model from answering queries that might help someone build a chemical or nuclear weapon.
Telling the world about AI risks is good for policy—and business
The Red Team’s efforts to amplify its message publicly have grown louder in recent months: It launched a stand-alone blog last month, called Red, with topics ranging from Anthropic’s nuclear-proliferation study in partnership with the Department of Energy, to a quirky experiment in which Claude runs a vending machine business. Lucas’s DEF CON talk was also the team’s first public outing at the conference.
“As far as I know, there’s no other team explicitly tasked with finding these risks as fast as possible—and telling the world about them,” said Frontier Red Team leader Logan Graham, who, along with Lucas, met with Fortune at a Las Vegas café just before DEF CON. “We have worked out a bunch of kinks about what information is sensitive and not sensitive to share, and ultimately, who’s responsible for dealing with this information. It’s just really clear that it’s really important for the public to know about this, and so there’s definitely a concerted effort.”
Experts in security and defense point out that the work of the Frontier Red Team, as part of Anthropic’s policy organization, also happens to be good for the company’s business—particularly in Washington, D.C. By showing they are out-front on national security risks, Anthropic turns what could be seen as an additional safety burden into a business differentiator.
“In AI, speed matters—but trust is what often accelerates scale,” said Wendy R. Anderson, a former Department of Defense staffer and defense tech executive. “From my years in the defense tech world, I’ve observed that companies that make safety and transparency core to their strategy don’t just earn credibility with regulators, they help shape the rules … It determines who gets access to the highest-value, most mission-critical deployments.”
Jen Weedon, a lecturer at Columbia University’s School of International and Public Affairs, who researches best practices in red teaming AI systems, pointed out that where a red team sits in the organizational chart shapes its incentives.
“By placing its Frontier Red Team under the policy umbrella, Anthropic is communicating that catastrophic risks aren’t just technical challenges—they’re also political, reputational, and regulatory ones,” she said. “This likely gives Anthropic leverage in Washington, but it also shows how security and safety talk doubles as strategy.” The environment for AI business in the U.S. right now, particularly for public-sector use cases, “seems to be open for the shaping and taking,” pointing to the Trump administration’s recently announced AI Action Plan, which is “broad in ambition but somewhat scant in details, particularly around safeguards.”
Critics from across the industry, however, have long taken aim at Anthropic’s broader efforts on AI safety. Some, like Yann LeCun, chief scientist at Meta’s Fundamental AI Research lab, argue that catastrophic risks are overblown and that today’s models are “dumber than a cat.” Others say the focus should be on present-day harms (such as encouraging self-harm or the tendency of LLMs to reinforce racial or gender stereotypes), or fault the company for being overly secretive despite its safety branding. Nvidia’s Jensen Huang has accused CEO Dario Amodei of regulatory capture—using his stance on AI safety to scare lawmakers into enacting rules that would benefit Anthropic at the expense of its rivals. He’s even claimed Amodei is trying to “control the entire industry.” (Amodei, on a recent technology podcast, called Huang’s comments “an outrageous lie” and a “bad-faith distortion.”)
On the other end of the spectrum, some researchers argue Anthropic isn’t going far enough. UC Berkeley’s Stuart Russell told the Wall Street Journal, “I actually think we don’t have a method of safely and effectively testing these kinds of systems.” And studies carried out by the nonprofits SaferAI and the Future of Life Institute (FLI) said that top AI companies such as Anthropic maintain “unacceptable” levels of risk management and show a “striking lack of commitment to many areas of safety.”
Inside Anthropic, though, executives argue that the Frontier Red Team, working alongside the company’s other security and safety teams, exists precisely to surface AI’s biggest potential risks—and to force the rest of the industry to reckon with them.
Securing the world from rogue AI models
Graham, who helped found Anthropic’s Frontier Red Team in 2022, has, like others in the group, a distinctive résumé: After studying economics in college, he earned a PhD in machine learning at Oxford as a Rhodes Scholar before spending two years advising the U.K.’s prime minister on science and technology.
Graham described himself as “AGI-pilled,” which he defines as someone who believes that AI models are just going to keep getting better. He added that while the Red Team’s viewpoints are diverse, “the people who select into it are probably, on average, more AGI-pilled than most.” The eclectic team includes a bioengineering expert, as well as three physicists, though Graham added that the most desired skill on the team is not a particular domain or background, but “craftiness”—which obviously comes in handy when trying to outsmart an AI into revealing dangerous capabilities.
The Frontier Red Team is “one of the most unique groups in the industry,” said Dan Lahav, CEO of a stealth startup that focuses on evaluating frontier models (his firm conducted third-party tests on Anthropic’s Claude 4, as well as OpenAI’s GPT-5). To work effectively, he said, its members need to be “hardcore AI scientists” but also able to communicate outcomes clearly—“philosophers blended with AI scientists.”
Calling it a “Red Team” is a spin on traditional security red teams—units that stress-test an organization’s defenses by playing the role of the attacker. Anthropic’s Frontier Red Team, Graham said, works differently. The key difference, he explained, is what they’re protecting against, and why. Traditional security red teams protect an organization from external attackers by finding vulnerabilities in their systems. Anthropic’s Frontier Red Team, on the other hand, is designed to protect society from the company’s own products, its AI models, by discovering what these systems are capable of before those capabilities become dangerous. They work to understand: “What could this AI do if someone wanted to cause harm?” and “What will AI be capable of next year that it can’t do today?”
For example, Anthropic points out that nuclear know-how, like AI, can be used for good or for harm—the same science behind power plants can also inform weapons development. To guard against that risk, the company recently teamed up with the Department of Energy’s National Nuclear Security Administration to test whether its models could spill sensitive nuclear information (they could not). More recently, they’ve gone a step further, codeveloping a tool with the agency that flags potentially dangerous nuclear-related conversations with high accuracy.
Anthropic isn’t alone in running AI-safety-focused “red team” exercises on its AI models: OpenAI’s red-team program feeds into its “Preparedness” framework, and Google DeepMind runs its own safety evaluations. But at those other companies, the red teams sit closer to technical security and research, while Anthropic’s placement under policy underscores what can be seen as a triple role—probing risks; making the public aware of them; and as a kind of marketing tool, reinforcing the company’s safety bona fides.
The right incentive structure
Jack Clark, who before cofounding Anthropic led policy efforts at OpenAI, told Fortune that the Frontier Red Team is focused on generating the evidence that guides both company decisions and public debate—and placing it under his policy organization was a “very intentional decision.”
Clark stressed that this work is happening in the context of rapid technological progress. “If you look at the actual technology, the music hasn’t stopped,” he said. “Things keep advancing, perhaps even more quickly than they did in the past.” In official submissions from Anthropic to the White House, he pointed out that the company has been consistent in saying it expects “really powerful systems by late 2026 or early 2027.”
That prediction, he explained, comes directly from the kinds of novel tests the Frontier Red Team are running. Some of what the team is studying are things like complex cyber-offense tasks, he explained, which involve long-horizon, multistep problem-solving. “When we look at performance on these tests, it keeps going up,” he said. “I know that these tests are impossible to game because they have never been published and they aren’t on the internet. When I look at the scores on those things, I just come away with this impression of continued, tremendous, and awesome progress, despite the vibes of people saying maybe AI is slowing down.”
Anthropic’s bid to shape the conversation on AI safety doesn’t end with the Frontier Red Team—or even with its policy shop. In July, the company unveiled a National Security and Public Sector Advisory Council stocked with former senators, senior Defense Department officials, and nuclear experts. The message is clear: Safety work isn’t just about public debate, it’s also about winning trust in Washington. For the Frontier Red Team and beyond, Anthropic is betting that transparency about risk can translate into credibility with regulators, government buyers, and enterprise customers alike.
“The purpose of the Frontier Red Team is to create better information for all of us about the risks of powerful AI systems. By making this available publicly, we hope to inspire others to work on these risks as well, and build a community dedicated to understanding and mitigating them,” said Clark. “Ultimately, we expect this will lead to a far larger market for AI systems than exists today, though the primary motivating purpose is for generating safety insights rather than product ones.”
The real test
The real test, though, is whether Anthropic will still prioritize safety if doing so means slowing its own growth or losing ground to rivals, according to Herb Lin, senior research scholar at Stanford University’s Center for International Security and Cooperation and research fellow at the Hoover Institution.
“At the end of the day, the test of seriousness—and nobody can know the answer to this right now—is whether the company is willing to put its business interests second to legitimate national security concerns raised by its policy team,” he said. “That ultimately depends on the motivations of the leadership at the time those decisions arise. Let’s say it happens in two years—will the same leaders still be there? We just don’t know.”
While that uncertainty may hang over Anthropic’s safety-first pitch, inside the company, the Frontier Red Team wants to show there’s room for both caution and optimism.
“We take it all very, very seriously so that we can find the fastest path to mitigating risks,” said Graham.
Overall, he adds, he’s optimistic: “I think we want people to see that there’s a bright future here, but also realize that we can’t just go there blindly. We need to avoid the pitfalls.”