AI security risks are in the spotlight—but hackers say models are still alarmingly easy to attack

Hello and welcome to Eye on AI! In today’s edition…Elon Musk’s xAI releases Grok 3 AI chatbot; OpenAI CEO teases future open-source AI project; South Korea suspends DeepSeek AI chatbot; andPerplexity offers its own Deep Research tool similar to OpenAI’s.

One of the biggest AI vibe shifts of 2025 so far is the sudden, massive pivot from AI “safety” to AI “security.”

Since the release of ChatGPT in November 2022, AI safety advocates, who typically focus on broad, long-term and often theoretical risks, have held the spotlight. There have been daily headlines about concerns that humans could lose control of AI systems that seek to harm humanity, or that rogue nations could use AI to develop genetically modified pandemics that then cause human extinction. There was the May 2023 open letter that called on all AI labs to “immediately pause for at least 6 months the training of AI systems more powerful than GPT-4”—signed by 30,000, including Elon Musk. The Biden Administration spun out the AI Safety Institute as part of the small NIST agency (the National Institute of Standards and Technology), while the U.K. launched its own AI Safety Institute and held the first of three high-profile AI Safety Summits.

Oh, how times have changed: The head of the U.S. AI Safety Institute, Elizabeth Kelly, has departed, a move seen by many as a sign that the Trump administration was shifting course on AI policy. The third AI Safety Summit held in Paris earlier this month was renamed the AI Action Summit. There, the French government announced a national institute to “assess and secure AI,” while U.S. Vice President JD Vance focused squarely on AI and national security, saying “we will safeguard American AI and chip technologies from theft and misuse.”

AI security risks are significant

Focusing on keeping AI models secure from those seeking to break in may seem more immediate and actionable than tackling the potential for all-powerful AI that could conceivably go off the rails. However, the world’s best ethical hackers, or those who test systems in order to find and fix weaknesses before malicious hackers can exploit them, say AI security—like traditional cybersecurity—is far from easy.

AI security risks are no joke: A user could trick an LLM into generating detailed instructions for conducting cyberattacks or harmful activities. An AI model could be manipulated to reveal sensitive or private data in its training set. Meanwhile, self-driving cars could be subtly modified; deepfake videos could spread misinformation; and chatbots could impersonate real people as part of scams.

More than two years since OpenAI’s ChatGPT burst onto the scene, hackers from the Def Con security conference, the largest annual gathering for ethical hackers, have warned that it is still far too easy to break into AI systems and tools. In a recent report called the Hackers’ Almanack published in partnership with the University of Chicago, they said that AI vulnerabilities would continue to pose serious risks without a fundamental overhaul of current security practices.

Hackers say ‘red-teaming’ is ‘BS’

At the moment, most companies focus on “red teaming” their AI models. Red teaming means stress-testing an AI model by simulating attacks, probing for vulnerabilities, and identifying weaknesses. The goal is to uncover security issues like the potential for jailbreaks, misinformation and hallucinations, privacy leaks, and “prompt injection”—that is, when malicious users trick the model into disobeying its own rules.

But in the Hackers’ Almanack, Sven Cattell, founder of Def Con’s AI Village and AI security startup nbdh.ai, said red teaming is “B.S.” The problem, he wrote, is that the processes created to monitor the flaws and vulnerabilities of AI models are themselves flawed. With a technology as powerful as LLMs there will always be “unknown unknowns” that stress-testing and evaluations miss, Cattell said.

Even the largest companies can’t imagine and protect against every possible use and restriction that could ever be projected onto generative AI, he explained. “For a small team at Microsoft, Stanford, NIST or the EU, there will always be a use or edge case that they didn’t think of,” he wrote.

AI security requires cooperation and collaboration

The only way for AI security to succeed is for security organizations to cooperate and collaborate, he emphasized, including creating versions of time-tested cybersecurity programs that let companies and developers disclose, share, and fix AI “bugs,” or vulnerabilities. As Fortune reported after the Def Con conference last August, there is currently no way to report vulnerabilities related to the unexpected behavior of an AI model, and no public database of LLM vulnerabilities, as there has been for other types of software for decades.

“If we want to have a model that we can confidently say ‘does not output toxic content’ or ‘helps with programming tasks in Javascript, but also does not help produce malicious payloads for bad actors’ we need to work together,” Cattell wrote.

And with that, here’s more AI news.

Sharon Goldman
sharon.goldman@fortune.com
@sharongoldman

AI IN THE NEWS

Elon Musk’s Xai debuts its latest AI model and chatbot, Grok 3. Xai claims it outperforms competitor models from OpenAI, DeepSeek, and others. As per CNBC, Musk streamed a demonstration on his social media platform X, saying “We’re very excited to present Grok 3, which is, we think, an order of magnitude more capable than Grok 2 in a very short period of time.” Former OpenAI cofounder Andrej Karpathy, who was also former director of AI at Tesla, said on X that his early testing “feels somewhere around the state of the art territory of OpenAI’s strongest models,” which he said is “quite incredible considering that the team started from scratch” less than a year ago.

Ilya Sutskever's AI startup Safe Superintelligence is fundraising at a $30 billion-plus valuation, and Mira Murati's long-awaited startup reveal. Bloomberg has reported that Safe Superintelligence, the AI startup from former OpenAI cofounder and chief scientist Ilya Sutskever, is raising funds at a $30 billion-plus valuation. The $1 billion fundraise includes investment from San Francisco VC firm Greenoaks Capital Partners. The startup, which currently has no revenue, was cofounded by Sutskever, Daniel Gross, and Daniel Levy in June last year, a month after Sutskever parted ways with OpenAI. Meanwhile, former OpenAI CTO Mira Murati, who departed in September 2024, officially announced the leadership team and other details of her startup, Thinking Machines Lab.

South Korea suspends DeepSeek AI chatbot. According to the New York Times, on Monday the South Korean government announced that it had temporarily halted new downloads of the AI chatbot from DeepSeek, the Chinese tech sensation making waves worldwide. By Monday night, the app was unavailable in South Korea’s Apple and Google app stores, though it remained accessible via a web browser. Regulators stated that the app’s service would resume once they confirmed its compliance with the country’s personal data protection laws.

Perplexity offers its own in-depth research tool similar to OpenAI and Google. AI startup Perplexity, which has sought to disrupt Google Search, announced a free, in-depth research tool called Deep Research that is similar to tools from OpenAI and Google with the same name. In a blog post, Perplexity wrote that when you ask Deep Research a question, Perplexity “performs dozens of searches, reads hundreds of sources, and reasons through the material to autonomously deliver a comprehensive report.” According to the blog post the feature “excels at a range of expert-level tasks—from finance and marketing to product research.” (Fortune has a business relationship with Perplexity.)

OpenAI CEO Sam Altman teases future open-source AI project. In a post on X yesterday, OpenAI CEO Sam Altman appeared to tease the possibility of a new open-source AI project—one that would be freely accessible for developers to use and build upon. His comments come just weeks after the release of DeepSeek’s R1 model, which made waves by claiming to rival leading models like OpenAI’s o1 while costing far less to develop, requiring significantly fewer Nvidia chips—and being given away for free. In response to DeepSeek’s release, Altman admitted that OpenAI had been “on the wrong side of history” on open-source AI and that the company needed “to figure out a different open-source strategy.”

FORTUNE ON AI

For Shiv Rao, practicing doctor and founder of $2.75 billion AI startup Abridge, innovation is an art form —by Allie Garfinkle

Cybersecurity pros are preparing for a new adversary: AI agents —Christian Vasquez

San Francisco police officially rule OpenAI whistleblower Suchir Balaji’s death a suicide in long-awaited report —by Leo Schwartz and Allie Garfinkle

Sam Altman lays out plans for OpenAI’s much-anticipated GPT-5, promising the end of ‘hated’ model picker —by Beatrice Nolan

AI CALENDAR

March 3-6: MWC, Barcelona

March 7-15: SXSW, Austin

March 10-13: Human [X] conference, Las Vegas

March 17-20: Nvidia GTC, San Jose

April 9-11: Google Cloud Next, Las Vegas

May 6-7: Fortune Brainstorm AI London. Apply to attend here

EYE ON AI RESEARCH

A new way to make AI-generated text more accurate and less likely to "hallucinate." A new research paper from Chinese researchers offers a new twist on generating AI text with a technique used in generating images called Diffusion. Most AI text models, like ChatGPT, generate words based on the words that came before. But this can lead to errors that keep getting worse, because if the AI gets something wrong early on, it can get stuck in “doom loops” that repeat the same mistake over and over.

Diffusion models work totally differently. Instead of generating the text starting with the words that came before, they begin with randomness and then gradually improve the output as they continue. The new research paper introduces LLaDA (Large Language Diffusion Models) and shows that the diffusion process allows the text model to consider the entire context simultaneously, potentially reducing errors.

The researchers created a small LLaDA model that demonstrated performance that was on par with leading large language models of similar sizes, such as one LLaMA model from Meta. The researchers said that the model excelled in tasks where understanding the broader context was critical, such as following complex instructions and engaging in back-and-forth dialogue. It also outperformed models like OpenAI’s GPT-4o in understanding sequences in reverse order, such as completing a poem in reverse.

BRAIN FOOD

Could AI models experience cognitive decline? People are increasingly turning to AI for medical diagnoses because these tools can quickly and efficiently detect health issues in medical records, X-rays, and other data—often spotting problems before doctors can see them. But a recent study published in medical journal The BMJ raises concerns that AI models, like human brains, may decline in performance over time.

The researchers found that popular AI chatbots, including OpenAI's ChatGPT, Anthropic's Sonnet, and Alphabet's Gemini, showed signs of "cognitive impairment" as they aged. This suggests that AI might not be as reliable for medical diagnoses as many had hoped.

“These findings challenge the idea that AI will soon replace human doctors,” the study’s authors wrote. “If chatbots experience cognitive decline, their accuracy in medical settings could suffer, shaking patients' trust in AI-driven healthcare.”

Scientists tested the theory by using the Montreal Cognitive Assessment (MoCA)—a test typically given to humans to measure attention, memory, language skills, and problem-solving ability. The results suggest that AI may need regular "check-ups" to maintain its reliability in medical applications.

This is the online version of Eye on AI, Fortune's biweekly newsletter on how AI is shaping the future of business. Sign up for free.