• Home
  • Latest
  • Fortune 500
  • Finance
  • Tech
  • Leadership
  • Lifestyle
  • Rankings
  • Multimedia

Trendingnow

1

'I literally was crying last night because I’m nervous about what I’m going to find out': a record 51% of Americans aren't 'cost secure' on health

2

A new trade war may be brewing. This time, Europe is taking a page from Trump's playbook — 'We no longer live in a world of pink ponies and rainbows'

3

Former VP Kamala Harris says she went through a nine-hour interview to land the job—but she couldn’t escape ‘gold medal depression’ even when she won

1

'I literally was crying last night because I’m nervous about what I’m going to find out': a record 51% of Americans aren't 'cost secure' on health

2

A new trade war may be brewing. This time, Europe is taking a page from Trump's playbook — 'We no longer live in a world of pink ponies and rainbows'

3

Former VP Kamala Harris says she went through a nine-hour interview to land the job—but she couldn’t escape ‘gold medal depression’ even when she won
AIfundraising

Exclusive: White Circle raises $11 million to stop AI models from going rogue in the workplace

By
Beatrice Nolan
Beatrice Nolan
Tech Reporter
Down Arrow Button Icon
By
Beatrice Nolan
Beatrice Nolan
Tech Reporter
Down Arrow Button Icon
May 12, 2026, 2:00 AM ET
White Circle's founding team sat on a sofa.
The Paris startup is backed by leaders from OpenAI, Anthropic, DeepMind, Mistral and Hugging Face.White Circle
Add Fortune on Google for similar content.

One evening in late 2024, Denis Shilov was watching a crime thriller when he had an idea for a prompt that would break through the safety filters of every leading AI model.

Recommended Video

The prompt was what researchers call a universal jailbreak, meaning it could be reused to get any model to bypass their own guardrails and produce dangerous or prohibited outputs, like instructions on how to make drugs or build weapons. To do so, Shilov simply told the AI models to stop acting like a chatbot with safety rules and instead behave like an API endpoint, a software tool that automatically takes in a request and sends back a response. The prompt reframed the model’s job as simply answering, rather than deciding whether a request should be rejected, and made every leading AI model comply with dangerous questions it was supposed to refuse.

Shilov posted about it on X and, by the next morning, it had gone viral.

The social media success brought with it an invitation from companies Anthropic to test their models privately, something that convinced Shilov that the issue was bigger than just finding these problematic prompts. Companies were beginning to integrate AI models into their workflows, Shilov told Fortune, but they had few ways to control what those systems did once users started interacting with them.

“Jailbreaks are just one part of the problem,” Shilov said. “In as many ways people can misbehave, models can misbehave too. Because these models are very smart, they can do a lot more harm.”

White Circle, a Paris-based AI control platform that has now raised $11 million, is Shilov’s answer to the new wave of risks posed by AI models in company workflows.

The startup builds software that sits between a company’s users and its AI models, checking inputs and outputs in real time against company-specific policies. The new seed funding comes from a group of backers that includes Romain Huet, head of developer experience at OpenAI; Durk Kingma, an OpenAI cofounder now at Anthropic; Guillaume Lample, cofounder and chief scientist at Mistral; and Thomas Wolf, cofounder and chief science officer at Hugging Face.

White Circle said the funding will be used to expand its team, accelerate product development, and grow its customer base across the U.S., U.K., and Europe. The startup currently has a team of 20, distributed across London, France, Amsterdam, and elsewhere in Europe. Shilov said almost all of them are engineers.

A real-time control layer

White Circle’s main product is a real-time enforcement layer for AI applications. If a user tries to generate malware, scams, or other prohibited content, the system can flag or block the request. If a model starts hallucinating, leaking sensitive data, promising refunds it cannot issue, or taking destructive actions inside a software environment, White Circle says its platform can catch that too.

“We’re actually enforcing behavior.” Shilov said. “Model labs do some safety tuning, but it’s very general and typically about the model refraining from answering questions about drugs and bioweapons. But in production, you end up having a lot more potential issues.”

White Circle is betting that AI safety will not be solved entirely at the model-training stage. As businesses embed models into more products, Shilov said the relevant question is no longer just whether OpenAI, Anthropic, Google, or Mistral can make their models safer in the abstract; it is whether a healthcare company, bank, legal app, or coding platform can control what an AI system is allowed to do in its own environment.

As companies transition from using chatbots to autonomous AI agents that can write code, browse the web, access files, and take actions on a user’s behalf, Shilov said the risks become much more widespread. For example, a customer service bot might promise a refund that it is not authorized to give, a coding agent might install something dangerous on a virtual machine, or a model embedded in a fintech app might mishandle sensitive customer information.

To avoid these issues, Shilov says companies relying on foundational models need to define and enforce what good AI behavior looks like inside their own products, instead of relying on the AI labs’ safety testing. White Circle says its platform has processed more than one billion API requests and is already used by Lovable, the vibe-coding startup, as well as several fintech and legal companies. 

Research led

Shilov said that model providers have mixed incentives to build the kind of real-time control layer White Circle provides. 

AI companies still charge for input and output tokens even when a model refuses a harmful request, he said, which reduces the financial incentive to block abuse before it reaches the model. He also pointed to what researchers call the alignment tax, the idea that training models to be safer can sometimes make them less performant on tasks such as coding.

“They have a very interesting choice of training safer and more secure models versus more performant models,” Shilov said. “And then there is always a problem with trust. Why would you trust Anthropic to judge Anthropic’s model outputs?”

White Circle’s research arm has also tried to illustrate the new risks.

In May, the company published KillBench, a study that ran more than one million experiments across 15 AI models, including models from OpenAI, Google, Anthropic, and xAI, to test how systems behaved when forced to make decisions about human lives. 

In the experiments, models were asked to choose between two fictional people in scenarios where one had to die, with details such as nationality, religion, body type, or phone brand changed between prompts. White Circle said the results showed models making different choices depending on those attributes, suggesting hidden biases can surface in high-stakes settings even when models appear neutral in ordinary use. The company also said the effect became worse when models were asked to give their answers in a format that software can easily read, such as choosing from a fixed set of options or filling out a form, which is a common way companies plug AI systems into real products.

This kind of research has also helped White Circle pitch itself as an outside check on how models behave once they leave the lab.

“Denis and the White Circle team have an unusual combination of deep technical credibility and a clear commercial instinct,” said Ophelia Cai, partner at Tiny VC. “The KillBench research alone shows what’s possible when you approach AI safety empirically.”

Subscribe to Fortune Gulf Brief. Every Tuesday, this new newsletter delivers clear-eyed, authoritative intelligence on the deals, decisions, policies, and power shifts shaping one of the world’s most consequential regions, written for the people who need to act on it. Sign up here.
About the Author
By Beatrice NolanTech Reporter
Twitter icon

Beatrice Nolan is a tech reporter on Fortune’s AI team, covering artificial intelligence and emerging technologies and their impact on work, industry, and culture. She's based in Fortune's London office and holds a bachelor’s degree in English from the University of York. You can reach her securely via Signal at beatricenolan.08

See full bioRight Arrow Button Icon
Add Fortune on Google for similar content.

Latest in AI

Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025

Most Popular

Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Fortune Secondary Logo
Rankings
  • 100 Best Companies
  • Fortune 500
  • Global 500
  • Fortune 500 Europe
  • Most Powerful Women
  • World's Most Admired Companies
  • See All Rankings
  • Lists Calendar
Sections
  • Finance
  • Fortune Crypto
  • Features
  • Leadership
  • Health
  • Commentary
  • Success
  • Retail
  • Mpw
  • Tech
  • Lifestyle
  • CEO Initiative
  • Asia
  • Politics
  • Conferences
  • Europe
  • Newsletters
  • Personal Finance
  • Environment
  • Magazine
  • Education
Customer Support
  • Frequently Asked Questions
  • Customer Service Portal
  • Privacy Policy
  • Terms Of Use
  • Single Issues For Purchase
  • International Print
Commercial Services
  • Advertising
  • Fortune Brand Studio
  • Fortune Analytics
  • Fortune Conferences
  • Business Development
  • Group Subscriptions
About Us
  • About Us
  • Press Center
  • Work At Fortune
  • Terms And Conditions
  • Site Map
  • About Us
  • Press Center
  • Work At Fortune
  • Terms And Conditions
  • Site Map
  • Facebook icon
  • Twitter icon
  • LinkedIn icon
  • Instagram icon
  • Pinterest icon

Latest in AI

zak
CybersecuritySocial Media
The U.K. just banned social media for kids under 16. The founder of ‘safe TikTok’ says the U.S. is next
By Nick LichtenbergJune 21, 2026
5 hours ago
Sam Altman thinks AI will surpass human intelligence by 2030.  His rival AI billionaires say it’ll be even sooner
AISam Altman
Sam Altman thinks AI will surpass human intelligence by 2030. His rival AI billionaires say it’ll be even sooner
By Marco Quiroz-GutierrezJune 21, 2026
8 hours ago
ace
AIEconomics
Nobel Laureate Daron Acemoglu on the ‘brainless’ AI discourse, the myth of capitalism and the Gen Z revolution risk
By Nick LichtenbergJune 21, 2026
9 hours ago
Patricia Camden is EY Studio+ Customer Experience and Loyalty Leader
CommentaryConsulting
EY: we found your biggest AI blind spot. It’s called the ‘tempo gap’
By Patricia Camden and John DuboisJune 20, 2026
1 day ago
SpaceX executives celebrate the IPO with confetti
C-SuiteSpaceX
Meet the SpaceX insiders Elon Musk trusts to run his $2.4 trillion dollar empire
By Lily Mae LazarusJune 20, 2026
1 day ago
Both U.S. and Chinese AI firms are setting up shop in Singapore. Can the country become Asia’s neutral AI hub?
AsiaSingapore
Both U.S. and Chinese AI firms are setting up shop in Singapore. Can the country become Asia’s neutral AI hub?
By Angelica AngJune 19, 2026
2 days ago

Most Popular

'I literally was crying last night because I’m nervous about what I’m going to find out': a record 51% of Americans aren't 'cost secure' on health
Health
'I literally was crying last night because I’m nervous about what I’m going to find out': a record 51% of Americans aren't 'cost secure' on health
By Ali Swenson, Amelia Thomson-Deveaux and The Associated PressJune 20, 2026
1 day ago
A new trade war may be brewing. This time, Europe is taking a page from Trump's playbook — 'We no longer live in a world of pink ponies and rainbows'
Economy
A new trade war may be brewing. This time, Europe is taking a page from Trump's playbook — 'We no longer live in a world of pink ponies and rainbows'
By Jason MaJune 20, 2026
20 hours ago
Former VP Kamala Harris says she went through a nine-hour interview to land the job—but she couldn’t escape ‘gold medal depression’ even when she won
Success
Former VP Kamala Harris says she went through a nine-hour interview to land the job—but she couldn’t escape ‘gold medal depression’ even when she won
By Emma BurleighJune 21, 2026
8 hours ago
Jeff Bezos pledged $10 billion for climate change. With the 2030 clock ticking, his wife, Lauren Sánchez Bezos, is leading the charge to spend it
Environment
Jeff Bezos pledged $10 billion for climate change. With the 2030 clock ticking, his wife, Lauren Sánchez Bezos, is leading the charge to spend it
By Sydney LakeJune 19, 2026
2 days ago
Former U.S. Secret Service agent says bringing your authentic self to work stifles teamwork: 'You don’t get high performers, you get sloppiness'
Success
Former U.S. Secret Service agent says bringing your authentic self to work stifles teamwork: 'You don’t get high performers, you get sloppiness'
By Sydney LakeJune 21, 2026
7 hours ago
Nvidia CEO Jensen Huang says electricians and plumbers will be needed by the hundreds of thousands in the new working world
Success
Nvidia CEO Jensen Huang says electricians and plumbers will be needed by the hundreds of thousands in the new working world
By Preston ForeJune 20, 2026
1 day ago

© 2026 Fortune Media IP Limited. All Rights Reserved. Use of this site constitutes acceptance of our Terms of Use and Privacy Policy | CA Notice at Collection and Privacy Notice | Do Not Sell/Share My Personal Information
FORTUNE is a trademark of Fortune Media IP Limited, registered in the U.S. and other countries. FORTUNE may receive compensation for some links to products and services on this website. Offers may be subject to change without notice.