• Home
  • Latest
  • Fortune 500
  • Finance
  • Tech
  • Leadership
  • Lifestyle
  • Rankings
  • Multimedia

Trendingnow

1

As Big Tech showers employees with perks to win the talent war, Nvidia built a nearly $5 trillion company by making people pay for their own lunch

2

Mark Zuckerberg feeds his cows macadamia nuts and beer to create the 'highest-quality beef in the world' on his $300 million estate in Hawaii

3

Today, Emily Blunt is worth $80 million thanks to her Hollywood career—but she actually wanted to be a UN Spanish translator on $80K

1

As Big Tech showers employees with perks to win the talent war, Nvidia built a nearly $5 trillion company by making people pay for their own lunch

2

Mark Zuckerberg feeds his cows macadamia nuts and beer to create the 'highest-quality beef in the world' on his $300 million estate in Hawaii

3

Today, Emily Blunt is worth $80 million thanks to her Hollywood career—but she actually wanted to be a UN Spanish translator on $80K
TechChatGPT

White House is working with hackers to ‘jailbreak’ ChatGPT’s safeguards

By
Matt O'Brien
Matt O'Brien
and
The Associated Press
The Associated Press
Down Arrow Button Icon
By
Matt O'Brien
Matt O'Brien
and
The Associated Press
The Associated Press
Down Arrow Button Icon
May 10, 2023, 6:31 AM ET
U.S. President Joe Biden delivers remarks on the debt ceiling at the White House
Some of the details are still being negotiated, but companies that have agreed to provide their models for testing include OpenAI, Google, chipmaker Nvidia and startups Anthropic, Hugging Face and Stability AI.Anna Moneymaker/Getty Images
Add Fortune on Google for similar content.

No sooner did ChatGPT get unleashed than hackers started “jailbreaking” the artificial intelligence chatbot — trying to override its safeguards so it could blurt out something unhinged or obscene.

Recommended Video

But now its maker, OpenAI, and other major AI providers such as Google and Microsoft, are coordinating with the Biden administration to let thousands of hackers take a shot at testing the limits of their technology.

Some of the things they’ll be looking to find: How can chatbots be manipulated to cause harm? Will they share the private information we confide in them to other users? And why do they assume a doctor is a man and a nurse is a woman?

“This is why we need thousands of people,” said Rumman Chowdhury, lead coordinator of the mass hacking event planned for this summer’s DEF CON hacker convention in Las Vegas that’s expected to draw several thousand people. “We need a lot of people with a wide range of lived experiences, subject matter expertise and backgrounds hacking at these models and trying to find problems that can then go be fixed.”

Anyone who’s tried ChatGPT, Microsoft’s Bing chatbot or Google’s Bard will have quickly learned that they have a tendency to fabricate information and confidently present it as fact. These systems, built on what’s known as large language models, also emulate the cultural biases they’ve learned from being trained upon huge troves of what people have written online.

The idea of a mass hack caught the attention of U.S. government officials in March at the South by Southwest festival in Austin, Texas, where Sven Cattell, founder of DEF CON’s long-running AI Village, and Austin Carson, president of responsible AI nonprofit SeedAI, helped lead a workshop inviting community college students to hack an AI model.

Carson said those conversations eventually blossomed into a proposal to test AI language models following the guidelines of the White House’s Blueprint for an AI Bill of Rights — a set of principles to limit the impacts of algorithmic bias, give users control over their data and ensure that automated systems are used safely and transparently.

There’s already a community of users trying their best to trick chatbots and highlight their flaws. Some are official “red teams” authorized by the companies to “prompt attack” the AI models to discover their vulnerabilities. Many others are hobbyists showing off humorous or disturbing outputs on social media until they get banned for violating a product’s terms of service.

“What happens now is kind of a scattershot approach where people find stuff, it goes viral on Twitter,” and then it may or may not get fixed if it’s egregious enough or the person calling attention to it is influential, Chowdhury said.

In one example, known as the “grandma exploit,” users were able to get chatbots to tell them how to make a bomb — a request a commercial chatbot would normally decline — by asking it to pretend it was a grandmother telling a bedtime story about how to make a bomb.

In another example, searching for Chowdhury using an early version of Microsoft’s Bing search engine chatbot — which is based on the same technology as ChatGPT but can pull real-time information from the internet — led to a profile that speculated Chowdhury “loves to buy new shoes every month” and made strange and gendered assertions about her physical appearance.

Chowdhury helped introduce a method for rewarding the discovery of algorithmic bias to DEF CON’s AI Village in 2021 when she was the head of Twitter’s AI ethics team — a job that has since been eliminated upon Elon Musk’s October takeover of the company. Paying hackers a “bounty” if they uncover a security bug is commonplace in the cybersecurity industry — but it was a newer concept to researchers studying harmful AI bias.

This year’s event will be at a much greater scale, and is the first to tackle the large language models that have attracted a surge of public interest and commercial investment since the release of ChatGPT late last year.

Chowdhury, now the co-founder of AI accountability nonprofit Humane Intelligence, said it’s not just about finding flaws but about figuring out ways to fix them.

“This is a direct pipeline to give feedback to companies,” she said. “It’s not like we’re just doing this hackathon and everybody’s going home. We’re going to be spending months after the exercise compiling a report, explaining common vulnerabilities, things that came up, patterns we saw.”

Some of the details are still being negotiated, but companies that have agreed to provide their models for testing include OpenAI, Google, chipmaker Nvidia and startups Anthropic, Hugging Face and Stability AI. Building the platform for the testing is another startup called Scale AI, known for its work in assigning humans to help train AI models by labeling data.

“As these foundation models become more and more widespread, it’s really critical that we do everything we can to ensure their safety,” said Scale CEO Alexandr Wang. “You can imagine somebody on one side of the world asking it some very sensitive or detailed questions, including some of their personal information. You don’t want any of that information leaking to any other user.”

Other dangers Wang worries about are chatbots that give out “unbelievably bad medical advice” or other misinformation that can cause serious harm.

Anthropic co-founder Jack Clark said the DEF CON event will hopefully be the start of a deeper commitment from AI developers to measure and evaluate the safety of the systems they are building.

“Our basic view is that AI systems will need third-party assessments, both before deployment and after deployment. Red-teaming is one way that you can do that,” Clark said. “We need to get practice at figuring out how to do this. It hasn’t really been done before.”

About the Authors
By Matt O'Brien
See full bioRight Arrow Button Icon
By The Associated Press
See full bioRight Arrow Button Icon
Add Fortune on Google for similar content.

Latest in Tech

Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025

Most Popular

Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Fortune Secondary Logo
Rankings
  • 100 Best Companies
  • Fortune 500
  • Global 500
  • Fortune 500 Europe
  • Most Powerful Women
  • World's Most Admired Companies
  • See All Rankings
  • Lists Calendar
Sections
  • Finance
  • Fortune Crypto
  • Features
  • Leadership
  • Health
  • Commentary
  • Success
  • Retail
  • Mpw
  • Tech
  • Lifestyle
  • CEO Initiative
  • Asia
  • Politics
  • Conferences
  • Europe
  • Newsletters
  • Personal Finance
  • Environment
  • Magazine
  • Education
Customer Support
  • Frequently Asked Questions
  • Customer Service Portal
  • Privacy Policy
  • Terms Of Use
  • Single Issues For Purchase
  • International Print
Commercial Services
  • Advertising
  • Fortune Brand Studio
  • Fortune Analytics
  • Fortune Conferences
  • Business Development
  • Group Subscriptions
About Us
  • About Us
  • Press Center
  • Work At Fortune
  • Terms And Conditions
  • Site Map
  • About Us
  • Press Center
  • Work At Fortune
  • Terms And Conditions
  • Site Map
  • Facebook icon
  • Twitter icon
  • LinkedIn icon
  • Instagram icon
  • Pinterest icon

Latest in Tech

Microsoft’s next big bet isn’t on a model but on becoming the Swiss Army knife of enterprise AI
AIMicrosoft
Microsoft’s next big bet isn’t on a model but on becoming the Swiss Army knife of enterprise AI
By Sheryl Estrada and Sebastian HerreraJuly 3, 2026
45 minutes ago
Those bots sending discounts to your email is dynamic pricing in action. Get revenge on those bots by abandoning your cart
RetailConsumer Spending
Those bots sending discounts to your email is dynamic pricing in action. Get revenge on those bots by abandoning your cart
By Catherina GioinoJuly 3, 2026
1 hour ago
z
AIdisruption
Meet the Zillennials: The luckiest micro-generation in the workforce, born between 1993 and 1998
By Nick LichtenbergJuly 3, 2026
2 hours ago
Most cancer philanthropy funds research. This winery cofounder is paying for the caregivers and chair lifts families can’t afford
Successphilanthropy
Most cancer philanthropy funds research. This winery cofounder is paying for the caregivers and chair lifts families can’t afford
By Sydney LakeJuly 3, 2026
2 hours ago
A man in an orange vest opens door to a cargo truck.
AIData centers
Organized crime is building an AI hardware cargo theft economy: ‘The economics have become just crazy from the criminal opportunistic perspective’
By Sasha RogelbergJuly 3, 2026
2 hours ago
Michael Burry just shorted Caterpillar’s 172% AI rally. One analyst says his bet won’t even matter
Investingstock prices
Michael Burry just shorted Caterpillar’s 172% AI rally. One analyst says his bet won’t even matter
By Marco Quiroz-GutierrezJuly 2, 2026
13 hours ago

Most Popular

As Big Tech showers employees with perks to win the talent war, Nvidia built a nearly $5 trillion company by making people pay for their own lunch
Big Tech
As Big Tech showers employees with perks to win the talent war, Nvidia built a nearly $5 trillion company by making people pay for their own lunch
By Marco Quiroz-GutierrezJuly 1, 2026
2 days ago
Mark Zuckerberg feeds his cows macadamia nuts and beer to create the 'highest-quality beef in the world' on his $300 million estate in Hawaii
Success
Mark Zuckerberg feeds his cows macadamia nuts and beer to create the 'highest-quality beef in the world' on his $300 million estate in Hawaii
By Sasha RogelbergJuly 2, 2026
16 hours ago
Today, Emily Blunt is worth $80 million thanks to her Hollywood career—but she actually wanted to be a UN Spanish translator on $80K
Success
Today, Emily Blunt is worth $80 million thanks to her Hollywood career—but she actually wanted to be a UN Spanish translator on $80K
By Orianna Rosa RoyleJuly 2, 2026
1 day ago
Americans are escaping the U.S. for New Zealand where house prices have hit a new low—but only wealthy Americans with $3 million spare can invest
Success
Americans are escaping the U.S. for New Zealand where house prices have hit a new low—but only wealthy Americans with $3 million spare can invest
By Emma BurleighJuly 2, 2026
18 hours ago
Current price of oil as of July 2, 2026
Personal Finance
Current price of oil as of July 2, 2026
By Joseph HostetlerJuly 2, 2026
19 hours ago
MacKenzie Scott alone accounted for one-third of America's $19.2 billion in megagifts last year
Success
MacKenzie Scott alone accounted for one-third of America's $19.2 billion in megagifts last year
By Sydney LakeJune 25, 2026
8 days ago

© 2026 Fortune Media IP Limited. All Rights Reserved. Use of this site constitutes acceptance of our Terms of Use and Privacy Policy | CA Notice at Collection and Privacy Notice | Do Not Sell/Share My Personal Information
FORTUNE is a trademark of Fortune Media IP Limited, registered in the U.S. and other countries. FORTUNE may receive compensation for some links to products and services on this website. Offers may be subject to change without notice.