• Home
  • News
  • Fortune 500
  • Tech
  • Finance
  • Leadership
  • Lifestyle
  • Rankings
  • Multimedia
TechAI

Microsoft knows you love tricking its AI chatbots into doing weird stuff and it’s designing ‘prompt shields’ to stop you

By
Jackie Davalos
Jackie Davalos
and
Bloomberg
Bloomberg
Down Arrow Button Icon
By
Jackie Davalos
Jackie Davalos
and
Bloomberg
Bloomberg
Down Arrow Button Icon
March 29, 2024, 8:24 AM ET
Sarah Bird
Sarah Bird, principal program manager and responsible AI lead for Azure AI at Microsoft, at the company's headquarters in Redmond, Washington, on Feb. 7, 2023. Chona Kasinger/Bloomberg via Getty Images

Microsoft Corp. is trying to make it harder for people to trick artificial intelligence chatbots into doing weird things. 

Recommended Video

New safety features are being built into Azure AI Studio which lets developers build customized AI assistants using their own data, the Redmond, Washington-based company said in a blog post on Thursday. 

The tools include “prompt shields,” which are designed to detect and block deliberate attempts — also known as prompt injection attacks or jailbreaks  — to make an AI model behave in an unintended way. Microsoft is also addressing “indirect prompt injections,” when hackers insert malicious instructions into the data a model is trained on and trick it into performing such unauthorized actions as stealing user information or hijacking a system. 

Such attacks are “a unique challenge and threat,” said Sarah Bird, Microsoft’s chief product officer of responsible AI. The new defenses are designed to spot suspicious inputs and block them in real time, she said. Microsoft is also rolling out a feature that alerts users when a model makes things up or generates erroneous responses.

Microsoft is keen to boost trust in its generative AI tools, which are now being used by consumers and corporate customers alike. In February, the company investigated incidents involving its Copilot chatbot, which was generating responses that ranged from weird to harmful. After reviewing the incidents, Microsoft said users had deliberately tried to fool Copilot into generating the responses.

“Certainly we see it increasing as there’s more use of the tools but also as more people are aware of these different techniques,” Bird said. Tell-tale signs of such attacks include asking a chatbot a question multiple times or prompts that describe role-playing. 

Microsoft is OpenAI’s largest investor and has made the partnership a key part of its AI strategy. Bird said Microsoft and OpenAI are dedicated to deploying AI safely and building protections into the large language models underlying generative AI. 

“However, you can’t rely on the model alone,” she said. “These jailbreaks for example, are an inherent weakness of the model technology.” 

Fortune Brainstorm AI returns to San Francisco Dec. 8–9 to convene the smartest people we know—technologists, entrepreneurs, Fortune Global 500 executives, investors, policymakers, and the brilliant minds in between—to explore and interrogate the most pressing questions about AI at another pivotal moment. Register here.
About the Authors
By Jackie Davalos
See full bioRight Arrow Button Icon
By Bloomberg
See full bioRight Arrow Button Icon
Rankings
  • 100 Best Companies
  • Fortune 500
  • Global 500
  • Fortune 500 Europe
  • Most Powerful Women
  • Future 50
  • World’s Most Admired Companies
  • See All Rankings
Sections
  • Finance
  • Leadership
  • Success
  • Tech
  • Asia
  • Europe
  • Environment
  • Fortune Crypto
  • Health
  • Retail
  • Lifestyle
  • Politics
  • Newsletters
  • Magazine
  • Features
  • Commentary
  • Mpw
  • CEO Initiative
  • Conferences
  • Personal Finance
  • Education
Customer Support
  • Frequently Asked Questions
  • Customer Service Portal
  • Privacy Policy
  • Terms Of Use
  • Single Issues For Purchase
  • International Print
Commercial Services
  • Advertising
  • Fortune Brand Studio
  • Fortune Analytics
  • Fortune Conferences
  • Business Development
About Us
  • About Us
  • Editorial Calendar
  • Press Center
  • Work At Fortune
  • Diversity And Inclusion
  • Terms And Conditions
  • Site Map

© 2025 Fortune Media IP Limited. All Rights Reserved. Use of this site constitutes acceptance of our Terms of Use and Privacy Policy | CA Notice at Collection and Privacy Notice | Do Not Sell/Share My Personal Information
FORTUNE is a trademark of Fortune Media IP Limited, registered in the U.S. and other countries. FORTUNE may receive compensation for some links to products and services on this website. Offers may be subject to change without notice.