Google wasn’t the only one to make errors in its A.I. demo. Analysis finds Microsoft’s Bing flubbed a string of financial figures

Nicholas GordonBy Nicholas GordonAsia Editor
Nicholas GordonAsia Editor

Nicholas Gordon is an Asia editor based in Hong Kong, where he helps to drive Fortune’s coverage of Asian business and economics news.

Microsoft executive Yusuf Mehdi next to a presentation showing OpenAI's logo.
Microsoft's demo of Bing AI displayed incorrect financial data from Gap's earnings report.
Jason Redmon—AFP via Getty Images

After a single mistake in Google’s demo of its new A.I. program erased billions of dollars in market value, employees complained that the company had rushed out the news to get ahead of Microsoft’s own A.I. announcement a day later

Except it turns out that Microsoft’s demo also contained answers that were incomplete, confusingly sourced, and, at worst, entirely incorrect, according to an analysis published by independent A.I. researcher Dmitri Brereton.

These errors highlight a growing problem with A.I. chatbots like OpenAI’s ChatGPT and Google’s Bard: Both employees and users seem to trust that these bots’ well-written and conversational answers are accurate, meaning mistakes at a public event might go unnoticed for days.

Wrong answers

During Microsoft’s demo of its new chatbot, aired last week, the company’s corporate vice president for search, Yusuf Mehdi, asked the program for “key takeaways” from Gap’s most recent earnings report. The bot obliged with a series of bullet points with the company’s key financial data and even a comparison with fellow clothing company Lululemon’s most recent earnings.

Yet Brereton found that Microsoft’s bot gave incorrect figures. For example, Bing’s A.I. said that Gap’s operating margin, adjusted for impairment costs, was 5.9%. According to Gap’s earnings report, the company’s adjusted operating margin for the recent quarter was 3.9%. The unadjusted margin was 4.6%. 

Bing A.I. also said that Gap was forecasting sales growth in the low double digits in the coming quarter. In fact, Gap is projecting a decline in net sales in the “mid-single digits.” 

The demo also made mistakes when it came to Lululemon’s financial data. For example, Bing’s A.I. reported that the clothing company’s operating margin in the most recent quarter was 20.7%. The company’s earnings report shows an adjusted operating margin of 19.4%.

Further analysis from CNN uncovered that Bing’s A.I. would attribute its answers to sources that did not contain the information in question.

Microsoft hopes that programs like ChatGPT, which can generate conversational answers in response to text prompts, can undercut Google’s dominance in search. The company is investing $10 billion in ChatGPT’s developer, OpenAI. 

“We recognize that there is still work to be done and are expecting that the system may make mistakes during this preview period, which is why the feedback is critical so we can learn and help the models get better,” said a Microsoft spokesperson in a statement.

Mistakes with Google’s Bard

Microsoft competitor Google was recently hit hard by the revelation that Google’s A.I., titled Bard, made a factual error in its demo. 

In a blog post, Google included a video of a user asking the Bard A.I. for interesting facts about the James Webb Space Telescope, launched in 2021. Google’s A.I. claimed the telescope was the first to discover a planet outside of our solar system. In fact, the first exoplanet was discovered by the Very Large Telescope array in Chile in 2004.

Shares in Alphabet, Google’s parent company, crashed 8% after the error was first reported by Reuters, wiping out around $100 billion of the company’s market capitalization. 

At a conference Monday, Alphabet chairman John Hennessy said that these kinds of mistakes were why the company hesitated to announce its own ChatGPT competitor. “You don’t want to put a system out that either says wrong things or sometimes says toxic things,” he said, according to CNBC.

Hallucinated answers

People testing these chatbots are discovering that certain prompts can lead to strange results, including instances where the bot responds with argumentative or “unhinged” answers. Users of Bing’s A.I. are now compiling a repository of instances of “failure cases” to help with “further study.” 

Tech leaders like Apple cofounder Steve Wozniak and businessman Mark Cuban have warned that generative A.I. can make mistakes and spread misinformation if used incorrectly. Those working in the field even have a term for A.I.-generated answers that appear entirely made up, calling them “hallucinations.”

Even Vint Cerf, an early pioneer of the internet and a current vice president at Google, revealed that a chatbot got details of his biography wrong when asked. “We know it doesn’t always work the way we would like it to,” he said at a conference on Tuesday, according to CNBC

Update, February 16, 2022: This article has been updated with a comment from Microsoft.

Learn how to navigate and strengthen trust in your business with The Trust Factor, a weekly newsletter examining what leaders need to succeed. Sign up here.