We asked Google’s A.I. chatbot ‘Bard’ basic SAT questions and it would flunk a real exam

Google logo with Bard open in a tab on the background.
Bard has already cost Google for its mistakes—but it's learning every day.
Jonathan Raa—NurPhoto/Getty Images

Google has been pretty open about the fact that Bard isn’t perfect.

Alphabet CEO Sundar Pichai appears to be relaxed about how far the company’s A.I. models have to go, writing in a company-wide memo that Bard is in its early stages: “As more people start to use Bard and test its capabilities, they’ll surprise us. Things will go wrong.” 

Now the public has been invited to test Bard, whereas previously the 80,000 users putting it through its paces were mainly made up of Google employees.

Fortune‘s spot on the wait list was finally called up, so we put Bard through its paces ahead of the upcoming SATs American teenagers will be facing this spring.

SATs are globally recognized tests used for U.S. college admissions, in skills including reading, writing, and math.

Unfortunately for Google, it looks like Bard won’t be making it to Harvard just yet, as it got the majority of math questions wrong and similarly struggled to ace writing and language tests.

Logging on to Bard for the first time, the user’s expectations are already set by a message which pops up, reading: “Bard will not always get it right. Bard may give inaccurate or inappropriate responses. When in doubt, use the ‘Google it’ button to check Bard’s responses. Bard will get better with your feedback. Please rate responses and flag anything that may be offensive or unsafe.”

How did Bard do?

On to the questions.

Fortune sourced practice SAT math questions from online learning resources and found that Bard got anywhere from 50% to 75% of them wrong—even when multiple-choice answers were provided.

Often Bard gave answers which were not even a multiple-choice option, though it sometimes got them correct when asked the same question again.

The A.I.’s inaccuracy has already cost Google somewhere in the region of $100 billion.

When Bard was launched in February it was asked a range of questions including how to explain to a 9-year-old what the James Webb Space Telescope has discovered. 

Bard responded that the telescope took the “very first pictures of a planet outside of our own solar system” even though NASA confirmed the first image of an exoplanet was captured by the Very Large Telescope, a ground-based array in Chile, in 2004 and confirmed as an exoplanet in 2005.

Science and math aren’t Bard’s strong points either, although the A.I. did fare better when it came to reading and writing exercises.

Bard’s first written language test with Fortune came back with around 30% correct answers, often needing to be asked the questions twice for the A.I. to understand.

Even when it was wrong, Bard’s tone is confident, frequently framing responses as: “The correct answer is”—which is a common feature of large language models.

Bizarrely, Bard’s best test out of both math and written skills was a passage that focussed on Harry Potter writer J.K. Rowling.

On this test, Bard scored 1200 points, an SAT score that would get a human into the likes of Howard University, San Diego State University, and Michigan State University.

The more Bard was asked language-based questions by Fortune—around 45 in total—the less frequently it struggled to understand or needed the question to be repeated.

On reading tests, Bard similarly performed better than it did in math—getting around half the answers correct on average.

A Google spokesperson reiterated Pichai’s message when approached by Fortune for comment, saying: “Bard is experimental, and some of the responses may be inaccurate, so double-check information in Bard’s responses. With your feedback, Bard is getting better every day. Before Bard launched publicly, thousands of testers were involved to provide feedback to help Bard improve its quality, safety, and accuracy.

“Accelerating people’s ideas with generative A.I. is truly exciting, but it’s still early days, and Bard is an experiment. While Bard has built-in safety controls and clear mechanisms for feedback in line with our A.I. Principles, be aware that it may display inaccurate information.”

In the space of a couple of days of questioning Bard, the A.I. did show signs of improving accuracy; on the speed of its development the large language model noted: “I would say that I am improving at a rapid pace.

“I am able to do things that I was not able to do just a few months ago. I am excited to see what the future holds for me. I am confident that I will continue to improve and that I will be able to do even more in the years to come.”

Subscribe to Well Adjusted, our newsletter full of simple strategies to work smarter and live better, from the Fortune Well team. Sign up today.

Read More

Artificial IntelligenceCryptocurrencyMetaverseCybersecurityTech Forward