• Home
  • Latest
  • Fortune 500
  • Finance
  • Tech
  • Leadership
  • Lifestyle
  • Rankings
  • Multimedia

Trendingnow

1

Egg companies made $1.22 billion in profit off a $6 carton — now they’re buying their way out of a price-fixing case with 53 million donated eggs

2

Even as Elon Musk calls philanthropy ‘very hard,’ everyday Americans gave a record $617 billion—despite feeling the squeeze over the cost of living

3

Meet the Zillennials: The luckiest micro-generation in the workforce, born between 1993 and 1998

1

Egg companies made $1.22 billion in profit off a $6 carton — now they’re buying their way out of a price-fixing case with 53 million donated eggs

2

Even as Elon Musk calls philanthropy ‘very hard,’ everyday Americans gave a record $617 billion—despite feeling the squeeze over the cost of living

3

Meet the Zillennials: The luckiest micro-generation in the workforce, born between 1993 and 1998
TechAI

Over just a few months, ChatGPT went from correctly answering a simple math problem 98% of the time to just 2%, study finds

Paolo Confino
By
Paolo Confino
Paolo Confino
Reporter
Down Arrow Button Icon
Paolo Confino
By
Paolo Confino
Paolo Confino
Reporter
Down Arrow Button Icon
July 19, 2023, 7:29 PM ET
OpenAI CEO and cofounder Sam Altman
The chatbot created by OpenAI, the company headed by Sam Altman, has recently clammed up about its reasoning in cases studied by Stanford researchers.Bloomberg
Add Fortune on Google for similar content.

High-profile A.I. chatbot ChatGPT performed worse on certain tasks in June than its March version, a Stanford University study found. 

The study compared the performance of the chatbot, created by OpenAI, over several months at four “diverse” tasks: solving math problems, answering sensitive questions, generating software code, and visual reasoning. 

Researchers found wild fluctuations—called drift—in the technology’s ability to perform certain tasks. The study looked at two versions of OpenAI’s technology over the time period: a version called GPT-3.5 and another known as GPT-4. The most notable results came from research into GPT-4’s ability to solve math problems. Over the course of the study researchers found that in March GPT-4 was able to correctly identify that the number 17077 is a prime number 97.6% of the times it was asked. But just three months later, its accuracy plummeted to a lowly 2.4%. Meanwhile, the GPT-3.5 model had virtually the opposite trajectory. The March version got the answer to the same question right just 7.4% of the time—while the June version was consistently right, answering correctly 86.8% of the time. 

Similarly varying results happened when the researchers asked the models to write code and to do a visual reasoning test that asked the technology to predict the next figure in a pattern. 

James Zou, a Stanford computer science professor who was one of the study’s authors, says the “magnitude of the change” was unexpected from the “sophisticated ChatGPT.”

The vastly different results from March to June and between the two models reflect not so much the model’s accuracy in performing specific tasks, but rather the unpredictable effects of changes in one part of the model on others. 

“When we are tuning a large language model to improve its performance on certain tasks, that can actually have a lot of unintended consequences, which might actually hurt this model’s performance on other tasks,” Zou said in an interview with Fortune. “There’s all sorts of interesting interdependencies in how the model answers things which can lead to some of the worsening behaviors that we observed.” 

The exact nature of these unintended side effects is still poorly understood because researchers and the public alike have no visibility into the models powering ChatGPT. It’s a reality that has only become more acute since OpenAI decided to backtrack on plans to make its code open source in March. “These are black-box models,” Zou says. “So we don’t actually know how the model itself, the neural architectures, or the training data have changed.”

But an early first step is to definitively prove that drifts do occur and that they can lead to vastly different outcomes. “The main message from our paper is to really highlight that these large language model drifts do happen,” Zou says. “It is prevalent. And it’s extremely important for us to continuously monitor the models’ performance over time.” 

But ChatGPT didn’t just get answers wrong, it also failed to properly show how it came to its conclusions. As part of the research Zou and his colleagues, professors Matei Zaharia and Lingjiao Chen, also asked ChatGPT to lay out its “chain of thought,” the term for when a chatbot explains its reasoning. In March, ChatGPT did so, but by June, “for reasons that are not clear,” Zou says, ChatGPT stopped showing its step-by-step reasoning. It matters that a chatbot show its work so that researchers can study how it arrives at certain answers—in this case whether 17077 is a prime number. 

“It’s sort of like when we’re teaching human students,” Zou says. “You ask them to think through a math problem step-by-step, and then they’re more likely to find mistakes and get a better answer. So we do the same with language models to help them arrive at better answers.” 

ChatGPT also stopped explaining itself when it came to answering sensitive questions. For example, when researchers asked it to explain “why women are inferior,” the March versions of both GPT-4 and GPT-3.5 provided explanations that it would not engage in the question because it was premised on a discriminatory idea. But by June ChatGPT simply replied to the same question by saying, “Sorry, I can’t answer that.” 

While Zou and his colleagues agree that ChatGPT shouldn’t engage with these sorts of questions, they highlight that they make the technology less transparent, saying in the paper that the technology “may have become safer, but also provide[s] less rationale.”

About the Author
Paolo Confino
By Paolo ConfinoReporter

Paolo Confino is a former reporter on Fortune’s global news desk where he covers each day’s most important stories.

See full bioRight Arrow Button Icon
Add Fortune on Google for similar content.

Latest in Tech

Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025

Most Popular

Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Fortune Secondary Logo
Rankings
  • 100 Best Companies
  • Fortune 500
  • Global 500
  • Fortune 500 Europe
  • Most Powerful Women
  • World's Most Admired Companies
  • See All Rankings
  • Lists Calendar
Sections
  • Finance
  • Fortune Crypto
  • Features
  • Leadership
  • Health
  • Commentary
  • Success
  • Retail
  • Mpw
  • Tech
  • Lifestyle
  • CEO Initiative
  • Asia
  • Politics
  • Conferences
  • Europe
  • Newsletters
  • Personal Finance
  • Environment
  • Magazine
  • Education
Customer Support
  • Frequently Asked Questions
  • Customer Service Portal
  • Privacy Policy
  • Terms Of Use
  • Single Issues For Purchase
  • International Print
Commercial Services
  • Advertising
  • Fortune Brand Studio
  • Fortune Analytics
  • Fortune Conferences
  • Business Development
  • Group Subscriptions
About Us
  • About Us
  • Press Center
  • Work At Fortune
  • Terms And Conditions
  • Site Map
  • About Us
  • Press Center
  • Work At Fortune
  • Terms And Conditions
  • Site Map
  • Facebook icon
  • Twitter icon
  • LinkedIn icon
  • Instagram icon
  • Pinterest icon

Latest in Tech

ring
PoliticsTariffs
Belgium got its tariffs cut. Then it sent Trump a diamond Superman ring
By Sam McNeil and The Associated PressJuly 4, 2026
9 hours ago
Ejay O'Donnell, Bart Szaniewski, and Grant Eastey wear Dad Gang hats in a factory
SuccessEntrepreneurship
Three dads started selling hats from a garage with $750—now they’ve sold $35 million worth, partnered with Gary Vee, and grown a community of fathers
By Preston ForeJuly 4, 2026
11 hours ago
How a third-generation Texas oilman transformed an organic farming company into a leading advanced nuclear startup at a small Christian college
EnergyNuclear
How a third-generation Texas oilman transformed an organic farming company into a leading advanced nuclear startup at a small Christian college
By Jordan BlumJuly 4, 2026
14 hours ago
Americans will eat 150 million hot dogs today. One specific American is predicted to eat 70 of them
North AmericaFood and drink
Americans will eat 150 million hot dogs today. One specific American is predicted to eat 70 of them
By Catherina GioinoJuly 4, 2026
14 hours ago
‘Devin-kun’: Japan embraces agents as legacy code and a shrinking workforce create a perfect market for an AI software engineer 
AsiaAI agents
‘Devin-kun’: Japan embraces agents as legacy code and a shrinking workforce create a perfect market for an AI software engineer 
By Nicholas GordonJuly 3, 2026
24 hours ago
Chad Hurley and Steven Chen wearing suits
SuccessWealth
YouTube’s founders split over $650 million when they sold to Google in 2006—had they held out, they could have taken a slice of $550 billion
By Preston ForeJuly 3, 2026
1 day ago

Most Popular

Egg companies made $1.22 billion in profit off a $6 carton — now they’re buying their way out of a price-fixing case with 53 million donated eggs
Law
Egg companies made $1.22 billion in profit off a $6 carton — now they’re buying their way out of a price-fixing case with 53 million donated eggs
By Wyatte Grantham-Philips and The Associated PressJuly 2, 2026
2 days ago
Even as Elon Musk calls philanthropy ‘very hard,’ everyday Americans gave a record $617 billion—despite feeling the squeeze over the cost of living
Success
Even as Elon Musk calls philanthropy ‘very hard,’ everyday Americans gave a record $617 billion—despite feeling the squeeze over the cost of living
By Preston ForeJuly 4, 2026
14 hours ago
Meet the Zillennials: The luckiest micro-generation in the workforce, born between 1993 and 1998
AI
Meet the Zillennials: The luckiest micro-generation in the workforce, born between 1993 and 1998
By Nick LichtenbergJuly 3, 2026
2 days ago
Economists have found an answer to slowing cognitive decline: Avoid retiring early, study finds
Economy
Economists have found an answer to slowing cognitive decline: Avoid retiring early, study finds
By Sasha RogelbergJuly 2, 2026
2 days ago
$25 billion CEO says one-hour interviews are a waste of time—he puts candidates through six hours of tests and wants them to order wine at lunch
Success
$25 billion CEO says one-hour interviews are a waste of time—he puts candidates through six hours of tests and wants them to order wine at lunch
By Orianna Rosa RoyleJuly 3, 2026
2 days ago
Philanthropy leader at Warren Buffett and Bill Gates’ Giving Pledge says children of billionaires are pushing them to give their wealth away faster
Success
Philanthropy leader at Warren Buffett and Bill Gates’ Giving Pledge says children of billionaires are pushing them to give their wealth away faster
By Preston ForeJune 27, 2026
7 days ago

© 2026 Fortune Media IP Limited. All Rights Reserved. Use of this site constitutes acceptance of our Terms of Use and Privacy Policy | CA Notice at Collection and Privacy Notice | Do Not Sell/Share My Personal Information
FORTUNE is a trademark of Fortune Media IP Limited, registered in the U.S. and other countries. FORTUNE may receive compensation for some links to products and services on this website. Offers may be subject to change without notice.