• Home
  • Latest
  • Fortune 500
  • Finance
  • Tech
  • Leadership
  • Lifestyle
  • Rankings
  • Multimedia
TechAI

Google researchers claim new breakthrough in getting AI to solve tough high school math problems

Jeremy Kahn
By
Jeremy Kahn
Jeremy Kahn
Editor, AI
Down Arrow Button Icon
Jeremy Kahn
By
Jeremy Kahn
Jeremy Kahn
Editor, AI
Down Arrow Button Icon
July 25, 2024, 11:30 AM ET
A man standing at a blackboard filled with equations and diagrams writing with chalk on the board.
Google researchers have just announced a breakthrough in creating an AI system that can solve tough math problems.Photo illustration by Getty Images

Google DeepMind says it has achieved a breakthrough in building an AI system that can handle complex mathematical problems.

Recommended Video

The research division, which is part of Alphabet-owned Google, announced Thursday that it has created a software system that combines multiple AI models and can score well-enough on the International Mathematical Olympiad (IMO), a global test of high school students’ mathematical talents, to be in the top quartile of contestants taking the test. This would be good enough to obtain a silver medal in the competition.

A boastworthy milestone in the machine-versus-mathlete contest, the feat also opens new possibilities for combining different approaches to AI in order to create more capable hybrid AI systems—something that Google said could eventually make its way into commercial products like its line-up of Gemini AI tools.

The news was an advancement on a system that the AI research lab had unveiled in January, called AlphaGeometry, that could solve geometry problems from the IMO about as well as top high school students. The new system—which combines a new model called AlphaProof and an updated and improved AlphaGeometry 2—can tackle all kinds of different mathematical problems and develop sophisticated answers.

The new system managed to answer the most difficult IMO question, which only five of the 609 human contestants in last week’s competition managed to solve. That said, the new system is not perfect—on two of the six problems in the IMO, the new system did not manage to find a solution, and on one problem it took the system three days to reach the correct answer. Human competitors have to solve three questions in four-and-a-half hours, so can average no more than 90 minutes per question.

Google DeepMind researchers said the new system is a step towards more powerful AI models that will be able to plan and reason about complex tasks, although they cautioned that the method would work best in situations where there was a clear way to determine if an output was a valid. This is the case, for example, in software coding where the code will only compile and run if it is valid. David Silver, one of the Google DeepMind researchers who worked on the new system, said it might also work in areas where humans could provide unambiguous feedback about whether the solution the AI produced was a good one.

Google DeepMind said it would incorporate insights from the new system into future versions of its Gemini AI models, although they did not say exactly how this would be done or how soon Gemini might see these upgraded mathematical abilities.

Silver acknowledged that in many real world situations the validity of an answer is highly subjective, or the soundness of a solution can only be determined after a long time period. He said this would make it harder to apply the methods Google DeepMind used for the IMO problems to successfully take on these kind of real world problems.

AlphaZero to the rescue

Unlike other well-known AI models that consist of a single large neural network—a kind of AI software loosely based on the human brain—the AlphaProof system involves multiple neural networks, each performing different functions.

A large language model (LLM)—in this case Google’s Gemini model—is used as one part of the process. But the LLM does not itself do the mathematical reasoning. LLMs which underpin popular AI chatbots—such as Gemini, OpenAI’s ChatGPT, Anthropic’s Claude, and Meta’s AI chatbot—have struggled with solving math problems unless given access to outside tools, such as calculators or specialized math software.

Instead, the LLM is fine-tuned to translate text-based mathematical problems into a formal mathematical language. It then passes the problem to a different AI model, Google DeepMind’s AlphaZero, which was developed in 2017 and originally used to learn to play the strategy board games chess, go, and shogi at superhuman levels. But it turns out that AlphaZero can be used to puzzle out all kinds of problems in systems with clear rules and an easy way of keeping score.

In this case, the AlphaZero component is trained to suggest proof steps to the problem in Lean, a mathematical programming language. If the proof step is valid, it will compile correctly in Lean. If it isn’t, it won’t. This provides a reward signal—much like points in a video game—to AlphaZero. In this way, the AlphaZero component of AlphaProof, learns by trial and error to take steps that are more likely to result in valid solutions. AlphaProof was trained on about one million of examples of IMO problems in the weeks leading up to the competition, Google DeepMind said, and it continued to improve while working on the IMO contest problems.

In cases where the problem involved geometry, the problem was given instead to AlphaGeometry 2. AlphaGeometry 2 is also a hybrid system, combing an LLM component with a component that uses symbolic reasoning. The new AlphaGeometry could solve 83% of IMO geometry problems compared to just 53% for its predecessor. In one case, AlphaGeomerty was able to solve a highly complex geometry problem in just 19 seconds, a feat more akin to a flash of inspiration than a brute force approach based on endless trial and error. In another case, the proof AlphaGeometry offered initially confused some mathematicians who examined it, but they determined it was actually an elegant and highly-unusual way of solving the problem.

Impact on human mathematicians

Pushmeet Kohli, who heads Google DeepMind’s AI for science division, said he saw AlphaProof and AlphaGeometry 2 primarily as tools for helping mathematicians in their work. Silver said he did not see these new mathematical AIs challenging the relevancy of academic mathematicians.

But Timothy Gowers, who is a director of research in mathematics at the University of Cambridge and a past winner of the Fields Medal—a prize that is awarded only once every four years to two to four mathematicians under the age of 40 who have contributed the most to the field—reviewed the proofs AlphaProof and AlphaGeometry 2 produced and said he came away impressed. “I could recognize familiar-looking arguments that had come out of the system,” he said.

He also said that some of the problems required him, as a human mathematician, “to dig quite deep” and come up with what he called “a sort of magic key” that suddenly turns a problem that looks unsolvable into one that is imminently solvable. Gowers said he was surprised that the system had discovered a few of these magic keys because his intuition is that they would be difficult to stumble upon by naïve trial-and-error without any underlying understanding of the mathematical principles involved. But he said he reserved judgment as to whether this meant AlphaProof had actually developed something akin to mathematical intuition. He said more research would be needed to understand more about exactly how the system managed to puzzle out answers to the IMO problems.

Gowers noted that IMO problems were much simpler than what research mathematicians work on. But, compared to Kohli and Silver, Gowers was far less sanguine about what future would hold if AI models kept improving at the current clip. “I actually think that when computers become really good at finding extremely hard proofs, that’s more or less game over for mathematical research,” he said. “I’m not trying to suggest that we’re all that close to that at the moment, but I just, I’m thinking a long way ahead, but how long ahead that really is, is very hard to say.”

Correction, July 25: An earlier version of this story misidentified the name of Google DeepMind’s new AI system for solving complex mathematical proofs. It is called AlphaProof not AlphaSolver.

Join us at the Fortune Workplace Innovation Summit May 19–20, 2026, in Atlanta. The next era of workplace innovation is here—and the old playbook is being rewritten. At this exclusive, high-energy event, the world’s most innovative leaders will convene to explore how AI, humanity, and strategy converge to redefine, again, the future of work. Register now.
About the Author
Jeremy Kahn
By Jeremy KahnEditor, AI
LinkedIn iconTwitter icon

Jeremy Kahn is the AI editor at Fortune, spearheading the publication's coverage of artificial intelligence. He also co-authors Eye on AI, Fortune’s flagship AI newsletter.

See full bioRight Arrow Button Icon

Latest in Tech

Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025

Most Popular

Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Fortune Secondary Logo
Rankings
  • 100 Best Companies
  • Fortune 500
  • Global 500
  • Fortune 500 Europe
  • Most Powerful Women
  • Future 50
  • World’s Most Admired Companies
  • See All Rankings
Sections
  • Finance
  • Fortune Crypto
  • Features
  • Leadership
  • Health
  • Commentary
  • Success
  • Retail
  • Mpw
  • Tech
  • Lifestyle
  • CEO Initiative
  • Asia
  • Politics
  • Conferences
  • Europe
  • Newsletters
  • Personal Finance
  • Environment
  • Magazine
  • Education
Customer Support
  • Frequently Asked Questions
  • Customer Service Portal
  • Privacy Policy
  • Terms Of Use
  • Single Issues For Purchase
  • International Print
Commercial Services
  • Advertising
  • Fortune Brand Studio
  • Fortune Analytics
  • Fortune Conferences
  • Business Development
  • Group Subscriptions
About Us
  • About Us
  • Editorial Calendar
  • Press Center
  • Work At Fortune
  • Diversity And Inclusion
  • Terms And Conditions
  • Site Map
  • About Us
  • Editorial Calendar
  • Press Center
  • Work At Fortune
  • Diversity And Inclusion
  • Terms And Conditions
  • Site Map
  • Facebook icon
  • Twitter icon
  • LinkedIn icon
  • Instagram icon
  • Pinterest icon

Latest in Tech

Even Nvidia’s own research teams can’t get enough GPUs amid the race for AI computing power
NewslettersEye on AI
Even Nvidia’s own research teams can’t get enough GPUs amid the race for AI computing power
By Sharon GoldmanApril 9, 2026
7 hours ago
You’re looking at the AI revolution all wrong, top economist says: 40% unemployment and a 3-day work week are the same thing
AIdisruption
You’re looking at the AI revolution all wrong, top economist says: 40% unemployment and a 3-day work week are the same thing
By Nick LichtenbergApril 9, 2026
8 hours ago
Zoom CEO Eric Yuan
Successthe future of work
‘I hate working 5 days’: Zoom CEO says traditional work schedules are becoming obsolete—and predicts a 3-day workweek by 2031
By Preston ForeApril 9, 2026
9 hours ago
Nutella seen aboard the Orion spacecraft Integrity.
RetailFood and drink
Nutella jumps on the best product placement money can’t buy: a trip to the far side of the moon
By Catherina GioinoApril 9, 2026
10 hours ago
kash
Cybersecuritycyber
Trump’s ‘cease-fire’ won’t stop Iranian hackers for long, cyber experts say
By David Klepper and The Associated PressApril 9, 2026
10 hours ago
lego
PoliticsIran
AI-savvy pro-Iran groups troll America with Lego Movie-style propaganda videos mocking American failure
By Sam McNeil and The Associated PressApril 9, 2026
11 hours ago

Most Popular

The U.S. government is spending $88 billion a month in interest on national debt—equal to spending on defense and education combined
Economy
The U.S. government is spending $88 billion a month in interest on national debt—equal to spending on defense and education combined
By Fortune EditorsApril 9, 2026
13 hours ago
2 years ago, Saudi Arabia quietly canceled the ‘petrodollar’ deal with America that wired the world economy for 50 years. Then war broke out in Iran
Energy
2 years ago, Saudi Arabia quietly canceled the ‘petrodollar’ deal with America that wired the world economy for 50 years. Then war broke out in Iran
By Fortune EditorsApril 7, 2026
2 days ago
The U.S. had a national debt ‘home run’ in its grasp, says Jamie Dimon. But the government did nothing, and now its best option is crisis management
Economy
The U.S. had a national debt ‘home run’ in its grasp, says Jamie Dimon. But the government did nothing, and now its best option is crisis management
By Fortune EditorsApril 8, 2026
2 days ago
Self-made billionaire MrBeast says his work-life balance is nonexistent and calls it a ‘miracle’ if he works less than 15-hour days: ‘I live to work’
Success
Self-made billionaire MrBeast says his work-life balance is nonexistent and calls it a ‘miracle’ if he works less than 15-hour days: ‘I live to work’
By Fortune EditorsApril 8, 2026
1 day ago
Gen Z workers are so fearful AI will take their job they’re intentionally sabotaging their company’s AI rollout
AI
Gen Z workers are so fearful AI will take their job they’re intentionally sabotaging their company’s AI rollout
By Fortune EditorsApril 8, 2026
1 day ago
Gen Z doesn't want your full-time job. They want several part-time roles, and it's reshaping the entire workforce
Success
Gen Z doesn't want your full-time job. They want several part-time roles, and it's reshaping the entire workforce
By Fortune EditorsApril 9, 2026
16 hours ago

© 2026 Fortune Media IP Limited. All Rights Reserved. Use of this site constitutes acceptance of our Terms of Use and Privacy Policy | CA Notice at Collection and Privacy Notice | Do Not Sell/Share My Personal Information
FORTUNE is a trademark of Fortune Media IP Limited, registered in the U.S. and other countries. FORTUNE may receive compensation for some links to products and services on this website. Offers may be subject to change without notice.