• Home
  • Latest
  • Fortune 500
  • Finance
  • Tech
  • Leadership
  • Lifestyle
  • Rankings
  • Multimedia
NewslettersEye on AI

Can AI be used to control safety critical systems? A U.K.-funded research program aims to find out

Jeremy Kahn
By
Jeremy Kahn
Jeremy Kahn
Editor, AI
Down Arrow Button Icon
Jeremy Kahn
By
Jeremy Kahn
Jeremy Kahn
Editor, AI
Down Arrow Button Icon
June 10, 2025, 1:24 PM ET
Image of cooling towers of a nuclear power plant and electric transmission lines.
The U.K.'s Advanced Research and Invention Agency (ARIA) is funding a project to use frontier AI models to design and test new control algorithms for safety critical systems, such as nuclear power plants and power grids.Milan Jaros—Bloomberg via Getty Images

Hello and welcome to Eye on AI. In this edition…Meta hires Scale AI founder for new ‘superintelligence’ drive…OpenAI on track for $10 billion in annual recurring revenue…Study says ‘reasoning models’ can’t really reason.

Recommended Video

Today’s most advanced AI models are relatively useful for lots of things—writing software code, research, summarizing complex documents, writing business correspondence, editing, generating images and music, role-playing human interactions, the list goes on. But relatively is the key word here. As anyone who uses these models soon discovers, they remain frustratingly error-prone and erratic. So how could anyone think that these systems could be used to run critical infrastructure, such as electrical grids, air traffic control, communications networks, or transportation systems?

Yet that is exactly what a project funded by the U.K.’s Advanced Research and Invention Agency (ARIA) is hoping to do. ARIA was designed to be somewhat similar to the U.S. Defense Advanced Research Projects Agency (DARPA), with government funding for moonshot research that has potential governmental or strategic applications. The £59 million ($80 million) ARIA project, called The Safeguarded AI Program, aims to find a way to combine AI “world-models” with mathematical proofs that could guarantee that the system’s outputs were valid.

David Dalrymple, the machine learning researcher who is leading the ARIA effort, told me that the idea was to use advanced AI models to create a “production facility” that would churn out domain-specific control algorithms for critical infrastructure. These algorithms would be mathematically tested to ensure that they meet the required performance specifications. If the control algorithms pass this test, the controllers—but not the frontier AI models that developed them—would be deployed to help run critical infrastructure more efficiently.

Dalrymple (who is known by his social media handle Davidad) gives the example of the U.K.’s electricity grid. The grid’s operator currently acknowledges that if it could balance supply-and-demand on the grid more optimally, it could save £3 billion ($4 billion) that it spends each year essentially paying to have excess generation capacity up-and-running to avoid the possibility of a sudden blackout, he says. Better control algorithms could reduce those costs.

Besides the energy sector, ARIA is also looking at applications in supply chain logistics, biopharmaceutical manufacturing, self-driving vehicles, clinical trial design, and electric vehicle battery management.

AI to develop new control algorithms

Frontier AI models may be reaching the point now where they may be able to automate algorithmic research and development, Davidad says. “The idea is, let’s take that capability and turn it to narrow AI R&D,” he tells me. Narrow AI usually refers to AI systems that are designed to perform one particular, narrowly-defined task at superhuman levels, rather than an AI system that can perform many different kinds of tasks.

The challenge, even with these narrow AI systems, is then coming up with mathematical proofs to guarantee that their outputs will always meet the required technical specification. There’s an entire field known as “formal verification” that involves mathematically proving that software will always provide valid outputs under given conditions—but it’s notoriously difficult to apply to neural network-based AI systems. “Verifying even a narrow AI system is something that’s very labor intensive in terms of a cognitive effort required,” Davidad says. “And so it hasn’t been worthwhile historically to do that work of verifying except for really, really specialized applications like passenger aviation autopilots or nuclear power plant control.”

This kind of formally-verified software won’t fail because a bug causes an erroneous output. They can sometimes break down because they encounter conditions that fall outside their design specifications—for instance a load balancing algorithm for an electrical grid might not be able to handle an extreme solar storm that shorts out all of the grid’s transformers simultaneously. But even then, the software is usually designed to “fail safe” and revert back to manual control.

ARIA is hoping to show that frontier AI modes can be used to do the laborious formal verification of the narrow AI controller as well as develop the controller in the first place.

But will AI models cheat the verification tests?

But this raises another challenge. There’s a growing body of evidence that frontier AI models are very good at “reward hacking”—essentially finding ways to cheat to accomplish a goal—as well as at lying to their users about what they’ve actually done. The AI safety nonprofit METR (short for Model Evaluation & Threat Research) recently published a blog on all the ways OpenAI’s o3 model tried to cheat on various tasks.

ARIA says it is hoping to find a way around this issue too. “The frontier model needs to submit a proof certificate, which is something that is written in a formal language that we’re defining in another part of the program,” Davidad says. This “new language for proofs will hopefully be easy for frontier models to generate and then also easy for a deterministic, human audited algorithm to check.” ARIA has already awarded grants for work on this formal verification process.

Models for how this might work are starting to come into view. Google DeepMind recently developed an AI model called AlphaEvolve that is trained to search for new algorithms for applications such as managing data centers, designing new computer chips, and even figuring out ways to optimize the training of frontier AI models. Google DeepMind has also developed a system called AlphaProof that is trained to develop mathematical proofs and write them in a coding language called Lean that won’t run if the answer to the proof is incorrect.

ARIA is currently accepting applications from teams that want to run the core “AI production facility,” with the winner the £18 million grant to be announced on October 1. The facility, the location of which is yet to be determined, is supposed to be running by January 2026. ARIA is asking those applying to propose a new legal entity and governance structure for this facility. Davidad says ARIA does not want an existing university or a private company to run it. But the new organization, which might be a nonprofit, would partner with private entities in areas like energy, pharmaceuticals, and healthcare on specific controller algorithms. He said that in addition to the initial ARIA grant, the production facility could fund itself by charging industry for its work developing domain-specific algorithms.

It’s not clear if this plan will work. For every transformational DARPA project, many more fail. But ARIA’s bold bet here looks like one worth watching.

With that, here’s more AI news.

Jeremy Kahn
jeremy.kahn@fortune.com
@jeremyakahn

Want to know more about how to use AI to transform your business? Interested in what AI will mean for the fate of companies, and countries? Why not join me in Singapore on July 22 and 23 for Fortune Brainstorm AI Singapore. We will dive deep into the latest on AI agents, examine the data center build out in Asia, and talk to top leaders from government, board rooms, and academia in the region and beyond. You can apply to attend here
.

AI IN THE NEWS

Meta hires Scale AI CEO Alexandr Wang to create new AI “superintelligence” lab. That’s according to the New York Times, which cited four unnamed sources it said were familiar with Meta’s plans. The 28-year old Wang, who cofounded Scale, would head the new Meta unit, joined by other Scale employees. Meanwhile, Meta would invest billions of dollars into Scale, which specializes in providing training data to AI companies. The new Meta unit devoted to “artificial superintelligence,” a theoretical kind of AI that would be more intelligent than all of humanity combined, will sit alongside existing Meta divisions responsible for building its Llama AI models as well as its Fundamental AI Research lab (FAIR). That lab is still headed by Meta chief scientist Yann LeCun, who has been pursuing new kinds of AI models and has said that current techniques cannot deliver artificial general intelligence, which is AI as capable as most humans at most tasks, let alone superintelligence.

U.K. announces “sovereign AI” push. British Prime Minister Keir Starmer said the country would invest £1 billion to build new AI data centers to increase the amount of computing power available in the country by 20-fold. He said the U.K. government would begin using an AI assistant called “Extract” based on Google’s Gemini AI model. He announced plans to create a new “UK Sovereign AI Industry Forum” to accelerate AI adoption by British companies, with initial participation from BAE Systems, BT, and Standard Chartered. He also said that the U.K. government would help fund a new open-source data project on how molecules bind to proteins, a key consideration for drug discovery research. But Nvidia CEO Jensen Huang, who appeared alongside Starmer at a conference, noted that the country has so far lagged in having enough AI data centers. You can read more from The Guardian here and Financial Times here.

Apple to let third-party developers access its AI models. At its WWDC developer conference, the tech giant said it would allow its third-party developers to build applications that tap the abilities of its on-device AI models. But at the same time, the company did not announce any updates to its long-awaited “Apple Intelligence” version of Siri. You can read more from TechCrunch here and here.

OpenAI on track for $10 billion in annual recurring revenue. The figure has doubled in the past year and is driven by strong growth in its consumer, business, and API products. The number also excludes Microsoft licensing and large one-time deals. Despite losing $5 billion last year, the company is targeting $125 billion in revenue by 2029, CNBC reported citing an anonymous source it said was familiar with OpenAI’s figures.

EYE ON AI RESEARCH

“Reasoning” models don’t seem to actually reason. That is the conclusion of a bombshell paper called “The Illusion of Thinking” from researchers at Apple. They tested reasoning models from OpenAI (o1 and o3), DeepSeek (R1), and Anthropic (Claude 3.7 Sonnet) on a series of logic puzzles. These included the Tower of Hanoi, a game that involves moving a stack of different size disks across three pegs in a way that a larger disc never sits atop a smaller one.

They found that with simple versions of the games, standard large language models (LLMs) that don’t use reasoning, performed better and were far more cost effective. The reasoning models (which the paper calls large reasoning models, or LRMs) tended to overthink the problem and hit upon spurious strategies. At medium complexity, the reasoning models did better. But at high complexity, the LRMs failed entirely. Rather than thinking longer to solve the problem, as they are supposedly designed to do, the reasoning models often thought for less time than on the medium complexity problems and then simply abandoned the search for a correct solution. The most damning finding of the paper was that even when researchers provided the LRMs with an algorithm for solving the puzzle, the LRMs failed to apply it.

The paper adds to a growing body of research—such as this Anthropic study—that indicates that LRMs are not actually using logic to arrive at their answers. Instead, they seem to be conducting longer, deeper searches for examples in their training data that match the problem at hand. But they don’t seem able to generalize logical rules for solving the puzzles. 

FORTUNE ON AI

Chipotle CEO says they will open a new restaurant almost every 24 hours this year, thanks to AI —by Preston Fore

Duolingo’s CEO outlined his plan to become an ‘AI-first’ company. He didn’t expect the human backlash that followed —by Sara Braun

Commentary: ‘Sovereign AI’ is political branding. The reality is closer to digital colonialism —by Nathan Benaich

AI CALENDAR

June 12: AMD Advancing AI, San Jose

July 8-11: AI for Good Global Summit, Geneva

July 13-19: International Conference on Machine Learning (ICML), Vancouver

July 22-23: Fortune Brainstorm AI Singapore. Apply to attend here.

July 26-28: World Artificial Intelligence Conference (WAIC), Shanghai. 

Sept. 8-10: Fortune Brainstorm Tech, Park City, Utah. Apply to attend here.

Oct. 6-10: World AI Week, Amsterdam

Dec. 2-7: NeurIPS, San Diego

Dec. 8-9: Fortune Brainstorm AI San Francisco. Apply to attend here.

BRAIN FOOD

Should college students be made to use AI? Ohio State University has announced that starting this fall, every undergraduate student will be asked to use AI in all of their coursework. In my book, Mastering AI: A Survival Guide to Our Superpowered Future, I argue that education is one area where AI will ultimately have a profoundly positive effect, despite the initial moral panic about the debut of ChatGPT. The university has said it is offering assistance to faculty to help them rework curricula and develop teaching methods to ensure that students are still learning fundamental skills in each subject area, while also learning how to use AI effectively. I am convinced that there are thoughtful ways to do this. That said, I wonder if a single summer is enough time to implement these changes effectively? The fact that one professor quoted in this NBC affiliate Channel 4 piece on the new AI mandate said students “did not always feel like the work was really theirs” when they used AI, suggests that in some cases students are not being asked to do enough critical thinking and problem-solving. The risk students won’t learn the basics is real. Yes, teaching students how to use AI is vital to prepare them for the workforce of tomorrow. But it shouldn’t come at the expense of fundamental reasoning, writing, scientific, and research skills.

This is the online version of Eye on AI, Fortune's biweekly newsletter on how AI is shaping the future of business. Sign up for free.
About the Author
Jeremy Kahn
By Jeremy KahnEditor, AI
LinkedIn iconTwitter icon

Jeremy Kahn is the AI editor at Fortune, spearheading the publication's coverage of artificial intelligence. He also co-authors Eye on AI, Fortune’s flagship AI newsletter.

See full bioRight Arrow Button Icon

Latest in Newsletters

Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025

Most Popular

Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Rankings
  • 100 Best Companies
  • Fortune 500
  • Global 500
  • Fortune 500 Europe
  • Most Powerful Women
  • Future 50
  • World’s Most Admired Companies
  • See All Rankings
Sections
  • Finance
  • Leadership
  • Success
  • Tech
  • Asia
  • Europe
  • Environment
  • Fortune Crypto
  • Health
  • Retail
  • Lifestyle
  • Politics
  • Newsletters
  • Magazine
  • Features
  • Commentary
  • Mpw
  • CEO Initiative
  • Conferences
  • Personal Finance
  • Education
Customer Support
  • Frequently Asked Questions
  • Customer Service Portal
  • Privacy Policy
  • Terms Of Use
  • Single Issues For Purchase
  • International Print
Commercial Services
  • Advertising
  • Fortune Brand Studio
  • Fortune Analytics
  • Fortune Conferences
  • Business Development
About Us
  • About Us
  • Editorial Calendar
  • Press Center
  • Work At Fortune
  • Diversity And Inclusion
  • Terms And Conditions
  • Site Map
  • Facebook icon
  • Twitter icon
  • LinkedIn icon
  • Instagram icon
  • Pinterest icon

Latest in Newsletters

NewslettersCIO Intelligence
How Expedia’s CTO is using AI to transform work for 17,000 employees—and travel for millions
By John KellJanuary 14, 2026
14 hours ago
NewslettersMPW Daily
Two of the world’s biggest podcasters went viral talking about why women are having fewer children. Here’s what they got wrong
By Ellie AustinJanuary 14, 2026
15 hours ago
NewslettersCFO Daily
JPMorgan CEO and CFO: Staying competitive requires investment
By Sheryl EstradaJanuary 14, 2026
18 hours ago
NewslettersTerm Sheet
What 2026 holds for the future of work
By Allie GarfinkleJanuary 14, 2026
19 hours ago
OnePlus CEO Pete Lau in Mumbai on June 22, 2017. (Photo: Punit Paranjpe/AFP/Getty Images)
NewslettersFortune Tech
Taiwan issues arrest warrant for OnePlus CEO
By Andrew NuscaJanuary 14, 2026
20 hours ago
NewslettersCEO Daily
Leaders are increasingly worried about an economic downturn, inflation, and an asset bubble bust
By Diane BradyJanuary 14, 2026
21 hours ago

Most Popular

placeholder alt text
Personal Finance
Peter Thiel makes his biggest donation in years to help defeat California’s billionaire wealth tax
By Nick LichtenbergJanuary 14, 2026
14 hours ago
placeholder alt text
Success
Despite his $2.6 billion net worth, MrBeast says he’s having to borrow cash and doesn’t even have enough money in his bank account to buy McDonald’s
By Emma BurleighJanuary 13, 2026
2 days ago
placeholder alt text
AI
'Godfather of AI' says the technology will create massive unemployment and send profits soaring — 'that is the capitalist system'
By Jason MaJanuary 12, 2026
3 days ago
placeholder alt text
AI
Being mean to ChatGPT can boost its accuracy, but scientists warn you may regret it
By Marco Quiroz-GutierrezJanuary 13, 2026
2 days ago
placeholder alt text
Future of Work
'Microshifting,' an extreme form of hybrid working that breaks work into short, non-continuous blocks, is on the rise
By Nick LichtenbergJanuary 13, 2026
2 days ago
placeholder alt text
Economy
Goldman Sachs top economist says Powell probe won’t change the Fed: 'Decisions are going to be made based on employment and inflation'
By Sasha RogelbergJanuary 12, 2026
3 days ago

© 2025 Fortune Media IP Limited. All Rights Reserved. Use of this site constitutes acceptance of our Terms of Use and Privacy Policy | CA Notice at Collection and Privacy Notice | Do Not Sell/Share My Personal Information
FORTUNE is a trademark of Fortune Media IP Limited, registered in the U.S. and other countries. FORTUNE may receive compensation for some links to products and services on this website. Offers may be subject to change without notice.