AI ‘godfather’ Yoshua Bengio believes he’s found a technical fix for AI’s biggest risks

For the past several years, Yoshua Bengio, a professor at the Université de Montréal whose work helped lay the foundations of modern deep learning, has been one of the AI industry’s most alarmed voices, warning that superintelligent systems could pose an existential threat to humanity—particularly because of their potential for self-preservation and deception.

In a new interview with Fortune, however, the deep-learning pioneer says his latest research points to a technical solution for AI’s biggest safety risks. As a result, his optimism has risen “by a big margin” over the past year, he said.

Bengio’s nonprofit, LawZero, which launched in June, was created to develop new technical approaches to AI safety based on research led by Bengio. Today, the organization—backed by the Gates Foundation and existential-risk funders such as Coefficient Giving (formerly Open Philanthropy) and the Future of Life Institute—announced that it has appointed a high-profile board and global advisory council to guide Bengio’s research, and advance what he calls a “moral mission” to develop AI as a global public good.

The board includes NIKE Foundation founder Maria Eitel as chair, along with Mariano-Florentino Cuellar, president of the Carnegie Endowment for International Peace, and historian Yuval Noah Harari. Bengio himself will also serve.

Bengio felt ‘desperate’

Bengio’s shift to a more optimistic outlook is striking. Bengio shared the Turing Award, computer science’s equivalent of the Nobel Prize, with fellow AI ‘godfathers’ Geoff Hinton and Yann LeCun in 2019. But like Hinton, he grew increasingly concerned about the risks of ever more powerful AI systems in the wake of ChatGPT’s launch in November 2022. LeCun, by contrast, has said he does not think today’s AI systems pose catastrophic risks to humanity.

Three years ago, Bengio felt “desperate” about where AI was headed, he said. “I had no notion of how we could fix the problem,” Bengio recalled. “That’s roughly when I started to understand the possibility of catastrophic risks coming from very powerful AIs,” including the loss of control over superintelligent systems.

What changed was not a single breakthrough, but a line of thinking that led him to believe there is a path forward.

“Because of the work I’ve been doing at LawZero, especially since we created it, I’m now very confident that it is possible to build AI systems that don’t have hidden goals, hidden agendas,” he says.

At the heart of that confidence is an idea Bengio calls “Scientist AI.” Rather than racing to build ever-more-autonomous agents—systems designed to book flights, write code, negotiate with other software, or replace human workers—Bengio wants to do the opposite. His team is researching how to build AI that exists primarily to understand the world, not to act in it.

A Scientist AI trained to give truthful answers

A Scientist AI would be trained to give truthful answers based on transparent, probabilistic reasoning—essentially using the scientific method or other reasoning grounded in formal logic to arrive at predictions. The AI system would not have goals of its own. And it would not optimize for user satisfaction or outcomes. It would not try to persuade, flatter, or please. And because it would have no goals, Bengio argues, it would be far less prone to manipulation, hidden agendas, or strategic deception.

Today’s frontier models are trained to pursue objectives—to be helpful, effective, or engaging. But systems that optimize for outcomes can develop hidden objectives, learn to mislead users, or resist shutdown, said Bengio. In recent experiments, models have already shown early forms of self-preserving behavior. For instance, AI lab Anthropic famously found that its Claude AI model would, in some scenarios used to test its capabilities, attempt to blackmail the human engineers overseeing it to prevent itself from being shutdown.

In Bengio’s methodology, the core model would have no agenda at all—only the ability to make honest predictions about how the world works. In his vision, more capable systems can be safety built, audited and constrained on top of that “honest,” trusted foundation.

Such a system could accelerate scientific discovery, Bengio says. It could also serve as an independent layer of oversight for more powerful agentic AIs. But the approach stands in sharp contrast to the direction most frontier labs are taking. At the World Economic Forum in Davos last year, Bengio said companies were pouring resources into AI agents. “That’s where they can make the fast buck,” he said. The pressure to automate work and reduce costs, he added, is “irresistible.”

He is not surprised by what has followed since then. “I did expect the agentic capabilities of AI systems would progress,” he says. “They have progressed in an exponential way.” What worries him is that as these systems grow more autonomous, their behavior may become less predictable, less interpretable, and potentially far more dangerous.

Preventing Bengio’s new AI from becoming a “tool of domination”

That is where governance enters the picture. Bengio does not believe a technical solution alone is sufficient. Even a safe methodology, he argues, could be misused “in the wrong hands for political reasons.” That is why LawZero is pairing its research agenda with a heavyweight board.

“We’re going to have difficult decisions to take that are not just technical,” he says—about who to collaborate with, how to share the work, and how to prevent it from becoming “a tool of domination.” The board, he says, is meant to help ensure that LawZero’s mission remains grounded in democratic values and human rights.

Bengio says he has spoken with leaders across the major AI labs, and many share his concerns. But, he adds, companies like OpenAI and Anthropic believe they must remain at the frontier to do anything positive with AI. Competitive pressure pushes them towards building ever more powerful AI systems—and towards a self-image in which their work and their organizations are inherently beneficial.

“Psychologists call it motivated cognition,” Bengio said. “We don’t even allow certain thoughts to arise if they threaten who we think we are.” That is how he experienced his AI research, he pointed out. “Until it kind of exploded in my face thinking about my children, whether they would have a future.”

For an AI leader who once feared that advanced AI might be uncontrollable by design, Bengio’s newfound hopefulness seems like a positive signal, though he admits that his take is not a common belief among those researchers and organizations focused on the potential catastrophic risks of AI.

But he does not back down from his belief that a technical solution does exist. “I’m more and more confident that it can be done in a reasonable number of years,” he said, “so that we might be able to actually have an impact before these guys get so powerful that their misalignment causes terrible problems.”

Join us at the Fortune Workplace Innovation Summit May 19–20, 2026, in Atlanta. The next era of workplace innovation is here—and the old playbook is being rewritten. At this exclusive, high-energy event, the world’s most innovative leaders will convene to explore how AI, humanity, and strategy converge to redefine, again, the future of work. Register now.