• Home
  • Latest
  • Fortune 500
  • Finance
  • Tech
  • Leadership
  • Lifestyle
  • Rankings
  • Multimedia
TechAI

Reliable ‘reasoning’ AI agents may be just around the corner thanks to DeepSeek’s innovations, say researchers

By
David Meyer
David Meyer
Down Arrow Button Icon
By
David Meyer
David Meyer
Down Arrow Button Icon
February 6, 2025, 11:40 AM ET
The DeepSeek AI app
The DeepSeek AI appJaap Arriens—NurPhoto/Getty Images

Innovations made by China’s DeepSeek could soon lead to the creation of AI agents that have strong reasoning skills but are also small enough to run directly on people’s computers and mobile devices, according to a researcher at the open-source AI organization Hugging Face.

Recommended Video

Starting with OpenAI’s o1 last September, the past several months have seen the emergence of AI models that can “reason” in a sense, by doing step-by-step thinking. DeepSeek astonished the sector two weeks ago by releasing a reasoning model called R1 that could match o1’s performance in many tasks, despite the fact that it cost a fraction as much to train.

DeepSeek achieved this through a combination of clever algorithmic advances and optimization of the hardware used in the training. DeepSeek also showed that it was fairly easy to transfer reasoning capabilities from a big model like R1 into a much smaller model like Meta’s Llama-8B, in a process called distillation. What’s more, it open-sourced much of its work—big models and smaller, distilled versions—allowing others to freely build on its achievements.

Within days, the team at Hugging Face kick-started a new community project called Open-R1 that aims to replicate what DeepSeek did. Hugging Face researcher Lewis Tunstall told Fortune on Tuesday that the work was “going quite fast” and would shortly have a big impact in the red-hot field of “agentic” AI, which essentially involves AI systems that can autonomously perform tasks on the user’s behalf.

“One of the big bottlenecks [with AI agents] has always been reliability—how do you make sure these agents don’t hallucinate the wrong decision and, for example, delete all your emails?” Tunstall said. “One big advantage of these reasoning models is they seem to be far more capable of detecting their own errors and therefore potentially being more reliable. So what I expect will happen in the coming months is that people will use these methods that were pioneered by R1 to try and create reliable agents which then can run on many different devices.”

Tunstall said some of these agents would be free for people to download and use, as is the way with open-source technology, though proprietary models based on DeepSeek’s advances were also likely. “I expect many AI agent companies are looking for ways to distill the reasoning traces from DeepSeek-R1 into smaller models that can power their products,” he said, referring to the record of logical steps, as well as the model’s own internal commentary on the strategies it is trying, that the model outputs.

Opening the rest

Much like Meta with its open-source Llama models, DeepSeek’s open-sourcing of R1 and its underlying base model, V3, came with limits. It released the models themselves, and several distilled versions of R1, and the “weights” that allow developers to customize DeepSeek’s models—but, although it outlined the algorithmic “recipe” it used to train its reasoning models, it did not release the recipe itself, nor the datasets used in that training.

The Open-R1 project is essentially about filling in those blanks, primarily so anyone can replicate the “post-training” method that DeepSeek used to refine R1 out of V3. Ultimately, the project may also make it possible to replicate the “pre-training” method that DeepSeek used to make V3, though as Hugging Face already hosts nearly a million pre-trained models in its repository, it’s focusing on the post-training aspect first.  

“We will have in a few weeks a first end-to-end demonstration of the post-training method from all the datasets all the way to the final models,” said Tunstall, adding that the next big question would be how easy it is to scale that recipe to the larger V3 model.

The project has already figured out how to implement DeepSeek’s novel reinforcement-learning algorithm, which the Chinese firm named Group Relative Policy Optimization (GRPO).

Reinforcement learning is a popular technique used to make an AI model perform better on a certain task—you give the model a problem of some kind, then give it a positive or negative signal based on whether it generates the right answer or not, and then you repeat the process until the model is very good at performing the task correctly. DeepSeek’s big innovation on this front was to remove the need for human evaluators to provide that positive or negative signal, thus making the training process much more efficient.

“We put together the training script for the community to immediately start playing with [GRPO], and we’ve already seen lots of very nice examples of people taking this code and then showing that, if you apply it to a whole range of different models, it actually works,” said Tunstall. “You can take a model like Llama and show and teach it how to do mathematics almost from scratch.”

Tunstall said he had already seen people taking models that are small enough to run in a browser and instilling so-called reasoning capabilities into them, using the script that Open-R1 has already released.

Now the project is working on creating datasets, mostly of math problems, that AI engineers can use to train new reasoning models with DeepSeek’s techniques. (As for DeepSeek’s own dataset, OpenAI has alleged that the Chinese company used o1’s outputs in contravention of its terms and conditions.) Other open-source projects, such as Open Thoughts, are trying to do the same thing.

“In the coming months we’re going to see this explosion of both datasets for reasoning and insights in how to actually train these models effectively, for multiple groups,” said Tunstall. “That to me is the exciting thing about open-source. It’s not a zero-sum thing. Our hope is that collectively we can decipher the secrets of R1.”

Open-source AI is a controversial subject, as such models tend to be less safe and secure than their proprietary, closed-source rivals. In fact, recent independent evaluations of DeepSeek’s R1 have found the model’s guardrails can be easily overcome through common “jailbreaking” methods—which involve designing prompts that trick the model into bypassing its guardrails. Once overcome, the model can generate responses that might be harmful, including generating potential malware and offering people help in potentially dangerous activities, from financial fraud to bioterrorism.

But, as Open-R1 is making clear, there’s little chance of going back. Meanwhile, open-source proponents argue that this collaborative approach could prove beneficial for the democratization and advancement of AI.

Join us at the Fortune Workplace Innovation Summit May 19–20, 2026, in Atlanta. The next era of workplace innovation is here—and the old playbook is being rewritten. At this exclusive, high-energy event, the world’s most innovative leaders will convene to explore how AI, humanity, and strategy converge to redefine, again, the future of work. Register now.
About the Author
By David Meyer
LinkedIn icon
See full bioRight Arrow Button Icon

Latest in Tech

Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025

Most Popular

Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Rankings
  • 100 Best Companies
  • Fortune 500
  • Global 500
  • Fortune 500 Europe
  • Most Powerful Women
  • Future 50
  • World’s Most Admired Companies
  • See All Rankings
Sections
  • Finance
  • Leadership
  • Success
  • Tech
  • Asia
  • Europe
  • Environment
  • Fortune Crypto
  • Health
  • Retail
  • Lifestyle
  • Politics
  • Newsletters
  • Magazine
  • Features
  • Commentary
  • Mpw
  • CEO Initiative
  • Conferences
  • Personal Finance
  • Education
Customer Support
  • Frequently Asked Questions
  • Customer Service Portal
  • Privacy Policy
  • Terms Of Use
  • Single Issues For Purchase
  • International Print
Commercial Services
  • Advertising
  • Fortune Brand Studio
  • Fortune Analytics
  • Fortune Conferences
  • Business Development
About Us
  • About Us
  • Editorial Calendar
  • Press Center
  • Work At Fortune
  • Diversity And Inclusion
  • Terms And Conditions
  • Site Map
  • Facebook icon
  • Twitter icon
  • LinkedIn icon
  • Instagram icon
  • Pinterest icon

© 2026 Fortune Media IP Limited. All Rights Reserved. Use of this site constitutes acceptance of our Terms of Use and Privacy Policy | CA Notice at Collection and Privacy Notice | Do Not Sell/Share My Personal Information
FORTUNE is a trademark of Fortune Media IP Limited, registered in the U.S. and other countries. FORTUNE may receive compensation for some links to products and services on this website. Offers may be subject to change without notice.


Most Popular

placeholder alt text
North America
'I meant what I said in Davos': Carney says he really is planning a Canada split with the U.S. along with 12 new trade deals
By Rob Gillies and The Associated PressJanuary 28, 2026
3 days ago
placeholder alt text
Politics
The American taxpayer spent nearly half a billion dollars deploying federal troops to U.S. cities in 2025, CBO finds
By Nick LichtenbergJanuary 28, 2026
2 days ago
placeholder alt text
Economy
Right before Trump named Warsh to lead the Fed, Powell seemed to respond to some of his biggest complaints about the central bank
By Jason MaJanuary 30, 2026
12 hours ago
placeholder alt text
C-Suite
Jeff Bezos capped his Amazon salary at $80,000: ‘How could I possibly need more incentive?’
By Sydney LakeJanuary 28, 2026
3 days ago
placeholder alt text
C-Suite
Fortune 500 CEOs are no longer giving employees an A for effort. Now they want proof of impact
By Claire ZillmanJanuary 28, 2026
3 days ago
placeholder alt text
Investing
Jerome Powell got a direct question about the U.S. ‘losing credibility’ and the soaring price of gold and silver. He punted
By Eva RoytburgJanuary 29, 2026
2 days ago

Latest in Tech

In this handout, the mug shot of Jeffrey Epstein, 2019.
PoliticsJeffrey Epstein
Elon Musk and Jeffrey Epstein emailed each other for years trying to meet up, new Justice Department records show
By Eva Roytburg and Sasha RogelbergJanuary 30, 2026
5 hours ago
Big TechThe Boring Company
After a decade of silence, Elon Musk’s tunneling startup and its reclusive president, are hitting the media circuit
By Jessica MathewsJanuary 30, 2026
8 hours ago
MagazineEducation
The 1966 cover of Fortune Magazine welcomed the Information age. Now the AI era beckons
By Indrani SenJanuary 30, 2026
9 hours ago
Gamestop
Big TechGameStop
Five years after the short squeeze, GameStop’s CEO is betting on a ‘genius or totally foolish’ $100 billion-plus acquisition
By Jake AngeloJanuary 30, 2026
10 hours ago
C-SuitePharmaceutical Industry
‘We’ll save the world from cancer’: Inside Pfizer CEO’s $23 billion post‑COVID bet on oncology
By Nick LichtenbergJanuary 30, 2026
11 hours ago
Sam Altman speaking into a mic.
AIOpenAI
A reported OpenAI IPO later this year may test investor tolerance for the AI boom’s cash bonfire
By Beatrice NolanJanuary 30, 2026
13 hours ago