• Home
  • Latest
  • Fortune 500
  • Finance
  • Tech
  • Leadership
  • Lifestyle
  • Rankings
  • Multimedia
TechAI

Reliable ‘reasoning’ AI agents may be just around the corner thanks to DeepSeek’s innovations, say researchers

By
David Meyer
David Meyer
Down Arrow Button Icon
February 6, 2025, 11:40 AM ET
The DeepSeek AI app
The DeepSeek AI appJaap Arriens—NurPhoto/Getty Images

Innovations made by China’s DeepSeek could soon lead to the creation of AI agents that have strong reasoning skills but are also small enough to run directly on people’s computers and mobile devices, according to a researcher at the open-source AI organization Hugging Face.

Recommended Video

Starting with OpenAI’s o1 last September, the past several months have seen the emergence of AI models that can “reason” in a sense, by doing step-by-step thinking. DeepSeek astonished the sector two weeks ago by releasing a reasoning model called R1 that could match o1’s performance in many tasks, despite the fact that it cost a fraction as much to train.

DeepSeek achieved this through a combination of clever algorithmic advances and optimization of the hardware used in the training. DeepSeek also showed that it was fairly easy to transfer reasoning capabilities from a big model like R1 into a much smaller model like Meta’s Llama-8B, in a process called distillation. What’s more, it open-sourced much of its work—big models and smaller, distilled versions—allowing others to freely build on its achievements.

Within days, the team at Hugging Face kick-started a new community project called Open-R1 that aims to replicate what DeepSeek did. Hugging Face researcher Lewis Tunstall told Fortune on Tuesday that the work was “going quite fast” and would shortly have a big impact in the red-hot field of “agentic” AI, which essentially involves AI systems that can autonomously perform tasks on the user’s behalf.

“One of the big bottlenecks [with AI agents] has always been reliability—how do you make sure these agents don’t hallucinate the wrong decision and, for example, delete all your emails?” Tunstall said. “One big advantage of these reasoning models is they seem to be far more capable of detecting their own errors and therefore potentially being more reliable. So what I expect will happen in the coming months is that people will use these methods that were pioneered by R1 to try and create reliable agents which then can run on many different devices.”

Tunstall said some of these agents would be free for people to download and use, as is the way with open-source technology, though proprietary models based on DeepSeek’s advances were also likely. “I expect many AI agent companies are looking for ways to distill the reasoning traces from DeepSeek-R1 into smaller models that can power their products,” he said, referring to the record of logical steps, as well as the model’s own internal commentary on the strategies it is trying, that the model outputs.

Opening the rest

Much like Meta with its open-source Llama models, DeepSeek’s open-sourcing of R1 and its underlying base model, V3, came with limits. It released the models themselves, and several distilled versions of R1, and the “weights” that allow developers to customize DeepSeek’s models—but, although it outlined the algorithmic “recipe” it used to train its reasoning models, it did not release the recipe itself, nor the datasets used in that training.

The Open-R1 project is essentially about filling in those blanks, primarily so anyone can replicate the “post-training” method that DeepSeek used to refine R1 out of V3. Ultimately, the project may also make it possible to replicate the “pre-training” method that DeepSeek used to make V3, though as Hugging Face already hosts nearly a million pre-trained models in its repository, it’s focusing on the post-training aspect first.  

“We will have in a few weeks a first end-to-end demonstration of the post-training method from all the datasets all the way to the final models,” said Tunstall, adding that the next big question would be how easy it is to scale that recipe to the larger V3 model.

The project has already figured out how to implement DeepSeek’s novel reinforcement-learning algorithm, which the Chinese firm named Group Relative Policy Optimization (GRPO).

Reinforcement learning is a popular technique used to make an AI model perform better on a certain task—you give the model a problem of some kind, then give it a positive or negative signal based on whether it generates the right answer or not, and then you repeat the process until the model is very good at performing the task correctly. DeepSeek’s big innovation on this front was to remove the need for human evaluators to provide that positive or negative signal, thus making the training process much more efficient.

“We put together the training script for the community to immediately start playing with [GRPO], and we’ve already seen lots of very nice examples of people taking this code and then showing that, if you apply it to a whole range of different models, it actually works,” said Tunstall. “You can take a model like Llama and show and teach it how to do mathematics almost from scratch.”

Tunstall said he had already seen people taking models that are small enough to run in a browser and instilling so-called reasoning capabilities into them, using the script that Open-R1 has already released.

Now the project is working on creating datasets, mostly of math problems, that AI engineers can use to train new reasoning models with DeepSeek’s techniques. (As for DeepSeek’s own dataset, OpenAI has alleged that the Chinese company used o1’s outputs in contravention of its terms and conditions.) Other open-source projects, such as Open Thoughts, are trying to do the same thing.

“In the coming months we’re going to see this explosion of both datasets for reasoning and insights in how to actually train these models effectively, for multiple groups,” said Tunstall. “That to me is the exciting thing about open-source. It’s not a zero-sum thing. Our hope is that collectively we can decipher the secrets of R1.”

Open-source AI is a controversial subject, as such models tend to be less safe and secure than their proprietary, closed-source rivals. In fact, recent independent evaluations of DeepSeek’s R1 have found the model’s guardrails can be easily overcome through common “jailbreaking” methods—which involve designing prompts that trick the model into bypassing its guardrails. Once overcome, the model can generate responses that might be harmful, including generating potential malware and offering people help in potentially dangerous activities, from financial fraud to bioterrorism.

But, as Open-R1 is making clear, there’s little chance of going back. Meanwhile, open-source proponents argue that this collaborative approach could prove beneficial for the democratization and advancement of AI.

Join us at the Fortune Workplace Innovation Summit May 19–20, 2026, in Atlanta. The next era of workplace innovation is here—and the old playbook is being rewritten. At this exclusive, high-energy event, the world’s most innovative leaders will convene to explore how AI, humanity, and strategy converge to redefine, again, the future of work. Register now.
About the Author
By David Meyer
LinkedIn icon
See full bioRight Arrow Button Icon

Latest in Tech

Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025

Most Popular

Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Rankings
  • 100 Best Companies
  • Fortune 500
  • Global 500
  • Fortune 500 Europe
  • Most Powerful Women
  • Future 50
  • World’s Most Admired Companies
  • See All Rankings
Sections
  • Finance
  • Leadership
  • Success
  • Tech
  • Asia
  • Europe
  • Environment
  • Fortune Crypto
  • Health
  • Retail
  • Lifestyle
  • Politics
  • Newsletters
  • Magazine
  • Features
  • Commentary
  • Mpw
  • CEO Initiative
  • Conferences
  • Personal Finance
  • Education
Customer Support
  • Frequently Asked Questions
  • Customer Service Portal
  • Privacy Policy
  • Terms Of Use
  • Single Issues For Purchase
  • International Print
Commercial Services
  • Advertising
  • Fortune Brand Studio
  • Fortune Analytics
  • Fortune Conferences
  • Business Development
About Us
  • About Us
  • Editorial Calendar
  • Press Center
  • Work At Fortune
  • Diversity And Inclusion
  • Terms And Conditions
  • Site Map

Latest in Tech

LawAmazon
Amazon is cutting checks to millions of customers as part of a $2.5 billion FTC settlement. Here’s who qualifies and how to get paid
By Sydney LakeJanuary 6, 2026
1 hour ago
InvestingU.S. economy
Ray Dalio says AI is in ‘the early stages of a bubble,’ so watch out for 2026
By Tristan BoveJanuary 6, 2026
2 hours ago
musk
AISocial Media
Elon Musk’s Grok chatbot draws global backlash for generating sexualized images of women and children without consent
By Kelvin Chan and The Associated PressJanuary 6, 2026
2 hours ago
Databricks CEO Ali Ghodsi speaking on stage at a Fortune tech conference.
AIEye on AI
Want AI agents to work better? Improve the way they retrieve information, Databricks says
By Jeremy KahnJanuary 6, 2026
2 hours ago
C-SuiteSamsung
Why one of the world’s most qualified chief design officers calls Samsung his ‘dream job’
By Nicholas GordonJanuary 6, 2026
3 hours ago
AINvidia
A year ago, Nvidia’s Jensen Huang said the ‘ChatGPT moment’ for robotics was around the corner. Now he says it’s ‘nearly here.’ But is it?
By Sharon GoldmanJanuary 6, 2026
5 hours ago

© 2025 Fortune Media IP Limited. All Rights Reserved. Use of this site constitutes acceptance of our Terms of Use and Privacy Policy | CA Notice at Collection and Privacy Notice | Do Not Sell/Share My Personal Information
FORTUNE is a trademark of Fortune Media IP Limited, registered in the U.S. and other countries. FORTUNE may receive compensation for some links to products and services on this website. Offers may be subject to change without notice.


Most Popular

placeholder alt text
Personal Finance
Janet Yellen warns the $38 trillion national debt is testing a red line economists have feared for decades
By Eva RoytburgJanuary 5, 2026
1 day ago
placeholder alt text
AI
Experienced software developers assumed AI would save them a chunk of time. But in one experiment, their tasks took 20% longer
By Sasha RogelbergJanuary 5, 2026
1 day ago
placeholder alt text
Energy
‘Big Short’ investor Michael Burry says toppling of Venezuela’s Maduro will weaken Russia’s global standing as its oil ‘just became less important’
By Marco Quiroz-GutierrezJanuary 5, 2026
1 day ago
placeholder alt text
Success
Blackstone exec says elite Ivy League degrees aren’t good enough—new analysts need to 'work harder' and be nice 
By Ashley LutzJanuary 5, 2026
1 day ago
placeholder alt text
Economy
Under Biden, America got 150 countries to agree a 15% global corporate tax. Under Trump, America gets an exemption
By Fatima Hussein and The Associated PressJanuary 5, 2026
23 hours ago
placeholder alt text
Personal Finance
Current price of silver as of Monday, January 5, 2026
By Joseph HostetlerJanuary 5, 2026
1 day ago