• Home
  • Latest
  • Fortune 500
  • Finance
  • Tech
  • Leadership
  • Lifestyle
  • Rankings
  • Multimedia
TechAI

Reliable ‘reasoning’ AI agents may be just around the corner thanks to DeepSeek’s innovations, say researchers

By
David Meyer
David Meyer
Down Arrow Button Icon
By
David Meyer
David Meyer
Down Arrow Button Icon
February 6, 2025, 11:40 AM ET
The DeepSeek AI app
The DeepSeek AI appJaap Arriens—NurPhoto/Getty Images

Innovations made by China’s DeepSeek could soon lead to the creation of AI agents that have strong reasoning skills but are also small enough to run directly on people’s computers and mobile devices, according to a researcher at the open-source AI organization Hugging Face.

Recommended Video

Starting with OpenAI’s o1 last September, the past several months have seen the emergence of AI models that can “reason” in a sense, by doing step-by-step thinking. DeepSeek astonished the sector two weeks ago by releasing a reasoning model called R1 that could match o1’s performance in many tasks, despite the fact that it cost a fraction as much to train.

DeepSeek achieved this through a combination of clever algorithmic advances and optimization of the hardware used in the training. DeepSeek also showed that it was fairly easy to transfer reasoning capabilities from a big model like R1 into a much smaller model like Meta’s Llama-8B, in a process called distillation. What’s more, it open-sourced much of its work—big models and smaller, distilled versions—allowing others to freely build on its achievements.

Within days, the team at Hugging Face kick-started a new community project called Open-R1 that aims to replicate what DeepSeek did. Hugging Face researcher Lewis Tunstall told Fortune on Tuesday that the work was “going quite fast” and would shortly have a big impact in the red-hot field of “agentic” AI, which essentially involves AI systems that can autonomously perform tasks on the user’s behalf.

“One of the big bottlenecks [with AI agents] has always been reliability—how do you make sure these agents don’t hallucinate the wrong decision and, for example, delete all your emails?” Tunstall said. “One big advantage of these reasoning models is they seem to be far more capable of detecting their own errors and therefore potentially being more reliable. So what I expect will happen in the coming months is that people will use these methods that were pioneered by R1 to try and create reliable agents which then can run on many different devices.”

Tunstall said some of these agents would be free for people to download and use, as is the way with open-source technology, though proprietary models based on DeepSeek’s advances were also likely. “I expect many AI agent companies are looking for ways to distill the reasoning traces from DeepSeek-R1 into smaller models that can power their products,” he said, referring to the record of logical steps, as well as the model’s own internal commentary on the strategies it is trying, that the model outputs.

Opening the rest

Much like Meta with its open-source Llama models, DeepSeek’s open-sourcing of R1 and its underlying base model, V3, came with limits. It released the models themselves, and several distilled versions of R1, and the “weights” that allow developers to customize DeepSeek’s models—but, although it outlined the algorithmic “recipe” it used to train its reasoning models, it did not release the recipe itself, nor the datasets used in that training.

The Open-R1 project is essentially about filling in those blanks, primarily so anyone can replicate the “post-training” method that DeepSeek used to refine R1 out of V3. Ultimately, the project may also make it possible to replicate the “pre-training” method that DeepSeek used to make V3, though as Hugging Face already hosts nearly a million pre-trained models in its repository, it’s focusing on the post-training aspect first.  

“We will have in a few weeks a first end-to-end demonstration of the post-training method from all the datasets all the way to the final models,” said Tunstall, adding that the next big question would be how easy it is to scale that recipe to the larger V3 model.

The project has already figured out how to implement DeepSeek’s novel reinforcement-learning algorithm, which the Chinese firm named Group Relative Policy Optimization (GRPO).

Reinforcement learning is a popular technique used to make an AI model perform better on a certain task—you give the model a problem of some kind, then give it a positive or negative signal based on whether it generates the right answer or not, and then you repeat the process until the model is very good at performing the task correctly. DeepSeek’s big innovation on this front was to remove the need for human evaluators to provide that positive or negative signal, thus making the training process much more efficient.

“We put together the training script for the community to immediately start playing with [GRPO], and we’ve already seen lots of very nice examples of people taking this code and then showing that, if you apply it to a whole range of different models, it actually works,” said Tunstall. “You can take a model like Llama and show and teach it how to do mathematics almost from scratch.”

Tunstall said he had already seen people taking models that are small enough to run in a browser and instilling so-called reasoning capabilities into them, using the script that Open-R1 has already released.

Now the project is working on creating datasets, mostly of math problems, that AI engineers can use to train new reasoning models with DeepSeek’s techniques. (As for DeepSeek’s own dataset, OpenAI has alleged that the Chinese company used o1’s outputs in contravention of its terms and conditions.) Other open-source projects, such as Open Thoughts, are trying to do the same thing.

“In the coming months we’re going to see this explosion of both datasets for reasoning and insights in how to actually train these models effectively, for multiple groups,” said Tunstall. “That to me is the exciting thing about open-source. It’s not a zero-sum thing. Our hope is that collectively we can decipher the secrets of R1.”

Open-source AI is a controversial subject, as such models tend to be less safe and secure than their proprietary, closed-source rivals. In fact, recent independent evaluations of DeepSeek’s R1 have found the model’s guardrails can be easily overcome through common “jailbreaking” methods—which involve designing prompts that trick the model into bypassing its guardrails. Once overcome, the model can generate responses that might be harmful, including generating potential malware and offering people help in potentially dangerous activities, from financial fraud to bioterrorism.

But, as Open-R1 is making clear, there’s little chance of going back. Meanwhile, open-source proponents argue that this collaborative approach could prove beneficial for the democratization and advancement of AI.

Join us at the Fortune Workplace Innovation Summit May 19–20, 2026, in Atlanta. The next era of workplace innovation is here—and the old playbook is being rewritten. At this exclusive, high-energy event, the world’s most innovative leaders will convene to explore how AI, humanity, and strategy converge to redefine, again, the future of work. Register now.
About the Author
By David Meyer
LinkedIn icon
See full bioRight Arrow Button Icon

Latest in Tech

Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025

Most Popular

Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Rankings
  • 100 Best Companies
  • Fortune 500
  • Global 500
  • Fortune 500 Europe
  • Most Powerful Women
  • Future 50
  • World’s Most Admired Companies
  • See All Rankings
Sections
  • Finance
  • Leadership
  • Success
  • Tech
  • Asia
  • Europe
  • Environment
  • Fortune Crypto
  • Health
  • Retail
  • Lifestyle
  • Politics
  • Newsletters
  • Magazine
  • Features
  • Commentary
  • Mpw
  • CEO Initiative
  • Conferences
  • Personal Finance
  • Education
Customer Support
  • Frequently Asked Questions
  • Customer Service Portal
  • Privacy Policy
  • Terms Of Use
  • Single Issues For Purchase
  • International Print
Commercial Services
  • Advertising
  • Fortune Brand Studio
  • Fortune Analytics
  • Fortune Conferences
  • Business Development
About Us
  • About Us
  • Editorial Calendar
  • Press Center
  • Work At Fortune
  • Diversity And Inclusion
  • Terms And Conditions
  • Site Map
  • Facebook icon
  • Twitter icon
  • LinkedIn icon
  • Instagram icon
  • Pinterest icon

© 2026 Fortune Media IP Limited. All Rights Reserved. Use of this site constitutes acceptance of our Terms of Use and Privacy Policy | CA Notice at Collection and Privacy Notice | Do Not Sell/Share My Personal Information
FORTUNE is a trademark of Fortune Media IP Limited, registered in the U.S. and other countries. FORTUNE may receive compensation for some links to products and services on this website. Offers may be subject to change without notice.


Most Popular

placeholder alt text
C-Suite
OpenAI’s Sam Altman says his highly disciplined daily routine has ‘fallen to crap’—and now unwinds on weekends at a ranch with no cell phone service
By Jacqueline MunisFebruary 5, 2026
2 days ago
placeholder alt text
Politics
Meet the Palm Beach billionaire who paid $2 million for a private White House visit with Trump
By Tristan BoveFebruary 3, 2026
4 days ago
placeholder alt text
Travel & Leisure
How Japan replaced France as the country young Americans obsessively romanticize—they’re longing for civility they don’t see at home
By Nick LichtenbergFebruary 5, 2026
2 days ago
placeholder alt text
Success
After decades in the music industry, Pharrell Williams admits he never stops working: ‘If you do what you love everyday, you’ll get paid for free'
By Emma BurleighFebruary 3, 2026
4 days ago
placeholder alt text
Investing
Ray Dalio warns the world is ‘on the brink’ of a capital war of weaponizing money—and gold is the best way for people to protect themselves
By Sasha RogelbergFebruary 4, 2026
2 days ago
placeholder alt text
Crypto
Bitcoin whales and ETFs are baling out of the market; UBS warns: ‘Crypto is not an asset’
By Jim EdwardsFebruary 6, 2026
17 hours ago

Latest in Tech

CEO and co-founder of Anthropic Dario Amodei speaking on stage.
AIAnthropic
Anthropic’s newest model excels at finding security vulnerabilities—but raises fresh cybersecurity risks
By Beatrice NolanFebruary 6, 2026
9 hours ago
Arts & EntertainmentSuper Bowl
Many 2026 Super Bowl ads share a common theme, revealing a truth about America’s current mindset
By Mae Anderson and The Associated PressFebruary 6, 2026
10 hours ago
Cybersecuritydeepfakes
In the disappearance of Savannah Guthrie’s mom, AI deepfakes add to the mystery
By Barbara Ortutay, Ed White and The Associated PressFebruary 6, 2026
10 hours ago
Google data center
Big TechData centers
Big Tech’s $630 billion AI spree now rivals Sweden’s economy, unsettling investors: ‘We’ve never invested this much on anything before’
By Jake AngeloFebruary 6, 2026
11 hours ago
AISocial Network
Moltbook, the Reddit for bots, alarms the tech world as agents start their own religion and plot to overthrow humans
By Kaitlyn Huamani and The Associated PressFebruary 6, 2026
11 hours ago
AISpaceX
Musk predicts more AI capacity will be in orbit than on earth in 5 years, with SpaceX becoming a ‘hyper-hyper’ scaler
By Jason MaFebruary 6, 2026
11 hours ago