• Home
  • Latest
  • Fortune 500
  • Finance
  • Tech
  • Leadership
  • Lifestyle
  • Rankings
  • Multimedia

Trendingnow

1

Jeff Bezos wants the bottom half of earners to pay zero income tax—he says nurses making just $75K should save $12K a year

2

Despite a $500 million net worth, Shaq just finished his fourth degree. He warns graduates: 'Your character will take you further than your resume'

3

Bolt CEO says he let go of his entire HR team for creating problems that didn’t exist: ‘Those problems disappeared when I let them go’ 

1

Jeff Bezos wants the bottom half of earners to pay zero income tax—he says nurses making just $75K should save $12K a year

2

Despite a $500 million net worth, Shaq just finished his fourth degree. He warns graduates: 'Your character will take you further than your resume'

3

Bolt CEO says he let go of his entire HR team for creating problems that didn’t exist: ‘Those problems disappeared when I let them go’ 
TechAI

Reliable ‘reasoning’ AI agents may be just around the corner thanks to DeepSeek’s innovations, say researchers

By
David Meyer
David Meyer
Down Arrow Button Icon
By
David Meyer
David Meyer
Down Arrow Button Icon
February 6, 2025, 11:40 AM ET
The DeepSeek AI app
The DeepSeek AI appJaap Arriens—NurPhoto/Getty Images

Innovations made by China’s DeepSeek could soon lead to the creation of AI agents that have strong reasoning skills but are also small enough to run directly on people’s computers and mobile devices, according to a researcher at the open-source AI organization Hugging Face.

Recommended Video

Starting with OpenAI’s o1 last September, the past several months have seen the emergence of AI models that can “reason” in a sense, by doing step-by-step thinking. DeepSeek astonished the sector two weeks ago by releasing a reasoning model called R1 that could match o1’s performance in many tasks, despite the fact that it cost a fraction as much to train.

DeepSeek achieved this through a combination of clever algorithmic advances and optimization of the hardware used in the training. DeepSeek also showed that it was fairly easy to transfer reasoning capabilities from a big model like R1 into a much smaller model like Meta’s Llama-8B, in a process called distillation. What’s more, it open-sourced much of its work—big models and smaller, distilled versions—allowing others to freely build on its achievements.

Within days, the team at Hugging Face kick-started a new community project called Open-R1 that aims to replicate what DeepSeek did. Hugging Face researcher Lewis Tunstall told Fortune on Tuesday that the work was “going quite fast” and would shortly have a big impact in the red-hot field of “agentic” AI, which essentially involves AI systems that can autonomously perform tasks on the user’s behalf.

“One of the big bottlenecks [with AI agents] has always been reliability—how do you make sure these agents don’t hallucinate the wrong decision and, for example, delete all your emails?” Tunstall said. “One big advantage of these reasoning models is they seem to be far more capable of detecting their own errors and therefore potentially being more reliable. So what I expect will happen in the coming months is that people will use these methods that were pioneered by R1 to try and create reliable agents which then can run on many different devices.”

Tunstall said some of these agents would be free for people to download and use, as is the way with open-source technology, though proprietary models based on DeepSeek’s advances were also likely. “I expect many AI agent companies are looking for ways to distill the reasoning traces from DeepSeek-R1 into smaller models that can power their products,” he said, referring to the record of logical steps, as well as the model’s own internal commentary on the strategies it is trying, that the model outputs.

Opening the rest

Much like Meta with its open-source Llama models, DeepSeek’s open-sourcing of R1 and its underlying base model, V3, came with limits. It released the models themselves, and several distilled versions of R1, and the “weights” that allow developers to customize DeepSeek’s models—but, although it outlined the algorithmic “recipe” it used to train its reasoning models, it did not release the recipe itself, nor the datasets used in that training.

The Open-R1 project is essentially about filling in those blanks, primarily so anyone can replicate the “post-training” method that DeepSeek used to refine R1 out of V3. Ultimately, the project may also make it possible to replicate the “pre-training” method that DeepSeek used to make V3, though as Hugging Face already hosts nearly a million pre-trained models in its repository, it’s focusing on the post-training aspect first.  

“We will have in a few weeks a first end-to-end demonstration of the post-training method from all the datasets all the way to the final models,” said Tunstall, adding that the next big question would be how easy it is to scale that recipe to the larger V3 model.

The project has already figured out how to implement DeepSeek’s novel reinforcement-learning algorithm, which the Chinese firm named Group Relative Policy Optimization (GRPO).

Reinforcement learning is a popular technique used to make an AI model perform better on a certain task—you give the model a problem of some kind, then give it a positive or negative signal based on whether it generates the right answer or not, and then you repeat the process until the model is very good at performing the task correctly. DeepSeek’s big innovation on this front was to remove the need for human evaluators to provide that positive or negative signal, thus making the training process much more efficient.

“We put together the training script for the community to immediately start playing with [GRPO], and we’ve already seen lots of very nice examples of people taking this code and then showing that, if you apply it to a whole range of different models, it actually works,” said Tunstall. “You can take a model like Llama and show and teach it how to do mathematics almost from scratch.”

Tunstall said he had already seen people taking models that are small enough to run in a browser and instilling so-called reasoning capabilities into them, using the script that Open-R1 has already released.

Now the project is working on creating datasets, mostly of math problems, that AI engineers can use to train new reasoning models with DeepSeek’s techniques. (As for DeepSeek’s own dataset, OpenAI has alleged that the Chinese company used o1’s outputs in contravention of its terms and conditions.) Other open-source projects, such as Open Thoughts, are trying to do the same thing.

“In the coming months we’re going to see this explosion of both datasets for reasoning and insights in how to actually train these models effectively, for multiple groups,” said Tunstall. “That to me is the exciting thing about open-source. It’s not a zero-sum thing. Our hope is that collectively we can decipher the secrets of R1.”

Open-source AI is a controversial subject, as such models tend to be less safe and secure than their proprietary, closed-source rivals. In fact, recent independent evaluations of DeepSeek’s R1 have found the model’s guardrails can be easily overcome through common “jailbreaking” methods—which involve designing prompts that trick the model into bypassing its guardrails. Once overcome, the model can generate responses that might be harmful, including generating potential malware and offering people help in potentially dangerous activities, from financial fraud to bioterrorism.

But, as Open-R1 is making clear, there’s little chance of going back. Meanwhile, open-source proponents argue that this collaborative approach could prove beneficial for the democratization and advancement of AI.

Join our exclusive webinar on May 28, featuring tech leaders from Orange, Mars, Reckitt, and Saint-Gobain. Apply to attend and receive Fortune’s editorial takeaways.
About the Author
By David Meyer
LinkedIn icon
See full bioRight Arrow Button Icon

Latest in Tech

Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025

Most Popular

Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Fortune Secondary Logo
Rankings
  • 100 Best Companies
  • Fortune 500
  • Global 500
  • Fortune 500 Europe
  • Most Powerful Women
  • World's Most Admired Companies
  • See All Rankings
  • Lists Calendar
Sections
  • Finance
  • Fortune Crypto
  • Features
  • Leadership
  • Health
  • Commentary
  • Success
  • Retail
  • Mpw
  • Tech
  • Lifestyle
  • CEO Initiative
  • Asia
  • Politics
  • Conferences
  • Europe
  • Newsletters
  • Personal Finance
  • Environment
  • Magazine
  • Education
Customer Support
  • Frequently Asked Questions
  • Customer Service Portal
  • Privacy Policy
  • Terms Of Use
  • Single Issues For Purchase
  • International Print
Commercial Services
  • Advertising
  • Fortune Brand Studio
  • Fortune Analytics
  • Fortune Conferences
  • Business Development
  • Group Subscriptions
About Us
  • About Us
  • Press Center
  • Work At Fortune
  • Terms And Conditions
  • Site Map
  • About Us
  • Press Center
  • Work At Fortune
  • Terms And Conditions
  • Site Map
  • Facebook icon
  • Twitter icon
  • LinkedIn icon
  • Instagram icon
  • Pinterest icon

Latest in Tech

Jon McNeill
SuccessCareers
Former Tesla president shares the secret to success he learned from his former boss, Elon Musk: ‘He demands to only work with world-class talent’
By Preston ForeMay 23, 2026
8 minutes ago
clay
CommentaryLoneliness
I’ve spent 25 years studying loneliness. AI is about to make it much worse
By Clay RoutledgeMay 23, 2026
38 minutes ago
ambrose
CommentaryRobotics
Former NASA Robotics Chief: America is building the wrong kind of robots — and China knows it
By Robert AmbroseMay 23, 2026
1 hour ago
Elon Musk’s SpaceX IPO filing just told us what business he’s betting on for the future—and it’s not rockets
InvestingFinance
Elon Musk’s SpaceX IPO filing just told us what business he’s betting on for the future—and it’s not rockets
By Shawn TullyMay 23, 2026
2 hours ago
morris
CommentaryEntrepreneurship
My startup hit $200 million ARR. But first I walked away from 2.5 million YouTube subscribers and nearly went bankrupt
By Joel MorrisMay 23, 2026
3 hours ago
How Grab’s CTO sees the superapp’s push into physical AI and automated driving—and why he uses his competitors’ robots in the office
AITransportation
How Grab’s CTO sees the superapp’s push into physical AI and automated driving—and why he uses his competitors’ robots in the office
By Angelica AngMay 22, 2026
13 hours ago

Most Popular

Jeff Bezos wants the bottom half of earners to pay zero income tax—he says nurses making just $75K should save $12K a year
Success
Jeff Bezos wants the bottom half of earners to pay zero income tax—he says nurses making just $75K should save $12K a year
By Preston ForeMay 21, 2026
2 days ago
Despite a $500 million net worth, Shaq just finished his fourth degree. He warns graduates: 'Your character will take you further than your resume'
Success
Despite a $500 million net worth, Shaq just finished his fourth degree. He warns graduates: 'Your character will take you further than your resume'
By Preston ForeMay 20, 2026
3 days ago
Bolt CEO says he let go of his entire HR team for creating problems that didn’t exist: ‘Those problems disappeared when I let them go’ 
Workplace Culture
Bolt CEO says he let go of his entire HR team for creating problems that didn’t exist: ‘Those problems disappeared when I let them go’ 
By Preston ForeMay 19, 2026
4 days ago
Indeed chief economist says we’re entering an era of ‘great mismatch’ thanks to a generational imbalance of workers
Success
Indeed chief economist says we’re entering an era of ‘great mismatch’ thanks to a generational imbalance of workers
By Emma BurleighMay 22, 2026
18 hours ago
Microsoft reports are exposing AI's real cost problem: Using the tech is more expensive than paying human employees
AI
Microsoft reports are exposing AI's real cost problem: Using the tech is more expensive than paying human employees
By Jake AngeloMay 22, 2026
17 hours ago
Pay transparency is exposing a bigger problem: Most companies can't explain why they pay what they pay
Workplace Culture
Pay transparency is exposing a bigger problem: Most companies can't explain why they pay what they pay
By Sydney LakeMay 20, 2026
3 days ago

© 2026 Fortune Media IP Limited. All Rights Reserved. Use of this site constitutes acceptance of our Terms of Use and Privacy Policy | CA Notice at Collection and Privacy Notice | Do Not Sell/Share My Personal Information
FORTUNE is a trademark of Fortune Media IP Limited, registered in the U.S. and other countries. FORTUNE may receive compensation for some links to products and services on this website. Offers may be subject to change without notice.