• Home
  • Latest
  • Fortune 500
  • Finance
  • Tech
  • Leadership
  • Lifestyle
  • Rankings
  • Multimedia
ConferencesBrainstorm AI

It’s getting harder to tell which company is winning the AI race, Hugging Face co-founder says

By
Beatrice Nolan
Beatrice Nolan
Tech Reporter
Down Arrow Button Icon
By
Beatrice Nolan
Beatrice Nolan
Tech Reporter
Down Arrow Button Icon
May 7, 2025, 5:49 AM ET
Cofounder and Chief Science Officer Hugging Face Thomas Wolf, on stage at Brainstorm AI.
An early pioneer of large language models, Hugging Face is best known for its vast repository of open-source and “open-weight” AI models.Fortune
  • Hugging Face’s Thomas Wolf says that it’s getting harder to tell which AI model is the best as traditional AI benchmarks become saturated. Going forward, Wolfe said the AI industry could rely on two new benchmarking approaches—agency‑based and use‑case‑specific.

Thomas Wolf, co‑founder and chief scientist at Hugging Face, thinks we may need new ways to measure AI models.

Recommended Video

Wolf told the audience at Brainstorm AI in London that as AI models get more advanced, it’s becoming increasingly difficult to tell which one is performing the best.

“It’s getting hard to tell what the best model is,” he said, pointing to the nominal differences between recent releases from OpenAI and Google. “They all seem to be, actually, very close.”

“The world of benchmarks has evolved a lot. We used to have this very academic benchmark that we mostly measured the knowledge of the model on—I think the most famous was MMLU (Massive Multitask Language Understanding), which was basically a set of graduate‑level or PhD‑level questions that the model had to answer,” he said. “These benchmarks are mostly all saturated right now.”

Over the past year, there has been a growing chorus of voices from academia, industry, and policy claiming that common AI benchmarks, such as MMLU, GLUE, and HellaSwag, have reached saturation, can be gamed, and no longer reflect real‑world utility.

In a study published in February, researchers at the European Commission’s Joint Research Centre, published a paper called “Can We Trust AI Benchmarks? An Interdisciplinary Review of Current Issues in AI Evaluation” that found “systemic flaws in current benchmarking practices”—including misaligned incentives, construct‑validity failures, gaming of results and data‑contamination.

Going forward, Wolf said the AI industry should rely on two main types of benchmarks going into 2025: one for assessing the agency of the models, where LLMs are expected to do tasks, and the other tailored to each use case for models.

Hugging Face is already working on the latter.

The company’s new program, “Your Bench,” aims to help users determine which model to use for a specific task. Users feed a few documents into the program, which then automatically generates a specific benchmark for the type of work that users can apply to different models to see which one is best for the use case.

“Just because these models are all working the same on this academic benchmark doesn’t really mean that they’re all exactly the same,” Wolf said.

Open‑source’s ‘ChatGPT moment’

Founded by Wolf, Clément Delangue, and Julien Chaumond in 2016, Hugging Face has long been a champion of open‑source AI.

Often referred to as the GitHub of machine learning, the company provides an open‑source platform that enables developers, researchers, and enterprises to build, share, and deploy machine‑learning models, datasets, and applications at scale. Users can also browse models and datasets that others have uploaded.

Wolfe told the Brainstorm AI audience that Hugging Face’s “business model is really aligned with open source” and the company’s “goal is to have the maximum number of people participating in this kind of open community and sharing models.”

Wolfe predicted that open‑source AI would continue to thrive, especially after the success of DeepSeek earlier this year.

After its launch late last year, the Chinese‑made AI model DeepSeek R1 sent shockwaves through the AI world when testers found that it matched or even outperformed American closed‑source AI models.

Wolf said DeepSeek was a “ChatGPT moment” for open‑source AI.

“Just like ChatGPT was the moment the whole world discovered AI, DeepSeek was the moment the whole world discovered there was kind of this open society,” he said.

The Fortune 500 Innovation Forum will convene Fortune 500 executives, U.S. policy officials, top founders, and thought leaders to help define what’s next for the American economy, Nov. 16-17 in Detroit. Apply here.
About the Author
By Beatrice NolanTech Reporter
Twitter icon

Beatrice Nolan is a tech reporter on Fortune’s AI team, covering artificial intelligence and emerging technologies and their impact on work, industry, and culture. She's based in Fortune's London office and holds a bachelor’s degree in English from the University of York. You can reach her securely via Signal at beatricenolan.08

See full bioRight Arrow Button Icon

Latest from our Conferences

Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025

Most Popular

Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Fortune Secondary Logo
Rankings
  • 100 Best Companies
  • Fortune 500
  • Global 500
  • Fortune 500 Europe
  • Most Powerful Women
  • World's Most Admired Companies
  • See All Rankings
  • Lists Calendar
Sections
  • Finance
  • Fortune Crypto
  • Features
  • Leadership
  • Health
  • Commentary
  • Success
  • Retail
  • Mpw
  • Tech
  • Lifestyle
  • CEO Initiative
  • Asia
  • Politics
  • Conferences
  • Europe
  • Newsletters
  • Personal Finance
  • Environment
  • Magazine
  • Education
Customer Support
  • Frequently Asked Questions
  • Customer Service Portal
  • Privacy Policy
  • Terms Of Use
  • Single Issues For Purchase
  • International Print
Commercial Services
  • Advertising
  • Fortune Brand Studio
  • Fortune Analytics
  • Fortune Conferences
  • Business Development
  • Group Subscriptions
About Us
  • About Us
  • Press Center
  • Work At Fortune
  • Terms And Conditions
  • Site Map
  • About Us
  • Press Center
  • Work At Fortune
  • Terms And Conditions
  • Site Map
  • Facebook icon
  • Twitter icon
  • LinkedIn icon
  • Instagram icon
  • Pinterest icon

Latest from our Conferences

mark
ConferencesHospitality
Hyatt’s CEO has built a ‘family’ culture for 20 years. Now he’s leaning on it
By Nick LichtenbergApril 30, 2026
6 hours ago
sweet
ConferencesConsulting
Accenture’s Julie Sweet blew up 50 years of company history. She says the hardest part is still ahead
By Nick LichtenbergApril 29, 2026
1 day ago
anirudh
Conferencesdisruption
Cadence CEO on the AI boom and human nature: ‘there are more tools, but the human part is not different’
By Nick LichtenbergApril 23, 2026
7 days ago
‘I think it’s a mistake’: Delta CEO Ed Bastian refuses to call it ‘artificial intelligence’ because it scares people
ConferencesDelta Air Lines
‘I think it’s a mistake’: Delta CEO Ed Bastian refuses to call it ‘artificial intelligence’ because it scares people
By Nick LichtenbergApril 22, 2026
8 days ago
Fortune Workplace Innovation Summit logo
ConferencesWorkplace Innovation Summit
Fortune Workplace Innovation Summit 2026 livestream
By Fortune EditorsMarch 23, 2026
1 month ago
Fortune COO Summit 2026 livestream
ConferencesCOO Summit
Fortune COO Summit 2026 livestream
By Fortune EditorsMarch 23, 2026
1 month ago

Most Popular

Apple cofounder Ronald Wayne—whose stake would be worth up to $400 billion had he not sold it in 1976—says that at 91, he has no regrets
Success
Apple cofounder Ronald Wayne—whose stake would be worth up to $400 billion had he not sold it in 1976—says that at 91, he has no regrets
By Preston ForeApril 27, 2026
3 days ago
‘They left me no choice’: Powell isn’t going anywhere—blocking Trump from another Fed appointee
Banking
‘They left me no choice’: Powell isn’t going anywhere—blocking Trump from another Fed appointee
By Eva RoytburgApril 29, 2026
23 hours ago
Jamie Dimon gets candid about national debt: ‘There will be a bond crisis, and then we’ll have to deal with it’
Economy
Jamie Dimon gets candid about national debt: ‘There will be a bond crisis, and then we’ll have to deal with it’
By Eleanor PringleApril 29, 2026
1 day ago
‘The cost of compute is far beyond the costs of the employees’: Nvidia executive says right now AI is more expensive than paying human workers
AI
‘The cost of compute is far beyond the costs of the employees’: Nvidia executive says right now AI is more expensive than paying human workers
By Sasha RogelbergApril 28, 2026
2 days ago
Google Cloud revenue is now 18% of Alphabet's business. Is this the beginning of the end of Google's search identity?
Big Tech
Google Cloud revenue is now 18% of Alphabet's business. Is this the beginning of the end of Google's search identity?
By Alexei OreskovicApril 29, 2026
16 hours ago
‘Take the money and run’: Johns Hopkins economist Steve Hanke on why the UAE quit OPEC
Energy
‘Take the money and run’: Johns Hopkins economist Steve Hanke on why the UAE quit OPEC
By Shawn TullyApril 29, 2026
1 day ago

© 2026 Fortune Media IP Limited. All Rights Reserved. Use of this site constitutes acceptance of our Terms of Use and Privacy Policy | CA Notice at Collection and Privacy Notice | Do Not Sell/Share My Personal Information
FORTUNE is a trademark of Fortune Media IP Limited, registered in the U.S. and other countries. FORTUNE may receive compensation for some links to products and services on this website. Offers may be subject to change without notice.