• Home
  • Latest
  • Fortune 500
  • Finance
  • Tech
  • Leadership
  • Lifestyle
  • Rankings
  • Multimedia

Trendingnow

1

Pentagon accuses Alibaba, Baidu and BYD, three of China's biggest companies, of supporting the Chinese military

2

'We are rapidly running out of time': Watchdog sounds Social Security alarm after 22% cut confirmed for 2032

3

Costco CEO Ron Vachris rose from forklift driver to the C-suite without a college degree: ‘Don’t chase a title’ is the career advice that got him there

1

Pentagon accuses Alibaba, Baidu and BYD, three of China's biggest companies, of supporting the Chinese military

2

'We are rapidly running out of time': Watchdog sounds Social Security alarm after 22% cut confirmed for 2032

3

Costco CEO Ron Vachris rose from forklift driver to the C-suite without a college degree: ‘Don’t chase a title’ is the career advice that got him there
TechBig Data

Yahoo Opens a Treasure Trove of Research Data

Barb Darrow
By
Barb Darrow
Barb Darrow
Down Arrow Button Icon
Barb Darrow
By
Barb Darrow
Barb Darrow
Down Arrow Button Icon
January 14, 2016, 5:03 PM ET
Yahoo Labs

In the era of big data, where researchers truly need massive amounts of information from many sources, more really is more. Extremely big data sets are needed to test out new academic theories and to replicate the results of already-proposed theories.

So Thursday’s announcement that Yahoo (YHOO) Labs’ is releasing 13.5 terabytes of data culled from 20 million readers of Yahoo News, Finance, Sports, and other sites over four months, was a big deal for academics and big data heads, who will now be able to slice and dice it.

But this data can also bring advantages to mere mortals who don’t care about big data or machine learning, a technology, that enables computers to recognize patterns and use algorithms to “learn” from the data they examine.

For example, research using this data could lead to a news page perfectly tailored to users’ own interests—one that shows their team’s scores and injury reports; reviews of their favorite author’s new book; real estate postings of areas they’re interested in, for example.

Marissa Mayer May Be Running Out of Options at Yahoo

As Suju Rajan, Yahoo Labs’ director of research for personalization science, puts it, making content more personally appealing is a good thing.

“I’m in Austin and a Longhorn fan, my husband likes the Houston Rockets. When he goes to Yahoo he wants to see what is most useful to him, I want to see what’s most useful to me,” Rajan said.

While Yahoo News is already somewhat customized, that is based on a combination of user-provided preferences and inferred preferences gleaned from the user’s reading behavior.

Because Yahoo hosts so many big sites (Yahoo News, Sports, Finance, and more) it has lots of content and many users viewing that content—and that’s a valuable combo. Rajan is careful to note that users had to opt-in to participate in the data gathering process and that all personally identifiable information (PII) was stripped out.

Did Yahoo score a touchdown with the NFL stream? Watch:

In her post, Rajan called the data trove the “largest-ever machine learning data set” ever offered to researchers. It comprises 110 billion “events” or records culled from reader interactions with Yahoo sites from February to May 2015.

Yahoo Labs would like for this data set to become the benchmark for gauging the performance of machine learning algorithms going forward, she said.

The data is offered as part of Yahoo Labs’ existing Webscope program, which releases anonymized user data for non-commercial use, according to the post.

Get Data Sheet, Fortune’s technology newsletter.

Companies like Yahoo, Facebook (FB), Google (GOOG) all collect massive amounts of user data. Being able to claim leadership by providing the biggest-and-best public data at the very least gives Yahoo bragging rights.

Two years ago, for example, Google offered up the GDELT data set comprising a quarter-of-a billion records, to anyone wanting to run queries of it using Google’s BigQuery tool. When that happened Google billed GDELT as the world’s largest data set.

For all its business problems which chief executive Marissa Mayer is trying to solve, Yahoo has always had ambitious and cool technology. For instance, the company contributed mightily to Hadoop, the popular open-source framework for storing and processing distributed data.

Projects and contributions like this data set may be one way for Yahoo prove it still has the wherewithal to do great work.

About the Author
Barb Darrow
By Barb Darrow
See full bioRight Arrow Button Icon

Latest in Tech

Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025

Most Popular

Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Fortune Secondary Logo
Rankings
  • 100 Best Companies
  • Fortune 500
  • Global 500
  • Fortune 500 Europe
  • Most Powerful Women
  • World's Most Admired Companies
  • See All Rankings
  • Lists Calendar
Sections
  • Finance
  • Fortune Crypto
  • Features
  • Leadership
  • Health
  • Commentary
  • Success
  • Retail
  • Mpw
  • Tech
  • Lifestyle
  • CEO Initiative
  • Asia
  • Politics
  • Conferences
  • Europe
  • Newsletters
  • Personal Finance
  • Environment
  • Magazine
  • Education
Customer Support
  • Frequently Asked Questions
  • Customer Service Portal
  • Privacy Policy
  • Terms Of Use
  • Single Issues For Purchase
  • International Print
Commercial Services
  • Advertising
  • Fortune Brand Studio
  • Fortune Analytics
  • Fortune Conferences
  • Business Development
  • Group Subscriptions
About Us
  • About Us
  • Press Center
  • Work At Fortune
  • Terms And Conditions
  • Site Map
  • About Us
  • Press Center
  • Work At Fortune
  • Terms And Conditions
  • Site Map
  • Facebook icon
  • Twitter icon
  • LinkedIn icon
  • Instagram icon
  • Pinterest icon

Latest in Tech

A $7 billion horse race: Goldman Sachs and Morgan Stanley battle for ‘lead left’ position ahead of OpenAI and Anthropic IPOs
Startups & VentureFinance
A $7 billion horse race: Goldman Sachs and Morgan Stanley battle for ‘lead left’ position ahead of OpenAI and Anthropic IPOs
By Shawn TullyJune 10, 2026
1 hour ago
Visa’s CFO downplays the importance of stablecoin and agentic commerce to the U.S. payments giant—at least in the short term
Bankingdigital and mobile payments
Visa’s CFO downplays the importance of stablecoin and agentic commerce to the U.S. payments giant—at least in the short term
By Angelica AngJune 10, 2026
1 hour ago
Man in a white shirt and jacket.
InnovationBrainstorm Tech
Marc Lore’s robots make 500 burrito bowls an hour. A human can make 45
By Amanda GerutJune 9, 2026
8 hours ago
A trader works on the floor of the New York Stock Exchange (NYSE) in New York, US, on Wednesday, June 3, 2026
InvestingWall Street
Wall Street dumped nearly $1 trillion in tech stocks by midday—then clawed it back and bought peanut butter and paint
By Eva RoytburgJune 9, 2026
11 hours ago
AI isn’t replacing Hyatt’s salespeople—it’s freeing up a full day of work every week, according to the CEO
AIBrainstorm Tech
AI isn’t replacing Hyatt’s salespeople—it’s freeing up a full day of work every week, according to the CEO
By Sharon GoldmanJune 9, 2026
11 hours ago
America’s grid is reeling. General Motors offers itself as a distributed utility in disguise
EnergyAutos
America’s grid is reeling. General Motors offers itself as a distributed utility in disguise
By Nick LichtenbergJune 9, 2026
11 hours ago

Most Popular

Pentagon accuses Alibaba, Baidu and BYD, three of China's biggest companies, of supporting the Chinese military
Asia
Pentagon accuses Alibaba, Baidu and BYD, three of China's biggest companies, of supporting the Chinese military
By Kate O'Keeffe and BloombergJune 8, 2026
1 day ago
'We are rapidly running out of time': Watchdog sounds Social Security alarm after 22% cut confirmed for 2032
Economy
'We are rapidly running out of time': Watchdog sounds Social Security alarm after 22% cut confirmed for 2032
By Nick LichtenbergJune 9, 2026
16 hours ago
Costco CEO Ron Vachris rose from forklift driver to the C-suite without a college degree: ‘Don’t chase a title’ is the career advice that got him there
Success
Costco CEO Ron Vachris rose from forklift driver to the C-suite without a college degree: ‘Don’t chase a title’ is the career advice that got him there
By Preston ForeJune 8, 2026
2 days ago
Trump, who has repeatedly called climate change fake, is now threatening Brazil with tariffs over the deforestation of the Amazon
Environment
Trump, who has repeatedly called climate change fake, is now threatening Brazil with tariffs over the deforestation of the Amazon
By Sasha RogelbergJune 8, 2026
1 day ago
Current price of oil as of June 8, 2026
Personal Finance
Current price of oil as of June 8, 2026
By Joseph HostetlerJune 8, 2026
2 days ago
Current price of oil as of June 9, 2026
Personal Finance
Current price of oil as of June 9, 2026
By Joseph HostetlerJune 9, 2026
19 hours ago

© 2026 Fortune Media IP Limited. All Rights Reserved. Use of this site constitutes acceptance of our Terms of Use and Privacy Policy | CA Notice at Collection and Privacy Notice | Do Not Sell/Share My Personal Information
FORTUNE is a trademark of Fortune Media IP Limited, registered in the U.S. and other countries. FORTUNE may receive compensation for some links to products and services on this website. Offers may be subject to change without notice.