• Home
  • News
  • Fortune 500
  • Tech
  • Finance
  • Leadership
  • Lifestyle
  • Rankings
  • Multimedia
TechOpenAI

What if OpenAI trained ChatGPT with illegal data scraping? The New York Times is reportedly considering suing to put that to the test

Irina Ivanova
By
Irina Ivanova
Irina Ivanova
Deputy US News Editor
Irina Ivanova
By
Irina Ivanova
Irina Ivanova
Deputy US News Editor
August 17, 2023, 5:08 PM ET
Sam Altman
OpenAI CEO Sam AltmanTomohiro Ohsumi—Getty Images

Legal woes are piling up for OpenAI, the startup behind ultra-popular ChatGPT. NPR reports that the New York Times is considering suing OpenAI after attempts to reach a deal in which OpenAI would license news content to train its algorithms failed to progress.

Recommended Video

If the lawsuit materializes, it would be the highest-profile attempt yet to bring to heel ChatGPT,  a tool whose hype has taken the world by storm. And a successful lawsuit could even go further than that, forcing OpenAI to retrain ChatGPT at great expense, as it would essentially remove much of the language on which the large language model has been trained.

Of note is that the Times was part of a group collectively lobbying for regulations on A.I., until it suddenly removed itself, according to Semafor. The Times’ lawsuit also is not alone in arguing that OpenAI has illegally scraped training data. Comedian Sarah Silverman and authors Paul Tremblay, Mona Awad, and Christopher Golden, sued OpenAI last month, alleging the company committed “indus­trial-strength” plagiarism when it trained ChatGPT on their work. 

In January, a trio of commercial artists sued the creators of the popular image-creating engine Midjourney, accusing it of stealing their work to create knockoffs, preventing artists from making a living off their work. The artists’ lawyers called the technology “a par­a­site that, if allowed to pro­lif­er­ate, will cause irrepara­ble harm to artists.” And Getty, the image-licensing service, has sued Stability AI, accusing it of illegally copying 12 million Getty-owned images in a bid to create a competing service. Meanwhile, earlier on Thursday, the Associated Press came up with a set of A.I. standards for staff that encourages them to experiment with the technology but forbids using it to create any content or images that would be published.

Even Elon Musk, who famously left OpenAI’s board in 2018, claimed in July of this year that “extreme levels of data scraping” were happening on Twitter at the hands of A.I. companies. “Almost every company doing A.I., from startups to some of the biggest corporations on earth, was scraping vast amounts of data. It is rather galling to have to bring large numbers of servers online on an emergency basis just to facilitate some A.I. startup’s outrageous valuation.”

The Times’ concern, according to NPR, is that OpenAI would create a direct competitor to its reporting “by creating text that answers questions based on the original reporting and writing of the paper’s staff.”

Neither the Times nor OpenAI immediately replied to a request for comment. However, the Times has a good reason to fear competition from ChatGPT. Small businesses that rely on web traffic have seen it destroyed by a more basic piece of technology—Google’s search box, which presents the answer to a typed question as a paragraph at the top of search results. 

The niche site CelebrityNetWorth used to do decent business as a source for people curious about celebs’ financial dealings, but after Google started presenting celebrities’ net worth in its search box, traffic to CelebrityNetWorth plunged by two-thirds, and the site had to lay off half its staff, its founder told The Outline.

“If it happens, this lawsuit will be about the value of gathering information and who gets to use it for their customers,” Jeremy Gilbert, Knight professor in digital media strategy at Northwestern University’s Medill School, told Fortune.

The search engine Bing (whose owner, Microsoft, has invested billions in OpenAI) is now using ChatGPT to power its searches. If a person were to ask Bing a question, the search engine could instantly produce a long, detailed answer based on New York Times reporting, eliminating the person’s need to visit the Times’ website (and cheating the paper of revenue). 

“Publishers feel most comfortable with direct traffic to news,” Gilbert said. But a large language model like ChatGPT’s “may not send you to the news website at all. 

“If [audiences] get everything they need without clicking through to the New York Times, how does the New York Times fund its reporting? Even if that’s much more satisfying for the consumer, it’s fundamentally untenable,” he said.

A group of media outlets, led by IAC, have formed a coalition to pressure OpenAI into paying them “billions” for the use of their work as training material.

OpenAI is copying everything—but is it legal? 

It’s no secret that OpenAI has been trained on a vast sea of data—novels, web forums, conversations, news articles, photos, and illustrations—scraped from the public web. 

What’s not clear yet is whether this scraping is legal. And a growing number of writers and artists say it isn’t, with lawsuits mounting against OpenAI and other generative-A.I. creators accusing them of copyright infringement. 

Even OpenAI’s users are creeped out by the thought of being training material: In response to user backlash, OpenAI this spring changed its terms to clarify that prompts submitted to ChatGPT would not be used to train the bot. 

Generative A.I. “is a minefield for copyright law,” a group of lawyers and media scholars recently wrote. The courts’ views of what, exactly, the technology does will be a key deciding factor in these cases. 

 If judges believe that the materials A.I. spits out are new creations, or that they significantly transform the works they’re based on, they’re likely to see its treatment of copyrighted works as fair use.

If, on the other hand, they believe the A.I. is simply copying and regurgitating others’ works, they could find its use illegal, and force OpenAI to destroy all copies of those works in its dataset.

Regardless of how the courts rule, the Times seems set to get its share of the A.I. pie. Speaking at a Cannes Lions event this spring, Times CEO Meredith Kopit Levien said: “There must be fair value exchange for the content that’s already been used, and the content that will continue to be used, to train models.”

Fortune Brainstorm AI returns to San Francisco Dec. 8–9 to convene the smartest people we know—technologists, entrepreneurs, Fortune Global 500 executives, investors, policymakers, and the brilliant minds in between—to explore and interrogate the most pressing questions about AI at another pivotal moment. Register here.
About the Author
Irina Ivanova
By Irina IvanovaDeputy US News Editor

Irina Ivanova is the former deputy U.S. news editor at Fortune.

 

See full bioRight Arrow Button Icon
Rankings
  • 100 Best Companies
  • Fortune 500
  • Global 500
  • Fortune 500 Europe
  • Most Powerful Women
  • Future 50
  • World’s Most Admired Companies
  • See All Rankings
Sections
  • Finance
  • Leadership
  • Success
  • Tech
  • Asia
  • Europe
  • Environment
  • Fortune Crypto
  • Health
  • Retail
  • Lifestyle
  • Politics
  • Newsletters
  • Magazine
  • Features
  • Commentary
  • Mpw
  • CEO Initiative
  • Conferences
  • Personal Finance
  • Education
Customer Support
  • Frequently Asked Questions
  • Customer Service Portal
  • Privacy Policy
  • Terms Of Use
  • Single Issues For Purchase
  • International Print
Commercial Services
  • Advertising
  • Fortune Brand Studio
  • Fortune Analytics
  • Fortune Conferences
  • Business Development
About Us
  • About Us
  • Editorial Calendar
  • Press Center
  • Work At Fortune
  • Diversity And Inclusion
  • Terms And Conditions
  • Site Map

© 2025 Fortune Media IP Limited. All Rights Reserved. Use of this site constitutes acceptance of our Terms of Use and Privacy Policy | CA Notice at Collection and Privacy Notice | Do Not Sell/Share My Personal Information
FORTUNE is a trademark of Fortune Media IP Limited, registered in the U.S. and other countries. FORTUNE may receive compensation for some links to products and services on this website. Offers may be subject to change without notice.