George R.R. Martin, Jodi Picoult and other top authors take on OpenAI

George R. R. Martin attends the FYC special screening for HBO Max's "House Of The Dragon" at DGA Theater Complex on March 07, 2023 in Los Angeles, California.

"Game of Thrones" author George R. R. Martin opposes AI being trained on copyrighted material without prior approval.

Amy Sussman—GA/The Hollywood Reporter via Getty Images

Here comes another major lawsuit over generative AI and copyright, with OpenAI as the target. This class action comes courtesy of the Authors Guild, with a starry list of names among the plaintiffs: George R.R. Martin, Jodi Picoult, Jonathan Franzen, John Grisham…these are the big guns.

As with previous suits in this vein, the core issue is Big AI allegedly training its systems on pirated, copyrighted material. The filing accuses OpenAI of evading “the Copyright Act altogether to publish their lucrative commercial endeavor, taking whatever datasets of relatively recent books they could get their hands on without authorization.”

I particularly enjoyed this bit, which riffs off OpenAI CEO Sam Altman’s recent testimonies on Capitol Hill: “Altman has told Congress that he shares Plaintiffs’ concerns. According to Altman, ‘ensuring that the creator economy continues to be vibrant is an important priority for OpenAI…OpenAI does not want to replace creators. We want our systems to be used to empower creativity, and to support and augment the essential humanity of artists and creators.’ Altman testified that OpenAI ‘think[s] that creators deserve control over how their creations are used’ and that ‘content creators, content owners, need to benefit from this technology.’”

There’s control, and there’s control. For the controller version, here’s Franzen on the authors’ demands: “Authors should have the right to decide when their works are used to ‘train’ AI. If they choose to opt in, they should be appropriately compensated.”

As I wrote last month regarding Google’s submission to an Australian consultation on AI regulation, the tech firms don’t want opt-in systems—they want creators to have to opt out of having their works used as large language model (LLM) training fodder.

This is problematic, and not just because it puts the onus on the creator to protect their works from unwanted exploitation. For an example of why, let’s have a look at the “artist and creative content owner opt out” form that OpenAI just published in tandem with the release of its DALL-E 3 image generator (which features higher-quality images, more plausible hands, and ChatGPT-aided prompting—sorry, “prompt engineers”—and which will be integrated into Microsoft’s Bing Chat).

Apart from telling artists they can tweak their websites to disallow OpenAI’s GPTBot web crawler, the form promises that those sending it in will have their images removed “from future training datasets.” As for the models that are already out there, that ship has sailed—you can’t make an LLM unlearn something, even if you insist, as OpenAI does in its form, that the models “no longer have access to the data” after training.

The Authors Guild notes that OpenAI’s GPT is “already being used to generate books that mimic human authors’ work, such as the recent attempt to generate volumes 6 and 7 of plaintiff George R.R. Martin’s Game of Thrones series A Song of Ice and Fire, as well as the numerous AI-generated books that have been posted on Amazon that attempt to pass themselves off as human-generated and seek to profit off a human author’s hard-earned reputation.”

The ability to tell OpenAI to stay away in the future won’t change that situation. And incidentally, the impossibility of retrospectively altering an existing LLM’s past training dataset could become a big problem for AI companies in areas other than copyright.

TechCrunch reports that the Polish privacy authority has decided to launch an investigation into OpenAI, following a General Data Protection Regulation complaint from privacy and security researcher Lukasz Olejnik. The EU law gives people the right to demand that companies fix incorrect personal information about them. Olejnik found inaccuracies in a ChatGPT-generated biography of himself, which may be par for the course when it comes to genAI, but in the EU Olejnik has the right to ask for rectification—and when he did so, OpenAI told him this was impossible.

It’s all about control. And if OpenAI can’t let people exercise the control afforded to them by the law, it’s in trouble. More news below.

Want to send thoughts or suggestions to Data Sheet? Drop a line here.

David Meyer

NEWSWORTHY

Cisco buys Splunk. Networking giant Cisco will shell out a cool $28 billion to buy the worst-named cybersecurity company in history. As Reuters reports, the deal announcement boosted Splunk’s share price by 23% and trimmed Cisco’s by 5%. From the announcement: “Combined, Cisco and Splunk will become one of the world's largest software companies and will accelerate Cisco's business transformation to more recurring revenue.”

Google-Broadcom debate. Google execs have been seriously discussing dropping Broadcom as a design partner for its AI chips, according to The Information. Google is reportedly annoyed at how much Broadcom is charging for the “tensor processing units” and may bring the design process in-house. The report knocked Broadcom’s share price by 6%. (Bonus read: Staying with silicon, do read the Wall Street Journal’s piece on Apple’s inability to stop using Qualcomm modems.)

Musk’s monkeys. A Wired report suggests Elon Musk was not being truthful when he claimed monkeys that died during Neuralink’s brain-implant trials did not die because of the implants themselves (he said they were terminally ill). UC Davis veterinary records show some of the deaths were very much the result of those implants, and the details do not make for pleasant reading. Neuralink is about to start recruiting human test subjects.

ON OUR FEED

"I believe many of our competitors will learn from our smartphone innovations and I welcome them to do so.”

—William Li, the CEO of Chinese automaker Nio, hails the launch of his company’s first smartphone. The Nio Phone is specifically for Nio drivers and can be used to tell the firm’s EVs to drive themselves to the user’s location, which is kosher in some places in China. It has over 30 car-specific functions in total.

IN CASE YOU MISSED IT

Elon Musk is attempting to enlist Taylor Swift’s help to boost traffic on his X platform, by Christiaan Hetzner

Google ‘Zooglers’ might be sending housing costs in one European hub higher than London and New York, by Ryan Hogg

Indeed CEO: ‘AI is changing the way we find jobs and how we work. People like me should not be alone in making decisions that affect millions of people’, by Chris Hyams

Cathie Wood steered clear of Arm IPO frenzy because there was ‘too much emphasis on AI’, by Chloe Taylor

‘After careful consideration,’ Amazon’s scraps an extra 2% fee for some merchants as regulators plot an antitrust lawsuit, by Bloomberg

A man allegedly drove off a collapsed bridge and because of Google Maps. Now his family is suing the company over his drowning, by the Associated Press

BEFORE YOU GO

Deepfake influencers. MIT Technology Review has a fascinating piece on how smaller Chinese brands are addressing the now-essential livestreaming element of e-commerce by using AI-generated hosts to present products and deals 24/7. Companies apparently need to pay just $1,100 or thereabouts, and hand over just a couple minutes of sample video, to get a cloned presenter with a year’s maintenance thrown in.

From the piece: “While the scripts were once pre-written by humans, companies are now using large language models to generate them too. Now, all the human workers have to do is input basic information such as the name and price of the product being sold, proofread the generated script, and watch the digital influencer go live.”

This is the web version of Data Sheet, a daily newsletter on the business of tech. Sign up to get it delivered free to your inbox.