Hi there. It’s Rachyl Jones, the tech reporting fellow. OpenAI has a little weasel scurrying around the internet collecting up-to-date information to train its large language model, ChatGPT. But it has run into a few locked doors recently.
A series of news organizations—including the New York Times, CNN, Reuters and the Chicago Tribune—have blocked the ability for ChatGPT to access their content, the Guardian reported on Thursday. The Australian Broadcasting Corporation and Australian Community Media, which owns 100 local publications, have also sealed their websites, according to the Guardian.
The weasel I’m referring to is actually a web-crawling software called GPTBot, which OpenAI launched earlier this month. “Allowing GPTBot to access your site can help AI models become more accurate and improve their general capabilities and safety,” OpenAI says on its website.
When ChatGPT launched, it had only been trained on information up until September 2021. Stuck in the year Joe Biden became president and the U.S. military exited Afghanistan, ChatGPT had to learn new things to remain relevant in the heated artificial intelligence race. OpenAI has been working to address this issue, from temporarily launching a Browse with Bing feature to setting its GPTBot free to scour the internet. But “do not enter” signs from major media organizations—which often have the most recent and relevant news—could present a problem for the chatbot.
The terms of service for the Times, Reuters, and the Tribune all explicitly state users may not scrape their data. The Times’ terms specifically say its content cannot be used to train A.I. programs. Publishers are in the business of selling information, whether through providing subscriptions or showing advertisements. Either way, they need people to visit their websites to make money. Freely providing their site’s content to chatbots—which might negate the need to visit publishers’ websites—could hurt their revenue. News outlets have already been struggling to revise their business models after the rise of social media drove advertising dollars away from traditional media. The knife is in publishers’ backs, and they’re trying to keep A.I. from twisting it.
By blocking GPTBot, these media organizations could be pressuring OpenAI to pay for access. Last month, OpenAI struck a deal with the Associated Press to license its news stories for A.I. training purposes. It is unclear how much OpenAI paid, but it’s something others might be interested in. Google, which is also scraping publishers’ sites to train its large language model, could make similar deals with news publishers that have locked it out.
Here’s what else is going on in tech today.
Rachyl Jones
Want to send thoughts or suggestions to Data Sheet? Drop a line here.
NEWSWORTHY
Alibaba’s new A.I. The Chinese e-commerce company launched two new artificial intelligence models on Friday, CNBC reported. One can reportedly understand images, with the ability to respond to basic questions about pictures and write captions. The other can engage in complex conversation, according to the company. Both models are open-source.
Long-awaited regulations for crypto. The U.S. Treasury Department proposed rules on Friday that would treat cryptocurrency platforms similar to stock brokers, forcing them to comply with tax law, according to the Wall Street Journal. The proposed rules are part of a larger push by Congress to crack down on crypto traders who don’t pay taxes on their earnings.
China’s chip imports. China’s imports for semiconductor materials totaled $5 billion in June and July, up 70% from the same time last year, the Financial Times reported. The record high imports for tools to build chips come as the U.S. and its allies are restricting chip exports to China.
ON OUR FEED
“It was time for the EU to set our own rules. A safer Internet for everyone.”
—Thierry Breton, the EU Commissioner for Internal Market, in a post on X about how the Digital Services Act is becoming legally enforceable on Friday for the biggest online platforms. The law regulates sites that connect people with goods, services, and content, and it aims to add greater protections for internet users. It is part of a larger sweep of regulations limiting the power of Big Tech.
IN CASE YOU MISSED IT
Donald Trump returns to rebranded Twitter for the first time since he was banned in 2021—to post his mugshot, by Chloe Taylor
Meet Apollo, your new humanoid coworker, by Sage Lazzaro
Why India is becoming a space force to be reckoned with, by Rachyl Jones
Women in tech can get out of the mid-career limbo by being themselves–and using this one superpower to get ahead, by Samantha Dewalt
Hollywood shouldn’t entirely reject A.I.–it’s already delivering a new era of movie magic, by Howard Wright
BEFORE YOU GO
Stroke survivors could speak again with A.I. The science of translating brain activity to computer-generated speech is moving rapidly, the MIT Technology Review reported, citing two research papers published this week. As part of one experiment, researchers built a digital avatar for a woman who lost her ability to speak as a result of a stroke 18 years ago. Using brain implants and artificial intelligence technologies, she is able to speak at 60 to 70 words per minute and make a limited range of facial expressions through her avatar.
The current technologies aren’t perfect. There is an error rate of 24%, and the speed of speech is far slower than it is for someone without a brain injury. But the advances do show proof of concept, and they point to a huge opportunity to bring speech back to those who have lost it.
This is the web version of Data Sheet, a daily newsletter on the business of tech. Sign up to get it delivered free to your inbox.