Hello and welcome to Eye on AI. In today’s edition…Meta admits only regulation will stop it from scraping user data to train its AI products; AI models get more multimodal (and more emotional); big tech splurges big on AI equipment.
During a government inquiry earlier this week in Australia, Meta’s global privacy director Melinda Claybaugh admitted that the company scraped the public photos and posts of every Australian adult user to train its AI models without providing an option to opt out—because it can. More specifically, because regulators haven’t stopped it from doing so.
According to the Australian Broadcasting Company, Claybaugh initially denied the scraping of user data for AI models, explicitly stating “we have not done that” when a senator asked whether Meta used posts from Australian users going back as far back as 2007 to train its AI products. After being pressed, she said that yes, in fact, the company has done that.
“The truth of the matter is that unless you have consciously set those posts to private since 2007, Meta has just decided that you will scrape all of the photos and all of the texts from every public post on Instagram or Facebook since 2007, unless there was a conscious decision to set them on private. That’s the reality, isn’t it?” another senator asked.
“Correct,” replied Claybaugh.
Meta equates ‘public’ profiles and posts with consent to train AI
Claybaugh went on to make clear that the company has not scraped data from users under 18, but she could not answer about the fate of posts from users who are now adults but opened their accounts when they were under 18. She also said that public photos of children posted by adults would have been scrapped. Additionally, she said that a user setting their account to private to prevent future scraping would not allow them to regain privacy over what’s already been used.
One could argue that these posts are public and open for the taking, but there’s a world of difference between allowing a profile to be public so users can interact with you and giving a trillion-dollar company unfettered permission to take your content and use it to build commercial AI products as it wishes. Yes, Meta may have legally gotten permission by putting its right to do so in its terms and conditions, but that is not anything near explicit or real consent. Overall, it’s a bleak picture of user privacy.
Meta having questionable data practices is not exactly new—and it’s already been known that the company is training its models on user data. What struck me as interesting about this, however, was the part of the conversation that unfolded about why Australians were not offered an option to opt out.
Meta says EU regulation means EU users get unique opt-out
After being questioned about why Australian users don’t have the option to opt out as users in the EU do, Claybaugh cited the Union’s strict data laws.
“In Europe there is an ongoing legal question around what is the interpretation of existing privacy law with respect to AI training. We have paused launching our AI products in Europe while there is a lack of certainty. So you are correct that we are offering an opt-out to users in Europe. I will say that the ongoing conversation in Europe is the direct result of the existing regulatory landscape,” she said, referring to the European Union’s General Data Protection Regulation (GDPR). That law has created obstacles for Meta’s plans to train on EU user data and also caused the company in June to announce it was pausing plans to roll out its AI models to EU users.
Basically, Claybaugh admitted that Meta will only take a privacy-first approach if regulators force its hand. In response to a request for comment about not offering an opt-out option in all jurisdictions, a Meta spokesperson further made clear that users are at the whim of their local regulators. “While we don’t currently have an opt-out feature, we’ve built in-platform tools that allow people to delete their personal information from chats with Meta AI across our apps. Depending on where people live, they can also object to the use of their personal information being used to build and train AI consistent with local privacy laws,” the spokesperson told Eye on AI.
Why not mandate opt-in rather than opt-out?
Many would argue that the entire discussion is misguided and that the process should be entirely opt-in. And, overall, the saga highlights the importance of both data privacy and AI regulations.
Critics of regulating AI often argue that doing so would inadvertently hurt smaller AI startups, handing dominance to the big tech players. But those big tech players already dominate–and they use their access to massive amounts of user data to continue to advance their position. Regulation around data privacy might actually give smaller companies a fighting chance, especially if they found a less ethically challenged way to collect data.
What’s more, Meta’s comments underline the urgent role regulators have in protecting consumers amid AI’s rapid proliferation.
And with that, here’s more AI news.
Sage Lazzaro
sage.lazzaro@consultant.fortune.com
sagelazzaro.com
AI IN THE NEWS
Mistral unveils its first multimodal modal. Called Pixtral, the model boasts 12 billion parameters and is the first from the French AI startup to be able to process both text and images. The model is available on GitHub and Hugging Face and can be fine-tuned. You can read more about Pixtral in TechCrunch.
OpenAI releases “Strawberry” AI model. The company says its new o1 model can answer more complex questions that stump its GPT-4 models and solve harder problems in science, coding, and math.
Hume AI launches an API to make popular LLMs more emotional. Called Empathetic Voice Interface 2, the voice-to-voice model can be hooked up to models from OpenAI, Anthropic, Meta, and others to allow users to give the models more emotionally expressive voices. The company says the model can understand a user’s tone of voice, generate any tone of voice, and emulate a wide range of personalities, accents, and speaking styles. It’s also actively being trained on new languages, the company says. Wired tested the technology and said its output is similar to ChatGPT’s Advanced Voice Mode, and that like OpenAI’s offering, it’s far more emotionally expressive than most conventional voice interfaces.
Taylor Swift cites AI in her endorsement of Kamala Harris. “Recently I was made aware that AI of ‘me’ falsely endorsing Donald Trump’s presidential run was posted to his site. It really conjured up my fears around AI, and the dangers of spreading misinformation. It brought me to the conclusion that I need to be very transparent about my actual plans for this election as a voter. The simplest way to combat misinformation is with the truth,” she wrote in an Instagram post endorsing Harris, shared shortly after Tuesday’s presidential debate. The series of images—at least 15 of which are confirmed to have been generated by AI—were part of a campaign by pro-Trump accounts pushing a false narrative that fans of Swift are turning their support toward the former president. Trump himself re-shared several of the images, furthering the false claims. You can read more about this from Fortune’s Jenn Brice.
FORTUNE ON AI
In the AI boom, incumbents are better positioned to compete with startups than in previous cycles, VCs say —by Allie Garfinkle
Kamala Harris attacks Donald Trump’s tech trade plan during the debate: ‘He basically sold us out’ to China —by Jenn Brice
Nvidia CEO Jensen Huang says AI chip shortage is making his customers tense and emotional —by Christiaan Hetzner
AI CALENDAR
Sept. 17-19: Dreamforce, San Francisco
Sept. 25-26: Meta Connect in Menlo Park, Calif.
Oct. 22-23: TedAI, San Francisco
Oct. 28-30: Voice & AI, Arlington, Va.
Nov. 19-22: Microsoft Ignite, Chicago, Ill.
Dec. 2-6: AWS re:Invent, Las Vegas, Nev.
Dec. 8-12: Neural Information Processing Systems (Neurips) 2024 in Vancouver, British Columbia
Dec. 9-10: Fortune Brainstorm AI San Francisco (register here)
EYE ON AI NUMBERS
$52.9 billion
That’s how much Alphabet, Amazon, Meta, and Microsoft combined have spent on purchases of AI “property and equipment” over the past quarter, according to the Wall Street Journal.
The estimated number of data centers has skyrocketed since the generative AI boom began in 2022, reaching an all-time high this quarter with just under 1,000 data centers between the four tech giants, according to the data. In the series of charts published by the Journal, the only one that goes up and to the right more sharply is the one depicting Nvidia’s quarterly revenue—which has become one of the biggest stories of the recent AI boom. Nvidia’s H100 chips are, of course, the most sought-after for AI computing and represent a significant chunk of those equipment purchases.