Smart speakers can mistakenly record audio nearly 20 times per day on average, study finds
Hey, Google! Alexa! Are you recording my private conversations? If you ask your smart speaker that question, the voice-enabled assistants will deny invading your privacy. But researchers now have a scientifically proven answer: Yes they are.
A new study released last week reveals just how often smart speakers, equipped with voice assistants from Google, Amazon, Apple, and Microsoft, are activating and recording audio clips without users’ permission.
According to the research, smart speakers activate without user permission between 1.5 and 19 times a day on average, depending on much dialogue is spoken around the devices. During these activations, more than half the devices recorded for 6 seconds or more, with the longest recording time lasting 43 seconds. The findings come from a study conducted by professors at Northeastern University and Imperial College London, who have spent the last six months studying smart speakers.
“If you’re worried about your conversations being recorded, its founded,” says David Choffnes, a Northeastern University assistant professor who co-authored the study. “We found it through our own controlled experiments.”
The study comes as speculation rises about smart speakers invading the privacy of consumers who buy them. Meanwhile, regulators, politicians, and consumers are increasingly scrutinizing Big Tech companies like Google and Amazon for their failure to be transparent about the data they’re collecting across their services, and what that data is being used for. Companies also are getting hammered for not providing users clear information about what is and isn’t in their control, and how to adjust those settings.
The researchers tested five speakers that use one of four voice assistants: Amazon’s Alexa, Apple’s Siri, Google voice assistant, and Microsoft’s Cortana. The study used a set of popular TV shows, like Gilmore Girls and The Office, on Netflix to simulate private conversations within a user’s home. They then played the audio nonstop for 125 hours and repeated the experiment at least four times. Once they had the results, they then tried to recreate situations where the smart speakers activated erroneously to see if there were consistencies.
The good news is researchers “found no evidence” that the speakers were constantly recording conversations in their environments. And most of the times devices erroneously activated, it was due to words that sounded similar to the wake word. For e.g., a character might’ve said “seriously,” which sounds similar to “Hey Siri.”
But the problem is that the devices are regularly activating and recording conversations that were not intended to be captured to begin with. “If you have a device in your home and you’re not interacting with it, and it’s potentially capturing data… it adds to this whole new sentiment to privacy,” says Drew Schuil, president and chief operating officer of privacy tech provider Integris Software. “It hits home.”
Consumers are getting increasingly accustomed to having companies collect small amounts of data on them for the sake of better service, says Rehan Jalil, CEO of Securiti.ai, a security compliance software company. But that can quickly become dangerous, as technology continues to rapidly progress.
“It’s very slippery slope,” he says. “You get comfortable and then it goes to the next advancement, and the next—privacy is at stake.”
For this story, Fortune reviewed what a Google Home, one of Google’s smart speakers, had captured. While all of the “hey Google” commands were tracked, not all had available audio recordings attached. Google allows users to delete the audio clips and configure their privacy settings for automatic deletion after three months. But within the recordings Fortune examined, there was no information given about whether recordings had been sent to human or machine reviewers, which Google is using to improve the assistant’s natural language processing.
Over the the previous two years of regular use, numerous cases of “unknown voice command” were listed, apparently triggered when the speaker had been activated but seemingly did not record anything. In a few cases, there were recordings of television audio, when characters on screen mentioned “Google” or said words that sound like Google. But there were also a couple recordings—one 8 seconds long, another 4 seconds—that included private conversations.
“I Googled a bunch of apartment buildings,” a person says in one recording, followed by the name of a specific property.
It’s that type of data captured from smart speakers that have alerted researchers. Choffnes says this is just the beginning of their work, as the team plans to delve into answering questions like: What happens to the data collected by those devices, regardless of whether they’re recorded with permission?
“If you ask your smart speaker about a politician of a certain pursuant, does that change anything?” Choffnes asks. “It depends on how that information is shared within the organization and how it’s shared outside. On the web, we know there are data brokers selling and sharing information. There’s no reason to believe this would be different, it’s just no one has looked yet.”
Jules Polonetsky, a former privacy exec and current CEO of the nonprofit Future of Privacy Forum, says he’s not surprised at any of the smart speaker study’s findings. But it does rehash a growing concern he has in this space: legal protection of personal data that’s stored in the cloud.
“The law protects data housed in servers differently than it protects data in my home,” he says. “If the government wants data from my home, they need a warrant. But if that data exists on another server… the standard to get that data is lower.”
And while a five-second snippet of information about an apartment complex may seem rather benign, combined with other data it could reveal a lot about a users’ identity, Schuil says. While the majority of consumer data is being leveraged for advertising purposes, he believes there could be more nefarious results that could result from what he calls a toxic combination of data.
“It’s the other use cases where this gets out into the dark web where hackers can get ahold of it and create a convincing phishing email or phone fraud,” Schuil warns. “What are the other use cases that can be used with this data if it gets in the wrong hands?”
More must-read stories from Fortune:
—HTC CEO on the company’s “new vision,” VR, and Facebook rivalry
—New, online A.I. course targets an important market: bosses
—Can San Francisco be saved?
—Did the ‘techlash’ kill Alphabet’s city of the future?
—How wireless carriers rank on 5G speeds
Catch up with Data Sheet, Fortune’s daily digest on the business of tech.