AI Has a Big Privacy Problem And Europe's GDPR Is About to Expose It

“Artificial intelligence” technology is on a roll these days, but it’s about to hit a major blockage: Europe’s new General Data Protection Regulation (GDPR).

The privacy law, which came into effect across the EU on Friday, has several elements that will make life very difficult for companies building machine learning systems, according to a leading Internet law academic.

“Big data is completely opposed to the basis of data protection,” said Lilian Edwards, a law professor at the University of Strathclyde in Glasgow, Scotland. “I think people have been very glib about saying we can make the two reconcilable, because it’s very difficult.”

Here’s the issue. Machine learning—the basis of what we call AI—involves algorithms that progressively improve themselves. They do this by feasting on data. The more they consume, the better they get at spotting patterns: speech patterns that make it easier for a bot to sound like a human; visual patterns that help an autonomous car system recognize objects on the road; customer behavior patterns that train a bank’s AI systems to better spot fraud.

All the while, the algorithms evolve themselves beyond the understanding of the people who created them, and the data gets combined with other data in new and mysterious ways.

Now let’s go over to the GDPR, which says:

When they collect personal data, companies have to say what it will be used for, and not use it for anything else.
Companies are supposed to minimize the amount of data they collect and keep, limiting it to what is strictly necessary for those purposes—they’re supposed to put limits on how long they hold that data, too.
Companies have to be able to tell people what data they hold on them, and what’s being done with it.
Companies should be able to alter or get rid of people’s personal data if requested.
If personal data is used to make automated decisions about people, companies must be able to explain the logic behind the decision-making process.

See the problem?

“Big data challenges purpose limitation, data minimization and data retention–most people never get rid of it with big data,” said Edwards. “It challenges transparency and the notion of consent, since you can’t consent lawfully without knowing to what purposes you’re consenting… Algorithmic transparency means you can see how the decision is reached, but you can’t with [machine-learning] systems because it’s not rule-based software.”

According to the professor, the issue becomes even more fraught where companies use people’s data to infer things about them—sensitive personal data, which includes things like sexuality and political and religious beliefs, gets even stronger protections under the GDPR.

Normally, the GDPR gives companies other legal justifications that they can use to process people’s data, such as the need to use that data in order to provide core services. But where this kind of sensitive data is concerned, people have to give their explicit consent to its processing.

“That ‘get out of jail free’ card goes if it’s sensitive data. The other way people try and get out of it is to say, ‘We’re not processing personal data because we’re anonymizing it,'” said Edwards. “But if you take enough big data and combine it with other bits of big data, you can re-identify almost everybody. It doesn’t fit the definition of anonymized data in the GDPR. The definition now is if you can single people out with the data you have, then you have personal data.”

And if your machine-learning systems are full of personal data, then under the GDPR it must be possible to pull that information out of the mix, alter it, limit what’s done with it, and explain how the system works.

“To be fair I am working with people who are trying to produce explainable systems, but at the moment it’s still very academic,” Edwards added.