Given the reliance of artificial intelligence on machine learning, and of machine learning on data, the conventional wisdom is that privacy and A.I. are fundamentally at odds—progress in one must come at the expense of the other.
Advancing both A.I. and privacy, however, is not a zero-sum game.
Researchers from academia and industry have been marrying ideas from cryptography and machine learning to provide the seemingly paradoxical ability to learn from data without seeing it. This month, the Conference on Neural Information Processing Systems—the largest artificial intelligence conference of the year—showcased some of the promising privacy-preserving technologies that have emerged so far, all of which are also researched at Intel Corporation:
Suppose that a group of hospitals wants to build a machine learning system that analyzes patient data and estimates the likelihood of disease. In theory, by sharing their data, they could build a more accurate model, but privacy rules forbid this. Federated learning offers a solution. Here’s how it works: First, each hospital uses its private data to adjust and improve the performance of a shared model, without any of its private data leaving the hospital. Next, a trusted central server collects all the adjustments, adds them up, and sends this aggregated improvement to all the hospitals.
This method repeats until the model is fully trained, and can help to preserve privacy while providing all of the participating hospitals the benefits of a large, diverse data set. In 2017, Google started testing federated learning in the keyboards of its Android devices.
Differential privacy (DP)
Apps collect our data to understand usage patterns and improve services such as traffic predictions and film recommendations. In theory, aggregated data shouldn’t expose its individual contributions. In practice, however, under certain circumstances, it is possible to re-identify individual information in an aggregated data set.
DP makes it harder to do so by, for example, introducing a bit of randomness to each user’s data to hide the contributions of any one user. Apple uses this privacy-preserving technique when it collects data from your devices, for instance.
Homomorphic encryption (HE)
This technique allows machine learning algorithms to operate on data while it is still encrypted, that is, without access to the underlying sensitive data. Using HE, a hospital could lock sensitive medical data; send it for analysis on a remote, untrusted system; receive back encrypted results; then use its key to decrypt the results—all without ever revealing the underlying data.
Using HE also provides ongoing protection where a solution like DP would not: Even if a machine stores your data without authorization, for example, the data remains protected by encryption. Intel is building HE tools to help researchers develop neural network models that can operate on encrypted data.
Challenges remain, of course.
On the technical front, while the underlying cryptography tools are mature, their applications to privacy-preserving machine learning still have room for improvement. For example, these techniques require additional computing resources compared to non-private methods, although researchers are working on reducing this added cost.
On the policy front, it is unclear to what extent these privacy-preserving machine learning methods would satisfy legislators and regulators. Notably, the EU’s General Data Protection Regulation (GDPR) does not apply to anonymous information. But since encrypted data can be unlocked, it appears unlikely to be considered anonymous—though this remains to be tested in a court of law.
Policymakers need to remain engaged in ongoing technical breakthroughs and the ways they may shift the privacy landscape. As more governments pass laws to regulate privacy, there is a risk of reactive legislation that has not been fully considered, or harmonized with technical realities. Ill-informed privacy concerns can lead to silos or cultures of secrecy inside tech companies that have the unintended consequence of limiting creative problem solving in a nascent A.I. industry.
These concerns underscore the importance of strong partnerships between technologists and policymakers to reach joint regulatory and technical solutions, fueled by research and enabled by industry. By working together, we can ensure that A.I. reaches its full potential, without forcing us to choose between privacy and innovation.
Casimir Wierzynski is senior director of A.I. research and Abigail Hing Wen is managing counsel for the office of the CTO in the Artificial Intelligence Products Group at Intel Corporation.