How privacy-preserving A.I. can help feed data hungry algorithms

This is the web version of Eye on A.I., Fortune’s weekly newsletter covering artificial intelligence and business. To get it delivered weekly to your in-box, sign up here.

Creators of A.I. software can be forgiven for feeling a bit like Seymour from the film Little Shop of Horrors, their hungry algorithms constantly crying out “feed me, feed me!” Luckily, the software doesn’t have a taste for human flesh. But it does need a constant diet of data, the fresher the better. The more data the A.I. systems are trained on, the more accurately they tend to perform.

But finding enough data is a constant problem. And the quest to obtain data to train A.I. software has sometimes led companies into ethically-dubious territory: collecting and using people’s information without clear, explicit consent. In other cases, data privacy and data governance considerations have meant that companies simply can’t obtain large enough datasets to train good machine learning systems.

Riddhiman Das encountered this problem firsthand when he worked as a researcher and product architect at a biometric security company that used machine learning to try to identify people based on the pattern of blood vessels in their eyes. (The company, originally called EyeVerify, later changed its name to Zoloz and was sold to Chinese financial giant Ant Group for a reported $100 million.) The problem, Das says, is that it was extremely difficult to obtain a dataset that covered a diverse enough set of people to be sure that the product would work correctly for every demographic group around the world. “We had a huge problem getting access to the right training data from the right places,” he tells me. “In a new country, we got a higher false negative rate until we adjusted to that local population.”

Later, Das joined Ant Group and worked helping it and its former parent company Alibaba search for technology investments worldwide. There, he said, he found moving financial data across national and regional boundaries was fraught due to different privacy rules and regulatory concerns. This made it difficult to train A.I. systems that would work well globally. (In fact, earlier this year, Ant announced it was divesting from Zoloz because of American concerns about Chinese companies processing the data of U.S. citizens.)

Das left Ant and Alibaba and went on to co-found TripleBlind, a startup based in Kansas City, that is seeking to address this problem. It offers users a simple way to use several different privacy-preserving machine-learning techniques and ensure they are complying with data privacy and data governance rules. Among the methods TripleBlind employs is “blind learning,” an approach the company has pioneered but which is based a previous technology called split learning. It involves taking a neural network and divvying it up among different parties, with each party only training a portion of the network. TripleBlind also uses other techniques where the data being processed is encrypted and broken up between different computers in such a way that no one machine can decode it.

The startup is marketing its technology to users in health care and financial services. Those are two industries where the need for the best A.I. systems is paramount—the consequences of a few percentage points of weaker performance can be measured in dollars and lives. And yet those are two industries where there are, for good reason, the most restrictive rules around personal data and data sharing both within and between companies.

So it’s not surprising that TripleBlind has started with these sectors. Nor is it surprising the startup counts both Accenture and the Mayo Clinic among its early investors. (The hospital group has been trialing its technology too.) And the company says it is starting to see some real world successes: One European genetics company has used the software to expand the amount of genetic data it can feed an A.I. systems to help in drug discovery. A financial services company is using it to create an “innovation sandbox” that will let startups use the bank’s own information to train A.I. systems without allowing them access to the actual data.

TripleBlind’s “blind learning” is hardly the only privacy-preserving machine learning out there. Intel, Microsoft, and IBM have all invested heavily in research into the area and have rolled out some privacy-preserving machine-learning services through their respective cloud service offerings.

What’s clear is that for A.I. to truly live up to its potential, the ability to learn from larger, more diverse datasets without compromising privacy or violating rules about data handling will be essential.

And with that, here’s the rest of this week’s A.I. news.

Jeremy Kahn
@jeremyakahn
jeremy.kahn@fortune.com

A.I. IN THE NEWS

If we make this really easy for you, will you let a stranger stay in your house? Airbnb says it plans to use A.I. to encourage more people to list property for overnight rental on its popular accommodation booking site, according to a story in The Financial Times. The technology will be used to help automatically generate listing criteria and arrange photos in an appealing manner in ads for properties. Airbnb says it is concerned that the number of properties listed for booking on its site has remained flat for the past six months, as more people, able to work from anywhere due to the coronavirus, have chosen to live in second homes that they would otherwise have rented out. Meanwhile, the number of people seeking accommodation is set to soar, the company predicts, as the pandemic eases and leisure and business travel resume.

Amazon extends its moratorium on selling facial recognition technology to law enforcement. The company told Reuters that it was extending the ban on selling facial recognition technology to police departments indefinitely. The company had initially instituted what it said would be a one-year moratorium on providing the technology to law enforcement in June 2020 amid protests that followed the murder of George Floyd and concerns that the technology functioned poorly in identifying people with darker skin. There were also worries that the police would deploy the technology in discriminatory ways.

Google unveils A.I. health tool to help users check themselves for possible skin conditions. The Internet giant launched its first major consumer-oriented health product last week: computer vision software that can analyze pictures users take of their skin in order to identify possible conditions and diseases ranging from acne to skin cancer. The Financial Times reported that the company will launch the product, which it calls Derm Assist, in Europe this year before rolling it out globally. The product will be free to use, with Google saying that users data will only be stored to further train the A.I. system and will not be used for advertising or marketing. A study published last year found that the A.I., which was trained on 16,000 real world dermatology cases, was about as accurate as human dermatologists in making correct diagnoses. That said, Google cautioned that Derm Assist is "not intended to provide a diagnosis . . . rather we hope it gives you access to authoritative information so you can make a more informed decision about your next step." It said the tool and its suggestions of possible conditions might help non-specialist doctors, such as general practitioners, make diagnoses or to refer patients to specialists.

Pinterest credits A.I. with rapid user and ad revenue growth. That's according to a story in The Wall Street Journal that says the company's neural network-driven recommendation engines have gotten very good at figuring out what is in the photos a user is "pinning" on the site and then suggesting ads that are relevant to those users. These large neural networks, which can be trained without labeled data, simply by analyzing all the pixels in an image and then predicting what ads might best match that image, can make up to 30 million predictions per second, according to the story. That has helped Pinterest increase its user numbers by 30% year over year and double its ad sales revenue in two years, a number that is higher than the growth seen at Google's YouTube and Facebook.

DeepMind fails to win more independence from Alphabet. The London-based A.I. research company, which Google bought in 2014 for more than $600 million and which has been run as an independent company under the Alphabet umbrella, had been negotiating with its parent for more autonomy, including a legal structure more similar to that used for non-profits that would have guaranteed its research would be independent and free of control or interference, according to a story in The Wall Street Journal. But, the Journal reported, DeepMind told staff late last month that those long-running negotiations, which have not been reported previously, had been discontinued. Instead, the company's research will be reviewed by a Google ethics board whose membership will include Lila Ibrahim, DeepMind's chief operating officer, and Mustafa Suleyman, DeepMind's co-founder who moved to a role within Google last year. Since purchasing DeepMind, Google and then Alphabet have bankrolled the research company's losses, which have totaled billions of dollars. In 2019, the last year for which financial figures are publicly-available, DeepMind lost about $660 million and Alphabet also forgave about $1 billion in loans it had extended to the London firm.

EYE ON A.I. TALENT

Scale AI, the San Francisco-based data labeling and data management startup, has hired Mark Valentine to be its new head of federal, according to a company blog post. Valentine, who is a former U.S. Air Force officer and senior military advisor to the Federal Emergency Management Agency, has most recently served as GM of National Security at Microsoft.

DataRobot, the A.I. and data management company based in Boston, has hired Damon Fletcher to be its new chief financial officer, Business Insider reported. Fletcher had previously been CFO at Tableau Software, helping to lead that company through its acquisition by Salesforce.

Law firm Baker McKenzie has named Ben Allgrove to a new role heading a research and development team that is partnering with New York City-based A.I. company SparkBeyond to work on projects that combine legal and A.I expertise, including developing technology for clients as well as using machine learning to help research cases, especially for the firm's pro bono work, according to a story in The Global Legal Post. Allgrove had been a lawyer in the firm's London office focused on technology-related matters.

EYE ON A.I. RESEARCH

It's getting easier to train a speech recognition system. Speech recognition technology is one of the most ubiquitous and important A.I. developments, helping to power digital assistants such as Google's digital assistant, Siri, and Amazon's Alexa. But one drawback of these systems is that they required large sets of human transcribed data to train on. And this has been difficult to obtain, particularly when one considers all of the different languages and regional accents that such systems must be able to handle accurately. Most companies that have developed digital assistants have wound up employing large armies of contractors in different parts of the world to essentially eavesdrop on snippets of conversation recorded by these digital assistants and transcribe them, with disturbing implications for privacy rights.

But now researchers at Facebook say they have managed to train a speech recognition system using unlabeled data—going directly from the sound waves the machine captures to transcription—and that, in English, it can equal the performance of some systems trained on large amounts of labeled data. The method was much more accurate than any previous unsupervised technique. It was not, however, quite as good as the best state-of-the-art supervised (i.e. those methods that use labeled data) method in English. The researchers have also used the method to train speech recognition systems that work relatively well for less common languages where obtaining enough labeled data presents a challenge, including Kyrgyz, Swahili, and Tatar. And here is where the researchers thought the unsupervised method could pay real dividends.

The method Facebook came up with relies on generative adversarial networks (GANs), the same technology that lies behind deepfakes. Here's how it works according to a Facebook blog post on the research: First, an unsupervised machine learning system learns to parse raw audio into individual sounds using a clustering technique. A neural network is then used to predict how these sounds fit together to form phonemes. It then passes this phonetic transcript to another neural network that has been trained on natural language to assess whether a phonetic sequence seems realistic. If they don't, the first network must try again until it eventually fool the second network.

FORTUNE ON A.I.

Giving traffic jams the heave-ho with the help of A.I. and data—by Fortune Editors

Startup Tenstorrent aims to build A.I. chips that beat Nvidia’s best—by Aaron Pressman

Google just revealed 3 new features. Here’s everything you need to know—by Danielle Abril

This startup just landed more funding to bolster its fight against phishing—by Jonathan Vanian

BRAIN FOOD

The road to self-driving cars keeps getting longer. The New York Times' ace A.I. reporter Cade Metz takes a long, hard look at why self-driving cars have failed to live up to their hype, and why many of those working on the technology are now trying to reset expectations, predicting that it may actually be decades more before the technology becomes mainstream. Metz examines why so many tech and ride sharing companies, which have poured billions into the idea of autonomous vehicles, misjudged the pace of progress so badly (many had predicted fleets of self-driving cars would be mainstream by now.)

One theory, he explains, is that nothing really went wrong: It is just that scientific journeys are full of unpredictable twists and turns, and there have been far more of those than people anticipated. But, as Metz explains, it didn't help that prominent tech gurus, such as Elon Musk, hyped the technology or that Silicon Valley venture capital firms were willing to throw billions into the pursuit of the self-driving dream, driven as much it seems by FOMO as anything else. I guess they needn't really have worried about missing out on a windfall: Teaching A.I. software to drive in a variety of road and weather conditions has proved much more difficult than anyone ever imagined. Even Waymo, which has a fully autonomous taxi service operating in one part of Phoenix, says the driving conditions there don't easily translate to other cities in different geographies and it is signaling that its next move may actually be in long-haul trucking, where the driving is far easier in many ways, than in a complex urban environment.

Nathaniel Fairfield, a Waymo software engineer, describes the problem like layers of onion—a company can get its A.I. systems to master certain scenarios, but that only allows it to get its car into even more complicated situations, such as trying to drive in a crowded city, that reveal yet more complexity. And here's how Metz summarizes the views of Dave Ferguson, another former Waymo engineer who is now president of Nuro, a company working on robots that can deliver goods like groceries or pizzas:

Mr. Ferguson said many thought self-driving technology would improve like an internet service or a smartphone app. But robotics is a lot more challenging. It was wrong to claim anything else.

“If you look at almost every industry that is trying to solve really, really difficult technical challenges, the folks that tend to be involved are a little bit crazy and little bit optimistic,” he said. “You need to have that optimism to get up every day and bang your head against the wall to try to solve a problem that has never been solved, and it’s not guaranteed that it ever will be solved.”

I, for one, had been hoping self-driving cars would arrive imminently, if for no other reason than it would spare me the anxiety and stress of watching my children learn to drive. But I guess I'll just have to steel myself after all. (On the other hand, if the pace of progress really is uncertain, maybe some unforeseen breakthrough will yet spare me. I've got a few years left until my eldest can apply for her learner's permit.)

Trendingnow

1

2

3

How ‘blind learning’ could solve A.I.’s ‘Little Shop of Horrors’ dilemma

A.I. IN THE NEWS

EYE ON A.I. TALENT

EYE ON A.I. RESEARCH

FORTUNE ON A.I.

BRAIN FOOD