CEO DailyCFO DailyBroadsheetData SheetTerm Sheet

A.I. engineers should spend time training not just algorithms, but also the humans who use them

May 5, 2020, 1:40 PM UTC

This is the web version of Eye on A.I., Fortune’s weekly newsletter covering artificial intelligence and business. To get it delivered weekly to your in-box, sign up here.

Last month in this newsletter, I interviewed Ahmer Inam, the chief A.I. officer at technology services firm Pactera Edge, who offered some advice for how companies can build machine learning systems that can cope with the changes the pandemic has caused to normal business and consumption patterns.

Inam argued that the coronavirus pandemic is pushing many businesses to accelerate the adoption of more sophisticated artificial intelligence.

Abhishek Gupta, a machine learning engineer at Microsoft and founder of the Montreal AI Ethics Institute, got in touch over Twitter to say that I should have highlighted some important safety issues to bear in mind when considering Inam’s suggestions.

Last week, I caught up with Gupta by video call and asked him to elaborate on his very valid concerns.

One of the suggestions Inam made was for the A.I. systems to always be designed with a “human in the loop,” who is able to intervene when necessary.

Gupta says that in principle, this sounds good, but in practice, there’s too often a tendency towards what he calls “the token human.”

At worst, this is especially dangerous because it provides the illusion of safety. It can just be a check-the-box exercise where a human is given some nominal oversight over the algorithm, but actually has no real understanding of how the A.I. works, whether the data analyzed looks anything like the data used to train the system, and whether its output is valid.

If an A.I. system performs well in 99% of cases, humans tend to become complacent, even in systems where the human is more empowered. They stop scrutinizing the A.I. systems they are supposed to be supervising. And when things go wrong, these humans-in-the-loop can become especially confused and struggle to regain control: a phenomenon known as “automation surprise.”

This is arguably part of what went wrong when an Uber self-driving car struck and killed pedestrian Elaine Herzberg in 2018; the car’s safety driver was looking at her phone at the moment of the collision. It was also a factor in the two fatal crashes of Boeing 737 Max airliners, in which the pilots struggled to figure out what was happening and how to disengage the automatic pilot.

Gupta thinks there’s a fundamental problem with the way most A.I. engineers work: They spend a lot of time worrying about how to train their algorithms and little time thinking about how to train the humans who will use them.

Most machine learning systems are probabilistic—there is a degree of uncertainty to every prediction they make. But a lot of A.I. software has user interfaces that mask this uncertainty.  

It doesn’t help, Gupta says, that most humans aren’t very good at probabilities. “It is hard for most people to distinguish between 65% confidence and 75% confidence,” he says.

More A.I. software, he says, should be designed to show a user its confidence in its own predictions. Better yet, if that confidence drops below a certain pre-defined threshold, the software should alert the user that it simply can’t make a prediction. Users should also be told exactly what data was used to train the software and what its strengths and weaknesses are.

Pactera Edge’s Inam said that the pandemic was also leading more companies to experiment with reinforcement learning—the kind of A.I. that learns from experience—usually in a simulator—rather than historical data. Gupta said that while reinforcement learning can indeed be very powerful, it can also be particularly dangerous.

The biggest challenge with reinforcement learning is specifying the A.I.’s goal in such a way that it will learn to do what you want it to do—without doing something dangerous or harmful in the process.

A.I. software is remarkably adept at “specification gaming”—finding shortcuts through data that allow it to achieve its objective, but not in the way or spirit that its creators intended. Programmers have to be extremely careful about how they state the algorithm’s objective and how they design positive reinforcement during its training.

DeepMind, one of the world’s foremost practitioners of reinforcement learning, recently published an excellent compilation of about 60 examples of A.I. systems running amok due to inaptly specified objectives and rules. (They come from lots of different researchers—not just DeepMind—and span about 30 years of experiments.)

Even an A.I. that performs fantastically in a simulator may not be safe when it is transferred to the real world, since simulations are never perfect. Even with thousands of human years of training in a simulator, it’s impossible to know if the A.I. will actually be able to cope with everything the real world might throw at it.

“Reinforcement learning is opening up new avenues for us, there’s no doubt,” Gupta says. “And in Go and chess, it’s fine. But we don’t have disposable humans to use to test autonomous vehicles.”

Good points all. Now here’s the rest of this week’s news in A.I.

Jeremy Kahn
jeremy.kahn@fortune.com
@jeremyakahn

A.I. IN THE NEWS

Utah suspends work with A.I. startup Banjo over CEO's white supremacist past. The State of Utah has suspended its contract with computer vision startup Banjo, which sold software to help law enforcement agencies track people through traffic and CCTV cameras, following revelations that its co-founder and chief executive officer had once been a member of a Klu Klux Klan splinter group and participated in a gun attack against a synagogue, The Salt Lake Tribune reported. Patton was 17 at the time of the synagogue attack, which damaged the building in Nashville, Tennessee. He was convicted of juvenile delinquency and testified against fellow gang members. OneZero, which first uncovered his past, found evidence that Patton's affiliation with white supremacists continued after the incident, and, in court documents, Patton admitted he associated with skinheads after joining the U.S. Navy at age 18. The 47-year old CEO Damien Patton told OneZero that his actions were "indefensibly wrong" and that he felt "extreme remorse" for "this shameful period in my life."  Banjo has raised more than $220 million in venture capital to date, from investors that include SoftBank.

Your A.I. may be a genius but it can't legally be an inventor, U.S. government says. The U.S. Patent and Trademark Office has ruled that only "natural persons" can be listed as the legal inventor on a valid patent filing. The decision followed two patent applications—one for a flashing light and another for a food container—that were conceived by an A.I. system called DABUS which was created by something called the Artificial Inventor Project, The Verge reported. The Financial Times had earlier reported that the project was a deliberate attempt by an international group of A.I. and legal scholars to test the limits of U.S., British and European patent systems.

Senator Tom Cotton promises "hard look" at Chinese science students in the U.S., but there's little evidence of the brain drain he alleges. Senator Tom Cotton, the Arkansas Republican who sits on a number of intelligence and defense-related committees, stirred controversy with his remarks on Fox News last week suggesting it was scandalous that U.S. universities trained a large number of Chinese students in technical subjects, including artificial intelligence and quantum computing, only to have these students return to China. Cotton threatened to introduced legislation to prevent any Chinese students from coming to the U.S. to study STEM subjects. A number of leading U.S. technologists, from both industry and academia, took to social media to condemn Cotton's remarks as xenophobic. And a Washington Post story looking at the available data found that most PhD students from China wanted to remain in the U.S. after their training and that there was no evidence of a sustained brain drain—in fact, quite the opposite. The story noted that no good data was available for those receiving bachelor's and master's degrees.

BenevolentAI helps find a possible coronavirus treatment. The London-based A.I. startup BenevolentAI used its software, which is designed to find novel drug research avenues lurking unnoticed in the medical literature, to pinpoint "a possible treatment with speed that surprised both the company that makes the drug and many doctors who had spent years exploring its effect on other viruses," according to a story in The New York Times. After just two days, the A.I. alighted on baricitinib, a drug that was originally designed to treat rheumatoid arthritis. It will soon be tested on coronavirus patients in an accelerated clinical trial overseen by the U.S. National Institutes of Health.

Facebook creates the most human-like chatbot yet. Facebook has created an open domain chat bot—one that can hold a conversation on virtually any subject—that comes tantalizingly close to passing the Turing Test. In blind tests, 49% of human evaluators said they would prefer to keep chatting with Facebook's A.I. software, which it calls Blender, instead of an actual person. The system also scored far better than the previous champion chatbot, a system called Meena, created earlier this year by Google. Facebook's chatbot is designed to blend—hence the name—essential conversational skills such as knowledge, personality and empathy. It was created using one of the largest language models to date, taking in some 9 billion variables across a pre-training set that included 1.5 billion Reddit comment thread posts and four separate fine-tuning training sets that helped it hone each of the conversational skills sets and then learn how to mix them together in a human-like manner. Here's my story in Fortune on the chatbot breakthrough. 

Google uses an A.I. to successfully predict gene expression in yeast. A team of Google A.I. researchers working with a group from their sister company Calico Life Sciences have built a machine learning model that looks at an entire genome and tries to accurately predict which genes will be expressed at any given point in time, Google says in a blog post. The researchers were only looking at yeast, a single cell organism, but the researchers said their work might have important implications for the study of aging, since yeast cells tend to self-destruct after they bud, or self-replicate, about 30 times. In one experiment, the researchers tried to see if their machine learning model could accurately predict which genes would serve as intermediate regulators of the expression of other genes. The researchers were only able to experimentally verify three out of 10 predictions the machine learning system made. But, interestingly, one of those predictions involved a protein scientists had not previously identified as a regulator, and another, for a protein that turned out to be a very active regulator, had been known to scientists previously but its significance had never been proven experimentally.

How Eric Schmidt became the liaison between Silicon Valley and the Pentagon. The New York Times has an interesting look at how ex-Google chairman Eric Schmidt has reinvented himself as the guy evangelizing state-of-the-art software, particularly A.I., to the U.S. military. Schmidt serves on several civilian boards that advise the military on innovation and A.I. Following one tour, he told a four-star general in charge of special operations that "you absolutely suck at machine learning," according to the story, and then made it his mission to help the military improve. Outside observers tell the newspaper that Schmidt may wind up like many other proponents of a revolution in military technology—frustrated by the slow pace of progress, bureaucratic inertia and the internal and external politics that stymies change within the defense establishment. The story also details potential conflicts-of-interest Schmidt has due to his past role at Google and his current investment in a tech startup, Rebellion Defense, that wants to sell its software to the Pentagon. 

EYE ON A.I. TALENT

Uber Chief Technology Officer Thuan Pham has resigned from the company amid widespread layoffs, The Information reported. Thuan, who joined Uber in 2013, was the company's longest-serving senior executive.

AiCure, a New York-based company that uses A.I. to improve clinical trials, has appointed Ed Ikeguchi as its new chief executive officer. Ikeguchi, who had been serving as AiCure's president and chief medical officer, had previously co-founded and served in senior executive roles at Medidata Solutions.  

Braid Health, a San Francisco startup that uses computer vision software to create A.I.-powered diagnostic tools for radiology, has hired Rajni Natesan as its chief medical officer. Natesan had been a professor of diagnostic radiology at MD Anderson Cancer Center.  

Clari, the Sunnyvale, California, technology company that uses A.I. to companies to better forecast and improve their revenue collection, has appointed Laura MacKinnon as its chief people officer. MacKinnon was most recently in the same role at SignalEx.

EYE ON A.I. RESEARCH

Baidu's researchers top "city of the future" challenge. The AI City Challenge is a contest created by an international group of A.I. researchers "to accelerate intelligent video analysis that helps make cities smarter and safer." It challenges teams to create A.I. systems that can perform a variety of tasks on monitoring vehicle traffic in a large city based on camera data.

This year's competition attracted 315 teams from 37 countries who had to design systems that could handle four separate tasks: automatic vehicle counting, vehicle identification, vehicle tracking, and detecting traffic anomalies. Teams from Baidu, the Chinese technology giant, took the top positions in all of these tasks except vehicle tracking, which was won by a team from Carnegie Mellon University.

Some of the tasks were clearly harder for A.I. systems to handle than others: Less than one hundredth of a point separated Baidu from the second- and third-place teams on the vehicle counting task. But on the vehicle re-identification task, Baidu's win was a more convincing 84% accuracy, compared to 78% for the runner-up. On traffic anomaly detection, Baidu achieved a remarkable 96% accurate compared to just 57% for the second-place algorithm. Vehicle tracking across multiple cameras remains a tough challenge: The winning CMU team achieved just .4589 (or about 46% correct identifications). 

The contest shows that A.I.-empowered traffic management systems for large cities may not be too far away. You can read more about the contest and the various contenders here

FORTUNE ON A.I.

Samsung, Nationwide, and GE just invested in A.I. startup Nexar—by Jonathan Vanian

Cybercriminals adapt to coronavirus faster than the A.I. cops hunting them—by Jeremy Kahn

The Coronavirus Economy: The startup founder in India striving to improve mass transit—by Maithreyi Seetharaman

Facebook creates the most ‘human’ chatbot yet—by Jeremy Kahn

Coronavirus patient data stored in electronic health records found difficult to study at scale—by Fred Schulte and Kaiser Health News

Some of these stories require a subscription to access. Thank you for supporting our journalism.

BRAIN FOOD

What do A.I., Jay Z and Fat Elvis all have in common? Now there's a question. The answer: They all present a thicket of intellectual property rights issues. Several stories this week touched on the challenges for artists and entertainers, A.I. developers, and the legal system.

First came the news that Jay Z was using copyright challenges to combat audio deepfakes. Apparently, the hip hop mogul's signature raspy voice and distinctive rap cadence have become popular material for those creating humorous audio impersonations using deepfake technology. The impersonations, which include what sounds like Jay Z rapping the Billy Joel song "We Didn't Start the Fire," Hamlet's soliloquy, and the Navy Seal copypasta, have drawn the ire of the star's entertainment company Roc Nation LLC, which has asked YouTube to remove the material for copyright violation, according to The Verge.

But, after initially taking down the recordings, YouTube subsequently reinstated them, saying Roc Nation's requests were incomplete. Technologist Andy Baio, co-founder of the XOXO and former Kickstarter CTO, argues in a lengthy blog post that using deepfakes in this way may not actually violate U.S. copyright laws—that they may be covered by fair use, especially as VocalSynthesis, the creator of the deepfake audio, does not seem to be profiting from them nor attempting to defame Jay Z.

Baio raises a number of interesting points. While copyright law has previously dealt with the issue of sampling, deepfakes are different: They don't use existing audio recording except as training data. Instead, the audio output the system generates is completely novel. As the technologists writes, deepfakes like this are clearly "derivative works" under copyright law, but they may not constitute a copyright violations. It may even be the case that the deepfakes themselves can be copyrighted by VocalSynthesis as new pieces. (Although courts may well find that deepfakes are not copyrightable because they are not created by a "natural person." This has huge implications for anyone using A.I. to create art. The artist may be able to claim that the A.I. code is copyrightable—but not the output of that code, which is interesting.)

OpenAI may be wading into the same copyright issues with the debut last week of its new music-making A.I. called Jukebox. The San Francisco-based A.I. research company has created software that can invent novel music, including sung lyrics, in the style of almost any artist in any genre. The system was trained on 1.2 million songs. A separate engine is used to generate lyrics, which are then fed into the music maker that uses a combination of state-of-the-art machine learning techniques to create the new songs. You can read more about it on OpenAI's blog, including a full research paper, and also listen to samples of Jukebox's output.

Part of the issue, as Baio also points out, has less to do with copyright than with artists' publicity rights—the right to use their own likeness, image, voice and style to promote their work. Deepfakes and other synthetic "in the style of" compositions (such as those produced by OpenAI's Jukebox) might infringe on these. Well, it turns out there's a fascinating set of case law on the issue of publicity rights and the fine line between fair impersonation and illegal rip-off artist, and much of it revolves around the battle that Elvis Presley's estate has fought over the years to police all those Fat Elvis impersonators. For a great overview of that case law, I highly recommend this entertainment law review article by David Wall, a professor at the University of Leeds.