This is the web version of Eye on A.I., Fortune’s weekly newsletter covering artificial intelligence and business. To get it delivered weekly to your in-box, sign up here.
We’ve written before about the struggle companies are having realizing financial gains from artificial intelligence. Too often A.I. projects disappoint. Maybe part of the problem is one of expectations—many A.I. projects are too ambitious. Sometimes it is the simplest, most mundane uses of machine learning, that can have the biggest impacts.
Recently, I spoke to Michael Veselinovski, a senior account supervisor at the advertising firm Campbell Ewald, in Detroit. He says he used to spend a large portion of his time analyzing data from digital ad campaigns to figure out what was working and what wasn’t, as well as trying to gather statistics that could prove that campaigns Campbell Ewald had designed for its clients were actually delivering a return on investment. “For us it was a lot of manual work,” Veselinovksi says. “We had to come up with a hypothesis and then test it out.”
Then Campbell Ewald started using software from a California-based A.I. company called inPowered that specializes in A.I. that helps brands better position what’s known as native advertising—that is advertising that looks like journalism, but is designed to help create a positive impression of a particular brand or at least make it more likely the reader will buy a particular product or service. William Lever, the British industrialist who founded the company that would eventually become consumer goods giant Unilever, “once famously said, ‘half my advertising doesn’t work, I just don’t know which half.’ What we wanted to do is actually build a way to figure that out and make sure you’re only spending money on the half that works,” Peyman Nilforoush, inPowered’s co-founder and CEO, says.
InPowered’s software takes in data from past ad campaigns as a baseline, but then it generates new data by automatically running a series of small A/B tests. It uses these to figure out the best pieces of marketing content to run on a given website. It also figures out the best time and place within that website to position a “call to action” —such as a pop-up asking the reader to click through to another piece of content or a different site where the advertiser may be able make a sale. Critically, inPowered’s “content intelligence” platform figures out the best wording to use in the headline of that content and in those pop-ups to drive clickthrough rates.
For the insurance client, the system increased the number of people who clicked through to the life insurance calculator from less than 0.1% to 12%. “That’s 6,000 incremental clicks we’ve driven to the insurance calculator,” Veselinovski says. And those clicks are more likely to result in sales because people who have already landed on the insurance calculator are 40% more likely to engage with the next piece of content from the insurer, such as an online quote.
InPowered is not the only company making software like this. Competitors include BrightEdge, MarketMuse, and Concured. The sophistication of what each of these A.I. systems does varies. inPowered, for example, uses some natural language processing, the kind of A.I. that can understand and manipulate words, to figure out the best audience for a given piece of content. NLP is an area where A.I. has been making rapid progress, and inPowered has experimented with some cutting-edge NLP techniques, according to Nilforoush. But for the most part, the kind of machine learning inPowered and its competitors are using wouldn’t wow A.I. researchers.
But for the people using these systems, the effect is nonetheless transformative. Not only has it resulted in far better returns on their marketing spend for Campbell Ewald’s clients, it has changed how Veselinovski, the senior account supervisor, works. “The thing I love about A.I. and data is the discussion that happens afterwards,” Veselinovksi says. Rather than spending his day analyzing data, he now spends more time talking to clients about their brand positioning and overall marketing strategy. “When you can allocate more time to that discussion and the strategy, that is when the good ideas come,” he says.
In other words, this is a case where A.I. frees humans up to do the critical thinking and creative work they’re best at. And isn’t that what we really want from A.I.—that it will do the boring stuff for us?
Here’s the rest of this week’s A.I. news.
A.I. IN THE NEWS
Banks are turning to A.I. to help them deal with the demise of Libor. That's according to a story in The Financial Times. The paper reports that the phasing out of the scandal-plagued London Interbank Offered Rate (Libor), a key interest rate benchmark used as a critical reference point in many financial contracts, has lead to a massive "repapering" exercise. This refers to banks and asset managers having to replace all references to Libor in existing contracts. The task is monumental so the banks' legal departments and outside law firms are deploying legal A.I. software, from vendors such as Eigen Technologies and Heretik, to help. The software can automatically find references to Libor—or other clauses that might refer to the benchmark without using that specific term—across entire databases of contracts and in multiple languages.
Barclays and Amazon team up to use A.I. to make credit decisions. British bank Barclays has struck a deal with Amazon to share data that will allow Barclays credit card customers to pay for items on Amazon under installment plans in Germany. The FT's Gillian Tett argues that the deal is all about A.I., which is what Barclays is using in the background to analyze customer data and make decisions about whether to offer the customer credit and on what terms. Barclays CEO Jes Staley tells Tett the deal is "one of the most important things to have happened to Barclays in the past five years." But Tett worries that the opacity of A.I.-based credit decisions is problematic, presenting a challenge to regulators and possibly even presenting systemic risks to the financial system.
And speaking of using A.I. to make credit decisions...Zest A.I. strikes a deal with Freddie Mac. The A.I. startup, which I profiled for Fortune a few weeks ago, has struck a deal with the home mortgage lending corporation to use Zest's A.I. to improve its credit risk modeling, according to a story in trade publication A.I. Authority. It's yet another example of banks trying to use A.I. to make both better and fairer credit decisions.
Using A.I. to help save the elephants. Smart Parks, a Dutch wildlife conservation organization, teamed up with Hackster.io, an open-source community owned by Avnet, and several leading tech companies, including Microsoft, u-blox and Taoglas, Nordic Semiconductors, Western Digital and Edge Impulse to help fund developers around the world who collaborated to build a better tracking collar for monitoring endangered elephant populations in the wild. The collar they came up with—called ElephantEdge and built by engineering firm Irnas—will, according to a story in Tech Crunch, use open-source machine learning models to better determine when elephants are on the move and what activities they are engaged in, while auditory detection software in the trackers will alert park rangers to the presence of human voice near the elephants, which could warn them about the activities of poachers.
Facebook, awash with problematic content, is ramping up its use of A.I. to police its social networks. But its human moderators say the tech isn't working. Facebook has revealed that it was inundated with disinformation during the run-up to November's U.S. presidential election, as my Fortune colleague Danielle Abril has reported. At the same time, the company has made big strides in using A.I. to police content that violates its policies and has now put machine learning algorithms in charge of triaging the content that is brought to the attention of the 15,000 human content moderators it employs around the world. But at least some moderators feel the system isn't working and they are still being exposed to too much graphic and violent content, inflicting potential psychological damage and distress, according to a story in tech publication The Register.
In fact, they claim failures in the A.I. system mean that they are being exposed to more potentially harmful content than ever before. Two hundred of them have signed an open letter to the company, backed by non-profit tech advocacy group Foxglove, saying, "it is important to explain that the reason you have chosen to risk our lives is that this year Facebook tried using ‘AI’ to moderate content—and failed." Meanwhile, Danielle also reports on the news that the Anti-Defamation League is also not happy with Facebook's efforts to combat hate speech. A.I. not withstanding, Facebook users are still seeing half a trillion posts annually that contain hate speech, according to an ADL-commissioned report. What's more, as I noted in Fortune last week, even if the A.I. works perfectly it won't solve many of the problems Facebook is facing which stem as much from bad human judgments and policy decisions as any technological failure.
Researchers are having success using A.I. and drones to look out for great white sharks swimming close to Southern California beaches. Douglas McCauley, a marine science professor at the University of California, Santa Barbara, and the director of the Benioff Ocean Initiative is working with A.I. researchers from Salesforce, the company founded and run by Marc Benioff, and San Diego State University, on the technology. They have tested the system, which is called SharkEye, at Padaro Beach in Santa Barbara County which, according to a story in The New York Times, is both a popular place to learn to surf and a nursery for juvenile great whites. Currently, a human drone pilot and the an A.I. system jointly work to spot any sharks and then send text messages to life guards and surf school instructors. The researchers are hoping to eventually train an A.I. system that will be able to predict, based on ocean conditions and temperature, how likely it is that sharks will show up at a particular beach.
In India, A.I. is being used to combat tuberculosis. Several computer vision algorithms, which have been created by a variety of different A.I. companies, can detect signs of tuberculosis from chest X-rays. They can be deployed on a mobile app, offering hope for patients in parts of rural India where TB is rampant and yet people tend to have access only to general medicine doctors, if they have access to a doctor at all, The New York Times reports. “Most chest X-rays for people who are suspected of having tuberculosis are read by people who are not remotely expert at interpreting them,” said Dr. Richard E. Chaisson, a TB expert at Johns Hopkins University told The Times. “If there were an A.I. package that could read the X-rays and the CT scans for you in some remote emergency room, that would be a huge, huge advance.”
The non-profit Stop TB Partnership tested a number of different algorithms trained to detect signs of TB from chest X-rays and found they all outperformed expert human radiologists, but The Times noted that their performance varied under real world conditions—which is a problem with a lot of A.I. software trained to be used in medical settings. Also, the apps are being used to assess the X-rays of children, even though the apps were not specifically trained on pediatric data, where signs of TB can be different from that found in adult scans. But the story notes that India is also severely lacking human radiologists who are comfortable interpreting pediatric X-rays and that several of the A.I. app makers are now trying to validate their technology for use with children's X-rays.
V.A. turns to A.I. to try to prevent suicide among veterans. Doctors with the U.S. Department of Veterans Affairs have begun using machine learning to try to identify patients who may be at risk of committing suicide, according to a story in The New York Times. A psychiatrist interviewed by the paper said that even trained psychiatrists were often poor judges of which patients were most at risk of killing themselves. While similar systems have been piloted in the U.K. National Health Service, the U.S. Army and elsewhere, the V.A. effort is the first one to use the system in daily clinical practice and is being closely watched by other medical organizations and researchers, according to The Times.
Roboticists worry our robots might be racist and there's controversy about what, exactly, to do about it. As awareness has grown that many of A.I. algorithms incorporate racial biases due to having been trained on explicitly racist past practices or due to an unintentional lack of diversity in training data, researchers and engineers who work on robots have grown particularly alarmed: After all, these are machines that are trained to take physical actions in the world where the consequences of making a wrong decision can be severe. The issue has become particularly fraught when it comes to police departments using robots, The New York Times says.
Ayanna Howard, a Georgia Tech roboticist and head of the group Black in Robotics, and Jason Borenstein, a colleague from Georgia Tech's public policy department, to wrote in 2017 research paper: “Given the current tensions arising from police shootings of African-American men from Ferguson to Baton Rouge, it is disconcerting that robot peacekeepers, including police and military robots, will, at some point, be given increased freedom to decide whether to take a human life, especially if problems related to bias have not been resolved.”
This summer, after more police shootings of unarmed Black people, several roboticists penned open letters and manifestos urging their fellow scientists and engineers to stop working with U.S. police departments entirely until major reforms to combat institutional racism are undertaken. But that stance has proven controversial, with at least one roboticist, Cindy Bethel, from Mississippi State University, telling The Times it would be better to work with police departments to develop robots—for instance, to help conduct surveillance inside an apartment so police don't have to enter, guns drawn, not knowing what's inside.
EYE ON A.I. TALENT
Trading firm Liquidnet, which runs one of the largest markets where securities' prices are not publicly listed, known as "a dark pool," has named Steven Nichols as head of natural language processing and unstructured data, according to financial publication The Trade. Nichols has been at Liquidnet since 2019 when the trading firm acquired Prattle, an A.I. company specializing in NLP, where Nichols had been head of data science. Liquidnet also announced the hiring of three new data scientists: Nicholas Burtch, Anthony Schramm, and Yusong Liu.
ringDNA, a Los Angeles-based company that sells several A.I.-enabled software tools to help companies increase their revenues and improve the performance of their sales teams, has hired Christine Hill to be vice president of customer success, according to a story in trade publication AiAuthority. Hill was previously senior director of worldwide sales at cloud data management company Rubrik.
EYE ON A.I. RESEARCH
DeepMind and Liverpool FC team up to explore how A.I. can be used in football (soccer to us Americans). In a recent research paper published on the research repository arxiv.org, scientist from London-based A.I. company DeepMind (which is part of Google-owner Alphabet) and sports data boffins from Premier League football club Liverpool FC explore the ways deep learning might contribute to the sport in the future. The team of researchers looked at ways to combine statistical learning, computer vision and game theory in different combinations.
In one example of what A.I. methods could do, the researchers combined statistical analysis of penalty kicks with game theory. The researchers found that by clustering players by playing style (a measure they derived from analysis of how they moved on the field in game video) it was possible to recommend slightly different penalty kick strategies for different groups of players. In another example they used computer vision and statistical analysis to compute the best predicted trajectories for players to take as they ran down the field and looked for situations in which a player's actual movements deviated significantly from this predicted trajectory.
The paper's authors say the ultimate goal would be to create an Automated Video Assistant Coach (AVAC) that could take unlabeled video feeds and analyze them to both suggest specific improvements that individual players can make to their technique, both during training and in the course of a game, as well as analyzing video feeds of opposing teams and then making recommendation's to the team's coach about what strategies or tactics might work well against that particular opponent.
Does this paper mean that DeepMind, which has previously created A.I. systems that have beaten top human players at numerous games including Go and Starcraft 2, is actually going to try to conquer football or create the A.I. coaching assistant the researchers postulate? I have no idea, but DeepMind has occasionally telegraphed the next grand challenges it plans to take on with papers of this sort. So watch this space....
FORTUNE ON A.I.
Who’s liable when a self-driving car collides with another vehicle?—by David Z. Morris
Marissa Mayer launches her first startup—by Lucinda Shen
Facebook’s A.I. is getting better at finding malicious content—but it won’t solve the company’s problems—by Jeremy Kahn
Facebook’s latest efforts to combat hate speech aren’t enough, ADL says—by Danielle Abril
Why Biden must rely on innovation to rejuvenate the economy—Joe Crowley
Google finds a massive problem with the way most modern A.I. systems are designed and trained. Many modern deep learning A.I. systems are trained on historical data and then tested on a subset of this training data that they have not seen before. But, despite often achieving excellent performance on such tests, these machine learning systems frequently fail when deployed in real world environments. This failure is usually attributed to a mismatch between the real world data and the training data which the A.I. system is unable to cope with. But now a major study undertaken by 40 Google researchers and A.I. experts across seven different departments at the company, has found there is a far more fundamental and insidious problem that is often at the heart of this problem.
The issue is a statistical phenomenon called "underspecification" and it has to do with the fact that while deep learning systems are very good at figuring out correlations, cannot determine causation or even take known causal factors into consideration. As a result, it is possible to have two different machine learning systems that, by placing different weights on different datapoints, achieve the identical test set performance but which will have wildly different performances when deployed in the real world.
Google found these problems in a huge number of A.I. systems deployed in the real world including some where underspecification could have life-and-death consequences, such as three different medical A.I. that are supposed to predict eye disease from retinal scans, cancer from skin lesions, and kidney failure from patient records.
“We are asking more of machine-learning models than we are able to guarantee with our current approach,” Alex D’Amour, who led the study, tells M.I.T. Technology Review. This means that in practice, given current training and testing methods, it is actually impossible to tell if you've found the best machine learning model or to have any idea how well that model is actually going to perform in the real world.
What's more, it's not clear that there is any way to solve the problem with most existing deep learning models and training methods. To do so, one would have to endow the deep learning model with a lot more causal structure. But the whole reason that people often want to use a deep learning model in the first place is that they don't actually know or understand the causal relationship between the data and the phenomenon they are trying to predict. As Brandon Rohrer, a machine-learning engineer at iRobot, tells MIT Tech Review: the Google paper is "a wrecking ball" for the entire A.I. field.
For the moment, D'Amour suggests that anyone hoping to deploy a deep learning do a lot more testing, including tests that are specifically designed to fool or break the system. He also suggested that people build many different models for every task they wish to accomplish and then to have these compete against one another on the real world task until it becomes more clear which is actually going to perform best in this live environment.