HireVue stops using facial expressions to assess job candidates amid audit of its A.I. algorithms

This is the web version of Eye on A.I., Fortune’s weekly newsletter covering artificial intelligence and business. To get it delivered weekly to your in-box, sign up here.

The journalist Malcolm Gladwell, on his podcast, “Revisionist History,” devoted a recent episode to his theory of “hiring nihilism.” It is Gladwell’s belief that people are so bad at predicting who will perform well at a given role—especially based on traditional screening criteria such as CVs and candidate interviews—that one should simply concede that all hiring is essentially arbitrary. Gladwell explained, when it came time to find a new assistant or hiring an accountant, he did so in explicitly arbitrary ways—picking whoever an acquaintance recommended or someone he met on the street, with only the most cursory of face-to-face conversation. Why waste time on a process that would ultimately produce a result no better than throwing darts?

For decades, a segment of the tech industry has grown based on an acceptance of Gladwell’s premise—that humans are terrible at forecasting job performance—but an emphatic rejection of his resort to nihilism. Instead, these tech companies argue, with better screening tools (which, not coincidentally, these same companies happen to sell), this problem can fixed. Increasingly, artificial intelligence has been a part of what these firms are selling.

A.I. offers the promise that there exists some hidden constellation of data, too complex or subtle for an H.R. executives or hiring managers to ever discern, that can predict which candidate will excel at a given role. In theory, the technology offers businesses the prospect of radically expanding the diversity of their candidate pool. In practice, though, critics warn, such software runs a high risk of reinforcing existing biases, making it harder for women, Black people and others from non-traditional backgrounds to get hired. What’s worse, it may cloak a process that remains as fundamentally arbitrary and biased as Gladwell argues in an ever more pseudoscientific veneer.

HireVue is one of the leading companies in “hiretech”—its software allows companies to record videos of candidates answering a standard set of interview questions and then sort candidates based on those responses—and it has been the target of such criticism. In 2019, the nonprofit Electronic Privacy Information Center filed a complaint against the company with the Federal Trade Commission alleging that HireVue’s use of A.I. to assess job candidate’s video interviews constituted “unfair and deceptive trade practices.” The company says it’s not done anything illegal. But, partly in response to the criticism, HireVue announced last year that it had stopped using a candidate’s facial expressions in the video interviews as a factor its algorithms considered.

This past week, the company also revealed the results of a third-party audit of its algorithms. The audit mostly gave HireVue good marks for its efforts to eliminate potential bias in its A.I. systems. But it also recommended several areas where the company could do more. For instance, it suggested the company investigate potential bias in the way the system assesses candidates with different accents. It also turns out that minority candidates are more likely to give very short answers to questions—one word responses or saying things such as “I don’t know”—which the system had difficulty scoring, resulting in these candidate interviews being disproportionately flagged for human reviewers.

Lindsey Zuloaga, the company’s chief data scientist, told me that the most important factor in predicting whether a job candidate would succeed was the content of their answers to the interview questions. Nonverbal data didn’t provide much predictive power compared to the content of a candidate’s answers—in fact, in most cases, it contributed about 0.25% to a model’s predictive power, she says. Even when trying to assess candidates for a role with a lot of customer interaction, nonverbal attributes contributed just 4% to the model’s predictive accuracy. “When you put that in the context of the concerns people were having [about potential bias], it wasn’t worth the incremental value we might have been getting from it,” Kevin Parker, HireVue’s CEO, says.

Parker says the company is “always looking for bias in the data that goes into the model” and that it had a policy of discarding datasets if using them led to a disparity in outcomes between groups based on things such as race, gender or age. He also notes that only about 20% of HireVue’s customers currently opt to use the predictive analytics feature of the software—the rest use humans to review the candidates’ videos—but that it’s becoming increasingly popular.

HireVue’s audit was conducted by O’Neil Risk Consulting and Algorithmic Auditing (ORCAA), a firm founded by Cathy O’Neil, the mathematician best known for her 2016 book about algorithmic decision-making, Weapons of Math Destruction, and which is one of a growing handful of companies specializing in these kinds of assessments.

Zuloaga says she was struck by the extent to which the ORCAA auditors sought out different types of people HireVue’s algorithms touched—from the job seekers themselves to the customers using the software to the data scientists helping to build the predictive models. One of the things that came across in the audit, she says, is that certain groups of job candidates may be more comfortable than others with entire idea of being interviewed by a piece of software and having a machine assess that interview—and so there may be some hidden selection bias built into all of HireVue’s data currently.

ORCAA recommended HireVue do more to communicate to candidates exactly what the interview process will involve and how their answers will be screened. Zuloaga says that HireVue is also learning that minority candidates may need more explicit encouragement from the software in order to keep going through the interview process. She and Parker say the company is looking at ways to provide that.

HireVue is among the first to engage a third party to conduct an algorithmic bias audit. And while PR damage control might have been part of the motivation—“this is a continuation of our focus on transparency,” Parker insists—it does make the company a pioneer. As A.I. gets adopted by more and more businesses, it is likely that such audits will become more commonplace. At least the audit reveals that HireVue is thinking hard about issues around A.I. ethics and bias and seems sincere in seeking to address it. It’s an example other businesses should follow. It is also worth remembering that the alternative to using technology such as HireVue’s is not some utopian vision of rationality, empiricism and fairness—it is Gladwell’s hiring nihilism.

And with that, here is the rest of this week’s A.I. news.

Jeremy Kahn
@jeremyakahn
jeremy.kahn@fortune.com

***

The societal reckoning over systemic racism continues to underscore the importance businesses must place on responsible A.I. All leaders are wrestling with thorny questions around liability and bias, exploring best practices for their company, and learning how to set effective industry guidelines on how to use the technology. Join us for our second interactive Fortune Brainstorm A.I. community conversation, presented by Accenture, on Tuesday, January 26, 2021 at 1:00–2:00 p.m. ET.

A.I. IN THE NEWS

U.S. creates National Artificial Intelligence office. In one of the Trump Administration's final acts, it has established the National Artificial Intelligence Initiative Office, which is part of the Office of Science and Technology Policy. The new office is charged with coordinating efforts across academia, government and industry to ensure the U.S. remains a leader in A.I. development and this development is in line with U.S. strategy. The announcement spends as much time talking about the symbolism of the office's new insignia—which includes an eagle superimposed on an image meant to represent a neural network and which garnered a lot of comments on Twitter—as it does about anything concrete the new office will do. You can read the announcement and see the new office's badge here.

FAA approves first fully autonomous drone flights. The aviation regulator has granted approval for Massachusetts-based American Robotics Inc. to fly drones without a pilot overseeing their actions on the ground and beyond the visual line-of-sight of an operator in a first that is could pave the way for the use of such aircraft in farming, mining and the energy sector, The Wall Street Journal reported. The drones will have to operate in rural areas and below 400 feet.

Social media app Parler, popular with right-wing groups, plans a comeback with help from A.I. After being banned by app stores and dropped by its web hosting service for allowing its users to incite violence and promote hate speech, Parler CEO John Matze told Fox News that he plans to relaunch the messaging service with help from A.I. He said the company would use the technology for content moderation. But, as other social media sites have discovered, A.I. is no easy fix for difficult problems such as hate speech and online bullying and an army of human content moderators is almost always still necessary to review the A.I.-decisions and handle tricky edge cases and adapt rapidly to users' ability to game the A.I.-based filters.

Biden names Lander to the Office of Science and Technology Policy. U.S. President-elect Joe Biden named Eric Lander, a geneticist and mathematician who served as co-chair President Barack Obama's Council of Advisors on Science and Technology, to serve as director of the White House Office of Science and Technology, which Biden also said he is elevating to a Cabinet-level position. You can read more about the announcement here.

EYE ON A.I. TALENT

Signal AI, an A.I.-enabled media intelligence company, has appointed Clancy Childs as chief product officer. Childs previously held a number of senior roles at both Google and Dow Jones, most recently serving as general manager of Dow Jones' innovation business unit.

British telecom company BT has hired Harmeen Mehta to be its chief digital and innovation officer. Mehta will lead a new BT Digital division and join BT's executive committee. She had previously been chief information officer at Indian telecom group Bharti Airtel.

EYE ON A.I. RESEARCH

Google trains a 1.6 trillion parameter language model. Google has unveiled a new ultra-large language model architecture, which it calls a Switch Transformer, that it says has achieved state-of-the-art performance on a number of natural language processing tasks while also being more efficient in its use of computing power. It says the algorithm is able to take in much more data--in this case, 1.6 trillion parameters, far larger than any model ever trained before--but only needs to reference parts of its massive neural network at a time, making the training more efficient. This is important because a major criticism of these ultra-large language models is how expensive, in terms of data center time, as well as power consumption, they are to train. But it's also interesting that even though the model is almost ten times larger than even OpenAI's GPT-3 language model, its performance doesn't seem to be 10 times better—which mean that this approach to natural language processing has significant diminishing returns.

FORTUNE ON A.I.

A.I. in the beauty industry: How the pandemic finally made consumers care about it—by Gaby Shacknai

Business could be on the precipice of an automation explosion—by Chris Morris

A Facebook case in Belgium could open the floodgates for GDPR privacy suits—by David Meyer

How patents help us invent the future—by Dario Gil

BRAIN FOOD

Axios, the fast-growing news service known for its bullet-point style reports, announced this week that it is expanding into local news. As part of that push, it published what it called an "audience Bill of Rights" that included, as its first principle, a pledge that "every item will be written or produced by a real person with a real identity. There will be NO AI-written stories. NO bots. NO fake accounts." (The all-caps emphasis is Axios'.)

I think a "no fake accounts and no fake bylines" policy is fair enough. News organizations should never mislead readers about who or what is producing the information they are consuming. If A.I. is being used to write a news story, then readers should know it. I also think that local communities deserve dedicated, locally-based reporters who can ask tough questions of officials and dig into important issues. If Axios is planning to hire a whole bunch of such folks, that's great. All that said, I can't help but wonder whether Axios' pledge goes too far, and is, by implication, too disparaging of the value A.I.-produced news can actually provide to local communities.

In a previous job, I worked on a podcast episode about something called Project Radar (it stood for Reporters and Data and Robots) that was a joint effort of the British news wire the Press Association and a startup called Urbs Media. The idea was to take large government data sets and use them to generate dozens or even hundreds of tailored stories for local newspapers. The way it worked, a human journalist still had to find the data set and write a template which would then be automatically populated with data for each local area, with natural language processing software adjusting the language used depending on whether the trends was up or down, etc.

I went to interview journalists at The Bournemouth Echo, one of the local papers that was using the system. The Echo's editor at the time, Andy Martin, told me he saw the system as a godsend. It wasn't replacing human reporters per se. The Echo already had too few of those any way, thanks to endless rounds of job cuts that had nothing to do with automation and everything to do with the way the Internet destroyed local news. The deployment of Radar didn't affect Martin's staff headcount one way or another. But what it did do is let Martin allow the few reporters he did have to spend more of their time doing legwork and digging—the stuff local reporters should be doing to hold public officials to account, but which these days, thanks to staff cuts, is often not possible. Similarly, the Associated Press, uses A.I. software to write simple local sports news summaries. Using A.I. in this way doesn't hurt local readers. If anything, it helps them by freeing media organizations to use the few reporters they do have to do the investigative work that, for now, only humans can do.

I think rather than making a big deal about all of its news being written by humans, Axios would be better off pledging that no matter how it produces the news, its stories will serve the public interest.

Subscribe to Well Adjusted, our newsletter full of simple strategies to work smarter and live better, from the Fortune Well team. Sign up today.