Google's Hate Speech Detection A.I. Has a Racial Bias Problem

A Google-created tool that uses artificial intelligence to police hate speech in online comments on sites like the New York Times has become racially biased, according to a new study.

The tool, developed by Google and a subsidiary of its parent company, often classified comments written in the African-American vernacular as toxic, researchers from the University of Washington, Carnegie Mellon, and the Allen Institute for Artificial Intelligence said in a paper presented in early August at the Association for Computational Linguistics conference in Florence, Italy.

The findings underscore machine learning’s susceptibility to bias and its unintended consequences on groups that are underrepresented in the data used to train machine-learning systems. A.I. researchers who are studying the problem worry that the technology─which learns to recognize patterns within vast quantities of data─could perpetrate inequality.

Google has dealt with machine learning bias problems in that past, like in 2015 when one of its tools used to search photos tagged African Americans as gorillas. Another tool released in 2018 that matched people’s selfies with popular artworks, inadvertently correlated the faces of African Americans with artwork depicting slaves, “perhaps because of an overreliance on Western art,” journalist Vauhini Vara wrote in Fortune.

The study also highlights the limitations of machine learning as a tool to filter and screen content. Despite its ability to automate tasks, thus reducing the load from human moderators, it often fails at understanding context, like deciphering whether a biting joke may be funny to one person and yet upsetting to another.

Jigsaw, the subsidiary that helped create the tool, called Perspective, pitched it for its debut in 2017 as a way for publishers like The Economist and The Guardian to automatically filter offensive comments, leading to more civil discourse. The idea, similar to initiatives by companies like Facebook and Twitter, is that machine-learning powered software can help filter the deluge of abusive and hateful online comments that are often left online without being deleted.

But this type of content-moderation software can stumble when trying to understand the nuances of human language. Natural language processing technology, or NLP—a subset of A.I. that helps computers understand language—often fails to understand context.

Maarten Sap, a University of Washington graduate student who is studying NLP and who co-authored the paper, said that his team was interested in studying social inequality and how it’s represented through language. The researchers had read previous reports of hate-speech detection tools inadvertently flagging African-American English, raising concerns that technology was silencing the voices of African-Americans in online comments or on social media.

For their research, the academics first gathered two datasets of over 100,000 Twitter comments and that are typically used by other researchers for hate-speech detection projects. Human annotators from the data-labeling company Figure Eight (now owned by data company Appen) inspected the tweets and labeled them as being offensive, hateful, or benign.

After inspecting the datasets, the researchers noticed that the human annotators would often label tweets commonly associated with African-American vernacular as being offensive or hateful, despite the phrases being typical to that particular dialect. These phrases might contain certain words that other social groups may find offensive, like the N-word, “ass,” or “bitch.”

The researchers then used the Twitter data to train a neural network—software that learns—to recognize offensive or hateful phrases. The trained neural network then analyzed two other datasets containing Tweets in which the Twitter users’ racial identities were either known or inferred, and associated common African-American phrases as being offensive, like the human annotators.

When the researchers tested the Perspective tool in the same manner, they found that it also labeled tweets in African-American English as being toxic, Sap said.

“That’s a pretty big problem,” said Sap, pointing to the impact that biased content filters would have on online comments.

Dan Keyserling, Jigsaw’s chief operating officer, said that Jigsaw representatives are now working with the researchers who presented the paper to improve its technology.

“We really welcome this kind of research,” Keyserling said. “We are constantly attending these conferences and try to support the research community.”

The researchers also conducted a small, related experiment to learn more about how human annotators label data. They hired temporary human annotators through Amazon’s Mechanical Turk service and sent them a collection of tweets by African-Americans to label as offensive or not.

In this case, because the human annotators knew that African-Americans wrote the tweets, they were significantly less likely to classify certain phrases as being offensive. Sap speculated that these crowdsourced workers found the N-Word to be less offensive when used by African-Americans, but is “probably more offensive if it is said by a white person.” In the case of the original labeled datasets, human annotators “just didn’t know the context,” he said.

Keyserling acknowledged that the Perspective tool, like other machine-learning systems, is susceptible to bias. In fact, journalists and researchers have called out Perspective in the past for similar instances of bias against certain groups.

But, Keyserling contends that Jigsaw has made great strides in reducing biases in its technology, pointing to previous blog posts and research papers that the company has released detailing its challenges with biases.

Keyserling did not identify where the human annotators Jigsaw uses to label data for training Perspective come from, or whether Jigsaw would specifically use of any of the techniques outlined in the paper to reduce future instances of bias. This could include giving human annotators more context about people’s race when reading their comments.

However, Keyserling noted that “context really matters” and “that’s absolutely something we are paying attention to when structuring our data.”

In a follow up email, Keyserling pointed to a technique Jigsaw uses during the data training process to mitigate bias, which involves so-called model cards that detail how the machine-learning models should be used and ethical considerations to consider. In a previous blog post, Jigsaw said the technique helped reduce unintended bias in its Perspective tool that may impact people who identity as gay, homosexual, or black.

“We are always working on ways to improve our models and mitigate unintended bias on an on-going basis,” Keyserling said.

This likely won’t be the last time Perspective stumbles and develops a bias. As Jigsaw expands the use of its technology in more languages—French newspaper Le Monde recently began using a French version of the tool, for example—it will likely face similar problems with other dialects.

But Keyserling contends that Perspective is “delivering impact” and is “incredibly helpful to platforms” that are otherwise overwhelmed with hateful online comments. Consistently updating the technology to eliminate biases is a never-ending task.

“The technology will never be perfect,” Keyserling said. “That is the nature of machine-learning research.”