Will big data help end discrimination—or make it worse?

January 15, 2015, 8:16 PM UTC
Judge people discrimination choose select CROP
Row of different people judged by check mark, cross or question mark
Trina Dalziel—Getty Images/Ikon Images

With only a tiny bit of data—the lowly ZIP code—it’s possible for marketers to infer a world of information about any given U.S. consumer.

With a ZIP code, a marketer can make a reasonable guess at a person’s income. With tools such as Prizm and Esri, they can probe deeper to determine education level and family composition, lifestyle and spending patterns, even hopes and dreams.

The rise of big data technology allows marketers to collect a tremendous amount of information about an individual with very little to start with. The challenge with having that kind of power? Keeping discrimination out of the picture.

“We use big data to create what we call propensity models,” says Jennifer Barrett Glasgow, chief privacy officer for Acxiom. “That’s where we’re trying to take a population and say, for example, are they more or less likely to be in the market for a new SUV?”

To an automotive company, a family with young children and a car that’s seven years old could be a good candidate. “It doesn’t take a rocket scientist to say that they’re probably looking for a new car,” Glasgow says. “We study the lifestyle and demographic data and rank order it, so when an SUV manufacturer wants to do a promotion, it’s one more piece of intelligence they can use.”

Acxiom (ACXM) has information about at least 95 percent of the U.S. population; the company typically has hundreds of data points on an individual. There is more data available than ever before, Glasgow says, and the modeling is getting far more sophisticated with improved accuracy and faster results. “More and more, the modeling and creation of new intelligence is being done in real time and on the fly,” she says. “It’s no longer static. In general, the velocity at which we can make and use predictions is really speeding up.”

For consumers, that means more relevant advertising and—in the big picture—continued free access to much of the Internet. “For individuals, big data is the tool that powers more personalized advertising and a free Internet,” says Mark Torrance, CTO of artificial intelligence-based media-buying platform Rocket Fuel, which claims tens of petabytes of anonymous consumer data. “It allows advertising to be more effective, delivering more relevant and interesting ads and offers to each person as compared with the random ads that would be shown online otherwise.”

But that granularity can cause significant repercussions. Case in point: A study published in early 2013 found that Google’s results for searches on “black-sounding” names—”Trevon Jones” was one example—were more likely to be accompanied by text suggesting that the person had an arrest record, regardless of whether he or she really did.

Similarly, a 2012 Wall Street Journal report found that online vendors were altering prices and offerings based on shoppers’ locations.
A U.S. Senate report released last December recognized the potential for discrimination explicitly; so did the Obama administration’s Big Data and Privacy Working Group Review earlier this year. More recently, the Federal Trade Commission hosted a public workshop entitled “Big Data: A Tool for Inclusion or Exclusion?

The American Civil Liberties Union is one of several consumer-protection organizations eager to make their voices heard. “One of the very tricky things about all this is that quite a lot of what’s happening is behind the scenes, at an algorithmic level, and it’s often proprietary,” ACLU attorney Rachel Goodman tells Fortune. “We know enough about what’s out there and the way the world has worked historically to really strongly suspect that [discrimination] is happening. Over and over in history, we’ve seen how credit and mortgage lending and other kinds of lending end up being apportioned unequally.”

Legislators have enacted regulations to try to keep things fair, particularly in housing (Fair Housing Act) and financial services (Equal Credit Opportunity Act). U.S. anti-discrimination law also prohibits the use of certain data known as “protected classes” as a means of discrimination. It’s not always that straightforward and marketers must walk a fine line. For example, ethnicity could be used for inclusionary purposes, Acxiom’s Glasgow says. “Maybe I have a financial product best-suited for that ethnicity, like a low-interest credit card that could help Hispanic kids in college,” she says. “That’s not discriminatory. It brings value to the person.”

It’s when marketers use ethnicity to keep attractive offers out of reach of certain population segments that it becomes discrimination. “It gets to be very subjective,” Glasgow adds.

A big part of what makes it so slippery is something known as a proxy, says Solon Barocas, a postdoctoral research associate at Princeton University’s Center for Information Technology Policy. The reason ZIP codes can be so useful for marketers is that they are proxies, or close representatives, of other factors and offer more insight than mere location.

Race is unfortunately one of those correlated factors, Barocas notes. By relying on a proxy like the ZIP code, marketers can end up discriminating racially—even if they don’t mean to. And removing the proxy from the model isn’t always an option. In real estate, for example, location is too essential a factor to eliminate from consideration.

It’s even more complicated online where the rules of the discrimination game are less apparent. “At least if you walk into a subprime lender in your segregated neighborhood, you have some information about who is being served,” the ACLU’s Goodman says. “Online, it’s all so hidden. I don’t think people think about it, and that makes the notion of predation so much easier.”

It doesn’t have to be like this. The Internet has long been viewed by some as the ideal place to remove many of the biases that plague human interactions offline. “That could be so tremendously powerful,” Goodman says. “The Internet could do that.”

Torrance agrees. “This technology can be used completely without regard to race, ethnicity, or other protected categories to identify people as good prospects on the basis of their online activity and behavior,” he says.

But that’s not what’s happening most of the time. The ACLU, for instance, has asked the Federal Trade Commission to investigate the issue. “We’ve already created all these institutions and systems to force best practices in banking and housing,” Goodman explains. “The entities that do that regulation need to consider how the existing law allows them to require that transparency online, especially in marketing.” Transparency is particularly important when it comes to marketers’ algorithms, she says.

Technology could also help. “If machine learning algorithms working on big data result in racial discrimination, then other algorithms can measure the effect of discrimination,” says Gregory Piatetsky-Shapiro, president and editor of KDnuggets.com, a website about data mining. “Once the effect has been measured, then society and government can decide if the discrimination is intentional or not, and what kind of compensation or remedial action can be taken.”

In the meantime, companies like Acxiom and Rocket Fuel (FUEL) have no choice but to tread carefully. “We have invested significantly in tools that give our customers insights into where, when, and to whom our machines are serving their ads so that they have a chance to give us feedback on what makes sense,” Torrance says.

For Acxiom’s part, the company says it limits how its data can be used. (It refuses to sell data to adult entertainment businesses, for example.) Internally, a training program sensitizes employees to the issues involved and regular privacy impact assessments keep things in line.

“The technology is moving so fast that we want people to stop and think, how would I feel if it was my data being used in that way?” Glasgow says. “Just because you can do it doesn’t mean that you should.”