The reporter Julia Angwin is at the vanguard of a new type of journalism that uses data to pick apart the secrets of large technology companies. Her work at the Wall Street Journal and ProPublica has exposed unsettling tracking practices by apps and websites, and revealed how Facebook ads promote racial bias and extremist content.
Backed by a $20 million grant from Craig Newmark, the founder of Craigslist, Angwin is now doubling down on that approach with a new venture called The Markup. The project aims to hire dozens of reporters to extract data from the likes of Facebook, Google, and Amazon in order to explain how their all-powerful algorithms affect our lives.
The tech giants, for their part, have little desire for Angwin to expose their inner workings to the public. And while they already rely on public-relations spin to deflect negative media attention, they may have a more powerful arrow in their quiver—namely, anti-hacking laws that they could use to block reporters and activists from obtaining data in the first place.
‘Wouldn’t be surprised if they went after journalists’
Angwin uses the scientific method to conduct journalism. She and her partners form a hypothesis about how the tech companies operate, then test to see if it’s true. Collecting the data to do this isn’t easy, not least because services offered by Google and Facebook show different things to different people—you and I don’t see the same ads, for instance—and the content they display quickly disappears. Angwin says she encountered this problem while evaluating Facebook’s election ads.
“There was no way to know how these ads affected people because they disappeared,” she says. “It’s all ephemeral, which is a problem because if there’s anything the public should be able to see and fact-check, it’s political ads.”
Their solution was to ask volunteers to install a Web browser extension that collected information about the Facebook ads they saw, then anonymously submit their findings to ProPublica. In their other investigations of Big Tech, Angwin and her colleagues have turned to automated tools to collect public data wholesale from company websites.
The problem with most of these approaches, however, is that companies may treat them as a form of law-breaking.
As the New York Times noted in its profile of The Markup: “Some of Ms. Angwin and [her colleague] Mr. Larson’s reporting tactics may violate tech platform terms of service agreements, which ban people from performing automated collection of public information and prohibit them from creating temporary research accounts.”
The consequences of violating a website’s terms of service are rarely severe, and can simply result in the site barring the violator from using the site. But in some cases a company will treat a terms violation as a form of hacking, and seek damages or call law enforcement—a concern that has crossed Angwin’s mind in relation to “scraping” or automated data collection.
“Facebook has been very aggressive about scraping on its platforms,” she says. “I wouldn’t be surprised if they went after journalists.”
For now, the only lawsuits on scraping have arisen when a would-be competitor obtains information from a website without permission. These include cases involving Facebook and LinkedIn, which is owned by Microsoft, sending cease-and-desist letters to companies that violated their policy against gathering information—even though that information was public or users had agreed to share it. In the case of Facebook, the company says those who disregard such letters are hacking.
The response of the courts so far is mixed. In a Washington D.C. case, a judge sided with the defendant on the grounds that so-called scraping is nothing more than a new way of gathering information.
“Scraping is merely a technological advance that makes information collection easier; it is not meaningfully different from using a tape recorder instead of taking written notes,” wrote the judge. In the Facebook case, however, the 9th Circuit Court of Appeals in California agreed with the social network that lawful scraping can become illegal hacking simply because a company says so.
Jamie Williams, an attorney at the Electronic Frontier Foundation, fears the decision could embolden Facebook and others to use the anti-hacking law to shut down public data collections by reporters. Worse, the law could discourage media outlets—many of which can’t afford lawyers—from attempting to do such reporting in the first place.
“The biggest concern here is chilling research,” Williams says. “Julia Angwin has a ton of money behind her, but there’s a lot of newsrooms across the country that may just tell reporters, ‘Don’t do this.'”
A call for a public interest ‘safe harbor’
Angwin is not the only one probing the tech giants to find out how they use our data. The journalist Kashmir Hill, for instance, revealed this week in Gizmodo that Facebook takes phone numbers it obtains for security purposes (so-called “two-factor authentication”) and shares them with advertisers. Hill only discovered this by posing as an advertiser herself—an act that Facebook could in theory treat as a misrepresentation and a violation of its terms of service.
While the tactics employed by Angwin and Hill may run afoul of tech company rules, they are often the only way to discover how exactly these firms are using our data. A legal pushback by Big Tech could shut down their work altogether.
“The investigations that The Markup has announced it will pursue are crucial to public understanding of the algorithmic forces shaping our society,” says Alex Abdo, an attorney with the Knight First Amendment Institute at Columbia University. “The outlet’s journalists shouldn’t have to worry about the threat of legal liability, but the social media companies have drafted their terms of service in a way that appears designed to cause journalists to think twice before conducting digital investigations of the platforms.”
Such fears are why the Knight Institute and several reporters sent a letter in August calling on Facebook to change its terms of service and grant a safe harbor to researchers and journalists whose work focuses on its platform. Abdo says there is little to report as yet, but that Facebook representatives have far been “thoughtful” in the discussions.
“There is still work to be done to figure out how we advance transparency while protecting the information that people choose to share on Facebook,” said a spokesperson for the company, who added discussions are ongoing.
Fortune also sent inquiries to Google and Twitter—two more companies whose powerful algorithms are under scrutiny by journalists—about whether they supported a safe harbor for public interest research on their platform. Neither company responded.
It is also unclear whether the tech giants would actually sue journalists at a time when they are under intense scrutiny from the media, Congress, and the Justice Department. The potential public relations disaster may, for now at least, deter them taking such a step.
That may be beside the point, however. In addition to the chilling effects described by Williams, it is not only Facebook that could turn terms of service into a censorship weapon.
“It’s hard to predict what legal challenges we’ll face,” Angwin says. “It’s usually not the big players who end up suing. It’s someone you never heard of.”