Researchers Caused an Uproar By Publishing Data From 70,000 OkCupid Users

Screenshot of OkCupid website homepage

Earlier this month, Danish researchers published data from the online profiles of nearly 70,000 OkCupid users—including usernames, political leanings, drug usage, and intimate sexual details—creating a privacy firestorm.

The researchers, Emil Kirkegaard and Julius Daugbjerg Bjerrekær, used data scraping software developed by a third contributor, Oliver Nordbjerg, to collect the information for a study that explored, among other things, the thinking of people on the site. They posted the database along with a draft paper on Open Science Framework, a site that encourages open source science research and collaboration.

Unlike recent incidents at Ashley Madison, a site for people seeking extramarital affairs, as well as some adult networks that cater to people with fetishes, the OkCupid research did not involve a security breach. That didn’t stop the ensuing controversy.

“Some may object to the ethics of gathering and releasing this data,” the authors wrote in the draft paper, which has since been pulled. “However, all the data found in the dataset are or were already publicly available, so releasing this dataset merely presents it in a more useful form.”

Online commenters, OkCupid users, the site’s operators, and academics attacked (and, in some cases, threatened) the researchers for making user information public. Some questioned whether such data harvesting, bundling, and broadcasting is justifiable for academic research and whether it crosses ethical and legal lines.

Although the researchers did not release the real names and pictures of the OkCupid users, critics noted that their identities could easily be uncovered from the details provided—such as from the usernames. “Your private life is a few big leaks away from being an inescapable matter of public record, once a statistician with BitTorrent gets bored,” said Scott Weingart, a digital humanities specialist at Carnegie Mellon University, mused in a post on Twitter (TWTR). He added that it would be easy to identify more than 10,000 of the people in the data dump and link them to their sexual inclinations.

Kirkegaard said that his group posted people’s usernames because it found the data on these self-selected pseudonyms to be scientifically interesting. (What does use of the word “hot” in an alias say about its subject, for example?) He also argued that retaining the information in the dataset would allow certain missing details—like height, profile text, or photos—to be added later.

The data, collected from November 2014 to March 2015, is indeed public—sort of. Some of it like bios, photos, age, gender, sexual orientation is easily accessible through basic Google (GOOG) searches. Answers to some 2,600 of the service’s most popular dating survey questions are restricted to people who are logged into the site and who have answered the same questions.

The site’s users can also set certain answers to “private,” which makes the responses inaccessible to others. In this case, the researchers scraped and presented the data accessible through Google and Q&A responses from individual profiles.

“We thought this was an obvious case of public data scraping so that it would not be a legal problem,” Kirkegaard wrote to Fortune.

Last week after the appearance of the dataset began inciting an uproar, Open Science Framework, the site that hosted the data, placed it behind a password-protected wall. OkCupid then filed a copyright claim on Friday ordering the site to take it down altogether. The page where the data initially appeared was initially changed to read: “Unavailable for legal reasons.” Now it simply states “Content removed.”

The editorial board at Open Differential Psychology, the journal to which the researchers submitted the accompanying paper (and where Kirkegaard is the editor), is currently reviewing the submission, Kirkegaard told the science blog Retraction Watch. “If the journal does not take the paper, we will probably publish it elsewhere,” he said.

Get Data Sheet, Fortune’s technology newsletter.

OkCupid, owned by InterActivCorp’s (IAC) Match Group (MTCH), released a statement that complained about the published data. “This was a violation of our terms of service and we sent a take-down notice,” Mathew Traub, a spokesperson for OkCupid, told Fortune in an email. “They appear to have complied.”

Kirkegaard said in a Twitter post that he did not ask the company for permission to collect or publish the data beforehand. Some commenters have argued that the researchers breached research ethics by failing to obtain the consent of the OkCupid users, too, before gathering and republishing their information. They cite, among other things, “code of conduct” guidelines by the American Psychological Association.

Aarhus University in Denmark, the school at which Kirkegaard is a graduate student, distanced itself from the team of students, who undertook the project in their spare time. “The views and actions by student Emil Kirkegaard is not on behalf of AU,” the university said in a statement posted to Twitter. “[H]is actions are entirely his own responsibility.”

This is not the first time someone has scraped the profile data of OkCupid users, of course. At least one individual cleverly “hacked” the dating system to get more romantic matches several years ago. And the site’s co-founder, Christian Rudder, published a treatise on data science that analyzed information from the data-rich dating network. These cases are different, however, from the latest instance of scraping, packaging and releasing profile information publicly.

A better comparison would be a 2008 study out of Harvard University that relied on information culled from Facebook (FB) profiles. The researchers did use some anonymizing techniques, but critics said the protections were not strong enough. The scientists ultimately took down the data.

In a message sent to Fortune, Kirkegaard wrote that he did not rule out the possibility of republishing the data his team collected with more effort put into obscuring the identity of the OKCupid users. Given OKCupid’s interpretation of its terms of service agreement—and its copyright claim—it’s unlikely that the company will sign off on the proposed compromise. As with the Harvard Facebook study, the data may very well remain in limbo.

It’s no surprise that people are sensitive to having their romantic and other interests neatly presented for others to rifle through online, even if done in the name of science. In addition to questions raised about the ethics of certain data science practices, the boundaries of open science research, and the ease of identifying the members of a given dataset, the incident reveals something else, too: People continue to give up vast quantities of their personal data to sites online, expecting privacy.

Subscribe to Well Adjusted, our newsletter full of simple strategies to work smarter and live better, from the Fortune Well team. Sign up today.

Read More

Artificial IntelligenceCryptocurrencyMetaverseCybersecurityTech Forward