An American University Is Spying On Students to Predict Dropouts. Here’s What That Says About Big Data in the U.S.
Our locations are constantly being tracked, largely through our cellphones, but also by sensors and cameras that track our trajectories through our day-to-day physical environments. It’s just a fact of life these days. But what are the consequences?
What we don’t see are the “big data” machinations behind the scenes that, given details of our locations at certain times, can use them to infer things about us. And we’re not just talking about the level of “Person X is passing our store and might want to buy Product Y”—the classic business use case for location data. We’re talking about more personal attributes.
The University of Arizona (UA) has just provided a splendid example of this phenomenon. In order to figure out whether freshman students are likely to drop out after their first year, researchers there retroactively spied on their movements by tapping into the past logs of their CatCard student ID cards.
Students use the cards to access parts of the campus and to use vending machines. Here’s how UA described the research in a press release:
“If Student A, on multiple occasions, uses her CatCard at the same location at roughly the same time as Student B, it would suggest a social interaction between the two… [The researchers] additionally used the CatCard data to look at the regularity of students’ routines and whether or not they had fairly established patterns of activity during the school week…
“Considered together with demographic information and other predictive measures of freshman retention, an analysis of students’ social interactions and routines was able to accurately predict 85 to 90 percent of the freshmen who would not return for a second year at the UA, with those having less-established routines and fewer social interactions most at-risk for leaving.”
And here’s a key quote from UA professor Sudha Ram, who led the research: “[The card is] really not designed to track their social interactions, but you can, because you have a timestamp and location information.”
Now, there’s no doubting that the researchers were targeting a real problem here—too many people drop out after their freshman year, and it’s worth finding new ways to stop that happening. UA is also not actually using the researchers’ algorithms in its predictive analytics (though it does correlate 800 data points to figure out who the most at-risk students are).
Ram also stressed that the CatCard data she collected was “anonymized” in order to protect students’ privacy, to stop her from being able to identify particular students. (That said, the identifying names and ID numbers are clearly still there, as UA noted identifying attributes “ultimately would be shared only with the students’ adviser” if the system went into use.)
But even if some protections are in play here, the end result is still that the researchers repurposed people’s personal data in order to figure out if they’re behaving more aimlessly than average and judge whether or not they’ve made enough friends.
If this system were to be used, it wouldn’t quite be China’s dystopian “social credit” framework, but it follows much the same principles: follow people all the time and correlate various data points about them, in order to predict what they might do in the future, and stop them from doing it.
UA’s research also points to the gulf in attitudes towards surveillance, big data and privacy in the U.S. and in Europe. This is something that ought to concern American companies hoping to deploy these sorts of tactics across the Atlantic.
The European Union will in a couple of months get a new privacy regime in the shape of the General Data Protection Regulation (GDPR). Comparing UA’s research with the GDPR’s rules is an educational exercise in itself.
Under the new EU law, it is illegal to collect personal data—any data that can be connected to an identifiable person—for one purpose and then use it for another. The GDPR also allows people to block organizations or companies from profiling them using big data systems. And it’s very strict on anonymization techniques: if it really isn’t possible to re-identify an individual, then anonymization takes data out of the GDPR’s scope, but that’s a hard thing to guarantee, and it doesn’t seem to be what’s happening in the system being researched.
These new-fangled analytics techniques may hold a lot of promise, but they also clash with privacy rights that are far more entrenched in law in Europe than they are in the U.S.
The EU approach isn’t merely legalistic—as UA’s experiment demonstrates so ably, something as banal as location data can really tell you a lot about a person, and the decisions that stem from that knowledge can have real effects on people and their futures. And what’s more, the EU has become expert at exporting its privacy norms to other countries.
U.S. companies building their own futures on big data technology had better pay heed to the problems they will face in rolling out those techniques internationally. Other countries are getting wise to the consequences.
This article was updated to clarify that Ram’s research is not yet being used in UA’s predictive analytics, and that it only involved past data.
David Meyer is a Berlin-based writer for Fortune, the author of Control Shift: How Technology Affects You and Your Rights, and a privacy consultant.