Coming Soon: Ethics Training for Data Scientists

December 4, 2015, 6:00 PM UTC
Close-up Of A Hard Drive
BERLIN, GERMANY - JANUARY 29: Symbolic photo for data protection, reflection of a fingerprint in a computer hard drive on January 29, 2015 in Berlin, Germany. (Photo by Thomas Trutschel/Photothek via Getty Images)
Photograph by Thomas Trutschel Photothek/Getty Images

The bright side of the confluence of cloud computing and big data is that it enables medical and other scientific breakthroughs. The dark side is that it also enables invasion of our personal privacy. (Cue Edward Snowden disclosures.)

But wait! There may be a bright side of the dark side soon. A Microsoft (MSFT) researcher who specializes in this field expects that data scientists will soon be getting schooled in the propriety of doing their jobs.

That tidbit comes from a new post outlining predictions by top Microsoft Technology & Research personnel, Microsoft’s version of IBM’s (IBM) annual 5 in 5 technology extravaganza.

In any case, Microsoft principal analyst Kate Crawford, who researches the social impact of big data, said that in 2016:

“The key breakthrough will be that every data science program will have a data ethics curriculum, giving greater understanding the human implications of large-scale data collection and experimentation (and ideally producing greater fairness and protection from forms of data discrimination).”

This call strikes a nerve in an era of ever-growing data mining—by retailers, by insurance companies, by the government, by terrorists. And Crawford, who is also a visiting professor at MIT’s Center for Civic Media, and a Senior Fellow at NYU’s Information Law Institute, has been on the case for awhile.

Speaking at an MIT conference two years ago, Crawford warned that people who were comforted that most government surveillance involved “just” anonymized metadata—or data about their data with their identity information stripped out—were kidding themselves.

“First it was only about metadata and you wonder, should you care? But what’s interesting is that studies show that metadata is incredibly sensitive. We need to do an enormous catchup of where this data is going and how it’s being used,” she said at the time.

Her point was that with the sheer amount of data being vacuumed up and the tools available to examine it, anonymized information doesn’t stay anonymous very long. She said that with just four geospatial data points taken from phone records, an aggregator can identify 95% of people.

So much for data privacy.

Of course, just making ethics classes available to the future data scientists of the world does not mean they, or whoever is calling the shots, will act only for the greater good. But it would be a step in the right direction.

I guess.

For more from Barb, follow her on Twitter at @gigabarb, read her coverage at or subscribe via this RSS feed.

And please subscribe to Data Sheet, Fortune’s daily newsletter on the business of technology.

Read More

Artificial IntelligenceCryptocurrencyMetaverseCybersecurityTech Forward