This article first appeared in Data Sheet, Fortune’s daily newsletter on the top tech news. Sign up here.
If I tracked everywhere you traveled in a taxi, what could I learn about you? Aaron in for Adam today, contemplating some of the small data in a big data set.
The New York City Taxi & Limousine Commission a few years ago released a fascinating set of data, listing 1.1 billion separate taxi trips taken from 2009 through 2015. The data includes the GPS coordinates of the beginning and end of every trip, offering a detailed picture of where people traveled. Software developer and part time data cruncher Todd Schneider did some cool analysis when the set was released (including reality checking Die Hard 3: “could Bruce Willis and Samuel L. Jackson have made it from 72nd and Broadway to Wall Street in less than 30 minutes?”).
Now others are sifting the data searching for answers to all kinds of questions. University of Chicago grad student David Andrew Finer realized that the data could shed light on how Wall Street interacts with the Federal Reserve, especially around the critical times when the central bank is voting whether to raise or lower interest rates. The Fed’s decisions can move markets worth trillions of dollars, so Wall Street has a lot riding on the outcome of each meeting. But the Fed is supposed to operate in secret and not leak its moves in advance.
Finer decided to look at cab rides that traveled to and from the New York Fed and the headquarters of major banks. Sure enough, he found evidence that the number of trips jumped in the days around a meeting. The data could even be used to find when a rider picked up near the Fed got off at a destination that was the same time and place as another rider who had been picked up near one of the banks. The occurrence of these “coincidental drop-offs” at lunch time also jumped in the days around Fed meetings. “The timing and locations of the rides imply unofficial or discreet interactions, though this certainly need not imply any impropriety,” Finer concluded. Here’s a link to the PDF of the full study.
Correlation isn’t causation. Some further investigation, perhaps of emails and texts between people at the meetings, is needed. Finer refers to an earlier study that looked at stock returns around Fed meetings and suggested information was leaking.
Finer doesn’t name names, either. Is it Goldman Sachs (GS) or J.P. Morgan (JPM) that had the most Fed-related taxi trips? But he certainly could. The data is so detailed that others have reverse-engineered the trips to show how easily the anonymous data can be de-anonymized. Some people tried to track celebrities based on known sightings and home addresses, while another attempt even purported to be able to identify which taxi driver were muslim based on activity during the five prayer times a day. Looks like it’s not just Facebook and Google that have collected worrisome data on us.