The Big Apple’s Big Data advantage

Aug 20, 2012

The new Microsoft (msft) research lab in Manhattan is the latest addition to New York City’s increasingly buzzy tech scene. Headed by mathematical physicist Jennifer Chayes, the lab will focus on Big Data analysis, parsing the vast troves of information created by the world's digital denizens. Chayes spoke with Fortune about the lab, the world-changing potential of Big Data, and why New York City has an edge when it comes to tech’s next big thing.

You have a new lab in New York City, what are you working on?
We’re pretty much centered on Big Data questions -- how Big Data can help answer social science questions, how it can help answer economic questions, and what it means for the interaction of the social sciences with technology. If you’re in that field, these are very, very exciting times.

What are some of the projects the lab is taking on?
Well, one example is we have researchers in economics and prediction engines looking at the bets people make. Generally speaking, if people place bets on certain things, they are often much more invested in that thing. So, when large numbers of people place bets on who’s going to win, say, a primary election, that’s a very good way of getting data. We have some researchers who have come up with really amazing precisions on who’s going to be winning the primaries based on the bets that people are making. You can also do this with sports events. You can do this with all kinds of events. The future of interactive entertainment will have a prediction engine component to it.

At the lab we also have leaders in machine learning. John Langford is building a machine learning platform that people all over the world are using. It’s called VW, which stands for Vowpal Wabbit. [Editor’s note: The name is Elmer Fudd speak for “Vorpal Rabbit.”] Which gives us an incredibly fast way to analyze huge data sets. And a few of our guys are setting up a large-scale experimental platform, using Amazon’s (amzn) Mechanical Turk to find participants online. In the past, a lot of experiments would use 30 or 40 student volunteers. This is a much larger scale.

In general, the magnitude of the data that we are able to take in now dwarfs anything that we were able to do in the past, and so you need a new scientific method to answer new kinds of questions. It’s really a new age of social science.

How is the lab settling in so far?
The lab opened on May 3 and now has 15 researches, and we’re actually hiring more. We expect to have very strong relationships to the universities in the city and to the startups. A lot of the guys in our lab have close colleagues in the startup scene. They’re very involved because, you know, all these scientists know each other. They go to the same conferences, they went to the same graduate school. It’s a very tight community.

We also have strong relationships with all the major universities in the area. New York Univeristy’s CUSP program, the new Center for Urban Science & Progress, is hiring 50 faculty and researchers, and is being headed by Steve Koonin, who was the undersecretary of energy for science. We’re going to be pursuing joint projects in mining urban data with them.

Some of our computational social scientists are teaching Columbia University courses. And we’re in touch with the Cornell Technion campus planners. The new campus opens on Roosevelt Island in about 5 years and will have a couple thousand graduate students and some 250 faculty. It will have three themes, all centered around Big Data. So it’s actually a really, really exciting time in the city right now.

What’s your take on New York City’s tech scene at the moment?
Silicon Alley is really becoming a hub for these data-intensive startups in Web 2.0 and beyond Web 2.0. New universities and new companies are coming in. Mayor Michael Bloomberg has really taken the lead here with the Applied Sciences Initiative, which has played a part in the NYU CUSP program, the Cornell Technion campus, and others. The initiative is really helping to provide the training for the people who are going to populate these companies.

Can New York compete with Silicon Valley in Big Data?
There are certain things that give New York an edge. The ad agencies are centered in the city and Madison Avenue, and a lot of the Big Data questions are being driven by the fact that the a lot of high-tech business models have to do with ads. There’s tons of creativity around this, but there’s also always a question of what’s going to fund all of these new products and all of these new initiatives. A lot of that is centered on ads. So it’s very natural that it’s gotten to be so strong in the city.

There’s an intersection of the design world, the ad industry, and Big Data that’s really happening in the city. There are obviously other companies that have that, like Apple (aapl), but you really aren’t seeing this nexus that New York has anywhere else.

So are Silicon Valley and New York neck-and-neck in this field?
Well, there aren’t yet as many people in high-tech in the city as in Silicon Valley, but the rate of growth is certainly higher. And if you look at the boundary of social sciences and technology, some of the most interesting things there are happening in the city. It’s a different cut of high-tech due to the design influence and the ad agency influence, and there are different kinds of advances being made.

So, Big Data is really the next big thing?
I think that’s certainly true. Both scientifically, and in terms of where our advances are going to come from. In the past, we have had to build models for how we thought people interacted, and now we just let the machine tell us what the answer is by finding patters in these huge amounts of data. This research will power everything from online purchase suggestions, to the ads on social networks, to search engines taking your terms and looking at a trillion websites and coming up with the right ones to give back to you.

When you look a the new exciting companies out there, so many of them are really data-driven businesses. And of course cloud computing is going to be our infrastructure for dealing with Big Data. Everything from social media to physical infrastructure to biotech will be data-driven. Look at the Genome Project, it relies on these same algorithms.I would say this is the age of Big Data, certainly.

Sounds like a shift is underway in New York.
You’ve got to understand that it’s the age of big data and it’s the age of the geek. Years ago, nobody wanted to say that they were a geek and now it’s kind of cool. It’s New York City, and it’s filled with geeks and they’re giving geeky names to their projects.

