Michelle Zatlyn, co-founder, CloudFlare.

Meet Fortune’s 2014 Big Data All-Stars

Updated: Aug 14, 2014 2:02 PM UTC | Originally published: Aug 04, 2014

Andrea Burbank, Data Scientist, Pinterest

At the image-focused social network Pinterest, data scientist Andrea Burbank leads A/B testing, evaluating how changes in the look or functionality of its website, mobile application, or communications impact the behavior of its roughly 60 million users worldwide. When a small module convinces you to invite a friend to the service, or a recommendation email induces you to follow more boards, it’s likely that Burbank and her team have their invisible hands on it. “I’ve watched hundreds of experiments go by on millions of users for a combination of billions of user days,” she said at an industry event in March. The testing extends to the company’s own processes, too. One of Burbank’s big wins was democratizing the ability for other Pinterest employees to run experiments. “Before, there was a single point of failure, but also a single point of knowledge,” she said. Not anymore. —Andrew Nusca

Arno Candel, Physicist and Hacker, 0xdata

Arno Candel caught the science bug early. He grew up in Untersiggenthal, Switzerland, a small village wedged between a top particle accelerator lab at the Paul Scherrer Institute and ETH Zürich, continental Europe’s most prestigious technical university. Studying particle physics and supercomputing, Candel coded models of the universe on computers. After moving to California to work at the SLAC National Accelerator Laboratory, he moved to the startup world, joining Skytree as a founding engineer and designing high-performance machine learning algorithms. At 0xdata he is a core developer on the data science platform known as h2o, which has been ranked the number one open-source Java machine learning project by members of the coding community GitHub. The platform enables deep learning and is compatible with the popular statistical programming language R. His title at the company? “Physicist & Hacker,” of course. —Robert Hackett

Arun Murthy, Co-founder, Hortonworks

Arun Murthy started off at Yahoo (yhoo) when Hadoop, the open-source storage and processing software that powers much of the web’s big data, was an early prototype. His team’s mission was to scale it for Yahoo’s web search. Murthy helped develop a resource and workload management system called YARN that acts as a sort of operating system for Hadoop. “Hadoop ‘One’ looked like Microsoft Windows with notepad,” Murthy says. “What you really want is Windows with PowerPoint, Word, Excel and so on.” That’s what YARN enabled: It let users plug many applications into Hadoop to store all sorts of data. “I have two kids at home,” Murthy says. “YARN is sort of my third.” —Robert Hackett

Barry Morris, Chief Executive Officer, NuoDB

Many technology companies promise to spark a revolution, but very few have the backing of the leaders of the previous one. Cambridge, Mass.-based NuoDB counts three of the four horsemen of the previous database wave—former Ingres Corp. CEO Gary Morgenthaler, former Sybase Inc. CEO Mitchell Kertzman, and former Informix Software CEO Roger Sippl, with only a certain Oracle chief executive abstaining—as investors. Why? Because NuoDB’s technology solves a problem that many industry veterans long considered to be the Holy Grail: run a database on multiple servers. “It’s about more machines, not bigger machines,” CEO Barry Morris says. “That problem, simple as it sounds, was as yet unsolved.” First funded in 2010, Morris’ company recently landed a massive customer—Dassault Systèmes, the second-largest software vendor in Europe—and is hurtling toward what Morris calls “a new convergence point.” He’s convinced NuoDB will be at the center of it. “It’s not about size or speed of the data. It’s about being data-driven,” he says. “Continuous improvement—that’s the revolution.” —Andrew Nusca

Brian Rogosky, Director of Big Data Engineering, Beats Music

It should be no surprise that Brian Rogosky was largely unable to discuss in detail his work at Beats Music, which was recently acquired by ultra-secretive Apple (aapl). (The deal, announced in May, closed on Friday.) But he was happy to talk shop all the same. Companies are interested in getting closer and closer to real-time analysis and processing of data, he says, and they are interested in making those data more shareable within their respective organizations. What's more, companies want to use those data to power increasingly personalized experiences within applications. How might Rogosky approach these trends at Beats? You'll have to connect the dots yourself. “I can’t say too much about that in my current role," he says. "Let’s just say general trends for now.” —Robert Hackett

Daniele Quercia, Researcher, Yahoo! Labs

As a child Daniele Quercia wanted to be a police officer, and had a toy motorbike to prove it. Today at Yahoo Labs, he knows cities inside and out—but on a digital level. With a Ph.D in computer science in hand and post-doctoral research at MIT’s school of Urban Studies on his C.V., Quercia focuses on urban studies on a large scale. To wit: he created a game that asks people which cityscapes they prefer, then shares the scores on Facebook in an effort to go viral. Quercia analyzes the results to learn what people like and don’t like in an effort to ultimately design better and more appealing cities. “Computer science is all about building tools," he says. "I wanted to do something new, something that would make an impact. More than half of the world’s population lives in cities.” —Shalene Gupta

Drew Purves, Head of the Computational Ecology and Environmental Science Group, Microsoft Research

When Stephen Emmott, head of computational science at Microsoft Research’s Cambridge Labs, pitched the idea of having the lab sponsor an internal ecology group at one of the company’s “Bill Reviews”—where employees present in front of founder and chairman Bill Gates—“it was a “famously bad meeting,” says Drew Purves, then a Princeton University ecologist. “He thought it was the most ridiculous thing.” But Gates ended up changing his mind, and soon Microsoft (msft) hired Purves to lead a group tasked with building predictive models of Earth’s systems. Since landing in Microsoft’s blue skies research division, Purves has helped develop “The Madingley Model,” a simulation of all life on Earth. The project is ambitious—quixotic, even—but it may eventually generate practical applications. “Everything that happens within the economy fits within an environmental context,” Purves says, rattling off some of the world’s most daunting challenges, including aging populations, cancer, food security, climate change and alternative sources of energy. Of ecology and biology, he added, “these things will be the key driver of 21st century economy.” —Robert Hackett

Florian Pinel, Senior Software Engineer, IBM's Watson & Cognitive Cooking group

After IBM’s "intelligent" computer system Watson trounced its opponents on Jeopardy!, the company wanted to see how else it could push the boundaries of cognitive computing. Since Watson withstood the pressure so well—it brought plenty of its own heat on its human competition—IBM (ibm) decided to bring the system into the kitchen. (Literally.) A professional chef by training, Florian Pinel is a member of the “Cognitive Cooking” team unleashing Watson’s creative potential. “We focused on food because its something everyone cares about and we could easily create prototype,” Pinel says. “I was extremely pleasantly surprised to combine both my passion for food and computer science.” The team begins with a set of fundamental ingredients and compounds, but their combinations grow exponentially into vast numbers of potentially tasty recipes. At an IBM food truck at the SxSW conference in Austin, Texas this year, Chef Watson invented an Austrian chocolate burrito that contained chocolate, ground beef, edamame, and apricot. Sounds awful, and yet: “It worked. It was really good.” Pinel says. “We’re here to inspire users and to help them discover combinations they would never have thought of.” —Robert Hackett

Jeff Hammerbacher, Chief Scientist, Cloudera; Assistant Professor of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai

Jeff Hammerbacher didn't need a Ph.D to land an assistant professorship in genetics and genomic sciences at the Icahn School of Medicine at Mount Sinai, but he had plenty of experience to justify the academic appointment. One of Facebook’s (fb) first data scientists, Hammerbacher ditched the ad-soaked world of social networking to become the chief scientist at Cloudera, the Apache Hadoop-based enterprise software powerhouse. Now he’s working with Eric Schadt, Mount Sinai's chair of genetics and genomic sciences, to bring big data analytics to bear on healthcare. “He’s defining a field they’ll give Ph.Ds in,” Schadt says. “There were no programs to train him in what he did.” Hammerbacher is working to construct the right infrastructure to manage and compute health data to create better predictive models in medicine. “Now is the right time for healthcare and medical centers to turn themselves into big data analytic engines,” Schadt says. “That we’re able to embrace a guy like Jeff just speaks volumes about where medicine is going to go in the future.” —Robert Hackett

Michelle Zatlyn, Co-founder, CloudFlare

If the Internet had its own bouncer, it would probably be CloudFlare, a San Francisco company that sees 5% of the world’s Internet traffic. Co-founder Michelle Zatlyn met her partners Matthew Prince and Lee Holloway at Harvard Business School and started the company in 2009. It acts as a buffer between websites and malicious users: if CloudFlare identifies a user as friendly, it speeds up their service. If it determines that user to be a spammer or a bot, it slows their service or introduces a CAPTCHA has a hurdle. Zatlyn focuses on making CloudFlare accessible and is working to expand its reach. “Ten years ago, I knew I wanted to be part of a team that was big and important," she says. "I didn’t know what that meant, but I feel really lucky that I found CloudFlare. I can’t imagine doing anything better than helping customers run their business better." —Shalene Gupta

Monica Rogati, Vice President of Data, Jawbone

At Jawbone, Monica Rogati has two jobs. First, make sense of the data created by Up, the San Francisco accessory company’s wearable, sensor-laden wristband. Second, build new things that use that data in smart ways. “We uncover all sorts of interesting things about how we sleep, move, and eat that you weren’t able to find before,” she says. “We used to have sleep studies with 100 people; now we can study 100,000 people.” Which means Rogati and her team can see what people literally lose sleep over—for Washington, D.C. residents, a presidential inauguration; for Istanbul residents, ongoing protests in the Middle East; for ultra-Catholic Providence, Rhode Island, the Pope’s resignation in February 2013—then feed that knowledge back into the Up to nudge wearers into adjusting their behavior. “We’re taking all kinds of insights in the data and using those to encourage people to be at their best,” she says. —Andrew Nusca

Onno Zoeter, Research Scientist, Xerox Research Centre Europe

As a child, Onno Zoeter wanted to be a Lego designer. Then he got his first computer at the age of 8. It sparked an interest in artificial intelligence. Today Zoeter works at Xerox Labs Europe (xrx), where he focuses on reducing traffic congestion in Los Angeles. “We know very little about parking because it takes a lot of time to observe,” Zoeter says. Zoeter’s team installed sensors in parking spaces across the sprawling city in an effort to bridge that knowledge gap. The data feed into a smartphone app that drivers can use to determine in real time which spaces are full and which ones are empty. Even better? With that data, the city is able to change the price of those parking spaces to reduce traffic in highly congested areas. Since the project went live in 2012, congestion in Los Angeles has dropped by 10%. —Shalene Gupta

Patrick Poels, Vice President of Engineering, Eventbrite

Patrick Poels left the tech industry for five years to be a professional poker player. The poker market dried up in 2010, though, and he decided to the return to technology. He doesn't regret it. “Analyzing data is a lot like poker," he says. "You play thousands of hands, you learn about people, you process data, you look for things that stick out. The same things are applicable.” At the online ticketing company Eventbrite, Poels and his team created a recommendation system that suggests events to users based on what other people also read. It's working: A million people a week buy tickets on Eventbrite, and nearly half of them are return buyers. His next project? Figuring out best practices for reserved seating events. —Shalene Gupta

Silvanus Lee, Lead Scientist, Dropbox

Wunderkind Silvanus Lee graduated from Stanford University in only two-and-a-half years with a double major in computer science and mathematics. He went straight to the financial industry after graduation, but the allure of the tech industry was too strong, and he joined Dropbox in 2012. There, he used his business background and technical know-how to start a team dedicated to data science. One of his projects is called Project Harmony, communication software to let Dropbox users discuss changes to documents as they work on them. Another marketing-related project is focused on figuring out if users from the same company are on Dropbox so that the company can offer them a premium package. “He has an incredible background that spans the technical, math and real world context,” his boss ChenLi Wang says. “It sets his data science apart.” —Shalene Gupta

Surabhi Gupta, Software Engineer, Airbnb

Surabhi Gupta has always loved to travel, and regularly plans trips for her friends and family. As a graduate student in computer science at Stanford University, she became fascinated by the art of summarization: extracting meaning from text without actually reading the text. Gupta was working at Google when she started researching a trip using Airbnb, the room rental service. Fascinated by the possibilities their data offered, she contacted them and landed a job. Four months later, she had overhauled and improved their search algorithms. Today, she's working on condensing all of Airbnb’s listings to create summaries so users can quickly understand different cities’ vibes. “The overall goal," she says, "is: how do we get people to come to Airbnb when they travel? And how do we get them information when they want to travel?" —Shalene Gupta

Swatee Singh, Vice President of GMS IM Platforms & Big Data Capabilities, American Express

Swatee Singh’s technical background is impeccable, and includes a doctorate in Machine Learning from Duke University. But she’s all about making business personal. She’s the brains behind American Express MyOffers, which aims to give AmEx (axp) members what they want when they need it. If it’s noon and you have a taste for Mexican food, AmEx can present you with a coupon to a nearby burrito place. She’s also responsible for a tool that allows merchants to compare their annual performance. “She’s very energetic, and a visionary, she’ll go places,” says her boss Sastry Durvasula. “She has a strong technical background but she can also speak to leaders like a leader. When we talk about data she’s there.” —Shalene Gupta

Tamara Gaffney, Principal Analyst, Adobe Digital Index

Tamara Gaffney uses data to see the future. At Adobe Digital Systems (adbe) she leads a team that mines data from the companies that use Adobe’s cloud to make predictions about who will win an Academy Award, which summer blockbusters will be profitable, and how much people will spend online during the winter holiday season. She's on the mark: during the last holiday buying rush, her team's predictions were off by only 1%. “Her unique blend of understanding technology and her interest in why people do things makes her a stand out,” says her manager Julie McEntee. “She’s curious, she loves determining patterns from data, chasing down leads, and hypothesizing why things might be.” For her next project, Gaffney is working on putting together a prediction about mobile shopping applications. —Shalene Gupta

Vijay Subramanian, Chief Analytics Officer, Rent the Runway

At first glance, it might look like Vijay Subramanian has a less-than-glamorous job at a decidedly high-wattage company. But as Rent the Runway’s chief analytics officer, no one is plugged into women’s style trends quite like he is. Not long after he joined in 2010, Subramanian built a model to estimate missed demand, product longevity, and occasion usage for the company’s inventory—a huge cost-saver for a company that buys truckloads of dresses and accessories from fashion designers every season to rent to customers. “If you bring the three data sources together, you can create a framework for how you’re buying,” he says. “It tells us what kind of styles to look for that give us the highest probability of being a star”—and which combinations to avoid, of course. His next mission? Incorporate new types of data from Unlimited, the company’s major new expansion into everyday wear. “Our classic model is all driven by the event you’re going to. You may have an edgy streak, but if you’re going to a black tie wedding, it doesn’t matter what your true style is. It matters within context,” he says. “Unlimited is the beginning of understanding the user’s style DNA.” —Andrew Nusca

Yan Qu, Vice President of Data Science, ShareThis

It may be hard to believe, but Yan Qu’s work impacts 95% of American readers. At the social web company ShareThis, she developed a Social Quality Index that measures the social activity around online content and helps advertisers and publishers target the right audiences. “The technical side isn’t hard,” she says. “What’s hard is identifying a business problem to apply the technical side to.” Qu is a graduate of Carnegie Mellon University, where she received her doctorate in natural language processing. Before ShareThis, she led the Advance Research team at AOL’s Advertising.com. Today, she's excited about tackling challenges presented by mobile phones, which don’t allow sites to plant cookies to identify repeat visitors. It's all part of an effort to collect—you guessed it—more data. —Shalene Gupta

Courtesy of Data Collective

Zachary Bogue, Co-Managing Partner, Data Collective

It was certainly unusual for Zack Bogue and Matt Ocko to launch Data Collective, a San Francisco Bay Area venture capital fund focused on big data companies, in 2011. But the pair has been riding a surging wave ever since. “Cost curves are rapidly being crossed everywhere,” Bogue says. “Plunging costs are enabling all of these brand new ways to attack these old line industries.” This year, the firm launched its third fund, bringing its total raised to nearly a quarter of a billion dollars. It’s making bets on companies like LendUp, a nimble payday lending startup, and MemSQL, an in-memory database that is “orders of magnitude” cheaper and faster than what’s on the market. “It’s a little bit of a Cambrian explosion,” Bogue says. “It’s opening up massive markets or industries for investment that previously weren’t available. One of our theses is that every single vertical, industry, SIC code will be fundamentally disrupted by technology. It’s really exciting.” —Andrew Nusca