It’s no secret that, these days, a single post on social media can have a dramatic impact. Consider two days in January 2013 when a series of damning—but false—tweets sent two stocks plunging. Some of the posts claimed that a company called Audience was being criminally investigated for “rumored fraud.” A second set claimed that the FDA had seized clinical-trial records of Sarepta Therapeutics (SRPT) on suspicions the results had been “doctored.” Only later did many readers notice that the authors were not in fact the well-known short-selling firms Muddy Waters and Citron Research, but rather two fake accounts using similar names with misspellings: @Mudd1waters and @Citreonresearc. The stocks fell 28% and 16%, respectively.
But there’s a surprising coda to this anecdote. Despite fooling the market, the perpetrator managed to net only $97 from his deception. By the time he bought shares of each a mere 10 minutes after his tweets moved the market, it was too late. Other investors—some, no doubt, using their own social media tools—had figured out the ruse, and the share prices almost instantly bounced back. So says the Securities and Exchange Commission, which filed suit against the alleged social media scammer, a Scottish man, in November. The SEC’s case, which was accompanied by a federal criminal indictment, is the first to charge market manipulation via Twitter. (The accused has reportedly denied any wrongdoing.)
The episode reveals a lot about ways investors are using Twitter (TWTR) and the like to guide trading decisions. Increasingly, there’s a new technological race in which hedge funds and other well-heeled investors armed with big-data analytics instantly analyze millions of Twitter messages and other non-traditional information sources to buy and sell stocks faster than smaller investors can hit “retweet.”
Irish research firm Eagle Alpha, for example, digested 7,416 comments on a Reddit gaming thread in October to predict that Electronic Arts (EA) would sell more of its new Star Wars videogame than it had projected; Electronic Arts soon raised its sales forecast, citing “excitement” over the game. Monitoring web traffic on Alexa.com this spring, the quant team at Goldman Sachs Asset Management noticed a spike in visits to HomeDepot.com (HD) and loaded up on the home-improvement stock months before the company increased its outlook and shares surged. “You have this explosion of other independent real-time sources. It’s a lot easier to get to [on-the-ground] truth,” says Matthew Granade, chief of the newly created data-crunching team at Point72, the reincarnation of Steven Cohen’s hedge fund SAC Capital Advisors. “Overall, I think this is a golden age for new investment data sources.”
That has meant a wave of demand for services such as Dataminr, which applies advanced analytics to the entire Twitter “fire hose” to detect events likely to move the market. Founded in 2009 by three former Yale roommates, the company now has roughly 75 financial clients—up from 50 a year ago—including the majority of big investment banks and at least half the top hedge funds, overseeing a collective $1 trillion in assets. (Dataminr’s customers also include Fortune 500 companies, media outlets, and government entities.)
“Dataminr feeds are like table stakes right now: Most hedge funds need to have it,” says Santo Politi, a founder of Spark Capital, a venture capital firm that was an early backer of Twitter and has a majority stake in a two-year-old hedge fund, Tashtego, that trades on signals from social media and other nontraditional data.
Whether or not they use Dataminr, hedgies are increasingly paying attention to Twitter and its ilk. “They’re just now starting to take advantage of what’s available through social media and this wisdom-of-crowds concept,” says Divya Narendra, founder of SumZero, a social network for pro investors. “That’s a new phenomenon.” But how to best take advantage of it—or even whether to do so—is a subject of sharp dispute.
When Dataminr launched six years ago, a billion tweets had been posted over Twitter’s history. Now that quantity is produced every two days. Dataminr founder and CEO Ted Bailey saw opportunity. His company became one of the first to buy direct access to the entire stream of tweets. Today it remains one of few companies that still have it; Twitter cut off the full feed for some companies in 2015 after acquiring Gnip, which resells social media data to analytics businesses and other clients. Access to the complete Twitter stream costs about $30,000 a month, with fees based on usage that can take the charges up to $1.5 million a year, according to people who have used the data.
A key reason for Dataminr’s prime position: Twitter owns a stake in it, according to Tom Glocer, another investor in Dataminr and a board member of Morgan Stanley (MS) (which is itself a client of Dataminr’s). That’s one of the service’s “big differentiating advantages,” he says. Bailey would only shrug when asked about the investment—or any of his company’s finances. Twitter declined to discuss its agreements or prices, but it says a growing number of hedge funds are buying its feeds directly.
Matthew Granade, who heads the data-crunching team at hedge fund Point72, the reincarnation of SAC Capital, calls this a “golden age for new investment data sources.”Photograph by Christopher Lane for Fortune Magazine
At 34, Bailey doesn’t lack self-assurance, though he may lack sleep—he’s got dark circles under deep-set eyes. Tall and with a grin just short of a smirk, he’s sitting in Dataminr’s headquarters in Manhattan’s Koreatown. Bailey expects to double his staff—currently about 200 employees divided among offices in New York City, Washington, D.C., London, and Bozeman, Mont.—over the next year. “At this time we are the leader, if not the only [company] that really has built these products and been successful,” he says. Bailey has lofty ambitions. He wants Dataminr to be “an industry-defining company that’s around for a long time, that’s a very significant, large company.”
Bailey sees almost limitless uses for his technology. “It’s pretty hard to come up with industries that would be happy knowing later, less, and not everything,” he says. “So if you think of it that way, the opportunity for Dataminr really is as big as the need for early information, more information, and full context on information and events around the world.”
Today, Dataminr specializes in identifying black-swan events, or “unknown unknowns,” as Bailey calls them, before the market reacts. It uses machine learning and cross-references 30 other data sets—from maps to triangulate users’ locations, to patent data, to stocks’ movement—to identify tweets and trends with impact, based on unusual patterns and “clusters” of similar tweets.
Using this method, Dataminr says it delivers early intel to its clients. Translating tweets from French and German, the service began alerting clients to the terrorist attacks in Paris five minutes after the first occurred outside the Stade de France, more than 45 minutes before the Associated Press tweeted the news. Dataminr revealed preliminary reports of Volkswagen’s emissions scandal three days before its stock price plummeted 30%. Oil and gas traders, Bailey says, received alerts about the death of the King of Saudi Arabia more than four hours before crude prices spiked on the news.
Dataminr is the biggest player in a nascent industry—call it alternative big data for big finance—that has exploded in the past six months: In March it raised $130 million from Fidelity as well as other investors, including former Citigroup (C) CEO Vikram Pandit, valuing the company at $700 million. (Since then, Fidelity, which owns 10% of Dataminr, marked down the value of its stake by more than a third. Dataminr declined to comment, but a person close to the startup contends the move reflects a change in general market conditions and not Dataminr’s prospects.)
It’s tricky to determine how much funding is flowing to companies like Dataminr, but the amount appears to be rising. Financial technology (“fintech”) startups have received more than $11 billion in venture capital funding so far this year, 83% more than all of last year, according to CB Insights. Banks like Citigroup and Goldman Sachs (GS) (another Dataminr investor) have backed 15 of those companies in 2015, compared with nine last year. CB Insights doesn’t specify what portion of those investments were for data analytics but says it’s a “hot” and growing category.
Twitter is only one of many new hoses from which investors are guzzling. Whereas hedge funds once might have sent an analyst to count cars in retailers’ parking lots to inform their earnings models, they’re now deploying web-crawling bots to vacuum info from online job-listing sites, Amazon (AMZN) reviews, Wikipedia, Zillow (Z) home-value records, FDA patient complaints, and the remotest reaches of the Internet. Investors are “scraping” retailers’ online stores for prices and inventory, or nearly buying (then quickly canceling) tickets on Expedia (EXPE) to figure out how many seats are left on every airplane.
Even fintech startups that don’t specialize in analytics, such as SumZero, StockTwits, and Scutify, have begun fielding requests from hedge funds wanting to buy their data, such as what stocks their users are searching for, which is seen as a potential proxy for bullishness.
Dataminr doesn’t sell a product to individual investors. But social media input is starting to leach into even mainstream websites. For example, Fidelity in November added a “social sentiment” score from Dataminr rival Social Market Analytics to its stock research pages.
Not everyone is a believer in the investing value of social media. “There certainly is a lot of skepticism,” says Franklin Gold, senior vice president for research and education at Fidelity. Shanta Puchtler, co–chief investment officer at Man Numeric, a division of Man Group, one of the world’s largest hedge fund firms, with $77 billion under management, says his team hasn’t been able to glean actionable insights from social media. “There’s this romantic notion that Twitter tweets are investable and you can make lots of money if you jump on them,” Puchtler says. “You do have to ask yourself the question, ‘Where is the value?’ ”
For all their reach, social media analytics can misfire. One large quant hedge fund got stung when its algorithm confused sarcastic tweets about Lululemon’s (LULU) see-through pants debacle with positive sentiment, buying shares in the yoga-apparel retailer when it should have been selling. “At some point in time, everyone has gotten burned by something that happened on Twitter,” says Joe Gits, CEO of Social Market Analytics.
Early damage to the concept was done by a short-lived British hedge fund, Derwent Capital Markets, which announced it was launching the world’s first “Twitter Fund” to much fanfare in 2011, only to shut it down a month later. Derwent’s founder, Paul Hawtin, popped up again in the Caribbean in 2013 promoting a new firm with a similar strategy but fell off the radar a year ago.
Dataminr’s Bailey says his company is getting better at separating wheat from chaff. But one challenge is perpetual: the eternally changing nature of the crowd on social media. Twitter users join—then deactivate their accounts. They delete tweets. A pattern detected can abruptly disappear. It means that analytics companies must constantly adjust their algorithms and models.
Some early believers fear that Twitter data has become so pervasive that it no longer offers an edge. David Lewis, head Americas trader at $800 billion asset manager Franklin Templeton, for example, found out that Russia had invaded Crimea last year well before the news hit major media because he’d been monitoring Twitter. Since then, though, he says, the social media headstarts have narrowed, and he dropped a trial of a Twitter-based alert system. “News just travels a heck of a lot faster,” says Lewis. “I think those opportunities are becoming more difficult because this data is more widely shared or information is easier to get. No one really has a monopoly on information anymore.”
That’s naive, counters Gene Ekster, a consultant who helps hedge funds implement alternative data strategies. “There is no way you’re going to arbitrage the alpha out of Twitter data,” he says. “It’s crazy to think. You can analyze it in an infinite number of ways.” Still, aware that investors might view it as losing its leg up, Dataminr just launched a custom product—which funds can use to blend its algorithms with their own to get alerts that their rivals won’t have.
How should an investor make use of social media data? Here, too, there is disagreement. Those who invest for the long term argue vehemently that, at most, Twitter data is one small piece of a much bigger mosaic. But plenty of momentum-oriented short-term traders seem to be buying and selling purely on social media.
Witness, for example, the way tweets from the real Citron Research recently pounded shares of pharma firms Valeant (VRX) and Mallinckrodt (MNK) . (Says Citron’s Andrew Left: “It’s like I’m a Kardashian. People are actually following my tweets. Crazy.”) And within a span of six weeks this fall, Hillary Clinton caused a drop in biotech stocks with a tweet calling for greater regulation of drug prices, then single-handedly tanked stocks of private-corrections companies when she tweeted about prison reform. Then there was the “hash crash” of 2013, when the Dow dropped 145 points in two minutes after someone hacked the Associated Press’s Twitter account and posted, falsely, that explosions in the White House had injured President Obama. “We’re starting to call it the dumb money, because these algos are reacting on the basis of some really stupid stuff,” says Leigh Drogen, CEO of Estimize, which crowdsources corporate earnings estimates.
Still, the studies that have examined this new field so far have largely supported the notion that analyzing social media can be useful in investing. Sentiment as surmised from social media correctly predicted which way the Dow would move three days later 87% of the time, according to a 2011 study by Johan Bollen, an associate professor at the Indiana University School of Informatics and Computing. In July, Eli Bartov, a professor at New York University Stern School of Business and two other researchers found that “aggregate opinion” from tweets before earnings announcements could predict earnings surprises as well as market reactions for individual stocks, leading to outperformance of 5% to 10% per year.
Bollen recently started a company, Guidewave Consulting, to sell the patented signals he describes as “sensing the zeitgeist among investors.” He says those signals have produced the highest returns when computers trade on them automatically, without the second-guessing of a human being.
Web scraping and other alternative data collection practices are already fueling debate over what constitutes nonpublic information and insider trading—and whether investors can misuse information even when it’s public and legally obtained. SumZero, for one, discovered that one of its users attempted to scrape all the research published by investors on its site, violating the startup’s policies.
Certainly, using technology to gather web data seems innocent compared with, say, cybertheft. But data miners and analytics companies say it’s a finer line than investors often realize; consultant Ekster, for example, says investors can receive cease-and-desist orders for scraping publicly available data. The website for Harvest Exchange, another online investor community, slowed because an algorithm in Florida was checking multiple hedge fund managers’ profiles every 10 seconds to see if they had posted any potentially market-moving ideas. (The site banned the offending IP addresses.)
The SEC set up its own data-mining unit a few years ago to help it catch market fraud. Social media has complicated that task. “In a world where information travels very, very fast and through different media, figuring out whether information is public or not is challenging,” says Daniel Hawke, the former chief of the SEC’s market abuse unit, who recently joined Washington, D.C., law firm Arnold & Porter.
Perhaps the most poetic example of the power of social media mining occurred in April. That’s when a startup called Selerity, which plumbs the web for earnings reports, detected results for Twitter itself. The information had accidentally been posted an hour early. Within three seconds, Selerity’s bots synthesized the report into less than 140 characters and tweeted it out; four seconds after that the market was moving. Many people accused Selerity of hacking or having been leaked the data (which, according to Nasdaq, was available for only 45 seconds before it was taken down). But Selerity insisted it had merely visited Twitter’s website. CEO Ryan Terpstra points out that anyone with a web browser could have accomplished that: “Just keep clicking ‘Refresh.’ ”
A version of this article appears in the December 15, 2015 issue of Fortune with the headline “Trading on Tweets.”
Editor’s note: This article has been updated to reflect the debate surrounding public vs. nonpublic information more completely, and to more precisely describe the website issues Harvest Exchange experienced due to a scraping algorithm.
For more on Twitter, watch this Fortune video: