Can an A.I.-driven hedge fund beat the market?

This is the web version of Eye on A.I., Fortune’s weekly newsletter covering artificial intelligence and business. To get it delivered weekly to your in-box, sign up here.

Hello everyone. Last week, I wrote about recent breakthroughs in natural language processing. This week, I want to talk about an example of what those developments make possible.

John “Kanen” Flowers is a tech industry archetype. A self-styled hacker and coder, he worked for Microsoft in the early 1990s before becoming a serial entrepreneur. He created one of the Internet era’s first personalized news sites, a network security company, a company that did computer-generated special effects, and another—this time in Kansas City, not Silicon Valley—that used natural language processing to answer questions through a chatbot-like interface.

That last company flamed out, but Flowers says his interest in NLP didn’t die with it. For his current act, he’s moved to New York City and combined his fascination with NLP with what he calls “an extreme interest in the stock market.” Only this time, he’s not running a tech company. He’s running a hedge fund.

“I believed if you could capture every blog, every analyst report, every chart, and feed it into a machine learning system, if you could get all of that, you could measure the emotional index and the velocity of interest people have in different stocks, and build a successful trading system,” he says.

The idea of mining both news reports and social media posts and producing “sentiment analysis” isn’t new. Plenty of other financial firms have invested in such systems over the past decade. But what sets Flowers’s one apart is the amount of data it ingests, the sophistication of its language analysis, and the degree to which he’s automated his fund’s entire strategy. Some financial firms use A.I. models to generate investing ideas. But these are often reviewed by humans. And plenty more now use A.I. to figure out the best way to execute a trade. But very few firms have taken the human out of the loop to the extent Flowers says he has: His system analyzes data, generates trading ideas, and executes them without human control.

Not that Flowers had an auspicious start: he says that he and his partners began with $1 million of their own money in 2017 and that, over about six months, lost every penny of it. But he kept at it. In 2018, he says his automated system made about a 60% return. In January 2019, he formally set up a hedge fund called Next Alpha, which today, he says, has about $30 million under management. That’s small for a hedge fund—but, so far, Next Alpha’s returns are impressive. It netted 40% between April 2019 and April this year, a period during which the S&P 500 lost 4.9%.

Next Alpha’s numbers are also a lot better than most other A.I.-focused hedge funds. The Eurekahedge A.I. Hedge Fund Index, which tracks some 20 hedge funds that say they use A.I. to determine their investing strategies, returned only 6.3% in 2019 and is up only 2% in 2020 so far, compared to 5.3% for the S&P 500. It seems like most A.I. systems have had a hard time dealing with the pandemic’s massive plunge and equally massive recovery. The index’s return for the past three months is a paltry 1.5% compared to more than 12% for the S&P 500.

There are a couple of things that helped Next Alpha. One is that it teamed up with a company called Accern. Accern runs what’s known as a “no code” platform—it’s software that lets you build A.I. models without having to know how to program. The platform lets data scientists create and train models on vast quantities of unstructured text as well as other kinds of data. Accern provides its customers access to a lot of news and social media feeds, but they can also upload whatever text they want to the system. Kumesh Aroomoogan, Accern’s co-founder and CEO, tells me that Accern is being used by banks to help with lending decisions and insurance companies to help with underwriting. Flowers’ fund Next Alpha used Accern to help with the sentiment analysis portion of its own trading model.

Aroomoogan says the advent of ultra-large language models in the past few years has enabled Accern to rapidly accelerate the capabilities of its own NLP tools. For instance, it has been one of the early access customers for OpenAI’s GPT-3 model and is working to integrate it into its own systems. “We saved years and capital and a lot of resources there,” he says.

Still, he says that one of the biggest mistakes he sees companies make with A.I. is taking pre-trained models, such as GPT-3, and assuming they will work perfectly without additional training on data that is specific to that business or use case. He says Accern spent years making sure its NLP was fine-tuned for the vocabulary used in the financial industry. “A general model will not give you the best accuracy,” he says. What’s more, honing a model on your own data takes much more time than many people assume. “Adjust your timelines because it will take six months to nine months to actually train this model for the best accuracy,” he says.

As for Flowers, he says the biggest A.I. pitfall, especially in the investing world, is an over-reliance on supervised learning from historical data. People, he says, are too limited by their own experience and imaginations. They spend too much time thinking about the worst scenario they can remember, rather than the worst scenario that could possibly happen. As a result, they aren’t well prepared for black swan events—such as the 2008 financial crisis or the recent pandemic. Flowers says he prefers instead to use approaches like reinforcement learning, where an A.I. model can learn from experience in a simulated environment. In the simulator, you can test the model against all kinds of extremely unlikely scenarios—including ones that have never occurred in history.

Still, Flowers says simulator have limitations too. “There’s nothing like live ammo to determine if the system is going to work,” he says. “Until you watch the system actually interact with the market you have no idea how it will perform.”

***

Before we get to the rest of this week’s A.I. news, I wanted to correct something from last week’s “Eye on A.I.” In that newsletter, I described my experiments in a new online lab environment that San Francisco startup Primer has created to showcase its own NLP capabilities.

In the newsletter, I faulted Primer’s question-answering tool for not being able to find something in a 100,000 word SEC document that was pretty trivial to find just using a web browser’s search function. John Bohannon, Primer’s head of science, got in touch to say that the problem was not that Primer’s tool didn’t work, it’s that the lab environment restricted text uploads to 50,000 characters. He admitted that Primer’s user interface should have made this character limit more evident. But I want to apologize to John and to Primer for faulting their NLP tool.

Also, in the same essay, I incorrectly stated that the workload of the average U.S. intelligence analyst, which stood at 200,000 words per day in 2016, was only 20,000 words per day in 1956. It turns out it was that low as recently as 1995. Again, my apologies for the error.

***

And now, here’s the rest of this week’s news.

Jeremy Kahn
@jeremyakahn
jeremy.kahn@fortune.com

A.I. IN THE NEWS

No points for second place. An A.I. system creamed a top U.S. Air Force fighter pilot in simulated dog fighting, winning all five match-ups in the finals of a competition held by the U.S. Defense Advanced Research Projects Agency (DARPA). The A.I., created by Washington, D.C.-area defense contractor Heron Systems, had earlier in the week bested eight similar software programs created by a range of defense contractors and academic research teams. While getting A.I. software to successfully transfer skills from a simulated environment to the real world is always a bit tricky, there's no doubt that the days of human-to-human air-to-air combat may be rapidly drawing to a close. Still, Col. Daniel “Animal” Javorsek, who runs the A.I. piloting program at DARPA, said that rather than replacing fighter pilots completely, he foresaw A.I. software acting as a cockpit assistant, helping human pilots manage systems during combat or select the best tactics and react faster in a dogfight. He also said the systems could be used to create autonomous drones more capable of defending themselves while carrying out missions. My Fortune colleague Aaron Pressman chronicled the contest for us here.

Nvidia reports stronger-than-expected earnings growth on booming datacenter and gaming business. The chipmaker, whose graphics processing units remain the market-leading computing chips for A.I. applications, posted record sales for the second quarter, with revenues jumping 50% to $3.87 billion, beating Wall Street expectations.The company said its datacenter business saw revenues double to $1.75 billion while gaming revenue jumped 26% to $1.65 billion. Despite this, the company's stock, which has hit all-time highs this year, allowing Nvidia to surge past Intel in market cap to become the world's most valuable semiconductor firm, slid after chief financial officer Colette Kress forecast data center sales will grow in the “low-to-mid single digit” percentage range in the third quarter, well off the 50%-plus quarter-over-quarter pace the company has been experiencing, Bloomberg News reported.

Microsoft and Department of Energy develop A.I. tools for natural disasters. The computing giant and the government department have formed a partnership called The First Five Consortium, named for the importance of the first five minutes to the outcomes of emergency response. Together they will try to build between 10 and 30 different A.I. systems that can aid those dealing with emergencies and natural disasters ranging from earthquakes to wildfires to floods, according to The Wall Street Journal. It said the consortium already has two A.I. tools in early-stage development: one for mapping the boundaries of an active wildfire and a similar tool for flood control.

Traditional banks are under pressure on A.I. development from fintechs, Big Tech. That's according to a story in The Wall Street Journal. The story quotes Gaia Bellone, senior vice president and head of data science at KeyBank Corp., telling a recent webinar panel that she thinks banks like her's will increasingly turn to fintech companies as sources of innovation in areas such as artificial intelligence, and simply incorporate those solutions into the banks' services rather than building their own systems. One of the problems with big banks trying to implement A.I. systems on their own? Too much internal red-tape, she said.

EYE ON A.I. TALENT

Payments company iBanFirst, headquartered in Brussels, Belgium, has appointed Yann Stadnicki as chief data officer, the company announced. Stadnicki had been a research engineer at Microsoft. At iBanFirst he will be responsible for integrating data, machine learning and A.I. across departments including sales, finance, and research and development, the company said.

Aptitude Software, which makes financial software, has appointed Nick Nesbitt as executive vice president, international, according to a report in trade publication AiAuthority. He was previously a manager director at Vuelta.

A.I. company DataRobot has appointed Parm Uppal as chief revenue officer, the company said. He was previously vice president of sales at AppDynamics.

EYE ON A.I. RESEARCH

Sometimes the simple, old tools really are best. Correctly labeling action in video sequences is an important task that people are increasingly automating using A.I. And computer vision algorithms for doing this have been getting more and more sophisticated. But, in a paper published this past week on the research repository arxiv.org, a group of M.I.T. researchers complain that these algorithms have often been tested against video test sets seemingly cherry-picked to give that particular algorithm an advantage and have often been presented without key information about exactly how the algorithm was trained and tuned so that other researchers can attempt to reproduce the results. So the M.I.T. team set out to rectify that. They tested 14 different techniques for action labeling in videos, using an experimental set up that allowed for clear apples-to-apples comparisons. They found that older, simple, two-dimensional convolutional neural networks—a type of neural network architecture that has been around for quite a while now—worked better than much more complex, models that try to analyze the videos in three dimensions. What's more, these simpler models were much faster—and thus less expensive—to train. One of the researchers other key findings was that the depth of the model—how many layers the neural network architecture has—was more important to its performance than how many different input variables it used. In other words, deeper models were better than bigger models.

FORTUNE ON A.I.

Fitbit debuts new smartwatches that tracks how stressed the owners are—by Aaron Pressman

Here’s how much these tech giants are making in profit per employee—by Jonathan Vanian

Brynne Kennedy could be the first female tech founder to serve in Congress—by Emma Hincliffe

Will India’s Jio be the next tech giant?—by Vivienne Walt

BRAIN FOOD

"Collective Punishment By Statistics." The fallout from the British government's misguided attempt to use an algorithm to award high school students their final grades—on which their university admissions hinge—continues. To recap, the U.K. education system places enormous weight on subject-specific exams called A-Levels, taken at the conclusion of the last two years of secondary school. They are the sole assessment of what students have learned during the previous two years and also the key determinant of university admissions. Only this year, due to Covid-19, the exams were cancelled. Instead, the government asked teachers to submit grades for students. But, concerned about grade inflation after it became apparent that the teacher-submitted grades skewed heavily towards higher marks, the government decided to adjust the grades based on an algorithm that would take into account factors including how well that student's particular school performed historically. This, not surprisingly, meant the algorithm severely disadvantaged students from poorer backgrounds who tend to go to worse-performing schools, while boosting those at already privileged elite private schools that on average have much better exam results. Nearly 40% of students had their grades reduced. After public outcry, the government reversed course and decided to just give students their teacher-awarded grades after all.

But now the problem is that universities, which had already made admissions offers based on the algorithm's results, say they cannot accommodate all the additional students who qualify for admission thisl year, especially given the need to enforce social-distancing due the on-going pandemic. So they are encouraging students to defer their places until next year—which means that students taking their exams and applying for admission in 2021 will be disadvantaged because they will be competing for far fewer available university places than normal. Meanwhile, results for exams that 15-year-olds take, called GCSEs, which were also moderated by an algorithm, wound up being unusually high, with some schools complaining weaker students had their teacher-submitted grades were bizarrely upgraded. In short, it's all a complete mess. One father who helped uncover the issue quipped to The Guardian that it was like "collective punishment by statistics."

And this insightful piece in The New York Times warns that the episode is a harbinger of the kinds of problems we may all encounter soon enough as governments, infatuated with technology that seems to offer silver bullet solutions to intractable problems, increasing deploy complex and often opaque algorithmic models to life-altering decisions:

“There is an idea that if it has an algorithm attached to it, it’s novel and interesting and different and innovative, without understanding what those things could be doing,” said Rachel Coldicutt, a technology policy expert in London who is working on a book about responsible innovation.

A lawyer who has brought several challenges against the British government for its use of biased algorithms says: "There has been a tendency to compute first and ask questions later."

And as this trenchant analysis in M.I.T. Technology Review on the subject says those questions were fundamental. There was no way to have an algorithm that was both fair to individual students and matched historic statistical averages. And in trying to decide which goal to prioritize, government officials never asked such vital first questions as: what are grades for in the first place?

"If they just looked one step past their immediate problem and looked at what are the purpose of grades—to go to university, to be able to get jobs—they could have flexibly worked with universities and with workplaces," Hye Jung Han, a researcher at Human Rights Watch in the US, who focuses on children’s rights and technology, told the publication.

That is a vital lesson for anyone trying to implement A.I.-based systems. Before trying to automate something it is important to ask what the purpose is of that process is. No amount of fancy technology can overcome muddled thinking about first principles.

Subscribe to Well Adjusted, our newsletter full of simple strategies to work smarter and live better, from the Fortune Well team. Sign up today.