A New A.I. Is Running the Table Against Poker Pros. Is Business Strategy Next?
It knows when to hold ’em and when to fold ’em. And, unlike in the old Kenny Rodgers ballad, it didn’t need a grizzled cowboy gambler to teach it a trick or two.
A poker bot has beaten a table full of pros at six-player, no-limit Texas Hold ’em, the version of the game used by most tournaments, over the course of 10,000 hands of play.
To master poker at this level, the A.I. learned entirely by playing millions of hands against itself, with no guidance from human card sharks. Among the players the bot, which is called Pluribus, beat were four-time World Poker Tour champion Darren Elias as well as World Series of Poker Main Event champions Chris “Jesus” Ferguson and Greg Merson.
The feat, which was announced today in conjunction with the publication of a research paper on the experiment in the journal Science, marks an important milestone in the development of artificial intelligence. The research was jointly undertaken by Facebook and Carnegie Mellon University in Pittsburgh, Pennsylvania.
The development has potential implications for the business world where, as in poker, people need to make strategic choices amid great uncertainty and where outcomes are often not zero-sum.
“Most real-world strategic interactions involve hidden information or multiple opponents or both,” Noam Brown, one of the researchers who designed Pluribus, says. Brown is a researcher at Carnegie Mellon and also holds a position at Facebook’s A.I. Research division. “This could be deployed across the board to countless scenarios,” he says.
Business negotiations are one area where the techniques behind the Pluribus might be useful, Brown says. Another area, he says, is cybersecurity—where adversaries have imperfect information about one another’s capabilities and intentions. He adds, however, that neither he nor Facebook have any immediate intention of commercializing the technology.
Games have long-been used as benchmarks of A.I. progress. Games test reasoning ability and simulate, in simplified form, some of the decision-making dilemmas found in the real-world. Computer scientists have also favored games for another reason: they have point systems and clearly-defined winners and losers. This makes them ideal environments for reinforcement learning, a technique where software learns from experience instead of existing data. In order for such software to judge whether a particular action is likely to be beneficial, points serve as a convenient reward signal, in much the way a dog trainer doles out a treat if Fido sits on command.
Chess was long considered the epitome of human strategic thought, a symbol of calculating rationality and intellect. It, of course, succumbed to artificial intelligence in 1997 when IBM’s DeepBlue algorithm beat grandmaster Gary Kasparov. After chess, came Go. In 2016, AlphaGo, an algorithm created by DeepMind, the London-based A.I. research shop owned by Google-parent Alphabet Inc., beat Lee Sedol, the world’s best player at the game. With a larger board than chess, Go is a far more difficult challenge: there are more possible move combinations than there are atoms in the universe and players select moves as much by instinct as by brute calculation. In ancient China, where the game originated, Go was considered one of the four essential arts a scholar needed to master.
Poker meanwhile enjoys a sleazier, less noble reputation. In poker, deception, luck and human psychology can play as large a role as pure intellect and reason. Well, guess what? Poker is a lot closer to most real world-decision making than either Go or chess. Multiple player games also more closely mirror the complexity of many situations in life, which are not winner-take-all.Pluribus builds on the techniques Brown and his Carnegie Mellon doctoral advisor, Tuomas Sandholm, used to create Libratus, another poker playing A.I. that in January 2017 beat four human poker pros over the course of 120,000 hands. But that experiment involved one-on-one competition, not the more usual six-player tournament version of the game.
In such two-sided games, it is always possible, through mathematical brute force, to compute an optimal strategy—known as a Nash equilibrium—that will result in the A.I. player at least breaking even. In non-team, mutli-player games, this kind of Nash equilibrium often doesn’t exist or is too difficult to calculate.
For this reason, Brown says six-player poker represents a harder challenge than even Starcraft II or Dota2, two video games where A.I. agents, designed by DeepMind and A.I. research firm OpenAI respectively, have beaten human opponents over the past two years. Those games are also complex and involve imperfect information and multiple players. But the players are grouped into two teams which face off in a winner-take-all contest, meaning an algorithm can still try to find the Nash equilibrium.
Starcraft II and Dota 2 also involve tactical elements—arcade-style shoot-’em-up battles. If an A.I. can master these tactics at super-human levels, it can win without having to use particularly innovative strategies. That’s not the case with poker. “In poker, you have to address imperfect information head-on,” Brown says. There’s no way to sidestep the problem by, for instance, learning to stack your chips better than your opponent. Being able to deal with unknown information is the key to effective bluffing and betting, he says.
Super-Human Performance, On A Laptop
Compared to Libratus, the earlier poker-playing A.I., Brown and Sandholm made substantial changes to the design of Pluribus that mean it requires far less computing power to both train and deploy. Libratus had used about 15 million core hours on a supercomputer to train. Pluribus uses just 128,400 core hours on a machine with 512 gigabytes of working memory—or about what a souped-up gaming laptop might have.
This is also vastly less computing power than that needed to train other A.I.s for game playing breakthroughs: AlphaZero, the latest version of DeepMind’s Go-playing algorithm, was trained on more than 5,000 of Google’s own highly-specialized computing processors. OpenAI’s Dota2 bots required more than 128,000 cores for every hour of training—and it trained for days.
The cost of all that data-crunching power can easily reach into the hundreds of thousands or even many millions of dollars. Brown and Sandholm estimate that at current cloud computing prices, it would cost less than $150 to train Pluribus. And, once trained, the algorithm is so light-weight, Brown and Sandholm could run it on a conventional 128 GB laptop.
The secret to Pluribus’ efficiency is a simple, but elegant way of strategizing. Libratus and many other game-playing A.I.’s “look ahead” to see how a strategy is likely to play out through to the end of a game, but this is too computationally difficult for a six-player game, especially given that each opponent can change their own strategy in response to what every other player around the table is betting. Brown and Sandholm found that Pluribus could achieve super-human performance by simply exploring the possibilities two or three rounds into the future and assuming the other players chose one of four possible strategies each round.
This finding may also have big implications for real-world A.I. applications: it may turn out to be easier and less expensive to create algorithms capable of advising human decision-makers under conditions of uncertainty than previously assumed.
A New Style of Hold ‘Em
The most immediate impact of Pluribus, though, is likely to be in the world of poker itself: Since the algorithm learned entirely from self-play, it can discover strategies and tactics beyond those found in poker lore.
For instance, conventional poker wisdom holds that if a player has been conservative on a betting round and merely checked, meaning the player declines to bet, or called, meaning the player matches the bets of the others, that player should not start the next betting round by raising. Yet, in its games against the human pros, Pluribus found this tactic—which is known as “donk betting”—could actually be effective. Pluribus also makes far more aggressive bets than human players tend to. And it plays a far more balanced game—varying whether to bluff or fold with a bad hand and whether to bet aggressively or conservatively when holding a good hand —than most human players. That makes it difficult for opponents to gain much information about Pluribus’ hand from its betting strategy.
Brown says the human pros that played Pluribus are already planning on adapting such strategies in their own future games.
So, while an A.I. is never going to bequeath you an ace that you can keep, like Rodgers’ grizzled gambler it might just give you something far more valuable: wisdom.