An Era in AI Ends with DeepMind’s Conquest of Go

0
1616
Image Courtesy by Bijan Saniee screenshot

Bijan Sainee
Staff Writer

Google DeepMind’s Go playing computer program — called AlphaGo — engaged Lee Sedol, one of the top Go players in the world, in a five-game series, winning 4-1 in a display of great progress within the AI community.

Go is an ancient strategy board game that originated in China at least 2,500 years ago, and continues to be a popular pastime most predominantly in East Asia. Although a computer version of Go has been discussed for decades, programmers were more interested in solving chess problems until IBM Deep Blue’s controversial victory over then-World Champion Garry Kasparov in 1997.

Over the past twenty years, however, computer Go has evolved rapidly, and, as demonstrated by AlphaGo in the DeepMind challenge match, finally caught up to the level of top human players. DeepMind was founded in 2010 by Demis Hassabis, an AI developer with a talent in chess, having reached master level at the age of 13, and a deep interest in strategy board games.

Like any pure strategy game, Go is essentially a solvable puzzle, as either player one or player two is predetermined to win a perfectly played game. When played perfectly, games such as tic-tac-toe and checkers result in a tie. The problem with mastering Go, however, has to do with its complexity.

To put Go’s complexity into perspective, checkers has an estimated 10^20 positions, while chess has an estimated 10^60 possible variations in moves. Go, on the other hand, has about 10^700 legal positions. No existing computer can even begin to process the sheer volume of variations required to look for a solution for Go.

So, Go’s complexity rules out a “brute force” solution to the game, which refers to a method that game computers employ to search through millions of variations in order to find the best possible line. Therefore, DeepMind AlphaGo uses a machine learning system on a convolutional neural network that allows it to not only learn the rules, but also improves with each game it plays.

The DeepMind team initially trained AlphaGo by giving it a database — consisting of about 30 million moves — of high quality recorded human games so the program could begin to mimic the players’ moves. Most of AlphaGo’s development occurred in the second stage of its training, which consisted of the program playing millions of games against itself and keeping track of all possible moves.

With the exception of AlphaGo, top Go computers tend to hit a wall below human professional level. These computers do not have the self-learning component of AlphaGo, but rather rely on a Monte-Carlo Tree Search, which looks at the most promising moves, examines possible subsequent variations and chooses the best move from the set.

Although AlphaGo soundly defeated Fan Hui, the reigning European Go Champion, before taking on Lee Sedol, most members of the Go community felt that the program would be unable to take a game from the Korean champion. Although Fan Hui and Lee Sedol are both professional Go players, there is a stark difference in their playing strength, which is evidenced by their respective rankings, 2P and 9P.

Go uses a traditional rank promotion system that, for professionals, begins at 1P (short for 1 Dan professional), and ends with 9P (9 Dan professional, the highest achievable rank).

For the match against Sedol, the DeepMind team used a distributed version of AlphaGo that used 1,920 CPUs and 280 GPUs. The machine took the first three games, showing a novelty in the second with move 37, which top Go analysts agreed no professional human would think of making.

Despite having already lost the match, Sedol responded in the fourth game with a win, confusing AlphaGo on move 78; AlphaGo had estimated the chances of Sedol playing this move at one in ten thousand, and was unable to cope with the development, immediately losing its advantage on the board. Once AlphaGo recognized its mistake, it proceeded to engage in dubious play that even a novice human player would agree was objectively poor. Because AlphaGo won game five, game four presents the only example of AlphaGo attempting to play from a losing position.

“To be honest we are a bit stunned and speechless,” Hassabis said, responding to the DeepMind Challenge. “AlphaGo can compute tens of thousands positions a second, but it’s amazing that Lee Sedol is able to compete with that and push AlphaGo to the limit.”

The poor play demonstrated by AlphaGo in game four is seen in all Go computers that use the Monte-Carlo Tree Search. In this sense, even AlphaGo is unable to totally emulate the human approach to Go and room for improvement is evident.

Go is by far the most complex popular board game developed by man, and its conquest by AlphaGo can be seen as the end of a programming era in which the winning of complicated board games signified progress within the community. Hassabis says that DeepMind’s goal is to eventually apply its general AI learning algorithms to socially practical areas in health care and science.