Untitled

DeepMind is a company that develops artificial intelligence. The company was bought by Google in 2014. One of their major developments, the Neural Turing Machine, is a neural network that plays games. So far, the company has tested games such as Space Invaders, Atari Breakout, and more.

Go
DeepMind began research on a program called AlphaGo in 2014. Go is considered to be more complex than chess, as “the number of possible configurations of the board is more than the number of atoms in the universe” (DeepMind, 2016). For reference, Garry Kasparov, a world champion in chess, was defeated by Deep Blue in 1997. At that time, the best programs for Go could only beat amateurs (Wikipedia).

However, come 2015, things have changed. AlphaGo was able to defeat Fan Hui, the European champion in Go, with a score of 5-0. Some months later, the program challenged Lee Sedol, the world champion in Go for over ten years (AlphaGo, DeepMind.com). AlphaGo was able to best Lee Sedol with a score of 4-1. As of recently, DeepMind has released a paper on how their deep neural network functions.

The paper on the inner workings of DeepMind is quite long, so I recommend checking it out for yourself. Here’s a link https://storage.googleapis.com/deepmind-media/alphago/AlphaGoNaturePaper.pdf

Below are a few snippets from the first couple of pages.

Go provides both players were perfect information, i.e. both players are aware of where each piece is and no information is hidden. Perfect play can be achieved by using a search tree of the optimal value function of b^d moves. b (breadth)refers to the amount of moves possible in any position, and d (depth) represents the amount of turns that the game lasts. For Go, b is approximately 250, and d is approximately 150. If you type that equation with these values into your calculator, you’ll probably get an overflow error! That’s quite a lot of states to have in a board game.

This massive amount of states makes heuristic, hard-coded behavior painstakingly laborious to do, so the programmers at DeepMind used a deep convolutional neural network. Convolutional layers “use many layers of neurons, each arranged in overlapping tiles, to construct increasingly abstract, localized representations of an image” (Mastering the game of Go with deep neural networks and tree search, Silver et. al. 1). Essentially, the convolutional layers reduce the amount of states (depth and breadth) for the program’s search tree to sift through.

The program was given a supervised learning policy network and a reinforcement learning policy network. The supervised learning policy network learns directly from recording how expert players moved their pieces. A fast policy then samples possible actions. Afterwards, the reinforcement learning policy optimizes the supervised learning policy by determining the outcomes from self-played games. A value network is then able to determine which player will win the self-played games.


Visualization of how AlphaGo processes its optimal value function.
Visualization of how AlphaGo processes its optimal value function.


The article contains much more information and a lot of things that I personally don’t understand, so I encourage that you take a look at it for yourself (and add on to this article if I missed anything interesting).

StarCraft II

After defeating the world champion in Go, DeepMind decided to tackle an even tougher issue. DeepMind is currently pursuing research to create an AI to compete in StarCraft II. StarCraft II is a real time strategy game that consists of three unique factions. The objective of the game is to gather resources, construct a base, train troops, and destroy the enemy's base. The game consists of strategic build orders, army positioning and control, resource management, and asymmetrical information - all in real time.

Asymmetrical information and real time passage make programming an AI for StarCraft II very different in comparison to AlphaGo. The game has fog of war, which means that a player cannot see the enemy's troops or buildings unless the player has a unit next to the enemy's. To play optimally, the program may have to manage hundreds of units while controlling where the camera is positioned, as the game does not display the entire battlefield at once. As the game is in real time, being able to execute orders quickly is important. Such a task for a computer is somewhat trivial, but performing the correct task for a specific situation is much more demanding. Compounding all these factors together, the requirements for the program for StarCraft II will make it a different beast compared to AlphaGo.

Much like AlphaGo, the team at DeepMind intends to eventually compete with the best players in the world for StarCraft II. The video below shows how DeepMind’s progress has come along so far.

https://www.youtube.com/watch?v=5iZlrBqDYPM

The program is able to decipher the game's graphics as small pixels and will use different image layers to depict specific information to the program (DeepMind and Blizzard to release StarCraft II as an AI research environment, DeepMind.com).

Currently, StarCraft II has in-game AI programmed by the game's developer, Blizzard; however, the AI present in the game right now is not proficient enough to compete with high level players.

Sources
https://www.youtube.com/watch?v=SUbqykXVx0A - Video on AlphaGo

https://en.wikipedia.org/wiki/AlphaGo

https://storage.googleapis.com/deepmind-media/alphago/AlphaGoNaturePaper.pdf - DeepMind article on AlphaGo

https://www.youtube.com/watch?v=5iZlrBqDYPM - DeepMind's AI for StarCraft II

https://deepmind.com/blog/deepmind-and-blizzard-release-starcraft-ii-ai-research-environment/