Simon Lucas

Monte Carlo Tree Search:
Insights and Applications
BCS Real AI Event
Simon Lucas
Game Intelligence Group
University of Essex
• General machine intelligence: the ingredients
• Monte Carlo Tree Search
– A quick overview and tutorial
• Example application: Mapello
– Note: Game AI is Real AI !!!
• Example test problem: Physical TSP
• Results of open competitions
• Challenges and future directions
General Machine Intelligence:
the ingredients
• Evolution
• Reinforcement Learning
• Function approximation
– Neural nets, N-Tuples etc
• Selective search / Sample
based planning / Monte
Carlo Tree Search
Conventional Game Tree Search
• Minimax with alpha-beta
pruning, transposition tables
• Works well when:
– A good heuristic value function is
– The branching factor is modest
• E.g. Chess: Deep Blue, Rybka
– Super-human on a smartphone!
• Tree grows exponentially with
search depth
• Much tougher for
• High branching factor
• No good heuristic value
• MCTS to the rescue!
“Although progress has been
steady, it will take many decades
of research and development
before world-championship–
calibre go programs exist”.
Jonathan Schaeffer, 2001
Monte Carlo Tree Search (MCTS)
Upper Confidence bounds for Trees
Further reading:
Attractive Features
• Anytime
• Scalable
– Tackle complex games and planning problems better
than before
– May be logarithmically better with increased CPU
• No need for heuristic function
– Though usually better with one
• Next we’ll look at:
– General MCTS
– UCT in particular
MCTS: the main idea
• Tree policy: choose which node to expand (not necessarily leaf of
• Default (simulation) policy: random playout until end of game
MCTS Algorithm
• Decompose into 6 parts:
• MCTS main algorithm
– Tree policy
• Expand
• Best Child (UCT Formula)
– Default Policy
– Back-propagate
• We’ll run through these then show demos
MCTS Main Algorithm
BestChild simply picks best child node of root according to some criteria: e.g. best
mean value
In our pseudo-code BestChild is called from TreePolicy and from MctsSearch, but
different versions can be used
– E.g. final selection can be the max value child or the most frequently visited one
• Note that node selected for expansion does not
need to be a leaf of the tree
• But it must have at least one untried action
Best Child (UCT)
• This is the standard UCT equation
– Used in the tree
• Higher values of c lead to more exploration
• Other terms can be added, and usually are
– More on this later
• Each time a new node is added to the tree, the
default policy randomly rolls out from the current
state until a terminal state of the game is reached
• The standard is to do this uniformly randomly
– But better performance may be obtained by biasing
with knowledge
• Note that v is the new node added to the tree by the tree
• Back up the values from the added node up the tree to the
MCTS Builds Asymmetric Trees (demo)
All Moves As First (AMAF),
Rapid Value Action Estimates (RAVE)
• Additional term in UCT equation:
– Treat actions / moves the same independently of
where they occur in the move sequence
Using for a new problem:
Implement the State interface
Example Application:
• Each move you must
Pincer one or more
opponent counters
between the one you
place and an existing
one of your colour
• Pincered counters are
flipped to your own
• Winner is player with
most pieces at the end
Basics of Good Game Design
• Simple
• Balance
• Sense of
• Outcome
not be
Othello Example – white leads: -58
(from )
Black wins with score of 16
• Take the counter-flipping drama of Othello
• Apply it to novel situations
– Obstacles
– Power-ups (e.g. triple square score)
– Large maps with power-plays e.g. line fill
• Novel games
– Allow users to design maps that they are expert in
– The map design is part of the game
• Research bonus: large set of games to experiment
Example Initial Maps
Or how about this?
Need Rapidly Smart AI
• Give players a challenging game
– Even when the game map can be new each time
• Obvious easy to apply approaches
– TD Learning
– Monte Carlo Tree Search (MCTS
– Combinations of these …
• E.g. Silver et al, ICML 2008
• Robles et al, CIG 2011
MCTS (see Browne et al, TCIAIG 2012)
• Simple
• Anytime
• No need for
a heuristic
• E-E balance
• Works well
across a
range of
• TDL learns
reasonable weights
• How well will this
play at 1 ply versus
limited toll-out
For Strong Play …
• Combine MCTS, TDL, N-Tuples
Where to play / buy
• Coming to Android (November 2012)
• Nestorgames (
MCTS in Real-Time Games: PTSP
• Hard to get long-term
planning without good
Optimal TSP order != PTSP Order
MCTS: Challenges and
Future Directions
• Better handling of problems with continuous
action spaces
– Some work already done on this
• Better understanding of handling real-time
– Use of approximations and macro-actions
• Stochastic and partially observable problems /
games of incomplete and imperfect information
• Hybridisation:
– with evolution
– with other tree search algorithms
• MCTS: a major new approach to AI
• Works well across a range of problems
– Good performance even with vanilla UCT
– Best performance requires tuning and heuristics
– Sometimes the UCT formula is modified or discarded
• Can be used in conjunction with RL
– Self tuning
• And with evolution
– E.g. evolving macro-actions
Further reading and links

similar documents