Adrien Couëtoux and Mario Milone and Matthias Brendel and Hassen Doghmen
and Michele Sebag and Olivier Teytaud , ACML, 2011.
In the last decade, Monte-Carlo Tree Search (MCTS) has revolutionized the domain of
large-scale Markov Decision Process problems. MCTS most often uses the Upper Confidence
Tree algorithm to handle the exploration versus exploitation trade-of, while a few
heuristics are used to guide the exploration in large search spaces. Among these heuristics
is Rapid Action Value Estimate (RAVE). This paper is concerned with extending the
RAVE heuristics to continuous action and state spaces. The approach is experimentally
validated on two artificial benchmark problems: the treasure hunt game, and a real-world
energy management problem.
Paper:
Adding Double Progressive Widening to Upper Confidence Tree to Cope with Uncertainty
in Planning Problems,
Hassen Doghmen and Adrien Couëtoux, EWRL, 2011.[submitted]
[See the Abstract]
Current state of the art methods in energy policy planning only approximate the problem
(Linear Programming on a finite sample of scenarios, Dynamic Programming on an approximation of
the problem, etc). Monte-Carlo Tree Search (MCTS) seems to be a potential candidate
to converge to an exact solution of these problems. But how fast, and how do key
parameters (double/simple progressive widening) influence the rate of convergence (or even the convergence itself),
are still open questions. Also, MCTS completely ignores the features of the problem, including the
scale of the objective function.
In this paper, we present MCTS, and its extension to continuous/stochastic domains.
We show that on problems with continuous
action spaces and infinite support of random variables, the ``vanilla'' version of MCTS fails. We also
show how the double progressive widening technique success relies on its widening coefficient.
We also study the impact of an unknown variance of the random variables, to see if it affects the optimal choice of
the widening coefficients.
Paper:
Consistency Modifications for Automatically Tuned Monte-Carlo Tree Search,
Vincent Berthier, Hassen Doghmen and Olivier Teytaud, LION, 2010.
[See the Abstract]
Monte-Carlo Tree Search algorithms (MCTS), including upper confidence trees (UCT), are known for their impressive ability in high dimensional control problems. Whilst the main testbed is the game of Go, there are increasingly many applications; these algorithms are now widely accepted as strong candidates for high- dimensional control applications. Unfortunately, it is known that for optimal performance on a given problem, MCTS requires some tuning; this tuning is often handcrafted or automated, with in some cases a loss of consistency, i.e. a bad behavior asymptotically in the computational power. This highly undesirable property led to a stupid behavior of our main MCTS program MoGo in a real-world situation described in section 3. This is a big trouble for our several works on automatic parameter tuning and the genetic programming of new features in MoGo. We will see in this paper: (i) A theoretical analysis of MCTS consistency, (ii) Detailed examples of consistent and inconsistent known algorithms, (iii) How to modify a MCTS implementation in order to ensure consistency, independently of the modifications to the scoring module (the module which is automatically tuned and genetically programmed in MoGo). As a by product of this work, we'll see the interesting property that some heavily tuned MCTS implementations are better than UCT in the sense that they do not visit the complete tree (whereas UCT asymptotically does), whilst preserving the consistency at least if consistency modifications above have been made.
Paper:
Scalability and Parallelization of Monte-Carlo Tree Search,
Amine Bourki, Guillaume Chaslot, Matthieu Coulm, Vincent Danjean, Hassen Doghmen, Thomas Herault,
Jean-Baptiste Hoock, Arpad Rimmel, Fabien Teytaud, Olivier Teytaud, Paul Vayssiere, Ziqin Yu.
[See the Abstract]
Monte-Carlo Tree Search is now a well established algorithm, in games and beyond. We analyze its scalability, and in particular its limitations, and the implications in terms of parallelization, in particular for our program MoGo but also for our Havannah program Shakti. In particular, we get a good efficiency for the parallel versions, both for multicore machines and for message-passing machines, but in spite of promising results in self-play there are situations for which increasing the time per move does not solve anything, and therefore parallelization is not the solution either. Nonetheless, for problems on which the Monte-Carlo part is less biased than in Go, parallelization should be very efficient even without shared memory.
Paper:
Computational and Human Intelligence in Blind Go, H.Doghmen, F.Teytaud, O.Teytaud, S. J. Yen, C. S. Lee, M. H. Wang, IEEE CIG 2011
[See the Abstract]
In this paper, we will consider questions related to blind-folded play: (i) the impact (for human players) of playing
blindfolded in the level of Go players in 9x9 Go (ii) the influence of the visual support (the visual support
is a board with no stone) (iii) which modifications are required for making a program strong in the blind variant of the
game (and, somehow surprisingly, implementing a program for playing
blind go is not equivalent to implementing a program for playing go)
(iv) some conclusions on the rules of blind Go for making it interesting and
pedagogically energycient.