The planner integrates two algorithms one based on offline policy
computation and one based on online search. It chooses one of them,
depending on the state space size of the given task. For small
instances, the offline algorithm, SARSOP [1], is executed. SARSOP is
an efficient point-based algorithm that backs up from belief points
which are close to -Y´optimally reachable¡ from the given initial
belief. For large instances, the online algorithm, derived from POMCP
[2], is triggered. It uses particle filters to represent beliefs and
UCT, a Monte-Carlo simulation method, to evaluate the value of
potential actions.
[1] H. Kurniawati, D. Hsu, and W.S. Lee. SARSOP: Efficient point-based
POMDP planning by approximating optimally reachable belief spaces. In
Proc. Robotics: Science and Systems, 2008.
[2] D. Silver, and J. Veness. Monte-carlo planning in large POMDPs. In
NIPS, 2010.