The PROST planning system is based on Kocsis' and Szepesvari's
\emph{UCT} algorithm, a Monte-Carlo Tree Sampling procedure where the
action selection in each sample (called \emph{rollout}) depends on all
previous rollouts. Even though UCT is provably optimal in the limit,
it takes quite a while to converge as initial rollouts correspond to
random walks. To improve this behaviour, we use - among several other
improvements of the plain UCT algorithm - a \emph{most likely} single
outcome determinization to compute a two-step lookahead heuristic and
\emph{initialize} nodes in the search space with these rewards.

Furthermore, as the time limit in the competition is quite steep for a
planner that computes single decisions rather than policies (< 1s)
PROST additionally caches all computed state-action pairs for future
reuse.