Glutton is an anytime MDP solver based on the RTDP algorithm that solves planning scenarios expressed in PPDDL optimally if given enough resources. PPDDL versions of IPPC-2011 problems differ subtly but fundamentally from benchmarks of previous competitions; the number of action outcomes in them is often comparable with the size of the problem's state space. Since RTDP and other Value Iteration-based approaches enumerate action outcomes explicitly during Bellman backup, they fail to scale under these circumstances. Glutton addresses this issue by sampling just a few outcomes of each action in a given state (typically no more than 50) and performing Bellman backups using the sampled outcomes only. Moreover, we observe that the effects of the "noop" action, which describes the so-called \emph{natural dynamics} of the given problem, are compiled into all other actions in every IPPC-2011 problem. For extra efficiency, Glutton separates out the natural dynamics effects out of all actions in a given state, samples these effects once, and then re-combines the generated outcomes with samples of each action's \emph{proper} effects. Thus, Glutton saves on resampling noop effects for each action separately. However, sampling actions' effects as above during every visit to a state would still be too costly speedwise. Instead, for many state-action pairs Glutton \emph{caches} the sampled action outcomes. With its size adjusted automatically, the cache can devour nearly all of RAM available, giving Glutton both its name and speed gains that can reach several orders of magnitude. To provide good anytime performance, Glutton runs RTDP in an iterative deepening fashion. Given a problem with a finite horizon of H steps, it first solves a 1-step version of the problem, then uses the obtained solution to produce a 2-step version of the problem, and so on as long as there is time remaining. Once time runs out, Glutton considers the resulting policy and an additional candidate policy for each action that consists of repeating that action at every step until the horizon is reached. The motivation for the action-specific candidate policies is the observation that in many finite-horizon scenarios, good policies tend to be cyclic, i.e. repeating the same sequence of actions over and over again. Glutton's candidate policies are the simplest examples of this class. Glutton compares the quality of all these policies by sampling, chooses the one with the highest estimated expected reward, and submits it to the server. On a significant number of large problems some of the primitive cyclic policies are appreciably better than the one produced by RTDP under a tight time constraint, providing Glutton with a reasonable fall-back solution. A final component of Glutton's tactics was a mechanism that learned which problem domains appeared "hard", dynamically allocated more time to the hard problems, and scheduled the expected easiest instances to be attempted first.