Hybrid POMDP Planner -- Combining Offline and Online Techniques. Eddy C. Borera Our planner takes advantages of offline and online techniques. First, a point-based technique is used to compute value functions for limited numbers of belief states. This step is done offline, and the resulting value functions are used to guide online sample-based techniques. Also, our technique learns values for sampled belief states overtime to improve the value functions. We have applied Symbolic Perseus by Poupart et al., which is based on the original point-based technique by Spaan et al. (Spaan and Vlassis 2005) to compute initial approximate action values. Our technique is similar to RTDP-Bel (Bonet and Geffner 1998) online technique, except, belief states are discretized differently. We use {K-Nearest} Neighbor to search for a suitable belief state, and its value is updated accordingly. During atrial, values for encountered belief states are stored in hashtables, and can be reused for future trials. This learning should improve the action values overtime. Also, instead of computing P(z | b, a) at each time step, observation sampling is used to compute for approximate values. This reduces the online computation time, especially for large problems. During the computation of a Q-value, if a belief state has been visited, then, its stored value is used. However, if it has not been seen before, it is given the average value of the k-closest belief states as initial value. For large problems, computing policy, even for small set of belief states, is infeasible. Instead, our technique learns the belief state values online through trials, and the resulting best action is used for a given current belief state.