We adopted a value-based RL approach to solve MDPs. The value function
is represented using linear function approximation, where features are
logical functions on the top of binary state dimensions. More
specifically, state dimensions constitute initial features while
incremental Feature Dependency Discovery (iFDD) [ICML 2011] expands
the feature set in areas where the temporal difference error
persists. Linear, gradient-descent Sarsa(0) [See Chapter 8 of Sutton
et. al 1998] updates the approximation parameters. The agent select
actions using an e-greedy policy.

A. Geramifard, F. Doshi, J. Redding, N. Roy, and J. P. How,
-Y´Incremental Feature Dependency Discovery¡, Proceedings of the
23rd International Conference on Machine Learning (ICML), 2011

Sutton, Richard S. and Barto, Andrew G. Reinforcement Learning: An
Introduction. MIT Press, 1998.