Mahmud, M.M.H. and Lloyd, J. W. : Learning Deterministic-Probabilistic Models for Partially Observable Reinforcement
Learning Problems. Tech Report (submitted to JMLR), containing
consistency results and other proofs for the ICML paper.
Mahmud, M.M.H.: Constructing States
for Reinforcement Learning. In, Proceedings of the 27th International Conference
on Machine Learning (2010).
Mahmud, M.M.H.: On
Universal Transfer Learning. Theoretical Computer Science 410 (2009), pp. 1826-1846
Mahmud, M.M.H., Ray, S.:
Transfer Learning using Kolmogorov complexity: Basic Theory and Empirical Evaluations.
In, the Proceedings of the 20th Neural Information Processing Systems Conference, 2007.
Mahmud, M.M.H.: On
universal transfer learning. In, the Proceedings
of the 18th International Conference on Algorithmic Learning
Theory, 2007. Lecture Notes in Artificial Intelligence, LNAI
4754, pp. 135-149,2007; Springer, Berlin
Mahmud, M.M.H., Ray, S.: Functional
Similarity in Markov Environments
Workshop on Inductive
Transfer, 18th Neural Information Processing Systems Conference,
Swarup, S, Mahmud,
M.M.H., Lakkaraju, K, and Ray, S: Cumulative
Learning: Towards Designing Cognitive Architectures for Artificial
Agents that Have a Lifetime Technical
Report, University of
Illinois at Urbana-Champaign, 2005.
Mahmud, M.M.H., Ray, S.: Using
Functional Similarity to Transfer Information in Markov
Environments. University of Illinois at Urbana-Champaign, 2005.
Mahmud, M.M.H., Ray, S.: A
Novel Forward Model for Markov Environments.
University of Illinois at Urbana-Champaign, 2005.
Mahmud., M. M. H. Universal Transfer Learning. Ph.D. Thesis, University of Illinois at Urbana-Champaign, 2008.
In addition to the material in the NIPS paper and ALT paper, the dissertation contains full development
of parallel transfer, competitive optimality of universal priors, Kolmogorov complexity of functions, and many,
many more experiments.
M. M. Hassan. Explanation Based Policy Adaptation. Master's
Thesis, University of Illinois at Urbana-Champaign, 2002.
We derived a method that adapts a policy learned in an ideal setting using prior knowledge, so that it works in the actual setting. So this way we require fewer actual examples to learn the policy. We applied our method in a simulated Air Hockey robot problem (simulated using equations derived for an actual robot), which is an example of a complex non-linear dynamic control problem.