A Comparative Study of 2D and 3D Lip Tracking Methods for AV ASR
Authors: Roland Göcke and Akshay Asthana
Presented by Roland Goecke at the International Conference on
Auditory-Visual Speech Processing AVSP 2008, Moreton Island,
Australia, 26-29 September 2008. AVISA.
Abstract
Over the past two decades, many algorithms have been proposed to detect
and track a human face and its facial features. Of particular interest to
the Automatic Speech Recognition (ASR) community are algorithms that can
track the shape of the lips, as such visual speech input can then be used
in an auditory-visual (AV) ASR system to improve the recognition accuracy
of traditional audio-only ASR systems, particularly in the presence of
acoustic noise. Despite the large number of face and lip tracking
algorithms that have been proposed over the years, there is a lack of a
comparative study that evaluates such algorithms in the context of AV ASR
performance. In this paper, the performance of various 2D and 3D lip
tracking algorithms is compared from a point of view of AV ASR. In
particular, the focus of this study is on algorithms that use explicit lip
models. A number of variants of the recently popular Active Appearance
Models (AAMs) are compared with a 3D lip tracking algorithm that uses
stereo vision. All performance evaluations are made using the AVOZES data
corpus.
Download (272kB, PDF)
Bibtex Entry
@INPROCEEDINGS{goecke_asthana2008,
AUTHOR = {R. Goecke and A. Asthana},
TITLE = {{A Comparative Study of 2D and 3D Lip Tracking Methods for AV ASR}},
BOOKTITLE = {{Proceedings of the International Conference on
Auditory-Visual Speech Processing AVSP 2008}},
PUBLISHER = {AVISA},
ADDRESS = {Moreton Island, Australia},
PAGES = {235--240},
MONTH = sep,
YEAR = 2008}
[Homepage]
[Research]
[Publications]
(c) Roland Göcke
Last modified: Sat May 30 23:25:42 AUS Eastern Normalzeit 2009