Stereo Vision Lip-Tracking for Audio-Video Speech Processing
Authors: Roland Göcke, J.Bruce Millar, Alexander Zelinsky, and
Jordi Robert-Ribes
Presented by Roland Göcke at the 2001 IEEE International
Conference on Acoustics, Speech, and Signal Processing ICASSP 2001
, Salt Lake City, USA, 7-11 May 2001
This paper was presented as a poster in the student forum of
which only abstracts were published in the conference proceedings,
not full papers.
Abstract
We present the first results from applying a recently proposed novel
algorithm for the robust and reliable automatic extraction of lip
feature points to an audio-video speech data corpus. This corpus
comprises 10 native speakers uttering sequences that cover the range
of phonemes and visemes in Australian English. The lip-tracking
algorithm is based on stereo vision which has the advantage of
measurements being in real-world (3D) coordinates, instead of image
(2D) coordinates. Certain lip feature points on the inner lip
contour such as the lip corners and the mid-points of upper and
lower lip are automatically tracked. Parameters describing the shape
of the mouth are derived from these points. The results obtained so
far show that there is a correlation between width and height of the
mouth opening as well as between the protrusion parameters of upper
and lower lips.
[Back to Homepage]
[Back to Publications]
(c) Roland Göcke
Last modified: 25/9/01