Analysis of Audio-Video Correlation in Vowels in Australian
English
Authors: Roland Göcke, J.Bruce Millar, Alexander Zelinsky, and
Jordi Robert-Ribes
Presented by Roland Göcke at the International Conference on
Auditory-Visual Speech Processing AVSP 2001,
Aalborg, Denmark, 7-9 September 2001
Abstract
This paper investigates the statistical relationship between
acoustic and visual speech features for vowels. We extract such
features from our stereo vision AV speech data corpus of Australian
English. A principal component analysis is performed to determine
which data points of the parameter curve for each feature are the
most important ones to represent the shape of each curve. This is
followed by a canonical correlation analysis to determine which
principal components, and hence which data points of which features,
correlate most across the two modalities. Several strong
correlations are reported between acoustic and visual features. In
particular, F1 and F2 and mouth height were strongly correlated.
Knowledge about the correlation of acoustic and visual features can
be used to predict the presence of acoustic features from visual
features in order to improve the recognition rate of automatic
speech recognition systems in environments with acoustic noise.
Download (1.8MB, PDF)
[Homepage]
[Research]
[Publications]
(c) Roland Göcke
Last modified: Wed Nov 24 13:29:10 AUS Eastern Daylight Time 2004