The Audio-Video Australian English Speech Data Corpus

What is it?

AVOZES is an audio-video (or auditory-visual) speech data corpus for Australian English. The AVOZES data corpus was designed and recorded with two major goals in mind.

Firstly, a new framework for the design of comprehensive, well-structured, multiple-use AV speech data corpora was proposed and followed in the production of the AVOZES data corpus. Secondly, the first publicly available, comprehensive AV speech data corpus for Australian English (AuE) was produced. In addition, it is the first AV speech data corpus to use a stereo vision system.

What does AVOZES mean?

AVOZES stands for Audio-Video OZstralian English Speech.

What does AVOZES contain?

AVOZES has a modular structure. A modular approach, where each module contains certain sequences, allows for extensibility in terms of the various design factors that need to be addressed at the time of corpus creation. For more details on the design of AVOZES click here.

AVOZES contains six modules. These are:

  1. Recording setup without speaker
  2. Recording setup with speaker
  3. 'Calibration' sequences
  4. Short words in a carrier phrase covering the phonemes and visemes of Australian English
  5. Application-driven sequences - Digits 0-9 in a carrier phrase
  6. Application-driven sequences - Continuous speech
As module 1 is speaker-independent, it only needed to be recorded once for each recording session. Module 2-6 are speaker-dependent and thus needed to be recorded for each speaker. Speaker-dependent recordings were made once for each speaker, i.e. no repititions were recorded. For more details on the contents of AVOZES click here (including example sequences) or read the AVOZES documentation.

AVOZES contains recordings from 20 native speakers of Australian English and 4 non-native speakers. Only the recordings of the native speakers are currently made available. The recordings of the non-native speakers might be published in the future.

Video recordings were made using a calibrated stereo camera system. Video frames are stored as DV-AVI files in the NTSC format (29.97Hz frame rate, 720x480 pixels resolution). Audio recordings were made using a mono microphone. Audio data are stored both in the DV-AVI files as well as in separate WAV files as 48kHz 16bit linear encoded samples.
Example of a stereo video frame in AVOZES

Why stereo video?

A stereo vision system has the advantage over monocular systems that 3D coordinates can be recovered accurately. Thus, 3D distances can be measured, not just distances in 2D image coordinates, which makes the measurements robust against rotations of the face.

The output of the stereo cameras was multiplexed into one video signal using field multiplexing. In this technique, a device containing a video switching integrated circuit selects the signal from one video stream as the odd field of the video output, while the signal from the other video stream becomes the even field. This requires to first de-interleave the odd-even fields of the video frames from each camera. Multiplexing video signals in the analogue phase has the advantage that it can be applied to virtually any video hardware system. Images from two cameras can be stored in a single video frame, albeit at reduced vertical resolutions. Stereo image processing can be performed within the computer's memory using only one image processing board.

Can I get a copy of AVOZES?

Yes, you can! AVOZES is publicly available. Send me an E-mail if you are interested. AVOZES comes on 3-4 DVDs, depending on whether you want the edited sequences or simple the complete recordings.

If you want to get a copy of AVOZES, you need to acquire a licence. There are two licences: a non-commercial (academic) licence and a commercial licence. The non-commercial licence is available for as little as AUD100 (plus postage)! Basically, just so that I cover my costs, as I am interested in making the data corpus as widely available as possible. Please make sure you checkout the wording of the licence agreement, before ordering a copy of AVOZES.

[Back to Homepage] [Back to Research]

© Roland Göcke
Last modified: Tue Mar 22 15:36:52 AUS Eastern Daylight Time 2005