A Gestural Interface to Virtual Environments
Three-dimensional
virtual environments present new challenges for human-computer interaction.
Current input devices provide little more than 3D "point and click" interaction
whilst tethering the user to the system by restrictive cabling or gloves.
In contrast, video-tracked hand gestures provide a natural and intuitive
means of interacting with the environment in an accoutrement-free manner.
In
this project, we are investigating the use of vision-based systems to track
the hand in a gesture-based interface to 3D immersive environments with
the aim of providing a more natural, less restrictive interface for manipulating
objects in 3D.
System Overview
We have developed a stereo vision-based system
for real-time tracking of the position and orientation of the user's hand
and classification of gestures. The system uses a combination of
model and feature-based methods to acquire, track and classify the hand
within the video images. Model-based template matching is used to
track features of the hand in real time. Skin colour detection is
used to locate the hand blob within the image on startup and if tracking
fails. Features extracted from the hand blob are also used in classifying
gestures.
The gesture interface system was developed as an interface to virtual
environments and has been used to control navigation and manipulation of
3D objects. The system is used in conjunction with the joint CSIRO/ANU
Virtual
Environments lab. The environment consists of a Barco Baron projection
table for 3D graphics display with CrystalEyes stereo shutter glasses for
stereoscopic viewing. The environment is powered by an SGI Onyx2.
Polhemus FastTrak sensors and stylus are available for non-gestural input.
Robust & Real-Time Tracking in 3D
The tracking system is able to track multiple
features on the user's hand at frame rate (30Hz). When tracking fails,
the system is able to relocate the hand within the image within 2-3 frames.
The images below show examples of the system tracking a hand. The
white squares depict the features being tracked with the size of the square
indicating the certainty of tracking for that feature. The larger the square,
the more confident the system is in the tracking result.
Skin Colour Detection
In
order to start tracking when the user's hand enters the working volume
or when tracking fails, some method of locating the hand within the images
is needed. We use skin colour detection to locate skin coloured blobs
within the images, and further image processing to detect the hand.
Once the hand is found, its location can be used to restart the tracking
of the model.
Classification
Classification of the hand shape is required to identify when the user
is showing different gestures, and thus wishes to perform a different action.
Classification of gesture is possible using a variety of methods including
hidden Markov Models, neural networks and probablistic models. We
use a statistical model to determine which gesture (if any) in the gesture
set is being displayed. Image features including moments are used
to create a feature vector from which a classification is made.
Applications
Navigation Control - Terrain Flythrough
A common task in 3D manipulation is user control of the viewpoint within
a scene or virtual world. The user should be able to move easily
through the scene. As a demonstration of the use of gesture to control
the user's viewpoint, we constructed a terrain flythrough. The user
controls their direction within the flythrough by tilting the hand as in
the image below. Forward and backward motion is controlled via the
location of the user's hand in space - moving the hand forward to move
forwards, and back to move backwards.
3D Object Manipulation - Blocks
Along with viewpoint control, object manipulation is a fundamental interaction
requirement in 3D virtual environments. Object manipulations include
selection, translation, rotation and scaling.
Multidimensional Input - Sound Space Exploration
An advantage of gesture over other 3D trackers is the ability to provide
multidimensional input. While a Polhemus Stylus or similar device
provides position and orientation information for a single point in space
(the stylus tip), a gesture interface can input many positions simultaneously
since the system tracks multiple features. To demonstrate this ability,
we developed "HandSynth" - a tool for exploring multi-dimensional sound
synthesis algorithms. In HandSynth, the position of each fingertip
is tied to different parameters within the sound generator. Moving
the hand about in space and changing its orientation generates a variety
of different sounds synthesised by changing up to 15 parameters at
the same time. We used the HandSynth to simultaneously control 5
fm-synths each with 3 parameters - carrier frequency, modulation
frequency and modulation depth. Exploring the sound space is
much quicker and easier than the conventional approach which involves
twiddling individual knobs and sliders for many tedious hours to understand
the complex perceptual interactions between parameters. The
HandSynth was also used as an interface for non-linear navigation in a
3 minute long sampled sound file. Movement from left to right was a fast
forward or rewind. Movement front-back provided normal playback speed for
the current position. Up and down allowed slow motion forward and back.
This interface allows the user to quickly hear an overview with left-right
hand movement, to zoom in on detail and to also have random access into
the file based on position of the hand movements. The synthesis
and navigation were combined to create a compositional tool in which
3 parameters of reverb and flanging effects were controlled by the spatial
position of two fingertips while the other 3 fingertips accessed samples
from the soundfile to be processed through the effects. Finally
we also used the sounds from the HandSynth as input to a music visualisation
based on a flock of 'boid' artificial lifeforms that respond to different
frequencies in the sound. The motion of the boids is displayed graphically
providing a visual display of auditory information. With HandSynth
providing the audio input, the boids now respond to the user's hand movements.
Publications
R.
O'Hagan, A. Zelinsky and S. Rougeaux, "Visual
Gesture Interfaces to Virtual Environments",
Interacting With
Computers Special Issue, To Appear
(invited paper) 2001.
R. O'Hagan and A. Zelinsky, "Vision-based
Gesture Interfaces to Virtual Environments", 1st Australasian
User Interfaces Conference (AUIC2000) Proceedings, Canberra, Australia,
pp 73-80. January 2000.
R. O'Hagan and A. Zelinsky,"Finger
Track - A Robust and Real-Time Gesture Interface", Advanced
Topics in Artificial Intelligence, Tenth Australian Joint Conference on
Artificial Intelligence (AI'97) Proceedings, Perth, Australia, pp475-484.
December 1997.
| RSL | PROJECTS
| PAPERS | PEOPLE
|
DEMOS
|
EVENTS
|
LABS
|
Feedback & Queries: Rochelle
O'Hagan
Date Last Modified: Sunday, 8th Jul 2001