Gaze Point Detection
The gaze vector of a person provides much information about what a
person is interested in or what a person is referring to.
The relative gaze direction (to the head) of the person and the 3-D pose of the
head is required to calculate the 3-D gaze vector, located between the eyes and
pointing in the direction the person looks in.
If the 3-D gaze vector can be determined it can be intersected with a
world model to calculate the gaze point.
Since the gaze point is of significant interest in applications ranging
from VR-interfaces to aviation, from safety systems to the evaluation of
advertisements, many researches are interested in developing such
systems.
These systems consist usually of two parts, a gyro mounted to the
headgear and cameras pointing at the eyes from small distances.
Recently there have been less intrusive systems reported shining a light
spot at the eye ball and comparing the distance and orientation of the
reflection and the pupil.
Though, light reflection based systems require a heavily controlled illumination
of the environment to prevent undesired reflections in the eyes.
The system we are proposing in this paper is non-intrusive, it only requires a
monocular camera and is able to cope with facial motion in any
direction, including changes in depth.
It does not require extremely close up images of the eyes and thus
head motion can be compensated without an active camera as long as the
face does not leave the field of view.
The use of an active camera would only extend the capabilities.
The computation of the gaze vector consists of two stages.
The block diagram shown below illustrates the calculation
mechanism.
First the 3-D gaze direction relative to the facial normal is
determined.
Thus, the location of the iris and the inner and outer corners of the
eyes have to be tracked as indicated in the diagram.
The convergence of the eyes can not be reliably measured due to noise
in the feature tracking.
So an estimate of the gaze point distance is not feasible and the gaze
direction could be determined using the orientation of either eye.
Better robustness and lower noise levels can be achieved by merging the
results of both eyes, according to their confidence computed from the confidence
values of the three features.
The merged orientation is converted to a gaze vector whose origin can be regarded
to be located between the eyes.
Based on the pose estimation of the head tracker described in the
previous page the 3-D gaze vector can be determined in
camera coordinates by simple homogenous coordinate transformation.
The intersection of the gaze vector with a world model computes the gaze
point
Feedback & Queries: Jochen Heinzmann
Date Last Modified: Thursday, 24th Oct 1997