Gaze Point Detection

The gaze vector of a person provides much information about what a person is interested in or what a person is referring to. The relative gaze direction (to the head) of the person and the 3-D pose of the head is required to calculate the 3-D gaze vector, located between the eyes and pointing in the direction the person looks in. If the 3-D gaze vector can be determined it can be intersected with a world model to calculate the gaze point. Since the gaze point is of significant interest in applications ranging from VR-interfaces to aviation, from safety systems to the evaluation of advertisements, many researches are interested in developing such systems. These systems consist usually of two parts, a gyro mounted to the headgear and cameras pointing at the eyes from small distances. Recently there have been less intrusive systems reported shining a light spot at the eye ball and comparing the distance and orientation of the reflection and the pupil. Though, light reflection based systems require a heavily controlled illumination of the environment to prevent undesired reflections in the eyes.
The system we are proposing in this paper is non-intrusive, it only requires a monocular camera and is able to cope with facial motion in any direction, including changes in depth. It does not require extremely close up images of the eyes and thus head motion can be compensated without an active camera as long as the face does not leave the field of view. The use of an active camera would only extend the capabilities.
The computation of the gaze vector consists of two stages. The block diagram shown below illustrates the calculation mechanism.

First the 3-D gaze direction relative to the facial normal is determined. Thus, the location of the iris and the inner and outer corners of the eyes have to be tracked as indicated in the diagram. The convergence of the eyes can not be reliably measured due to noise in the feature tracking. So an estimate of the gaze point distance is not feasible and the gaze direction could be determined using the orientation of either eye. Better robustness and lower noise levels can be achieved by merging the results of both eyes, according to their confidence computed from the confidence values of the three features. The merged orientation is converted to a gaze vector whose origin can be regarded to be located between the eyes. Based on the pose estimation of the head tracker described in the previous page the 3-D gaze vector can be determined in camera coordinates by simple homogenous coordinate transformation. The intersection of the gaze vector with a world model computes the gaze point

Feedback & Queries: Jochen Heinzmann
Date Last Modified: Thursday, 24th Oct 1997