Real-Time Vision for Human Face Tracking

One important aspect of the overall system is the visual interface that allows the human operator to control the robot using facial gestures and the gaze point.
The approach we use is a 3 layered system shown in the figure below. On the lowest level the vision system performs bitmap correlation in hardware. The results are measured feature positions that may contain tracking errors. The measured positions are forwarded to the 2-D model which takes into account geometric constraints in the image plane and the correlation distortion to generate an estimate of the feature positions. This layer is implemented as a network of Kalman filters. The estimated positions of features determine the location within the next image frame of the hardware search windows. The 2-D image positions of the features are transfered to the 3-D model of the feature locations. Using multiple feature triplets the 3-D pose of the head can be determined and used for other calculations such as gesture recognition or gaze point detection. The 3-D model is also projected back into the image plane to adapt the constraints in the 2-D model. All three layers run at 30Hz.

Feedback & Queries: Jochen Heinzmann
Date Last Modified: Thursday, 24th Oct 1997