Head Pose Estimation
The estimation of the pose of the head is a basic functionality of
visual face trackers and has been investigated by many researchers.
In most systems the pose estimation is done in the next layer above the
feature tracking layer.
In our system there is a 2-D layer located
in between hardware feature tracking and 3-D pose estimation.
As stated in the previous section this layer has the purpose of increasing
robustness by detecting and recovering from tracking errors.
Two different transformations may be used for pose estimation from
monocular data: perspective or affine transformation.
The perspective transformation precisely models the actual projection of a 3-D
scene to the image plane.
However, the required calculations are complex and time consuming and can deliver
up to a fourfold ambiguity in the pose estimation.
Perspective transformations are only used in systems running off-line in
combination with the computationally expensive least squares model fitting.
Real-time systems usually use affine transformation because it has simpler
calculations and only a twofold ambiguity.
This so called weak perspective is a parallel projection without depth
forthshortening.
It is a good approximation of perspective projection as long as the
depth of the object does not exceed 1/10th of the distance
between camera and object which is usually the case in face tracking
applications.
Most pose estimations are based on the well-known three point model
fitting algorithms proposed by Huttenlocher and Ullman or
by Grimson, Huttenlocher and Alter.
Least-square fitting algorithms are too slow, the selection of the
three best tracking points causes unreliability in near frontal view situations
and relying on simple heuristics such as the side of the image
the centre of the feature set is within restricts the head
translations to a minimum.
We decided to choose to use the interference of the angle sets of multiple
triplets to solve the twofold ambiguity of pose.
The figure above shows a typical tracking
situation where the head is rotated slightly to the side, in this case
about 15 degrees.
The two triplets 1 and 2 are located in two different planes which
intersect at an angle of 45 degrees.
The local results of the inverse affine projection algorithm are the
angles between the plane and the optical axis of the camera.
The corresponding head rotational angles are calculated easily.
Interference shows that only one head rotational angle satisfies the
solutions for both triplets.
The following diagram shows the result from a pose estimation test sequence.
Feedback & Queries: Jochen Heinzmann
Date Last Modified: Thursday, 24th Oct 1997