Facial Motion Gesture Recognition

Robust real-time face tracking gives rise to the possibility of recognising gestures based on motions of the head. Gesture recognition and face tracking are implemented as independent processes. Both processes run at the NTSC video frame rate (30Hz).
The design of the algorithms for gesture recognition is determined by hard real-time constraints since a dozen of gestures must be compared with the data stream from the face tracking module. Also, gesture recognition must be flexible in respect to the performance of the gesture including timing and amplitudes of the parameter. The system should assign confidence values to the gestures that it recognises. This allows higher level processes to use the gestures as input and interpret them optimally. To avoid the computationally expensive time warping method we developed a recursive method taking only the current motion vector of the face into account. A set of finite state machines implicitly store the previous tracking information.
Gesture recognition is implemented as a two layer system based on the decomposition of gestures into atomic actions. The lower layer recognises basic motion primitives and state primitives called atomic actions. The output of an atomic action consists of its activation which is a measure for the similarity of the recent motion vectors and the atomic action definition. The system incorporates 22 predefined atomic actions for which the activation is calculated in each frame cycle.
The upper layer is concerned with the recognition of patterns in the activation of the set of atomic actions. A gesture is defined by a sequence of atomic actions and time constraints for the occurrence of each of them. Each time the first atomic action of a gesture definition is activated an instance of a finite state machine is generated dynamically. This instance then observes the activation of the next atomic action in the gesture definition and does the transition to the next state if the activation occurs within a given time frame. If no activation occurs in the time frame the finite state machine instance is deleted from the system. When the state machine reaches it's final state the atomic action sequence is completely recognised and the gesture is send to output together with a confidence measure of how well the observed pattern matched the gesture definition.
The figure shows the result of a gesture recognition test sequence.

Feedback & Queries: Jochen Heinzmann
Date Last Modified: Thursday, 24th Oct 1997