Automatic perception of human posture and gestures of input vision plays an important role in the development of intelligent video systems. To achieve the robustness and real-time performance required for practical applications, the idea is to break the exponentially large problem of the upper body pose in two steps: first, the 3-D movements of the upper extremities of the body (head and hands) are tracked. Then, using knowledge of the upper body model constraints, these limb movements are used to infer the whole motion of the body in 3-D as a problem of inverse kinematics. Since the head and hand regions are typically well defined and suffer less occlusion, tracking is more reliable and could allow determination of the possession of the strongest upper body. In addition, by breaking the problem of tracking the posture of the upper body in two steps, the complexity is greatly reduced. Using the follow-up pose, recognition of the gestures is performed from the measure of similarity of the longest common subsequence of the dynamic of the angles of the joints of the upper body. In our experiment, we provided extensive validation of the proposed upper-body follow-up follow-up for 3-D limb movement which showed good results with various subjects in different settings. As for the recognition of gestures based on the dynamics of the articular angles, our experimental evaluation of five subjects making six gestures in the upper body with an average classification accuracy above 90% indicates the promise and viability of the proposed system.