15-12-2012, 02:19 PM
Kinect
1Kinect.pdf (Size: 448.55 KB / Downloads: 91)
INTRODUCTION
The Xbox 360 is the second video game console developed by and produced for Microsoft and the successor to the Xbox. Kinect is a "controller-free gaming and entertainment experience" for the Xbox 360. It was first announced on June 1, 2009 at the Electronic Entertainment Expo, under the codename, Project Natal. The add-on peripheral enables users to control and interact with the Xbox 360 without a game controller by using gestures, spoken commands and presented objects and images. The Kinect accessory is compatible with all Xbox 360 models, connecting to new models via a custom connector, and to older ones via a USB and mains power adapter. Based around a webcam-style add-on peripheral for the Xbox 360 console, it enables users to control and interact with the Xbox 360 without the need to touch a game controller, through a natural user interface using gestures and spoken commands. It aims at broadening the Xbox 360's audience beyond its typical gamer base. Kinect holds the Guinness World Record of being the "fastest selling consumer electronics device". It sold an average of 133,333 units per day with a total of 8 million units in its first 60 days.
Kinect is based on software technology developed internally by Rare, a subsidiary of Microsoft Game Studios owned by Microsoft, and on range camera technology by Israeli developer PrimeSense, which interprets 3D scene information from a continuously-projected infrared structured light. The device features an "RGB camera, depth sensor and multi-array microphone running proprietary software which provides full-body 3Dmotion capture, facial recognition and voice recognition capabilities. Kinect software is capable of automatically calibrating the sensor based on game play and the player's physical environment, accommodating for the presence of furniture or other obstacles. Described by Microsoft personnel as the primary innovation of Kinect the software technology enables advanced gesture recognition, facial recognition and voice recognition. Kinect is capable of simultaneously tracking up to six people, including two active players .Through reverse engineering efforts it has been determined that the Kinect sensor outputs video at a frame rate of 30Hz.
AN RGB COLOR SPACE
An RGB color space is any additive color space based on the RGB color model. A particular RGB color space is defined by the three chromaticities of the red, green, and blue additive primaries, and can produce any chromaticity that is the triangle defined by those primary colors. The complete specification of an RGB color space also requires a white point chromaticity and a gamma correction curve.
3D SENSOR
It is a device that analyzes a real-world object or environment to collect data on its shape and possibly its appearance (i.e. color). The collected data can then be used to construct digital, three dimensional models useful for a wide variety of applications.
Multiple scans, even hundreds, from many different directions are usually performed and the required information about all sides of the subject is obtained. These scans have to be brought in a common reference system, a process that is usually called alignment or registration, and then merged to create a complete model. This whole process, going from the single range map to the whole model, is usually known as the 3D scanning pipeline.
MULTIPLE MICROPHONE ARRAY
A microphone array is any number of microphones operating in tandem. Typically, an array is made up of omni directional microphones distributed about the perimeter of a space, linked to a computer that records and interprets the results into a coherent form. Arrays may also be formed using numbers of very closely spaced microphones. In Kinect ,the microphone array features four microphone capsules and operates with each channel processing 16-bit audio at a sampling rate of 16 kHz.
TECHNOLOGIES USED
The depth sensor consists of the IR projector combined with the IR camera, which is a monochrome complementary metal oxide semiconductor (CMOS) sensor. The depth-sensing technology is licensed from the Israeli company PrimeSense. Although the exact technology is not disclosed, it is based on the structured light principle. The IR projector is an IR laser that passes through a diffraction grating and turns into a set of IR dots. The relative geometry between the IR projector and the IR camera as well as the projected IR dot pattern are known. If we can match a dot observed in an image with a dot in the projector pattern, we can reconstruct it in 3D using triangulation. Because the dot pattern is relatively random, the matching between the IR image and the projector pattern can be done in a straightforward way by comparing small neighborhoods using, for example, normalized cross correlation.
The depth values produced by the Kinect sensor are sometimes inaccurate because the calibration between the IR projector and the IR camera becomes invalid. This could be caused by heat or vibration during transportation or a drift in the IR laser. To address this problem, the Kinect team developed a recalibration technique using the card that is shipped with the Kinect sensor. If users find that the Kinect is not responding accurately to their actions, they can recalibrate the Kinect sensor by showing it the card. The idea is an adaptation of earlier camera calibration technique.
KINECT SKELETAL TRACKING
The innovation behind Kinect hinges on advances in skeletal tracking. The operational envelope demands for commercially viable skeletal tracking are enormous. Simply put, skeletal tracking must ideally work for every person on the planet, in every household, without any calibration. A dauntingly high number of dimensions describe this envelope, such as the distance from the Kinect sensor and the sensor tilt angle. Entire sets of dimensions are necessary to describe unique individuals, including size, shape, hair, clothing, motions, and poses. Household environment dimensions are also necessary for lighting, furniture and other household furnishings, and pets. In skeletal tracking, a human body is represented by a number of joints representing body parts such as head, neck, shoulders, and arms. Each joint is represented by its 3D coordinates. The goal is to determine all the 3D parameters of these joints in real time to allow fluent interactivity and with limited computation resources allocated on the Xbox 360 so as not to impact gaming performance. Rather than trying to determine directly the body pose in this high-dimensional space, Jamie Shotton and his team met the challenge by proposing per-pixel, body-part recognition as an intermediate step. Shotton’s team treats the segmentation of a depth image as a per-pixel classification task (no pair wise terms or conditional random field are necessary). Evaluating each pixel separately avoids a combinatorial search over the different body joints. For further speedup, the classifier can be run in parallel on each pixel on a graphics processing unit (GPU). Finally, spatial modes of the inferred per-pixel distributions are computed using mean shift resulting in the 3D joint proposals. An optimized implementation of our algorithm runs in under 5 ms per frame (200 frames per second) on the Xbox 360 GPU. It works frame by frame across dramatically differing body shapes and sizes, and the learned discriminative approach naturally handles self occlusions and poses cropped by the image frame.
Head-Pose and Facial-Expression Tracking
Head-pose and facial-expression tracking has been an active research area in computer vision for several decades. It has many applications including human-computer interaction, performance-driven facial animation, and face recognition. Most previous approaches focus on 2D images, so they must exploit some appearance and shape models because there are few distinct facial features. They might still suffer from lighting and texture variations, occlusion of profile poses, and so forth. Related research has also focused on fitting morphable models to 3D facial scans.
CONCLUSION
Kinect is a "controller-free gaming and entertainment experience" for the Xbox 360. By integrating all these techniques to a single console, Kinect act as a perfect device for creating a virtual reality for the user. Several project researches are now carried out using Kinect as the main tracking device. Some of the researches and projects have already proved that Kinect is not just a gaming console, but also an eye to a computer.