02-11-2012, 05:08 PM
Object Identification and Object Tracking
Object Identification and Object Tracking .doc (Size: 208 KB / Downloads: 22)
Introduction
Automatic detection and tracking of moving object is very important task for human computer interface video communication/expression and security and surveillance system application and so on. Various imaging techniques for detection, tracking and identification of the moving objects have been proposed by many researchers. The object detection can be divided atleast into five conventional approaches: frame difference, background subtraction, and optical flow, skin color extraction and probability based approaches. The object tracking method can be categorized into four categories: region based tracking, active contour based tracking, feature based tracking and model based tracking. The object identification is performed to evaluate the effectiveness of the tracking object especially when the object occlusion happens. It can be done by measuring the similarity between the object model and the tracked object.
In object detection methodology, many researchers have developed their methods. (Liu etal., 2001) proposed background subtraction to detect moving regions in an image by taking the difference between current and reference background image in a pixel-by-pixel. It is extremely sensitive to change in dynamic scenes derived from lighting and extraneous events etc. In another work, (Stauffer & Grimson, 1997) proposed a Gaussian mixture model based on background model to detect the object. (Lipton et al., 1998) proposed frame difference that use of the pixel-wise differences between two frame images to extract the moving regions. This method is very adaptive to dynamic environments, but generally does a poor job of extracting all the relevant pixels, e.g., there may be holes left inside moving entities. In order to overcome disadvantage of two-frames differencing, in some cases three frames differencing is used. For instance, (Collins et al., 2000) developed a hybrid method that combines three-frame differencing with an adaptive background subtraction model for their VSAM (Video Surveillance and Monitoring) project. The hybrid algorithm successfully segments moving regions in video without the defects of temporal differencing and background subtraction. (Desa & Salih, 2004) proposed a combination of background subtraction and frame difference that improved the previous results of background subtraction and frame difference.
Tracking is then performed at three levels of abstraction: regions, people, and groups. Each region has a bounding box and regions can merge and split. A human is composed of one or more regions grouped together under the condition of geometric structure constraints on the human body, and a human group consists of one or more people grouped together. object detection is achieved by employing frame difference on low resolution image (Sugandi et al., 2007 ) and object tracking employing block matching algorithm based on PISC image (Satoh et al.,2001) and object identification employing color and spatial information of the tracked object(Cheng & Chen, 2006).
Nearly, every tracking system starts with motion detection. Motion detection aims at separating the corresponding moving objects region from the background image. The first process in the motion detection is capturing the image information using a video camera. The motion detection stage includes some image preprocessing step such as; gray-scaling and smoothing, reducing image resolution using low resolution image technique, frame difference, morphological operation and labeling. The smoothing technique is performed by using median filter. The low resolution image is performed in three successive frames to remove the small or fake motion in the background. Then frame difference is performed on those frames to detect the moving object emerging in the scene. The next process is applying morphological operation such as dilation and erosion as filtering to reduce the noise that is remained in the moving object. Connected component labeling is then performed to label each moving object in different label.
The second stage is tracking the moving object. In this stage, we perform a block matching technique to track only the interest moving object among the moving objects emerging in the background. The blocks are defined by dividing the image frame into non-overlapping square parts. The blocks are made based on PISC image that considers the brightness change in all the pixels of the blocks relative to the considered pixel.
A subsequent action, such as tracking, analyzing the motion or identifying objects, requires an accurate extraction of the foreground objects, making oving object detection a crucial part of the system. Our object detection method consists of two main steps. The first step is pre-processing step including gray scaling, smoothing, and reducing image resolution and so on. The second step is filtering to remove the image noise contained in the object. The filtering is performed by applying the morphology filter such as dilation and erosion. And finally connected component labeling is performed on the filtered image. The entire process of moving object detection is illustrated in Fig. 2.
Pre-processing
The first step on the moving object detection process is capturing the image information using a video camera. Image is capture by a video camera as 24 bit RGB (red, green, blue) image which each color is specified using 8-bit unsigned integers (0 through 255) that representing the intensities of each color. The size of the captured image is 320x240 pixels. This RGB image is used as input image for the next stage. In order to reduce the processing time, gray-scale image is used on entire process instead of color image. The gray-scale image only has one color channel that consists of 8 bit while RGB image has three color channels. Image smoothing is performed to reduce image noise from input image in order to achieve high accuracy for detecting the moving objects.
To detect the moving object from the background based of image subtraction, generally there are three approaches can be performed: (i) background subtraction (ii)frame difference and (iii) combination of background subtraction and frame difference .Background subtraction is computing the difference between the current and the reference background image in a pixel-by-pixel. Frame difference is computing the difference image between the successive frames image.
Filtering
In order to fuses narrow breaks and long thin gulfs, eliminates small holes, and fills gaps in the contour, a morphological operation is applied to the image. As a result, small gaps between the isolated segments are erased and the regions are merged. To extract the bounding boxes of detected objects, connected component analysis was used. Morphological operation eliminates background noise and fills small gaps inside an object.This property makes it well suited to our objective since we are interested in generating masks which preserve the object boundary.
Object tracking
After the object detection is achieved, the problem of establishing a correspondence between
object masks in consecutive frames should arise. Indeed, initializing a track, updating it robustly and ending the track are important problems of object mask association during tracking. Obtaining the correct track information is crucial for subsequent actions, such as object identification and activity recognition. Tracking process can be considered as a region mask association between temporally consecutive frames and estimating the trajectory of an object in the image plane as it moves around a scene.