04-05-2011, 12:14 PM
Abstract
We discuss and present a computer vision system thatmonitors a sitting baby of ages of 6 months old to oneyear-old. The system tracks a baby's mouth and handsand alarms the guardians if it finds some objects in thebaby's hand going towards his or her mouth. We useskin detection, grayscale based Sum of SquareDifferences (SSD) and template matching for trackingthe hands and the head of a baby.
1. Introduction
It is important that babies are never left unattended.However it is unavoidable and in these cases a computervision surveillance system could be very helpful to thebaby’s guardians.This system would be monitoring a sitting unattendedchild, when a caregiver has left the room, or turnedaway to retrieve something and alert them if it finds thebaby putting things in the mouth. This system could alsoprovide some statistical data to psychologists interestedin studying child behavior.The purpose of the system is not to replace guardiansbut to provide an aid to them. To this end, it tracks ababy’s mouth and hands using computer visionmethods. The system relies on only one camera. Weassume that baby is wearing a short-sleeved shirt and ashort pair of pants. Moreover, the sitting baby should beupright, stationary and in a sitting position facing thecamera when the system starts so that the system candetect the necessary features such as eyes, hands, legsand the mouth. No hands will be tracked until thesystem finds at least one instance when the baby’s handsare not occluded. A plain dark, non-skin colorbackground is assumed as well.
2. Previous work
Work especially on recognizing baby’s gesture has notyet been investigated to our knowledge. However, alarge amount of research has been performed on gesturerecognition in general, using various methods. Thesemethods include motion-based recognition [3], HiddenMarkov Model (HMM) [2] and Neural Networks [5].Cutler and Turk [7] have used optical flow for gesturerecognition.Our goal also involves hand tracking and head/mouthtracking. A lot of research has been performed to tacklethis difficult task of body parts tracking. Martin, Devinand Crowley [6] use an architecture in which asupervisor selects and activates visual processes. Theyuse image differencing and normalized histogrammatching for hand tracking. To track hands, Heap andSamaria [8] used 2D deformable models and geneticalgorithms. 3D information [9] has also been commonlyused to track hands. Many of these approaches havebeen adapted for head tracking as well. Templatematching [4] has been a common approach for trackingand action recognition. Texture models were also usedfor the head-tracking task by Schödl, Haro and Essa[10].Most of these approaches use data images containingonly the head and hands. The problem that we’re tryingto solve is much harder because we’re simultaneouslytracking the head and hands in an image that containsthe whole body of a baby (Figure 2).
3. The finite state machine
The system is based on a finite state machine (Figure1). The following is a brief discussion of the states andconditions involved in the finite state automaton.The system begins by keeping track of the hands ofthe baby being monitored. As soon as the system findsone of these hands occluding the head region, it checks the hand motion history. If the distance between one ofthe hands and the mouth/head region decreases and thesame hand occludes the head/mouth region, then itchecks if either an object or a hand has covered themouth. If the mouth is covered then the system alarmsor in our prototype it displays “>>>>!DANGER!<<<<”text right on the top of the mouth region (see section onResults - V) in the corresponding image. Otherwise thesystem will go to the original state of “hand-mouthocclusion”. Every time, before the alert occurs, if a handoccludes the head region which is if two conditions outof three required for the system alert are met, then it willdisplay, “Possible Danger” text on the left side of thatparticular image (see Results - IV) making the care giveralert of the situation.
4. Stepwise discussion of the algorithm
The following are five main steps involved in thesystem:1. Find the skin regions of the image.2. Find the connected components of theseregions: the hands, legs and the head.3. Find the eyes and the mouth.4. Track the hands.5. Alert the guardians if it meets three of theconditions on the gesture recognition.Step 1: Find skin regions of the imageThe first step in our approach is to subject the inputimage to a skin detection algorithm, originallydeveloped by Kjeldsen et al. [1], to find the skin regions(Figure 3) of the baby in the image. The idea, there, is tomanually mark a subset of the RGB color space for thetraining phase and so that the algorithm can recognizeskin-colored regions in the future test images.
Download full report
http://www.cse.unr.edu/~bebis/326_Bhatt_J.doc