03-10-2012, 04:44 PM
EE368 Digital Image Processing Project - Automatic Face Detection Using Color Based Segmentation and Template/Energy Thresholding
Digital Image Processing.pdf (Size: 1.18 MB / Downloads: 156)
INTRODUCTION
The purpose of this project has been to try to replicate on a
computer that which human beings are able to do effortlessly
every moment of their lives, detect the presence or absence
of faces in their field of vision. While it is something that
to a layman appears trivial, to implement the necessary steps
leading to the successful execution of this in an algorithm is
difficult and still an unsolved problem in computer vision.
In EE368 we have been given the task of using a collection
of seven digital images to train and develop a system for doing
just this in a competitive format. The only real limitation is
that it run under seven minutes for a single file. In deriving a
method of our own, we initially began by reviewing various
articles on the topic as well the material covered in lecture.
We explored the possibility of using some of the methods that
have been explored by researchers thus far, such as neural
networks, statistical methods, machine learning algorithms
(SVM, FLD), PLC (such as Eigenfaces and the concept of
a ”face space”), as well as a newer methodology called Maximum
Rejection Classification (MRC). We initially attempted
to devise a system that linked Eigenface based front-end with
a neural network based back-end, but the neural network
machinery proved rather difficult to train and develop in a
manner that allowed us to understand the inner workings of
our system. We were unsuccessful in being able to generalize
the training data to unseen images and were prevented by the
nature of the neural network to really have a grasp of the
particular shortcomings of our system and what could be done
to improve it. Hence we decided to abandon that approach
and pursue a method based on color segmentation followed
by template/energy matching. This system has been shown to
be reasonably fast, taking on the average of 80 to 120 seconds
to run, depending on the internal downsampling rate applied
to the input image and various other parameters that can be
adjusted. With the final parameter values that we decided on, it
runs for approximately 100 seconds on a Dell 1.8Mhz Pentium
IV laptop.
COLOR BASED SEGMENTATION
Assuming that a person framed in any random photograph
is not an attendee at the Renaissance Fair or Mardi Gras,
it can be assumed that the face is not white, green, red,
or any unnatural color of that nature. While different ethnic
groups have different levels of melanin and pigmentation, the
range of colors that human facial skin takes on is clearly a
subspace of the total color space. With the assumption of a
typical photographic scenario, it would be clearly wise to take
advantage of face-color correlations to limit our face search
to areas of an input image that have at least the correct color
components.
LOWER PLANE MASKING
While in general it would destroy the generality of a
detector, in our case we believe that its reasonable to take
advantage of a priori knowledge of where faces are most
likely to be and not be to remove ”noise”. We observed that
in the training images that no faces ever appeared in the
lower third of the image field. With very high probability it
is likely that the scenarios where our system will be used
(i.e. the testing images) that the same will be true since we
know that the conditions in which the pictures were taken are
identical. Hence, we removed the lower portion of the image
from consideration to remove the possibility of false alarms
originating from this region.
RESULTS AND CONCLUSION
Using the algorithm described in the previous section has
produced rather reasonable results when applied to the various
training images. For the particular image that this report has
been looking at, we were able to accurately detect the 22
present faces and had no false alarms or misses. The results
may be viewed in figure 19.
Our results when applied to the other testing images ranged
from approximately 85% to 100%. We are looking forward
to seeing how it performs when applied to future test images.
Our algorithm is reasonably fast in that it performs typically
in approximately 100 seconds or so and is sufficiently accurate
given the difficulty of the problem.
We will note that the distribution of work was even between
both team members, with both members contributing to all
aspects of the project it fairly equal amounts.