17-01-2013, 01:25 PM
REAL-TIME FACE RECOGNITION ON A MIXED SIMD VLIW ARCHITECTURE
1REAL-TIME FACE.pdf (Size: 192.55 KB / Downloads: 21)
Abstract
There is a rapidly growing demand for using
intelligent cameras for various applications in surveillance
and identication. Most of these applications have real-time
demands and require huge processing capacity. Face recognition
is one of those applications highly in demand. In this
paper we show that we can run face recognition in real-time
by implementing the algorithm on an architecture which combines
a massively parallel processor with a high performance
Digital Signal Processor.
In this project we focus on the INCA+ intelligent camera.
It contains a CMOS sensor, a Single Instruction Multiple
Data (SIMD) processor [1] and a Very Long Instruction
Word (VLIW) processor. The SIMD processor enables
high-performance pixel processing and detects the interesting
(face) regions from the video. It sends the regions of interest
to the VLIW processor, which performs the actual face
recognition using a neural network. With this architecture
we perform face recognition from a 5-persons database at
more than 200 faces per second. The performance is better
than high-end professional systems that are in use now [2].
INTRODUCTION
Recently, face detection and recognition is becoming an
important application for intelligent cameras. Face detection
and recognition requires lots of processing performance
if real-time constraints are taken into account[3].
It is difcult, because for a robust recognition, the face
needs to be at a proper angle and completely in front of
the camera. Also, the size of the face has to be spanning
the correct range of pixels. If the face portion of the image
does not contain enough pixels, a reliable detection cannot
be made.
What we want to show in this publication is that it is
possible using thought-over smart camera architectures to
achieve good, real-time face recognition results. A smart
camera is hereby dened as a stand-alone programmable
device with a size equal to or smaller than a typical video
surveillance camera.
ARCHITECTURE
Face recognition consists of a face detection and face recognition
part. In the detection part faces are detected in the
scene and the relevant parts of the image are forwarded
to the face recognition process where the found face is
matched to a database with a set of stored faces in order
to recognize and put an identication to it.
These two parts of the algorithms work on different data
structures. While the detection parts works on all pixels
of the captured video and is pixel oriented (low level image
processing).
FACE DETECTION
In face detection we take an image from sensor and detect
and localize an unknown number (if any) of faces. Before
enabling the detection and localization, the image should
be segmented to the region which face should be there.
This is done by colour specic selection. By removing
too small regions and enforcing a certain aspect ratio of
the selected region of interest (ROI) the detection becomes
more reliable. We detect faces in the image by searching
for the presence of skin-tone coloured pixels or groups of
pixels. The representation of pixels as they are delivered
by the colour interpolation routines from the CMOS sensor
image are in RGB form . This is not very suitable
for characterizing skin colour. The components in RGB
space not only represent colour but also luminance, which
varies from situation to situation by going to a normalized
colour domain this effect is minimized. The effect on the
skin detection is that when a light condition changes the
skin tone has also changed colour. With this feature the
detection decreases in reliability. To solve this obstacle,
there is the need to nd for a colour domain which separated
the luminance with the colour.
Architecture of RBF Neural Network
An RBF neural network structure is demonstrated in Figure
7. Its architecture is similar to that of a traditional
three-layer feed forward neural network. The input layer
of this network is a set of n units, which accepts the elements
of an n-dimensional input feature vector (here, the
RBF neural network input is the face which is gained from
the face detection part. Since it is normalized with a 6472
pixel face, it follows that n = 4608). The input units are
completely connected to the hidden layer with m hidden
nodes. Connections between the input and the hidden layers
have xed unit weights and, consequently it is not necessary
to train them. The purpose of the hidden layer is to
cluster the data and decrease its dimensionality. The RBF
hidden nodes are also completely connected to the output
layer. The number of outputs depends on the number
of people to be recognized (for example, for 100 persons
o = 100). The output layer provides the response to the
activation pattern applied to the input layer. The change
from the input space to the RBF unit space is nonlinear,
whereas the change from the RBF hidden unit space to
the output space is linear.
FACE RECOGNITION
The main goal of this section is to introduce the face recognition
process. Through this process, the face detected in
the previous section is identied with respect to the face
database. For this purpose a Radial Basis Function (RBF)
neural network is used [6]. The reason behind using an
RBF neural network is its ability for clustering similar
images before classifying them. RBF based clustering
received wide attention in the neural networks community.
Using RBF neural network
The rst step in face recognition is normalizing the region
of interest(Figure 8) to the size of the faces stored
in the identication database(64 72 pixels) and after that
feed them to the neural network input. Subsequently, we
calculate the output for each person, and we consider the
maximum value between the outputs and report that as the
recognized person. Figure 9 shows the main kernel for using
the RBF neural network.
CONCLUSIONS AND FUTURE WORK
Face recognition is becoming an important application for
smart cameras. However, up till now, the processing required
for real-time detection, prohibits integration of the
whole application into a small sized, consumer type of
camera. This paper showed that by:
1. Proper selection of algorithms, both for face detection
and recognition,
2. Adequate choice of processing architecture, supporting
both SIMD and ILP types of parallelism,
3. Tuning the mapping of algorithms to the selected
architecture,