09-08-2012, 10:23 AM
Text detection and recognition in images and video frames
1Text detection.pdf (Size: 680.53 KB / Downloads: 105)
Abstract
This paper presents a new method for detecting and recognizing text in complex images and video frames. Text detection
is performed in a two-step approach that combines the speed of a text localization step, enabling text size normalization,
with the strength of a machine learning text veri3cation step applied on background independent features. Text recognition,
applied on the detected text lines, is addressed by a text segmentation step followed by an traditional OCR algorithm within
a multi-hypotheses framework relying on multiple segments, language modeling and OCR statistics. Experiments conducted
on large databases of real broadcast documents demonstrate the validity of our approach.
? 2003 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.
Introduction
Content-based multimedia database indexing and retrieval
tasks require automatic extraction of descriptive features
that are relevant to the subject materials (images, video,
etc.). The typical low-level features that are extracted in images
and video include measures of color [1], texture [2], or
shape [3]. Although these features can easily be obtained,
they do not give a precise idea of the image content. Extracting
more descriptive features and higher level entities,
such as text [4] and human faces [5], has recently attracted
signi3cant research interest. Text embedded in images and
video, especially captions, provide brief and important content
information, such as the name of players or speakers,
the title, location, date of an event, etc. This text can be a
keyword resource as powerful as the information provided
by speech recognizers. Besides, text-based search has been
successfully applied in many applications, while the robustness
and computation cost of feature matching algorithms
based on other high-level features is not eAcient enough to
be applied to large databases.
Related work
In this section, we review existing methods for text detection
and text recognition. These two problems are often
addressed separately in the literature.
Text detection
Text can be detected by exploiting the discriminate properties
of text characters such as the vertical edge density,
the texture or the edge orientation variance. One early approach
for localizing text in covers of Journals or CDs [8]
assumed that text characters were contained in regions of
high horizontal variance satisfying certain spatial properties
that could be exploited in a connected component analysis
process. Smith et al. [14] localized text by 3rst detecting vertical
edges with a prede3ned template, then grouping vertical
edges into text regions using a smoothing process. These
two methods are fast but also produce many false alarms
because many background regions may also have strong
horizontal contrast. The method of Wu et al. [9] for text
localization is based on texture segmentation. Texture features
are computed at each pixel from the derivatives of the
image at diHerent scales. Using a K-means algorithm.
Text detection
There are two problems in obtaining eAcient and robust
text detection using machine learning tools. One is how to
avoid performing computational intensive classi3cation on
the whole image, the other is how to reduce the variance
of character size and gray scale in the feature space before
training. In this paper, we address these problems by proposing
a localization/veri3cation scheme that quickly extracts
text blocks in images with a low rejection rate. This localization
process allows us to further extract individual text
lines and normalize the size of the text. We then perform
precise veri3cation in a set of feature spaces that are invariant
to gray-scale changes.
Discussion and conclusions
This paper presents a general scheme for extracting and
recognizing embedded text of any gray-scale value in images
and videos. The method is split into two main parts: the
detection of text lines, followed by the recognition of text
in these lines.
Applying machine learning methods for text detection
encounters diAculties due to character size and gray-scale
variations and heavy computation cost. To overcome these
problem, we proposed a two-step localization/veri3cation
scheme. The 3rst step aims at quickly locating candidate
text lines, enabling the normalization of characters into a
unique size. In the veri3cation step, a trained SVM or MLP
is applied on background independent features to remove the
false alarms. Experiments showed that the proposed scheme
improves the detection result at a lower cost in comparison
with the same machine learning tools applied without size
normalization, and that an SVM was more appropriate than
an MLP to address the text texture veri3cation problem.