Text Information Extraction in Images and Video: A Survey

project uploader · 03-04-2012, 01:06 PM

Text Information Extraction in Images and Video: A Survey

.pdf

Text Information Extraction in Images and Video A Survey.pdf (Size: 1.43 MB / Downloads: 51)
Abstract
Text data present in images and video contain useful information for automatic annotation, indexing, and structuring of images. Extraction of this information involves detection, localization, tracking, extraction, enhancement, and recognition of the text from a given image. However, variations of text due to differences in size, style, orientation, and alignment, as well as low image contrast and complex background make the problem of automatic text extraction extremely challenging. While comprehensive surveys of related problems such as face detection, document analysis, and image & video indexing can be found, the problem of text information extraction is not well surveyed. A large number of techniques have been proposed to address this problem, and the purpose of this paper is to classify and review these algorithms, discuss benchmark data and performance evaluation, and to point out promising directions for future research.
Keywords: Text information extraction, text detection, text localization, text tracking, text enhancement, OCR
1 Introduction
Content-based image indexing refers to the process of attaching labels to images based on their content. Image content can be divided into two main categories: perceptual content and semantic content [1]. Perceptual content includes attributes such as color, intensity, shape, texture, and their temporal changes, whereas semantic content means objects, events, and their relations. A number of studies on the use of relatively low-level perceptual content [2-6] for image and video indexing have already been reported. Studies on semantic image content in the form of text, face, vehicle, and human action have also attracted some recent interest [7-16]. Among them, text within an image is of particular interest as (i) it is very useful for describing the contents of an image; (ii) it can be easily extracted compared to other semantic contents, and (iii) it enables applications such as keyword-based image search, automatic video logging, and text-based image indexing.
1.1 Text in images
A variety of approaches to text information extraction (TIE) from images and video have been proposed for specific applications including page segmentation [17, 18], address block location [19], license plate location [9, 20], and content-based image/video indexing [5, 21]. In spite of such extensive studies, it is still not easy to design a general-purpose TIE system. This is because there are so many possible sources of variation when extracting text from a shaded or textured background, from low-contrast or complex images, or from images having variations in font size, style, color, orientation, and alignment. These variations make the problem of automatic TIE extremely difficult.
Figures 1-4 show some examples of text in images. Page layout analysis usually deals with document images1 (Fig. 1). Readers may refer to papers on document segmentation/analysis [17, 18] for more examples of document images. Although images acquired by scanning book covers, CD covers, or other multi-colored documents have similar characteristics as the document images (Fig. 2), they can not be directly dealt with using a conventional document image analysis technique. Accordingly, this survey distinguishes this category of images as multi-color document images from other document images. Text in video images can be further classified into caption text (Fig. 3), which is artificially overlaid on the image, or scene text (Fig. 4), which exists naturally in the image. Some researchers like to use the term ‘graphics text’ for scene text, and ‘superimposed text’ or ‘artificial text’ for caption text [22, 23]. It is well known that scene text is more difficult to detect and very little work has been done in this area. In contrast to caption text, scene text can have any orientation and may be distorted by the perspective projection. Moreover, it is often affected by variations in scene and camera parameters such as
1 The distinction between document images and other scanned images is not very clear. In this paper, we refer to images
illumination, focus, motion, etc.
(a) (b) ©
Fig. 1. Grayscale document images: (a) single-column text from a book, (b) two-column page from a journal (IEEE Transactions on PAMI), and © an electrical drawing (courtesy of Lu [24]).
(a) (b) ©
Fig. 2. Multi-color document images: each text line may or may not be of the same color.
(a) (b) ©
Fig. 3. Images with caption text: (a) shows captions overlaid directly on the background. (b) and © contain text in frames for better contrast. © contains a text string that is polychrome.
with text contained in a homogeneous background as document images.
Fig. 4. Scene text images: Images with variations in skew, perspective, blur, illumination, and alignment.
Before we attempt to classify the various techniques used in TIE, it is important to define the commonly used terms and summarize the characteristics2 of text that can be used for TIE algorithms. Table 1 shows a list of properties that have been utilized in recently published algorithms [25-30]. Text in images can exhibit many variations with respect to the following properties:
1. Geometry:

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	A Change Information Based Fast Algorithm for Video Object Detection and Tracking	seminar ideas	1	7,728	06-09-2017, 01:24 PM Last Post: jaseela123
	Hospitality Information Technology ppt	project girl	1	2,200	29-08-2017, 03:04 PM Last Post: jaseela123
	Information Technology	mkaasees	0	316	25-08-2017, 09:32 PM Last Post: mkaasees
	Emerging Information Technology	mkaasees	0	303	25-08-2017, 09:32 PM Last Post: mkaasees
	Pivot Vector Space Approach in Audio-Video Mixing	computer science crazy	0	12,308,095	25-08-2017, 09:32 PM Last Post: computer science crazy
	A Novel approach for data hiding in the video motion using crypt analytical	mkaasees	0	339	06-10-2016, 12:11 PM Last Post: mkaasees
	INFORMATION TECHNOLOGY ACT, 2000	mkaasees	0	319	04-10-2016, 03:47 PM Last Post: mkaasees
	A NOVEL APPROACH FOR IMAGE AND VIDEO STEGANOGRAPHY TECHNIQUE TO EMBED IMAGES	mkaasees	0	281	17-09-2016, 11:57 AM Last Post: mkaasees
	Clustering of Time Series With HMM Model - A Survey	mkaasees	0	266	15-09-2016, 10:49 AM Last Post: mkaasees
	Comparison and Implementation of Huffman and Arithmetic Encoding and Decoding on FPGA	mkaasees	0	392	07-09-2016, 11:41 AM Last Post: mkaasees

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.