10-12-2012, 01:08 PM
A Novel Method for Super Imposed Text Extraction in a Sports Video
A Novel Method for Super.pdf (Size: 355.27 KB / Downloads: 33)
ABSTRACT
Video is one of the sources for presenting the valuable
information. It contains sequence of video images, audio and
text information. Text data present in video contain useful
information for automatic annotation, structuring, mining,
indexing and retrieval of video. Nowadays mechanically
added (superimposed) text in video sequences provides useful
information about their contents. It provides supplemental but
important information for video indexing and retrieval. A
large number of techniques have been proposed to address this
problem. This paper provides a novel method of detecting
video text regions containing player information and score in
sports videos. It also proposes an improved algorithm for the
automatic extraction of super imposed text in sports video.
First, we identified key frames from video using the Color
Histogram technique to minimize the number of video frames.
Then, the key images were converted into gray images for the
efficient text detection. Generally, the super imposed text
displayed in bottom part of the image in the sports video. So,
we cropped the text image regions in the gray image which
contains the text information. Then we applied the canny edge
detection algorithms for text edge detection. The ESPN
cricket video data was taken for our experiment and extracted
the super imposed text region in the sports video. Using the
OCR tool, the text region image was converted as ASCII text
and the result was verified.
INTRODUCTION
Increasing availability of video data has rekindled interest in
the problems of how to index video information automatically
and how to browse and manipulate them efficiently.
Traditionally, the images and video sequences have been
manually annotated with a small number of key word
descriptors after visual inspection by the human reviewer.
This process can be very time consuming. Text Information
retrieval from video images (Video Annotation) has become
an increasingly important research area in recent years for the
video information retrieval and video mining applications.
Detection and recognition of text captions embedded in image
frames of videos is an important component for video retrieval
and indexing. Video text detection and extraction is an
important step for information retrieval and indexing in video
images [1]. Recognizing video text information directly from
video provides unique benefits.
BACKGROUND
From observation on TV program, normally the superimposed
text is displayed only a single line. The super imposed text
falls in the two categories. (i). Moving text in the video frame.
i.e., the advertisement messages/text and flash news may be
moving from left to right or right to left. In general, text
motion can divided into three classes. There are static, simple
linear motion (for example, scrolling movie credits) and
complex nonlinear motion (for example, zooming in/out,
rotation, or free movement of scene text), respectively. (ii).
Non-moving Text in a video frame, i.e, the sports score card
display changing the score display without moving.
The algorithms for text detection can be classified in two
categories [7], [8] those working on the compressed domain
and those working on the spatial domain. Compressed domain
includes both in the compressed and in the semi-compressed
domain. It is based on the localization of static characters
over moving background taking into account the macroblocks
belonging to P frames Moreover it assumes that text
has horizontal geometry, that it does not occupy the whole
frame and that it has to appear at least in three frames. These
three features allow the algorithm to isolate macro-blocks and
to determinate if the macro blocks are candidates to contain
text.
METHODOLOGY
In this paper, we provide a robust detection method for super
imposed text in sports videos. We presented an efficient
algorithm for super imposed text region extraction of super
imposed text in video. The extraction of super imposed text in
a video frame consists of six steps. They are, 1. Video Frame
Extraction; 2. Key Frame Extraction; 3. Grayscale
Conversion; 4. Cropping the Video Image; 5. Canny Edge
Detection; 6. Text region Retrieval.
The video consists of sequence of images (video frames). In
the first step, we extracted all frames in the video and saved as
JPEG images. To reduce the number of frames, in the second
step key-frame selection was performed. We have applied the
Histogram technique for selecting the key frames. The
selected the key frames have difference greater than given
threshold value. The threshold value was selected depending
on the video domain. Key frame is the frame which has major
difference from previous frame. It uses adjacent frame
subtraction with threshold value. The next process is grayscale
conversion. It uses the grayscale conversion equation gs =
red*0.5+green*0.3+blue*0.2 to convert the image into
grayscale images. The images which are converted into gray
scale are easy to process the image. It usually gray of the color
picture to elevate processing speed and decrease resource
occupation when processing images.
CONCLUSION
There are numerous applications of a video text information
extraction system, including vehicle license plate extraction
(surveillance video data); text based video indexing, video
content analysis and video event identification. In this paper,
we proposed a robust extraction method of text information in
sports videos. Our proposed algorithm can only detect the
videotext in the boundary of the image. In general, there is
little valuable information in non-play shots because these
non-play shots include scenes when the game is temporally
halted by scoring, no run play, or time-out, and do not carry
any meanings related to the content of a game. In future
research, we have to extract only the play shots. We can
combine the audio information such as applause and cheering
to generate more exciting sports summaries. However, the text
extraction results are inappropriate for general OCR software:
text enhancement is needed for low quality video images and
more adaptability is required for general cases (e.g., inverse
characters, 2D or 3D deformed characters, polychrome
characters, and so on).