21-01-2013, 12:45 PM
Parallel Genetic Algorithm for Document Image Compression Optimization
1Parallel Genetic.pdf (Size: 775.28 KB / Downloads: 23)
Abstract
This work proposes a parallel genetic algorithm for
compressing scanned document images. A fitness
function is designed with Hausdorff distance which
determines the terminating condition. The algorithm
helps to locate the text lines. A greater compression ratio
has achieved with lesser distortion.
INTRODUCTION
Document image processing is a very important matter in
our day to day life. Even though computerization is done in
many of the offices, piles of old documents are remaining to
enter in the digital world. Survey and land records,
documents in registration offices, patent information,
valuable manuscripts in palm leaf etc are some of the
examples. Document Image Processing is a shortcut for
entering such documents in the digital world. Document
imaging help to capture, store, search and retrieve images
easily. Document Image Processing involves image
segmentation, analysis, enhancement, compression,
reconstruction, transmission of document image etc.
EARLIER WORKS AND PRESENT STATUS IN DOCUMENT
IMAGE COMPRESSION
Document Image Compression helps us to use less
storage space and easy access of data. There are two types of
Compression techniques: Lossless and Lossy. Lossy
Compression techniques are applicable for ordinary digital
images, because of the limitation of our eyes. Even though
certain pixel portions are lost human eye can interpret the
image. But this is not completely adoptable in the case of
Document image compression. If some of the text portions
are lost the meaning may change or the reader may be unable
to deduce the meaning. Some of the conventional encoding
technique for compression are Huffman coding, inter pixel
redundancy, psycho visual Redundancy, JPEG, JPEG2000
compression etc.
B F Wu, C C Chin and Y L Chan [5] provide a method to
compress the text plane using the pattern matching technique,
called JB2. Wavelet transform and zero tree coding are used
to compress the background and the text’s color plane in
their paper “Algorithm for compressing compound document
images with large text/background overlap”. In their Paper,
“Binary image compression using identity mapping
backpropagation neural network” [1], Murshed, Nabeel A,
Bortolozzi, Flavio, Sabourin and Robert describes the
compression of handwritten signatures and their
reconstruction. They observed that, the lowest and highest
reconstruction errors were 3.05 multiplied by 10-3% and
0.01% respectively. Patrice Y Simard, Henrique S Malvar,
James Rinker and Erin Renshaw proposed a system in their
paper “A Foreground/ Background separation Algorithm for
Image Compression” [2] known as SLIm (Segmented
layered Image) for separating text and line drawing from
background images, in order to compress both more
effectively.
COMPRESSION OPTIMIZATION USING PARALLEL
GENETIC ALGORITHM
Compression of Document images was very relevant
for managing disk space for a long period. Now the storage
cost is reduced and the bottle neck lies at transmission time.
And due to advancement in technology it is easy to retrieve
information and store files without much delay. But
considering the internet traffic, it is relevant to reduce the
file size for transferring information with high speed. And
most of the internet service providers fix their rate for
certain GB of downloadable data. In such situations, if there
are efficient lossless compression algorithms available to
compress document images significantly, then it will help to
reduce congestion in the network. The structure of the
document image vary from language to language, context to
context, content to content etc. Even though it contain
certain common features like line spacing, word spacing,
column spacing etc. there are certain difficulties and issues
in identifying regions, i.e. certain difficulties in identifying
boundaries, sometimes not exact boundaries, blurred edges,
incomplete edges etc.