07-09-2016, 11:34 AM
1453445534-ResearchPaper.pdf (Size: 882.59 KB / Downloads: 13)
Abstract- Data compression refers to reducing the quantity of data used to represent a file, image
or video content without excessively reducing the quality of the original data. It has got a key
importance in today’s technological world for efficient transmission and storage of various types
of data. It is one of the major technologies that have caused today’s information revolution.
Evolving technologies like 3G, 4G and LTE has made it necessary to have bundles of data for
every end user.
Data Compression would not have been possible without efficient and reliable data compression
techniques. Files, images, audio and video clips are compressed before transmitted through a
communication channel. This saves a lot of space and the reduced file sizes occupy less
bandwidth. Hence resources are efficiently used.
In this thesis, we compared two major entropy coding techniques, namely Arithmetic and
Huffman coding. We compared and implemented them on Matlab. The results show that
Huffman coding is simple, easy to implement and has better performance. Arithmetic coding
achieves higher compression ratio than Huffman coding, but it has more complex operations and
execution time.
Experimental results showed that the compression ratio of the arithmetic coding for text files is
better than Huffman coding, while the performance of the Huffman coding is better than the
arithmetic coding.
Introduction:
Coding is a process in which any sort of data is converted into bits in order to efficiently and
effectively transmit it through a communication channel. The present era can unambiguously be
called as the “Digital Era”, in which each and every sort of data is digitized before transmission.
Coding is a scheme of representing or converting source data into binary form, by using different
algorithms, in order not to lose the data integrity, privacy, authentication and security etc.
There are different coding algorithms currently in use across the globe. In this thesis,
we are concerned about source coding, though we will touch channel coding as well. Some of the
very well-known source coding techniques are Arithmetic coding, Huffman coding and LZW
coding.
Source coding is also referred to as data compression. The basic purpose of source coding is to
compress the data as much as possible in order to reduce the size of the file, traffic or load on the
communication link and most importantly, the security of the data. There are numerous source
coding techniques currently in use all over the world, like Huffman coding, Arithmetic coding,
LZW coding etc. Each of these techniques has their own advantages and disadvantages that
depend upon the type of data being compressed. For example Lempel Ziv Welch (LZW) coding
is useful only if we can extend the basic dictionary which is used to codify the characters in a
file.
Data Compression:
Data compression is one of the most important topics in Communications. It tends to
reduce the number of bits required to store or transmit information. Transmitting data using
fewer bits reduces the file size that is being sent over a communication channel, so the data can
be sent quickly because of the compressed file size. Upon receiving the data, one can download
it in much less time than if the data was not compressed.
In this chapter we are going to discuss various compression techniques that are used for
secure transmission of data over a communication channel. Furthermore, we will discuss the two
major classes of data compression, namely lossless and lossy data compression.
Data compression uses several algorithms in order to reduce the number of bits
required to store or transmit information. As it is evident that sending data or information over
any communication channel without compression has several disadvantages, e.g. more channel
bandwidth will be consumed, more time will be required for data processing and sending and
more resources will be wasted. Thus, it has been compulsory to compress each and every kind of
data before transmitting it.
Compression is used everywhere. The images we receive over the internet are
compressed in JPEG, GIF or PNG formats. Similarly, we send different important academic or
non academic files to our friends.
Data compression is divided into two major categories: Lossless and Lossy compression.
Lossless Data Compression
As the name implies, in this type of compression no information is lost at all. The original
data can be exactly recovered from the encrypted data. This type of compression is used for
applications that cannot tolerate any difference from the original and the decrypted data. Entropy
encoding is the best example of lossless data compression.
Lossless data compression is used mostly in text compression, because information loss
in text is indispensable. It is used in many applications, e.g. zip file format.
Lossy Data Compression
It is a type of data compression technique in which less important part of the message is
discarded and more important part is retained; some of the information is lost upon
decompression. Multimedia compressions (audio, video and images) are mostly lossy and its
some applications are streaming media and internet telephony.
JPEG image format and mp3 audio format are some examples of lossy compression.
Data compression has many advantages. Some of them are discussed below:
1. Cost Saving
2. Reduce storage capacity
3. Bandwidth saving
4. Less probability of transmission errors
5. Security Assurance
Huffman Coding
Huffman coding is one of the variable length entropy encoding techniques used in lossless data
compression. This technique was developed by David Huffman as a class assignment when he
was a Ph.D student at MIT. In this type of coding, more frequently occurring letters are assigned
with shorter code words and less frequently occurring letters with longer code words.
Now we illustrate Huffman coding with the help of an example:
Suppose we have some text, say “MISSISSIPI”. We compress this text using Huffman coding.
Each character with its frequency of occurrence in the particular text has been written down.