07-09-2016, 11:41 AM
1453445594-ResearchPaper.docx (Size: 85.62 KB / Downloads: 5)
Abstract- Data compression refers to reducing the quantity of data used to represent a file, image or video content without excessively reducing the quality of the original data. It has got a key importance in today’s technological world for efficient transmission and storage of various types of data. It is one of the major technologies that have caused today’s information revolution. Evolving technologies like 3G, 4G and LTE has made it necessary to have bundles of data for every end user.
Data Compression would not have been possible without efficient and reliable data compression techniques. Files, images, audio and video clips are compressed before transmitted through a communication channel. This saves a lot of space and the reduced file sizes occupy less bandwidth. Hence resources are efficiently used.
In this thesis, we compared two major entropy coding techniques, namely Arithmetic and Huffman coding. We compared and implemented them on Matlab. The results show that Huffman coding is simple, easy to implement and has better performance. Arithmetic coding achieves higher compression ratio than Huffman coding, but it has more complex operations and execution time.
Experimental results showed that the compression ratio of the arithmetic coding for text files is better than Huffman coding, while the performance of the Huffman coding is better than the arithmetic coding.
Introduction:
Coding is a process in which any sort of data is converted into bits in order to efficiently and effectively transmit it through a communication channel. The present era can unambiguously be called as the “Digital Era”, in which each and every sort of data is digitized before transmission. Coding is a scheme of representing or converting source data into binary form, by using different algorithms, in order not to lose the data integrity, privacy, authentication and security etc.
There are different coding algorithms currently in use across the globe. In this thesis, we are concerned about source coding, though we will touch channel coding as well. Some of the very well-known source coding techniques are Arithmetic coding, Huffman coding and LZW coding.
Source coding is also referred to as data compression. The basic purpose of source coding is to compress the data as much as possible in order to reduce the size of the file, traffic or load on the communication link and most importantly, the security of the data. There are numerous source coding techniques currently in use all over the world, like Huffman coding, Arithmetic coding, LZW coding etc. Each of these techniques has their own advantages and disadvantages that depend upon the type of data being compressed. For example Lempel Ziv Welch (LZW) coding is useful only if we can extend the basic dictionary which is used to codify the characters in a file.
Data Compression:
Data compression is one of the most important topics in Communications. It tends to reduce the number of bits required to store or transmit information. Transmitting data using fewer bits reduces the file size that is being sent over a communication channel, so the data can be sent quickly because of the compressed file size. Upon receiving the data, one can download it in much less time than if the data was not compressed.
In this chapter we are going to discuss various compression techniques that are used for secure transmission of data over a communication channel. Furthermore, we will discuss the two major classes of data compression, namely lossless and lossy data compression.
Data compression uses several algorithms in order to reduce the number of bits required to store or transmit information. As it is evident that sending data or information over any communication channel without compression has several disadvantages, e.g. more channel bandwidth will be consumed, more time will be required for data processing and sending and more resources will be wasted. Thus, it has been compulsory to compress each and every kind of data before transmitting it.
Compression is used everywhere. The images we receive over the internet are compressed in JPEG, GIF or PNG formats. Similarly, we send different important academic or non academic files to our friends.
Data compression is divided into two major categories: Lossless and Lossy compression.
Lossless Data Compression
As the name implies, in this type of compression no information is lost at all. The original data can be exactly recovered from the encrypted data. This type of compression is used for applications that cannot tolerate any difference from the original and the decrypted data. Entropy encoding is the best example of lossless data compression.
Lossless data compression is used mostly in text compression, because information loss in text is indispensable. It is used in many applications, e.g. zip file format.
Lossy Data Compression
It is a type of data compression technique in which less important part of the message is discarded and more important part is retained; some of the information is lost upon decompression. Multimedia compressions (audio, video and images) are mostly lossy and its some applications are streaming media and internet telephony.
JPEG image format and mp3 audio format are some examples of lossy compression.
Data compression has many advantages. Some of them are discussed below:
1. Cost Saving
2. Reduce storage capacity
3. Bandwidth saving
4. Less probability of transmission errors
5. Security Assurance
Huffman Coding
Huffman coding is one of the variable length entropy encoding techniques used in lossless data compression. This technique was developed by David Huffman as a class assignment when he was a Ph.D student at MIT. In this type of coding, more frequently occurring letters are assigned with shorter code words and less frequently occurring letters with longer code words.
Arithmetic Coding
Arithmetic coding, like Huffman coding, is a form of entropy encoding technique used for lossless data compression. It differs from other types of entropy encoding techniques, like Huffman coding in the sense that rather than encoding each symbol separately with a codeword, we encode the entire message altogether.
Arithmetic coding provides an effective way to remove redundancy. We need to decompose the data into a sequence of events and encode the events using less number of bits. Frequently occurring events are assigned shorter code words than less frequently occurring ones. Let e1,e2, e3 …….., en be mutually exclusive events with probability distribution P. Shannon proved that the smallest possible expected number of bits needed to encode an event is the entropy of P, can be represented by
H(P)=∑_(k=1)^n-p{ek}log2p{ek}
H(P)=∑ -P{ek}log2 P{ek}
P{ek} is the probability of event ek.
Algorithm for Arithmetic coding
The following algorithm is used for encoding a file using Arithmetic coding:
Begin with an interval [L,H) initialized to [0,1).
Perform two steps for each event in a file:
Subdivide the interval into subintervals, one for each existing event.
Select the subinterval corresponding to the event that really occurs next. Make it the new current interval.
As discussed in the first paragraph, Arithmetic coding assigns to a message a sequence of bits; it treats a whole string as a single unit. In Arithmetic coding, we do not necessarily use integer values to the symbols like we do in Huffman coding. Rather, we use bits according to the probability of the symbols present in an alphabet. More bits are required for low probability symbols and less bits for high probability ones, as in Huffman coding. The main concept of Arithmetic coding is to assign each symbol an interval. We start with the interval (0, 1) and divide each interval into subintervals. This subinterval is then taken as the interval of the next symbol, and the final output is the interval of the last symbol
Arithmetic Coding vs Huffman Coding
The main difference between Arithmetic and Huffman coding is that in Arithmetic coding we encode sequence of symbols together whereas in Huffman coding each symbol is encoded separately. So, in Arithmetic coding we achieve higher compression than Huffman coding.
Integer Implementation
Arithmetic coding supports fractional values, that is why is has more compression ratio than Huffman coding. But the disadvantage is that the operations and computations become complex. An easy way is to use integer values instead of fractional values, which will make computations rather easy at the cost of reducing the compression ratio. This approach is not recommended in general as we need more and more compression of our data.
Arithmetic Encoding
Given below is the detail of Arithmetic encoding procedure/algorithm.
Initialize u=1, l=0 and Scale3=0.
Get the symbol.
Apply the following formulas to find the sub-intervals:
l←-l+((u-l+1)×Cum_Count(x-1))/(Total_Count)
u←-l+ ((u-l+1)×Cum_Count(x))/(Total_Count)-1
while (MSB(u)=MSB(l) or E3 condition holds)
if(MSB(u)=MSB(l))
{
Send MSB.
Shift l to left by 1bit and shift 0 into LSB.
Shift u to left by 1bit and shift 1 into LSB.
While(Scale3>0)
{
Send complement of MSB.
Decrement Scale3.
}
}
if(E3 condition holds)
{
Send MSB.
Shift l to left by 1bit and shift 0 into LSB.
Shift u to left by 1bit and shift 1 into LSB
Complement (new) MSB of l and u.
Increment Scale3.
}
Arithmetic Decoding
Given below is the Arithmetic Decoding algorithm:
Step1: Initialize u=1, l=0 and k=0.
Step2:Read m bits of the received sequence/bitstreams into tag t.
Step3:k=0
while ( ((t-l+1)×TotalCount-1)/(u-l+1)≥ Cum_Count(k))
k← k+1
Decode symbol x.
Step4:Apply the following formulas to find the sub-intervals:
l←l+((u-l+1)×Cum_Count(x-1))/(Total_Count)
u←l+((u-l+1)×Cum_Count(x))/(Total_Count)-1
Step5:
while (MSB(u)=MSB(l) or E3 condition holds)
if(MSB(u)=MSB(l))
{
Shift l to left by 1bit and shift 0 into LSB.
Shift u to left by 1bit and shift 1 into LSB.
Shift t to left by 1bit and read next bit from received bitstreams into LSB.
}
if(E3 condition holds)
{
Shift l to left by 1bit and shift 0 into LSB.
Shift u to left by 1bit and shift 1 into LSB.
Shift t to left by 1bit and read next bit from received bitstreams into LSB.
Complement (new) MSB of l and uand t.
}
Experimental Results
Testing different images using Huffman and Arithmetic coding yields different results. We can see in the given table below that the compression ratio of Arithmetic coding for different images is more than that of Huffman coding, but the processing time of Arithmetic coding is more than Huffman coding. This means the Arithmetic coding achieves more compression ratio at the expense of taking more execution time.
We should also examine in the above table that upon increasing the image sizes, the compression ratio of Arithmetic coding gets much improved than Huffman coding. For example, the compression ratio of Huffman coding increases from 4.38bits/sample to just 6.37bits/sample by increasing the image sizes from 128X128 to 2048X2048. On the other hand, Arithmetic coding achieves much higher compression in the same scenario given above, an increase from 4.65bits/sample to 12.02bits/sample can be seen.
Similarly, the execution time for both Huffman and Arithmetic compression is evident from the table. Arithmetic compression requires much more execution time than Huffman compression. So, Arithmetic coding is considered to be much more complex and time consuming than Huffman coding, which is one of the disadvantages of Arithmetic coding.