[
attachment=13714]
DATA COMPRESSION TECHNIQUES
Introduction
Data compression is the process of encoding data so that it takes less storage space or less transmission time than it would if it were not compressed.
Compression is possible because most real-world data is very redundant
Compression Techniques
LOSSLESS COMPRESSION
In lossless data compression, the integrity of the data is preserved, the original data and the data after compression and decompression are exactly the same.
The compression and decompression algorithms are exact inverses of each other: no part of the data is lost in the process.
Redundant data is removed in compression and added during decompression.
Lossless compression methods are normally used when we cannot afford to lose any data.
Run-length encoding
It is probably the simplest method of compression. It can be used to compress data made of any combination of symbols.
The general idea behind this method is to replace consecutive repeating occurrences of a symbol by one occurrence of the symbol followed by the number of occurrences.
The method can be even more efficient if the data uses only two symbols (for example 0 and 1) in its bit pattern and one symbol is more frequent than the other.
Run-length encoding example
Run-length encoding for two symbols
Huffman coding
It assigns shorter codes to symbols that occur more frequently and longer codes to those that occur less frequently.
For example, imagine we have a text file that uses only five characters (A, B, C, D, E).
Before we can assign bit patterns to each character, we assign each character a weight based on its frequency of use.
Contd…
Lempel Ziv encoding
It is an example of a category of algorithms called dictionary-based encoding.
The idea is to create a dictionary (a table) of strings used during the communication session
If both the sender and the receiver have a copy of the dictionary, then previously-encountered strings can be substituted by their index in the dictionary to reduce the amount of information transmitted.
LOSSY COMPRESSION METHODS
Our eyes and ears cannot distinguish subtle changes. In such cases, we can use a lossy data compression method.
These methods are cheaper—they take less time and space when it comes to sending millions of bits per second for images and video.
Several methods have been developed using lossy compression techniques.
Contd…
JPEG (Joint Photographic Experts Group) encoding is used to compress pictures and graphics
MPEG (Moving Picture Experts Group) encoding is used to compress video
MP3 (MPEG audio layer 3) for audio compression.
JPEG encoding
An image can be represented by a two-dimensional array (table) of picture elements (pixels).
A grayscale picture of 307,200 pixels is represented by 2,457,600 bits, and a color picture is represented by 7,372,800 bits.
In JPEG, a grayscale picture is divided into blocks of 8 × 8 pixel blocks to decrease the number of calculations.
Contd…
The whole idea of JPEG is to change the picture into a linear (vector) set of numbers that reveals the redundancies. The redundancies (lack of changes) can then be removed using one of the lossless compression methods we studied previously. A simplified version of the process is shown in Figure 15.11.
Discrete cosine transform (DCT)
In this step, each block of 64 pixels goes through a transformation called the discrete cosine transform (DCT).
The transformation changes the 64 values so that the relative relationships between pixels are kept but the redundancies are revealed.
The formula is given in Appendix G. P(x, y) defines one value in the block, while T(m, n) defines the value in the transformed block.
Quantization
After the T table is created, the values are quantized to reduce the number of bits needed for encoding.
Quantization divides the number of bits by a constant and then drops the fraction. This reduces the required number of bits even more.
In most implementations, a quantizing table (8 by 8) defines how to quantize each value.
This is done to optimize the number of bits and the number of 0s for each particular application.
Compression
After quantization the values are read from the table, and redundant 0s are removed.
To cluster the 0s together, the process reads the table diagonally in a zigzag fashion rather than row by row or column by column.
JPEG usually uses run-length encoding at the compression phase to compress the bit pattern resulting from the zigzag linearization.
zigzag linearization
MPEG encoding
The Moving Picture Experts Group (MPEG) method is used to compress video.
A motion picture is a rapid sequence of a set of frames in which each frame is a picture.
In other words, a frame is a spatial combination of pixels, and a video is a temporal combination of frames that are sent one after another.
MPEG frames
MP3 compression
Audio compression can be used for speech or music. For speech we need to compress a 64 kHz digitized signal, while for music we need to compress a 1.411 MHz signal.
Two categories of techniques are used for audio compression:
1: predictive encoding
2: perceptual encoding.