Huffman encoding compression

Abstract:

Data Compression is also known as a source coding. Encoding the information by using less bits is called Huffman encoding compression. Huffman encoding compression is the process of reducing the quantity to represent a file which could be image or video content without effecting the quality of a file.

Introduction:

The Huffman encoding is used to for lossless data compression. The number of bits required to transmit or store the files also less compared to the original storage. Huffman encoding compression is a technique where is no data loss. Huffman coding is purely based on the frequency count of pixel of a file. It reduces the count of frequently occurring pixels. The aim of the Huffman encoding compression is to reduce the number of bits that frequently occurs.

Basic Elements of Compression:

1. Digital data representation:

In the Digital data representation, the data representation is done in the form of 0’s and 1’s .The information could be images, text, video or audio. It uses the same number of bits to represent the characters to compress the data. The number of bits used to compress the data is always less than the bits used to represent the original bits.

2. Color representation:

Color representation is based on RGB, brightness and intensities. All the colors in this process is set to the maximum intensities to get the white color perception. The color detection is predicted by the how the human eye process the light. The data is represented in two dimensional format which represents the RGB (Red, yellow, Green).

3. Digitization:

The image which is captured by a light sensor must be digitized .There are different digitization techniques which are spatial sampling, Temporal sampling, Quantization. In the spatial sampling, the two dimensional sampling points are converted into single dimensional process through a rastar scanning process. In temporal sampling the videos are used to digitize. The digitization is done in24/7 process since the human perception is slow in video frames. In the quantization process the data can be compressed which has continuous values that cannot be digitized since the data might lose in such cases.

Huffman algorithm

It is a lossless data compression is used to reduce the number of bits to represent the data. The variable length code table is used to encode the data. The variable length code table is made by using probability of occurrence of each value of the symbol. It uses a specific method to represent the data while data compressed is called prefix free code. It is a technique to represent most frequently used characters using fewer bits. There is no other separate symbols are used to represent the same data so that it produces the smaller output size. For a given set of data with a uniform distribution and a number of members which is a power of two, Huffman is similar to the ASCII.

For a give set of data and the frequency occurrences, crate a binary tree based on the probability of occurance, which is the less frequency characters from left to right tree putting them together to form another equivalent symbol having a probability that equals the sum of the two symbols. Follow the same process till it becomes single character.

This is basically implemented by using binary tree and stored in the form of an array. The number of symbols in the array represents the size of an array. A node in the binary tree could be an internal node or a leaf node. In the beginning of a binary tree all nodes are leaf node that has only symbols and the internal nodes contains the symbols with weights. The internal nodes have two child nodes as well parent link. The left child is ‘0’ and the right child is ‘1’.Complete binary tree has N leaf nodes and the leaf node has N-1 internal nodes. Linear Huffman coding can be implemented using two queues since there are two weights have to be stored. The two weights are initial weights and combined weights.

Steps to create a binary tree:

1) Make the number of leaves as the number of symbols present in the data.

2) Enqueue all leaf nodes in the queue.

3) If more than one node present in the tree, dequeue the least weight node. Make a new internal node with the recently deleted two children. And move to the next queue which is second queue. The last node is the root node.

Let us consider an example, take five letters with the different frequencies like {a, b, c, d, e} and respective frequencies are {20, 15, 5, 15, 45}.

Solution: take the least weights 5 and 15 as n1 and then a and b as n2

Here n1=20,n2=35 and e=45

The next step is merge n1 and n2 that is n3=55, e=45.

Properties:

1) Prefix property is very useful for decoding and it is unambiguous.

2) Accurate

3) The power value of two should be negative to get the efficient encoding.

Applications:

1) Huffman coding is used in the arithmetic coding since it uses the binary coding.

2) It is used in the different domain because of its accuracy and it is very simple.

3) It is used in the multimedia compression models.

Advantages:

1) Easy to use

2) No loss of data

Disadvantages:

1) The compression model not only depends on the algorithm what we use but also on the data files.

2) If an image contains same number of pixels with the same frequency count then there is going to be data loss.

3) It is slow process since there are so many procedures we need to follow to compress the data.

4) The size of the data is one more constrain while compressing.

5) Huffman encoding table is required to decode.

Conclusion:

The Huffman coding is used to get the optimal results and it is efficiently used to compress the data compared to the other algorithms. It is very easy to implement and used in wide range of applications. The data loss is also less compared to the other algorithms. Especially this algorithm is used in multimedia like images, videos applications.

Reference:

1) Huffman coding, http://paper.ijcsns.org/07_book/201005/20100520.pdf

2) Analysis of Data Compression Techniques using Huffman Coding and Arithmetic Coding,http://home.cse.ust.hk/faculty/golin/COMP271Sp03/Notes/MyL17.pdf

3) Compression using Huffman coding,http://ijarcsse.com/Before_August_2017/docs/papers/Volume_6/5_May2016/V6I5-0357.pdf