CREATE OWN LIBRARY

Huffman Coding

Back to Programming

Description

Overview


         Huffman Coding: It is a lossless data compression method which uses variable length character coding to encode data based on the probability or frequency of each character. Huffman coding uses prefix codes to encode data.

 

         Prefix code: Assignment of codewords to characters such that no codeword is a prefix of another. Prefix codes can be described as a binary tree where the codewords are the leaf nodes of a tree and the left branch means“0”and the right branch means “1” as shown in figure below. 



The prefix code can be decoded for a character by traversing from the root node to the node with that character. For example, in the figure above the prefix code of b can be obtained traversing from root to b. The prefix code is thus, 001

Prerequisites:

 ●     Heap tree 


Procedure:

Let us consider the following characters and their frequencies as shown in table below,



To generate huffman codes,

(1) build the mean heap of characters based on their frequencies.

(2) Extract two minimum nodes from the heap

(3) Build the huffman tree by creating a new internal node with the first extracted node as its left child and second one as its right child such that the of the internal node is equal to the sum of the frequency of the child nodes. 

(4) Finally, push the internal node into the min heap.  

 

Step 1:The table represents the min heap in tabular format where d is the root node.


We extract the two minimum nodes with minimum frequency from the min heap which are cand d. We create a tree with a new internal node represented by a dummy character$1 such that frequency($1) = frequency(c) + frequency(d)= 5 + 5 = 10. Next, we push $1 to the min heap.





Step 2:

The new min heap after step 1 is shown below. We extract the two minimum nodes with minimum frequency from the min heap which are band $1. We create a new internal node represented by a dummy character $2 such that frequency($2) = frequency($1)+frequency(b)= 10 + 40 = 50. Next, we push $2 to the min heap.




Step 3: 

The new min heap after step 2 is shown below. We extract the two minimum nodes with minimum frequency from the min heap which are a and $2. We create a new internal node represented by a dummy character ‘$3 such that frequency($3) = frequency($2) + frequency(a) = 50 + 50 = 100. Next, we push $2 to the min heap.




Step 4:

The new min heap after step 1 is shown below. Since, there is only one node in the min heap i.e the heap size is 1, we terminate the process. 



Next, we assign prefix code to the huffman tree. We start from the root node, assign 0 to every left edge and 1 to every right edge as shown in figure below.



The prefix codes of each character are the values of edges traversed to reach the character from the root node. The prefix codes are,



To find the average bits per character we use the following formula,

 Total Number of bits used in encoding the characters = frequency of characters x length of prefix codes = 50 x 1 + 40 x 2 + 5 x 3 + 5 x 3 = 160. 


Total number of characters = sum of the frequencies of the characters = 50 + 40 + 5 + 5 = 100. 

Average bits per character = Total Number of bits used in encoding the characters / Total number of characters = 160 / 100 = 1.6


Algorithm

Algorithm:


Input:

1.     An array of unique characters along with their frequency of occurrences


Output:

1.     Displays the prefix code of each character

2.     Displays the average bits per character 


Algo: 


Step 1: Start 


Step 2: Define user defined datatype Node consisting of char data, intfreq and pointers left & right of type Node 


Step 3: Defining user defined datatypeMinHeap consisting of int size & capacity and an array nodes of Node 


Step 4: Set inttotalBits     0


Step 5:Defining heapify function

           voidheapify(structMinHeap* minHeap, int i)

                       Set smallest     i

                       Set left      (2 * i) + 1

                       Set right     (2 * i) + 2

                       Set temp     Null

                       Check if (left <minHeap.size AND minHeap.nodes[left].freq<                                             minHeap.nodes[smallest].freq)

                                   Set smallest     left

                       Check if (right <minHeap.size AND minHeap.nodes[right].freq<                                        minHeap.nodes[smallest].freq)

                                   Set smallest     right

                               Check if (smallest ≠ i)

                                   Set temp     minHeap.nodes[smallest]

                                   Set minHeap.nodes[smallest]     minHeap.nodes[i]

                                   Set minHeap.nodes[i]     temp

                                   Call heapify(minHeap, smallest)

 

Step 6:extractMin function

           struct Node* extractMin(structMinHeap* minHeap)

                       Set temp     minHeap.nodes[0]

                       Set minHeap.nodes[0]     minHeap.nodes[minHeap.size – 1]

                       Set minHeap.sizeminHeap.size – 1

                       Call heapify(minHeap, 0)

                       Return temp

 

Step 7: Defining insertMinHeap function

           voidinsertMinHeap(structMinHeap* minHeap, struct Node* parent)

                       Set minHeap.sizeminHeap.size + 1

                       Set i     minHeap.size – 1

                       While (parent.freq<minHeap.nodes[(i - 1) / 2].freq AND i > 0) do

                                   Set minHeap.nodes[i]     minHeap.nodes[(i - 1) / 2]

                                   Set i     (i – 1) / 2

                       End While

                       Set minHeap.nodes[i]     parent

 

Step 8:Defining buildMinHeap function

           voidbuildMinHeap(structminHeap *minHeap)

                       Set n     minHeap.size – 1

                       for i = (n - 10) / 2 to i >= 0 repeat

                                   Call heapify(minHeap, i)

                                   Set i     i – 1

 

Step 9: Defining createMinHeap function

           structMinHeap* createMinHeap(char chars[], intfreq[], intlen)

                       Set minHeapGetNode(MinHeap)

                       Set minHeap.size     0

                       Set minHeap.capacitylen

                       Set minHeap.nodesGetNode(minHeap.capacity * Node)

                       for i = 0 to i <len repeat

                                   Set minHeap.nodes[i]     GetNode(Node)

                                   Set minHeap.nodes[i].left     Null

                                   Set minHeap.nodes[i].right     Null

                                   Set minHeap.nodes[i].freqfreq[i]

                                   Set minHeap.nodes[i].data     chars[i]

                                   Set i     i + 1

                       minHeap.sizelen

                       returnminHeap

 

Step 10: Defining buildHuffmanTree function

           struct Node* buildHuffmanTree(char chars[], intfreq[], intlen)

                       Set Node *left     Null

                       Set Node *right     Null

                       Set Node *parent     Null

                       Set Node *root     Null

                       Set MinHeap *minHeap     call createMinHeap(chars, freq, len)

                      Call buildMinHeap(minHeap)

                       While (minHeap.size ≠ 1) do

                                   Set left     extractMin(minHeap)

                                   Set right     extractMin(minHeap)

                                   Set parent     GetNode(Node)

                                   Set parent.left     left

                                   Set parent.right     right

                                   Set parent.data     ‘$’

                                   Set parent.freqleft.freq + right.freq

                                   Call insertMinHeap(minHeap, parent)

                       End While

                       Set root     extractMin(minHeap)

                       Return root

 

Step 11:Defining displayPrefixCode function

           voiddisplayPrefixCode(struct Node* root, int prefix[], int top)

                       Check if (root.left ≠ Null)

                                   Set prefix[top]     0

                                   Call displayPrefixCode(root.left, prefix, top + 1)

                       Check if (root.right ≠ Null)

                                   Set prefix[top]     1

                                   Call displayPrefixCode(root.right, prefix, top + 1)

                       Check if ((root.left = Null) AND (root.right = Null))

                                   Display “Prefix code of ”root.data “ = ”

                                   for i = 0 to i < top repeat

                                               Display prefix[i]

                                               Set i = i + 1

                                   Set totalBitstotalBits + (root.freq * i)

 

Step 12: Defining main function

           Display “Enter Number Of Characters”

           Read len

           Set char chars[len]     Null

           Set intfreq[len]     Null

           Set intprefix[height]     Null

           Set int top     0

           Set Node *root     Null

           for i = 0 to i <len repeat

                       Display “Character ” (i + 1)

                       Read chard[i]

                       Display “Frequency ” (i+1)

                       Read freq[i]

                       Set totalFreqtotalFreq + freq[i]

                       Set i     i + 1

           Set root     Call buildHuffmanTree(chars, freq, len)

           Call displayPrefixCode(root, prefix, top)

           Set avgtotalBits / totalFreq

           Display “Average Bits Per Character =” avg

 

Step 13: Stop

 

Code

Characteristics of Huffman Coding:

 ●     Huffman coding algorithm requires a priority queue of a heap tree to store and extract the characters with least frequency

●     It requires an extra space for storing the characters and their frequencies in an array to form the heap. The space complexity is O(n) where n is the number of characters.

●     Huffman coding uses a form of optimal prefix codes to encode data.


Huffman coding Applications:

 ●     Arithmetic coding

●     Lossless data compression (for example, DEFLATE

 

Huffman Coding Advantage: 

●     Huffman coding results in lossless data compression.

●     Huffman coding assigns shorter prefix code to words with higher frequencies which reduces storage space.

 

Huffmand Coding Disadvantage:

●     Huffman coding algorithm needs to know in advance the frequency of each character.


Complexity of Huffman Coding:

 

Time Complexity of Huffman Coding:

         Let n be the number of character. In the buildHuffmanTree() function, the buildMinHeap() function is called one time which takes a time complexity of O(n). The extractMin() function takes a time complexity of O(log n) . Inside the buildHuffmanTree() function, the extractMin() function is called 2*(n-1) time which takes a time complexity of 2*(n-1)*O(log n) = O(n log n). Hence, the total time complexity of the algorithm is O(n) + O(nlogn) = O(nlogn).


 Space Complexity of Huffman Coding:

         The input characters are stored in heap tree of size n where n is the number of characters. Hence, the space complexity of the algorithm is O(n).