Digital Image
Processing
Data Compression-1
Dr. Muhammad Irfan
Material Reference
Images and Material
From
Rafael C. Gonzalez and Richard E. Wood,
Digital Image Processing, 2nd Edition.& Internet Resources
2
Contents
Recap
Data Compression
Lossless Compression
Run Length Encoding
Huffman Encoding
3
OBJECTIVES
After reading this chapter, the reader should
be able to:
Realize the need for data compression.
Differentiate between lossless and lossy
compression.
Understand three lossless compression
encoding techniques: Run length, Huffman,
and Lempel Ziv
Understand two lossy compression
methods: JPEG and
MPEG.
4
Why Data Compression?
Storage:
Data compression reduces the size of a file
to reduce the storage space required to
store that particular file.
Data Transmission:
It saves the time that is required in
transmitting a file.
5
What is Data Compression?
The Art of representing information in a
compact form is called Data Compression.
Data Compression is a reduction in the
number of bits needed to represent data.
Compressing data can save storage capacity,
speed file transfer, and decrease costs for
storage hardware and network bandwidth.
Data Compression refers to the reducing the
number of bits that need to be transmitted
over communication channel.
6
What is Data Compression?
Data Compression becomes particularly
important when we send data with high size
such as audio & video
Even with very fast transmission speed of
data we need to send data in short time. We
need to Compress data for this purpose.
Virtually all form of data contain redundancy
i.e. it is the amount of wasted "space" used to
transmit certain data.
By making use of more efficient data
representation methods, redundancy can be
reduced.
7
Compression Ratio
Data compression ratio, also known as
compression power, is a term used to
quantify the reduction in data-representation
size produced by a data compression
algorithm.
Data compression ratio is defined as the ratio
between the uncompressed size and
compressed size.
8
Compression Ratio
Thus a representation that compresses a
10 MB file to 2MB has a compression ratio
of 10/2 = 5, often noted as an exploit
ratio , 5:1 (read “five” to “one”) or as an
implicit ratio 5/1
Sometimes the space saving is given
instead, which is defined as the reduction
in size relative to the uncompressed size.
Thus a representation that compresses a 10
MB file to 2 MB would yield a space saving of
1-2/10 = 0.8 often noted as a percentage,809 %
Components of Data
Compression
Data compression involves two main
components:
Encoding Algorithm
Decoding Algorithm
10
Components of Data
Compression
Encoding Algorithm Decoding Algorithm
This algorithm takes a This algorithm
message and reconstructs the
generates a original message or
compressed some approximation of
representation of that it from the
message. compressed
representation.
11
Data Compression Methods
Data compression means sending or
storing a smaller number of bits.
12
Data Compression Methods
LOSSLESS
COMPRESSION
METHODS
13
Lossless Compression
In lossless data compression, the integrity of
the data is preserved.
The lossless compression refers to data
compression techniques in which no data is
lost.
For most types of data, lossless compression
techniques can reduce the space needed by
only about 50%.
For greater compression, one must use a
lossy compression technique. Note, however,
that only certain types of data -graphics,
audio, and video -- can tolerate lossy
14
compression.
Lossless Compression
You must use a lossless compression
technique when compressing data and
programs.
The PKZIP compression technology is an
example of lossless compression.
With lossless compression, every single bit of
data that was originally in the file remains
after the file is uncompressed. All of the
information is completely restored.
This is generally the technique of choice for
text or spreadsheet files, where losing words
or financial data could pose a problem.
15
Lossless Compression
The Graphics Interchange File (GIF) is an image
format used on the Web that provides lossless
compression.
The original data and the data after compression
and decompression are exactly the same because
the compression and decompression algorithms
are exactly the inverse of each other. Nothing is
lost
Example:
Run-length encoding
Huffman encoding
Lempel Ziv (L Z) encoding (dictionary-based
16
encoding)
Run Length Encoding(RLE)
When data contain strings of repeated
symbols (such as bits or characters), the
strings can be replaced by a special marker,
followed by the repeated symbol, followed by
the number of occurrences.
For example, in given Figure, the symbol # is
the marker. The symbol being repeated (the
run symbol) follows the marker.
After the run symbol, the number of
occurrences (length) is shown by a two-digit
number.
Example : This run-length encoding method
17
can be used in audio (silence is a run of 0s)
and video (run of a picture element having the
Run Length Encoding(RLE)
18
Run Length Encoding(RLE)
It does not need knowledge of the frequency of
occurrence of symbols and can be very
efficient if data are represented as 0s and 1s.
For example:
19
Run Length Encoding for Two
Symbols
We can encode one symbol which is more
frequent than the other.
This example only encode 0’s between 1’s.
There is no 0
between 1’s
20
Huffman Coding Example-1
Cont…
In Huffman coding, you assign shorter codes
to symbols that occur more frequently and
longer codes to those that occur less
frequently.
For example:
Character A B C D E
---------------------------------------------------------------------
Frequency 17 12 12 27 32
Frequency of characters
21
Huffman Coding Example-1
Cont…
22
Huffman Coding Example-1
Cont…
Final tree and
code
23
Huffman Coding Example-1
Cont…
24
Huffman Coding Example-1
Cont…
25
Huffman Coding Example-1
Cont…
Example: Relative frequencies of eight selected
letters given below:
Letter C D E K L M U D
Freque
32 42 120 7 42 24 37 2
ncy
26
Huffman Coding Example-2
Letters and frequencies:
32 42 120 7 42 24 37 2
C D E K L M U Z
27
Huffman Coding Example-2
Cont…
2 7 24 32 37 42 42 120
Z K M C U D L E
28
Huffman Coding Example-2
Cont…
2 7 24 32 37 42 42 120
Z K M C U D L E
29
Huffman Coding Example-2
Cont…
24 32 37 42 42 120
9
M C U D L E
2 7
Z K
30
Huffman Coding Example-2
Cont…
24 32 37 42 42 120
9
M C U D L E
2 7
Z K
31
Huffman Coding Example-2
Cont…
32 37 42 42 120
33
33 L E
C U D
24
9
M
2 7
Z K
32
Huffman Coding Example-2
Cont…
32 37 42 42 120
33
C U D L E
24
9
M
2 7
Z K
33
Huffman Coding Example-2
Cont…
37 42 42 120
65
U D L E
32
33
C
9 24
M
2 7
Z K
34
Huffman Coding Example-2
Cont…
37 42 42 120
U D L 65
E
32 33
C
9 24
M
2 7
Z K
35
Huffman Coding Example-2
Cont…
42 120
65 79
L E
42 37
32 D U
33
C
9 24
M
2 7
Z K
36
Huffman Coding Example-2
Cont…
42 120
65 79
L E
32 42 37
33 D U
C
9 24
9 M
2 7
Z K
37
Huffman Coding Example-2
Cont…
79 107 120
79 E
42 37 42
D U L 65
32 33
C
24
9
M
2 7
Z K 38
Huffman Coding Example-2
Cont…
79 107 120
79 E
42 37 42
D U L 65
32 33
C
24
9 M
2 7
Z K 39
Huffman Coding Example-2
Cont…
120
186
E
107
79
79
42 37 42 65
D U L
32 33
C
24
9
M
2 7
Z K 40
Huffman Coding Example-2
Cont…
120
186
E
107
79
79
42 37 42 65
D U L
32 33
C
24
9
M
2 7
Z K 41
Huffman Coding Example-2
Cont…
306
120 186
E
107
79
42
42 37 65
L
D U
32
33
C
9 24
M
2 7
Z K 42
Huffman Coding Example-2
Cont…
306
1
0
120 186
E 0 1
107
79 1
0
0 1
42
65
42 37 L 1
D U 0
32 33
C
0 1
9 24
M
0 1
2 7
Z K 43
Huffman Coding Example-2
Cont…
Characte Code
Encod r
C: 1110 er
D: 101 E: 0 C 1110
K:111101 L: 110 M: D 101
11111 U:100 E 0
Z:111100
K 111101
L 110
M 11111
U 100
Z 111100
44
Huffman Coding Example-2
Cont…
Encod
C: 1110 erD: 101 E: 0
K:111101 L: 110 M: EDCUZLD
11111 U:100
Z:111100
010111101001111001
10101
Huffman Code
45
Huffman Coding Example-2
Cont…
Decod Huffman
1110: Cer101: D
0: E Code
01011110100111100 Receive
111101: K 110: L 110101
d
11111: M
100: U 11100: Z
EDCUZLD
46
Huffman Coding Task
Compress and Decompress the string
BDKFGBDKGFDKB by using Huffman Coding.
The characters and their frequency is given
below:
Character B D F G K
-------------------------------------------------------
---
Frequency 20 15 10 5 40
47
Huffman Coding Summary
The beauty of Huffman coding is that no code
in the prefix of another code.
There is no ambiguity in encoding.
The receiver can decode the received data
without ambiguity.
Huffman code is called instantaneous code
because the decoder can unambiguously
decode the bits instantaneously with the
minimum number of bits.
48
Summary
Objectives
Data Compression
Lossless Compression
Run Length Encoding
Huffman Encoding
49
Slide Credits and
References
Wilhelm Burger and Mark J. Burge, Digital Image
Processing, Springer, 2008
University of Utah, CS 4640: Image Processing Basics,
Spring 2012
Rutgers University, CS 334, Introduction to Imaging and
Multimedia, Fall 2012
[Link]
-compression
[Link]
[Link]
[Link]
data-compression/
THANK YOU
51