0% found this document useful (0 votes)
5 views16 pages

Unit 3

Data compression reduces the size of digital data by encoding it with fewer bits, improving efficiency in storage and transmission. It can be categorized into lossless and lossy compression, each serving different purposes based on data integrity and quality requirements. Techniques like Huffman coding, LZ77, and arithmetic coding are commonly used, with applications spanning multimedia streaming, file storage, and communication systems.

Uploaded by

Suraj Mahto
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views16 pages

Unit 3

Data compression reduces the size of digital data by encoding it with fewer bits, improving efficiency in storage and transmission. It can be categorized into lossless and lossy compression, each serving different purposes based on data integrity and quality requirements. Techniques like Huffman coding, LZ77, and arithmetic coding are commonly used, with applications spanning multimedia streaming, file storage, and communication systems.

Uploaded by

Suraj Mahto
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Introduction to Data Compression

Overview
Data compression is the process of reducing the size of digital data by encoding information
using fewer bits than the original representation. It is an essential technique in multimedia,
networking, storage systems, and communication because it helps save storage space, reduce
transmission time, and improve system efficiency.

With the rapid growth of digital media such as images, audio, and video, large volumes of
data must be handled efficiently. Compression makes it possible to store and transmit this
data without excessive resource usage while maintaining acceptable quality.

Objectives of Data Compression


 Reduce storage requirements
 Minimize transmission bandwidth
 Speed up data transfer
 Lower cost of storage and communication
 Improve system performance

Basic Principle of Compression


Compression works by eliminating redundancy in data. Redundancy refers to repeated or
predictable patterns that can be represented more efficiently.

Types of Redundancy Removed

1. Spatial Redundancy – Repetition within data (e.g., repeated pixels)


2. Temporal Redundancy – Repetition over time (e.g., video frames)
3. Coding Redundancy – Inefficient data representation

By identifying these patterns, algorithms encode the information more compactly.

Compression Model
A typical compression system consists of:

 Encoder (Compressor)
Converts original data into compressed form
 Decoder (Decompressor)
Reconstructs data from compressed form

Original Data → Encoder → Compressed Data → Decoder → Reconstructed Data

Categories of Compression
Lossless Compression

 No information is lost
 Original data perfectly restored
 Used for text, documents, software

Lossy Compression

 Some data removed permanently


 Higher compression ratio
 Used for audio, image, video

Applications of Data Compression


 Multimedia streaming
 File storage and archiving
 Internet data transmission
 Mobile communication
 Medical imaging
 Satellite communication

Advantages of Data Compression


 Saves storage space
 Reduces bandwidth usage
 Improves transfer speed
 Enables efficient multimedia distribution

Limitations
 Compression requires processing power
 Lossy methods reduce quality
 Some algorithms are complex to implement

Lossless & Lossy Compression


Data compression techniques are broadly classified into two main categories based on
whether the original data can be perfectly reconstructed after decompression:

[Link] Compression
[Link] Compression

Each method serves different purposes depending on the type of data and required accuracy.

Lossless Compression
Overview

Lossless compression is a technique in which the original data can be reconstructed exactly
from the compressed data without any loss of information. Every bit of data remains intact
after decompression.

This type of compression is essential when accuracy and data integrity are critical.

Working Principle

Lossless methods remove redundancy by:

 Identifying repeating patterns


 Encoding frequently occurring symbols efficiently
 Replacing repeated sequences with shorter representations

No data is discarded — only reorganized.

Common Techniques

 Huffman Coding
 Run-Length Encoding
 Lempel–Ziv methods (LZ77, LZW)

(These will be discussed later in detail)

Applications

 Text files
 Program files
 Databases
 Medical images
 Legal or financial documents

Advantages

 Perfect reconstruction
 No quality degradation
 Suitable for critical data

Limitations

 Lower compression ratio compared to lossy methods


 Less efficient for multimedia data

Lossy Compression
Overview
Lossy compression reduces file size by permanently removing less important or less
noticeable data. The reconstructed data is an approximation of the original.

This method is widely used where slight quality loss is acceptable.

Working Principle

Lossy algorithms:

 Analyze human perception limits


 Remove inaudible or invisible details
 Approximate original signals

Once removed, the discarded information cannot be recovered.

Common Techniques

 Transform coding
 Quantization
 Perceptual encoding

Applications

 Audio streaming
 Image sharing
 Video broadcasting
 Multimedia storage

Advantages

 High compression ratio


 Reduced storage and bandwidth use
 Efficient multimedia delivery

Limitations

 Quality loss
 Irreversible process
 May introduce distortion

Comparison Between Lossless and Lossy Compression


Feature Lossless Lossy
Data Recovery Exact Approximate
Quality Preserved Reduced
Compression Ratio Lower Higher
Usage Text, software Audio, video
Reversibility Yes No
Huffman Coding & Algorithm — Detailed
Notes
Overview
Huffman Coding is a widely used lossless data compression technique developed by David
Huffman. It is based on assigning variable-length binary codes to symbols depending on their
frequency of occurrence. Symbols that appear more frequently are assigned shorter codes,
while less frequent symbols receive longer codes. This approach reduces the overall number
of bits required to represent data.

Huffman coding is used in many compression standards such as image, audio, and file
compression systems because it efficiently removes coding redundancy without losing
information.

Basic Principle
The algorithm works on the idea that:

 Frequently occurring symbols should use fewer bits


 Rare symbols should use more bits
 No code should be a prefix of another code (Prefix Property)

This ensures unambiguous decoding.

Steps in Huffman Coding Algorithm


1. List Symbols and Frequencies
Determine how often each symbol appears in the data.
2. Create Leaf Nodes
Treat each symbol as a node labeled with its frequency.
3. Build the Huffman Tree
Select two nodes with the smallest frequencies
Combine them into a new node
Assign the sum of their frequencies
Repeat until one tree remains
4. Assign Binary Codes

Traverse tree from root


Assign:

Left branch → 0

Right branch → 1

The path to each symbol forms its code

5. Encode Data
Replace symbols with corresponding binary codes.

Example Concept (Illustrative)


If symbol frequencies are:

Symbol Frequency
A High
B Medium
C Low

Then codes might be:

 A → Short code
 B → Medium length
 C → Longer code

This reduces total storage compared to fixed-length coding.

Decoding Process
 Read encoded bits sequentially
 Traverse Huffman tree accordingly
 Output symbol when leaf node reached

Since codes follow prefix property, decoding is straightforward and accurate.

Characteristics of Huffman Coding


 Variable-length coding
 Prefix-free representation
 Optimal for symbol-by-symbol coding
 Based on probability distribution

Advantages
 No loss of information
 Efficient for data with unequal symbol frequencies
 Simple decoding mechanism
 Widely used in compression systems

Limitations
 Requires frequency table storage
 Less effective if symbol frequencies are uniform
 Static Huffman needs preprocessing
 Tree construction adds computation

Applications
 File compression utilities
 Image compression
 Multimedia encoding
 Data transmission systems

Shannon–Fano Algorithm — Detailed Notes


Overview
The Shannon–Fano algorithm is an early lossless data compression technique developed by
Claude Shannon and Robert Fano. It is based on assigning variable-length binary codes to
symbols according to their probabilities or frequencies of occurrence. Similar to Huffman
coding, symbols that appear more frequently receive shorter codes, while less frequent
symbols receive longer codes.

Although it is not always optimal like Huffman coding, the Shannon–Fano method
introduced the fundamental idea of entropy-based coding and laid the foundation for modern
compression techniques.

Basic Principle
The algorithm follows these principles:

 Arrange symbols based on decreasing probability (or frequency).


 Divide the set of symbols into two groups with nearly equal total probabilities.
 Assign binary digits to each group:
o First group → 0
o Second group → 1
 Repeat the division process recursively for each group until each symbol has a unique
code.

This results in prefix-free variable-length codes that enable correct decoding.


Steps of the Shannon–Fano Algorithm
[Link] Symbol Frequencies
Determine how often each symbol appears in the input data.

[Link] Symbols
Arrange symbols in descending order of frequency.

[Link] Symbols
Divide the list into two parts such that the total frequency of each part is as close as possible.

[Link] Codes

 First group receives binary digit 0


 Second group receives binary digit 1

[Link] Recursively
Apply the same partitioning and coding to each subgroup until each symbol has a distinct
code.

Encoding Process
 Replace each symbol with its assigned binary code
 Store or transmit encoded data

This reduces the total number of bits required compared to fixed-length encoding.

Decoding Process
 Read binary sequence
 Match prefix codes
 Identify corresponding symbols

Since codes are prefix-free, decoding remains unambiguous.

Characteristics
 Variable-length coding
 Based on symbol probability
 Prefix property maintained
 Simpler conceptually than Huffman

Advantages
 Reduces coding redundancy
 Easy to understand and implement
 Efficient for uneven symbol distributions
 Historically important for compression theory

Limitations
 Not always optimal (may produce longer codes than Huffman)
 Efficiency depends on partition accuracy
 Less commonly used in modern systems

Comparison with Huffman Coding


Feature Shannon–Fano Huffman
Optimal coding Not guaranteed Guaranteed optimal
Tree construction Top-down partitioning Bottom-up merging
Efficiency Slightly lower Higher
Usage Historical/educational Widely used

Adaptive & Arithmetic Coding


Overview
Adaptive coding and arithmetic coding are advanced techniques used in lossless data
compression. Unlike simple variable-length coding methods, these approaches improve
compression efficiency by dynamically adjusting symbol representation or encoding entire
messages as numerical ranges. They are widely applied in modern multimedia and
communication systems.

Adaptive Coding
Introduction

Adaptive coding refers to compression methods that adjust coding parameters dynamically
while processing data, rather than relying on predefined symbol frequencies. The encoder and
decoder update their models continuously based on the data being processed.

This makes adaptive coding suitable when symbol probabilities are unknown or change over
time.

Working Principle

 Initially assume equal probability for symbols


 As symbols are processed, update frequency statistics
 Modify codes according to updated probabilities
 Decoder performs identical updates to maintain synchronization
This allows the coding scheme to “learn” patterns from incoming data.

Characteristics

 No prior knowledge of data distribution required


 Model updates during encoding
 Responds to changing data patterns
 Often used with adaptive Huffman coding

Advantages

 Better performance for dynamic data streams


 Eliminates need to transmit frequency tables
 Flexible and efficient for real-time processing

Limitations

 Higher computational complexity


 Encoder and decoder must remain synchronized
 Implementation more complex than static methods

Arithmetic Coding
Introduction

Arithmetic coding is a powerful lossless compression technique that encodes an entire


message into a single fractional number between 0 and 1. Instead of assigning codes to
individual symbols, it represents the sequence as a range that becomes narrower as more
symbols are processed.

This method often achieves higher compression efficiency than Huffman coding.

Working Principle

[Link] probability ranges to symbols


[Link] with interval [0,1)
[Link] each symbol:

 Subdivide interval according to probabilities


 Select sub-interval corresponding to symbol
[Link] interval uniquely represents message

Any number within the final interval can represent the encoded data.

Example Concept

If symbols A, B, and C have assigned probability ranges:

 A → first portion
 B → middle portion
 C → last portion

As symbols are processed, the interval becomes smaller and more precise.

Characteristics

 Encodes entire message as one value


 Uses probability modeling
 Produces highly efficient compression
 Works well with adaptive probability updates

Advantages

 Higher compression efficiency


 Handles fractional bit representation
 Effective for skewed probability distributions

Limitations

 Complex implementation
 Slower processing than simpler methods
 Requires high numerical precision

Comparison: Adaptive vs Arithmetic Coding


Feature Adaptive Coding Arithmetic Coding
Approach Updates symbol model Encodes message range
Complexity Moderate High
Efficiency Improved over static Very high
Usage Dynamic data streams Advanced compression systems

Dictionary-Based Compression (LZ77,


LZW)
Overview
Dictionary-based compression techniques are a class of lossless data compression methods
that reduce data size by replacing repeated patterns with shorter references stored in a
dictionary. Instead of encoding symbols individually, these methods identify recurring strings
or sequences and substitute them with references pointing to earlier occurrences.

These techniques are widely used because they are efficient, adaptable, and suitable for
different types of data such as text, images, and executable files.
Two well-known dictionary-based algorithms are:

 LZ77 (Lempel–Ziv 1977)


 LZW (Lempel–Ziv–Welch)

LZ77 Compression
Introduction

LZ77 is a sliding-window compression algorithm that replaces repeated data sequences with
references to previous occurrences within a fixed-size window of recently processed data.

It does not maintain a permanent dictionary; instead, it uses previously encoded data as its
reference source.

Working Principle

The algorithm divides data into two buffers:

[Link] Buffer

 Contains recently encoded data


 Used to find matching patterns

[Link]-Ahead Buffer

 Contains upcoming data to encode

Process:

 Compare look-ahead data with search buffer


 Find longest matching sequence
 Encode reference as a triple:

(offset, length, next symbol)

 Offset → Distance to match


 Length → Size of match
 Next symbol → Following unmatched character

Characteristics

 Uses sliding window


 No explicit dictionary storage
 Effective for repetitive data
 Basis for many modern compressors

Advantages
 Good compression for repeated patterns
 No prior knowledge required
 Adaptive to input data

Limitations

 Performance depends on window size


 Searching for matches can be slow
 Less effective on non-repetitive data

LZW Compression
Introduction

LZW improves dictionary compression by building a dictionary dynamically during


encoding. Unlike LZ77, it stores patterns explicitly in a dictionary table and assigns codes to
sequences.

It is widely used in formats such as GIF images and file compression utilities.

Working Principle

[Link] dictionary with basic symbols


[Link] input sequence
[Link] longest sequence in dictionary
[Link] dictionary code
[Link] new sequence to dictionary
[Link] until input ends

The decoder rebuilds the same dictionary using identical logic.

Characteristics

 Dictionary grows dynamically


 Encodes sequences instead of single symbols
 No need to transmit dictionary

Advantages

 High compression efficiency


 Fast encoding and decoding
 Works well with structured data

Limitations

 Dictionary size management required


 Less efficient for random data
 Memory usage increases
Comparison: LZ77 vs LZW
Feature LZ77 LZW
Dictionary Sliding window Explicit table
Encoding Offset-length reference Dictionary codes
Memory use Lower Higher
Speed Slower search Faster lookup
Efficiency Good Often better

Compression Ratio & Comparison


Compression ratio is an important metric used to evaluate the effectiveness of a data
compression technique. It measures how much the size of data has been reduced after
compression. By comparing different compression methods using this metric and other
performance factors, multimedia designers and system engineers can choose the most
appropriate algorithm for their needs.

Compression Ratio
Definition

Compression ratio expresses the relationship between the size of original data and the size of
compressed data.

Formula

Compression Ratio=Original SizeCompressed Size\text{Compression Ratio} = \frac{\


text{Original Size}}{\text{Compressed
Size}}Compression Ratio=Compressed SizeOriginal Size

Interpretation

 Higher ratio → Better compression efficiency


 Lower ratio → Less size reduction

Example

If a file of 100 MB is compressed to 25 MB:

Compression Ratio=10025=4:1Compression\ Ratio = \frac{100}{25} =


4:1Compression Ratio=25100=4:1

This means the compressed file is four times smaller than the original.

Compression Efficiency Measures


• Bit Rate Reduction

Indicates how much data transmission requirement has been reduced.

• Space Saving Percentage

Shows percentage of storage saved.

Space Saving=Original−CompressedOriginal×100\text{Space Saving} = \frac{Original -


Compressed}{Original} \times 100Space Saving=OriginalOriginal−Compressed×100

• Quality Retention

Important in lossy compression to evaluate how much original quality is preserved.

Factors Affecting Compression Ratio


 Nature of data (text, audio, video)
 Redundancy level in data
 Algorithm used
 Compression settings
 Acceptable quality loss

Highly repetitive data typically compresses better than random data.

Comparison of Compression Techniques


Technique Type Compression Efficiency Complexity Typical Use
Huffman Lossless Moderate Low Text/data files
Shannon–Fano Lossless Moderate Low Educational use
Arithmetic Coding Lossless High High Advanced systems
LZ77 Lossless Good Moderate File compression
LZW Lossless Good/High Moderate Images/files
Lossy Methods Lossy Very High Moderate Multimedia

Trade-Offs in Compression
Choosing a compression method often involves balancing:

 Compression ratio
 Processing speed
 Memory usage
 Reconstruction quality
 Application requirements

For example:
 Text documents require lossless accuracy
 Video streaming prioritizes higher compression ratios

Importance in Multimedia Systems


Compression ratio analysis helps:

 Optimize storage resources


 Improve streaming performance
 Reduce network bandwidth usage
 Enhance user experience

It plays a crucial role in multimedia design, communication systems, and cloud storage.

You might also like