0% found this document useful (0 votes)
5 views47 pages

Notes Data Compression Unit 4

The document discusses data compression, focusing on lossy methods, quantization, and distortion criteria. It covers various quantization types, including uniform, adaptive, and non-uniform quantization, along with their respective challenges and models. Additionally, it highlights the importance of distortion measures and rate-distortion theory in assessing the quality of compression algorithms.

Uploaded by

d3660301
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views47 pages

Notes Data Compression Unit 4

The document discusses data compression, focusing on lossy methods, quantization, and distortion criteria. It covers various quantization types, including uniform, adaptive, and non-uniform quantization, along with their respective challenges and models. Additionally, it highlights the importance of distortion measures and rate-distortion theory in assessing the quality of compression algorithms.

Uploaded by

d3660301
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Data Compression

(KCS-064)
Unit 4 Mr. Yusuf Parvez
Unit 4
▪ Distortion criteria, Models
▪ Quantization:
▪ The Quantization problem
▪ Uniform Quantizer,
▪ Adaptive Quantization,
▪ Non uniform Quantization.
Background
•Math Foundations: Introducing key mathematical background for lossy compression,
similar to what we did before lossless schemes.
•Lossless focuses on rate (bits per sample); lossy adds worry about information
loss/distortion.
We assess the ways to measure the impact of distortion in lossy methods.
•Information Theory Recap: Revisits theory, emphasizing rate-distortion theory—trade-
offs between lower bit rates and introduced distortion.
•Compression Models: Overview of models used to build lossy compression algorithms.
Distortion Criteria

▪ Distortion measures fidelity between original X and reconstructed X.


▪ Subjective methods use human judges but are biased and impractical for design.
▪ Objective difference measures are preferred for math tractability.
▪ Difference Distortion Measures.
Average measures summarize errors:
• Mean Squared Error (MSE): It measures the average squared difference between the orginal
data and the reconstructed signal.

(time average for ergodic processes). Assumes zero-mean additive error.


Root Mean squared Error : The square root of MSE, providing error in the same unit as the orginal
data
RMSE = 𝑴𝑺𝑬
• Signal-to-Noise Ratio (SNR): Compares the level of desired signal to the level of background
noise

• Peak Signal to Noise Ration (PSNR): Measures the ratio between the maximum possible
signal power and the power of distortion noise.
• PSNR =
Average measures summarize errors:
• Mean Absolute Difference:

• Maximum Error:

• Perceptual models transform to human sensory space before measuring difference, though
complex.
Models
Model plays an important role in the design of lossy compression algorithm.
Lossy compression removes same data to reduce file size while keeping perceptual quality acceptable.

Probability models are used for the design and analysis of lossy compression schemes differ from those used in
the design and analysis of lossless compression schemes.

When developing models in the lossless compression we try for an exact match but in lossy compression schemes ,
we look more to general rather than exact correspondence.

Some models used are

1. The uniform,
2. Gaussian,
3. Laplacian,
4. gamma distributions
Uniform Distribution:

A uniform distribution model assumes equal probability for all values in


a range.

It is simple and useful in initial modelling, random data or uniform


quantization.
Gaussian Distribution:
The Gaussian distribution is a bell shaped, symmetric curve representing data clustered around
a mean. It is widely used in lossy compression to model, natural signals, noise and transform
coefficient enabling efficient quantization and encoding.

The Gaussian distribution is one of the most commonly used probability models for two reasons:
it is mathematically tractable and, by virtue of the central limit theorem,
it can be argued that in the limit the distribution of interest goes to a Gaussian distribution. The
probability density function for a random variable with a Gaussian distribution and mean and
variance is
Laplacian Distribution:
Many sources that we deal with have distributions that have a sharp peak at zero. For example,
speech consists mainly of silence. Therefore, samples of speech will be zero or close to zero
with high probability.

Image pixels themselves do not have any attraction to small values. However, there is a high
degree of correlation among pixels. Therefore, a large number of the pixel-to-pixel differences
will have values close to zero. In these situations, a Gaussian distribution is not a very close
match to the data. A closer match is the Laplacian distribution, which is peaked at zero. The
distribution function for a zero mean random variable with Laplacian distribution and variance
is
Gamma Distribution:
A distribution that is even more peaked, though considerably less tractable, than the Laplacian distribution
is the gamma distribution. It is useful in lossy compression, particularly for modeling non-negative data.

The distribution function for a gamma distributed random variable with zero mean and variance is given by
Quantization

Quantization is a lossy process that reduces the precision of data to achieve


compression.
It maps the large set of input values to smaller set of output values, discarding less
important information.

Quantization are of 2 types

1. Scalar quantization (The set of Input and output of a quantizer are scalar in nature.)

2. Vector quantization (The set of input and output of a quantizer are vector in nature.)
The Quantization problem

The quantization problem is the challenge of how to represent a continuous range of values (analog signal) using a
finite number of discrete levels.
In real-world signals:

•Input values can take any value (infinite possibilities)


•But digital systems can store only limited values (finite levels)

So we, Map infinite values → limited set of levels


Key Issues in the Quantization Problem

[Link] Levels
1. Only a fixed number of output values available

[Link]
1. Output is not exactly equal to input

[Link]-off
1. More levels → less error → more bits needed
2. Fewer levels → more error → less storage
Quantizer consists of two types of mapping

1. Encoder mapping
2. Decoder mapping

Encoder Mapping:
• Encoder divides the different range values into no of intervals.
• Each interval is represented by a distinct codeword.

• Mapping for 3-bit encoder


Decoder Mapping:
• There is now way of knowing which value was actually generated by space or source.
• To solve this problem we use the midpoint of the interval.

INPUT CODES OUTPUT


000 -3.5
001 -2.5
010 -1.5
011 -0.5
100 0.5
101 1.5
110 2.5
111 3.5

Mapping for a 3-bit D/A converter


Quantizer input output Mapping

▪ Construction of interval can be viewed as part of the design of encoder.


▪ Selection of reconstruction values is part of design of the decoder.
▪ Quality of construction value form Encoder-Decoder pair quantizers.
Input model for a X (random variables)
Internal
The fidelity of the reconstruction depends on both the interval and the
reconstruction values. Therefore, when designing or analyzing encoder
and decoder, it is reasonable to view them as a pair.

There are two types of codewords to represent the quantizer output:


i) Fixed length codewords
ii) Variable length codewords

Fixed length codewords: If we used fixed length codewords to represent the quantizer
output , then the size of the output alphabet immediately specified the rate. If the number of
quantizer output is M, then the rate is given by

R = 𝒍𝒐𝒈𝟐 𝑴
Variable length codewords: We can use the variable length codes to represent the quantizer
output.
Codeword assignment for an eight level quantizer.

y1 1110
Y2 1100
Y3 100
y4 00
Y5 01
Y6 101
Y7 1101
y8 1111
Uniform Quantization

Uniform quantization divides the signal into equal-sized intervals. Each interval has the
same width, called the step size. Here all the reconstruction levels are equally spaced,

𝒓𝒊+𝟏 − 𝒓𝒊 = 𝚫 𝒐𝒓 𝜽 𝟏≤𝒊≤𝑳−𝟏

Easy Explanation
All parts of the signal are treated equally. The quantizer does not consider where most values occur.
Example

> 3-bit quantizer → L = 8


> Range: [-4, 4]

Levels:
-3.5, -2.5, -1.5, -0.5, 0.5, 1.5, 2.5, 3.5

Input: x = 1.7
Falls in [1,2) → Output = 1.5

The input is rounded to the nearest level.


Adaptive Quantization
Adaptive quantization refers to adjusting the decision and reconstruction levels according to the
local statistics of the prediction error.

Adaptive quantization = changing the step size depending on the signal


•Small changes → use fine steps (more accuracy)
•Big changes → use big steps (save space)
Forward Adaptive Quantizer (Offline adaptive approach) :

In the forward adaptive quantization, each block is analyzed before quantization and the quantizer
parameter are set accordingly. The setting of the quantizer are then transmitted to the receiver as side
information.

1
Estimation of variance locally → 𝜎 𝑞2 = σ𝑁 −1
𝑖=0 𝑥 𝑛2 + 𝑖
𝑁
The selection of block size is a critical issue:

• If the size is small, the adaption to the local statistics will be effective, but the side information needs to

be sent frequently (More bits are used for sending side information).

• If the size is large, the bits used for side information decrease, the adaptation becomes less sensitive to

changing statistics, and both processing delay and storage required increase.

• A compromise between quality of side information and effectiveness of adaptation produces a good

selection of the block size.


Backward Adaptive Quantizer (Online adaptive approach) :

In backward adaptive quantization, only the post quantized samples are available for use in adapting the
quantizer.

Backward adaptive quantization

Δ 𝑛 = 𝑀𝑙(𝑛−1)Δ 𝑛−1

Where l(n-1) is the quantization interval at time (n-1)


Non-Uniform Quantization

A quantizer that has non-uniform intervals is called a non uniform quantizer. In an analogus
fashion, in order to decrease the average distortion, we can try to approximate the input better
in regions of high probability perhaps at the cost of worse approximation in regions of lower
probability. We can do this by making the quantization interval smaller in those regions that
have more probability mass.

Step size 𝜟 is not constant / equal.


In order, In order to decrease the average distortion, we can try to approximate the input better in that region high
probability.
• Low probability → cost of worse approximate .
• Smaller interval in those regions that
have more probability mass.

• Interval near the origin are smaller, while the


Interval away from the origin are larger.
“How do we decide where to keep small steps and where to keep large steps?”

Approaches to design non-uniform quantizers are:

1. Pdf optimized quantization (Lloyd max algorithms )

2. Compounded quantization

PDF-Optimized Quantization:

A non-uniform quantization method where the quantization intervals are designed based on the Probability
Density Function (PDF) of the input signal, so that overall distortion is minimized.

PDF (Probability Density Function) tells that “Where values occur more often”
Example:
Speech → values near 0 are very common
So PDF is high near 0

“Not all errors are equal — errors in frequent values matter more, so we reduce them.”
If we do not use the pdf-optimization then:

• Uniform quantizer wastes levels on rare values.

• High distortion happens.

Whereas by using PDF optimization:


More levels where data is frequent
Less levels where data is rare
Minimum average distortion
PYQ

1. What are distortion criteria in lossy compression? Explain various distortion measures.
2. What is quantization? Explain the quantization problem with suitable example.
3. Differentiate between uniform and non-uniform quantization.
4. Explain adaptive quantization and its various approaches.
5. What is rate–distortion theory? Explain its significance in lossy compression.
6. Explain the rate–distortion function for a binary source.
7. Explain the additive noise model of a quantizer.
8. Differentiate between fidelity and quality in lossy compression systems.
9. Explain quantization and the quantization problem.
10. Differentiate between uniform and non-uniform quantization with examples.
11. Explain adaptive quantization.
12. What are distortion criteria used in lossy compression?
13. Explain rate-distortion theory.
14. Explain additive noise model of quantizer.
15. Differentiate between fidelity and quality.
EXAM-ORIENTED 7-MARK QUESTIONS
1. Explain the quantization problem. Describe uniform and non-uniform quantization in detail.
2. What is rate-distortion theory? Explain the rate-distortion function for a binary source.
3. Discuss various distortion criteria used in lossy data compression.
4. Explain adaptive quantization and methods of adapting quantizer parameters.
5. Explain additive noise model of a quantizer with suitable explanation.

You might also like