0% found this document useful (0 votes)

13 views19 pages

Gaussian Classifier and Bayesian Decision Theory

The document discusses the Gaussian classifier and its foundation in Bayesian decision theory, emphasizing the Bayes decision rule and the maximum a-posteriori (MAP) rule for minimizing error in classification tasks. It explains the implications of class probabilities and how they affect decision boundaries, particularly in cases where classes have different covariance structures. The Gaussian classifier is presented as a multivariate extension, leading to linear discriminant functions when class covariances are equal.

Uploaded by

gasev82612

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views19 pages

Gaussian Classifier and Bayesian Decision Theory

Uploaded by

gasev82612

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

The Gaussian classifier

Nuno Vasconcelos
ECE Department, UCSD
Bayesian decision theory
recall that we have
• Y – state of the world
• X – observations
• g(x) – decision function
• L[g(x),y] – loss of predicting y with g(x)
Bayes decision rule is the rule that minimizes the risk
Risk = EX ,Y L( X , Y )

given x, it consists of picking the prediction of

minimum conditional risk
M
g ( x) = arg min  PY | X (i | x) L[ g ( x), i ]
*

g ( x) i =1

2
MAP rule
for the “0-1” loss
1, g ( x)  y
L[ g ( x), y ] = 
0, g ( x) = y
the optimal decision rule is the maximum a-posteriori
probability rule
g * ( x) = arg max PY | X (i | x)
i

the associated risk is the probability of error of this

rule (Bayes error)
there is no other decision function with lower error

3
MAP rule
by application of simple mathematical laws (Bayes
rule, monotonicity of the log)
we have shown that the following three decision
rules are optimal and equivalent
• 1) i * (x ) = arg max PY |X (i | x )
i

• 2) i * (x ) = arg maxPX Y| (x | i )PY (i )

• 3) i * (x ) = arg maxlog PX Y| (x | i ) + log PY (i )

• 1) is usually hard to use, 3) is frequently easier than 2)

4
Example
the Bayes decision rule is usually highly intuitive
we have used an example from communications
• a bit is transmitted by a source, corrupted by noise, and
received by a decoder
Y X
channel

• Q: what should the optimal decoder do to recover Y?

5
Example
this was modeled as a classification problem with
Gaussian classes

PX |Y ( x | 0) = G ( x, m 0 ,  ) 1 −
( x−m )2

G ( x, m ,  ) = e 2 2
PX |Y ( x | 1) = G ( x, m1 ,  ) 2 2

PY (0) = PY (1) = 1
2
• or, graphically,

m0 m1
6
BDR
for which the optimal decision boundary is a
threshold
• pick “0” if m + m0
x 1
2

m0 m1
pick 0 pick 1

7
BDR
what is the point of going through all the math?
• now we know that the intuitive threshold is actually optimal,
and in which sense it is optimal (minimum probability or
error)
• the Bayesian solution keeps us honest.
• it forces us to make all our assumptions explicit
• assumptions we have made
• uniform class probabilities PY (0) = PY (1) = 1
2

• Gaussianity PX |Y ( x | i) = G( x, mi ,  i )

• the variance is the same under the two states  i =  , i

• noise is additive X =Y +
• even for a trivial problem, we have made lots of
assumptions
8
BDR
what if the class probabilities are not the same?
• e.g. coding scheme 7 = 11111110
• in this case PY(1) >> PY(0)
• how does this change the optimal decision rule?

i* ( x) = arg maxlog PX |Y ( x | i) + log PY (i)

i
  1 −
( x − mi ) 2 
 
= arg max log  e 2  + log PY (i )
2

i   2 2  
 1 ( x − mi ) 2 
= arg max − log( 2 ) −
2
+ log PY (i )
i  2 2 2

 ( x − mi ) 2 
= arg min  − log P (i ) 
 2
2 Y
i 
9
BDR
 ( x − mi ) 2 
• or i* = arg min  − log PY (i )
 2
2
i 
= arg min ( x 2 − 2 xmi + mi − 2 2 log PY (i ))
2

= arg min (−2 xmi + mi − 2 2 log PY (i ))

• the optimal decision is, therefore

• pick 0 if
− 2 xm0 + m0 − 2 2 log PY (0)  −2 xm1 + m1 − 2 2 log PY (1)
2 2

PY (0)
2 x( m1 − m0 )  m1 − m0 + 2 log
2 2 2

PY (1)
• or, pick 0 if

m1 + m 0 2 P ( 0)
x + log Y
2 m1 − m 0 PY (1)
10
BDR
what is the role of the prior for class probabilities?
𝜇1 + 𝜇0 𝜎2 𝑃𝑌 (0)
𝑥< + log
2 𝜇1 − 𝜇0 𝑃𝑌 (1)

• the prior moves the threshold up or down, in an intuitive

way
• PY(0)>PY(1) : threshold increases
• since 0 has higher probability, we care more about errors on
the 0 side
• by using a higher threshold we are making it more likely to pick
0
• if PY(0)=1, all we care about is Y=0, the threshold becomes
infinite
• we never say 1
• how relevant is the prior?
• it is weighed by 1
൘𝜇1 − 𝜇0
𝜎2 11
BDR
how relevant is the prior?
• it is weighed by the inverse of the normalized distance
between the means
1
m1 − m 0 distance between the means
2 in units of variance
• if the classes are very far apart, the prior makes no
difference
• this is the easy situation, the observations are very clear, Bayes
says “forget the prior knowledge”
• if the classes are exactly equal (same mean) the prior gets
infinite weight
• in this case the observations do not say anything about the
class, Bayes says “forget about the data, just use the
knowledge that you started with”
• even if that means “always say 0” or “always say 1”
12
The Gaussian classifier
this is one example of a Gaussian classifier
• in practice we rarely have only one variable
• typically X = (X1, …, Xn) is a vector of observations
the BDR for this case is equivalent, but more
interesting
the main difference is in the class-conditional
distributions, which are multivariate Gaussian
PX |Y ( x | i ) =
1  1 
exp − ( x − mi )T i−1 ( x − mi )
(2 ) d | i |  2 

13
The Gaussian classifier
in this case
1  1 
PX |Y ( x | i ) = exp − ( x − mi )T  i−1 ( x − mi )
(2 ) d |  i |  2 

• the BDR

i * (x ) = arg maxlog PX Y| (x | i ) + log PY (i )

• becomes

 1
i (x ) = arg max − (x − mi )T i−1 (x − mi )
*

i  2
1 
− log( 2 )d i + log PY (i )
2 
14
1
𝑖 ∗ (𝑥) = argmax ൤− (𝑥 − 𝜇𝑖 )𝑇 Σ𝑖−1 (𝑥 − 𝜇𝑖 )

The Gaussian classifier

𝑖 2
1
− log( 2𝜋)𝑑 Σ𝑖 + log 𝑃𝑌 (𝑖)൨
2

this can be written as

discriminant:
i (x ) = arg min d i (x , mi ) + ai 
*
PY|X(1|x ) = 0.5
i

with

d i (x , y ) = (x − y )T i−1 (x − y )

a i = log( 2 )d i − 2 log PY (i )
the optimal rule is to assign x to the closest class
closest is measured with the Mahalanobis distance
di(x,y)
to which a constant is added to account for class prior
15
The Gaussian classifier
first special case of interest:
• classes have the same covariance,

i = , i

the BDR becomes

i* ( x) = arg min d ( x, mi ) + a i 
i

• with
same metric for
d ( x, y ) = ( x − y )T  −1 ( x − y ) all classes

constant, not function

a i = log( 2 )  − 2 log PY (i)
d
of i, can be dropped

16
The Gaussian classifier
in detail

𝑖 ∗ (𝑥) = argmin (𝑥 − 𝜇𝑖 )𝑇 Σ−1 (𝑥 − 𝜇𝑖 ) − 2 log 𝑃𝑌 (𝑖)

𝑖

= argmin 𝑥 𝑇 Σ −1 𝑥 − 𝑥 𝑇 Σ −1 𝜇𝑖 − 𝜇𝑖 𝑇 Σ −1 𝑥 + 𝜇𝑖 𝑇 Σ−1 𝜇𝑖 − 2 log 𝑃𝑌 (𝑖)

𝑖

= argmin 𝑥 𝑇 Σ −1 𝑥 − 2𝜇𝑖 𝑇 Σ−1 𝑥 + 𝜇𝑖 𝑇 Σ −1 𝜇𝑖 − 2 log 𝑃𝑌 (𝑖)

𝑖

𝑇 −1
1 𝑇 −1
= argmax 𝜇𝑖 Σ 𝑥− 𝜇𝑖 Σ 𝜇𝑖 + log 𝑃𝑌 (𝑖)
𝑖 2
𝑤𝑖𝑇 𝑤𝑖0

17
1
𝑖 ∗ (𝑥) = argmax 𝜇𝑖 𝑇 Σ −1 𝑥− 𝜇𝑖 𝑇 Σ −1 𝜇𝑖 + log 𝑃𝑌 (𝑖)
2

The Gaussian classifier

𝑖 𝑇
𝑤𝑖 𝑤𝑖0

in summary, when classes have equal covariance,

𝑖 ∗ (𝑥) = argmax𝑔𝑖 (𝑥) discriminant:

𝑖 PY|X(1|x ) = 0.5

• with

g i ( x) = wiT x + wi 0
wi =  −1mi
1 T −1
wi 0 = − mi  mi + log PY (i )
2
• the BDR is a linear function or a linear discriminant

18
19

Bayesian Decision Theory Overview
No ratings yet
Bayesian Decision Theory Overview
32 pages
Bayesian Decision Theory Overview
No ratings yet
Bayesian Decision Theory Overview
64 pages
Bayesian Decision Theory Overview
No ratings yet
Bayesian Decision Theory Overview
65 pages
Bayesian Decision Theory Overview
No ratings yet
Bayesian Decision Theory Overview
63 pages
Unit-2 1 Slides
No ratings yet
Unit-2 1 Slides
84 pages
Decesion Surfacenew
No ratings yet
Decesion Surfacenew
88 pages
Bayesian Decision Theory Overview
No ratings yet
Bayesian Decision Theory Overview
40 pages
Bayesian Decision Theory Overview
No ratings yet
Bayesian Decision Theory Overview
38 pages
1.3 - Error Rate Classification
No ratings yet
1.3 - Error Rate Classification
15 pages
Bayes Decision Theory Overview
No ratings yet
Bayes Decision Theory Overview
53 pages
Bayesian Decision Theory
No ratings yet
Bayesian Decision Theory
89 pages
Introduction to Bayesian Classifiers
No ratings yet
Introduction to Bayesian Classifiers
31 pages
Bayesian Decision Theory Overview
No ratings yet
Bayesian Decision Theory Overview
32 pages
Classifiers and Decision Surfaces in Pattern Recognition
No ratings yet
Classifiers and Decision Surfaces in Pattern Recognition
40 pages
Chap 2 Part1
No ratings yet
Chap 2 Part1
29 pages
Weatherwax Theodoridis Solutions
No ratings yet
Weatherwax Theodoridis Solutions
212 pages
Bayesian Decision Theory Overview
No ratings yet
Bayesian Decision Theory Overview
46 pages
Bayesian Decision Theory Explained
No ratings yet
Bayesian Decision Theory Explained
25 pages
Chap 2 25VF
No ratings yet
Chap 2 25VF
61 pages
Bayesian Decision Theory Explained
No ratings yet
Bayesian Decision Theory Explained
21 pages
Bayesian Decision Theory Overview
No ratings yet
Bayesian Decision Theory Overview
64 pages
Understanding Bayes Classifier Basics
No ratings yet
Understanding Bayes Classifier Basics
23 pages
Bayes Classification Techniques Explained
No ratings yet
Bayes Classification Techniques Explained
86 pages
Bayes Decision Theory in Classification
No ratings yet
Bayes Decision Theory in Classification
82 pages
Introduction to Pattern Recognition
No ratings yet
Introduction to Pattern Recognition
72 pages
Understanding Linear Classification Techniques
No ratings yet
Understanding Linear Classification Techniques
31 pages
Bayes Decision Theory Explained
No ratings yet
Bayes Decision Theory Explained
49 pages
Bayesian Classifiers in MATLAB
No ratings yet
Bayesian Classifiers in MATLAB
21 pages
Bayesian Decision Theory Overview
No ratings yet
Bayesian Decision Theory Overview
10 pages
Bayesian Decision Theory Overview
No ratings yet
Bayesian Decision Theory Overview
16 pages
Pattern Recognition Overview by Theodoridis
No ratings yet
Pattern Recognition Overview by Theodoridis
80 pages
MAP Decision Rule in Classification Theory
No ratings yet
MAP Decision Rule in Classification Theory
38 pages
Bayes Decision Theory Explained
No ratings yet
Bayes Decision Theory Explained
16 pages
Bayes Decision Theory in Pattern Recognition
No ratings yet
Bayes Decision Theory in Pattern Recognition
44 pages
Understanding Empirical Risk Minimization
No ratings yet
Understanding Empirical Risk Minimization
34 pages
Optimum Statistical Classifiers Explained
100% (1)
Optimum Statistical Classifiers Explained
12 pages
Minimum Error Rate Classification
No ratings yet
Minimum Error Rate Classification
16 pages
Bayesian Decision Theory in Pattern Recognition
No ratings yet
Bayesian Decision Theory in Pattern Recognition
13 pages
Bayesian Classifier and Decision Theory
No ratings yet
Bayesian Classifier and Decision Theory
74 pages
Bayesian Decision v2
No ratings yet
Bayesian Decision v2
35 pages
Understanding Naive Bayes and Decision Theory
No ratings yet
Understanding Naive Bayes and Decision Theory
48 pages
Bayesian Decision Theory: Prof. Richard Zanibbi
No ratings yet
Bayesian Decision Theory: Prof. Richard Zanibbi
47 pages
Bayes Classifiers and Decision Theory
No ratings yet
Bayes Classifiers and Decision Theory
35 pages
Bayes Decision Theory and Classification
No ratings yet
Bayes Decision Theory and Classification
30 pages
Understanding Bayes Classifier in ML
No ratings yet
Understanding Bayes Classifier in ML
57 pages
Pattern Recognition Notes by Theodoridis
100% (1)
Pattern Recognition Notes by Theodoridis
209 pages
LDA and Logistic Regression in MY474
No ratings yet
LDA and Logistic Regression in MY474
58 pages
Bayesian Decision Theory in Machine Learning
No ratings yet
Bayesian Decision Theory in Machine Learning
9 pages
ML & AI Notes
No ratings yet
ML & AI Notes
24 pages
1.2 - Bayesian Decision Theory
No ratings yet
1.2 - Bayesian Decision Theory
20 pages
All of Statistics 74
No ratings yet
All of Statistics 74
5 pages
Bayes Minimum Risk Classifier Detailed Explanation
No ratings yet
Bayes Minimum Risk Classifier Detailed Explanation
2 pages
Bayesian Decision Theory Overview
No ratings yet
Bayesian Decision Theory Overview
38 pages
Discriminant Functions in Gaussian Models
No ratings yet
Discriminant Functions in Gaussian Models
7 pages
Chapter 07
No ratings yet
Chapter 07
68 pages
Bayesian Decision Theory Explained
No ratings yet
Bayesian Decision Theory Explained
63 pages
Bayes Decision Theory in Machine Learning
No ratings yet
Bayes Decision Theory in Machine Learning
5 pages
Lecture 02
No ratings yet
Lecture 02
18 pages
Bayesian Decision Theory Explained
No ratings yet
Bayesian Decision Theory Explained
63 pages
Viscous Flow Analysis Around Cylinders
No ratings yet
Viscous Flow Analysis Around Cylinders
21 pages
Livguard Battery Features & Warranty Details
No ratings yet
Livguard Battery Features & Warranty Details
2 pages
Limiting and Excess Reactants Explained
No ratings yet
Limiting and Excess Reactants Explained
8 pages
Transistor Basics: PNP & NPN Overview
No ratings yet
Transistor Basics: PNP & NPN Overview
23 pages
Microprocessor and Peripheral Devices Overview
No ratings yet
Microprocessor and Peripheral Devices Overview
70 pages
Total Cookies on 4 Trays Calculation
No ratings yet
Total Cookies on 4 Trays Calculation
9 pages
Key Terms in Circle Geometry
No ratings yet
Key Terms in Circle Geometry
3 pages
Video Wall Equipment and Pricing Details
0% (1)
Video Wall Equipment and Pricing Details
5 pages
Approval Records Summary
No ratings yet
Approval Records Summary
80 pages
Forestry Agroforestry Exam Paper 2020
No ratings yet
Forestry Agroforestry Exam Paper 2020
62 pages
Abhitej Vissamsetty's Resume
No ratings yet
Abhitej Vissamsetty's Resume
3 pages
Maharashtra Board Solutions Class 12 Arts Science Maths Part 1 Chapter 7 Linear Programming 1
No ratings yet
Maharashtra Board Solutions Class 12 Arts Science Maths Part 1 Chapter 7 Linear Programming 1
73 pages
Affidavit of Loss for Driver's License
50% (2)
Affidavit of Loss for Driver's License
2 pages
Chapter 1 Environmental Problems and Their Causes
100% (1)
Chapter 1 Environmental Problems and Their Causes
44 pages
B503 Steel Mesh Specifications
No ratings yet
B503 Steel Mesh Specifications
1 page
Understanding Metaphors in Literature
No ratings yet
Understanding Metaphors in Literature
5 pages
Milind Khandare: Payments Marketing Lead
No ratings yet
Milind Khandare: Payments Marketing Lead
3 pages
Trac 104 SDS 241122
No ratings yet
Trac 104 SDS 241122
8 pages
McDonald's Menu Nutritional Info
100% (1)
McDonald's Menu Nutritional Info
6 pages
Year 6 English Diagnostic Test
No ratings yet
Year 6 English Diagnostic Test
6 pages
Yamaha AL115FX Parts Catalogue
No ratings yet
Yamaha AL115FX Parts Catalogue
54 pages
Asuhan Keperawatan pada Syok Sepsis
No ratings yet
Asuhan Keperawatan pada Syok Sepsis
26 pages
Geology Review: Arun-Tamor, Eastern Nepal
No ratings yet
Geology Review: Arun-Tamor, Eastern Nepal
19 pages
Civilization of the Spectacle Explained
No ratings yet
Civilization of the Spectacle Explained
4 pages
Method Statement for Pipeline Construction
100% (4)
Method Statement for Pipeline Construction
13 pages
10-40-10 Fertilizer Product Overview
No ratings yet
10-40-10 Fertilizer Product Overview
11 pages
Grade 3 Maths Scheme: Term 3 Activities
100% (1)
Grade 3 Maths Scheme: Term 3 Activities
8 pages
Manny Patiño - Afro-Cuban Keyboard Grooves
97% (32)
Manny Patiño - Afro-Cuban Keyboard Grooves
77 pages
Covered Court Construction Estimate
No ratings yet
Covered Court Construction Estimate
255 pages
MIT Economics Program Overview
No ratings yet
MIT Economics Program Overview
18 pages