0% found this document useful (0 votes)

15 views42 pages

Parametric Classification and Bias in ML

Uploaded by

Tuğba Can

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views42 pages

Parametric Classification and Bias in ML

Uploaded by

Tuğba Can

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

BLG 527E Machine Learning

FALL 2021-2022
Assoc. Prof. Yusuf Yaslan & Assist. Prof. Ayşe Tosun

Parametric Methods
Parametric Estimation
• X = { xt }t where xt ~ p (x)
• Parametric estimation:
Assume a form for p (x |q ) and estimate q , its sufficient statistics,
using X
e.g., N ( μ, σ2) where q = { μ, σ2}

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 2
Maximum Likelihood Estimation
• Likelihood of q given the sample X
l (θ|X) = p (X |θ) = ∏t p (xt|θ)

• Log likelihood
L(θ|X) = log l (θ|X) = ∑t log p (xt|θ)

• Maximum likelihood estimator (MLE)

θ* = argmaxθ L(θ|X)
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 3
Examples: Bernoulli/Multinomial
• Bernoulli: Two states, failure/success, x in {0,1}
P (x) = pox (1 – po ) (1 – x)
L (po|X) = log ∏t poxt (1 – po ) (1 – xt)
MLE: po = ∑t xt / N

• Multinomial: K>2 states, xi in {0,1}

P (x1,x2,...,xK) = ∏i pixi
L(p1,p2,...,pK|X) = log ∏t ∏i pixit
MLE: pi = ∑t xit / N

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 4
Examples: Bernoulli (Derivation)
• Bernoulli: Two states, failure/success, x in {0,1}

P (x) = pox (1 – po ) (1 – x)
t t
L (po|X) = log ∏t pox (1 – po ) (1 – x )

dL( p0 | X ) N t d N
d
 x log( p0 )   (1  x )
t
log(1  p0 )
dp0 t 1 dp0 t 1 dp0
N N
1 1

p0
x 
t 1
t
 (1  x t )
t 1 1  p0
0

5
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Bernoulli (Derivation)
N N N
(1  p0 ) x t  p0  1  p0  x t 0
t 1 t 1 t 1
N N
1
 x  p0 N 0  p0 
t
 x t

t 1 N t 1

MLE: po = ∑t xt / N
Gaussian (Normal) Distribution
• p(x) = N ( μ, σ2)
p x  
1  x   2 
1  x   2 
px  
exp - 
exp 
2
2  2 

2   2  2


μ σ

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 7
Gaussian (Normal) Distribution
• Given that X = { xt }t with xt ~ N ( μ, σ2)

𝐿 ¿
 x t

MLE for μ and σ2: m t

N
 x  m
t 2

s2  t

N
Bias and Variance
Let X be a sample from a population specified up to a
parameter q

To evaluate the quality of this estimator we can measure

how much it is different from q
That is (d (X)-q )2

But since it is random variable (it depends on the

sample) we need to average over all possible X and
consider meas square error of the estimator

Remember the properties of expectation

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 9
Bias and Variance
Unknown parameter q
Estimator di = d (Xi) on sample Xi

Bias: bq(d) = E [d] – q

Variance: E [(d–E [d])2]

Mean square error:

r (d,q) = E [(d–q)2]=E [(d–E[d]+E[d]-θ)2]

= (E[d]-θ)2+E[(d–E[d])2+2 (d–E[d])(E[d]-θ)]

Remember the properties of expectation

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 10
= E[(E[d]-θ)2]+E [(d–E [d])2] +2 E[(d–E[d])(E[d]-θ)]

= E[(E[d]-θ)2]+E [(d–E [d])2] +2 (E[d]–E[d])(E[d]-θ)

= (E [d] – θ)2 +E [(d–E [d])2]

= (E [d] – θ)2 + E [(d–E [d])2]

= Bias2 + Variance
Bayes’ Estimator
• Sometimes before looking at a sample we may have some prior
information on the possible value range that a parameter , θ, may
take.
• This information is quite useful especially when the sample is small.
• Treat θ as a random variable with prior p (θ)

• Bayes’ rule: p (θ|X) = p(X|θ) p(θ) / p(X)

• Density at x: p(x|X) = ∫ p(x|θ, X) p(θ|X) dθ= ∫ p(x|θ) p(θ|X) dθ

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 12
Bayes’ Estimator
• Evaluating the p(x|X) integrals may be quite difficult except in cases
where the posterior has a nice form
• Maximum a Posteriori (MAP): θMAP = argmaxθ p(θ|X)
• Maximum Likelihood (ML): θML = argmaxθ p(X|θ)
• Bayes’: θBayes’ = E[θ|X] = ∫ θ p(θ|X) dθ
• If we have no prior reason to favor some values of θ then the prior
density is flat and the posterior will have the same form as the
likelihood p(X|θ)

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 13
Bayes’ Estimator: Example
• xt ~ N (θ, σo2) and θ ~ N ( μ, σ2) where are μ, σ2, σo2
known
• θML = m

• p(θ|X) ∝ p(X| θ)p(θ)

• Take the derivative with respect to θ

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 14
1  x   2 
P( X |  ) exp   
2  0  2 0 
2

1     2 
P( )  exp   
2   2 
2

Likelihood :
1     2  N 1 
 xt    
2

l ( ) exp    exp  
2   2  2
 t 1 2  0  2 2
0 
Loglikelihood :

 1        N   1   x t  
2
   
2

L( ) log       log   

 2    2 2
 t 1   2  0   2 2
0  
 1       
2 N   1   xt      
2

L( ) log        log   


 2    2 2
 t 1   2  0   2 2
0  
L( )

 

    N xt    0
  2
t 1 02

N
xt N
     N N x t N
  

t 1  0
2
 2
t 1  0  2
 2 
 0 t 1 N

t 1  0
2
 2 2 
 
N  N 
m 2  2  2
0 2
 0 

N /2
1/  2
E  | X   0
m 
N /  0  1/ 
2 2
N /  0 1/ 
2 2
Bayes’ Estimator: Example
• xt ~ N (θ, σo2) and θ ~ N ( μ, σ2)
• θML = m
• θMAP = θBayes’ =

N / 2
1/  2
E  | X   m 0

N /  0  1/ 
2 2
N /  0 1/ 
2 2

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 17
Bayesian Learning for Coin Model
• Bayesian Learning procedure:
• Given data x1, x2,…, xN write down expression for likelihood p(X| θ)

Specify a prior P (θ )

Compute posterior p(θ |X) = p(X| θ)p(θ)/ p(X)

p(θ |X) ∝ p(X| θ)p(θ)

• This example is obtained from Nando Freitas lecture notes

Bayesian Learning for Coin Model
• For coin model likelihood of data (i.i.d. in our case)

P(x1, x2,…, xN| θ ) = ∏t θ xt (1 – θ ) (1 – xt) = θ m (1 – θ ) (N – m)

Where xt ϵ {0,1} and m is the number of 1’s

• Specify a prior on θ. For this we need to introduce Beta distribution

Beta Distribution
(   )   1  ,  are hyperparameters
p ( )   (1   )   1
( )(  )

( z ) e x x z  1dx
0

p( )d 1
(   )   1
( )( )
1
 (1   ) d 1

( )(  ) 
 (1   ) d  (   )
1 1
E[ ] 
 
• The figure is obtained from wikipedia
Bayesian Learning for Coin Model
Compute Posterior: p(θ |X) ∝ p(X| θ)p(θ)

P(x1, x2,…, xN| θ ) = ∏t θ xt (1 – θ ) (1 – xt) = θ m (1 – θ ) (N – m)

(   )   1 1 1 1

p ( )   (1   )   (1   )   1
( )(  ) const
1 1 ( ' ' )  ' 1
p ( | X )   (1   )   1 m (1   ) N  m  p ( | X )   (1   )  ' 1
const ( ' )(  ' )
 ' m   ,  ' N  m  
Conjugate Prior
• Conjugate priors: A likelihood-prior pair is said to be conjugate if they
result in a posterior which is of the same form as the prior.
• This enables us to compute the posterior density analytically without
having to worry about computing the denominator in Bayes' rule, the
marginal likelihood.

Prior Likelihood
Gaussian Gaussian
Beta Binomial
Dirichlet Multinomial
Gamma Gaussian
Example
• Suppose that we observe X= {1,1,1,1,1,1} where each xt comes
from Bernoulli distribution θML = 1
• We can compute posterior and use its mean as the estimate

1 1 ( ' ' )  ' 1 '

p ( | X )   (1   )   1 m (1   ) N  m  p ( | X )   (1   )  ' 1 E[ ] 
const ( ' )(  ' )  ' '
 ' m   ,  ' N  m  

• Using Beta(2,2) prior 8

B 
10
Parametric Classification
gi x  px |C i P C i 
or
gi x  log px |C i  log P C i 

px |C i  
1 
exp  
x  i  
2

2 
2  i  2  i 
1
gi x   log 2  log  i 
x   i 2
 log P C i 
2
2 2 i

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 24
• Given the sample X {x t ,r t }tN1
t
1 if x Ci
x t
ri  t
0 if x C j , j i
• ML estimates are

• Discriminant becomes

1
gi x   log 2  log si 
x  mi 
 log ˆC i 
P
2

2 2 si 2
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 25
Equal variances

Single boundary at
halfway between means

Two boundaries

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 27
Probabilistic Interpretation of Linear Regression

r  f x  
estimator : gx | 
 ~ N 0, 2 
pr | x  ~ N gx | , 2 
N
L  |X  log  px t , r t 
t 1
N N
log  pr t | x t  log  px t 
t 1 t 1
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 28
Regression: From LogL to Error

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 29
Linear Regression
g x t |w1 , w 0 w1 x t  w 0 Take derivative of E

 r
t
t
Nw 0  w1 x t

t
…wrto w0

r x t t
w 0  x  w1  xt

t 2 …wrto w1
t t t

 N t   w0 
x t
   r t

A  w  y  t 
  t x    w1   t x 
t 2    t t
x t
r
t

w A  1y
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 30
Polynomial Regression
g x |w k , , w 2 , w1 , w 0 w k x
t
    w x   w x
t k
2
t 2
1
t
 w0

1 x 1

x 1 2
 x  

 1 k
r 1

 2
D 
1 x 2
x 2 2
 x  
2 k
r r 
   
   N
 1 x N
x N 2
 x  
N 2
 r 

w D D DT r T 1

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 31
Other Error Measures
N 2
1
• Square Error: 
E  |X    r t  g x t | 
2 t 1

N 2

 r t
 g x t |  
E  |X   t 1 N 2
• Relative Square Error:
 r
t 1
t
r 
• Absolute Error: E (θ |X) = ∑t |rt – g(xt| θ)|
• ε-sensitive Error:
E (θ |X) = ∑ t 1(|rt – g(xt| θ)|>ε) (|rt – g(xt|θ)| – ε)

   
E r  gx 2 | x E r  E r | x 2 | x  E r | x   gx 2
noise squared error

 
E X E r | x   g x  | x  E r | x   E X g x   E X g x   E X g x 
2 2
 2

bias variance

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 33
Estimating Bias and Variance

• M samples Xi={xti , rti}, i=1,...,M

are used to fit gi (x), i =1,...,M and t=1,…,N

 g x  f x 
1
Bias g  
2 t t 2

N t

  g x  g x 
1
Variance g   t t 2
i
NM t i

1
g x   g i x 
 
M i

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 34
Bias/Variance Dilemma
• Example: gi(x)=2 has no variance and high bias

gi(x)= ∑t rti/N has lower bias with variance

• As we increase complexity,
bias decreases (a better fit to data) and
variance increases (fit varies more with data)
• Bias/Variance dilemma: (Geman et al., 1992)

variance

Bias Variance Image is obtained from: [Link] 36

Polynomial Regression

Best fit “min error”

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 38
Regression example
Coefficients increase in magnitude as order
increases:

1: [-0.0769, 0.0016]
2: [0.1682, -0.6657, 0.0080]
3: [0.4238, -2.5778, 3.4675, -0.0002
4: [-0.1093, 1.4356, -5.5007, 6.0454, -0.0019]

Idea: Penalize large coefficients

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 39
Regularization
• New Cost Function
2

E w | X     y t  g x t | w    i wi2
N
1
2 t 1
• Ridge RegressionR( w)w 2  w2
i i
• LASSO: R( w) w 1 wi
i

)+
 i wi2 wT w
• = -)+λ

• -) +λ = 0  w +λ 

• +λ)

• =
• Image is obtained from Nando Freitas’ lecture notes

Machine Learning: Parametric Methods
No ratings yet
Machine Learning: Parametric Methods
27 pages
Lecture 13 Merged
No ratings yet
Lecture 13 Merged
83 pages
ML, MAP, and Bayesian Estimation Methods
No ratings yet
ML, MAP, and Bayesian Estimation Methods
34 pages
I2ml3e Chap4
No ratings yet
I2ml3e Chap4
28 pages
Parametric Methods in Machine Learning
No ratings yet
Parametric Methods in Machine Learning
14 pages
MLE Exercises and Solutions
No ratings yet
MLE Exercises and Solutions
49 pages
ML vs MAP Estimation in Bayesian Inference
No ratings yet
ML vs MAP Estimation in Bayesian Inference
35 pages
SMAI 2026 Class Material
No ratings yet
SMAI 2026 Class Material
22 pages
Parametric Methods in Machine Learning
No ratings yet
Parametric Methods in Machine Learning
26 pages
Parameter Estimation in Bayesian Classifiers
No ratings yet
Parameter Estimation in Bayesian Classifiers
19 pages
Parameter Estimation in Bayesian Theory
No ratings yet
Parameter Estimation in Bayesian Theory
6 pages
Lec 03
No ratings yet
Lec 03
25 pages
MLE vs MAP in Machine Learning
No ratings yet
MLE vs MAP in Machine Learning
29 pages
Parameter Estimation & MLE Basics
No ratings yet
Parameter Estimation & MLE Basics
15 pages
02 Statistical Schools
No ratings yet
02 Statistical Schools
29 pages
Parametric vs Non-parametric Estimation
No ratings yet
Parametric vs Non-parametric Estimation
47 pages
Probabilistic Models in Supervised Learning
No ratings yet
Probabilistic Models in Supervised Learning
32 pages
Maximum Likelihood in Machine Learning
No ratings yet
Maximum Likelihood in Machine Learning
3 pages
4-Parametric Methods
No ratings yet
4-Parametric Methods
27 pages
Maximum Likelihood vs Bayesian Estimation
No ratings yet
Maximum Likelihood vs Bayesian Estimation
11 pages
Parametric Methods in Machine Learning
No ratings yet
Parametric Methods in Machine Learning
33 pages
Linear Models in Machine Learning
No ratings yet
Linear Models in Machine Learning
86 pages
Machine Learning: Parametric Methods Guide
No ratings yet
Machine Learning: Parametric Methods Guide
74 pages
ML CH-04
No ratings yet
ML CH-04
26 pages
Parametric Methods in Estimation
No ratings yet
Parametric Methods in Estimation
13 pages
Pattern Recognition: ML & Bayesian Estimation
No ratings yet
Pattern Recognition: ML & Bayesian Estimation
51 pages
Parametric Methods in Machine Learning
No ratings yet
Parametric Methods in Machine Learning
26 pages
Understanding Maximum Likelihood Estimation
No ratings yet
Understanding Maximum Likelihood Estimation
6 pages
MAP vs MLE in Probability Estimation
No ratings yet
MAP vs MLE in Probability Estimation
7 pages
Understanding Parameter Estimation
No ratings yet
Understanding Parameter Estimation
6 pages
Statistical Machine Learning Overview
No ratings yet
Statistical Machine Learning Overview
5 pages
Machine Learning Parameter Estimation Techniques
No ratings yet
Machine Learning Parameter Estimation Techniques
43 pages
Understanding Maximum Likelihood Estimation
No ratings yet
Understanding Maximum Likelihood Estimation
46 pages
Gaussian Parameter Estimation Techniques
No ratings yet
Gaussian Parameter Estimation Techniques
31 pages
Optimization Techniques in ML by Dinesh
No ratings yet
Optimization Techniques in ML by Dinesh
32 pages
Pattern Recognition Parameter Estimation
No ratings yet
Pattern Recognition Parameter Estimation
15 pages
Parameter Estimation in Bayesian Analysis
No ratings yet
Parameter Estimation in Bayesian Analysis
171 pages
Vorlesung Causal Inference-3
No ratings yet
Vorlesung Causal Inference-3
221 pages
Probabilistic Machine Learning Note
No ratings yet
Probabilistic Machine Learning Note
35 pages
Lec 9B - Bayesian Learning II
No ratings yet
Lec 9B - Bayesian Learning II
35 pages
SciML Lecture 4
No ratings yet
SciML Lecture 4
36 pages
MLE vs Bayesian Estimation Explained
No ratings yet
MLE vs Bayesian Estimation Explained
35 pages
Maximum-Likelihood vs Bayesian Estimation
No ratings yet
Maximum-Likelihood vs Bayesian Estimation
9 pages
Parametric Models: MLE & Bayesian Estimation
No ratings yet
Parametric Models: MLE & Bayesian Estimation
33 pages
PCML 2014 Exam Sample Questions
No ratings yet
PCML 2014 Exam Sample Questions
6 pages
Lec 07
No ratings yet
Lec 07
28 pages
1 Intro
No ratings yet
1 Intro
27 pages
Maximum Likelihood for Gaussian Models
No ratings yet
Maximum Likelihood for Gaussian Models
520 pages
Probabilistic Supervised Learning Overview
No ratings yet
Probabilistic Supervised Learning Overview
31 pages
Module 2 Probability Distributions
No ratings yet
Module 2 Probability Distributions
15 pages
Bayesian Classifiers Overview and Theory
No ratings yet
Bayesian Classifiers Overview and Theory
5 pages
Maximum Likelihood Estimation Techniques
No ratings yet
Maximum Likelihood Estimation Techniques
110 pages
Understanding Maximum Likelihood Estimation
No ratings yet
Understanding Maximum Likelihood Estimation
4 pages
Maximum Likelihood Estimation Techniques
No ratings yet
Maximum Likelihood Estimation Techniques
110 pages
Probabilistic Models in AI & ML
No ratings yet
Probabilistic Models in AI & ML
43 pages
Z-Test Hypothesis Analysis Report
No ratings yet
Z-Test Hypothesis Analysis Report
28 pages
Solved Exercises in Statistical Inference
No ratings yet
Solved Exercises in Statistical Inference
229 pages
Model Estimasi Volume Pohon Keruing
No ratings yet
Model Estimasi Volume Pohon Keruing
66 pages
Improvements On CVBootstrap
No ratings yet
Improvements On CVBootstrap
14 pages
Biometry: Statistics in Biology
No ratings yet
Biometry: Statistics in Biology
11 pages
Understanding Statistics: Definitions & Methods
No ratings yet
Understanding Statistics: Definitions & Methods
7 pages
Welding Cycle Time Reduction Project
100% (1)
Welding Cycle Time Reduction Project
50 pages
ANOVA: Effects of Noise on Memory
No ratings yet
ANOVA: Effects of Noise on Memory
35 pages
Sales Forecasting with Exponential Smoothing
No ratings yet
Sales Forecasting with Exponential Smoothing
20 pages
Statistical Analysis and Calculations Guide
No ratings yet
Statistical Analysis and Calculations Guide
55 pages
Residual Plot Analysis in Regression
No ratings yet
Residual Plot Analysis in Regression
117 pages
Paired vs Unpaired T-Test Explained
No ratings yet
Paired vs Unpaired T-Test Explained
17 pages
Kruskal-Wallis Test Overview and Examples
No ratings yet
Kruskal-Wallis Test Overview and Examples
11 pages
5 Steps in Hypothesis Testing
No ratings yet
5 Steps in Hypothesis Testing
99 pages
Applied Mathematics for Engineers CE 207
No ratings yet
Applied Mathematics for Engineers CE 207
23 pages
CS1 Actuarial Statistics Study Guide
No ratings yet
CS1 Actuarial Statistics Study Guide
34 pages
STAT 121 Confidence Interval Assignment
No ratings yet
STAT 121 Confidence Interval Assignment
3 pages
R Programming: Data Types & Operations
No ratings yet
R Programming: Data Types & Operations
23 pages
South Africa VAR Model: Inflation & Rates
No ratings yet
South Africa VAR Model: Inflation & Rates
20 pages
Inovasi dan Citra Merek Vespa Surabaya
No ratings yet
Inovasi dan Citra Merek Vespa Surabaya
14 pages
R Program Evaluation Summary
No ratings yet
R Program Evaluation Summary
659 pages
Hypothesis Testing Concepts and Activities
No ratings yet
Hypothesis Testing Concepts and Activities
33 pages
Math 105 Syllabus: Engineering Data Analysis
No ratings yet
Math 105 Syllabus: Engineering Data Analysis
18 pages
Inventory Control's Impact on Productivity
No ratings yet
Inventory Control's Impact on Productivity
36 pages
Complete Business Statistics (7th Edition) PDF
No ratings yet
Complete Business Statistics (7th Edition) PDF
10 pages
Understanding Simple Linear Regression
No ratings yet
Understanding Simple Linear Regression
7 pages
Understanding UMVUE in Statistics
No ratings yet
Understanding UMVUE in Statistics
27 pages
Hypergraph Inference from Dynamics
No ratings yet
Hypergraph Inference from Dynamics
9 pages
Chi-Square Tests: Goodness-of-Fit Explained
No ratings yet
Chi-Square Tests: Goodness-of-Fit Explained
16 pages
Influence Curve in Robust Estimation
No ratings yet
Influence Curve in Robust Estimation
12 pages

Parametric Classification and Bias in ML

Uploaded by

Parametric Classification and Bias in ML

Uploaded by

BLG 527E Machine Learning

• Maximum likelihood estimator (MLE)

• Multinomial: K>2 states, xi in {0,1}

MLE for μ and σ2: m t

To evaluate the quality of this estimator we can measure

But since it is random variable (it depends on the

Remember the properties of expectation

Bias: bq(d) = E [d] – q

Mean square error:

r (d,q) = E [(d–q)2]=E [(d–E[d]+E[d]-θ)2]

Remember the properties of expectation

= E[(E[d]-θ)2]+E [(d–E [d])2] +2 (E[d]–E[d])(E[d]-θ)

= (E [d] – θ)2 +E [(d–E [d])2]

= (E [d] – θ)2 + E [(d–E [d])2]

• Bayes’ rule: p (θ|X) = p(X|θ) p(θ) / p(X)

• p(θ|X) ∝ p(X| θ)p(θ)

• Take the derivative with respect to θ

L( ) log       log   

L( ) log        log   

Compute posterior p(θ |X) = p(X| θ)p(θ)/ p(X)

p(θ |X) ∝ p(X| θ)p(θ)

• This example is obtained from Nando Freitas lecture notes

P(x1, x2,…, xN| θ ) = ∏t θ xt (1 – θ ) (1 – xt) = θ m (1 – θ ) (N – m)

Where xt ϵ {0,1} and m is the number of 1’s

• Specify a prior on θ. For this we need to introduce Beta distribution

P(x1, x2,…, xN| θ ) = ∏t θ xt (1 – θ ) (1 – xt) = θ m (1 – θ ) (N – m)

(   )   1 1 1 1

1 1 ( ' ' )  ' 1 '

• Using Beta(2,2) prior 8

• M samples Xi={xti , rti}, i=1,...,M

gi(x)= ∑t rti/N has lower bias with variance

Bias Variance Image is obtained from: [Link] 36

Best fit “min error”

Idea: Penalize large coefficients

You might also like