0% found this document useful (0 votes)

7 views5 pages

Chapter 7. Statistical Estimation: 7.2: Maximum Likelihood Examples

Chapter 7.2 discusses maximum likelihood estimation (MLE) with examples from Poisson, Exponential, and Uniform distributions. It explains the process of computing likelihoods, taking derivatives, and finding estimates, highlighting the importance of MLE in statistics and machine learning. The chapter emphasizes that while most MLE problems follow a straightforward approach, the Uniform distribution presents unique challenges due to its dependency on the maximum sample value.

Uploaded by

Alona

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views5 pages

Chapter 7. Statistical Estimation: 7.2: Maximum Likelihood Examples

Uploaded by

Alona

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Chapter 7.

Statistical Estimation
7.2: Maximum Likelihood Examples
(From “Probability & Statistics with Applications to Computing” by Alex Tsun)

We spend an entire section just doing examples because maximum likelihood is such a fundamental concept
used everywhere (especially machine learning). I promise that the idea is simple: find θ that maximizes the
likelihood of the data. The computation and notation can be confusing at first though.

7.2.1 MLE Example (Poisson)

Example(s)

Let’s say x1 , x2 , ..., xn are iid samples from Poi(θ). (These values might look like x1 = 13, x2 = 5, x3 =
6, etc...) What is the MLE of θ?

Solution Remember that we discussed that the sample mean might be a good estimate of θ. If we observed
20 events over 5 units of time, a good estimate for λ, the average number of events per unit of time, would
be 20
5 = 4. This turns out to be the maximum likelihood estimate!
Let’s follow the recipe provided in 7.1.

1. Compute the likelihood and log-likelihood of data. To do this, we take the following product
of the Poisson PMFs at each sample xi , over all the data points:

n n
Y Y θ xi
L(x | θ) = pX (xi | θ) = e−θ
i=1 i=1
xi !

Again, this is the probability of seeing x1 , then x2 , and so on. This function is pretty hard to differ-
entiate, so to make it easier, let’s compute the log-likelihood instead, using the following identities:

log(ab) = log(a) + log(b) log(a/b) = log(a) − log(b) log(ab ) = b log(a)

In most cases, we’ll want to optimize the log-likelihood instead of the likelihood (since we don’t want
to use the product rule of calculus)!
n
!
xi
−θ θ
Y
ln L(x | θ) = ln e [def of likelihood]
i=1
xi !
n
θ xi
X
= ln e−θ [log of product is sum of logs]
i=1
xi !
n
X
= [ln(e−θ ) + ln(θxi ) − ln xi !)] [log of product is sum of logs]
i=1
Xn
= [−θ + xi ln θ − ln xi !)] [other log properties]
i=1

1
2 Probability & Statistics with Applications to Computing 7.2

2. Take the partial derivative(s) with respect to θ and set to 0. Solve the equation(s).
Now we want to take the derivative of the log likelihood with respect to θ, so the derivative of −θ is
just −1, and the derivative of xi ln θ is just xθi , because remember xi is a constant with respect to θ.

n
∂ Xh xi i
ln L(x | θ) = −1 +
∂θ i=1
θ

And now we want to set the derivative equal P

to 0, and solve for θ, and θ̂ is actually the estimate that
n
we solve for. We do some algebra, and get n1 i=1 xi , which is actually just the sample mean!

n h n n
X xi i 1X 1X
−1 + = 0 → −n + xi = 0 → θ̂ = xi
i=1
θ θ̂ i=1 n i=1

3. Optionally, verify θ̂M LE is indeed a (local) maximizer by checking that the second deriva-
tive at θ̂M LE is negative (if θ is a single parameter), or the Hessian (matrix of second
partial derivatives) is negative semi-definite (if θ is a vector of parameters).
We want to take the second derivative also, because
Pn otherwise we don’t know if this is a maximum or
a minimum. We differentiate the first derivative i=1 [−1 + xθi ] again with respect to θ, and we notice
that because θ2 is always positive, the negative of that is always negative, so the second derivative
is always less than 0, so that means that it’s concave down everywhere. This means that anywhere
the derivative is zero is a global maximum, so we’ve successfully found the global maximum of our
likelihood equation.

n
∂2 X h xi i
ln L(x | θ) = − 2 < 0 → concave down everywhere
∂θ2 i=1
θ

7.2.2 MLE Example (Exponential)

Example(s)

Let’s say x1 , x2 , ..., xn are iid samples from Exp(θ). (These values might look like x1 = 1.354, x2 =
3.198, x3 = 4.312, etc...) What is the MLE of θ?

Solution Now that we’ve seen one example, we’ll just follow the procedure given in the previous section.

1. Compute the likelihood and log-likelihood of data.

Since we have a continuous distribution, our likelihood is the product of the PDFs:
n
Y n
Y
L(x | θ) = fX (xi | θ) = θe−θxi
i=1 i=1

The log-likelihood is
n
X n
X
ln L(x | θ) = ln θe−θxi = [ln(θ) − θxi ]
i=1 i=1
7.2 Probability & Statistics with Applications to Computing 3

2. Take the partial derivative(s) with respect to θ and set to 0. Solve the equation(s).
n
∂ X 1
ln L(x | θ) = − xi
∂θ i=1
θ

Now, we set the derivative to 0 and solve (here we replace θ with θ̂):
n n
X 1 n X n
− xi = 0 → − xi = 0 → θ̂ = Pn
i=1 θ̂ θ̂ i=1 i=1 xi

This is just the inverse of the sample mean! This makes sense because if the average waiting time was
1
1/2 hours, then the average rate per unit of time λ should be 1/2 = 2 per hour!

Since the second derivative is negative everywhere, the function is concave down, and any critical point
is a global maximum!

7.2.3 MLE Example (Uniform)

Example(s)

Let’s say x1 , x2 , ..., xn are iid samples from (continuous) Unif(0, θ). (These values might look like
x1 = 2.325, x2 = 1.1242, x3 = 9.262, etc...) What is the MLE of θ?

Solution It turns out our usual procedure won’t work on this example, unfortunately. We’ll explain why
once we run into the problem!
To compute the likelihood, we first need the individual density functions. Recall

1 0 ≤ x ≤ θ
fX (x | θ) = θ
0 otherwise

Let’s actually define an indicator function for whether or not some boolean condition A is true or false:
(
1 A is true
IA =
0 A is false

This way, we can rewrite the uniform density in one line as (1/θ for 0 ≤ x ≤ θ and 0 otherwise):
1
fX (x | θ) = I{0≤x≤θ}
θ
4 Probability & Statistics with Applications to Computing 7.2

First, we take the product over all data points of the density at that data point, and plug in the density of
the uniform distribution. How do we simplify this? First of all, we notice that in every term in the product,
there is still a θ1 , so multiply it by itself n times and get θ1n . How do we multiply indicators? If we want the
product of 1’s and 0’s to be 1, they ALL have to be 1. So,

I{0≤x1 ≤θ} · I{0≤x2 ≤θ} · · · I{0≤xn ≤θ} = I{0≤x1 ,...,xn ≤θ}

and our likelihood is

n n
Y Y 1 1
L(x | θ) = fX (xi | θ) = I{0≤xi ≤θ} = I{0≤x1 ,...,xn ≤θ}
i=1 i=1
θ θn

We could take the log-likelihood before differentiating, but this function isn’t too bad-looking, so let’s take
the derivative of this. The I{0≤x1 ,...,xn ≤θ} just says the function is θ1n when it the condition is true and 0
otherwise. So our derivative will just be the derivative of θ1n when that condition is true and 0 otherwise.

d n
L(x | θ) = − n+1 I{0≤x1 ,...,xn ≤θ}
dθ θ

Now, let’s set the derivative equal to 0 and solve for θ.

n
− = 0 → θ =???
θn+1

There seems
n to be no value of θ that solves this, what’s going on? Let’s plot the likelihood. First, we plot
just θ1 (not quite the likelihood) where θ is on the x-axis:

Above is a graph of θ1n , and so if we wanted to maximize this function, we should choose θ = 0. But
remember that the likelihood, was θ1n I{0≤x1 ,...,xn ≤θ} , which can also be written as θ1n I{xmax ≤θ} , because all
the samples are ≤ θ if and only if the maximum is. Below is the graph of the actual likelihood:
7.2 Probability & Statistics with Applications to Computing 5

Notice that multiplying by the indicator function just kept the function as is when the condition was true,
xmax ≤ θ, but zeroed it out otherwise. So now we can see that our maximum likelihood estimator should be
θ̂M LE = xmax = max{x1 , x2 , . . . , xn }, since it achieves the highest value.

Why? Remember x1 , . . . , xn ∼ Unif(0, θ), so θ has to be at least as large as the biggest xi , because if it’s
not as large as the biggest xi , then it would have been impossible for that uniform to produce that largest
xi . For example, if our samples were x1 = 2.53, x2 = 8.55, x3 = 4.12, our θ had to be at least 8.55 (the
maximum sample), because if it were 7 for example, then Unif(0, 7) could not possibly generate the sample
8.55.
So our likelihood remember θ1n would have preferred as small a θ as possible to maximize it, but subject to
θ ≥ xmax . Therefore the “compromise” was reached by making them equal!
I’d like to point out this is a special case because the range of the uniform distribution depends on its
parameter(s) a, b (the range of Unif(a, b) is [a, b]). On the other hand, most of our distributions like Poisson
or Exponential have the same range no matter what value the value of their parameters. For example, the
range of Poi(λ) is always {0, 1, 2, . . . } and the range of Exp(λ) is always [0, ∞), independent of λ.

Therefore, most MLE problems will be similar to the first two examples rather than this complicated one!

Understanding Maximum Likelihood Estimation
No ratings yet
Understanding Maximum Likelihood Estimation
6 pages
MLE: Examples and Estimation Methods
No ratings yet
MLE: Examples and Estimation Methods
7 pages
Maximum Likelihood Estimation Explained
No ratings yet
Maximum Likelihood Estimation Explained
8 pages
MLE for Uniform Distribution Explained
No ratings yet
MLE for Uniform Distribution Explained
7 pages
Understanding Parameter Estimation
No ratings yet
Understanding Parameter Estimation
6 pages
Robust and Maximum Likelihood Estimators
No ratings yet
Robust and Maximum Likelihood Estimators
15 pages
Maximum Likelihood Estimators Explained
No ratings yet
Maximum Likelihood Estimators Explained
15 pages
Maximum Likelihood Estimators Explained
No ratings yet
Maximum Likelihood Estimators Explained
15 pages
Maximum Likelihood Estimation Explained
No ratings yet
Maximum Likelihood Estimation Explained
17 pages
Frequentist Estimation and MLE Concepts
No ratings yet
Frequentist Estimation and MLE Concepts
6 pages
Maximum Likelihood Estimation Explained
No ratings yet
Maximum Likelihood Estimation Explained
14 pages
Maximum Likelihood Estimators Explained
No ratings yet
Maximum Likelihood Estimators Explained
22 pages
Point Estimation in Statistics
No ratings yet
Point Estimation in Statistics
23 pages
Maximum Likelihood Method (University of Kassel)
No ratings yet
Maximum Likelihood Method (University of Kassel)
26 pages
Maximum Likelihood Estimation Overview
No ratings yet
Maximum Likelihood Estimation Overview
24 pages
Understanding Maximum Likelihood Estimation
No ratings yet
Understanding Maximum Likelihood Estimation
24 pages
Point Estimation Methods in Statistics
No ratings yet
Point Estimation Methods in Statistics
3 pages
LN20 Parameters Mle
No ratings yet
LN20 Parameters Mle
5 pages
Maximum Likelihood Estimation Explained
No ratings yet
Maximum Likelihood Estimation Explained
59 pages
Understanding Maximum Likelihood Estimation
No ratings yet
Understanding Maximum Likelihood Estimation
4 pages
MLE Properties and Estimation Techniques
No ratings yet
MLE Properties and Estimation Techniques
103 pages
Maximum Likelihood Estimation Overview
No ratings yet
Maximum Likelihood Estimation Overview
21 pages
MLE of Gaussian Mean and Bernoulli Probability
No ratings yet
MLE of Gaussian Mean and Bernoulli Probability
5 pages
Parametric Point Estimation Methods
No ratings yet
Parametric Point Estimation Methods
13 pages
Introduction To Maximum Likelihood Estimation (MLE)
No ratings yet
Introduction To Maximum Likelihood Estimation (MLE)
6 pages
Point Estimation-II
No ratings yet
Point Estimation-II
5 pages
Understanding Maximum Likelihood Estimation
No ratings yet
Understanding Maximum Likelihood Estimation
46 pages
Maximum Likelihood Estimation in Astronomy
No ratings yet
Maximum Likelihood Estimation in Astronomy
37 pages
DSA201 Lecture 6 DM
No ratings yet
DSA201 Lecture 6 DM
45 pages
Maximum Likelihood Estimation Overview
No ratings yet
Maximum Likelihood Estimation Overview
32 pages
20 Mle Annotated-2
No ratings yet
20 Mle Annotated-2
45 pages
Maximum Likelihood Estimation Explained
No ratings yet
Maximum Likelihood Estimation Explained
4 pages
Understanding Maximum Likelihood Estimation
No ratings yet
Understanding Maximum Likelihood Estimation
22 pages
Maximum Likelihood Estimation Explained
No ratings yet
Maximum Likelihood Estimation Explained
276 pages
Lec 13
No ratings yet
Lec 13
6 pages
Maximum Likelihood Estimation Explained
No ratings yet
Maximum Likelihood Estimation Explained
3 pages
Maximum Likelihood Estimation - MIT 18.650
No ratings yet
Maximum Likelihood Estimation - MIT 18.650
17 pages
Maximum Likelihood Estimation in STAT 414
No ratings yet
Maximum Likelihood Estimation in STAT 414
8 pages
Maximum Likelihood Estimation Explained
No ratings yet
Maximum Likelihood Estimation Explained
37 pages
Maximum Likelihood Estimator Explained
No ratings yet
Maximum Likelihood Estimator Explained
40 pages
Maximum Likelihood Estimator Explained
No ratings yet
Maximum Likelihood Estimator Explained
3 pages
Maximum Likelihood Estimation Explained
No ratings yet
Maximum Likelihood Estimation Explained
7 pages
Introduction To Maximum Likelihood Estimation (MLE)
No ratings yet
Introduction To Maximum Likelihood Estimation (MLE)
6 pages
Maximum Likelihood Estimation Explained
No ratings yet
Maximum Likelihood Estimation Explained
5 pages
Maximum Likelihood Estimation Explained
No ratings yet
Maximum Likelihood Estimation Explained
4 pages
Hasan Method: Estimation Techniques
No ratings yet
Hasan Method: Estimation Techniques
5 pages
Maximum Likelihood Estimation Explained
No ratings yet
Maximum Likelihood Estimation Explained
11 pages
Maximum Likelihood Estimation in Finance
No ratings yet
Maximum Likelihood Estimation in Finance
39 pages
Understanding Statistical Estimation Techniques
No ratings yet
Understanding Statistical Estimation Techniques
20 pages
Maximum Likelihood Estimation Guide
No ratings yet
Maximum Likelihood Estimation Guide
17 pages
Maximum Likelihood Estimation Explained
No ratings yet
Maximum Likelihood Estimation Explained
14 pages
MATLAB Implementation of MLE Method
No ratings yet
MATLAB Implementation of MLE Method
277 pages
MLE: Estimation Techniques in Statistics
No ratings yet
MLE: Estimation Techniques in Statistics
4 pages
Likelihood and MLE in Statistics
No ratings yet
Likelihood and MLE in Statistics
17 pages
Total Variation Distance in MLE
No ratings yet
Total Variation Distance in MLE
25 pages
Maximum Likelihood Estimation Overview
No ratings yet
Maximum Likelihood Estimation Overview
10 pages
STPM 2018 Trial Exam Paper 1 Solutions
No ratings yet
STPM 2018 Trial Exam Paper 1 Solutions
9 pages
Limited-memory BFGS Optimization Algorithm
No ratings yet
Limited-memory BFGS Optimization Algorithm
6 pages
Understanding Limits and Discontinuities
No ratings yet
Understanding Limits and Discontinuities
22 pages
Math9 Q1 Weeks5to8 Binded Ver1.0-1
No ratings yet
Math9 Q1 Weeks5to8 Binded Ver1.0-1
41 pages
Hsslive Xi Maths SSK Worksheet PDF
No ratings yet
Hsslive Xi Maths SSK Worksheet PDF
70 pages
NPTEL Data Science for Engineers Q&A
No ratings yet
NPTEL Data Science for Engineers Q&A
13 pages
Impulse Response from Transfer Function
No ratings yet
Impulse Response from Transfer Function
15 pages
M.Sc. Applied Mathematics Syllabus
No ratings yet
M.Sc. Applied Mathematics Syllabus
48 pages
Anisotropic Elasticity Solutions Overview
No ratings yet
Anisotropic Elasticity Solutions Overview
5 pages
Fourier Series and Complex Exponentials
No ratings yet
Fourier Series and Complex Exponentials
3 pages
Ebookmeta - Com/?p 106439
100% (2)
Ebookmeta - Com/?p 106439
170 pages
Algebraic Expressions and Operations
No ratings yet
Algebraic Expressions and Operations
77 pages
Grade 11 Trigonometry Workbook
No ratings yet
Grade 11 Trigonometry Workbook
19 pages
Numerical Methods Assignment Solutions
No ratings yet
Numerical Methods Assignment Solutions
2 pages
MATH101 Descriptive Statistics Assignment
No ratings yet
MATH101 Descriptive Statistics Assignment
10 pages
Matrix Operations and Types Explained
No ratings yet
Matrix Operations and Types Explained
42 pages
Climate Change Modeling in Space Science
No ratings yet
Climate Change Modeling in Space Science
22 pages
Iso-Hyetal Method in Rainfall Analysis
No ratings yet
Iso-Hyetal Method in Rainfall Analysis
3 pages
Cubic Spline Smoothing in Regression
No ratings yet
Cubic Spline Smoothing in Regression
5 pages
List of Mathematical Symbols - Wikipedia, The Free Encyclopedia
No ratings yet
List of Mathematical Symbols - Wikipedia, The Free Encyclopedia
16 pages
Limits and Continuity Exercises
100% (1)
Limits and Continuity Exercises
37 pages
Mathematics I Tutorial Sheet 7 Solutions
No ratings yet
Mathematics I Tutorial Sheet 7 Solutions
2 pages
Homological Methods in Banach Space Theory 1st Edition Félix Cabello Sánchez Ebook Auto-Download
100% (5)
Homological Methods in Banach Space Theory 1st Edition Félix Cabello Sánchez Ebook Auto-Download
63 pages
Angular Momentum in Quantum Mechanics
No ratings yet
Angular Momentum in Quantum Mechanics
35 pages
Elasto-Plastic Analysis of Reinforced Soils
No ratings yet
Elasto-Plastic Analysis of Reinforced Soils
16 pages
Higher Maths Study Schedule
No ratings yet
Higher Maths Study Schedule
3 pages
Differential Equations Overview
No ratings yet
Differential Equations Overview
5 pages
Engineering Physics I Overview
No ratings yet
Engineering Physics I Overview
29 pages
Multivariable Functions in Engineering Mathematics
No ratings yet
Multivariable Functions in Engineering Mathematics
10 pages
Fractional Programming: The Sum-Of-Ratios Case
No ratings yet
Fractional Programming: The Sum-Of-Ratios Case
16 pages

Chapter 7. Statistical Estimation: 7.2: Maximum Likelihood Examples

Uploaded by

Chapter 7. Statistical Estimation: 7.2: Maximum Likelihood Examples

Uploaded by

Chapter 7.

7.2.1 MLE Example (Poisson)

log(ab) = log(a) + log(b) log(a/b) = log(a) − log(b) log(ab ) = b log(a)

And now we want to set the derivative equal P

7.2.2 MLE Example (Exponential)

1. Compute the likelihood and log-likelihood of data.

7.2.3 MLE Example (Uniform)

I{0≤x1 ≤θ} · I{0≤x2 ≤θ} · · · I{0≤xn ≤θ} = I{0≤x1 ,...,xn ≤θ}

and our likelihood is

Now, let’s set the derivative equal to 0 and solve for θ.

You might also like