Maximum Likelihood Estimation Explained

The document discusses Maximum Likelihood Estimation (MLE) as a method for estimating parameters of a model to best fit data, particularly in the context of Gaussian distributions and linear regression. It highlights the process of maximizing the likelihood function and the computational convenience of minimizing the negative log-likelihood instead. Additionally, it contrasts MLE with Maximum A Posteriori (MAP) estimation, which incorporates prior knowledge to mitigate risks associated with relying solely on sample data.

Uploaded by

motherpanda06

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views21 pages

Maximum Likelihood Estimation Explained

Uploaded by

motherpanda06

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Week 4 Lecture

Parameter Estimation: The

Maximum Likelihood
Estimation (MLE)
Content

o Maximum Likelihood Estimation

o The MLE and The Gaussian
o MLE in Linear Regression
o The Bayesian Way
Likelihood
o For the data given below (i.e the histogram), which model fits the data the best? How
do we choose?

o The MLE aims to answer that question. It gives estimates of the parameters of the
true (but unknown) underlying distribution by using the available data (in this case,
the histogram).
Likelihood
o Let us denote by w the vector of parameters that govern a deterministic probability
distribution
o For instance, the Normal distribution has 2 parameters,and ; hence

o We had seen in Week 1 how a model could be optimised on the training set by
minimising a loss function. However, given the data above, we do not have pairs of
and , but only data on with no corresponding to train an algorithm.
o Given such a parametric model for a probability density function ), e.g. a Gaussian
distribution, and some sample data points (a training set without labels y), how are
the parameters optimised?
Likelihood
o Aim: We seek to find the values of w that will maximise . In other words, we want to
find the that gives the model that will fit our data the best.
o The MLE aims to optimise the Likelihood function defined by:

o The likelihood gives the probability of observing the data that we have (i.e the
histogram above), given a certain value of . Maximum Likelihood Estimation aims to
find the that will maximise the likelihood .
Maximum Likelihood
o We thus want to find thesuch that:

o In practice, instead of maximising the likelihood, we choose to minimise the negative

log-likelihood (either way gives rise to the same )

o Why?
o 1) The likelihood is a product of many small probabilities and can give rise to
numerical instabilities, so we want to turn the product into a sum.
o 2) Minimising a function is usually easier, and can be done using common algorithms
such as the Gradient Descent
Maximum Likelihood
o We thus seek to find , the that minimises the negative log-likelihood (NLL) such
that:

o The logarithm (commonly in base e) is a monotonic function. Hence, the value

which gives a maximum of is identical to that which minimises the NLL.
Maximum Likelihood and the Gaussian distribution

o The Gaussian p.d.f. in one dimension for data point is

Maximum Likelihood and the Gaussian distribution

o The Gaussian p.d.f. in one dimension for data point is

o The negative log-likelihood function is therefore:

Maximum Likelihood and the Gaussian distribution

o The maximum likelihood estimate (M.L.E.) for can be determined directly by

differentiation of the NLL:
MLE in Linear Regression
o Regression setting:

o Assumption 1: The data can be modelled by a line.

o Assumption 2: Noise of each point is modelled by a Gaussian:
MLE in Linear Regression
o Regression setting:
MLE in Linear Regression

o Again, we are interested in predicting given data on the features

o For a single data point we are interested in ) where is the slope of our linear model
o Since ) is Normally distributed, we have

o How do we find the that best explains our data?

o MLE!
MLE in Linear Regression
MLE in Linear Regression
o We introduce the loss function

o is a convex parabola →easy to minimise!

o This specific loss function is also known as the Squared Loss, or the Ordinary Least
Square (OLS)
o The OLS can be minimised using gradient descent, Newton’s method, or in closed-
form
o To be completely accurate, the loss function for data point is The cost function is the
averaged sum of the loss
MLE Summary
o The MLE method aims to find the best parameters for a model to fit our data
o It does so by finding the parameters that maximise the probability of observing the
data that we have
o For computational purposes, it is often easier to minimise the Negative Log-
Likelihood instead
o In Linear Regression, the estimate of can be modelled by a Gaussian, where the
mean defines the line, and the variance defines the noise (or the spread of the data)
around that line. The MLE for the mean gives us the Squared Loss function

o What might be a drawback of only relying on the MLE

to obtain a model that will be used for prediction?
MLE Summary
o The Maximum Likelihood Estimator (MLE) gives an estimate based on the sample
only (observed data) by maximising the Likelihood function

o Risk: Since the MLE only considers the data given to us (sample), if that data is too
small or non-representative of the population, then the model built using the MLE will
do great on that sample, but not so great on the population (which is what we care
about when making predictions on new data).
MLE and MAP
o The Maximum A Posteriori estimate (MAP) uses the observed data (i.e through the
likelihood function) but also incorporates our prior knowledge of what the distribution
of the parameter that we try to estimate might be

o This prior knowledge aims to address the possible risk of the MLE described above by
bringing information that is not dictated by the sample

o Posterior ∝ Likelihood x Prior

o Risk: If our prior knowledge is wrong, we might make things worse.
MLE and MAP

Posterior ∝ Likelihood x
Prior
The Bayesian Way
Questions?

Introduction To Maximum Likelihood Estimation (MLE)
No ratings yet
Introduction To Maximum Likelihood Estimation (MLE)
6 pages
Understanding Maximum Likelihood Estimation
No ratings yet
Understanding Maximum Likelihood Estimation
6 pages
Lec 9B - Bayesian Learning II
No ratings yet
Lec 9B - Bayesian Learning II
35 pages
Introduction To Maximum Likelihood Estimation (MLE)
No ratings yet
Introduction To Maximum Likelihood Estimation (MLE)
6 pages
PR Gaussian
No ratings yet
PR Gaussian
10 pages
Maximum Likelihood Estimation Explained
No ratings yet
Maximum Likelihood Estimation Explained
22 pages
Unit-5 Parameter Estimation
No ratings yet
Unit-5 Parameter Estimation
14 pages
Maximum Likelihood in Machine Learning
No ratings yet
Maximum Likelihood in Machine Learning
8 pages
Maximum Likelihood Estimation Overview
No ratings yet
Maximum Likelihood Estimation Overview
36 pages
MLE of Gaussian Mean and Bernoulli Probability
No ratings yet
MLE of Gaussian Mean and Bernoulli Probability
5 pages
Lec 13
No ratings yet
Lec 13
6 pages
Maximum Likelihood Estimation Explained
No ratings yet
Maximum Likelihood Estimation Explained
19 pages
Maximum Likelihood in Machine Learning
No ratings yet
Maximum Likelihood in Machine Learning
8 pages
Lecture 37
No ratings yet
Lecture 37
28 pages
Understanding Maximum Likelihood Estimation
No ratings yet
Understanding Maximum Likelihood Estimation
28 pages
LN20 Parameters Mle
No ratings yet
LN20 Parameters Mle
5 pages
5 1 Conditional Probability, Maximum Likelihood Estimate, Maximum A Posteriori Estimate
No ratings yet
5 1 Conditional Probability, Maximum Likelihood Estimate, Maximum A Posteriori Estimate
15 pages
Module 3
No ratings yet
Module 3
22 pages
MLE: Estimation Techniques in Statistics
No ratings yet
MLE: Estimation Techniques in Statistics
4 pages
Hasan Method: Estimation Techniques
No ratings yet
Hasan Method: Estimation Techniques
5 pages
Maximum Likelihood Estimation in ML
No ratings yet
Maximum Likelihood Estimation in ML
97 pages
Maximum Likelihood Estimation in Matlab
No ratings yet
Maximum Likelihood Estimation in Matlab
5 pages
Maximum Likelihood Estimation in Statistics
No ratings yet
Maximum Likelihood Estimation in Statistics
62 pages
Linear Regression Techniques Explained
No ratings yet
Linear Regression Techniques Explained
34 pages
Understanding Parameter Estimation
No ratings yet
Understanding Parameter Estimation
6 pages
Learning With Maximum Likelihood: Andrew W. Moore Professor School of Computer Science Carnegie Mellon University
No ratings yet
Learning With Maximum Likelihood: Andrew W. Moore Professor School of Computer Science Carnegie Mellon University
50 pages
SMAI 2026 Class Material
No ratings yet
SMAI 2026 Class Material
22 pages
Maximum Likelihood Estimators Explained
No ratings yet
Maximum Likelihood Estimators Explained
22 pages
Maximum Likelihood Estimation in Finance
No ratings yet
Maximum Likelihood Estimation in Finance
39 pages
Maximum Likelihood Estimation Explained
No ratings yet
Maximum Likelihood Estimation Explained
11 pages
Maximum Likelihood Learning of Gaussians For Data Mining
No ratings yet
Maximum Likelihood Learning of Gaussians For Data Mining
25 pages
Parameter Estimation in Bayesian Theory
No ratings yet
Parameter Estimation in Bayesian Theory
6 pages
Advanced Linear Regression Techniques
No ratings yet
Advanced Linear Regression Techniques
66 pages
MLE and Bayesian Methods in Regression
No ratings yet
MLE and Bayesian Methods in Regression
89 pages
Maximum Likelihood Estimation Explained
No ratings yet
Maximum Likelihood Estimation Explained
4 pages
Maximum Likelihood Estimation Explained
No ratings yet
Maximum Likelihood Estimation Explained
1 page
MLE Properties and Estimation Techniques
No ratings yet
MLE Properties and Estimation Techniques
103 pages
MLE: Estimation and Model Comparison
No ratings yet
MLE: Estimation and Model Comparison
21 pages
MLE vs MAP in Machine Learning
No ratings yet
MLE vs MAP in Machine Learning
29 pages
Maximum Likelihood Estimation Explained
No ratings yet
Maximum Likelihood Estimation Explained
10 pages
Understanding Parameter Estimation Techniques
No ratings yet
Understanding Parameter Estimation Techniques
13 pages
Understanding Maximum Likelihood Estimation
No ratings yet
Understanding Maximum Likelihood Estimation
15 pages
Understanding Maximum Likelihood Estimation
No ratings yet
Understanding Maximum Likelihood Estimation
6 pages
Linear Regression and MLE Techniques
No ratings yet
Linear Regression and MLE Techniques
148 pages
Understanding Maximum Likelihood Estimation
No ratings yet
Understanding Maximum Likelihood Estimation
3 pages
Maximum Likelihood Estimation Explained
No ratings yet
Maximum Likelihood Estimation Explained
14 pages
Maximum Likelihood Estimation Explained
No ratings yet
Maximum Likelihood Estimation Explained
10 pages
Maximum Likelihood
No ratings yet
Maximum Likelihood
16 pages
MLE and Least Squares Explained
No ratings yet
MLE and Least Squares Explained
5 pages
MLE vs OLS in Econometrics Analysis
No ratings yet
MLE vs OLS in Econometrics Analysis
14 pages
Note 4: EECS 189 Introduction To Machine Learning Fall 2020 1 MLE and MAP For Regression (Part I)
No ratings yet
Note 4: EECS 189 Introduction To Machine Learning Fall 2020 1 MLE and MAP For Regression (Part I)
6 pages
Understanding Maximum Likelihood Estimation
No ratings yet
Understanding Maximum Likelihood Estimation
4 pages
Maximum Likelihood Estimation Explained
No ratings yet
Maximum Likelihood Estimation Explained
3 pages
Likelihood Models in Statistical Learning
No ratings yet
Likelihood Models in Statistical Learning
16 pages
Point Estimation Methods in Statistics
No ratings yet
Point Estimation Methods in Statistics
3 pages
Maximum Likelihood in Logistic Regression
No ratings yet
Maximum Likelihood in Logistic Regression
5 pages
Industrial Relations and Labour Laws
No ratings yet
Industrial Relations and Labour Laws
22 pages
Understanding Gearbox Mechanics
100% (1)
Understanding Gearbox Mechanics
59 pages
Antimicrobial Properties of Cashew Extract
No ratings yet
Antimicrobial Properties of Cashew Extract
3 pages
Combinatorial Optimization and Applications 14th International Conference COCOA 2020 Dallas TX USA December 11 13 2020 Proceedings Weili Wu Ebook PDF
100% (4)
Combinatorial Optimization and Applications 14th International Conference COCOA 2020 Dallas TX USA December 11 13 2020 Proceedings Weili Wu Ebook PDF
67 pages
FAT Work Instruction for Switchgear Testing
No ratings yet
FAT Work Instruction for Switchgear Testing
3 pages
BS 8102:2022 Basement Waterproofing Guide
No ratings yet
BS 8102:2022 Basement Waterproofing Guide
4 pages
Thailand's Cold War Role and U.S. Influence
No ratings yet
Thailand's Cold War Role and U.S. Influence
6 pages
Maternity Leave & Child Care Policy
No ratings yet
Maternity Leave & Child Care Policy
4 pages
Effective Assessment Rubrics Guide
No ratings yet
Effective Assessment Rubrics Guide
4 pages
Work, Energy, and Power Concepts
No ratings yet
Work, Energy, and Power Concepts
71 pages
Hitachi ADAX4 Servo Drive Manual
100% (3)
Hitachi ADAX4 Servo Drive Manual
296 pages
EASA Type Certificate Data Sheet A330
100% (2)
EASA Type Certificate Data Sheet A330
38 pages
Twelve Steps of Neurotics Anonymous
No ratings yet
Twelve Steps of Neurotics Anonymous
2 pages
Codes and Standards:: The Welding Institute
No ratings yet
Codes and Standards:: The Welding Institute
0 pages
Psychodynamic Group Therapy Overview
No ratings yet
Psychodynamic Group Therapy Overview
16 pages
IX English Revision Paper 4
No ratings yet
IX English Revision Paper 4
4 pages
NEP Syllabus: Language Course 1st Year
No ratings yet
NEP Syllabus: Language Course 1st Year
47 pages
Bok:978 3 642 24574 9 PDF
100% (1)
Bok:978 3 642 24574 9 PDF
416 pages
Hercules 390 Owners Handbook: Primary System
No ratings yet
Hercules 390 Owners Handbook: Primary System
21 pages
Overview of Bharat Operating System
No ratings yet
Overview of Bharat Operating System
2 pages
Forgery and Counterfeiting Explained
No ratings yet
Forgery and Counterfeiting Explained
18 pages
Phase 3 Phonics Lesson Plans
No ratings yet
Phase 3 Phonics Lesson Plans
12 pages
Plymouth Colony and American Literature
No ratings yet
Plymouth Colony and American Literature
20 pages
The Envious Tree and Ana's Courage
No ratings yet
The Envious Tree and Ana's Courage
2 pages
Evaluating Special Order Decisions
63% (8)
Evaluating Special Order Decisions
33 pages
Tardieu Scale for Muscle Spasticity Assessment
No ratings yet
Tardieu Scale for Muscle Spasticity Assessment
2 pages
Neurological Emergencies
100% (6)
Neurological Emergencies
492 pages
Tumor-Associated Macrophages in Ovarian Cancer
No ratings yet
Tumor-Associated Macrophages in Ovarian Cancer
12 pages
GST Breakdown for Airtel Wi-Fi Bill
No ratings yet
GST Breakdown for Airtel Wi-Fi Bill
4 pages
Born-Haber Cycle Analysis for NaCl and MgO
No ratings yet
Born-Haber Cycle Analysis for NaCl and MgO
8 pages