0% found this document useful (0 votes)

9 views23 pages

Point Estimation in Statistics

Chapter 2 of MATH 3423 focuses on point estimation, detailing methods for estimating unknown parameters using statistics, particularly through Maximum Likelihood Estimation (MLE). It introduces the concept of likelihood and explains how to find MLE, emphasizing its desirable properties such as being asymptotically unbiased and normally distributed. The chapter also discusses the importance of understanding the parameter space and provides examples of finding MLE in different statistical contexts.

Uploaded by

Zexi Tang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views23 pages

Point Estimation in Statistics

Uploaded by

Zexi Tang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

MATH 3423 Statistical Inference | Dr.

CW YU

Chapter 2: Point Estimation

There are two parts for this chapter:
Part I: Finding estimators in general cases and Evaluating Estimators, and
Part II: Best Unbiased Estimator (UMVUE).

1 PART I: INTRODUCTION
For part I, we will first study how to estimate a parameter(s) in general cases by a methodical
estimation principle and then discuss its performance.

Point Estimation:
The idea of point estimation is so simple that we just use a statistic 𝑇(𝒙) to estimate the
unknown parameter, say 𝜃, where 𝒙 = (𝑥1 , … , 𝑥𝑛 )′ is a realization of the random sample 𝑿 =
(𝑋1 , … , 𝑋𝑛 )′ or {𝑋𝑖 : 𝑖 = 1, … , 𝑛} of size 𝑛 from a population with a pdf 𝑓(⋅ |𝜃) or pmf 𝑝(⋅ |𝜃)
and 𝜃 is in the parameter space Θ.

In some cases, there is an obvious or natural point estimator of an unknown parameter. For
instance, sample mean of a random sample is a natural point estimator of the population mean.
However, when we leave such a simple case, we need a more methodical estimation technique
that will at least give us a reasonable candidate for consideration. In Section 2, we will study
one most commonly used estimation approach in statistics: Maximum Likelihood Estimation.

Remark:

[Parameter of interest] Most often, the parameter(s) of our interest to be estimated (called
estimand) is a function of the unknown distribution parameter(s) 𝜃, say 𝑔(𝜃). For instance, we
may be interested in 𝜇 2 , instead of 𝜇, or 𝜎/𝜇, instead of 𝜇 or 𝜎 only, etc.
[Estimator? Estimate?] An estimator is a funciton of the random sample 𝑿, while an estimate is
the realized value of the estimator that is obtained when a sample of data is actually taken.

[Why is called ‘point’ estimation?] Note that the statistic 𝑇 indeed is a ‘point’ in 𝑅 𝑘 , where
𝑘 ≥ 1 represents the number of unknown parameters to be estimated. We use it to estimate
𝑔(𝜃), which is also a ‘point’ in 𝑅 𝑘 . So, that’s why 𝑇(𝒙) is called a point estimate of 𝑔(𝜃).
Caution: We ONLY estimate an UNKNOWN parameter(s). For any KNOWN parameter, there is
no point for us to estimate it!!!

~1~
MATH 3423 Statistical Inference | Dr. CW YU

2 GENERAL METHOD OF FINDING ESTIMATORS

In practice, there are a lot of estimation techniques which can be used to estimate an unknown
parameter(s), but we only detail one method --- maximum likelihood estimation, mainly
because it is most popular in statistics and data science, and has some desirable properties,
such as asymptotically unbiased and asymptotically normal.

2.1 MAXIMUM LIKELIHOOD ESTIMATION (MLE)

The method of maximum likelihood was popularized in mathematical statistics by Ronald
Aylmer Fisher in 1922. Nowadays, there are still a lot of research studying its properties.

Ronald Aylmer Fisher (1890-1962)

Fisher is one of the most prominent statisticians of the

19th-20th century. Other examples of his contributions are
sufficiency, consistency, efficiency, Fisher information,
genetical statistics, etc.

More details about him can be found in the article “How

Ronald Fisher became a mathematical statistician” by
Stephen M. Stigler.

Before showing how to find MLE, let’s first understand what the ‘likelihood’ is.

WHAT IS ‘LIKELIHOOD’?
Likelihood function is used to quantify how our observed data is likely to occur.
Definition: Consider a r.s. of size 𝑛 from a population with a pdf 𝑓(⋅ |𝜃) or pmf 𝑝(⋅ |𝜃). After
collection, we have the realization 𝒙 = (𝑥1 , … , 𝑥𝑛 )′ . The likelihood function is defined by
𝐿(𝜃) = 𝐿(𝜃1 , 𝜃2 , … , 𝜃𝑘 |𝒙) = ∏𝑛𝑖=1 𝑓(𝑥𝑖 |𝜃) for continuous cases and 𝐿(𝜃) = ∏𝑛𝑖=1 𝑝(𝑥𝑖 |𝜃) for
discrete cases.
Remark that 𝐿(𝜃) is a function of 𝜃, with 𝒙 held fixed.

~2~
Statistical belief highest pron if getting data E x xzixi
Discrete cases
joint pmfl P x_x 三⼤ Xn⼆加10
jointprob
⼆
熊 B 加101 where Pxisthe commonpintofthe
t getting I ncopiesof X
吣⽐ likelihood [Link] to be ⻛伏10
Dye the

Continuous
xiifi
the
ones
[Link]
fwwml [Link]
likelihood
Cannotuse
o variate

plxi x.X [Link] n xnl0

So we can consider
[Link].nl0
⼆
点
PlxiEXisxitdxilosbyindept fi
惑P EXcxitdxilo by
Xi identicallydistrito

⼆点
find们⼆
感 [Link]
MATH 3423 Statistical Inference | Dr. CW YU

The basic principle of the maximum likelihood estimation:

It comes from the statistical belief that there is a highest chance of getting our current
particular set of data. For each realization 𝒙, we look for a value of 𝜃, denoted by 𝜃̂, in Θ at
which 𝐿(𝜃) attains its maximum. In other words, the value --- maximum likelihood estimate
(MLE) --- makes our observed data be most likely to occur.

More formally, we have the following definition of MLE.

Definition (MLE): The maximum likelihood estimate is 𝜃̂ = argmax 𝐿(𝜃), which means
𝜃∈Θ

𝐿(𝜃̂ ) = max 𝐿(𝜃),

𝜃∈Θ

where max means the maximum over the parameter space Θ.

𝜃∈Θ

We also use the abbreviation MLE for the maximum likelihood estimator when we study the
properties of this “maximum likelihood” estimation method.
In some cases, especially when differentiation is used, it is easier to work with a natural
logarithm of 𝐿(𝜃), i.e. 𝑙(𝜃) = log 𝐿(𝜃), called log likelihood, than it is to work with 𝐿(𝜃)
directly. This is possible because the log function is strictly increasing, which implies that the
maxima of 𝐿(𝜃) and 𝑙(𝜃) coincide.

The MLE is oftenbiased Mostoften there is no closed

Remark:
MLE may not exist or may not be unique in Θ.
form of MLE
[Invariance property] If 𝜃̂𝑖 is the MLE for 𝜃𝑖 for 𝑖 = 1, … , 𝑘, then ℎ(𝜃̂1 , 𝜃̂2 , … , 𝜃̂𝑘 ) is the
MLE for ℎ(𝜃1 , 𝜃2 , … , 𝜃𝑘 ), where ℎ is a known function.

For 𝜃 ∈ 𝑅 𝑘 , 𝜃̂𝑛 is consistent, asymptotically unbiased, asymptotically efficient and

asymptotically normally distributed. To be more precise, under regularity assumptions,
we have
𝑑
√𝑛(𝜃̂𝑛 − 𝜃) → 𝑁𝑘 (𝟎, 𝐼−1
𝑋 (𝜃)),

where 𝐼𝑋 (𝜃) is known as Fisher Information matrix (More details about this matrix will
be discussed in part II) and it is a 𝑘 × 𝑘 matrix with the (𝑖, 𝑗)𝑡ℎ entry defined as

𝜕 𝜕
𝐸 [( log 𝑓𝑋 (𝑋|𝜃)) ( log 𝑓𝑋 (𝑋|𝜃))]
𝜕𝜃𝑖 𝜕𝜃𝑗
for 𝑖 = 1, … , 𝑘 and 𝑗 = 1, … , 𝑘.

~3~
MATH 3423 Statistical Inference | Dr. CW YU

There are three standard approaches to find MLE. Our job is to find a global maximum!!!
(i) If the parameter space Θ contains finitely many points, then an MLE can always be
obtained by simply comparing finitely many value of (log) 𝐿(𝜃), for all 𝜃 ∈ Θ.

(ii) If 𝐿(𝜃) is differentiable on the interior of Θ, then one possible way of finding an MLE
is to consider the values of 𝜃 = (𝜃1 , 𝜃2 , … , 𝜃𝑘 )′ in the interior that solve the
first-order/ likelihood/ log likelihood equations
immune 𝜕 𝜕
𝐿(𝜃) = 0 𝑜𝑟 𝑙(𝜃) = 0, 𝑓𝑜𝑟 𝑖 = 1, … , 𝑘.
kit 𝜕𝜃𝑖 𝜕𝜃𝑖

ikiriu
However, this is just a necessary condition for a maximum (or minimum), not a
sufficient condition. To be more precise, the solutions to the above equations are
just the critical points, which may or may not be extrema. Furthermore, the zeros of
the first derivative only locate the critical points in the interior of the domain of the
(log) 𝐿(𝜃). If the maximum occur on the boundary, the first derivative may not be
zero. Thus, the boundary must be checked separately for MLE.
[A special case for one-parameter cases] When 𝑘 = 1, there is a case that we can
get a global maximum easily. If there is a unique critical point and it has a negative
second derivative of (log) 𝐿(𝜃), then it must be a global maximum. Note that for
this case, we do not have to check any boundary point!!

Example: Consider a random sample of size 𝑛 from 𝑁(𝜃, 1). Then,

𝑛

To KI
1 (𝑥𝑖 −𝜃)2 1 ∑(𝑥𝑖 −𝜃)2
− −
𝐿(𝜃) = ∏ 𝑒 2 = 𝑒 2 .
√2𝜋 (2𝜋)𝑛/2
𝑖=1

The first derivative of log 𝐿(𝜃) being 0 is

𝑑 𝑑 −𝑛 𝑑 ∑(𝑥𝑖 − 𝜃)2
0= 𝑙(𝜃) = [ log(2𝜋)] + [− ] = ∑(𝑥𝑖 − 𝜃),
𝑑𝜃 𝑑𝜃 2 𝑑𝜃 2

which yields the solution 𝜃̂ = 𝑥̅ . To verify that it is, in fact, a global

maximum of log 𝐿(𝜃) (or 𝐿(𝜃)), we first note that it is the unique solution x
to the first-order equation. Second, we can check that

𝑑2
0 =
dhan 𝑙"(𝜃) = 𝑙(𝜃)| = −𝑛 < 0.
𝑑𝜃 2 𝜃=𝑥̅

Therefore, 𝜃̂ = 𝑥̅ is a global maximum --- MLE.

~4~ Notethat the parameter

space iisc [Link]
and I can be any real value
in
MATH 3423 Statistical Inference | Dr. CW YU

(iii) Another way to find an MLE is to abandon differentiation and proceed with a direct
maximization. One general technique is to find a global upper bound on (log) 𝐿(𝜃)
and then establish that there is a unique point for which the upper bound is
attained.

Example (cont’): Instead of using calculus, we can also show that 𝜃̂ = 𝑥̅ is MLE
algebraically. Note that ∑(𝑥𝑖 − 𝜃)2 ≥ ∑(𝑥𝑖 − 𝑥̅ )2 for any 𝜃, where
they are equal if and only if 𝜃 = 𝑥̅ . Thus, for any 𝜃 ∈ Θ,

I
𝐿(𝜃) ≤ 𝐿(𝑥̅ )

with equality if and only if 𝜃 = 𝑥̅ . Hence, the MLE for 𝜃 is 𝑥̅ .

祕 0於产伽义成⼀
⼆
FUxi
Remark that the global maximum finding problem in the above case can be solved for large
situations when some regularity conditions are required. For instance,
n
itc tui
0N [Link]
2⼼⼼⽐
⼀
20
⼆
Flxij
MATH5423 Advanced Stated inference nioi
More details can be found in the book “Theory of Point Estimation” written by 20
Xiii
E. L. Lehmann and George Casella.
⼆产

Examples for finding MLE forany

1. Consider a r.s. of size 𝑛 from 𝑁(𝜇0 , 𝜎 2 ), where 𝜎 2 ∈ (0, ∞) is unknown and 𝜇0 is known.
Use the MLE to estimate 𝜎 2 .

2. Consider a r.s. with size 𝑛 of 𝑋 ∼ 𝑃𝑜𝑖𝑠𝑠𝑜𝑛(𝜆), where 𝜆 ∈ (0, ∞). Find the MLE of 𝜆.

3. Consider a r.s. with size 𝑛 of 𝑋 ∼ 𝐵𝑖𝑛𝑜𝑚𝑖𝑎𝑙(1, 𝜃), where 𝜃 ∈ [0,1]. Find the MLE of 𝜃,
and then get the MLE when 𝑛 = 10 and ∑𝑛𝑖=1 𝑥𝑖 = 4.

4. Consider a r.s. of size 𝑛 of 𝑋 from 𝐺𝑎𝑚𝑚𝑎(𝛼, 𝛽), where 0 < 𝛼 < ∞ and 0 < 𝛽 < ∞.
Find MLE when

~5~
t
donotneed to estimate it

感 fcxiló where x xn are the data forthe rs from Ny 0

㳤意 eilc [Link] 㮺㷗
2
当总 Flxipo
2⽄62 e I took [Link]
write
[Link]
hey
I
afmun HE Not afountain
go
斯 _if if ii
0 chemise
Thus we can take the 1St part as L G
LG 2⽄62
垐 e总 Flxiyui

take log to
get
f 是 lnzn [Link] 與刮
and 14 derivative with respect to 62
0 l⼼芒六 t
t 忘了
Go
Ǒ⼆六年 xi
[Link] a
unique solution
Ò220 K 1
T
negative 2ndderivative
2
0 0 an xi are equal to
M 七⼼

斲
㡭
parameterspare
co oo
惑吣⼼阳⻔⼊⼊
等 [Link]
otherwise
n o
⼆
下
F1 器I灬
f Xiē叭
I can xi [Link]
恥
lrsr
hry⼊ [Link]
Not function a

of ⼊ of
三术
Thus ⼼⼊ Take log to go
Xi
T l 灬
Fxi 以 M Thai
and take 14
dernvanmurt.
三
t i ⼆亚
on
i K 1

点
嵓蕊器 iixi [Link]
T T 的
point

l
㞊 Parameter spare l
lonoo ⼽ 0 all xiao
The NNE exist 叺
and x P lxi [Link] ē 欧
guts
When zero observations
[Link]
hi n⼊ with No and no
⼼
器前扎⼊ does My exist Iii Ū 010 can beused
Thus Āo has limo
find l
⼊ 0
but it is NOTreasonable
But the MLE with
suphi 0 because ⼊ 70
does Mi exist in [Link]
感吣 10
热 Ücroilcxioorl

[Link]
a function
of 0 NOT
Thus Luo ǒxicrojnxi
and to
i Exijlnotln [Link]
i
⼼
Fxim
㘩 Fxilnotln [Link] his Qo
战
⼼1凹 uni
愁愁些
iii 1囵 [Link] TOinto⻔必Eisley oaia
iii 1凹 XTMLE is in toni
to
5
㘩 n h 10 to into⻔ MEI
the MLE for 0 into ⻔ is T
Tang
MATH 3423 Statistical Inference | Dr. CW YU

i. 𝛽 is unknown but 𝛼 is known, say 𝛼0 .

ii. 𝛼 is unknown but 𝛽 is known, say 𝛽0 .
iii. Both are unknown.
iv. Both are unknown, but the mean is known to be 𝐾0 .

5. Consider a r.s. of size 𝑛 from 𝑁(𝜇, 𝜎 2 ), where 𝜎 2 ∈ (0, ∞) is unknown and 𝜇 ∈ (−∞, ∞)
is unknown. Find the MLE of 𝜃 = (𝜇, 𝜎 2 )′ .

6. Suppose 𝑋 is from 𝑈[0, 𝜃], where 0 < 𝜃 < ∞. Find the MLE of 𝜃 if a r.s. with size 𝑛 of 𝑋
is considered.

7. Suppose 𝑋 is from 𝑈[𝜃 − 1, 𝜃 + 1], where −∞ < 𝜃 < ∞. Find the MLE of 𝜃 if a r.s. with
size 𝑛 of 𝑋 is considered.

8. Consider a r.s. of size 𝑛 from 𝑁(𝜇, 1), where 0 ≤ 𝜇 < ∞ is unknown. Find the MLE of 𝜇.

~6~
⼩ p is unknown but ⼩⼭ is known where 10 o
⼩
for p
惑fcxilp ⼆点器 xfè I 佖州
⼆
点器 [Link] in

at meth of p
Thus
up 惑器
xié llp惑 aolnp wnxntcao nlnxi pxi
naolnp nmnxostlxo D.mxi_ 噍xi
Then tip
with
0
unique critical point
yields
2nd derivative t
a
是 Go
negative 俴型
器
So the Mi E
of pislsl
ii x is unknown but pipo is known where o o
Similarly we have
tcxsznxlnpo
However there is No
nmnastcx [Link]
closed form of the solution
of tix 0
Thus we can
only get the estimate by numerical methods
Same as ⼼⻔ and civ
⼀
o.o x o.o
For Q f 息 where
[Link] ⼼ and GElong
2
to ⼆
望 hun ⼀
是 lnl ⼼⼀
⾔点 lip
2

we want to
max l lo
QE
to get the MLE㤀缘
Recall that xi
[Link] iforanyptl
and
[Link]

EE o.o
any
In otherwords
[Link] [Link] Elli 62 forany of co

地上 2
是毖器器 fm
max
along
his
Note that
tix 62 ⼆
望 [Link] lol ⼀
交点 xixi
是 man ⼀号mcó
哥
will be maximized at E Si Sio if and only if allxi
zero prob
[Link] and Gao
we have

llqi ad x E Coo o Si E Coo

Thus
I and
主I
感 fail 0 ⼆
杰 going
了
where 10 no

㐫 I goEx xi 01
a futon of 0
Thug
Le
j I [Link] 0
⼆
⾐ I losxpcxn so WhyNOT 加⼆加
加⼆加 X⼼如
X [Link]
加 20 zeroprob

ME in
otxrnso
加我
感 file ⽓ I flexion where [Link]

t I o 1 Exncxn 0⺾

f I 她 1 0 E 加⼗

1 40 ㄩ0

There are

n
.[Link] s0
MATH 3423 Statistical Inference | Dr. CW YU

2.2 FINDING MLE WITH R

Except for a few cases, typically we are only able to write down (log) 𝐿(𝜃) but cannot maximize
it analytically because there are no explicit solutions to the likelihood equation. However, there
is still some hope of maximizing it numerically by R or other statistical packages and, hence,
finding MLE. Note that when this is done, there is still always the question of whether a local or
global maximum is found.

PRINCIPLE OF THE NUMERICAL SOLUTION TO LIKELIHOOD EQUATIONS

Example: Consider a r.s. with size 𝑛 of 𝑋 ∼ 𝐶𝑎𝑢𝑐ℎ𝑦(𝜃). Find an MLE of 𝜃.

First, try to get (log) 𝐿(𝜃). Since the pdf of 𝑋 is 𝑓𝑋 (𝑥|𝜃) = 𝜋 −1 [1 + (𝑥 − 𝜃)2 ]−1 , the likelihood
is 𝐿(𝜃) = 𝜋 −𝑛 ∏𝑛𝑖=1[1 + (𝑥𝑖 − 𝜃)2 ]−1 and
𝑛

𝑙(𝜃) = −𝑛 log 𝜋 − ∑ log[1 + (𝑥𝑖 − 𝜃)2 ].

𝑖=1

Setting
𝑛
𝑑 2(𝑥𝑖 − 𝜃)
𝑙′(𝜃) = 𝑙(𝜃) = ∑ =0
𝑑𝜃 1 + (𝑥𝑖 − 𝜃)2
𝑖=1

yields the MLE (Again, we then also have to check if it is a global maximum.) Note that the
(solution) MLE cannot be solved explicitly in this case, but we can obtain/ approximate it by
numerical method like Newton-Raphson Algorithm.
According to Taylor, we have the following result:
1 ′ 1 1
0= 𝑙 (𝜃̂) ≈ 𝑙 ′ (𝜃) + (𝜃̂ − 𝜃) 𝑙"(𝜃).
𝑛 𝑛 𝑛
Thus,

𝜃̂ ≈ 𝜃 − 𝑙 ′ (𝜃)[𝑙"(𝜃)]−1 .
Newton-Raphson Algorithm:
−1
𝜃𝑗+1 ≈ 𝜃𝑗 − 𝑙 ′ (𝜃𝑗 )[𝑙"(𝜃𝑗 )] , 𝑗 = 0, 1, 2, …,

1
l 19
Q_Q Oj
~7~
MATH 3423 Statistical Inference | Dr. CW YU

R CORNER
We would use the R package maxLik to maximize (log) 𝐿(𝜃) in the following. Other R functions
like optim can also be used.
[Case 1: One unknown parameter]

t
initialguess

~8~
MATH 3423 Statistical Inference | Dr. CW YU

[Case 2: Multi unknown parameters]

Next, we give an example of a normal distribution with two unknown parameters

Note that for this case par is defined to be a vector.

~9~
MATH 3423 Statistical Inference | Dr. CW YU

3 ESTIMATOR EVALUATION
In addition to using MLE, we can also have a whole bunch of other estimators to estimate the
parameter(s) of interest. Thus, our next problem about the point estimation is how to evaluate
the goodness of the estimator, so that we can compare different estimators and then get the
best estimator in a class of estimators under consideration.

Bias
Vart
3.1 MEAN SQUARE ERROR (MSE)
For the evaluation of the goodness of an estimator, we consider the “closeness” of an estimator
𝜃̂(𝑿), or simply 𝜃̂, to the true unknown parameter 𝜃. (Note that 𝜃̂ used in this section 3
represents any estimator, it is not necessary to be MLE.)

So, it is reasonable to use a distance function to measure the closeness. Here we consider the
2
squared error norm (or 𝐿2 norm), (𝜃̂ − 𝜃) , because of its easy calculation and nice properties.

2
Note that (𝜃̂ − 𝜃) is random, so we need to find a way to remove its randomness to get a
numerical quantity for the comparison of different estimators. Conventionally, we fix this
problem by taking the expectation.

More precisely, we have

2
Definition (MSE): The mean squared error (MSE) of 𝜃̂ for 𝜃 is defined by 𝐸(𝜃̂(𝑿) − 𝜃) .

Note that MSE is a function of 𝜃 . For any two estimators, say 𝜃̂1 and 𝜃̂2 , if for all 𝜃 ∈ Θ,
2 2
𝐸(𝜃̂1 (𝑿) − 𝜃) ≤ 𝐸(𝜃̂2 (𝑿) − 𝜃) ,

and the inequality is strict for at least one 𝜃, then 𝜃̂1 is uniformly better than 𝜃̂2 . Consider a
class 𝑀 of all estimators for 𝜃, if there exists an estimator 𝜃̂ ∗∗ in 𝑀 that is uniformly better than
any other estimators in 𝑀, then 𝜃̂ ∗∗ is said to be a uniform minimum MSE estimator for 𝜃 in 𝑀.

However, such an estimator 𝜃̂ ∗∗ in general does not exist because (i) we are too Greedy to get a
uniform ‘best’ estimator over all 𝜃, and (ii) we are too Generous to consider too many (all)
estimators for 𝜃, even some of them are poor or not reasonable (like 𝜃̂ = 3423).

~ 10 ~
MATH 3423 Statistical Inference | Dr. CW YU

For (i), to remove the dependence of MSE on 𝜃, we can

1) Replace MSE by its maximum, and then compare estimators by looking at their respective
maximum MSE, naturally preferring the one with the smallest maximum MSE over 𝑀.
Such an estimator is said to be minimax.

2) Average out 𝜃, just as we average out the dependence on samples when going from
2 2
(𝜃̂ − 𝜃) to 𝐸(𝜃̂(𝑿) − 𝜃) . Then, a natural question is
“How should 𝜃 be average out?”
The answer is based on “Bayesian statistics”.

For (ii), we can restrict us to consider a particular class of estimators. Is it reasonable?

Yes, it is because sometimes a “very poor” estimator can be a locally best estimator. For
instance, 𝜃̂ = 3423 is undoubtedly a poor estimator because no information of data is used,
i.e. 3423 is always used to estimate an unknown parameter 𝜃 no matter what the observed
data are. However, it is the best if the true value of 𝜃 is really equal to 3423. Thus, at least, we
have to shrink a class of estimator to kick such a poor estimator out.

Obviously, we want to keep estimators with some nice properties. In this course, we keep
mean-unbiased estimators.

Definition (Unbiasedness): If an estimator 𝜃̂ satisfies 𝐸(𝜃̂) = 𝜃 for all 𝜃 ∈ Θ, then it is said to

be mean-unbiased or unbiased for 𝜃; otherwise, it is biased.

Interpretation:
The statement “𝜃̂ is unbiased for 𝜃" means that in repeating sampling, 𝜃̂ equals 𝜃 on average.
That is, in the long run, the amounts by which 𝜃̂ overestimates and underestimates 𝜃 will
balance.

2
Note that 𝐸(𝜃̂(𝑿) − 𝜃) = 𝑉𝑎𝑟 (𝜃̂(𝑿)) + 𝑏𝑖𝑎𝑠 2, where 𝑏𝑖𝑎𝑠 = 𝐸(𝜃̂ (𝑿)) − 𝜃.
So, if 𝜃̂ is unbiased for 𝜃, then its MSE is just its variance! In other words, we fix the bias to be
zero, and then look for an estimator with the smallest variance (or the most efficient!!). Such an
estimator is called a UMVUE --- uniform minimum variance unbiased estimator.

In part II, we would learn how to ‘catch’ UMVUE in some special cases.

i Existence maynot existbecause unbiasedestimatormay not en

ii uniqueness
~ 11 ~
MATH 3423 Statistical Inference | Dr. CW YU

Remarks:
1) According to Lemma 1 in Chapter 1, we know that the 𝑘 𝑡ℎ sample moment (about 0) is
2
unbiased for the 𝑘 𝑡ℎ population moment (about 0), and sample variance 𝑆𝑛−1 is
G
2
unbiased for 𝜃, but 𝑆𝑛 is not.

2) MLE is often biased.

3) The biased estimator is NOT always bad because a bias estimator can have a smaller
MSE than an unbiased estimator.

4) It is possible to have infinitely many diferent or NO unbiased estimators for 𝜃.

a. [Infinitely many] Consider a r.s. of size 𝑛 from a distribution with a finite mean 𝜃.
∑𝑛
𝑖=1(𝑎𝑖 𝑋𝑖 )
All estimators in form of are unbiased for 𝜃, where 𝑎1 , … , 𝑎𝑛 ∈ 𝑅 and
∑𝑛
𝑖=1 𝑎𝑖
∑𝑛𝑖=1 𝑎𝑖 ≠ 0.
𝜃
b. [No] Suppose that we have a r.s. from Binomial(1, 𝜃) with 𝑔(𝜃) = 1−𝜃 as the
parameter to be estimated. Note that there does not exist an unbiased estimator
for 𝑔(𝜃).

5) An unbiased estimator may be a poor estimate. X N Poisson

th W
For instance, consider a situation at which a telephonist has to leave the switch board
for a short time. Let 𝑋 be the number of telephone calls per 10 minutes. Suppose that
𝑋 ∼ 𝑃𝑜𝑖𝑠𝑠𝑜𝑛(𝜃) and the telephonist is absent for 20 minutes. We want to estimate the
probability that there is no calls during his absence, i.e. 𝑒 −2𝜃 .

If we can only observe the number of calls received in the preceding 10 minutes, say 𝑋1,
and want to use a function ℎ of 𝑋1 to get an unbiased estimator for 𝑒 −2𝜃 , then we
would have the result that ℎ(𝑋1 ) = (−1) 𝑋1 . Note that if 𝑥1 is any odd integer, then -1
will be obtained to be an estimator of the above probability 𝑒 −2𝜃 . It is very Poor!!!

6) Unbiasedness does not have an invariance property. That is, if 𝜃̂ is unbiased for 𝜃, then
ℎ(𝜃̂) may NO be unbiased for ℎ(𝜃). For instance, 𝑋̅ is unbiased for 𝜇, but 𝑋̅ 2 is not
unbiased for 𝜇 2 when 𝜎 > 0.

Eli ⼆
Uarlinǐ 台 ǐtǐ
⼆

~ 12 ~
[Link]
fhfy
Px

data oo
T hdl
fxonpxbfxil0
wiuu oo.rs1 ⼀⼀
o o
x Xixni T
17
statistic
⼀

下 IR EIRK
wherek is the
numberofunknown
parameterto be
estimated

Tx or Ò四 estimate a number like

[Link]

f [Link] Godness or Cod guess

5 closeness 仝
Ò区 0 2 平 Òix 0
2

depends on E
MSE
㞓公⽐ 0 i ⾂⼼⽐505
sense
cnet ru rv
How about the best MSE estimator
M all estimators for0
恐 MSE
However the best MSE estimatordoesMY exist
Remedy i use a smaller mass of estimator
satisfying some criteria
Criterion Unbinds
tn
豳⼈
Leiter aclass
Echo foranypossible
value of 0 in
ofunbiased
estimator
force
MBE Var Ò t bias
⼆ Varlò when unbiasedness is used

big 0

best MSE minimum Variance

over a class
of unbiasedness
for any 0 t 01
Uniform 竺⼼mum Karima Enbiased Estimator
[Link] Part⼆

a sequence ofÒn⼆⼼ 2 n
Langen
properties Asymptotic unbiased ElÒn 0 as ⼼
as me for any ⽐
ii Consisting 8 Òn 上 0 as
Asymptotic Normality ii ⽔
⼀

F 后 Òn 0 为 Mlo ⼼
吓 Yantai 剩
恐
e an
i of0
it Asymptotic Variance
of these three
Écanmjgny ⼼
器照
以 is 等
⼀
MATH 3423 Statistical Inference | Dr. CW YU

3.2 LARGE-SAMPLE PROPERTIES OF A POINT ESTIMATOR

We would consider the large-sample/large-𝑛/ asymptotic properties of a sequence of
estimators when it is difficult (or impossible) to check the evaluations in the finite-sample case.

1. Asymptotic Unbiaedness

Unbiasedness is nice, but in many cases our estimator, say MLE, may not be unbiased. Luckily,
when we assess the performance of a sequence of estimators asymptotically, we find that the
biases of most biased estimators will disappear. If an estimator whose bias tends to 0 as 𝑛 →
∞, then it is said to be asymptotically unbiased. More formally, we have

Definition (Asymptotic Unbiasedness): A sequence of estimator, {𝜃̂𝑛 : 𝑛 = 1,2, … , }, based on a

r.s. of size 𝑛 is said to be asymptotically unbiased if lim 𝐸(𝜃̂𝑛 ) = 𝜃, for all 𝜃 ∈ Θ.
𝑛→∞

Note that 𝑆𝑛2 , the MLE of 𝜎 2 , is asymptotically biased, although it is biasd in a finite-sample
case. That is, in the asymptotic sense, 𝑆𝑛2 can also enjoy a nice property.

2. Consistency ---- convergence in probability

𝑝
For consistency, 𝜃̂𝑛 has to be arbitrarily close to 𝜃 with a high probability, i.e. 𝜃̂𝑛 → 𝜃.

Definition (Consistency): A sequence of estimator, {𝜃̂𝑛 : 𝑛 = 1,2, … , }, based on a r.s. of size 𝑛 is

said to be consistency if, for any 𝜖 > 0, lim 𝑃(|𝜃̂𝑛 − 𝜃| ≤ 𝜖) = 1, for all 𝜃 ∈ Θ.
𝑛→∞

Note that although asymptotically unbiasedness and consistency look similar, but they cannot
imply each other in general. However, an estimator will be consistent if it is asymptotical
unbiased AND its variance 𝑉𝑎𝑟(𝜃̂𝑛 ) tends to 0 as 𝑛 → ∞.

3. Asymptotic Normality ---- convergence in distribution to normal

This property is important when we want to use 𝜃̂𝑛 to make more statistical inference about 𝜃,
say find a confidence intercal or do a hypothesis testing.

Definition (Asymptotic Normality): A sequence of estimator, {𝜃̂𝑛 : 𝑛 = 1,2, … , }, based on a r.s.

𝑑
of size 𝑛 is said to be asymptotically normal if √𝑛(𝜃̂𝑛 − 𝜃) → 𝑁(0, 𝜎𝜃2 ).
2
𝜎
Note that the asymptotic variance of 𝜃̂𝑛 is 𝑛𝜃 .

Recall that MLE is consistent, asymptotically unbiased, and asymptotically normal. The proofs
will be covered in part II.

~ 13 ~

Maximum Likelihood Estimation Explained
No ratings yet
Maximum Likelihood Estimation Explained
14 pages
Point Estimation Methods in Statistics
No ratings yet
Point Estimation Methods in Statistics
3 pages
Lec 13
No ratings yet
Lec 13
6 pages
Chapter 7. Statistical Estimation: 7.2: Maximum Likelihood Examples
No ratings yet
Chapter 7. Statistical Estimation: 7.2: Maximum Likelihood Examples
5 pages
MLE: Examples and Estimation Methods
No ratings yet
MLE: Examples and Estimation Methods
7 pages
Understanding Maximum Likelihood Estimation
No ratings yet
Understanding Maximum Likelihood Estimation
6 pages
Maximum Likelihood Estimation Explained
No ratings yet
Maximum Likelihood Estimation Explained
8 pages
Understanding Maximum Likelihood Estimation
No ratings yet
Understanding Maximum Likelihood Estimation
4 pages
MLE for Uniform Distribution Explained
No ratings yet
MLE for Uniform Distribution Explained
7 pages
Parametric Point Estimation Methods
No ratings yet
Parametric Point Estimation Methods
13 pages
Understanding Maximum Likelihood Estimation
No ratings yet
Understanding Maximum Likelihood Estimation
3 pages
MLE Properties and Estimation Techniques
No ratings yet
MLE Properties and Estimation Techniques
103 pages
Maximum Likelihood Estimation Explained
No ratings yet
Maximum Likelihood Estimation Explained
3 pages
MLE: Estimation Techniques in Statistics
No ratings yet
MLE: Estimation Techniques in Statistics
4 pages
Introduction To Maximum Likelihood Estimation (MLE)
No ratings yet
Introduction To Maximum Likelihood Estimation (MLE)
6 pages
Introduction To Maximum Likelihood Estimation (MLE)
No ratings yet
Introduction To Maximum Likelihood Estimation (MLE)
6 pages
Maximum Likelihood Estimator Explained
No ratings yet
Maximum Likelihood Estimator Explained
40 pages
Maximum Likelihood Estimation Explained
No ratings yet
Maximum Likelihood Estimation Explained
14 pages
Frequentist Estimation and MLE Concepts
No ratings yet
Frequentist Estimation and MLE Concepts
6 pages
Maximum Likelihood Estimation in Astronomy
No ratings yet
Maximum Likelihood Estimation in Astronomy
37 pages
Understanding Parameter Estimation
No ratings yet
Understanding Parameter Estimation
6 pages
Maximum Likelihood Estimators Explained
No ratings yet
Maximum Likelihood Estimators Explained
22 pages
Maximum Likelihood Estimation Overview
No ratings yet
Maximum Likelihood Estimation Overview
21 pages
LN20 Parameters Mle
No ratings yet
LN20 Parameters Mle
5 pages
Maximum Likelihood Estimation Explained
No ratings yet
Maximum Likelihood Estimation Explained
17 pages
Maximum Likelihood Estimation Methods
No ratings yet
Maximum Likelihood Estimation Methods
7 pages
Maximum Likelihood Estimator Explained
No ratings yet
Maximum Likelihood Estimator Explained
3 pages
Maximum Likelihood Estimation Explained
No ratings yet
Maximum Likelihood Estimation Explained
4 pages
DSA201 Lecture 6 DM
No ratings yet
DSA201 Lecture 6 DM
45 pages
Maximum Likelihood Estimators Explained
No ratings yet
Maximum Likelihood Estimators Explained
15 pages
Robust and Maximum Likelihood Estimators
No ratings yet
Robust and Maximum Likelihood Estimators
15 pages
Maximum Likelihood Estimators Explained
No ratings yet
Maximum Likelihood Estimators Explained
15 pages
Exponential Family & Point Estimation
0% (1)
Exponential Family & Point Estimation
33 pages
Maximum Likelihood Estimation in Finance
No ratings yet
Maximum Likelihood Estimation in Finance
39 pages
Maximum Likelihood Estimation Explained
No ratings yet
Maximum Likelihood Estimation Explained
1 page
Understanding Maximum Likelihood Estimation
No ratings yet
Understanding Maximum Likelihood Estimation
24 pages
Maximum Likelihood Estimation Overview
No ratings yet
Maximum Likelihood Estimation Overview
24 pages
Understanding Maximum Likelihood Estimation
No ratings yet
Understanding Maximum Likelihood Estimation
6 pages
Maximum Likelihood Learning of Gaussians For Data Mining
No ratings yet
Maximum Likelihood Learning of Gaussians For Data Mining
25 pages
MLE of Gaussian Mean and Bernoulli Probability
No ratings yet
MLE of Gaussian Mean and Bernoulli Probability
5 pages
PR Gaussian
No ratings yet
PR Gaussian
10 pages
Maximum Likelihood Estimation Examples
No ratings yet
Maximum Likelihood Estimation Examples
6 pages
Point Estimation-II
No ratings yet
Point Estimation-II
5 pages
Understanding Maximum Likelihood Estimation
No ratings yet
Understanding Maximum Likelihood Estimation
17 pages
Understanding Maximum Likelihood Estimation
No ratings yet
Understanding Maximum Likelihood Estimation
46 pages
Maximum Likelihood Estimation Explained
No ratings yet
Maximum Likelihood Estimation Explained
11 pages
Maximum Likelihood Estimation in STAT 414
No ratings yet
Maximum Likelihood Estimation in STAT 414
8 pages
Hasan Method: Estimation Techniques
No ratings yet
Hasan Method: Estimation Techniques
5 pages
Maximum Likelihood Estimation Guide
No ratings yet
Maximum Likelihood Estimation Guide
17 pages
Maximum Likelihood Estimation Overview
No ratings yet
Maximum Likelihood Estimation Overview
10 pages
Invariance Property of MLE Explained
No ratings yet
Invariance Property of MLE Explained
16 pages
Bias-Variance Trade-off & MLE Explained
No ratings yet
Bias-Variance Trade-off & MLE Explained
37 pages
Consistent Asymptotically Normal Estimators
No ratings yet
Consistent Asymptotically Normal Estimators
11 pages
Maximum Likelihood Estimation Explained
No ratings yet
Maximum Likelihood Estimation Explained
36 pages
Maximum Likelihood Estimation Explained
No ratings yet
Maximum Likelihood Estimation Explained
5 pages
Maximum Likelihood Estimation Explained
No ratings yet
Maximum Likelihood Estimation Explained
7 pages
Maximum Likelihood Estimation (MLE)
No ratings yet
Maximum Likelihood Estimation (MLE)
6 pages
Point Estimation Methods Explained
No ratings yet
Point Estimation Methods Explained
9 pages
Eureka Math Grade 3 Module 2 Tips For Parents
No ratings yet
Eureka Math Grade 3 Module 2 Tips For Parents
2 pages
Class 7 Rational Numbers Worksheet
71% (31)
Class 7 Rational Numbers Worksheet
3 pages
SYBSc Sem IV Math Question Bank
No ratings yet
SYBSc Sem IV Math Question Bank
2 pages
Chap 01 Relations and Functions Test
No ratings yet
Chap 01 Relations and Functions Test
3 pages
Exercise 3A: 1 A Multiply 2x y 6 by 3: e Multiply
No ratings yet
Exercise 3A: 1 A Multiply 2x y 6 by 3: e Multiply
2 pages
C Programming Lab Question Bank
No ratings yet
C Programming Lab Question Bank
3 pages
Mathematical Induction Techniques Explained
No ratings yet
Mathematical Induction Techniques Explained
12 pages
Basic Mathematics Assignment 2026
No ratings yet
Basic Mathematics Assignment 2026
4 pages
Graph Analysis and Function Solutions
No ratings yet
Graph Analysis and Function Solutions
18 pages
Evaluating Limits in Mathematics
No ratings yet
Evaluating Limits in Mathematics
8 pages
TVL Food and Beverage Exam Specifications
No ratings yet
TVL Food and Beverage Exam Specifications
13 pages
Lagrange Multipliers and Extrema Failures
100% (2)
Lagrange Multipliers and Extrema Failures
3 pages
Math 10 Q3: Combinations Module 3
No ratings yet
Math 10 Q3: Combinations Module 3
7 pages
S1 Mathematics Exam Question-Answer Book
No ratings yet
S1 Mathematics Exam Question-Answer Book
9 pages
Comprehensive Exam Syllabus Overview
No ratings yet
Comprehensive Exam Syllabus Overview
3 pages
Central Limit Theorem in Gaming
No ratings yet
Central Limit Theorem in Gaming
17 pages
What Is The Factorial of Hundred - Google Search
No ratings yet
What Is The Factorial of Hundred - Google Search
1 page
Understanding Pascal's Triangle
No ratings yet
Understanding Pascal's Triangle
4 pages
Psychology 128 Test
No ratings yet
Psychology 128 Test
2 pages
Degrees of Freedom for Laser Robot
No ratings yet
Degrees of Freedom for Laser Robot
5 pages
Theory of Computation Course Syllabus
100% (1)
Theory of Computation Course Syllabus
2 pages
Properties and Operations of Radicals
100% (2)
Properties and Operations of Radicals
4 pages
Staff Services Analyst Exam Study Guide
No ratings yet
Staff Services Analyst Exam Study Guide
12 pages
Probability and Statistics Overview
No ratings yet
Probability and Statistics Overview
2 pages
Math-8-2024-2025-Learning Plan
No ratings yet
Math-8-2024-2025-Learning Plan
6 pages
Edexcel C1 Maths Mock Paper
No ratings yet
Edexcel C1 Maths Mock Paper
4 pages
Business Mathematics Mock Test II
No ratings yet
Business Mathematics Mock Test II
16 pages
LU-Factorization and Elementary Matrices
No ratings yet
LU-Factorization and Elementary Matrices
15 pages
A Matlab (R) Companion To Comple - A. David Wunsch
100% (2)
A Matlab (R) Companion To Comple - A. David Wunsch
353 pages
Conformal Ricci-Bourguignon Solitons
No ratings yet
Conformal Ricci-Bourguignon Solitons
21 pages

Point Estimation in Statistics

Uploaded by

Point Estimation in Statistics

Uploaded by

MATH 3423 Statistical Inference | Dr.

Chapter 2: Point Estimation

2 GENERAL METHOD OF FINDING ESTIMATORS

2.1 MAXIMUM LIKELIHOOD ESTIMATION (MLE)

Ronald Aylmer Fisher (1890-1962)

Fisher is one of the most prominent statisticians of the

More details about him can be found in the article “How

plxi x.X [Link] n xnl0

The basic principle of the maximum likelihood estimation:

More formally, we have the following definition of MLE.

𝐿(𝜃̂ ) = max 𝐿(𝜃),

where max means the maximum over the parameter space Θ.

The MLE is oftenbiased Mostoften there is no closed

For 𝜃 ∈ 𝑅 𝑘 , 𝜃̂𝑛 is consistent, asymptotically unbiased, asymptotically efficient and

Example: Consider a random sample of size 𝑛 from 𝑁(𝜃, 1). Then,

The first derivative of log 𝐿(𝜃) being 0 is

which yields the solution 𝜃̂ = 𝑥̅ . To verify that it is, in fact, a global

Therefore, 𝜃̂ = 𝑥̅ is a global maximum --- MLE.

~4~ Notethat the parameter

with equality if and only if 𝜃 = 𝑥̅ . Hence, the MLE for 𝜃 is 𝑥̅ .

Examples for finding MLE forany

感 fcxiló where x xn are the data forthe rs from Ny 0

i. 𝛽 is unknown but 𝛼 is known, say 𝛼0 .

llqi ad x E Coo o Si E Coo

2.2 FINDING MLE WITH R

PRINCIPLE OF THE NUMERICAL SOLUTION TO LIKELIHOOD EQUATIONS

𝑙(𝜃) = −𝑛 log 𝜋 − ∑ log[1 + (𝑥𝑖 − 𝜃)2 ].

[Case 2: Multi unknown parameters]

Next, we give an example of a normal distribution with two unknown parameters

Note that for this case par is defined to be a vector.

More precisely, we have

For (i), to remove the dependence of MSE on 𝜃, we can

For (ii), we can restrict us to consider a particular class of estimators. Is it reasonable?

Definition (Unbiasedness): If an estimator 𝜃̂ satisfies 𝐸(𝜃̂) = 𝜃 for all 𝜃 ∈ Θ, then it is said to

i Existence maynot existbecause unbiasedestimatormay not en

2) MLE is often biased.

4) It is possible to have infinitely many diferent or NO unbiased estimators for 𝜃.

5) An unbiased estimator may be a poor estimate. X N Poisson

Tx or Ò四 estimate a number like

f [Link] Godness or Cod guess

best MSE minimum Variance

3.2 LARGE-SAMPLE PROPERTIES OF A POINT ESTIMATOR

Definition (Asymptotic Unbiasedness): A sequence of estimator, {𝜃̂𝑛 : 𝑛 = 1,2, … , }, based on a

2. Consistency ---- convergence in probability

Definition (Consistency): A sequence of estimator, {𝜃̂𝑛 : 𝑛 = 1,2, … , }, based on a r.s. of size 𝑛 is

3. Asymptotic Normality ---- convergence in distribution to normal

Definition (Asymptotic Normality): A sequence of estimator, {𝜃̂𝑛 : 𝑛 = 1,2, … , }, based on a r.s.

You might also like