Properties of Good Estimators Explained
Properties of Good Estimators Explained
STATISTICAL
Indira Gandhi National Open University
School of Sciences INFERENCE
Block
2
PROPERTIES OF GOOD ESTIMATOR
UNIT 6
Unbiasedness 163
UNIT 7
Consistency 181
UNIT 8
Efficiency and Mean Squared Error 201
UNIT 9
Sufficiency and Minimal Sufficiency 221
159
BLOCK 1: Sampling Distributions
Unit 1: Basic Concepts of Sampling Distribution
Unit 7: Consistency
160
BLOCK 2: PROPERTIES OF GOOD ESTIMATOR
In Block 1 of this course, you have studied the sampling distributions of various statistics
such as sample mean, difference of two sample means, sample proportion, difference of two
sample proportions, sample variance, and ratio of two sample variances. Also, have studied
some standard sampling distributions such as chi-square, t, and F-distributions that provide a
platform to draw inferences about the population parameters on the basis of the samples.
Estimation admits two problems; the first is to select some criteria or properties such that if
an estimator possesses these properties it is said to be the best estimator among all possible
estimators and the second is to derive some methods or techniques through which we obtain
an estimator which possesses such properties. This block is devoted to explaining the criteria
of a good estimator.
This block comprises four units.
Unit 6: Unbiasedness is devoted to explaining the concept of estimation (point and interval
estimation) with the first property of a good estimator, i.e. unbiasedness. The properties of an
unbiased estimator are also described in this unit.
Unbiasedness property is defined for a fixed sample size. In Unit 7: Consistency, you will
learn about consistency which is defined for increasing sample size. Here, we describe the
concept of consistency and its asymptotic distribution with a suitable normalisation with
examples. We also explain the properties of the consistent estimator.
There may exist more than one unbiased estimator of a parameter, therefore, to check which
one is better, we explain the concept of efficiency. Unit 8: Efficiency and Mean Squared
Error is devoted to explaining the concept of efficiency, mean squared error and minimum
variance unbiased estimator which help us to compare estimators and make the decision which
one is better.
In the continuation of finding the best estimator, we introduce the concept of sufficiency in
Unit 9: Sufficiency and Minimal Sufficiency. In this unit, you will study the concept of
sufficient and minimal sufficient estimators with examples. The Fisher-Nayman Factorization
theorem for finding sufficient estimators is also explained in this unit.
161
Notations and Symbols
SAQ/TQ : Self Assessment Question/Terminal Question
Fig./Figs. : Figure/Figures
X1, X2, …, Xn : A random sample of size n
X : Sample mean
2
S : Sample variance
µ and σ2 : Mean and variance of a population
E(X) and Var(X) : Mathematical expectation and variance of X
Z ~N (0, 1) : Standard normal variate
P and p : Population and sample proportion
a b : Beta function
B ( a,b ) =
a+b
a : Gamma function
Θ Parametric Space
T = t(X1 , X2 , ..., Xn ) : Estimator
162
UNIT 6
UNBIASEDNESS
Structure
6.1 Introduction 6.6 Summary
Expected Learning Outcomes 6.7 Terminal Questions
6.2 Basic Terminology 6.8 Solutions /Answers
6.3 Properties of Good Estimator
6.4 Unbiasedness
6.5 Properties of Unbiased
Estimator
6.1 INTRODUCTION
In many real-life problems, the population parameter (characteristic of the
population) is not known and someone is interested in obtaining the value of Tools You Will Need
the parameter. But, if
The following terms are
• the whole population is too large to study, considered essential
background material for
• the units of the population are destructive in nature,
this Unit. If you doubt your
• there are limited resources and manpower available, etc. knowledge of any of these
terms, you should review
then it is not practically convenient to examine each unit of the population to
the appropriate Unit or
find the value of the parameter. For example, as you know many of us use
section before proceeding:
Facebook and you are interested to know the average age of the people who
use Facebook. However, the true value (average age) of Facebook users is • Sampling distributions
not known. The only way to know the true average age of Facebook users is (Units 2,3, 4 and 5).
to survey each and every person in the world who uses Facebook. But it is not • Probability distributions
possible to survey everyone in the world. In such a situation, one can select (MST-012).
randomly some persons who use Facebook and note their age. Suppose we
randomly selected 20 Facebook users and obtained the following data of their
age (in years):
20 42 36 30 20 52 32 18 70 22
45 18 40 16 18 20 30 19 41 20
If we use the sample average age to estimate the unknown average age of the
Facebook users, then we get an estimate of the same as
163
Unit Writer- Dr. Prabhat Kumar Sangal, School of Sciences, IGNOU, New Delhi
Block 2 Properties of Good Estimator
n
1 607
X
= ∑ =
n i=1
Xi = 30.25
20
This unit is divided into nine sections. Section 6.1 is introductory in nature. The
164 basic terms used in estimation are defined in Section 6.2. Section 6.3 is
Unit 6 Unbiasedness
devoted to explaining the criteria of a good estimator. Section 6.4 explores the
concept of unbiasedness with examples. The properties of an unbiased
estimator are described in Section 6.5. The unit ends by providing a summary
of what we have discussed in this unit in Section 6.6. The terminal questions
and the solution of the SAQs/TQs are given in Sections 6.7 and 6.8,
respectively.
In the next unit, we shall discuss the second characteristic of a good estimator,
that is, consistency.
Binomial (discrete)
2 n n− x n&p np np(1 – p)
P[X =
x] = Cxp x (1 − p ) ; x=
0,1,...,n
Poisson (discrete)
3 e−λ λ x λ λ λ
P [ X= x ]= ; x= 0,1,... ; λ > 0
x!
Uniform (discrete)
4 1 n
n +1 n2 − 1
P[X= x= ] ; = x 1,2,...,n 2 12
n
Hypergeometric (discrete)
M nM NM (N − M)(N − n )
5 Cx N−M Cn− x N, M & n
P[X ]
= x= x 0,1,...,min {M,n}
;= N N2 (N − 1)
N
Cn 165
Block 2 Properties of Good Estimator
Geometric (discrete) p p
6 p
P[X =
x] =
p (1 − p ) ; x =
0,1,2,...
x (1 − p ) (1 − p )2
Negative Binomial (discrete)
rp rp
7 x + r − 1 r r&p
P[X =
x] =
x
p (1 − p ) ; x =
0,1,2, ... (1 − p ) (1 − p )2
r −1
Normal (continuous)
2
1 x −µ
−
8 1 2 µ & σ2 µ σ2
= f (x) e σ ; − ∞ < x < ∞;
σ 2π
σ > 0, − ∞ < µ < ∞
Uniform (continuous)
10 1 a&b
a+b ( b − a )2
f (x)
= ; a < x < b,b > a 2 12
b−a
Exponential (continuous) 1 1
11 −θx θ
f ( x ) = θe ; x ≥ 0; θ > 0 θ θ2
Negative Exponential or simply exponential
(continuous)
x θ θ θ2
1 −θ
=f (x) e ; x ≥ 0; θ > 0
θ
Gamma (continuous)
b b
12 ba −bx a −1 a&b
f (x)
= e x ; x > 0;a,b > 0 a a2
a
Beta First Kind (continuous)
1 b −1 a ab
=
.13 f (x) xa −1 (1 − x ) ; 0 < x < 1; a&b 2
B ( a,b ) a+b ( a + b ) ( a + b + 1)
a > 0,b > 0
Standard Cauchy
Does
1
15=f (x) ; −∞ < x < ∞ --- not Does not exist
(
π 1 + x2 ) exist
Laplace
x −µ
16 1 − b µ&b µ 2b2
= f (x) e ; −∞ < x < ∞
2b
Parameter Space
The set of all possible values that the parameter θ or parameters θ1, θ2, …, θk
can assume is called the parameter space. It is denoted by Θ and is read as
“big theta”. For finding the parameter space of a parameter, we have to think
all possible values of the parameter yet the chance of these is very very small.
166 For example, suppose the parameter θ represents the average life of electric
Unit 6 Unbiasedness
bulbs manufactured by a company. Since the bulb can be fused at the initial
time 0 or at 1, 2, 2.3, 3 hours, and so on, therefore, it lies from 0 to ∞. Hence,
the parameter space of the average life of the bulbs, that is, θ = is Θ {θ : θ ≥ 0} .
It means that the parameter average life θ can take all possible values greater
than or equal to 0, Similarly, in a normal distribution (μ, σ2), the parameter
space of parameters μ and σ
= 2
is Θ {(μ,σ 2
}
) : −∞ < μ < ∞; 0 < σ < ∞ .
Mathematical Expectation
If X is a continuous random variable having the probability density function
f ( x ) , then the expected value of X (mean) is defined as
∞ ∞
E ( X) = ∫ ( ) ∫x
x f ( x ) dx and E Xr = r
f ( x ) dx
−∞ −∞
• E ( aX ) = aE ( X )
• E ( aX ± bY=
) aE ( X ) ± bE ( Y )
Variance
If X is a random variable then the variance of X in terms of expectation is
defined as
Var ( X ) =
2
E X − E ( X ) = ( )
E X2 − E ( X )
2
• Var ( aX ) = a2 Var ( X )
Var ( aX ±=
bY ) a2 Var ( X ) + b2 Var ( Y )
SAQ 1
If θ represents the average marks (out of 50) of the learner in the Term-End-
Exam paper of the MST-016 course, then find the parameter space of θ.
After understanding the basic definition and terminology which will help you to
understand the properties of a good estimator. We now finally discuss the
properties of a good estimator in the next section. 167
Block 2 Properties of Good Estimator
22 + 30
• Sample median
= X = 26
2
• Sample mode X0 = 20
max + min 70 + 14
• Average of extreme=
users = = 42
2 2
Now, the questions arise,
• Which estimator should you use, that is, which is likely to give estimates
closer to the true (but unknown) population value?
• Are some of the possible estimators better, in some sense, than the
others?”
In general, an estimator whose sampling distribution concentrates as closely
Unbiasedness as possible near the true value of the parameter may be regarded as a good
estimator. To give the answer to the above questions, Prof. Ronald A. Fisher
gave some properties of a good estimator which are as follows:
Consistency • Unbiasedness
Characte • Consistency
ristics of
good • Efficiency
estimator
Efficiency • Sufficiency
We shall discuss these properties one by one in the subsequent Units.
Now, answer the following Self Assessment Question.
Sufficiency
SAQ 2
Write properties of a good estimator.
We now discuss the first characteristic of a good estimator in the next section.
6.4 UNBIASEDNESS
In the previous units, you have studied that any statistic such as sample mean,
Any statistic which is used
sample variance, sample proportion, etc. which is used to estimate an
to estimate an unknown
unknown population parameter is known as an estimator. You also saw that
population parameter is
the value of any estimator changes from sample to sample, therefore, we
known as estimator.
consider the estimator as a random variable and we can find the mean and
variance of the estimator. So we can define an estimator as an unbiased
estimator as:
An estimator is said to be unbiased for a population parameter if and only if
the average or mean of the sampling distribution of the estimator is equal to
168 the true value of the parameter. This property of the estimator is called
Unit 6 Unbiasedness
unbiasedness.
Let us see some examples,
• In Unit 1, you have seen that the mean of the sampling distribution of the
sample mean of monthly salary of the employees is equal to the mean
salary of all employees of the industry. So sample mean is an unbiased
estimate of the population mean.
• Similarly, in Unit 3, we saw that the mean of the sample proportions of the
children who like to dance is equal to the population proportion. Therefore,
sample proportion is an unbiased estimate of the population proportion.
In general, we denote any population parameter such as a population mean,
population standard deviation, population proportion, and so on by the Greek
letter theta θ, and its estimator such as the sample mean, sample standard
deviation, and sample proportion by T or θ̂ (pronounced as “theta-hat”).
Mathematically,
If X1 , X2 , ..., Xn is a random sample of size n taken from a population whose
probability density (mass) function is f(x,θ) where θ is the population
parameter then an estimator T = t(X1 , X2 , ..., Xn ) is said to be an unbiased
An estimator is said to be
estimator of the parameter θ if and only if unbiased if the expected
E(T) = θ value of the estimator is
equal to the true value of
for all possible values of the parameter θ.
the parameter being
However, if the expected value of the estimator does not equal to the true estimated.
value of the parameter, then the estimator is said to be a “biased estimator”,
that is, if
E(T) ≠ θ
• If b(θ) > 0 or E(T) > θ, then the estimator T is said to be positively biased
for the parameter θ.
• If b(θ) < 0 or E(T) < θ, then the estimator T is said to be negatively biased
for the parameter θ.
• If E(T) → θ as n → ∞ , that is, if an estimator T is unbiased for a large
sample only then the estimator T is said to be asymptotically unbiased for
1
θ. For example, suppose E(T) = θ + then as n → ∞,E(T) → θ .
n
estimator.
Now, we explain the procedure to show whether an estimator is unbiased or
not for a parameter with the help of some examples.
Example 1: Show that the sample mean (X) is an unbiased estimator of the
population mean (µ) if it exists.
Solution: Let X1, X2, …, Xn be a random sample of size n taken from any
population with mean µ. We have to show that the sample mean X is an
unbiased estimator for µ, therefore, we have to find E(X) and check whether
it is equal to µ or not. That is,
( )
E X =μ
Consider,
X + X2 + ... + Xn
( )
E X =E 1
n
[By defination of the sample mean]
1
= E ( X1 ) + E ( X2 ) + ... + E ( Xn ) E ( aX + bY=
) aE ( X ) + bE ( Y )
n
Since X1, X2,…, Xn are randomly drawn from the same population with mean μ
and variance σ2, therefore,
E(X1=
) E(X2=
) ...= E(Xn=
) E(X)= μ
Thus,
1
E(X)
= μ + μ + ...=
+ μ 1=(nμ) μ
n n
n− times
Hence, the sample mean (X) is an unbiased estimator of the population mean
μ. Also if x1 , x 2 , ..., xn are the observed values of the random sample
n
1
X1 , X2 , ..., Xn then x =
n
∑ xi is an unbiased estimate of the population mean.
i=1
Vehicle 1 2 3 4 5 6 7 8 9 10
Speed
(in kph) 62 70 65 68 64 65 70 64 55 60
1 n X1 + X2 + ... + Xn
=X =
n i=1
Xi ∑ n
62 + 70 + 65 + 68 + 64 + 65 + 70 + 64 + 55 + 60
= 64.3
10
Now, we have to show that the sample average speed is an unbiased estimate
of the average speed of all lightweight vehicles on the roadway.
Since the speed of the vehicles is normally distributed and standard deviation
(σ) is known, therefore, the sample average speed also follows a normal
distribution with mean µ and variance σ2/n.
( )
Thus, E X = μ
Hence, the point estimate of the proportion of all defective water bottles is
0.05.
Example 4: A furniture company manufacturing square tables of a side length
µ. Thus, the area of the table will be µ2 (unknown). Based on n independent
measurements X1, . . . , Xn of the length, estimate area of the table. Assume
that the measurements of the length have mean µ and variance σ2.
Solution: Since the measurements of the length have mean µ and variance
σ2, therefore, the sampling distribution of mean has mean and variance as
follows:
σ2
( )
E X = μ and Var ( X ) =
n
σ2
( )
Var X =
n
and by the formula of variance, we have
171
Block 2 Properties of Good Estimator
( )
2
Var
= ( )
X E X2 − E X
( )
Therefore,
σ2
( )
2
E X2 = Var X + E X =
( )
n
+ μ2 ≠ μ2 ( )
Hence, X2 is not an unbiased estimator of the area of the table, that is, µ2
and we can calculate the bias of the estimator as
σ2 σ2
( )
E X2 − μ2 =
n
(
⇒ E X2 − μ2 =
n
) E ( a ) = a
We now find the value of k such that the estimator X2 − kS2 is unbiased for µ2.
(
E X2 − kS2 =
μ2 )
( )
E X2 − kE S2 = μ2 ( ) E ( aX ± bY=
) aE ( X ) ± bE ( Y )
σ2
μ2 + μ2 [since S2 is an unbiased estimator of σ2,i.e. E(S2)= σ2]
− kσ 2 =
n
Therefore,
1
k=
n
1 2 2
Hence, for k = , the estimator X − kS is unbiased for µ2.
n
Example 5: If X1 , X2 , ..., Xn is a random sample taken from a population with a
mean μ and variance σ2, then
n
∑ ( Xi )
1 2
=S′2 −X is a biased estimator of σ2.
n i=1
n 1 n
∑( )
2
whereas,
= S2 ′2 i.e. S2
S= Xi − X is an unbiased estimator of
n −1 n − 1 i=1
σ2.
2
Solution: We have to show that S′ is not an unbiased estimator for σ2,
therefore, we have to find E(S′2 ) and check whether it is equal to σ2 or not.
Therefore, we consider:
1 n 2 1 n 2
( )
S′2 E
E =
n
Xi − X= E
n ∑(
Xi − X ) ∑( ) E (=
aX ) aE ( X )
= i 1= i 1
1 n 1 n 2 n 2 n
= E
n i 1
∑(
Xi2 + X2 −=
2Xi X
E Xi +
n i 1 =i 1
)
X − −2X Xi
∑ ∑ ∑
= = =i 1
1 n 2 1 n n
= ∑
E Xi + nX2 − 2XnX
n i=1 X = Xi ⇒
n i 1 =i 1
∑ ∑
Xi = nX
=
172
Unit 6 Unbiasedness
1 n 2 1 n 2
= E =∑
n i=1
Xi − nX2
n ∑
E Xi − nE X2
( )
i=1
1 n
= ∑ ( ) 2 2
E Xi − nE X
n i=1
( )
1 n 2 σ2
= ∑(
n i=1
)
μ + σ 2 − n μ2 +
n
[ As discussed in Example 4]
1 2 σ2
= (2 2
)
n μ + σ − n μ +
n
n
σ2 n − 1 2
= μ2 + σ 2 − μ2 − = σ
n n
E S′2
= ( ) n −1 2
n
σ ≠ σ2
2
Hence, S′ is not an unbiased estimator for σ2.
Now, we check whether S2 is unbiased of σ2 or not, therefore, we consider:
E S2
= ( ) =
n
n −1
( )
E S′2
n n −1 2
=
n −1 n
σ σ2
SAQ 3
a. A company produces batteries for laptops and wants to estimate the
average life of the battery. For that, the statistician of the company
selected 5 batteries from the production and measured their lives. He
suggests two estimators for estimating the average life of the battery:
where X1, X2, X3, X4 and X5 repersent life of the selected batteries. It is
known that the life of batteries has mean μ and variance σ2. Are both
estimators unbiased?
b. Show that the sample mean and sample median are both unbiased
estimators for mean (μ) of a normal distribution.
1. Unbiased estimators may not be unique. For example, the sample mean
and sample median are unbiased estimators of the population mean of a
normal population.
2. Unbiased estimators do not always exist for all parameters. For example,
for a Bernoulli distribution (θ), there is no unbiased estimator for θ2.
Similarly, for a Poisson distribution (λ), there exists no unbiased estimator
for 1/ λ.
aT+ (1−a) T*
is also an unbiased estimator of θ where ‘a’ (0 ≤ a ≤ 1) is any constant.
SAQ 4
Some patients with high blood pressure are randomly assigned to a placebo
group and a treatment group. The placebo patients receive an inactive pill, and
the treatment patients receive a new drug that is expected to lower blood
pressure. After the patients are treated for two months, the high blood
pressures of the patients of both groups are measured and given as follows:
Placebo
140 165 170 140 135 170 165 150 140
Group (X)
Treatment
130 135 140 130 120 120 132 118 120
Group (Y)
If E ( X ) μ=
= 1, Var ( X ) σ1 and
2
= E ( Y ) μ=
2 , Var ( Y ) σ 2 , then
2
(i) ( )
Show that the statistic X − Y is an unbiased estimator of the parameter
(μ1 − μ2 ) . Also, find the estimate of the same using the given data.
(ii) Calculate the variance and standard deviation of the estimator in Part (i).
Also, find the estimate of standard error.
(iii) Calculate an estimate of the ratio σ1 / σ 2 .
We now end this unit by giving a summary of what we have covered in it.
6.6 SUMMARY
In this unit, we have covered the following points:
• If we estimate an unknown parameter by a single statistic then this
technique is known as point estimation whereas if we determine an
174 interval (using sample values) that contains the true value of the unknown
Unit 6 Unbiasedness
• The set of all possible values that the parameter θ or parameters θ1, θ2,
…, θk can assume is called the parameter space. It is denoted by Θ .
=E ( Xi ) μ =
and Var ( Xi ) σ=
2
for all i 1,2,...,5
We now consider
X + X 2 + X3 + X 4 + X5
E ( T1 ) = E 1
5 175
Block 2 Properties of Good Estimator
1
= E ( X1 ) + E ( X2 ) + E ( X3 ) + E ( X4 ) + E ( X5 )
5
1
=
5
[μ + μ + μ + μ + μ]
E ( T1 ) = μ
Similarly,
E ( T2 ) = μ
σ2
( )
E X = μ and Var ( X ) =
n
4. ( )
We have to show that X − Y is an unbiased estimator for (μ1 − μ2 ) ,
therefore, we have to find E ( X − Y ) and check whether it is equal to
(μ1 − μ2 ) or not. Thus, we consider
( ) ( ) ( )
E X − Y =E X − E Y =μ1 − μ2 E ( aX − bY ) =aE ( X ) − bE ( Y )
( )
Hence, the estimator X − Y is an unbiased estimator for (μ1 − μ2 ) .
We now find the estimate of the same using the given data. Since
estimate is the value of the estimator, therefore, we find the mean of
176 each group as
Unit 6 Unbiasedness
Placebo Treatment
X2 Y2
Group (X) Group (Y)
140 130 19600 16900
165 135 27225 18225
170 140 28900 19600
140 130 19600 16900
135 120 18225 14400
170 120 28900 14400
165 132 27225 17424
150 118 22500 13924
140 120 19600 14400
1375 1145 211775 146173
n
1 1 1375
=X ∑ =
n1 i=1
Xi = 152.78
9
n
1 2 1145
=Y =
n2 i=1
∑
Yi = 127.22
9
X−
= Y 152.78 − 127.22
= 25.56.
σ12 σ 22
(
SD X − Y = ) (
Var X − Y = ) +
n1 n2
σ12 σ 22
( )
SE X − Y = SD X − Y = ( ) +
n1 n2
σˆ 12 σˆ 22 S12 S22
(
SE X − Y = ) +
n1 n2
= +
n1 n2
S1 S12 212.43
= = = 1.83
S2 S22 63.58
Since X1, X2,…, Xn are randomly drawn from the same population having
mean θ, therefore,
E ( X1=
) E ( X2 =) ...= E ( Xn =) E ( X=) θ
Therefore, we consider
X + X2 + ... + Xn
( )
E X =E 1
n
[By defination of sample mean]
1 E ( aX + bY )
= E ( X1 ) + E ( X2 ) + ... + E ( Xn )
n = aE ( X ) + bE ( Y )
1 1
= (θ
θ
+ + ...=θ) =
+ (nθ ) θ
n n-times
n
N = 3 and n = 2
Therefore, the possible numbers of samples (with replacement) that can
be drawn from this population are Nn = 32 = 9. For each of these 9
samples, we will calculate the values of X,S′2 and S2 by the formulae
given below:
1 n 1 n 1 n
∑( ) ∑( )
2 2
X = Xi , S′2 =
n n
∑Xi − X and S2 =
n − 1
Xi − X
=i 1 =i 1 =i 1
and the necessary calculations for these results are shown in the
following table:
Calculation for X, S′2 and S2
2
∑ ( Xi - X )
2
Sample
Sample
Observations X S′2 S2
i=1
1 8, 8 8 0 0 0
2 8, 6 7 2 1 2
3 8,10 9 2 1 2
4 6, 8 7 2 1 2
5 6, 6 6 0 0 0
6 6, 10 8 8 4 8
7 10, 8 9 2 1 2
8 10, 6 8 8 4 8
9 10, 10 10 0 0 0
Total 72 12 24
1
X2 = ( 8 + 6 )= 7,...,
2
1
X9 = (10 + 10 )= 10
2
1 2 2
S1′2= ( 8 − 8 ) + ( 8 − 8 ) = 0,
2
1
S′22=
( 8 − 7 )2 + ( 6 − 7 )2 = 1,...,
2
1 2 2
S′92= (10 − 10 ) + (10 − 10 ) =
0
2
1
S12
=
( 8 − 8 )2 + ( 8 − 8=
)2 0,
2 −1
1 2 2
S22
= ( 8 − 7 ) + ( 6 − 7=
) 2,...,
2 −1 179
Block 2 Properties of Good Estimator
1 2 2
S92
= (10 − 10 ) + (10 − 10=
) 0
2 −1
From the above table, we have
k
1 1
( )
E X =
k
∑ Xi =9 × 72 =8 =μ
i=1
2
Therefore, S′ is not an unbiased estimator of σ2 whereas,
k
( ) ∑S
E S2 =
1
k
2
i
1
= × 24 =2.67 =σ 2
9
i=1
180
UNIT 7
CONSISTENCY
Structure
7.1 Introduction 7.5 Summary
Expected Learning Outcomes 7.6 Terminal Questions
7.2 Consistency 7.7 Solutions /Answers
7.3 Properties of Consistent
Estimator
7.4 Consistent Asymptotically
Normal Estimator
7.1 INTRODUCTION
In the previous unit, you have seen that there exists more than one estimator
for an unknown parameter. For example, for estimating unknown population
mean, we may use sample mean, sample median, sample mode, average of Tools You Will Need
extreme observations, etc. Now, the questions may arise: The following terms are
• Which estimator should we use, that is, which is likely to give estimates considered essential
closer to the true (but unknown) population value? background material for this
Unit. If you doubt your
• Are some of the possible estimators better, in some sense, than the knowledge of any of these
others?” terms, you should review
To answer the above questions, Prof. Ronald A. Fisher gave some the appropriate Unit or
section before proceeding:
characteristics of a good estimator which are as follows:
• Sampling distributions
(Units 2,3, 4 and 5).
• Basic terms of
estimation (Unit 6).
• Probability distributions
(MST-012).
In the previous unit, you studied one of the characteristics of a good estimator,
that is, unbiasedness.
An estimator is said to be unbiased for the population parameter if and only if
the average or mean of the sampling distribution of the estimator is equal to
the true value of the parameter. In other words, an estimator is said to be 181
Unit Writer- Dr. Prabhat Kumar Sangal, School of Sciences, IGNOU, New Delhi
Block 2 Properties of Good Estimator
unbiased if the expected value of the estimator is equal to the true value
of the parameter being estimated, that is,
E(T) = θ
This concept was defined for a fixed sample size. In this unit, you will learn
about consistency which is defined for increasing sample size.
This unit is divided into seven sections. Section 7.1 is introductory in nature.
Consistency is described with examples in Section 7.2. Section 7.3 is devoted
to describing various properties of the consistent estimator. Section 7.4 is
devoted to the study of an additional property of a consistent estimator, which
involves its asymptotic distribution. The unit ends by providing a summary of
what we have discussed in this unit in Section 7.5. The terminal questions and
the solution of the SAQs/TQs are given in Sections 7.6 and 7.7, respectively.
In the next unit, we shall discuss the third characteristic of a good estimator,
that is, efficiency.
7.2 CONSISTENCY
In the previous unit, we have learnt about unbiasedness. An estimator T is
said to be an unbiased estimator of a parameter, say, θ if the mean of the
sampling distribution of estimator T is equal to the true value of the parameter
θ, that is,
E(T) = θ
This concept was defined for a fixed sample size. In this section, we will learn
about consistency which is defined for increasing sample size. In general, we
construct an estimator as a function of an available sample of size n for a
parameter. Suppose we are able to keep collecting data and expanding the
In statistics, a consistent sample. In this way, we would obtain a sequence of estimates such as
estimator is an estimator T1 = t1(X1), T2 = t2(X1, X2), T3 = t3(X1, X2, X3),…, Tn = tn(X1, X2, ..., Xn),…. Here,
that converges to the true we denote an estimator as Tn (indexed by n) to represent the estimator based
value of the parameter as on sample size n, instead of T as used in the previous unit. The consistency is
the sample size increases. a property of what occurs as the sample size “grows to infinity”.
It means that the
If X1, X2, …, Xn is a random sample of size n taken from a population whose
estimation becomes more
probability density (mass) function is f(x,θ) where, θ is the population
and more accurate as
parameter then consider a sequence of estimators, say, {T1, T2,…, Tn}. A
more data is collected.
sequence of estimators {T1, T2,…, Tn} is said to be a consistent estimator for a
parameter θ if the deviation/difference of the values of an estimator from the
parameter tends to zero as the sample size increases. This indicates that as
182 sample size increases, the estimator values tend to approach the parameter.
Unit 7 Consistency
In other words, we can say that as the sample size approaches infinity, the
sampling distribution of a consistent estimator becomes concentrated on the
value of the parameter. It means that the standard error of the estimator
declines to 0 and the sampling distribution concentrates around the population
parameter.
For example, suppose {T1, T2, T3, …} is a sequence of estimators for
parameter θ whose true value is 5. As the sample size increases, the
sampling distributions of these estimators (as shown in Fig. 7.1) are getting
more and more concentrated near the true value θ = 5 (even the estimators
are biased) and the density is more tightly distributed around the true value.
As the sample size becomes infinite, the sampling distribution of the sequence
collapses to a spike at the true value. Therefore, we can say that this
sequence is consistent.
Fig. 7.1: Sampling distribution of a consistent estimator Tn with increasing sample size.
i.e. for every ɛ > 0 and η > 0, there exist n ≥ m such that
where m is some very large value of n. The above expressions are to mean
the same thing.
To find this probability, we convert it to the standard form. Recall from Unit 1
that the sample mean ( X ) has mean µ and finite variance σ2/n, therefore, we
184 convert X − μ in the standard form by dividing it by σ / n as
Unit 7 Consistency
X −μ ε n
lim P T=
n − θ < ε
lim P <
n →∞ n →∞
σ / n σ
By the central limit theorem (described in Unit 1 of this course), you know that
X −μ
the variate Z = is a standard normal variate for large sample size n.
σ/ n
Therefore,
ε n
θ < ε lim P Z <
lim P Tn −=
n →∞ n →∞
σ
−ε n ε n
= lim P <Z< X < a ⇒ −a < X < a
n →∞
σ σ
ε n
σ
b
= lim ∫ f ( z ) dz P [a < U < b] =
∫ f ( u ) du
n →∞
−ε n a
σ
ε n
σ
1 1
∫ f ( z ) =
2 2
= lim e− z /2
dz e− z /2
n →∞
ε n 2π 2π
−
σ
∞
1
∫
2
= e− z /2
dz
−∞ 2π
1 2
Since e− z
/2
is the pdf of a standard normal variate Z, therefore, the
2π
integration of this in the whole range −∞ to ∞ , is unity.
Thus,
lim P X − μ < ε=
lim P Tn − θ < ε= 1 as n → ∞
n →∞ n →∞
X + X2 + ... + Xn
( )
E X =E 1
n
[by definition of the sample mean]
1
= E ( X1 ) + E ( X2 ) + ... + E ( Xn ) E ( aX + bY=
) aE ( X ) + bE ( Y )
n
Since X1, X2,…, Xn are randomly drawn from the same population, therefore,
they also have same mean and variance. Therefore,
E(X1=
) E(X2=
) ...= E(Xn=
) E(X)= μ
185
Block 2 Properties of Good Estimator
Thus,
1 1
n( )
E(X)
= μ + μ + ...=+ μ = nμ μ
n
n-times
( )
E X =μ
Hence, the sample mean (X) is an unbiased estimator of the population mean
µ.
Now, we consider the variance of the sample mean (X) and check whether it
converges to 0 or not as n → ∞ .
Thus, we consider,
1
Var
= ( )
X Var ( X1 + X2 + ... + Xn )
n
If X and Y are two
independent random 1
= Var ( X1 ) + Var ( X2 ) + ... + Var ( Xn )
variables, then n2
Var(aX±bY) = a2 Var(X) + b2 1 2 2 1
Var(Y).
=
n2
σ + σ2
= n2 nσ
+ ... + σ (2
)
n-times
σ2
( )
Var X=
n
→ 0 as n → ∞
Hence, by the sufficient conditions of consistency, we can say that the sample
mean (X) is a consistent estimator of the population mean.
Number of Accidents 0 1 2 3 4 5 6
Frequency 10 12 12 9 5 3 1
Since X1 , X2 , ..., Xn are independent and come from the same Poisson
distribution, therefore,
E ( Xi ) =
E ( X) =
λ and Var ( Xi ) =
Var ( X ) =
λ for all i = 1, 2, …, n
186
We now consider
Unit 7 Consistency
1
( )
X E ( X1 + X2 + ... + Xn )
E=
n
[by definition of the sample mean]
1
= E ( X1 ) + E ( X2 ) + ... + E ( Xn ) E ( aX + bY=
) aE ( X ) + bE ( Y )
n
1 1
= λ +
λ λ = ( nλ ) =λ
+ ... +
n n-times n
( )
E X = λ
Var(aX±bY) = a2 Var(X) + b2
1
= 2 Var ( X1 ) + Var ( X2 ) + ... + Var ( Xn ) Var(Y).
n
1 1
= λ +
+ ... +
λ =
λ ( nλ )
n2 n-times n2
λ
Var X =( ) n
→ 0 as n → ∞
Hence, by the sufficient conditions of consistency, we can say that the sample
mean is a consistent estimator of the parameter λ of the Poisson distribution.
Since the mean of the Poisson distribution is λ, therefore, we estimate the
population mean by the sample mean. Thus, we calculate the sample mean of
the given data as follows:
1 0 10 0
2 1 12 12
3 2 12 24
4 3 9 27
5 4 5 20
6 5 3 15
7 6 1 6
N = 52 ∑ fX = 104
1
= × 104 =2
52
Hence, the estimate of the parameter λ is 2. 187
Block 2 Properties of Good Estimator
If the actual variation in the diameter of the ball bearing is σ2 then show that
S′2 is a consistent estimator.
Solution: To show that the proposed estimator S′2 is consistent, we have to
show that
• It is asymptotically unbiased, that is
E( S′2 ) → σ2 as n →∞ and
Therefore, we consider
1 n 2 1 n
( ) ( ) ( )
2
S′2
E = E ∑ Xi − X= E ∑ Xi − X E (=
aX ) aE ( X )
= n i 1= n i 1
1 1 n n
( ) ( ) =(n − 1) S
2 2
= E ( n − 1) S2 S2 = ∑ i X − X ⇒ ∑ Xi − X 2
n = n − 1 i 1=i 1
σ 2 ( n − 1) S
2
Therefore, we have
σ 2 ( n − 1) S σ 2
2
σ2
( )
E S′ 2
= E
n σ 2
= ( n − 1) = σ 2
−
n n
Therefore,
σ2 2 σ2
( )
lim E S′2= lim σ 2 − =
n
σ lim = 0
188
n →∞ n →∞
n→∞ n
Unit 7 Consistency
(σ ) (σ )
2 2
( n − 1) S2
2 2
1
= 2 Var σ2 = Var χ 2
n −1
= × 2 ( n − 1)
n σ2 n2 n2
1 1
n →∞
( )
lim Var S′2 = 2σ 4 lim − 2 = 2σ 4 × 0 = 0
n →∞ n
n
1 n
( )
2
Hence, the estimator
= S′2 ∑ Xi − X
n i=1
is a consistent estimator for
I think you have understood what a consistent estimator is and how to check
whether an estimator is consistent or not. Let us study the properties of a
consistent estimator in the next session.
1. Consistent estimators may not be unique. For example, the sample mean
189
Block 2 Properties of Good Estimator
and the sample median both are consistent estimators of the population
mean of a normal population (see Example 1 and SAQ 1(i)). Also, the
sample variances S′2 and S2 , respectively are consistent estimators for
population variance (see Example 3 and TQ 2).
Thus,
E ( T ) E=
= ( X1 ) μ
Therefore, the estimator T= X1 is an unbiased estimator.
Since Var ( T ) = Var ( X1 ) = σ 2 but it does not converge to zero as n tends to
infinity, therefore, by the sufficient conditions of the consistency, it is not a
consistent estimator.
Example 5: Consider Example 2. Find the consistent estimator of λ(λ –1).
Also, find the estimate of it.
(i) First, we find the consistent estimator of the parameter (here λ).
(ii) After that, we check whether the given function is continuous or not.
(iii) Finally, we use the invariance property of the consistent estimator to find
the consistent estimator of the given function.
In Example 2, we showed that the sample mean (X) is a consistent estimator
of the parameter λ. Therefore, we move the second point and check whether
λ(λ –1) is a continuous function or not. Since λ(λ –1) is a polynomial and we
know that each polynomial is a continuous function, therefore, λ(λ –1) is a
continuous function. Since the sample mean is a consistent estimator of λ and
λ(λ –1) is a continuous function of λ, therefore, by the invariance property of
consistency X(X − 1) will be the consistent estimator of λ(λ –1).
It is now time for you to try the following Self Assessment Question.
SAQ 2
Suppose it is known that the probability that a certain company experiences a
network failure in a given week is θ and the distribution of the number of
weeks the company does not experience a network failure follows a geometric
distribution with parameter θ, then show that the sample mean X is a
consistent estimator of 1/θ. Also, find a consistent estimator of e1/θ.
consistent estimator for the population mean. To show the impact of the
sample size on the sampling distribution of the consistent estimator, we plot
the sampling distribution of the estimator (the sample mean for the population
mean), say, θ = 5 for sample sizes 100, 300, 500, 1000… as shown in
Fig. 7.2.
Fig. 7.2: Sampling distribution of sample mean with increasing sample size.
From Fig. 7.2, you can see that as the sample size increases the shape of the
sampling distribution changes and its average becomes increasingly tight
around the true value of the population mean. Also, in Unit 15 of the course
MST-012: Probability and Probability Distributions, you have studied that
convergence in probability implies convergence in law (distribution), therefore,
the estimator Tn converges to the parameter θ = 5 in distribution as sample
size approaches infinity. It means that the asymptotic distribution of the
consistent estimator Tn is degenerate at θ (in our case θ = 5, as shown in
Fig. 7.2). It indicates that the estimator will take one value θ with probability 1.
Such a degenerate distribution is not helpful to find the rate of convergence or
to find an interval estimator of θ. If we know the sampling distribution of our
estimator for every sample size, we could use it to draw inferences using this
finite-sample distribution. Hence, we aim to find an estimator whose sampling
distribution does not change for a large sample size. For that, we use the
concept of consistent asymptotically normal distribution.
It is observed that if we re-centre and re-scale the estimator, then the form of
the sampling distributions of the new version of the estimator does not change
with sample size and non-degenerate as the sample size tends to infinity.
Also, the shape of the sampling distribution gets arbitrarily close to a normal
distribution as the sample size increases. For illustration purposes, instead of
looking at the distribution of the estimator Tn = X for sample size n, let’s look
at the distribution of n ( Tn − θ0 ) , where θ0 is the true value of the population
192
mean (parameter) for which the estimator Tn is consistent. We plot again the
Unit 7 Consistency
Fig. 7.3: Sampling distribution of a re-centre and re-scale sample mean with increasing
sample size
From the above figure, you can observe that the sampling distributions of
n ( Tn − θ0 ) =n(X − 5.0) (for sample size 100, 300, 500 and 1000) are
indistinguishable from each other and look closely. They also follow the normal
distributions with mean 0 and a constant variance σ2 (population variance). In
other words, we can say that the distribution of n ( Tn − θ0 ) =n(X − 5.0) gets
arbitrarily close to a N(0, σ2) distribution as n → ∞. Therefore, we can define
the consistent asymptotically normal estimator as follows:
An estimator Tn is said to be a consistent asymptotically normal estimator for
the parameter θ if the sampling distribution of n ( Tn − θ0 ) follows a normal
distribution with mean 0 and constant variance σ2.
We now end this unit by giving a summary of what we have covered in it.
7.5 SUMMARY
In this unit, we have covered the following points:
Var ( Tn ) → 0 as n → ∞ .
1. (i) To check the consistency, first, we have to show that the sample
median ( X ) is asymptotically unbiased or simply an unbiased
and
estimator for population mean µ. Therefore, we have to find E(X)
check whether it is equal to µ or not as n → ∞ .
= 1.253 σ
SD X n
= μ , therefore, the sample median is an unbiased
Since E X
estimator of the population mean µ.
194
whether it converges to 0 or not as n → ∞ . Therefore, we consider,
Unit 7 Consistency
2
σ σ2
Var
=
X( ) (=
SD ) 1.253 =
2
1.57
n n
( )
=→ 0 as n → ∞
Var X
Consider,
1 n 1
E ( T1 ) = E ∑ = Xi E ( X1 ) + E ( X2 ) + ... + E ( Xn )
n i=1 n
1 1
= θ θ
+
+ ...= θ =
+ (nθ ) θ
n n-times n
E ( T1 ) = θ
1
= θ θ
+
+ ... +
θ
n + 1 n-times
1 n
= = (nθ ) θ
n +1 n +1
1
Since E ( T2 ) = + 1 θ ≠ θ
n
Hence, the estimator T2 is not an unbiased estimator of the parameter
θ.
We now check the consistency of both estimators. For that, we check
whether the variances of these estimators converge to 0 or not as
n → ∞ . Therefore, we consider
1
Var
= ( T1` ) Var ( X1 + X2 + ... + Xn )
n 195
Block 2 Properties of Good Estimator
1
= Var ( X1 ) + Var ( X2 ) + ... + Var ( Xn )
n2
1 2 1
=
n2
θ + θ2 + ... + θ2 = 2 nθ2
( )
n-times n
θ2
Var ( T1`=
) → 0 as n → ∞
n
We consider
1
Var ( T2` ) Var
= ( X1 + X2 + ... + Xn )
n + 1
1
= Var ( X1 ) + Var ( X2 ) + ... + Var ( Xn )
(n + 1)
2
1 2 1
= θ + θ2 + ... + θ2 =
nθ2 ( )
(n + 1) ( n + 1)
2 2
n − times
nθ2
Var ( T2` ) == → 0 as n → ∞
(n + 1)
2
1
E ( T2 )
Also,= θ → θ as n → ∞
1
1 + n
(iii) As we know the pdf of the uniform distribution with parameters ‘a’ and
‘b’ as follows:
1
f (x)
= ; a≤ x≤b
b−a
The mean and variance of the uniform distribution are given as follows:
(b − a )
2
a+b
E ( X) = and Var ( X ) =
2 12
1 1
(x)
f= = ; θ≤x≤θ+6
θ+6−θ 6
(θ + 6 − θ) 3
2
θ+θ+6
E ( X) = θ + 3 and Var ( X ) ==
=
2 12
Let X1 , X2 , ..., Xn denote the weight of the person on n random winter
days, therefore,
E ( Xi ) =
E ( X) =
θ + 3 and Var ( Xi ) =
Var ( X ) =
3 for all i = 1, 2, ..., n
1 By definition of
E(X)
= E (X1 + X2 + ... + Xn ) sample mean
n
1 E ( aX + bY )
= [E(X1 ) + E(X2 ) + ... + E(Xn )]
n = aE ( X ) + bE ( Y )
1
= ( θ + 3 ) + ( θ + 3 ) + ... + ( θ + 3 )
n
n-times
1
= n ( θ + 3 ) = θ + 3
n
Therefore, the sample mean is an unbiased estimator of ( θ + 3 ) .
( ) ( )
E X → θ + 3 and Var X → 0 as n → ∞
Therefore, we consider,
1
Var
= ( )
X Var (X1 + X2 + ... + Xn )
n
1
=
n2
[ Var(X1 ) + Var(X2 ) + ... + Var(Xn )]
1
= 3 +
2
3
+ ... +
3
n n-times
1 3
2 (
= = 3n )
n n
3
( )
Var X =
n
→ 0 as n → ∞
( )
Thus, E X = θ + 3 and Var(X) → 0 as n → ∞
1 1− θ
=E(X) = and Var ( X )
θ θ2 197
Block 2 Properties of Good Estimator
Since X1 , X2 , ..., Xn are independent and come from the same geometric
distribution, therefore,
E(Xi ) = E(X) and Var ( Xi ) = Var ( X ) for all i = 1, 2, …, n
Therefore, we consider,
1
E=( )
X E (X1 + X2 + ... + Xn ) [by definition of the sample mean]
n
1 E ( aX + bY )
= [E(X1 ) + E(X2 ) + ... + E(Xn )]
n = aE ( X ) + bE ( Y )
1 1 1 1
= + + ... +
n
θ
θ θ
n-times
1n 1
= =
n θ θ
1
( )
E X =
θ
1
Thus, the sample mean is an unbiased estimator of .
θ
We now consider
1
Var
= ( )
X Var (X1 + X2 + ... + Xn )
n
1
=
n2
[ Var(X1 ) + Var(X2 ) + ... + Var(Xn )]
1 1 − θ 1 − θ 1 − θ
= 2 2 + 2 + ... + 2
n θ θ θ
n − times
1 1 − θ 1 1 − θ
= = n
n2 θ2 n θ2
1 1− θ
= ( )
Var X
n θ2
→ 0 as n → ∞
1
Since E X
= ( ) θ
and Var(X) → 0 as n → ∞
Terminal Questions(TQs)
198 1. Refer to Section 7.2.
Unit 7 Consistency
E(S2) → σ2 as n →∞ and
• The variance of S2 tends to zero as n tends to infinity, that is,
Var(S2) →0 as n →∞
Therefore, we consider
1 n
( ) ( )
2
=E S2 E ∑ Xi − X
n − 1 i=1
1 n 2
= (
E ∑ Xi − X
n − 1 i=1
)
1
= E ( n − 1) S2
n −1
1 n n
( ) ( ) =(n − 1) S
2 2
=
S 2
= ∑ X i
n − 1 i 1=i 1
− X ⇒ ∑ Xi − X 2
σ 2 ( n − 1) S
2
and
( n − 1) S2
Var χ(n−1)
2
Var 2
=
σ
( n − 1) S2
Var 2
= 2 ( n − 1)
σ
Therefore, we have
σ 2 ( n − 1) S σ 2
2
( )
E S
= 2
E
n − 1 σ 2
=
n − 1
(n −=
1) σ 2
(n − 1)
2
i=1
1 2 ( n − 1) S2
= Var σ
(n − 1)
2
σ2
(σ ) (σ )
2 2
2 2
= Var χ =
2
n −1 × 2 ( n − 1)
(n − 1) (n − 1)
2 2
1
n →∞
( )
lim Var S2= 2σ 4 lim
=
n →∞ n − 1
2σ 4 × =
0 0
variance σ2.
200
UNIT 8
EFFICIENCY AND
MEAN SQUARED ERROR
Structure
8.1 Introduction 8.6 Minimum Variance Unbiased
Estimator
Expected Learning Outcomes
8.7 Summary
8.2 Concept of Efficiency
8.8 Terminal Questions
8.3 Most Efficient Estimator
8.9 Solutions /Answers
8.4 Properties of Efficient
Estimator
8.5 Mean Squared Error
8.1 INTRODUCTION
In the previous Units 6 and 7, we discussed two characteristics: unbiasedness
Tools You Will Need
and consistency of a good estimator with various examples. I hope you
understand both properties. You have also seen that the sample mean and The following terms are
sample median both are unbiased and consistent for the population mean µ considered essential
background material for this
when sampling is done from a normal population with mean µ and variance σ2.
Unit. If you doubt your
Now, the question may arise: Are they as “good” as one another, or is there
knowledge of any of these
some reason to prefer one over another? This means that we need to consider
terms, you should review
other characteristics of a good estimator to check which one is better in
the appropriate Unit or
comparison to another. Thus, this unit is devoted to explaining the concept of section before proceeding:
efficiency, mean squared error and minimum variance unbiased estimator
• Sampling distributions
which help us to compare estimators and make the decision which one is
(Units 2,3, 4 and 5).
better.
• Basic terms of
This unit is divided into nine sections. Section 8.1 is introductory in nature.
estimation (Unit 6).
There may exist more than one unbiased estimator of a parameter, therefore,
to check which one is better, we explain the concept of efficiency in Section • Unbiased and
8.2. If we have a class of unbiased estimators of a parameter, then to compare consistency (Units 6
and 7).
them, we use the concept of the most efficient estimator which is explained in
Section 8.3. Section 8.4 is devoted to discussing the properties of efficient • Probability distributions
estimators. Section 8.5 explains the concept of the mean squared error. (MST-012).
Section 8.6 describes the minimum variance unbiased estimator. The unit
ends by providing a summary of what we have discussed in this unit in Section
8.7. The terminal questions and the solution of the SAQ/TQ are given in 201
Unit Writer- Dr. Prabhat Kumar Sangal, School of Sciences, IGNOU, New Delhi
Block 2 Properties of Good Estimator
Sections 8.8 and 8.9, respectively.
From Fig. 8.1, you can observe that the centre of both sampling distributions is
θ so both estimators are unbiased, however, the sampling distribution of the
estimator T2 is more spread than the estimator T1, therefore, we can conclude
that the variance (spread) of the estimator T1 is smaller than the estimator T2.
However, it is clear that we would also desire the estimator whose sampling
distribution not be too spread out around the true value of the parameter
because if it is too spread then there will be a high probability that an estimate
could be generated will have a significant distance from the true value of the
202 parameter. Therefore, there is a necessity for some further criterion which will
Unit 8 Efficiency and Mean Squared Error
statistical inference is important in comparing the performance of various efficient at a larger sample
size than sample size of
estimators. The efficiency of an estimator can also be treated as the precision
your data.
of the estimate. If an estimator is more efficient then we can say that it is the
more precise estimator of the parameter.
Let us look at an example to see how this definition works.
Example 1: A company produces batteries for laptops and wants to estimate
the average life of the batteries. For that, the statistician of the company
selected 5 batteries from the production and measured their lives. He
suggests two unbiased estimators for estimating the average life of the
batteries:
X1 + X2 + X3 + X4 + X5 X + 2X2 + 3X3 + 4X4 + 5X5
T1 = and T2 = 1
5 15
where X1, X2, X3, X4 and X5 represent the life of the selected batteries. If it is
known that the life of batteries has mean µ and variance σ2 then which one is
more efficient?
Solution: We have to check which one of these proposed unbiased
estimators T1 and T2 is more efficient. Therefore, we have to find the variances
of both estimators and check which one is smaller. Since X1, X2, X3, X4 and X5
are independent and taken from the same population with a mean µ and
variance σ2, therefore,
E ( Xi ) = μ and Var ( Xi ) = σ 2 for all i = 1,2,...,5
So we consider,
If X and Y are two
X + X 2 + X3 + X 4 + X5
Var ( T1 ) = Var 1
independent random
5 variables and a & b are two
constants, then
1
= Var ( X1 ) + Var ( X2 ) + Var ( X3 ) + Var ( X4 ) + Var ( X5 )
25
1
= σ 2 + σ 2 + σ 2 + σ 2 + σ 2 Var ( Xi ) = σ 2
25 203
Block 2 Properties of Good Estimator
1
=
25
(
5σ 2 )
1 2
Var ( T1 ) = σ
5
Similarly,
If X and Y are two X + 2X2 + 3X3 + 4X4 + 5X5
independent random Var ( T2 ) = Var 1
15
variables and a & b are two
constants, then
1 Var ( X1 ) + 4Var ( X2 ) + 9Var ( X3 )
=
225
+ 16Var ( X 4 ) + 25Var ( X )
5
1
=
225
(
σ 2 + 4σ 2 + 9σ 2 + 16σ 2 + 25σ 2 ) Var ( Xi ) = σ 2
55σ 2
=
225
11σ 2
Var ( T2 ) =
45
Since, Var ( T1 ) < Var ( T2 ) , therefore, we conclude that the estimator T1 is more
efficient than T2.
Example 2: Show that the sample mean is a more efficient estimator than the
sample median for estimating the mean of the normal population.
Solution: To show that the sample mean is a more efficient estimator than the
sample median for estimating the mean of the normal population, we have to
compare the variance of the sample mean with the variance of the sample
median.
Let X1, X2, …, Xn be a random sample taken from a normal population with
mean μ and variance σ2. Also, let X and X be the sample mean and sample
median, respectively. We have seen in Unit 2 that the sampling distribution of
mean from a normal population follows a normal distribution with means µ and
variance σ2/n. Similarly, the sampling distribution of the median from a normal
π σ2
population also follows a normal distribution with mean µ and variance .
2 n
Therefore,
σ2
( )
Var X =
n
2
= πσ
Var X( )2n
σ2 πσ2
Since
n
<
2n
π
( ) ( )
2 > 1 , therefore, Var X < Var X . Thus, we conclude
that the sample mean is a more efficient estimator than the sample median.
I hope you understood the concept of efficiency and how to check which one is
more efficient between the two estimators. Therefore, before going to the next
section, you should assess yourself by answering the following Self
204 Assessment Question.
Unit 8 Efficiency and Mean Squared Error
SAQ 1
A company manufactures fruit juice packets. Suppose the weight of juice
packets follows a normal distribution with mean weight µ ml and standard
deviation σ ml. To estimate the average weight of the fruit juice packets, the
quality control inspector measured the weight of three selected fruit juice
packets X1, X2, and X3 ml and proposed two estimators for estimating the
average weight of fruit juice packets µ as follows:
X1 + X2 + X3 X1 + X2 X3
=T1 and
= T2 +
3 4 2
Are both estimators unbiased for µ? Which one of them is more efficient?
e=
( )
Var T *
Var ( T )
=e
( ) <1
Var T *
Var ( T )
Since X1, X2, X3, and X4 are independent and taken from the same group of
the LED bulbs (population) with a mean µ and variance σ2, therefore,
E ( Xi ) = μ and Var ( Xi ) = σ 2 for all i = 1,2,..., 4
Therefore,
2μ + 3μ + αμ
=μ ⇒ α = 5
10
Similarly, to find the value of β, we consider
E ( T3 ) = μ
X + X2 + βX3 E ( X1 ) + E ( X2 ) + βE ( X3 )
E 1 μ⇒
= μ
=
5 5
μ + μ + βμ
=μ ⇒ β =3
5
To check which one of these proposed estimators T1, T2 and T3 is most
efficient, we have to find variances of the estimators and check which one is
the smallest. Therefore, we consider
If X and Y are two
X + X 2 + X3 + X 4
independent random Var ( T1 ) = Var 1
variables and a & b are two 4
constants, then
1
= Var ( X1 ) + Var ( X2 ) + Var ( X3 ) + Var ( X4 )
16
1 1
=
16
σ 2 + σ 2 + σ 2 + σ 2 =
16
(
4σ 2 )
σ2
Var ( T1 ) =
206 4
Unit 8 Efficiency and Mean Squared Error
Similarly,
If X and Y are two
2X + 3X2 + 5X4
Var ( T2 ) = Var 1 independent random
10 variables and a & b are two
constants, then
1
= 4Var ( X1 ) + 9Var ( X2 ) + 25Var ( X4 )
100
1 38 2
= 4σ 2 + 9σ 2 + 25σ
= 2
σ
100 100
Similarly,
X + X2 + 3X3 1
Var ( T=
3) Var 1 = 25 Var ( X1 ) + Var ( X2 ) + 9Var ( X3 )
5
1 11 2
= σ 2 + σ 2 + 9σ=
2
σ
25 25
Since the variance of the estimator T1 is minimum, therefore, by the definition
of the most efficient estimator, we conclude that the estimator T1 is the most
efficient estimator in the class of three unbiased estimators.
We now come to part (iii). We can calculate the efficiency of an unbiased
estimator as
e=
( )
Var T *
Var ( T )
100
= = 0.658
38 × 4
Similarly, we can compute the efficiency of estimator T3 as follows:
Var ( T1 ) σ2 / 4
=e =
Var ( T3 ) 11σ 2 / 25
25
= = 0.568
11× 4
Hence, we conclude that estimator T2 is more efficient in the comparison of
estimator T3.
Note 1: Although an unbiased estimator is usually preferred over a biased
one. But, there are situations in which a biased estimator with higher efficiency
can be more valuable than an unbiased estimator with lower efficiency.
Note 2: The relative efficiency of two estimators may depend on the
distribution involved. For example, the mean is more efficient than the median
for normal distribution, however, this is not the case for highly skewed
distribution.
207
Block 2 Properties of Good Estimator
I think you have a curiosity to find the efficiency of an estimator. Therefore,
you can try the following Self Assessment Question.
SAQ 2
Consider the question of the manufacturing fruit juice packets discussed in
SAQ 1. Suppose the quality control inspector proposed third estimator for
estimating the average weight of fruit juice packets µ as follows:
X1 + 2X2 + 3X3
T3 =
6
(i) Is estimator T3 unbiased of µ?
(ii) Which one is the most efficient estimator among the three?
(iii) Calculate the efficiency of the remaining estimators.
The first estimator T1 is the sample mean which is unbiased and its value
changes with the change of the samples. Therefore, it has a certain variance
greater than zero. But the second estimator T2 does not change with the
samples and always takes a single value so its variance is zero but it is highly
biased because not all young males may have the same height of 165 cm.
Similarly, an estimator that multiplies the sample mean by [n/(n+1)] will
underestimate the population mean (biased estimator) but have a smaller
variance. Therefore, the question may arise:
(i) Is a biased estimator with a smaller variance better than an unbiased
estimator with a larger variance?
(ii) How can we compare such estimators?
Think about that. To compare these estimators, we require a measuring
device that explicitly trades off biasedness with the variance of an estimator. A
simple approach is to compare estimators based on their mean squared error.
It permits us to compare biased and unbiased estimators.
In statistics, the mean squared error is an essential measure which is used to
The mean square error may
assess the performance of a point estimator (biased or unbiased). It is also
be called a risk function
necessary for relating the concepts of precision, bias and accuracy during
which agrees with the
the statistical estimation. It is abbreviated as MSE. The mean squared error
expected value of the loss
measures the average squared difference between the estimator and the
of squared error. This
parameter.
difference or the loss could
Therefore, we can define the mean squared error of an estimator T of a be developed due to the
parameter θ as randomness or due to the
= E [ T − θ]
2
MSE estimator is not
representing the true
It is a function of parameter θ. unknown parameter.
= E {T − E ( T )} + 2 {T − E ( T )} {E ( T ) − θ} + {E ( T ) − θ}
2 2
= E {T − E ( T )} + 2E {T − E ( T )} {E ( T ) − θ} + E {E ( T ) − θ}
2 2
If the estimator is unbiased then the mean squared error is equal to the
variance of the estimator.
For the unbiased estimator, the mean squared error is equal to the variance.
Therefore, for comparing the estimators, we compare the mean squared error
regardless of whether they are biased or unbiased. If T1 and T2 are two
estimators (biased or unbiased) of a parameter θ with the same size, then the
estimator T1 is said to be more efficient than the estimator T2 for all the same
sample sizes if
210
MSE ( T1 ) < MSE ( T2 ) for all n
Unit 8 Efficiency and Mean Squared Error
We can also compare the mean squared errors of two estimators by using
relative efficiency. If T1 and T2 are two estimators, then the efficiency of T1
relative to T2 is
MSE ( T2 )
e(T1,T2 ) =
MSE ( T1 )
1 1 1
= E ( X1 ) + E ( X2 ) + E ( X3 ) + E ( 2 )
2 4 4
1 1 1
=μ+ μ+ μ+2 =μ+2 E ( a ) =a
2 4 4
Since E ( T1 ) = μ + 2 ≠ μ so the estimator T1 is not an unbiased estimator of the
parameter µ.
Similarly, we consider
1 1
E ( T=
2) E X1 − X2 + X3 + X 4
2 2
1 1
= E ( X1 ) − E ( X2 ) + E ( X3 ) + E ( X4 )
2 2
1 1
= μ − μ + μ + μ= μ
2 2 211
Block 2 Properties of Good Estimator
E ( T2 ) = μ
Since MSE ( T1 ) =
4.75 < MSE ( T2 ) =
5
Thus, we conclude that the estimator T1 is more efficient than the estimator T2.
212 We also know (from Unit 3) that the sample variance has a mean
Unit 8 Efficiency and Mean Squared Error
E S ( ) =σ
2 2
and variance
2σ 4
Var(S2 ) =
n −1
If X and Y are two
Since E ( S2 ) = σ 2 so S2 is unbiased whereas independent random
variables and a & b are two
n −1 n −1 2 n −1 2 constants, then
E S′2
= ( ) =
n
E S2( )
n
σ
=
S′2
n
S
Since E ( S′2 ) ≠ σ 2 so S′2 is a biased estimator.
n n
2 ( n − 1) σ 4
4 2
n − 1 2σ
= =
n n −1 n2
n −1
0
+=
n −1
2 ( n − 1) σ 4
2
n −1 2
( ) ( )
MSE S′2 = Var S′2 + (Bias ) =
2
n2
+
n
σ − σ2
2 ( n − 1) σ 4
2
n − 1− n 2
= + σ
n2 n
2 ( n − 1) σ 4 σ4
= 2
+
n n2
( 2n − 1) σ 4
MSE S′2 = ( ) n2
We now consider
2n − 1 2 4
( )
MSE S′2 − MSE S2 = ( )
n2 − n − 1 σ
( 2n − 1) × ( n − 1) − 2n2 4
= σ
n2 ( n − 1)
2n2 − 2n − n + 1 − 2n2 4
= σ
n2 ( n − 1)
1 − 3n 4
( )
MSE S′2 − MSE S=
2
2 ( ) σ < 0
n ( n − 1)
Therefore, MSE ( S′2 ) < MSE ( S2 )
Thus, we conclude that the sample variance S′2 has less mean squared error
than the sample variance S2 even S′2 is a biased estimator.
213
Block 2 Properties of Good Estimator
The above example does not suggest that S should not be used as an
2
Fig. 8.3: Mean squared error of two estimators for various values of the parameter θ.
Therefore, we would have no basis for preferring one of the estimators over
the other on the basis of mean squared error.
It is now time for you to try the following Self Assessment Question to make
sure that you have understood the concept of mean squared error.
SAQ 3
The magnitude of earthquakes recorded in a region modelled as an
exponential distribution with an unknown parameter θ whose pdf is given by
1 − θx
f ( x,θ
= ) e ; x > 0,θ > 0
θ
A researcher considered the following two estimators for estimating the
parameter θ:
1 n 1 n
T1 = ∑ Xi and T2 = ∑ Xi
n i=1 n + 1 i=1
= E [ T − θ]
2
MSE
Also, you have studied that the unbiasedness criterion ensures only the
average or mean of the sampling distribution of the estimator is equal to the
true value of the parameter. However, it does not tell us the scatteredness
(variance) of the sampling distribution of the estimator. Graphically, we show
the sampling distribution of the two estimators T and T′ of the parameter θ in
Fig. 8.4.
From Fig. 8.4, you can observe that both estimators are unbiased however,
the variance (spread) of the estimator T is smaller than the estimator T′
estimator. It is clear that we would also desire the estimator whose sampling
distribution not be too spread out around the true value of the parameter
because if it is too spread then there will be a high probability that an estimate
could be generated that will have a significant distance from the true value of
the parameter. The foregoing considerations motivate that if one wishes to use
an unbiased estimator of the parameter θ, one should use the unbiased
estimator that also has minimum variance among all unbiased estimators of θ.
Such an estimator is called a minimum variance unbiased estimator (MVUE).
We can define it as follows:
8.7 SUMMARY
In this unit, we have covered the following points:
• If T1 and T2 are two unbiased estimators of a parameter θ with the same
size, then the estimator T1 is said to be more efficient than the estimator
T2 if
Var ( T1 ) < Var ( T2 ) for all n
e=
( )
Var T *
where T* is the most efficient estimator.
Var ( T )
We now consider
X + X 2 X3
E ( T2 ) E 1
= +
4 2
1 1
= E ( X1 ) + E ( X2 ) + E ( X3 )
4 2
1 μ μ μ
=
4
[μ + μ] + = +
2 2 2
Since E ( T2 ) = μ so it is also unbiased.
X + X 2 X3
Var ( T2 ) Var 1
= +
4 2
1 1
= Var ( X1 ) + Var ( X2 ) + Var ( X3 )
16 4
1 2 1
= σ + σ 2 + σ 2
16 4
σ 2 σ 2 σ 2 + 2σ 2
= + =
8 4 8 217
Block 2 Properties of Good Estimator
2
3σ
Var ( T2 ) =
8
Since Var ( T1 ) < Var ( T2 ) , therefore, T1 is a more efficient estimator of μ
than T2.
2. To check whether the estimator T3 is unbiased or not, we find the
expectation of T3 as
X + 2X2 + 3X3
E ( T3 ) = E 1
6
1
= E ( X1 ) + 2E ( X2 ) + 3E ( X3 )
If X and Y are two 6
independent random
1
variables and a & b are two
constants, then
E ( T3 ) =
6
[μ + 2μ + 3μ]
1
= = ( 6μ) μ
6
Since E ( T3 ) = μ so it is also unbiased.
e=
Var T * ( )
Var ( T )
Since the estimator T1 is the most efficient estimator in the class of three
unbiased estimators, therefore, for computing the efficiency of estimator
T2, we take estimator T1 in place of T*
Var ( T1 ) 0.333σ 2
=e = = 0.88
Var ( T2 ) 0.375σ 2
Var ( T1 ) 0.333σ 2
=e = = 0.85
Var ( T3 ) 0.389σ 2
We consider,
If X and Y are two
1 n
1 independent random
E ( T1 ) = E ∑ = Xi E ( X1 ) + E ( X2 ) + ... + E ( Xn ) variables and a & b are two
n i=1 n constants, then
1 1
= θθ
+ + ...=+θ = (nθ ) θ
n n-times n
E ( T1 ) = θ
Similarly,
1 n 1
E ( T2 ) = E = ∑ Xi E ( X1 ) + E ( X2 ) + ... + E ( Xn )
n + 1 i=1 n + 1
1 1 n
= θθ
+ + ... +θ =
= (nθ ) θ
n + 1 n-times n +1 n +1
n
E ( T2 )
Since= θ ≠ θ
n + 1
1 2 1
= θ + θ2 + ... + θ2 = 2 ( nθ2 )
2
n n-times n
θ2
Var ( T1` ) =
n
1
= Var ( X1 ) + Var ( X2 ) + ... + Var ( Xn )
(n + 1)
2
2
2
1 nθ
= θ + θ2 + ... + θ2 =
2
(n + 1)
2
(n + 1) n-times
nθ2
Var ( T2` ) =
(n + 1)
We now calculate the mean square errors as
2
MSE ( T1 ) = Var ( T1 ) + ( bias ) = Var ( T1 ) + E ( T1 ) − θ
2
θ2 θ2
+ (θ − θ) =
2
=
n n
2
MSE ( T2 ) = Var ( T2 ) + ( bias ) = Var ( T2 ) + E ( T2 ) − θ
2
2 2
nθ2 nθ nθ2 nθ − nθ − θ
= + − θ=
+
(n + 1)
2
n +1 (n + 1)
2
n + 1
nθ2 θ2 nθ2 + θ2 θ2
= + = =
(n + 1)
2
(n + 1)
2
(n + 1)
2
(n + 1)
Since, MSE ( T2 ) < MSE ( T1 ) , therefore, we conclude that the estimator T2
is a more efficient estimator than the estimator T1 for estimating the
magnitude of the earthquake in the region.
Terminal Questions (TQs)
1. Refer to Sections 8.2 and 8.5.
220
UNIT 9
Structure
9.1 Introduction 9.5 Properties of Sufficient
Statistic
Expected Learning Outcomes
9.1 INTRODUCTION
In Units 6, 7 and 8, you have studied the characteristics of a good estimator
Tools You Will Need
namely: unbiasedness, consistency and efficiency. Let us have a look at them.
The following terms are
• An estimator is said to be unbiased for a parameter θ if and only if the
considered essential
average/mean of the sampling distribution of the estimator is equal to the background material for this
true value of the parameter. In other words, an estimator is said to be Unit. If you doubt your
unbiased if the expected value of the estimator is equal to the true knowledge of any of these
value of the parameter being estimated, that is, E(T) = θ terms, you should review
the appropriate Unit or
• An estimator Tn is said to be a consistent estimator of θ if Tn converges to
section before proceeding:
θ in probability.
• Sampling distributions
• If T1 and T2 are two unbiased estimators of a parameter θ with the same
(Units 2,3, 4 and 5).
size, then the estimator T1 is said to be more efficient than the estimator T2
if Var ( T1 ) < Var ( T2 ) for all n. • Basic terms of
estimation (Unit 6).
In the continuation of finding the best estimator, we introduce the concept of
• Unbiased, consistency
sufficiency in this unit. and efficiency (Units 6,
This unit is divided into nine sections. Section 9.1 is introductory in nature. The 7 and 8).
joint probability density (mass) function which is used to find a sufficient • Probability distributions
statistic is defined in Section 9.2. Section 9.3 is devoted to explaining the (MST-012).
concept of sufficient statistic. Section 9.4 explores the Fisher-Nayman
factorization theorem. The properties of the sufficient statistic are described in
Section 9.5. The concept of minimal sufficient statistic is described in Section 221
Unit Writer- Dr. Prabhat Kumar Sangal, School of Sciences, IGNOU, New Delhi
Block 2 Properties of Good Estimator
9.6. The unit ends by providing a summary of what we have discussed in this
unit in Section 9.7. The terminal questions and the solution of the SAQs/TQs
are given in Sections 9.8 and 9.9, respectively.
define the joint probability density (mass) function and how to compute it;
In this case, the function f ( x1, x1,..., xn ,θ ) represents the probability density
function of the random sample X1, X2, …, Xn.
Let us understand the process of finding the joint probability density (mass)
222 function by taking some examples.
Unit 9 Sufficiency and Minimal Sufficiency
− nλ
∑ xi n
e λ i =1
Π xi !− repersents the
f ( x1, x1,..., xn , λ ) = n
i =1
product of x !
Π xi ! i
i =1
Let us check your understanding of the above by answering the following Self
Assessment Question.
SAQ 1
To test the effectiveness of a new drug in controlling systolic blood pressure, a
medicine scientist applied the drug to 10 systolic blood pressure patients. If
the numbers of the patients who were cured the disease follow binomial
distribution with parameters 10 and p, then find the joint probability mass
function.
All estimators are statistics because they do not depend on the unknown
population parameter. Obviously, there are lots of functions of X1, X2, …, Xn
and so lots of statistics.
To understand when a statistic is said to be a sufficient statistic, we consider
an example. Suppose we have three coins, and if we toss all three coins
simultaneously, then the possible outcomes are:
(T, T, T), (T, T, H), (T, H, T), (H, T, T), (T, H, H), (H, T, H), (H, H, T), (H, H, H)
If we represent head (H) by 1 and tail (T) by 0, we can represent the outcomes
as
(0, 0, 0), (0, 0, 1), (0, 1, 0), (1, 0, 0), (0, 1, 1), (1, 0, 1), (1, 1, 0), (1, 1, 1)
Suppose we are interested in estimating the number of heads on the basis of
samples, so we have to worry about all outcomes as mentioned above. We
can also estimate the number of heads using a statistic, T = X1 + X2 + X3 then
it takes the values 0, 1, 2, and 3 as shown in the following table:
Outcomes (0, 0, 0) (0, 0, 1), (0, 1, 0), (0, 1, 1), (1, 0, 1), (1, 1, 1)
(1, 0, 0) (1, 1, 0)
Statistic (T) 0 1 2 3
It means that using the statistic T instead of the sample, we condense the data
into four subgroups instead of eight. And if we use the statistic T instead of the
sample, we have to worry about 4 subgroups instead of 8. Now some
questions may arise:
(i) Is some information lost by using a statistic instead of full data?
(ii) Does the random sample contain any more information about the
population than this?
To get the answers to such types of questions, we have to study the concept
of sufficient statistic.
Suppose there are two researchers, say, Rajesh and Prabhat. Rajesh knows a
particular outcome (sample) of all possible cases (X1, X2, X3), say (1,0,1),
however, Prabhat only knows the value of a statistic T = X1 + X2 + X3 that is 2.
Since Prabhat knows that 2 heads come when three coins are tossed
simultaneously, therefore, he can generate a random sample (X1′ , X′2 , X′3 ) such
as (1, 1, 0) or (0, 1, 1) or (1, 0, 1) and he can use his random sample
(X1′ , X′2 , X′3 ) to compute whatever Rajesh computes using his random sample
(X1, X2, X3). Therefore, we can say that there is no information lost using the
statistic T = X1 + X2 + X3. In other words, we can say that Prabhat who knows
the value of T can do just as good a job of estimating the unknown parameter
θ as Rajesh who knows the entire random sample. Thus, we can say that
statistic T is a sufficient statistic. So, a statistic is sufficient if it is just as
informative as the full data. We can define the sufficient statistic as
= g ( x1, x 2 , ..., xn )
where the numerator is the joint probability density (mass) function of the
sample values and the function g ( x1, x 2 ,..., xn ) does not depend on the
parameter θ.
Let us take an example to understand the above definition.
Example 3: Consider the example of tossing three coins simultaneously and
assume that the probability of getting head is p. Check whether the statistic
T = X1 + X2 + X3 is sufficient or not.
Solution: To check whether the statistic T = X1 + X2 + X3 is sufficient or not,
we have to find the conditional distribution of the sample observations given T.
For that, we have to use some concepts of probability distributions which you
have studied in Unit 9 of the course MST-012: Probability and Probability
226 Distributions.
Unit 9 Sufficiency and Minimal Sufficiency
The probability of getting a head in a single coin is p and not getting the same
is 1 ̶ p. Since we perform a random experiment (tossing a coin
independently) and the outcome of each has two categories: head and tail, If each trial of a random
then the probability distribution of a random variable which takes the value 1 if experiment is termed in
the outcome is a head (success) and 0 if the outcome is a tail (failure) is one of the two possible
known as Bernoulli distribution. Therefore, we can write the probability mass categories traditionally
function as known as a success or a
failure then such a trial is
P[X =
x] =
p x (1 − p )
1− x
; x=
0,1
known as Bernoulli trial. If
we perform a random
Since T = X1 + X2 + X3, is the sum of the Bernoulli distributed random
experiment and the
variables, therefore, T follows a binomial distribution with parameters n = 3
realisation of a trial has
and p whose probability mass function is given as
only two categories
P [T =
t] =
Ct p (1 − p )
n− t
n t
; t=
0,1,2,3 & n =
3 success or failure, then
probability distribution of a
We are now ready to find the conditional distribution of the sample random variable which
observations given T and check whether it is independent of the parameter p takes value 1 if outcome is
or not. Therefore, we consider a success and 0 if
outcome is a failure is
P [ X1= x1, X2= x 2 ,..., Xn= xn | T= t=
[ X1 x1=
P= , X2 x 2 ,...,=
Xn x=
n ,T t]
] P [T = t ]
known as Bernoulli
distribution. The probability
mass function of Bernoulli
Suppose we observed a random sample of size n = 3 in which X1 = 1, X2 = 0
random variable X is given
and X3 = 1. In this case: by
P [ X=
1 1, X=
2 0, X=
3 1,T ] 0
= 0=
T =0, 1 and 3 are not possible. So the above expression becomes impossible
and their probability becomes 0 whereas only event (X1 = 1, X2 = 0, X3 = 1, T
= 2) is possible. In this case, we have, by independence:
P [ X=
1 1, X=
2 0, X=
3 1,T ] P [ X=1 1] P [ X=2 0] P [ X=3 1]
= 2=
=p (1 − p ) p =p2 (1 − p )
3 −1
So, in general
n
P
= [ X1 x=
1, X 2 x 2 ,...,
= Xn =
xn ,T=t] P
= [ X1 x=
1, X 2 x 2 ,...,
= Xn x n ] if ∑x
i =1
i =t
and
n
P [ X1= x1, X2= x 2 ,..., Xn= xn ,T= t=
] 0 if ∑x
i =1
i ≠t
Therefore,
p (1 − p )
2 3−2
1
= = 3
C2p (1 − p ) C2
3 2 3−2
We have just shown that the conditional distribution of the sample X1, X2, …,
Xn given T = t does not depend on the parameter p. Therefore, T is indeed
sufficient for p. That is, once the value of T is known neither the sample nor
other function of X1, X2, …, Xn will provide any additional information about
the possible value of p.
Now, it is time for you to check your understanding of the condition distribution
by solving the following Self Assessment Question.
SAQ 2
Suppose the time between customers who enter a certain shop follows
exponential distribution with parameter θ whose pdf is given as follows:
1 − θx
f ( x, θ )
= e ; x > 0, θ > 0
θ
To estimate the parameter θ, the market researcher proposed the
statistic/estimator T = X based on sample X1, X2, …, Xn then show that T is a
sufficient statistic for the parameter θ using condition distribution.
For applying the factorization theorem, we try to factor the joint probability
density (mass) function as the product of two functions, one of which is a
function of the parameter(s) and statistic and another independent of the
parameter(s).
Note 1: The factorization theorem should not be used to show that a given
statistic or estimator T is not sufficient.
Let us learn how to apply the factorization theorem to obtain a sufficient
statistic with the help of some examples.
Example 4: Suppose the number of visitors visiting a website per hour follows
Poisson distribution with parameter λ. Find a sufficient statistic for λ.
Solution: To find the sufficient statistic for λ, we can use the factorization
theorem. For that, we have to find the joint probability density function of the
sample values. We know that the probability mass function of Poisson
distribution with parameter λ is
e−λ λ x
P [ X= x ]= ; x= 0, 1, 2, ... & λ > 0
x!
Let X1 , X2 , ..., Xn be a random sample taken from the Poisson distribution with
parameter λ. We can obtain the joint probability mass function of X1 , X2 , ..., Xn
as
λ ) P [=
f ( x1, x 2 ,..., xn ,= X1 x1 ] .P [ =
X2 x 2 ] ... P [ =
Xn x n ]
− nλ
∑ xi
e λ i =1
f ( x1, x 2 ,..., xn , λ ) = n
Π xi !
i =1
We now try to factor the above joint probability mass function as the product of
two functions, one of which is a function of the parameter (λ) and another is
independent of the parameter (λ). We can factor the joint probability mass
function as
∑ xi 1
n
f ( x1, x 2 ,..., xn , λ
= ) e λ × n
− n λ i =1
Π
i =1
xi !
statistic for λ.
But, wait a second! We can also write the joint probability mass function as:
e−nλ λnx 1 n
f ( x1, x 2 ,..., xn , λ )
= n
= since x ∑ xi
Πx ! n i=1
i
i =1
And factor as
1
f ( x1, x 2 ,..., xn , λ=
) (e − nλ
λ nx
) × n
Π xi !
i=1
= g [ t(x), λ ] .h ( x1, x 2 ,..., xn )
n
statistics, because if we know X , we can easily find ∑ X . And, if we
i =1
i
n
know ∑ X , we can easily find
i =1
i X . Also, both condense the data the same.
Note 2: Since throughout the course we are using a capital letter for statistic
n
or estimator, therefore, in the last line of the above example, we use ∑ X in
i =1
i
n
place of ∑ x . Thus, in all the examples and exercises relating to sufficient
i =1
i
Solution: To find the sufficient statistic for the parameter µ, we use the
230 factorization theorem. We know that the probability density function of the
Unit 9 Sufficiency and Minimal Sufficiency
2π × 25 50π
We can obtain the joint probability density function of the sample observations
X1 , X2 , ..., Xn as
50π
We now try to factor the joint probability density function as the product of two
functions, one of which is a function of parameter µ and statistic and another is
independent of the parameter µ. But there is no separate term which is a
n
function of xi’s in ∑ (x
i =1
i − μ)2 . Therefore, we use a trick to factor of the joint pdf.
50π
50π
50π
n
But the last term in the exponent ∑(x i =1
i − x) =
0 (by property of mean), and the
second term can be added up n times because it does not depend on the
index i, therefore, we get
n
n 1
1 − 50 ∑
2 2
( xi − x ) + n ( x −μ ) − 0
f ( x1, x 2 ,...xn ,μ) = e
i =1
50π
n
n 1 2
1 − 50 ∑
2
( xi − x ) + n ( x −μ )
= e i =1
50π 231
Block 2 Properties of Good Estimator
n
n 1 n
1 − ∑( x i − x )2 − ( x −μ )2
f ( x1, x 2 ,...xn ,μ) =
50 i =1 50
e
50π
Now, we can easily factor the joint probability density function as
1 n −
n
1
−
n
( x −μ )2 ∑ ( x i − x )2
f ( x1, x 2=
,...xn ,μ) e ×
50 i =1
50
e
50π
50π
is independent of µ. Hence, by the factorization theorem of sufficiency, the
statistic sample mean X is a sufficient statistic for µ when σ2 is known.
We now check whether X2 and X3 are sufficient statistics for µ.
If we are given the value of y = X3 , we can easily get the single value of X
through the one-to-one function y1/3 . Therefore, X3 is also sufficient for µ.
ba ba ba
= e−bx1 x1a −1. e−bx2 x a2−1 ... e−bxn xna −1
a a a
After simplification, we get
n
bna ∑ xi n a −1
−b
f ( x1, x 2=
,..., xn ,a,b ) e i =1 Π x i
( a)
n
i=1
232 Since there are two parameters (a, b) so we consider the following cases:
Unit 9 Sufficiency and Minimal Sufficiency
Case I: When ‘b’ is known then we treat ‘b’ as a constant and find the
sufficient statistic for ‘a’.
We can factor the joint probability density function as
na n a −1
n
b −b ∑ xi
f ( x1, x 2 ,...,=
xn ,a ) Π x e i =1 Since ‘b’ is known so ‘b’
( )
n i
a
i =1
is treated as a constant.
i =1
− b ∑ xi
n
a −1
1 n
f ( x1, x 2 ,..., x n ,b ) b e
= na Πx Since ‘a’ is known so ‘a’ is
a n i=1 i
i = 1
( )
treated as a constant.
for ‘b’.
Case III: When ‘a’ and ‘b’ are unknown then we find jointly sufficient statistics
for ‘a’ and ‘b’.
Since we cannot separate any term of the joint probability density function
which is independent to both ‘a’ and ‘b’, therefore, we can factor the joint
probability density function as
na −b n x
b e ∑
a −1
n .1
f ( x1, x=
2 ,..., x n ,a,b )
i
i =1
Π xi
( )
n
i=1
a 233
Block 2 Properties of Good Estimator
= g [ t1(x), t 2 (x), a,b] .h ( x1, x 2 ,..., xn )
n
bna ∑ xi n a −1
−b
where, g [ t1(x),
= t 2 (x), a,b] e i =1 Π xi is a function of the parameters
( a)
n
i=1
n
‘a’& ‘b’ and the sample values x1, x 2 ,..., xn only through t1(x) = Π xi and
i =1
n
t 2 (x) = ∑ xi whereas, h ( x1, x 2 ,...xn ) = 1 and independent of the parameters ‘a’
i =1
and ‘b’.
n n
Hence, by the factorization theorem, Π Xi and
i =1
∑ X are jointly sufficient for the
i =1
i
1 1 1
= . ...
β−α β−α β−α
1
The order statistics of a f ( x1, x 2 ,..., xn ,α,β ) =
(β − α )
n
random sample
are the
Since the range of variables depends upon the parameters so we consider
sample values placed in
ascending order of ordered statistics X(1), X(2),…, X(n) instead of the sample observations X1, X2,…,
magnitude. These are Xn . Therefore, we can write the joint probability density function as
denoted by
1
f ( x1, x 2 ,...,=
xn ,α,β ) ; α ≤ x (1) ≤ x ( 2) ≤ ... ≤ x (n) ≤ β
(β − α )
n
=
1
( β − α )
(
I x (1) ,α I2 x (n) ,β
n 1 ) ( ) .1
where, x(1) and x(n) are the minimum and maximum sample observations,
respectively, and
1; if x (1) ≥ α
(
I1 x (1) ,α = )
0; otherwise
1; if x (n) ≤ β
(
I2 x (n) ,β = )
0; otherwise
Therefore,
234
f ( x1, x 2 ,..., xn ,α,β ) = g [ t1(x), t 2 (x), α, β] .h ( x1, x 2 ,..., xn )
Unit 9 Sufficiency and Minimal Sufficiency
where, g [ t1(x), t 2 (x), α,β] =
1
( β − α )
( I
n 1
x ) (
(1) ,α I2 x )
(n) ,β is a function of
parameters (α, β) and sample values x1, x 2 ,..., xn only through t1(x) = x (1) and
t 2 (x) = x (n) whereas, h ( x1, x 2 ,...xn ) = 1 and independent of parameters ‘α’ and
‘β’.
Hence, by the factorization theorem of sufficiency, X(1) and X(n) are jointly
sufficient for α and β.
Now, you will get more clearly about how to factor the joint probability density
(mass) function and obtain the sufficient statistic, when you try the following
Self Assessment Question.
SAQ 3
(i) If the time between two customers arriving in a bank follows an
exponential distribution with parameter θ, then find the sufficient statistic
for θ.
(ii) Consider Example 5 of the life of the lithium batteries used in cars. If the
life of the lithium battery follows a normal distribution with a mean 95
months and variance σ2. Find the sufficient statistic for σ2.
(iii) The time interval between two metro trains follows the uniform
distribution [0, θ]. Find the sufficient statistic for θ.
Let us discuss the properties of the sufficient statistic in the next section.
9.7 SUMMARY
In this unit, we have covered the following points:
• The joint probability mass function for discrete distribution sample values
is defined as
θ ) P [=
f ( x1, x1,..., xn ,= X1 x1 ] P [=
X2 x 2 ] ...P [=
Xn x n ]
• A statistic T = t(X1, X2,…, Xn) is a sufficient statistic if, for each t, the
conditional distribution of X1, X2,…, Xn given T = t and θ does not depend
on θ.
n
P [ X = x ] = p x (1 − p ) ; x = 0, 1, 2, ...n & 0 ≤ p ≤ 1
n− x
x
10
= p x (1 − p )
10 − x
; x = 1,2,...,10 (since n = 10)
x
If X1, X2,…, Xn denote the outcomes of the drug then by the definition of
the joint probability mass function of the sample observations
X1 , X2 , ..., X10 , we have
f ( x1, x1,..., x=
n ,p ) [ X1 x1 ] P=
P= [ X2 x 2 ]...P=
[ Xn x10 ]
We can obtain the joint probability mass function by putting X as
x1, x2, …, x10 in the probability mass function as mentioned above.
Therefore, 237
Block 2 Properties of Good Estimator
10 x1 10 − x1 10 x
f ( x1, x1,..., x10 ,p ) =
p (1 − p ) . p 2 (1 − p ) 2
10 − x
x
1 x
2
10 x10
p (1 − p )
10 − x10
...
x10
Collecting like terms, we get
10
10 x1 + x2 +...+ x10
f ( x1, x1,..., x10 ,p ) ∏ x (1 − p )10−
10 +10 + ... +10 −( x1 + x 2 +...+ x10 )
= p
times
i =1 i
10 ∑
10
10 xi
100 − ∑ xi
f ( x=
1, x1,..., x 10 ,p ) ∏ p i =1
( ) i =1
1 − p
i =1 x i
=f ( x1, θ ) f ( x 2 , θ ) ...f ( xn , θ )
x x x
1 − θ1 1 − θ2 1 − θn
= e e ... e
θ θ θ
n
1
1 − θ ∑ xi
= n e i =1
θ
1 − nθ t 1 n
f ( x1=
, x 2 , ..., xn , t ) = e t ∑ xi
θn n i=1
We now find the distribution of T. Since the sample comes from the
exponential distribution, therefore, X1, X2, …, Xn follow the same
exponential distribution with parameter θ. Also, we know that if X1, X2, …,
n
Xn follow the exponential distribution then ∑ X will follow a gamma
i =1
i
distribution with parameter (n, 1/θ) and the statistic T = X will follow the
gamma distribution with parameters (n, n/θ). Therefore, the pdf of T = X
is given as follows:
n n
n − θ t n−1
θ e t
=f (t) ; t>0
n
Therefore, we can find the condition distribution of the sample
238 observations given T as
Unit 9 Sufficiency and Minimal Sufficiency
n
1 −θt
f ( x1, x 2 , ..., xn ,t ) n
e
( 1 2
f=x , x , ..., x | t ) = θ
f (t)
n n n
n − θ t n−1
θ e t
n
n
= n n −1
nt
Since the conditional distribution of X1, X2, …, Xn given T = X does not
depend on the parameter θ. Therefore, T is indeed sufficient for the
parameter θ.
3(i) Here, we take random sample from exp (θ) whose probability density
function is given by
f ( x, θ ) = θ e−θx ; x > 0 & θ > 0
To find the sufficient statistic, first, we have to find the joint probability
density function of the sample values of the exponential distribution. Let
X1, X2, …, Xn be a random sample taken from the exponential distribution
with parameter θ. We can obtain the joint probability density function of
the exponential distribution as
f ( x1, x 2 ,...xn , θ=
) f ( x1, θ ) .f ( x 2 , θ ) ... ( xn , θ )
n
−θ ∑ xi
θe−θx1 . θ e−θx2 ... θ e−θxn =
= θn e i =1
1 n
f ( x1, x 2 ,...xn , θ ) =θn e −θnx x = ∑ xi
n i=1
(
f ( x1, x 2 ,...xn , θ ) = θn e−θnx .1 )
= g t ( x ) , θ .h ( x1, x 2 ,..., xn )
2πσ2 239
Block 2 Properties of Good Estimator
Let X1, X2, …, Xn be a random sample taken from the above normal
distribution. We can obtain the joint probability density function of the
sample observations X1, X2, …, Xn as
( ) ( ) (
f x1, x 2 ,...xn , σ2 = f x1, σ2 .f x 2 , σ2 ... f xn , σ2 ) ( )
1 1
1 − ( x1 −95 ) 2 1 − ( x2 −95 ) 2
= e 2 σ2
. e 2 σ2
2πσ2 2πσ2
1
1 − ( xn − 95 ) 2
... e 2 σ2
2πσ2
n
n 1
1 − 2 σ2 ∑
2
( xi − 95 )
= e i =1
2
2πσ
1 n − 12 ∑ ( xi −95 )2 1 n
n
(
f x1, x 2 ,...xn , σ =
2
) e
σ2
2 σ i =1
2π
= g t ( x ) , σ2 .h ( x1, x 2 ,..., xn )
n
n 1
1 − 2 σ2 ∑
2
( xi −95 )
where g t ( x ) , σ2 =
e i =1
is a function of parameter σ2
2
σ
n
t (x) ∑(x − 95 ) , whereas
2
and sample values x1, x 2 ,..., xn only through = i
i =1
n
1
h ( x1, x 2 ,..., xn ) = is independent of σ . Hence, by the factorization
2
2π
n
theorem of sufficiency, ∑ ( Xi − 95 ) is a sufficient estimator for σ2 when µ
2
i =1
is known as 95.
To find the sufficient statistic, first, we have to find the joint probability
density function of the sample values. Let X1, X2, …, Xn be a random
sample taken from U[0, θ]. Therefore, we can obtain the joint probability
density function of the sample X1, X2,…, Xn as
f ( x1, x 2 ,...xn , =
θ ) f ( x1, θ ) .f ( x 2 , θ ) ... f ( xn , θ )
1 1 1 1
= =. ...
θ θ θ θn
1
f ( x1, x 2 ,...x=
n, θ) ; 0 ≤ x (1) ≤ x ( 2) ≤ ... ≤ x (n) ≤ θ
θn
1
= n I x (n) , θ
θ
( ) .1
where,
1 ; if x (n) ≤ θ
( )
I x (n) , θ =
0 ; otherwise
Therefore,
f ( x1, x 2 ,...x=
n, θ) g t ( x ) , θ .h ( x1, x 2 ,..., xn )
where g t (=
1
x ) , θ
θn
( )
I x (n) , θ is a function of θ and sample values only
P [ X = x ] = p x (1 − p )
1− x
; x = 0,1, 0 < p < 1
We can obtain the joint probability mass function of the three coins as
f ( x1, x1, x=
3 ,p ) [ X1 x1 ] P=
P= [ X2 x 2 ] P=
[ X3 x 3 ]
We can obtain the joint probability mass function by putting X as x1, x2, x3
in the probability mass function as mentioned above. Therefore,
f ( x1, x1, x 3 ,p ) =−
p x1 (1 p ) .p x2 (1 − p ) .p x3 (1 − p )
1− x1 1− x 2 1− x3
f ( x1,=
x1, x 3 ,p ) p x1 + x2 + x3 (1 − p )
1+1+1− ( x1 + x 2 + x3 )
∑ xi 3
f ( x1, x 2 ,=
x 3 ,p ) p i =1 (1 − p ) ∑i=1 i
3− x
We now try to write the joint mass function as the product of two
functions, but we cannot separate any term of the joint probability mass
function which is independent of p, therefore, we can factor the joint
probability mass function as
∑ xi
3
3
3 − ∑ xi
f ( x1, x 2=
, x 3 ,p ) p i =1
(1 − p ) i=1 .1