0% found this document useful (0 votes)
22 views83 pages

Properties of Good Estimators Explained

The document outlines the structure and content of a course on statistical inference, focusing on the properties of good estimators. It covers concepts such as unbiasedness, consistency, efficiency, and sufficiency, along with methods of estimation and hypothesis testing. The expected learning outcomes include defining parameter space, explaining properties of estimators, and applying the Fisher-Nayman Factorization theorem.

Uploaded by

Aaryan Arun
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views83 pages

Properties of Good Estimators Explained

The document outlines the structure and content of a course on statistical inference, focusing on the properties of good estimators. It covers concepts such as unbiasedness, consistency, efficiency, and sufficiency, along with methods of estimation and hypothesis testing. The expected learning outcomes include defining parameter space, explaining properties of estimators, and applying the Fisher-Nayman Factorization theorem.

Uploaded by

Aaryan Arun
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

MST-016

STATISTICAL
Indira Gandhi National Open University
School of Sciences INFERENCE

Block

2
PROPERTIES OF GOOD ESTIMATOR
UNIT 6
Unbiasedness 163
UNIT 7
Consistency 181
UNIT 8
Efficiency and Mean Squared Error 201
UNIT 9
Sufficiency and Minimal Sufficiency 221

159
BLOCK 1: Sampling Distributions
Unit 1: Basic Concepts of Sampling Distribution

Unit 2: Sampling Distributions of Sample Means

Unit 3: Sampling Distributions of Sample Proportions and Variances

Unit 4: Sampling Distributions Associated with Normal Populations-I

Unit 5: Sampling Distributions Associated with Normal Populations-II

BLOCK 2: Properties of Good Estimator


Unit 6: Unbiasedness

Unit 7: Consistency

Unit 8: Efficiency and Mean Squared Error

Unit 9: Sufficiency and Minimal Sufficiency

BLOCK 3: Methods of Estimation


Unit 10: Method of Maximum Likelihood Estimation

Unit 11: Other Methods of Point Estimation

Unit 12: Interval Estimation for Means

Unit 13: Interval Estimation for Proportions

Unit 14: Interval Estimation for Variances

BLOCK 4: Testing of Hypothesis: Parametric Tests


Unit 15: Basic Concepts of Testing of Hypothesis

Unit 16: Tests for Means

Unit 17: Tests for Proportions

Unit 18: Tests for Variances

160
BLOCK 2: PROPERTIES OF GOOD ESTIMATOR
In Block 1 of this course, you have studied the sampling distributions of various statistics
such as sample mean, difference of two sample means, sample proportion, difference of two
sample proportions, sample variance, and ratio of two sample variances. Also, have studied
some standard sampling distributions such as chi-square, t, and F-distributions that provide a
platform to draw inferences about the population parameters on the basis of the samples.
Estimation admits two problems; the first is to select some criteria or properties such that if
an estimator possesses these properties it is said to be the best estimator among all possible
estimators and the second is to derive some methods or techniques through which we obtain
an estimator which possesses such properties. This block is devoted to explaining the criteria
of a good estimator.
This block comprises four units.
Unit 6: Unbiasedness is devoted to explaining the concept of estimation (point and interval
estimation) with the first property of a good estimator, i.e. unbiasedness. The properties of an
unbiased estimator are also described in this unit.
Unbiasedness property is defined for a fixed sample size. In Unit 7: Consistency, you will
learn about consistency which is defined for increasing sample size. Here, we describe the
concept of consistency and its asymptotic distribution with a suitable normalisation with
examples. We also explain the properties of the consistent estimator.
There may exist more than one unbiased estimator of a parameter, therefore, to check which
one is better, we explain the concept of efficiency. Unit 8: Efficiency and Mean Squared
Error is devoted to explaining the concept of efficiency, mean squared error and minimum
variance unbiased estimator which help us to compare estimators and make the decision which
one is better.
In the continuation of finding the best estimator, we introduce the concept of sufficiency in
Unit 9: Sufficiency and Minimal Sufficiency. In this unit, you will study the concept of
sufficient and minimal sufficient estimators with examples. The Fisher-Nayman Factorization
theorem for finding sufficient estimators is also explained in this unit.

Expected Learning Outcomes


After completing this block, you should be able to:
 define the parameter space and describe the properties of a good estimator;
 explain the unbiasedness, consistency, efficiency and sufficiency properties of a good
estimator;
 check whether an estimator is unbiased, consistent, efficient or sufficient ;
 describe the properties of unbiased, consistent, efficient and sufficient estimators;
 describe the Fisher-Nayman Factorization theorem and how to use it to find sufficient
statistic; and
 explain the concept of a consistent asymptotically normal estimator; mean squared
error, most efficient estimator, minimum variance unbiased estimator, and minimal
sufficient statistic.

161
Notations and Symbols
SAQ/TQ : Self Assessment Question/Terminal Question
Fig./Figs. : Figure/Figures
X1, X2, …, Xn : A random sample of size n
X : Sample mean
2
S : Sample variance
µ and σ2 : Mean and variance of a population
E(X) and Var(X) : Mathematical expectation and variance of X
Z ~N (0, 1) : Standard normal variate
P and p : Population and sample proportion
a b : Beta function
B ( a,b ) =
a+b
a : Gamma function
Θ Parametric Space
T = t(X1 , X2 , ..., Xn ) : Estimator

e and MSE : Efficiency and mean squared error


g [ t(x),θ] : Non-negative function of the parameter θ and observed sample
values
h ( x1, x 2 ,..., xn ) : Non-negative function of observed sample values

162
UNIT 6

UNBIASEDNESS

Structure
6.1 Introduction 6.6 Summary
Expected Learning Outcomes 6.7 Terminal Questions
6.2 Basic Terminology 6.8 Solutions /Answers
6.3 Properties of Good Estimator
6.4 Unbiasedness
6.5 Properties of Unbiased
Estimator
6.1 INTRODUCTION
In many real-life problems, the population parameter (characteristic of the
population) is not known and someone is interested in obtaining the value of Tools You Will Need
the parameter. But, if
The following terms are
• the whole population is too large to study, considered essential
background material for
• the units of the population are destructive in nature,
this Unit. If you doubt your
• there are limited resources and manpower available, etc. knowledge of any of these
terms, you should review
then it is not practically convenient to examine each unit of the population to
the appropriate Unit or
find the value of the parameter. For example, as you know many of us use
section before proceeding:
Facebook and you are interested to know the average age of the people who
use Facebook. However, the true value (average age) of Facebook users is • Sampling distributions
not known. The only way to know the true average age of Facebook users is (Units 2,3, 4 and 5).

to survey each and every person in the world who uses Facebook. But it is not • Probability distributions
possible to survey everyone in the world. In such a situation, one can select (MST-012).
randomly some persons who use Facebook and note their age. Suppose we
randomly selected 20 Facebook users and obtained the following data of their
age (in years):

20 42 36 30 20 52 32 18 70 22

45 18 40 16 18 20 30 19 41 20

If we use the sample average age to estimate the unknown average age of the
Facebook users, then we get an estimate of the same as
163
Unit Writer- Dr. Prabhat Kumar Sangal, School of Sciences, IGNOU, New Delhi
Block 2 Properties of Good Estimator
n
1 607
X
= ∑ =
n i=1
Xi = 30.25
20

Estimating is not something new to us. Every one of us uses an estimate in


our day-to-day life. Some situations are as follows:
• At the metro station of Delhi, a guard may estimate the height of a child to
be 3 feet or longer.
• Lavnik estimates the time to reach his school from home is about 20
minutes.
• A family estimates the monthly expenditure on the basis of particular
needs.
• The distance between New Delhi and Gujrat is approximately 1112 km.
So, the question is, what is an estimation?
The technique of finding an estimator to produce an estimate
(approximate value) of the unknown parameter of the population on the
basis of a sample is called estimation.
As its name suggests, the objective of estimation is to determine the
Any statistic used to approximate value of a population parameter on the basis of a sample
estimate an unknown statistic.
population parameter is There are two methods of estimation:
known as estimator and
the particular value of the
1. Point Estimation
estimator is known as 2. Interval Estimation
estimate of parameter.
In point estimation, we determine an appropriate single statistic whose value is
The estimated value of
used to estimate the unknown parameter whereas, in interval estimation, we
sample mean and sample
determine an interval that contains the true value of the unknown parameter
variance are denoted by
with a certain confidence. For example, in the case of Facebook users, we get
x and S2, respectively.
the point estimate as 30.25 years because we estimated it by only one value
(30.25 years) whereas if we estimate the same as the age group (18, 34) uses
Facebook then it is an interval estimation because we estimated it by using an
interval (18, 34) age. The point estimation and interval estimation are briefly
described in Units 10-11 and Units 12-14, respectively.
Estimation admits two problems:

• The first is to select some criteria or properties such that if an estimator


possesses these properties then it is called the best estimator among
possible estimators, that is, properties of a good estimator, and

• The second is to derive some methods or techniques through which we


obtain an estimator which possesses such properties, that is, methods of
estimation.

Units 6, 7, 8 and 9 are devoted to describing the properties of a good


estimator in detail, however, Units 10, 11, 12, 13 and 14 explain the methods
of estimation.

This unit is divided into nine sections. Section 6.1 is introductory in nature. The
164 basic terms used in estimation are defined in Section 6.2. Section 6.3 is
Unit 6 Unbiasedness

devoted to explaining the criteria of a good estimator. Section 6.4 explores the
concept of unbiasedness with examples. The properties of an unbiased
estimator are described in Section 6.5. The unit ends by providing a summary
of what we have discussed in this unit in Section 6.6. The terminal questions
and the solution of the SAQs/TQs are given in Sections 6.7 and 6.8,
respectively.
In the next unit, we shall discuss the second characteristic of a good estimator,
that is, consistency.

Expected Learning Outcomes


After studying this unit, you should be able to:
 define the parameter space;
 describe the properties of a good estimator;
 explain the unbiasedness characteristic of an estimator;
 check whether an estimator is unbiased or not; and
 describe the properties of an unbiased estimator.

6.2 BASIC TERMINOLOGY


Before discussing the properties of a good estimator, we discuss basic
definitions of some important terms. These terms are very useful in
understanding the fundamentals of the theory of estimation discussed.
Discrete and Continuous Distributions
In Units 9 to 16 of MST-012, we have discussed standard discrete and
continuous distributions as binomial, Poisson, normal, exponential, etc. We
know that a population can be described with the help of a distribution,
therefore, standard discrete and continuous distributions are also used in
statistical inference. Here, we discuss some standard discrete and continuous
distributions in brief as in tabular form and you have to learn at least the mean
and variance of these distributions which will pay you to do the estimation
questions easily.
S. No. Distribution Parameter(s) Mean Variance
Bernoulli (discrete)
1 1− x p p p(1 – p)
P[X = p x (1 − p )
x] = ; x=
0,1

Binomial (discrete)
2 n n− x n&p np np(1 – p)
P[X =
x] = Cxp x (1 − p ) ; x=
0,1,...,n

Poisson (discrete)
3 e−λ λ x λ λ λ
P [ X= x ]= ; x= 0,1,... ; λ > 0
x!
Uniform (discrete)
4 1 n
n +1 n2 − 1
P[X= x= ] ; = x 1,2,...,n 2 12
n
Hypergeometric (discrete)
M nM NM (N − M)(N − n )
5 Cx N−M Cn− x N, M & n
P[X ]
= x= x 0,1,...,min {M,n}
;= N N2 (N − 1)
N
Cn 165
Block 2 Properties of Good Estimator

Geometric (discrete) p p
6 p
P[X =
x] =
p (1 − p ) ; x =
0,1,2,...
x (1 − p ) (1 − p )2
Negative Binomial (discrete)
rp rp
7  x + r − 1 r r&p
P[X =
x] =

x
 p (1 − p ) ; x =
0,1,2, ... (1 − p ) (1 − p )2
 r −1 

Normal (continuous)
2
1  x −µ 
−  
8 1 2 µ & σ2 µ σ2
= f (x) e  σ  ; − ∞ < x < ∞;
σ 2π
σ > 0, − ∞ < µ < ∞

Standard Normal (continuous)


1
9 1 − x2 -- 0 1
= f (x) e 2 ; −∞ < x < ∞

Uniform (continuous)
10 1 a&b
a+b ( b − a )2
f (x)
= ; a < x < b,b > a 2 12
b−a

Exponential (continuous) 1 1
11 −θx θ
f ( x ) = θe ; x ≥ 0; θ > 0 θ θ2
Negative Exponential or simply exponential
(continuous)
x θ θ θ2
1 −θ
=f (x) e ; x ≥ 0; θ > 0
θ

Gamma (continuous)
b b
12 ba −bx a −1 a&b
f (x)
= e x ; x > 0;a,b > 0 a a2
a
Beta First Kind (continuous)
1 b −1 a ab
=
.13 f (x) xa −1 (1 − x ) ; 0 < x < 1; a&b 2
B ( a,b ) a+b ( a + b ) ( a + b + 1)
a > 0,b > 0

Beta Second Kind (continuous)


a a ( a + b + 1)
14 1 xa −1 a&b
= f (x) ; x > 0; a,b > 0 b −1 (b − 1)2 (b − 2)
B ( a,b ) (1 + x )a +b

Standard Cauchy
Does
1
15=f (x) ; −∞ < x < ∞ --- not Does not exist
(
π 1 + x2 ) exist

Laplace
x −µ
16 1 − b µ&b µ 2b2
= f (x) e ; −∞ < x < ∞
2b

Parameter Space
The set of all possible values that the parameter θ or parameters θ1, θ2, …, θk
can assume is called the parameter space. It is denoted by Θ and is read as
“big theta”. For finding the parameter space of a parameter, we have to think
all possible values of the parameter yet the chance of these is very very small.
166 For example, suppose the parameter θ represents the average life of electric
Unit 6 Unbiasedness

bulbs manufactured by a company. Since the bulb can be fused at the initial
time 0 or at 1, 2, 2.3, 3 hours, and so on, therefore, it lies from 0 to ∞. Hence,
the parameter space of the average life of the bulbs, that is, θ = is Θ {θ : θ ≥ 0} .
It means that the parameter average life θ can take all possible values greater
than or equal to 0, Similarly, in a normal distribution (μ, σ2), the parameter
space of parameters μ and σ
= 2
is Θ {(μ,σ 2
}
) : −∞ < μ < ∞; 0 < σ < ∞ .

Mathematical Expectation
If X is a continuous random variable having the probability density function
f ( x ) , then the expected value of X (mean) is defined as
∞ ∞
E ( X) = ∫ ( ) ∫x
x f ( x ) dx and E Xr = r
f ( x ) dx
−∞ −∞

If X is a discrete random variable having the probability mass function


p ( x ) , then the expected value of X is defined as
n n
E ( X) = ∑ ( ) ∑ x p(x )
xi p ( xi ) and E Xr = r
i i
i=1 i=1

Some properties of mathematical expectation are:


• E ( a ) = a where ‘a’ is a constant

• E ( aX ) = aE ( X )

• E ( aX ± bY=
) aE ( X ) ± bE ( Y )
Variance
If X is a random variable then the variance of X in terms of expectation is
defined as

Var ( X ) =
2
E  X − E ( X )  = ( )
E X2 − E ( X ) 
2

Some properties of variance are:


• Var ( a ) = 0

• Var ( aX ) = a2 Var ( X )

• If random variables X and Y are independent, then

Var ( aX ±=
bY ) a2 Var ( X ) + b2 Var ( Y )

Now, try the following Self Assessment Question.

SAQ 1
If θ represents the average marks (out of 50) of the learner in the Term-End-
Exam paper of the MST-016 course, then find the parameter space of θ.

After understanding the basic definition and terminology which will help you to
understand the properties of a good estimator. We now finally discuss the
properties of a good estimator in the next section. 167
Block 2 Properties of Good Estimator

6.3 PROPERTIES OF GOOD ESTIMATOR


It is to be noted that a large number of estimators can be proposed for an
unknown parameter. For example, in our case of estimating the average age
of Facebook users, some possible estimators are:
607
• Sample mean=
X = 30.25
20

 22 + 30
• Sample median
= X = 26
2
• Sample mode X0 = 20

max + min 70 + 14
• Average of extreme=
users = = 42
2 2
Now, the questions arise,
• Which estimator should you use, that is, which is likely to give estimates
closer to the true (but unknown) population value?
• Are some of the possible estimators better, in some sense, than the
others?”
In general, an estimator whose sampling distribution concentrates as closely
Unbiasedness as possible near the true value of the parameter may be regarded as a good
estimator. To give the answer to the above questions, Prof. Ronald A. Fisher
gave some properties of a good estimator which are as follows:
Consistency • Unbiasedness
Characte • Consistency
ristics of
good • Efficiency
estimator
Efficiency • Sufficiency
We shall discuss these properties one by one in the subsequent Units.
Now, answer the following Self Assessment Question.
Sufficiency
SAQ 2
Write properties of a good estimator.

We now discuss the first characteristic of a good estimator in the next section.

6.4 UNBIASEDNESS
In the previous units, you have studied that any statistic such as sample mean,
Any statistic which is used
sample variance, sample proportion, etc. which is used to estimate an
to estimate an unknown
unknown population parameter is known as an estimator. You also saw that
population parameter is
the value of any estimator changes from sample to sample, therefore, we
known as estimator.
consider the estimator as a random variable and we can find the mean and
variance of the estimator. So we can define an estimator as an unbiased
estimator as:
An estimator is said to be unbiased for a population parameter if and only if
the average or mean of the sampling distribution of the estimator is equal to
168 the true value of the parameter. This property of the estimator is called
Unit 6 Unbiasedness

unbiasedness.
Let us see some examples,
• In Unit 1, you have seen that the mean of the sampling distribution of the
sample mean of monthly salary of the employees is equal to the mean
salary of all employees of the industry. So sample mean is an unbiased
estimate of the population mean.
• Similarly, in Unit 3, we saw that the mean of the sample proportions of the
children who like to dance is equal to the population proportion. Therefore,
sample proportion is an unbiased estimate of the population proportion.
In general, we denote any population parameter such as a population mean,
population standard deviation, population proportion, and so on by the Greek
letter theta θ, and its estimator such as the sample mean, sample standard
deviation, and sample proportion by T or θ̂ (pronounced as “theta-hat”).

Mathematically,
If X1 , X2 , ..., Xn is a random sample of size n taken from a population whose
probability density (mass) function is f(x,θ) where θ is the population
parameter then an estimator T = t(X1 , X2 , ..., Xn ) is said to be an unbiased
An estimator is said to be
estimator of the parameter θ if and only if unbiased if the expected
E(T) = θ value of the estimator is
equal to the true value of
for all possible values of the parameter θ.
the parameter being
However, if the expected value of the estimator does not equal to the true estimated.
value of the parameter, then the estimator is said to be a “biased estimator”,
that is, if
E(T) ≠ θ

then the estimator T is called the biased estimator of θ.


We can also define bias as
The distance between the estimate obtained from a sample and the actual
value of the population parameter from which the sample was taken is called
bias.
The amount of biases is given by
b(θ)
= E(T) − θ

• If b(θ) > 0 or E(T) > θ, then the estimator T is said to be positively biased
for the parameter θ.
• If b(θ) < 0 or E(T) < θ, then the estimator T is said to be negatively biased
for the parameter θ.
• If E(T) → θ as n → ∞ , that is, if an estimator T is unbiased for a large
sample only then the estimator T is said to be asymptotically unbiased for
1
θ. For example, suppose E(T) = θ + then as n → ∞,E(T) → θ .
n

An unbiased estimator is generally preferred in comparison to a biased 169


Block 2 Properties of Good Estimator

estimator.
Now, we explain the procedure to show whether an estimator is unbiased or
not for a parameter with the help of some examples.
Example 1: Show that the sample mean (X) is an unbiased estimator of the
population mean (µ) if it exists.
Solution: Let X1, X2, …, Xn be a random sample of size n taken from any
population with mean µ. We have to show that the sample mean X is an
unbiased estimator for µ, therefore, we have to find E(X) and check whether
it is equal to µ or not. That is,

( )
E X =μ

Consider,
 X + X2 + ... + Xn 
( )
E X =E 1
 n 

[By defination of the sample mean]

1
= E ( X1 ) + E ( X2 ) + ... + E ( Xn )   E ( aX + bY=
) aE ( X ) + bE ( Y )
n

Since X1, X2,…, Xn are randomly drawn from the same population with mean μ
and variance σ2, therefore,
E(X1=
) E(X2=
) ...= E(Xn=
) E(X)= μ

Thus,

1 
E(X)
=  μ + μ + ...=
+ μ  1=(nμ) μ
n      n
 n− times 

Hence, the sample mean (X) is an unbiased estimator of the population mean
μ. Also if x1 , x 2 , ..., xn are the observed values of the random sample
n
1
X1 , X2 , ..., Xn then x =
n
∑ xi is an unbiased estimate of the population mean.
i=1

Example 2: Suppose the speed of lightweight vehicles on a particular stretch


of roadway is normally distributed with a known standard deviation of 5 kph. A
researcher measured the speed of 10 lightweight vehicles randomly and
obtained the following results:

Vehicle 1 2 3 4 5 6 7 8 9 10
Speed
(in kph) 62 70 65 68 64 65 70 64 55 60

(i) Find the point estimate of the average speed.


(ii) Show that the sample average speed is an unbiased estimator of the
average speed of all the lightweight vehicles on the roadway.
Solution: Generally, to draw the inference about the population mean, we use
the sample mean, therefore, to find the point estimate of the average speed,
we use the sample mean.
170 We can obtain the point estimate of average speed as
Unit 6 Unbiasedness

1 n X1 + X2 + ... + Xn
=X =
n i=1
Xi ∑ n

62 + 70 + 65 + 68 + 64 + 65 + 70 + 64 + 55 + 60
= 64.3
10
Now, we have to show that the sample average speed is an unbiased estimate
of the average speed of all lightweight vehicles on the roadway.
Since the speed of the vehicles is normally distributed and standard deviation
(σ) is known, therefore, the sample average speed also follows a normal
distribution with mean µ and variance σ2/n.

( )
Thus, E X = μ

Hence the sample average speed is an unbiased estimate of the average


speed of all lightweight vehicles on the roadway.
Example 3: A machine produces a large number of water bottles. A quality
inspector selected 40 water bottles randomly and found 2 defective water
bottles. Find the point estimate of the proportion of all defective water bottles.
Solution: To draw the inference about the population proportion, we use the
sample proportion, therefore, to find the point estimate of the proportion of all
defective water bottles, we use sample proportion defectives.
Therefore, we can obtain the point estimate of the proportion of all defective
water bottles as
X 2
p
= = = 0.05
n 40

Hence, the point estimate of the proportion of all defective water bottles is
0.05.
Example 4: A furniture company manufacturing square tables of a side length
µ. Thus, the area of the table will be µ2 (unknown). Based on n independent
measurements X1, . . . , Xn of the length, estimate area of the table. Assume
that the measurements of the length have mean µ and variance σ2.

(i) Show that X2 is not an unbiased estimator for µ2.


2 2
(ii) For what value of k, is the estimator X − kS unbiased for µ2?

Solution: Since the measurements of the length have mean µ and variance
σ2, therefore, the sampling distribution of mean has mean and variance as
follows:

σ2
( )
E X = μ and Var ( X ) =
n

We have to show that X2 is not an unbiased estimator for µ2, therefore, we


have to find E(X2 ) and check whether it is equal to µ2 or not. But the question
is how we find the E(X2 ) without knowing the sampling distribution of X2 .
However, we know that

σ2
( )
Var X =
n
and by the formula of variance, we have
171
Block 2 Properties of Good Estimator

( )
2
Var
= ( )
X E X2 − E X 
  ( )
Therefore,

σ2
( )
2
E X2 = Var X + E X  =
  ( )
n
+ μ2 ≠ μ2 ( )
Hence, X2 is not an unbiased estimator of the area of the table, that is, µ2
and we can calculate the bias of the estimator as

σ2 σ2
( )
E X2 − μ2 =
n
(
⇒ E X2 − μ2 =
n
)  E ( a ) = a 

We now find the value of k such that the estimator X2 − kS2 is unbiased for µ2.

Since the estimator X2 − kS2 is unbiased for µ2, therefore,

(
E X2 − kS2 =
μ2 )
( )
E X2 − kE S2 = μ2 ( )  E ( aX ± bY=
) aE ( X ) ± bE ( Y )
σ2
μ2 + μ2 [since S2 is an unbiased estimator of σ2,i.e. E(S2)= σ2]
− kσ 2 =
n
Therefore,
1
k=
n
1 2 2
Hence, for k = , the estimator X − kS is unbiased for µ2.
n
Example 5: If X1 , X2 , ..., Xn is a random sample taken from a population with a
mean μ and variance σ2, then
n
∑ ( Xi )
1 2
=S′2 −X is a biased estimator of σ2.
n i=1

n 1 n
∑( )
2
whereas,
= S2 ′2 i.e. S2
S= Xi − X is an unbiased estimator of
n −1 n − 1 i=1
σ2.
2
Solution: We have to show that S′ is not an unbiased estimator for σ2,
therefore, we have to find E(S′2 ) and check whether it is equal to σ2 or not.
Therefore, we consider:

1 n 2 1  n 2
( )
S′2 E 
E =
n
Xi − X= E
 n  ∑(
Xi − X ) ∑( )   E (=
aX ) aE ( X ) 
=  i 1=  i 1 

1  n  1  n 2 n 2 n 
= E

n i 1
∑(
Xi2 + X2 −=
2Xi X 

E  Xi +

n  i 1 =i 1
)
X − −2X Xi 
 ∑ ∑ ∑
=  = =i 1 

1  n 2   1 n n 
= ∑
E  Xi + nX2 − 2XnX 
n  i=1    X = Xi ⇒
n i 1 =i 1
∑ ∑
Xi = nX 
  = 
172
Unit 6 Unbiasedness

1  n 2  1  n 2 
= E =∑
n  i=1
Xi − nX2 
 n   ∑
E  Xi  − nE X2 

( )
   i=1 

1 n 
= ∑ ( ) 2 2
 E Xi − nE X 
n  i=1 
( )
1 n 2  σ2 
= ∑(

n  i=1
)
μ + σ 2 − n  μ2 +
 
n  
[ As discussed in Example 4]

1  2 σ2 
= (2 2
)
n μ + σ − n  μ +
n 

n  

σ2 n − 1 2
= μ2 + σ 2 − μ2 − = σ
n n

E S′2
= ( ) n −1 2
n
σ ≠ σ2

2
Hence, S′ is not an unbiased estimator for σ2.
Now, we check whether S2 is unbiased of σ2 or not, therefore, we consider:

E S2
= ( ) =
n
n −1
( )
E S′2
n n −1 2
=
n −1 n
σ σ2

Hence, estimator S2 is an unbiased estimator for σ2.


2 2
This is the reason why we consider S in place of S′ for estimating the sample
variance.
Now, you can assess your understanding by answering the following Self
Assessment Question.

SAQ 3
a. A company produces batteries for laptops and wants to estimate the
average life of the battery. For that, the statistician of the company
selected 5 batteries from the production and measured their lives. He
suggests two estimators for estimating the average life of the battery:

X1 + X2 + X3 + X4 + X5 X1 + 2X2 + 3X3 + 4X4 + 5X5


T1 = , T2
5 15

where X1, X2, X3, X4 and X5 repersent life of the selected batteries. It is
known that the life of batteries has mean μ and variance σ2. Are both
estimators unbiased?
b. Show that the sample mean and sample median are both unbiased
estimators for mean (μ) of a normal distribution.

6.5 PROPERTIES OF UNBIASED ESTIMATOR


After understanding the concept of unbiasedness and how to check whether
an estimator is unbiased or not, we now discuss some properties of the
unbiased estimator as follows: 173
Block 2 Properties of Good Estimator

1. Unbiased estimators may not be unique. For example, the sample mean
and sample median are unbiased estimators of the population mean of a
normal population.

2. Unbiased estimators do not always exist for all parameters. For example,
for a Bernoulli distribution (θ), there is no unbiased estimator for θ2.
Similarly, for a Poisson distribution (λ), there exists no unbiased estimator
for 1/ λ.

3. If an estimator is unbiased for all types of distribution, then it is called an


absolutely unbiased estimator. For example, the sample mean is an
absolutely unbiased estimator of the population mean, if the population
mean exists.

4. If T and T* are two unbiased estimators of the parameter θ then

aT+ (1−a) T*
is also an unbiased estimator of θ where ‘a’ (0 ≤ a ≤ 1) is any constant.

For a better understanding of the unbiasedness try Self Assessment


Questions.

SAQ 4
Some patients with high blood pressure are randomly assigned to a placebo
group and a treatment group. The placebo patients receive an inactive pill, and
the treatment patients receive a new drug that is expected to lower blood
pressure. After the patients are treated for two months, the high blood
pressures of the patients of both groups are measured and given as follows:
Placebo
140 165 170 140 135 170 165 150 140
Group (X)
Treatment
130 135 140 130 120 120 132 118 120
Group (Y)

If E ( X ) μ=
= 1, Var ( X ) σ1 and
2
= E ( Y ) μ=
2 , Var ( Y ) σ 2 , then
2

(i) ( )
Show that the statistic X − Y is an unbiased estimator of the parameter
(μ1 − μ2 ) . Also, find the estimate of the same using the given data.
(ii) Calculate the variance and standard deviation of the estimator in Part (i).
Also, find the estimate of standard error.
(iii) Calculate an estimate of the ratio σ1 / σ 2 .

We now end this unit by giving a summary of what we have covered in it.

6.6 SUMMARY
In this unit, we have covered the following points:
• If we estimate an unknown parameter by a single statistic then this
technique is known as point estimation whereas if we determine an
174 interval (using sample values) that contains the true value of the unknown
Unit 6 Unbiasedness

parameter with a certain confidence then it is known as interval


estimation.

• The set of all possible values that the parameter θ or parameters θ1, θ2,
…, θk can assume is called the parameter space. It is denoted by Θ .

• The properties of a good estimator are unbiasedness, consistency,


efficiency and sufficiency.

• An estimator is said to be unbiased if the expected value of the estimator


is equal to the true value of the parameter being estimated.

• The properties of an unbiased estimator.

6.7 TERMINAL QUESTIONS


1. A random sample X1, X2, …, Xn of size n taken from a population whose
pdf is given by
1 − x/θ
f(x,θ)
= e ; x > 0 ,θ > 0
θ

Show that sample mean (X) is an unbiased estimator of parameter θ.

2. Consider a population comprising three LED televisions of a certain


company. If the lives of the LED televisions are 8, 6 and 10 years then
construct the sampling distribution of the average life of the LED
televisions by taking samples of size 2 and show that the sample mean
2
is an unbiased estimator of the population mean life. Also, show that S′
is not an unbiased estimator of population variance whereas S2 is an
unbiased estimator of population variance where
n
1 n
∑( )
1 2
∑( )
2
=S′2 and S2
Xi − X = Xi − X
n i=1
n − 1 i=1

6.8 SOLUTIONS / ANSWERS


Self Assessment Questions (SAQs)
1. Since parameter θ represents the average marks in the paper of the
MST-016 course of 50 marks and there is no negative marking,
therefore, a learner can take a minimum 0 marks and a maximum 50
= {θ : 0 ≤ θ ≤ 50} .
marks. Thus, the parameter space of θ is Θ

2. Refer to Section 6.3.


3(a) We have to check whether estimator T1 and T2 are unbiased or not.
Therefore, we have to find E(T1) and E(T2) and check whether it is equal
to µ or not. Since X1, X2, X3, X4 and X5 are independent and taken from
the same population with a mean μ and variance σ2, therefore,

=E ( Xi ) μ =
and Var ( Xi ) σ=
2
for all i 1,2,...,5

We now consider
 X + X 2 + X3 + X 4 + X5 
E ( T1 ) = E  1 
 5  175
Block 2 Properties of Good Estimator
1
= E ( X1 ) + E ( X2 ) + E ( X3 ) + E ( X4 ) + E ( X5 ) 
5
1
=
5
[μ + μ + μ + μ + μ]

E ( T1 ) = μ

Similarly,

 X + 2X2 + 3X3 + 4X4 + 5X5 


E ( T2 ) = E  1 
 15 
1
= E ( X1 ) + 2E ( X2 ) + 3E ( X3 ) + 4E ( X4 ) + 5E ( X5 ) 
15 
1 1
=
15
[μ + 2μ + 3μ + 4μ + 5μ] = 15 (15μ)

E ( T2 ) = μ

Hence, both estimators T1 and T2 are unbiased estimators of μ.


3(b) Let X and X be the sample mean and sample median respectively.
We have seen in Unit 2 that if we draw samples from the population
whose mean is μ and variance σ2 then the sampling distribution of mean
has mean μ and variance σ2/n.
Therefore,

σ2
( )
E X = μ and Var ( X ) =
n

Hence, the sample mean is an unbiased estimator of the population


mean.
Similarly, the sampling distribution of the median has mean μ and
π σ2
variance . Therefore,
2 n
2
( )  = πσ
 = μ and Var X
E X ( )
2n

Hence, the sample median is also an unbiased estimator of the


population mean.

4. ( )
We have to show that X − Y is an unbiased estimator for (μ1 − μ2 ) ,
therefore, we have to find E ( X − Y ) and check whether it is equal to
(μ1 − μ2 ) or not. Thus, we consider
( ) ( ) ( )
E X − Y =E X − E Y =μ1 − μ2  E ( aX − bY ) =aE ( X ) − bE ( Y ) 

( )
Hence, the estimator X − Y is an unbiased estimator for (μ1 − μ2 ) .

We now find the estimate of the same using the given data. Since
estimate is the value of the estimator, therefore, we find the mean of
176 each group as
Unit 6 Unbiasedness

Placebo Treatment
X2 Y2
Group (X) Group (Y)
140 130 19600 16900
165 135 27225 18225
170 140 28900 19600
140 130 19600 16900
135 120 18225 14400
170 120 28900 14400
165 132 27225 17424
150 118 22500 13924
140 120 19600 14400
1375 1145 211775 146173

n
1 1 1375
=X ∑ =
n1 i=1
Xi = 152.78
9
n
1 2 1145
=Y =
n2 i=1

Yi = 127.22
9

Thus, we find the estimate of the parameter (μ1 − μ2 ) as

X−
= Y 152.78 − 127.22
= 25.56.

We now find the variance and standard deviation of X − Y as ( )


σ12 σ 22
( )
Var X − Y = Var X + Var Y = ( ) ( ) +
n1 n2

And standard deviation

σ12 σ 22
(
SD X − Y = ) (
Var X − Y = ) +
n1 n2

By the definition of standard error, we can find it as

σ12 σ 22
( )
SE X − Y = SD X − Y = ( ) +
n1 n2

We can compute the estimate of standard error as

σˆ 12 σˆ 22 S12 S22
(
SE X − Y = ) +
n1 n2
= +
n1 n2

Therefore, we first calculate S12 and S22 as


n1
1 1  n 
= 2
S1 − X)2
(Xi=
n1 − 1 i 1 =
∑ 
n1 − 1  i 1
∑ Xi2 − nX2 
= 
1
= ( 211775 − 9 × 152.78 × 152.78
= ) 212.43
8
n2 n
1 1  2 2 
=S22 − Y)2
(Yi=
n2 − 1 i 1 =

 Yi − nY 2 
n2 − 1  i 1  ∑ 177
= 
Block 2 Properties of Good Estimator
1
= (146173 − 9 × 127.22 × 127.22
= ) 63.58
8

Finally, we can compute the estimate of the standard error as

S12 S22 212.43 63.58


(
SE X − Y = ) +
n1 n2
=
9
+
9
= 30.67 = 5.54

Note that S2 is an unbiased estimator of σ2 however, S is not an


unbiased estimator of σ. Similarly, S1/S2 is not an unbiased estimator of
σ1/σ2.
An estimate of σ1/σ2 is S1/S2 (this is a biased estimate)

S1 S12 212.43
= = = 1.83
S2 S22 63.58

Terminal Questions (TQs)


1. We have to show that X is an unbiased estimator for θ, therefore, we
have to find E(X) and check whether it is equal to θ or not. Here, we are
given that
1 − x/θ
f(x, θ)
= e ; x > 0 ,θ > 0
θ

It is negative exponential distribution with parameter θ and know the


mean of this distribution as
E ( X) = θ

Since X1, X2,…, Xn are randomly drawn from the same population having
mean θ, therefore,
E ( X1=
) E ( X2 =) ...= E ( Xn =) E ( X=) θ
Therefore, we consider
 X + X2 + ... + Xn 
( )
E X =E 1
 n 

[By defination of sample mean]

1  E ( aX + bY ) 
= E ( X1 ) + E ( X2 ) + ... + E ( Xn )   
n = aE ( X ) + bE ( Y ) 

1 1
= (θ
 θ 
+ + ...=θ) =
+ (nθ ) θ
n n-times
n

Thus, the sample mean is an unbiased estimator of the parameter θ.


2. Here, the population consists of three LED televisions whose lives are 8,
6 and 10 years so we can find the population mean and variance as
8 + 6 + 10
=μ = 8
3
1 8
σ2 = ( 8 − 8 )2 + ( 6 − 8 )2 + (10 − 8 )2  = = 2.67
3  3

178 Here, we are given that


Unit 6 Unbiasedness

N = 3 and n = 2
Therefore, the possible numbers of samples (with replacement) that can
be drawn from this population are Nn = 32 = 9. For each of these 9
samples, we will calculate the values of X,S′2 and S2 by the formulae
given below:

1 n 1 n 1 n
∑( ) ∑( )
2 2
X = Xi , S′2 =
n n
∑Xi − X and S2 =
n − 1
Xi − X
=i 1 =i 1 =i 1

and the necessary calculations for these results are shown in the
following table:
Calculation for X, S′2 and S2
2
∑ ( Xi - X )
2
Sample
Sample
Observations X S′2 S2
i=1

1 8, 8 8 0 0 0
2 8, 6 7 2 1 2
3 8,10 9 2 1 2
4 6, 8 7 2 1 2
5 6, 6 6 0 0 0
6 6, 10 8 8 4 8
7 10, 8 9 2 1 2
8 10, 6 8 8 4 8
9 10, 10 10 0 0 0
Total 72 12 24

We calculate X,S′2 and S2 as


1
X1= ( 8 + 8 )= 8,
2

1
X2 = ( 8 + 6 )= 7,...,
2

1
X9 = (10 + 10 )= 10
2

1 2 2
S1′2=  ( 8 − 8 ) + ( 8 − 8 ) = 0,
2 

1
S′22= 

( 8 − 7 )2 + ( 6 − 7 )2 = 1,...,
2

1 2 2
S′92=  (10 − 10 ) + (10 − 10 ) =
 0
2

1 
S12
= 

( 8 − 8 )2 + ( 8 − 8=
)2  0,
2 −1

1  2 2
S22
=  ( 8 − 7 ) + ( 6 − 7=
)  2,...,
2 −1 179
Block 2 Properties of Good Estimator
1  2 2
S92
=  (10 − 10 ) + (10 − 10=
)  0
2 −1
From the above table, we have
k
1 1
( )
E X =
k
∑ Xi =9 × 72 =8 =μ
i=1

Hence, the sample mean is an unbiased estimator of the population


mean.
Also
k
( ) ∑ S′
E S′2 =
1
k
i
2 1
= × 12 =1.33 ≠ σ 2
9
i=1

2
Therefore, S′ is not an unbiased estimator of σ2 whereas,
k
( ) ∑S
E S2 =
1
k
2
i
1
= × 24 =2.67 =σ 2
9
i=1

Hence, the estimator S2 is an unbiased estimator of parameter σ2.

180
UNIT 7
CONSISTENCY

Structure
7.1 Introduction 7.5 Summary
Expected Learning Outcomes 7.6 Terminal Questions
7.2 Consistency 7.7 Solutions /Answers
7.3 Properties of Consistent
Estimator
7.4 Consistent Asymptotically
Normal Estimator

7.1 INTRODUCTION
In the previous unit, you have seen that there exists more than one estimator
for an unknown parameter. For example, for estimating unknown population
mean, we may use sample mean, sample median, sample mode, average of Tools You Will Need
extreme observations, etc. Now, the questions may arise: The following terms are
• Which estimator should we use, that is, which is likely to give estimates considered essential
closer to the true (but unknown) population value? background material for this
Unit. If you doubt your
• Are some of the possible estimators better, in some sense, than the knowledge of any of these
others?” terms, you should review
To answer the above questions, Prof. Ronald A. Fisher gave some the appropriate Unit or
section before proceeding:
characteristics of a good estimator which are as follows:
• Sampling distributions
(Units 2,3, 4 and 5).

• Basic terms of
estimation (Unit 6).

• Unbiased (Unit 6).

• Probability distributions
(MST-012).

In the previous unit, you studied one of the characteristics of a good estimator,
that is, unbiasedness.
An estimator is said to be unbiased for the population parameter if and only if
the average or mean of the sampling distribution of the estimator is equal to
the true value of the parameter. In other words, an estimator is said to be 181
Unit Writer- Dr. Prabhat Kumar Sangal, School of Sciences, IGNOU, New Delhi
Block 2 Properties of Good Estimator
unbiased if the expected value of the estimator is equal to the true value
of the parameter being estimated, that is,
E(T) = θ

This concept was defined for a fixed sample size. In this unit, you will learn
about consistency which is defined for increasing sample size.
This unit is divided into seven sections. Section 7.1 is introductory in nature.
Consistency is described with examples in Section 7.2. Section 7.3 is devoted
to describing various properties of the consistent estimator. Section 7.4 is
devoted to the study of an additional property of a consistent estimator, which
involves its asymptotic distribution. The unit ends by providing a summary of
what we have discussed in this unit in Section 7.5. The terminal questions and
the solution of the SAQs/TQs are given in Sections 7.6 and 7.7, respectively.
In the next unit, we shall discuss the third characteristic of a good estimator,
that is, efficiency.

Expected Learning Outcomes


After studying this unit, you should be able to:
 comprehend the concept of consistency of an estimator;

 describe various properties of a consistent estimator;


 explain the concept of a consistent asymptotically normal estimator; and
 define consistent asymptotically normal estimator.

7.2 CONSISTENCY
In the previous unit, we have learnt about unbiasedness. An estimator T is
said to be an unbiased estimator of a parameter, say, θ if the mean of the
sampling distribution of estimator T is equal to the true value of the parameter
θ, that is,
E(T) = θ

This concept was defined for a fixed sample size. In this section, we will learn
about consistency which is defined for increasing sample size. In general, we
construct an estimator as a function of an available sample of size n for a
parameter. Suppose we are able to keep collecting data and expanding the
In statistics, a consistent sample. In this way, we would obtain a sequence of estimates such as
estimator is an estimator T1 = t1(X1), T2 = t2(X1, X2), T3 = t3(X1, X2, X3),…, Tn = tn(X1, X2, ..., Xn),…. Here,
that converges to the true we denote an estimator as Tn (indexed by n) to represent the estimator based
value of the parameter as on sample size n, instead of T as used in the previous unit. The consistency is
the sample size increases. a property of what occurs as the sample size “grows to infinity”.
It means that the
If X1, X2, …, Xn is a random sample of size n taken from a population whose
estimation becomes more
probability density (mass) function is f(x,θ) where, θ is the population
and more accurate as
parameter then consider a sequence of estimators, say, {T1, T2,…, Tn}. A
more data is collected.
sequence of estimators {T1, T2,…, Tn} is said to be a consistent estimator for a
parameter θ if the deviation/difference of the values of an estimator from the
parameter tends to zero as the sample size increases. This indicates that as
182 sample size increases, the estimator values tend to approach the parameter.
Unit 7 Consistency
In other words, we can say that as the sample size approaches infinity, the
sampling distribution of a consistent estimator becomes concentrated on the
value of the parameter. It means that the standard error of the estimator
declines to 0 and the sampling distribution concentrates around the population
parameter.
For example, suppose {T1, T2, T3, …} is a sequence of estimators for
parameter θ whose true value is 5. As the sample size increases, the
sampling distributions of these estimators (as shown in Fig. 7.1) are getting
more and more concentrated near the true value θ = 5 (even the estimators
are biased) and the density is more tightly distributed around the true value.
As the sample size becomes infinite, the sampling distribution of the sequence
collapses to a spike at the true value. Therefore, we can say that this
sequence is consistent.

Fig. 7.1: Sampling distribution of a consistent estimator Tn with increasing sample size.

Formally, we can define a consistent estimator as

A sequence {Tn} of estimators or simply an estimator Tn is said to be a


consistent estimator of θ if Tn converges to θ in probability, that is
p
Tn  → θ as n → ∞ for every θ

i.e. for every ɛ > 0


lim P  Tn − θ < ε  → 1
n →∞

i.e. for every ɛ > 0 and η > 0, there exist n ≥ m such that

P  Tn − θ < ε  > 1 − η; n≥m

where m is some very large value of n. The above expressions are to mean
the same thing.

The above definition says that an estimator is said to be a consistent estimator


if the probability of accurate estimates (estimates close to the value of the
population parameter) increases as sample size increases. 183
Block 2 Properties of Good Estimator
Consistency as defined above is sometimes called weak consistency. If we
replace convergence in probability with almost sure convergence, then the
A sequence of random {Xn}
estimator is said to be strongly consistent. Therefore,
variables is said to be An estimator Tn of parameter θ is said to be strongly consistent, if it converges
almost sure convergence to almost surely to the true value of the parameter, that is,
a random variable X if {Xn}
lim P  Tn − θ < ε  =
1
converges to X with n →∞

probability 1. That is,


The definition of the consistent estimator could appear a little difficult to
lim P  Xn − X  =
n →∞
1 understand. Furthermore, it is not always simple to verify whether an estimator
is consistent or not using this definition. For this reason, we will present a
more straightforward applied criteria (known as sufficient conditions for
consistency) of consistency through which you can easily check the
consistency of an estimator.

Sufficient conditions for consistency


If {Tn} is a sequence of estimators or simply an estimator Tn is such that for all
θ ∈ Θ then an estimator Tn is a consistent estimator of θ if

(i) The estimator Tn is an asymptotically unbiased or simply unbiased


estimator of θ, that is,
E ( Tn ) → θ as n → ∞, and

(ii) The variance of estimator Tn decreases with increasing sample size. In


other words, we can say that the variance of the estimator approaches
Formally, an unbiased zero as n → ∞ , that is,
estimator Tn for parameter µ
Var ( Tn ) → 0 as n → ∞ .
is said to be consistent if
Var (Tn) approaches zero as Note: The concept of consistency relates to a sequence of estimators {Tn }n→∞
n → ∞. but we usually say consistency of an estimator Tn for simplicity. Further,
consistency is a large sample property of an estimator.
Since consistency is a large sample property of an estimator, some
statisticians suggest that consistency should not be used alone for judging the
goodness of an estimator; rather it should be used along with other criteria.

After understanding the concept of consistency, let us take some examples to


understand how the definition and sufficient conditions for a consistent
estimator are used.

Example 1: Prove that the sample mean is a consistent estimator of the


population mean (µ) provided that the population has finite variance.
Solution: Let X1 , X2 , ..., Xn be a random sample taken from a population
having mean µ and finite variance σ2. By the definition of consistency, we
consider lim P  Tn − θ < ε  and check whether it converges to 0 or not.
n →∞

< ε  lim P  X − μ < ε 


lim P  Tn − θ= [here Tn = X and θ as μ]
n →∞ n →∞

To find this probability, we convert it to the standard form. Recall from Unit 1
that the sample mean ( X ) has mean µ and finite variance σ2/n, therefore, we
184 convert X − μ in the standard form by dividing it by σ / n as
Unit 7 Consistency

 X −μ ε n
lim P  T=
n − θ < ε
 lim P  < 
n →∞ n →∞
 σ / n σ 

By the central limit theorem (described in Unit 1 of this course), you know that
X −μ
the variate Z = is a standard normal variate for large sample size n.
σ/ n
Therefore,
 ε n
θ < ε  lim P  Z <
lim P  Tn −= 
n →∞ n →∞
 σ 

 −ε n ε n
= lim P  <Z<   X < a ⇒ −a < X < a 
n →∞
 σ σ  
ε n
σ
 b

= lim ∫ f ( z ) dz  P [a < U < b] =
∫ f ( u ) du 
n →∞
−ε n  a 
σ
ε n
σ
1  1 
∫  f ( z ) =
2 2
= lim e− z /2
dz e− z /2

n →∞
ε n 2π  2π 

σ

1

2
= e− z /2
dz
−∞ 2π

1 2
Since e− z
/2
is the pdf of a standard normal variate Z, therefore, the

integration of this in the whole range −∞ to ∞ , is unity.

Thus,

 lim P  X − μ < ε=
lim P  Tn − θ < ε=  1 as n → ∞
n →∞ n →∞  

Hence, the sample mean is a consistent estimator of the population mean.


I think that you may feel this process is a little difficult to show whether an
estimator is consistent or not. Furthermore, it is not always simple to verify
whether an estimator is consistent or not using this definition. For this reason,
you can use sufficient conditions for consistency. Let us solve this example
with the help of sufficient conditions for consistency.
First, we have to show that the sample mean is asymptotically unbiased or
simply an unbiased estimator for parameter µ. Therefore, we have to find
( )
E X and check whether it is equal to µ or not as n → ∞ . Consider,

 X + X2 + ... + Xn 
( )
E X =E 1
 n 

[by definition of the sample mean]

1
= E ( X1 ) + E ( X2 ) + ... + E ( Xn )   E ( aX + bY=
) aE ( X ) + bE ( Y )
n
Since X1, X2,…, Xn are randomly drawn from the same population, therefore,
they also have same mean and variance. Therefore,
E(X1=
) E(X2=
) ...= E(Xn=
) E(X)= μ
185
Block 2 Properties of Good Estimator
Thus,

1  1
 n( )
E(X)
=  μ + μ + ...=+ μ = nμ μ
n   
n-times 

( )
E X =μ

Hence, the sample mean (X) is an unbiased estimator of the population mean
µ.
Now, we consider the variance of the sample mean (X) and check whether it
converges to 0 or not as n → ∞ .

Thus, we consider,

1 
Var
= ( )
X Var  ( X1 + X2 + ... + Xn ) 
n 
If X and Y are two
independent random 1
=  Var ( X1 ) + Var ( X2 ) + ... + Var ( Xn ) 
variables, then n2 
Var(aX±bY) = a2 Var(X) + b2 1  2 2 1
Var(Y).
=
n2
σ + σ2
   = n2 nσ
+ ... + σ (2
)
 n-times 

σ2
( )
Var X=
n
→ 0 as n → ∞

Hence, by the sufficient conditions of consistency, we can say that the sample
mean (X) is a consistent estimator of the population mean.

We prove this result in general, so the sample mean is always an unbiased


and consistent estimator of the population mean of all populations which
follows normal, Poisson, binomial, exponential, etc. Let us consider the next
example.

Example 2: If the number of weekly accidents occurring on a mile stretch of a


particular road follows a Poisson distribution with parameter λ then show that
the sample mean (X) is a consistent estimator of λ. Also, find the estimate of
parameter λ on the basis of the following data:

Number of Accidents 0 1 2 3 4 5 6

Frequency 10 12 12 9 5 3 1

Solution: Here, the number of weekly accidents occurring on a mile stretch of


a particular road follows a Poisson distribution with parameter λ. You also
know that the mean and variance of Poisson distribution (λ) are
E ( X) =
λ and Var ( X ) =
λ

Since X1 , X2 , ..., Xn are independent and come from the same Poisson
distribution, therefore,
E ( Xi ) =
E ( X) =
λ and Var ( Xi ) =
Var ( X ) =
λ for all i = 1, 2, …, n

186
We now consider
Unit 7 Consistency

1 
( )
X E  ( X1 + X2 + ... + Xn ) 
E=
n 
[by definition of the sample mean]

1
= E ( X1 ) + E ( X2 ) + ... + E ( Xn )   E ( aX + bY=
) aE ( X ) + bE ( Y )
n
1  1
= λ +
λ  λ  = ( nλ ) =λ
+ ... +
n n-times  n

( )
E X = λ

Thus, the sample mean is an unbiased estimator of the parameter λ.


Now we consider the variance of the sample mean and check whether it
converges to 0 or not as n → ∞ .

Therefore, we consider, If X and Y are two


independent random
1 
Var
= ( )
X Var  ( X1 + X2 + ... + Xn ) 
n 
variables, then

Var(aX±bY) = a2 Var(X) + b2
1
= 2  Var ( X1 ) + Var ( X2 ) + ... + Var ( Xn )  Var(Y).
n
1   1
= λ +
 + ... +
λ  =
λ ( nλ )
n2  n-times  n2

λ
Var X =( ) n
→ 0 as n → ∞

Hence, by the sufficient conditions of consistency, we can say that the sample
mean is a consistent estimator of the parameter λ of the Poisson distribution.
Since the mean of the Poisson distribution is λ, therefore, we estimate the
population mean by the sample mean. Thus, we calculate the sample mean of
the given data as follows:

S. No. Number of Accidents(X) Frequency(f) fX

1 0 10 0

2 1 12 12

3 2 12 24

4 3 9 27

5 4 5 20

6 5 3 15

7 6 1 6

N = 52 ∑ fX = 104

The formula for calculating the mean is


1
X=
N
∑ fX where N is the total number of accidents.

1
= × 104 =2
52
Hence, the estimate of the parameter λ is 2. 187
Block 2 Properties of Good Estimator

Example 3: A company produces ball bearings. The quality control inspector


found that there is a variation in the diameters of the steel ball bearings. To
estimate the variation in the diameter, he randomly selected n ball bearings
from the production line and measured the diameter of each selected ball
bearing. Suppose the measured diameters of the ball bearings are
X1, X2 ,, Xn . He proposed an estimator for the variance of all ball bearings
which is given as follows:
1 n
( )
2
=S′2 ∑ Xi − X
n i=1

If the actual variation in the diameter of the ball bearing is σ2 then show that
S′2 is a consistent estimator.
Solution: To show that the proposed estimator S′2 is consistent, we have to
show that
• It is asymptotically unbiased, that is
E( S′2 ) → σ2 as n →∞ and

• Its variance tends to zero as n tends to infinity, that is,


Var( S′2 ) → 0 as n →∞

Therefore, we consider
1 n 2 1  n 
( ) ( ) ( )
2
S′2
E = E  ∑ Xi − X= E  ∑ Xi − X   E (=
aX ) aE ( X ) 
=  n i 1=  n i 1 
1  1 n n

( ) ( ) =(n − 1) S
2 2
= E ( n − 1) S2   S2 = ∑ i X − X ⇒ ∑ Xi − X 2

n = n − 1 i 1=i 1 

σ 2  ( n − 1) S 
2

= E  [multiplying and dividing by σ2]


n  σ 2 

Recall from Units 3 and 4 that


(n − 1) S2
follows a chi-square distribution with
σ2
(n − 1) degrees of freedom and from the properties of the chi-square
distribution, we have the mean and variance of the chi-square distribution with
(n − 1) degrees of freedom as (n − 1) and 2(n − 1), therefore, we have
 ( n − 1) S2 
E = E χ(2n−1)  =
n −1
σ 2  
 
 ( n − 1) S2 
And Var   = Var χ(n−1)  = 2 ( n − 1)
2
2
 σ 

Therefore, we have

σ 2  ( n − 1) S  σ 2
2
σ2
( )
E S′ 2
= E
n  σ 2
 = ( n − 1) = σ 2

 n n

Therefore,

 σ2  2  σ2 
( )
lim E S′2= lim  σ 2 − =
n 
 σ  lim = 0
188
n →∞ n →∞
  n→∞ n 
Unit 7 Consistency

Hence S′2 is an asymptotically unbiased estimator for σ2.


We now consider the variance of S′2 as
1 n 
( ) ( )
2
Var S′2
= Var  ∑ Xi − X 
 n i=1 
1  n 
( )
2
=
n2
Var  ∑ Xi − X   Var ( aX ) =
a2 Var ( X ) 
 i=1 

(σ ) (σ )
2 2
 ( n − 1) S2 
2 2
1
= 2 Var σ2 = Var χ 2
n −1
= × 2 ( n − 1)
n  σ2  n2 n2

1 1 
n →∞
( )
lim Var S′2 = 2σ 4 lim  − 2  = 2σ 4 × 0 = 0
n →∞ n
 n 

1 n
( )
2
Hence, the estimator
= S′2 ∑ Xi − X
n i=1
is a consistent estimator for

population variance σ2.


It is now time for you to try the following Self Assessment Question to make
sure that you have understood consistency.
SAQ 1
(i) If the grades of the students in a paper of the MSCAST programme follow
a normal distribution with mean µ and variance σ2 then show that the
sample median is a consistent estimator of the population mean (µ).
(ii) The magnitude of earthquakes recorded in a region modelled as an
exponential distribution with an unknown parameter θ whose pdf is given
by
1 − θx
f ( x,θ
= ) e ; x > 0,θ > 0
θ
A researcher considered the following two estimators for the parameter θ:
1 n 1 n
T1 = ∑
n i=1
Xi and T2 = ∑ Xi
n + 1 i=1

Check whether both estimators are unbiased and consistent.


(iii) The average weight gained by a person over the winter months is
uniformly distributed and ranges from θ to θ+6 lbs, then show that the
sample mean is an unbiased as well as a consistent estimator of ( θ + 3 ) .

I think you have understood what a consistent estimator is and how to check
whether an estimator is consistent or not. Let us study the properties of a
consistent estimator in the next session.

7.3 PROPERTIES OF CONSISTENT ESTIMATOR


After understanding the concept of consistency and how to check whether an
estimator is a consistent estimator or not, we now discuss some important
properties of a consistent estimator as follows:

1. Consistent estimators may not be unique. For example, the sample mean
189
Block 2 Properties of Good Estimator

and the sample median both are consistent estimators of the population
mean of a normal population (see Example 1 and SAQ 1(i)). Also, the
sample variances S′2 and S2 , respectively are consistent estimators for
population variance (see Example 3 and TQ 2).

2. Consistent estimators need not be unbiased. For example, the sample


1 n
variance
= S′2 ∑
n i=1
(Xi − X)2 is not an unbiased estimator but consistent

(see Example 3).

3. An unbiased estimator need not be consistent (see Example 4).


A function f(x) is said to be
continuous at x = a if 4. If Tn is a consistent estimator of θ and f(θ) is a continuous function of θ
then f ( Tn ) will be the consistent estimator of f(θ). This property is known
lim
= f(x) lim
= f(x) f(a), In
x → a− x → a+
as invariance property. For example, if X is a consistent estimator of the
other words, we can say
population mean θ then e X is also a consistent estimator of eθ because
that a function is continuous
at a point x = a, if
eθ is a continuous function of θ.
L.H.L.
= R.H.L.
= =
=at x a=at x a
5. If Tn and Sn are consistent estimators of the parameters α and β,
value of the function at x
respectively, then
=a
(i) Tn + Sn is a consistent estimator of α + β

(ii) TnSn is a consistent estimator of αβ

(iii) If β ≠ 0 then Tn/Sn is a consistent estimator of α/β


Let us study the use of the properties of a consistent estimator with the help of
an example.

Example 4: The height of the person living on hills follows a normal


distribution with mean µ inches and variance σ2 square inches. To estimate the
average height, a researcher suggested an estimator T = X1 (the first
observation of a sample). Check whether it is unbiased and consistent.

Solution: To check whether the suggested estimator T is unbiased or not, we


have to find E(T). Since X1 was taken from the same population with a mean µ
and variance σ2, therefore,
E ( X1 ) = μ and Var ( X1 ) = σ 2

Thus,
E ( T ) E=
= ( X1 ) μ
Therefore, the estimator T= X1 is an unbiased estimator.
Since Var ( T ) = Var ( X1 ) = σ 2 but it does not converge to zero as n tends to
infinity, therefore, by the sufficient conditions of the consistency, it is not a
consistent estimator.
Example 5: Consider Example 2. Find the consistent estimator of λ(λ –1).
Also, find the estimate of it.

Solution: In such questions where we have to find out consistent estimator


190 for a function of a parameter, we solve the question as follows:
Unit 7 Consistency

(i) First, we find the consistent estimator of the parameter (here λ).
(ii) After that, we check whether the given function is continuous or not.
(iii) Finally, we use the invariance property of the consistent estimator to find
the consistent estimator of the given function.
In Example 2, we showed that the sample mean (X) is a consistent estimator
of the parameter λ. Therefore, we move the second point and check whether
λ(λ –1) is a continuous function or not. Since λ(λ –1) is a polynomial and we
know that each polynomial is a continuous function, therefore, λ(λ –1) is a
continuous function. Since the sample mean is a consistent estimator of λ and
λ(λ –1) is a continuous function of λ, therefore, by the invariance property of
consistency X(X − 1) will be the consistent estimator of λ(λ –1).

In Example 2, we also calculated the values of the sample mean as


X=2
Therefore, the estimate of λ(λ –1) is X(X − 1) = 2 × (2 − 1) = 2 .

It is now time for you to try the following Self Assessment Question.

SAQ 2
Suppose it is known that the probability that a certain company experiences a
network failure in a given week is θ and the distribution of the number of
weeks the company does not experience a network failure follows a geometric
distribution with parameter θ, then show that the sample mean X is a
consistent estimator of 1/θ. Also, find a consistent estimator of e1/θ.

After studying the properties of a consistent estimator, the next section is


devoted to a study of an additional property of a consistent estimator, which
involves its asymptotic distribution with a suitable normalisation. These play a
key role in large sample inference theory.

7.4 CONSISTENT ASYMPTOTICALLY NORMAL


ESTIMATOR
In the previous sections, you have studied that a sequence of estimators
{T1, T2,…, Tn} or simply an estimator Tn is said to be a consistent estimator for
a parameter θ if the sampling distribution of the estimator Tn becomes
concentrated on the value of the parameter as the sample size approaches
infinity. Everything is OK with a consistent estimator, but we face mainly two
problems with the consistent estimator which are given as follows:
• The shape of the sampling distribution of most of the estimators
(consistent) is not known because the estimators are complicated non-
linear functions of random samples. If we know the sampling distribution
of our estimator for every sample size, we could use it to draw inferences
using this finite-sample distribution.
• If the sampling distribution of the consistent estimator is known then its
shape changes with the sample size.
For example, in Example 1, you have seen that the sample mean is a 191
Block 2 Properties of Good Estimator

consistent estimator for the population mean. To show the impact of the
sample size on the sampling distribution of the consistent estimator, we plot
the sampling distribution of the estimator (the sample mean for the population
mean), say, θ = 5 for sample sizes 100, 300, 500, 1000… as shown in
Fig. 7.2.

Fig. 7.2: Sampling distribution of sample mean with increasing sample size.

From Fig. 7.2, you can see that as the sample size increases the shape of the
sampling distribution changes and its average becomes increasingly tight
around the true value of the population mean. Also, in Unit 15 of the course
MST-012: Probability and Probability Distributions, you have studied that
convergence in probability implies convergence in law (distribution), therefore,
the estimator Tn converges to the parameter θ = 5 in distribution as sample
size approaches infinity. It means that the asymptotic distribution of the
consistent estimator Tn is degenerate at θ (in our case θ = 5, as shown in
Fig. 7.2). It indicates that the estimator will take one value θ with probability 1.
Such a degenerate distribution is not helpful to find the rate of convergence or
to find an interval estimator of θ. If we know the sampling distribution of our
estimator for every sample size, we could use it to draw inferences using this
finite-sample distribution. Hence, we aim to find an estimator whose sampling
distribution does not change for a large sample size. For that, we use the
concept of consistent asymptotically normal distribution.
It is observed that if we re-centre and re-scale the estimator, then the form of
the sampling distributions of the new version of the estimator does not change
with sample size and non-degenerate as the sample size tends to infinity.
Also, the shape of the sampling distribution gets arbitrarily close to a normal
distribution as the sample size increases. For illustration purposes, instead of
looking at the distribution of the estimator Tn = X for sample size n, let’s look
at the distribution of n ( Tn − θ0 ) , where θ0 is the true value of the population

192
mean (parameter) for which the estimator Tn is consistent. We plot again the
Unit 7 Consistency

sampling distribution of n ( Tn − θ0 ) =n(X − 5.0) instead of the sample mean


with the sample sizes 100, 300, 500 and 1000 in Fig. 7.3.

Fig. 7.3: Sampling distribution of a re-centre and re-scale sample mean with increasing
sample size

From the above figure, you can observe that the sampling distributions of
n ( Tn − θ0 ) =n(X − 5.0) (for sample size 100, 300, 500 and 1000) are
indistinguishable from each other and look closely. They also follow the normal
distributions with mean 0 and a constant variance σ2 (population variance). In
other words, we can say that the distribution of n ( Tn − θ0 ) =n(X − 5.0) gets
arbitrarily close to a N(0, σ2) distribution as n → ∞. Therefore, we can define
the consistent asymptotically normal estimator as follows:
An estimator Tn is said to be a consistent asymptotically normal estimator for
the parameter θ if the sampling distribution of n ( Tn − θ0 ) follows a normal
distribution with mean 0 and constant variance σ2.
We now end this unit by giving a summary of what we have covered in it.

7.5 SUMMARY
In this unit, we have covered the following points:

• A consistent estimator is an estimator that converges to the true value of


the parameter as the sample size increases. This means that the estimate
becomes more and more accurate as more data is collected.
• A sequence {Tn} of estimators or simply an estimator Tn is said to be
consistent for a parameter θ, if Tn converges to θ in probability, that is,
p
Tn  → θ as n → ∞ for every θ

• If {Tn} is a sequence of estimators such that for all θ ∈ Θ then an estimator


Tn is a consistent estimator of θ if
193
Block 2 Properties of Good Estimator

 The estimator Tn is an asymptotically unbiased or simply unbiased

estimator of θ, that is,


E ( Tn ) → θ as n → ∞, and

 The variance of estimator Tn decreases with increasing sample size,


or we can say that variance of the estimator approaches zero as
n → ∞ , that is,

Var ( Tn ) → 0 as n → ∞ .

• If Tn is a consistent estimator of θ and f(θ) is a continuous function of θ


then f ( Tn ) will be the consistent estimator of f(θ). This property is known
as invariance property.

• An estimator Tn is said to be a consistent asymptotically normal estimator


for the parameter θ if the sampling distribution of n ( Tn − θ0 ) follows a
normal distribution with mean 0 and constant variance σ2.

7.6 TERMINAL QUESTIONS


1. Define consistency.

2. Consider Example 2 and suppose the quality control inspector proposed


1 n
( )
2
the estimator
= S2 ∑
n − 1 i=1
Xi − X for the population variance σ2 then

show that it is also consistent.

7.7 SOLUTIONS / ANSWERS


Self Assessment Questions (SAQs)

1. (i) To check the consistency, first, we have to show that the sample
median ( X ) is asymptotically unbiased or simply an unbiased
 and
estimator for population mean µ. Therefore, we have to find E(X)
check whether it is equal to µ or not as n → ∞ .

Recall from Unit 2 that if we draw the samples from a normally/non-


normally distributed population with mean µ and variance σ2, then the
sampling distribution of median will be normally distributed, only for the
large sample, with mean
  = μ and standard deviation
E  X 

  = 1.253 σ
SD  X  n
  = μ , therefore, the sample median is an unbiased
Since E  X 
estimator of the population mean µ.

We now consider the variance of the sample median ( X  ) and check

194
whether it converges to 0 or not as n → ∞ . Therefore, we consider,
Unit 7 Consistency
2
 σ  σ2
Var
= 
X( ) (=
SD )  1.253 =
2


 1.57
n n

( )
 =→ 0 as n → ∞
Var X

Hence, by the sufficient conditions of consistency, we can say that the


sample median ( X ) is a consistent estimator of the population mean.

(ii) Since the magnitude of earthquakes recorded in a region is modelled


as the exponential distribution, therefore, it has mean as θ and
variance θ2.
We have to check whether estimators T1 are T2 are unbiased and
consistent. Therefore, first, we check whether they are unbiased. For
that, we have to find E(T1) and E(T2) and check whether these are
equal to µ or not. Let X1, X2,…, Xn be a random sample of size n taken
from exponential distribution. Since the sample observations are
independent and taken from the same population with a mean θ and
variance θ2, therefore,
E ( Xi ) = θ and Var ( Xi ) = θ2 for all i = 1, 2, …,n

Consider,
 1 n  1
E ( T1 ) = E  ∑ = Xi  E ( X1 ) + E ( X2 ) + ... + E ( Xn ) 
 n i=1  n

1  1
= θ θ 
+
 + ...= θ =
+ (nθ ) θ
n n-times  n
E ( T1 ) = θ

Hence, the estimator T1 is an unbiased estimator of θ.


Similarly,
 1 n  1
E ( T2 ) = E  = ∑ Xi  E ( X1 ) + E ( X2 ) + ... + E ( Xn ) 
 n + 1 i=1  n + 1

1  
= θ θ 
+
 + ... +
θ
n + 1 n-times 
1 n
= = (nθ ) θ
n +1 n +1
1 
Since E ( T2 ) = + 1 θ ≠ θ
n 
Hence, the estimator T2 is not an unbiased estimator of the parameter
θ.
We now check the consistency of both estimators. For that, we check
whether the variances of these estimators converge to 0 or not as
n → ∞ . Therefore, we consider

1
Var
= ( T1` ) Var  ( X1 + X2 + ... + Xn )
n  195
Block 2 Properties of Good Estimator
1
=  Var ( X1 ) + Var ( X2 ) + ... + Var ( Xn ) 
n2 

1  2  1
=
n2
θ + θ2 + ... + θ2  = 2 nθ2
 ( )
 n-times  n

θ2
Var ( T1`=
) → 0 as n → ∞
n

Hence, E ( T1 ) = θ and Var ( T1` ) → 0 as n → ∞ , therefore, by the


sufficient conditions of consistency, the estimator T1 is a consistent
estimator of the parameter θ.

We consider

 1
Var ( T2` ) Var 
= ( X1 + X2 + ... + Xn )
n + 1 

1
=  Var ( X1 ) + Var ( X2 ) + ... + Var ( Xn ) 
(n + 1)
2

1  2  1
= θ + θ2 + ... + θ2  =
 nθ2 ( )
(n + 1)   ( n + 1)
2 2
n − times

nθ2
Var ( T2` ) == → 0 as n → ∞
(n + 1)
2

1
E ( T2 )
Also,= θ → θ as n → ∞
 1
1 + n 
 

Hence, E ( T2 ) → θ and Var ( T2` ) → 0 as n → ∞ , therefore, by the


sufficient conditions of consistency, the estimator T2 is also a
consistent estimator of the parameter θ.

Hence, both estimators T1 and T2 are consistent but not unbiased


estimators of the parameter θ.

(iii) As we know the pdf of the uniform distribution with parameters ‘a’ and
‘b’ as follows:

1
f (x)
= ; a≤ x≤b
b−a

The mean and variance of the uniform distribution are given as follows:

(b − a )
2
a+b
E ( X) = and Var ( X ) =
2 12

In our case, a = θ and b = θ + 6, therefore, the pdf, mean and variance


196
are given as follows:
Unit 7 Consistency

1 1
(x)
f= = ; θ≤x≤θ+6
θ+6−θ 6

(θ + 6 − θ) 3
2
θ+θ+6
E ( X) = θ + 3 and Var ( X ) ==
=
2 12
Let X1 , X2 , ..., Xn denote the weight of the person on n random winter
days, therefore,
E ( Xi ) =
E ( X) =
θ + 3 and Var ( Xi ) =
Var ( X ) =
3 for all i = 1, 2, ..., n

To show that X is unbiased estimator of ( θ + 3 ) , we consider

1  By definition of 
E(X)
= E  (X1 + X2 + ... + Xn ) sample mean 
 n   

1  E ( aX + bY ) 
= [E(X1 ) + E(X2 ) + ... + E(Xn )]  
n = aE ( X ) + bE ( Y ) 

1 
= ( θ + 3 ) + ( θ + 3 ) + ... + ( θ + 3 ) 
n   
 n-times 
1
= n ( θ + 3 ) = θ + 3
n
Therefore, the sample mean is an unbiased estimator of ( θ + 3 ) .

For consistency, we have to show that

( ) ( )
E X → θ + 3 and Var X → 0 as n → ∞

Therefore, we consider,
1 
Var
= ( )
X Var  (X1 + X2 + ... + Xn )
n 
1
=
n2
[ Var(X1 ) + Var(X2 ) + ... + Var(Xn )]
1 
= 3 +
2  
3 
+ ... +
3
n  n-times 
1 3
2 (
= = 3n )
n n
3
( )
Var X =
n
→ 0 as n → ∞

( )
Thus, E X = θ + 3 and Var(X) → 0 as n → ∞

Hence, by the sufficient conditions of the consistency, the sample


mean is also a consistent estimator of ( θ + 3 ) .

2. We know that the mean and variance of geometric distribution(θ) which


are given by

1 1− θ
=E(X) = and Var ( X )
θ θ2 197
Block 2 Properties of Good Estimator
Since X1 , X2 , ..., Xn are independent and come from the same geometric
distribution, therefore,
E(Xi ) = E(X) and Var ( Xi ) = Var ( X ) for all i = 1, 2, …, n

First, we show that the sample mean X is a consistent estimator of 1/θ.

Therefore, we consider,

1 
E=( )
X E  (X1 + X2 + ... + Xn ) [by definition of the sample mean]
n 

1  E ( aX + bY ) 
= [E(X1 ) + E(X2 ) + ... + E(Xn )]  
n = aE ( X ) + bE ( Y ) 

 
1 1 1 1
= + + ... + 
n  
θ 
θ  θ

 n-times 
1n  1
= =
n  θ  θ
1
( )
E X =
θ
1
Thus, the sample mean is an unbiased estimator of .
θ
We now consider
1 
Var
= ( )
X Var  (X1 + X2 + ... + Xn )
n 
1
=
n2
[ Var(X1 ) + Var(X2 ) + ... + Var(Xn )]
 
1  1 − θ   1 − θ   1 − θ 
= 2  2  +  2  + ... +  2  
n  θ   θ   θ 

 n − times 

1   1 − θ  1  1 − θ 
= = n
n2   θ2   n  θ2 

1  1− θ 
= ( )
Var X
n  θ2 
→ 0 as n → ∞

1
Since E X
= ( ) θ
and Var(X) → 0 as n → ∞

Hence, by the sufficient conditions of the consistency, the sample


mean X is a consistent estimator of 1/θ.
Since e1/θ is a continuous function of 1/θ, therefore, by the invariance
property of consistency, e X is a consistent estimator of e1/θ .

Terminal Questions(TQs)
198 1. Refer to Section 7.2.
Unit 7 Consistency

2. To show that the proposed estimator S2 is a consistent estimator, we


have to show that
• S2 is asymptotically unbiased, that is

E(S2) → σ2 as n →∞ and
• The variance of S2 tends to zero as n tends to infinity, that is,

Var(S2) →0 as n →∞

Therefore, we consider
 1 n 
( ) ( )
2
=E S2 E ∑ Xi − X
 n − 1 i=1

1  n 2
= (
E  ∑ Xi − X 
n − 1  i=1
)

1
= E ( n − 1) S2 
n −1
 1 n n

( ) ( ) =(n − 1) S
2 2


=
 S 2
= ∑ X i
n − 1 i 1=i 1
− X ⇒ ∑ Xi − X 2

σ 2  ( n − 1) S 
2

= E  [multiplying and dividing by σ2]


n − 1  σ 2 

Recall from Units 3 and 4 that


(n − 1) S2
follows the chi-square
σ2
distribution with (n − 1) degrees of freedom and from the properties of
the chi-square distribution, we have the mean and variance of the chi-
square distribution with (n − 1) degrees of freedom as (n − 1) and
2(n − 1), respectively, therefore, we have
 ( n − 1) S2 
E = E χ(2n−1)  =
n −1
σ 2  
 

and
 ( n − 1) S2 
 Var χ(n−1) 
2
Var  2
=
 σ 

 ( n − 1) S2 
Var  2
=  2 ( n − 1)
 σ 

Therefore, we have

σ 2  ( n − 1) S  σ 2
2

( )
E S
= 2
E
n − 1  σ 2
= 
n − 1
(n −=
1) σ 2


Hence S2 is an unbiased estimator for σ2.

We now consider the variance of S2 as


 1 n 
( ) ( )
2
Var S2
= Var  ∑ Xi − X
 n − 1 i=1


199
Block 2 Properties of Good Estimator
1  n

( )
2
= Var  ∑ Xi − X   Var ( aX ) = a Var ( X ) 
2

(n − 1)
2
 i=1 

1  2 ( n − 1) S2 
= Var σ 
(n − 1)
2
 σ2 

(σ ) (σ )
2 2
2 2

= Var χ =
2

n −1 × 2 ( n − 1)
(n − 1) (n − 1)
2 2

 1 
n →∞
( )
lim Var S2= 2σ 4 lim 

=
n →∞ n − 1 

2σ 4 × =
0 0

Hence, by the sufficient conditions of consistency, the estimator


1 n
( )
2
=S2 ∑
n − 1 i=1
Xi − X is a consistent estimator for the population

variance σ2.

200
UNIT 8
EFFICIENCY AND
MEAN SQUARED ERROR

Structure
8.1 Introduction 8.6 Minimum Variance Unbiased
Estimator
Expected Learning Outcomes
8.7 Summary
8.2 Concept of Efficiency
8.8 Terminal Questions
8.3 Most Efficient Estimator
8.9 Solutions /Answers
8.4 Properties of Efficient
Estimator
8.5 Mean Squared Error

8.1 INTRODUCTION
In the previous Units 6 and 7, we discussed two characteristics: unbiasedness
Tools You Will Need
and consistency of a good estimator with various examples. I hope you
understand both properties. You have also seen that the sample mean and The following terms are
sample median both are unbiased and consistent for the population mean µ considered essential
background material for this
when sampling is done from a normal population with mean µ and variance σ2.
Unit. If you doubt your
Now, the question may arise: Are they as “good” as one another, or is there
knowledge of any of these
some reason to prefer one over another? This means that we need to consider
terms, you should review
other characteristics of a good estimator to check which one is better in
the appropriate Unit or
comparison to another. Thus, this unit is devoted to explaining the concept of section before proceeding:
efficiency, mean squared error and minimum variance unbiased estimator
• Sampling distributions
which help us to compare estimators and make the decision which one is
(Units 2,3, 4 and 5).
better.
• Basic terms of
This unit is divided into nine sections. Section 8.1 is introductory in nature.
estimation (Unit 6).
There may exist more than one unbiased estimator of a parameter, therefore,
to check which one is better, we explain the concept of efficiency in Section • Unbiased and
8.2. If we have a class of unbiased estimators of a parameter, then to compare consistency (Units 6
and 7).
them, we use the concept of the most efficient estimator which is explained in
Section 8.3. Section 8.4 is devoted to discussing the properties of efficient • Probability distributions
estimators. Section 8.5 explains the concept of the mean squared error. (MST-012).
Section 8.6 describes the minimum variance unbiased estimator. The unit
ends by providing a summary of what we have discussed in this unit in Section
8.7. The terminal questions and the solution of the SAQ/TQ are given in 201
Unit Writer- Dr. Prabhat Kumar Sangal, School of Sciences, IGNOU, New Delhi
Block 2 Properties of Good Estimator
Sections 8.8 and 8.9, respectively.

Expected Learning Outcomes


After studying this unit, you should be able to:
 comprehend the concept of efficiency of an estimator;
 explain the concept of the most efficient estimator;
 describe various properties of an efficient estimator;
 define the mean squared error of an estimator; and
 describe the concept of minimum variance unbiased estimator.

8.2 CONCEPT OF EFFICIENCY


In some situations, we may see that there is more than one estimator of the
same parameter which are unbiased. For example, the sample mean and the
sample median both are unbiased estimators for the population parameter
mean µ when the sampling is done from a normal population with mean µ and
variance σ2. Now, the question may arise: Are they all as “good” as one
another, or is there some reason to prefer one over another? Let us assume
that we have two unbiased estimators say, T1 and T2 (based on the same
sample size) for the same parameter θ and they have variances Var(T1) and
Var(T2), respectively. Suppose the shapes of the sampling distribution of both
estimators are as shown in Fig. 8.1.

Fig. 8.1: Sampling distributions of estimators T1 and T2.

From Fig. 8.1, you can observe that the centre of both sampling distributions is
θ so both estimators are unbiased, however, the sampling distribution of the
estimator T2 is more spread than the estimator T1, therefore, we can conclude
that the variance (spread) of the estimator T1 is smaller than the estimator T2.
However, it is clear that we would also desire the estimator whose sampling
distribution not be too spread out around the true value of the parameter
because if it is too spread then there will be a high probability that an estimate
could be generated will have a significant distance from the true value of the
202 parameter. Therefore, there is a necessity for some further criterion which will
Unit 8 Efficiency and Mean Squared Error

enable us to choose between the estimators with the common property of


unbiasedness. One way to compare estimators is by looking at their variance.
If one unbiased estimator has a lower variance than another unbiased
estimator, we say that the one with a lower variance is more efficient than the
one with a higher variance. Such a criterion which is based on the variances of
the sampling distributions of the estimators is usually known as efficiency. We There are some textbooks
can define it as follows: in which equal sample
sizes are not mentioned.
If T1 and T2 are two unbiased estimators of a parameter θ with the same
But this seems a bit unfair
size, then the estimator T1 is said to be more efficient than the estimator because as you know that
T2 if the variance of an
Var ( T1 ) < Var ( T2 ) for all n estimator decreases by
increasing the sample
It means that if we want to compare (which one is better) two unbiased size. In practice the
estimators of the same size of a parameter then we can compare their sample size is fixed. It is
variances, and which one has the less variance is said to be more efficient. An hard to imagine a situation
estimator with a smaller variance is relatively more efficient because its values where you would select an
are concentrated more closely on the true value of the parameter. Efficiency in estimator that is more

statistical inference is important in comparing the performance of various efficient at a larger sample
size than sample size of
estimators. The efficiency of an estimator can also be treated as the precision
your data.
of the estimate. If an estimator is more efficient then we can say that it is the
more precise estimator of the parameter.
Let us look at an example to see how this definition works.
Example 1: A company produces batteries for laptops and wants to estimate
the average life of the batteries. For that, the statistician of the company
selected 5 batteries from the production and measured their lives. He
suggests two unbiased estimators for estimating the average life of the
batteries:
X1 + X2 + X3 + X4 + X5 X + 2X2 + 3X3 + 4X4 + 5X5
T1 = and T2 = 1
5 15
where X1, X2, X3, X4 and X5 represent the life of the selected batteries. If it is
known that the life of batteries has mean µ and variance σ2 then which one is
more efficient?
Solution: We have to check which one of these proposed unbiased
estimators T1 and T2 is more efficient. Therefore, we have to find the variances
of both estimators and check which one is smaller. Since X1, X2, X3, X4 and X5
are independent and taken from the same population with a mean µ and
variance σ2, therefore,
E ( Xi ) = μ and Var ( Xi ) = σ 2 for all i = 1,2,...,5

So we consider,
If X and Y are two
 X + X 2 + X3 + X 4 + X5 
Var ( T1 ) = Var  1 
independent random
 5  variables and a & b are two
constants, then
1
=  Var ( X1 ) + Var ( X2 ) + Var ( X3 ) + Var ( X4 ) + Var ( X5 ) 
25 
1
= σ 2 + σ 2 + σ 2 + σ 2 + σ 2   Var ( Xi ) = σ 2 
25 203
Block 2 Properties of Good Estimator
1
=
25
(
5σ 2 )
1 2
Var ( T1 ) = σ
5
Similarly,
If X and Y are two  X + 2X2 + 3X3 + 4X4 + 5X5 
independent random Var ( T2 ) = Var  1 
 15 
variables and a & b are two
constants, then
1  Var ( X1 ) + 4Var ( X2 ) + 9Var ( X3 ) 
=  
225 
 + 16Var ( X 4 ) + 25Var ( X )
5 

1
=
225
(
σ 2 + 4σ 2 + 9σ 2 + 16σ 2 + 25σ 2 )  Var ( Xi ) = σ 2 

55σ 2
=
225
11σ 2
Var ( T2 ) =
45
Since, Var ( T1 ) < Var ( T2 ) , therefore, we conclude that the estimator T1 is more
efficient than T2.
Example 2: Show that the sample mean is a more efficient estimator than the
sample median for estimating the mean of the normal population.
Solution: To show that the sample mean is a more efficient estimator than the
sample median for estimating the mean of the normal population, we have to
compare the variance of the sample mean with the variance of the sample
median.
Let X1, X2, …, Xn be a random sample taken from a normal population with
mean μ and variance σ2. Also, let X and X  be the sample mean and sample
median, respectively. We have seen in Unit 2 that the sampling distribution of
mean from a normal population follows a normal distribution with means µ and
variance σ2/n. Similarly, the sampling distribution of the median from a normal
π σ2
population also follows a normal distribution with mean µ and variance .
2 n
Therefore,
σ2
( )
Var X =
n
2
 = πσ
Var X( )2n
σ2 πσ2
Since
n
<
2n
 π



( ) ( )
 2 > 1 , therefore, Var X < Var X . Thus, we conclude
that the sample mean is a more efficient estimator than the sample median.

I hope you understood the concept of efficiency and how to check which one is
more efficient between the two estimators. Therefore, before going to the next
section, you should assess yourself by answering the following Self
204 Assessment Question.
Unit 8 Efficiency and Mean Squared Error

SAQ 1
A company manufactures fruit juice packets. Suppose the weight of juice
packets follows a normal distribution with mean weight µ ml and standard
deviation σ ml. To estimate the average weight of the fruit juice packets, the
quality control inspector measured the weight of three selected fruit juice
packets X1, X2, and X3 ml and proposed two estimators for estimating the
average weight of fruit juice packets µ as follows:
X1 + X2 + X3 X1 + X2 X3
=T1 and
= T2 +
3 4 2
Are both estimators unbiased for µ? Which one of them is more efficient?

8.3 MOST EFFICIENT ESTIMATOR


In the previous section, you studied the concept of efficiency. According to
this, if one unbiased estimator, say, T1 has lower variance than another
unbiased estimator, say, T2, then we say that the estimator T1 is more efficient
than the estimator T2 for all the same sample sizes. This concept is used when
we have to compare two unbiased estimators. Sometimes, we have a class of
unbiased estimators for a parameter then to compare the efficiency of the
estimator, we use the concept of the most efficient estimator. We can define
the most efficient estimator as follows:
In a class of unbiased estimators (based on the same sample size) of a
parameter, if there exists one estimator whose variance is minimum
(least) among the class, then it is said to be the most efficient estimator
of that parameter.
For example, suppose T1, T2 and T3 are three unbiased estimators of
parameter θ having variance 1/n, 1/(n+1) and 5/n, respectively. Since the
variance of estimator T2 is minimum, therefore, estimator T2 is the most
efficient estimator in that class.
Efficiency
The efficiency of an unbiased estimator is measured by concerning the most
efficient estimator is called “Absolute Efficiency”. If T* is the most efficient
estimator having variance Var(T*) and T is any other unbiased estimator
having variance Var(T), then the efficiency of T is defined as

e=
( )
Var T *
Var ( T )

Since the variance of the most efficient estimator is minimum, therefore,

=e
( ) <1
Var T *
Var ( T )

Let us take an example for illustration purposes.

Example 3: Suppose a market researcher proposed three unbiased


estimators for estimating the average life of LED bulbs produced by a
company on the basis of a sample of size 4 which are given as follows:
205
Block 2 Properties of Good Estimator
X1 + X2 + X3 + X4 2X1 + 3X2 + αX4 X1 + X2 + βX3
=T1 = , T2 = ,T3
4 10 5
where X1, X2, X3, and X4 represent the life of the selected LED bulbs in the
random sample. It is known that the life of the LED bulbs has mean µ and
variance σ2.
(i) Find the values of α and β.
(ii) Which one is the most efficient estimator?
(iii) Calculate the efficiency of the remaining estimators.
Solution: Since it is given that the estimators are unbiased, therefore, by the
definition of the unbiased estimator, their expected values equal to the
average life of the LED bulbs, therefore,
( T1 ) E=
E= ( T2 ) E=
( T3 ) μ
To find the value of α, we consider
If X and Y are two E ( T2 ) = μ
independent random
variables and a & b are two  2X + 3X2 + αX4  2E ( X1 ) + 3E ( X2 ) + αE ( X4 )
constants, then E 1  μ⇒
= μ
=
 10  10

Since X1, X2, X3, and X4 are independent and taken from the same group of
the LED bulbs (population) with a mean µ and variance σ2, therefore,
E ( Xi ) = μ and Var ( Xi ) = σ 2 for all i = 1,2,..., 4

Therefore,
2μ + 3μ + αμ
=μ ⇒ α = 5
10
Similarly, to find the value of β, we consider
E ( T3 ) = μ

 X + X2 + βX3  E ( X1 ) + E ( X2 ) + βE ( X3 )
E 1  μ⇒
= μ
=
 5  5

μ + μ + βμ
=μ ⇒ β =3
5
To check which one of these proposed estimators T1, T2 and T3 is most
efficient, we have to find variances of the estimators and check which one is
the smallest. Therefore, we consider
If X and Y are two
 X + X 2 + X3 + X 4 
independent random Var ( T1 ) = Var  1 
variables and a & b are two  4 
constants, then
1
=  Var ( X1 ) + Var ( X2 ) + Var ( X3 ) + Var ( X4 ) 
16 
1 1
=
16
σ 2 + σ 2 + σ 2 + σ 2  =
16
(
4σ 2 )
σ2
Var ( T1 ) =
206 4
Unit 8 Efficiency and Mean Squared Error

Similarly,
If X and Y are two
 2X + 3X2 + 5X4 
Var ( T2 ) = Var  1  independent random
 10  variables and a & b are two
constants, then
1
=  4Var ( X1 ) + 9Var ( X2 ) + 25Var ( X4 ) 
100 
1 38 2
=  4σ 2 + 9σ 2 + 25σ
= 2
 σ
100 100
Similarly,
 X + X2 + 3X3  1
Var ( T=
3) Var  1 =  25  Var ( X1 ) + Var ( X2 ) + 9Var ( X3 ) 
 5 
1 11 2
= σ 2 + σ 2 + 9σ=
2
 σ
25 25
Since the variance of the estimator T1 is minimum, therefore, by the definition
of the most efficient estimator, we conclude that the estimator T1 is the most
efficient estimator in the class of three unbiased estimators.
We now come to part (iii). We can calculate the efficiency of an unbiased
estimator as

e=
( )
Var T *
Var ( T )

where T* is the most efficient estimator.


Since the estimator T1 is the most efficient estimator in the class of three
unbiased estimators, therefore, for computing the efficiency of estimator T2, we
take estimator T1 in place of T*
Var ( T1 ) σ2 / 4
=e =
Var ( T2 ) 38σ 2 / 100

100
= = 0.658
38 × 4
Similarly, we can compute the efficiency of estimator T3 as follows:
Var ( T1 ) σ2 / 4
=e =
Var ( T3 ) 11σ 2 / 25
25
= = 0.568
11× 4
Hence, we conclude that estimator T2 is more efficient in the comparison of
estimator T3.
Note 1: Although an unbiased estimator is usually preferred over a biased
one. But, there are situations in which a biased estimator with higher efficiency
can be more valuable than an unbiased estimator with lower efficiency.
Note 2: The relative efficiency of two estimators may depend on the
distribution involved. For example, the mean is more efficient than the median
for normal distribution, however, this is not the case for highly skewed
distribution.
207
Block 2 Properties of Good Estimator
I think you have a curiosity to find the efficiency of an estimator. Therefore,
you can try the following Self Assessment Question.

SAQ 2
Consider the question of the manufacturing fruit juice packets discussed in
SAQ 1. Suppose the quality control inspector proposed third estimator for
estimating the average weight of fruit juice packets µ as follows:
X1 + 2X2 + 3X3
T3 =
6
(i) Is estimator T3 unbiased of µ?
(ii) Which one is the most efficient estimator among the three?
(iii) Calculate the efficiency of the remaining estimators.

After understanding the concept of the efficient estimator, we now discuss


some important properties of the same in the next section.

8.4 PROPERTIES OF EFFICIENT ESTIMATOR


After understanding the concept of efficiency and how to calculate it, we now
discuss some properties of the efficient estimator as follows:

1. Efficient estimators are not necessarily unbiased or consistent (see


Example 4).

2. The most efficient estimator is unique.


After understanding the concept of efficiency and how to check whether an
unbiased estimator is more efficient or not, we would like to indicate one
weakness of an efficient estimator. The efficiency is restricted to unbiased
estimators and excludes biased estimators. Although an unbiased estimator is
usually preferred over a biased one, however, there are situations in which a
biased estimator with higher efficiency can be more valuable than an unbiased
estimator with lower efficiency. Therefore, in such cases, we require some
other characteristics of a good estimator which compares the estimators. In
the next section, we introduce the concept of such a tool known as mean
squared error.

8.5 MEAN SQUARED ERROR


In the previous sections, you studied the concept of efficiency and the most
efficient estimators. With the help of efficiency, we can compare unbiased
estimators and judge which one is better or most efficient in the class of
unbiased estimators. You also noticed that the concept of efficiency is
restricted to unbiased estimators and excludes biased estimators. But there
exist so many situations where a biased estimator has a smaller variance in
comparison to the unbiased estimator for a parameter. For example, if an
investigator proposed two estimators for the average height of the young
males in a city as follows:
1 n
(i) T1 = Sample mean X = ∑ Xi
n i=1
208
Unit 8 Efficiency and Mean Squared Error
(ii) T2 = a constant = 165 cm

The first estimator T1 is the sample mean which is unbiased and its value
changes with the change of the samples. Therefore, it has a certain variance
greater than zero. But the second estimator T2 does not change with the
samples and always takes a single value so its variance is zero but it is highly
biased because not all young males may have the same height of 165 cm.
Similarly, an estimator that multiplies the sample mean by [n/(n+1)] will
underestimate the population mean (biased estimator) but have a smaller
variance. Therefore, the question may arise:
(i) Is a biased estimator with a smaller variance better than an unbiased
estimator with a larger variance?
(ii) How can we compare such estimators?
Think about that. To compare these estimators, we require a measuring
device that explicitly trades off biasedness with the variance of an estimator. A
simple approach is to compare estimators based on their mean squared error.
It permits us to compare biased and unbiased estimators.
In statistics, the mean squared error is an essential measure which is used to
The mean square error may
assess the performance of a point estimator (biased or unbiased). It is also
be called a risk function
necessary for relating the concepts of precision, bias and accuracy during
which agrees with the
the statistical estimation. It is abbreviated as MSE. The mean squared error
expected value of the loss
measures the average squared difference between the estimator and the
of squared error. This
parameter.
difference or the loss could
Therefore, we can define the mean squared error of an estimator T of a be developed due to the
parameter θ as randomness or due to the
= E [ T − θ]
2
MSE estimator is not
representing the true
It is a function of parameter θ. unknown parameter.

We can also express the mean squared error as


2
= E [ T − θ] =E T − E ( T ) + E ( T ) − θ 
2
MSE [add and subtract E(T)]
 

= E {T − E ( T )} + 2 {T − E ( T )} {E ( T ) − θ} + {E ( T ) − θ} 
2 2

 

= E {T − E ( T )} + 2E {T − E ( T )} {E ( T ) − θ} + E {E ( T ) − θ}
2 2

= Var ( T ) + 2 {E ( T ) − E ( T )} {E ( T ) − θ} + E Bias ( T,θ )   E (E ( T ) ) = E ( T ) 


2

Var ( T ) + {Bias ( T,θ )}  E {Bias ( T,θ )} ={Bias ( T,θ )} 


2 2 2
MSE =
 
Thus, the mean squared error incorporates two components, one measuring
the variability of the estimator (precision) and the other measuring its bias
(accuracy). It means that an estimator will be an efficient estimator if its
variance and bias should be minimum.
Therefore, a desirable property of a good estimator is not only unbiased but
also has a small variance. An estimator which has a smaller mean squared
error is said to be better than the other, regardless of whether they are biased
or unbiased. Therefore, we can say that an estimator will be an efficient 209
Block 2 Properties of Good Estimator
estimator if its variance as well as its bias should be minimum. You can easily
understand the same using an example of a dart board on which there are
several situations of hits which are shown in Fig. 8.2.
From Fig. 8.2, you can observe that the hitting of the target is too good if the
bullets have less variation (closely packed) and are at the centre or near the
centre.
In a similar way, we can say that an estimator is too good if its variance is
small and it is unbiased or has a small bias.

Fig. 8.2: Several situations of hits on a dart board.

If the estimator is unbiased then the mean squared error is equal to the
variance of the estimator.

= Var ( T ) + {0}= Var ( T )


2
MSE

For the unbiased estimator, the mean squared error is equal to the variance.
Therefore, for comparing the estimators, we compare the mean squared error
regardless of whether they are biased or unbiased. If T1 and T2 are two
estimators (biased or unbiased) of a parameter θ with the same size, then the
estimator T1 is said to be more efficient than the estimator T2 for all the same
sample sizes if

210
MSE ( T1 ) < MSE ( T2 ) for all n
Unit 8 Efficiency and Mean Squared Error

We can also compare the mean squared errors of two estimators by using
relative efficiency. If T1 and T2 are two estimators, then the efficiency of T1
relative to T2 is
MSE ( T2 )
e(T1,T2 ) =
MSE ( T1 )

Sometimes the mean square error of an unbiased estimator is greater than


that of a biased estimator. In such a situation, we prefer the biased estimator.
For a better understanding of the concept of the mean squared error, we take
an example.
Example 4: Suppose the market researcher of Example 3 proposed the
following estimators for estimating the average life of LED bulbs produced by
the company as follows:
1 1 1 1 1
T1 = X1 + X2 + X3 + 2, T2 = X1 − X2 + X3 + X4
2 4 4 2 2
where X1, X2, X3, and X4 represent the life of the selected LED bulbs in the
random sample. It is known that the life of the LED bulbs has mean µ and
variance 2.
(i) Check whether the estimators are unbiased or not.
(ii) Find the bias, variance and mean squared error.
(iii) Which one is the more efficient estimator?
Solution: We have to check whether the estimators T1 and T2 are unbiased or
not. Therefore, we have to find E(T1) and E(T2) of both estimators and check
whether they are equal to µ or not. Since X1, X2, X3 and X4 are independent
and taken from the same population with a mean µ and variance σ2, therefore,
E ( Xi ) = μ and Var ( Xi ) = σ 2 for all i = 1,2,3, 4

So we consider, If X and Y are two


independent random
1 1 1 
E ( T1 )= E  X1 + X2 + X3 + 2  variables and a & b are two
 2 4 4  constants, then

1 1 1
= E ( X1 ) + E ( X2 ) + E ( X3 ) + E ( 2 )
2 4 4
1 1 1
=μ+ μ+ μ+2 =μ+2  E ( a ) =a 
2 4 4
Since E ( T1 ) = μ + 2 ≠ μ so the estimator T1 is not an unbiased estimator of the
parameter µ.
Similarly, we consider
1 1 
E ( T=
2) E  X1 − X2 + X3 + X 4 
2 2 

1 1
= E ( X1 ) − E ( X2 ) + E ( X3 ) + E ( X4 )
2 2
1 1
= μ − μ + μ + μ= μ
2 2 211
Block 2 Properties of Good Estimator
E ( T2 ) = μ

Since E ( T2 ) = μ so the estimator T2 is an unbiased for the parameter µ.

We now find the bias of the estimator which is not unbiased as


The bias of the estimator T1 = E ( T1 ) − μ = μ + 2 − μ = 2

We now find the variance of both estimators as


1 1 1 
If X and Y are two Var (=
T1 ) Var  X1 + X2 + X3 + 2 
independent random 2 4 4 
variables and a & b are two
1 1 1
constants, then = Var ( X1 ) + Var ( X2 ) + Var ( X3 ) + Var ( 2 )
4 16 16
1 1 1 6
= σ2 + σ2 + σ2 + 0 = σ2  Var ( a ) =0 
4 16 16 16
6
Var ( T1 ) = × 2 = 0.75
16
Similarly,
1 1
( T2 ) Var  X1 − X2 + X3 + X4 
Var=
2 2 
1 1
= Var ( X1 ) + Var ( X2 ) + Var ( X3 ) + Var ( X4 )
4 4
1 2 1 5 2
= σ + σ 2 + σ 2 + σ 2= σ
4 4 2
5
Var ( T2 ) = ×2 = 5
2
We can calculate the mean squared error of both estimators as

) Var ( T1 ) + {Bias ( T1,θ )}= 0.75 + 4= 4.75


2
MSE ( T1=

MSE ( T2 ) = Var ( T2 ) + {Bias ( T2 ,θ )} = 5 + 0 = 5


2

Since MSE ( T1 ) =
4.75 < MSE ( T2 ) =
5

Thus, we conclude that the estimator T1 is more efficient than the estimator T2.

Example 5: Suppose the counsellor of the MST-016 course of the MSCAST


programme gave the problem of estimating the variation in the marks of the
cute play school children discussed in Unit 2 to the two groups of learners.
Suppose the first group of learners estimates the same using the sample
variance S2 whereas the second by the sample variance S′2 then
(i) Check whether both estimators are unbiased.
(ii) Find the mean squared error of the estimators.
Solution: As we know that
1 n 1 n n −1 2
( ) ( )
2 2
=S2 ∑ Xi − X
n − 1 i=1
and S=
′2 ∑
n i=1
Xi − X=
n
S

212 We also know (from Unit 3) that the sample variance has a mean
Unit 8 Efficiency and Mean Squared Error

E S ( ) =σ
2 2

and variance
2σ 4
Var(S2 ) =
n −1
If X and Y are two
Since E ( S2 ) = σ 2 so S2 is unbiased whereas independent random
variables and a & b are two
n −1 n −1 2  n −1 2 constants, then
E S′2
= ( ) =
n
E S2( )
n
σ
= 

 S′2
n
S 

Since E ( S′2 ) ≠ σ 2 so S′2 is a biased estimator.

We can also find the variance of S′2 as


2
 n − 1 2   n − 1 2
Var(S′ ) = Var  S  =  Var(S )
2

 n   n 
2 ( n − 1) σ 4
4 2
 n − 1  2σ
= = 
 n  n −1 n2

We now calculate the mean square errors as


2σ 4 2σ 4
( ) ( )
MSE S2= Var S2 + (Bias )=
2

n −1
0
+=
n −1
2 ( n − 1) σ 4
2
 n −1 2 
( ) ( )
MSE S′2 = Var S′2 + (Bias ) =
2

n2
+
 n
σ − σ2 

2 ( n − 1) σ 4
2
 n − 1− n 2 
= + σ 
n2  n 
2 ( n − 1) σ 4 σ4
= 2
+
n n2

( 2n − 1) σ 4
MSE S′2 = ( ) n2
We now consider
 2n − 1 2  4
( )
MSE S′2 − MSE S2 = ( )
 n2 − n − 1  σ
 
 ( 2n − 1) × ( n − 1) − 2n2  4
=  σ

 n2 ( n − 1) 
 2n2 − 2n − n + 1 − 2n2  4
=  σ

 n2 ( n − 1) 
 1 − 3n  4
( )
MSE S′2 − MSE S=
2
 2 ( )  σ < 0
 n ( n − 1) 
Therefore, MSE ( S′2 ) < MSE ( S2 )

Thus, we conclude that the sample variance S′2 has less mean squared error
than the sample variance S2 even S′2 is a biased estimator.
213
Block 2 Properties of Good Estimator
The above example does not suggest that S should not be used as an
2

estimator of σ2. The reasons are discussed in the remarks as follows:


Important Remarks
• From Example 5, you may have the curiosity to know why we take S2 in
place of S′2 even mean squared error of S′2 is less than S2. One reason is
that the concept of the mean square error is a fair criterion for location
parameters, but it is not appropriate for scale parameters because the
mean squared error penalizes equally for overestimation and
underestimation, which is fine in the location case but in the scale case,
the lower limit of the scale parameter is 0, so the estimation problem is not
symmetric.
• The second reason is that if we use the mean squared error as a
measure, then on average, S′2 will be closer to σ2 than S2. However, S′2
is biased and will, on average, underestimate σ2. This fact alone may
make us uncomfortable using S′2 an estimator for σ2. In general, the mean
squared error is a function of the parameter, therefore, for some
parameter values, one is better, and for other values, the other is better.
Suppose we have two estimators, say, T1 and T2 and their respective
mean squared errors are MSET1 = t1 (θ) and MSET2 = t2 (θ) which are function
of the parameter θ and are likely cross to each other. For some values of
θ estimator T1 has a smaller mean squared error whereas for other values
of θ, estimator T2 has a smaller mean squared error as shown in Fig. 8.3.

Fig. 8.3: Mean squared error of two estimators for various values of the parameter θ.

Therefore, we would have no basis for preferring one of the estimators over
the other on the basis of mean squared error.
It is now time for you to try the following Self Assessment Question to make
sure that you have understood the concept of mean squared error.

SAQ 3
The magnitude of earthquakes recorded in a region modelled as an
exponential distribution with an unknown parameter θ whose pdf is given by
1 − θx
f ( x,θ
= ) e ; x > 0,θ > 0
θ
A researcher considered the following two estimators for estimating the
parameter θ:
1 n 1 n
T1 = ∑ Xi and T2 = ∑ Xi
n i=1 n + 1 i=1

214 Check which one is more efficient for θ.


Unit 8 Efficiency and Mean Squared Error

8.6 MINIMUM VARIANCE UNBIASED ESTIMATOR


In the previous section, you studied the concept of the mean squared error,
and we can define the mean squared error of an estimator T of a parameter θ
as

= E [ T − θ]
2
MSE

It is a function of parameter θ. Due to this, for some values of θ estimator T1


has a smaller mean squared error whereas for other values of θ, estimator T2
has a smaller mean squared error.

Also, you have studied that the unbiasedness criterion ensures only the
average or mean of the sampling distribution of the estimator is equal to the
true value of the parameter. However, it does not tell us the scatteredness
(variance) of the sampling distribution of the estimator. Graphically, we show
the sampling distribution of the two estimators T and T′ of the parameter θ in
Fig. 8.4.

Fig. 8.4: Sampling distributions of estimator T and T`of parameter θ

From Fig. 8.4, you can observe that both estimators are unbiased however,
the variance (spread) of the estimator T is smaller than the estimator T′
estimator. It is clear that we would also desire the estimator whose sampling
distribution not be too spread out around the true value of the parameter
because if it is too spread then there will be a high probability that an estimate
could be generated that will have a significant distance from the true value of
the parameter. The foregoing considerations motivate that if one wishes to use
an unbiased estimator of the parameter θ, one should use the unbiased
estimator that also has minimum variance among all unbiased estimators of θ.
Such an estimator is called a minimum variance unbiased estimator (MVUE).
We can define it as follows:

An estimator T of the parameter θ is said to be a minimum variance unbiased


estimator of θ if and only if

(i) E ( T ) = θ , that is the estimator T is an unbiased estimator of the


parameter θ; and 215
Block 2 Properties of Good Estimator
(ii) Var ( T ) ≤ Var ( T′ ) where T′ is any other unbiased estimator of parameter
θ.
The above definition implies that an estimator is a minimum variance unbiased
estimator (MVUE) if and only if the estimator is unbiased and if there is no
other unbiased estimator that has a smaller variance for any value of θ. Since
it is for all values of the parameter θ, therefore it is also called a uniformly
minimum variance unbiased estimator (UMVUE).
We now end this unit by giving a summary of what we have covered in it.

8.7 SUMMARY
In this unit, we have covered the following points:
• If T1 and T2 are two unbiased estimators of a parameter θ with the same
size, then the estimator T1 is said to be more efficient than the estimator
T2 if
Var ( T1 ) < Var ( T2 ) for all n

• In a class of estimators of a parameter, if there exists one estimator


whose variance is minimum (least) among the class, then it is said to be
the most efficient estimator of that parameter.
• The efficiency of an estimator T is defined as

e=
( )
Var T *
where T* is the most efficient estimator.
Var ( T )

• The mean square error is defined as the average squared difference


between the estimator and the parameter.

= E [T − θ] Var ( T ) + {Bias ( T,θ )}


2 2
MSE =

• An estimator T of the parameter θ is said to be a minimum variance


unbiased estimator of θ if and only if
(i) E ( T ) = θ , that is, the estimator T is an unbiased estimator of the
parameter θ.
(ii) Var ( T ) ≤ Var ( T′ ) where T′ is any other unbiased estimator of the
parameter θ.

8.8 TERMINAL QUESTIONS


1. Describe efficiency and mean squared error.
2. Define the most efficient estimator and minimum variance unbiased
estimator.

8.9 SOLUTIONS / ANSWERS


Self Assessment Questions (SAQs)
1. Since X1 , X2 , X3 are the weight of juice packets which are taken randomly
and independently from a normal population with a mean µ and variance
216 σ2, therefore,
Unit 8 Efficiency and Mean Squared Error

E ( Xi ) = μ and Var ( Xi ) = σ for all i = 1,2,3


2

To check whether the estimators T1 and T2 are unbiased or not, we find


expectations of T1 and T2 as
 X + X 2 + X3 
E ( T1 ) = E  1  If X and Y are two
 3  independent random
variables and a & b are two
1
= E ( X1 ) + E ( X2 ) + E ( X3 )  constants, then
3
1
E ( T1 =
)
3
[μ + μ + μ]
1
= = ( 3μ) μ
3
Since E ( T1 ) = μ so it is unbiased.

We now consider
 X + X 2 X3 
E ( T2 ) E  1
= +
 4 2 

1 1
= E ( X1 ) + E ( X2 )  + E ( X3 )
4 2
1 μ μ μ
=
4
[μ + μ] + = +
2 2 2
Since E ( T2 ) = μ so it is also unbiased.

Hence, T1 and T2 both are unbiased estimators of µ.


For efficiency, we find the variances of T1 and T2 as
 X + X 2 + X3  If X and Y are two
Var ( T1 ) = Var  1  independent random
 3 
variables and a & b are two
1 constants, then
=  Var ( X1 ) + Var ( X2 ) + Var ( X3 ) 
9
1 2 3σ 2
= σ + σ 2 + σ 2 =

9 9
σ2
Var ( T1 ) =
3
We now consider

 X + X 2 X3 
Var ( T2 ) Var  1
= +
 4 2 

1 1
=  Var ( X1 ) + Var ( X2 )  + Var ( X3 )
16 4
1 2 1
= σ + σ 2  + σ 2
16 4
σ 2 σ 2 σ 2 + 2σ 2
= + =
8 4 8 217
Block 2 Properties of Good Estimator
2

Var ( T2 ) =
8
Since Var ( T1 ) < Var ( T2 ) , therefore, T1 is a more efficient estimator of μ
than T2.
2. To check whether the estimator T3 is unbiased or not, we find the
expectation of T3 as
 X + 2X2 + 3X3 
E ( T3 ) = E  1 
 6 
1
= E ( X1 ) + 2E ( X2 ) + 3E ( X3 ) 
If X and Y are two 6
independent random
1
variables and a & b are two
constants, then
E ( T3 ) =
6
[μ + 2μ + 3μ]
1
= = ( 6μ) μ
6
Since E ( T3 ) = μ so it is also unbiased.

We now find the variance of the estimator T3 as


 X + 2X2 + 3X3 
Var ( T.3 ) = Var  1 
 6 
1
=  Var ( X1 ) + 4Var ( X2 ) + 9Var ( X3 ) 
36 
1 14σ 2
= σ 2 + 4σ 2 + 9σ 2 =

36 36
7σ 2
Var ( T3 ) =
18
To find the most efficient estimator among the three unbiased estimators,
we compare their variances. We have
σ2
Var ( T=
1) = 0.333σ 2 ,
3
3σ 2
Var (=
T2 ) = 0.375σ 2 and
8
7σ 2
Var (=
T3 ) = 0.389σ 2
18
Since, the variance of the estimator T1 is minimum, therefore, by the
definition of the most efficient estimator, we conclude that the estimator
T1 is the most efficient estimator in the class of three unbiased
estimators.
We can calculate the efficiency of the unbiased estimator as

e=
Var T * ( )
Var ( T )

218 where T* is the most efficient estimator.


Unit 8 Efficiency and Mean Squared Error

Since the estimator T1 is the most efficient estimator in the class of three
unbiased estimators, therefore, for computing the efficiency of estimator
T2, we take estimator T1 in place of T*
Var ( T1 ) 0.333σ 2
=e = = 0.88
Var ( T2 ) 0.375σ 2

Similarly, we can compute the efficiency of estimator T3 as follows:

Var ( T1 ) 0.333σ 2
=e = = 0.85
Var ( T3 ) 0.389σ 2

Hence, we conclude that estimator T2 is more efficient in comparison of


estimator T3.

3. Since the magnitude of the earthquakes recorded in the region is


modelled as the exponential distribution. Therefore, it has mean θ and
variance θ2. To check which estimator is more efficient, first, we check
whether estimators T1 are T2 are unbiased or not. For that, we have to find
E(T1) and E(T2) and check whether it is equal to µ or not. Let X1, X2,…, Xn
be a random sample of size n taken from exponential distribution. Since
the sample observations are independent and taken from the same
population with a mean θ and variance θ2, therefore,

E ( Xi ) = θ and Var ( Xi ) = θ2 for all i = 1,2,...,n

We consider,
If X and Y are two
 1 n
 1 independent random
E ( T1 ) = E  ∑ = Xi  E ( X1 ) + E ( X2 ) + ... + E ( Xn )  variables and a & b are two
 n i=1  n constants, then

1  1
=  θθ 
+ + ...=+θ  = (nθ ) θ
n n-times  n

E ( T1 ) = θ

Hence, the estimator T1 is an unbiased estimator of θ.

Similarly,

 1 n  1
E ( T2 ) = E  = ∑ Xi  E ( X1 ) + E ( X2 ) + ... + E ( Xn ) 
 n + 1 i=1  n + 1

1   1 n
=  θθ 
+ + ... +θ  =
= (nθ ) θ
n + 1 n-times  n +1 n +1

 n 
E ( T2 ) 
Since= θ ≠ θ
 n + 1

Hence, the estimator T2 is not an unbiased estimator of the parameter θ.


Since both estimators are not unbiased estimators so we use the concept
of the mean squared error for judging which one is more efficient.
Therefore, we have to find the variance of these unbiased estimators. We
now consider
219
Block 2 Properties of Good Estimator
1  1
n

If X and Y are two Var
= ( T1` ) Var  = ∑ Xi  Var  ( X1 + X2 + ... + Xn ) 
independent random  n i=1  n 
variables and a & b are two
1
constants, then =  Var ( X1 ) + Var ( X2 ) + ... + Var ( Xn ) 
n2 

1 2  1
= θ + θ2 + ... + θ2 = 2 ( nθ2 )
2   
n  n-times  n
θ2
Var ( T1` ) =
n

We now calculate the variance of estimator T2 as


 1 n  1
Var ( T2` ) Var=
=  ∑ Xi   Var ( X1 + X2 + ... + Xn ) 
(n + 1)
2
 n + 1 i=1 

1
=  Var ( X1 ) + Var ( X2 ) + ... + Var ( Xn ) 
(n + 1)
2

2
 2
1  nθ
= θ + θ2 + ... + θ2 =
2   
(n + 1)
2
(n + 1)  n-times 

nθ2
Var ( T2` ) =
(n + 1)
We now calculate the mean square errors as
2
MSE ( T1 ) = Var ( T1 ) + ( bias ) = Var ( T1 ) + E ( T1 ) − θ 
2

θ2 θ2
+ (θ − θ) =
2
=
n n
2
MSE ( T2 ) = Var ( T2 ) + ( bias ) = Var ( T2 ) + E ( T2 ) − θ 
2

2 2
nθ2  nθ  nθ2  nθ − nθ − θ 
= + − θ=
 +
(n + 1)
2
n +1  (n + 1)
2
 n + 1 

nθ2 θ2 nθ2 + θ2 θ2
= + = =
(n + 1)
2
(n + 1)
2
(n + 1)
2
(n + 1)
Since, MSE ( T2 ) < MSE ( T1 ) , therefore, we conclude that the estimator T2
is a more efficient estimator than the estimator T1 for estimating the
magnitude of the earthquake in the region.
Terminal Questions (TQs)
1. Refer to Sections 8.2 and 8.5.

2. Refer to Sections 8.3 and 8.6.

220
UNIT 9

SUFFICIENCY AND MINIMAL


SUFFICIENCY

Structure
9.1 Introduction 9.5 Properties of Sufficient
Statistic
Expected Learning Outcomes

9.2 Joint Probability Density 9.6 Minimal Sufficient Statistic


(Mass) Function
9.7 Summary
9.3 Concept of Sufficiency
9.8 Terminal Questions
9.4 Fisher-Neyman Factorization
9.9 Solutions /Answers
Theorem

9.1 INTRODUCTION
In Units 6, 7 and 8, you have studied the characteristics of a good estimator
Tools You Will Need
namely: unbiasedness, consistency and efficiency. Let us have a look at them.
The following terms are
• An estimator is said to be unbiased for a parameter θ if and only if the
considered essential
average/mean of the sampling distribution of the estimator is equal to the background material for this
true value of the parameter. In other words, an estimator is said to be Unit. If you doubt your
unbiased if the expected value of the estimator is equal to the true knowledge of any of these
value of the parameter being estimated, that is, E(T) = θ terms, you should review
the appropriate Unit or
• An estimator Tn is said to be a consistent estimator of θ if Tn converges to
section before proceeding:
θ in probability.
• Sampling distributions
• If T1 and T2 are two unbiased estimators of a parameter θ with the same
(Units 2,3, 4 and 5).
size, then the estimator T1 is said to be more efficient than the estimator T2
if Var ( T1 ) < Var ( T2 ) for all n. • Basic terms of
estimation (Unit 6).
In the continuation of finding the best estimator, we introduce the concept of
• Unbiased, consistency
sufficiency in this unit. and efficiency (Units 6,
This unit is divided into nine sections. Section 9.1 is introductory in nature. The 7 and 8).
joint probability density (mass) function which is used to find a sufficient • Probability distributions
statistic is defined in Section 9.2. Section 9.3 is devoted to explaining the (MST-012).
concept of sufficient statistic. Section 9.4 explores the Fisher-Nayman
factorization theorem. The properties of the sufficient statistic are described in
Section 9.5. The concept of minimal sufficient statistic is described in Section 221
Unit Writer- Dr. Prabhat Kumar Sangal, School of Sciences, IGNOU, New Delhi
Block 2 Properties of Good Estimator

9.6. The unit ends by providing a summary of what we have discussed in this
unit in Section 9.7. The terminal questions and the solution of the SAQs/TQs
are given in Sections 9.8 and 9.9, respectively.

Expected Learning Outcomes


After studying this unit, you should be able to:

 define the joint probability density (mass) function and how to compute it;

 explain the concept of sufficient statistic and how to find sufficient


statistic for a parameter;

 describe the Fisher-Nayman Factorization theorem and how to use it to


find the sufficient statistic;

 explain the properties of sufficient statistic; and

 describe the concept of minimal sufficient statistic.

9.2 JOINT PROBABILITY DENSITY (MASS)


FUNCTION
Before discussing sufficiency, we discuss the joint probability density (mass)
function. This term is very useful in understanding the concept of sufficiency.
If X1, X2, …, Xn is a random sample of size n taken from a population whose
probability density (mass) function is f(x,θ) where θ is the population
parameter then the joint probability density (mass) function of the sample
values is denoted by f ( x1, x1,..., xn ,θ ) and it is a function of the sample
observations. We can define it as
For discrete case:
The joint probability mass function is defined as
θ ) P [=
f ( x1, x1,..., xn ,= X1 x1, = Xn x n ]
X2 x 2 ,..., =

Since sample observations X1, X2, …, Xn are independent, therefore, we can


If A and B are two write the above expression as
independent events, then
θ ) P [=
f ( x1, x1,..., xn ,= X1 x1 ] P [=
X2 x 2 ] ...P [=
Xn x n ]
P(AՈB) =P(A)P(B)
In this case, the function f ( x1, x1,..., xn ,θ ) represents the probability that the
particular sample x1, x2, …, xn has been drawn for a fixed (given) value of
parameter θ.
For continuous case:
The joint probability density function is defined as
f ( x1, x1,..., xn , =
θ ) f ( x1, θ ) .f ( x 2 , θ ) ... f ( xn , θ )

In this case, the function f ( x1, x1,..., xn ,θ ) represents the probability density
function of the random sample X1, X2, …, Xn.
Let us understand the process of finding the joint probability density (mass)
222 function by taking some examples.
Unit 9 Sufficiency and Minimal Sufficiency

Example 1: Suppose the number of weekly accidents occurring on a mile


stretch of a particular road follows a Poisson distribution with a parameter λ
whose pdf is given by
e−λ λ x
P [ X= x ]= ; x= 0, 1, 2, ... & λ > 0
x!
To estimate the number of road accidents, a transport officer randomly
selected a sample of the number of road accidents, say, X1, X2, …, Xn, then
find the joint probability mass function of X1, X2, …, Xn.

Solution: The probability mass function of the Poisson distribution is given by


e−λ λ x
P [ X= x ]= ; x= 0, 1, 2, ... & λ > 0
x!
Since the Poisson distribution is a discrete distribution, therefore, by the
definition of the joint probability mass function of the sample observations X1,
X2, …, Xn, we have
λ ) P [=
f ( x1, x1,..., xn ,= X1 x1 ] P [ =
X2 x 2 ] ...P [ =
Xn x n ]

Since the number of weekly accidents (sample observations) follows the


Poisson distribution with parameter λ, therefore, we can obtain the joint
probability mass function by putting X as x1, x2, …, xn in the probability mass
function as mentioned above. Therefore,
e−λ λ x1 e−λ λ x2 e−λ λ xn
f ( x1, x1,..., xn , λ ) = . ...
x1 ! x2 ! xn !

Collecting like terms, we get


−λ−λ− ... −λ
 
e n − times
λ x1 + x2 + ... + xn
f ( x1, x1,..., x n , λ ) =
x1 ! x 2 ! ... xn !

Now, simplifying, by adding up all λs in the exponents as well as xi’s in the


power of λ, we get the required joint pdf of the sample observations as
n

− nλ
∑ xi n 
e λ i =1
Π xi !− repersents the 
f ( x1, x1,..., xn , λ ) = n
i =1
 product of x ! 
Π xi !  i 
i =1

Example 2: The magnitude of the earthquakes recorded in a region modelled


as an exponential distribution with an unknown parameter θ whose pdf is given
by
f ( x, θ ) = θe−θx ; x > 0, θ > 0

If a seismologist measured the magnitude of the n random earthquakes in that


region and denoted by X1, X2, …, Xn then find the joint probability density of the
sample observations.
Solution: Since the exponential distribution is a continuous distribution,
therefore, by the definition of the joint probability density function of the sample
observations X1, X2, …, Xn, we have
f ( x1, x1,..., xn , =
θ ) f ( x1, θ ) .f ( x 2 , θ ) ... f ( xn , θ ) 223
Block 2 Properties of Good Estimator

Since the magnitude of the earthquakes follows the exponential distribution


with parameter θ, therefore, we can obtain the joint probability density function
by putting X as x1, x2, …, xn in the probability density function of the
exponential distribution as
f ( x1, x1,..., xn , θ ) =θe −θx1 . θe −θx2 ... θe −θxn

Collecting like terms, we get


11
++...
+1
f ( x1, x1,..., xn , θ ) =θ
−θ( x1 + x 2 +...+ xn )
n -times
e

Now, simplifying, by adding up all θs as well as xi’s in the exponents, we get


the required joint pdf of the sample observations as
n
−θ ∑ xi
f ( x1, x1,..., xn , θ ) =θn e i =1

Let us check your understanding of the above by answering the following Self
Assessment Question.

SAQ 1
To test the effectiveness of a new drug in controlling systolic blood pressure, a
medicine scientist applied the drug to 10 systolic blood pressure patients. If
the numbers of the patients who were cured the disease follow binomial
distribution with parameters 10 and p, then find the joint probability mass
function.

9.3 CONCEPT OF SUFFICIENCY


In statistical estimation, the aim of the statistician is to estimate the value of
the unknown parameter (θ) on the basis of an estimator/statistic. As the
Any quantity calculated sample mean is used to estimate the population mean; the sample variance is
from sample values and used to estimate the population variation. Also, an estimator or a statistic T
does not contain any (function of sample values) is said to be the best estimator of a population
unknown parameter is parameter θ if T is close to the parameter θ (unbiased and efficient). In this
known as a statistic. way, a statistic which is used in estimation or drawing inferences about the
population parameter is a function of sample observations, rather than the full
data set. Obviously, there are lots of functions of the sample observations, and
so lots of statistics. For example, suppose you are interested in knowing the
average age of Facebook users on the basis of a sample X1, X2, …, Xn of size
n, then some functions of the sample observations through which you can
estimate/obtain the average age of Facebook users are given as follows:
X1 + X2 + ... + Xn
• T1 ( X1, X2 , Xn ) = Sample mean X =
n

• T2 ( X1, X2 , Xn ) = Sample median X


 = median ( X , X , X )
1 2 n

• T3 ( X1, X2 , Xn ) = Sample mode X0 = mode ( X1, X2 , Xn )

max ( X1, X2 ,, Xn ) + min ( X1, X2 ,, Xn )


• T4 ( X1, X2 , Xn ) =
224
2
Unit 9 Sufficiency and Minimal Sufficiency

All estimators are statistics because they do not depend on the unknown
population parameter. Obviously, there are lots of functions of X1, X2, …, Xn
and so lots of statistics.
To understand when a statistic is said to be a sufficient statistic, we consider
an example. Suppose we have three coins, and if we toss all three coins
simultaneously, then the possible outcomes are:
(T, T, T), (T, T, H), (T, H, T), (H, T, T), (T, H, H), (H, T, H), (H, H, T), (H, H, H)
If we represent head (H) by 1 and tail (T) by 0, we can represent the outcomes
as
(0, 0, 0), (0, 0, 1), (0, 1, 0), (1, 0, 0), (0, 1, 1), (1, 0, 1), (1, 1, 0), (1, 1, 1)
Suppose we are interested in estimating the number of heads on the basis of
samples, so we have to worry about all outcomes as mentioned above. We
can also estimate the number of heads using a statistic, T = X1 + X2 + X3 then
it takes the values 0, 1, 2, and 3 as shown in the following table:

Outcomes (0, 0, 0) (0, 0, 1), (0, 1, 0), (0, 1, 1), (1, 0, 1), (1, 1, 1)
(1, 0, 0) (1, 1, 0)

Statistic (T) 0 1 2 3

It means that using the statistic T instead of the sample, we condense the data
into four subgroups instead of eight. And if we use the statistic T instead of the
sample, we have to worry about 4 subgroups instead of 8. Now some
questions may arise:
(i) Is some information lost by using a statistic instead of full data?
(ii) Does the random sample contain any more information about the
population than this?
To get the answers to such types of questions, we have to study the concept
of sufficient statistic.
Suppose there are two researchers, say, Rajesh and Prabhat. Rajesh knows a
particular outcome (sample) of all possible cases (X1, X2, X3), say (1,0,1),
however, Prabhat only knows the value of a statistic T = X1 + X2 + X3 that is 2.
Since Prabhat knows that 2 heads come when three coins are tossed
simultaneously, therefore, he can generate a random sample (X1′ , X′2 , X′3 ) such
as (1, 1, 0) or (0, 1, 1) or (1, 0, 1) and he can use his random sample
(X1′ , X′2 , X′3 ) to compute whatever Rajesh computes using his random sample
(X1, X2, X3). Therefore, we can say that there is no information lost using the
statistic T = X1 + X2 + X3. In other words, we can say that Prabhat who knows
the value of T can do just as good a job of estimating the unknown parameter
θ as Rajesh who knows the entire random sample. Thus, we can say that
statistic T is a sufficient statistic. So, a statistic is sufficient if it is just as
informative as the full data. We can define the sufficient statistic as

A sufficient statistic is a particular kind of statistic which condenses the


data in such a way that no information about the parameter is lost.
The concept of sufficiency was introduced by Ronald. A. Fisher in the 1920s,
and refined by Jerzy Neyman in the 1930s. We now formally define the 225
Block 2 Properties of Good Estimator
sufficient estimator/statistic as
“A statistic T = T(X) is said to be a sufficient statistic for a parameter θ if
it contains “all of the information” about θ that is available in the
sample.”
In other words, we do not lose any information about θ by reducing the sample
X to the statistic T. This property of an estimator is called sufficiency.
According to statistician Ronald A. Fisher,
“…no other statistic that can be calculated from the same sample
provides any additional information as to the value of the parameter.”
It means a sufficient statistic contains all information about θ, that is, contained
in the sample and if we know the value of the sufficient statistic, then the
sample values themselves are not needed and can nothing tell you more
about θ.
Now the question may arise:
How do we check whether a statistic contains all the information about θ that is
contained in the sample?
As you know the actual data has a certain probability distribution such as
Normal, Bernoulli, Binomial, etc., which in general also, involve the
parameter(s). A statistic T = t(X1, X2, · · ·, Xn) is a sufficient statistic for a
parameter θ if, for each t, the conditional distribution of X1, X2, …, Xn given
T = t does not depend on θ.
Mathematically,
For discrete distribution

P [ X1= x1, X2= x 2 ,..., Xn= xn | T= t=


[ X1 x1=
P= , X2 x 2 ,...,=
Xn x=
n ,T t]
] P [T = t ]

= g ( x1, x 2 , ..., xn )

For continuous distribution


f ( x1, x 2 , ..., xn , t )
( x1, x 2 , ..., xn | t )
f= = g ( x1, x 2 , ..., xn )
f (t)

where the numerator is the joint probability density (mass) function of the
sample values and the function g ( x1, x 2 ,..., xn ) does not depend on the
parameter θ.
Let us take an example to understand the above definition.
Example 3: Consider the example of tossing three coins simultaneously and
assume that the probability of getting head is p. Check whether the statistic
T = X1 + X2 + X3 is sufficient or not.
Solution: To check whether the statistic T = X1 + X2 + X3 is sufficient or not,
we have to find the conditional distribution of the sample observations given T.
For that, we have to use some concepts of probability distributions which you
have studied in Unit 9 of the course MST-012: Probability and Probability
226 Distributions.
Unit 9 Sufficiency and Minimal Sufficiency

The probability of getting a head in a single coin is p and not getting the same
is 1 ̶ p. Since we perform a random experiment (tossing a coin
independently) and the outcome of each has two categories: head and tail, If each trial of a random
then the probability distribution of a random variable which takes the value 1 if experiment is termed in
the outcome is a head (success) and 0 if the outcome is a tail (failure) is one of the two possible
known as Bernoulli distribution. Therefore, we can write the probability mass categories traditionally
function as known as a success or a
failure then such a trial is
P[X =
x] =
p x (1 − p )
1− x
; x=
0,1
known as Bernoulli trial. If
we perform a random
Since T = X1 + X2 + X3, is the sum of the Bernoulli distributed random
experiment and the
variables, therefore, T follows a binomial distribution with parameters n = 3
realisation of a trial has
and p whose probability mass function is given as
only two categories
P [T =
t] =
Ct p (1 − p )
n− t
n t
; t=
0,1,2,3 & n =
3 success or failure, then
probability distribution of a
We are now ready to find the conditional distribution of the sample random variable which
observations given T and check whether it is independent of the parameter p takes value 1 if outcome is
or not. Therefore, we consider a success and 0 if
outcome is a failure is
P [ X1= x1, X2= x 2 ,..., Xn= xn | T= t=
[ X1 x1=
P= , X2 x 2 ,...,=
Xn x=
n ,T t]
] P [T = t ]
known as Bernoulli
distribution. The probability
mass function of Bernoulli
Suppose we observed a random sample of size n = 3 in which X1 = 1, X2 = 0
random variable X is given
and X3 = 1. In this case: by
P [ X=
1 1, X=
2 0, X=
3 1,T ] 0
= 0=

P [ X1= 1, X2= 0, X3= 1,T= 1=


] 0
P [ X=
1 1, X=
2 0, X=
3 1,T ] 0
= 3=
3
Because for X1 = 1, X2 = 0, X3 = 1, T = ∑ Xi = 1 + 0 + 1 = 2 is possible whereas
i =1

T =0, 1 and 3 are not possible. So the above expression becomes impossible
and their probability becomes 0 whereas only event (X1 = 1, X2 = 0, X3 = 1, T
= 2) is possible. In this case, we have, by independence:
P [ X=
1 1, X=
2 0, X=
3 1,T ] P [ X=1 1] P [ X=2 0] P [ X=3 1]
= 2=

=p (1 − p ) p =p2 (1 − p )
3 −1

So, in general
n
P
= [ X1 x=
1, X 2 x 2 ,...,
= Xn =
xn ,T=t] P
= [ X1 x=
1, X 2 x 2 ,...,
= Xn x n ] if ∑x
i =1
i =t

and
n
P [ X1= x1, X2= x 2 ,..., Xn= xn ,T= t=
] 0 if ∑x
i =1
i ≠t

Therefore,

P [ X1= x1, X2= x 2 ,..., Xn= xn | T= t=


[ X1 x1=
P= , X2 x 2 ,...,=
Xn x=
n ,T t]
] P [T = t ]
227
Block 2 Properties of Good Estimator

p (1 − p )
2 3−2
1
= = 3
C2p (1 − p ) C2
3 2 3−2

We have just shown that the conditional distribution of the sample X1, X2, …,
Xn given T = t does not depend on the parameter p. Therefore, T is indeed
sufficient for p. That is, once the value of T is known neither the sample nor
other function of X1, X2, …, Xn will provide any additional information about
the possible value of p.
Now, it is time for you to check your understanding of the condition distribution
by solving the following Self Assessment Question.

SAQ 2
Suppose the time between customers who enter a certain shop follows
exponential distribution with parameter θ whose pdf is given as follows:
1 − θx
f ( x, θ )
= e ; x > 0, θ > 0
θ
To estimate the parameter θ, the market researcher proposed the
statistic/estimator T = X based on sample X1, X2, …, Xn then show that T is a
sufficient statistic for the parameter θ using condition distribution.

9.4 FISHER-NEYMAN FACTORIZATION


THEOREM
In the previous section, you studied the definition of sufficient statistic for a
parameter θ and a statistic T = t(X1, X2, · · ·, Xn) is said to be a sufficient
statistic if, for each t, the conditional distribution of X1, X2, …, Xn given T = t
does not depend on θ.
While this definition of sufficient statistic is fairly simple, but finding the
conditional distribution is the tough part. In fact, most statisticians consider
it extremely difficult. One, slightly easier, way to find the conditional
distribution is to use the Factorization Theorem. The Factorization theorem
was given by Fisher and Nayman so it is called the Fisher-Nayman
factorization theorem or simply called the factorization theorem. Suppose we
would like to get information about the parameter value θ from our sample.
The concept of factorization theorem allows us to separate information
contained in the sample into two parts. One part contains all valuable
information as long as we are concerned with parameter θ, while the other part
contains pure noise in the sense that this part has no valuable information.
Thus, we can ignore the latter part.
Let us learn what the factorization theorem states.
Statement of Fisher-Nayman Factorization Theorem: Let X1, X2, …, Xn be
a random sample of size n taken from the probability density (mass) function
f(x, θ). A statistic or estimator T is said to be sufficient for a parameter θ if and
only if the joint probability density (mass) function of sample observations X1,
X2, …, Xn can be factored as

228 f ( x1, x 2 , ..., x=


n, θ) g [ t(x), θ] .h ( x1, x 2 , ..., xn )
Unit 9 Sufficiency and Minimal Sufficiency

where the function g [ t(x),θ] is a non-negative function of the parameter θ and


observed sample values (x1, x 2 ,..., xn ) only through the function t(x) and the
function h ( x1, x 2 ,..., xn ) is a non-negative function of (x1, x 2 ,..., xn ) and does not
involve the parameter θ.

For applying the factorization theorem, we try to factor the joint probability
density (mass) function as the product of two functions, one of which is a
function of the parameter(s) and statistic and another independent of the
parameter(s).

The proof of this theorem is beyond the scope of this course.

Note 1: The factorization theorem should not be used to show that a given
statistic or estimator T is not sufficient.
Let us learn how to apply the factorization theorem to obtain a sufficient
statistic with the help of some examples.

Example 4: Suppose the number of visitors visiting a website per hour follows
Poisson distribution with parameter λ. Find a sufficient statistic for λ.

Solution: To find the sufficient statistic for λ, we can use the factorization
theorem. For that, we have to find the joint probability density function of the
sample values. We know that the probability mass function of Poisson
distribution with parameter λ is
e−λ λ x
P [ X= x ]= ; x= 0, 1, 2, ... & λ > 0
x!
Let X1 , X2 , ..., Xn be a random sample taken from the Poisson distribution with
parameter λ. We can obtain the joint probability mass function of X1 , X2 , ..., Xn
as
λ ) P [=
f ( x1, x 2 ,..., xn ,= X1 x1 ] .P [ =
X2 x 2 ] ... P [ =
Xn x n ]

e−λ λ x1 e−λ λ x2 e−λ λ xn


= . ...
x1 ! x2 ! xn !
−λ−λ− ... −λ
 
e n -times
λ x1 + x2 + ... + xn
=
x1 ! x 2 ! ... xn !
n

− nλ
∑ xi
e λ i =1
f ( x1, x 2 ,..., xn , λ ) = n
Π xi !
i =1

We now try to factor the above joint probability mass function as the product of
two functions, one of which is a function of the parameter (λ) and another is
independent of the parameter (λ). We can factor the joint probability mass
function as
  
∑ xi   1 
n

f ( x1, x 2 ,..., xn , λ
= ) e λ ×  n 
 − n λ i =1 
   Π
i =1
xi ! 

= g [ t(x), λ ] .h ( x1, x 2 ,..., xn ) 229


Block 2 Properties of Good Estimator
n
∑ xi
where g [ t(x), λ=
] e−nλ λ i=1 is a function of the parameter λ and the observed
n
1
sample values x1, x 2 ,..., xn only through t(x) = ∑ xi and h ( x1, x 2 ,..., xn ) = n
i =1
Π xi !
i =1

is a function of observed sample values x1, x 2 ,..., xn and is independent of the


parameter λ.
n
Hence, by the factorization theorem of sufficiency, the ∑X i =1
i is a sufficient

statistic for λ.
But, wait a second! We can also write the joint probability mass function as:
e−nλ λnx  1 n 
f ( x1, x 2 ,..., xn , λ )
= n
=  since x ∑ xi 
Πx !  n i=1 
i
i =1

And factor as
 
 1 
f ( x1, x 2 ,..., xn , λ=
) (e − nλ
λ nx
) × n 
 Π xi ! 
 i=1 
= g [ t(x), λ ] .h ( x1, x 2 ,..., xn )

Therefore, the Factorization Theorem of sufficiency tells us that X is a


sufficient statistic for λ.
n
If you think about it, it makes sense that X and ∑ X are both sufficient
i =1
i

n
statistics, because if we know X , we can easily find ∑ X . And, if we
i =1
i

n
know ∑ X , we can easily find
i =1
i X . Also, both condense the data the same.

Note 2: Since throughout the course we are using a capital letter for statistic
n
or estimator, therefore, in the last line of the above example, we use ∑ X in
i =1
i

n
place of ∑ x . Thus, in all the examples and exercises relating to sufficient
i =1
i

statistic, we are using a similar approach.


Note 3: It is easy to see that if f(t) is a one-to-one function and T is a sufficient
statistic for the parameter θ, then f(T) is also a sufficient statistic for θ. In
particular, we can multiply/divide a sufficient statistic by a non-zero constant
and get another sufficient statistic.
Example 5: If the life of the lithium batteries used in cars follows a normal
distribution with mean µ months and standard deviation 5 months, then show
that the sample mean X is a sufficient statistic for µ. Are X2 and X3 sufficient
statistics for µ.

Solution: To find the sufficient statistic for the parameter µ, we use the
230 factorization theorem. We know that the probability density function of the
Unit 9 Sufficiency and Minimal Sufficiency

normal distribution with mean µ and standard deviation σ as


1
1 ( x −μ ) 2
( )

f x, μ, σ2
= e 2 σ2
; − ∞ < x < ∞, ∞ < µ < ∞
2πσ2

In our case, σ is known as 5, therefore,


1 1
1 − ( x −μ ) 2 1 − ( x −μ ) 2
f ( x, μ) = e 2×25 e 50

2π × 25 50π

We can obtain the joint probability density function of the sample observations
X1 , X2 , ..., Xn as

f ( x1, x 2 ,...xn ,μ) = f ( x1,μ) .f ( x 2 ,μ) ... f ( xn ,μ)


1 1 1
1 − ( x1 −μ)2 1 − ( x 2 −μ )2 1 − ( x n −μ )2
= e 50
. e 50
... e 50

50π 50π 50π


n
n 1
 1  − 50 ∑
2
( xi − μ )
f ( x1, x 2 ,...xn ,μ) =   e
i =1

 50π 

We now try to factor the joint probability density function as the product of two
functions, one of which is a function of parameter µ and statistic and another is
independent of the parameter µ. But there is no separate term which is a
n
function of xi’s in ∑ (x
i =1
i − μ)2 . Therefore, we use a trick to factor of the joint pdf.

An easier task is to add and subtract x to the quantity in parentheses in the


summation. That is
n
n 1
 1  − 50 ∑
2
( xi − x + x − μ )
f ( x1, x 2 ,...xn ,μ) =   e
i =1

 50π 

Now, squaring the quantity in parentheses, we get


n
n 1
 1  − 50 ∑
( x − x ) 2 2
+ ( x −μ) − 2( xi − x )( x −μ) 
 i
f ( x1, x 2 ,...xn ,μ) = 

 e
i =1

 50π 

And then distributing the summation, we get


n n n
n − 1  
50  ∑
 ( x i − x ) + ∑ ( x −μ ) − 2 ( x −μ ) ∑ ( xi − x ) 
2 2
 1 =
f ( x1, x 2 ,...xn ,μ) =   e i 1=i 1 =i 1 

 50π 
n
But the last term in the exponent ∑(x i =1
i − x) =
0 (by property of mean), and the

second term can be added up n times because it does not depend on the
index i, therefore, we get
n
n 1  
 1  − 50 ∑
2 2
( xi − x ) + n ( x −μ ) − 0 
f ( x1, x 2 ,...xn ,μ) =   e
i =1 

 50π 
n
n 1  2
 1  − 50 ∑
2
( xi − x ) + n ( x −μ ) 

=  e i =1

 50π  231
Block 2 Properties of Good Estimator
n
n 1 n
 1  − ∑( x i − x )2 − ( x −μ )2
f ( x1, x 2 ,...xn ,μ) = 
50 i =1 50
 e
 50π 
Now, we can easily factor the joint probability density function as
  1 n − 
n
1

n
( x −μ )2 ∑ ( x i − x )2
f ( x1, x 2=
,...xn ,μ) e ×  
50 i =1
50
e
 50π  
 

= g  t ( x ) ,μ .h ( x1, x 2 ,..., xn )


n
− ( x −μ )2
Where g  t ( x ) ,μ = e 50
is a function of parameter µ and sample values
n
n 1
 1  − 50 ∑
2
( xi − x )
x1, x 2 ,..., xn only through t ( x ) = x , whereas h ( x1, x 2 ,..., xn ) =   e
i =1

 50π 
is independent of µ. Hence, by the factorization theorem of sufficiency, the
statistic sample mean X is a sufficient statistic for µ when σ2 is known.
We now check whether X2 and X3 are sufficient statistics for µ.

Since y = X2 is not a one-to-one function of X because we get two possible


values, namely − X and X . Therefore, X2 is not a sufficient statistic for µ.

If we are given the value of y = X3 , we can easily get the single value of X
through the one-to-one function y1/3 . Therefore, X3 is also sufficient for µ.

Example 6: Suppose the time between medication doses follows a gamma


distribution whose pdf is given as follows:
ba −bx a −1
f ( x,a,b ) =
= e x ; x > 0, a,b > 0
a
Obtain sufficient statistic for
(i) ‘a’ when ‘b’ is known
(ii) ‘b’ when ‘a’ is known
(iii) ‘a’ and ‘b’ both.
Solution: To find the sufficient statistic, first, we have to find the joint
probability density function of the sample values of the gamma distribution. Let
X1, X2, …, Xn be a random sample taken from the gamma distribution with
parameters ‘a’ and ‘b’.
We can obtain the joint probability density function as
f ( x1, x 2 ,..., xn ,a,b ) = f ( x1, a,b ) f ( x 2 , a,b ) ...f ( xn , a,b )

ba ba ba
= e−bx1 x1a −1. e−bx2 x a2−1 ... e−bxn xna −1
a a a
After simplification, we get
n

bna ∑ xi  n a −1
−b
f ( x1, x 2=
,..., xn ,a,b ) e i =1  Π x i 
( a)
n
 i=1 

232 Since there are two parameters (a, b) so we consider the following cases:
Unit 9 Sufficiency and Minimal Sufficiency

Case I: When ‘b’ is known then we treat ‘b’ as a constant and find the
sufficient statistic for ‘a’.
We can factor the joint probability density function as
 na n a −1 
n

 b    −b ∑ xi
f ( x1, x 2 ,...,=
xn ,a )   Π x  e i =1 Since ‘b’ is known so ‘b’

( )  
n i
 a 
i =1
is treated as a constant.
 

= g [ t(x),a] .h ( x1, x 2 ,...xn )


a −1
bna  n 
where g [ t(x),a
= ] Πx
n  i =1 i 
is a function of the parameter ‘a’ and the
( )
a  
n
n −b ∑ xi
sample values x1, x 2 ,..., xn only through t(x) = Π xi and h ( x1, x 2 ,..., xn ) = e i =1

i =1

is a function of sample values x1, x 2 ,..., xn (‘b’ treated as a constant) and is


independent of the parameter ‘a’.
n
Hence, by the factorization theorem of sufficiency, Π Xi is a sufficient statistic
i =1

for the parameter ‘a’


Case II: When ‘a’ is known then we treat ‘a’ as a constant and find the
sufficient statistic for ‘b’.
We now factor the joint probability density function as

 − b ∑ xi 
n
 a −1 
 1  n
 
f ( x1, x 2 ,..., x n ,b )  b e
= na  Πx Since ‘a’ is known so ‘a’ is
  a n  i=1 i  
i = 1


  ( ) 
treated as a constant.

= g [ t(x),b] .h ( x1, x 2 ,...xn )


n
−b ∑ xi
where g [ t(x),b] = bna e i =1
is a function of the parameter ‘b’ and the sample
a −1
n
1 n 
values x1, x 2 ,..., xn only through t(x) = ∑ xi and h ( x1, x 2 ,...,
= xn ) Πx
n  i =1 i 
i =1
( )
a  

is a function of sample values x1, x 2 ,..., xn (‘a’ treated as a constant) and is


independent of the parameter ‘b’.
n
Hence, by the factorization theorem of sufficiency, ∑ Xi is a sufficient statistic
i =1

for ‘b’.
Case III: When ‘a’ and ‘b’ are unknown then we find jointly sufficient statistics
for ‘a’ and ‘b’.
Since we cannot separate any term of the joint probability density function
which is independent to both ‘a’ and ‘b’, therefore, we can factor the joint
probability density function as
 na −b n x 
 b e ∑
a −1
n   .1
f ( x1, x=
2 ,..., x n ,a,b )
i


i =1
Π xi  
( )
n
 i=1 
 a  233
Block 2 Properties of Good Estimator
= g [ t1(x), t 2 (x), a,b] .h ( x1, x 2 ,..., xn )
n

bna ∑ xi  n a −1
−b
where, g [ t1(x),
= t 2 (x), a,b] e i =1  Π xi  is a function of the parameters
( a)
n
 i=1 
n
‘a’& ‘b’ and the sample values x1, x 2 ,..., xn only through t1(x) = Π xi and
i =1
n
t 2 (x) = ∑ xi whereas, h ( x1, x 2 ,...xn ) = 1 and independent of the parameters ‘a’
i =1

and ‘b’.
n n
Hence, by the factorization theorem, Π Xi and
i =1
∑ X are jointly sufficient for the
i =1
i

parameters ‘a’& ‘b’.


Example 7: If X1 , X2 , ..., Xn is a random sample taken from a uniform
distribution U(α, β), find the sufficient statistics for α and β.

Solution: The probability density function of U(α, β) is given by


1
f ( x,
= α, β ) ; α≤ x≤β
β−α

We can obtain the joint probability density function as


f ( x1, x 2 ,..., xn ,α,β ) = f ( x1, α, β ) .f ( x 2 , α, β ) ... f ( xn , α, β )

1 1 1
= . ...
β−α β−α β−α

1
The order statistics of a f ( x1, x 2 ,..., xn ,α,β ) =
(β − α )
n
random sample
are the
Since the range of variables depends upon the parameters so we consider
sample values placed in
ascending order of ordered statistics X(1), X(2),…, X(n) instead of the sample observations X1, X2,…,
magnitude. These are Xn . Therefore, we can write the joint probability density function as
denoted by
1
f ( x1, x 2 ,...,=
xn ,α,β ) ; α ≤ x (1) ≤ x ( 2) ≤ ... ≤ x (n) ≤ β
(β − α )
n

 
=
1
 ( β − α )
(
I x (1) ,α I2 x (n) ,β
n 1 ) ( ) .1

where, x(1) and x(n) are the minimum and maximum sample observations,
respectively, and

 1; if x (1) ≥ α
(
I1 x (1) ,α = )
0; otherwise

 1; if x (n) ≤ β
(
I2 x (n) ,β = )
0; otherwise

Therefore,

234
f ( x1, x 2 ,..., xn ,α,β ) = g [ t1(x), t 2 (x), α, β] .h ( x1, x 2 ,..., xn )
Unit 9 Sufficiency and Minimal Sufficiency

 
where, g [ t1(x), t 2 (x), α,β] = 
1
 ( β − α )
( I
n 1
x ) (
(1) ,α I2 x )
(n) ,β  is a function of

parameters (α, β) and sample values x1, x 2 ,..., xn only through t1(x) = x (1) and
t 2 (x) = x (n) whereas, h ( x1, x 2 ,...xn ) = 1 and independent of parameters ‘α’ and
‘β’.
Hence, by the factorization theorem of sufficiency, X(1) and X(n) are jointly
sufficient for α and β.
Now, you will get more clearly about how to factor the joint probability density
(mass) function and obtain the sufficient statistic, when you try the following
Self Assessment Question.

SAQ 3
(i) If the time between two customers arriving in a bank follows an
exponential distribution with parameter θ, then find the sufficient statistic
for θ.

(ii) Consider Example 5 of the life of the lithium batteries used in cars. If the
life of the lithium battery follows a normal distribution with a mean 95
months and variance σ2. Find the sufficient statistic for σ2.

(iii) The time interval between two metro trains follows the uniform
distribution [0, θ]. Find the sufficient statistic for θ.

Let us discuss the properties of the sufficient statistic in the next section.

9.5 PROPERTIES OF SUFFICIENT STATISTIC


After understanding the concept of sufficient statistic, we now describe some
important properties of the sufficient statistic as follows:
1. A sufficient estimator/statistic is always a consistent estimator.

2. A sufficient estimator/statistic may be unbiased.

3. A sufficient estimator is the most efficient estimator if an efficient estimator


exists.
4. The random sample X1, X2 ,..., Xn and order statistics X(1) , X( 2) ,..., X(n) are
always sufficient statistics because both contain all information about a
parameter of the population.

5. If T is a sufficient statistic for the parameter θ and f(T) is a one-to-one


function of T then f(T) is also sufficient for θ. For example, if T = ∑ Xi is
1 T
sufficient statistic for the parameter θ then
= X =
n
∑ Xi
n
is also sufficient
T
for θ because X = is a one-to-one function of T.
n

After understanding the sufficiency, we now discuss the concept of minimal


sufficient statistic in the next section. 235
Block 2 Properties of Good Estimator

9.6 MINIMAL SUFFICIENT STATISTIC


In Sections 9.3 and 9.4, you studied the concept of sufficient statistic and how
to use the Fisher-Nayman factorization theorem to find a sufficient statistic/
estimator, respectively. In section 9.5, you studied the properties of the
sufficient statistic and one property of the sufficient statistic is that every one-
to-one function of a sufficient estimator is also sufficient statistic for the same
parameter and also the whole sample and order statistics are sufficient for a
parameter. In Example 2, you have seen that the sample mean X and X3 are
sufficient statistics for the population mean when the sample is taken from
normal distribution. Also, the whole sample X1, X2, …, Xn and order statistics
X(1), X(2), …, X(n) are also sufficient statistics for the population mean µ. Now
the question may arise:
(i) Are they all as “good” as one another?
(ii) Is there some reason to prefer one over another?
It means that to give the answer of the same, we require sometimes more.
The minimal sufficient statistic does the same job for us. Actually, we use the
concept of sufficient statistic to condense the data X1, X2, …, Xn using a
statistic in such a way that no information will be lost. In general, if a statistic
condenses the data more than the other then we prefer that statistic. A
sufficient statistic is said to be a minimal sufficient statistic if no other sufficient
statistic condenses the data more.
We can formally define minimal sufficient statistic as
A sufficient statistic is defined to be a minimal sufficient statistic if and
only if it is a function of every other sufficient statistic.
Like the definition of the consistent estimator and sufficient statistic, the
definition of the minimal sufficient statistic has little use in finding a minimal
sufficient statistic. It is noted that if the joint probability density function is
properly factored, then the Fisher-Nayman factorization criterion will give a
minimal sufficient statistic. So, we can say that the statistic found in Examples
3 to 7 are minimal.

9.7 SUMMARY
In this unit, we have covered the following points:

• The joint probability mass function for discrete distribution sample values

is defined as
θ ) P [=
f ( x1, x1,..., xn ,= X1 x1 ] P [=
X2 x 2 ] ...P [=
Xn x n ]

• The joint probability density function for continuous distribution sample


values is defined as
f ( x1, x1,..., xn , =
θ ) f ( x1, θ ) .f ( x 2 , θ ) ... f ( xn , θ )

• A statistic T = T(X) is said to be a sufficient statistic for a parameter θ if it


236 contains “all of the information” about θ that is available in the sample. In
Unit 9 Sufficiency and Minimal Sufficiency

other words, we do not lose any information about θ by reducing the


sample X to the statistic T. This property of an estimator is called
sufficiency.

• A statistic T = t(X1, X2,…, Xn) is a sufficient statistic if, for each t, the
conditional distribution of X1, X2,…, Xn given T = t and θ does not depend
on θ.

• A statistic or estimator T is said to be sufficient for a parameter θ if and


only if the joint probability density (mass) function of X1, X2,…, Xn can be
factored as
f ( x1, x 2 , ..., x=
n, θ) g [ t(x), θ] .h ( x1, x 2 , ..., xn )

where the function g [ t(x),θ] is a non-negative function of the parameter


θ and observed sample values (x1, x 2 ,..., xn ) only through the function t(x)
and the function h ( x1, x 2 ,..., xn ) is a non-negative function of (x1, x 2 ,..., xn )
and does not involve the parameter θ.
• Explain the properties of sufficient statistic.
• A sufficient statistic is defined to be a minimal sufficient statistic if and
only if it is a function of every other sufficient statistic.

9.8 TERMINAL QUESTIONS


1. Consider Example 3 and find the sufficient statistic for the parameter θ
using the Fisher-Nayman factorization theorem. Is it a minimal sufficient
statistic.
2. Define sufficiency and minimal sufficient statistic.

9.9 SOLUTIONS / ANSWERS


Self Assessment Questions (SAQs)
1. Since the number of patients who were cured the disease follows a
binomial distribution with parameters n = 10 and p, therefore, its
probability mass function is given by

n
P [ X = x ] =   p x (1 − p ) ; x = 0, 1, 2, ...n & 0 ≤ p ≤ 1
n− x

x
 10 
=   p x (1 − p )
10 − x
; x = 1,2,...,10 (since n = 10)
 x

If X1, X2,…, Xn denote the outcomes of the drug then by the definition of
the joint probability mass function of the sample observations
X1 , X2 , ..., X10 , we have

f ( x1, x1,..., x=
n ,p ) [ X1 x1 ] P=
P= [ X2 x 2 ]...P=
[ Xn x10 ]
We can obtain the joint probability mass function by putting X as
x1, x2, …, x10 in the probability mass function as mentioned above.
Therefore, 237
Block 2 Properties of Good Estimator
 10  x1 10 − x1  10  x
f ( x1, x1,..., x10 ,p ) =
  p (1 − p ) .   p 2 (1 − p ) 2
10 − x

x
 1 x
 2
 10  x10
 p (1 − p )
10 − x10
... 
 x10 
Collecting like terms, we get
10
 10  x1 + x2 +...+ x10
f ( x1, x1,..., x10 ,p ) ∏ x (1 − p )10−
10 +10 + ... +10 −( x1 + x 2 +...+ x10 )
= p

times
i =1  i

On simplifying, we get the required joint probability mass function as


10

 10  ∑
10
10 xi
100 − ∑ xi
f ( x=
1, x1,..., x 10 ,p ) ∏  p i =1
( ) i =1
1 − p
i =1  x i 

2. To show that the statistics T = X is a sufficient statistic, we have to find


the condition distribution of the sample observations given T. Therefore,
we consider
f ( x1, x 2 , ..., xn ,t )
f ( x1, x 2 , ..., xn | t ) =
f (t)

It means that we require f ( x1, x 2 , ..., xn ,t ) and the distribution of the


statistic T = X . We first find f ( x1, x 2 , ..., xn ,t ) as discussed in Example 3,
that is,
f ( x1, x 2 , ..., xn ,t ) = f ( x1, x 2 , ..., xn ,t )

=f ( x1, θ ) f ( x 2 , θ ) ...f ( xn , θ )
x x x
1 − θ1 1 − θ2 1 − θn
= e e ... e
θ θ θ
n
1
1 − θ ∑ xi
= n e i =1
θ

1 − nθ t  1 n 
f ( x1=
, x 2 , ..., xn , t ) = e  t ∑ xi 
θn  n i=1 

We now find the distribution of T. Since the sample comes from the
exponential distribution, therefore, X1, X2, …, Xn follow the same
exponential distribution with parameter θ. Also, we know that if X1, X2, …,
n
Xn follow the exponential distribution then ∑ X will follow a gamma
i =1
i

distribution with parameter (n, 1/θ) and the statistic T = X will follow the
gamma distribution with parameters (n, n/θ). Therefore, the pdf of T = X
is given as follows:
n n
 n  − θ t n−1
θ e t
=f (t)   ; t>0
n
Therefore, we can find the condition distribution of the sample
238 observations given T as
Unit 9 Sufficiency and Minimal Sufficiency
n
1 −θt
f ( x1, x 2 , ..., xn ,t ) n
e
( 1 2
f=x , x , ..., x | t ) = θ
f (t)
n n n
 n  − θ t n−1
θ e t
 
n
n
= n n −1
nt
Since the conditional distribution of X1, X2, …, Xn given T = X does not
depend on the parameter θ. Therefore, T is indeed sufficient for the
parameter θ.
3(i) Here, we take random sample from exp (θ) whose probability density
function is given by
f ( x, θ ) = θ e−θx ; x > 0 & θ > 0

To find the sufficient statistic, first, we have to find the joint probability
density function of the sample values of the exponential distribution. Let
X1, X2, …, Xn be a random sample taken from the exponential distribution
with parameter θ. We can obtain the joint probability density function of
the exponential distribution as
f ( x1, x 2 ,...xn , θ=
) f ( x1, θ ) .f ( x 2 , θ ) ... ( xn , θ )
n
−θ ∑ xi
θe−θx1 . θ e−θx2 ... θ e−θxn =
= θn e i =1

 1 n 
f ( x1, x 2 ,...xn , θ ) =θn e −θnx  x = ∑ xi 
 n i=1 

Since we cannot separate any term of the joint probability density


function which is independent to θ, therefore, we can factor the joint
probability density function as

(
f ( x1, x 2 ,...xn , θ ) = θn e−θnx .1 )
= g  t ( x ) , θ  .h ( x1, x 2 ,..., xn )

where g  t ( x ) , θ  =θn e−θnx is a function of the parameter θ and the


sample values x1, x 2 ,..., xn only through t ( x ) = x and h ( x1, x 2 ,..., xn ) = 1 , is
independent of θ. Hence, by the factorization theorem of sufficiency, X is
a sufficient statistic/estimator of θ.
3(ii) To find the sufficient statistic for the parameter σ2, we use the
factorization theorem. We know that the probability density function of
the normal distribution with parameter µ and variance σ2 as
1
1 ( x −µ ) 2
( )

f x, µ, σ2
= e 2 σ2
; − ∞ < x < ∞, ∞ < µ < ∞
2πσ2

In our case, µ is known as 95, therefore,


1
1
2
( x −95 )
( )

f x, σ2 = e 2σ
2

2πσ2 239
Block 2 Properties of Good Estimator
Let X1, X2, …, Xn be a random sample taken from the above normal
distribution. We can obtain the joint probability density function of the
sample observations X1, X2, …, Xn as

( ) ( ) (
f x1, x 2 ,...xn , σ2 = f x1, σ2 .f x 2 , σ2 ... f xn , σ2 ) ( )

1 1
1 − ( x1 −95 ) 2 1 − ( x2 −95 ) 2
= e 2 σ2
. e 2 σ2

2πσ2 2πσ2
1
1 − ( xn − 95 ) 2
... e 2 σ2

2πσ2
n
n 1
 1  − 2 σ2 ∑
2
( xi − 95 )
=  e i =1

2
 2πσ 

We can factor the joint probability density function as

  1 n − 12 ∑ ( xi −95 )2   1 n
n

(
f x1, x 2 ,...xn , σ =
2
)  e
  σ2 
2 σ i =1 
  2π 
 

= g  t ( x ) , σ2  .h ( x1, x 2 ,..., xn )
n
n 1
 1  − 2 σ2 ∑
2
( xi −95 )
where g  t ( x ) , σ2  =
  e i =1
is a function of parameter σ2
2
 σ 
n
t (x) ∑(x − 95 ) , whereas
2
and sample values x1, x 2 ,..., xn only through = i
i =1
n
 1 
h ( x1, x 2 ,..., xn ) =   is independent of σ . Hence, by the factorization
2

 2π 
n
theorem of sufficiency, ∑ ( Xi − 95 ) is a sufficient estimator for σ2 when µ
2

i =1

is known as 95.

3(iii) The probability density function of uniform distribution U [0,θ] is given as


follows:
1
f ( x,
= θ) ; 0 ≤ x ≤ θ, θ > 0
θ

To find the sufficient statistic, first, we have to find the joint probability
density function of the sample values. Let X1, X2, …, Xn be a random
sample taken from U[0, θ]. Therefore, we can obtain the joint probability
density function of the sample X1, X2,…, Xn as

f ( x1, x 2 ,...xn , =
θ ) f ( x1, θ ) .f ( x 2 , θ ) ... f ( xn , θ )

1 1 1 1
= =. ...
θ θ θ θn

Since the range of variable depends upon the parameter θ, so we


consider ordered statistics X(1) ≤ X( 2) ≤ ... ≤ X(n) .

240 Therefore, we can factor the joint probability density function as


Unit 9 Sufficiency and Minimal Sufficiency

1
f ( x1, x 2 ,...x=
n, θ) ; 0 ≤ x (1) ≤ x ( 2) ≤ ... ≤ x (n) ≤ θ
θn

1
=  n I x (n) , θ
θ
( ) .1
where,

 1 ; if x (n) ≤ θ
( )
I x (n) , θ =
0 ; otherwise

Therefore,

f ( x1, x 2 ,...x=
n, θ) g  t ( x ) , θ  .h ( x1, x 2 ,..., xn )

where g  t (=
1
x ) , θ 
θn
( )
I x (n) , θ is a function of θ and sample values only

through t(x) = x (n) , whereas h ( x1, x 2 ,..., xn ) = 1 is independent of θ.

Hence, by the factorization theorem X(n) is a sufficient statistic for the


parameter θ.

Terminal Questions (TQs)


1. If the outcome of each coin is represented by random variable X then it
follows the Bernoulli distribution with the probability of getting head p.
Therefore, the probability mass function of X is given as follows:

P [ X = x ] = p x (1 − p )
1− x
; x = 0,1, 0 < p < 1

We can obtain the joint probability mass function of the three coins as
f ( x1, x1, x=
3 ,p ) [ X1 x1 ] P=
P= [ X2 x 2 ] P=
[ X3 x 3 ]
We can obtain the joint probability mass function by putting X as x1, x2, x3
in the probability mass function as mentioned above. Therefore,

f ( x1, x1, x 3 ,p ) =−
p x1 (1 p ) .p x2 (1 − p ) .p x3 (1 − p )
1− x1 1− x 2 1− x3

Collecting like terms, we get

f ( x1,=
x1, x 3 ,p ) p x1 + x2 + x3 (1 − p )
1+1+1− ( x1 + x 2 + x3 )

On simplifying, we get the required joint probability mass function as


3

∑ xi 3

f ( x1, x 2 ,=
x 3 ,p ) p i =1 (1 − p ) ∑i=1 i
3− x

We now try to write the joint mass function as the product of two
functions, but we cannot separate any term of the joint probability mass
function which is independent of p, therefore, we can factor the joint
probability mass function as
 ∑ xi 
3
3
3 − ∑ xi
f ( x1, x 2=
, x 3 ,p )  p i =1
(1 − p ) i=1  .1

 

= g  t ( x ) , θ  .h ( x1, x 2 ,..., xn ) 241

You might also like