Statistical Inference Course Outline
Statistical Inference Course Outline
CHAPTER ONE
THEORY OF ESTIMATION
5.1 Introduction
Inferential statistics is basically concerned with techniques for drawing
conclusions about an entire population based on the results from a study of a sample
drawn from the population. The conclusion may be to predict the values of some
population parameters or to supply a range of values which has a known probability
of including the true value of the parameter, or to assess the probability of certain
kinds of results under certain population conditions.
A population is defined in statistics as the entire body of items about which we want
to obtain some information or reach an opinion. It is the totality of all units whose
attribute is under investigation. Sometimes, when a population is under study, it
might be impracticable to extract the required data from every item of that
population. We might be contented with just a part of the population selected
according to some rules. This fraction of the population which is selected for the
purpose of study, so as to make some statements of conclusion about the entire
population is called a sample.
Any descriptive measure obtained (by calculation or otherwise) from sample values
is called a statistic, while those obtained using the entire population data are called
parameters. Examples of sample statistics are sample mean, 𝑋̅ , sample variance, 𝑆 2 ,
sample standard deviation, 𝑆, sample median, sample size, n, and every other
1
measure calculated or obtained from sample data. Examples of population
parameters are population mean, population variance, 𝜎 2 , population standard
deviation, 𝜎, population correlation coefficient, 𝜌, population size N, and indeed every
other measure obtained by using full population data.
Any statistic T, derived from a random sample, and used to give information
about some unknown population parameter 𝜃, is called an estimator for 𝜃. If the
estimator T, for a population parameter 𝜃 is given by a single value, then the estimator
is called a point estimate. Interval estimation is concerned with locating a range of
values within which a population parameter is expected to lie with a given degree of
confidence or probability. This degree of confidence is usually expressed in
percentage or in probability form.
(a) Unbiasedness:
Let 𝜃 be an unknown population parameter, and T an estimator for 𝜃. Then T is said
to be an unbiased estimator for 𝜃 if E(T) = 𝜃, where E(T) stands for the mathematical
expectation (or average value of T). It should be noted that for any population
parameter 𝜃, it is possible to have more than one unbiased estimators for 𝜃.
Theorem 5.1
Let X1, X2, X3,……,Xn be a random sample of size n, drawn from a population having
unknown population mean μ and variance σ2. Then
(i) the sample mean 𝑋̅ is an unbiased estimator for the population mean μ
2
𝑛
(ii) 𝑆 2 is an unbiased estimator for σ2
𝑛−1
Proof:
(i)
𝑋1 + 𝑋2 + 𝑋3 + ⋯ + 𝑋𝑛 ∑𝑋𝑖
𝑋̅ = =
𝑛 𝑛
∑𝑋
Now, E(𝑋̅ ) = 𝐸[ 𝑖 ] = 1⁄𝑛 ∑𝐸 (𝑋𝑖 ) = 𝜇
𝑛
1
= 𝐸 ( ∑(𝑋𝑖 − 𝜇)2 − 𝑛(𝑋̅ − 𝜇)2 )
𝑛−1
1
= { ∑𝐸((𝑋𝑖 − 𝜇)2 ) − 𝑛 𝐸((𝑋̅ − 𝜇)2 )
𝑛−1
1 𝜎2 1
= [ 𝑛𝜎 2 − 𝑛 ] = (𝑛𝜎 2 − 𝜎 2 )
𝑛−1 𝑛 𝑛−1
1
= (𝑛 − 1)𝜎 2 = 𝜎2
𝑛−1
𝑛 𝑛
Hence, 𝐸 [ 𝑆2] = 𝜎 2 ⇒ 𝑆 2 is an unbiased estimator for σ2
𝑛−1 𝑛−1
Exercises
Show that ∑(𝑋𝑖 − 𝑋̅ )2 = ∑𝑋𝑖2 − 𝑛𝑋̅ 2 and that 𝑆 2 is not an unbiased estimator for σ2.
3
Example 5.1
The following random sample was obtained from a population:
12, 8, 11, 10, 8, 8, 13, 9, 11, 10.
Find 𝑋̅ and 𝑆 2 , and hence obtain unbiased estimates of μ and σ2.
Solution:
𝑛𝑆 2
Unbiased estimates of μ and σ2 are given by 𝑋̅ and , respectively.
𝑛−1
2
𝑛𝑆 2 10 𝑥 2.8
𝜎̂ = = = 3.11
𝑛−1 9
1
This is equivalent to 𝜎̂ 2 = ∑(𝑋𝑖 − 𝑋̅ )2
𝑛−1
NOTE
Sample variance is 𝑆 2 = 1⁄𝑛 ∑(𝑋𝑖 − 𝑋̅ )2
Unbiased estimate for population variance from sample data is
1
𝜎̂ 2 = ∑(𝑋𝑖 − 𝑋̅ )2
𝑛−1
1
But the actual population variance is 𝜎 2 = ∑(𝑋𝑖 − 𝜇)2
𝑁
(b) Efficiency
Let T1 and T2 be distinct unbiased estimators for 𝜃, where 𝜃 is an unknown population
parameter, i.e. E(T1) = E (T1) = 𝜃.
Both T1 and T2 will have well defined distributions, and suppose their variances are
not equal, and if Var(T1) < Var(T2), then we would say that T1 is a more efficient
estimator of 𝜃 than T2. Efficiency is therefore concerned with the comparison of all
unbiased estimators of a parameter 𝜃. The one with the smallest variance is called the
4
most efficient estimator of 𝜃. This property is also called Minimum Variance Unbiased
Estimator (MVUE).
Theorem 5.2
Given any random sample of size n from a population with mean μ and variance σ 2.
(i) The most efficient estimator for μ is 𝑋̅
𝑛
(ii) The most efficient estimator for σ2 is 𝑆2
𝑛−1
Example 5.2
1
Given a random sample X1, X2, X3,…, Xn of size n show that both (X1+X2+X3) and
3
1
(X4+X5) are unbiased estimators for μ . Which of the two is more efficient? Hence
2
show that of all unbiased estimates derived from the sample, 𝑋̅ is the most efficient.
Solution:
1 1
Clearly E{ (X1+X2+X3)} = .3μ = μ
3 3
5
(c) Consistency
This is concerned with the behaviour of an estimator as the sample size n becomes
large. Given sample X1, X2, …………,Xn of size n from a population with unknown
parameter 𝜃. Supposed T is an unbiased estimator of 𝜃 derived from the sample.
If as n → ∞, Var(T) → 0 or lim 𝑃[|𝑇 − 𝜃|] → 0 , then T is called a consistent estimator
𝑛→∞
for 𝜃. Simply put as the sample size increases, a sufficient estimator becomes more
reliable.
𝜎2
For example Var{ ½ (X4+X4)} = which clearly is independent of n, as the
2
sample size increases, remains constant and does not tend to zero.
Hence, ½ (X4+X5) though, unbiased, is not a consistent estimator of μ
Theorem 5.3
i. 𝑋̅ is a consistent estimator for μ
𝑛
ii. 𝑆 2 is a consistent estimator for σ.
𝑛−1
Proof:
(i) Var (𝑋̅ ) = Var{1⁄𝑛 ∑𝑋𝑖 }
1 1 𝜎2
= 2
Σ𝑉𝑎𝑟 (𝑋𝑖 ) = 2
× 𝑛𝜎 2 =
𝑛 𝑛 𝑛
𝜎2
lim 𝑉𝑎𝑟(𝑋̅ ) = lim [ ]→0
𝑛→∞ 𝑛→∞ 𝑛
1 𝜎2
= ∑{ 𝜎 2 − }
(𝑛−1)2 𝑛
6
𝑛 𝜎2
lim { 𝑆 2 } = lim [ ]→0
𝑛→∞ 𝑛−1 𝑛→∞ 𝑛−1
𝑛
∴ 𝑆 2 is a consistent estimator for σ2
𝑛−1
(d) Sufficiency
An estimator is said to be sufficient if it extracts from the sample every bit of available
information relative to the parameter.
Example 5.3
A random sample X1, X2, X3, ……..,Xn is drawn from a distribution with mean μ and
variance σ2 both assumed unknown. Consider the statistic
1
T= ∑𝑋𝑖 . Show that Var(T) < Var(𝑋̅ ) for all values of n. explain why this does not
𝑛+1
Solution:
𝜎 2
Now Var(𝑋̅ ) = ,
𝑛
1 1 𝑛𝜎 2
Var (T) = Var [ ∑𝑋𝑖 ] = ∑𝜎 2 =
𝑛+1 𝑛+1 (𝑛 + 1)2
𝑛𝜎2 𝜎2
To show that < ,
(𝑛+1)2 𝑛
𝑛𝜎2 𝜎2
Multiply both sides by σ2 to get <
(𝑛+1)2 𝑛
7
1 𝑛
= 𝑛𝜇 = 𝜇 ≠ 𝜇
𝑛+1 𝑛+1
Hence T is not an unbiased estimator for μ. Thus the fact that 𝑋̅ is the most efficient
unbiased estimator for μ has not been contradicted, and the biasedness of T precludes
it from being a consistent estimator for μ.
Proof
{𝑛𝑋̅ +𝑚𝑌̅} 1 1
(a) 𝐸[ ]= 𝐸 (𝑛𝑋̅ + 𝑚𝑌̅) = (𝑛𝜇 + 𝑚𝜇) = 𝜇
𝑛+𝑚 𝑛+𝑚 𝑛+𝑚
Exercise
Two random samples of size 8 and 7 respectively are drawn from a population as
follows:
8
X: 6, 8, 9, 10, 17, 14, 13, 11. and Y: 9, 10, 6, 12, 7, 11, 8.
From these two samples, calculate the unbiased estimates of the population mean and
variance. (Answer: 10.07, 8.92)
Let 𝑍𝛼⁄2 be the value of the standard normal variable to the right of which is an area
of 𝛼⁄2under the density function (fig. 5.2). Then we can write.
𝑋̅ − 𝜇
𝑃(−𝑍∝/2 ≤ < 𝑍∝/2 = 1−∝ … … (2)
𝜎/√𝑛
9
Multiply each term in the inequality of (2) by σ √𝑛, subtract 𝑋̅ x from each term and
multiply through by -1, to get
𝛼/2
𝛼/2
−𝑍𝛼/2 𝑍𝛼/2
Example 5.4
Suppose that from a random sample, n = 20, and 𝑋̅ = 64.3 . If the variance of the
population is known to be σ2 = 225, then to obtain a 95% Confidence Interval for the
mean of the population from which the sample was drawn.
Solution
𝜎 𝜎
A (1−∝)100%CI for µ is given by 𝑋̅ − 𝑍𝛼 < µ < 𝑋̅ + 𝑍𝛼
2 √𝑛 2 √𝑛
Hence the CI is given by 64.3 − 1.95 × 15/√20 < µ < 64.3 + 1.95 × 15/√20
= 57.726 < µ < 70.874
Hence, the 95% C.I. for µ is (57.7, 70.9)
Example 5.5
10
Find a 95% C.I. for the true population mean µ if a random sample of 12 from the
population with variance 124 yielded 𝑋̅ = 51.2.
𝟏𝟐𝟒
Solution: CI is given by 𝟓𝟏. 𝟐 ± 𝟏. 𝟗𝟔 × √ = (44.9, 57.5)
𝟏𝟐
For small samples selected from population that do not satisfy the normality
assumption, we cannot expect our degree of freedom to be accurate. However for
samples of sizes n ≥ 30, good results are guaranteed by theory.
where S is the sample variance. The procedure is same as in (a) above except that we
use the t - distribution in place of the standard normal.
𝛼/2
𝛼/2
−𝑡𝛼 𝑡𝛼
2 2
Fig. 5.3
From the figure above (fig. 5.3)
𝑃(−𝑡𝛼 < 𝑇 < 𝑡𝛼 ) = 1−∝ … … … …(2)
2 2
Where 𝑡𝛼 is the t - value with n-1 degrees of freedom, to the right of which there is an
2
area of ∝/2. By symmetry, an equal area of ∝/2 will fall to the left of −𝑡𝛼 . Substituting
2
11
𝑋̅ −µ
𝑃(−𝑡𝛼⁄2 < ( ) √𝑛 < 𝑡𝛼⁄2 ) = 1−∝
𝑆
Thus for a small sample, where n < 30 and with σ2 unknown, a 100(1−∝)%
confidence interval for the mean µ is given by
𝑆 𝑆
𝑋̅ − 𝑡𝛼⁄2 . < µ < 𝑋̅ + 𝑡𝛼⁄2 .
√𝑛 √𝑛
However, for large samples (with size n > 30), a 100(1−∝)% confidence interval for
µ is given by
𝑆 𝑆
𝑋̅ − 𝑍𝛼⁄2 . < µ < 𝑋̅ + 𝑍𝛼⁄2 .
√𝑛 √𝑛
Where 𝑍𝛼⁄2 𝑍 ∝/2 is as defined earlier. This result is possible because for large values
of n, the T-distribution closely approximates that of a standard normal distribution.
Moreover when n is large, S2 is a good estimator for σ2 .
𝑛
We had that 𝑆 2 is an unbiased estimator for σ2. Thus, when n is large enough, we
𝑛−1
𝑛
shall have that E[S2 ] = σ2 , since → 1 𝑎𝑠 𝑛 → ∞ .
𝑛−1
Example 5.6
Suppose a paint maker wishes to determine the true average drying time of a new
paint. He tests twelve areas of equal sizes, and gets a mean of 66.3 minutes and a
standard deviation of 8.4 minutes. Based on this sample, construct a 95% CI for the
actual drying time of the paint.
Solution
∝ = 0.05, 𝑛 = 12, 𝑋̅ = 66.3, S = 8.4, 𝑡𝛼⁄2 = 2.201, since sample size is small.
𝑆 𝑆
Therefore CI for µ is given by 𝑋̅ − 𝑡𝛼⁄2 . < µ < 𝑋̅ + 𝑡𝛼⁄2 .
√𝑛 √𝑛
Which gives 66.3 ± 5.337 i,e. 60.963 < µ < 71.637 is the 95% C.I.
5.6 Error in Estimating the mean
12
The 100(1−∝)% 𝐶. 𝐼. provides an estimate of the accuracy of the point estimate 𝑋̅ .
We must mention that most of the time 𝑋̅ will not be exactly equally to µ, so the point
estimate 𝑋̅ will be in error. The size of this error will be the absolute difference
between 𝑋̅ and µ, and we can only be 100(1−∝)% confident that this difference will
𝜎
not exceed𝑍𝛼 . .
2 √𝑛
Furthermore we may wish to know how large a sample will be necessary to ensure
that the error in estimating µ will not exceed a specified quantity K. We can be
100(1−∝)% confident that the error will not exceed a specified amount K when the
sample size is
(𝑍𝛼⁄2 . 𝜎)2
n =
𝐾
Errors can really be estimated when σ2 is known or when n ≥ 30, otherwise, we may
not expect our level of confidence to be reliable.
that is, Z has standard normal distribution and 𝑃 (−𝑍𝛼 < 𝑍 < 𝑍𝛼 ) = 1− ∝
2 2
𝜎 2 𝜎𝑦2
(𝑋̅ − 𝑌̅) ± 𝑍∝/2 √ 𝑥 +
𝑛 𝑚
13
Hence a 100(1−∝)% confidence interval for µ𝑥 − µ𝑦 when population variances are
known is given by
𝜎 2 𝜎𝑦2
(𝑋̅ − 𝑌̅) ± 𝑍∝/2 √ 𝑥 +
𝑛 𝑚
However, when population variances are unknown, and cannot be estimated from
large samples, then the sample equivalents will be used, resulting in the confidence
1 1
interval, (𝑋̅ − 𝑌̅) ± 𝑡∝/2 S ∗ √ + … …(2)
𝑛 𝑚
(𝑛 − 1)𝑆𝑥2 + (𝑚 − 1)𝑆𝑦2 1 1
(𝑋̅ − 𝑌̅) ± 𝑡∝/2 √ ( + )
𝑛+𝑚−2 𝑛 𝑚
Example 5.7
A sample size of 15 from a normal population with mean µx and variance 60, yields a
sample mean of 𝑋̅ = 70.1; while an independent sample of size 8 from another normal
population with mean µy and variance 40 had a sample mean of 𝑌̅ = 75.3 . Find a
95% CI and another 90% CI for µx - µy .
Solution
σ2x = 60, σ2y = 40, 𝑋̅ = 70.1, 𝑌̅ = 75.3, n = 15, m = 8
𝜎 2 𝜎𝑦2
The appropriate CI is given by (𝑋̅ − 𝑌̅) ± 𝑍∝/2 √ 𝑥 +
𝑛 𝑚
60 40
∴ 𝐶. 𝐼. = (70.1 − 75.3) ± +1.96√ +
15 8
14
(ii) For a 90% CI, ∝= 0.1, 𝑍∝/2 = 1.645
60 40
C.I. = -5.2+1.645√ + ; Hence, -10.135 < µx - µy < - 0.265
15 8
It is difficult to manipulate the inequalities to obtain an interval whose end points are
𝑋
independent of θ. If n is large we use the point estimate for θ at the end points to get
𝑛
15
0.35(0.65)
Therefore, a 95% CI for θ is given as: 0.35 ± 1.96√
400
Solution
X = 208, n = 400
Let θ represent the true population proportion who intend to vote for A.
208
A point estimator of θ is = 0.52and a 95% CI for θ is
400
(0.52)(0.48)
0.52 ± 1.96√
400
∴ 𝑋⁄𝑛 ~𝑁(0.45, 0.00062). Since 𝑋⁄𝑛 is the sample vote proportion for A,
the proportion for B will be 1 - 𝑋⁄𝑛. Hence for A to have at least half of the voters in
favour, we require P(𝑋⁄𝑛≥ 1- 𝑋⁄𝑛 ), which is equivalent to P(𝑋⁄𝑛≥ ½ )
0.5−0.45
Let 𝑍= (0.45)(0.55)
= 2.01
√
400
16
X X
(1− )
(1) the error will not exceed: 𝑍𝛼 √n n
n
(ii) The error will not exceed a specified amount of K when the sample size is
2 X⁄n[ 1−X⁄n]
𝑍𝛼/2 .
n
𝑛 =
𝐾2
In the last expression we must note that
𝑥 𝑥 1
(1 − ) ≤
𝑛 𝑛 4
With this we can be at least 100(1 − 𝛼)% confident that the error will not exceed a
specified amount K when the sample size is
2
𝑍𝛼/2
𝑛=
4𝐾 2
When solving for the sample size, n, all fractional values are rounded up to the upper
integer. When we are to estimate the difference between two proportions, a
100(1 − 𝛼)% CI for the difference between two binomial parameters θ1 and θ2 is
given by
𝑋1 𝑋 𝑋2 𝑋
𝑋1 𝑋2 (1− 1 ) (1− 2 )
𝑛 𝑛 𝑚 𝑚
( − ) ± 𝑍𝛼/2 𝑆, where 𝑆 = +
𝑛 𝑚 𝑛 𝑚
and n is the sample size of the sample from the first binomial population with
parameter θ1 , and m is the sample size of the sample from second binomial
population with parameter θ2.
17
From the distribution, the probability that a random sample produces a X 2 value
greater than some specified value is equal to the area under the curve to the right of
this value.
𝛼/2
𝑋𝛼2
Fig. 5.4
𝑋𝛼2 is used to represent the X2 value to the right of which we find an area of 𝛼 (fig. 5.4).
𝛼/2 𝛼/2
2
𝑋1−𝛼/2 2
𝑋𝛼/2
Fig. 5.5
From this figure we see that
2 2
𝑃(𝑋1−𝛼/2 < 𝑋 2 < 𝑋𝛼/2 ) =1−𝛼 … … … …(2)
2 2
Where 𝑋1−𝛼/2 and 𝑋𝛼/2 are values of the chi-square distribution with n - 1 degrees of
freedom, leaving areas under the curve of 1 − 𝛼/2 and 𝛼/2 respectively, to the right.
From (1) and (2) we have
18
2 (𝑛−1) 2
𝑃 (𝑋1−𝛼/2 < 𝑆 2 < 𝑋𝛼/2 )=1−𝛼
𝜎2
Example 5.10
Suppose in 16 test runs, gas consumption of an engine had s = 2.2 litre. Construct a
99% confidence interval for σ2 as a true indication of the variability of the gas
consumption.
Solution:
Assuming that the data comes from a normal population,
n = 16, s = 2.2, S2 = 4.84, 𝛼 = 0.01, 1 −α/2= 0.995,
2 (15)
𝑋𝛼/2 = 32.80, X21-α/2(15) = 4.60
15(4.84) 15(4.84)
The CI is given by ≤ 𝜎2 ≤ ie 2.213 ≤ 𝜎 2 ≤ 15.783
32.8 4.60
Example 5.11
A random sample of 8 from an approximately normal population gave the following
values: 12, 11, 12, 8, 12, 10, 13, 10. Construct a 90% confidence interval
for the variance of the population.
Solution
𝑋̅ = 11, 𝑆 2 = 2.57, 𝛼 = 0.1, 2
𝑋.05 2
= 14.067, 𝑋.95 = 2.167
The CI is given by
7×2.57 7×2.57
< 𝜎2 < ie 1.279 < σ2< 8.302.
14.067 2.17
19
Note that here S2 is the unbiased estimate of the population variance.
To estimate the ratio of two variances,𝜎𝑋2 and 𝜎𝑌2 , we use the point estimate which is
𝑆𝑋2 /𝑆𝑌2 which is the ratio of the sample variances. If 𝜎𝑋2 and 𝜎𝑌2 are the variances of two
normal populations, we can establish an interval estimate of 𝜎𝑋2 / 𝜎𝑋2 by using the
statistic
𝑆𝑥2 /𝜎𝑥2 𝜎𝑦2 𝑆𝑥2
𝐹= = … … .(5)
𝑆𝑦2 /𝜎𝑦2 𝜎𝑥2 𝑆𝑦2
Where v1 = n - 1, v2 = m - 1 and 𝑓𝛼/2 (v1,v2) is the F-value with (v1, v2) degrees of
freedom, leaving an area of 𝛼/2 to the right.
CHAPTER TWO
TESTS OF HYPOTHESES
6.1 Introduction
A statistical hypothesis is a statement or conjecture about a given population.
Hypothesis testing involves the formulation of a set of rules which will enable us to
make decision (reject or accept a statement) about the given population. In a simple
hypothesis, the functional form of the underlying distribution, as well as the values of
the parameters are stated, Whereas in composite hypothesis, the functional form of
the distribution may be stated without the exact value of the parameter. An example
20
of a simple hypothesis is the statement “the population is binomially distributed with
parameter θ=0.48; while that of a composite hypothesis is “the population is
binomially distributed with parameter θ > 0.48 .
Rejecting a hypothesis means that it is false on the basis of some evidence
provided by a test. Accepting a hypothesis means that we have no evidence to believe
otherwise. A null hypothesis is the statistical assumption we wish to verify or
possibly disprove. In general format, it assumes no deviation from the normal, it is
usually stated in null form with equality constraints, and most often denoted with Ho.
It is the actual focus of a statistical test. The alternative hypothesis, denoted with H 1
or HA, is that hypothesis that is automatically accepted on the rejection of H 0. It is a
negation of the null hypothesis in a statistical sense. It usually puts emphasis on a
range of values. To enable us test H0 against H1, we partition the sample space of
outcomes into two disjoint, mutually exclusive and exhaustive regions called the
acceptance region for H0, and the rejection region for H0.
The rejection region is also called the critical region. The values separating the
acceptance region from the rejection are called critical values. The size of a critical
region is the probability of obtaining an outcome which falls in that critical region.
A type I error is committed when we reject the null hypothesis whereas it is
true. A type II error is committed if we accept the null hypothesis, whereas it is false,
and mathematically, we write
P (type I error) = P(Rejecting H0 given that H0 is true) = α
P (type II error) = P(Accepting H0 given that H0 is false) = β
The value of 𝛼 is the level of significance of the test. The probability of occurrence of
both errors can be decreased by increasing the sample size. The power of any test is
an indication of how well that test will enable us to minimize type II error. A null
hypothesis concerning a population parameter will always be stated so as to specify
an exact value, whereas the alternative allows for several values of the parameter.
21
6.2 General Test Procedure
Performing classical statistical test will involve the following steps:
(i) Make a short summary of known facts, such as nature of the distribution,
population parameters, or sample statistics, etc.
(ii) Formulate the null hypothesis H0
(iii) Formulate the alternative hypothesis, H1, whose acceptance is implied by the
rejection of H0.
(iv) Choose or identify the level of significance (size of the CR)
(v) Determine the appropriate test statistic and the corresponding CR
(vi) Formulate a test criterion.
The decision is usually to Reject the null hypothesis if the value of the test statistic
falls in the CR, otherwise do not reject. A test of any statistical hypothesis where the
alternative is one sided is called a one-tailed test.
An example of such is H0: θ = θo versus H1: θ >θo
Or Ho: θ = θo versus H1: θ <θo .
The Critical Region for H1: θ >θo lies entirely in the right tail of the distribution, while
that of H1: θ <θo lies entirely in the left tail. A test of any statistical hypothesis where
the alternative is two sided is called a two tailed test. The Critical Region is split into
two equal parts, and located in each tail of the distribution of the test statistic. An
example is
Ho: θ = θo versus H1: θ ≠ θo
A test is said to be significant if Ho is rejected at α = 0.05, and highly significant if
H0 is rejected at α = 0.01.
22
present below the appropriate hypotheses, test statistic, and decision criteria (or
critical region).
Case I: To test Ho : µ = µ0 Versus H1: µ < µ0
This is a left sided one-tailed whose Test statistic is
̅ − μ0 )
(X
𝑍= σ
⁄ n
√
For α level of significance, Reject H0 if Z < - Zα
AR AR
-Z Z
Fig. 6.1 LS one tailed test (case I) Fig. 6.2 RS one tailed test (case II)
CR CR
−𝑍 ∝⁄2 𝑍 ∝⁄2
Example 6.1
23
Suppose a certain type of 100 watts bulb has been standardized so that the mean life
of the bulbs is 1000hrs and standard deviation is 128 hrs. A sample of 16 of these
bulbs having mean µ was tested and found to have a mean of 967.8hrs. Test at both
1% and 5% levels of significance the hypothesis that the actual mean is less than
1000hrs.
Solution:
𝑋̅ = 967.8, µo = 1000, σ = 128, n = 16
H0: µ = 1000 H1: µ < 1000
(967.8−1000)√16
𝑍= = −1.00625
128
Example 6.2
Suppose we know that the breaking strength of a certain type of steel bar has a normal
distribution with mean µ and variance 25. The manufacturing process is changed as
a result of research and an observed sample of breaking strength of 100 steel bars
had a mean of 77.8. However the people who were involved in the decision to change
the process have conjectured that µ is above 80. Test at 5% level of significance the
reliability of their proposition.
Solution:
H0: µ = 80, H1: µ ≠ 80, 𝑋̅ = 77.8, µ0 = 80, σ = 5
This is a two-tailed test, so the test statistic is
(𝑋̅ − μ0 )√n (77.8−80)√100
|𝑍|=| |=| | = 4.4
σ 5
24
(b) When Population Variance is Unknown
(𝑋̅ − 𝜇0 )√n
The statistic 𝑇=
S
has a t-distribution with n-1 degrees of freedom. Hence for test concerning the mean
of a normal population where the population variance is unknown, the procedure is
as follows:
Case I
H0: µ = µ0 Versus H1: µ < µ0 , Reject H0 if T < -tα(n-1)
Case II
H0: µ = µ0 Versus H1: µ > µ0 , Reject H0 if T > tα(n-1)
Case III
H0: µ = µ0 versus H1: µ ≠ µ0 , Reject H0 if |T| > tα/2(n-1)
Example 6 .3
Eight different determination of alcoholic contents in a bottle of wine yielded a
sample mean of 𝑋̅ = 16.6% with s=0.06%. If µ is the population mean of the
determination, then test at both 10% and 5% levels of significance the following
hypotheses:
(a) H0: µ = 16.64% Versus H1: µ < 16.64%
(b) H0: µ = 16.64% Versus H1: µ ≠ 16.64%
Solution:
(16.6−16.64)√8
𝑇= = −1.886
0.06
25
At 5% level 𝛼/2 = 0.025, 𝑡0.025 (7) = 2.365, hence we do not reject Ho
26
Hence the decision criteria becomes (at n+m-2 degrees of freedom),
Case I : H0 : µX = µY Versus H1: µX< µY. Reject H0 if T < - tα,
Case II: H0: µx = µy Versus H1: µx> µy . Reject H0 if T > tα, .
Case III: H0: µx = µy Versus H1: µx ≠ µy . Reject H0 if |T | > tα/2 .
(c) When Testing for difference in means, it may sometimes happen that H1 may be
in the form:
H1: µx - µy < d, for the LS one tailed test, or
H1: µx - µy > d, for the RS one tailed test, or
H1: µx - µy ≠ d, for the two tailed test.
In that case, if 𝜎𝑥2 and 𝜎𝑌2 are both known, the test statistic will be
(𝑋̅ − 𝑌̅)−𝑑
𝑍= 2 2
…………………………………………(3)
√𝜎𝑥 + 𝜎𝑌
𝑛 𝑚
27
2
(𝑛 − 1)𝑆 2
𝑋 =
𝜎02
This is the chi-square value for testing H0: σ2 = 𝜎02 against a relevant alternative. The
statistic above has a chi-square distribution with n-1 degrees of freedom. It must be
mentioned that the chi-square distribution is not symmetric about any axis, however
the two tails are of equal probability.
𝐴𝑅 𝐶𝑅
𝑿𝟐∝ (𝒏 − 𝟏)
Fig. 6.4 chi-square distribution
The test procedure would be as follows
Case I: To test H0: σ2 =𝜎02 versus H1: σ2 <𝜎02 , Reject H0 if X2 <𝑋1−𝛼
2
(n-1) .
Case II :To test H0: σ2 <𝜎02 versus H1: σ2 >𝜎02 , Reject H0 if X2 >𝑋𝛼2 (n-1) .
Case III: To test H0: σ2 =𝜎02 versus H1: σ2 ≠ 𝜎02 ,
2 2
Reject H0 if X2 <𝑋1−𝛼/2 (n-1) or if X2 >𝑋𝛼/2 (n-1) .
CR
AR AR CR
Fig. 6.5 CR for 𝜎 2 < 𝜎02 Fig. 6.6 CR for 𝜎 2 < 𝜎02
28
AR CR
2
𝑋1−𝛼/2 2
𝑋𝛼/2
Fig. 6.7 CR for 𝜎 2 ≠ 𝜎02
Example 6.4
A sample of size 9 has variance of 8.01. Test at 5% whether the sample is likely to
have been drawn from a normal population with variance of 9.0
Solution
This is a two tailed test with H0: σ2 = 9,H1: σ2 ≠ 9, n=9, S2 = 8.01.
(𝑛−1)𝑆 2 8×8.01
Test statistic is 𝑋2 = = = 7.12
𝜎02 9
2 2
𝜶 = 0.5 𝑋0.025 (8) = 17.53, 𝑋0.975 (8) = 2.18
X2 0.975 = 2.18, x20.025 (8) = 17.53
Since 2.18 < 7.12< 17.53, we do not reject Ho and hence, we conclude that it is likely
that the sample was drawn from the normal population.
Example 6.5
A soft drink dispensing machine is said to be out of control if the variance of the
contents exceeds 1.15 litres. If a random sample of 25 drinks from this machine has a
variance of 2.03 litres, does this indicate at 5% level of significance, that the machine
is out of control? (Assume that the contents are approximately normally distributed).
Solution
σ2 = 1.15, S2 = 2.03, n = 25, H0 : σ2 = 1.15 (not out of control)
H1: σ2 > 1.15 (the machine is out of control)
29
(𝑛−1)𝑆 2 24 ×2.03
Test statistic 𝑋2 = = = 42.365
𝜎02 1.15
2 (
The critical value is 𝑋0.05 24) = 35.415. Since X2> 𝑋𝛼2 , we reject H0 , and conclude
that the machine is out of control.
H0: 𝜎𝑥2 = 𝜎𝑌2 , versus H1: 𝜎𝑥2 <𝜎𝑌2 or 𝜎𝑥2 > 𝜎𝑌2 or 𝜎𝑥2 ≠ 𝜎𝑌2 .
The test statistic for this test is 𝐹 = 𝑠𝑥2 /𝑆𝑦2 , where 𝑆𝑥2 and 𝑆𝑦2 are variances computed
from the two samples. F so defined is a value of the F – distribution with n-1 and m-1
degrees of freedom. Decision criteria for α level of significance will be
(i) To test H0 against H1: 𝜎𝑥2 < 𝜎𝑦2 , reject H0 if F < F1-α(n-1, m-1)
(ii) To test H0: against H1: 𝜎𝑥2 > 𝜎𝑦2 , reject H0 if F > Fα(n-1, m-1)
(iii) To test H0 against H1: 𝜎𝑥2 ≠ 𝜎𝑦2 , reject H0 ,
if 𝐹 < 𝐹1−𝛼/2 (𝑛 − 1, 𝑚 − 1) 𝑜𝑟 𝐹 > 𝐹𝛼/2 (𝑛 − 1, 𝑚 − 1).
Example 6.6
A large automobile manufacturing company is trying to decide whether to purchase
brand X or brand Y tyres for its new models. To help arrive at a decision, an
experiment is conducted using 12 of brand X and 15 of brand Y. The tyres are run
until they wear out. The results are 𝑋̅ x = 37,900km, Sx = 5100km, 𝑌̅ =
39,800km , and Sy = 5900km. At 5% level of significance, and assuming that the
population are approximately normally distributed, test the hypothesis that
(i) there is no difference in the mean life of the tyres.
(ii) both brands of tyres have same variance.
30
Solution
(i) 𝛼 = 0.05, 𝑋̅ = 37,900, Sx = 5100, n = 12, m = 15, 𝑌̅ = 39,800, and Sy =
5900km. H0: µx = µy H1: µx ≠ µy
Observe that population variances are not known for both populations and H 0
assumes equal variances. Hence test statistic will be, as in 6.4 (c)
| 37900−39800 | 11×5100+14×5950
𝑇= , 𝑆=√ = 74.485
1 1 25
𝑆√ +
12 13
1900
𝑇 = = 65.863
74.485 × 0.387
31
Case I: To test H0: θ = θ0 versus H1: θ < θ0
The critical region of size 𝛼 is given by 𝑋 < 𝐾𝛼 , where 𝐾𝛼 is the largest integer for
which 𝑃( X<𝐾𝛼 / θ = 𝜃0 ) = 𝛴𝑏(𝑋, 𝑛, 𝜃0 ) ≤ ∝
Case II: For H0: θ = 𝜃0 versus H1: θ > θ0,
the critical region of size ∝ is given by X >𝐾𝛼′ , where 𝐾𝛼′ is the smallest integer for
which 𝑃( X >𝐾𝛼′ / θ = θo ) = 𝛴𝑏(𝑋, 𝑛, 𝜃0 ) ≤∝
Case III: For H0: θ = θ0versus H1: θ ≠ θ0, the critical region of size ∝ is given by
′
X≤ 𝐾𝛼/2 and X > 𝐾𝛼/2 , where K and K’ are as defined in case I and case II
above. In all three, X is the number of successes. The decision is that H 0 should be
rejected whenever X falls in the critical region.
Example 6.7
An official of the Delta State University students Union claims that at least 60% of the
student population prefer campus hostel accommodation to off campus. What
conclusion would you draw, if only 11 in a sample of 20 students preferred campus
hostel? (use α = 0.05 level of significance).
Solution
H0: θ = 0.6 H1: θ < 0.6, ∝ = 0.05, n = 20, X = 11.
Critical Region is X< Kα , From table, K = 7 (binomial prob)
∴ CR is X< 7. But X = 11, hence we do not reject H0.
In the case when n, the sample size is large, the normal approximation with
parameters µ = nθ0 and σ2 = nθ0(1- θ0) is used and it provides an accurate test
provided θ0 is not too close to zero or 1. The normal approximation gives
𝑋 − 𝑛𝜃
𝑍=
𝑛𝜃0 (1 − 𝜃)
Which is a value of the standard normal variable Z .
For H0: θ = θ0 versus H1: θ < θ0, reject H0 if Z < - Zα
32
For H0: θ = θ0 versus H1: θ > θ0, reject H0 if Z > Zα
and for H0: θ = θ0 versus H1: θ ≠ θ0 , reject H0 if |Z| > Zα/2
Example 6.8
A union official of Delta State University Students Union claims that at least 60% of
the students prefers campus hostel accommodation to off campus. If in a sample of
200 students, 110 of them preferred campus hostel, test at 5% level of significance if
this claim is exaggerated.
Solution
𝜃0 = 0.6, 𝑛 = 200 (𝑙𝑎𝑟𝑔𝑒 ), 𝑋 = 110, ∝= 0.05,
H0: θ = 0.6 H1: θ< 0.6
Since n is large the normal approximation gives
110 − 200 × 0.6
𝑍= = −1.443
200 × 0.6 × 0.4
Table value is 𝑍0.05 = 1.645, −𝑍0.05 = -1.645
Since 𝑍 > −𝑍0.05 we do not reject the claim, and we conclude that it is not an
exaggeration.
33
𝜃𝑥 – θy
When n is large, 𝑍=
θ (1−θy )
√θx (1−θx )+ y
n m
When H0 is true
𝜃𝑥 − 𝜃𝑦
𝑍=
√{𝜃(1 − 𝜃)(1⁄𝑛 + 1⁄𝑚)}
Where X and Y are the number of successes in each of the two samples.
Therefore in testing H0: θx= θy, the Z value becomes
𝜃𝑥 −𝜃𝑦
𝑍=
√{𝜃(1−𝜃)(1⁄𝑛+1⁄𝑚)}
Example 6.9
An opinion poll was conducted among secondary school students in a certain state to
determine whether to continue with the November / December GCE or not. If 120 of
200 female students prefer May / June SSCE, and 240 of 500 male students prefer
May/June SSCE to November / December GCE, would you agree that the proportion
of female students who favour the scrapping of Nov./Dec. GCE is higher than the
proportion of male students who favour same? (Use α = 0.025 level of significance).
Solution
Let X and Y represent the populations of female and male students respectively.
θx = 0.6, θx = 0.48, n = 200, m = 500, ∝ = 0.025
360
H0: θx = θy, H1: θx >θy , and θ = = 0.514
700
34
0.6−0.48
𝑍= 1 1
= 2.87, and 𝑍0.025 = 1.96
√{(0.514)(0.486)( + )}
200 500
Since Z > Z0.025. we reject H0 and conclude that the proportion of female students
who prefer May/June SSCE is higher than the proportion of male students who prefer
same.
CHAPTER THREE
GOODNESS – OF - FIT TESTS
7.1 Introduction
Goodness of fit tests are tests that determine if a population has a specified theoretical
distribution. It measures how good a fit we have between the frequency of occurrence
of observations in an observed sample and the expected frequencies obtained from
the hypothesized distribution.
Under the general goodness of fit test, the test statistics is
(𝑂−𝐸)2
𝑋2 = 𝛴
𝐸
Where O stands for the observed frequency, while E represents the expected
frequency under H0. By this, the expected frequency is thus computed through a
theoretical distribution based on H0. If the observed frequencies differ considerably
from the expected frequencies, X2 value will be large, and the fit poor. A good fit (small
X2 value) leads to the rejection of H0. Therefore, the critical region of H0 will fall in the
right tail of the chi-square distribution. Thus for α level of significance reject H0 if X2
>𝑋𝛼2 . Underlying the above, each expected frequency must be at least 5. Such
frequencies that are less than 5 should be combined with adjacent cells, resulting in
reduction of number of degrees of freedom. The number of degrees of freedom
associated with the chi-square distribution on goodness of fit is equal to the number
of cells minus the number of quantities (or parameters) obtained from the observed
data, which are used in the calculation of the expected frequency.
35
Example 7.1
A die was thrown 120 times and the following frequency distribution was obtained.
We wish to test at α = 0.05, whether or not the die is biased.
Face 1 2 3 4 5 6
Frequency 15 12 20 18 30 25
Solution
H0: The die is not biased. From H0 the theoretical distribution implies equal frequency
should be expected. Thus we have:
Face 1 2 3 4 5 6
Obs. Freq. 15 12 20 18 30 25
Exp. Freq. 20 20 20 20 20 20
𝑂−𝐸 -5 -8 0 -2 10 5
(𝑂 − 𝐸 ) 2 25 64 0 4 100 25
(O-E)2/E 1.25 3.2 0 0.2 5 1.25
2
∑ (𝑂 − 𝐸 ) 2
𝑋 = = 1.25 + 3.2 + 0 + 0.02 + 5 + 1.25 = 10.9
𝐸
There are 6 cells and 1 restriction (total frequency)
... degree of freedom = 6 – 1 = 5 , 𝑋𝛼2 = 11.070
2
Since X2 <𝑋0.05 (5) , we do not reject H0, and we conclude that the die is not biased.
Example 7.2
No. of faults 0 1 2 3 4 5 6 7
No. of pieces 28 25 12 8 6 2 1 0
The number of minor faults in a steel plate produced by a machine were observed as
above. Under normal conditions, the expected distribution of faults based on two
restrictions are as follows:
No. of faults 0 1 2 3 4 5 6 7
No of pieces 26 22 15 8 5 3 2 1
Using an appropriate test, say whether the two distributions are same at 5% level of
significance.
36
Solution:
Faults 0 1 2 3 4 5 or more
E 26 22 15 8 5 6
O 28 25 12 8 6 3
(𝑂 − 𝐸 ) 2 4 9 9 0 1 9
(O-E)2 / E 0.1538 0.4091 0.6 0 0.2 1.5
Example 7.3
37
Test whether the sampling distribution given below can be considered as binomial at
5% level of significance.
Score 1 2 3 4 5 6
Frequency 12 15 10 14 5 4
Solution:
2.95
∑ƒ = 60, ∑ 𝑓𝑋 = 177, 𝑋̅ = 2.95, n = 6, 𝑝 = = 0.4917
6
Score X 1 2 3 4 5 or 6
Obs Frequency 12 15 10 14 9
Exp Frequency 6.01 14.53 18.73 13.59 6.11
( O – E )2 35.88 0.2209 76.2129 0.1681 8.3521
(O – E )2 / E 5.970 0.0152 4.069 0.0124 1.367
∑(𝑂−𝐸)2
Hence, 𝑋 2 = = 11.434.
𝐸
2
Degree of freedom is 5 – 2 = 3 and Critical value is 𝑋0.05 (3) = 7.815
2
Comparing the test statistic X2 and the critical value 𝑋0.05 (5), we conclude that the
2 ( )
test is significant since X2>𝑋0.05 5 . Hence, we reject the hypothesis that the given
sampling distribution is binomial.
38
NOTE:
The expected frequencies were obtained as follows:
6
Ef(1) = N × P(X = 1) = 60 x ( ) 𝑝1 (1 − 𝑝)5 = 6.01,
1
6
Ef(2) = N × P(X = 2) = 60 x ( ) 𝑝2 (1 − 𝑝)4 = 14.53
2
6
Ef(3) = N × P(X = 3) = 60 x ( ) 𝑝3 (1 − 𝑝)3 = 18.73
3
6
Ef(4) = N × P(X = 4) = 60 x ( ) 𝑝4 (1 − 𝑝)2 = 13.59
4
6
Ef(5) = N × P(X = 5) = 60 x ( ) 𝑝5 (1 − 𝑝)1 = 5.26
5
6
Ef(6) = N × P(X = 6) = 60 x ( ) 𝑝6 (1 − 𝑝)0 = 0.85
6
Example 7.4
Four coins were tossed 200 times with the following results.
No. of Heads 0 1 2 3 4
No. of Times 9 42 73 61 15
Using X2 goodness of fit test at 5% level of significance, decide whether the coins were
biased.
Solution:
H0: The coins are unbiased . H0: The coins are not unbiased
Hence under H0 p(H) = ½. This means that there is only one restriction, and it is on
the total frequency.
Expected frequency for:
4
X = 0 is 200 ( ) (1/2)0 (1/2)4 = 12.5
0
4
X = 1 is 200 ( ) (1/2)1 (1/2)3 = 50
1
39
4
X = 2 is 200 ( ) (1/2)2 (1/2)2 = 75
2
4
X = 3 is 200 ( ) (1/2)3 (1/2)1 = 50
3
4
X = 4 is 200 ( ) (1/2)4 (1/2)0 = 12.5
4
Thus we have
No. of heads 0 1 2 3 4
Observe freq 9 42 73 61 15
Expect. Freq 12.5 50 75 50 12.5
(O-E)2/E 0.98 1.28 0.053 2.42 0.5
2 ( )
X2 = 5.233, df = 5 – 1 = 4, 𝑋0.05 4 = 9.49
We conclude that the test is not significant at 5% and that the binomial distribution
with p = ½, gives a good fit to the given sampling distribution. Hence, the coins are
unbiased.
Exercise
Test whether the sampling distribution given below can be considered to be binomial
at either 5% or 10% level of significance.
X 0 1 2 3 4 5
Frequency 1 6 14 33 31 15
The parameter 𝜇 and total frequency are required for the generation of the theoretical
distribution. The parameter µ is usually estimated by the mean of the given
distribution. Hence, there are usually two restrictions on the choice of the expected
frequency values. Thus the degree of freedom for this test is n-2.
40
Example 7.5
Test whether a good fit is given by a poison distribution to the following frequency
distribution at 5% level of significance.
X 0 1 2 3 4 5
F 19 26 27 13 11 4
Solution:
H0: the distribution is Poisson. H0: the distribution is not Poisson.
𝑋̅ = 183/100 = 1.83, hence, µ = 1.83
Using the formula above, we compute the expected frequencies as follows:
Since the last expected frequency is less than 5, we combine the last two cells to have
the table below.
X 0 1 2 3 4 or 5
𝑂 19 26 27 13 15
E 16.04 29.36 26.86 16.38 10.24
(O-E)2/E 0.546 0.385 0.0007 0.697 2.213
2
X2 = 3.496, degree of freedom is 3, 𝑋0.05 (3) = 7.81
2
We do not reject H0 since X2<𝑋0.05 (3). We thus conclude that the Poisson distribution
gives a good fit to the sampling distribution above.
41
In testing for normality of a given sampling distribution, two sample parameters 𝑋̅
and S, along with the total frequency will be needed. This implies that there will be
three restirctions on the choice of the expected values. The procedure is as follows:
1
(a) Compute 𝑋̅ and S from the given sample [ 𝑆 = √ ∑(𝑋 − 𝑋̅ )2 ]
𝑛
(b) Use 𝑋̅ and S as estimates µ and σ, and the total to set up a theoretical normal
distribution.
(c) Compute the observed and expected frequencies as usual, using the X2 – test
statistic with three restrictions. Whenever µ and σ are known for the
theoretical distribution, then the estimates in (a) and (b) above will not be
needed.
Example 7.6
To test whether a good fit is provided by the normal distribution to the following
frequency distribution.
Solution:
The first thing is to find 𝑋̅ and S as estimates of µ and σ.
X f f.X di fdi2
12 3 36 -13.58 553.2492
17 7 119 -8.58 515.3148
22 15 330 -3.58 192.2460
27 20 540 1.42 40.3280
32 9 288 6.42 370.9476
37 6 222 11.42 782.4984
60 1535 2454.5840
1535 2454.5840
𝑋̅ = = 25.58, 𝑆 = = 6.45 .
60 59
42
Next we standardize the upper bounds as follows:
14.5−25.58 19.5 − 25.58
𝑧1 = = −1.718,𝑧2 = = −0.943 ,
6.45 6.45
24.5−25.58 29.5− 25.58
𝑧3 = = −0.167, 𝑧4 = = 0.608,
6.45 6.45
34.5 − 25.58
𝑧5 = = 1.383, 𝑧5 = ∞.
6.45
We combine the first and second cells to bring the expected frequency level to at least
5. Thus, we now have
Class 10-19 20 – 24 25 - 29 30 - 34 35- 39
Obs freq 10 15 20 9 6
Exp freq. 10.41 15.53 17.80 11.23 5.03
(O-E)2 / E 0.016 0.018 0.272 0.443 0.187
X2 = 0.936, degree of freedom = 5 - 3 = 2. At 5% level of significance, X20.05(2) =
5.991. Hence, the test is not significant, so we conclude that the given sampling
distribution appears normal.
43
Factor A
A1 A2 An Totals
B1 O11 O12 …. O1n R1
B2 O21 O22 …. O2n R2
Factor B
…. …. …. …. …. …
Bm Om1 Om1 …. Omn Rm
Totals C1 C2 …. Cn T
Where A1, A2, … ,An and B1, B2, …, Bm are the respective levels of factors A and B.
Deciding on the level of a factor depends on what the researcher considers important.
Oij denotes the number of observations that possess the attributes B i and Aj
simultaneously, Ri and Cj are the row and column totals respectively, while T is the
grand total of the observations.
To each entry Oijof the contingency table we can compute the expected frequency
thus:
𝑅𝑖 𝑥 𝑐𝑗
𝐸𝑖𝑗 =
𝑇
∑ ∑(𝑂−𝐸)2
Test statistic remains X2 = , the decision rule is that at α level of significance,
𝐸
reject H0 if X2 >𝑋𝛼2 {(m - 1)(n - 1)}.
Example 7.7
A group of students were tested first in Language, and then in mathematics. The
results were graded into three categories A, B, C, for each test. A summary of the
distribution of performance is given below:
Language
A B C TOTALS
A 55 72 12
Maths
B 48 162 38
C 14 42 85
44
First, we compute the row totals, column totals and grand total. Then, under H0, we
draw up a table of expected frequency using
𝑇𝑖. × 𝑇.𝑗
𝐸𝑖𝑗 = to get the following table of expected frequency:
𝑇..
Language
A B C TOTALS
A 30.8 72.7 35.5 139
Maths
2
(55 − 30.8)2 (72 − 72.7)2 (12 − 35.5)2 (48 − 55)2 (162 − 129.6)2
𝑋 = + + + +
30.8 72.7 35.5 55 129.6
(38 − 63.4)2 (14 − 31.2)2 (42 − 73.7)2 (85 − 36.1)2
+ + + +
63.4 31.2 73.7 36.1
= 19.01 + 0.01 + 15.56 + 0.89 + 8.1 + 10.18 + 9.48 + 13.63 + 66.24
= 143.10 . Degree of freedom if 2 x 2 = 4
2
But 𝑋0.05 (4) = 9.49
The result is highly significant, and hence we reject H0 .
CHAPTER FOUR
CORRELATION ANALYSIS
45
Two variables are said to be positively correlated if they tend to increase or decrease
together in the same direction, and for this type 0 < r ≤ 1. Two variables X and Y
are said to be negatively correlated if X and Y change in opposite direction; that is,
when X increases, Y decreases. Here the value of r will satisfy -1 < r <0 . Two variables
are said to be un-correlated when they tend to change with no definite pattern
regarding each other. This is also called zero correlation. Here r = 0. We present
here a summary of the Properties and the interpretation of the value of a Correlation
Coefficient, r.
The correlation coefficient usually denoted by ‘ 𝑟 ’ satisfies the following conditions:
46
We will consider two common methods for calculating the coefficient of correlation.
These are:
(a) The Pearson’s Product Moment Correlation Coefficient and
(b) The Spearman’s Rank Correlation Coefficient.
To derive r,
Let the variables be X and Y. Find their respective means 𝑋̅ 𝑎𝑛𝑑 𝑌̅. Obtain the mean
deviation of each Xi and Yi
Let dXi = Xi - 𝑋̅ and dYi = Yi - 𝑌̅
47
𝑇ℎ𝑒𝑛 ∑(𝑋𝑖 − 𝑌̅)(𝑌𝑖 − 𝑌̅) = ∑𝑑𝑋𝑖 𝑑𝑌𝑖 … …
∑𝑑𝑋𝑖 𝑑𝑌𝑖
∴ = 𝑆𝑥𝑦
𝑛
This is nthe covariance between X and Y. Now, divide 𝑆𝑥𝑦 by the standard deviations
of X and Y, the result is the sample correlation coefficient r, given by
∑𝑑𝑋𝑖 𝑑𝑌𝑖 𝑆𝑥𝑦
𝑟 = =
𝑛𝑆𝑥 𝑆𝑦 𝑆𝑥 𝑆𝑦
where
∑ 𝑋𝑌 − 𝑛 𝑋̅ 𝑌̅
𝑟=
√[∑𝑋 2 − 𝑛 𝑋̅ 2 ][∑𝑌 2 − 𝑛 𝑌̅ 2 ]
∑(𝑋 − 𝑋̅ )(𝑌 − 𝑌̅)
𝑜𝑟 𝑟=
√∑ (𝑋 − 𝑋̅ )2 ∑ (𝑌 − 𝑌̅)2
Example 4.7
Calculate the Pearson’s Product Moment correlation coefficient for the data below:
X 2 3 6 4 7
Y 9 6 8 2 5
Solution:
X Y XY X2 Y2
2 9 18 4 81
3 6 18 9 36
48
6 8 48 36 64
4 2 8 16 4
7 5 35 49 25
22 30 127 114 210
5 × 127 − 22 × 30 −25
𝑟 = = = −0.2201
√[5 × 114 − 222 ][5 × 210 − 302 ] √86 × 150
Or
127 − 5 × 4.4 × 6 −5
𝑟= = = −0.2201
√[114 − 5 × 4.42 ][210 − 5 × 62 ] √17.2 × 30
Example 4.8
Use the Pearson’s product moment method to obtain the linear correlation coefficient
between price (X) and quantity (Y).
Time (n) 1 2 3 4 5 6 7 8 9 10
Quantity(Y) 10 20 50 40 50 60 80 90 90 120
Price(X) 2 4 6 8 10 12 14 16 18 20
Solution
N Y X XY X2 Y2
1 10 2 20 4 100
2 20 4 80 16 400
3 50 6 300 36 2500
4 40 8 320 64 1600
5 50 10 500 100 2500
6 60 12 720 144 3600
7 80 14 1120 196 6400
8 90 16 1440 256 8100
9 90 18 1620 324 8100
10 120 20 2400 400 14400
Total 610 110 8520 1540 47700
Now
𝑛∑𝑋𝑌 – (∑𝑋)( ∑𝑌)
𝑟=
√{𝑛∑𝑋 2 − (∑𝑋)2 } {𝑛∑𝑦 2 − (∑𝑋)2 }
From the table above,
49
10(8520)−(110)(610)
𝑟=
√{(10)(1540)−(110)2 }{(10)(47700)−(610)2
18100
= = 0.97282
√3300 × 104900
Hence , r = 0.973, which shows a very strong positive correlation between
quantity and price.
Using
∑(𝑋 − 𝑋̅ )(𝑌 − 𝑌̅) 1810
𝑟= = = 0.97282
√∑ (𝑋 − 𝑋̅ )2 ∑ (𝑌 − 𝑌̅)2 √330 × 10490
50
are paired observations, rank each variable independent of the other and find the
difference in ranks of each pair, then square these difference in rank and use the
formula
6∑𝑑𝑖2
𝑟 =1−
𝑛(𝑛2 − 1)
Where n is the number of pairs of observations, di is the difference in ranks of the ith
pair of observation, and r is the coefficient of correlation.
In using the rank correlation coefficient, observations are ranked and the ranks used
in the computation instead of the actual observations. The observations are ranked
in a specific sequence e.g. in ascending or descending order of attributes.
and
𝑛 𝑛
𝑛(𝑛 + 1)(2𝑛 + 1)
∑ 𝑅𝑖2 = ∑ 𝑇𝑖2 =
6
𝑖=1 𝑖=1
Now
∑(𝑅𝑖 − 𝑅̅ )2 = ∑ 𝑅𝑖2 − 𝑛𝑅̅2
𝑛(𝑛+1)(2𝑛+1) 𝑛(𝑛+1)2 𝑛(𝑛2 −1)
= − =
6 4 12
𝑛(𝑛2 −1)
Similarly, ∑(𝑇𝑖 − 𝑇̅)2 = ∑ 𝑇𝑖2 − 𝑛𝑇 2 =
12
If the difference in rank for the ith pair of observations is denoted by di, then
51
𝑑𝑖 = 𝑅𝑖 − 𝑇𝑖 = (𝑅𝑖 − 𝑅̅) − (𝑇𝑖 − 𝑇̅)
𝑑𝑖2 = (𝑅𝑖 − 𝑅̅)2 − 2 (𝑅𝑖 − 𝑅̅ )(𝑇𝑖 − 𝑇̅) + (𝑇𝑖 − 𝑇̅)2
∑𝑑𝑖2 = ∑(𝑅𝑖 − 𝑅̅ )2 − 2 ∑(𝑅𝑖 − 𝑅̅) (𝑇𝑖 − 𝑇̅) + ∑(𝑇𝑖 − 𝑇̅)2
𝑛(𝑛2 −1) 𝑛(𝑛2 −1)
∑𝑑𝑖2 = − 2∑(𝑅𝑖 − 𝑅̅)(𝑇𝑖 − 𝑇̅) +
12 12
2𝑛(𝑛2 −1)
∑𝑑𝑖2 = − 2∑(𝑅𝑖 − 𝑅̅)(𝑇𝑖 − 𝑇̅)
12
𝑛 ( 𝑛 2 − 1) 1 2
∑(𝑅𝑖 − 𝑅̅)(𝑇𝑖 − 𝑇̅) = − ∑𝑑𝑖
12 2
But
𝑆𝑋𝑌
𝑟 =
𝑆𝑋 𝑆𝑌
6∑𝑑𝑖2
∴𝑟 =1−
𝑛(𝑛2 −1)
Example 4.9
Calculate the Spearman’s rank correlation coefficient for the data below.
X 2 3 6 4 7
Y 9 6 8 2 5
Solution
52
X Y 𝑅𝑖 𝑇𝑖 𝑑𝑖 𝑑𝑖2
2 9 5 1 4 16
3 6 4 3 1 1
6 8 2 2 0 0
4 2 3 5 -2 4
7 5 1 4 -3 9
30
6∑𝑑 2
𝑛 = 5, ∑𝑑𝑖2 = 30, ℎ𝑒𝑛𝑐𝑒 𝑢𝑠𝑖𝑛𝑔 𝑟 = 1 −
𝑛 (𝑛 2 − 1)
6 × 30
𝑤𝑒 𝑔𝑒𝑡 𝑟 = 1− = −0.5
5 × 24
This implies a strong negative correlation between X and Y
Example 4.10
There are ten finalists in a competition for which there are two judges X and Y. The
Final scores by the judges are as follows:
Competitor A B C D E F G H I J
Judge X 31 20 55 30 60 38 37 24 27 41
Judge Y 50 30 28 20 36 52 26 38 47 47
Calculate the rank correlation coefficient between the scores awarded by the judges.
Solution:
Rank the observation x and y respectively to get the following table.
Competitor X Rank X Y Rank Y 𝑑𝑖 𝑑𝑖2
A 31 6 50 2 4 16
B 20 10 30 7 3 9
C 55 2 28 8 -6 36
D 30 7 20 10 -3 9
E 60 1 36 6 -5 25
F 38 4 52 1 3 9
G 37 5 26 9 -4 16
H 24 9 38 5 4 16
I 27 8 47 3.5 4.5 20.25
J 41 3 47 3.5 -0.5 0.25
53
156.5
6∑𝑑2
Now, n = 10 and 𝑟 =1−
𝑛 (𝑛2 −1)
6 𝑥 156.5
𝑟 =1−
10 (102 −1)
972
𝑟 =1− = 0.05152
990
There is very low correlation between the scores of the judges.
Exercise:
The scores of ten students in a class cumulative test and their examination scores are
as in the table below. Calculate the coefficient of correlation using both the Pearson’s
Product moment and the Spearman’s Rank Correlation method. In each case
comment on your result.
Student A B C D E F G H I J
Cumulative x 12 14 8 8 7 6 4 5 8 2
Cumulative y 73 65 55 60 50 49 50 48 51 30
Example 4.11
𝑋 3 1 2 7 4 8
𝑌 4 5 2 9 6 7
Find the Spearman’s Rank Correlation Coefficient for the data above.
𝑛 = 6, ∑𝒅𝟐𝒊 = 8
6×8
𝑟 =1− = 0.7714
6(62 − 1)
∴ 𝑟 = 0.77which implies that there is strong positive correlation between 𝑋 and 𝑌.
Example 4.12
𝑋 2 6 4 8 5 10
𝑌 20 11 18 6 14 5
Find the Spearman’s Rank Correlation Coefficient using the data above
6×70
𝑟 =1− = −1, there is perfect negative correlation between 𝑋 and 𝑌.
6(62 −1)
Example 4.13
54
𝑋 3 1 6 3 5 4 9 7 8 5
𝑌 9 10 2 8 6 5 3 8 4 8
Find the Spearman’s Rank Correlation Coefficient
6 × 283.0 6 × 283
𝑟 =1− = 1 − = −0.715
10(102 − 1) 990
There is strong negative correlation between 𝑋 and 𝑌.
Example 4.14
𝑋 2 3 4 5 6 8
𝑌 11 8 9 5 6 3
To Find the Spearman’s Rank Correlation Coefficient
6 × 66 396
𝑟 =1− = 1 − = −𝑜. 8857
6(62 − 1) 210
There is strong negative correlation between 𝑋 and 𝑌.
Example 4.15
𝑋 1 3 4 6 8 9 11 14
𝑌 1 2 4 4 5 7 8 9
Find the Spearman’s Rank Correlation Coefficient using the data above.
6 × 0.5
𝑛 = 6, ∑𝑑 2 = 0.5, 𝑟 = 1 − = 1 − 0.005952
8(82 − 1)
𝐻𝑒𝑛𝑐𝑒, 𝑟 = 0.994, there is strong positive correlation between 𝑋 and 𝑌.
Example 4.16
Compute the correlation coefficient for the data below using the Pearson’s formula.
𝑋 3 5 6 8 9
𝑌 2 3 6 5 4
55
Solution
𝑋 𝑌 𝑋𝑌 𝑋2 𝑌2
3 2 6 9 4
5 3 15 25 9
6 6 36 36 36
8 5 40 64 25
9 4 36 81 16
𝟑𝟏 𝟐𝟎 𝟏𝟑𝟑 𝟐𝟏𝟓 𝟗𝟎
5 × 133 − 31 × 20 45
𝑟= 2 2
= = 0.596
[5 × 215 − (31) ][5 × 90 − (20) ] √114 × 50
This implies a strong positive correlation.
Example 4.17
Compute the correlation coefficient for the data below using both Spearman’s
and Pearson’s methods.
𝑋 1 3 5 2 5
𝑌 4 8 7 6 10
Solution
𝑿 𝒀 𝑹𝑿 𝑹𝒀 𝒅𝟐𝒊 𝑿𝒀 𝑿𝟐 𝒀𝟐
1 4 5 5 0 4 1 16
3 8 3 2 1 24 9 64
5 7 1.5 3 2.25 35 25 49
2 6 4 4 0 12 4 36
5 10 1.5 1 0.25 50 25 100
𝟏𝟔 𝟑𝟓 𝟑. 𝟓0 𝟏𝟐𝟓 𝟔𝟒 𝟐𝟔𝟓
Spearman’s Method
6 ∑ 𝒅𝟐𝒊
𝑟 =1−
𝑛 (𝑛 2 − 1)
6 × 3.50
𝑟 =1− = 0.825
5(25 − 1)
Pearson’s Method
56
𝑛 ∑ 𝑋𝑌 − (∑ 𝑋 )(∑ 𝑌)
𝑟=
√[𝑛 ∑ 𝑋 2 − (∑ 𝑋 )2 ][𝑛 ∑ 𝑌 2 − (∑ 𝑌)2 ]
5 × 125 − 16 × 35
𝑟=
√(5 × 64 − 162 )(5 × 265 − 352 )
65
𝑟= = 0.8125
√64 × 100
NOTE
Spearman’s correlation coefficient and Pearson’s correlation coefficient do not give
exactly the same results, except when r = +1 or -1. Spearman’s method is an
approximation and a quick guess. It belongs to the class of measures called non
parametric statistics. It is calculated from ranks instead of the actual observations.
Pearson’s method is more reliable because it is obtained from the actual observations.
CHAPTER FIVE
SIMPLE LINEAR REGRESSION
CHAPTER SIX
ONE – WAY ANALYSIS OF VARIANCE
57