Statistical Inference Principles Guide
Statistical Inference Principles Guide
Lecture Notes
Course Outline
1. Introduction
2. Point Estimation
3. Interval Estimation
4. Test of Hypothesis
5. Inference on Proportions
6. Inference on Means
7. Analysis of Variance
8. Categorical Data Analysis
9. Regression and Correlation
10. Non-Parametric Inference
Books
1. H J Larsons. Introduction to Probability Theory and Statistical Inference. 3rd ed.
Wiley, 1982
2. I Miller & M Miller. John E Freud’s. Mathematical Statistics with Application. 7th ed.
Pearson’s Education, Prentice Hall, New Jersey, 2003.
3. Mik Winsniewski. Quantitative Methods for Decision Makers. 4th E.d. Prentice Hall
2006
CHAPTER ONE
INTRODUCTION
A statistic is any quantity whose value can be calculated from sample data. Given a
sample of 𝑛 observations from a population, one can compute estimates of the
population mean, median, standard deviation, and various other population
characteristics (parameters).
The most commonly used sample statistics and the corresponding population
parameters are shown in the following table.
Parameter Population Sample
Proportion 𝑋 𝑥
𝑃= 𝑃̂ =
𝑁 𝑛
𝑛
Mean 1
𝑁
1
𝜇 = ∑ 𝑋𝑖 𝑥̅ = ∑ 𝑥𝑖
𝑁 𝑛
𝑖=1 𝑖=1
Standard deviation 𝑁 𝑛
1 1
𝜎 = √ ∑(𝑋𝑖 − 𝜇)2 𝑠=√ ∑(𝑥𝑖 − 𝑥̅ )2
𝑁 𝑛−1
𝑖=1 𝑖=1
Since a statistic varies from one sample to the other it can be regarded as a random
variable. Prior to obtaining data, there is uncertainty as to which of all possible samples
will occur. Because of this, estimates such as the sample mean 𝑥̅ and sample standard
deviation 𝑠 will vary from one sample to another. The behaviour of such estimates in
repeated sampling is described by sampling distributions. Any particular sampling
distribution will give an indication of how close the estimate is likely to be to the value
of the parameter being estimated.
The probability distribution of any particular statistic depends not only on the
population distribution (normal, uniform, etc.) and the sample size n but also on the
method of sampling.
2
CHAPTER TWO
POINT ESTIMATION
Example 2.1
A random sample of 𝑛 = 3 batteries might yield observed lifetimes (hours) 𝑥1 =
5.0, 𝑥2 = 6.4, 𝑥3 = 5.9. The computed value of the sample mean lifetime is 𝑥̅ = 5.77 ,
and it is reasonable to regard 5.77 as a very plausible and “best guess” for the value of µ
based on the available sample information
3
Example 2.2
When 𝑋 is a binomial random variable with parameters n and p, Show that the sample
𝑥
proportion 𝑝̂ = 𝑛 is an unbiased estimator of 𝑝.
Solution
𝑥 1 1
𝐸(𝑝̂ ) = 𝐸 (𝑛) = 𝑛 𝐸(𝑥) = 𝑛 . 𝑛𝑝 = 𝑝.
Since 𝐸(𝑝̂ ) = 𝑝, 𝑝̂ is an unbiased estimator of p
Exercise 2.1
1
Show that the sample variance 𝑠 2 = 𝑛−1 ∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2 is an unbiased estimator of 𝜎 2
The standard error of an estimator 𝜃̂ is its standard deviation 𝜎𝜃̂ = √𝑉𝑎𝑟( 𝜃̂)
If the standard error itself involves unknown parameters whose values can be
estimated, substitution of these estimates into 𝜎𝜃̂ yields the estimated standard error
(estimated standard deviation) of the estimator. The estimated standard error is
denoted 𝑠𝜃̂
Example 2.3
1
Find the standard error for the sample mean, 𝑥̅ = 𝑛 ∑𝑛𝑖=1 𝑥𝑖
Solution
𝑛
1
𝑥̅ = ∑ 𝑥𝑖
𝑛
𝑖=1
𝑛 𝑛
1 1 𝑛𝜎 2 𝜎 2
𝑉𝑎𝑟(𝑥̅ ) = 𝑉𝑎𝑟 ( ∑ 𝑥𝑖 ) = 2 ∑ 𝑉𝑎𝑟(𝑥𝑖 ) = 2 =
𝑛 𝑛 𝑛 𝑛
𝑖=1 𝑖=1
𝜎2 𝜎
𝜎𝑥̅ = √ =
𝑛 √𝑛
𝜎
Therefore the standard error for the sample mean is
√𝑛
4
2.5 The Bootstrap (*Optional)
The form of the estimator 𝜃̂ may be sufficiently complicated so that standard statistical
theory cannot be applied to obtain an expression for 𝜎𝜃̂ .
In recent years, a new computer-intensive method called the bootstrap has been
introduced to address this problem. Suppose that the population pdf is f (x;θ), a
member of a particular parametric family, and that data x1, x2, . . .,xn gives 𝜃̂ = 21.7.
We now use the computer to obtain “bootstrap samples” from the pdf f (x; 21.7), and for
each sample we calculate a “bootstrap estimate” 𝜃̂ ∗ :
First bootstrap sample: 𝑥1∗ , 𝑥2∗ , … , 𝑥𝑛∗ ; estimate 𝜃̂1∗
Second bootstrap sample: 𝑥1∗ , 𝑥2∗ , … , 𝑥𝑛∗ ; estimate = 𝜃̂2∗
.
.
.
𝐵 − 𝑡ℎ bootstrap sample: 𝑥1∗ , 𝑥2∗ , … , 𝑥𝑛∗ ; estimate = 𝜃̂𝐵∗
1
B= 100 or 200 is often used. Now let 𝜃̅ ∗ = 𝐵 ∑𝐵𝑖=1 𝜃̂𝑖∗ the sample mean of the bootstrap
estimates. The bootstrap estimate of the standard error of 𝜃̂ is now just the sample
standard deviation of the 𝜃̂𝑖∗ s:
𝐵
1 2
𝑆 𝜃̂ = √ ∑( 𝜃̂𝑖∗ − 𝜃̅ ∗ )
𝐵−1
𝑖=1
Example 2.4
Let 𝑥1 , 𝑥2 , … , 𝑥𝑛 represent a random sample of service times of n customers at a certain
facility, where the underlying distribution is assumed exponential with parameter λ.
Since there is only one parameter to be estimated, the estimator is obtained by equating
5
𝐸(𝑋) to 𝑥̅ . Since 𝐸(𝑋) = 1/𝜆 for an exponential distribution, this gives 1/𝜆 = 𝑥̅ or
𝜆 = 1/ 𝑥̅ . The moment estimator of 𝜆 is then ̂𝜆 = 1/𝑥̅
Example 2.5
Let 𝑥1 , 𝑥2 , … , 𝑥𝑛 be a random sample from a gamma distribution with parameters 𝛼 and
𝛽 having 𝐸(𝑋) = 𝛼𝛽 and 𝑉𝑎𝑟(𝑋) = 𝛼𝛽 2 .
a) Obtain the moment estimators of 𝛼 and 𝛽
b) The data below shows the survival time 𝑋 in weeks of a randomly selected male
mouse exposed to 240 rads of gamma radiation. Assuming it has a gamma
distribution, compute the estimates of 𝛼 and 𝛽.
152 115 109 94 88 137 152 77 160 165
125 40 128 123 136 101 62 153 83 69
Solution
a)
𝑉𝑎𝑟(𝑋) = 𝐸(𝑋 2 ) − [𝐸(𝑋)]2
𝐸(𝑋 2 ) = 𝑉𝑎𝑟(𝑋) + [𝐸(𝑋)]2
In this case
𝐸(𝑋 2 ) = αβ2 + 𝛼 2 𝛽 2
Equating the sample and population moments:
𝐸(𝑋) = 𝑥̅
𝛼𝛽 = 𝑥̅ (i)
𝑛
1
𝐸(𝑋 2 ) = ∑ 𝑥𝑖2
𝑛
𝑖=1
1
2
αβ + 𝛼 𝛽 = 𝑛 ∑𝑛𝑖=1 𝑥𝑖2
2 2
(ii)
From (i) 𝛼𝛽 = 𝑥̅ . Substituting this in (ii)
𝑛
1
β𝑥̅ + 𝑥̅ = ∑ 𝑥𝑖2
2
𝑛
𝑖=1
1 𝑛 2
∑ 𝑥 − 𝑥̅ 2
̂β = 𝑛 𝑖=1 𝑖
𝑥̅
𝛼 = 𝑥̅ /𝛽
𝑥̅ 2
̂=
α
1 𝑛 2 2
∑
𝑛 𝑖=1 𝑥𝑖 − 𝑥̅
b)
From the data given
2269
𝑥̅ = = 113.45
20
𝑛
1 1522 + 1152 + ⋯ + 692 281755
∑ 𝑥𝑖2 = = = 14087.75
𝑛 20 20
𝑖=1
6
113.452 12870.9025 12870.9025
̂=
α = = = 10.577
14087.75 − 113.452 14087.75 − 12870.9025 1216.8475
Note that Since 𝑙𝑛[𝑔(𝑥)] is a monotonic function of 𝑔(𝑥), finding 𝑥 to maximize 𝑙𝑛[𝑔(𝑥)]
is equivalent to maximizing 𝑔(𝑥) itself. In statistics, taking the logarithm frequently
changes a product to a sum, which is easier to work with.
Example 2.6
Suppose 𝑥1 , 𝑥2 , … , 𝑥𝑛 is a random sample from an exponential distribution with
parameter 𝜆. Because of independence, the likelihood function is a product of the
individual pdf’s:
𝐿 = 𝑓(𝑥1 , 𝑥2 , … , 𝑥𝑛 ; 𝜆) = (𝜆𝑒 −𝜆𝑥1 )(𝜆𝑒 −𝜆𝑥2 ) … (𝜆𝑒 −𝜆𝑥𝑛 )
The 𝑙𝑛(𝑙𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑) is
ln 𝐿 = ln 𝑓(𝑥1 , 𝑥2 , … , 𝑥𝑛 ; 𝜆) = 𝑙𝑛 (𝜆𝑒 −𝜆𝑥1 ) + 𝑙𝑛(𝜆𝑒 −𝜆𝑥2 ) + ⋯ + 𝑙𝑛(𝜆𝑒 −𝜆𝑥𝑛 )
ln 𝐿 = 𝑛 ln 𝜆 − 𝜆 ∑ 𝑥𝑖
𝑖=1
d ln 𝐿
To maximize ln 𝐿 with respect to 𝜆, solve =0
𝑑𝜆
𝑛
d ln 𝐿 𝑛
= − ∑ 𝑥𝑖 = 0
𝑑𝜆 𝜆
𝑖=1
𝑛 1
𝜆̂ = 𝑛 =
∑𝑖=1 𝑥𝑖 𝑥̅
1
Thus the 𝑚𝑙𝑒 is 𝜆̂ = 𝑥̅ which is identical to the method of moment’s estimator.
7
Example 2.7
Let 𝑥1 , 𝑥2 , … , 𝑥𝑛 be a random sample from a normal distribution. The likelihood function
is
𝐿 = 𝑓(𝑥1 , 𝑥2 , … , 𝑥𝑛 ; 𝜇, 𝜎 2 )
1 2 2 1 2 2 1 2 2
=( 𝑒 −(𝑥1 −𝜇) /2𝜎 ) ( 𝑒 −(𝑥2 −𝜇) /2𝜎 ) … ( 𝑒 −(𝑥𝑛−𝜇) /2𝜎 )
√2𝜋𝜎 2 √2𝜋𝜎 2 √2𝜋𝜎 2
𝑛
𝑛 1
ln 𝐿 = − ln(2𝜋𝜎 2 ) − 2 ∑(𝑥𝑖 − 𝜇)2
2 2𝜎
𝑖=1
To find the maximizing values of µ and 𝜎 2 , we must take the partial derivatives of ln 𝐿
with respect to µ and 𝜎 2 , equate them to zero and solve the resulting two equations. The
resulting 𝑚𝑙𝑒’𝑠 are:
𝜇̂ = 𝑥̅
𝑛
1
𝜎̂ 2 = ∑(𝑥𝑖 − 𝑥̅ )2
𝑛
𝑖=1
The 𝑚𝑙𝑒 of σ2 is not the unbiased estimator, so two different principles of estimation
(unbiasedness and maximum likelihood) yield two different estimators
Exercise 2.2
Let 𝑥1 , 𝑥2 , … , 𝑥𝑛 represent a random sample from a Rayleigh distribution with pdf
𝑥 2 /2𝜃
𝑒 −𝑥
𝑓(𝑥; 𝜃) = { 𝜃 x>0
0 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒
𝜋𝜃 4−𝜋
It can be shown that the mean and variance are respectively √ 2 and 𝜃.
2
A random sample of ten yields the data
16.88, 10.23, 4.59, 6.66, 13.68, 14.23, 19.87, 9.40, 6.51, 10.95
a) Use the method of moments to obtain an estimate of θ and then compute the estimate
for this data.
b) Obtain the maximum likelihood estimator of θ, and then compute the estimate for the
given data.
8
CHAPTER THREE
INTERVAL ESTIMATION
A confidence level of 95% implies that 95% of all samples would give an interval that
includes µ, or whatever other parameter is being estimated, and only 5% of all samples
would yield an erroneous interval.
The most frequently used confidence levels are 95%, 99%, and 90%. The higher the
confidence level, the more strongly we believe that the value of the parameter being
estimated lies within the interval.
Information about the precision of an interval estimate is conveyed by the width of the
interval. If the confidence level is high and the resulting interval is quite narrow, our
knowledge of the value of the parameter is reasonably precise.
A very wide confidence interval, however, gives the message that there is a great deal
of uncertainty concerning the value of what we are estimating
9
Solution
In this case 100(1 − 𝛼) % = 90%, (1 − 𝛼) = 0.9, 𝛼 = 0.1. 𝑧𝛼/2 = 𝑧0.05 = 1.645. The
desired interval is:
13
70.7 ± 1.645. = 70.7 ± 3.381 = (67.319,74.081)
√40
With a 90% degree of confidence, we can say that 67.319<µ74.081
Example 3.2
An algebra placement test was used to determine placement in mathematics courses. A
sample of 50 students gave the following scores. Calculate the 95% confidence interval
of the population mean µ
29 21 23 24 22 24 22 23 15 21 22 17 15 23 17 18 23 18
19 17 14 19 16 22 23 14 19 19 22 16 21 12 28 20 17 24
12 18 18 10 21 22 26 24 14 27 15 24 28 13
Solution
From the data given 𝑛 = 50, 𝑥̅ =19.82 and 𝑠 = 4.50. The 95% confidence interval is then:
4.50
19.82 ± 1.96. = 19.82 ± 1.25 = (18.6,21.1)
√50
Hence 18.6 < µ < 21.1with a 95% confidence level. The interval has a reasonably
narrow width of 2.5 indicating a fairly precise estimation of µ.
10
Compute the 95% confidence interval for the population mean µ.
Solution
From the data given 𝑛 = 16, 𝑥̅ =5.34 and 𝑠 = 0.8483. The 95% confidence interval is then:
𝑠 0.8483
𝑥̅ ± 𝑡0.025,15 . 5.34 ± 2.131. = 5.34 ± 0.45 = (4.89,5.79)
√𝑛 √16
2
(𝑛 − 1)𝑠 2 /𝜒(1−𝛼/2),𝑛−1
A confidence interval for σ has lower and upper limits that are the square roots of the
corresponding limits in the interval for 𝜎 2 .
Example 3.5
Recall the example of the weight of 16, 3 –month old babies. Compute the 95%
confidence interval of the population variance σ2.
Solution
2 2
𝑠 2 = 0.84832 = 0.7196, 𝜒0.025,15 = 27.488, 𝜒0.975,15 = 6.262
11
The 95% confidence interval for σ2 if
15 ∗ 0.7196 15 ∗ 0.7196
( , ) = (0.3927,1.7237)
27.488 6.262
Taking the square root of each endpoint yields (0.6267, 1.3129) as the 95% confidence
interval for σ.
The method substitutes heavy computation for theory, and it has been feasible only
fairly recently with the availability of fast computers.
The bootstrap percentile interval with a confidence level of 100(1-α)% for a specified
parameter is obtained by first generating B bootstrap samples, for each one calculating
the value of some particular statistic that estimates the parameter, and sorting these
values from smallest to largest.
Then we compute k =αB/2 and choose the kth value from each end of the sorted list.
These two values form the confidence limits for the confidence interval. If k is not an
integer, then interpolation can be used, but this is not crucial.
12
CHAPTER FOUR
HYPOTHESIS TESTING
In each problem considered, the question of interest is simplified into two competing
claims / hypotheses between which we have a choice. These are:
a) Null Hypothesis
The null hypothesis H0 represents a theory that has been put forward, either because it
is believed to be true or because it is to be used as a basis for argument, but has not been
proved. For example, in the study of the effects of a new finance policy on the
performance of a company, the null hypothesis might be that the new policy is no
better, on average, than the current policy. We would write
𝐻0 : There is no difference between the two financial policies on average.
b) Alternative Hypothesis
The alternative hypothesis, H1, is a statement of what a statistical hypothesis test is set
up to establish. For example, in the study of the effects of a new finance policy on the
performance of a company, the null alternative hypothesis might be that the new policy
has a different effect on average compared to the current policy. We would write
𝐻1 : The two policies have different effects on average.
The alternative hypothesis might also be that the new policy is better, on average, than
the current one. In this case we would write
𝐻1 : The new policy is better than the current one on average
The final conclusion once the test has been carried out is always given in terms of the
null hypothesis. The two possible conclusions are:
- Reject 𝐻0 in favour of 𝐻1
- Fail to reject 𝐻0
Concluding “Fail to reject H0" does not necessarily mean that the null hypothesis is true,
it only suggests that there is not sufficient evidence against 𝐻0 in favour of 𝐻1 .
Rejecting the null hypothesis then, suggests that the alternative hypothesis is likely to be
true.
13
When making a conclusion in hypothesis testing two types of errors can be made
Type I Error
A type I error occurs when the null hypothesis is rejected when it is in fact true; that is,
𝐻0 is wrongly rejected.
For example, in the study of the effects of a new finance policy on the performance of a
company, the null hypothesis might be that the new policy is no better, on average, than
the current policy. That is:
𝐻0 : There is no difference between the two financial policies on average.
A type I error would occur if we concluded that the two policies produced different
effects when in fact there was no difference between them.
Type II Error
A type II error occurs when the null hypothesis 𝐻0 , is not rejected when it is in fact false.
For example, in the study of the effects of a new finance policy on the performance of a
company, the null hypothesis might be that the new policy is no better, on average, than
the current policy. That is:
𝐻0 : There is no difference between the two financial policies on average.
A type II error would occur if it was concluded that the two policies produced the same
effect, i.e. there is no difference between the two policies on average, when in fact they
produced different ones.
The following table gives a summary of possible results of any hypothesis test:
Decision
Reject 𝑯𝟎 Don't reject 𝑯𝟎
𝑯𝟎 Type I Error Right decision
Truth
𝑯𝟏 Right decision Type II Error
A type I error is often considered to be more serious, and therefore more important to
avoid, than a type II error. The hypothesis test procedure is therefore adjusted so that
there is a guaranteed 'low' probability of rejecting the null hypothesis wrongly; this
probability is never 0. This probability of a type I error can be precisely computed as
𝑃(𝑡𝑦𝑝𝑒 𝐼 𝑒𝑟𝑟𝑜𝑟) = 𝛼
14
If we do not reject the null hypothesis, it may still be false (a type II error) as the sample
may not be big enough to identify the falseness of the null hypothesis (especially if the
truth is very close to hypothesis).
For any given set of data, type I and type II errors are inversely related; the smaller the
risk of one, the higher the risk of the other.
A type I error can also be referred to as an error of the first kind.
Significance Level, 𝜶
The significance level of a statistical hypothesis test is a fixed probability of wrongly
rejecting the null hypothesis H0, if it is in fact true.
It is the probability of a type I error and is set by the investigator in relation to the
consequences of such an error. That is, we want to make the significance level as small
as possible in order to protect the null hypothesis and to prevent, as far as possible, the
investigator from inadvertently making false claims.
The significance level is usually denoted by 𝛼
𝑆𝑖𝑔𝑛𝑖𝑓𝑖𝑐𝑎𝑛𝑐𝑒 𝐿𝑒𝑣𝑒𝑙 = 𝑃 (𝑡𝑦𝑝𝑒 𝐼 𝑒𝑟𝑟𝑜𝑟) = 𝛼
Usually, the significance level is chosen to be 0.05 (or equivalently, 5%).
P-Value
The probability value (p-value) of a statistical hypothesis test is the probability of
getting a value of the test statistic as extreme as or more extreme than that observed by
chance alone, if the null hypothesis H0 is true. It is the probability of wrongly rejecting
the null hypothesis if it is in fact true.
It is equal to the significance level of the test for which we would only just reject the null
hypothesis. The p-value is compared with the actual significance level of our test and, if
it is smaller, the result is significant. That is, if the null hypothesis were to be rejected at
the 5% significance level, this would be reported as "p < 0.05".
Small p-values suggest that the null hypothesis is unlikely to be true. The smaller it is,
the more convincing is the rejection of the null hypothesis. It indicates the strength of
evidence for say, rejecting the null hypothesis H0, rather than simply concluding "Reject
H0' or "Fail to reject H0".
Power
The power of a statistical hypothesis test measures the test's ability to reject the null
hypothesis when it is actually false - that is, to make a correct decision.
15
In other words, the power of a hypothesis test is the probability of not committing
a type II error. It is calculated by subtracting the probability of a type II error from 1,
usually expressed as:
𝑃𝑜𝑤𝑒𝑟 = 1 − 𝑃(𝑡𝑦𝑝𝑒 𝐼𝐼 𝑒𝑟𝑟𝑜𝑟) = 1 − 𝛽
The maximum power a test can have is 1, the minimum is 0. Ideally we want a test to
have high power, close to 1.
Test Statistic
A test statistic is a quantity calculated from our sample of data. Its value is used to
decide whether or not the null hypothesis should be rejected in our hypothesis test.
The choice of a test statistic will depend on the assumed probability model and the
hypotheses under question.
The critical region, or rejection region, is a set of values of the test statistic for which the
null hypothesis is rejected in a hypothesis test. That is, the sample space for the test
statistic is partitioned into two regions; one region (the critical region) will lead us to
reject the null hypothesis H0, the other will not. So, if the observed value of the test
statistic is a member of the critical region, we conclude "Reject H0"; if it is not a member
of the critical region then we conclude "Fail to reject H0".
One-sided Test
A one-sided test is a statistical hypothesis test in which the values for which we can
reject the null hypothesis, 𝐻0 are located entirely in one tail of the probability
distribution.
In other words, the critical region for a one-sided test is the set of values less than the
critical value of the test, or the set of values greater than the critical value of the test.
The choice between a one-sided and a two-sided test is determined by the purpose of
the investigation or prior reasons for using a one-sided test.
Example
Suppose we wanted to test a manufacturer’s claim that there are, on average, 50
matches in a box. We could set up the following hypotheses
16
𝐻0 : µ = 50,
against
𝐻1 : µ < 50 or 𝐻1 : µ > 50
Either of these two alternative hypotheses would lead to a one-sided test. Presumably,
we would want to test the null hypothesis against the first alternative hypothesis since
it would be useful to know if there is likely to be less than 50 matches, on average, in a
box (no one would complain if they get the correct number of matches in a box or
more).
Two-Sided Test
A two-sided test is a statistical hypothesis test in which the values for which we can
reject the null hypothesis, H0 are located in both tails of the probability distribution.
In other words, the critical region for a two-sided test is the set of values less than a first
critical value of the test and the set of values greater than a second critical value of the
test.
Example
Suppose we wanted to test a manufacturers claim that there are, on average, 50 matches
in a box. We could set up the following hypotheses
𝐻0 : µ = 50,
𝐻1 : µ < 50 or 𝐻1 : µ > 50
Either of these two alternative hypotheses would lead to a one-sided test. Presumably,
we would want to test the null hypothesis against the first alternative hypothesis since
it would be useful to know if there is likely to be less than 50 matches, on average, in a
box (no one would complain if they get the correct number of matches in a box or
more).
Yet another alternative hypothesis could be tested against the same null, leading this
time to a two-sided test:
𝐻0 : µ = 50
𝐻1 : µ ≠ 50
Here, nothing specific can be said about the average number of matches in a box; only
that, if we could reject the null hypothesis in our test, we would know that the average
number of matches in a box is likely to be less than or greater than 50.
17
Steps in Conducting Hypothesis Testing
i. Begin by stating the claim or hypothesis that is being tested. Also form a
statement for the case that the hypothesis is false. These are 𝐻0 and 𝐻1 .
ii. Choose the desired significance level 𝛼. The values 0.05 and 0.01 are common
values used for alpha, but any positive number between 0 and 0.50 could be used
for a significance level.
iii. Determine which statistic and distribution to use. The type of distribution is
dictated by features of the data. Common distributions include: 𝑧 score, 𝑡 score
and chi-squared (𝜒 2 ) and 𝑓.
iv. Compute the test statistic and the 𝑝 value for this statistic. Here we will have to
consider if we are conducting a two tailed test (typically when the alternative
hypothesis contains a “is not equal to” symbol, or a one tailed test (typically used
when an inequality is involved in the statement of the alternative hypothesis).
v. If the 𝑝 value is less than the set significance level 𝛼 we must reject the null
hypothesis. The alternative hypothesis stands. If p value is not less 𝛼 then we fail
to reject the null hypothesis. This does not prove that the null hypothesis is true,
but gives a way to quantify how likely it is to be true.
vi. We now state the results of the hypothesis test in such a way that the original
claim is addressed.
18
CHAPTER FIVE
INFERENCE ON PROPORTIONS
If an individual or object with the property is labelled a success (S), then 𝑝 is the
population proportion of successes.
Tests concerning 𝑝 are based on a random sample of size n from the population.
Null hypothesis, 𝐻0 : 𝑝 = 𝑝𝑜
Test statistic:
𝑝̂ − 𝑝𝑜
𝑧=
√𝑝𝑜 (1 − 𝑝𝑜 )
𝑛
Example 5.1
A plastics manufacturer has developed a new type of plastic trash can and proposes to
sell them with an unconditional 6-year warranty. To see whether this is economically
feasible, 20 prototype cans are subjected to an accelerated life test to simulate 6 years of
use. The proposed warranty will be modified only if the sample data strongly suggests
that fewer than 90% of such cans would survive the 6-year period. During the test 12
cans survive the test. Should the manufacturer implement the unconditional 6-year
warranty? Test at 𝛼 = 0.05.
19
Solution
Let 𝑝 denote the proportion of all cans that survive the accelerated test. The relevant
hypotheses are
𝐻0 : 𝑝 = 0.9
𝐻1 : 𝑝 < 0.9
12
The sample proportion is: 𝑝̂ = 20 = 0.8
Since the computed 𝑧 value is less than the tabulated one, we fail to reject 𝐻0 and
conclude that the proportion of cans that can survive 6 years is 0.9.
Assume the availability of a sample of 𝑚 individuals from the first population and
𝑛 from the second.
The variables 𝑋 and 𝑌 represent the number of individuals in each sample possessing
the characteristic that defines 𝑝1 and 𝑝2 .
The two-sample proportion test compares the difference in sample proportions of two
independent populations. The test is as follows:
20
Null hypothesis, 𝐻0 : 𝑝1 = 𝑝2
Test statistic:
𝑝̂1 − 𝑝̂ 2
𝑧=
√𝑝̂ 𝑞̂ ( 1 + 1)
𝑚 𝑛
𝑥+𝑦
Where 𝑝̂ = 𝑚+𝑛 with 𝑞̂ = 1 − 𝑝̂
Example 5.2
Is someone who switches brands because of a financial inducement less likely to remain
loyal than someone who switches without inducement?
Let 𝑝1 and 𝑝2 denote the true proportions of switchers to a certain brand with and
without inducement, respectively, who subsequently make a repeat purchase. Given
the data below, test the appropriate hypothesis at 𝛼 = 0.01 .
Solution
The null and alternative hypotheses are:
𝐻0 : 𝑝1 = 𝑝2
𝐻1 : 𝑝1 < 𝑝2
21
The test statistic is
Since the computed 𝑧 value is far less than the tabulated one, we reject 𝐻0 and conclude
that someone who switches brands because of a financial inducement less likely to
remain loyal than someone who switches without inducement.
22
CHAPTER SIX
INFERENCE ON MEANS
Test statistic:
𝑥̅ − 𝜇𝑜
𝑡=
𝑠/√𝑛
Example 6.1
A manufacturer claims that the average weight a certain product is 3.3 kg. A random
sample of ten such products gave the following results
2.6 2.2 2.9 3.4 3.4 3.7 1.7 2.7 3.3 2.3
Test at 𝛼 = 0.05 the hypothesis that the mean weight of the sampled products differs
from the claimed figure.
Solution
Let µ denote the mean weight of the product
The hypothesis to the tested is
𝐻0 : µ = 3.3
vs
𝐻1 : µ ≠ 3.3
23
This is a two tailed test 𝛼 = 0.05
In this case
𝑥̅ = 2.82
𝑛
∑(𝑥𝑖 − 𝑥̅ )2 = 3.656
𝑖=1
3.656
𝑠=√ = 0.6374
9
𝑖 𝑥𝑖 𝑥𝑖 − 𝑥̅ (𝑥𝑖 − 𝑥̅ )2
1 2.6 -0.22 0.0484
2 2.2 -0.62 0.3844
3 2.9 0.08 0.0064
4 3.4 0.58 0.3364
5 3.4 0.58 0.3364
6 3.7 0.88 0.7744
7 1.7 -1.12 1.2544
8 2.7 -0.12 0.0144
9 3.3 0.48 0.2304
10 2.3 -0.52 0.2704
Total 3.656
2.82 − 3.3
𝑡=
0.6374/√10
−0.48
𝑡= = −2.3810
0.2016
The tabulated t value for this test is 𝑡0.05,9 (𝑡𝑤𝑜 𝑡𝑎𝑖𝑙) = 2.262
Since the computed t value is greater that the tabulated one in absolute terms, we reject
𝐻0 and conclude that the mean weight of the sampled products significantly differs
from the claimed figure.
Exercise 3.1
According to a recent report on accident claims by an insurance company, the average
number accident claims reported per branch is 3 per day. A random sample of twenty
branches on a given day yielded the following data of claim incidents
24
3 1 2 1 2 3 5 2 5 1
3 2 3 1 3 3 4 3 1 8
Test at 𝛼 = 0.05 the hypothesis that the mean number of claim incidents in these
branches is less than the one in the report.
Null hypothesis, 𝐻0 : 𝜇1 = 𝜇2
Test statistic:
𝑥̅ − 𝑦̅
𝑡=
2 2
√𝑠1 + 𝑠2
𝑚 𝑛
Where
𝜈 =𝑚+𝑛−2
𝑚
1
𝑠12 = ∑(𝑥𝑖 − 𝑥̅ )2
𝑚−1
𝑖=1
𝑛
1
𝑠22 = ∑(𝑦𝑖 − 𝑦̅)2
𝑛−1
𝑖=1
25
Example 6.2
As an extension to Example 6.1, suppose the management of the company suspects that
there is a difference in weight of products between those produced during day shift and
those produced during night shift. A random sample of 18 products gave the following
results
Day 3.7 2.7 2.6 4.1 3.7 3.3 3.3 4.2 2.8 3.6
Nigh
3.2 2.3 3.3 2.6 3.3 3.4 3.1 3.6
t
Test 𝛼 = 0.05 the hypothesis that the mean weight of day shift products is greater than
that one of night shift in the company.
Solution
Let µ1 denote the mean weight of day shift products and µ2 denote the mean weight of
night shift products
In this case
𝑥̅ = 3.4, 𝑦̅ = 3.1
1 2.86
𝑠12 = ∑(𝑥𝑖 − 𝑥̅ )2 = = 0.3178
𝑚−1 9
26
1 1.32
𝑠22 = ∑(𝑦𝑖 − 𝑦̅)2 = = 0.1886
𝑛−1 7
The tabulated 𝑡 value for this test is 𝑡0.05,16 (𝑜𝑛𝑒 𝑡𝑎𝑖𝑙) = 1.746
Since the computed 𝑡 value is less than the tabulated one, we fail to reject 𝐻0 and
conclude that the mean weight of day shift products is equal to that one of night shift
products in the facility.
Exercise 6.2
A study was taken to establish whether there is a difference in the mean sales between
the male marketers and female ones. The monthly sales in KES 100,000 for the
marketers grouped by gender are shown below. Test the appropriate hypotheses using
𝛼 = 0.05 significance level.
Male 27.4 25.4 28.5 31.1 30.4 31.5 23.4 27.5 30.2 25.8 26.6 24.6
24.3 26.3 26.5
Female 27.8 31.2 29.2 25.6 26.8 32.9 28.3 30.3 25.5 28.8 28.8 26.8
Test statistic:
𝑑̅
𝑡=
𝑠𝑑 /√𝑛
27
𝑑̅ and 𝑠𝑑 are the sample mean and standard deviation respectively of the differences di’s
between the first and second observations within a pair.
Example 6.3
An investor tries out a new investment strategy. The following data represents the
monthly percentage returns for one year before and after the implementation of the
strategy. Did the new strategy work? Test at α=0.05.
Month 1 2 3 4 5 6 7 8 9 10 11 12
Before 8.5 7.8 11.2 1.1 7.5 3.9 8.2 3.1 10.3 10.2 4.5 11.3
After 8.2 9.8 10.2 10.5 14.2 12.4 11.8 15.5 6.1 11.9 8.6 17.6
Solution
Null hypothesis: H0 : µ1 = µ2
Alternative hypothesis: H1 : µ1 > µ2
Where 𝜇1 and 𝜇2 are the average monthly percentage returns before and after the
implementation of the strategy respectively
2
Month After Before 𝑑𝑖 𝑑𝑖 − 𝑑̅ (𝑑𝑖 − 𝑑̅ )
1 8.2 8.5 -0.3 -4.4 19.36
2 9.8 7.8 2.0 -2.1 4.41
3 10.2 11.2 -1.0 -5.1 26.01
4 10.5 1.1 9.4 5.3 28.09
5 14.2 7.5 6.7 2.6 6.76
6 12.4 3.9 8.5 4.4 19.36
7 11.8 8.2 3.6 -0.5 0.25
8 15.5 3.1 12.4 8.3 68.89
9 6.1 10.3 -4.2 -8.3 68.89
10 11.9 10.2 1.7 -2.4 5.76
11 8.6 4.5 4.1 0.0 0.00
12 17.6 11.3 6.3 2.2 4.84
TOTAL 252.62
1 49.2
𝑑̅ = ∑ 𝑑𝑖 = = 4.1
12 12
1 2 252.62
𝑠𝑑 = √ ∑(𝑑𝑖 − 𝑑̅ ) = √ = √22.9655 = 4.7922
𝑛−1 11
𝑑̅ 4.1 4.1
𝑡= = = = 2.9637
𝑠𝑑 /√𝑛 4.7922/√12 1.3834
28
The tabulated 𝑡 value for this test is 𝑡0.05,11 (𝑜𝑛𝑒 𝑡𝑎𝑖𝑙) = 1.796.
Since the computed 𝑡 value is greater than the tabulated one, we reject 𝐻0 and conclude
that the strategy worked in significantly increasing returns
Exercise 9.3
Compare the prices of 15 household goods in two supermarkets
A B
109 101
128 137
63 62
71 65
136 138
100 91
136 144
73 80
85 81
81 78
77 70
94 101
63 58
121 114
85 83
29
CHAPTER SEVEN
Analysis of Variance, popularly known as the ANOVA, is used to compare the means
in cases where there are more than two groups.
When we have only two samples we can use the t-test to compare the means of the
samples but it might become unreliable in case of more than two samples. If we only
compare two means, then the t-test (independent samples) will give the same results as
the ANOVA.
Let 𝑦𝑖𝑗 denote the response for the 𝑗 − 𝑡ℎ experimental unit in the 𝑖 − 𝑡ℎ sample, 𝑦𝑖. and
𝑦̅𝑖. represent the total and mean of the 𝑛𝑖 responses in the 𝑖 − 𝑡ℎ sample.
𝑛
∑𝑘 𝑖
𝑖=1 ∑𝑗=1 𝑦𝑖𝑗
The overall mean is 𝑦̅ = 𝑛
30
𝑀𝑆𝑇
𝐻0 is rejected if 𝐹 = > 𝐹𝑘−1,𝑛−𝑘,𝛼
𝑀𝑆𝐸
Example 7
A microfinance has four main plans of recruiting customers. The data below show the
number of customers recruited each of these plans by 23 assistants in six months. Do the
plans differ in mean achievement
Plan
I II III IV
59 65 75 94
78 87 69 89
67 73 83 80
62 79 81 88
83 81 72
76 69 79
90
𝑦𝑖. 425 454 549 351
𝑛𝑖 6 6 7 4
𝑦̅𝑖. 70.83 75.67 78.43 87.75
Solution
𝑘 = 4, 𝑛 = 23
SSG=17.002+8.143+254.802+432.640
SSG=712.587
𝑘 𝑛𝑖
MSG=SSG/ (k-1)=712.587/3=237.529
31
F=MSG/MSE=237.529/62.981=3.771
Since the competed F-value is greater that the tabulated one, we reject 𝐻0 and conclude
that there is significant difference in mean achievement for the four plans.
𝒊 𝒋 𝒚𝒊𝒋 ̅)
(𝒚𝒊𝒋 − 𝒚 ̅ )𝟐
(𝒚𝒊𝒋 − 𝒚
1 59 -18.35 336.7225
2 78 0.65 0.4225
3 67 -10.35 107.1225
1
4 62 -15.35 235.6225
5 83 5.65 31.9225
6 76 -1.35 1.8225
1 65 -12.35 152.5225
2 87 9.65 93.1225
3 73 -4.35 18.9225
2
4 79 1.65 2.7225
5 81 3.65 13.3225
6 69 -8.35 69.7225
1 75 -2.35 5.5225
2 69 -8.35 69.7225
3 83 5.65 31.9225
3 4 81 3.65 13.3225
5 72 -5.35 28.6225
6 79 1.65 2.7225
7 90 12.65 160.0225
1 94 16.65 277.2225
2 89 11.65 135.7225
4
3 80 2.65 7.0225
4 88 10.65 113.4225
TOTAL 1909.2175
32
Exercise 7
A local bank has three branch offices. The bank has a liberal sick leave policy, and a
vice-president was concerned about employees taking advantage of this policy. She
thought that the tendency to take advantage depended on the branch at which the
employee worked. To see whether there were differences in the time employees took for
sick leave, she asked each branch manager to sample employees randomly and record
the number of days of sick leave taken during 2015. Twenty employees were chosen,
and the data are listed below:
Branch
A B C D
13 13 11 13
13 15 12 7
12 15 12 12
16 17 14 8
25 25 22 10
Does the data indicate a difference in branches? Use a level of significance of 0.05.
33
CHAPTER EIGHT
8.1 Introduction
A great deal of the data collected by scientists, medical statisticians and economists is in
the form of counts (whole numbers or integers). The numbers of individuals that died,
the number of firms going bankrupt, the number of days of frost, the number of red
blood cells on a microscope slide, or the number of craters in a sector of lunar landscape
are all potentially interesting variables for study.
ii. There is a single population of interest, with each individual in the population
categorized with respect to two different factors. There are 𝐼 categories associated
with the first factor, and 𝐽 categories associated with the second factor. A single
sample is taken, and the number of individuals belonging in both category 𝑖 of
factor 1 and category 𝑗 of factor 2 is entered in the cell in row 𝑖, column 𝑗 (𝑖 =
1, … , 𝐼 ; 𝑗 = 1, … , 𝐽). As an example, customers making a purchase might be
classified according to department in which the purchase was made, with 𝐼 = 6
departments, and according to method of payment, with 𝐽 = 5 as in (i) above.
Let 𝑛𝑖𝑗 denote the number of individuals in the sample falling in the (𝑖, 𝑗)𝑡ℎ of the table.
34
The table displaying the 𝑛𝑖𝑗 ′𝑠 is called a two-way contingency table; a prototype is shown
below:
1 2 … j … J
1 𝑛11 𝑛12 … 𝑛1𝑗 … 𝑛1𝐽
2 𝑛21 𝑛22 … 𝑛2𝑗 … 𝑛2𝐽
. . . . . . .
. . . . . . .
. . . . . . .
i 𝑛𝑖1 𝑛𝑖2 … 𝑛𝑖𝑗 … 𝑛𝑖𝐽
. . . . . . .
. . . . . . .
. . . . . . .
2
𝐻0 is rejected if 𝜒 2 > 𝜒𝛼,(𝐼−1)(𝐽−1)
Example 8
Suppose you want to determine if certain types of products sell better in certain
geographic locations than others. Consider the accompanying data of number of sales of
three products in three regions. Test the hypothesis of independence between type of
product and region
35
Product
Region I II III Total
A 31 14 45 90
B 22 15 37 74
C 33 35 18 86
Total 86 64 100 250
Solution
We could now set up the following table:
Thus, we would reject the null hypothesis that there is no relationship between type of
product and region. Our data tell us there is a statistically significant relationship
between type of product and region.
Exercise 8
A company packages a particular product in cans of three different sizes, each one
using a different production line. Most cans conform to specifications, but a quality
control engineer has identified the following reasons for non-conformance: (1) blemish
on can; (2) crack in can; (3) improper pull tab location; (4) pull tab missing; (5) other. A
sample of nonconforming units is selected from each of the three lines, and each unit is
categorized according to reason for nonconformity, resulting in the following
contingency table data:
36
Does the data suggest that the proportions falling in the various non-conformance
categories are not the same for the three lines?
37
CHAPTER NINE
9.1 Introduction
Regression analysis involves identifying the relationship between a dependent variable
and one or more independent variables. A model of the relationship is hypothesized,
and estimates of the parameter values are used to develop an estimated regression
equation. Various tests are then employed to determine if the model is satisfactory. If
the model is deemed satisfactory, the estimated regression equation can be used to
predict the value of the dependent variable given values for the independent variables.
The correlation is a measure of linear association between two variables. Values of the
correlation coefficient are always between -1 and +1. A correlation coefficient of +1
indicates that two variables are perfectly related in a positive linear sense; a correlation
coefficient of -1 indicates that two variables are perfectly related in a negative linear
sense, and a correlation coefficient of 0 indicates that there is no linear relationship
between the two variables.
The quantity r, called the Pearson product moment correlation coefficient, measures the
strength and the direction of a sample linear relationship between two variables. The
formula for computing r is:
𝑏0 = 𝑦̅ − 𝑏1 𝑥̅
38
Example 9.1
A study was made on the profitability of certain small ventures depending on amount
invested. The data were recorded as follows in KES 10000;
Invested amount, x 2.1 1.6 1.9 1.7 1.4 1.2 1.3 1.1 2.3 1.4
Profit, y 10.6 7.7 8.6 7.6 7.8 5.9 7.2 5.4 9.6 5.6
a) Determine the regression equation.
b) Compute the Pearson product moment correlation coefficient.
Solution
a) The regression equation
The least squares regression line is given by
𝑦̂ = 𝑏0 + 𝑏1 𝑥 where
𝑏0 = 𝑦̅ − 𝑏1 𝑥̅
𝑥 𝑦 𝑥𝑦 𝑥2 𝑦2
2.1 10.6 22.26 4.41 112.36
1.6 7.7 12.32 2.56 59.29
1.9 8.6 16.34 3.61 73.96
1.7 7.6 12.92 2.89 57.76
1.4 7.8 10.92 1.96 60.84
1.2 5.9 7.08 1.44 34.81
1.3 7.2 9.36 1.69 51.84
1.1 5.4 5.94 1.21 29.16
2.3 9.6 22.08 5.29 92.16
1.4 5.6 7.84 1.96 31.36
16 76 127.06 27.02 603.54
Thus
𝑦 = 1.448 + 3.845𝑥
b)
The Pearson product moment correlation coefficient, 𝑟
39
𝑛 ∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖 − (∑𝑛𝑖=1 𝑥𝑖 )(∑𝑛𝑖=1 𝑦𝑖 )
𝑟=
√[𝑛 ∑𝑛𝑖=1 𝑥𝑖2 − (∑𝑛𝑖=1 𝑥𝑖 )2 ][𝑛 ∑𝑛𝑖=1 𝑦𝑖2 − (∑𝑛𝑖=1 𝑦𝑖 )2 ]
Exercise 9.1
The following are heights in cm and weights in kg of 10 men
Height 162 168 174 176 180 180 182 184 186 186
Weight 65 65 84 63 75 76 82 65 80 81
a) Draw the scatter diagram for the data
b) Find the regression equation
c) Compute the Pearson’s correlation coefficient
This is obtained by solving the normal equations. Consider the case where 𝑝 = 2. The
normal equations are:
𝑛 𝑛 𝑛
𝑛 𝑛 𝑛 𝑛
2
𝑏0 ∑ 𝑥𝑖1 + 𝑏1 ∑ 𝑥𝑖1 + 𝑏2 ∑ 𝑥𝑖1 𝑥𝑖2 = ∑ 𝑥𝑖1 𝑦𝑖
𝑖=1 𝑖=1 𝑖=1 𝑖=1
𝑛 𝑛 𝑛 𝑛
2
𝑏0 ∑ 𝑥𝑖2 + 𝑏1 ∑ 𝑥𝑖1 𝑥𝑖2 + 𝑏2 ∑ 𝑥𝑖2 = ∑ 𝑥𝑖2 𝑦𝑖
𝑖=1 𝑖=1 𝑖=1 𝑖=1
The ANOVA table for the multiple linear regression model is:
40
Regression 𝑝 𝑆𝑆𝑅 𝑀𝑆𝑅
𝑆𝑆𝑅 = ∑(𝑦̂𝑖 − 𝑦̅)2 𝑀𝑆𝑅 = 𝐹=
𝑝 𝑀𝑆𝐸
Error 𝑛−𝑝−1 𝑆𝑆𝐸 = 𝑆𝑆𝑇 − 𝑆𝑆𝑅 𝑆𝑆𝐸
𝑀𝑆𝐸 =
𝑛−𝑝−1
Total 𝑛−1 𝑆𝑆𝑇 = ∑(𝑦𝑖 − 𝑦̅)2
a) Overall test. Taken collectively, does the entire set of explanatory or independent
variables contribute significantly to the prediction of response?
The null hypothesis for this test may be stated as: ‘‘All 𝑝 independent variables
considered together do not explain the variation in the responses.’’ In other words,
𝐻0 : 𝛽1 = 𝛽2 = ⋯ = 𝛽𝑝 = 0
The 𝐹 statistic can be used to test this global null hypothesis 𝐻0 is rejected if the
computed 𝐹 statistic is greater than the tabulated one, 𝐹𝛼,(𝑝),(𝑛−𝑝−1) .
The R-squared (𝑅 2 ) statistic provides a measure of how well the model is fitting the
actual data. It gives the proportion of the variance in the dependent variable that is
predictable from the independent variable. It is given by
𝑆𝑆𝑅
𝑅2 =
𝑆𝑆𝑇
In multiple regression settings, the 𝑅 2 will always increase as more variables are
included in the model. That’s why the adjusted 𝑅 2 is the preferred measure as it adjusts
for the number of variables considered. It is defined as
(1 − 𝑅 2 )𝑝
𝐴𝑑𝑗𝑢𝑠𝑡𝑒𝑑 𝑅 2 = 𝑅 2 −
𝑛−𝑝−1
b) Test for the value of a single factor. Does the addition of one particular variable of
interest add significantly to the prediction of response over and above that
achieved by other independent variables?
The null hypothesis for this test may stated as:
‘‘Factor 𝑋𝑗 does not have any value added to the prediction of the response given that
other factors are already included in the model.’’ In other words,
𝐻0 : 𝛽𝑗 = 0
This can be tested using
𝑏𝑗
𝑡=
𝑆𝐸(𝑏𝑗 )
41
where 𝑏𝑗 is the corresponding estimated regression coefficient and 𝑆𝐸(𝑏𝑗 ) is the estimate
of the standard error of 𝑏𝑗 .
Where
𝑀𝑆𝐸 𝑀𝑆𝐸
𝑆𝐸(𝑏𝑗 ) = √ = √
∑𝑛𝑖=1(𝑥𝑗𝑖 − 𝑥̅𝑗 )2 ∑𝑛𝑖=1 𝑥𝑗𝑖2 − 𝑛𝑥̅𝑗2
Example 9.2
Let 𝑦 be the sales at a fast-food outlet (KES 1000), 𝑥1 be the population within a 2-
kilometre radius (1000’s of people) and 𝑥2 be number (in hundreds) of competing
outlets within a 2 - kilometre radius.
Fit a multiple linear regression model and test the significance of both the fitted model
and the two independent variables.
42
Solution
𝒚 𝒙𝟏 𝒙𝟐 𝒙𝟐𝟏 𝒙𝟏 𝒙𝟐 𝒙𝟐𝟐 𝒙𝟏 𝒚 𝒙𝟐 𝒚
101 81.7 19.9 6674.89 1625.83 396.01 8251.7 2009.9
142 103.8 18.7 10774.44 1941.06 349.69 14739.6 2655.4
117 96.5 26.1 9312.25 2518.65 681.21 11290.5 3053.7
104 95.2 24.5 9063.04 2332.4 600.25 9900.8 2548.0
109 92.9 21.6 8630.41 2006.64 466.56 10126.1 2354.4
132 99.1 23.3 9820.81 2309.03 542.89 13081.2 3075.6
107 85.4 28.2 7293.16 2408.28 795.24 9137.8 3017.4
118 90.5 21.4 8190.25 1936.7 457.96 10679.0 2525.2
103 95.6 25.5 9139.36 2437.8 650.25 9846.8 2626.5
120 83.4 19.9 6955.56 1659.66 396.01 10008.0 2388.0
131 106.7 21.6 11384.89 2304.72 466.56 13977.7 2829.6
123 92.4 22.9 8537.76 2115.96 524.41 11365.2 2816.7
Total 1407 1123.2 273.6 105776.8 25596.73 6327.04 132404.4 31900.4
4681.478𝑏2 = −8745.75
𝑏2 = −1.868
Substituting this in (v)
𝑏1 = 1.065
43
From (i)
𝑏0 = 60.156
44
𝑆𝑆𝑅 1090.996
𝑅2 = = = 0.5815. The model accounts for 58.15% of the variation in the
𝑆𝑆𝑇 1876.250
response variable which is fairly adequate.
(1 − 𝑅 2 )𝑝 (1 − 0.5815) × 2
𝐴𝑑𝑗𝑢𝑠𝑡𝑒𝑑 𝑅 2 = 𝑅 2 − = 0.5815 − = 0.4885
𝑛−𝑝−1 12 − 2 − 1
𝒙𝟏 𝒙𝟐 ̅𝟏 )𝟐 (𝒙𝟐 − 𝒙
(𝒙𝟏 − 𝒙 ̅𝟐 ) 𝟐
81.7 19.9 141.61 8.41
103.8 18.7 104.04 16.81
96.5 26.1 8.41 10.89
95.2 24.5 2.56 2.89
92.9 21.6 0.49 1.44
99.1 23.3 30.25 0.25
85.4 28.2 67.24 29.16
90.5 21.4 9.61 1.96
95.6 25.5 4.00 7.29
83.4 19.9 104.04 8.41
106.7 21.6 171.61 1.44
92.4 22.9 1.44 0.01
Total 645.3 88.96
𝑏𝑗 𝑀𝑆𝐸
𝑡= ; 𝑆𝐸(𝑏𝑗 ) = √ 𝑛
𝑆𝐸(𝑏𝑗 ) ∑𝑖=1(𝑥𝑗𝑖 − 𝑥̅𝑗 )2
87.25 87.25
𝑆𝐸(𝑏1 ) = √ = 0.3677; 𝑆𝐸(𝑏2 ) = √ = 0.9903
645.3 88.96
1.065 −1.868
𝑡1 = = 2.8964; 𝑡2 = = 1.8863
0.3677 0.9903
45
Exercise 9.2
For the data given below, fit a multiple linear regression model and test the significance
of both the fitted model and the two independent variables.
𝑥1 𝑥2 𝑦
5.3 77.4 50.5
5.4 11.1 24.8
5.6 32.1 31.6
2.5 25.1 27.8
3.4 22.1 22.1
2.6 35.1 28.5
4.4 50.3 41.1
2.1 52.1 28.9
3.6 40.9 36.3
5.1 78.8 43.4
46
CHAPTER TEN
10.1 Introduction
Nonparametric tests are used in situations where the data come from a probability
distribution whose underlying form is not specified. That is, it will not be assumed that
the underlying distribution is normal, or exponential, or any other given type.
Because no particular parametric form for the underlying distribution is assumed, such
tests are called nonparametric.
The strength of a nonparametric test resides in the fact that it can be applied without
any assumption on the form of the underlying distribution.
Parametric procedures are all sensitive to extreme observations, a few very small or
very large—perhaps erroneous—data values.
The results of these nonparametric tests are much less affected by extreme observations.
If the null hypothesis is true, that is, 𝑚 = 𝑚0 , then N− and N+ both follow a binomial
distribution with parameters n and p = ½
47
In that case, we should reject the null hypothesis if n− , the observed number of negative
signs, is too small, or alternatively, if the P-value as defined by:
𝑃 = 𝑃(N− ≤ n− )
is small, that is, less than or equal to α.
In that case, we should reject the null hypothesis if n+ , the observed number of positive
signs, is too small, or alternatively, if the P-value as defined by:
𝑃 = 𝑃(N+ ≤ n+ )
is small, that is, less than or equal to α.
Example 10.1
Recall Example 6.1
A manufacturer claims that the median weight a certain product is 3.3 kg. A random
sample of ten such products gave the following results
2.6 2.2 2.9 3.4 3.4 3.7 1.7 2.7 3.3 2.3
Test at 𝛼 = 0.05 the hypothesis that the median weight of the sampled products differs
from the claimed figure.
Solution
The test of hypothesis is:
𝐻0 : 𝑚 = 3.3
𝐻1 : 𝑚 ≠ 3.3
𝑥𝑖 𝑥𝑖 − 𝑚0 Sign 𝑥𝑖 𝑥𝑖 − 𝑚0 Sign
2.6 -0.7 - 3.7 0.4 +
2.2 -1.1 - 1.7 -1.6 -
2.9 -0.4 - 2.7 -0.6 -
3.4 0.1 + 3.3 0.0
3.4 0.1 + 2.3 -1.0 -
48
n− = 6, n+ = 3, min(n− , n+ ) = 3
The p value is
Since the p value is greater than 0.05, we fail to reject the null hypothesis. The median
weight of the sampled products is NOT different from the claimed figure of 3.3.
Exercise 10.1
According to a recent report on accident claims by an insurance company, the average
number accident claims reported per branch is 3 per day. A random sample of twenty
branches on a given day yielded the following data of claim incidents
3 1 2 1 2 3 5 2 5 1
3 2 3 1 3 3 4 3 1 8
Test at 𝛼 = 0.05 the hypothesis that the median number of claim incidents in these
branches is less than the one in the report.
But unlike the t test, this test does not assume that the underlying populations are
normally distributed and is less affected by extreme observations.
The Wilcoxon rank-sum test evaluates the null hypothesis that the medians of the two
populations are identical.
Let n1 and n2 be the two sample sizes and R be the sum of the ranks from the sample
with size n1.
Under the null hypothesis that the two underlying populations have identical medians,
we would expect the averages of ranks to be approximately equal.
49
is the mean and
𝑛1 𝑛2 (𝑛1 + 𝑛2 + 1)
𝜎𝑅 = √
12
is the standard deviation of R. It does not make any difference which rank sum we use.
For relatively large values of n1 and n2 the sampling distribution of this statistic is
approximately standard normal.
Thus the p-value for this statistic is 𝑝(𝑍 ≥ 𝑧), 𝑝(𝑍 ≤ −𝑧) and 2𝑝(𝑍 ≥ 𝑧) respectively for
the upper tailed, lower tailed and two tailed tests respectively
Example 10.2
Recall Example 6.2
As an extension to Example 6.1, suppose the management of the company suspects that
there is a difference in weight of products between those produced during day shift and
those produced during night shift. A random sample of 18 products gave the following
results
Day 3.7 2.7 2.6 4.1 3.7 3.3 3.3 4.2 2.8 3.6
Night 3.2 2.3 3.3 2.6 3.3 3.4 3.1 3.6
Test a nonparametric test at 𝛼 = 0.05 that the median weight of day shift products is
greater than that one of night shift in the company.
Solution
Day Night
Weight Rank Weight Rank
3.7 15.5 3.2 7
2.7 4 2.3 1
2.6 2.5 3.3 9.5
4.1 17 2.6 2.5
3.7 15.5 3.3 9.5
3.3 9.5 3.4 12
3.3 9.5 3.1 6
4.2 18 3.6 13.5
2.8 5
3.6 13.5
Total 110 61
50
𝑛1 (𝑛1 + 𝑛2 + 1) 10(10 + 8 + 1) 190
𝜇𝑅 = = = = 95
2 2 2
𝑅 − 𝜇𝑅 110 − 95
𝑧= = = 1.3328
𝜎𝑅 11.2546
The upper tail p-value for this statistic is 𝑝(𝑍 ≥ 1.3328) = 0.0913. We therefore fail to
reject Ho. The median weight for day shift equals that one for night shift.
𝜎𝑅 = 11.2546
𝑅 − 𝜇𝑅 61 − 76
𝑧= = = −1.3328
𝜎𝑅 11.2546
Exercise 10.2
Recall Exercise 6.2.
A study was taken to establish whether there is a difference in the mean sales between
the male marketers and female ones. The monthly sales in KES 100,000 for the
marketers grouped by gender are shown below. Test the appropriate non parametric
hypotheses using α = 0.05 significance level.
Male 27.4 25.4 28.5 31.1 30.4 31.5 23.4 27.5 30.2 25.8 26.6 24.6
24.3 26.3 26.5
Female 27.8 31.2 29.2 25.6 26.8 32.9 28.3 30.3 25.5 28.8 28.8 26.8
As with the paired t test, we begin by forming differences. Then the absolute values of
the differences are assigned ranks; if there are ties in the differences, the average of the
appropriate ranks is assigned.
51
This is achieved by multiplying each rank by +1, -1, or 0 as the corresponding difference
is positive, negative, or zero. The results are n signed ranks, one for each pair of
observations; for example, if the difference is zero, its signed rank is zero.
The basic idea is that if the mean difference is positive, there would be more and larger
positive signed ranks; since if this were the case, most differences would be positive and
larger in magnitude than the few negative differences, most of the ranks, especially the
larger ones, would then be positively signed.
We can base the test on the sum R of the positive signed ranks. We test the null
hypothesis of no difference by calculating the standardized test statistic:
𝑅 − 𝜇𝑅
𝑧=
𝜎𝑅
Where
𝑛(𝑛 + 1)
𝜇𝑅 =
4
is the mean and
𝑛(𝑛 + 1)(2𝑛 + 1)
𝜎𝑅 = √
24
Thus the p-value for this statistic is 𝑝(𝑍 ≥ 𝑧), 𝑝(𝑍 ≤ −𝑧) and 2𝑝(𝑍 ≥ 𝑧) respectively for
the upper tailed, lower tailed and two tailed tests respectively
Example 10.3
Recall Example 6.3.
An investor tries out a new investment strategy. The following data represents the
monthly percentage returns for one year before and after the implementation of the
strategy. Did the new strategy work? Test at α=0.05. Use the nonparametric method.
Month 1 2 3 4 5 6 7 8 9 10 11 12
After 8.5 7.8 11.2 1.1 7.5 3.9 8.2 3.1 10.3 10.2 4.5 11.3
Before 8.2 9.8 10.2 10.5 14.2 12.4 11.8 15.5 6.1 11.9 8.6 17.6
52
Solution
Month After Before 𝑑𝑖 |𝑑𝑖 | Rank Signed Rank
1 8.2 8.5 -0.3 0.3 1 -1
2 9.8 7.8 2 2 4 4
3 10.2 11.2 -1 1 2 -2
4 10.5 1.1 9.4 9.4 11 11
5 14.2 7.5 6.7 6.7 9 9
6 12.4 3.9 8.5 8.5 10 10
7 11.8 8.2 3.6 3.6 5 5
8 15.5 3.1 12.4 12.4 12 12
9 6.1 10.3 -4.2 4.2 7 -7
10 11.9 10.2 1.7 1.7 3 3
11 8.6 4.5 4.1 4.1 6 6
12 17.6 11.3 6.3 6.3 8 8
R=4+11+9+10+5+12+3+6+8=68
𝑛(𝑛 + 1) 12 ∗ 13
𝜇𝑅 = = = 39
4 4
𝑅 − 𝜇𝑅 68 − 39 29
𝑧= = = = 2.275
𝜎𝑅 12.7475 12.7475
The upper tail p-value for this statistic is 𝑝(𝑍 ≥ 2.275) = 0.0115. We therefore reject Ho.
The median returns were significantly increased.
Exercise 10.3
Recall Exercise 6.3.
53
10.5 The Kuskal-Wallis Test
The Kruskal–Wallis one-way analysis of variance is a direct generalization of the
Wilcoxon Rank-Sum test to the case in which we have three or more independent groups.
To perform the Kruskal–Wallis test, we simply rank all scores without regard to group
membership.
The test statistic then is:
∑𝑘𝑖=1 𝑛𝑖 (𝑟̅𝑖. − 𝑟̅ )2
𝐾 = (𝑛 − 1) 𝑘
∑𝑖=1 ∑𝑛𝑗=1
𝑖
(𝑟𝑖𝑗 − 𝑟̅ )2
Where
𝑛𝑖 is the number of observations in the i-th group
𝑟𝑖𝑗 is the rank among all observations of observation j in the i-th group
∑𝑛𝑗=1
𝑖
𝑟𝑖𝑗
𝑟̅𝑖. =
𝑛𝑖
𝑛+1
𝑟̅ = is the average of all 𝑟𝑖𝑗
2
2
The p-value is approximated by 𝑃(𝜒𝑘−1 ≥ 𝐾)
Example 10.4
Recall the Example 7.
A microfinance has four main plans of recruiting customers. The data below show the
number of customers recruited each of these plans by 23 assistants in six months. Do the
plans differ in mean achievement? Use the nonparametric approach
Plan
I II III IV
59 65 75 94
78 87 69 89
67 73 83 80
62 79 81 88
83 81 72
76 69 79
90
54
Solution
We have:
276
𝑟̅ = = 12
23
∑𝑘𝑖=1 𝑛𝑖 (𝑟̅𝑖. − 𝑟̅ )2
𝐾 = (𝑛 − 1)
∑𝑘𝑖=1 ∑𝑛𝑗=1
𝑖
(𝑟𝑖𝑗 − 𝑟̅ )2
The p-value for this statistic is 𝑃(𝜒32 ≥ 7.7909) = 0.0505. Since the p-value is greater
than 0.05, we fail to reject Ho.
55
Exercise 10.4
Recall Exercise 7.
A local bank has three branch offices. The bank has a liberal sick leave policy, and a
vice-president was concerned about employees taking advantage of this policy. She
thought that the tendency to take advantage depended on the branch at which the
employee worked. To see whether there were differences in the time employees took for
sick leave, she asked each branch manager to sample employees randomly and record
the number of days of sick leave taken during 2015. Twenty employees were chosen,
and the data are listed below:
Branch
A B C D
13 13 11 13
13 15 12 7
12 15 12 12
16 17 14 8
25 25 22 10
Does the data indicate a difference in branches? Use a level of significance of 0.05.
Conduct a non - parametric test of hypothesis.
For a sample of size 𝑛, the 𝑛 raw scores 𝑥𝑖 , 𝑦𝑖 are converted to ranks 𝑟(𝑥𝑖 ), 𝑟(𝑦𝑖 ), and 𝑟𝑠 is
computed from:
6 ∑ 𝑑𝑖2
𝑟𝑠 = 1 −
𝑛(𝑛2 − 1)
where 𝑑𝑖 = 𝑟(𝑥𝑖 ) − 𝑟(𝑦𝑖 ), is the difference between ranks.
Example 10.5
Recall Example 9.1. A study was made on the profitability of certain small ventures
depending on amount invested. The data were recorded as follows in KES 10000;
Invested amount, x 2.1 1.6 1.9 1.7 1.4 1.2 1.3 1.1 2.3 1.4
Profit, y 10.6 7.7 8.6 7.6 7.8 5.9 7.2 5.4 9.6 5.6
56
Solution
𝑥𝑖 𝑟(𝑥𝑖 ) 𝑦𝑖 𝑟(𝑦𝑖 ) 𝑑𝑖 𝑑𝑖2
2.1 2 10.6 1 1 1
1.6 5 7.7 5 0 0
1.9 3 8.6 3 0 0
1.7 4 7.6 6 -2 4
1.4 6.5 7.8 4 2.5 6.25
1.2 9 5.9 8 1 1
1.3 8 7.2 7 1 1
1.1 10 5.4 10 0 0
2.3 1 9.6 2 -1 1
1.4 6.5 5.6 9 -2.5 6.25
Total 20.5
Exercise 10.5
Recall Exercise 9.1. The following are heights in cm and weights in kg of 10 men.
Compute the Spearman rank correlation coefficient
Height 162 168 174 176 180 180 182 184 186 186
Weight 65 65 84 63 75 76 82 65 80 81
57
The choice between a one-sided test and a two-sided test in hypothesis testing is influenced by the purpose of the investigation or prior reasons for using a particular test direction. For instance, if a study seeks to determine if a parameter is greater or less than a specific value, a one-sided test is appropriate. However, if the study aims to detect any deviation from a specific value in either direction, a two-sided test is more suitable .
The test statistic in hypothesis testing is a calculated value from the sample data used to decide whether to reject the null hypothesis. For a t-test, the test statistic is computed using the formula: t = (x̄ - µ₀) / (s/√n). Here, x̄ is the sample mean, µ₀ is the hypothesized population mean, s is the sample standard deviation, and n is the sample size. This statistic helps assess where the sample mean lies in relation to the hypothesized mean under the assumption that the null hypothesis is true .
Choosing an appropriate significance level in hypothesis testing is crucial because it determines the threshold for making decisions about the null hypothesis. A low significance level (α) reduces the risk of a Type I error, which is the incorrect rejection of a true null hypothesis. Commonly used significance levels are 0.05 and 0.01. The significance level impacts the critical value(s), defining the boundaries for the decision-making process in hypothesis testing .
When deciding whether to use a parametric or non-parametric test, several considerations should be made, including the level of measurement of the data (interval/ratio for parametric), distribution assumptions (normality is assumed for parametric tests), sample size (parametric tests typically require larger samples), and the presence of outliers (non-parametric tests are more robust to outliers). These factors influence the appropriateness and power of the respective tests in detecting significant effects .
The concept of power in statistical tests is directly related to Type II error, represented as β. Power is the probability of correctly rejecting a false null hypothesis and is calculated as 1 minus the probability of a Type II error (Power = 1 - β). This means that as power increases, the likelihood of committing a Type II error decreases. Ideally, a test would have high power, close to 1, indicating a low chance of making a Type II error .
The Kruskal-Wallis test differs from a one-way ANOVA in that it is a non-parametric test, thus it does not assume a normal distribution of data. It is used when the assumptions necessary for a parametric one-way ANOVA are not met. Kruskal-Wallis ranks all data points across groups and uses these ranks to test whether the different groups' central tendencies are different. It is particularly useful when dealing with ordinal data or when the samples have unequal variances .
Contingency tables provide a structured way to display counts of categorical data across different groups or categories. They allow for examination of the relationship between two or more categorical variables by organizing data into rows and columns. Each cell in a contingency table represents the frequency of occurrences for a specific category combination. Analyzing these tables helps identify any significant associations or differences between categorical variables .
The p-value in hypothesis testing indicates the probability of observing the given data, or something more extreme, under the assumption that the null hypothesis is true. It provides a measure of evidence against the null hypothesis. A p-value less than the chosen significance level α suggests that the observed result is statistically significant, leading to the rejection of the null hypothesis. If the p-value is greater than α, we fail to reject the null hypothesis, implying insufficient evidence against it .
A critical region in a hypothesis test is determined by the critical values, which are based on the chosen significance level. This region signifies the range of values for the test statistic that would lead to the rejection of the null hypothesis. For a one-sided test, the critical region is either below or above a single critical value, whereas for a two-sided test, the region lies outside two critical values, one on each tail of the distribution. These regions are essential for making decisions about the null hypothesis .
Conducting a two-sample t-test involves several steps: First, state the null hypothesis, usually that the two population means are equal (H₀: μ₁ = μ₂). Second, select the significance level α. Third, calculate the test statistic using the formula t = (x̄ - ȳ) / √((s₁²/m) + (s₂²/n)), where x̄ and ȳ are sample means, s₁² and s₂² are sample variances, and m and n are sample sizes. Fourth, determine the degrees of freedom and corresponding critical t-value. Finally, compare the test statistic to the critical value to decide if the null hypothesis can be rejected, indicating a significant difference between means .