0% found this document useful (0 votes)

29 views57 pages

Statistical Inference Principles Guide

Q: What influences the choice between a one-sided test and a two-sided test in hypothesis testing?

The choice between a one-sided test and a two-sided test in hypothesis testing is influenced by the purpose of the investigation or prior reasons for using a particular test direction. For instance, if a study seeks to determine if a parameter is greater or less than a specific value, a one-sided test is appropriate. However, if the study aims to detect any deviation from a specific value in either direction, a two-sided test is more suitable .

Q: What role does the test statistic play in hypothesis testing, and how is it calculated for a t-test?

The test statistic in hypothesis testing is a calculated value from the sample data used to decide whether to reject the null hypothesis. For a t-test, the test statistic is computed using the formula: t = (x̄ - µ₀) / (s/√n). Here, x̄ is the sample mean, µ₀ is the hypothesized population mean, s is the sample standard deviation, and n is the sample size. This statistic helps assess where the sample mean lies in relation to the hypothesized mean under the assumption that the null hypothesis is true .

Q: Why is it important to choose an appropriate significance level in hypothesis testing?

Choosing an appropriate significance level in hypothesis testing is crucial because it determines the threshold for making decisions about the null hypothesis. A low significance level (α) reduces the risk of a Type I error, which is the incorrect rejection of a true null hypothesis. Commonly used significance levels are 0.05 and 0.01. The significance level impacts the critical value(s), defining the boundaries for the decision-making process in hypothesis testing .

Q: What considerations should be made when deciding on the use of a parametric versus a non-parametric test?

When deciding whether to use a parametric or non-parametric test, several considerations should be made, including the level of measurement of the data (interval/ratio for parametric), distribution assumptions (normality is assumed for parametric tests), sample size (parametric tests typically require larger samples), and the presence of outliers (non-parametric tests are more robust to outliers). These factors influence the appropriateness and power of the respective tests in detecting significant effects .

Q: How does the concept of power relate to Type II error in statistical tests?

The concept of power in statistical tests is directly related to Type II error, represented as β. Power is the probability of correctly rejecting a false null hypothesis and is calculated as 1 minus the probability of a Type II error (Power = 1 - β). This means that as power increases, the likelihood of committing a Type II error decreases. Ideally, a test would have high power, close to 1, indicating a low chance of making a Type II error .

Q: How does the Kruskal-Wallis test differ from a one-way ANOVA, and when is it used?

The Kruskal-Wallis test differs from a one-way ANOVA in that it is a non-parametric test, thus it does not assume a normal distribution of data. It is used when the assumptions necessary for a parametric one-way ANOVA are not met. Kruskal-Wallis ranks all data points across groups and uses these ranks to test whether the different groups' central tendencies are different. It is particularly useful when dealing with ordinal data or when the samples have unequal variances .

Q: What information do contingency tables provide in the analysis of categorical data?

Contingency tables provide a structured way to display counts of categorical data across different groups or categories. They allow for examination of the relationship between two or more categorical variables by organizing data into rows and columns. Each cell in a contingency table represents the frequency of occurrences for a specific category combination. Analyzing these tables helps identify any significant associations or differences between categorical variables .

Q: What does the p-value indicate in the context of hypothesis testing, and how is it interpreted with respect to the significance level?

The p-value in hypothesis testing indicates the probability of observing the given data, or something more extreme, under the assumption that the null hypothesis is true. It provides a measure of evidence against the null hypothesis. A p-value less than the chosen significance level α suggests that the observed result is statistically significant, leading to the rejection of the null hypothesis. If the p-value is greater than α, we fail to reject the null hypothesis, implying insufficient evidence against it .

Q: How is a critical region determined in a hypothesis test and what does it signify?

A critical region in a hypothesis test is determined by the critical values, which are based on the chosen significance level. This region signifies the range of values for the test statistic that would lead to the rejection of the null hypothesis. For a one-sided test, the critical region is either below or above a single critical value, whereas for a two-sided test, the region lies outside two critical values, one on each tail of the distribution. These regions are essential for making decisions about the null hypothesis .

Q: Explain the procedure of conducting a two-sample t-test and its application in comparing two population means.

Conducting a two-sample t-test involves several steps: First, state the null hypothesis, usually that the two population means are equal (H₀: μ₁ = μ₂). Second, select the significance level α. Third, calculate the test statistic using the formula t = (x̄ - ȳ) / √((s₁²/m) + (s₂²/n)), where x̄ and ȳ are sample means, s₁² and s₂² are sample variances, and m and n are sample sizes. Fourth, determine the degrees of freedom and corresponding critical t-value. Finally, compare the test statistic to the critical value to decide if the null hypothesis can be rejected, indicating a significant difference between means .

The document outlines the course STAT 223 on Principles of Statistical Inference, covering topics such as point estimation, interval estimation, hypothesis testing, and various statistical analyses. It includes references to key textbooks and introduces fundamental concepts like statistics, sampling distributions, and methods of point estimation. The document also discusses unbiased estimators, mean squared error, and the bootstrap method for estimating standard errors.

Uploaded by

kissingerhenry68

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views57 pages

Statistical Inference Principles Guide

Uploaded by

kissingerhenry68

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

STAT 223 – PRINCIPLES OF STATISTICAL INFERENCE

Lecture Notes

Course Outline

1. Introduction
2. Point Estimation
3. Interval Estimation
4. Test of Hypothesis
5. Inference on Proportions
6. Inference on Means
7. Analysis of Variance
8. Categorical Data Analysis
9. Regression and Correlation
10. Non-Parametric Inference

Books
1. H J Larsons. Introduction to Probability Theory and Statistical Inference. 3rd ed.
Wiley, 1982
2. I Miller & M Miller. John E Freud’s. Mathematical Statistics with Application. 7th ed.
Pearson’s Education, Prentice Hall, New Jersey, 2003.
3. Mik Winsniewski. Quantitative Methods for Decision Makers. 4th E.d. Prentice Hall
2006
CHAPTER ONE

INTRODUCTION

A statistic is any quantity whose value can be calculated from sample data. Given a
sample of 𝑛 observations from a population, one can compute estimates of the
population mean, median, standard deviation, and various other population
characteristics (parameters).
The most commonly used sample statistics and the corresponding population
parameters are shown in the following table.
Parameter Population Sample
Proportion 𝑋 𝑥
𝑃= 𝑃̂ =
𝑁 𝑛
𝑛
Mean 1
𝑁
1
𝜇 = ∑ 𝑋𝑖 𝑥̅ = ∑ 𝑥𝑖
𝑁 𝑛
𝑖=1 𝑖=1

Standard deviation 𝑁 𝑛
1 1
𝜎 = √ ∑(𝑋𝑖 − 𝜇)2 𝑠=√ ∑(𝑥𝑖 − 𝑥̅ )2
𝑁 𝑛−1
𝑖=1 𝑖=1

Since a statistic varies from one sample to the other it can be regarded as a random
variable. Prior to obtaining data, there is uncertainty as to which of all possible samples
will occur. Because of this, estimates such as the sample mean 𝑥̅ and sample standard
deviation 𝑠 will vary from one sample to another. The behaviour of such estimates in
repeated sampling is described by sampling distributions. Any particular sampling
distribution will give an indication of how close the estimate is likely to be to the value
of the parameter being estimated.

The probability distribution of any particular statistic depends not only on the
population distribution (normal, uniform, etc.) and the sample size n but also on the
method of sampling.

Statistical inference is frequently directed toward drawing some type of conclusion

about one or more parameters (population characteristics). To do so requires that an
investigator obtain sample data from each of the populations under study. Conclusions
can then be based on the computed values of various sample quantities.

The methods of inferential statistics are

 Estimation of parameters.
 Testing of statistical hypotheses.

2
CHAPTER TWO

POINT ESTIMATION

Given a parameter of interest, such as a population mean µ or population proportion p,

the objective of point estimation is to use a sample to compute a number that represents
in some sense a good guess for the true value of the parameter. The resulting number is
called a point estimate.

A point estimate of a parameter θ is a single number that can be regarded as a sensible

value for θ. A point estimate is obtained by selecting a suitable statistic and computing
its value from the given sample data. The selected statistic is called the point estimator
of θ.

Example 2.1
A random sample of 𝑛 = 3 batteries might yield observed lifetimes (hours) 𝑥1 =
5.0, 𝑥2 = 6.4, 𝑥3 = 5.9. The computed value of the sample mean lifetime is 𝑥̅ = 5.77 ,
and it is reasonable to regard 5.77 as a very plausible and “best guess” for the value of µ
based on the available sample information

2.1 Mean Squared Error

In the best of situation, we could find an estimator 𝜃̂ for which 𝜃̂ = 𝜃 always. However,
𝜃̂ is a function of the sample Xi’s, so it is a random variable. For some samples, 𝜃̂ will
yield a value larger than 𝜃, whereas for other samples 𝜃̂ will underestimate 𝜃. If we
write 𝜃̂ = 𝜃 + 𝐸𝑟𝑟𝑜𝑟 𝑜𝑓 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑖𝑜𝑛 then an accurate estimator would be one resulting in
small estimation errors, so that estimated values will be near the true value.
A popular way to quantify the idea of 𝜃̂ being close to 𝜃 is to consider the squared error
2
(𝜃̂ − 𝜃)
Recall 𝑉𝐴𝑅(𝑋) = 𝐸(𝑋 2 ) − [𝐸(𝑋)]2. Letting 𝑋 = (𝜃̂ − 𝜃)
2
𝑉𝐴𝑅(𝜃̂ − 𝜃) = 𝐸[(𝜃̂ − 𝜃) ] − [𝐸(𝜃̂ − 𝜃)]2
2
𝐸 [(𝜃̂ − 𝜃) ] = 𝑉𝐴𝑅(𝜃̂) + [𝐸(𝜃̂) − 𝜃]2
2
𝐸 [(𝜃̂ − 𝜃) ] is called the mean squared error, MSE and 𝐸(𝜃̂) − 𝜃 the bias.
Thus
𝑀𝑆𝐸 = 𝑉𝐴𝑅(𝜃̂) + [𝐵𝑖𝑎𝑠]2

2.2. Unbiased Estimators

A point estimator 𝜃̂ is said to be an unbiased estimator of 𝜃 if 𝐸(𝜃̂ ) = 𝜃 for every
possible value of 𝜃. If 𝜃̂ is not unbiased, the difference 𝐸(𝜃̂) − 𝜃 is called the bias of 𝜃̂

3
Example 2.2
When 𝑋 is a binomial random variable with parameters n and p, Show that the sample
𝑥
proportion 𝑝̂ = 𝑛 is an unbiased estimator of 𝑝.

Solution
𝑥 1 1
𝐸(𝑝̂ ) = 𝐸 (𝑛) = 𝑛 𝐸(𝑥) = 𝑛 . 𝑛𝑝 = 𝑝.
Since 𝐸(𝑝̂ ) = 𝑝, 𝑝̂ is an unbiased estimator of p

Exercise 2.1
1
Show that the sample variance 𝑠 2 = 𝑛−1 ∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2 is an unbiased estimator of 𝜎 2

2.3 Estimators with Minimum Variance

Among all estimators of 𝜃 that are unbiased, choose the one that has minimum
variance. The resulting 𝜃̂ is called the minimum variance unbiased estimator (MVUE)
of 𝜃. Since MSE = variance + (bias)2, seeking an unbiased estimator with minimum
variance is the same as seeking an unbiased estimator that has minimum mean squared
error

2.4 Reporting a Point Estimate: The Standard Error

Besides reporting the value of a point estimate, some indication of its precision should
be given. The usual measure of precision is the standard error of the estimator used.

The standard error of an estimator 𝜃̂ is its standard deviation 𝜎𝜃̂ = √𝑉𝑎𝑟( 𝜃̂)
If the standard error itself involves unknown parameters whose values can be
estimated, substitution of these estimates into 𝜎𝜃̂ yields the estimated standard error
(estimated standard deviation) of the estimator. The estimated standard error is
denoted 𝑠𝜃̂

Example 2.3
1
Find the standard error for the sample mean, 𝑥̅ = 𝑛 ∑𝑛𝑖=1 𝑥𝑖
Solution
𝑛
1
𝑥̅ = ∑ 𝑥𝑖
𝑛
𝑖=1
𝑛 𝑛
1 1 𝑛𝜎 2 𝜎 2
𝑉𝑎𝑟(𝑥̅ ) = 𝑉𝑎𝑟 ( ∑ 𝑥𝑖 ) = 2 ∑ 𝑉𝑎𝑟(𝑥𝑖 ) = 2 =
𝑛 𝑛 𝑛 𝑛
𝑖=1 𝑖=1

𝜎2 𝜎
𝜎𝑥̅ = √ =
𝑛 √𝑛

𝜎
Therefore the standard error for the sample mean is
√𝑛

4
2.5 The Bootstrap (*Optional)
The form of the estimator 𝜃̂ may be sufficiently complicated so that standard statistical
theory cannot be applied to obtain an expression for 𝜎𝜃̂ .
In recent years, a new computer-intensive method called the bootstrap has been
introduced to address this problem. Suppose that the population pdf is f (x;θ), a
member of a particular parametric family, and that data x1, x2, . . .,xn gives 𝜃̂ = 21.7.
We now use the computer to obtain “bootstrap samples” from the pdf f (x; 21.7), and for
each sample we calculate a “bootstrap estimate” 𝜃̂ ∗ :
First bootstrap sample: 𝑥1∗ , 𝑥2∗ , … , 𝑥𝑛∗ ; estimate 𝜃̂1∗
Second bootstrap sample: 𝑥1∗ , 𝑥2∗ , … , 𝑥𝑛∗ ; estimate = 𝜃̂2∗
.
.
.
𝐵 − 𝑡ℎ bootstrap sample: 𝑥1∗ , 𝑥2∗ , … , 𝑥𝑛∗ ; estimate = 𝜃̂𝐵∗
1
B= 100 or 200 is often used. Now let 𝜃̅ ∗ = 𝐵 ∑𝐵𝑖=1 𝜃̂𝑖∗ the sample mean of the bootstrap
estimates. The bootstrap estimate of the standard error of 𝜃̂ is now just the sample
standard deviation of the 𝜃̂𝑖∗ s:
𝐵
1 2
𝑆 𝜃̂ = √ ∑( 𝜃̂𝑖∗ − 𝜃̅ ∗ )
𝐵−1
𝑖=1

2.6 Methods of Point Estimation

2.6.1 The Method of Moments

The basic idea of this method is to equate certain sample characteristics, such as the
mean, to the corresponding population expected values. Solving these equations for
unknown parameter values yields the estimators.

Let 𝑥1 , 𝑥2 , … , 𝑥𝑛 be a random sample from a 𝑝𝑚𝑓 or 𝑓(𝑥). For 𝑘 = 1, 2, 3, …, the 𝑘 − 𝑡ℎ

population moment, or 𝑘 − 𝑡ℎ moment of the distribution 𝑓(𝑥), is 𝐸(𝑋 𝑘 ).
1
The 𝑘 − 𝑡ℎ sample moment is 𝑛 ∑𝑛𝑖=1 𝑥𝑖𝑘
Thus the first population moment is 𝐸(𝑋) = µ and the first sample moment is 𝑥̅ =
1 𝑛
∑ 𝑥
𝑛 𝑖=1 𝑖
1
The second population and sample moments are 𝐸(𝑋 2 ) and 𝑛 ∑𝑛𝑖=1 𝑥𝑖2 respectively. The
population moments will be functions of any unknown parameters 𝜃1 , 𝜃2 , …

Example 2.4
Let 𝑥1 , 𝑥2 , … , 𝑥𝑛 represent a random sample of service times of n customers at a certain
facility, where the underlying distribution is assumed exponential with parameter λ.
Since there is only one parameter to be estimated, the estimator is obtained by equating
5
𝐸(𝑋) to 𝑥̅ . Since 𝐸(𝑋) = 1/𝜆 for an exponential distribution, this gives 1/𝜆 = 𝑥̅ or
𝜆 = 1/ 𝑥̅ . The moment estimator of 𝜆 is then ̂𝜆 = 1/𝑥̅

Example 2.5
Let 𝑥1 , 𝑥2 , … , 𝑥𝑛 be a random sample from a gamma distribution with parameters 𝛼 and
𝛽 having 𝐸(𝑋) = 𝛼𝛽 and 𝑉𝑎𝑟(𝑋) = 𝛼𝛽 2 .
a) Obtain the moment estimators of 𝛼 and 𝛽
b) The data below shows the survival time 𝑋 in weeks of a randomly selected male
mouse exposed to 240 rads of gamma radiation. Assuming it has a gamma
distribution, compute the estimates of 𝛼 and 𝛽.
152 115 109 94 88 137 152 77 160 165
125 40 128 123 136 101 62 153 83 69

Solution
a)
𝑉𝑎𝑟(𝑋) = 𝐸(𝑋 2 ) − [𝐸(𝑋)]2
𝐸(𝑋 2 ) = 𝑉𝑎𝑟(𝑋) + [𝐸(𝑋)]2
In this case
𝐸(𝑋 2 ) = αβ2 + 𝛼 2 𝛽 2
Equating the sample and population moments:
𝐸(𝑋) = 𝑥̅
𝛼𝛽 = 𝑥̅ (i)
𝑛
1
𝐸(𝑋 2 ) = ∑ 𝑥𝑖2
𝑛
𝑖=1
1
2
αβ + 𝛼 𝛽 = 𝑛 ∑𝑛𝑖=1 𝑥𝑖2
2 2
(ii)
From (i) 𝛼𝛽 = 𝑥̅ . Substituting this in (ii)
𝑛
1
β𝑥̅ + 𝑥̅ = ∑ 𝑥𝑖2
2
𝑛
𝑖=1
1 𝑛 2
∑ 𝑥 − 𝑥̅ 2
̂β = 𝑛 𝑖=1 𝑖
𝑥̅
𝛼 = 𝑥̅ /𝛽
𝑥̅ 2
̂=
α
1 𝑛 2 2
∑
𝑛 𝑖=1 𝑥𝑖 − 𝑥̅
b)
From the data given
2269
𝑥̅ = = 113.45
20
𝑛
1 1522 + 1152 + ⋯ + 692 281755
∑ 𝑥𝑖2 = = = 14087.75
𝑛 20 20
𝑖=1

6
113.452 12870.9025 12870.9025
̂=
α = = = 10.577
14087.75 − 113.452 14087.75 − 12870.9025 1216.8475

14087.75 − 113.452 1216.8475

β̂ = = = 10.726
113.45 113.45

2.6.2 Maximum Likelihood Estimation

Let 𝑥1 , 𝑥2 , … , 𝑥𝑛 have joint 𝑝𝑚𝑓 or 𝑝𝑑𝑓 𝑓(𝑥1 , 𝑥2 , … , 𝑥𝑛 ; 𝜃1 , 𝜃2 , … , 𝜃𝑚 ) where the parameters
𝜃1 , 𝜃2 , … , 𝜃𝑚 have unknown values. When 𝑥1 , 𝑥2 , … , 𝑥𝑛 are the observed sample values
and 𝑓(𝑥1 , 𝑥2 , … , 𝑥𝑛 ; 𝜃1 , 𝜃2 , … , 𝜃𝑚 ) is regarded as a function of 𝜃1 , 𝜃2 , … , 𝜃𝑚 , it is called the
likelihood function. The maximum likelihood estimates 𝜃̂1 , 𝜃̂2 , … , 𝜃̂𝑚 are those values of
the 𝜃𝑖 ′𝑠 that maximize the likelihood function, so that
𝑓(𝑥1 , 𝑥2 , … , 𝑥𝑛 ; 𝜃̂1 , 𝜃̂2 , … , 𝜃̂𝑚 ) ≥ 𝑓(𝑥1 , 𝑥2 , … , 𝑥𝑛 ; 𝜃1 , 𝜃2 , … , 𝜃𝑚 ) for all 𝜃1 , 𝜃2 , … , 𝜃𝑚 .

Note that Since 𝑙𝑛[𝑔(𝑥)] is a monotonic function of 𝑔(𝑥), finding 𝑥 to maximize 𝑙𝑛[𝑔(𝑥)]
is equivalent to maximizing 𝑔(𝑥) itself. In statistics, taking the logarithm frequently
changes a product to a sum, which is easier to work with.

Example 2.6
Suppose 𝑥1 , 𝑥2 , … , 𝑥𝑛 is a random sample from an exponential distribution with
parameter 𝜆. Because of independence, the likelihood function is a product of the
individual pdf’s:
𝐿 = 𝑓(𝑥1 , 𝑥2 , … , 𝑥𝑛 ; 𝜆) = (𝜆𝑒 −𝜆𝑥1 )(𝜆𝑒 −𝜆𝑥2 ) … (𝜆𝑒 −𝜆𝑥𝑛 )

The 𝑙𝑛(𝑙𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑) is
ln 𝐿 = ln 𝑓(𝑥1 , 𝑥2 , … , 𝑥𝑛 ; 𝜆) = 𝑙𝑛 (𝜆𝑒 −𝜆𝑥1 ) + 𝑙𝑛(𝜆𝑒 −𝜆𝑥2 ) + ⋯ + 𝑙𝑛(𝜆𝑒 −𝜆𝑥𝑛 )

ln 𝐿 = ln 𝜆 − 𝜆𝑥1 + ln 𝜆 − 𝜆𝑥2 + ⋯ + ln 𝜆 − 𝜆𝑥𝑛

𝑛

ln 𝐿 = 𝑛 ln 𝜆 − 𝜆 ∑ 𝑥𝑖
𝑖=1
d ln 𝐿
To maximize ln 𝐿 with respect to 𝜆, solve =0
𝑑𝜆
𝑛
d ln 𝐿 𝑛
= − ∑ 𝑥𝑖 = 0
𝑑𝜆 𝜆
𝑖=1
𝑛 1
𝜆̂ = 𝑛 =
∑𝑖=1 𝑥𝑖 𝑥̅

1
Thus the 𝑚𝑙𝑒 is 𝜆̂ = 𝑥̅ which is identical to the method of moment’s estimator.

7
Example 2.7
Let 𝑥1 , 𝑥2 , … , 𝑥𝑛 be a random sample from a normal distribution. The likelihood function
is
𝐿 = 𝑓(𝑥1 , 𝑥2 , … , 𝑥𝑛 ; 𝜇, 𝜎 2 )
1 2 2 1 2 2 1 2 2
=( 𝑒 −(𝑥1 −𝜇) /2𝜎 ) ( 𝑒 −(𝑥2 −𝜇) /2𝜎 ) … ( 𝑒 −(𝑥𝑛−𝜇) /2𝜎 )
√2𝜋𝜎 2 √2𝜋𝜎 2 √2𝜋𝜎 2
𝑛
𝑛 1
ln 𝐿 = − ln(2𝜋𝜎 2 ) − 2 ∑(𝑥𝑖 − 𝜇)2
2 2𝜎
𝑖=1

To find the maximizing values of µ and 𝜎 2 , we must take the partial derivatives of ln 𝐿
with respect to µ and 𝜎 2 , equate them to zero and solve the resulting two equations. The
resulting 𝑚𝑙𝑒’𝑠 are:
𝜇̂ = 𝑥̅
𝑛
1
𝜎̂ 2 = ∑(𝑥𝑖 − 𝑥̅ )2
𝑛
𝑖=1

The 𝑚𝑙𝑒 of σ2 is not the unbiased estimator, so two different principles of estimation
(unbiasedness and maximum likelihood) yield two different estimators

Exercise 2.2
Let 𝑥1 , 𝑥2 , … , 𝑥𝑛 represent a random sample from a Rayleigh distribution with pdf

𝑥 2 /2𝜃
𝑒 −𝑥
𝑓(𝑥; 𝜃) = { 𝜃 x>0
0 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒
𝜋𝜃 4−𝜋
It can be shown that the mean and variance are respectively √ 2 and 𝜃.
2
A random sample of ten yields the data
16.88, 10.23, 4.59, 6.66, 13.68, 14.23, 19.87, 9.40, 6.51, 10.95
a) Use the method of moments to obtain an estimate of θ and then compute the estimate
for this data.
b) Obtain the maximum likelihood estimator of θ, and then compute the estimate for the
given data.

8
CHAPTER THREE

INTERVAL ESTIMATION

A point estimate, because it is a single number, by itself provides no information about

the precision and reliability of estimation.
Consider, for example the statistic 𝑥̅ . Because of sampling variability, it is virtually
never the case that 𝑥̅ = 𝜇 .

The point estimate says nothing about how close it might be to 𝜇.

An alternative to reporting a single sensible value for the parameter being estimated is
to calculate and report an entire interval of plausible values—an interval estimate or
confidence interval (CI).
A confidence interval is always calculated by first selecting a confidence level, which is a
measure of the degree of reliability of the interval.

A confidence level of 95% implies that 95% of all samples would give an interval that
includes µ, or whatever other parameter is being estimated, and only 5% of all samples
would yield an erroneous interval.
The most frequently used confidence levels are 95%, 99%, and 90%. The higher the
confidence level, the more strongly we believe that the value of the parameter being
estimated lies within the interval.

Information about the precision of an interval estimate is conveyed by the width of the
interval. If the confidence level is high and the resulting interval is quite narrow, our
knowledge of the value of the parameter is reasonably precise.
A very wide confidence interval, however, gives the message that there is a great deal
of uncertainty concerning the value of what we are estimating

3.1 Confidence Interval for the Population Mean

Case I: 𝝈 known
A 100(1 − 𝛼)% confidence interval for the mean µ of a normal population when the
value of σ is known is given by
𝜎 𝜎
(𝑥̅ − 𝑧𝛼 . , 𝑥̅ + 𝑧𝛼 . )
2 √𝑛 2 √𝑛
Example 3.1
A sample of 40 students has a mean final exam score of 70.7. Past experience suggests
that the distribution of final exam scores is normally distributed with standard
deviation 13. Calculate a confidence interval for the population mean using a
confidence level of 90%

9
Solution
In this case 100(1 − 𝛼) % = 90%, (1 − 𝛼) = 0.9, 𝛼 = 0.1. 𝑧𝛼/2 = 𝑧0.05 = 1.645. The
desired interval is:
13
70.7 ± 1.645. = 70.7 ± 3.381 = (67.319,74.081)
√40
With a 90% degree of confidence, we can say that 67.319<µ74.081

Case II: 𝝈 unknown, large 𝒏

If n is sufficiently large
𝑠
𝑥̅ ± 𝑧𝛼 .
√𝑛
2
is a large-sample confidence interval for µ with confidence level approximately
100(1 − 𝛼)%.

This formula is valid regardless of the shape of the population distribution.

Generally 𝑛 > 30 will be sufficient to justify the use of this interval.

Example 3.2
An algebra placement test was used to determine placement in mathematics courses. A
sample of 50 students gave the following scores. Calculate the 95% confidence interval
of the population mean µ
29 21 23 24 22 24 22 23 15 21 22 17 15 23 17 18 23 18
19 17 14 19 16 22 23 14 19 19 22 16 21 12 28 20 17 24
12 18 18 10 21 22 26 24 14 27 15 24 28 13

Solution
From the data given 𝑛 = 50, 𝑥̅ =19.82 and 𝑠 = 4.50. The 95% confidence interval is then:
4.50
19.82 ± 1.96. = 19.82 ± 1.25 = (18.6,21.1)
√50
Hence 18.6 < µ < 21.1with a 95% confidence level. The interval has a reasonably
narrow width of 2.5 indicating a fairly precise estimation of µ.

Case II: 𝝈 unknown, small 𝒏

Let 𝑥̅ and s be the sample mean and sample standard deviation computed from the
results of a random sample from a normal population with mean µ.
Then a 100(1 − 𝛼)% confidence interval for µ is
𝑠
𝑥̅ ± 𝑡𝛼⁄2,𝑛−1 .
√𝑛
Example 3.3
The weights of 16, 3-month old babies attending clinic are given below:
4.68 4.13 4.80 4.63 5.08 5.79 6.29 6.79
4.93 4.25 5.70 4.74 5.88 6.77 6.04 4.95

10
Compute the 95% confidence interval for the population mean µ.

Solution
From the data given 𝑛 = 16, 𝑥̅ =5.34 and 𝑠 = 0.8483. The 95% confidence interval is then:
𝑠 0.8483
𝑥̅ ± 𝑡0.025,15 . 5.34 ± 2.131. = 5.34 ± 0.45 = (4.89,5.79)
√𝑛 √16

Hence 4.89<µ<5.79 with a 95% confidence level.

3.2 Confidence Interval for the Population Proportion

A confidence interval for a population proportion p with confidence level
approximately 100(1 − 𝛼)% has confidence limits:
𝑝̂ 𝑞̂
𝑝̂ ± 𝑧𝛼⁄2 √
𝑛
Example 3.4
In 𝑛 = 48 trials in a particular laboratory, 16 resulted in ignition of a particular type of
substrate by a lighted cigarette. Let 𝑝 denote the long-run proportion of all such trials
16
that would result in ignition. A point estimate for 𝑝 is 𝑝̂ = 48 = 0.333. A confidence
interval for 𝑝 with a confidence level of approximately 95% is:
0.333 ∗ 0.667
0.333 ± 1.96√ = 0.333 ± 0.133 = (0.200,0.466)
48
Hence 0.200 < 𝑝 < 0.466 with a 95% confidence level

3.3 Confidence Interval for the Population Variance

A 100(1 − 𝛼)% confidence interval for the variance 𝜎 2 of a normal population has
lower limit
2
(𝑛 − 1)𝑠 2 /𝜒𝛼/2,𝑛−1
and upper limit

2
(𝑛 − 1)𝑠 2 /𝜒(1−𝛼/2),𝑛−1
A confidence interval for σ has lower and upper limits that are the square roots of the
corresponding limits in the interval for 𝜎 2 .

Example 3.5
Recall the example of the weight of 16, 3 –month old babies. Compute the 95%
confidence interval of the population variance σ2.

Solution
2 2
𝑠 2 = 0.84832 = 0.7196, 𝜒0.025,15 = 27.488, 𝜒0.975,15 = 6.262

11
The 95% confidence interval for σ2 if
15 ∗ 0.7196 15 ∗ 0.7196
( , ) = (0.3927,1.7237)
27.488 6.262

Taking the square root of each endpoint yields (0.6267, 1.3129) as the 95% confidence
interval for σ.

3.4 Bootstrap Confidence Intervals (*Optional)

The bootstrap, developed by Bradley Efron in the late 1970s, allows us to calculate
estimates in situations where there is no adequate statistical theory.

The method substitutes heavy computation for theory, and it has been feasible only
fairly recently with the availability of fast computers.

The bootstrap percentile interval with a confidence level of 100(1-α)% for a specified
parameter is obtained by first generating B bootstrap samples, for each one calculating
the value of some particular statistic that estimates the parameter, and sorting these
values from smallest to largest.
Then we compute k =αB/2 and choose the kth value from each end of the sorted list.

These two values form the confidence limits for the confidence interval. If k is not an
integer, then interpolation can be used, but this is not crucial.

As an example, if α =0.05 and B =1000 then k =αB/2 = (.05)(1000)/2 = 25.

12
CHAPTER FOUR

HYPOTHESIS TESTING

Setting up and testing hypotheses is an essential part of statistical inference. In order to

formulate such a test, usually some theory has been put forward, either because it is
believed to be true or because it is to be used as a basis for argument, but has not been
proved, for example, claiming that a new marketing strategy is better than the current
one for a particular product

In each problem considered, the question of interest is simplified into two competing
claims / hypotheses between which we have a choice. These are:

a) Null Hypothesis
The null hypothesis H0 represents a theory that has been put forward, either because it
is believed to be true or because it is to be used as a basis for argument, but has not been
proved. For example, in the study of the effects of a new finance policy on the
performance of a company, the null hypothesis might be that the new policy is no
better, on average, than the current policy. We would write
𝐻0 : There is no difference between the two financial policies on average.

b) Alternative Hypothesis
The alternative hypothesis, H1, is a statement of what a statistical hypothesis test is set
up to establish. For example, in the study of the effects of a new finance policy on the
performance of a company, the null alternative hypothesis might be that the new policy
has a different effect on average compared to the current policy. We would write
𝐻1 : The two policies have different effects on average.

The alternative hypothesis might also be that the new policy is better, on average, than
the current one. In this case we would write
𝐻1 : The new policy is better than the current one on average

The final conclusion once the test has been carried out is always given in terms of the
null hypothesis. The two possible conclusions are:
- Reject 𝐻0 in favour of 𝐻1
- Fail to reject 𝐻0

Concluding “Fail to reject H0" does not necessarily mean that the null hypothesis is true,
it only suggests that there is not sufficient evidence against 𝐻0 in favour of 𝐻1 .

Rejecting the null hypothesis then, suggests that the alternative hypothesis is likely to be
true.

13
When making a conclusion in hypothesis testing two types of errors can be made

Type I Error
A type I error occurs when the null hypothesis is rejected when it is in fact true; that is,
𝐻0 is wrongly rejected.

For example, in the study of the effects of a new finance policy on the performance of a
company, the null hypothesis might be that the new policy is no better, on average, than
the current policy. That is:
𝐻0 : There is no difference between the two financial policies on average.

A type I error would occur if we concluded that the two policies produced different
effects when in fact there was no difference between them.

Type II Error
A type II error occurs when the null hypothesis 𝐻0 , is not rejected when it is in fact false.
For example, in the study of the effects of a new finance policy on the performance of a
company, the null hypothesis might be that the new policy is no better, on average, than
the current policy. That is:
𝐻0 : There is no difference between the two financial policies on average.

A type II error would occur if it was concluded that the two policies produced the same
effect, i.e. there is no difference between the two policies on average, when in fact they
produced different ones.

A type II error is frequently due to sample sizes being too small.

The following table gives a summary of possible results of any hypothesis test:

Decision
Reject 𝑯𝟎 Don't reject 𝑯𝟎
𝑯𝟎 Type I Error Right decision
Truth
𝑯𝟏 Right decision Type II Error

A type I error is often considered to be more serious, and therefore more important to
avoid, than a type II error. The hypothesis test procedure is therefore adjusted so that
there is a guaranteed 'low' probability of rejecting the null hypothesis wrongly; this
probability is never 0. This probability of a type I error can be precisely computed as
𝑃(𝑡𝑦𝑝𝑒 𝐼 𝑒𝑟𝑟𝑜𝑟) = 𝛼

The exact probability of a type II error is generally unknown.

14
If we do not reject the null hypothesis, it may still be false (a type II error) as the sample
may not be big enough to identify the falseness of the null hypothesis (especially if the
truth is very close to hypothesis).

For any given set of data, type I and type II errors are inversely related; the smaller the
risk of one, the higher the risk of the other.
A type I error can also be referred to as an error of the first kind.

The probability of a type II error is generally unknown, but is symbolised by β and

written
𝑃(𝑡𝑦𝑝𝑒 𝐼𝐼 𝑒𝑟𝑟𝑜𝑟) = 𝛽
A type II error can also be referred to as an error of the second kind.

Significance Level, 𝜶
The significance level of a statistical hypothesis test is a fixed probability of wrongly
rejecting the null hypothesis H0, if it is in fact true.
It is the probability of a type I error and is set by the investigator in relation to the
consequences of such an error. That is, we want to make the significance level as small
as possible in order to protect the null hypothesis and to prevent, as far as possible, the
investigator from inadvertently making false claims.
The significance level is usually denoted by 𝛼
𝑆𝑖𝑔𝑛𝑖𝑓𝑖𝑐𝑎𝑛𝑐𝑒 𝐿𝑒𝑣𝑒𝑙 = 𝑃 (𝑡𝑦𝑝𝑒 𝐼 𝑒𝑟𝑟𝑜𝑟) = 𝛼
Usually, the significance level is chosen to be 0.05 (or equivalently, 5%).

P-Value
The probability value (p-value) of a statistical hypothesis test is the probability of
getting a value of the test statistic as extreme as or more extreme than that observed by
chance alone, if the null hypothesis H0 is true. It is the probability of wrongly rejecting
the null hypothesis if it is in fact true.

It is equal to the significance level of the test for which we would only just reject the null
hypothesis. The p-value is compared with the actual significance level of our test and, if
it is smaller, the result is significant. That is, if the null hypothesis were to be rejected at
the 5% significance level, this would be reported as "p < 0.05".

Small p-values suggest that the null hypothesis is unlikely to be true. The smaller it is,
the more convincing is the rejection of the null hypothesis. It indicates the strength of
evidence for say, rejecting the null hypothesis H0, rather than simply concluding "Reject
H0' or "Fail to reject H0".

Power
The power of a statistical hypothesis test measures the test's ability to reject the null
hypothesis when it is actually false - that is, to make a correct decision.

15
In other words, the power of a hypothesis test is the probability of not committing
a type II error. It is calculated by subtracting the probability of a type II error from 1,
usually expressed as:
𝑃𝑜𝑤𝑒𝑟 = 1 − 𝑃(𝑡𝑦𝑝𝑒 𝐼𝐼 𝑒𝑟𝑟𝑜𝑟) = 1 − 𝛽
The maximum power a test can have is 1, the minimum is 0. Ideally we want a test to
have high power, close to 1.

Test Statistic
A test statistic is a quantity calculated from our sample of data. Its value is used to
decide whether or not the null hypothesis should be rejected in our hypothesis test.
The choice of a test statistic will depend on the assumed probability model and the
hypotheses under question.

Critical Value(s) and Region

The critical value(s) for a hypothesis test is a threshold to which the value of the test
statistic in a sample is compared to determine whether or not the null hypothesis is
rejected.
The critical value for any hypothesis test depends on the significance level at which the
test is carried out, and whether the test is one-sided or two-sided.

The critical region, or rejection region, is a set of values of the test statistic for which the
null hypothesis is rejected in a hypothesis test. That is, the sample space for the test
statistic is partitioned into two regions; one region (the critical region) will lead us to
reject the null hypothesis H0, the other will not. So, if the observed value of the test
statistic is a member of the critical region, we conclude "Reject H0"; if it is not a member
of the critical region then we conclude "Fail to reject H0".

One-sided Test
A one-sided test is a statistical hypothesis test in which the values for which we can
reject the null hypothesis, 𝐻0 are located entirely in one tail of the probability
distribution.

In other words, the critical region for a one-sided test is the set of values less than the
critical value of the test, or the set of values greater than the critical value of the test.

A one-sided test is also referred to as a one-tailed test of significance.

The choice between a one-sided and a two-sided test is determined by the purpose of
the investigation or prior reasons for using a one-sided test.

Example
Suppose we wanted to test a manufacturer’s claim that there are, on average, 50
matches in a box. We could set up the following hypotheses

16
𝐻0 : µ = 50,
against
𝐻1 : µ < 50 or 𝐻1 : µ > 50
Either of these two alternative hypotheses would lead to a one-sided test. Presumably,
we would want to test the null hypothesis against the first alternative hypothesis since
it would be useful to know if there is likely to be less than 50 matches, on average, in a
box (no one would complain if they get the correct number of matches in a box or
more).

Two-Sided Test
A two-sided test is a statistical hypothesis test in which the values for which we can
reject the null hypothesis, H0 are located in both tails of the probability distribution.
In other words, the critical region for a two-sided test is the set of values less than a first
critical value of the test and the set of values greater than a second critical value of the
test.

A two-sided test is also referred to as a two-tailed test of significance.

The choice between a one-sided test and a two-sided test is determined by the purpose
of the investigation or prior reasons for using a one-sided test.

Example
Suppose we wanted to test a manufacturers claim that there are, on average, 50 matches
in a box. We could set up the following hypotheses
𝐻0 : µ = 50,
𝐻1 : µ < 50 or 𝐻1 : µ > 50

Either of these two alternative hypotheses would lead to a one-sided test. Presumably,
we would want to test the null hypothesis against the first alternative hypothesis since
it would be useful to know if there is likely to be less than 50 matches, on average, in a
box (no one would complain if they get the correct number of matches in a box or
more).

Yet another alternative hypothesis could be tested against the same null, leading this
time to a two-sided test:
𝐻0 : µ = 50
𝐻1 : µ ≠ 50

Here, nothing specific can be said about the average number of matches in a box; only
that, if we could reject the null hypothesis in our test, we would know that the average
number of matches in a box is likely to be less than or greater than 50.

17
Steps in Conducting Hypothesis Testing
i. Begin by stating the claim or hypothesis that is being tested. Also form a
statement for the case that the hypothesis is false. These are 𝐻0 and 𝐻1 .

ii. Choose the desired significance level 𝛼. The values 0.05 and 0.01 are common
values used for alpha, but any positive number between 0 and 0.50 could be used
for a significance level.

iii. Determine which statistic and distribution to use. The type of distribution is
dictated by features of the data. Common distributions include: 𝑧 score, 𝑡 score
and chi-squared (𝜒 2 ) and 𝑓.

iv. Compute the test statistic and the 𝑝 value for this statistic. Here we will have to
consider if we are conducting a two tailed test (typically when the alternative
hypothesis contains a “is not equal to” symbol, or a one tailed test (typically used
when an inequality is involved in the statement of the alternative hypothesis).

v. If the 𝑝 value is less than the set significance level 𝛼 we must reject the null
hypothesis. The alternative hypothesis stands. If p value is not less 𝛼 then we fail
to reject the null hypothesis. This does not prove that the null hypothesis is true,
but gives a way to quantify how likely it is to be true.

vi. We now state the results of the hypothesis test in such a way that the original
claim is addressed.

18
CHAPTER FIVE

INFERENCE ON PROPORTIONS

Let 𝑝 denote the proportion of individuals or objects in a population who possess a

specified property (e.g., cars with manual transmissions or smokers who smoke a filter
cigarette).

If an individual or object with the property is labelled a success (S), then 𝑝 is the
population proportion of successes.

Tests concerning 𝑝 are based on a random sample of size n from the population.

5.1 One - Sample Proportion Test

The one-sample proportion test compares the proportion of a sample to a known value
usually the population proportion. The test is as follows:

Null hypothesis, 𝐻0 : 𝑝 = 𝑝𝑜

Alternative P value Rejection region for a level α test

hypothesis
𝐻1 : 𝑝 > 𝑝𝑜 𝑃(𝑍 ≥ 𝑧) 𝑧 ≥ 𝑧𝛼 (upper-tailed)

𝐻1 : 𝑝 < 𝑝𝑜 𝑃(𝑍 ≤ −𝑧) 𝑧 ≤ −𝑧𝛼 (lower-tailed)

𝐻1 : 𝑝 ≠ 𝑝𝑜 2𝑃(𝑍 ≥ 𝑧) = 2𝑃(𝑍 ≤ −𝑧) Either 𝑧 ≥ 𝑧𝛼/2 or −𝑧 ≤ −𝑧𝛼/2

(two-tailed)

Test statistic:
𝑝̂ − 𝑝𝑜
𝑧=
√𝑝𝑜 (1 − 𝑝𝑜 )
𝑛

Example 5.1
A plastics manufacturer has developed a new type of plastic trash can and proposes to
sell them with an unconditional 6-year warranty. To see whether this is economically
feasible, 20 prototype cans are subjected to an accelerated life test to simulate 6 years of
use. The proposed warranty will be modified only if the sample data strongly suggests
that fewer than 90% of such cans would survive the 6-year period. During the test 12
cans survive the test. Should the manufacturer implement the unconditional 6-year
warranty? Test at 𝛼 = 0.05.

19
Solution
Let 𝑝 denote the proportion of all cans that survive the accelerated test. The relevant
hypotheses are

𝐻0 : 𝑝 = 0.9
𝐻1 : 𝑝 < 0.9

12
The sample proportion is: 𝑝̂ = 20 = 0.8

The test statistic is

𝑝̂ − 𝑝𝑜 0.8 − 0.9 −0.1
𝑧= = = = −1.49
√𝑝𝑜 (1 − 𝑝𝑜 )/𝑛 √0.9(1 − 0.9)/20 0.0671

Since this is a one-tailed test 𝑧𝛼 = 𝑧0.05 = 1.64

Since the computed 𝑧 value is less than the tabulated one, we fail to reject 𝐻0 and
conclude that the proportion of cans that can survive 6 years is 0.9.

5.2 Two - Sample Proportion Test

We let 𝑝1 and 𝑝2 denote the proportions of individuals in populations 1 and 2,
respectively, who possess a particular characteristic.

Assume the availability of a sample of 𝑚 individuals from the first population and
𝑛 from the second.

The variables 𝑋 and 𝑌 represent the number of individuals in each sample possessing
the characteristic that defines 𝑝1 and 𝑝2 .

The obvious estimator for 𝑝1 − 𝑝2 , the difference in population proportions, is the

corresponding difference in sample proportions 𝑝̂1 − 𝑝̂2 with 𝑝̂1 = 𝑥/𝑚 and 𝑝̂2 = 𝑦/𝑛.

The two-sample proportion test compares the difference in sample proportions of two
independent populations. The test is as follows:

20
Null hypothesis, 𝐻0 : 𝑝1 = 𝑝2

Alternative P value Rejection region for a level α

hypothesis test

𝐻1 : 𝑝1 > 𝑝2 𝑃(𝑍 ≥ 𝑧) 𝑧 ≥ 𝑧𝛼 (upper-tailed)

𝐻1 : 𝑝1 < 𝑝2 𝑃(𝑍 ≤ −𝑧) 𝑧 ≤ −𝑧𝛼 (lower-tailed)

𝐻1 : 𝑝1 ≠ 𝑝2 2𝑃(𝑍 ≥ 𝑧) = 2𝑃(𝑍 ≤ −𝑧) Either 𝑧 ≥ 𝑧𝛼/2 or −𝑧 ≤ −𝑧𝛼/2

(two-tailed)

Test statistic:
𝑝̂1 − 𝑝̂ 2
𝑧=
√𝑝̂ 𝑞̂ ( 1 + 1)
𝑚 𝑛
𝑥+𝑦
Where 𝑝̂ = 𝑚+𝑛 with 𝑞̂ = 1 − 𝑝̂

Example 5.2
Is someone who switches brands because of a financial inducement less likely to remain
loyal than someone who switches without inducement?

Let 𝑝1 and 𝑝2 denote the true proportions of switchers to a certain brand with and
without inducement, respectively, who subsequently make a repeat purchase. Given
the data below, test the appropriate hypothesis at 𝛼 = 0.01 .

𝑚 =200 number of successes = 30

𝑛 =600 number of successes =180

Solution
The null and alternative hypotheses are:
𝐻0 : 𝑝1 = 𝑝2
𝐻1 : 𝑝1 < 𝑝2

The sample proportions are:

30 180
𝑝̂1 = = 0.15, 𝑝̂ 2 = = 0.3
200 600

𝑥+𝑦 30 + 180 210

𝑝̂ = = = = 0.2625, 𝑞̂ = 1 − 0.2625 = 0.7375
𝑚 + 𝑛 200 + 600 800

21
The test statistic is

𝑝̂1 − 𝑝̂ 2 0.15 − 0.3 −0.15

𝑧= = = = 4.1783
0.0359
√𝑝̂ 𝑞̂ ( 1 + 1) √0.2625 ∗ 0.7375 ( 1 + 1 )
𝑚 𝑛 200 600

−𝑧𝛼 = −𝑧0.01 = −2.33

Since the computed 𝑧 value is far less than the tabulated one, we reject 𝐻0 and conclude
that someone who switches brands because of a financial inducement less likely to
remain loyal than someone who switches without inducement.

22
CHAPTER SIX

INFERENCE ON MEANS

6.1 One-Sample T-Test

The one-sample t-test compares the mean score of a sample to a known value usually
the population mean (the average for the outcome of some population of interest). The
basic idea of the test is a comparison of the average of the sample (observed average)
and the population (expected average), with an adjustment for the number of cases in
the sample and the standard deviation of the average.

The test is as follows:

Null hypothesis, 𝐻0 : 𝜇 = 𝜇𝑜

Alternative P value Rejection region for a level α

hypothesis test
𝐻1 : 𝜇 > 𝜇𝑜 𝑃(𝑇𝑛−1 ≥ 𝑡) 𝑡 ≥ 𝑡𝛼,𝑛−1 (upper-tailed)

𝐻1 : 𝜇 < 𝜇𝑜 𝑃(𝑇𝑛−1 ≤ 𝑡) 𝑡 ≤ −𝑡𝛼,𝑛−1 (lower-tailed)

𝐻1 : 𝜇 ≠ 𝜇𝑜 2𝑃(𝑇𝑛−1 ≥ 𝑡) = 2𝑃(𝑇𝑛−1 ≤ 𝑡) Either 𝑡 ≥ 𝑡𝛼/2,𝑛−1 or 𝑡 ≤

−𝑡𝛼/2,𝑛−1 (two-tailed)

Test statistic:
𝑥̅ − 𝜇𝑜
𝑡=
𝑠/√𝑛

Example 6.1
A manufacturer claims that the average weight a certain product is 3.3 kg. A random
sample of ten such products gave the following results
2.6 2.2 2.9 3.4 3.4 3.7 1.7 2.7 3.3 2.3

Test at 𝛼 = 0.05 the hypothesis that the mean weight of the sampled products differs
from the claimed figure.

Solution
Let µ denote the mean weight of the product
The hypothesis to the tested is

𝐻0 : µ = 3.3
vs
𝐻1 : µ ≠ 3.3

23
This is a two tailed test 𝛼 = 0.05

In this case
𝑥̅ = 2.82
𝑛

∑(𝑥𝑖 − 𝑥̅ )2 = 3.656
𝑖=1

3.656
𝑠=√ = 0.6374
9

𝑖 𝑥𝑖 𝑥𝑖 − 𝑥̅ (𝑥𝑖 − 𝑥̅ )2
1 2.6 -0.22 0.0484
2 2.2 -0.62 0.3844
3 2.9 0.08 0.0064
4 3.4 0.58 0.3364
5 3.4 0.58 0.3364
6 3.7 0.88 0.7744
7 1.7 -1.12 1.2544
8 2.7 -0.12 0.0144
9 3.3 0.48 0.2304
10 2.3 -0.52 0.2704
Total 3.656

2.82 − 3.3
𝑡=
0.6374/√10

−0.48
𝑡= = −2.3810
0.2016

The tabulated t value for this test is 𝑡0.05,9 (𝑡𝑤𝑜 𝑡𝑎𝑖𝑙) = 2.262

Since the computed t value is greater that the tabulated one in absolute terms, we reject
𝐻0 and conclude that the mean weight of the sampled products significantly differs
from the claimed figure.

Exercise 3.1
According to a recent report on accident claims by an insurance company, the average
number accident claims reported per branch is 3 per day. A random sample of twenty
branches on a given day yielded the following data of claim incidents

24
3 1 2 1 2 3 5 2 5 1
3 2 3 1 3 3 4 3 1 8

Test at 𝛼 = 0.05 the hypothesis that the mean number of claim incidents in these
branches is less than the one in the report.

6.2 Two-Sample T-Test

We often want to know whether the means of two populations on some outcome differ.
For example, there are many questions in which we want to compare two categories of
some categorical variable (e.g., compare males and females) or two populations
receiving different treatments in context of an experiment. The two-sample t-test is a
hypothesis test for answering questions about the mean where the data are collected
from two random samples of independent observations.

The test is as follows:

Null hypothesis, 𝐻0 : 𝜇1 = 𝜇2

Alternative P value Rejection region for a level α

hypothesis test
𝐻1 : 𝜇1 > 𝜇2 𝑃(𝑇𝑛−1 ≥ 𝑡) 𝑡 ≥ 𝑡𝛼,𝜈 (upper-tailed)

𝐻1 : 𝜇1 < 𝜇2 𝑃(𝑇𝑛−1 ≤ 𝑡) 𝑡 ≤ −𝑡𝛼,𝜈 (lower-tailed)

𝐻1 : 𝜇1 ≠ 𝜇2 2𝑃(𝑇𝑛−1 ≥ 𝑡) = 2𝑃(𝑇𝑛−1 ≤ 𝑡) Either 𝑡 ≥ 𝑡𝛼/2,𝜈 or 𝑡 ≤

−𝑡𝛼/2,𝜈 (two-tailed)

Test statistic:
𝑥̅ − 𝑦̅
𝑡=
2 2
√𝑠1 + 𝑠2
𝑚 𝑛
Where
𝜈 =𝑚+𝑛−2
𝑚
1
𝑠12 = ∑(𝑥𝑖 − 𝑥̅ )2
𝑚−1
𝑖=1

𝑛
1
𝑠22 = ∑(𝑦𝑖 − 𝑦̅)2
𝑛−1
𝑖=1

25
Example 6.2
As an extension to Example 6.1, suppose the management of the company suspects that
there is a difference in weight of products between those produced during day shift and
those produced during night shift. A random sample of 18 products gave the following
results
Day 3.7 2.7 2.6 4.1 3.7 3.3 3.3 4.2 2.8 3.6
Nigh
3.2 2.3 3.3 2.6 3.3 3.4 3.1 3.6
t

Test 𝛼 = 0.05 the hypothesis that the mean weight of day shift products is greater than
that one of night shift in the company.

Solution
Let µ1 denote the mean weight of day shift products and µ2 denote the mean weight of
night shift products

The hypothesis to the tested is

𝐻0 : µ1 = µ2
vs
𝐻1 : µ1 > µ2

This is a one tailed test at 𝛼 = 0.05.

In this case
𝑥̅ = 3.4, 𝑦̅ = 3.1

𝑥𝑖 𝑥𝑖 − 𝑥̅ (𝑥𝑖 − 𝑥̅ )2 𝑦𝑖 𝑦𝑖 − 𝑦̅ (𝑦𝑖 − 𝑦̅)2

3.7 0.3 0.09 3.2 0.1 0.01
2.7 -0.7 0.49 2.3 -0.8 0.64
2.6 -0.8 0.64 3.3 0.2 0.04
4.1 0.7 0.49 2.6 -0.5 0.25
3.7 0.3 0.09 3.3 0.2 0.04
3.3 -0.1 0.01 3.4 0.3 0.09
3.3 -0.1 0.01 3.1 0 0
4.2 0.8 0.64 3.6 0.5 0.25
2.8 -0.6 0.36
3.6 0.2 0.04
Total 2.86 1.32

1 2.86
𝑠12 = ∑(𝑥𝑖 − 𝑥̅ )2 = = 0.3178
𝑚−1 9

26
1 1.32
𝑠22 = ∑(𝑦𝑖 − 𝑦̅)2 = = 0.1886
𝑛−1 7

3.4 − 3.1 0.3 0.3

𝑡= = = = 1.2744
√0.0318 + 0.0236 0.2354
√0.3178 + 0.1886
10 8

The tabulated 𝑡 value for this test is 𝑡0.05,16 (𝑜𝑛𝑒 𝑡𝑎𝑖𝑙) = 1.746

Since the computed 𝑡 value is less than the tabulated one, we fail to reject 𝐻0 and
conclude that the mean weight of day shift products is equal to that one of night shift
products in the facility.

Exercise 6.2
A study was taken to establish whether there is a difference in the mean sales between
the male marketers and female ones. The monthly sales in KES 100,000 for the
marketers grouped by gender are shown below. Test the appropriate hypotheses using
𝛼 = 0.05 significance level.
Male 27.4 25.4 28.5 31.1 30.4 31.5 23.4 27.5 30.2 25.8 26.6 24.6
24.3 26.3 26.5
Female 27.8 31.2 29.2 25.6 26.8 32.9 28.3 30.3 25.5 28.8 28.8 26.8

6.3 Paired T Test

This test compares one set of measurements with a second set from the same sample. It
is often used to compare “before” and “after” scores in experiments to determine
whether significant change has occurred.

The test is as follows:

Null hypothesis, 𝐻0 : 𝜇1 = 𝜇2

Alternative P value Rejection region for a level α test

hypothesis
𝐻1 : 𝜇1 > 𝜇2 𝑃(𝑇𝑛−1 ≥ 𝑡) 𝑡 ≥ 𝑡𝛼,𝑛−1 (upper-tailed)

𝐻1 : 𝜇1 < 𝜇2 𝑃(𝑇𝑛−1 ≤ 𝑡) 𝑡 ≤ −𝑡𝛼,𝑛−1 (lower-tailed)

𝐻1 : 𝜇1 ≠ 𝜇2 2𝑃(𝑇𝑛−1 ≥ 𝑡) = 2𝑃(𝑇𝑛−1 ≤ 𝑡) Either 𝑡 ≥ 𝑡𝛼/2,𝑛−1𝑛−1 or 𝑡 ≤

−𝑡𝛼/2,𝑛−1 (two-tailed)

Test statistic:
𝑑̅
𝑡=
𝑠𝑑 /√𝑛

27
𝑑̅ and 𝑠𝑑 are the sample mean and standard deviation respectively of the differences di’s
between the first and second observations within a pair.

Example 6.3
An investor tries out a new investment strategy. The following data represents the
monthly percentage returns for one year before and after the implementation of the
strategy. Did the new strategy work? Test at α=0.05.

Month 1 2 3 4 5 6 7 8 9 10 11 12
Before 8.5 7.8 11.2 1.1 7.5 3.9 8.2 3.1 10.3 10.2 4.5 11.3
After 8.2 9.8 10.2 10.5 14.2 12.4 11.8 15.5 6.1 11.9 8.6 17.6

Solution
Null hypothesis: H0 : µ1 = µ2
Alternative hypothesis: H1 : µ1 > µ2
Where 𝜇1 and 𝜇2 are the average monthly percentage returns before and after the
implementation of the strategy respectively

2
Month After Before 𝑑𝑖 𝑑𝑖 − 𝑑̅ (𝑑𝑖 − 𝑑̅ )
1 8.2 8.5 -0.3 -4.4 19.36
2 9.8 7.8 2.0 -2.1 4.41
3 10.2 11.2 -1.0 -5.1 26.01
4 10.5 1.1 9.4 5.3 28.09
5 14.2 7.5 6.7 2.6 6.76
6 12.4 3.9 8.5 4.4 19.36
7 11.8 8.2 3.6 -0.5 0.25
8 15.5 3.1 12.4 8.3 68.89
9 6.1 10.3 -4.2 -8.3 68.89
10 11.9 10.2 1.7 -2.4 5.76
11 8.6 4.5 4.1 0.0 0.00
12 17.6 11.3 6.3 2.2 4.84
TOTAL 252.62

1 49.2
𝑑̅ = ∑ 𝑑𝑖 = = 4.1
12 12

1 2 252.62
𝑠𝑑 = √ ∑(𝑑𝑖 − 𝑑̅ ) = √ = √22.9655 = 4.7922
𝑛−1 11

𝑑̅ 4.1 4.1
𝑡= = = = 2.9637
𝑠𝑑 /√𝑛 4.7922/√12 1.3834

28
The tabulated 𝑡 value for this test is 𝑡0.05,11 (𝑜𝑛𝑒 𝑡𝑎𝑖𝑙) = 1.796.

Since the computed 𝑡 value is greater than the tabulated one, we reject 𝐻0 and conclude
that the strategy worked in significantly increasing returns

Exercise 9.3
Compare the prices of 15 household goods in two supermarkets
A B
109 101
128 137
63 62
71 65
136 138
100 91
136 144
73 80
85 81
81 78
77 70
94 101
63 58
121 114
85 83

29
CHAPTER SEVEN

ANALYSIS OF VARIANCE (ANOVA)

Analysis of Variance, popularly known as the ANOVA, is used to compare the means
in cases where there are more than two groups.

When we have only two samples we can use the t-test to compare the means of the
samples but it might become unreliable in case of more than two samples. If we only
compare two means, then the t-test (independent samples) will give the same results as
the ANOVA.

The test is as follows:

Assume that independent random samples have been drawn from k populations with
means 𝜇1 , 𝜇2 , … , 𝜇𝑘 , respectively. Let 𝑛𝑖 , 𝑖 = 1, 2, … , 𝑘, be the number of observations in
the sample drawn from the 𝑖 − 𝑡ℎ population.

The null and alternative hypotheses are

𝐻0 : 𝜇1 = 𝜇2 = ⋯ = 𝜇𝑘
𝐻1 : 𝜇𝑖 ≠ 𝜇𝑗
For at least one 𝑖, 𝑗

The total number of observations in the experiment is 𝑛 = 𝑛1 + 𝑛2 + ⋯ + 𝑛𝑘

Let 𝑦𝑖𝑗 denote the response for the 𝑗 − 𝑡ℎ experimental unit in the 𝑖 − 𝑡ℎ sample, 𝑦𝑖. and
𝑦̅𝑖. represent the total and mean of the 𝑛𝑖 responses in the 𝑖 − 𝑡ℎ sample.
𝑛
∑𝑘 𝑖
𝑖=1 ∑𝑗=1 𝑦𝑖𝑗
The overall mean is 𝑦̅ = 𝑛

First generate the ANOVA table shown below:

Source of Degrees of Sum of squares Mean sum of F P value

variation freedom squares
Between 𝑘−1 𝑘 𝑆𝑆𝐺 𝑀𝑆𝐺 𝑃(𝐹𝑘−1,𝑛−𝑘 > 𝐹)
2 𝑀𝑆𝐺 = 𝐹=
groups 𝑆𝑆𝐺 = ∑ 𝑛𝑖 ( 𝑦̅𝑖. − 𝑦̅) 𝑘−1 𝑀𝑆𝐸
𝑖=1
Within 𝑛−𝑘 𝑆𝑆𝐸 = 𝑆𝑆𝑇 − 𝑆𝑆𝐺 𝑆𝑆𝐸
𝑀𝑆𝐸 =
groups 𝑛−𝑘
𝑛𝑖
Total 𝑛−1 𝑘

𝑆𝑆𝑇 = ∑ ∑(𝑦𝑖𝑗 − 𝑦̅)2

𝑖=1 𝑗=1

30
𝑀𝑆𝑇
𝐻0 is rejected if 𝐹 = > 𝐹𝑘−1,𝑛−𝑘,𝛼
𝑀𝑆𝐸

Example 7
A microfinance has four main plans of recruiting customers. The data below show the
number of customers recruited each of these plans by 23 assistants in six months. Do the
plans differ in mean achievement

Plan
I II III IV
59 65 75 94
78 87 69 89
67 73 83 80
62 79 81 88
83 81 72
76 69 79
90
𝑦𝑖. 425 454 549 351
𝑛𝑖 6 6 7 4
𝑦̅𝑖. 70.83 75.67 78.43 87.75

Solution
𝑘 = 4, 𝑛 = 23

454 + 459 + 425 + 351 1689

𝑦̅ = = = 77.35
6+7+6+4 23

SSG=6(75.67-77.35)2 +7(78.43-77.35)2 +6(70.83-77.35)2 +4(87.75-77.35)2

SSG=6(-1.68)2 +7(1.08)2 +6(-6.52)2 +4(10.4)2

SSG=17.002+8.143+254.802+432.640

SSG=712.587
𝑘 𝑛𝑖

SST = ∑ ∑(𝑦𝑖𝑗 − 𝑦̅)2 = 1909.218

𝑖=1 𝑗=1

SSE = SST – SSE=1909.218-712.578=1196.631

MSG=SSG/ (k-1)=712.587/3=237.529

MSE=SSE/(n-k) = 1196.631/19 = 62.981

31
F=MSG/MSE=237.529/62.981=3.771

From F tables, 𝐹3,19,0.05 = 3.13

Since the competed F-value is greater that the tabulated one, we reject 𝐻0 and conclude
that there is significant difference in mean achievement for the four plans.

𝒊 𝒋 𝒚𝒊𝒋 ̅)
(𝒚𝒊𝒋 − 𝒚 ̅ )𝟐
(𝒚𝒊𝒋 − 𝒚
1 59 -18.35 336.7225
2 78 0.65 0.4225
3 67 -10.35 107.1225
1
4 62 -15.35 235.6225
5 83 5.65 31.9225
6 76 -1.35 1.8225
1 65 -12.35 152.5225
2 87 9.65 93.1225
3 73 -4.35 18.9225
2
4 79 1.65 2.7225
5 81 3.65 13.3225
6 69 -8.35 69.7225
1 75 -2.35 5.5225
2 69 -8.35 69.7225
3 83 5.65 31.9225
3 4 81 3.65 13.3225
5 72 -5.35 28.6225
6 79 1.65 2.7225
7 90 12.65 160.0225
1 94 16.65 277.2225
2 89 11.65 135.7225
4
3 80 2.65 7.0225
4 88 10.65 113.4225
TOTAL 1909.2175

The ANOVA table is;

Source Degrees of freedom SS MSS F

Treatments 3 712.587 237.529 3.771
Error 19 1196.631 62.981
Total 22 1909.218

32
Exercise 7
A local bank has three branch offices. The bank has a liberal sick leave policy, and a
vice-president was concerned about employees taking advantage of this policy. She
thought that the tendency to take advantage depended on the branch at which the
employee worked. To see whether there were differences in the time employees took for
sick leave, she asked each branch manager to sample employees randomly and record
the number of days of sick leave taken during 2015. Twenty employees were chosen,
and the data are listed below:

Branch
A B C D
13 13 11 13
13 15 12 7
12 15 12 12
16 17 14 8
25 25 22 10

Does the data indicate a difference in branches? Use a level of significance of 0.05.

33
CHAPTER EIGHT

CATEGORICAL DATA ANALYSIS

8.1 Introduction
A great deal of the data collected by scientists, medical statisticians and economists is in
the form of counts (whole numbers or integers). The numbers of individuals that died,
the number of firms going bankrupt, the number of days of frost, the number of red
blood cells on a microscope slide, or the number of craters in a sector of lunar landscape
are all potentially interesting variables for study.

Categorical data can be considered to be of four types:

• Data on frequencies, where we count how many times something happened, but we
have no way of knowing how often it did not happen (e.g. lightning strikes,
bankruptcies, deaths, births, etc.)
• Data on proportions, where both the number doing a particular thing, and the total
group size are known (insects dying in an insecticide bioassay, sex ratios at birth,
proportions responding in a questionnaire)
• Grouped data, in which the response variable is a count distributed across a
categorical variable with two or more levels
• Binary response variables (dead or alive, solvent or insolvent, infected or immune)

8.2 Contingency Tables

In this case the data table has 𝐼 rows (𝐼 ≥ 2) and 𝐽 columns hence 𝐼𝐽 cells.
There are two commonly encountered situations in which such data arises:

i. There are 𝐼 populations of interest, each corresponding to a different row of the

table, and each population is divided into the same 𝐽 categories. A sample is
taken from the 𝑖𝑡ℎ population (𝑖 = 1, … , 𝐼 ), and the counts are entered in the cells
in the 𝑖𝑡ℎ row of the table. For example, customers of each of 𝐼 = 3 department
store chains might have available the same 𝐽 = 5 payment categories: cash,
check, store credit card, Visa, and MasterCard.

ii. There is a single population of interest, with each individual in the population
categorized with respect to two different factors. There are 𝐼 categories associated
with the first factor, and 𝐽 categories associated with the second factor. A single
sample is taken, and the number of individuals belonging in both category 𝑖 of
factor 1 and category 𝑗 of factor 2 is entered in the cell in row 𝑖, column 𝑗 (𝑖 =
1, … , 𝐼 ; 𝑗 = 1, … , 𝐽). As an example, customers making a purchase might be
classified according to department in which the purchase was made, with 𝐼 = 6
departments, and according to method of payment, with 𝐽 = 5 as in (i) above.

Let 𝑛𝑖𝑗 denote the number of individuals in the sample falling in the (𝑖, 𝑗)𝑡ℎ of the table.

34
The table displaying the 𝑛𝑖𝑗 ′𝑠 is called a two-way contingency table; a prototype is shown
below:

1 2 … j … J
1 𝑛11 𝑛12 … 𝑛1𝑗 … 𝑛1𝐽
2 𝑛21 𝑛22 … 𝑛2𝑗 … 𝑛2𝐽
. . . . . . .
. . . . . . .
. . . . . . .
i 𝑛𝑖1 𝑛𝑖2 … 𝑛𝑖𝑗 … 𝑛𝑖𝐽
. . . . . . .
. . . . . . .
. . . . . . .

I 𝑛𝐼1 𝑛𝐼2 … 𝑛𝐼𝑗 … 𝑛𝐼𝐽

8.3 Pearson’s Chi-Squared Tests of Independence

In two-way contingency tables for two response variables, the null hypothesis of
statistical independence is
𝐻0 : The two variables are independent
The estimated expected frequencies are
𝑛𝑖+ 𝑛+𝑗
𝜇̂ 𝑖𝑗 =
𝑛
This is the row total for the cell multiplied by the column total for the cell, divided by
the overall sample size.

For testing H0, independence in 𝐼 × 𝐽 contingency tables, the

Pearson chi-squared statistic for testing is:
2
2
(𝑛𝑖𝑗 − 𝜇̂ 𝑖𝑗 ) (𝑂 − 𝐸)2
𝜒 =∑ =∑
𝜇̂ 𝑖𝑗 𝐸
It has a large-sample chi-squared distribution with (𝐼 − 1)( 𝐽 − 1) degrees of freedom.

2
𝐻0 is rejected if 𝜒 2 > 𝜒𝛼,(𝐼−1)(𝐽−1)

Example 8
Suppose you want to determine if certain types of products sell better in certain
geographic locations than others. Consider the accompanying data of number of sales of
three products in three regions. Test the hypothesis of independence between type of
product and region

35
Product
Region I II III Total
A 31 14 45 90
B 22 15 37 74
C 33 35 18 86
Total 86 64 100 250

Solution
We could now set up the following table:

Observed Expected (𝑂 − 𝐸) (𝑂 − 𝐸)2 (𝑂 − 𝐸)2 /𝐸

31 30.96 0.04 0.0016 0.0001
14 23.04 -9.04 81.7216 3.5469
45 36.00 9.00 81.0000 2.2500
22 25.46 -3.46 11.9716 0.4702
15 18.94 -3.94 15.5236 0.8196
37 29.60 7.40 54.7600 1.8500
33 29.58 3.42 11.6964 0.3954
35 22.02 12.98 168.4804 7.6512
18 34.40 -16.40 268.9600 7.8186
Total 24.8021

𝜒 2 = 24.8021. Degrees of Freedom = (𝐼 − 1)(𝐽 − 1) = 2(2) = 4

Reject 𝐻0 because 24.8021 is greater than 9.488 (for 𝛼 = 0.05)

Thus, we would reject the null hypothesis that there is no relationship between type of
product and region. Our data tell us there is a statistically significant relationship
between type of product and region.

Exercise 8
A company packages a particular product in cans of three different sizes, each one
using a different production line. Most cans conform to specifications, but a quality
control engineer has identified the following reasons for non-conformance: (1) blemish
on can; (2) crack in can; (3) improper pull tab location; (4) pull tab missing; (5) other. A
sample of nonconforming units is selected from each of the three lines, and each unit is
categorized according to reason for nonconformity, resulting in the following
contingency table data:

36
Does the data suggest that the proportions falling in the various non-conformance
categories are not the same for the three lines?

37
CHAPTER NINE

REGRESSION AND CORRELATION

9.1 Introduction
Regression analysis involves identifying the relationship between a dependent variable
and one or more independent variables. A model of the relationship is hypothesized,
and estimates of the parameter values are used to develop an estimated regression
equation. Various tests are then employed to determine if the model is satisfactory. If
the model is deemed satisfactory, the estimated regression equation can be used to
predict the value of the dependent variable given values for the independent variables.

The correlation is a measure of linear association between two variables. Values of the
correlation coefficient are always between -1 and +1. A correlation coefficient of +1
indicates that two variables are perfectly related in a positive linear sense; a correlation
coefficient of -1 indicates that two variables are perfectly related in a negative linear
sense, and a correlation coefficient of 0 indicates that there is no linear relationship
between the two variables.

The quantity r, called the Pearson product moment correlation coefficient, measures the
strength and the direction of a sample linear relationship between two variables. The
formula for computing r is:

𝑛 ∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖 − (∑𝑛𝑖=1 𝑥𝑖 )(∑𝑛𝑖=1 𝑦𝑖 )

𝑟=
√[𝑛 ∑𝑛𝑖=1 𝑥𝑖2 − (∑𝑛𝑖=1 𝑥𝑖 )2 ][𝑛 ∑𝑛𝑖=1 𝑦𝑖2 − (∑𝑛𝑖=1 𝑦𝑖 )2 ]

9.2 Simple Linear Regression Model

Simple linear regression is a statistical method that allows us to summarize and study
relationships between two continuous (quantitative) variables. One variable, denoted 𝑋,
is regarded as the predictor, explanatory, or independent variable. The other variable,
denoted 𝑌, is regarded as the response, outcome or dependent variable.

The relationship between the two variables is :

𝑌 = 𝛽0 + 𝛽1 𝑋 + 𝜀

The least squares regression line is given by

𝑦̂ = 𝑏0 + 𝑏1 𝑥 where

𝑛 ∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖 − (∑𝑛𝑖=1 𝑥𝑖 )(∑𝑛𝑖=1 𝑦𝑖 )

𝑏1 =
𝑛 ∑𝑛𝑖=1 𝑥𝑖2 − (∑𝑛𝑖=1 𝑥𝑖 )2

𝑏0 = 𝑦̅ − 𝑏1 𝑥̅

38
Example 9.1
A study was made on the profitability of certain small ventures depending on amount
invested. The data were recorded as follows in KES 10000;

Invested amount, x 2.1 1.6 1.9 1.7 1.4 1.2 1.3 1.1 2.3 1.4
Profit, y 10.6 7.7 8.6 7.6 7.8 5.9 7.2 5.4 9.6 5.6
a) Determine the regression equation.
b) Compute the Pearson product moment correlation coefficient.

Solution
a) The regression equation
The least squares regression line is given by
𝑦̂ = 𝑏0 + 𝑏1 𝑥 where

𝑛 ∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖 − (∑𝑛𝑖=1 𝑥𝑖 )(∑𝑛𝑖=1 𝑦𝑖 )

𝑏1 =
𝑛 ∑𝑛𝑖=1 𝑥𝑖2 − (∑𝑛𝑖=1 𝑥𝑖 )2

𝑏0 = 𝑦̅ − 𝑏1 𝑥̅

𝑥 𝑦 𝑥𝑦 𝑥2 𝑦2
2.1 10.6 22.26 4.41 112.36
1.6 7.7 12.32 2.56 59.29
1.9 8.6 16.34 3.61 73.96
1.7 7.6 12.92 2.89 57.76
1.4 7.8 10.92 1.96 60.84
1.2 5.9 7.08 1.44 34.81
1.3 7.2 9.36 1.69 51.84
1.1 5.4 5.94 1.21 29.16
2.3 9.6 22.08 5.29 92.16
1.4 5.6 7.84 1.96 31.36
16 76 127.06 27.02 603.54

10 ∗ 127.06 − (16)(76) 1270.6 − 1216 54.6

𝑏1 = = = = 3.845
10 ∗ 27.02 − (16)2 270.2 − 256 14.2

𝑏0 = 7.6 − (3.845)(1.6) = 7.6 − 6.152 = 1.448

Thus
𝑦 = 1.448 + 3.845𝑥

b)
The Pearson product moment correlation coefficient, 𝑟

39
𝑛 ∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖 − (∑𝑛𝑖=1 𝑥𝑖 )(∑𝑛𝑖=1 𝑦𝑖 )
𝑟=
√[𝑛 ∑𝑛𝑖=1 𝑥𝑖2 − (∑𝑛𝑖=1 𝑥𝑖 )2 ][𝑛 ∑𝑛𝑖=1 𝑦𝑖2 − (∑𝑛𝑖=1 𝑦𝑖 )2 ]

10 ∗ 127.06 − (16)(76) 54.6

𝑟= = = 0.8996
√[10 ∗ 27.02 − (16)2 ][10 ∗ 603.54 − (76)2 ] √14.2 ∗ 259.4

This shows a strong positive correlation between 𝑦 and 𝑥.

Exercise 9.1
The following are heights in cm and weights in kg of 10 men
Height 162 168 174 176 180 180 182 184 186 186
Weight 65 65 84 63 75 76 82 65 80 81
a) Draw the scatter diagram for the data
b) Find the regression equation
c) Compute the Pearson’s correlation coefficient

9.3 Multiple Linear Regression Model

The model used to describe the relationship between a single dependent variable 𝑦 and
𝑝 independent variables 𝑥1 , 𝑥2 , … , 𝑥𝑘 is :
𝑌 = 𝛽0 + 𝛽1 𝑋1 + 𝛽1 𝑋1 + ⋯ + 𝛽𝑝 𝑋𝑝 + 𝜀

The least squares estimate is given by

𝑦̂ = 𝑏0 + 𝑏1 𝑥1 + 𝑏2 𝑥2 + ⋯ + 𝑏𝑝 𝑥𝑝

This is obtained by solving the normal equations. Consider the case where 𝑝 = 2. The
normal equations are:
𝑛 𝑛 𝑛

𝑛𝑏0 + 𝑏1 ∑ 𝑥𝑖1 + 𝑏2 ∑ 𝑥𝑖2 = ∑ 𝑦𝑖

𝑖=1 𝑖=1 𝑖=1

𝑛 𝑛 𝑛 𝑛
2
𝑏0 ∑ 𝑥𝑖1 + 𝑏1 ∑ 𝑥𝑖1 + 𝑏2 ∑ 𝑥𝑖1 𝑥𝑖2 = ∑ 𝑥𝑖1 𝑦𝑖
𝑖=1 𝑖=1 𝑖=1 𝑖=1

𝑛 𝑛 𝑛 𝑛
2
𝑏0 ∑ 𝑥𝑖2 + 𝑏1 ∑ 𝑥𝑖1 𝑥𝑖2 + 𝑏2 ∑ 𝑥𝑖2 = ∑ 𝑥𝑖2 𝑦𝑖
𝑖=1 𝑖=1 𝑖=1 𝑖=1

The ANOVA table for the multiple linear regression model is:

Source of Degrees of Sum of squares Mean squares F value

variation freedom

40
Regression 𝑝 𝑆𝑆𝑅 𝑀𝑆𝑅
𝑆𝑆𝑅 = ∑(𝑦̂𝑖 − 𝑦̅)2 𝑀𝑆𝑅 = 𝐹=
𝑝 𝑀𝑆𝐸
Error 𝑛−𝑝−1 𝑆𝑆𝐸 = 𝑆𝑆𝑇 − 𝑆𝑆𝑅 𝑆𝑆𝐸
𝑀𝑆𝐸 =
𝑛−𝑝−1
Total 𝑛−1 𝑆𝑆𝑇 = ∑(𝑦𝑖 − 𝑦̅)2

Testing Hypotheses in Multiple Linear Regression

Once we have fit a multiple linear regression model and obtained estimates for the
various parameters of interest, we want to answer questions about the contributions of
various factors to the prediction of the response variable.
There are two types of such questions:

a) Overall test. Taken collectively, does the entire set of explanatory or independent
variables contribute significantly to the prediction of response?
The null hypothesis for this test may be stated as: ‘‘All 𝑝 independent variables
considered together do not explain the variation in the responses.’’ In other words,
𝐻0 : 𝛽1 = 𝛽2 = ⋯ = 𝛽𝑝 = 0
The 𝐹 statistic can be used to test this global null hypothesis 𝐻0 is rejected if the
computed 𝐹 statistic is greater than the tabulated one, 𝐹𝛼,(𝑝),(𝑛−𝑝−1) .

The R-squared (𝑅 2 ) statistic provides a measure of how well the model is fitting the
actual data. It gives the proportion of the variance in the dependent variable that is
predictable from the independent variable. It is given by
𝑆𝑆𝑅
𝑅2 =
𝑆𝑆𝑇

In multiple regression settings, the 𝑅 2 will always increase as more variables are
included in the model. That’s why the adjusted 𝑅 2 is the preferred measure as it adjusts
for the number of variables considered. It is defined as
(1 − 𝑅 2 )𝑝
𝐴𝑑𝑗𝑢𝑠𝑡𝑒𝑑 𝑅 2 = 𝑅 2 −
𝑛−𝑝−1

b) Test for the value of a single factor. Does the addition of one particular variable of
interest add significantly to the prediction of response over and above that
achieved by other independent variables?
The null hypothesis for this test may stated as:
‘‘Factor 𝑋𝑗 does not have any value added to the prediction of the response given that
other factors are already included in the model.’’ In other words,
𝐻0 : 𝛽𝑗 = 0
This can be tested using
𝑏𝑗
𝑡=
𝑆𝐸(𝑏𝑗 )

41
where 𝑏𝑗 is the corresponding estimated regression coefficient and 𝑆𝐸(𝑏𝑗 ) is the estimate
of the standard error of 𝑏𝑗 .
Where
𝑀𝑆𝐸 𝑀𝑆𝐸
𝑆𝐸(𝑏𝑗 ) = √ = √
∑𝑛𝑖=1(𝑥𝑗𝑖 − 𝑥̅𝑗 )2 ∑𝑛𝑖=1 𝑥𝑗𝑖2 − 𝑛𝑥̅𝑗2

𝐻0 is rejected if computed 𝑡 is greater than the tabulated one 𝑡𝛼/2,𝑛−𝑝−1

Example 9.2
Let 𝑦 be the sales at a fast-food outlet (KES 1000), 𝑥1 be the population within a 2-
kilometre radius (1000’s of people) and 𝑥2 be number (in hundreds) of competing
outlets within a 2 - kilometre radius.

Fit a multiple linear regression model and test the significance of both the fitted model
and the two independent variables.

Sales(𝒚) Population(𝒙𝟏 ) Competition (𝒙𝟐 )

101 81.7 19.9
142 103.8 18.7
117 96.5 26.1
104 95.2 24.5
109 92.9 21.6
132 99.1 23.3
107 85.4 28.2
118 90.5 21.4
103 95.6 25.5
120 83.4 19.9
131 106.7 21.6
123 92.4 22.9

42
Solution
𝒚 𝒙𝟏 𝒙𝟐 𝒙𝟐𝟏 𝒙𝟏 𝒙𝟐 𝒙𝟐𝟐 𝒙𝟏 𝒚 𝒙𝟐 𝒚
101 81.7 19.9 6674.89 1625.83 396.01 8251.7 2009.9
142 103.8 18.7 10774.44 1941.06 349.69 14739.6 2655.4
117 96.5 26.1 9312.25 2518.65 681.21 11290.5 3053.7
104 95.2 24.5 9063.04 2332.4 600.25 9900.8 2548.0
109 92.9 21.6 8630.41 2006.64 466.56 10126.1 2354.4
132 99.1 23.3 9820.81 2309.03 542.89 13081.2 3075.6
107 85.4 28.2 7293.16 2408.28 795.24 9137.8 3017.4
118 90.5 21.4 8190.25 1936.7 457.96 10679.0 2525.2
103 95.6 25.5 9139.36 2437.8 650.25 9846.8 2626.5
120 83.4 19.9 6955.56 1659.66 396.01 10008.0 2388.0
131 106.7 21.6 11384.89 2304.72 466.56 13977.7 2829.6
123 92.4 22.9 8537.76 2115.96 524.41 11365.2 2816.7
Total 1407 1123.2 273.6 105776.8 25596.73 6327.04 132404.4 31900.4

The normal equations are:

12𝑏0 + 1123.2𝑏1 + 273.6𝑏2 = 1407 (i)

1123.2𝑏0 + 105776.8𝑏1 + 25596.73𝑏2 = 132404.4 (ii)

273.6𝑏0 + 25596.73𝑏1 + 6327.04𝑏2 = 31900.4 (iii)

Multiplying (i) by 93.6 and subtracting from (ii)

645.28𝑏1 − 12.23𝑏2 = 709.2 (iv)

Multiplying (i) by 22.8 and subtracting from (iii)

−12.23𝑏1 + 88.96𝑏2 = −179.2 (v)

Multiplying (v) by 52.762 and adding (iv)

4681.478𝑏2 = −8745.75

𝑏2 = −1.868
Substituting this in (v)

−12.23𝑏1 − 166.177 = −179.2

𝑏1 = 1.065

43
From (i)

12𝑏0 + 1196.208 − 511.085 = 1407

𝑏0 = 60.156

The fitted multiple regression model is 𝑦̂ = 60.156 + 1.065𝑥1 − 1.868𝑥2

Next is to perform tests of significance

𝑦 𝑦̂ (𝑦 − 𝑦̅)2 (𝑦̂ − 𝑦̅)2

101 109.99 264.0625 52.6597
142 135.77 612.5625 343.0423
117 114.17 0.0625 9.4636
104 115.78 175.5625 2.1668
109 118.75 68.0625 2.2371
132 122.17 217.5625 24.2369
107 98.43 105.0625 354.2150
118 116.56 0.5625 0.4716
103 114.34 203.0625 8.4914
120 111.80 7.5625 29.6611
131 133.44 189.0625 262.2035
123 115.78 33.0625 2.1468
Total 1876.250 1090.996
𝑆𝑆𝑅 = ∑(𝑦̂𝑖 − 𝑦̅)2 = 1090.996

𝑆𝑆𝑇 = ∑(𝑦𝑖 − 𝑦̅)2 = 1876.250

𝑆𝑆𝐸 = 𝑆𝑆𝑇 − 𝑆𝑆𝑅 = 785.254

The ANOVA table for the multiple regression model becomes:

Source of Degrees of Sum of squares Mean squares F value P value

variation freedom
Regression 2 1090.996 545.498 6.252 0.0199
Error 9 785.254 87.250
Total 11 1876.250

𝐻0 : 𝛽1 = 𝛽2 = 0 is rejected since the p value of the computed 𝐹 is less than 0.05.

44
𝑆𝑆𝑅 1090.996
𝑅2 = = = 0.5815. The model accounts for 58.15% of the variation in the
𝑆𝑆𝑇 1876.250
response variable which is fairly adequate.

(1 − 𝑅 2 )𝑝 (1 − 0.5815) × 2
𝐴𝑑𝑗𝑢𝑠𝑡𝑒𝑑 𝑅 2 = 𝑅 2 − = 0.5815 − = 0.4885
𝑛−𝑝−1 12 − 2 − 1

Next, we perform tests on individual independent variables:

𝒙𝟏 𝒙𝟐 ̅𝟏 )𝟐 (𝒙𝟐 − 𝒙
(𝒙𝟏 − 𝒙 ̅𝟐 ) 𝟐
81.7 19.9 141.61 8.41
103.8 18.7 104.04 16.81
96.5 26.1 8.41 10.89
95.2 24.5 2.56 2.89
92.9 21.6 0.49 1.44
99.1 23.3 30.25 0.25
85.4 28.2 67.24 29.16
90.5 21.4 9.61 1.96
95.6 25.5 4.00 7.29
83.4 19.9 104.04 8.41
106.7 21.6 171.61 1.44
92.4 22.9 1.44 0.01
Total 645.3 88.96

𝑏𝑗 𝑀𝑆𝐸
𝑡= ; 𝑆𝐸(𝑏𝑗 ) = √ 𝑛
𝑆𝐸(𝑏𝑗 ) ∑𝑖=1(𝑥𝑗𝑖 − 𝑥̅𝑗 )2

87.25 87.25
𝑆𝐸(𝑏1 ) = √ = 0.3677; 𝑆𝐸(𝑏2 ) = √ = 0.9903
645.3 88.96

1.065 −1.868
𝑡1 = = 2.8964; 𝑡2 = = 1.8863
0.3677 0.9903

The tests for individual coefficients are tabluated below:

Parameter Estimated Standard 𝑡 P value Tabulated Conclusion

coefficient error 𝑡0.025,9
Population 1.065 0.3677 2.8964 0.0088 2.262 Reject
(𝛽1) 𝐻0 : 𝛽1 = 0
Competition -1.868 0.9903 -1.8863 0.0459 -2.262 Reject
(𝛽2) 𝐻0 : 𝛽2 = 0

45
Exercise 9.2
For the data given below, fit a multiple linear regression model and test the significance
of both the fitted model and the two independent variables.
𝑥1 𝑥2 𝑦
5.3 77.4 50.5
5.4 11.1 24.8
5.6 32.1 31.6
2.5 25.1 27.8
3.4 22.1 22.1
2.6 35.1 28.5
4.4 50.3 41.1
2.1 52.1 28.9
3.6 40.9 36.3
5.1 78.8 43.4

46
CHAPTER TEN

NON PARAMETRIC INFERENCE

10.1 Introduction
Nonparametric tests are used in situations where the data come from a probability
distribution whose underlying form is not specified. That is, it will not be assumed that
the underlying distribution is normal, or exponential, or any other given type.

Because no particular parametric form for the underlying distribution is assumed, such
tests are called nonparametric.

The strength of a nonparametric test resides in the fact that it can be applied without
any assumption on the form of the underlying distribution.

Parametric procedures are all sensitive to extreme observations, a few very small or
very large—perhaps erroneous—data values.

The results of these nonparametric tests are much less affected by extreme observations.

10.2 The Sign Test

The sign test is a non-parametric alternative to the one sample 𝑡 test. It tests the
hypothesis that the median 𝑚 takes on a particular value 𝑚0 . The hypothesis to be
tested is 𝐻0 : 𝑚 = 𝑚0 . The alternative hypothesis either be 𝐻1 : 𝑚 > 𝑚0 , 𝐻0 : 𝑚 <
𝑚0 or 𝐻0 : 𝑚 ≠ 𝑚0

The test is conducted as follows:

Given a sample 𝑥1 , 𝑥2 , … , 𝑥𝑛
Calculate 𝑥𝑖 − 𝑚0 for 𝑖 = 1, 2, . . . , n.
Define N− = the number of negative signs obtained upon calculating 𝑥𝑖 − 𝑚0 for
𝑖 = 1, 2, . . . , n
Define N+ = the number of positive signs obtained upon calculating 𝑥𝑖 − 𝑚0 for
𝑖 = 1, 2, . . . , n

If the null hypothesis is true, that is, 𝑚 = 𝑚0 , then N− and N+ both follow a binomial
distribution with parameters n and p = ½

Suppose we are interested in testing the null hypothesis 𝐻0 : 𝑚 = 𝑚0 against the

alternative hypothesis 𝐻1 : 𝑚 > 𝑚0 . Then, if the alternative hypothesis were true, we
should expect 𝑥𝑖 − 𝑚0 to yield more positive (+) signs than would be expected if the
null hypothesis were true:

47
In that case, we should reject the null hypothesis if n− , the observed number of negative
signs, is too small, or alternatively, if the P-value as defined by:
𝑃 = 𝑃(N− ≤ n− )
is small, that is, less than or equal to α.

Suppose we are interested in testing the null hypothesis 𝐻0 : 𝑚 = 𝑚0 against the

alternative hypothesis 𝐻1 : 𝑚 < 𝑚0 . Then, if the alternative hypothesis were true, we
should expect 𝑥𝑖 − 𝑚0 to yield more negative (−) signs than would be expected if the
null hypothesis were true:

In that case, we should reject the null hypothesis if n+ , the observed number of positive
signs, is too small, or alternatively, if the P-value as defined by:
𝑃 = 𝑃(N+ ≤ n+ )
is small, that is, less than or equal to α.

If we are interested in testing the null hypothesis 𝐻0 : 𝑚 = 𝑚0 against the alternative

hypothesis 𝐻1 : 𝑚 ≠ 𝑚0 , it makes sense that we should reject the null hypothesis if we
have too few negative signs or too few positive signs. Formally, we reject if nmin , which
is defined as the smaller of n− and n+ , is too small. Alternatively, we reject if the P-value
as defined by:
𝑃 = 2𝑃(Nmin ≤ min(n− , n+ )
is small, that is, less than or equal to α.

Example 10.1
Recall Example 6.1
A manufacturer claims that the median weight a certain product is 3.3 kg. A random
sample of ten such products gave the following results
2.6 2.2 2.9 3.4 3.4 3.7 1.7 2.7 3.3 2.3

Test at 𝛼 = 0.05 the hypothesis that the median weight of the sampled products differs
from the claimed figure.

Solution
The test of hypothesis is:
𝐻0 : 𝑚 = 3.3
𝐻1 : 𝑚 ≠ 3.3

𝑥𝑖 𝑥𝑖 − 𝑚0 Sign 𝑥𝑖 𝑥𝑖 − 𝑚0 Sign
2.6 -0.7 - 3.7 0.4 +
2.2 -1.1 - 1.7 -1.6 -
2.9 -0.4 - 2.7 -0.6 -
3.4 0.1 + 3.3 0.0
3.4 0.1 + 2.3 -1.0 -

48
n− = 6, n+ = 3, min(n− , n+ ) = 3
The p value is

𝑃 = 2𝑃(Nmin ≤ 3) = 2{P(Nmin = 0) + P(Nmin = 1) + P(Nmin = 2) + P(Nmin = 3)}

𝑃 = 2{0.0010 + 0.0098 + 0.0439 + 0.1172} = 0.3438

Since the p value is greater than 0.05, we fail to reject the null hypothesis. The median
weight of the sampled products is NOT different from the claimed figure of 3.3.

Exercise 10.1
According to a recent report on accident claims by an insurance company, the average
number accident claims reported per branch is 3 per day. A random sample of twenty
branches on a given day yielded the following data of claim incidents

3 1 2 1 2 3 5 2 5 1
3 2 3 1 3 3 4 3 1 8

Test at 𝛼 = 0.05 the hypothesis that the median number of claim incidents in these
branches is less than the one in the report.

10.3 The Wilcoxon Rank-Sum Test (Mann – Whitney U Test)

The Wilcoxon Rank-Sum test is a nonparametric counterpart of the two-sample t test.
It is used to compare two samples that have been drawn from independent populations.

But unlike the t test, this test does not assume that the underlying populations are
normally distributed and is less affected by extreme observations.

The Wilcoxon rank-sum test evaluates the null hypothesis that the medians of the two
populations are identical.

Let n1 and n2 be the two sample sizes and R be the sum of the ranks from the sample
with size n1.

Under the null hypothesis that the two underlying populations have identical medians,
we would expect the averages of ranks to be approximately equal.

We test this hypothesis by calculating the statistic

𝑅 − 𝜇𝑅
𝑧=
𝜎𝑅
Where
𝑛1 (𝑛1 + 𝑛2 + 1)
𝜇𝑅 =
2

49
is the mean and
𝑛1 𝑛2 (𝑛1 + 𝑛2 + 1)
𝜎𝑅 = √
12
is the standard deviation of R. It does not make any difference which rank sum we use.

For relatively large values of n1 and n2 the sampling distribution of this statistic is
approximately standard normal.

Thus the p-value for this statistic is 𝑝(𝑍 ≥ 𝑧), 𝑝(𝑍 ≤ −𝑧) and 2𝑝(𝑍 ≥ 𝑧) respectively for
the upper tailed, lower tailed and two tailed tests respectively

Example 10.2
Recall Example 6.2

As an extension to Example 6.1, suppose the management of the company suspects that
there is a difference in weight of products between those produced during day shift and
those produced during night shift. A random sample of 18 products gave the following
results
Day 3.7 2.7 2.6 4.1 3.7 3.3 3.3 4.2 2.8 3.6
Night 3.2 2.3 3.3 2.6 3.3 3.4 3.1 3.6

Test a nonparametric test at 𝛼 = 0.05 that the median weight of day shift products is
greater than that one of night shift in the company.

Solution
Day Night
Weight Rank Weight Rank
3.7 15.5 3.2 7
2.7 4 2.3 1
2.6 2.5 3.3 9.5
4.1 17 2.6 2.5
3.7 15.5 3.3 9.5
3.3 9.5 3.4 12
3.3 9.5 3.1 6
4.2 18 3.6 13.5
2.8 5
3.6 13.5
Total 110 61

Let 𝑅 be the sum of the ranks for boys:-𝑅 = 110

50
𝑛1 (𝑛1 + 𝑛2 + 1) 10(10 + 8 + 1) 190
𝜇𝑅 = = = = 95
2 2 2

𝑛1 𝑛2 (𝑛1 + 𝑛2 + 1) 10 ∗ 8 ∗ (10 + 8 + 1) 1520

𝜎𝑅 = √ =√ =√ = 11.2546
12 12 12

𝑅 − 𝜇𝑅 110 − 95
𝑧= = = 1.3328
𝜎𝑅 11.2546
The upper tail p-value for this statistic is 𝑝(𝑍 ≥ 1.3328) = 0.0913. We therefore fail to
reject Ho. The median weight for day shift equals that one for night shift.

If we let R be the sum of the ranks for night shift:-𝑅 = 61

𝑛1 (𝑛1 + 𝑛2 + 1) 8(8 + 10 + 1) 152
𝜇𝑅 = = = = 76
2 2 2

𝜎𝑅 = 11.2546

𝑅 − 𝜇𝑅 61 − 76
𝑧= = = −1.3328
𝜎𝑅 11.2546

This would result in the same p-value hence similar conclusion

Exercise 10.2
Recall Exercise 6.2.
A study was taken to establish whether there is a difference in the mean sales between
the male marketers and female ones. The monthly sales in KES 100,000 for the
marketers grouped by gender are shown below. Test the appropriate non parametric
hypotheses using α = 0.05 significance level.
Male 27.4 25.4 28.5 31.1 30.4 31.5 23.4 27.5 30.2 25.8 26.6 24.6
24.3 26.3 26.5
Female 27.8 31.2 29.2 25.6 26.8 32.9 28.3 30.3 25.5 28.8 28.8 26.8

10.4 The Wilcoxon Signed - Rank Test

The idea of using ranks, instead of measured values, to form statistical tests to compare
population means applies to the analysis of pair-matched data as well.

As with the paired t test, we begin by forming differences. Then the absolute values of
the differences are assigned ranks; if there are ties in the differences, the average of the
appropriate ranks is assigned.

Next, we attach a + or a - sign back to each rank, depending on whether the

corresponding difference is positive or negative.

51
This is achieved by multiplying each rank by +1, -1, or 0 as the corresponding difference
is positive, negative, or zero. The results are n signed ranks, one for each pair of
observations; for example, if the difference is zero, its signed rank is zero.

The basic idea is that if the mean difference is positive, there would be more and larger
positive signed ranks; since if this were the case, most differences would be positive and
larger in magnitude than the few negative differences, most of the ranks, especially the
larger ones, would then be positively signed.

We can base the test on the sum R of the positive signed ranks. We test the null
hypothesis of no difference by calculating the standardized test statistic:
𝑅 − 𝜇𝑅
𝑧=
𝜎𝑅
Where
𝑛(𝑛 + 1)
𝜇𝑅 =
4
is the mean and
𝑛(𝑛 + 1)(2𝑛 + 1)
𝜎𝑅 = √
24

is the standard deviation of R under the null hypothesis.

This normal approximation applies for relatively large samples.

Thus the p-value for this statistic is 𝑝(𝑍 ≥ 𝑧), 𝑝(𝑍 ≤ −𝑧) and 2𝑝(𝑍 ≥ 𝑧) respectively for
the upper tailed, lower tailed and two tailed tests respectively

This test is referred to as Wilcoxon’s signed-rank test.

Example 10.3
Recall Example 6.3.
An investor tries out a new investment strategy. The following data represents the
monthly percentage returns for one year before and after the implementation of the
strategy. Did the new strategy work? Test at α=0.05. Use the nonparametric method.

Month 1 2 3 4 5 6 7 8 9 10 11 12
After 8.5 7.8 11.2 1.1 7.5 3.9 8.2 3.1 10.3 10.2 4.5 11.3
Before 8.2 9.8 10.2 10.5 14.2 12.4 11.8 15.5 6.1 11.9 8.6 17.6

52
Solution
Month After Before 𝑑𝑖 |𝑑𝑖 | Rank Signed Rank
1 8.2 8.5 -0.3 0.3 1 -1
2 9.8 7.8 2 2 4 4
3 10.2 11.2 -1 1 2 -2
4 10.5 1.1 9.4 9.4 11 11
5 14.2 7.5 6.7 6.7 9 9
6 12.4 3.9 8.5 8.5 10 10
7 11.8 8.2 3.6 3.6 5 5
8 15.5 3.1 12.4 12.4 12 12
9 6.1 10.3 -4.2 4.2 7 -7
10 11.9 10.2 1.7 1.7 3 3
11 8.6 4.5 4.1 4.1 6 6
12 17.6 11.3 6.3 6.3 8 8

R=4+11+9+10+5+12+3+6+8=68

𝑛(𝑛 + 1) 12 ∗ 13
𝜇𝑅 = = = 39
4 4

𝑛(𝑛 + 1)(2𝑛 + 1) 12(13)(25) 3900

𝜎𝑅 = √ =√ =√ = 12.7475
24 24 24

𝑅 − 𝜇𝑅 68 − 39 29
𝑧= = = = 2.275
𝜎𝑅 12.7475 12.7475

The upper tail p-value for this statistic is 𝑝(𝑍 ≥ 2.275) = 0.0115. We therefore reject Ho.
The median returns were significantly increased.

Exercise 10.3
Recall Exercise 6.3.

Compare the prices of 15 household goods in two supermarkets using the

nonparametric method
A B A B A B
109 101 100 91 77 70
128 137 136 144 94 101
63 62 73 80 63 58
71 65 85 81 121 114
136 138 81 78 85 83

53
10.5 The Kuskal-Wallis Test
The Kruskal–Wallis one-way analysis of variance is a direct generalization of the
Wilcoxon Rank-Sum test to the case in which we have three or more independent groups.

As such it is the distribution-free equivalent of the one-way analysis of variance. It tests

the hypothesis that all samples were drawn from identical populations and is
particularly sensitive to differences in central tendency.

To perform the Kruskal–Wallis test, we simply rank all scores without regard to group
membership.
The test statistic then is:
∑𝑘𝑖=1 𝑛𝑖 (𝑟̅𝑖. − 𝑟̅ )2
𝐾 = (𝑛 − 1) 𝑘
∑𝑖=1 ∑𝑛𝑗=1
𝑖
(𝑟𝑖𝑗 − 𝑟̅ )2
Where
𝑛𝑖 is the number of observations in the i-th group

𝑟𝑖𝑗 is the rank among all observations of observation j in the i-th group

𝑛 = ∑ 𝑛𝑖 = total sample size

∑𝑛𝑗=1
𝑖
𝑟𝑖𝑗
𝑟̅𝑖. =
𝑛𝑖

𝑛+1
𝑟̅ = is the average of all 𝑟𝑖𝑗
2

2
The p-value is approximated by 𝑃(𝜒𝑘−1 ≥ 𝐾)

Example 10.4
Recall the Example 7.
A microfinance has four main plans of recruiting customers. The data below show the
number of customers recruited each of these plans by 23 assistants in six months. Do the
plans differ in mean achievement? Use the nonparametric approach
Plan
I II III IV
59 65 75 94
78 87 69 89
67 73 83 80
62 79 81 88
83 81 72
76 69 79
90

54
Solution
We have:
276
𝑟̅ = = 12
23

∑𝑘𝑖=1 𝑛𝑖 (𝑟̅𝑖. − 𝑟̅ )2
𝐾 = (𝑛 − 1)
∑𝑘𝑖=1 ∑𝑛𝑗=1
𝑖
(𝑟𝑖𝑗 − 𝑟̅ )2

6(2.0079) + 7(0.5098) + 6(19.5099) + 4(56.25) 7868.8553

𝐾 = 22 ∗ = = 7.7909
1010 1010

The p-value for this statistic is 𝑃(𝜒32 ≥ 7.7909) = 0.0505. Since the p-value is greater
than 0.05, we fail to reject Ho.

There is NO significant difference in the achievement by the four plans

Plan Number Rank 𝑟𝑖𝑗 − 𝑟̅ (𝑟𝑖𝑗 − 𝑟̅ )2 𝑟𝑖. 𝑟̅𝑖. 𝑟̅𝑖. − 𝑟̅ (𝑟̅𝑖. − 𝑟̅ )2

I 59 1 -11 121
78 11 -1 1
67 4 -8 64
45.5 7.583 -4.417 19.5099
62 2 -10 100
83 17.5 5.5 30.25
76 10 -2 4
III 75 9 -3 9
69 5.5 -6.5 42.25
83 17.5 5.5 30.25
81 15.5 3.5 12.25 89 12.714 0.714 0.5098
72 7 -5 25
79 12.5 0.5 0.25
90 22 10 100
II 65 3 -9 81
87 19 7 49
73 8 -4 16
63.5 10.583 -1.417 2.0079
79 12.5 0.5 0.25
81 15.5 3.5 12.25
69 5.5 -6.5 42.25
IV 94 23 11 121
89 21 9 81
78 19.5 7.5 56.2500
80 14 2 4
88 20 8 64
Total 276 1010 78.2776

55
Exercise 10.4
Recall Exercise 7.

A local bank has three branch offices. The bank has a liberal sick leave policy, and a
vice-president was concerned about employees taking advantage of this policy. She
thought that the tendency to take advantage depended on the branch at which the
employee worked. To see whether there were differences in the time employees took for
sick leave, she asked each branch manager to sample employees randomly and record
the number of days of sick leave taken during 2015. Twenty employees were chosen,
and the data are listed below:

Branch
A B C D
13 13 11 13
13 15 12 7
12 15 12 12
16 17 14 8
25 25 22 10

Does the data indicate a difference in branches? Use a level of significance of 0.05.
Conduct a non - parametric test of hypothesis.

10.6 Spearman Rank Correlation Coefficient

The Spearman correlation coefficient 𝑟𝑠 is defined as the correlation coefficient between
the ranked variables.

For a sample of size 𝑛, the 𝑛 raw scores 𝑥𝑖 , 𝑦𝑖 are converted to ranks 𝑟(𝑥𝑖 ), 𝑟(𝑦𝑖 ), and 𝑟𝑠 is
computed from:
6 ∑ 𝑑𝑖2
𝑟𝑠 = 1 −
𝑛(𝑛2 − 1)
where 𝑑𝑖 = 𝑟(𝑥𝑖 ) − 𝑟(𝑦𝑖 ), is the difference between ranks.

Example 10.5
Recall Example 9.1. A study was made on the profitability of certain small ventures
depending on amount invested. The data were recorded as follows in KES 10000;

Invested amount, x 2.1 1.6 1.9 1.7 1.4 1.2 1.3 1.1 2.3 1.4
Profit, y 10.6 7.7 8.6 7.6 7.8 5.9 7.2 5.4 9.6 5.6

Compute the Spearman rank correlation coefficient

56
Solution
𝑥𝑖 𝑟(𝑥𝑖 ) 𝑦𝑖 𝑟(𝑦𝑖 ) 𝑑𝑖 𝑑𝑖2
2.1 2 10.6 1 1 1
1.6 5 7.7 5 0 0
1.9 3 8.6 3 0 0
1.7 4 7.6 6 -2 4
1.4 6.5 7.8 4 2.5 6.25
1.2 9 5.9 8 1 1
1.3 8 7.2 7 1 1
1.1 10 5.4 10 0 0
2.3 1 9.6 2 -1 1
1.4 6.5 5.6 9 -2.5 6.25
Total 20.5

6 ∑ 𝑑𝑖2 6 × 20.5 123

𝑟𝑠 = 1 − 2
= 1− 2
= 1− = 0.8758
𝑛(𝑛 − 1) 10(10 − 1) 990

This shows a strong positive correlation between the two variables.

Exercise 10.5
Recall Exercise 9.1. The following are heights in cm and weights in kg of 10 men.
Compute the Spearman rank correlation coefficient
Height 162 168 174 176 180 180 182 184 186 186
Weight 65 65 84 63 75 76 82 65 80 81

Common questions