Unit 3 Lecture Notes
Unit 3 Lecture Notes
Study Guide
A statistical hypothesis is an assumption about a population parameter. This assumption may or may not
be true. Hypothesis testing refers to the formal procedures used by statisticians to accept or reject
statistical hypothesis.
The best way to determine whether a statistical hypothesis is true would be to examine the entire
population. Since that is often impractical, researchers typically examine a random sample from the
population. If sample data are not consistent with the statistical hypothesis, the hypothesis is rejected.
Parameter.: These statistical constants of population are called parameter. Greek letters are used to
denote the population parameter. E.g. Mean(𝜇), Standard deviation.(𝜎), Population proportion.(P) etc.
Statistics: The statistical constants for the sample drawn from the given population are called the
statistics. Roman letters are used to denote the sample statistics. E.g. mean (𝑥̅ ), Standard deviation (s),
Sample proportion (p).
Null hypothesis. The null hypothesis, denoted by H0, is usually the hypothesis that sample
observations result purely from chance.
Alternative hypothesis. The alternative hypothesis, denoted by H1 or Ha, is the hypothesis that sample
observations are influenced by some non-random cause.
For example, suppose we wanted to determine whether a coin was fair and balanced. A null hypothesis
might be that half the flips would result in Heads and half, in Tails. The alternative hypothesis might be
that the number of Heads and Tails would be very different. Symbolically, these hypotheses would be
expressed as
H0: P = 0.5
H1: P ≠ 0.5
Suppose we flipped the coin 50 times, resulting in 40 Heads and 10 Tails. Given this result, we would be
inclined to reject the null hypothesis. We would conclude, based on the evidence, that the coin was
probably not fair and balanced.
Test Statistics:
After setting up the null hypothesis and alternative hypothesis test statistics is calculated. It is used to test
whether the null hypothesis should be accepted or rejected.
Statisticians follow a formal process to determine whether to reject a null hypothesis, based on sample
data. This process, called hypothesis testing, consists of four steps.
State the hypotheses. This involves stating the null and alternative hypotheses. The hypotheses are
stated in such a way that they are mutually exclusive. That is, if one is true, the other must be false.
Formulate an analysis plan. The analysis plan describes how to use sample data to evaluate the null
hypothesis. The evaluation often focuses around a single test statistic.
Analyze sample data. Find the value of the test statistic (mean score, proportion, t statistic, z-score,
etc.) described in the analysis plan.
Interpret results. Apply the decision rule described in the analysis plan. If the value of the test statistic
is unlikely, based on the null hypothesis, reject the null hypothesis.
Level of significance: The level of significance is the maximum probability of making a Type I error and
is denoted by (𝛼) alpha
Two tailed and one tailed test: When the test hypothesis is made on the basis of region of rejection
represented by both the sides of the standard normal curve it is called a two tailed test. i.e. H0 : µ = µ0 H1 :
µ ≠ µ0. A test of statistical hypothesis where the alternative hypothesis is one sided is called one tailed
test. H0 : µ ≤ µ0 H1 : µ > µ0., H0 : µ ≥ µ0 H1 :µ<µ0.
Decision Errors
Type I error. A Type I error occurs when the researcher rejects a null hypothesis when it is true. The
probability of committing a Type I error is called the level of singnificance. This probability is also
called alpha, and is often denoted by α.
Type II error. A Type II error occurs when the researcher fails to reject a null hypothesis that is false
. The probability of committing a Type II error is called Beta, and is often denoted by β. The
probability of not committing a Type II error is called the Power of the test.
Two tests
Difference
the z- test and the t-test = Z
S.E
Assumptions: z-test
2. Compute M = ΣX/N.
3. Compute .
4. Compute where M is the sample mean and µ is the hypothesized value of the population.
6. If Zc > Zt
we reject the null hypothesis. Otherwise we accept null hypothesis.
This section explains how to compute a significance test for the mean of a normally-distributed variable
for which the population standard deviation (σ) is known. In practice, the standard deviation is rarely
known. However, learning how to compute a significance test when the standard deviation is known is an
excellent introduction to how to compute a significance test in the more realistic situation in which the
standard deviation has to be estimated.
1. The first step in hypothesis testing is to specify the null hypothesis and the alternate hypothesis. In
testing hypotheses about µ, the null hypothesis is a hypothesized value of µ. Suppose the mean score
of all 10-year old children on an anxiety scale were 7. If a researcher were interested in whether 10-
year old children with alcoholic parents had a different mean score on the anxiety scale, then the null
and alternative hypotheses would be:
H0: µalcoholic = 7
Ha: µalcoholic ≠ 7
2. The second step is to choose a significance level. Assume the 0.05 level is chosen.
5. The sample size (N) and the population standard deviation (σ) are needed to calculate σM. Assume
that N = 16 and σ= 2.0. Then,
𝟔. Zc > Zt we reject the null hypothesis. It is concluded that the mean anxiety score of 10
year-old children with alcoholic parents is higher than the population mean.
The probability of not committing a Type II error is called the power of a hypothesis test.
Effect Size
To compute the power of the test, one offers an alternative view about the "true" value of the population
parameter, assuming that the null hypothesis is false. The effect size is the difference between the true
value and the value specified in the null hypothesis.
For example, suppose the null hypothesis states that a population mean is equal to 100. A researcher
might ask: What is the probability of rejecting the null hypothesis if the true population mean is equal to
90? In this example, the effect size would be 90 - 100, which equals -10.
Sample size (n). Other things being equal, the greater the sample size, the greater the power of the
[Link] level (α). The higher the significance level, the higher the power of the test. If you
increase the significance level, you reduce the region of acceptance. As a result, you are more likely
to reject the null hypothesis. This means you are less likely to accept the null hypothesis when it is
false; i.e., less likely to make a Type II error. Hence, the power of the test is increased.
The "true" value of the parameter being tested. The greater the difference between the "true" value of
a parameter and the value specified in the null hypothesis, the greater the power of the test. That is,
the greater the effect size, the greater the power of the test
(A) I only
(B) II only
(C) III only
(D) All of the above
(E) None of the above
Solution: The correct answer is (C). Increasing sample size makes the hypothesis test more sensitive -
more likely to reject the null hypothesis when it is, in fact, false. Increasing the significance level reduces
the region of acceptance, which makes the hypothesis test more likely to reject the null hypothesis, thus
increasing the power of the test. Since, by definition, power is equal to one minus beta, the power of a
test will get smaller as beta gets bigger.
Problem 2: Suppose a researcher conducts an experiment to test a hypothesis. If she doubles her sample
size, which of the following will increase?
(A) I only
(B) II only
(C) III only
(D) All of the above
(E) None of the above
Solution: The correct answer is (A). Increasing sample size makes the hypothesis test more sensitive -
more likely to reject the null hypothesis when it is, in fact, false. Thus, it increases the power of the test.
The effect size is not affected by sample size. And the probability of making a Type II error gets smaller,
not bigger, as sample size increases.
(A) I only
(B) II only
(C) III only
Solution: The correct answer is (E). The P-value is the probability of observing a sample statistic as
extreme as the test statistic. It can be greater than the significance level, but it can also be smaller than the
significance level. It is not computed from the significance level, it is not the parameter in the null
hypothesis, and it is not a test statistic.
1
H1 : P (two – tailed test)
2
Difference p P 280 1
0.06
500 2
1 1
PQ 2 2 0.02236
p
S.E of n 500
Difference 0.06
Z 2.68 258
S.E 0.02236
Interpret Results If the sample findings are unlikely, given the null
hypothesis, the researcher rejects the null hypothesis. Typically,
this involves comparing the P-value to the significance level, and
rejecting the null hypothesis when the P-value is less than the
significance level.
Suppose the previous example is stated a little bit differently. Suppose the CEO claims that at least
80 percent of the company's 1,000,000 customers are very satisfied. Again, 100 customers are
surveyed using simple random sampling. The result: 73 percent are very satisfied. Based on these
results, should we accept or reject the CEO's hypothesis? Assume a significance level of 0.05. (5%)
Solution: The solution to this problem takes four steps: (1) state the hypotheses, (2) formulate an analysis
plan, (3) analyze sample data, and (4) interpret results. We work through those steps below:
State the hypotheses. The first step is to state the null hypothesis and an alternative hypothesis.
Note that these hypotheses constitute a one-tailed test. The null hypothesis will be rejected only if the
sample proportion is too small.
Formulate an analysis plan. For this analysis, the significance level is 0.05. The test method, shown
in the next section, is a one-sample z-test.
Analyze sample data. Using sample data, we calculate the standard deviation (σ) and compute the z-
score test statistic (z).
where P is the hypothesized value of population proportion in the null hypothesis, p is the sample
proportion, and n is the sample size.
Since we have a one-tailed test, the P-value is the probability that the z-score is less than -1.75. We use
the Normal Distribution Calculator to find P(z < -1.75) = 0.04. Thus, the P-value = 0.04.
Interpret results. Since the P-value (0.04) is less than the significance level (0.05), we cannot accept
the null hypothesis. ( NULL HYPOTHESES IS REJECTED)
Note: If you use this approach on an exam, you may also want to mention why this approach is
appropriate. Specifically, the approach is appropriate because the sampling method was simple random
sampling, the sample included at least 10 successes and 10 failures, and the population size was at least
10 times the sample size.
H0: µ= 500
Ha: µ 500
Ans:
H 0 : 171.17
H 1 : 171.17
Difference x 171.38 171.17 0.21
3.3
S .Eofx 0.165
n 400
Diff . 0.21
Z 1.27 1.96
S.E. 0.165
Therefore, H 0 may be accepted at 5% level of significance.
Therefore, the sample may be regarded as a random sample from a population with mean 171.17
4. A random sample of 100 students from a college of 1200 students gave mean and S.D of heights
as 66 inches and 1.2 inches respectively. Test the hypothesis that the average height of all the
students of the college is 65.8
H 0 : 65.8
Ans:
H1 : 65.8
n 100
As S.D of the population is not Known and sampling fraction 0.08 is more than 0.05,we use
N 1200
the following formula for S.E.
Diff . 0.2
Z 1.67 1.96
S.E. 0.12
Therefore, H 0 may be accepted. i.e. average height of all students of the college may be regarded as 65
inches.
1. A stenographer claims that he can write at an average speed of 120 words per minutes. In 100
trials he obtained an average speed of 116 words per minute with S.D. of 15 words. Is the claim
justified? (Use 5% level of significance) (summer 22-23)
2. A sample of 400 students has a mean height of 171.38 cms. Can it be reasonably regarded as a
random sample from a large population with mean height 171.17 and standard deviation 3.3
cm? ( 5% level of significance.=1.96)
3. A random sample of size 20 from a normal population has mean 42 and standard deviation of
Test the hypothesis that the population mean is 45. Use 5% level of significance. (𝑡0.05 =2.09)
4. A machinist is making engine parts with axle diameter of 0.7 cm. A random sample of 10 parts shows a
mean diameter of 0.742 cm with a standard deviation of 0.04 cm. Compute the statistic you would use to
test whether work is meeting the specification at 0.05 level of significance. (Ans: t=2.262, rejected.)
or simply 1 2 .
x1 x2
If 1 2 then, 0 .
x1 x2
The standard deviation of the distribution of the difference between the sample means is called the
standard error of the difference between two means and is calculated by using this formula: 𝜎𝑑 =
𝜎1 2 𝜎2 2
√ +
𝑛1 𝑛2
where, 𝜎1 2 =variance of population 1
𝜎2 2 =variance of population 2
𝑛1 =size of sample from population 1
𝑛2 =size of sample from population 2
d= 𝑥
̅̅̅1 − ̅̅̅
𝑥2
If two population standard deviations are not known, we can estimate the standard error of the difference
between two means by using the formula
2 2
1 2
d
n1 n2
2
where, 1 =estimated variance of population 1
2
2 = estimated variance of population 2
Solution. Let 𝑋1 and 𝑋2 denote the height (in inches) of athletic participants and non- athletic
participants respectively. In the usual notations, we are given:
𝐻0 : 𝜇1 = 𝜇2
𝐻1 : 𝜇1 > 𝜇2
𝑠1 = 2.5 , 𝑛1 = 50,𝑥̅̅̅1 = 68.2, 𝑠2 = 2.8 , 𝑛2 = 50, 𝑥̅̅̅2 = 67.5
2 2
s1 s 6.25 7.84
d 2 0.53
n1 n2 50 50
Solution: 𝐻0 : 𝜇1 = 𝜇2
𝐻1 : 𝜇1 ≠ 𝜇2
𝑠1 = 6 , 𝑛1 = 60,𝑥 ̅̅̅1 = 86, 𝑠2 = 9 , 𝑛2 = 75, ̅̅̅
𝑥2 = 82
2 2
s s 36 81
d 1 2 1.296
n1 n2 60 75
0.01 (level of significance)
The limits of the acceptance region are 𝑧 = ±2.58 or ̅̅̅ 𝑥2 = 0 ± 𝑧 d
𝑥1 − ̅̅̅
= ±2.58(1.296) = ±3.344
(𝑥 ̅̅̅2̅)−(𝜇1 −𝜇2 )𝐻0
̅̅̅1̅−𝑥 (86−82)−0
Because the observed 𝑧 value =
= = 3.09 > 2.58
1.296
d
Hence, we reject 𝐻0 .
It is reasonable to conclude that the two samples come from different populations.
The table below shows three sets of null and alternative hypotheses. Each makes a statement about the
difference d between the mean of one population μ1 and the mean of another population μ2. (In the table,
the symbol ≠ means " not equal to ".)
The first set of hypotheses (Set 1) is an example of a two-tailed test, since an extreme value on either side
of the sampling distribution would cause a researcher to reject the null hypothesis. The other two sets of
hypotheses (Sets 2 and 3) are one-tailed tests, since an extreme value on only one side of the sampling
distribution would cause a researcher to reject the null hypothesis.
H0: μ1 = μ2
Ha: μ1 ≠ μ2
4. Sketch the distribution and mark the sample value and critical values.
Example 1. Samples of two types of electric bulbs were tested for length of life and following data were
obtained.
Type I Type II
Number of Units 8 7
1Mean (in hours) 1134 1024
S.D.(in hours) 35 40
Test at 5% level whether the difference in the sample means is significant.
Example 2. Below are given the gain in weights (in lbs) of cows fed on two diets X and Y.
Diet 25 32 30 32 24 14 32
X
Diet 24 34 22 30 42 31 40 30 32 35
Y
Test at 5% level whether the two diets differ as regard their effects on mean increase in weight.
Solution. H0 : 1 2
H1 : 1 2
𝑥1 𝑥2 𝑥1 − ̅̅̅
𝑥1 𝑥1 2
(𝑥1 − ̅̅̅) 𝑥2 − 𝑥̅̅̅2 𝑥2 2
(𝑥2 − ̅̅̅)
25 24 -2 4 -8 64
32 34 5 25 2 4
30 22 3 9 -10 100
32 30 5 25 -2 4
24 42 -3 9 10 100
14 31 -13 169 -1 1
32 40 5 25 8 64
30 -2 4
32 0 0
35 3 9
189 320 0 266 0 350
∑ x1 189 ∑ x2 320
𝑥1 =
̅̅̅ = = 27 , ̅̅̅
𝑥2 = = = 32
n1 7 n2 10
1 1
𝑆2 = 𝑛 {𝑛1 𝑆1 2 + 𝑛2 𝑆2 2 } = 7+10−2 {266 + 350} = 41.067
1 +𝑛2 −2
Therefore, 𝑆 = √41.067 = 6.41
̅̅̅1̅−𝑥
|𝑥 ̅̅̅2̅| 𝑛 𝑛 |37−32| 7×10
𝑡= × √𝑛 1+𝑛2 = × √7+10 = 1.58
𝑆 1 2 6.41
D.f= 𝑛1 + 𝑛2 − 2 = 15
Table value of t on 15 d.f and at 5% level of significance =2.131
As 𝑡𝑐𝑎𝑙 < 𝑡𝑡𝑎𝑏
Therefore, 𝐻0 is accepted.
Hence, diets do not differ significantly.
n
3)By considering the hypothesis and given level of significance compute the value of t according to
acceptance region.
𝑥̅ −𝜇𝐻
4) Compute the observed t value by the formula 0 .
x
5) Interpret the result.
Example 1 Sherri Welch is a quality control engineer with the windshield wiper manufacturing division
of Emsco, Inc. Emsco is currently considering two new synthetic rubbers for its wiper blades, and Sherri
was charges with seeing whether blades made with the two compounds wear equally well. She equipped
12 cars belonging to other Emsco employees with one blade made of each of the two compounds. On
cars 1 to 6, the right blade was made of compound A and the left blade was made of compound B; on
cars 7 to 12, compound A was used for the left blade. The cars were driven under normal operating
conditions until the blades no longer did a satisfactory job of clearing the windshield of rain. The data
below give the usable life (in days) of the blades. At 𝛼 = 0.05, do the two compounds wear equally well?
Car 1 2 3 4 5 6 7 8 9 10 11 12
Left 162 323 220 274 165 271 233 156 238 211 241 154
blade
Right 183 347 247 269 189 257 224 178 263 199 263 148
blade
Solution.
Car 1 2 3 4 5 6 7 8 9 10 11 12
Left blade 162 323 220 274 165 271 233 156 238 211 241 154
Right blade 183 347 247 269 189 257 224 178 263 199 263 148
Difference 21 24 27 -5 24 -14 9 -22 -25 12 -22 6
∑ 𝑥 35
𝑥̅ = = = 2.9167 𝑑𝑎𝑦𝑠
𝑛 12
1 1
𝑠2 = (∑ 𝑥 2 − 𝑛𝑥̅ 2 ) = (4397 − 12(2.9167)2 ) = 390.45, 𝑠 = √𝑠 2 = 19.76 𝑑𝑎𝑦𝑠
𝑛−1 11
Example.2 Nine computer-components dealers in major metropolitan areas were asked for their prices
on two similar color inkjet printers. The results of this survey are given below. At = 0.05 , it is
reasonable to assert that, on average, the Apson printer is less expensive than the Okaydata printer?
Dealer 1 2 3 4 5 6 7 8 9
Apson 250 319 285 260 305 295 289 309 275
price(in
dollars)
Okaydata 270 325 269 275 289 285 295 325 300
price(in
dollars)
Solution.
Dealer 1 2 3 4 5 6 7 8 9
Apson 250 319 285 260 305 295 289 309 275
price(in
dollars)
Okaydata 270 325 269 275 289 285 295 325 300
price(in
dollars)
Difference 20 6 -16 15 -16 -10 6 16 25
∑ 𝑥 46
𝑥̅ = = = 5.1111 𝑑𝑜𝑙𝑙𝑎𝑟𝑠
𝑛 9
1 1
𝑠2 = (∑ 𝑥 2 − 𝑛𝑥̅ 2 ) = (2190 − 9(5.1111)2 ) = 244.36, 𝑠 = √𝑠 2 = 15.63 𝑑𝑜𝑙𝑙𝑎𝑟𝑠
𝑛−1 8
s 15.63
x 5.21 dollars
n 9
𝐻0 : 𝜇0 = 𝜇𝐴
𝐻1 : 𝜇0 > 𝜇𝐴
𝛼 = 0.05
The upper limit of the acceptance region is 𝑡 = 1.860 ,or
𝑥̅ = 0 ± 𝑡 x = 1.860(5.21) = 9.69 𝑑𝑜𝑙𝑙𝑎𝑟𝑠
Every hypothesis test requires the analyst to state a null hypothesis and an alternative hypothesis. The
table below shows three sets of hypothesis. Each makes a statement about the difference d between two
population proportions, P1 and P2.
The first set of hypotheses (Set 1) is an example of a two-tailed test, since an extreme value on either side
of the sampling distribution would cause a researcher to reject the null hypothesis. The other two sets of
hypotheses (Sets 2 and 3) are one-tailed tests, since an extreme value on only one side of the sampling
distribution would cause a researcher to reject the null hypothesis.
When the null hypothesis states that there is no difference between the two population proportions (i.e., d
= 0), the null and alternative hypothesis for a two-tailed test are often stated in the following form.
H0: P1 = P2
H1: P1 ≠ P2
The analysis plan describes how to use sample data to accept or reject the null hypothesis. It should
specify the following elements.
Significance level. Often, researchers choose significance levels equal to 0.01, 0.05, or 0.10; but any
value between 0 and 1 can be used.
Test method. Use the two-proportion z-test to determine whether the hypothesized difference between
population proportions differs significantly from the observed sample difference.
Pooled sample proportion. Since the null hypothesis states that P1=P2, we use a pooled sample
proportion (P) to compute the standard error of the sampling distribution.
𝑛1 𝑃1 + 𝑛2 𝑃2
𝑃=
𝑛1 + 𝑛2
Standard error. Compute the standard error (SE) of the sampling distribution difference between
1 1
two proportions. SE = √PQ (n + n )
1 2
Test statistic. The test statistic is a z-score (z) defined by the following equation.
z = (P1 - P2) / SE
𝑛1 𝑃1 + 𝑛2 𝑃2
𝑃=
𝑛1 + 𝑛2
𝑃2 = 0.434
Let H0: P1 = P2
H1: P1 ≠ P2
𝑄 = 1 − 𝑃 = 0.504
1 1 1 1
= √PQ (𝑛 + 𝑛 ) = √0.496 × 0.504 (956 + 450) = 0.027
1 2
0.091
z = (P1 - P2) / SE = 0.027 = 3.368
Since, z>1.96, the null hypothesis is rejected at 5% level of significance, i.e. the data are inconsistent
with the hypothesis
P1 = P2 and we conclude that there is significant difference in the proportion of male births in the towns
A and B.
Example 2. In two large populations, there are 30 and 25 percent respectively of blue-eyed people. Is this
difference likely to be hidden in samples of 1200 and 900 respectively from the two populations? Take
5% level of significance.
Let 𝑃1 be the proportion of blue –eyed people in the first population =0.30
Let H0: P1 = P2
H1: P1 ≠ P2
Q = 1 − P = 0.721
1 1 1 1
SE= √PQ (𝑛 + 𝑛 ) = √0.279 × 0.721 (1200 + 900) = 0.0197
1 2
0.05
z = (P1 - P2) / SE = 0.0197 = 2.538
Since, z>1.96, the null hypothesis is rejected at , i.e. the data are inconsistent with the hypothesis
Exercises:
Samples of two types of electric bulbs were tested for length of life and the following data were
obtained Is the difference in the means sufficient to warrant that type I bulbs are superior to type
2 bulbs? (ANS t =9.39 , rejected )
Size Mean SD
Sample 1 8 1234 hr 36 hr
Sample 2 7 1036 hr 40 hr
A company tests the battery life (in hours) of two different brands. Is there a significant difference between
the means of these two samples at 0.01 level of significance? The results are:
is given below. In the 𝑟 × 𝑐 contingency table the observed frequencies of different cells are shown below:
𝐵1 𝐵2 𝐵3 … 𝐵𝑐
𝐴1 𝑂11 𝑂12 𝑂13 … 𝑂1𝑐
𝐴2 𝑂21 𝑂22 𝑂23 … 𝑂2𝑐
… … … … … …
𝐴𝑟 𝑂𝑟1 𝑂𝑟2 𝑂𝑟3 … 𝑂𝑟𝑐
Performance
Good Not good Total
Trained 100 (90) 50 (60) 150
Untrained 20 (30) 30 (20) 50
120 80 200
(150)(120)
Expected frequency of cell (1,1) = 200 = 90
The expected frequencies of different cells are indicated in brackets in above table.
(𝑜𝑖 − 𝑒𝑖 )2 (100 − 90)2 (50 − 60)2 (20 − 30)2 (30 − 20)2
𝜒2 = ∑ = + + +
𝑒𝑖 90 60 30 20
= 1.11 + 1.67 + 3.33 + 5 = 11.11
𝑑. 𝑓. = (𝑟 − 1)(𝑐 − 1) = (2 − 1)(2 − 1) = 1
On 1 d.f. and at 5% significance level, table value of 𝜒 2 = 3.84
2 2
i.e. 𝜒𝑐𝑎𝑙 > 𝜒𝑡𝑎𝑏
Hence𝐻0 is rejected
Thus performance depends upon training.
Example: The result in the last exam of a sample of 100 students is given below:
1st class 2nd class 3rd class Total
Boys 10 28 12 50
Girls 20 22 2 50
Total 30 50 20 100
Can it be said that the performance in the exam depends upon gender.
(50)(30)
Expected frequency of cell (1,1) = 100 = 15
The expected frequencies of different cells are indicated in brackets in above table.
(𝑜𝑖 − 𝑒𝑖 )2
𝜒2 = ∑
𝑒𝑖
(10 − 15)2 (28 − 25)2 (12 − 10)2 (20 − 15)2 (22 − 25)2 (8 − 10)2
= + + + + +
15 25 10 15 25 10
= 1.67 + 0.3 + 0.4 + 1.67 + 0.36 + 0.4 = 4.86
𝑑. 𝑓. = (𝑟 − 1)(𝑐 − 1) = (2 − 1)(3 − 1) = 2
On 2 d.f. and at 5% significance level, table value of 𝜒 2 = 5.99
2 2
i.e. 𝜒𝑐𝑎𝑙 < 𝜒𝑡𝑎𝑏
hence 𝐻0 is accepted.
Thus performance does not depend upon gender.
Exercise: In a certain sample of 2000 families, 1400 families are consumers of tea. Out of 1800 Hindu
families, 1236 families consume tea. Use 𝜒 2 test and state whether there is any significant difference
between consumption of tea among Hindu and non-Hindu families.
1
Solution:𝐻0 : Die is unbiased i.e. the probability of getting any number on die is 6 .
3 49 50 0.02
4 53 50 0.18
5 57 50 0.98
6 56 50 0.72
(𝑜𝑖 − 𝑒𝑖 )2
𝜒2 = ∑ = 4.24
𝑒𝑖
𝑑. 𝑓. = 𝑛 − 1 = 6 − 1 = 5
2
The table value 𝜒𝑡𝑎𝑏 on 5 d.f. and 5% significance level is
2
𝜒𝑡𝑎𝑏 = 11.07
2 2
Hence 𝜒𝑐𝑎𝑙 < 𝜒𝑡𝑎𝑏
Thus 𝐻0 may be accepted.
Therefore die may be regarded as unbiased.
Example: The number of road accidents on a highway during a week is given below. Can it be considered
that the proportion of accidents are equal for all days?
Day Mon Tue Wed Thurs Fri Sat Sun
Number of 14 16 8 12 11 9 14
accidents
Solution:𝐻0 : the proportion of accidents is the same for all the days i.e. probability of an accident on any
1
day is 7 .
Day Mon Tue Wed Thurs Fri Sat Sun Total
Observed 14 16 8 12 11 9 14 84
frequency
Expected 12 12 12 12 12 12 12 84
frequency
= 4.17
𝑑. 𝑓. = 𝑛 − 1 = 7 − 1 = 6
2
Table value 𝜒 2 on 6 d.f. and at 5% significance level 𝜒𝑡𝑎𝑏 =12.59
2 2
𝜒𝑐𝑎𝑙 < 𝜒𝑡𝑎𝑏
Hence 𝐻0 may be accepted. Thus proportions of accidnts is same for all days.
Exercise:
1. he units produced by a plant are classified into four grades. The past performance of the plant
shows that the respective proportions are 8:4:2:1. To check the run of the plant 600 parts are
examined and classified as follows. Is there any evidence of a change in production standards?
Grades 1st 2nd 3rd 4th Total
Units 340 130 100 30 600
2. A store manager believes that customers prefer three types of payment methods (Cash, Credit
Card, Digital Payment) in a ratio of 40:35:25. A sample of 200 customers provided the following
actual preferences Cash as 85, Credit Card as 70 and Digital Payment as 45. Test at a 5%
significance level whether the observed data matches the expected distribution.(Ans: Since χ² =
0.8125 < 5.99, we fail to reject the null hypothesis.)
3. A study was conducted to check if coffee consumption is independent of gender. The following
data was collected from 100 people:
Drinks Coffee Does Not Drink Coffee Total
Male 30 20 50
Female 35 15 50
Total 65 35 100
Test whether coffee consumption is independent of gender at a 5% level of significance using the
Chi-Square test for independence. (Ans: Since χ² = 1.0988 < 3.84, we fail to reject the null
hypothesis)