0% found this document useful (0 votes)
3 views30 pages

Unit 3 Lecture Notes

The document serves as a study guide on hypothesis testing, covering key concepts such as statistical hypotheses, types of errors, and various tests including Z-tests and t-tests. It outlines the formal process of hypothesis testing, including stating hypotheses, analyzing data, and interpreting results, while also discussing the significance level and power of tests. Additionally, it provides examples and exercises to reinforce understanding of the material.

Uploaded by

maitradalia8
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views30 pages

Unit 3 Lecture Notes

The document serves as a study guide on hypothesis testing, covering key concepts such as statistical hypotheses, types of errors, and various tests including Z-tests and t-tests. It outlines the formal process of hypothesis testing, including stating hypotheses, analyzing data, and interpreting results, while also discussing the significance level and power of tests. Additionally, it provides examples and exercises to reinforce understanding of the material.

Uploaded by

maitradalia8
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Testing of Hypothesis

Study Guide

Yash Fichadiya/Hardik Gupta


Department Of Applied Science & Humanities,
PIET
Parul University
INDEX
Sr Page
Topic Subtopics
No. No.
Introduction to Hypothesis Statistical hypothesis, null & alternative hypothesis,
1 2
Testing test statistic
Type I & Type II errors, power of test, level of
2 Types of Errors & Significance 3
significance
3 One-tailed & Two-tailed Tests Rejection regions, critical values, examples 4
4 Z-Test Concepts When to use, assumptions, formula, decision rules 5
5 Z-Test for Population Mean One sample mean test, two-sample mean test 8
6 Z-Test for Proportion One-sample proportion test, two-proportion test 9
7 Solved Z-Test Examples Mean test, proportion test, real-life examples 10-15
8 t-Test Concepts When to use, small sample conditions, formula 15
9 One-Sample t-Test Mean test example, procedure 16-17
10 Two-Sample t-Test Independent samples, equal variance assumption 18
11 Paired t-Test Differences, before-after studies, examples 19
12 Solved t-Test Examples Independent sample test, paired sample test 20-21
Test for Difference Between
13 Theory, steps, examples 21-23
Proportions
14 Chi-Square Test – Introduction χ² definition, assumptions 24
15 Chi-Square Test for Independence Contingency table, expected frequency, examples 25
16 Chi-Square Goodness of Fit Observed vs expected frequencies, df, examples 25-26
17 Exercises (Mixed) Z-test, t-test, proportion test, chi-square problems 28

PROBABILITY, STATISTICS AND NUMERICAL


1
METHODS (303191251)
Statistical Hypotheses:

A statistical hypothesis is an assumption about a population parameter. This assumption may or may not
be true. Hypothesis testing refers to the formal procedures used by statisticians to accept or reject
statistical hypothesis.

The best way to determine whether a statistical hypothesis is true would be to examine the entire
population. Since that is often impractical, researchers typically examine a random sample from the
population. If sample data are not consistent with the statistical hypothesis, the hypothesis is rejected.

Terms related to tests of hypothesis:

Parameter.: These statistical constants of population are called parameter. Greek letters are used to
denote the population parameter. E.g. Mean(𝜇), Standard deviation.(𝜎), Population proportion.(P) etc.

Statistics: The statistical constants for the sample drawn from the given population are called the
statistics. Roman letters are used to denote the sample statistics. E.g. mean (𝑥̅ ), Standard deviation (s),
Sample proportion (p).

There are two types of statistical hypothesis:

 Null hypothesis. The null hypothesis, denoted by H0, is usually the hypothesis that sample
observations result purely from chance.

Alternative hypothesis. The alternative hypothesis, denoted by H1 or Ha, is the hypothesis that sample
observations are influenced by some non-random cause.

For example, suppose we wanted to determine whether a coin was fair and balanced. A null hypothesis
might be that half the flips would result in Heads and half, in Tails. The alternative hypothesis might be
that the number of Heads and Tails would be very different. Symbolically, these hypotheses would be
expressed as

H0: P = 0.5
H1: P ≠ 0.5

Suppose we flipped the coin 50 times, resulting in 40 Heads and 10 Tails. Given this result, we would be
inclined to reject the null hypothesis. We would conclude, based on the evidence, that the coin was
probably not fair and balanced.

Test Statistics:

After setting up the null hypothesis and alternative hypothesis test statistics is calculated. It is used to test
whether the null hypothesis should be accepted or rejected.

PROBABILITY, STATISTICS AND NUMERICAL


2
METHODS (303191251)
Hypothesis Tests

Statisticians follow a formal process to determine whether to reject a null hypothesis, based on sample
data. This process, called hypothesis testing, consists of four steps.

 State the hypotheses. This involves stating the null and alternative hypotheses. The hypotheses are
stated in such a way that they are mutually exclusive. That is, if one is true, the other must be false.

 Formulate an analysis plan. The analysis plan describes how to use sample data to evaluate the null
hypothesis. The evaluation often focuses around a single test statistic.

 Analyze sample data. Find the value of the test statistic (mean score, proportion, t statistic, z-score,
etc.) described in the analysis plan.

 Interpret results. Apply the decision rule described in the analysis plan. If the value of the test statistic
is unlikely, based on the null hypothesis, reject the null hypothesis.

Level of significance: The level of significance is the maximum probability of making a Type I error and
is denoted by (𝛼) alpha

Two tailed and one tailed test: When the test hypothesis is made on the basis of region of rejection
represented by both the sides of the standard normal curve it is called a two tailed test. i.e. H0 : µ = µ0 H1 :
µ ≠ µ0. A test of statistical hypothesis where the alternative hypothesis is one sided is called one tailed
test. H0 : µ ≤ µ0 H1 : µ > µ0., H0 : µ ≥ µ0 H1 :µ<µ0.

Decision Errors

Two types of errors can result from a hypothesis test.

 Type I error. A Type I error occurs when the researcher rejects a null hypothesis when it is true. The
probability of committing a Type I error is called the level of singnificance. This probability is also
called alpha, and is often denoted by α.

 Type II error. A Type II error occurs when the researcher fails to reject a null hypothesis that is false
. The probability of committing a Type II error is called Beta, and is often denoted by β. The
probability of not committing a Type II error is called the Power of the test.

TABLE VALUE. ( Z – TEST)

%( LEVEL OF C ALPHA One tail test Two tail test


SIGNIFICANCE)
1% 0.99 0.01 2.33 2.575
2% 0.98 0.02 2.05 2.33
5% 0.95 0.05 1.645 1.96
10% 0.90 0.1 1.28 1.645

PROBABILITY, STATISTICS AND NUMERICAL


3
METHODS (303191251)
One Sample / Two Sample Hypothesis Tests

Applied to determine if the population mean is consistent with a


specified value or standard

Two tests
Difference
the z- test and the t-test = Z
S.E

Assumptions: z-test

 the underlying distribution is normal or the Central Limit


Theorem can be assumed to hold

 the sample has been randomly selected

 the population standard deviation is known or the


sample size is at least 25.

Assumptions: the t- test

 the underlying distribution is normal or the Central Limit


Theorem can be assumed to hold

 the sample has been randomly selected

PROBABILITY, STATISTICS AND NUMERICAL


4
METHODS (303191251)
Decision Rules The analysis plan includes decision rules for rejecting the null hypothesis. In practice,
statisticians describe these decision rules in two ways - with reference to a P-value or with reference to a
region of acceptance.

Summary of Computational Steps

1. Specify the null hypothesis and an alternative hypothesis.

2. Compute M = ΣX/N.

3. Compute .

4. Compute where M is the sample mean and µ is the hypothesized value of the population.

5. Write Zt using the table value.

6. If Zc > Zt
we reject the null hypothesis. Otherwise we accept null hypothesis.

Testing of Means when the Population Standard Deviation is Known

This section explains how to compute a significance test for the mean of a normally-distributed variable
for which the population standard deviation (σ) is known. In practice, the standard deviation is rarely
known. However, learning how to compute a significance test when the standard deviation is known is an
excellent introduction to how to compute a significance test in the more realistic situation in which the
standard deviation has to be estimated.

1. The first step in hypothesis testing is to specify the null hypothesis and the alternate hypothesis. In
testing hypotheses about µ, the null hypothesis is a hypothesized value of µ. Suppose the mean score
of all 10-year old children on an anxiety scale were 7. If a researcher were interested in whether 10-
year old children with alcoholic parents had a different mean score on the anxiety scale, then the null
and alternative hypotheses would be:
H0: µalcoholic = 7
Ha: µalcoholic ≠ 7

2. The second step is to choose a significance level. Assume the 0.05 level is chosen.

3. The third step is to compute the mean. Assume M = 8.1.


4. The fourth step is to compute p, the probability (or probability value) of obtaining a difference
between M and the hypothesized value of µ (7.0) as large or larger than the difference obtained in the
experiment. Applying the general formula to this problem,

5. The sample size (N) and the population standard deviation (σ) are needed to calculate σM. Assume
that N = 16 and σ= 2.0. Then,

PROBABILITY, STATISTICS AND NUMERICAL


5
METHODS (303191251)
8.1−7
𝑍= = 2.2 At 0.05 level of significance , Zt = 1.96
0.50

𝟔. Zc > Zt we reject the null hypothesis. It is concluded that the mean anxiety score of 10

year-old children with alcoholic parents is higher than the population mean.

Power of a Hypothesis Test

The probability of not committing a Type II error is called the power of a hypothesis test.

Effect Size
To compute the power of the test, one offers an alternative view about the "true" value of the population
parameter, assuming that the null hypothesis is false. The effect size is the difference between the true
value and the value specified in the null hypothesis.

Effect size = True value - Hypothesized value

For example, suppose the null hypothesis states that a population mean is equal to 100. A researcher
might ask: What is the probability of rejecting the null hypothesis if the true population mean is equal to
90? In this example, the effect size would be 90 - 100, which equals -10.

Factors That Affect Power


The power of a hypothesis test is affected by three factors.

 Sample size (n). Other things being equal, the greater the sample size, the greater the power of the
[Link] level (α). The higher the significance level, the higher the power of the test. If you
increase the significance level, you reduce the region of acceptance. As a result, you are more likely
to reject the null hypothesis. This means you are less likely to accept the null hypothesis when it is
false; i.e., less likely to make a Type II error. Hence, the power of the test is increased.

 The "true" value of the parameter being tested. The greater the difference between the "true" value of
a parameter and the value specified in the null hypothesis, the greater the power of the test. That is,
the greater the effect size, the greater the power of the test

PROBABILITY, STATISTICS AND NUMERICAL


6
METHODS (303191251)
Problem 1: Other things being equal, which of the following actions will reduce the power of a
hypothesis test?

I. Increasing sample size.


II. Increasing significance level.
III. Increasing beta, the probability of a Type II error.

(A) I only
(B) II only
(C) III only
(D) All of the above
(E) None of the above

Solution: The correct answer is (C). Increasing sample size makes the hypothesis test more sensitive -
more likely to reject the null hypothesis when it is, in fact, false. Increasing the significance level reduces
the region of acceptance, which makes the hypothesis test more likely to reject the null hypothesis, thus
increasing the power of the test. Since, by definition, power is equal to one minus beta, the power of a
test will get smaller as beta gets bigger.

Problem 2: Suppose a researcher conducts an experiment to test a hypothesis. If she doubles her sample
size, which of the following will increase?

I. The power of the hypothesis test.


II. The effect size of the hypothesis test.
III. The probability of making a Type II error.

(A) I only
(B) II only
(C) III only
(D) All of the above
(E) None of the above

Solution: The correct answer is (A). Increasing sample size makes the hypothesis test more sensitive -
more likely to reject the null hypothesis when it is, in fact, false. Thus, it increases the power of the test.
The effect size is not affected by sample size. And the probability of making a Type II error gets smaller,
not bigger, as sample size increases.

Problem 3: In hypothesis testing, which of the following statements is always true?

I. The P-value is greater than the significance level.


II. The P-value is computed from the significance level.
III. The P-value is the parameter in the null hypothesis.
IV. The P-value is a test statistic.
V. The P-value is a probability.

(A) I only
(B) II only
(C) III only

PROBABILITY, STATISTICS AND NUMERICAL


7
METHODS (303191251)
(D) IV only
(E) V only

Solution: The correct answer is (E). The P-value is the probability of observing a sample statistic as
extreme as the test statistic. It can be greater than the significance level, but it can also be smaller than the
significance level. It is not computed from the significance level, it is not the parameter in the null
hypothesis, and it is not a test statistic.

Example for Hypothesis Test for a Proportin


1, In a hospital out of 500 new born babies,280 are [Link] this information support the hypothesis that the
births of boys and girls are in equal proportions?(Take 1% level of significance) H 0 : Proportion of boys P=1/2

1
H1 : P  (two – tailed test)
2

Difference  p  P 280 1
   0.06
500 2

1 1

PQ 2 2  0.02236
p 
S.E of n 500

Difference 0.06
Z   2.68  258
S.E 0.02236

Therefore, H may be rejected at 1% level of significance. i.e the


0

proportion of births of boys and girls may not be regarded equal.

Interpret Results If the sample findings are unlikely, given the null
hypothesis, the researcher rejects the null hypothesis. Typically,
this involves comparing the P-value to the significance level, and
rejecting the null hypothesis when the P-value is less than the
significance level.

PROBABILITY, STATISTICS AND NUMERICAL


8
METHODS (303191251)
Problem 1 One-Tailed Test

Suppose the previous example is stated a little bit differently. Suppose the CEO claims that at least
80 percent of the company's 1,000,000 customers are very satisfied. Again, 100 customers are
surveyed using simple random sampling. The result: 73 percent are very satisfied. Based on these
results, should we accept or reject the CEO's hypothesis? Assume a significance level of 0.05. (5%)

Solution: The solution to this problem takes four steps: (1) state the hypotheses, (2) formulate an analysis
plan, (3) analyze sample data, and (4) interpret results. We work through those steps below:

State the hypotheses. The first step is to state the null hypothesis and an alternative hypothesis.

Null hypothesis: P >= 0.80


Alternative hypothesis: P < 0.80( ONE TAILED TEST)

Note that these hypotheses constitute a one-tailed test. The null hypothesis will be rejected only if the
sample proportion is too small.

 Formulate an analysis plan. For this analysis, the significance level is 0.05. The test method, shown
in the next section, is a one-sample z-test.
 Analyze sample data. Using sample data, we calculate the standard deviation (σ) and compute the z-
score test statistic (z).

S.E. of z = √𝑃(1 − 𝑃)/𝑛= √(0.8)(0.2)/100 = √0.0016 = 0.04

z = (p - P) /S.E. of z = (0.73 - 0.80)/0.04 = -1.75

where P is the hypothesized value of population proportion in the null hypothesis, p is the sample
proportion, and n is the sample size.

Since we have a one-tailed test, the P-value is the probability that the z-score is less than -1.75. We use
the Normal Distribution Calculator to find P(z < -1.75) = 0.04. Thus, the P-value = 0.04.

 Interpret results. Since the P-value (0.04) is less than the significance level (0.05), we cannot accept
the null hypothesis. ( NULL HYPOTHESES IS REJECTED)

Note: If you use this approach on an exam, you may also want to mention why this approach is
appropriate. Specifically, the approach is appropriate because the sampling method was simple random
sampling, the sample included at least 10 successes and 10 failures, and the population size was at least
10 times the sample size.

PROBABILITY, STATISTICS AND NUMERICAL


9
METHODS (303191251)
Example: One Sample Hypothesis Test

Large Sample: Sample size: n > 30


1. The scores on an aptitude test required for entry into a certain job position have a mean of 500
and a standard deviation of 120. If a random sample of 36 applicants has a mean of 546, is there
evidence that their mean score is different from the mean that is expected from all applicants?

Ans: Null and Alternative Hypothesis

H0: µ= 500

Ha: µ  500

Convert 546 to a z-score to compare it to the assumed population


mean.
x 546  500 46
z    2.3
 120 20
n 36
Zt = 1.96(5% level of significance)
Zc > Zt
we reject the null hypothesis.
Thus, we conclude that the population mean is not 500; that is we reject the null hypothesis and accept
the alternate, concluding that the mean is not 500.
Let’s construct a 95% confidence interval estimate of the population mean.
546  1.96*( 120 ) = 546  39.2
36
The lower limit of the interval is 546 - 39.2 = 506.8
The upper limit of the interval is 546 + 39.2 = 585.2
Thus, we conclude that the actual mean score for the population from which this sample was drawn falls
between 507 and 585.

[Link] problem number 1 assuming that the sample size is 16.

Approach the problem the same way as in 1, using the t-


distribution.
x 546  500 46
t     1.5
.3
s 120 30
n 16
The degrees of freedom is 16-1=15
Using the t-table with 15 degrees of freedom, we find the closest t-value to 1.53 is 1.753

PROBABILITY, STATISTICS AND NUMERICAL


10
METHODS (303191251)
3.A sample of 400 students has a mean height of 171.38 cms. Can it be reasonably regarded as a
random sample from a large population with mean height 171.17 and standard deviation 3.3 cms ?

Ans:
H 0 :   171.17
H 1 :   171.17
Difference  x    171.38  171.17  0.21
 3.3
S .Eofx   0.165
n 400
Diff . 0.21
Z   1.27  1.96
S.E. 0.165
Therefore, H 0 may be accepted at 5% level of significance.
Therefore, the sample may be regarded as a random sample from a population with mean 171.17

4. A random sample of 100 students from a college of 1200 students gave mean and S.D of heights
as 66 inches and 1.2 inches respectively. Test the hypothesis that the average height of all the
students of the college is 65.8

H 0 :   65.8
Ans:
H1 :   65.8

Difference  x    66  65.8  0.2

n 100
As S.D of the population is not Known and sampling fraction   0.08 is more than 0.05,we use
N 1200
the following formula for S.E.

S N n 1.2 1200  100


S .Eofx      0.12
n 1 N 1 100  1 1200  1

Diff . 0.2
Z   1.67  1.96
S.E. 0.12

Therefore, H 0 may be accepted. i.e. average height of all students of the college may be regarded as 65
inches.

PROBABILITY, STATISTICS AND NUMERICAL


11
METHODS (303191251)
Exercises:

1. A stenographer claims that he can write at an average speed of 120 words per minutes. In 100
trials he obtained an average speed of 116 words per minute with S.D. of 15 words. Is the claim
justified? (Use 5% level of significance) (summer 22-23)
2. A sample of 400 students has a mean height of 171.38 cms. Can it be reasonably regarded as a
random sample from a large population with mean height 171.17 and standard deviation 3.3
cm? ( 5% level of significance.=1.96)
3. A random sample of size 20 from a normal population has mean 42 and standard deviation of
Test the hypothesis that the population mean is 45. Use 5% level of significance. (𝑡0.05 =2.09)
4. A machinist is making engine parts with axle diameter of 0.7 cm. A random sample of 10 parts shows a
mean diameter of 0.742 cm with a standard deviation of 0.04 cm. Compute the statistic you would use to
test whether work is meeting the specification at 0.05 level of significance. (Ans: t=2.262, rejected.)

PROBABILITY, STATISTICS AND NUMERICAL


12
METHODS (303191251)
Testing Hypothesis: Two –Sample test

Hypothesis Testing for Differences between means and proportions


In many decision-making situations, people need to determine whether the parameters of two populations
are alike or different. For example (i) A company may want to test whether its female employees receive
lower salaries than its male employees for the same work. (ii) A drug manufacturer may need to know
whether a new drug causes one reaction in one group of experimental animals but a different reaction in
another group. Hence, decision makers are concerned with the parameters of two populations and
applying hypothesis testing procedure for their needs.

Difference between means


Suppose we take a random sample from the distribution of population 1 and another population [Link] we
   
then subtract the two sample means, we get x1  x2 .This difference will be positive if x1 is larger than x2
 
and negative if x2 is larger than x1 .
The mean of the sampling distribution of the difference between sample means is symbolized as    or
x1  x2

   or simply 1   2 .
 
x1 x2

If 1  2 then,       0 .
x1 x2

The standard deviation of the distribution of the difference between the sample means is called the
standard error of the difference between two means and is calculated by using this formula: 𝜎𝑑 =
𝜎1 2 𝜎2 2
√ +
𝑛1 𝑛2
where, 𝜎1 2 =variance of population 1
𝜎2 2 =variance of population 2
𝑛1 =size of sample from population 1
𝑛2 =size of sample from population 2
d= 𝑥
̅̅̅1 − ̅̅̅
𝑥2

If two population standard deviations are not known, we can estimate the standard error of the difference
between two means by using the formula
 2  2
 1 2
d  
n1 n2
 2
where,  1 =estimated variance of population 1
 2
 2 = estimated variance of population 2

PROBABILITY, STATISTICS AND NUMERICAL


13
METHODS (303191251)
Tests for difference between means: Large sample sizes
When both sample sizes are greater than 30 we have to do two-tailed test of a hypothesis about the
difference between two means.
Steps:
1. State your hypothesis, type of test and significance level.
(Often, researchers choose significance levels equal to 0.01, 0.05, or 0.10; but any value between 0 and 1
can be used)
2. Choose the appropriate distribution and the critical value.
3. Compute the standard error and standardize the sample statistic.
4. Sketch the distribution and mark the sample value and critical values.
5. Interpret the result by testing the difference between means when 1  2  0 .
Example 1 The mean height of 50 male students who showed above average participation in college
athletics was 68.2 inches with a standard deviation of 2.5 inches; while 50 male students who showed no
interest in such participation had a mean height of 67.5 inches with a standard deviation of 2.8 inches.
(i)Test the hypothesis that male students who participate in college athletics are taller than other male
students.
(ii)By how much should the sample size of each of the two groups be increased in order that the observed
difference of 0.7 inches in mean heights be significant at the 5% level of significance.

Solution. Let 𝑋1 and 𝑋2 denote the height (in inches) of athletic participants and non- athletic
participants respectively. In the usual notations, we are given:
𝐻0 : 𝜇1 = 𝜇2
𝐻1 : 𝜇1 > 𝜇2
𝑠1 = 2.5 , 𝑛1 = 50,𝑥̅̅̅1 = 68.2, 𝑠2 = 2.8 , 𝑛2 = 50, 𝑥̅̅̅2 = 67.5
 2 2
s1 s 6.25 7.84
d   2    0.53
n1 n2 50 50

  0.05 (level of significance)


̅𝑥̅̅1̅−𝑥
̅̅̅2̅ 68.2−67.5
𝑧= 
= = 1.32
0.53
d
For a right –tailed test, the critical value of z at 5% level of significance is 1.645.
(i)Since, the calculated value of z (=1.32) is less than the critical value (=1.645),it is not significant at 5%
level of significance. Hence, the null hypothesis is accepted and we conclude that the college athletes are
not taller than other male students.
(ii)The difference between the mean heights of two groups, each of size n will be significant at 5% level
of significance if 𝑧 ≥ 1.645
68.2−67.5 0.7
≥ 1.645 Or ≥ 1.645
6.25 7.84 14.09

n n n
2
1.645 × 3.754
𝑛≥( ) ≈ 78
0.7
Hence the sample size of each of the two groups should be increased by atleast 78-50=28,in order that the
difference between the mean heights of two groups is significant.

PROBABILITY, STATISTICS AND NUMERICAL


14
METHODS (303191251)
Example 2 Two independent samples of observations were collected. For the first sample of 60 elements,
the mean was 86 and the standard deviation [Link] second sample of 75 elements had a mean of 82 and a
standard deviation of 9.
(a)Compute the estimation standard error of the difference between the two means.
(b)Using 𝛼 = 0.01, test whether the two samples can reasonably be considered to have come from
populations with the same mean.

Solution: 𝐻0 : 𝜇1 = 𝜇2
𝐻1 : 𝜇1 ≠ 𝜇2
𝑠1 = 6 , 𝑛1 = 60,𝑥 ̅̅̅1 = 86, 𝑠2 = 9 , 𝑛2 = 75, ̅̅̅
𝑥2 = 82
 2 2
s s 36 81
d  1  2    1.296
n1 n2 60 75
  0.01 (level of significance)


The limits of the acceptance region are 𝑧 = ±2.58 or ̅̅̅ 𝑥2 = 0 ± 𝑧  d
𝑥1 − ̅̅̅
= ±2.58(1.296) = ±3.344
(𝑥 ̅̅̅2̅)−(𝜇1 −𝜇2 )𝐻0
̅̅̅1̅−𝑥 (86−82)−0
Because the observed 𝑧 value = 
= = 3.09 > 2.58
1.296
d
Hence, we reject 𝐻0 .
It is reasonable to conclude that the two samples come from different populations.

(t-TEST) Test for Differences between Means: Small Sample Sizes


Suppose two independent small samples of size 𝑛1 and 𝑛2 are drawn from two normal populations and
the means of the samples are ̅̅̅
𝑥1 and ̅̅̅
𝑥2 respectively. If we want to test the hypothesis that population
means are equal we can apply t test in the following way.
Steps:
1. State your hypothesis, type of test and significance level.

The table below shows three sets of null and alternative hypotheses. Each makes a statement about the
difference d between the mean of one population μ1 and the mean of another population μ2. (In the table,
the symbol ≠ means " not equal to ".)

Set Null hypothesis Alternative hypothesis Number of tails


1 μ1 - μ2 = d μ1 - μ2 ≠ d 2
2 μ1 - μ2 > d μ1 - μ2 < d 1
3 μ1 - μ2 < d μ1 - μ2 > d 1

The first set of hypotheses (Set 1) is an example of a two-tailed test, since an extreme value on either side
of the sampling distribution would cause a researcher to reject the null hypothesis. The other two sets of
hypotheses (Sets 2 and 3) are one-tailed tests, since an extreme value on only one side of the sampling
distribution would cause a researcher to reject the null hypothesis.

PROBABILITY, STATISTICS AND NUMERICAL


15
METHODS (303191251)
When the null hypothesis states that there is no difference between the two population means (i.e., d = 0),
the null and alternative hypothesis are often stated in the following form.

H0: μ1 = μ2
Ha: μ1 ≠ μ2

2. Choose the appropriate distribution and the critical value.

3. Compute the standard error and standardize the sample statistic.


Under the assumption that both the population have the same variance.
̅̅̅1̅−𝑥
|𝑥 ̅̅̅2̅| ̅̅̅1̅−𝑥
|𝑥 ̅̅̅2̅| 𝑛1 𝑛2
𝑡= = ×√
1
𝑆√ +
1 𝑆 𝑛1 +𝑛2
𝑛1 𝑛2
1
where, 𝑆 2 = 𝑛 𝑥1 2 + ∑(𝑥2 − ̅̅̅)
{∑(𝑥1 − ̅̅̅) 𝑥2 2 }
1 +𝑛2 −2
1
= {𝑛1 𝑆1 2 + 𝑛2 𝑆2 2 }
𝑛1 +𝑛2 −2
1 1
where, 𝑆1 2 = 𝑥1 2 and 𝑆2 2 = 𝑛 ∑(𝑥2
∑(𝑥1 − ̅̅̅) −𝑥
̅̅̅)
2
2
𝑛1 2

t is based on n1 + n2 − 2 degrees of freedom. For testing the null hypothesis 𝐻0 : 1  2


against 𝐻1 : 1  2 the value of t is computed from the given data and it is compared with the table value
of t on appropriate degrees of freedom and at a required level of significance. The decision regarding
acceptance or rejection of the hypothesis is then taken.

4. Sketch the distribution and mark the sample value and critical values.

5. Interpret the result by testing the difference between means when 1  2  0 .

Example 1. Samples of two types of electric bulbs were tested for length of life and following data were
obtained.

Type I Type II
Number of Units 8 7
1Mean (in hours) 1134 1024
S.D.(in hours) 35 40
Test at 5% level whether the difference in the sample means is significant.

Solution. Here, 𝑛1 = 8, ̅̅̅


𝑥1 = 1134, 𝑆1 = 35
𝑛2 = 7, 𝑥
̅̅̅2 = 1024, 𝑆2 = 40
1 1
𝑆2 = 𝑛 {𝑛1 𝑆1 2 + 𝑛2 𝑆2 2 } = 8+7−2 {8(35)2 + 7(40)2 } = 1615.38
1 +𝑛2 −2

Therefore, 𝑆 = √1615.38 = 40.192


|𝑥
̅̅̅1 − ̅̅̅|
𝑥2 𝑛1 𝑛2 |1134 − 1024| 8×7
𝑡= ×√ = ×√ = 5.288
𝑆 𝑛1 + 𝑛2 40.192 8+7

PROBABILITY, STATISTICS AND NUMERICAL


16
METHODS (303191251)
D.f= 𝑛1 + 𝑛2 − 2 = 13
Table value of t on 13 d.f and at 5% level of significance =2.16
As 𝑡𝑐𝑎𝑙 > 𝑡𝑡𝑎𝑏
Therefore, 𝐻0 is rejected.
Hence,the two types of bulbs differ significantly so far as their mean lives are concerned.

Example 2. Below are given the gain in weights (in lbs) of cows fed on two diets X and Y.
Diet 25 32 30 32 24 14 32
X
Diet 24 34 22 30 42 31 40 30 32 35
Y
Test at 5% level whether the two diets differ as regard their effects on mean increase in weight.
Solution. H0 : 1  2
H1 : 1  2
𝑥1 𝑥2 𝑥1 − ̅̅̅
𝑥1 𝑥1 2
(𝑥1 − ̅̅̅) 𝑥2 − 𝑥̅̅̅2 𝑥2 2
(𝑥2 − ̅̅̅)
25 24 -2 4 -8 64
32 34 5 25 2 4
30 22 3 9 -10 100
32 30 5 25 -2 4
24 42 -3 9 10 100
14 31 -13 169 -1 1
32 40 5 25 8 64
30 -2 4
32 0 0
35 3 9
189 320 0 266 0 350

∑ x1 189 ∑ x2 320
𝑥1 =
̅̅̅ = = 27 , ̅̅̅
𝑥2 = = = 32
n1 7 n2 10
1 1
𝑆2 = 𝑛 {𝑛1 𝑆1 2 + 𝑛2 𝑆2 2 } = 7+10−2 {266 + 350} = 41.067
1 +𝑛2 −2
Therefore, 𝑆 = √41.067 = 6.41
̅̅̅1̅−𝑥
|𝑥 ̅̅̅2̅| 𝑛 𝑛 |37−32| 7×10
𝑡= × √𝑛 1+𝑛2 = × √7+10 = 1.58
𝑆 1 2 6.41

D.f= 𝑛1 + 𝑛2 − 2 = 15
Table value of t on 15 d.f and at 5% level of significance =2.131
As 𝑡𝑐𝑎𝑙 < 𝑡𝑡𝑎𝑏
Therefore, 𝐻0 is accepted.
Hence, diets do not differ significantly.

PROBABILITY, STATISTICS AND NUMERICAL


17
METHODS (303191251)
Testing Differences between Means with Dependent Samples
Sometimes, however, it makes sense to take samples that are not independent of each other. Often, the
use of such dependent (or paired) samples enables us perform more precise analysis, because they will
allow us to control for extraneous factors. With dependent samples, we still follow the same basic
procedure of hypothesis testing. The only difference is the use of different formula for the estimated
standard error of the sample differences and that we will require that both samples to be of the same size.
We will compute by following steps:
∑𝑥
1) 𝑥̅ = 𝑛
where, ∑ 𝑥 is the sum of corresponding differences of the two samples.
n is the sample size

s
2) 𝑠 2 = 𝑛−1 (∑ 𝑥 2 − 𝑛𝑥̅ 2 )  x 
1

n
3)By considering the hypothesis and given level of significance compute the value of t according to
acceptance region.
𝑥̅ −𝜇𝐻
4) Compute the observed t value by the formula  0 .
x
5) Interpret the result.

Example 1 Sherri Welch is a quality control engineer with the windshield wiper manufacturing division
of Emsco, Inc. Emsco is currently considering two new synthetic rubbers for its wiper blades, and Sherri
was charges with seeing whether blades made with the two compounds wear equally well. She equipped
12 cars belonging to other Emsco employees with one blade made of each of the two compounds. On
cars 1 to 6, the right blade was made of compound A and the left blade was made of compound B; on
cars 7 to 12, compound A was used for the left blade. The cars were driven under normal operating
conditions until the blades no longer did a satisfactory job of clearing the windshield of rain. The data
below give the usable life (in days) of the blades. At 𝛼 = 0.05, do the two compounds wear equally well?
Car 1 2 3 4 5 6 7 8 9 10 11 12
Left 162 323 220 274 165 271 233 156 238 211 241 154
blade
Right 183 347 247 269 189 257 224 178 263 199 263 148
blade

Solution.

Car 1 2 3 4 5 6 7 8 9 10 11 12
Left blade 162 323 220 274 165 271 233 156 238 211 241 154
Right blade 183 347 247 269 189 257 224 178 263 199 263 148
Difference 21 24 27 -5 24 -14 9 -22 -25 12 -22 6

∑ 𝑥 35
𝑥̅ = = = 2.9167 𝑑𝑎𝑦𝑠
𝑛 12
1 1
𝑠2 = (∑ 𝑥 2 − 𝑛𝑥̅ 2 ) = (4397 − 12(2.9167)2 ) = 390.45, 𝑠 = √𝑠 2 = 19.76 𝑑𝑎𝑦𝑠
𝑛−1 11

PROBABILITY, STATISTICS AND NUMERICAL


18
METHODS (303191251)

s 19.76
x    5.7042 days
n 12
𝐻0 : 𝜇𝐴 = 𝜇𝐵
𝐻1 : 𝜇𝐴 ≠ 𝜇𝐵
𝛼 = 0.05
The limits of the acceptance region are 𝑡 = ±2.201 ,or

𝑥̅ = 0 ± 𝑡  x = ±2.201(5.7042) = ±12.55 𝑑𝑎𝑦𝑠
𝑥̅ −𝜇𝐻0 2.9167−0
Because the observed t value = 
= = 0.511 < 2.201
5.7042
x
(or 𝑥̅ = 2.9167 < 12.55),we do not reject 𝐻0 .The two compounds are not significantly different with
respect to usable life.

Example.2 Nine computer-components dealers in major metropolitan areas were asked for their prices
on two similar color inkjet printers. The results of this survey are given below. At = 0.05 , it is
reasonable to assert that, on average, the Apson printer is less expensive than the Okaydata printer?
Dealer 1 2 3 4 5 6 7 8 9
Apson 250 319 285 260 305 295 289 309 275
price(in
dollars)
Okaydata 270 325 269 275 289 285 295 325 300
price(in
dollars)

Solution.

Dealer 1 2 3 4 5 6 7 8 9
Apson 250 319 285 260 305 295 289 309 275
price(in
dollars)
Okaydata 270 325 269 275 289 285 295 325 300
price(in
dollars)
Difference 20 6 -16 15 -16 -10 6 16 25
∑ 𝑥 46
𝑥̅ = = = 5.1111 𝑑𝑜𝑙𝑙𝑎𝑟𝑠
𝑛 9
1 1
𝑠2 = (∑ 𝑥 2 − 𝑛𝑥̅ 2 ) = (2190 − 9(5.1111)2 ) = 244.36, 𝑠 = √𝑠 2 = 15.63 𝑑𝑜𝑙𝑙𝑎𝑟𝑠
𝑛−1 8

s 15.63
x    5.21 dollars
n 9
𝐻0 : 𝜇0 = 𝜇𝐴
𝐻1 : 𝜇0 > 𝜇𝐴
𝛼 = 0.05
The upper limit of the acceptance region is 𝑡 = 1.860 ,or

𝑥̅ = 0 ± 𝑡  x = 1.860(5.21) = 9.69 𝑑𝑜𝑙𝑙𝑎𝑟𝑠

PROBABILITY, STATISTICS AND NUMERICAL


19
METHODS (303191251)
𝑥̅ −𝜇𝐻0 5.1111−0
Because the observed t value = 
= = 0.981 < 1.860
5.21
x
(or 𝑥̅ = 5.1111 < 9.69),we do not reject 𝐻0 .On average, the Apson inkjet printer is not significantly less
expensive than the Okaydata inkjet printer.

Tests for difference between Proportions: Large sample sizes


This approach consists of four steps:

(1) State the hypothesis

(2) Formulate an analysis plan

(3) Analyze sample data

(4) Interpret results.

State the hypothesis

Every hypothesis test requires the analyst to state a null hypothesis and an alternative hypothesis. The
table below shows three sets of hypothesis. Each makes a statement about the difference d between two
population proportions, P1 and P2.

Set Null hypothesis Alternative hypothesis Number of tails


1 P1 - P2 = 0 P1 - P2 ≠ 0 2
2 P1 - P2 > 0 P1 - P2 < 0 1
3 P1 - P2 < 0 P1 - P2 > 0 1

The first set of hypotheses (Set 1) is an example of a two-tailed test, since an extreme value on either side
of the sampling distribution would cause a researcher to reject the null hypothesis. The other two sets of
hypotheses (Sets 2 and 3) are one-tailed tests, since an extreme value on only one side of the sampling
distribution would cause a researcher to reject the null hypothesis.

When the null hypothesis states that there is no difference between the two population proportions (i.e., d
= 0), the null and alternative hypothesis for a two-tailed test are often stated in the following form.

H0: P1 = P2
H1: P1 ≠ P2

PROBABILITY, STATISTICS AND NUMERICAL


20
METHODS (303191251)
Formulate an Analysis Plan

The analysis plan describes how to use sample data to accept or reject the null hypothesis. It should
specify the following elements.

 Significance level. Often, researchers choose significance levels equal to 0.01, 0.05, or 0.10; but any
value between 0 and 1 can be used.

 Test method. Use the two-proportion z-test to determine whether the hypothesized difference between
population proportions differs significantly from the observed sample difference.

Analyze Sample Data


Using sample data, complete the following computations to find the test statistic and its associated P-
Value.

 Pooled sample proportion. Since the null hypothesis states that P1=P2, we use a pooled sample
proportion (P) to compute the standard error of the sampling distribution.

𝑛1 𝑃1 + 𝑛2 𝑃2
𝑃=
𝑛1 + 𝑛2

where, 𝑃1 is the sample proportion from population 1

𝑃2 is the sample proportion from population 2

n1 is the size of sample 1

n2 is the size of sample 2.

 Standard error. Compute the standard error (SE) of the sampling distribution difference between
1 1
two proportions. SE = √PQ (n + n )
1 2

where, P is the pooled sample proportion Q=1-P

n1 is the size of sample 1 n2 is the size of sample 2.

 Test statistic. The test statistic is a z-score (z) defined by the following equation.

z = (P1 - P2) / SE

where, P1 is the proportion from sample 1

P2 is the proportion from sample 2

SE is the standard error of the sampling distribution.

PROBABILITY, STATISTICS AND NUMERICAL


21
METHODS (303191251)
Example 1 In a year there are 956 births in a town A, of which 52.5% were males, while in towns A and
B combined, this proportion in a total of 1,406 births was [Link] there any significant difference in the
proportion of male births in the two towns? Take 5% level of significance.

Sol. n1 = 956, n1 + n2 = 1,406 => n2 = 450

Let 𝑃1 be the proportion of males in the sample of town A =0.525

and 𝑃2 be the proportion of males in the sample of town B

Combined proportion P = 0.496(given)

𝑛1 𝑃1 + 𝑛2 𝑃2
𝑃=
𝑛1 + 𝑛2

956 × 0.525 + 450 × 𝑃2


0.496 =
1,406

𝑃2 = 0.434

Let H0: P1 = P2
H1: P1 ≠ P2

𝑄 = 1 − 𝑃 = 0.504

1 1 1 1
= √PQ (𝑛 + 𝑛 ) = √0.496 × 0.504 (956 + 450) = 0.027
1 2

0.091
z = (P1 - P2) / SE = 0.027 = 3.368

Since, z>1.96, the null hypothesis is rejected at 5% level of significance, i.e. the data are inconsistent
with the hypothesis

P1 = P2 and we conclude that there is significant difference in the proportion of male births in the towns
A and B.

Example 2. In two large populations, there are 30 and 25 percent respectively of blue-eyed people. Is this
difference likely to be hidden in samples of 1200 and 900 respectively from the two populations? Take
5% level of significance.

Solution. . n1 = 1200, n2 = 900

Let 𝑃1 be the proportion of blue –eyed people in the first population =0.30

and P2 be the proportion of blue –eyed people in the second population=0.25

PROBABILITY, STATISTICS AND NUMERICAL


22
METHODS (303191251)
Combined proportion

n1 P1 + n2 P2 1200 × 0.30 + 900 × 0.25


P= = = 0.279
n1 + n2 2100

Let H0: P1 = P2
H1: P1 ≠ P2

Q = 1 − P = 0.721

1 1 1 1
SE= √PQ (𝑛 + 𝑛 ) = √0.279 × 0.721 (1200 + 900) = 0.0197
1 2

0.05
z = (P1 - P2) / SE = 0.0197 = 2.538

Since, z>1.96, the null hypothesis is rejected at , i.e. the data are inconsistent with the hypothesis

P1 = P2 and we conclude that the difference in population proportions is unlikely to be hidden in


sampling.

Exercises:

Samples of two types of electric bulbs were tested for length of life and the following data were
obtained Is the difference in the means sufficient to warrant that type I bulbs are superior to type
2 bulbs? (ANS t =9.39 , rejected )

Size Mean SD
Sample 1 8 1234 hr 36 hr
Sample 2 7 1036 hr 40 hr

A company tests the battery life (in hours) of two different brands. Is there a significant difference between
the means of these two samples at 0.01 level of significance? The results are:

Brand [Link] samples Mean Sample Variance


Brand A 10 400 14400
Brand B 12 380 16900
(Ans: The battery lives do not show a significant difference)

PROBABILITY, STATISTICS AND NUMERICAL


23
METHODS (303191251)
Chi-square as a Test of Independence:
When the data are classified according to two attributes, Chi-square (𝜒 2 ) can also be used to test the
hypotheses that the two attributes are [Link] the data are classified into 𝑟 classes
𝐴1 , 𝐴2 , … , 𝐴𝑟 according to attribute 𝐴 and into 𝑐 classes 𝐵1 , 𝐵2 , … , 𝐵𝑐 according to attribute 𝐵. The
representation of the data in a cross-classified table known as acontigency table

is given below. In the 𝑟 × 𝑐 contingency table the observed frequencies of different cells are shown below:
𝐵1 𝐵2 𝐵3 … 𝐵𝑐
𝐴1 𝑂11 𝑂12 𝑂13 … 𝑂1𝑐
𝐴2 𝑂21 𝑂22 𝑂23 … 𝑂2𝑐
… … … … … …
𝐴𝑟 𝑂𝑟1 𝑂𝑟2 𝑂𝑟3 … 𝑂𝑟𝑐

𝐴𝑖 = Total of 𝑖 𝑡ℎ row and 𝐵𝑗 = total of 𝑗 𝑡ℎ column


𝑂𝑖𝑗 = frequency of cell 𝑖𝑗 (𝑖 𝑡ℎ row and 𝑗 𝑡ℎ column)
𝑁 = ∑ 𝐴𝑖 = ∑ 𝐵𝑗 = total frequency
Under the null hypothesis that the two attributes A and B are independent, we shall find the expected
frequency of (𝑖, 𝑗)𝑡ℎ cell.
𝐴
The probability that any observation will fall in the 𝑖 𝑡ℎ row = 𝑁𝑖
𝐵𝑖
Similarly the probability that any observation will fall in the 𝑗 𝑡ℎ column = 𝑁
𝑡ℎ 𝑡ℎ 𝐴𝑖 𝐵𝑖
And probability that any observation will fall in the 𝑖 row and 𝑗 column = ×
𝑁 𝑁
𝑡ℎ 𝐴𝑖 𝐵𝑖 𝐴𝑖 𝐵𝑖
∴ Expected frequency of (𝑖, 𝑗) cell = 𝑒𝑖𝑗 = 𝑁 × 𝑁 × 𝑁 = 𝑁
Thus we can find expected frequencies of all the cells. From observed frequencies 𝑂𝑖𝑗 and expected
frequency 𝑒𝑖𝑗 ; the value of 𝜒 2 can be obtained by following formula:
2
2
(𝑂𝑖𝑗 − 𝑒𝑖𝑗 )
𝜒 = ∑∑
𝑒𝑖𝑗
𝑖 𝑗
The number of independent cells in a 𝑟 × 𝑐 contingency table is (𝑟 − 1)(𝑐 − 1). Hence the degrees of
freedom in a 𝑟 × 𝑐 table is (𝑟 − 1)(𝑐 − 1).
For testing the hypothesis of independence of two attributes A and B, the value of 𝜒 2 is found out and is
compared with the table value of 𝜒 2 on (𝑟 − 1)(𝑐 − 1) d.f. and at a required level of significance. If
calculated value 𝜒 2 is less than the table of 𝜒 2 , the hypothesis that the attributes are independent may be
accepted.

PROBABILITY, STATISTICS AND NUMERICAL


24
METHODS (303191251)
Example: In an industry, 200 workers employed for a specific job were classified according to their
performance and training received / not received. Test independence of training and performance. The data
are summarized as follows:
Performance
Good Not good Total
Trained 100 50 150
Untrained 20 30 50
120 80 200

Solution: 𝐻0 : Performance is independent of training.

Performance
Good Not good Total
Trained 100 (90) 50 (60) 150
Untrained 20 (30) 30 (20) 50
120 80 200

(150)(120)
Expected frequency of cell (1,1) = 200 = 90
The expected frequencies of different cells are indicated in brackets in above table.
(𝑜𝑖 − 𝑒𝑖 )2 (100 − 90)2 (50 − 60)2 (20 − 30)2 (30 − 20)2
𝜒2 = ∑ = + + +
𝑒𝑖 90 60 30 20
= 1.11 + 1.67 + 3.33 + 5 = 11.11
𝑑. 𝑓. = (𝑟 − 1)(𝑐 − 1) = (2 − 1)(2 − 1) = 1
On 1 d.f. and at 5% significance level, table value of 𝜒 2 = 3.84
2 2
i.e. 𝜒𝑐𝑎𝑙 > 𝜒𝑡𝑎𝑏
Hence𝐻0 is rejected
Thus performance depends upon training.

Example: The result in the last exam of a sample of 100 students is given below:
1st class 2nd class 3rd class Total
Boys 10 28 12 50
Girls 20 22 2 50
Total 30 50 20 100
Can it be said that the performance in the exam depends upon gender.

Solution:𝐻0 : Gender and performance in the exam are independent.

PROBABILITY, STATISTICS AND NUMERICAL


25
METHODS (303191251)
1st class 2nd class 3rd class Total
Boys 10 (15) 28 (25) 12 (10) 50
Girls 20 (15) 22 (25) 2 (10) 50
Total 30 50 20 100

(50)(30)
Expected frequency of cell (1,1) = 100 = 15
The expected frequencies of different cells are indicated in brackets in above table.
(𝑜𝑖 − 𝑒𝑖 )2
𝜒2 = ∑
𝑒𝑖

(10 − 15)2 (28 − 25)2 (12 − 10)2 (20 − 15)2 (22 − 25)2 (8 − 10)2
= + + + + +
15 25 10 15 25 10
= 1.67 + 0.3 + 0.4 + 1.67 + 0.36 + 0.4 = 4.86
𝑑. 𝑓. = (𝑟 − 1)(𝑐 − 1) = (2 − 1)(3 − 1) = 2
On 2 d.f. and at 5% significance level, table value of 𝜒 2 = 5.99
2 2
i.e. 𝜒𝑐𝑎𝑙 < 𝜒𝑡𝑎𝑏
hence 𝐻0 is accepted.
Thus performance does not depend upon gender.

Exercise: In a certain sample of 2000 families, 1400 families are consumers of tea. Out of 1800 Hindu
families, 1236 families consume tea. Use 𝜒 2 test and state whether there is any significant difference
between consumption of tea among Hindu and non-Hindu families.

Chi-square as a Test of Goodness of Fit: Testing the appropriateness of a distribution


Under the null hypothesis that there is no significant difference between observed and expected
frequencies, the value of 𝜒 2 is calculated by the formula:
(𝑜𝑖 − 𝑒𝑖 )2
𝜒2 = ∑
𝑒𝑖
If all observed frequencies and expected frequencies are equal, the value of 𝜒 2 will be zero. This will
signify a perfect agreement of observations with expectations. More the value of 𝜒 2 , more is the divergence
between the observed and expected frequencies.
The value of 𝜒 2 is calculated from the given data and it is compared with the table value of 𝜒 2 on 𝑛 − 1
2 2
degrees of freedom (d.f.) and at arequired significance level. If calculated value of 𝜒 2 i.e. 𝜒𝑐𝑎𝑙 <𝜒𝑡𝑎𝑏 i.e.
2
table value of 𝜒 , the null hypothesis may be accepted and it can be concluded that the given frequency
2 2
fits the hypothesis. And if 𝜒𝑐𝑎𝑙 >𝜒𝑡𝑎𝑏 , the null hypothesis may be rejected and it can concluded that the
observed frequency distribution does not fit the hypothesis.
Note: Here, d.f. = −𝑘 − 1 , where 𝑘 is the number of parameters estimated.

PROBABILITY, STATISTICS AND NUMERICAL


26
METHODS (303191251)
Example: A die is thrown for 300 times and the following distribution is obtained. Can the die be regarded
as unbiased.

Number on the die 1 2 3 4 5 6


Frequency 41 44 49 53 57 56

1
Solution:𝐻0 : Die is unbiased i.e. the probability of getting any number on die is 6 .

Observed Expected (𝑜𝑖 − 𝑒𝑖 )2


Number on die
frequency 𝑜𝑖 frequency 𝑒𝑖 𝑒𝑖
81
1 41 50 = 1.62
50
2 44 50 0.72

3 49 50 0.02

4 53 50 0.18

5 57 50 0.98

6 56 50 0.72

Total 300 300 4.24

(𝑜𝑖 − 𝑒𝑖 )2
𝜒2 = ∑ = 4.24
𝑒𝑖
𝑑. 𝑓. = 𝑛 − 1 = 6 − 1 = 5
2
The table value 𝜒𝑡𝑎𝑏 on 5 d.f. and 5% significance level is
2
𝜒𝑡𝑎𝑏 = 11.07
2 2
Hence 𝜒𝑐𝑎𝑙 < 𝜒𝑡𝑎𝑏
Thus 𝐻0 may be accepted.
Therefore die may be regarded as unbiased.

Example: The number of road accidents on a highway during a week is given below. Can it be considered
that the proportion of accidents are equal for all days?
Day Mon Tue Wed Thurs Fri Sat Sun
Number of 14 16 8 12 11 9 14
accidents

Solution:𝐻0 : the proportion of accidents is the same for all the days i.e. probability of an accident on any
1
day is 7 .
Day Mon Tue Wed Thurs Fri Sat Sun Total
Observed 14 16 8 12 11 9 14 84
frequency
Expected 12 12 12 12 12 12 12 84
frequency

PROBABILITY, STATISTICS AND NUMERICAL


27
METHODS (303191251)
(𝑜𝑖 − 𝑒𝑖 )2
𝜒2 = ∑
𝑒𝑖
(14 − 12)2 (16 − 12)2 (8 − 12)2 (12 − 12)2 (11 − 12)2 (9 − 12)2 (14 − 12)2
= + + + + + +
12 12 12 12 12 12 12
4 + 6 + 16 + 0 + 1 + 9 + 4 50
= =
12 12

= 4.17
𝑑. 𝑓. = 𝑛 − 1 = 7 − 1 = 6
2
Table value 𝜒 2 on 6 d.f. and at 5% significance level 𝜒𝑡𝑎𝑏 =12.59
2 2
𝜒𝑐𝑎𝑙 < 𝜒𝑡𝑎𝑏

Hence 𝐻0 may be accepted. Thus proportions of accidnts is same for all days.

Exercise:
1. he units produced by a plant are classified into four grades. The past performance of the plant
shows that the respective proportions are 8:4:2:1. To check the run of the plant 600 parts are
examined and classified as follows. Is there any evidence of a change in production standards?
Grades 1st 2nd 3rd 4th Total
Units 340 130 100 30 600

2. A store manager believes that customers prefer three types of payment methods (Cash, Credit
Card, Digital Payment) in a ratio of 40:35:25. A sample of 200 customers provided the following
actual preferences Cash as 85, Credit Card as 70 and Digital Payment as 45. Test at a 5%
significance level whether the observed data matches the expected distribution.(Ans: Since χ² =
0.8125 < 5.99, we fail to reject the null hypothesis.)
3. A study was conducted to check if coffee consumption is independent of gender. The following
data was collected from 100 people:
Drinks Coffee Does Not Drink Coffee Total
Male 30 20 50
Female 35 15 50
Total 65 35 100
Test whether coffee consumption is independent of gender at a 5% level of significance using the
Chi-Square test for independence. (Ans: Since χ² = 1.0988 < 3.84, we fail to reject the null
hypothesis)

PROBABILITY, STATISTICS AND NUMERICAL


28
METHODS (303191251)
PROBABILITY, STATISTICS AND NUMERICAL
29
METHODS (303191251)

You might also like