Chapter 9: Statistical Inference for Two
Samples
Course Name: PROBABILITY & STATISTICS
Lecturer: Duong Thi Hong
Hanoi, 2022
1 / 38 Chapter 9: Statistical Inference for Two Samples
Content
1 Inference on the Difference in Means of Two Normal
Distributions, Variance Known
2 Inference on the Difference in Means of Two Normal
Distributions, Variance Unknown
3 Inference on the Two Proportions
2 / 38 Chapter 9: Statistical Inference for Two Samples
Content
1 Inference on the Difference in Means of Two Normal
Distributions, Variance Known
2 Inference on the Difference in Means of Two Normal
Distributions, Variance Unknown
3 Inference on the Two Proportions
3 / 38 Chapter 9: Statistical Inference for Two Samples
Inference on the Difference in Means of Two
Normal Distributions, Variance Known
We will assume that
is a random sample from population 1
X11 , X12 , ..., X1n1
X21 , X22 , ..., X2n is a random sample from population 2
2
The two populations represented by X1 and X2 are
independent
Both populations are normal
Then, the quantity
X̄1 − X̄2 − (µ1 − µ2 )
Z= q 2 ,
σ1 σ22
n1 + n2
has a N (0, 1) distribution.
4 / 38 Chapter 9: Statistical Inference for Two Samples
Confidence Interval on the Difference in Means,
Variances Known
Confidence Interval on the Difference in Means, Variances Known
If x̄1 and x̄2 are the means of independent random samples of sizes
n1 and n2 from two independent normal populations with known
variances σ12 and σ22, respectively, a 100(1 − α)% confidence interval
for µ1 − µ2 is
s s
σ12 σ22 σ12 σ22
x̄1 − x̄2 − zα/2 + ≤ µ1 − µ2 ≤ x̄1 − x̄2 + zα/2 +
n1 n2 n1 n2
where zα/2 is the upper α/2 percentage point of the standard normal
distribution.
5 / 38 Chapter 9: Statistical Inference for Two Samples
Confidence Interval on the Difference in Means,
Variances Known
Example 1
A product developer is interested in reducing the drying time of a
primer paint. Two formulations of the paint are tested; formulation
1 is the standard chemistry, and formulation 2 has a new drying
ingredient that should reduce the drying time. From experience, it
is known that the standard deviation of drying time is 8 minutes,
and this inherent variability should be unaffected by the addition
of the new ingredient. Ten specimens are painted with formula-
tion 1, and another 10 specimens are painted with formulation
2; the 20 specimens are painted in random order. The two sam-
ple average drying times are x̄1 = 121 minutes and x̄2 = 112 minutes.
Construct 95% confidence interval on the difference in means.
6 / 38 Chapter 9: Statistical Inference for Two Samples
Confidence Interval on the Difference in Means,
Variances Known
Sample Size for a Confidence Interval on the Difference in Means,
Variances Known
If the standard deviations σ1 and σ2 are known and the two sample
sizes n1 and n2 are equal (n1 = n2 = n) we can determine the sample
size required so that the error in estimating µ1 − µ2 by x̄1 − x̄2 will
be less than E at 100(1 − α)% confidence. The required sample size
from each population is
zα/2 2 2
n=( ) (σ1 + σ22 )
E
7 / 38 Chapter 9: Statistical Inference for Two Samples
Confidence Interval on the Difference in Means,
Variances Known
One-Sided Confidence Bounds
One-Sided Upper Confidence Bound
s
σ12 σ22
µ1 − µ2 ≤ x̄1 − x̄2 + zα +
n1 n2
One-Sided Lower Confidence Bound
s
σ12 σ22
x̄1 − x̄2 − zα + ≤ µ1 − µ2
n1 n2
8 / 38 Chapter 9: Statistical Inference for Two Samples
Hypothesis Tests on the Difference in Means,
Variances Known
Formally, we summarize these results in the following display.
Tests on the Difference in Means, Variances Known
9 / 38 Chapter 9: Statistical Inference for Two Samples
Inference on the Difference in Means of Two
Normal Distributions, Variance Known
Exercise 1
Consider the hypothesis test H0 : µ1 = µ2 against H1 : µ1 6= µ2 with
known variances σ1 = 10 and σ2 = 5. Suppose that sample sizes n1 = 10
and n2 = 15 and that x̄1 = 4.7 and x̄2 = 7.8. Use α = 0.05
a) Test the hypothesis and find the P-value.
b) Explain how the test could be conducted with a confidence interval.
10 / 38 Chapter 9: Statistical Inference for Two Samples
Inference on the Difference in Means of Two
Normal Distributions, Variance Known
Exercise 1
Consider the hypothesis test H0 : µ1 = µ2 against H1 : µ1 6= µ2 with
known variances σ1 = 10 and σ2 = 5. Suppose that sample sizes n1 = 10
and n2 = 15 and that x̄1 = 4.7 and x̄2 = 7.8. Use α = 0.05
a) Test the hypothesis and find the P-value.
b) Explain how the test could be conducted with a confidence interval.
Exercise 2
Consider the hypothesis test H0 : µ1 = µ2 against H1 : µ1 < µ2 with
known variances σ1 = 10 and σ2 = 5. Suppose that sample sizes n1 = 10
and n2 = 15 and that x̄1 = 14.2 and x̄2 = 19.7. Use α = 0.05
a) Test the hypothesis and find the P-value.
b) Explain how the test could be conducted with a confidence interval.
10 / 38 Chapter 9: Statistical Inference for Two Samples
Question 1
11 / 38 Chapter 9: Statistical Inference for Two Samples
Question 2
12 / 38 Chapter 9: Statistical Inference for Two Samples
Question 3
13 / 38 Chapter 9: Statistical Inference for Two Samples
Question 4
14 / 38 Chapter 9: Statistical Inference for Two Samples
Content
1 Inference on the Difference in Means of Two Normal
Distributions, Variance Known
2 Inference on the Difference in Means of Two Normal
Distributions, Variance Unknown
3 Inference on the Two Proportions
15 / 38 Chapter 9: Statistical Inference for Two Samples
Inference on the Difference in Means of Two
Normal Distributions, Variance Unknown
Case 1: σ12 = σ22 = σ2.
Let
X11 , X12 , ..., X1n1 is a random sample from population 1
X21 , X22 , ..., X2n is a random sample from population 2
2
X̄1 , X̄2 , S12 and S22 be the sample means and sample variances,
respectively.
16 / 38 Chapter 9: Statistical Inference for Two Samples
Inference on the Difference in Means of Two
Normal Distributions, Variance Unknown and
Equal
Pooled Estimator of Variance
The pooled variance of σ2 denoted by Sp2, is defined by
(n1 − 1)S12 + (n2 − 1)S22
Sp2 =
n1 + n2 − 2
17 / 38 Chapter 9: Statistical Inference for Two Samples
Inference on the Difference in Means of Two
Normal Distributions, Variance Unknown and
Equal
Given the assumptions of this section, the quantity
X̄1 − X̄2 − (µ1 − µ2 )
T = q 2
Sp Sp2
n1 + n2
has a t distribution with n1 + n2 − 2 degrees of freedom.
18 / 38 Chapter 9: Statistical Inference for Two Samples
Confidence Interval on the Difference in Means,
Variances Unknown and Equal
Confidence Interval on the Difference in Means, Variances
Unknowns and Equal
If x̄1, x̄2, s21 and s22 are the sample means and variances of two random
samples of sizes n1 and n2 respectively, from two independent normal
populations with unknown but equal variances, then a 100(1 − α)%
confidence interval on the difference in means µ1 − µ2 is
q
s2p s2
x̄1 − x̄2 − tα/2,n1 +n2 −2 + np2 ≤ µ1 − µ2
n1
q 2
s s2
≤ x̄1 − x̄2 + tα/2,n1 +n2 −2 np1 + np2
where tα/2,n +n −2 is the upper α/2 percentage point of the t
distribution with n1 + n2 − 2 degrees of freedom.
1 2
19 / 38 Chapter 9: Statistical Inference for Two Samples
Confidence Interval on the Difference in Means,
Variances Unknown and Equal
One-sided confidence bound on the difference in means
One-Sided Upper Confidence Bound
s
s2p s2p
µ1 − µ2 ≤ x̄1 − x̄2 + tα,n1 +n2 −2 +
n1 n2
One-Sided Lower Confidence Bound
s
s2p s2p
x̄1 − x̄2 − tα,n1 +n2 −2 + ≤ µ1 − µ2
n1 n2
20 / 38 Chapter 9: Statistical Inference for Two Samples
Hypotheses Tests on the Difference in Means,
Variances Unknown and Equal
Tests on the Difference in Means of Two Normal Distributions,
Variances Unknown and Equal
21 / 38 Chapter 9: Statistical Inference for Two Samples
Hypotheses Tests on the Difference in Means,
Variances Unknown and Not Assumed Equal
Case 2: σ12 6= σ22.
Test Statistic for the Difference in Means, Variances Unknown and
Not Assumed Equal
If H0 : µ1 − µ2 = ∆0 is true, the statistic
X̄1 − X̄2 − ∆0
T0∗ = q 2
S1 S22
n1 + n2
is distributed approximately as t with degrees of freedom given by
s2 s22 2
( n11 + n2 )
v= (s21 /n1 )2 (s22 /n2 )2
n1 −1 + n2 −1
If v is not an integer, round down to the nearest integer.
22 / 38 Chapter 9: Statistical Inference for Two Samples
Confidence Interval on the Difference in Means,
Variances Unknown and Not Assumed Equal
Case 2: σ12 6= σ22.
Approximate Confidence Interval on the Difference in Means,
Variances Unknown Are Not Assumed Equal
If x̄1, x̄2, s21 and s22 are the means and variances of two random samples
of sizes n1 and n2 respectively, from two independent normal
populations with unknown and unequal variances, an approximate
100(1 − α)% confidence interval on the difference in means µ1 − µ2 is
s s
s21 s22 s21 s2
x̄1 − x̄2 − tα/2,v + ≤ µ1 − µ2 ≤ x̄1 − x̄2 + tα/2,v + 2
n1 n2 n1 n2
23 / 38 Chapter 9: Statistical Inference for Two Samples
Inference on the Difference in Means of Two
Normal Distributions, Variance Unknown
Exercise 1
Consider the hypothesis test H0 : µ1 = µ2 against H1 : µ1 6= µ2 . Suppose
that sample sizes n1 = 15 andn2 = 15 and that x̄1 = 4.7 and x̄2 = 7.8,
s21 = 4, s22 = 6.25. Assume that σ12 = σ22 and that the data are drawn from
normal distributions. Use α = 0.05
a) Test the hypothesis and find the P-value.
b) Explain how the test could be conducted with a confidence interval.
24 / 38 Chapter 9: Statistical Inference for Two Samples
Question 1
25 / 38 Chapter 9: Statistical Inference for Two Samples
Question 2
26 / 38 Chapter 9: Statistical Inference for Two Samples
Question 3
27 / 38 Chapter 9: Statistical Inference for Two Samples
Question 4
28 / 38 Chapter 9: Statistical Inference for Two Samples
Question 5
29 / 38 Chapter 9: Statistical Inference for Two Samples
Content
1 Inference on the Difference in Means of Two Normal
Distributions, Variance Known
2 Inference on the Difference in Means of Two Normal
Distributions, Variance Unknown
3 Inference on the Two Proportions
30 / 38 Chapter 9: Statistical Inference for Two Samples
Inference on the Two Proportions
Two independent random samples of size n1 and n2 (large enough).
Sample proportion: p̂1 = nx , p̂2 = nx
1 2
p̂1 − p̂2 is point estimator of p1 − p2
1 2
If n1, n2 are large enough, we have
p1 (1 − p1 ) p2 (1 − p2 )
p̂1 − p̂2 ∼ N (p1 − p2 , + )
n1 n2
Pooled proportion
x1 + x2
p̂ =
n1 + n2
31 / 38 Chapter 9: Statistical Inference for Two Samples
Confidence Interval on the Difference in
Population Proportions
Approximate Confidence Interval on the Difference in Population
Proportions
If p̂1 and p̂2 are the sample proportions of observations in two
independent random samples of sizes n1 and n2 that belong to a class
of interest, an approximate twosided 100(1 − α)% confidence interval
on the difference in the true proportions p1 − p2 is
s
p̂1 (1 − p̂1 ) p̂2 (1 − p̂2 )
p̂1 − p̂2 − zα/2 + ≤ p1 − p2
n1 n2
s
p̂1 (1 − p̂1 ) p̂2 (1 − p̂2 )
≤ p̂1 − p̂2 + zα/2 +
n1 n2
where zα/2 is the upper α/2 percentage point of the standard normal
distribution.
32 / 38 Chapter 9: Statistical Inference for Two Samples
Large-Sample Tests on the Difference in
Population Proportions
We are interested in testing the hypotheses
H0 : p1 = p2
H1 : p1 6= p2
Test Statistic:
P̂1 − P̂2 − (p1 − p2 )
Z=q
p1 (1−p1 )
n1 + p2n(1−p
2 −1
2)
33 / 38 Chapter 9: Statistical Inference for Two Samples
Large-Sample Tests on the Difference in
Population Proportions
Approximate Tests on the Difference of Two Population
Proportions
34 / 38 Chapter 9: Statistical Inference for Two Samples
Question 1
35 / 38 Chapter 9: Statistical Inference for Two Samples
Question 2
36 / 38 Chapter 9: Statistical Inference for Two Samples
Question 3
37 / 38 Chapter 9: Statistical Inference for Two Samples
Question 4
38 / 38 Chapter 9: Statistical Inference for Two Samples