Hypothesis tests for two
independent samples
• Compare mean values of two
populations
• Compare two proportions
Two independent samples
model
Model of two groups of objects with different
a) Intervention levels,
b) Individual proper
14.3
6 steps of hypothesis testing…
We follow the 6 steps to perform hypothesis
testing when comparing two populations:
[Link] the null and alternative hypotheses
[Link] the test statistic
[Link] the significance level
[Link] the decision rule
[Link] the value of the test statistic
[Link] conclusions
Problem 1. Compare two mean values
Let ( X 1, X 2 ,..., X n ) be a sample of n independent
observations from a variable X with expectation 1 and
2
variance
(Y1,Y2 ,...,Ym ) be a sample of m independent
observations from a variable Y with expectation 2 and
2
variance
Problem: Compare two expectations 1 and 2 .
Estimate and compare two mean values X and Y .
Case 1: Testing a hypothesis about μ1
– μ2 when the population variances
are known
((xx11 xx22)) (( ))
zz
nn11 nn22
Cases 2 - 3: Testing a hypothesis about μ1 –
μ2 when the population variances are
unknown
Practically, the z-statistic is hardly used, because the
population variances, σ12 and σ22, are usually not
known and estimated by sample variances, s12 and s22.
((xx11 xx22)) (( ))
ZtZ
2
2 2
s1 s
nn11 nn22
Instead of a z-statistic, we construct a t-statistic using
the sample ‘variances’ (s12 and s22).
Cases 2 - 3: Testing a hypothesis about μ1 –
μ2 when the population variances are
unknown
Two cases are considered when producing the t-
statistic:
Case 2: The two unknown population variances
are equal.
Case 3: The two unknown population variances
are not equal.
Case 2: Unknown but equal
variances
Calculate the pooled variance estimate by:
The pooled ( n 1) s 2
( n 1) s 2
variance S p2 1 1 2 2
n1 n2 2
estimator
14.
10
Construct the equal-variances t-statistic as follows:
( x1 x2 ) ( )
t
1 1
s 2p ( )
n1 n2
d . f . n1 n2 2
Perform an equal-variances t-test of μ1 – μ2
H0: μ1 - μ2 = 0
HA: μ1 - μ2 ≠ 0;
The problem can be solved by using the following
Theorem:
Theorem. Let ( X , X ,..., X ) and (Y1 , Y2 ,..., Ym ) be two
1 2 n
samples of independent observations selected correspondingly from a
2
variable X with sample mean X and sample variance S X and
from a variable Y with sample mean Y and sample variance S 2
Y
(both variables are normal distributed with common variance). If the
hypothesis H is true (µ1 = µ2) then the variable (statistic)
n.m nm 2
t ( X Y )
n m (n 1) S X2 (m 1) SY2
has Student distribution with (n+m-2) degrees of freedom.
Hypothesis Tests
Hypothesis
H: µ1 = µ2
Alternative Hypothesis
K: µ1 ≠ µ2
Steps of testing
Step 1. Estimate sample mean values
Mean(X) , Mean(Y) and sample variances
Var(X) , Var(Y)
Step 2. Calculating perform the quantity
n.m nm 2
t ( Mean( X ) Mean(Y ))
n m (n 1) Var ( X ) (m 1) Var (Y )
Step 3 (p-value approach). Taking a
variable T(n+m-2) of Student distribution
with (n + m - 2) degrees of freedom
calculate the p-value (probability)
b = P { | T(n+m-2) | ≥ | t | }
Step 4. Compare the p-value b with a given
ahead significance level α (=5%, 1%, 0.5%
or 0.1%):
+ If b ≥ α accept Hypothesis H and
conclude
µ1 = µ2
+ If b < α reject Hypothesis H and
confirm
µ1 ≠ µ2
Using Excel to Compute t - Distribution
• Excel has two functions for computing cumulative
probabilities and x values for any t - distribution:
• [Link] is used to compute the cumulative
probability given an x value.
• [Link] is used to compute the x value given a
cumulative probability.
16
Version B. Using Student critical value
Calculate the critical value T(n+m-2)(1-α/2) of
Student distribution with n+m-2
degrees of freedom (α is a given ahead
significance level =5%, 1% or 0.5%)
Decide
- Reject Hypothesis H if
|t| ≥ T(n+m-2)(1-α/2)
- Accept Hypothesis H if
|t| < T(n+m-2)(1-α/2)
Version C. Using confidence intervals
When degree of freedom (sample size) is large,
Student distribution approximates Normal
distribution. Then we can use confidence
intervals (with significance level of 5%) for
testing:
( n 1) SD( X ) ( n 1) SD( X )
( Mean( X ) T(1 /2) ; Mean( X ) T(1 /2) ),
n n
( m 1) SD (Y ) ( m 1) SD (Y )
( Mean(Y ) T(1 /2) ; Mean(Y ) T(1 /2) )
m m
Decide
Reject Hypothesis H if the two intervals
disjoin
Accept Hypothesis H if the two intervals have
nonempty intersection
Case 3: Unknown and unequal
variances
Construct the unequal-variances t-statistic as follows:
( x1 x2 ) ( 1 2 )
t
s12 s22
( )
n1 n2
( s12 n1 s22 / n2 )2
with d . f . 2
( s12 2
n1 ) ( s22
n2 )
n1 1 n2 1
Then the hypothesis testing procedure remains the
same as of Case 2
Problem 2. Compare two
proportions – the case of large
sample sizes
Let ( X1, X 2 ,..., X n1 ) be a sample of a binary variable X taking
value 1 with probability p1 and value 0 with probability (1 p1 ) ,
(Y1, Y2 ,..., Yn2 ) be a sample of a binary variable Y taking value 1
with probability p2 and value 0 with probability (1 p2 ) ;
p1,p2 (0,1).
Consider the Hypothesis H: p1 = p2
and Alternative Hypothesis K: p1 p2
Note. Variable X has expectation p1 and variance p1 (1- p1 ).
Variable Y has expectation p2 and variance p2 (1- p2 ).
Therefore we can treat the testing problem as a special problem
of comparing two mean values (expectations) p1 and p 2 .
By Moivre-Laplace Theorem, for large sample
size,
n1×p1 ≥ 5 and n1×(1-p1) ≥ 5,
n2×p2 ≥ 5 and n2×(1-p2) ≥ 5,
the sample proportions m(p1)/n1 and m(p2)/n2
of appearance of number 1 have
distributions approximate to normal
distribution with expectation p1, p2 and
variance p1 ×(1-p1)/n1, p2 ×(1-p2)/n2,
respectively. Denote m1 = m(p1) and m2 =
m(p2).
If the Hypothesis H is true then use the two samples ( X1, X 2 ,..., X n1 )
and (Y1, Y2 ,..., Yn2 ) as samples collected from one variable and estimate
the common variance of X and Y by
m1 m2 m m2 m m2 n1 n2 m1 m2
.(1 1 ) 1 .
n1 n2 n1 n2 n1 n2 n1 n2
then perform a statistic
m1 m2 m1 m2 n1 n2 m1 m2 n1 n2
u / . .
1n n2 n1 n2 n1 n2 n .n
1 2
for testing, where m1 and m2 respectively are the numbers
of values 1 appeared in the above two samples.
By Central Limit Theorem, when sample sizes
are large, the difference Mean(X) - Mean(Y)
has a distribution very close to Normal
distribution. Then the testing procedure can
be as follows:
Step 1. Calculate value of statistic
m1 m2 m1 m2 n1 n2 m1 m2 n1 n2
u / . .
n1 n2 n1 n2 n1 n2 n1.n2
Step 2. Taking Normal distribution N(0,1) find
the probability (p-value)
b = P { | N(0,1) | > | u | }
Step 3. Compare the probability b (p-value) to
the given ahead significance α
* If b ≥ α Accept Hypothesis H , confirm
the equality of two proportions
* If b < α Reject Hypothesis H and
conclude two proportions to be different
Version B. Using Normal critical value
Looking in Table of Normal distribution
find out critical value uα/2 of Normal
distribution (the critical value for α = 5%
equals 1.96)
Decide
- Reject Hypothesis H if
|u| ≥ uα/2
- Accept Hypothesis H if
|u| < uα/2
Version C. Using confidence intervals
Use confidence intervals (with significance
level of α) of estimated proportions for
testing:
m1 m1 m1 m1 m1 m1
Z1 /2 * (1 ) / n1 ; Z1 /2 * (1 ) / n1
n1 n1 n1 n1 n1 n1
m2 m2 m2 m2 m2 m2
Z1 /2 * (1 ) / n2 ; Z1 /2 * (1 ) / n2
n2 n2 n2 n2 n2 n2
Decide
Reject Hypothesis H if the two intervals
disjoin
Accept Hypothesis H if the two intervals
have nonempty intersection
Compare several proportions
Let X be a binary variable taking two values 0 and 1 .
Collecting data from that variable under k different
conditions we have a sample containing k groups of
observations related with the conditions
Let p1, p2 ,..., pk
be probabilities of appearance of value 1 of
variable X under each of the above k
conditions.
Hypothesis
p1 p2 ... pk
H:
Alternative Hypothesis
K: there is certain difference between p1, p2 ,..., pk
Data: Perform a 2xk table of 2 rows and k columns:
each column for one group, the 1rst row for value 1,
the 2nd row for value 0 of the variable at
observations:
Table 1. Observed frequency
n1 n11 n12 ... n1k ; n0 n01 n02 ... n0k
n ( j ) n j1 n j 0 ; j 1,2,..., k ; n n0 n1
Compare several proportions
• If the hypothesis is correct, the
proportion of occurrence of 1
estimated commonly to all
columns (conditions) is equal to
n1 / n
• The proportion of occurrence of
0 estimated commonly to all
columns is equal to
n0 / n
Perform the table of expected (theoretical)
frequencies of the hypothesis:
Table 2. predicted (expected) frequency
Perform the table of the test statistic:
k 1 ( j) ( j)
n .n n .n
2 ( nij i )2 /( i )
j 1 i 0
n n
LEMMA. Suppose that hypothesis H is true.
2
Then variable has distribution approximate
to the Chi-square distribution with ( k 1)
2
degrees of freedom (k-1) .
Density function of Chi – squared distribution
Using Excel to Compute Chi – squared
Distribution
• Excel has two functions for computing cumulative
probabilities and x values for any Chi - squared
distribution:
• [Link] is used to compute the cumulative
probability given an x value, p-value.
• [Link] is used to compute the x value given a
cumulative probability, critical value.
35
Method A (p-value):
Step 1. Taking a variable ꭓ2(k-1) of Chi-
squared distribution with (k-1) degrees of
freedom calculate the probability (p-value)
b = P {ꭓ2(k-1) > }2 .
Step 2. Compare the probability b to the
given ahead significance level α :
* If b ≥ α accept hypothesis H ,
conclude the all proportions are equal
* If b < α reject hypothesis H , confirm
the appearance of some difference between
proportions.
Method B. (Critical value)
Looking in Table of Chi-squared
distribution to find (2critical
k 1) ( ) value
of Chi-squared distribution with k-1
degrees of freedom (α is a given ahead
significance level =5%,1% or 0.5%)
Decide
- Reject Hypothesis H: = if
2 2
( k 1) ( )
- Accept Hypothesis
2 2
H: = if
( k 1) ( )
Test for two related (paired)
samples
• Compare two mean
values
C. Model of two dependent
(paired) samples
• Two dependent samples model is used in a study when
• A) Each object in the first sample is chosen together with a similar
(paired) object in the second sample, or
• B) Any object in the second sample is the same one in the first sample,
but the measures in the two samples are taken under different
conditions.
Compare mean values of two
related samples
For related variables X and Y , the
comparison of mean values is equivalent to
the comparison the mean value of the
difference variable X – Y to value 0
the problem reduces to one-sample
model.
Compare mean values of two related samples
Hypothesis
Alternative Hypothesis
where and are the
expectations of X and Y
With ,
comparing expectations of to
0:
Hypothesis
Alternative Hypothesis
Compare mean values of two related
samples
With the empirical value of the test statistic
a)Compare the empirical value of the t-test
statistic with the critical value
, which is the
percentile of the Student
distribution with n-1 degrees of freedom:
• - If reject the hypothesis H ,
• - If accept H .
Compare mean values of two related
samples
b) Taking a random variable T having
Student distribution with n-1 degrees of
freedom, calculate the probability of
significance
Compare the probability of significance
with the significance level :
• - If reject H ,
• - If accept H .
Compare mean values of two related
samples
c) Determine the 95% confidence
interval of the estimation :
Compare 0 to the confidence
interval:
• - If reject H,
• - If accept H .