Comparing Two Population Means
Comparing Two Population Means
Introductory Statistical
Methods
Preparatory Module
8-2
Chapter 8
The Comparison of
Two Populations
8-3
8 LEARNING OBJECTIVES
After studying this chapter you should be able to:
• Explain the need to compare two population parameters
• Conduct a paired difference test for the difference in
population means
• Conduct an independent samples test for the difference in
population means
• Describe why a paired difference test is better than
independent samples test
• Conduct a test for difference in population proportions
• Test whether two population variances are equal
8-5
Paired-Observation Comparisons
of Means
D D
t D D0 0
t s
sD D
n
n
where D is the sample average difference between each
where D is the sample average difference between each
pair of observations, s is the sample standard deviation
pair of observations, s D Dis the sample standard deviation
of these differences, and the sample size, n, is the number
of these differences, and the sample size, n, is the number
of pairs of observations. The symbol is the population
of pairs of observations. The symbol D0D0is the population
mean difference under the null hypothesis. When the null
mean difference under the null hypothesis. When the null
hypothesis is true and the population mean difference is ,
hypothesis is true and the population mean difference is D0D,0
the statistic has a t distributi on with (n - 1) degrees of freedom.
the statistic has a t distributi on with (n - 1) degrees of freedom.
8-8
Example 8-1
AArandom
randomsample
sampleofof1616viewers
viewersofofHome
HomeShopping
ShoppingNetwork
Networkwaswasselected
selectedfor
foran
anexperiment.
experiment. AllAllviewers
viewersinin
thesample
the samplehad
hadrecorded
recordedthe
theamount
amountofofmoney
moneythey
theyspent
spentshopping
shoppingduring
duringthe
theholiday
holidayseason
seasonofofthe
theprevious
previous
year. The
year. Thenext
nextyear,
year,these
thesepeople
peoplewere
weregiven
givenaccess
accesstotothe
thecable
cablenetwork
networkand
andwere
wereasked
askedtotokeep
keepaarecord
recordofof
theirtotal
their totalpurchases
purchasesduring
duringthe
theholiday
holidayseason.
season. Home
HomeShopping
ShoppingNetwork
Networkmanagers
managerswant
wanttototest
testthe
thenull
null
hypothesisthat
hypothesis thattheir
theirservice
servicedoes
doesnot
notincrease
increaseshopping
shoppingvolume,
volume,versus
versusthe
thealternative
alternativehypothesis
hypothesisthat thatititdoes.
does.
t t2.354
t t== =2.354
2.354
=2.354 >1.753,
1.753,
>>1.753,
>1.753, so
soso
so HHH H rejected
isrejected
0 is0is
rejectedand and
and we
wewe conclude
conclude
conclude
0 is0 rejected and we conclude that
that
that
that
D D there
there isevidence
evidence that shopping volume by network
D D 3232
0 0 . 81 .
81 0 0 there
there isisevidence
isevidence that
that
that shopping
shopping
shopping volume
volume
volume byby
by network
network
network
t t s s 2.354
2.354 viewers
viewers
viewers
viewers has has
has
has increased,
increased,
increased,
increased, with
with with
witha ap-value
p-value
ap-value
ap-value between
between
between
between 0.01
0.01
0.01
0.01 an
anan
an
DD 55 55
. 75 . 75
0.025.
0.025.
0.025.
0.025. The
The The
The Template
Template
Template
Template output
output
output
output gives
gives
gives
gives amore
aamore more
amore exact
exact
exact
exact p-value
p-value
p-value
p-value
n n 1616
ofofofof 0.0163.
0.0163.
0.0163.
0.0163. See
See See
See the
the the
the next
next next
next slide
slide
slide
slide for
for for
for the
the the
the output.
output.
output.
output.
t Distribution:
t Distribution: df=15
df=15
0.4 0.4
0.3 0.3
f(t)
f(t)
0.2 0.2
Nonrejection
Nonrejection Rejection
Rejection
Region
Region Region
Region
0.1 0.1
0.0 0.0
-5 -5 0 1.753
01.753 5 5 t t
= t0.05= t0.05
2.131 2.131 2.602
2.602
= t0.025
= t0.025 =t =t
0.01 0.01
2.354=
2.354=
test test
statistic
statistic
8-10
Example 8-2
ItIthas
hasrecently
recentlybeen
beenasserted
assertedthat
thatreturns
returnsononstocks
stocksmaymaychange
changeonceonceaastory
storyabout
aboutaacompany
companyappears
appearsininThe
TheWall
Wall
StreetJournal
Street Journalcolumn
column“Heard
“Heardononthe
theStreet.”
Street.” AnAninvestments
investmentsanalyst
analystcollects
collectsaarandom
randomsample
sampleofof5050stocks
stocksthat
that
wererecommended
were recommendedasaswinners
winnersbybythe
theeditor
editorofof “Heard
“Heardononthe
theStreet,”
Street,”and
andproceeds
proceedstotoconduct
conductaatwo-tailed
two-tailedtest
test
ofofwhether
whetherorornotnotthe
theannualized
annualizedreturn
returnon
onstocks
stocksrecommended
recommendedininthe thecolumn
columndiffers
differsbetween
betweenthethemonth
monthbefore
before
andthe
and themonth
monthafter
afterthe
therecommendation.
recommendation. For Foreach
eachstock
stockthe
theanalysts
analystscomputes
computesthethereturn
returnbefore
beforeand
andthe
thereturn
return
afterthe
after theevent,
event,and
andcomputes
computesthe
thedifference
differenceininthe
thetwo
tworeturn
returnfigures.
figures. He
Hethen
thencomputes
computesthe theaverage
averageand
andstandard
standard
deviationofofthe
deviation thedifferences.
differences.
H0: D 0
H1: D > 0 DD D
D 0 00.1.100 14.14
zz ss 0 0.05 14 .14
DD 0 . 05
n = 50 nn 5050
(1--))100%
AA(1 100%confidence
confidenceinterval
intervalfor
forthe
themean difference ::
meandifference
DD
D t ssDD
D t 2 n
2
n
wherett2 2isisthe
where thevalue
valueof
of the
thettdistributi
distributiononwith
with(n(n--1)
1)degrees
degreesof
offreedom
freedom
thatcuts
that cutsoff
offan anarea of 22totoits
areaof itsright,
right, When
Whenthe
thesample
samplesize
sizeisislarge,
large,
wemay
we mayapproximat
approximateett with zz . .
with
2 2 22
8-12
95%confidence
95% confidenceinterval
intervalfor
forthe
thedata
datain Example8822::
inExample
D z ssDD 0.11.960.05 0.05 0.1 (1.96)(.0071)
D z n 0.11.96
2 0.1 (1.96)(.0071)
2
n 50
50
00.1.100.014
.014[[00.086
.086,0,0.114
.114]]
Notethat
Note thatthis
thisconfidence
confidenceinterval
intervaldoes
doesnot
notinclude
includethe
thevalue
value0.0.
8-13
( x x ) ( )
z 1 2 1 2 0
2
2
1
2
n
1
n 2
Theterm
The term((1--2))0isisthe
thedifference between1an
differencebetween an2under
underthethe
1 2 0 1 2
nullhypothesis.
null hypothesis. IsIsisisequal
equaltotozero
zeroin
insituations
situationsIIand
andII,
II,and
andititisis
equalto
equal tothe
theprespecified
prespecifiedvaluevalueDDin insituation
situationIII.
III. The
Theterm
termin inthe
the
denominatorisisthe
denominator thestandard
standarddeviation
deviationof ofthe
thedifference
differencebetween
between
thetwo
the twosample
samplemeans
means(it (itrelies
relieson
onthe
theassumption
assumptionthatthatthe
thetwo
two
samplesare
samples areindependent).
independent).
8-16
x = 452
1
( x x ) ( )
1 2 1 2 0 ( 452 523) 0
z
= 212
1
2
1 2
2
212
2
185
2
n n 1200 800
Population 2 : Gold Card 1 2
71 71
7.926
80.2346 8.96
n = 800
2
= 185
2 H is rejected at any common level of significan ce
0
8-17
0.2
0.1
andwe
and wemaymayconclude
concludethat that
0.0
0
z thereisisaastatistically
there statistically
-z0.01=-2.576 z0.01=2.576
Rejection Nonrejection Rejection significantdifference
significant differencebetween
between
Region Region Region
Test Statistic=-7.926 theaverage
the averagemonthly
monthlycharges
charges
ofGold
of GoldCardCardandandPreferred
Preferred
Visacardholders.
Visa cardholders.
8-18
Population 1 : Duracell H : 45
0 1 2
H : 45
1 1 2
n = 100
1
( x x ) ( )
x = 308
1 z 1 2 1 2 0 (308 254) 45
2 2 2 2
84 67
= 84
1
1 2
n n 100 100
1 2
Population 2 : Energizer
9 9
0.838
115.45 10.75
n = 100
2
p - value : p(z > 0.838) = 0.201
x = 254
2
H may not be rejected at any common
0
= 67
2
level of significance
8-19
AA95%
95%confidence
confidenceinterval
intervalusing
usingthe
thedata
datain
inexample
example8-3:
8-3:
2 2
212 2 1852
(x x ) z 1 2 (523 452) 1.96 [53.44,88.56]
1 2 n n 1200 800
1 2
2
8-20
Ifwe
If wemight
mightassume
assumethat
thatthe
thepopulation variances212and
populationvariances and222 are
areequal
equal
1 2
(eventhough
(even thoughunknown),
unknown),thenthenthe
thetwo
twosample
samplevariances,
variances,ss212and
andss22,2,
1 2
providetwo
provide twoseparate
separateestimators
estimatorsofofthe
thecommon
commonpopulation
populationvariance.
variance.
Combiningthe
Combining thetwo
twoseparate
separateestimates
estimatesinto
intoaapooled
pooledestimate
estimateshould
should
giveus
give usaabetter
betterestimate
estimatethan
thaneither
eithersample
samplevariance
varianceby
byitself.
itself.
Deviation from the Deviation from the
mean. One for each mean. One for each
sample data point. sample data point.
}
}
** * * * * **
** * * * * * ** * * ** * * ** * *
Sample 1 Sample 2
x1 x2
From sample 1 we get the estimate s12 with From sample 2 we get the estimate s22 with
(n1-1) degrees of freedom. (n2-1) degrees of freedom.
From both samples together we get a pooled estimate, sp2 , with (n1-1) + (n2-1) = (n1+ n2 -2)
total degrees of freedom.
8-21
s 2p
n1 n2 2
Thedegrees
The degreesof
offreedom
freedomassociated
associatedwith
withthis
thisestimator
estimatoris:
is:
df==(n
df (n+1+ n2-2)
1 n2-2)
Thepooled
The pooledestimate
estimateof ofthe
thevariance
varianceisisaaweighted
weightedaverage
averageof ofthe
thetwo
two
individualsample
individual samplevariances,
variances,with
withweights
weightsproportional
proportionaltotothe
thesizes
sizesof
ofthe
thetwo
two
samples. That
samples. Thatis,
is,larger
largerweight
weightisisgiven
giventotothe
thevariance
variancefrom
fromthethelarger
larger
sample.
sample.
8-22
Teststatistic
Test statisticfor
forthe
thedifference
differencebetween
betweentwotwopopulation
populationmeans,
means, assuming
assumingequal
equal
populationvariances:
population variances:
(x1xx 2))(( 1 2)) 0
(x1 2 1 2 0
t t==
22 11 11
sspp n n
n11 n22
where(( 1 2)) 0 isisthe
where thedifference
differencebetween
betweenthe
thetwo
twopopulation
populationmeans
meansunder
underthe
thenull
null
1 2 0
hypothesis(zero
hypothesis (zeroororsome
someother
othernumber
numberD).D).
Thenumber
The numberofofdegrees
degreesofoffreedom
freedomofofthe
thetest statisticisisdfdf ==((nn1nn2 22))(the
teststatistic (the
2 1 2
numberofofdegrees
degreesofoffreedom
freedomassociated
associatedwith 2
withss p, , the
thepooled
pooledestimate
estimateofofthe
the
number p
populationvariance.
population variance.
8-23
Example 8-5
Dothe
Do thedata
dataprovide
providesufficient
sufficientevidence
evidencetotoconclude
concludethat
thataverage
averagepercentage
percentageincrease
increaseininthe
theCPI
CPIdiffers
differswhen
whenoil
oil
sellsatatthese
sells thesetwo
twodifferent
differentprices?
prices?
n1 = 14 H1: 1 2 0
( x1 x 2 ) ( 1 2 ) 0
x1 = 0.317% t
s1 = 0.12% ( n1 1) s12 ( n2 1) s22 1 1
Population 2: Oil price = $20.00 n1 n2 2 n1 n2
n2 = 9 0.107 0.107
2.154
x 2 = 0.21% 0.00247 0.0497
s 2 = 0.11%
Critical point: t = 2.080
0.025
H 0 may be rejected at the 5% level of significance
df = (n n 2 ) (14 9 2 ) 21
1 2
8-24
Example 8-6
t Distribution: df = 25 Sincethe
Since thetest
teststatistic
statisticisisless
less
thantt0.10,the
thenull
nullhypothesis
hypothesis
0.4
0.3
than 0.10,
cannotbe
cannot berejected
rejectedatatany
any
f(t)
0.2
reasonablelevel
reasonable levelof
of
0.1
significance. We
significance. Weconclude
conclude
thatthe
theprice
pricereduction
reductiondoesdoes
0.0
-5 -4 -3 -2 -1 0 1 2 3
t0.10=1.316
4 5 t
that
Nonrejection
Region
Rejection
Region
notsignificantly
not significantlyaffectaffectsales.
sales.
Test Statistic=0.91
8-26
2 1
1
( x1 x2 ) t sp
1 2
n n
2
AA95%
95%confidence
confidenceinterval
intervalusing
usingthe
thedata
datain
inExample
Example8-6:
8-6:
1 1
( x1 x 2 ) t
2
sp ( 6870 6598 ) 2 .06 ( 595835)( 0.15) [ 343.85,887 .85]
n1 n2
2
8-27
Whenthe
When thepopulation
populationproportions
proportionsare
arehypothesized
hypothesizedtotobebeequal,
equal,then
thenaapooled
pooledestimator
estimatorofof
theproportion
the proportion(( p ))may
maybebeused
usedinincalculating
calculatingthe
thetest
teststatistic.
statistic.
AAlarge-sample
large-sampletest
teststatistic
statisticfor
forthe
thedifference
differencebetween
betweentwo twopopulation
population
proportions,when
proportions, whenthethehypothesized
hypothesizeddifference
differenceisiszero:
zero:
( pˆ 1 pˆ 2 ) 0
z
1 1
pˆ (1 pˆ )
n1 n2
where
where pˆ 1
x1
isisthe
thesample
sampleproportion
proportionininsample and p x1 isisthe
sample 11and thesample
sample
n1 1
n1
proportionininsample
proportion sample2.2. The symbol p stands
Thesymbol standsfor
forthe
thecombined
combinedsample
sample
proportionininboth
proportion bothsamples,
samples,considered
consideredas
asaasingle
singlesample.
sample. That
Thatis:
is:
x1 x1
pˆ
n1 n2
8-29
Carry out a two-tailed test of the equality of banks’ share of the car loan market in 1980 and 1995.
Population 1:1980
H : p p 0
n = 100 0 1 2
1 H : p p 0
x = 53 1 1 2
1 ( p p )0
1 2 0.53 0.43
z
p̂ = 0.53 1 1
1 pˆ (1 pˆ ) 1 1 (.48)(.52)
n n 100 100
1 2
0.10 0.10
Population 2 :1995 1.415
0.004992 0.07065
n 2 = 100
x 2 = 43 Critical point : z = 1.645
0.05
p̂2 = 0.43 H may not be rejected even at a 10%
0
x +x level of significance.
pˆ 1 2 53 43 0.48
n n 100 100
1 2
8-30
0.2
10%
0.1
mayconclude
may concludethat thatthere
thereisisno
no
0.0
0
z statisticallysignificant
statistically significant
-z0.05=-1.645 z0.05=1.645
Rejection Nonrejection Rejection differencebetween
difference betweenbanks’
banks’
Region Region Region
Test Statistic=1.415 sharesof
shares ofcarcarloans
loansinin1980
1980
and1995.
and 1995.
8-31
Carryout
Carry outaaone-tailed
one-tailedtest
testtotodetermine
determinewhether
whetherthe
thepopulation
populationproportion
proportionofoftraveler’s
traveler’scheck
checkbuyers
buyerswho
whobuybuy
atatleast
least$2500
$2500ininchecks
checkswhen
whensweepstakes
sweepstakesprizes
prizesare
areoffered
offeredasasatatleast
least10%
10%higher
higherthan
thanthe
theproportion
proportionofofsuch
such
buyerswhen
buyers whennonosweepstakes
sweepstakesare areon.
on.
0.2
0.1
thenull
the nullhypothesis
hypothesismay maybebe
0.0
0
z rejected,and
rejected, andwe wemay
may
z0.001=3.09
pˆ (1 pˆ ) pˆ (1 pˆ )
( pˆ pˆ ) z 1 1 2 2
1 2 n n
1 2
2
AA95%
95%confidence
confidenceinterval
intervalusing
usingthe
thedata
datain
inexample
example8-8:
8-8:
pˆ (1 pˆ ) pˆ (1 pˆ )
( pˆ pˆ ) z 1 1 2 2 (0.4 0.2) 1.96 (0.4)(0.6) (0.2)(0.8)
1 2 n n
1 2
300 700
2
0.2 (1.96)(0.0321) 0.2 0.063 [0.137,0.263]
8-34
TheFFdistribution
The distributionisisthe
thedistribution
distributionofofthe
theratio
ratioofoftwo
twochi-square
chi-squarerandom
randomvariables
variables
thatare
that areindependent
independentofofeach
eachother,
other,each
eachofofwhich
whichisisdivided
dividedbybyits
itsown
owndegrees
degreesofof
freedom.
freedom.
AnFFrandom
An randomvariable
variablewith
withkk1and
andkk2degrees
degreesofoffreedom:
freedom:
1 2
12
k1
F k ,k
1 2
22
k2
8-35
The F Distribution
•• The
TheFFrandom
randomvariable
variablecannot
cannot F Distributions with different Degrees of Freedom
benegative,
be negative,so soititisisbound
boundbyby
zeroon
zero onthe
theleft.
left. 1.0 F(25,30)
f(F)
•• The
TheFFdistribution
distributionisisskewed
skewedtoto F(10,15)
theright.
the right. 0.5
•• The
TheFFdistribution
distributionisisidentified
identified
thenumber
the numberof ofdegrees
degreesof of 0.0
F(5,6)
freedomin
freedom inthe
thenumerator,
numerator,kk,1, 0 1 2 3 4 5
F
1
andthe
and thenumber
numberof ofdegrees
degreesofof
freedomin
freedom inthe
thedenominator,
denominator,
kk2.2.
8-36
k1 1 2 3 4 5 6 7 8 9 0.7
0.6
k2
1 161.4 199.5 215.7 224.6 230.2 234.0 236.8 238.9 240.5 0.5
2 18.51 19.00 19.16 19.25 19.30 19.33 19.35 19.37 19.38 0.4
f(F)
3 10.13 9.55 9.28 9.12 9.01 8.94 8.89 8.85 8.81
4 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 6.00 0.3
5 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 4.77 0.2
6 5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 4.10
7 5.59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 3.68 0.1
8 5.32 4.46 4.07 3.84 3.69 3.58 3.50 3.44 3.39 0.0 F
9 5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 3.18 0 1 2 3 4 5
10 4.96 4.10 3.71 3.48 3.33 3.22 3.14 3.07 3.02
11 4.84 3.98 3.59 3.36 3.20 3.09 3.01 3.01 2.95 2.90
12 4.75 3.89 3.49 3.26 3.11 3.00 2.91 2.85 2.80 F0.05=3.01
13 4.67 3.81 3.41 3.18 3.03 2.92 2.83 2.77 2.71
14 4.60 3.74 3.34 3.11 2.96 2.85 2.76 2.70 2.65
15 4.54 3.68 3.29 3.06 2.90 2.79 2.71 2.64 2.59
0.4
FF(6,9) =3.37
(6,9)=3.37
f(F)
0.3
0.05
0.2
0.1 Thecorresponding
The correspondingleft-hand
left-handcritical
critical
0.0 pointisisgiven
point givenby:
by:
0 1 2 3 4 5 F
1 1
F0.95=(1/4.10)=0.2439 F0.05=3.37
0.2439
F 9 , 6 410
.
8-38
I:I:Two-Tailed
Two-TailedTest
Test
•• 1==2 1 2
•• HH[Link]1 1==2 2
•• HH:
1:2
1 2
II:
II:One-Tailed
One-TailedTest
Test
••
12
1 2
•• HH[Link]1 1
2
2
•• HH:1:1
1 1
2
2
8-39
Example 8-9
The economist wants to test whether or not the event (interceptions and prosecution of insider
traders) has decreased the variance of prices of stocks.
Population 1 : Before
n = 25 2
1 2 2
H 0:
s 2 9.3 1 21
1
2 2
Population 2 : After H1:
n = 24 1 2
2
s2
s 2 3.0 1 9.3
F F 3.1
2
n1 1, n 2 1
24,23 s2 3.0
2
0.05
F 2.01
24,23 H 0 may be rejected at a 1% level of significance.
0.01
F 2.70
24,23
8-40
0.6
statisticisisabove
statistic abovethethecritical
critical
0.5 point,even
point, evenforforaalevel
levelof
of
0.4
significanceas
significance assmall
smallasas0.01,
0.01,
f(F)
0.3
0.2 thenull
the nullhypothesis
hypothesismay maybebe
0.1
rejected,and
rejected, andwewemaymay
0.0 F
0 1 2 3 4 5 concludethat
conclude thatthe
thevariance
varianceofof
F0.01=2.7 Test Statistic=3.1 stockprices
stock pricesisisreduced
reducedafter
after
theinterception
the interceptionand and
prosecutionof
prosecution ofinside
insidetraders.
traders.
8-41
Population 1 Population 2
n = 14 n =9 2
H :
2
1 2 0 1 2
2 2 2 2
s 0.12 s 0.11 2 2
1 2 H :
1 1 2
0.05
s2
F 3.28 1 0.122
F F 119
.
13,8 n1 1, n 2 1 s 0.11
13,8 2 2
2
0.10
F 2.50 H may not be rejected at the 10% level of significance.
13,8 0
8-42
0.10
0.6 0.80 statistic
0.5
0.4
points,even
points, evenfor foraa20%
20%level
levelof
of
f(F)
0.3
0.10 significance,we
significance, wecan
cannot
notreject
reject
0.2
0.1 thenull
the nullhypothesis.
hypothesis. We We
0.0
0 1 2 3 4 5 F concludethe
conclude thetwo
twopopulation
population
F0.90=(1/2.20)=0.4545 F0.10=3.28
variancesare
variances areequal.
equal.
Test Statistic=1.19
8-43
The p-value
The p-value forfor the
the test
test is
is 0.8304
0.8304
which is
which is larger
larger than
than 0.05.
0.05. Thus
Thus
the null
the null hypothesis
hypothesis cannot
cannot be be
rejected at
rejected at this
this level
level of
of
significance of
significance of 0.05.
0.05. That
That is,is, one
one
can assume
can assume equal
equal variance.
variance.