NCC501 Statistics for Management
Sample Size to Control Type II Error Probability
Hypothesis tests are usually designed with one goal: make sure that the Type I Error Probability is α
or lower. This does not give any specific protection against a Type II Error, “accepting the null
hypothesis when it is false.”
Calculating the Type II Error probability is a source of great confusion, and for this reason most
people treat it as an “after-thought” in hypothesis testing. As a consequence, if the null hypothesis is
not rejected, we are left in the uncomfortable situation of not knowing whether H0 is true, or whether
it is false but the sample was too small to give a reliable test. In the latter case, “accepting H0” would
be a Type II error.
The key to controlling Type II error is the sample size. This note presents formulas that
approximate how large the sample must be. To use these formulas you must make a judgment:
You must specify a value of a population parameter (either µ or p) that represents
the alternative hypothesis, one that is different enough from H0 to matter.
This is a practical issue and has nothing to do with statistics. For example, no one cares if a bottle-
filling machine is off by an average of 0.00002 ounces per sixteen-ounce bottle. However, if that gap
were 0.1 ounces someone might be very concerned. It is your job to decide how large the gap must
be to actually make a difference in the real situation.
The gap between H0 and Ha is the main determinant of sample size. If the gap is narrow, we need
a large sample to tell us which hypothesis is true. If the gap is wide, a much smaller sample will
suffice. Simply put, it takes more information to distinguish between very similar alternatives than
very different ones.
The formulas for calculating the appropriate sample size depend on knowing how much
variability there is in the population. This presents a chicken-and-egg dilemma: you can’t calculate
the necessary sample size until you have an estimate of variability, but you can’t get an estimate
until you collect a sample. An often-used solution to this dilemma is to do a small pilot study to
estimate those values. Another method is to make an educated guess. In either case, the resulting
sample-size calculation must be treated as an approximation.
The benefit derived from the sample-size calculation comes in the interpretation of the results:
If the null hypothesis is not rejected, you have strong evidence that “either H0 is
true, or it is false by an amount too small to matter from a practical standpoint.”
The sample-size formulas are based on the normal probability distribution, and certain other
assumptions that may be approximations. Use them to determine the “order of magnitude” of the
sample size, but do not treat them as exact. For example, if the recommended sample size is 34.5, do
not worry about whether to use 34 or 35. “Round up” to an integer, and consider increasing the
sample a bit more to be conservative.
Sample Size to Control Type II Error Probability p. 2
One-Sample Test for a Mean (File: [Link], Sheet: 1-Mean)
µ is the population mean.
• µ0 is the value of µ under the null hypothesis.
• µa is a value of µ that makes Ha true and is “just different enough” from µ0 to matter.
• For a 2-tail test the value of µa could be on either side of µ0. Use either one in the n* formula.
• s is the standard deviation estimated from the sample. If the population value (σ) is available, use
it instead of s.
This formula for n* assumes that σ has the same value under Ha as under H0.
(1) n* ≈
(z α + zβ )2 s 2 Use the Normal Distribution for zα and zβ.
(µa − µ0 ) 2 For a two-tail test use α/2.
Example:
A bottle-filling machine has been set to produce an average of 16.3 ounces per bottle, in the long
run. Legally, every bottle must contain at least 16 ounces. However, the filling process has a
standard deviation of 0.06 ounces. If the machine were set at 16, half of the bottles would be
underfilled. They set it at 16.3 to make sure that the rate of underfilled bottles is less than one in
a million.
Sometimes the machine goes out of adjustment. When the long-run average drops to 16.27
ounces per bottle, the rate of underfilled bottles exceeds 3 per million, a situation that
management considers serious enough to stop the filling process and readjust the machine,
although the lost production is a substantial cost.
Their current sampling plan is to test 36 bottles and stop the machine if the sample’s average
is below a given value. However, they are not sure that this plan gives error probabilities that are
low enough. Management wants to be 95% certain that the machine will be stopped when its
long-run average is 16.27 or lower, and to be 95% certain of NOT stopping the machine when its
long-run average is actually 16.3.
Is the sample size of 36 sufficient?
This is a one-tail test because they stop the machine only when the sample average is below a
specified level.
• H0 is that the machine is running properly, so µ0. = 16.3
• Ha is represented by µa = 16.27, which is far enough below 16.3 to matter.
• The standard deviation is 0.06. Since it describes the process rather than a particular sample, it is
a population value, σ.
• To achieve 95% certainty of avoiding both kinds of errors, α=0.05 for a one-tail test, so use zα =
1.645, and β = 0.05, so use zβ = 1.645.
n* ≈
(z α + zβ )2 s 2 (1.645 + 1.645) 2 0.06 2
= = 43.3
(µa − µ0 ) 2 0.03 2
Conclusion: To achieve 5% probabilities for Type I and Type II errors when the difference between
“in adjustment” and “out of adjustment” is 0.03, the samples should include at least 44 bottles rather
than 36.
Sample Size to Control Type II Error Probability p. 3
Two-Sample Test for Means (File: [Link], Sheet: 2-Means)
µ1 − µ2 is the difference between two population means.
• d0 is the value of µ1 − µ2 under the null hypothesis (usually zero).
• da is a value of µ1 − µ2 that makes Ha true and is “just different enough” from d0 to matter.
• For a 2-tail test the value of da could be on either side of d0. Use either one in the n* formula.
• s1 and s2 are the standard deviations from each sample. If the population values (σ1 and σ2) are
available, use them instead of s1 and s2.
( z α + z β )2 (s12 + s 22 ) Use the Normal Distribution for zα and zβ.
(2) n* ≈ For a two-tail test use α/2.
(d a − d 0 ) 2 The recommended sample sizes are n1=n* and n2=n*.
Example:
The customer service department of a large corporation has begun a program to benchmark their
service quality against their competitors. Their first effort was to measure how long it takes to
reach a customer service representative at their “800” number. A pilot study was carried out.
First they looked at their own system. Based on a sample of 30 calls, the average time was 2.5
minutes and the standard deviation was 0.9 minute. Calling one of their competitors 30 times
resulted in an average of 2.7 minutes with a standard deviation of 1.1 minutes.
After careful consideration, the benchmarking team decided that they should guard against
making an error of more than 0.3 minute. That is, if the long-run difference in times were really
0.3 minute, they want to be 99% certain that their study will make the correct conclusion, which
would be that the companies really differ. However, they want to be equally careful to draw the
correct conclusion if the long-run average times do not differ.
This is a two-tail test because they are only asking if they “differ” from their competitor.
• The null hypothesis is “no difference” so d0 = 0.
• For the alternative hypothesis, a difference of 0.3 or more is important, so da = 0.3.
• Since they want 99% certainty of no error, both error probabilities are to be 1%. Thus, α = 0.01
for a two-tail test, so zα/2 = 2.576. Also, β = 0.01, zβ = 2.326.
• From the pilot study we have preliminary estimates to use in the sample size formula:
s1= 0.9 and s2 =1.1.
( z α + z β )2 (s12 + s 22 ) ( 2.576 + 2.326 )2 (0.9 2 + 1.12 ) = 539.4
n* ≈ =
(d a − d 0 ) 2 (0.3 − 0.0 ) 2
Conclusion: To achieve 1% for both error probabilities they should collect samples of at least 540
from each population.
Sample Size to Control Type II Error Probability p. 4
One-Sample Test for a Proportion (File: [Link], Sheet: 1-Prop’n)
p is the population proportion.
• p0 is the value of p under the null hypothesis.
• pa is a value of p that makes Ha true and is “just different enough” from p0 to matter.
• For a 2-tail test the value of pa could be on either side of p0. Choose the one that is closest to 0.5.
• The formula relies on the normal approximation to the binomial, so be sure to verify that n*p≥5
and n*(1-p)≥5 for both p0 and pa after you use it.
(3) n* ≈
(zα p0 (1 - p0 ) + zβ pa (1 - pa ) )2 Use the Normal Distribution for zα and zβ.
(pa − p0 ) 2 For a two-tail test use α/2.
Example:
The president wants to be informed when her “true” approval rating differs by more than 2
percentage points from 60%. She does not want to be notified of a shift unless the evidence is
quite strong. However, she also does not want to be in the embarrassing situation of not having
been notified when a real shift has occurred. The staff is about to commission a new survey of
100 voters.
This is a two-tail test because notification is requested whenever there is a change in either direction.
• Unless notified otherwise, she assumes a 60% approval, so p0 = 0.6.
• Because H0 is two-tailed, the alternative hypothesis could have either pa = 0.62 or pa = 0.58; we
use pa = 0.58, the one closer to 0.5.
• No values are given for error probabilities. We will use 5% so that zα/2 = 1.960 and zβ = 1.645.
n* ≈
(zα p0 (1 - p0 ) + zβ pa (1 - pa ) )2 = (1.960 .6(.4) + 1.645 .58(.42) )2 = 7850.1
(pa − p0 ) 2 0.022
Conclusion: To achieve 5% error probabilities when the difference between “current approval” and
“new approval” is 0.02, a sample of at least 7851 observations is needed.
Sample Size to Control Type II Error Probability p. 5
Two-Sample Test for Proportions (File: [Link], Sheet: 2-Prop’ns)
p1 − p2 is the difference between two population proportions.
• d0, the value of p1 − p2 under the null hypothesis, is assumed to be zero.
• da is a value of p1 − p2 that makes Ha true and is “just different enough” from zero to matter.
• p1 and p 2 are the proportions calculated from the samples.
n p + n 2 p2
• p= 1 1 is the pooled estimate. If you have no estimate of p , use an educated guess.
n1 + n 2
• The formula relies on the normal approximation to the binomial, so be sure to verify that n* p ≥5
and n*(1- p )≥5 after you use it.
2 Use the Normal Distribution for zα and zβ.
⎛ z 2p(1 - p) + z 2p(1 - p) - 0.5d 2 ⎞
⎜ α β a ⎟ For a two-tail test use α/2.
(4) n* ≈ ⎝ ⎠
2
The recommended sample sizes are
da n1=n* and n2=n*.
Example:
Consumers United is testing to see whether there is a difference in the results from two pollsters.
Each pollster sampled 1000 voters, asking “would you would vote for the current president if the
election were held today?” One pollster’s result was 37% and the other was 43%. Are the
samples large enough to maintain 5% or lower probabilities for both error types?
This is a two-tail test because they are only asking if the polling methods “differ”. The null
hypothesis is “no difference” we can use the method above. CU has specified that a difference of
0.03 or more is important, so that value is p1−p2 for Ha. For a two tail test, zα = z0.025 =1.96 and zβ =
z0.05 = 1.645. First calculate the “pooled” value of the sample proportion, p :
n p + n 2 p2 1000(0.37) + 1000(0.43)
p= 1 1 = =0.4
n1 + n 2 20000
Then calculate the recommended sample size:
2
⎛ z 2p(1 - p) + z 2p(1 - p) - 0.5d 2 ⎞
⎜ α β a ⎟
n* ≈ ⎝ ⎠
d a2
2
⎛1.96 2(.4)(.6) + 1.645 2(.4)(.6) - 0.5(.03) 2 ⎞
⎜ ⎟
=⎝ ⎠ = 6927.5
0.03 2
Conclusion: Since this is much larger than the published samples, if Consumers United uses the
samples of 1000 to test the difference between the two pollsters at α=0.05, they face a Type II error
probability that is much larger than 5%.