116
Module No. and Title : Module 6 Elementary Statistics and Probability
Lesson No. and Title : Confidence Intervals
Learning Outcomes :
1. Find a confidence interval for a variance and a standard deviation.
2. Determine the minimum sample size for finding a confidence interval
3. Determine the minimum sample size for finding a confidence interval for a
proportion.
Time Frame : 6 hours
Introduction:
My dear students welcome to module 6! One aspect of statistics is the process of estimating the
value of a parameter from information obtained from a sample. An important question in estimation is
that of sample size. How large should the sample be in order to make an accurate estimate? This
question is not easy to answer since the size of the sample depends on several factors, such as the
accuracy desired and the probability of making a correct estimate. The question of sample size will be
explained in this module.
Activity:
A survey by SWS found that 45% of the people who were offended by a television program would
change the channel, while 15% would turn off their television sets. The survey further stated that the
margin of error is 3 percentage points, and 4000 adults were interviewed.
1. How do these estimate compare with the true population percentages?
2. What is meant by a margin of error of 3 percentages points?
3. Is the sample of 4,000 large enough to represent the population of all adults who watch television in
the Philippines?
Analysis:
From the activity, How were you able to determine the find the estimates and answer the questions?
Abstraction:
Confidence Intervals
An interval of 4 plus or minus 2
A Confidence Interval is a range of values we are fairly sure our true value lies in.
117
Example: Average Height
We measure the heights of 40 randomly chosen men, and get a mean height of 175cm,
We also know the standard deviation of men's heights is 20cm.
The 95% Confidence Interval (we show how to calculate it later) is:
175cm ± 6.2cm
This says the true mean of ALL men (if we could measure all their heights) is likely to be
between 168.8cm and 181.2cm.
But it might not be!
The "95%" says that 95% of experiments like we just did will include the true mean,
but 5% won't.
So there is a 1-in-20 chance (5%) that our Confidence Interval does NOT include the true
mean.
Calculating the Confidence Interval
Step 1: start with
the number of observations n
the mean
and the standard deviations
Note: we should use the standard deviation of the entire population, but in many cases we
won't know it.
We can use the standard deviation for the sample if we have enough observations (at least
n=30, hopefully more).
Using our example:
number of observations n = 40
mean = 175
standard deviation s = 20
118
Step 2: decide what Confidence Interval we want: 95% or 99% are common choices.
Then find the "Z" value for that Confidence Interval here:
Confidence
Z
Interval
80% 1.282
85% 1.440
90% 1.645
95% 1.960
99% 2.576
99.5% 2.807
99.9% 3.291
For 95% the Z value is 1.960
Step 3: use that Z value in this formula for the Confidence Interval
± Zs√n
Where:
is the mean
Z is the chosen Z-value from the table above
s is the standard deviation
n is the number of observations
And we have:
175 ± 1.960 × 20√40
Which is:
175cm ± 6.20cm
In other words: from 168.8cm to 181.2cm
The value after the ± is called the margin of error. The margin of error in our example is
6.20cm
Another Example
119
Example: Apple Orchard
Are the apples big enough?
There are hundreds of apples on the trees, so you randomly choose just 46 apples and get:
a Mean of 86
a Standard Deviation of 6.2
So let's calculate:
± Zs√n
We know:
is the mean = 86
Z is the Z-value = 1.960 (from the table above for 95%)
s is the standard deviation = 6.2
n is the number of observations = 46
86 ± 1.960 × 6.2√46 = 86 ± 1.79
So the true mean (of all the hundreds of apples) is likely to be between 84.21 and 87.79
True Mean
Now imagine we get to pick ALL the apples straight away, and get them ALL measured
by the packing machine (this is a luxury not normally found in statistics!)
And the true mean turns out to be 84.9
Let's lay all the apples on the ground from smallest to largest:
Each apple is a green dot,
except our observations which are blue
Our result was not exact ... it is random after all ... but the true mean is inside our
confidence interval of 86 ± 1.79 (in other words 84.21 to 87.79)
Now the true mean might not be inside the confidence interval, but in 95% of the cases it
will be!
95% of all "95% Confidence Intervals" will include the true mean.
120
Maybe we had this sample, with a mean of 83.5:
Each apple is a green dot,
our observations are marked purple
That does not include the true mean. Expect that to happen 5% of the time for a 95%
confidence interval.
So how do we know if the sample we took is one of the "lucky" 95% or the unlucky 5%?
Unless we get to measure the whole population like above we simply don't know.
This is the risk in sampling, we might have a bad sample.
Example in Research
Here is Confidence Interval used in actual research on extra exercise for older people:
What is it saying? Looking at the "Male" line we see:
1,226 Men (47.6% of all people)
had a "HR" (see below) with a mean of 0.92,
and a 95% Confidence Interval (95% CI) of 0.88 to 0.97 (which is also 0.92±0.05)
"HR" is a measure of health benefit (lower is better), so that line says that the true benefit
of exercise (for the wider population of men) has a 95% chance of being between 0.88
and 0.97
* Note for the curious: "HR" is used a lot in health research and means "Hazard Ratio"
where lower is better, so an HR of 0.92 means the subjects were better off, and 1.03
means slightly worse off.
121
Standard Normal Distribution
It is all based on the idea of the Standard Normal Distribution, where the Z value is the
"Z-score"
For example the Z for 95% is 1.960, and here we see the range from -1.96 to +1.96
includes 95% of all values:
From -1.96 to +1.96 standard deviations is 95%
Applying that to our sample looks like this:
Also from -1.96 to +1.96 standard deviations, so includes 95%
Conclusion
The Confidence Interval is based on Mean and Standard Deviation. Its formula is:
± Zs√n
Where:
is the mean
Z is the Z-value from the table below
s is the standard deviation
n is the number of observations
122
Confidence Z
Interval
80% 1.282
85% 1.440
90% 1.645
95% 1.960
99% 2.576
99.5% 2.807
99.9% 3.291
Minimum Sample Size for finding a confidence interval for the mean
Sample size is closely related to statistical estimation. Quite often, you ask
How large a sample is necessary to make accurate estimate? The answer is not simple since it depends
on three things:
1. The maximum error of the estimate.
2. The population standard deviation
3. The degree of confidence.
Example:
123
A scientist wishes to estimate the average depth of a river. He wants to be 99% confident that the
estimate is accurate within 2 feet. From previous study, the standard deviation of the depths measured
was 4.38 feet.
Solution:
Since ( or 1-0.99) , and E =2. Substituting in the formula,
= 31.92
Round the value 31.92 up to 32. Therefore, to be 99% confident the estimate is within 2 feet of the true
mean depth, the scientist needs at least a sample of 32 measurements.
Note: Always round the n up to the next whole number.q
The formula for determining the sample size requires the use of the population standard deviation.
What happens when is unknown? In this case, an attempt is made to estimate . One such way is to
use the standard deviation s obtained from a sample taken previously as an estimate for . The standard
deviation can also be estimated by dividing the range by 4.
Application
1. A sample of 30 randomly selected oranges was taken from a large population, and their diameters
were measured. The diameter of the sample was 91 mm and the standard deviation was 8 mm.
Assuming a normal distribution, calculate (correct to one decimal place) 85% confidence limits for the
mean diameter of the whole population of oranges.
2.A sample of 20 randomly chosen watermelons was taken from a large population, and their weights
were measured. The mean weight of the sample was 105 lb. and the standard deviation was 15lb.
Calculate (correct to one decimal place) 99.5% confidence limits for the mean weight of the whole
population of watermelons.
3. The times of 8 runners in a randomly selected heat of the 100 m sprint in the Olympic Games had a
mean time of 9.94 s and a standard deviation of 0.08 s. Calculate (correct to two decimal places) 99.9%
confidence limits for the mean time of all the 100 m runners at the Olympic Games.
4. A sample of nine is taken from a large population, with the following ages:
33,40,44,45,49,52,54,57,58
124
Calculate (correct to one decimal place) the 95% confidence limits for the mean of the whole
population.
5. If the variance of a national accounting examination is 900, how large a sample needed to estimate
the true mean within 5 points with 99% confidence?
References:
Bluman, A.G. (2008). Elementary Statistics: A step by step approach. (A Brief Version). New
York: McGraw-
Hill Companies, Inc.
Gibbons, J.D. & Chakraborti, S. (2010). Nonparametric statistical inference. (5th ed.) Florida:
CRC Press.
Siegel, S. & Castella, N.J (1998). Nonparametric statistics for the behavioral sciences. (2nd ed.)
New York: McGraw Hill.
Sirug, W.S. (2011). Basic probability and statistics: A step by step approach. Manila,
Philippines: Mindshapers Co., Inc.
Walpole, R.E. (2003). Introduction to statistics. (3rd ed.) New York: Macmillan