Chapter 4A
Inferences Based on a Single Sample:
Confidence Intervals
Chapter outline
Appendix 1: the Normal Distribution
Appendix 2: Sampling Distribution
Confidence intervals for the population
parameter
Proper sample size for estimating a population
parameter (optional)
Test of Hypothesis
Appendix 1
The Normal Distribution
Importance of
Normal Distribution
Describes many random processes or
continuous phenomena
Basis for classical statistical inference
Normal Distribution
1. ‘Bell-shaped’ & f(x )
symmetrical
2. Mean, median,
mode are equal
x
Mean
Median
Mode
Probability Density Function
2
1 x
1
2
f ( x) e
2
where
µ = Mean of the normal random variable x
= Standard deviation
π = 3.1415 . . .
e = 2.71828 . . .
P(x < a) is obtained from a table of normal
probabilities
Effect of Varying
Parameters ( & )
Normal Distribution
Probability
Probability is
d
area under P(c x d) f (x)dx ?
curve! c
f(x)
x
c d
Standard Normal Distribution
The standard normal distribution is a normal
distribution with µ = 0 and = 1. A random
variable with a standard normal distribution,
denoted by the symbol z, is called a standard
normal random variable.
The Standard Normal Table:
P(0 < z < 1.96)
Standardized Normal
Probability Table (Portion)
Z .04 .05 .06 =1
1.8 .4671 .4678 .4686
.4750
1.9 .4738 .4744 .4750
2.0 .4793 .4798 .4803
= 0 1.96 z
2.1 .4838 .4842 .4846 Shaded area
Probabilities exaggerated
The Standard Normal Table:
P(–1.26 z 1.26)
Standardized Normal Distribution
=1
.3962 .3962 P(–1.26 ≤ z ≤ 1.26)
= .3962 + .3962
= .7924
–1.26 1.26 z
=0
Shaded area exaggerated
The Standard Normal Table:
P(z > 1.26)
Standardized Normal Distribution
=1
.5000
P(z > 1.26)
= .5000 – .3962
.3962
= .1038
1.26 z
=0
The Standard Normal Table:
P(–2.78 z –2.00)
Standardized Normal Distribution
=1
.4973 P(–2.78 ≤ z ≤ –2.00)
= .4973 – .4772
.4772 = .0201
–2.78 –2.00 z
=0
Shaded area exaggerated
The Standard Normal Table:
P(z > –2.13)
Standardized Normal Distribution
=1
.4834 .5000 P(z > –2.13)
= .4834 + .5000
= .9834
–2.13 z
=0
Shaded area exaggerated
Non-standard Normal
Distribution
Normal distributions differ by Each distribution would
mean & standard deviation. require its own table.
f(x)
x That’s an infinite
number of tables!
Converting a Normal Distribution to
a Standard Normal Distribution
If x is a normal random variable with mean μ and
standard deviation , then the random variable z,
defined by the formula
x µ
z
has a standard normal distribution. The value z
describes the number of standard deviations
between x and µ.
Standardize the
Normal Distribution
x
z
Normal Standardized Normal
Distribution Distribution
= 1
x = 0 z
One table!
Finding a Probability Corresponding to a
Normal Random Variable
1. Sketch normal distribution, indicate mean, and shade
the area corresponding to the probability you want.
2. Convert the boundaries of the shaded area from x
values to standard normal random variable z
x µ
z
Show the z values under corresponding x values.
3. Use Table II in Appendix D to find the areas
corresponding to the z values. Use symmetry when
necessary.
Non-standard Normal μ = 5,
σ = 10: P(5 < x < 6.2)
x 6.2 5
z .12
Normal 10 Standardized Normal
Distribution Distribution
= 10 =1
.0478
= 5 6.2 x = 0 .12 z
Shaded area exaggerated
Non-standard Normal μ = 5,
σ = 10: P(3.8 x 5)
x 3.8 5
z .12
10
Normal Standardized Normal
Distribution Distribution
= 10 =1
.0478
3.8 = 5 x -.12 = 0 z
Shaded area exaggerated
Non-standard Normal μ = 5,
σ = 10: P(2.9 x 7.1)
x 2.9 5 x 7.1 5
z .21 z .21
10 10
Normal Standardized Normal
Distribution Distribution
= 10 =1
.1664
.0832 .0832
2.9 5 7.1 x -.21 0 .21 z
Shaded area exaggerated
Non-standard Normal μ = 5,
σ = 10: P(x 8)
x 8 5
z .30
Normal
10 Standardized Normal
Distribution Distribution
= 10 =1
.5000
.3821
.1179
=5 8 x =0 .30 z
Shaded area exaggerated
Non-standard Normal μ = 5,
σ = 10: P(7.1 X 8)
x 7.1 5 x 8 5
z .21 z .30
10 10
Normal Standardized Normal
Distribution Distribution
= 10 =1
.1179
.0347
.0832
=5 7.1 8 x =0 .21 .30 z
Shaded area exaggerated
Normal Distribution Thinking
Challenge
You work in Quality Control for
GE. Light bulb life has a
normal distribution with
= 2000 hours and = 200
hours. What’s the probability
that a bulb will last
A. between 2000 and 2400
hours?
B. less than 1470 hours?
Finding z-Values
for Known Probabilities
What is Z, given Standardized Normal
P(z) = .1217? Probability Table (Portion)
=1 Z .00 .01 0.2
.1217
0.0 .0000 .0040 .0080
0.1 .0398 .0438 .0478
=0 ?
.31 z 0.2 .0793 .0832 .0871
Shaded area 0.3 .1179 .1217 .1255
exaggerated
Finding x Values
for Known Probabilities
Normal Distribution Standardized Normal Distribution
= 10 =1
.1217 .1217
= 5 8.1
? x = 0 .31 z
x z 5 .3110
Shaded areas exaggerated
Descriptive Methods for
Assessing Normality (optional)
Determining Whether the Data
Are from an Approximately
Normal Distribution
1. Construct either a histogram or stem-and-leaf
display for the data and note the shape of the
graph. If the data are approximately normal,
the shape of the histogram or stem-and-leaf
display will be similar to the normal curve.
Determining Whether the Data
Are from an Approximately
Normal Distribution
2. Compute the intervals x s, x 2s, and x 3s,
and determine the percentage of
measurements falling in each. If the data are
approximately normal, the percentages will be
approximately equal to 68%, 95%, and 100%,
respectively; from the Empirical Rule (68%,
95%, 99.7%).
Determining Whether the Data
Are from an Approximately
Normal Distribution
3. Find the interquartile range, IQR, and standard
deviation, s, for the sample, then calculate the
ratio IQR/s. If the data are approximately
normal, then IQR/s ≈ 1.3.
IQR Q3 Q1
s s
Determining Whether the Data
Are from an Approximately
Normal Distribution
4. Examine a normal
probability plot for the
Expected z–score
data. If the data are
approximately
normal, the points will
fall (approximately)
on a straight line.
Observed value
Normal Probability Plot
A normal probability plot for a data set is a
scatterplot with the ranked data values on one axis
and their corresponding expected z-scores from a
standard normal distribution on the other axis.
[Note: Computation of the expected standard
normal z-scores are beyond the scope of this text.
Therefore, we will rely on available statistical
software packages to generate a normal
probability plot.]
Appendix 2: Sampling
Distribution
Parameter & Statistic
A parameter is a numerical descriptive measure
of a population. Because it is based on all the
observations in the population, its value is almost
always unknown.
A sample statistic is a numerical descriptive
measure of a sample. It is calculated from the
observations in the sample.
Common Statistics &
Parameters
Sample Statistic Population Parameter
Mean x
Standard
Deviation s
Variance s2
Binomial ^
p p
Proportion
Sampling Distribution
The sampling distribution of a sample statistic calculated from a sample of
n measurements is the probability distribution of the statistic.
Collect randomly n measurements for each sample from the population
The statistics of these samples follow a distribution, namely sampling
distribution.
Example
Sampling Distributions
Suppose There’s a Population ...
Population size, N = 4
Random variable, x
Values of x: 1, 2, 3, 4
Uniform distribution
© 1984-1994 T/Maker Co.
Population Characteristics
Summary Measure Population Distribution
N P(x)
.3
xi .2
i1 2.5 .1
N .0 x
1 2 3 4
All Possible Samples
of Size n = 2
16 Samples 16 Sample Means
1st 2nd Observation 1st 2nd Observation
Obs 1 2 3 4 Obs 1 2 3 4
1 1,1 1,2 1,3 1,4 1 1.0 1.5 2.0 2.5
2 2,1 2,2 2,3 2,4 2 1.5 2.0 2.5 3.0
3 3,1 3,2 3,3 3,4 3 2.0 2.5 3.0 3.5
4 4,1 4,2 4,3 4,4 4 2.5 3.0 3.5 4.0
Sample with replacement
Sampling Distribution
of All Sample Means
16 Sample Means Sampling Distribution
1st 2nd Observation of the Sample Mean
Obs 1 2 3 4
1 1.0 1.5 2.0 2.5 P(x)
.3
2 1.5 2.0 2.5 3.0 .2
.1
3 2.0 2.5 3.0 3.5 .0 x
1.0 1.5 2.0 2.5 3.0 3.5 4.0
4 2.5 3.0 3.5 4.0
Summary Measure of
All Sample Means
x i
1.0 1.5 ... 4.0
X i 1
2.5
N 16
Comparison
Population Sampling Distribution
P(x) P(x)
.3 .3
.2 .2
.1 .1
.0 x
.0 x
1 2 3 4 1.0 1.5 2.0 2.5 3.0 3.5 4.0
2.5 x 2.5
The Sampling Distribution
of a Sample Mean and the
Central Limit Theorem
Properties of the Sampling
Distribution of x
1. Mean of the sampling distribution equals mean
of sampled population, that is,
x E x .
2. Standard deviation of the sampling distribution
equals Standard deviation of sampled population
Square root of sample size
That is, x .
n
Standard Error of the Mean
The standard deviation x is often referred
to as the standard error of the mean.
Theorem
If a random sample of n observations is selected
from a population with a normal distribution, the
sampling distribution of x will be a normal
distribution.
Sampling from
Normal Populations
Central Tendency Population Distribution
x = 10
Dispersion
= 50 x
x
n Sampling Distribution
Sampling with n=4 n =16
replacement x = 5 x = 2.5
x- = 50 x
Standardizing the Sampling
Distribution of x
x x x
z
x
Sampling n Standardized Normal
Distribution Distribution
x = 1
x x =0 z
Central Limit Theorem
Consider a random sample of n observations
selected from a population (any probability
distribution) with mean μ and standard deviation .
Then, when n is sufficiently large, the sampling
distribution of x will be approximately a normal
distribution with mean x and standard
deviation x n . The larger the sample size,
the better will be the normal approximation to the
sampling distribution of x .
Central Limit Theorem
As sample x
n
size gets
sampling
large
distribution
enough
becomes almost
(n 30) ...
normal.
x x
Central Limit Theorem
x
x
The Sampling Distribution
of the Sample Proportion
Sample Proportion
Just as the sample mean is a good estimator of the
population mean, the sample proportion—denoted
p̂
p̂
— is a good estimator of the population
proportion p. How good the estimator is will
depend on the sampling distribution of the statistic.
x. similar to
This sampling distribution has properties
those of the sampling distribution of
Sample Distribution of p̂
1. Mean of the sampling distribution is equal to the
true binomial proportion, p; that is, E ( pˆ ) p.
Consequently, p̂ is an unbiased estimator of p
2. Standard deviation of the sampling distribution is
equal to p (1 p ) / n ; that is,
3. For large samples, the sampling distribution is
approximately normal. (A sample is considered
large if p (1 p ) / n .
pˆ
npˆ 15 and n(1 pˆ ) 15.)
Example
Suppose you’re interested in the average amount of
money that students in this class (the population)
have on them. How would you find out?
Confidence intervals of
parameters
Target Parameter
The unknown population parameter (e.g., mean or
proportion) that we are interested in estimating is
called the target parameter.
The type of data (quantitative or qualitative)
collected is indicative of the target parameter. With
quantitative data, you are likely to be estimating the
mean or variance of the data. With qualitative data
with two outcomes (success or failure), the binomial
proportion of successes is likely to be the parameter
of interest.
Target Parameter
Determining the Target Parameter
Parameter Key Words of Phrase Type of Data
µ Mean; average Quantitative
p Proportion; percentage
fraction; rate Qualitative
Estimates
If the sampling distribution of a sample statistic
has a mean equal to the population parameter
the statistic is intended to estimate, the statistic
is said to be an unbiased estimate of the
parameter.
If the mean of the sampling distribution is not
equal to the parameter, the statistic is said to be
a biased estimate of the parameter.
Point Estimator
A point estimator of a population parameter is a
rule or formula that tells us how to use the sample
data to calculate a single number that can be used
as an estimate of the target parameter.
Point Estimation
Provides a single value: based on
observations from one sample
Gives no information about how close the
value is to the unknown population parameter
Example: Sample mean x = 3 is the point
estimate of the unknown population mean
Interval Estimator
An interval estimator (or confidence
interval) is a formula that tells us how to use
the sample data to calculate an interval that
estimates the target parameter.
Interval Estimation
Provides a range of values
Based on observations from one sample
Gives information about closeness to unknown
population parameter
• Stated in terms of probability
– Knowing exact closeness requires knowing unknown
population parameter
Example: Unknown population mean lies between
50 and 70 with 95% confidence
Estimation Process
Population Random Sample
I am 95%
Mean confident that
Mean, , is x = 50 is between 40 &
unknown 60.
Sample
Key Elements of
Interval Estimation
Sample statistic
Confidence
(point estimate)
interval
Confidence Confidence
limit (lower) limit (upper)
A confidence interval provides a range of
plausible values for the population parameter.
Example-Overdue
Data set: OVRDUE
Confidence Interval
The Central Limit Theorem:
The sampling distribution of the sample mean is approximately normal for large samples.
The interval estimator:
For large samples, the fact that sigma is unknown The sample standard deviation s provides a very
good approximation to sigma.
1.96
x 1.96 x x
n
Confidence Interval
If sample measurements yield a value of x that falls
between the two lines on either side of µ, then the
interval x 1.96 x will contain µ.
95% Confidence Level
If our confidence level is 95%, then in the long run,
95% of our confidence intervals will contain µ and
5% will not.
To choose a different confidence coefficient we
increase or decrease the area (call it ) assigned to
the tails. If we place /2 in each tail
and z/2 is the z-value, the
confidence interval with
coefficient (1 – ) is
x z 2 x .
Large-Sample
(1 – )% Confidence Interval for µ
x z 2 x x z 2 / n
where z/2 is the z-value with an area /2 to its right
and in the standard normal distribution.
The parameter is the standard deviation of the
sampled population, and n is the sample size.
Note: When is unknown and n is large (n ≥ 30),
the confidence interval is approximately equal to
x z 2 s / n
where s is the sample standard deviation.
Required Conditions
1. A random sample is selected from the target
population.
2. The sample size n is large (i.e., n ≥ 30). Due to
the Central Limit Theorem, this condition
guarantees that the sampling distribution of
is approximately normal. Also, for large n, s will
be a good estimator of .
Thinking Challenge
You’re a Q/C inspector for
Gallo. The for 2-liter bottles
is .05 liters. A random sample
of 100 bottles showed x =
1.99 liters. What is the 90%
confidence interval estimate
of the true mean amount in 2-
liter bottles?
22 liter
liter
© 1984-1994 T/Maker Co.
Problem
Unoccupied seats on flights cause airlines to lose
revenue. Suppose a large airline wants to
estimate its average number of unoccupied seats
per flight over the past year. To accomplish this,
the records of 225 flights are randomly selected,
and the number of unoccupied seats is noted for
each of the sampled flights. (The data are saved
in the NOSHOW file.)
Example
Example
Small Sample Unknown
Instead of using the standard normal statistic
x µ x µ
z
x n
use the t–statistic
x µ
t
s n
in which the sample standard deviation, s, replaces
the population standard deviation, .
Student’s t-Statistic
The t-statistic has a sampling distribution very much like
that of the z-statistic: mound-shaped, symmetric, with
mean 0.
The primary
difference between
the sampling
distributions of t and
z is that the t-
statistic is more
variable than the z-
statistic.
Degrees of Freedom
The actual amount of variability in the sampling
distribution of t depends on the sample size n. A
convenient way of expressing this dependence is to say
that the t-statistic has (n – 1) degrees of freedom (df).
Student’s t Distribution
Standard
Normal
Bell-Shaped
t (df = 13)
Symmetric
‘Fatter’ Tails
t (df = 5)
z
t
0
t - Table
t-value
If we want the t-value with an area of .025 to its right
and 4 df, we look in the table under the column t.025 for
the entry in the row corresponding to 4 df. This entry
is t.025 = 2.776. The corresponding standard normal z-
score is z.025 = 1.96.
Small-Sample
Confidence Interval for µ
s
x t 2
n
where ta/2 is based on (n – 1) degrees of freedom.
Required Conditions
1. A random sample is selected from the target
population.
2. The population has a relative frequency
distribution that is approximately normal.
Thinking Challenge
You’re a time study analyst
in manufacturing. You’ve
recorded the following task
times (min.):
3.6, 4.2, 4.0, 3.5, 3.8, 3.1.
What is the 90% confidence
interval estimate of the
population mean task time?
Problem
Facial structure of CEOs. In Psychological Science (Vol. 22, 2011),
researchers reported that a chief executive officer’s facial structure
can be used to predict a firm’s financial performance. The study
involved measuring the facial width to-height ratio (WHR) for each in
a sample of 55 CEOs at publicly traded Fortune 500 firms. These WHR
values (determined by a computer analyzing a photo of the CEO’s
face) had a mean of x = 1.96 and a standard deviation of s = .15.
a. Find and interpret a 95% confidence interval for m, the mean facial
WHR for all CEOs at publicly traded Fortune 500 firms.
b. The researchers found that CEOs with wider faces (relative to
height) tended to be associated with firms that had greater financial
performance. They based their inference on an equation that uses
facial WHR to predict financial performance. Suppose an analyst
wants to predict the financial performance of a Fortune 500 firm
based on the value of the true mean facial WHR of CEOs. The analyst
wants to use the value of m = 2.2. Do you recommend he use this
value?
Large-Sample Confidence
Interval for a Population
Proportion
Problem
A food-products company conducted a market
study by randomly sampling and interviewing
1,000 consumers to determine which brand of
breakfast cereal they prefer. Suppose 313
consumers were found to prefer the company’s
brand. How would you estimate the true fraction
of all consumers who prefer the company’s cereal
brand?
Sampling Distribution of p̂
1. The mean of the sampling distribution of p̂ is p;
that is, p̂ is an unbiased estimator of p.
2. The standard deviation of the sampling
distribution of p̂ is pq n ; that is, p̂ pq n
where q = 1–p.
3. For large samples, the sampling distribution of p̂
is approximately normal. A sample size is
considered large if both np̂ 15 and nq̂ 15.
Large-Sample Confidence
Interval for p̂
pq ˆˆ
pq
pˆ z 2 pˆ pˆ z 2 pˆ z 2
n n
x
where p̂ and q̂ 1 p̂.
n
Note: When n is large, p̂ can approximate the
value of p in the formula for p̂ .
Conditions Required for a Valid
Large-Sample Confidence
Interval for p
1. A random sample is selected from the target population.
2. The sample size n is large. (This condition will be
satisfied if both np̂ 15 and nq̂ 15 . Note that np̂
and nq̂ are simply the number of successes and
number of failures, respectively, in the sample.).
Estimation Example
Proportion
A random sample of 400 graduates showed 32
went to graduate school. Set up a 95% confidence
interval estimate for p.
ˆˆ
pq ˆˆ
pq 32
pˆ Z /2 p pˆ Z /2 pˆ 0.08
n n 400
.08 .92 .08 .92
.08 1.96 p .08 1.96
400 400
.053 p .107
Thinking Challenge
You’re a production
manager for a newspaper.
You want to find the %
defective. Of 200
newspapers, 35 had
defects. What is the 90%
confidence interval estimate
of the population
proportion defective?
Problem
Adjusted (1 – )100% Confidence
Interval for a Population Proportion, p
p1 p
p z 2
n4
x2
where p is the adjusted sample proportion of observations with the characteristic
of interest, xis the n 4 of successes in the sample, and n is the sample size.
number
Determining the Sample Size
Sample size and C.I.
Sampling Error
In general, we express the reliability associated
with a confidence interval for the population mean
µ by specifying the sampling error within which
we want to estimate µ with 100(1 –)% confidence.
The sampling error (denoted SE), then, is equal to
the half-width of the confidence interval.
Sample Size Determination for 100(1 – )
% Confidence Interval for µ
In order to estimate µ with a sampling error (SE)
and with 100(1 – )% confidence, the required
sample size is found as follows:
z 2 SE
n
The solution for n is given by the equation
2
z /2
n
SE
Sample Size Example
What sample size is needed to be 90% confident
the mean is within 5? A pilot study suggested
that the standard deviation is 45.
1.645 45
2 2 2 2
(z 2 )
n 219.2 220
(SE) 2 5
2
Sample Size Determination for 100(1 – )
% Confidence Interval for p
In order to estimate p with a sampling error SE and
with 100(1 – )% confidence, the required sample
size is found by solving the following equation for n:
pq
z 2 SE
n
The solution for n can be written as follows:
z pq
2
2
Note: Always round n
n up to the nearest
SE 2
integer value.
Sample Size Example
What sample size is needed to estimate p
within .03 with 90% confidence?
width .03
SE .015
2 2
(Z 2 ) pq
2
1.645 .5 .5
2
n 3006.69 3007
(SE) 2 .015 2
Thinking Challenge
You work in Human Resources at Merrill Lynch.
You plan to survey employees to find their
average medical expenses. You want to be
95% confident that the sample mean is within ±
$50.
A pilot study showed that was about $400.
What sample size do you use?
Confidence Interval for a
Population Variance
Confidence Interval for a
Population Variance
Conditions Required for a Valid
Confidence Interval for 2
1. A random sample is selected from the target
population.
2. The population of interest has a relative
frequency distribution that is approximately
normal.
Thinking Challenge
You’re a marketing manager for a 5K race. You
take a random sample of the times of 292 runners
from the last race, with mean of 28.5 minutes and
standard deviation of 8.3 minutes. What is the
95% confidence interval estimate of the
population variance?
Key Ideas
Commonly Used z-Values for a Large-Sample
Confidence Interval
90% CI: (1 – ) = .10 z.05 = 1.645
95% CI: (1 – ) = .05 z.025 = 1.96
98% CI: (1 – ) = .02 z.005 = 2.326
99% CI: (1 – ) = .01 z.005 = 2.575