27/10/2016
Inferences on Population
Proportions
Use Calculation from Sample to
Estimate Population Parameter
(select)
Population Sample
(describes) (calculate)
(estimate)
Parameter Statistic
p? pˆ 63%
Statistic Parameter
Describes a Describes a
sample. population.
Always known Usually unknown
Changes upon Is fixed
repeated sampling.
Examples: Examples:
x , s 2 , s, pˆ , 2 , , p
27/10/2016
A Statistic is a Random Variable
Upon repeated sampling of the same
population, the value of a statistic changes
variable.
While we don’t know what the next sample
will yield, we do know the overall pattern
over many, many samplings random.
The distribution of possible values of a
statistic for repeated samples of the same
size from a population is called the
sampling distribution of the statistic.
Proportion
We are interested in the distribution of
y
pˆ
n
Note, p̂ is cY where c = 1/n is a constant and
Y is a binomial random variable.
If Y is normally distributed, cY will also be
normally distributed.
If Y is normal cY is normal
90 92 94 96 98 100 102 104 106 108 110 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
y cy
0.5y
For example: If Y is n(µ=100,σ2=4),
then (0.5)Y is n(µ=50, σ2=1)
27/10/2016
Distribution of Sample Proportions
A normal curve can be used to
approximate the distribution of sample
proportions if:
The size of the sample or number of
repetitions is relatively large (say 30 or
more).
While the sample size is relatively small
compared to the population size (say <
10%)
Sampling Distribution for p̂
The sampling distribution of p̂ based on
large n is approximately normal.
Sampling Distribution for p̂
To completely define the normal distribution of
p̂
We need the mean (expected value) and variance.
27/10/2016
Sampling Distribution of p̂
1-α
p(1 p) p p (1 p)
p z / 2 p z / 2
n n
p(1 p)
So, at most, p̂ will be z n
/2away from p, (1-
α)100% of the time. We call this (1-α)100% the level
of confidence.
Repeated Sampling of Size n
95% 95% of the time
our estimate
will be within
p (1 p )
1.96
n
of the truth.
Standard Error
We don’t know the value of p, so we will
use p̂
When we use p̂ , we have an estimate of
the standard deviation for the sampling
distribution of p̂ .
We call this estimate the standard error.
pˆ (1 pˆ )
n
27/10/2016
Confidence Interval for p
pˆ (1 pˆ )
(1 )100%CI for p : pˆ z / 2
n
Example:
pˆ (1 pˆ )
95%CI for p : pˆ 1.96
n
Confidence Interval Based on
Normal Distribution
Pt Est ( Z Value) (Standard Error)
pˆ (1 pˆ )
pˆ z / 2
n
Standard error is our estimate of the standard
deviation for the distribution of the point
estimate.
Effect of Sample Size on a
Confidence Interval
At a given level of confidence, as one
increases the sample size what happens to
the width of the confidence interval?
So choose your sample size before you
estimate a proportion with a confidence
interval
27/10/2016
Sample Size
pˆ (1 pˆ )
Maximum Error e z/2
n
(solving for n)
z2 / 2 pˆ (1 pˆ )
n
e2
Sample Size
If you can’t guess a value for p̂ :
z2 / 2 (.5)(.5) z2 / 2
n 2
2
e 4e
What sample size would you recommend
to estimate the proportion of exeedences
with a 95% confidence interval and
maximum error of 2%?
Research Update!
Recent research shows that we get better
coverage for (1-α)100% CI’s on
p-hat if we alter the CI formula.
pˆ (1 pˆ )
95%CI: pˆ z / 2
n4
where x2
pˆ (Agresti/Coull)
n4
27/10/2016
Estimation Procedures: An Example
Basic Logic
In estimation procedures, statistics
calculated from random samples are
used to estimate the value of
population parameters.
Example:
If we know 42% of a random sample
drawn from a city are Republicans, we
can estimate the percentage of all city
residents who are Republicans.
Basic Logic
Information from
POPULATION
samples is used to
estimate
information about SAMPLE
the population.
Statistics are used
to estimate PARAMETER
parameters.
STATISTIC
27/10/2016
Basic Logic
Sampling Distribution
is the link between POPULATION
sample and
population.
The value of the
parameters are
SAMPLING DISTRIBUTION
unknown but
characteristics of the
S.D. are defined by
theorems.
SAMPLE
Two Estimation Procedures
A point estimate is a sample statistic
used to estimate a population value.
A newspaper story reports that 74% of a
sample of randomly selected Americans
support capital punishment.
Confidence intervals consist of a
range of values.
”between 71% and 77% of Americans
approve of capital punishment.”
Constructing Confidence
Intervals For Means
Set the alpha (probability that the interval
will be wrong).
Setting alpha equal to 0.05, a 95% confidence
level, means the researcher is willing to be
wrong 5% of the time.
Find the Z score associated with alpha.
If alpha is equal to 0.05, we would place half
(0.025) of this probability in the lower tail and
half in the upper tail of the distribution.
Substitute values into equation.
27/10/2016
Confidence Intervals For Means
For a random sample of 178
households, average TV viewing was
6 hours/day with s = 3. Alpha = .05.
N=178.
c.i. = 6.0 ±1.96(3/√177)
c.i. = 6.0 ±1.96(3/13.30)
c.i. = 6.0 ±1.96(.23)
c.i. = 6.0 ± .44
Confidence Intervals For Means
We can estimate that households in this
community average 6.0±.44 hours of TV
watching each day.
Another way to state the interval:
5.56≤μ≤6.44
We estimate that the population mean is greater
than or equal to 5.56 and less than or equal to
6.44.
This interval has a .05 chance of being
wrong.
Confidence Intervals For Means
Even if the statistic is as much as
±1.96 standard deviations from the
mean of the sampling distribution the
confidence interval will still include
the value of μ.
Only rarely (5 times out of 100) will
the interval not include μ.
27/10/2016
Other confidence levels
Confidence Alpha Alpha/2 Z score
level
90% .10 .05 +/- 1.65
95% .05 .025 +/- 1.96
99% .01 .0050 +/- 2.58
99.9% .001 .0005 +/- 3.29
Constructing Confidence Intervals
For Proportions
Procedures:
Set alpha.
Find the associated Z score.
Substitute the sample information into
Formula
Confidence Intervals For
Proportions
If 42% of a random sample of 764 from a
Midwestern city are Republicans, what % of
the entire city are Republicans?
Don’t forget to change the % to a
proportion.
c.i. = .42 ±1.96 (√.25/764)
c.i. = .42 ±1.96 (√.00033)
c.i. = .42 ±1.96 (.018)
c.i. = .42 ±.04
27/10/2016
Confidence Intervals For
Proportions
Changing back to %s, we can estimate that
42% ± 4% of city residents are
Republicans.
Another way to state the interval:
38%≤Pu≤ 46%
We estimate the population value is greater than
or equal to 38% and less than or equal to 46%.
This interval has a .05 chance of being
wrong.
SUMMARY
In this situation, identify the
following:
Population
Sample
Statistic
Parameter
SUMMARY
Population = All residents of the
city.
Sample = the 764 people selected
for the sample and interviewed.
Statistic = Ps = .42 (or 42%)
Parameter = unknown. The % of all
residents of the city who are
Republicans.
27/10/2016
Estimating Population Parameters:
Another Example
Estimating Population Proportion
Example: When surveying 500 people selected at random
from the general population we get 200 responses of “yes”
when asked if they like broccoli. Estimate the proportion of
the population that likes broccoli.
p’ = x/n = 200/500 = 0.40
q’ = 1 - p’ = 0.60
np’ = 200 > 25 and nq’ = 300 > 25 normal aprox. is
OK
At the 95% confidence level we have /2 = 0.025 and z/2 = 1.96
The margin for error is E = z/2(.4*.6/500) = 1.96(.022) = 0.043
P(p’- 0.043 < p < p’+ 0.043) = P(.357 < p < .443) = 0.95
Estimating Population Proportion
There are two user controlled factors that determine the
margin of error
E = z/2(p’q’/n) (1)
1. The confidence level 1 -
The smaller , the greater z/2 and the greater E
2. The sample size n
The larger a value of n, the smaller E
If the experimenter wants to fix the width of the confidence
interval (by setting E to a pre-determined constant) and set
the confidence level (by selecting a particular ), then we
can use equation (1) above to determine the sample size
needed to achieve this level of precision.
27/10/2016
Estimating Population Proportion
E = z/2(p’q’/n) (1)
Set E and to desired values E and z/2 are
constant. Solve equation (1) for n
n = z/22(p’q’)/E2 (2)
In equation (2) we have not yet taken a sample from the
population, so we cannot be sure what the proportion of
successes might be. The value of p’ that we use in this
equation may be an estimate that we make based upon
some prior knowledge, or, we may chose p’ = q’ = 0.5,
which maximizes n for particular choice of and E
Estimating Population Proportion
Returning to our previous example, suppose we choose = 0.05,
E = 0.025, and have no prior knowledge about p’
The from equation (2) on the previous slide we obtain
n = z/22(p’q’)/E2
Where
z/2 = 1.96 and
n= (1.96)2 (0.25)/0.0252
n = 1536.4 1537
Estimating the population mean, known
Assume:
•Sample size, n > 30
•Population standard deviation is known
Then from the Central Limit Theorem, we know that the
sampling distribution for samples of size n
•Has mean x’ = , the mean of the original population
•Has x’ = /n
27/10/2016
Estimating the population mean, known
Let x’ be the mean of the sample of size n, then we
have for a confidence interval 1 - given by
P(x’ – z/2/n < < x’ + z/2/n) = 1 -
Let the margin of error E = z/2/n
Then with a probability of 1 - the population
mean lies between x’ – E and x’ + E
Estimating the population mean, known
x mean from sample of size n
E ’
If the mean were 1 the probability of
getting a sample x’ would be /2
1 /2
E
If the mean were 2 the
probability of getting a
sample x’ would be /2
/2 2
Estimating Population mean with unknown
If the population standard deviation is unknown, the
sample will have to provide both an estimate on the
population mean and standard deviation.
Estimation of the population mean will be similar to how it
was done when is known, but the sample standard
deviation will be used instead. Student’s t-distribution will
be used to determine the margin of error, and the confidence
interval will be somewhat wider than it would be for the
same sample size if were known.
27/10/2016
Estimating Population mean with unknown
Step 1. For a sample of size n, n 30, calculate the sample
mean x’, and sample standard deviation s
Recall: the sample variance s2 =(fixi2 – (fixi)2)/(n – 1)
Step 2. Convert to a standard t-score
t = (x’ - )/(s/n)
Where is the (unknown) mean of both the original
population and the sampling distribution
Step 3. Select a confidence level 1 - , and determine t/2
Step 4. Find the margin of error E = t/2 (s/n)
Then P(x’ – E x’ + E) = 1 -
Estimating Population mean with unknown
Finding t/2
Choose = 0.05 , and assume n = 30 – From table A-3
Degrees of Area in One Tail
freedom 0.005 0.01 0.025 0.05
……………………………………………………….
29 2.756 2.462 2.045 1.699
Then t/2 = 2.045 for n –1 = 29 degrees of freedom
Estimating Population mean with unknown
Using the TI calculator to find confidence interval for a
statistic with t-distribution
Let n = 106, x’ = 98.2, s = 0.62
Construct 99% Confidence Interval
Step 1: Select STAT > TESTS scroll down to 8: TInterval
Step 2: Select Stats if x’, n, and s are known
Select Data if these values are to be calculated from a
list
Step 3: (Stats) Use the arrow key to move to each prompt and
enter the values given above. Then Calculate <enter>
Answer: TInterval (98.081, 98.319)
27/10/2016
Estimating Population Variance
Requirements:
1. The sample is a simple random sample
2. The population must have normally distributed
values
The sample variance has 2 distribution
2 = (n – 1) s2/2
Where
n = sample size
s2 = sample variance
2 = population variance (to be determined)
Estimating Population Standard Deviation
Confidence Interval for the Population Variance:
(n-1)s2/2R < 2 < (n-1)s2/2L
/2 /2
0 2
2L 2R
2 distribution is skewed right and always positive
Estimating Population Standard Deviation
Example: Given the following data, find the 95% confidence
interval for the population standard deviation
n = 41, x’ = 67200, s = 18277
P{(n-1)s2/2R < 2 < (n-1)s2/2L} = 0.95
First find 2R and 2L when each tail of the distribution
contains 2.5% of the area under the curve
From Table A-4 for the Chi-Square Distribution
Degrees of Area to the right of the Critical Value
Freedom 0.995 0.99 0.975 0.95 0.10 0.05 0.025
………………………………………………………………………………………….
40 20.707 22.164 24.443 26.509 51.805 55.758 59.342
27/10/2016
Estimating Population Standard Deviation
From the previous slide we have 2R = 59.342 and 2L = 24.433
And therefore:
(n-1)s2/2R = 40 (18277)2/59.342 = 2.2516 x 108
and
(n-1)s2/2L = 40 (18277)2/24.433 = 5.4688 x 108
Thus we have:
P(2.2516 x 108 < 2 < 5.4688 x 108) = 0.95
and for the standard deviation
P(15,005 < < 23385) = 0.95