Introduction to Inference
Sampling Distributions
Confidence Intervals
Distribution of Means
(Sampling Distribution)
• It is a hypothetical distribution, however we know its
characteristics
• Its mean is the same as the mean of the population of
individuals
• Its variance is the variance of the population divided
by the sample size
NAEP Quantitative Scores
1. To estimate the unknown population mean m,
x we use = 272, n=840.
2. The law of large numbers → x will be close to m,
but → some error in the estimate.
1. We’ll come back to this…
3. Sampling distribution of →Normal distribution
with 60
2.1
m and standard deviation of 60 n 840
Case Study
NAEP Quantitative Scores
Confidence Intervals (CI)
4. The 68-95-99.7 rule
indicates that
x and m are within two
standard deviations (2.1 x
2= 4.2) of each other in
about 95% of all samples.
P rules
x 4.2 = 272 4.2 = 267.8
BPS - 5th Ed.
x + 4.2 = 272 + 4.2 = 276.2 5
Case Study
NAEP Quantitative Scores
So, if we estimate that m lies within 4.2 of x, we’ll
be right about 95% of the time.
BPS - 5th Ed. 6
Confidence Interval
• How accurate is this estimate? What is the range of possible
means that actually includes the true population mean?
• It is an interval estimate, computed from the statistics of the
sample(s), that might contain the true value of an unknown
population parameter.
• Variation in the population
• Sample size
Confidence Intervals
• We want to generalize these results back to the
populations of interest.
• However, when we generalize these results back to
the population we will still have some error present
in our generalization.
• In order to account for the error, we make our
estimates using a confidence interval
– that is based on the estimated error that is present in our statistic.
Confidence Intervals
n=10 n=20 n=100
1) As sample size increases, CI gets smaller (which is
good! More precise estimation)
Confidence Interval
90% confident
95% confident
99% confident
2) The more confident we wish to be, the larger the
confidence interval (CI) will be.
Confidence Intervals
• Research Companies provide CI for elections
– Sonar, Konda, A&G, etc.
• They make money out of representative
sampling
– CI: % for the political party X is +-3%
– 95% of the time
– AKA: sample and population means are within 2
standard deviations of each other in about 95% of
all samples
Statistical Inference
• Provides methods for drawing conclusions about a
population by using a sample data
• Your text book: SRS (Simple Random Sample)
• Simple conditions for inference about a mean:
1. A SRS from the population of interest.
2. SRS should be from a normal distribution, N( , σ)
3. We don’t know the population mean, but know population
standard deviation.
Previous Example:
Example: Ch. 13, p. 232
1. We do not expect that x-bar
• The most recent health report
would be identical to mu, so we
about Body Mass Index (BMI)
want to tell how accurate our
for 654 women, aged 20 to 29
estimation is.
years.
2. We know the sampling
• The mean BMI of these
distribution (meaning: in
women from a SRS (simple
repeated samples) has a normal
random sample) is = 26.8.
distribution with the mean of mu,
We want to estimate the mu or
and standard deviation of σ/√n).
population mean of 18 millions
So, x-bar of an 654 women has a
of women by using this SRS.
standard deviation of σ/√n =
• σ= 7.5
7.5/√654=0.3 (rounded off)
3. 68-95-99.7 rule, says that x-bar is
within 0.6 (two SD) of the mean
95% of the time. So 26.8 ±0.6
Confidence Interval &
Confidence Level
• We are 95% confident that the mean BMI of all young
women (mu) is some value in that interval of 26.2- 27.4.
• Most confidence intervals (CI) have a similar form to
this:
Estimate ± Margin of error
• So, CI has two parts:
1. An interval has a usual form of estimate ± margin of error.
2. A Confidence Level C gives the probability that the interval
will capture the true parameter in repeated samples. So,
confidence level is the success rate.
• We usually choose Confidence level at or above 90%,
most popular is 95%.
Confidence
interval— The
estimated range of
error associated
with treatment
effect.
Confidence Intervals for a
Population Mean
• Central area C is marked by two points,
+z* and –z*.
• Numbers like z* that mark off the
specified areas: critical values.
• You can find all numbers for many
choices of C at the bottom of the Table C
at the back of your book, but here are the
most common ones:
* two-tailed
Confidence level C 90% 95% 99%
Critical Value z* 1.645 1.960 2.576
Confidence Intervals for a
Population Mean
Confidence level C 90% 95% 99%
Critical Value z* 1.645 1.960 2.576
• Notice: For C=95%, the critical value is
z*=1.96, more precise that 2 based on the
68-95-99.7 rule.
• This is the formula for the CI for the mean
of a normal population
Example
• Here is a SRS of 18 scores: • A 95% confidence interval for
29 27 34 40 22 28 14 35 26 is therefore:
35 12 30 23 18 11 22 23 33
• We will estimate the mean rate
mu by providing a 95% CI.
• The CI below fits this situation:
• = 25.67, and if we know that
σ=8.
• For 95% interval, the critical
value is 1.96.