Chapter Nine
Sampling Distributions
Sampling Distributions
A sampling distribution is created by, as the name suggests,
sampling
There are 2 ways to create a sampling distribution
[Link] actually draw samples of the same size from a population,
calculate the statistic of interest, and then use descriptive
techniques to learn more about the sampling distribution
[Link] method relies on the rules of probability and the laws
of expected value and variance to derive the sampling
distribution
Sampling Distributions
The method we will employ the rules of probability and
the laws of expected value and variance to derive the
sampling distribution
For example, consider the roll of one and two dice…
Population Mean (Expected Value)
The population mean is the weighted average of all of its
values. The weights are the probabilities
This parameter is also called the expected value of X and is
represented by E(X)
Population Variance
The population variance is calculated similarly. It is the
weighted average of the squared deviations from the mean
As before, there is a “short-cut” formulation
The standard deviation is the same as before:
Sampling Distribution of the Mean
The population is created by throwing a fair die infinitely many
times,
with the random variable X = # of spots on any throw
The probability distribution of X is:
x 1 2 3 4 5 6
P(x) 1/6 1/6 1/6 1/6 1/6 1/6
…and the mean and variance are calculated as well:
Sampling Distribution of Two
Dice
A sampling distribution is created by looking at all samples
of size n = 2 (i.e. two dice) and their means…
While there are 36 possible samples of size 2, there are only
11 values for , and some (e.g. = 3.5) occur more
frequently than others (e.g. =1)
Sampling Distribution of Two
Dice
Because the value of the sample mean varies randomly from sample
to sample, we can regard as a new random variable created by
sampling
Since each sample is equally likely, the probability of any one
sample being selected is 1/36
However, can assume only 11 different possible values: 1.0, 1.5,
2.0,…, 6.0, with certain values of occurring more frequently than
others
The value = 1.0 occurs only once, so its probability is 1/36. the
value of = 1.5 can occur in 2 ways – (1,2) and (2,1) – each having
the same probability (1/36). Thus P( = 1.5) = 2/36. the probabilities
of other values of are determined in the same fashion
Sampling Distribution of Two
Dice
The sampling distribution of is shown below:
6/36
P( )
5/36
4/36
P( )
3/36
2/36
1/36
Sampling Distribution of Two
Dice
Mean of the sampling distribution of is equal to the mean
of the population of the toss of a die
Variance of the sampling distribution of is exactly half
of the variance of the population of the toss of a die
Compare…
Compare the distribution of X…
…with the sampling distribution of
Distribution of is different from the distribution of X
However, the 2 random variables are related
Generalize…
We can generalize the mean and variance of the sampling
of two dice:
…to n-dice:
The standard deviation of the
sampling distribution is
called the standard error:
Central Limit Theorem
The sampling distribution of the mean of a random sample
drawn from any population is approximately normal for a
sufficiently large sample size
The larger the sample size, the more closely the sampling
distribution of will resemble a normal distribution
Central Limit Theorem
If the population is normal, then is normally distributed
for all values of n
If the population is non-normal, then is approximately
normal only for larger values of n
In most practical situations, a sample size of 30 may be
sufficiently large to allow us to use the normal distribution
as an approximation for the sampling distribution of
However, if the population is extremely non-normal (for
e.g., bimodal and highly skewed distributions), the
sampling distribution will also be non-normal, even for
moderately large values of n
Sampling Distribution of the
Sample Mean
1.
2.
3. If X is normal, is normal. If X is non-normal, is
approximately normal for sufficiently large sample sizes
Note: the definition of “sufficiently large” depends on the
extent of non-normality of x (e.g. heavily skewed;
multimodal)
Sampling Distribution of the
Sample Mean
We can express the sampling distribution of the mean
simply as
Example 9.1(a)
The foreman of a bottling plant has observed that the
amount of soda in each “32-ounce” bottle is actually a
normally distributed random variable, with a mean of 32.2
ounces and a standard deviation of .3 ounce
If a customer buys one bottle, what is the probability that
the bottle will contain more than 32 ounces?
Example 9.1(a)
We want to find P(X > 32), where the random variable X
(amount of soda in one bottle) is normally distributed and
µ = 32.2 and σ =.3
=1 - NORMDIST(32, 32.2, 0.3, TRUE) = 0.7475
OR
=1 - NORMSDIST(-0.6667) = 0.7475
“There is about a 75% chance that a single bottle of soda
contains more than 32oz”
Example 9.1(b)
The foreman of a bottling plant has observed that the
amount of soda in each “32-ounce” bottle is actually a
normally distributed random variable, with a mean of 32.2
ounces and a standard deviation of .3 ounce
If a customer buys a carton of four bottles, what is the
probability that the mean amount of the four bottles will
be greater than 32 ounces?
Example 9.1(b)
We want to find P(X > 32), where X is normally distributed
With µ = 32.2 and σ =.3
Things we know:
1) X is normally distributed, therefore so will
2) = 32.2 oz
3)
Example 9.1(b)
If a customer buys a carton of four bottles, what is the
probability that the mean amount of the four bottles will
be greater than 32 ounces?
=1 - NORMDIST(32, 32.2, 0.3/SQRT(4), TRUE) = 0.9082
OR
=1 - NORMSDIST(-1.33) = 0.9082
“There is about a 91% chance the mean of the four
bottles will exceed 32oz”
Graphically Speaking
Mean = 32.2
what is the probability that one bottle will what is the probability that the mean of four
contain more than 32 ounces? bottles will exceed 32 oz?
Example 1
The marks of a statistics midterm test are normally
distributed with a mean of 78 and a standard deviation of 6
[Link] proportion of the class has a midterm mark of less
than 75?
[Link] is the probability that a class of 50 has an average
midterm mark that is less than 75?
Example 1
a. We want to find P(X < 75), where the random variable X
(marks of a statistics midterm test) is normally
distributed and µ = 78 and σ = 6
NORMDIST(75, 78, 6, TRUE) = 0.3085
OR
NORMSDIST(-0.5) = 0.3085
Example 1
b. Probability that a class of 50 has an average midterm
mark that is less than 75
NORMDIST(75, 78, 6 /SQRT(50), TRUE) = 0
OR
NORMSDIST(-3.54) = 0
Example 2
Salaries of a Business School’s Graduates
In the advertisements for a large university, the dean of the
School of Business claims that the average salary of the
school’s graduates one year after graduation is $800 per
week with a standard deviation of $100. The salaries are
distributed normally
A second-year student in the business school who has just
completed his statistics course would like to check
whether the claim about the mean is correct
Example 2
Salaries of a Business School’s Graduates
He does a survey of 25 people who graduated one year ago and
determines their weekly salary
He discovers the sample mean to be $750
To interpret his finding, he needs to calculate the probability that
a sample of 25 graduates would have a mean of $750 or less when
the population mean is $800, and the standard deviation is $100
After calculating the probability, he needs to draw some
conclusions
Example 2
We want to find the probability that the sample mean is less than
$750. Thus, we seek
The distribution of X, the weekly income, is likely to be positively
skewed, but not sufficiently so to make the distribution of
non-normal. As a result, we may assume that
is normal with mean
and standard deviation
Example 2
Thus,
NORMDIST(750, 800, 100 /SQRT(25), TRUE) = 0.0062
OR
NORMSDIST(-2.5) = 0.0062
The probability of observing a sample mean as low as $750 when the
population mean is $800 is extremely small. Because this event is quite
unlikely, we would have to conclude that the dean's claim is justified and
student’s claim that average salary is less than $750 is false
Example 3
If a random sample of size 36 is drawn from a population with the
mean of 2 and standard deviation of 0.25. What is the probability
that the sample mean will be greater than 2.1
µ=2
σ = 0.25
n = 36
P( > 2.1) = 1 – P( < 2.1)
= 1-NORMDIST(2.1, 2, 0.25/SQRT(36), TRUE)
= 0.0087
Example 4
A report announced that the mean sale of the new houses sold in a
city is Rs. 3,40,000 and the standard deviation of the price is Rs.
20,000. Answer the following questions based on this information:
a. If you select a random sample of 50, what is the probability that
the sample mean will be within ±Rs. 2500 of the population mean
b. If you select a random sample of 100, what is the probability that
the sample mean will be within ±Rs. 2500 of the population mean
b. If you select a random sample of 200, what is the probability that
the sample mean will be within ±Rs. 2500 of the population mean
Example 4a
µ = 3,40,000
σ = 20,000
n = 50
P(3,40,000 – 2500 < < 3,40,0000 + 2500)
i.e., P(3,37,500 < < 3,42,500)
=NORMDIST(342500, 340000, 20000/SQRT(50), TRUE) –
NORMDIST(337500, 340000, 20000/SQRT(50), TRUE)
= 0.6232
Example 4b
µ = 3,40,000
σ = 20,000
n = 100
P(3,40,000 – 2500 < < 3,40,0000 + 2500)
i.e., P(3,37,500 < < 3,42,500)
=NORMDIST(342500, 340000, 20000/SQRT(100), TRUE) –
NORMDIST(337500, 340000, 20000/SQRT(100), TRUE)
= 0.7887
Example 4c
µ = 3,40,000
σ = 20,000
n = 200
P(3,40,000 – 2500 < < 3,40,0000 + 2500)
i.e., P(3,37,500 < < 3,42,500)
=NORMDIST(342500, 340000, 20000/SQRT(200), TRUE) –
NORMDIST(337500, 340000, 20000/SQRT(200), TRUE)
= 0.9229
Example 5
The researcher in a company has observed that amount of soda
filled in each bottle is 500ml. The fill in the bottle is normally
distributed with a mean of 490ml and a standard deviation of 20ml.
Find the probability that, if a customer buys a carton of 16 bottles,
the mean amount of fill in bottle will be between 485ml and 500ml
µ = 490
σ = 20
n = 16
P(485 < < 500)
= NORMDIST(500, 490, 20/SQRT(16), TRUE) – NORMDIST (485, 490,
20/SQRT(16), TRUE)
= 0.8186