Topic 6
Sampling and Sampling Distributions
Why Sample?
• Selecting a sample is less time-consuming than
selecting every item in the population (census).
• Selecting a sample is less costly than selecting every
item in the population.
• An analysis of a sample is less cumbersome and more
practical than an analysis of the entire population.
Selection of Class Representatives
Unbiased
Sample Unbiased,
representative sample
Male students
drawn at random from
Female students
Population the entire population
Biased
Sample
Biased, unrepresentative
Female sample drawn consisting
Male students students of more female students
Population
than males
Sampling Process begins with a Sampling Frame
• The sampling frame is a listing of items that make up the population
• Frames are data sources such as population lists, directories, or maps
• Inaccurate or biased results can result if a frame excludes certain
portions of the population
• Using different frames to generate data can lead to dissimilar
conclusions
Types of Sampling
Sampling
Non-Probability Probability Sampling
Sampling
Simple Random Stratified
Convenience Judgement Quota Snowball
Systematic Cluster
Types of Sampling: Non-probability Sampling
In non-probability sampling, items included are chosen without regard
to their probability of occurrence.
• In convenience sampling, items are selected based only on the fact that they are
easy, inexpensive, or convenient to sample.
• In judgment sampling, one gets the opinions of pre-selected individuals or
experts in the subject matter.
• In quota sampling, individuals or items are selected on the basis of specific traits
or qualities. Some fixed number of units are selected including all the traits.
• In snowball sampling, research units are selected with the help of other research
units. It is used where potential participants are difficult to identify. For example,
customers in life insurance, network marketing, survey on ‘social evils’ etc.
Types of Sampling: Probability Sampling
In probability sampling, items in the sample are chosen on
the basis of known probabilities.
Probability Sampling
Simple Random Stratified Random Systematic Cluster
Probability Sampling: Simple Random Sampling
• Every individual or item from the frame has an equal chance
of being selected
• Selection may be with replacement (selected individual is
returned to frame for possible reselection) or without
replacement (selected individual isn’t returned to the frame).
• Samples are obtained using either lottery method or random
number tables or computer random number generators.
Selecting a Simple Random Sample using ‘Random Number Table’
Portion Of A Random Number Table
Sampling Frame For 49280 88924 35779 00283 81163 07275
Population With 850 11100 02340 12860 74697 96644 89439
09893 23997 20048 49420 88872 08401
Items
Item Name Item #
Bev R. 001 The First 12 Items in a simple random sample: first
3 digits should be between 001 to 850
Ulan X. 002
. . Item # 49280 - select Item # 11100 - select
. . Item # 88924 - ignore Item # 02340 - select
. . Item # 35779 - select Item # 12860 - select
Item # 00283 - select Item # 74697- select
. . Item # 81163 - select Item # 96644 - ignore
Joann P. 849 Item # 07275 - select Item # 89439 - ignore
Paul F. 850
Probability Sampling: Stratified Random Sampling
• Divide population into two or more subgroups (called strata) according to
some common characteristic
• A simple random sample is selected from each subgroup, with sample sizes
proportional to strata sizes
• Samples from subgroups are combined into one
• This is a common technique when sampling population of voters, stratifying
across racial or socio-economic lines.
Population
Divided
into 4
strata
Chap 7-10
Probability Sampling: Systematic Sampling
(Pseudo Random Sampling)
• Decide on sample size: n (say, 100)
• Divide frame of N (say, 5000) individuals into groups of k
individuals: (skip interval) k=N/n = 5000/100 = 50
• Randomly select one individual from the 1st group
• Select every kth individual thereafter
N = 40 First Group
n=4
k = 10
Probability Sampling: Cluster Sampling
• Population is divided into several “clusters,” each representative of the
population
• A simple random sample of clusters is selected
• All items in the selected clusters can be used, or items can be chosen from a
cluster using another probability sampling technique
• A common application of cluster sampling involves election exit polls, where
certain election districts are selected and sampled.
Population
divided into
16 clusters. Randomly selected
clusters for sample
Probability Sample: Comparing Sampling Methods
• Simple random sample and Systematic sample
Simple to use
May not be a good representation of the population’s
underlying characteristics
• Stratified sample
Ensures representation of individuals across the entire
population
• Cluster sample
More cost effective
Less efficient (need larger sample to acquire the same level
of precision)
Types of Survey Errors
• Coverage error or selection bias
Exists if some groups are excluded from the frame and have no chance
of being selected
• Non response error or bias
People who do not respond may be different from those who do
respond
• Sampling error
Variation from sample to sample will always exist
• Measurement error
Due to weaknesses in question design, respondent error, and
interviewer’s effects on the respondent (“Hawthorne effect”)
Types of Survey Errors
• Coverage error Excluded from
frame
• Non response error Follow up on
non-responses
• Sampling error Random differences
from sample to sample
• Measurement error Bad or leading
question
Sampling Distributions
• A sampling distribution is a distribution of all of the possible
values of a sample statistic (mean, std dev., proportion etc.) for
a given size of sample selected from a population.
• For example, suppose you sample 50 students from your
college regarding their mean GPA. If you obtain different
samples of size 50, you will compute a different mean for each
sample. We are interested in the distribution of all potential
mean GPAs we might calculate for all samples of 50 students.
Developing a Sampling Distribution
• Assume there is a population …
C D
• Population size N=4 A B
• Random variable, X,
is age of individuals
• Values of X: 18, 20,
22, 24 (years)
Developing a Sampling Distribution
Summary Measures for the Population Distribution:
μ
X i P(x)
N .3
18 20 22 24
21 .2
4 .1
(X μ) 2 0
18 20 22 24 x
σ i
2.236
N A B C D
Uniform Distribution
Developing a Sampling Distribution
Now consider all possible samples of size n=2
16 Sample Means
1st 2nd Observation
Obs (statistic)
18 20 22 24
18 18,18 18,20 18,22 18,24 1st 2nd Observation
20 20,18 20,20 20,22 20,24 Obs 18 20 22 24
22 22,18 22,20 22,22 22,24 18 18 19 20 21
24 24,18 24,20 24,22 24,24 20 19 20 21 22
16 possible samples 22 20 21 22 23
(sampling with replacement) 24 21 22 23 24
X- Fr Relative
ba eq freq.
r . (Prob.) Developing a Sampling Distribution
18 1 1/16 =
0.0625 (continued)
19 2 2/16 = Sampling Distribution of All Sample Means
0.125
20 3 3/16 =
16 Sample Means Sample Means
0.1875
21 4 4/16 =
Distribution
0.25 1st 2nd Observation _
22 3 3/16 = Obs 18 20 22 24 P(X)
0.1875 .3
23 2 2/16 = 18 18 19 20 21
0.125 .2
24 1 1/16 = 20 19 20 21 22
0.0625 .1
16
22 20 21 22 23
0 _
24 21 22 23 24 18 19 20 21 22 23 24 X
(no longer uniform)
Developing a
Sampling Distribution
(continued)
Summary Measures of this Sampling Distribution:
μX
X
i 18 19 19 24
21
N 16
σX
( X i μ X
) 2
(18 - 21) (19 - 21) (24 - 21)
2 2 2
1.58
16
Comparing Population Distribution and
Sample Means Distribution
Population; N = 4 Sample Means Distribution; n = 2
μ 21 σ 2.236 μX 21 σ X 1.58
_
P(X) P(X)
.3 .3
.2 .2
.1 .1
0 X 0
18 19 20 21 22 23 24
_
18 20 22 24 X
A B C D
Sample Mean Sampling Distribution:
Standard Error of the Mean
• Different samples of the same size from the same
population will yield different sample means
• A measure of the variability in the mean from sample to
sample is given by the Standard Error of the Mean:
(This assumes that sampling is with replacement or
sampling is without replacement from an infinite population)
σ
σX
n
• Note that the standard error of the mean decreases as the
sample size increases
Sample Mean Sampling Distribution:
If the Population is Normal
• If a population is normal with mean μ and standard
deviation σ, the sampling distribution of X is also
normally distributed with
σ
μX μ and σX
n
Z-value for Sampling Distribution of Mean
Z-value for the sampling distribution of X
( X μX ) ( X μ)
Z
σX σ
n
where: X = sample mean
μ = population mean
σ = population standard deviation
n = sample size
Sampling Distribution Properties
Normal Population
μx μ Distribution
μ x
(i.e. x is unbiased ) Normal Sampling
Distribution
(has the same mean)
μx
x
Sampling Distribution Properties
(continued)
As n increases, Larger
σ xdecreases sample size
Smaller
sample size
μ x
Determining An Interval Including A Fixed Proportion of the
Sample Means
Find a symmetrically distributed interval around µ that
will include 95% of the sample means when µ = 368, σ
= 15, and n = 25.
• Since the interval contains 95% of the sample means 5%
of the sample means will be outside the interval
• Since the interval is symmetric 2.5% will be above the
upper limit and 2.5% will be below the lower limit.
• From the standardized normal table, the Z score with
2.5% (0.025) below it is -1.96 and the Z score with 2.5%
(0.025) above it is 1.96.
Determining An Interval Including A Fixed Proportion of the
Sample Means (continued)
• Calculating the lower limit of the interval
σ 15
XL μ Z 368 (1.96) 362.12
n 25
• Calculating the upper limit of the interval
σ 15
XU μ Z 368 (1.96) 373.88
n 25
• 95% of all sample means of sample size 25 are between 362.12 and
373.88
Sample Mean Sampling Distribution:
If the Population is not Normal
• We can apply the Central Limit Theorem:
• Even if the population is not normal,
• …sample means from the population will be
approximately normal as long as the sample size is large
enough.
Properties of the sampling distribution:
σ
μx μ and σx
n
Central Limit Theorem
the sampling
As the n↑
distribution
sample
becomes
size gets
almost normal
large
regardless of
enough…
shape of
population
x
Sample Mean Sampling Distribution:
If the Population is not Normal
(continued)
Population Distribution
Sampling distribution
properties:
Central Tendency
μx μ
μ x
Sampling Distribution
Variation
σ (becomes normal as n increases)
σx Larger
n Smaller
sample size
sample
size
μx x
How Large is Large Enough?
• For most distributions, n > 30 will give a sampling
distribution that is nearly normal
• For fairly symmetric distributions, n > 15
• For normal population distributions, the sampling
distribution of the mean is always normally
distributed
Example
• Suppose a population has mean μ = 8 and standard
deviation σ = 3. Suppose a random sample of size n
= 36 is selected.
• What is the standard error of sample mean?
• What is the probability that the sample mean is
between 7.8 and 8.2?
Example
Solution:
• Even if the population is not normally distributed,
the central limit theorem can be used (n > 30)
• … so the sampling distribution of x is approximately
normal
• … with mean μx = 8
σ 3
• …and standard error σx 0.5
n 36
Example
(continued)
Solution (continued):
7.8−8 X −μ 8.2−8
P(7.8 < X < 8.2) = P < σ <
3 3
36 n 36
= P(−0.4 < Z < 0.4) = 0.3108
Population Sampling Standard Normal
Distribution Distribution Distribution .1554
??? +.1554
? ??
? ? Sample Standardize
?? ?
?
-0.4 0.4
μ8 X 7.8
μX 8
8.2
x μz 0 Z
Practice Exercises
1. Mean expenditure of all the visitors in a restaurant is Rs.2000 with a std. deviation of
Rs.250. A random sample of 40 customers was taken, find the probability that
(a) mean expenditure of customers is more than Rs.1928, (b) mean expenditure of
customers is between Rs.1950 and Rs.2030.
(a) Z = Xσ −μ = 1928
250
−2000
= -1.82
n 40
𝐏(𝐗 > 𝟏𝟗𝟐𝟖)
= 𝐏 Z > −1.8𝟐
= 𝑷(−1.82<Z<0) + 𝑷(0<Z<∞)
= 0.4656 + 0.5 0.4656 0.5
= 0.9656
(b) P(1950< 𝐗<2030) Z= -1.82 Z=0
= P(-1.26<Z<0.76) = 0.6726
2. The numerical population of grade point averages at a college has mean 2.61 and
standard deviation 0.5. If a random sample of size 100 is taken from the population,
what is the probability that the sample mean will be between 2.51 and 2.71?
3. A prototype automotive tire has a mean design life of 38,500 miles with a standard
deviation of 2,500 miles. Five such tires are manufactured and tested. Find the
probability that the sample mean will be less than 36,000 miles. Assume that the
distribution of lifetimes of such tires is normal.
4. An automobile battery manufacturer claims that its midgrade battery has a mean life
of 50 months with a standard deviation of 6 months. Suppose the distribution of battery
lives of this particular brand is approximately normal.
(a) On the assumption that the manufacturer’s claims are true, find the probability that
a randomly selected battery of this type will last less than 48 months. (Normal
distribution problem)
(b) On the same assumption, find the probability that the mean life of a random sample
of 36 such batteries will be less than 48 months. (Sampling distribution problem)
Population Proportions
π = the proportion of the population having
some characteristic
• Sample proportion (p) provides an estimate of π.
X number of items in the sample having the characteristic of interest
p
n sample size
• 0≤ p≤1
• p is approximately distributed as a normal distribution when
n is large
Sampling Distribution of p
• Approximated by a
normal distribution if: Sampling Distribution
P( ps)
•
nπ 5
.3
.2
.1
and 0 0 .2 .4 .6 8 1 p
n(1 π ) 5
where
π(1 π )
μp π and σp
n
(where π = population proportion)
Z-Value for Proportions
Standardize p to a Z value with the formula:
p p
Z
σp (1 )
n
Example
• If the true proportion of voters who support
Proposition A is π = 0.4, what is the probability that
a sample of size 200 yields a sample proportion
between 0.40 and 0.45?
i.e.: if π = 0.4 and n = 200, what is
P(0.40 ≤ p ≤ 0.45) ?
Example
(continued)
• if π = 0.4 and n = 200, what is
P(0.40 ≤ p ≤ 0.45) ?
(1 ) 0.4(1 0.4)
Find σ p : σ p 0.03464
n 200
Convert to 0.40 0.40 0.45 0.40
P(0.40 p 0.45) P Z
standardized 0.03464 0.03464
normal:
P(0 Z 1.44)
Example
(continued)
• if π = 0.4 and n = 200, what is
P(0.40 ≤ p ≤ 0.45) ?
Use standardized normal table: P(0 ≤ Z ≤ 1.44) = 0.4251
Standardized
Sampling Distribution Normal Distribution
0.4251
Standardize
0.40 0.45 0 1.44
p Z