0% found this document useful (0 votes)
7 views45 pages

Understanding Sampling Methods and Distributions

Uploaded by

pari362022
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views45 pages

Understanding Sampling Methods and Distributions

Uploaded by

pari362022
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Topic 6

Sampling and Sampling Distributions


Why Sample?

• Selecting a sample is less time-consuming than


selecting every item in the population (census).

• Selecting a sample is less costly than selecting every


item in the population.

• An analysis of a sample is less cumbersome and more


practical than an analysis of the entire population.
Selection of Class Representatives

Unbiased
Sample Unbiased,
representative sample
Male students
drawn at random from
Female students
Population the entire population

Biased
Sample
Biased, unrepresentative
Female sample drawn consisting
Male students students of more female students
Population
than males
Sampling Process begins with a Sampling Frame

• The sampling frame is a listing of items that make up the population


• Frames are data sources such as population lists, directories, or maps
• Inaccurate or biased results can result if a frame excludes certain
portions of the population
• Using different frames to generate data can lead to dissimilar
conclusions
Types of Sampling

Sampling

Non-Probability Probability Sampling


Sampling

Simple Random Stratified


Convenience Judgement Quota Snowball

Systematic Cluster
Types of Sampling: Non-probability Sampling
In non-probability sampling, items included are chosen without regard
to their probability of occurrence.
• In convenience sampling, items are selected based only on the fact that they are
easy, inexpensive, or convenient to sample.
• In judgment sampling, one gets the opinions of pre-selected individuals or
experts in the subject matter.
• In quota sampling, individuals or items are selected on the basis of specific traits
or qualities. Some fixed number of units are selected including all the traits.
• In snowball sampling, research units are selected with the help of other research
units. It is used where potential participants are difficult to identify. For example,
customers in life insurance, network marketing, survey on ‘social evils’ etc.
Types of Sampling: Probability Sampling

In probability sampling, items in the sample are chosen on


the basis of known probabilities.

Probability Sampling

Simple Random Stratified Random Systematic Cluster


Probability Sampling: Simple Random Sampling

• Every individual or item from the frame has an equal chance


of being selected

• Selection may be with replacement (selected individual is


returned to frame for possible reselection) or without
replacement (selected individual isn’t returned to the frame).

• Samples are obtained using either lottery method or random


number tables or computer random number generators.
Selecting a Simple Random Sample using ‘Random Number Table’

Portion Of A Random Number Table


Sampling Frame For 49280 88924 35779 00283 81163 07275
Population With 850 11100 02340 12860 74697 96644 89439
09893 23997 20048 49420 88872 08401
Items
Item Name Item #
Bev R. 001 The First 12 Items in a simple random sample: first
3 digits should be between 001 to 850
Ulan X. 002
. . Item # 49280 - select Item # 11100 - select
. . Item # 88924 - ignore Item # 02340 - select
. . Item # 35779 - select Item # 12860 - select
Item # 00283 - select Item # 74697- select
. . Item # 81163 - select Item # 96644 - ignore
Joann P. 849 Item # 07275 - select Item # 89439 - ignore
Paul F. 850
Probability Sampling: Stratified Random Sampling

• Divide population into two or more subgroups (called strata) according to


some common characteristic
• A simple random sample is selected from each subgroup, with sample sizes
proportional to strata sizes
• Samples from subgroups are combined into one
• This is a common technique when sampling population of voters, stratifying
across racial or socio-economic lines.

Population
Divided
into 4
strata

Chap 7-10
Probability Sampling: Systematic Sampling
(Pseudo Random Sampling)

• Decide on sample size: n (say, 100)


• Divide frame of N (say, 5000) individuals into groups of k
individuals: (skip interval) k=N/n = 5000/100 = 50
• Randomly select one individual from the 1st group
• Select every kth individual thereafter

N = 40 First Group
n=4
k = 10
Probability Sampling: Cluster Sampling

• Population is divided into several “clusters,” each representative of the


population
• A simple random sample of clusters is selected
• All items in the selected clusters can be used, or items can be chosen from a
cluster using another probability sampling technique
• A common application of cluster sampling involves election exit polls, where
certain election districts are selected and sampled.

Population
divided into
16 clusters. Randomly selected
clusters for sample
Probability Sample: Comparing Sampling Methods

• Simple random sample and Systematic sample


Simple to use
May not be a good representation of the population’s
underlying characteristics
• Stratified sample
Ensures representation of individuals across the entire
population
• Cluster sample
More cost effective
Less efficient (need larger sample to acquire the same level
of precision)
Types of Survey Errors

• Coverage error or selection bias


Exists if some groups are excluded from the frame and have no chance
of being selected

• Non response error or bias


People who do not respond may be different from those who do
respond

• Sampling error
Variation from sample to sample will always exist

• Measurement error
Due to weaknesses in question design, respondent error, and
interviewer’s effects on the respondent (“Hawthorne effect”)
Types of Survey Errors

• Coverage error Excluded from


frame

• Non response error Follow up on


non-responses

• Sampling error Random differences


from sample to sample

• Measurement error Bad or leading


question
Sampling Distributions

• A sampling distribution is a distribution of all of the possible


values of a sample statistic (mean, std dev., proportion etc.) for
a given size of sample selected from a population.

• For example, suppose you sample 50 students from your


college regarding their mean GPA. If you obtain different
samples of size 50, you will compute a different mean for each
sample. We are interested in the distribution of all potential
mean GPAs we might calculate for all samples of 50 students.
Developing a Sampling Distribution

• Assume there is a population …


C D
• Population size N=4 A B

• Random variable, X,
is age of individuals
• Values of X: 18, 20,
22, 24 (years)
Developing a Sampling Distribution

Summary Measures for the Population Distribution:

μ
 X i P(x)
N .3
18  20  22  24
  21 .2
4 .1

 (X  μ) 2 0
18 20 22 24 x
σ i
 2.236
N A B C D

Uniform Distribution
Developing a Sampling Distribution

Now consider all possible samples of size n=2

16 Sample Means
1st 2nd Observation
Obs (statistic)
18 20 22 24
18 18,18 18,20 18,22 18,24 1st 2nd Observation
20 20,18 20,20 20,22 20,24 Obs 18 20 22 24
22 22,18 22,20 22,22 22,24 18 18 19 20 21
24 24,18 24,20 24,22 24,24 20 19 20 21 22

16 possible samples 22 20 21 22 23
(sampling with replacement) 24 21 22 23 24
X- Fr Relative
ba eq freq.
r . (Prob.) Developing a Sampling Distribution
18 1 1/16 =
0.0625 (continued)
19 2 2/16 = Sampling Distribution of All Sample Means
0.125
20 3 3/16 =
16 Sample Means Sample Means
0.1875
21 4 4/16 =
Distribution
0.25 1st 2nd Observation _
22 3 3/16 = Obs 18 20 22 24 P(X)
0.1875 .3
23 2 2/16 = 18 18 19 20 21
0.125 .2
24 1 1/16 = 20 19 20 21 22
0.0625 .1
16
22 20 21 22 23
0 _
24 21 22 23 24 18 19 20 21 22 23 24 X
(no longer uniform)
Developing a
Sampling Distribution
(continued)

Summary Measures of this Sampling Distribution:

μX 
 X
i 18  19  19    24
  21
N 16

σX 
 ( X i  μ X
) 2

(18 - 21)  (19 - 21)    (24 - 21)


2 2 2
  1.58
16
Comparing Population Distribution and
Sample Means Distribution

Population; N = 4 Sample Means Distribution; n = 2

μ  21 σ  2.236 μX  21 σ X  1.58
_
P(X) P(X)
.3 .3

.2 .2
.1 .1
0 X 0
18 19 20 21 22 23 24
_
18 20 22 24 X
A B C D
Sample Mean Sampling Distribution:
Standard Error of the Mean
• Different samples of the same size from the same
population will yield different sample means
• A measure of the variability in the mean from sample to
sample is given by the Standard Error of the Mean:
(This assumes that sampling is with replacement or
sampling is without replacement from an infinite population)

σ
σX 
n
• Note that the standard error of the mean decreases as the
sample size increases
Sample Mean Sampling Distribution:
If the Population is Normal

• If a population is normal with mean μ and standard


deviation σ, the sampling distribution of X is also
normally distributed with

σ
μX  μ and σX 
n
Z-value for Sampling Distribution of Mean

Z-value for the sampling distribution of X

( X  μX ) ( X  μ)
Z 
σX σ
n
where: X = sample mean
μ = population mean
σ = population standard deviation
n = sample size
Sampling Distribution Properties

Normal Population
μx  μ Distribution

μ x
(i.e. x is unbiased ) Normal Sampling
Distribution
(has the same mean)

μx
x
Sampling Distribution Properties
(continued)

As n increases, Larger
σ xdecreases sample size

Smaller
sample size

μ x
Determining An Interval Including A Fixed Proportion of the
Sample Means
Find a symmetrically distributed interval around µ that
will include 95% of the sample means when µ = 368, σ
= 15, and n = 25.

• Since the interval contains 95% of the sample means 5%


of the sample means will be outside the interval
• Since the interval is symmetric 2.5% will be above the
upper limit and 2.5% will be below the lower limit.
• From the standardized normal table, the Z score with
2.5% (0.025) below it is -1.96 and the Z score with 2.5%
(0.025) above it is 1.96.
Determining An Interval Including A Fixed Proportion of the
Sample Means (continued)

• Calculating the lower limit of the interval


σ 15
XL  μ  Z  368  (1.96)  362.12
n 25
• Calculating the upper limit of the interval
σ 15
XU  μ  Z  368  (1.96)  373.88
n 25
• 95% of all sample means of sample size 25 are between 362.12 and
373.88
Sample Mean Sampling Distribution:
If the Population is not Normal

• We can apply the Central Limit Theorem:


• Even if the population is not normal,
• …sample means from the population will be
approximately normal as long as the sample size is large
enough.

Properties of the sampling distribution:

σ
μx  μ and σx 
n
Central Limit Theorem

the sampling
As the n↑
distribution
sample
becomes
size gets
almost normal
large
regardless of
enough…
shape of
population

x
Sample Mean Sampling Distribution:
If the Population is not Normal
(continued)

Population Distribution
Sampling distribution
properties:
Central Tendency
μx  μ
μ x
Sampling Distribution
Variation
σ (becomes normal as n increases)
σx  Larger
n Smaller
sample size
sample
size

μx x
How Large is Large Enough?

• For most distributions, n > 30 will give a sampling


distribution that is nearly normal
• For fairly symmetric distributions, n > 15
• For normal population distributions, the sampling
distribution of the mean is always normally
distributed
Example
• Suppose a population has mean μ = 8 and standard
deviation σ = 3. Suppose a random sample of size n
= 36 is selected.

• What is the standard error of sample mean?

• What is the probability that the sample mean is


between 7.8 and 8.2?
Example
Solution:
• Even if the population is not normally distributed,
the central limit theorem can be used (n > 30)
• … so the sampling distribution of x is approximately
normal
• … with mean μx = 8
σ 3
• …and standard error σx    0.5
n 36
Example
(continued)
Solution (continued):
7.8−8 X −μ 8.2−8
P(7.8 < X < 8.2) = P < σ <
3 3
36 n 36
= P(−0.4 < Z < 0.4) = 0.3108

Population Sampling Standard Normal


Distribution Distribution Distribution .1554
??? +.1554
? ??
? ? Sample Standardize
?? ?
?
-0.4 0.4
μ8 X 7.8
μX  8
8.2
x μz  0 Z
Practice Exercises
1. Mean expenditure of all the visitors in a restaurant is Rs.2000 with a std. deviation of
Rs.250. A random sample of 40 customers was taken, find the probability that
(a) mean expenditure of customers is more than Rs.1928, (b) mean expenditure of
customers is between Rs.1950 and Rs.2030.
(a) Z = Xσ −μ = 1928
250
−2000
= -1.82
n 40

𝐏(𝐗 > 𝟏𝟗𝟐𝟖)


= 𝐏 Z > −1.8𝟐
= 𝑷(−1.82<Z<0) + 𝑷(0<Z<∞)
= 0.4656 + 0.5 0.4656 0.5
= 0.9656

(b) P(1950< 𝐗<2030) Z= -1.82 Z=0


= P(-1.26<Z<0.76) = 0.6726
2. The numerical population of grade point averages at a college has mean 2.61 and
standard deviation 0.5. If a random sample of size 100 is taken from the population,
what is the probability that the sample mean will be between 2.51 and 2.71?
3. A prototype automotive tire has a mean design life of 38,500 miles with a standard
deviation of 2,500 miles. Five such tires are manufactured and tested. Find the
probability that the sample mean will be less than 36,000 miles. Assume that the
distribution of lifetimes of such tires is normal.
4. An automobile battery manufacturer claims that its midgrade battery has a mean life
of 50 months with a standard deviation of 6 months. Suppose the distribution of battery
lives of this particular brand is approximately normal.
(a) On the assumption that the manufacturer’s claims are true, find the probability that
a randomly selected battery of this type will last less than 48 months. (Normal
distribution problem)
(b) On the same assumption, find the probability that the mean life of a random sample
of 36 such batteries will be less than 48 months. (Sampling distribution problem)
Population Proportions

π = the proportion of the population having


some characteristic
• Sample proportion (p) provides an estimate of π.
X number of items in the sample having the characteristic of interest
p 
n sample size

• 0≤ p≤1
• p is approximately distributed as a normal distribution when
n is large
Sampling Distribution of p

• Approximated by a
normal distribution if: Sampling Distribution
P( ps)

nπ  5
.3
.2
.1
and 0 0 .2 .4 .6 8 1 p
n(1  π )  5
where
π(1 π )
μp  π and σp 
n
(where π = population proportion)
Z-Value for Proportions
Standardize p to a Z value with the formula:

p  p 
Z 
σp  (1  )
n
Example

• If the true proportion of voters who support


Proposition A is π = 0.4, what is the probability that
a sample of size 200 yields a sample proportion
between 0.40 and 0.45?

 i.e.: if π = 0.4 and n = 200, what is


P(0.40 ≤ p ≤ 0.45) ?
Example
(continued)

• if π = 0.4 and n = 200, what is


P(0.40 ≤ p ≤ 0.45) ?

 (1  ) 0.4(1 0.4)


Find σ p : σ p    0.03464
n 200

Convert to  0.40  0.40 0.45  0.40 


P(0.40  p  0.45)  P Z 
standardized  0.03464 0.03464 
normal:
 P(0  Z  1.44)
Example
(continued)

• if π = 0.4 and n = 200, what is


P(0.40 ≤ p ≤ 0.45) ?

Use standardized normal table: P(0 ≤ Z ≤ 1.44) = 0.4251

Standardized
Sampling Distribution Normal Distribution

0.4251

Standardize

0.40 0.45 0 1.44


p Z

You might also like