CHAPTER 2
SAMPLING AND ESTIMATION
PART 1
03/16/2025 1
In this chapter you will learn about
sampling methods including random and non-random sampling
how to simulate a random sample from a given distribution
the expectation and variance of the sample mean
the distribution of the sample mean the use of the central limit theorem
the distribution of the sample proportion estimates of population parameters:
mean
variance
proportion
confidence intervals for:
a population mean, involving the -distribution
a population mean, involving the -distribution
03/16/2025 population proportion 2
2.1 Sampling
2.1.1 Population
In a statistical enquiry you often need information about a particular group. This group is
known as the population or the target population, and it could be small, large or even infinite.
Note that the word 'population' does not necessarily mean 'people'. Here are some examples
of populations:
pupils in a class,
people in Sri Lanaka in full time employment,
hospitals in Sri Lanka,
cans of soft drink produced in a factory,
ferns in a wood,
rational
03/16/2025 numbers between 0 and 10. 3
2.1.2 Surveys
Information is collected by means of a survey. There are two types:
(a) a census,
(b) a sample surveys.
(a) Census
In a census every member of the population is surveyed. When the population is small, this
could be a straight forward.
When populations are large, taking a census can be very time consuming and difficult to do
with accuracy.
03/16/2025 4
(b). Sample survey
When a survey covers less than 100°/o of the population, it is known as a sample survey. In
many circumstances, taking a sample is preferable to carrying out a census. Sample data
can be obtained relatively cheaply and quickly and, if the sample is representative of the
population, a sample survey can give an accurate indication of the population characteristic
being studied. The size of the sample does not depend on the size of the population .
Sample design
Once the purpose of a survey has been stated precisely, the target population must be defined.
The sampling units must be defined clearly. These are the people or items to be sampled,
03/16/2025 5
Bias
The purpose of sampling is to gain information about the whole population by selecting a
sample from that population. You want the sample to be representative of the population so
you must give every member of the population an equal chance of being included in the
sample. This should eliminate any bias in the selection of the sample.
Sources of bias include
(a) The lack of a good sampling frame:
- using the telephone directory misses all those who do not have a telephone or whose
number is ex-directory,
03/16/2025 6
(b). the wrong choice of sampling unit:
- choosing an individual rather than a particular group such as 'household'.
(c). non-response by some of the chosen units:
- it might be difficult to locate the particular unit, the cooperation of the respondent might
not have been obtained,
- the enquiry might not have been understood, for example, a questionnaire might have been
badly designed. Questionnaires should be clear, specific, unambiguous and easily
understood. Questions should be worded neutrally, especially in opinion surveys, to avoid
bias caused by pointing towards a particular response.
03/16/2025 7
(d). bias introduced by the person conducting the survey:
- the interviewer might not question someone who appears uncooperative,
- the style of questioning may influence the response. It should be noted that a sample can
only be representative of the population from which it is selected. If you select a sample of
teachers from one school, the sample is representative of the teachers in that school, not of
all teachers in all schools.
SAMPLING METHODS
Once a sampling frame has been established, you can choose a method of sampling. These
fall into two categories:
random sampling e.g. simple, systematic, stratified;
non-random sampling e.g. quota, cluster
03/16/2025 8
Simple random sampling
Suppose a population consists of sampling units and you require a sample of of these units.
A sample of size is called a simple random sample if all possible samples of size are equally
likely to be selected. Some form of random processes must be used to make the selection.
If the unit selected at each draw is replaced into the population before the next draw, then it
can appear more than once in the sample. This is known as sampling with replacement.
If the unit selected at each draw is not replaced into the population before the next draw, this
is known as sampling without replacement. The second method of sampling without
replacement is known as simple random sampling.
03/16/2025 9
Two methods of simple random sampling are commonly used
drawing lots,
random number sampling.
For each, make a list of all members of the population and give each member a different
number.
Calculator random number generator
your calculator, which You probably have a random number generator key ran#
03/16/2025 10
Stratified sampling
Stratified sampling is used when the population is split into distinguishable layers strata this
could be different from each other and which together cover the whole population. for
example age groups, occupational groups, - topographical regions. Separate random
samples are then taken from each stratum and put together to form the sample from the
population. It is usual to represent the population proportionately in the strata.
03/16/2025 11
Non-random sampling
(a) Cluster sampling
Sometimes there is a natural sub-grouping of the population and these subgroups are
called clusters. For example, in a population consisting of all children in the country
attending state primary schools, the local education authorities form natural clusters. When
a sample survey is carried out on a population that can be broken into clusters it is often
more convenient to first choose a random sample of clusters and then to sample within
each cluster chosen. Unlike stratified sampling where the strata are as different from each
other as possible, each cluster should be as similar to other clusters as possible.
03/16/2025 12
(b). Quota sampling
Quota sampling is widely used in market research where the population is divided into
groups in terms of age, sex, income level and so on. Then the interviewer is told how
many people to interview within each specified group, but is given no specific instructions
about how to locate them and fulfil the quota. This is the method generally used in street
interview surveys commonly carried out in shopping centers. It is quick to use,
complications are kept to a minimum and, unlike random sampling, any member of the
sample may be replaced by another member with the same characteristics.
03/16/2025 13
2.2 Sample Statistics
When you are trying to find out information about a population it seems sensible to take random
samples and then consider the values obtained from them. It is therefore useful to know how these
sample values are distributed.
2.2.1 The Distribution of the Sample Mean
Imagine carrying out the following procedure:
Take a random sample of independent observations from a population. Note that from a finite
population, sampling should be with replacement to ensure that the observations are independent.
Calculate the mean of these sample values. This is known as the sample mean.
Now repeat the procedure until you have taken all possible samples of size , calculating the sample
mean of each one.
Form a distribution of all the sample means.
03/16/2025 14
The distribution that would be formed is called the sampling distribution of means.
The mean and variance of the sampling distribution of means
It Is possible to work out the mean and variance of this sampling distribution using
expectation algebra.
Consider a population in which
Take independent observations
Since
03/16/2025 15
Since
The sample mean,
03/16/2025 16
03/16/2025 17
The standard deviation of the sampling distribution is usually written . This is known as the
standard error of the mean.
The mean of the sampling distribution is the same as the mean of the population. The standard
deviation of the sampling distribution is much smaller than that of the population since has
been divided by . This implies that the sample means are much more clustered around than
the population values are. In fact, the larger the sample size, the more clustered they are.
The following diagrams help to illustrate the shape of the sampling distribution of means
resulting from different sized samples from given populations.
03/16/2025 18
(a) The distribution of when the population of is normal
From the diagrams, you can see that if samples are taken from a normal population, the
sampling distribution of means is normal for any sample size.
If 03/16/2025 19
Example 2.1
At a college the masses of the male students can be modelled by a normal distribution with
mean mass 70 kg and standard deviation 5 kg. Four male students are chosen at random.
Find the probability that their mean mass is less than 65 kg.
Example 2.2
The distribution of the random variable is The mean of a random sample size drawn from
this distribution is . Find the value of correct to two significant figures given that
approximately equal to 0.0005.
03/16/2025 20
(b) The distribution of when is not normally distributed
The following diagrams illustrate the distribution of X for samples of different sizes taken
from a population
03/16/2025 21
03/16/2025 22
03/16/2025 23
Central Limit Theorem
From the diagrams you can see that when samples are taken from a population that is not
normally distributed, the sampling distribution takes on the characteristic normal shape as the
sample size increases. For large the distribution of the sample mean is approximately normal.
This result is known as the central limit theorem. It is somewhat surprising, since it holds when
the population of is discrete (as in the binomial and Poisson distributions) and when is
continuous (as in the uniform distribution).
For sample taken a non-normal population with mean and variance , by the central limit
theorem, is approximately normal and
Provided that the sample size, n is large ( say).
03/16/2025 24
Example 2.3
Thirty random observations are taken from each of the following distributions and the sample
mean calculated. Find, in each case, the probability that the sample mean exceeds .
(a) is the number of telephone calls made in an evening to a counselling service, where
(b) is the number of heads obtained when an unbiased coin is tossed nine times.
(c) is distributed uniformly throughout the range
03/16/2025 25
Example 2.4
Independent observations are taken from a normal distribution with mean 30 and variance 5.
(a) Find the probability that the average of 10 observations exceeds 30.5.
(b) Find the probability that the average of 40 observations exceeds 30.5.
(c) Find the probability that the average of 100 observations exceeds 30.5.
(d) Find the least value of n such that the probability that the average of n observations
exceeds 30.5 is less than 1%
03/16/2025 26
Example 2.5
Two red balls and 2 white balls are placed in a bag. Balls are drawn one by one, at random
and without replacement. The random variable is the number of white balls drawn before the
first red ball is drawn.
(a)Show that and find the rest of the probability distribution of X.
(b)Find and show that .
(c)The sample mean for 80 independent observations of is denoted by . Using a suitable
approximation, find .
03/16/2025 27
2.2.2 The Distribution Of The Sample Proportion, P
Suppose a random sample of observations is taken from a population in which the
proportion of successes is and the proportion of failures is . If is the number of successes in
the sample, then follows a binomial distribution i.e. . The random variable for the proportion
of success in the sample is .
This can be written where · It is possible to work out the mean and the variance of , using
expectation algebra as
03/16/2025 28
The distribution of has mean and variance .
When n is large, the distribution of is approximately normal, and
The larger the sample size, , the better the approximation.
The distribution of , is known as the sampling distribution of proportions. The standard
deviation of the distribution is known as the standard error of proportion.
Note: When considering the normal approximation to the binomial distribution, a continuity
correction of is needed.
Since , use a continuity correction i.e.
03/16/2025 29
Example 2.6
It is known that 3% of frozen pies delivered to a canteen are broken. What is the probability
that, on a morning when 500 pies are delivered, 5% or more are broken?
Example 2.7
Mr. Hand gained 48% of the votes in the District Council elections.
(a) Find the probability that a poll of 100 randomly selected voters would show over in
favour of Mr. Hand.
(b) Find the corresponding probability if the sample consists of 1000 randomly selected
voters.
03/16/2025 30