0% found this document useful (0 votes)
42 views8 pages

Sampling Distributions and Statistics

The lecture notes cover the concepts of random sampling and sampling distributions, emphasizing the importance of statistical inference when the joint distribution of random variables is unknown. It defines random samples, sample statistics, and explains the sampling distribution of the sample mean, including the Central Limit Theorem, which states that as sample size increases, the sampling distribution of the sample mean approaches a normal distribution. Examples illustrate how to calculate probabilities using these concepts.

Uploaded by

Roman Andrews
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views8 pages

Sampling Distributions and Statistics

The lecture notes cover the concepts of random sampling and sampling distributions, emphasizing the importance of statistical inference when the joint distribution of random variables is unknown. It defines random samples, sample statistics, and explains the sampling distribution of the sample mean, including the Central Limit Theorem, which states that as sample size increases, the sampling distribution of the sample mean approaches a normal distribution. Examples illustrate how to calculate probabilities using these concepts.

Uploaded by

Roman Andrews
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Probability and statistics

Lecture notes

Thanos Mergoupis

9.3.25

Part 2.1: Random sampling and

sampling distributions
We have seen that the investigation into how random variables relate to each other

often focuses on how features of conditional distributions change as the values of the

conditioning variable(s) change. If the joint distribution of two random variables is

known, we have developed tools to carry out such an investigation. More often

however, we do not know the exact joint distribution of the variables we are interested

in. We then have to infer this joint distribution using pieces of information drawn

from it. Statistical inference gives us the tools to do this.

We start the study of statistical inference by studying how pieces of

information or samples from a distribution, are related to the distribution they are

drawn from. To do this, we initially assume that we know the distribution or

population from which these pieces of information, or samples, are drawn from.

Random samples

Definition:

Let X1, X2, X3, …, Xn represent independent drawings from the distribution, or

population, of the random variable X. The ordered set {X1, X2, X3, …, Xn} is called a

random sample of size n on the random variable X.

1
Note

Every Xi for i=1, 2, 3, …., n is a r.v. because its values are determined by the random

experiment of drawing a value from the distribution of X. In fact, the likelihood of

each value of the Xi equals the likelihood of the same value of X, since the draws are

from the distribution of X. Therefore the distribution of each Xi is identical to the

distribution of X. Moreover, because the random draws are independent from each

other, the Xi are distributed independently of each other. Therefore the Xi are

independently and identically distributed or i.i.d. When a specific sample is drawn, we

have a realisation of this set of variables.

Because a random sample is an ordered set, we can represent it using vector notation.

We use bold to denote vectors so that X denotes the random vector (X1, X2, X3, …, Xn)

and x denotes the realisation of this random vector, i.e. (x1, x2, x3, …, xn).

Because by definition the Xi are identically and independently distributed, if they are

drawn from the distribution f(x) of a random variable X, their joint distribution gn(x) is

given by:

gn (x) = gn ( x1 , x2 ,..., xn ) = f1 ( x1 )  f 2 ( x2 )  ...  f n ( xn ) =


= f ( x1 )  f ( x2 )  ...  f ( xn ) =  f ( xi )
i

The first line uses the fact that the Xi are independently distributed, so that their joint

distribution equals the product of the marginal distributions. The second line uses the

2
fact that the Xi are all identically distributed with their distribution identical to the

distribution they are drawn from.

Example

The exponential parametric family is a one-parameter family of distributions. A

random variable X follows an exponential distribution with parameter λ if its pdf is

given by:

e− x for x  0
f ( x) = 
0 otherwise

Then a random sample size n from this distribution has joint probability density

function given by:

g n (x) = f ( x1 )  f ( x2 )  ...  f ( xn ) =  e −  x1   e −  x2  ...   e −  xn =


−  xi
= n  e−  x1 − x2 −...− xn =  n  e−  ( x1 + x2 +...+ xn ) =  n  e i

for x with all xi positive, and gn (x) = 0 otherwise.

Sample statistics

Let T = h(X1, X2, X3, …, Xn) = h(X). That is, the values of T are determined by a

function h(  ) of the random sample. Then T is called sample statistic.

Because sample statistics are functions of random vectors, they are random variables

themselves. Their values are determined by the n independent draws of a random

sample.

Examples of sample statistics

Sample Mean

3
The sample mean is defined as the arithmetic average of a random sample. That is:

1 n
X = ( X1 + X 2 + X 3 + ... + X n ) / n =  Xi
n i =1

Sample Variance

The sample variance is defined as:

1 n
S2 =  ( X i − X )2
n − 1 i =1

Examples of other sample statistics are:

Sample maximum

Xmax = max(X1, X2, X3, …, Xn)

Sample minimum

Xmin = min(X1, X2, X3, …, Xn)

Sample range

Xmax-Xmin

Sample midrange

(Xmax+Xmin)/2

Sampling distributions

The distribution of a sample statistic is called sampling distribution. In general, a

sampling distribution depends on:

4
1. The function h( , , , …, ) that determines the values of the sample statistic.

2. The distribution the sample is drawn from, i.e. the pmf or the pdf f(X).

3. The size n of the random sample.

The derivation of sampling distributions has variable degrees of difficulty. Certain

features of sampling distributions however, can be derived quite easily.

The sampling distribution of the sample mean

The sample mean theorem

Given a random sample size n { X1, X2, X3, …, Xn} from a population with E(X) = ,

and V(X) = 2, the sampling distribution of the sample mean has:

E( X ) = 

V (X ) =  2 / n

That is, the expected value of the sample mean is equal to the expected value of the

population from which the sample was drawn, and the variance of the sample mean is

equal to the variance of the population divided by the sample size.

Note that the mean of the sampling distribution of the sample mean does not vary

with the sample size, but the variance of the sample mean does. As sample size

increases, the distribution of the sample mean becomes more concentrated about its

mean.

When sampling from some distributions we can have more precise results on the

sampling distribution of the sample mean.

5
Sampling from the normal distribution

When sampling, with sample size n, on the r.v. X with X ~ N(,2), then:

X ~ N (, 2 / n)

The fact the sample mean has mean μ and variance σ2/n follows directly from the

sample mean theorem. The fact that the sample mean is also a normal random

variable follows from the property that linear combinations of independent normal

random variables are also normally distributed.

Example

In a British city the (natural) logarithm of annual family income before taxes follows

a normal distribution with mean 9.680 and variance 9.105. We draw a random sample

of 10 families from this population and ask for their family income. What is the

probability that the mean income of this sample is greater than £26,000?

Let Y = log(annual family income before taxes)

Then it is given that Y ~ N(9.680, 9.105)

We have that log(26,000)=10.166

Because the population the sample was drawn from is normal, the sample mean is

distributed as:

Y ~ N (9.680,9.105/10)  Y ~ N (9.680,0.9105)

Then:

 Y − 9.680 10.166 − 9.680 


P(Y  10.166) = P    = P( Z  0.5093) =
 0.9542 0.9542 

6
=1 – P( Z  0.5093) = 1 – 0.695 = 0.305

Clearly it is necessary to know the exact distribution of a sample mean in order to

calculate probabilities like the one in this example. As suggested earlier, although the

sample mean is a fairly simple function of the random sample, its distribution will

vary with the population sampled. The distributions of sample means from some well

known parametric families are known, but in general one would have to work it out.

This is true if we want to evaluate probabilities of a sample mean exactly. If, however,

we are prepared to accept a small margin of error, then there is a remarkable result

that identifies the distribution of any sample mean, regardless of the population

sampled, as long as we have a large enough sample. This is the Central Limit

Theorem.

Central Limit Theorem

In random sampling size n on any random variable X with E(X) = , and V(X) = 2 ,

as the size of a random sample increases, the sampling distribution of the sample

mean approaches (in some sense we have not defined) a normal distribution. In

particular, the standardised sample mean

X −
Z=
/ n

approaches the standard normal distribution N(0,1).

7
The use of the expression “approaches in some sense” is vague, but it refers to

something well defined mathematically. The sense in which “approaches” is used here

is called “convergence in distribution” but we do not need to define this concept here.

Example

Suppose it is known that the average distance a student travels to the University of

Bath is 2 miles, with standard deviation 1.2 miles. You survey a random sample of 50

students on the distance they travelled to get to the University. What is the probability

that the sample mean will be at most 1.75 miles?

Here we do not know the exact population we are sampling from. The only things we

know are that it has a mean of 2 miles and a standard deviation of 1.2 miles. But

because the sample is somewhat large, we can appeal to the CLT. So:

Let X = distance travelled to the University of Bath.

Then:

 X − 2.0 1.75 − 2.0 


P( X  1.75) = P    = P( Z  −1.473)  (−1.473) =
 1.2 / 50 1.2 / 50 
= 1 − 0.929 = 0.071

Common questions

Powered by AI

The joint distribution of a random sample can be derived by using the fact that each sample is drawn independently from the population distribution of the random variable X. This independence implies that the likelihood of any particular set of outcomes for the samples is the product of their individual probabilities. Therefore, if X1, X2, ..., Xn are independently drawn from the distribution f(x) of X, their joint distribution gn(x) is the product of these marginal distributions. Independence is crucial because it ensures that the probabilistic behavior of each Xi is unaffected by the others .

Increasing the sample size generally leads to a decrease in the standard error of the sample statistics, thereby increasing the precision of estimations derived from the sampling distributions. As the sample size grows, the sampling distribution of sample statistics, like the sample mean, becomes narrower and more concentrated around the population parameter. This effect arises because the variance of the sample mean decreases with larger sample sizes, specifically the variance being σ^2/n for the sample mean. Therefore, larger samples provide more reliable estimates and allow for more accurate hypothesis testing and confidence interval construction, which enhances the credibility and robustness of statistical conclusions .

The Central Limit Theorem (CLT) is significant because it states that, regardless of the population distribution, the distribution of the sample mean will approach a normal distribution as the sample size increases. This holds true as long as the samples are independent and identically distributed with finite variance, even when the population distribution is not known or is not normal. The theorem provides a foundational result that justifies using the normal distribution for inference about the sample mean in many practical situations, which simplifies the evaluation of probabilities and the calculation of confidence intervals. The standardization of the sample mean leads to the Z-distribution, a key element in hypothesis testing and statistical inference .

The variance of the sample mean, calculated from a random sample of n observations, is equal to the variance of the population (σ^2) divided by the sample size (n). This reduction in variance with increasing sample size implies that larger samples result in more precise estimates of the population mean. This characteristic is critical for statistical analysis as it guides researchers in determining the appropriate sample size needed to achieve a desired level of accuracy. More specifically, as the sample size increases, the sample mean's distribution becomes more concentrated around the population mean, thereby reducing the standard error and improving estimation reliability .

Consideration of the underlying population distribution is critical when calculating probabilities for sample means because the exact distribution of the sample mean depends on the distribution of the population from which the sample is drawn. While the Central Limit Theorem suggests that the sample mean distribution approaches normality for large samples, it does not specify how large the sample size must be for the approximation to be adequate. Misconceptions about the population distribution can lead to inaccurate estimates of probabilities. Therefore, knowledge of the population distribution aids in more precise calculations, especially in small samples where deviations from normality might significantly impact results .

Random sampling is integral to statistical inference because it allows us to draw conclusions about the population from which the sample is taken. Each random sample is an independent drawing from the population, meaning the occurrence of one sample does not affect the outcome of another. This independence ensures that the set of samples {X1, X2, ..., Xn} is independently and identically distributed (i.i.d.), allowing the joint distribution of the samples to be the product of the individual marginal distributions of each sample . This relationship is critical for forming accurate statistical inferences about population parameters based on sample statistics.

Sample statistics such as the sample minimum, maximum, and range offer unique insights about the distribution of data that the sample mean alone cannot provide. The sample minimum and maximum help identify the extremities or outliers within the data set, revealing variability and potential anomalies. The sample range, calculated as the difference between the maximum and minimum, measures the spread of the data, giving an indication of the dispersion and variability within the sample. These statistics complement the sample mean by providing a fuller picture of the dataset's distribution, highlighting aspects such as skewness and the presence of outliers, which are not captured by the mean alone .

Understanding convergence in distribution, as articulated by the Central Limit Theorem, enhances the interpretation of results from large samples by providing a theoretical foundation for approximating the sampling distribution with a normal distribution. This understanding facilitates the use of standard normal probabilities for inference, offering practical benefits such as simplifying the calculation of confidence intervals and hypothesis tests. It assures researchers that as sample size increases, even non-normally distributed populations will yield sample means that approximate normality, thus legitimizing the application of parametric tests that assume normality. This convergence significantly broadens the scope and applicability of inferential statistics, ensuring that results derived from large samples are robust and reliable .

Sample statistics, such as the sample mean or sample variance, are estimates derived from a random sample of observations and are used to infer the characteristics of the entire population. Population parameters, such as the mean (μ) or variance (σ^2), are fixed values that describe the entire population. Sample statistics are random variables because their values vary from sample to sample. They play a crucial role in statistical inference, allowing for the estimation of population parameters, testing of hypotheses, and making predictions based on sampled data. Statistical inference techniques use sample statistics to draw conclusions about population parameters with known degrees of uncertainty .

The exponential distribution is characterized by the parameter λ, and its probability density function (pdf) is f(x) = λe^{-λx} for x > 0, and 0 otherwise. When a random sample of size n is drawn from this distribution, the joint probability density function of the sample is represented by the product of the individual pdfs of each sample: g(x) = λ^n e^{-λ(x1+x2+...+xn)}, where x denotes the vector of sample values. This formulation shows the product law of the exponential function applied across independent samples, demonstrating the particular structure of sampling distributions within the exponential family .

You might also like