0% found this document useful (0 votes)
10 views7 pages

Limit Theorems in Statistics Explained

Chapter 5 discusses limit theorems in statistics, focusing on the Law of Large Numbers (LLN) and the Central Limit Theorem (CLT). The LLN states that as the number of independent random variables increases, the sample mean converges to the expected value, while the CLT indicates that the distribution of the sample mean approaches a normal distribution as sample size increases. The chapter also includes exercises to apply these concepts in practical scenarios.
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views7 pages

Limit Theorems in Statistics Explained

Chapter 5 discusses limit theorems in statistics, focusing on the Law of Large Numbers (LLN) and the Central Limit Theorem (CLT). The LLN states that as the number of independent random variables increases, the sample mean converges to the expected value, while the CLT indicates that the distribution of the sample mean approaches a normal distribution as sample size increases. The chapter also includes exercises to apply these concepts in practical scenarios.
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Chapter 5 Statistics 2A STA02A2

CHAPTER 5: Limit Theorems

Lecture notes sections Content from corresponding textbook sections


1. Introduction 5.1. Introduction
2. Law of large numbers 5.2. Law of large numbers
3. Convergence in distribution and the central limit 5.3. Convergence in distribution and the central limit
theorem theorem

1. Introduction
This chapter is concerned with the limiting behaviour of the sum of independent random variables as the number
of the summands becomes large. The results are useful in statistics, since many commonly computed statistical
quantities, such as averages, can be represented as sums.

2. Law Of Large Numbers


It is commonly believed that if a fair coin is tossed many times the coin would land on heads approximately half
the time so that the probability of observing heads is close to 0.5. John Kerrick, a South African mathematician,
tested this belief empirically while detained as a prisoner in World War II. He tossed a coin 10000 times and
observed 5067 heads.

The law of large numbers (LLN) is the mathematical formulation of this belief and has a central role in probability
and statistics. It states that if you repeat an experiment independently a large number of times and average the
result, what you obtain should be close to the expected value. Therefore, the LLN is a proposition that provides a
set of sufficient conditions for the convergence of the sample mean to a constant. Typically, the constant is the
expected value of the distribution from which the sample has been drawn.

First, we will define the concepts of closeness and convergence:


➢ Closeness
Let  X n  be a sequence of random variables defined on the sample space Ω. Take a random variable X

and a strictly positive number   0 . Suppose we consider that X n is far away for X if X n − X   ,

then P ( X n − X   ) is the probability that X n is far away for X .

1
Chapter 5 Statistics 2A STA02A2

➢ Convergence
If  X n  converges to X , then the probability that X n is far away for X should become smaller and

smaller as n increases, therefore lim P ( X n − X   ) = 0 .


n →

There are two main versions of the LLN, namely the weak law of large numbers (WLLN) and the strong law of
large numbers (SLLN). The difference between them is mostly theoretical. There are many different LLN’s, but
we will focus on Chebyshev’s WLLN, and briefly discuss the SLLN formulation and comparison with a WLLN.

2.1 Weak law of large numbers


Let X 1 , X 2 , , X n be a sequence of independent and identically distributed (i.i.d.) random variables with

1 n
E ( X i ) =  , Var ( X i ) =  2 and sample mean X n =  X i . Then, for any   0 :
n i =1

(
lim P X n −    = 0
n →
)
(
 lim P X n −    = 1
n →
)

In other words, for a very small   0 , the probability that the difference between the sample mean and the
population mean is even smaller than  , forms a sequence of probability values, which increases with n, because
the sample size n approaches the population size N. This sequence converges to 1. Therefore, the sample mean
converges in probability to the expected value.

Proof/Derivation:
Since the X i ' s are independent and identically distributed, it follows that:

1 n  1 n 1 n n
E ( Xn ) = E   Xi  =  E ( Xi ) =   = =
 n i =1  n i =1 n i =1 n
1 n  1 n
Var ( X n ) = Var   X i  = 2 Var ( X i )
 n i =1  n i =1
1 n
n 2  2
=
n2
 2 =
i =1 n2
=
n

The result follows from Chebyshev’s inequality, namely:


Var ( X n ) 2
(
P Xn −     ) 2
= 2 → 0 as n → 
n
(
 lim P X n −    = 0
n →
)
2
Chapter 5 Statistics 2A STA02A2

2.2 Strong law of large numbers


Let X i be a sequence of independent and identically distributed (i.i.d.) random variables with E ( X i ) =  and

Var ( X i ) =  2 , for i = 1, 2, 
n
. Let X n = 1
n i =1
X i . Then,

P  lim ( X n =  )  = 1
 n→ 
(
 P lim X n −  = 0 = 1
n →
)
In other words, a sequence of differences between a sample mean of increasing size and the population mean,
vanishes to 0 with probability 1 as n →  . Therefore, the sample mean converges almost surely to the expected
value.

2.3 SLLN vs. WLLN


➢ SLLN refers to the assurance that something does happen, almost surely.
➢ WLLN refers to the assurance that what we want to see, will happen with increasing probability.
➢ SLLN implies WLLN, WLLN does not imply SLLN

3. Convergence in Distribution and the Central Limit Theorem


In applications, we often want to find P ( a  X  b ) when we do not know the cdf of X precisely. It is sometimes

possible to do this by approximating FX , often using some sort of limiting argument. The most famous limit

theorem in probability theory is the central limit theorem (CLT). The derivation of the CLT depends on the notion
of convergence in distribution and certain properties of the moment-generating function, namely that a
distribution is uniquely determined by its moment-generating function, and that convergence of a moment-
generating function implies convergence in distribution.

Convergence in distribution
Let X 1 , X 2 , be a sequence of random variables with cumulative distribution functions F1 , F2 , , and let X be

a random variable with cumulative distribution function F . Then X n converges in distribution to X if:

lim Fn ( x ) = F ( x )
n →

at every point at which F is continuous.

3
Chapter 5 Statistics 2A STA02A2

Continuity theorem
Let Fn be a sequence of cumulative distribution functions with corresponding moment-generating function M n .

Let F be a cumulative distribution function with moment-generating function M . If M n ( t ) → M ( t ) for al t in

an open interval containing zero, then Fn ( x ) → F ( x ) at all continuity points of F .

Central limit theorem


Let X 1 , X 2 , be a sequence of random variables with a mean of  , a variance of  2 , and the common

distribution function F and moment-generating function M defined in a neighbourhood of zero.

Let S n =  i =1 X i = nX , since X = 
n n
1
n i =1
X i . From the law of large numbers we know that:

• E ( X ) =   E ( Sn ) = E ( nX ) = n

2 2
• Var ( X ) =  Var ( S n ) = Var ( nX ) = n 2Var ( X ) = n 2  = n 2
n n

Note that the standardized form of X can be written as the standardized form of the sum (total) S n :

X − X − n S n − n
Z= =  =
  n  n
n n

Let G denote the cumulative distribution function of the standardized form of X (or S n ). Then, through

convergence in distribution and the continuity theorem, the limit of G is the cumulative distribution function of
the standard normal distribution, that is, the standardized form of X (or S n ) has a limiting standard normal

distribution.
 X − 
lim G = lim P   x  =  ( x)
 n
n → n →

 S − n 
lim G = lim P  n  x  =  ( x)
n → n →
  n 

The detailed proof of the conclusion is not for examination purposes.

4
Chapter 5 Statistics 2A STA02A2

The above theorem implies that X and S n are asymptotically normal, i.e., the distributions approach the normal

 2 
distribution as n increases. Therefore, if n is sufficiently large X ~ N   ,  and Sn ~ N ( n , n 2 ) . The CLT
 n 

is used in inferential statistics to derive the sampling distribution of a statistic and allows use to calculate
probabilities around the behaviour of such sample statistics. It can also be applied to approximate distributions
such as the Poisson (for large values of λ), the Binomial, and others.

Exercise 1
A harassed father, wishing to keep his son quiet, offers to pay him R100 if the average score of 100 throws of a
fair die exceeds 4. What is the probability that the father will have to pay out?

5
Chapter 5 Statistics 2A STA02A2

Exercise 2 (Question 13 in the textbook exercises)


A drunkard executes a “random walk” in the following way: Each minute he takes a step north or south, with
probability 0.5 each, and his successive step directions are independent. His step length is 50cm. Use the central
limit theory to approximate the probability distribution of his location after 1 hour. Where is he most likely to be?

6
Chapter 5 Statistics 2A STA02A2

Exercise 3 (Question 26 in the textbook exercises)


Suppose that a basketball player can score on a particular shot with probability 0.3. Use the central limit theorem
to find the approximate distribution of S, the number of successes out of 25 independent shots. Find the
approximate probabilities that S is less than or equal to 5, 7, 9, and 11, and compare these to the exact probabilities.

Common questions

Powered by AI

The weak law of large numbers (WLLN) assures that the sample mean converges in probability to the expected value, meaning that as the sample size increases, the probability that the sample mean differs from the expected value by more than a given amount approaches zero . On the other hand, the strong law of large numbers (SLLN) assures that this convergence happens almost surely, with the probability of the sample mean not converging exactly to the expected value being zero as the sample size approaches infinity . SLLN implies WLLN but the converse is not true; the essential difference lies in the type of convergence: in probability for WLLN and almost sure for SLLN .

Standardization assists in the application of the Central Limit Theorem (CLT) by transforming a sum of random variables into a standard normal form, which has a mean of zero and a standard deviation of one. This is achieved by subtracting the mean and dividing by the standard deviation, enabling different data sets to be compared on the same scale . Through standardization, the CLT can be applied uniformly, regardless of the original mean and variance of the data, facilitating the approximation of distributions and the derivation of useful statistical inferences .

Chebyshev's inequality provides a bound on the probability that a random variable deviates from its mean by more than a certain multiple of its standard deviation. In the proof of the weak law of large numbers, Chebyshev's inequality is used to show that as the sample size increases, the probability that the sample mean deviates from the expected value by more than a specified amount approaches zero . This directly supports the assertion that the sample mean converges in probability to the expected value of the distribution as the number of observations increases .

The Central Limit Theorem enables the computation of probabilities for discrete distributions like binomial distributions by approximating the distribution of the sum (or proportion) to a normal distribution when the sample size is large. By standardizing the binomial distribution (subtracting the mean np and dividing by the standard deviation sqrt(np(1-p))) and assuming the conditions for normal approximation are met (such as np(1-p) being sufficiently large), probabilities can be calculated using the normal distribution tables or functions . This makes it easier to perform statistical analyses and interpret results in practical scenarios without computing exact probabilities .

The definition of 'almost surely' convergence is critical for the strong law of large numbers because it asserts that a sequence of random variables converges to a constant with probability one, meaning the event that the sequence does not converge has probability zero . This ensures stronger results than convergence in probability, which only guarantees that the probability of deviation beyond a certain threshold decreases but does not rule out the possibility of non-convergence with probability one. Almost sure convergence, thus, provides a more definitive assertion about the behavior of sample means, emphasizing the certainty of their alignment with the expected value as the sample size becomes infinite .

Convergence in probability implies that for any given small positive number, the probability that the sample mean deviates from the expected value by more than that number approaches zero as the sample size increases. It is often applied in contexts such as the weak law of large numbers where the objective is the convergence of sample statistics to true population parameters . On the other hand, convergence in distribution refers to the situation where the cumulative distribution function of a sequence of random variables converges to the cumulative distribution function of another random variable at all points of continuity. This is often used in contexts like the central limit theorem, where the aim is to demonstrate that the distribution of a sum of random variables approaches a normal distribution .

The Central Limit Theorem (CLT) allows the approximation of complex distributions by stating that the sum (or average) of a large number of independent and identically distributed random variables, regardless of the original distribution, will tend to be normally distributed if the sample size is sufficiently large. This is particularly useful when the original distribution is unknown or difficult to work with . By normalizing and standardizing the variables, we can estimate probabilities and conduct inferential statistics using the properties of the normal distribution .

The limit theorems imply that as sample size increases, certain statistical properties such as the sample mean become more reliable predictors of population characteristics, effectively reducing uncertainty. This has direct implications for statistical decision-making, emphasizing the importance of larger sample sizes to enhance the precision of estimates and the applicability of normal approximations in inferential statistics . The theorems also underline the role of variance, as a smaller variance generally enhances the accuracy of the sample mean as an estimator. These aspects are critical in determining the adequacy of sample sizes and in assessing the reliability of statistical conclusions in practice .

Moment-generating functions (MGFs) play a pivotal role in establishing convergence in distribution because they uniquely determine the distribution of a random variable. If the MGFs of a sequence of random variables converge to the MGF of a limiting distribution, then the sequence of cumulative distribution functions converges to the cumulative distribution function of the limiting distribution at all points of continuity, as established by the continuity theorem . For the Central Limit Theorem, the convergence of moment-generating functions suggests that the distribution of a standardized sum of random variables converges to the standard normal distribution, thus allowing the application of normal distribution properties for large sample sizes .

The weak law of large numbers can be illustrated through empirical experiments by conducting repeated independent trials and observing the convergence of the sample mean to the expected value. In John Kerrick's coin tossing experiment, he tossed a fair coin 10,000 times and observed the proportion of heads converging to the expected value of 0.5, demonstrating that with a large number of trials, the sample proportion approximates the theoretical probability . Such experiments effectively demonstrate the convergence in probability, underscoring the empirical validation of theoretical statistical principles .

You might also like