0% found this document useful (0 votes)
3 views5 pages

Week 4

The document discusses Chebyshev's Theorem, which provides a way to estimate the probability of a random variable falling within a certain range around its mean based on its standard deviation. It also covers the Law of Large Numbers and methods for approximating the mean and variance of nonlinear functions using Taylor expansions. Additionally, it includes examples illustrating the application of these concepts in statistical analysis.

Uploaded by

qawsedrf010588
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views5 pages

Week 4

The document discusses Chebyshev's Theorem, which provides a way to estimate the probability of a random variable falling within a certain range around its mean based on its standard deviation. It also covers the Law of Large Numbers and methods for approximating the mean and variance of nonlinear functions using Taylor expansions. Additionally, it includes examples illustrating the application of these concepts in statistical analysis.

Uploaded by

qawsedrf010588
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Week 4 : Math 230-02

1. Chebyshef’s Theorem
If a random variable X has a small standard deviation σX , it means that the
distribution is concentrated around its mean value µ. The smaller the standard
deviation is, the more the pdf is concentrated around the mean.
It is often useful to quantitably estimate the probability of the distribution lies
in a symmetric interval around the mean.
Theorem 1.1 (Chebyshef’s Theorem). Let X be any random variable, and consider
the symmetric interval [µ − kσ, µ + kσ] for any number k > 0. Then we have
1
P = (|X − µ|| < kσ) = P (µ − kσ < X < µ + kσ) ≥ 1 − .
k2
We can rewrite
P (µ − kσ < X < µ + kσ) = P (|X − µ|| < kσ).
Here is another form of Chebyshef’s theorem.
Corollary 1.2. Let X be any random variable. Then we have
1
P (|X − µ| ≥ kσ) ≤ .
k2
Proof. Note that
1 = P (|X − µ| < kσ) + P (|X − µ| ≥ kσ).
Therefore we have
1 1
P (|X − µ| ≥ kσ) = 1 − P (|X − µ| < kσ) ≥ 1 − (1 − ) = 2.
k2 k
This finishes the proof. 

Example 1.3. Let a random variable X has mean µ = 8 and the standard deviation
σ = 3 (or equivalently the variance σ 2 = 9). Give a lower estimate of the probability
P (−4 < X < 20).
Solution: We first express the interval [−4, 20] of the form
[µ − kσ, µ + kσ]
by setting
(
µ − kσ = −4
µ + kσ = 20.
Solving the system of this equation (actually one is enough to determine k), we get
k = 24 24 1 35
σ = 3 = 6. Therefore the probability is at least greater than 1 − 62 = 36 .

Example 1.4. In a class of 50 students, the midterm exam scores have a mean of
70 points and a standard deviation of 10 points. Your score is 90 points. Assuming
that the scores are symmetrically distributed about the mean, are you among the
top 10 students?
1
2

Solution: Set X be the random variable of the midterm exam scores. Then we
have
µ = 70, σ = 10.
We would like to find the probability of P (X > 90) to answer the question. We
note 90 = 70 + 2 × 10 = µ + 2σ. Obviously, we have
P (|X − µ| > 2σ) = P (X > µ + 2σ) + P (X < µ − 2σ).
By the symmetry about the mean µ = 70, we also know
P (X > µ + 2σ) = P (X < µ − 2σ)
and hence
1 1 1 1
P (X > 90) = P (X > µ + 2σ) = P (|X − µ| > 2σ) < × 2 = .
2 2 2 8
1
We compute 50 × 8 ∼ 6.3 and hence you are among the top 10 students.

Remark 1.5. Note that Chebyshef’s inequality is universal in that it applies to


any random variable.
Here is an important application of Chebyshef’s theorem.
Theorem 1.6 (The Law of Large Numbers). Let X1 , · · · , Xn be independent and
identically distributed random variables with mean µ and variance σ 2 . Consider the
average random variable
X1 + · · · + Xn
X := .
n
Then for any given positive ε > 0, we have
lim P (|X − µ| ≥ ε) = 0.
n→∞

This theorem explains what the ‘statistical average’ means. An example of the
circumstance appearing in the theorem is as follows: Think about your laboratory
and make some experiments. Imagine you or your colleagues do the same experi-
ments repeatedly. Each of your experiment results is a random variable Xi which
are independent.

Proof. Denote by µ and σ 2 the mean and the variance of X respectively. We


compute
 
X1 + · · · + Xn 1
E(X) = E = (E(X1 ) + · · · + E(Xn )
n n
 
1 1
= µ + · · · + µ = (nµ) = µ.
n | {z } n
ntimes

On the other hand, we compute


 
2 X1 + · · · + Xn
σ = Var(X) = Var
n
1 1 2 σ2
= (Var(X1 ) + · · · + Var(Xn )) = (nσ ) =
n2 n2 n
3

from which we obtain σ 2 = nσ 2 . Now to apply Chebyshef’s theorem, we put ε = kσ.


We express σ in terms of the given constants ε and σ,
p √
ε = kσ = k nσ 2 = (k n) σ.
By Chebyshef’s inequality applied to X, we obtain
√  1 1
P (|X − µ| ≥ ε) = P |X − µ| ≥ (k n)σ ≤ √ 2 = 2 .
(k n) k n
This proves
1
lim P (|X − µ| ≥ ε) ≤ lim = 0.
n→∞ n→∞ k2 n
This finishes the proof. 

2. Approximating Mean and Variance of nonlinear function g(X, Y )


First consider the linear function g(x, y) = ax + by + c. Then we have
E(g(X, Y )) = aE(X) + bE(Y ) + c.
Theorem 2.1. We have
σaX+bY +c = a2 σX
2
+ b2 σY2 + 2abσXY .
2 2
Corollary 2.2. (1) σX+c = σX .
2 2 2 2 2
(2) σaX+bY = a σX + b σY .
(3) σa21 X1 +···+an Xn = a21 σX
2
1
2
+ · · · a2n σX n
.

When g is not a linear function, even such as g(X/Y ), it is not simple to compute
2
E(X/Y ) or σX/Y in general, i.e., there is no simple formula as above. We need to
find a way of finding a good approximate value. One most common easy way is to
use linear approximations using the Taylor expansion of g(x, y) at the mean center.
(µX , µY )
Theorem 2.3 (Taylor’s formula; one variable cases). Suppose X be a random
variable. Let c = µX be given. Then we have
1
g(x) = g(c) + gx (c)(x − c) + gxx (c)(x − c)2 + “higher order terms”. (2.1)
2
We commonly right the difference as
∆x = x − c.
Corollary 2.4. Suppose X be a random variable. Then
(1) We have approximate mean
1 2
E(g(X)) ∼ g(µX ) + gxx (µX )σX .
2
(2) We have approximate variance
Var(g(X)) ∼ (gx (µX ))2 σX
2
.
4

Proof. For (1), we put c = µX and apply E to the equation (2.1) and dropping the
higher order terms. Then we get
 
1 2
E(g(X)) ∼ E g(µX ) + gx (µX )(X − µX ) + gxx (µX )(X − µX )
2
 
1 2
= E(g(µX )) + E(gx (µX )(X − µX )) + E gxx (µX )(X − µX )
2
1
= E(g(µX )) + gx (µX )(E(X − µX )) + gxx (µX )E (X − µX )2 .

2
2
 2
We note E(X − µX ) = 0 and E (X − µX ) = σX which finishes the proof.
For (2), we note
Var(g(X)) = E((g(X) − µg(X) )2 ).
Then we rewrite (2.1) into
1
g(x) − g(c) = gx (c)(x − c) + gxx (c)(x − c)2 + “higher order terms”.
2
Therefore dropping the multiple of “higher order terms”, we obtain
 2
1
(g(x) − g(c))2 = gx (c)(x − c) + gxx (c)(x − c)2 + “higher order terms”
2
 2
2 1 2
∼ (gx (c)(x − c)) + 2(gx (c)(x − c)) gxx (c)(x − c)
2
 2
1
+ gxx (c)(x − c)2
2
Still further dropping the terms higher than the quadratic terms of (x − c)
(g(x) − g(c))2 ∼ (gx (c)(x − c))2 = (gx (c))2 (x − c)2 .
Therefore
E(g(X)) − g(µX ) ∼ (gx (c))2 (x − c)2 .
Combining the above, we get
2
= E((g(X) − µg(X) )2 ) ∼ E (gx (µX ))2 (X − c)2

σg(X)
= (gx (µX ))2 E((X − µX )2 ) = (gx (µX ))2 σX
2
.
This finishes the proof. 

Example 2.5.
2 1
Given the random variable X with mean µX = 1 and variance σX = 2 estimate
1
the mean and the variance of the random variable g( X+1 ).
Solution: We have c = µX = 1. Then we compute
1 2
gx (x) = − , gxx (x) = .
(1 + x)2 (1 + x)3
Therefore
1 1 1
g(1) = , gx (1) = − , gxx (1) = .
2 4 4
5

This gives rise to


1 2 1 1 1 1 3
E(g(X) ∼ g(1) + gxx (1)σX = − × × =
2 2 2 4 2 16
 2  
2 2 2 2 2 1 1 1
σg(X) ∼ (gx (µg(X) )) σX = (gx (1)) σX = − × = .
4 2 32
For the multi random variables, we similarly apply the Tayler’s formula for multi-
variable functions y = h(x1 , · · · , xk ).
Theorem 2.6. Let X1 , · · · , Xk be independent random variables, and consider the
random variable given by
Y = h(X1 , · · · , Xk ).
Write µXi = µi and σXi = σi . Then we have
k
X σ2 i
E(Y ) ∼ h(µ1 , · · · , µk ) + hxi xi (µ1 , · · · , µn )
i=1
2
k
X
Var Y ∼ (hxi (µ1 , · · · , µk ))2 σi2 .
i=1

The approximation for random variable inter-dependent becomes more compli-


cated because of the appearance the mixed terms in the second derivatives, or the
Hessian matrix.
Example 2.7. In Newton’s theory of gravity, the force between two planets (or
two stars) are given by the formula
Gm1 m2
F =−
r2
where G is called the gravitational constant which is universal independent of the
planets. What we have to ask about is to determine this constant. How? We
can estimate G by measuring the masses of many pairs of planets and the forces
between them. Regard M1 and M2 be two random variables of measuring the
masses R the random variable measuring the distance r and F the one of the force.
Suppose E(Mi ) = 100, 500, σi2 = 100, for i = 1, 2, and E(F ) = 200, σF2 = 5, and
2
E(R) = 1, 0000000, σR = 20. Estimate the constant G and the standard deviation
of this estimation.
Solutions: From the formula, we derive
F r2
G=− .
m1 m2
x4 x23
If we write h(x1 , x2 , x3 , x4 ) = x1 x2 ,then we have G = h(F, r, m1 , m2 ), and hence
F r2
 
G ∼ E(G) = E = E(h(R, M1 , M2 , F )).
m1 m2

You might also like