0% found this document useful (0 votes)

5 views9 pages

Chernoff and Tail Bounds Explained

mt4

Uploaded by

Ankit Abhinav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views9 pages

Chernoff and Tail Bounds Explained

mt4

Uploaded by

Ankit Abhinav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

A Theorist’s Toolkit (CMU 18-859T, Fall 2013)

Lecture 03: Chernoff/Tail Bounds

September 16, 2013
Lecturer: Ryan O’Donnell Scribe: Elara Willett

1 INTRODUCTION
Define H ∼ Binomial(n, 12 ), i.e. H is the number of heads that occur when we
√
flip a coin n
n n
times. Recall that the mean of H is µ = 2 and the standard deviation is σ = 2 . Last time,
we applied the Berry-Esseen Theorem to get
√
n n φ(t) −t2
Pr H ≥ + t ≈ Φ(t) ≤ ≤e 2
2 2 t
where Φ(t) is the probability that the Gaussian distribution is at most t and t ≥ 1. This
tells us that the probability that we exceed the standard deviation decreases very rapidly.
√ h √ i
n
As an example, consider t = 10 ln n. Then we get P r H ≥ 2 + t 2 ≤ e−50 ln n = n150 .
n

However, the error term of our above approximation is ±O( √1n ), which is important in this
example because the error is so much larger than the bound. We will see in the analysis to
come that the bound does, indeed, hold in this example.
In this lecture, we will cover bounds on random variables given varying assumptions.
For a comprehensive reference on probabilistic techniques used in approximation algorithms,
including topics covered in this lecture and more, see Dubhashi and Panconesi’s book [DP+ ].
For another reference, see Mitzenmacher and Upfal’s book [MU05], which is an introduction
to probabilistic techniques used in algorithm analysis, including the topics covered in this
lecture.

2 BOUNDS BASED ONLY ON MEAN & VARIANCE

Say we have a random variable X. Generally speaking, the more we know about X, the
better bounds we can calculate for P [X ≥ t]. So we’ll begin by supposing we know only the
expectation E[X]. Assume X ≥ 0 always and X is not always 0.
Theorem 2.1. Markov Inequality. For all t > 0, P [X ≥ t · E[X]] ≤ 1t .
Proof. WOLOG assume E[X] = 1. (We can multiply X by a constant and we do not change
the probabilities.) For contradiction, suppose P [X ≥ t] > 1t . Then
E[X] ≥ t · P [X ≥ t] + 0 · P [X < t] > 1
which contradicts our assumption that E[X] = 1.

1
Here is an alternative proof, using the intuition from the depicted graph.

Proof. Define f (x) to be 0 for x < t and 1 for x ≥ t, so that P r[X ≥ t] = E[f (X)]. Define
g(x) = xt . Since g(x) ≥ f (x), we get

X 1
P r[X ≥ t] = E[f (X)] ≤ E[g(X)] = E =
t t

where the last equality holds because E[X] = 1.

We can prove another similar fact based only on expectation using a similar argument,
known as a ‘Markov-type’ or ‘average-type’ argument.

Fact 2.2. Suppose E[X] = and 0 ≤ X ≤ 1, then P r[X ≥ 2 ] ≥ 2 .

Proof. For contradiction, suppose P r[X ≥ 2 ] < 2 , then

h i h i
E[X] ≤ 1 · P r X ≥ + · Pr X < <1· + ·1=
2 2 2 2 2
which contradicts our assumption that E[X] = .
Now, we can move on to the scenerio where we know the expectation and the variance
of X. In this case we get the following bounds.

Theorem 2.3. Chebyshev’s Inequality. Let E[X] = µ and stddev[X] = σ 6= 0, then for all
t > 0, P [|X − µ| ≥ tσ] ≤ t12 .

Proof. WOLOG assume µ = 0. (We can subtract µ from the mean without changing the
standard deviation.) Also, WOLOG assume σ = 1. (We can multiply X by σ1 without
changing the probabilites.) Note that these assumptions imply E[X 2 ] = 1. Now, it is
sufficient to show P [|X| ≥ t] ≤ t12 . Well, the event |X| ≥ t is equivalent to the event
X 2 ≥ t2 . Since X 2 is nonnegative and t2 = t2 · E[X 2 ], we can apply Markov’s Inequality to
bound P [X 2 ≥ t2 ], which gives us the desired bound
1
P [|X| ≥ t] = P [X 2 ≥ t2 ] ≤
t2

2
Of course, as in the proof of Markov’s Inequality, we can alternatively prove this theorem
with a picture.

Proof. Again WOLOG assume µ = 0 and σ = 1. Now define f (x) to be 0 for x < t and 1
for x ≥ t and define g(x) = tx2 . As before, since g(x) ≤ f (x),
2
X 1
P r[|X| ≥ t] = E[f (X)] ≤ E[g(X)] = E 2 = 2
t t

3 BACK TO SUMS OF VARIABLES

Let X = X1 + X2 + · · · + Xn , and assume (for now) that the Xi ’s are independent. Given this
additional information about X, what bounds can we prove? In this section we’ll explore
Chernoff Bounds to answer this question.

3.1 Markov and Chebyshev on Sums

To start, let’s again consider H ∼ Binomial(n, 12 ). Recall that we are interested in bounding
n
√
n
√ n
√
n
P r[H ≥ 2 + t 2 ], where t = 10 ln n. We know that µ = 2 and σ = 2 , so we can apply
the inequalities form the previous section. Using only the mean, we can apply Markov’s
Inequality and get
" √ !!#
ln n 1 1
Pr H ≥ µ 1 + O √ ≤ √ =
n 1 + O( √lnnn ) e √1 )
1 + O( n

Using the mean and the standard deviation, we can apply Chebyshev’ Inequality and get
√ 1 1
P r[|X − µ| ≥ 10 ln n · σ] ≤ √ =
(10 ln n) 2 100 ln n
Chebyshev’s bound will still hold if we only have pairwise independence of the Wi ’s. Pairwise
independence means that for all pairs i 6= j, E[Xi Xj ] = E[Xi ]E[Xj ]. The mean will remain
the same as in the independence case because of linearity of independence. To see that the
variance will also remain unchanged, notice that pairwise independence gives us
" n # n n
X X X X X
E ( Xi )2 = E[Xi2 ] + 2 E[Xi Xj ] = E[Xi2 ] + 2 E[Xi ]E[Xj ]
i=1 i=1 i6=j i=1 i6=j

3
n n
Xi )2 ] − (E[ Xi ])2 , and by our calculation
P P
By definition, V ar[X1 + X2 + · · · + Xn ] = E[(
i=1 i=1
above this value is the same as in the case of independent variables, so variance is unchanged.
Since the mean and variance are the same in the cases of independence and pairwise inde-
pendence, Chebychev’s bound is the same for both. Therefore, we see that Chebychev’s
bound is strong in some cases and weak in others.

3.2 Fourth Moment Method

So far we have seen bounds based on the mean, e.g. the first moment, and the variance, e.g.
the second moment, so now we will look at bounds based on higher moments. The graph
below shows how a fourth-degree polynomial could more closely fit our function, f , than a
quadratic polynomial. An odd-degree polynomial could not upper bound f , so we do not
consider the third moment.

Let X = X1 + X2 + · · · + Xn , where the Xi ’s are independent identical random variables

and
+1 w.p. 1/2
Xi =
−1 w.p. 1/2
√ √
We will use the Fourth
√ Moment Method to upper bound P r[|X| ≥ 10 ln n · n].
4 4 2 4
Well, P r[|X| ≥ t n] = P [X ≥ t n ], and X is a nonnegative variable, so we can apply
Markov’s Inequality, namely,
√ E[X 4 ]
P r[|X| ≥ t n] = P [X 4 ≥ t4 n2 ] ≤ 4 2
tn
4
P 4
Now, let’s take a close look at X = Xi . When we multiple the variables out, we get
i
X X X X X
X4 = Xi4 + c Xi2 Xj2 + c0 Xi3 Xj + c00 Xi2 Xj Xk + c000 Xi Xj Xk Xl
i i6=j i6=j i6=j6=k i6=j6=k6=l

for some integers c, c0 , c00 , c000 . Since E[Xi ] = 0 and the variables are independent,
E[Xi Xj Xk Xl ] = E[Xi ]E[Xj ]E[Xk ]E[Xl ] = 0 · 0 · 0 · 0 = 0
Similarly, E[Xi3 Xj ] = E[Xi3 ]E[Xj ] = E[Xi3 ] · 0 = 0, and E[Xi2 Xj Xk ] = 0. Thus,
X X
E[X 4 ] = E[Xi4 ] + c E[Xi2 ]E[Xj2 ]
i i6=j

4
n
4
and if we have to calculate it, c = 2 2
= 3n2 − 3n. Thus,
√ E[X 4 ] 3n2 − 3n 3
P r[|X| ≥ t n] ≤ = ≤
tn2 t4 n2 t4
Furthermore, in our specific example,
√ √

3 1
P r[|X| ≥ 10 ln n · n] ≤ √ =Θ
(10 ln n)4 ln2 n
We could continue in√the same manner, applying the 2s Moment Method to get bounds
of the form P r[|X| ≥ t n] ≤ tC2ss for some constant t. Notice that as s goes to infinity, cs
also goes to infinity. Thus, to see which of these methods gives the best bound, we would
need to optimize over all possible s, geting s(t) for a given t. This would be quite difficult,
but it might give us a slightly better bound.

3.3 Chernoff Method

For this subsection, we assume X is the sum of independent identical variables.
√
Define c = eλ for some λ > 0. Let g(x) = cx /cu for a given u = t n. Looking at the
diagram below, it is clear that g is strictly increasing and g(u) = 1.

This tells us that the event that g ≥ 1 is equivalent to the event that X ≥ u, which
implies P [X ≥ u] = P r[g ≥ 1] = P r[eλX ≥ eλu ]. Since eλX is a nonegative random variable,
we can now apply Markov’s Inequality to get,
E[eλX ]
P [X ≥ u] = P r[eλX ≥ eλu ] ≤
eλu
This bound is a function of λ and is called the moment generating function of X. By
optimizing over λ > 0, we could determine the best bound achievable by this function for
any given u.
In order to use this strategy, we first need to get a function for E[eλX ]. Well,
E[exp(λX)] = E[exp(λX1 + λX2 + · · · + λXn )] = E[exp(λX1 )exp(λX2 ) · · · exp(λXn )]
By independence of the Xi ’s, we get
E[exp(λX1 )exp(λX2 ) · · · exp(λXn )] = E[exp(λX1 )] · E[exp(λX2 )] · · · E[exp(λXn )]

5
and since the Xi ’s and identical, we get

E[exp(λX1 )] · E[exp(λX2 )] · · · E[exp(λXn )] = (E[exp(λX1 )])n

Therefore, E[eλX ] = (E[exp(λX1 )])n . Now, E[exp(λX1 )] = 21 eλ + 21 e−λ , which is known as

the hyperbolic cosine. We can simplify this function by applying the Taylor Expansion for
2 3 2 3
ex . Namely, eλ = (1 + λ + λ2 + λ3! + · · ·) and e−λ = (1 − λ + λ2 − λ3! + · · ·). The odd terms
2 4 6
cancel each other out, so we get E[exp(λX1 )] = (1 + λ2 + λ4! + λ6! + · · ·). By Claim 3.1, this
2
series is upper bounded by the Taylor Expansion of ex for x = λ2 . Thus, we get
2
E[eλX ] eλ n/2 2
P [X ≥ u] ≤ λu
≤ λu
= eλ n/2−λu
e e
Now, we have an upper bound for P [X ≥ u] that we could optimize over λ for a given u.
For example, if we choose λ = nu then we get the bound:
u2
P [X ≥ u] ≤ e− 2n
√ √
Furthermore, for u = 10 ln n · n, we get P [X ≥ u] ≤ e−50 ln n = n150 . Recall from the
Introduction that this is precisely the bound we desired. For this particular random variable
and choice of t, we can deduce from the Berry-Esseen Theorem that this is about the best
bound we can get. In more general cases, this bound is still about the best we can do. See
[AS04] or [Slu77] for more details.
λ2 λ4 λ6 2 2 2
Claim 3.1. Let x = 1 + 2
+ 4!
+ 6!
+ · · · and y = 1 + ( λ2 ) + 12 ( λ2 )2 + 3!1 ( λ2 )2 + · · ·. then
x ≤ y.
∞ ∞ ∞
P (λ2 /2)i P (λ)2i P (λ)2i
Proof. Well, y = i!
= 2i i!
≥ (2i)!
=x, where the inequality follows from
i=0 i=0 i=0

(2i)! 2i(2i − 1) · · · (i + 1) 2i 2i − 1 i+1
≥ = ··· ≥ 2i
i! i(i − 1) · · · 1 i i−1 1

Note: Suppose our Xi ’s were not identical, but had some common structure, such as

1 w.p. pi
Xi =
0 w.p. 1 − pi

As long as the Xi ’s are independent, we could do a similar, yet somewhat more complicated,
analysis and get a good bound.
We will now formally state two variations on the bound we proved, the Hoeffding Bound
and the Chernoff [Link] bounds assume X = X1 + X2 + · · · + Xn with independent
Xi ’s and µ = E[X] = E[X]. These bounds are extremely useful and the Chernoff
i
Bound should be memorized.

6
Theorem 3.2. Hoeffding Bound. Suppose ai ≤ Xi ≤ bi always. Then for all t > 0,
2t2

P r[X ≥ µ + t] ≤ exp − P 2
i (bi − ai )

and the exact same bound holds for P r[X ≥ µ − t].

The Chernoff Bound, below, is a little stronger than the Hoeffding Bound, which is also
sometimes called the Chernoff Bound.
Theorem 3.3. Chernoff Bound. Suppose 0 ≤ Xi ≤ 1. Then for all > 0,
2
2

P r[X ≤ (1 − )µ] ≤ exp − µ , and P r[X ≥ (1 + )µ] ≤ exp − µ
2 2+
The Chernoff Bound is even useful in the case where you only know some range within
which µ falls. Suppose µL ≤ µ ≤ µH , then as you might expect,
2
2

P r[X ≤ (1 − )µL ] ≤ exp − µL , and P r[X ≥ (1 + )µH ] ≤ exp − µH
2 2+

4 RELAXING INDEPENDENCE
So far we have mostly considered only sums of independent variables. In this section we will
consider two special cases where we can prove good bounds about the sums of dependent
variables: negatively associated random variables and martingales.

4.1 Negatively Associated Random Variables

For intuition’s sake, negative association (NA) means that if one variable in a set of negatively
associated random variables is large than the others will be small. The following are examples
of negatively associated variables.

1. Uniformly at random, draw n points on the unit circle as depicted, and let X1 , X2 , ...Xn
be the arc lengths between these points. Notice that if you know that one arc length

is large, then the others must be small.

2. Independently throw n balls into n bins, and let Xi be the number of balls in bin i. If
there are a lot of balls in bin i, then there are few balls in the other bins.

7
3. Let x1 , x2 , ..., xn be any real numbers. Let X1 , ..., Xn be a random permutation of these
numbers. If we know Xi = xj for a high xj , than the other variables cannot take on
xj , so they will probably be comparatively low. Note, this setup can be used to model
sampling without replacement.

We will now formally define negative association.

Definition 4.1. Random variables X1 , X2 , ..., Xn are randomly associated if

E[f (Xi : i ∈ A)g(Xj : j ∈ B)] ≤ E[f (Xi : i ∈ A)]E[g(Xj : j ∈ B)]

for any disjoint sets A, B ⊆ {1, 2, ...n} and any nondecreasing functions, f, g.

This definition implies that E[Xi Xj ] ≤ E[Xi ]E[Xj ], or cov[Xi , Xj ] ≤ 0, but the converse
is not true. In general, Negative association is stronger than this claim about covariance.
See [JDP83] for more details.
By induction, one can show that negative association is enough to give,

E[exp(λX1 )exp(λX2 ) · · · exp(λXn )] ≤ E[exp(λX1 )] · E[exp(λX2 )] · · · E[exp(λXn )]

In our discussion of the Chernoff Method, we only needed independence to get this one fact.
Therefore, the Chernoff Bounds hold for negatively associated variables.
Here are a few interesting facts to ponder about negative association:

• Independence implies negative association.

• Negative association is closed under subsets and independent unions.

• Negative association is closed under sapplying nondecreasing functions to subsets of

variables.

4.2 Martingales
Suppose you bought a lottery ticket with numbers 12 33 5 60. Before the winning numbers
are revealed, you have an expected payoff of about 0, but after the first number is revealed
to be 12, your expected payoff is much higher. Your expected payoff becomes further refined
as more of the numbers are revealed. Finally, the last number is revealed and you know your
payoff exactly. We could model your series of guesses as a martingale.

Definition 4.2. Let X1 , X2 , ..., Xn be discrete random variables. Let f : Rn → R. Define

Yi = E[f (X1 , ..., Xn )|X1 , ..., Xi ]. Then Y0 , ..., Yn is a martingale with respect to X1 , ..., Xn .

So we have Y0 = E[f (X1 , ..., Xn )] = µ, Y1 = E[f (X1 , ..., Xn )|X1 ],..., Yn = f (X1 , ..., Xn ).
Given a property of the Yi ’s, i.e. bounded diffference, we can get a bound on f .

8
Theorem 4.3. Azuma’s Inequality or Method of Bounded Difference. Let X1 , ..., Xn be
independent. If f satisfies

f (X1 , ..., Xn ) − f (X1 , ..., Xi−1 , Xi0 , Xi+1 , ..., Xn ) ≤ ci

then
2t2

P r[f (X1 , ..., Xn ) ≥ µ + t] ≤ exp − P 2
i ci

and the same upper bound holds for P r[f (X1 , ..., Xn ) ≥ µ − t].

For more information on martingales see [McD89].

References
[AS04] N. Alon and J.H. Spencer. The Probabilistic Method, chapter Appendix A.2. Wiley-
Interscience series in discrete mathematics and optimization. Wiley, 2004.

[DP+ ] D Dubhashi, Alessandro Panconesi, et al. Concentration of measure for the analysis
of randomised algorithms.

[JDP83] Kumar Joag-Dev and Frank Proschan. Negative association of random variables
with applications. The Annals of Statistics, 11(1):286–295, 1983.

[McD89] Colin McDiarmid. On the method of bounded differences. Surveys in combina-

torics, 141(1):148–188, 1989.

[MU05] M. Mitzenmacher and E. Upfal. Probability and Computing: Randomized Algo-

rithms and Probabilistic Analysis. Cambridge University Press, 2005.

[Slu77] Eric V Slud. Distribution inequalities for the binomial law. The Annals of Proba-
bility, 5(3):404–412, 1977.

Chernoff Bounds in Random Variables
No ratings yet
Chernoff Bounds in Random Variables
4 pages
Bounds and Inequalities in Probability
No ratings yet
Bounds and Inequalities in Probability
25 pages
Probability Inequalities Overview
No ratings yet
Probability Inequalities Overview
11 pages
Chernoff Bounds for Binomial Distribution
No ratings yet
Chernoff Bounds for Binomial Distribution
11 pages
Understanding Markov's Inequality
No ratings yet
Understanding Markov's Inequality
11 pages
Randomized Algorithms Analysis Essentials
No ratings yet
Randomized Algorithms Analysis Essentials
8 pages
CS203 PartIII Lecture Notes
No ratings yet
CS203 PartIII Lecture Notes
14 pages
Markov and Chebyshev Inequalities
No ratings yet
Markov and Chebyshev Inequalities
25 pages
Subgaussian Random Variables Explained
No ratings yet
Subgaussian Random Variables Explained
37 pages
Overview of Random Graphs & Probabilistic Methods
No ratings yet
Overview of Random Graphs & Probabilistic Methods
24 pages
Union Bound in Probability Theory
No ratings yet
Union Bound in Probability Theory
19 pages
Random Graphs: Inequalities & Estimates
No ratings yet
Random Graphs: Inequalities & Estimates
9 pages
Stochastic Multi-Armed Bandits Overview
No ratings yet
Stochastic Multi-Armed Bandits Overview
15 pages
Hoeffding's Inequality Overview
No ratings yet
Hoeffding's Inequality Overview
8 pages
Gaussian Random Variables and Their PDFs
No ratings yet
Gaussian Random Variables and Their PDFs
38 pages
Markov's Inequality in Machine Learning
No ratings yet
Markov's Inequality in Machine Learning
5 pages
Limit Theorems in Probability Theory
No ratings yet
Limit Theorems in Probability Theory
36 pages
Probability: Concentration Inequalities & WLLN
No ratings yet
Probability: Concentration Inequalities & WLLN
42 pages
UC Berkeley CS 174 Problem Set 3 Solutions
No ratings yet
UC Berkeley CS 174 Problem Set 3 Solutions
7 pages
Chebyshev's Inequality Applications
No ratings yet
Chebyshev's Inequality Applications
6 pages
Understanding Randomised Algorithms
No ratings yet
Understanding Randomised Algorithms
30 pages
Probability and Statistics Concepts Explained
No ratings yet
Probability and Statistics Concepts Explained
6 pages
Randomised Algorithms Mid-Term Report
No ratings yet
Randomised Algorithms Mid-Term Report
19 pages
Fundamental Inequalities in Probability
No ratings yet
Fundamental Inequalities in Probability
6 pages
Probability Bounds in Learning Theory
No ratings yet
Probability Bounds in Learning Theory
14 pages
Concentration Inequalities: Stephane - Boucheron@
No ratings yet
Concentration Inequalities: Stephane - Boucheron@
33 pages
Randomized Algorithms in Intro To Algorithms
No ratings yet
Randomized Algorithms in Intro To Algorithms
9 pages
UMA401 A4 Solutions
No ratings yet
UMA401 A4 Solutions
7 pages
Markov and Chebyshev Inequalities Explained
No ratings yet
Markov and Chebyshev Inequalities Explained
4 pages
Variance in Binomial and Geometric RVs
No ratings yet
Variance in Binomial and Geometric RVs
28 pages
Probability Bounds and Proofs
No ratings yet
Probability Bounds and Proofs
3 pages
Chebyshev's Inequality Explained
No ratings yet
Chebyshev's Inequality Explained
5 pages
Lecture 2
No ratings yet
Lecture 2
7 pages
Multi-Armed Bandits Probability Proofs
No ratings yet
Multi-Armed Bandits Probability Proofs
15 pages
Mathematics of Data Science
No ratings yet
Mathematics of Data Science
12 pages
Probability and Stochastic Processes
No ratings yet
Probability and Stochastic Processes
53 pages
Central Limit Theorem Explained
No ratings yet
Central Limit Theorem Explained
9 pages
Understanding Limit Theorems in Probability
No ratings yet
Understanding Limit Theorems in Probability
19 pages
Chapter 7 8fhjg
No ratings yet
Chapter 7 8fhjg
9 pages
UMA401 A4 Solutions
No ratings yet
UMA401 A4 Solutions
7 pages
04 Chernoff
No ratings yet
04 Chernoff
10 pages
Homework 2 by Chirag Mahapatra
No ratings yet
Homework 2 by Chirag Mahapatra
13 pages
11 Tail Inequalities: 11.1 Markov's Inequality
No ratings yet
11 Tail Inequalities: 11.1 Markov's Inequality
5 pages
Chebyshev's Inequality Explained
No ratings yet
Chebyshev's Inequality Explained
21 pages
ES150 StudyGuide
No ratings yet
ES150 StudyGuide
6 pages
McDiarmid's Inequality Explained
No ratings yet
McDiarmid's Inequality Explained
9 pages
Probability and Stochastic Processes Overview
No ratings yet
Probability and Stochastic Processes Overview
4 pages
Understanding Joint Distribution Concepts
No ratings yet
Understanding Joint Distribution Concepts
12 pages
Machine Learning Risk Minimization Techniques
No ratings yet
Machine Learning Risk Minimization Techniques
7 pages
Statistical Estimation Overview
No ratings yet
Statistical Estimation Overview
19 pages
Moment Generating Functions Explained
No ratings yet
Moment Generating Functions Explained
2 pages
Markov and Cheybshev Inequalities and The Law of Large Numbers-Proofs and Illustrations
No ratings yet
Markov and Cheybshev Inequalities and The Law of Large Numbers-Proofs and Illustrations
18 pages
Limit Theorems and Convergence in Probability
No ratings yet
Limit Theorems and Convergence in Probability
11 pages
Intermediate Statistics: Tail Inequalities
No ratings yet
Intermediate Statistics: Tail Inequalities
7 pages
What Is The Factorial of Hundred - Google Search
No ratings yet
What Is The Factorial of Hundred - Google Search
1 page
Economic Analysis Optimization Exam Guide
No ratings yet
Economic Analysis Optimization Exam Guide
10 pages
Basic Simulation Lab: Amity School of Engineering Amity University - UTTAR PRADESH
No ratings yet
Basic Simulation Lab: Amity School of Engineering Amity University - UTTAR PRADESH
24 pages
LU-Factorization and Elementary Matrices
No ratings yet
LU-Factorization and Elementary Matrices
15 pages
Engineering Mathematics: Fractions & Decimals
No ratings yet
Engineering Mathematics: Fractions & Decimals
18 pages
Understanding Sets in Discrete Mathematics
No ratings yet
Understanding Sets in Discrete Mathematics
63 pages
2023 HKDSE S6 Math Mock Paper 1
No ratings yet
2023 HKDSE S6 Math Mock Paper 1
25 pages
Discrete Random Variables Overview
No ratings yet
Discrete Random Variables Overview
17 pages
100 Polynomial Practice Problems
No ratings yet
100 Polynomial Practice Problems
1 page
Chap 01 Relations and Functions Test
No ratings yet
Chap 01 Relations and Functions Test
3 pages
Scale 0399
No ratings yet
Scale 0399
1 page
Port Management Admission Questions 2019-20
No ratings yet
Port Management Admission Questions 2019-20
12 pages
C Programming Lab Question Bank
No ratings yet
C Programming Lab Question Bank
3 pages
IMU CET Differential Calculus Guide
No ratings yet
IMU CET Differential Calculus Guide
42 pages
Properties and Operations of Radicals
100% (2)
Properties and Operations of Radicals
4 pages
Teaching Strategies in Mathematics Module
75% (4)
Teaching Strategies in Mathematics Module
274 pages
Complex Analysis Exercises and Solutions
No ratings yet
Complex Analysis Exercises and Solutions
3 pages
Class 4 Math Practice Questions
No ratings yet
Class 4 Math Practice Questions
5 pages
JEE Advanced 2016 Part Test 01 Syllabus
No ratings yet
JEE Advanced 2016 Part Test 01 Syllabus
4 pages
Basic Mathematics Assignment 2026
No ratings yet
Basic Mathematics Assignment 2026
4 pages
Peano-Baker Series in ODEs Explained
No ratings yet
Peano-Baker Series in ODEs Explained
5 pages
Math-8-2024-2025-Learning Plan
No ratings yet
Math-8-2024-2025-Learning Plan
6 pages
Functions and Their Properties Analysis
No ratings yet
Functions and Their Properties Analysis
16 pages
Solving Quadratic Inequalities Guide
No ratings yet
Solving Quadratic Inequalities Guide
3 pages
Polar Map Projections in Cartography
No ratings yet
Polar Map Projections in Cartography
15 pages
Compactness in Metric Spaces Explained
No ratings yet
Compactness in Metric Spaces Explained
12 pages
Journals-5
No ratings yet
Journals-5
106 pages
Significant Digits and Rounding Rules
No ratings yet
Significant Digits and Rounding Rules
2 pages

Chernoff and Tail Bounds Explained

Uploaded by

Chernoff and Tail Bounds Explained

Uploaded by

A Theorist’s Toolkit (CMU 18-859T, Fall 2013)

Lecture 03: Chernoff/Tail Bounds

2 BOUNDS BASED ONLY ON MEAN & VARIANCE

where the last equality holds because E[X] = 1.

Fact 2.2. Suppose E[X] =  and 0 ≤ X ≤ 1, then P r[X ≥ 2 ] ≥ 2 .

Proof. For contradiction, suppose P r[X ≥ 2 ] < 2 , then

3 BACK TO SUMS OF VARIABLES

3.1 Markov and Chebyshev on Sums

3.2 Fourth Moment Method

Let X = X1 + X2 + · · · + Xn , where the Xi ’s are independent identical random variables

3.3 Chernoff Method

E[exp(λX1 )] · E[exp(λX2 )] · · · E[exp(λXn )] = (E[exp(λX1 )])n

Therefore, E[eλX ] = (E[exp(λX1 )])n . Now, E[exp(λX1 )] = 21 eλ + 21 e−λ , which is known as

and the exact same bound holds for P r[X ≥ µ − t].

4.1 Negatively Associated Random Variables

is large, then the others must be small.

We will now formally define negative association.

Definition 4.1. Random variables X1 , X2 , ..., Xn are randomly associated if

E[f (Xi : i ∈ A)g(Xj : j ∈ B)] ≤ E[f (Xi : i ∈ A)]E[g(Xj : j ∈ B)]

E[exp(λX1 )exp(λX2 ) · · · exp(λXn )] ≤ E[exp(λX1 )] · E[exp(λX2 )] · · · E[exp(λXn )]

• Independence implies negative association.

• Negative association is closed under subsets and independent unions.

• Negative association is closed under sapplying nondecreasing functions to subsets of

Definition 4.2. Let X1 , X2 , ..., Xn be discrete random variables. Let f : Rn → R. Define

f (X1 , ..., Xn ) − f (X1 , ..., Xi−1 , Xi0 , Xi+1 , ..., Xn ) ≤ ci

For more information on martingales see [McD89].

[McD89] Colin McDiarmid. On the method of bounded differences. Surveys in combina-

[MU05] M. Mitzenmacher and E. Upfal. Probability and Computing: Randomized Algo-

You might also like

Fact 2.2. Suppose E[X] = and 0 ≤ X ≤ 1, then P r[X ≥ 2 ] ≥ 2 .

Proof. For contradiction, suppose P r[X ≥ 2 ] < 2 , then