Monte Carlo Methods Overview and Applications
Monte Carlo Methods Overview and Applications
Contents
1 Monte Carlo Methods: A Broad Perspective 4
1.1 Monte Carlo as a computational philosophy . . . . . . . . . . . . . . . . . . . . . 4
1.2 Why randomness is useful . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Major classes of Monte Carlo applications . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Monte Carlo methods as computational experiments . . . . . . . . . . . . . . . . 5
1.5 Strengths and limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.6 Scope of these notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
7 Conceptual Summary 10
1
10 Monte Carlo Integration and Random Variable Generation 12
10.1 Random variables as computational objects . . . . . . . . . . . . . . . . . . . . . 12
10.2 Random variable generation for Monte Carlo integration . . . . . . . . . . . . . . 13
10.3 The Monte Carlo estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
10.4 Unbiasedness: correctness on average . . . . . . . . . . . . . . . . . . . . . . . . . 13
10.5 Convergence: consistency of the estimator . . . . . . . . . . . . . . . . . . . . . . 14
10.6 Variance and statistical error: understanding Big-O notation . . . . . . . . . . . 14
10.7 Uniform sampling as a special case . . . . . . . . . . . . . . . . . . . . . . . . . . 15
10.8 Monte Carlo integration as statistical measurement . . . . . . . . . . . . . . . . . 15
10.9 Conceptual takeaway . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
13 Probability Foundations 16
13.1 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
13.2 Expectation and Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2
21 Example 5: Inverse Transform Sampling (Exponential) 21
21.1 Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
21.2 Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3
1 Monte Carlo Methods: A Broad Perspective
Monte Carlo methods (also known as Monte Carlo experiments or Monte Carlo simulations)
constitute a broad class of computational techniques that rely on repeated random sampling to
obtain numerical results.
The defining feature of Monte Carlo methods is the deliberate use of randomness as a
computational tool. Rather than attempting to solve a problem exactly using deterministic
algorithms, Monte Carlo methods approximate solutions by performing many randomized trials
and analyzing their collective behavior.
The problems addressed by Monte Carlo methods are often deterministic in principle. How-
ever, they may be analytically intractable, computationally expensive, or impractical to solve
using classical deterministic techniques due to factors such as high dimensionality, complex
geometry, nonlinear interactions, or uncertainty in inputs.
The name “Monte Carlo” originates from the Monte Carlo Casino in Monaco. One of the
pioneers of the method, the mathematician Stanislaw Ulam, was inspired by games of chance
while thinking about problems involving randomness and large numbers of possible outcomes
during his work on nuclear physics in the mid-twentieth century.
4
1.4 Monte Carlo methods as computational experiments
Monte Carlo simulations are often implemented as computer experiments. Each run of a Monte
Carlo algorithm can be viewed as one realization of a randomized experiment. The final result
is obtained not from a single run, but from the statistical behavior observed across many
independent repetitions.
This perspective closely parallels experimental science, where repeated measurements are
used to estimate physical quantities and quantify uncertainty.
5
How can uniformly distributed random numbers be transformed into random vari-
ables with a desired probability distribution?
6
3.3 Law of Large Numbers (conceptual role)
The Law of Large Numbers is the theoretical foundation of Monte Carlo methods. It asserts
that sample averages converge to true expectations as the number of samples grows.
There are two main forms:
P
1. Weak Law of Large Numbers: µ̂N − → µ (convergence in probability)
a.s.
2. Strong Law of Large Numbers: µ̂N −−→ µ (almost sure convergence)
The strong form, which asserts that convergence occurs with probability 1, provides the
most reassuring guarantee for Monte Carlo methods.
Then E[I] = π/4. Repeating this experiment N times yields the estimator:
N
1 X
π̂N = 4 · Ii .
N
i=1
This illustrates how geometric quantities can be estimated using randomized sampling. As
N increases, π̂N converges to π.
Why this works: The indicator variable I is 1 when the point lands inside the quarter
circle and 0 otherwise. The proportion of points inside the circle approximates the ratio of
the circle’s area to the square’s area, which equals π/4. Since each point is chosen uniformly
at random, sampling many points and checking their positions gives a reliable estimate of this
ratio. This is the essence of Monte Carlo: random sampling reveals geometric and mathematical
properties through observation.
This demonstrates that probabilities are special cases of expectations. Any probabilistic
question can be formulated as the expectation of an indicator function, making it amenable to
Monte Carlo estimation.
Deeper intuition: A probability is just a long-run average. If you flip a fair coin infinitely
many times and count how many times heads appears, the fraction approaches 1/2. More
generally, the probability of any event is the average value of an indicator function that equals
7
1 when the event occurs. By running the experiment many times and computing this average,
we estimate the true probability. This universal principle—probability as an expectation of an
indicator—connects all Monte Carlo methods.
R = min{k : Sk ≥ T }.
The random variable R represents the number of throws required to reach or exceed T . The
goal is to estimate:
µ = E[R].
Monte Carlo estimation proceeds by repeating the entire experiment many times, recording
realizations R1 , R2 , . . . , RN , and computing:
N
1 X
µ̂N = Ri .
N
i=1
This example illustrates a more complex scenario: we are not averaging simple observations,
but rather the results of entire multi-step experiments. Each trial consists of multiple dice rolls
until a stopping condition is met. This demonstrates the flexibility of Monte Carlo methods to
handle hierarchical and sequential processes.
Why this demonstrates Monte Carlo’s power: This problem is difficult to solve ana-
lytically. The number of throws needed is random and depends on the cumulative sum, which
fluctuates. Instead of deriving a closed-form formula (which is tedious and error-prone), we
simply run the full experiment many times on a computer and average the outcomes. Each
run is a complete realization of the random process. This approach generalizes to vastly more
complex scenarios: financial simulations with correlated assets, particle transport through mat-
ter, weather prediction, and countless others. The computational cost scales simply with the
number of trials, not with the complexity of the system.
8
This can be formalized as:
Xn = Xn−1 + ξn , (1)
where each increment ξn is defined by:
(
1
+1 with probability 2
ξn = 1
(2)
−1 with probability 2
and
1 1
E[ξi ] = (+1) · + (−1) · = 0.
2 2
Thus
E[Xn ] = 0. (4)
On average, the walk has no preferred direction; it does not drift.
Using independence,
n
X
Var(Xn ) = Var(ξi ).
i=1
9
6.5 Significance of E[Xn2 ] and its interpretation
What does E[Xn2 ] mean? This is the mean squared displacement. After n steps, the particle
could be at any position (positive or negative). Squaring the position emphasizes how far away
it is, regardless of direction. The expectation E[Xn2 ] is the average of these squared distances
over all possible random walks.
Why calculate it? This quantity captures how much the walk has spread out. Unlike
the mean position E[Xn ] = 0 (which suggests no motion), the quantity E[Xn2 ] p = n reveals that
√
spreading grows linearly with time. A typical displacement from the origin is E[Xn2 ] = n.
√
Physical significance: This n scaling appears in diffusion processes everywhere. Smoke
spreads through a room, heat diffuses through material, pollutants disperse in water—all follow
this same scaling law. Knowing E[Xn2 ] lets us predict how far the effect has spread after time n.
More mathematically, E[Xn2 ] is related to the variance and controls how concentrated or spread
the distribution is, which directly determines how well we can estimate the true expectation in
Monte Carlo methods.
So, many small symmetric random steps produce an approximately normal distribution at
large scales. This is a concrete example of Gaussian behavior emerging from simple randomness.
\ 2 2
By the Law of Large Numbers, E[Xn ] → E[Xn ] = n as M → ∞.
7 Conceptual Summary
Monte Carlo methods:
• Use randomness as a computational tool
• Replace exact computation with statistical estimation
• Apply to discrete and continuous problems alike
• Naturally handle complexity and uncertainty
Key takeaway: Monte Carlo methods transform difficult deterministic problems into prob-
lems of empirical measurement. These foundations prepare us to study Monte Carlo integration
and sampling methods in greater depth.
10
8 Prelude: Why Monte Carlo?
Many problems in physics, statistics, and applied mathematics reduce to the evaluation of
integrals of the form: Z
I= f (x) p(x) dx, (7)
Ω
where p(x) is a probability density function (PDF) and f (x) is some quantity of interest (an
“observable”).
In principle, such integrals can be evaluated by standard calculus or by classical numerical
methods. However, in practice there are two major obstacles:
• High dimension: When x ∈ Rd with large d (e.g. d = 10, 100, 1000), grid-based methods
require an astronomically large number of points. The required work typically grows like
M d for M grid points per dimension. This is the curse of dimensionality.
• Complex structure: Even in low dimension, f (x) or the domain Ω may have complicated
shape, sharp peaks, or singularities, making classical quadrature inaccurate or unstable.
Monte Carlo offers a different viewpoint:
• Instead of covering the domain by a regular grid, it draws random samples from Ω.
• Instead of summing function values on a fixed pattern, it computes an average over random
samples.
• Instead of error depending exponentially on dimension, the Monte Carlo error typically
decays like N −1/2 , almost independent of d.
A useful physical picture is the following: imagine wanting to know the average temperature
in a room. One approach is to place many thermometers on a regular grid (deterministic).
Another approach is to walk around randomly and measure temperature at random locations,
then average. Monte Carlo is like the second: a randomized measurement process that estimates
an average.
The key advantage emerges in high dimensions. If you have a 100-dimensional domain
and want to place 10 grid points per dimension, a grid-based method needs 10100 evaluations—
impossible! But Monte Carlo does not care about dimension; it uses the same number of samples
regardless. This is why Monte Carlo dominates in high-dimensional problems.
The goal of this prelude is therefore to shift from:
“evaluate a deterministic integral” to “estimate an expectation by sampling”.
This conceptual shift will be formalized in the next section.
11
9.2 Introduction of probability densities
The key conceptual step in Monte Carlo integration is to reinterpret this integral as an expec-
tation. Choose a probability density function p(x) on Ω such that:
Z
p(x) ≥ 0 for all x ∈ Ω, p(x) dx = 1. (9)
Ω
We do not require p(x) to be related to f (x) at this stage; any valid density will do. Now
multiply and divide by p(x) inside the integral:
Z Z
f (x)
I= f (x) dx = p(x) dx. (10)
Ω Ω p(x)
This expression now has the form of an expectation of the random variable f (X)/p(X),
where X is distributed according to p.
Here Ep [·] denotes expectation with respect to the distribution p. This is not an approxi-
mation; it is a purely algebraic rewrite of the original integral.
Why this matters.
• Calculus tells us how to manipulate integrals symbolically.
• Probability theory tells us how to estimate expectations by sampling.
• By turning integrals into expectations, we gain access to the entire toolkit of statistics
and Monte Carlo.
Intuitively, one can think of X as a random point in Ω, chosen according to p(x). The
quantity f (X)/p(X) can then be viewed as a “random measurement” whose average over many
trials equals the integral I. This idea becomes concrete in the next section when we introduce
the Monte Carlo estimator.
12
10.2 Random variable generation for Monte Carlo integration
To transform the expectation identity (11) into a numerical method, we proceed as follows:
1. Fix a density p(x) on Ω for which we know how to generate samples.
2. Generate N independent random variables
X1 , X2 , . . . , XN ∼ p(x).
f (Xi )
Yi = , i = 1, . . . , N. (12)
p(Xi )
Interpretation. Each Yi is one noisy measurement of I. The estimator IˆN is the average
of N independent measurements. This mirrors experimental science: one never trusts a single
reading of an instrument; one repeats the measurement and averages.
E[IˆN ] = I. (14)
This means that if we could repeat the entire Monte Carlo experiment infinitely many times
(each time using fresh random numbers) and then average all resulting estimates, we would
recover the exact integral. Unbiasedness does not guarantee that any single run gives the
correct value, but it does guarantee that there is no systematic tendency to overestimate or
underestimate.
13
10.5 Convergence: consistency of the estimator
The Law of Large Numbers gives a strong statement about individual runs:
a.s.
IˆN −−→ I as N → ∞. (16)
This says that with probability 1, if we fix one infinite sequence of random samples and form
the partial averages IˆN , the sequence will converge to I.
Informally:
• Each IˆN is random and has some error.
• As N grows, these errors get smaller and smaller and do not persist.
• Eventually the estimator stays close to I and does not wander away.
Thus, Monte Carlo integration is consistent: more samples lead to better estimates.
Understanding Big-O notation: When we write error ∼ O(N −1/2 ), we mean the error
scales like N −1/2 in the limit of large N . More precisely, Big-O notation describes how a quantity
depends on some parameter. Here, O(N −1/2 ) means:
• If you quadruple N (multiply by 4), the error shrinks by a factor of about 2 (since 4−1/2 =
1/2).
• If you increase N by a factor of 100, the error decreases by a factor of 10 (since 100−1/2 =
1/10). √
• The error is always proportional to 1/ N , with some constant hidden in the O(·) notation.
Comparison with classical quadrature and dimensionality: Traditional methods like
Simpson’s rule or Gaussian quadrature use a grid : they place evaluation points on a regular
lattice covering the domain. To achieve a given error tolerance in dimension d, you typically
need M points per dimension, requiring a total of M d function evaluations. This is the curse
of dimensionality: exponential growth with dimension.
In contrast, Monte Carlo uses exactly N random samples, regardless of dimension. For high
d (say, 100 or 1000), Monte Carlo is vastly superior: it needs N ≈ 104 or 106 samples, whereas
a grid method would need M 100 or M 1000 evaluations (impossible!). This dimension-robustness
is the central reason Monte Carlo dominates in high-dimensional integration.
This important observation is the central strength of Monte Carlo:
√
• Doubling the number of samples N reduces the error only by a factor of about 2.
• But this N −1/2 rate does not degrade with the dimension d of Ω.
• Grid-based methods require exponentially more points as d increases.
• Monte Carlo requires the same number of samples regardless of d.
This explains why Monte Carlo is so useful for high-dimensional integrals: although conver-
gence is slow per sample, it is steady and dimension-robust, whereas deterministic methods fail
catastrophically in high dimensions.
14
10.7 Uniform sampling as a special case
If Ω = [a, b] and we choose the uniform density
1
p(x) = ,
b−a
then Z b
I= f (x) dx = (b − a)E[f (U )], (20)
a
where U ∼ Uniform(a, b). The estimator becomes
N
1 X
IˆN = (b − a) f (Ui ), (21)
N
i=1
15
12 Motivation for Physical Densities
Many physical processes require sampling from specific PDFs:
• Exponential distribution (radioactive decay, waiting times)
• Maxwell–Boltzmann distribution (thermal motion, molecular velocities)
• Power-law distributions (astrophysics, network formation)
• Boltzmann distribution (statistical mechanics at finite temperature)
In each case, the underlying physics dictates a natural probability distribution. Computing
expectations with respect to these distributions requires efficient sampling methods.
Thus we seek systematic methods to generate samples from arbitrary distributions. This is
the focus of the next sections.
13 Probability Foundations
13.1 Random Variables
A random variable X is a measurable map from a probability space (Ω, F , P ) to R. Its distri-
bution can be described by:
• Probability density function (PDF): pX (x)
• Cumulative distribution function (CDF): FX (x) = P (X ≤ x)
The CDF uniquely determines the distribution and is related to the PDF by:
Z x
FX (x) = pX (t) dt.
−∞
Variance quantifies the spread of f (X) around its mean and is a key measure of statistical
uncertainty in Monte Carlo estimation.
The CDF contains all information about the distribution and is defined for all random
variables (both continuous and discrete).
16
14.1 Properties
The cumulative distribution function has several fundamental properties:
1. F (x) is non-decreasing: if x1 < x2 , then F (x1 ) ≤ F (x2 )
2. Boundary conditions: F (−∞) = 0 and F (∞) = 1
3. If p(x) > 0 for all x in the support, then F (x) is strictly increasing and invertible
4. The derivative is the PDF: dF dx = p(x)
The invertibility property is crucial for inverse transform sampling. When the CDF is strictly
increasing and continuous, a unique inverse function F −1 exists.
15.3 Theorem
Let U ∼ Uniform(0, 1) and define:
X = F −1 (U ), (25)
where F is the CDF of a target distribution. Then X has PDF p(x).
17
This theorem provides a systematic method to transform uniform randomness into arbitrary
distributions.
15.4 Proof
We verify that X = F −1 (U ) has the correct distribution by computing its CDF:
P (X ≤ x) = P (F −1 (U ) ≤ x) (26)
= P (U ≤ F (x)) (27)
= F (x), (28)
which is precisely the CDF of the target distribution. The second equality holds because
F −1 (u) ≤ x if and only if u ≤ F (x), and the last equality uses that U ∼ Uniform(0, 1) so
P (U ≤ u) = u.
u = 1 − e−λx (31)
−λx
e =1−u (32)
−λx = ln(1 − u) (33)
1
x = − ln(1 − u). (34)
λ
Thus, to generate an exponential random variable:
1. Generate U ∼ Uniform(0, 1).
2. Set X = − λ1 ln(1 − U ).
Since 1 − U is also uniform on [0, 1], one often uses X = − λ1 ln U .
18
16.1 Why inverse transform sampling fails
Many important distributions, such as the normal distribution, have CDFs with no simple
closed-form inverse. While numerical inversion is possible, it may be computationally expensive
or unstable.
In multiple dimensions, the notion of a “CDF” becomes more complicated (involving inte-
grals over regions), and inverting it is generally not realistic.
p̃(x)
p(x) = ,
Z
where p̃(x) is known and Z (the normalizing constant) is unknown and hard to compute.
Without Z, we cannot construct the CDF F (x), so inverse transform sampling cannot be applied
directly.
19
Part II: Practical Coding Examples
17.2 Solution
18.2 Solution
20
19.2 Solution
20.2 Solution
21
21.2 Solution
Conclusion
Monte Carlo methods provide a powerful, flexible framework for:
• Estimating expectations and integrals
• Sampling from complex distributions
• Solving high-dimensional problems
• Handling uncertainty and randomness
The combination of rigorous theory and practical coding examples makes Monte Carlo
both intellectually satisfying and computationally practical. The dimension-robustness of the
O(N −1/2 ) error rate is the key advantage over deterministic methods, especially in modern
applications involving 100s or 1000s of dimensions.
22