0% found this document useful (0 votes)
14 views22 pages

Monte Carlo Methods Overview and Applications

Monte Carlo methods are computational techniques that use repeated random sampling to approximate solutions to problems that are often deterministic but analytically intractable. They transform complex deterministic problems into empirical estimation tasks, making them useful for high-dimensional and irregular systems. These methods find applications in numerical integration, probability distribution sampling, and optimization across various fields including science and engineering.

Uploaded by

abhinavjp002
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views22 pages

Monte Carlo Methods Overview and Applications

Monte Carlo methods are computational techniques that use repeated random sampling to approximate solutions to problems that are often deterministic but analytically intractable. They transform complex deterministic problems into empirical estimation tasks, making them useful for high-dimensional and irregular systems. These methods find applications in numerical integration, probability distribution sampling, and optimization across various fields including science and engineering.

Uploaded by

abhinavjp002
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Monte Carlo Methods

Foundations, Philosophy, and Illustrative Examples

Contents
1 Monte Carlo Methods: A Broad Perspective 4
1.1 Monte Carlo as a computational philosophy . . . . . . . . . . . . . . . . . . . . . 4
1.2 Why randomness is useful . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Major classes of Monte Carlo applications . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Monte Carlo methods as computational experiments . . . . . . . . . . . . . . . . 5
1.5 Strengths and limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.6 Scope of these notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Random Variables as Computational Objects 5


2.1 Uniform random variables as the primitive source . . . . . . . . . . . . . . . . . . 5
2.2 Why computers generate Uniform(0, 1) . . . . . . . . . . . . . . . . . . . . . . . . 6

3 Monte Carlo as Repeated Experiments 6


3.1 Repeated trials and empirical averages . . . . . . . . . . . . . . . . . . . . . . . . 6
3.2 Expectations as measurable quantities . . . . . . . . . . . . . . . . . . . . . . . . 6
3.3 Law of Large Numbers (conceptual role) . . . . . . . . . . . . . . . . . . . . . . . 7

4 Simple Monte Carlo Experiments 7


4.1 Estimating the value of π . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.2 Estimating probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

5 Monte Carlo Estimation of Complex Quantities 8


5.1 Stopping-time experiment with dice . . . . . . . . . . . . . . . . . . . . . . . . . 8

6 One-Dimensional Random Walk 8


6.1 Definition and conceptual motivation . . . . . . . . . . . . . . . . . . . . . . . . . 8
6.2 Position after n steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
6.3 Expectation: no drift . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
6.4 Variance: measure of spreading . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
6.5 Significance of E[Xn2 ] and its interpretation . . . . . . . . . . . . . . . . . . . . . 10
6.6 Distribution and Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . 10
6.7 Monte Carlo viewpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

7 Conceptual Summary 10

8 Prelude: Why Monte Carlo? 11

9 From Deterministic Integrals to Expectations 11


9.1 Deterministic integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
9.2 Introduction of probability densities . . . . . . . . . . . . . . . . . . . . . . . . . 12
9.3 Exact reformulation as expectations . . . . . . . . . . . . . . . . . . . . . . . . . 12

1
10 Monte Carlo Integration and Random Variable Generation 12
10.1 Random variables as computational objects . . . . . . . . . . . . . . . . . . . . . 12
10.2 Random variable generation for Monte Carlo integration . . . . . . . . . . . . . . 13
10.3 The Monte Carlo estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
10.4 Unbiasedness: correctness on average . . . . . . . . . . . . . . . . . . . . . . . . . 13
10.5 Convergence: consistency of the estimator . . . . . . . . . . . . . . . . . . . . . . 14
10.6 Variance and statistical error: understanding Big-O notation . . . . . . . . . . . 14
10.7 Uniform sampling as a special case . . . . . . . . . . . . . . . . . . . . . . . . . . 15
10.8 Monte Carlo integration as statistical measurement . . . . . . . . . . . . . . . . . 15
10.9 Conceptual takeaway . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

11 Failure of Uniform Sampling 15

12 Motivation for Physical Densities 16

13 Probability Foundations 16
13.1 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
13.2 Expectation and Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

14 The Cumulative Distribution Function 16


14.1 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

15 Inverse Transform Sampling 17


15.1 Interpretation: geometric intuition before formalism . . . . . . . . . . . . . . . . 17
15.2 Intuition with a concrete example . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
15.3 Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
15.4 Proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
15.5 Example: Exponential Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 18
15.6 Physical Meaning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

16 Limitations of Inverse Transform Sampling 18


16.1 Why inverse transform sampling fails . . . . . . . . . . . . . . . . . . . . . . . . . 19
16.2 High-dimensional distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
16.3 Unnormalized densities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
16.4 Motivation for advanced methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

17 Example 1: Estimating π via Buffon’s Method 20


17.1 Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
17.2 Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

18 Example 2: Estimating Rare Event Probabilities 20


18.1 Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
18.2 Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

19 Example 3: Random Walk Simulation 20


19.1 Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
19.2 Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

20 Example 4: Stopping Time Experiment (Dice) 21


20.1 Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
20.2 Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2
21 Example 5: Inverse Transform Sampling (Exponential) 21
21.1 Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
21.2 Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

22 Summary of Coding Examples 22

3
1 Monte Carlo Methods: A Broad Perspective
Monte Carlo methods (also known as Monte Carlo experiments or Monte Carlo simulations)
constitute a broad class of computational techniques that rely on repeated random sampling to
obtain numerical results.
The defining feature of Monte Carlo methods is the deliberate use of randomness as a
computational tool. Rather than attempting to solve a problem exactly using deterministic
algorithms, Monte Carlo methods approximate solutions by performing many randomized trials
and analyzing their collective behavior.
The problems addressed by Monte Carlo methods are often deterministic in principle. How-
ever, they may be analytically intractable, computationally expensive, or impractical to solve
using classical deterministic techniques due to factors such as high dimensionality, complex
geometry, nonlinear interactions, or uncertainty in inputs.
The name “Monte Carlo” originates from the Monte Carlo Casino in Monaco. One of the
pioneers of the method, the mathematician Stanislaw Ulam, was inspired by games of chance
while thinking about problems involving randomness and large numbers of possible outcomes
during his work on nuclear physics in the mid-twentieth century.

1.1 Monte Carlo as a computational philosophy


Monte Carlo methods are not a single algorithm but a general computational philosophy. They
replace exhaustive or exact computation with statistical exploration. Instead of evaluating all
possible configurations of a system, Monte Carlo methods sample representative configurations
at random and infer global properties from these samples.
In this sense, Monte Carlo methods transform difficult deterministic problems into problems
of empirical estimation.

1.2 Why randomness is useful


Randomness becomes useful in computation when:
• The number of possible system configurations is extremely large
• The problem involves many degrees of freedom
• The system exhibits complex or irregular behavior
• Deterministic algorithms scale poorly with problem size
By randomly exploring the space of possible outcomes, Monte Carlo methods avoid the
need for systematic enumeration, which often becomes infeasible in large or high-dimensional
problems.

1.3 Major classes of Monte Carlo applications


Monte Carlo methods are widely used across science, engineering, and mathematics. Broadly,
they are applied to three major classes of problems:
1. Numerical integration
2. Generating samples from probability distributions
3. Optimization and search in complex spaces
Beyond these categories, Monte Carlo methods are also used to model systems with signif-
icant uncertainty in their inputs, such as reliability analysis, financial risk assessment, particle
transport, and large-scale physical simulations.

4
1.4 Monte Carlo methods as computational experiments
Monte Carlo simulations are often implemented as computer experiments. Each run of a Monte
Carlo algorithm can be viewed as one realization of a randomized experiment. The final result
is obtained not from a single run, but from the statistical behavior observed across many
independent repetitions.
This perspective closely parallels experimental science, where repeated measurements are
used to estimate physical quantities and quantify uncertainty.

1.5 Strengths and limitations


Monte Carlo methods offer several important advantages:
• Applicability to high-dimensional and complex problems
• Minimal assumptions about smoothness or structure
• Conceptual simplicity and algorithmic flexibility
• Natural handling of uncertainty
At the same time, they have inherent limitations:
• Results are approximate rather than exact
• Convergence can be slow
• Accuracy depends on the number of samples
• Statistical error must be carefully assessed

1.6 Scope of these notes


These notes introduce Monte Carlo methods from a broad and conceptual standpoint, empha-
sizing intuition, examples, and foundational principles. Specific Monte Carlo techniques will
be developed gradually, beginning with simple randomized experiments and progressing toward
more structured Monte Carlo algorithms.

2 Random Variables as Computational Objects


In probability theory, a random variable is an abstract mathematical object defined on a prob-
ability space. In Monte Carlo computation, however, random variables must be realized con-
cretely as numerical values produced by an algorithm.
Thus, Monte Carlo methods require mechanisms to:
• Generate realizations of random variables
• Ensure these realizations follow prescribed probability distributions
• Repeat this process reliably and reproducibly
The process of producing numerical realizations from a probability distribution is known
as random variable generation. This process is the statistical engine that drives all Monte
Carlo algorithms.

2.1 Uniform random variables as the primitive source


At the most fundamental level, Monte Carlo computation assumes access to a source of ran-
domness that produces samples from the uniform distribution on the interval [0, 1].
Let U ∼ Uniform(0, 1). Such a random variable serves as the primitive building block from
which more complex random variables are constructed.
Key idea: In Monte Carlo methods, all randomness ultimately originates from uniform
random variables.
Consequently, the problem of random variable generation can be rephrased as:

5
How can uniformly distributed random numbers be transformed into random vari-
ables with a desired probability distribution?

2.2 Why computers generate Uniform(0, 1)


Modern computers cannot truly generate randomness; instead, they rely on pseudo-random
number generators (PRNGs). These algorithms use deterministic sequences that, when properly
constructed, exhibit statistical properties indistinguishable from true randomness.
The standard basis is the uniform distribution on [0, 1] because:
1. It is the simplest to implement algorithmically
2. It requires minimal computational overhead
3. All other continuous distributions can be derived from it via transformation
4. Discrete distributions follow naturally from uniform samples
Common PRNG algorithms include the Linear Congruential Generator (LCG), the Mersenne
Twister, and more recent cryptographically secure methods. The period (the number of values
before repetition) must be sufficiently long for practical applications.

3 Monte Carlo as Repeated Experiments


A Monte Carlo computation can be viewed as a repeated experiment:
• Each trial produces a random outcome
• Outcomes are governed by prescribed rules
• The quantity of interest is inferred from repeated trials
The central object of interest is typically an average or long-run behavior emerging from
many repetitions.

3.1 Repeated trials and empirical averages


Suppose we perform N independent trials of a random experiment, obtaining outcomes X1 , X2 , . . . , XN .
The empirical average is:
N
1 X
XN = Xi .
N
i=1
This sample mean is an estimator of the true population mean µ = E[X]. The quality
of this estimator improves with N . As N → ∞, the empirical average converges to the true
expectation, a principle formalized by the Law of Large Numbers.

3.2 Expectations as measurable quantities


Let X be a random variable and f (X) a measurable function. Monte Carlo methods aim to
estimate quantities of the form:
E[f (X)].
If X1 , X2 , . . . , XN are independent realizations of X, a natural estimator is:
N
1 X
µ̂N = f (Xi ).
N
i=1

By the Law of Large Numbers,


a.s.
µ̂N −−→ E[f (X)].
This principle underpins all of Monte Carlo computation: expectations can be estimated by
averaging.

6
3.3 Law of Large Numbers (conceptual role)
The Law of Large Numbers is the theoretical foundation of Monte Carlo methods. It asserts
that sample averages converge to true expectations as the number of samples grows.
There are two main forms:
P
1. Weak Law of Large Numbers: µ̂N − → µ (convergence in probability)
a.s.
2. Strong Law of Large Numbers: µ̂N −−→ µ (almost sure convergence)
The strong form, which asserts that convergence occurs with probability 1, provides the
most reassuring guarantee for Monte Carlo methods.

4 Simple Monte Carlo Experiments


4.1 Estimating the value of π
Consider a unit square [0, 1]2 containing a quarter circle of radius 1. Rather than computing
the area analytically, we reformulate the problem probabilistically.
Let (X, Y ) be uniformly distributed in the unit square and define the indicator variable:
(
1, X 2 + Y 2 ≤ 1
I=
0, otherwise

Then E[I] = π/4. Repeating this experiment N times yields the estimator:
N
1 X
π̂N = 4 · Ii .
N
i=1

This illustrates how geometric quantities can be estimated using randomized sampling. As
N increases, π̂N converges to π.
Why this works: The indicator variable I is 1 when the point lands inside the quarter
circle and 0 otherwise. The proportion of points inside the circle approximates the ratio of
the circle’s area to the square’s area, which equals π/4. Since each point is chosen uniformly
at random, sampling many points and checking their positions gives a reliable estimate of this
ratio. This is the essence of Monte Carlo: random sampling reveals geometric and mathematical
properties through observation.

4.2 Estimating probabilities


Consider a random experiment with two outcomes: success or failure. Define a Bernoulli random
variable: (
1, if the event occurs
X=
0, otherwise
Then E[X] = P (event). Monte Carlo estimation proceeds by repeated trials and averaging:
N
1 X
p̂N = Xi .
N
i=1

This demonstrates that probabilities are special cases of expectations. Any probabilistic
question can be formulated as the expectation of an indicator function, making it amenable to
Monte Carlo estimation.
Deeper intuition: A probability is just a long-run average. If you flip a fair coin infinitely
many times and count how many times heads appears, the fraction approaches 1/2. More
generally, the probability of any event is the average value of an indicator function that equals

7
1 when the event occurs. By running the experiment many times and computing this average,
we estimate the true probability. This universal principle—probability as an expectation of an
indicator—connects all Monte Carlo methods.

5 Monte Carlo Estimation of Complex Quantities


5.1 Stopping-time experiment with dice
Consider repeatedly throwing three independent eight-sided dice. Let Yi denote the sum of the
dice on the i-th throw, and define the cumulative sum:
k
X
Sk = Yi .
i=1

Given a threshold T , define the stopping time:

R = min{k : Sk ≥ T }.

The random variable R represents the number of throws required to reach or exceed T . The
goal is to estimate:
µ = E[R].
Monte Carlo estimation proceeds by repeating the entire experiment many times, recording
realizations R1 , R2 , . . . , RN , and computing:
N
1 X
µ̂N = Ri .
N
i=1

This example illustrates a more complex scenario: we are not averaging simple observations,
but rather the results of entire multi-step experiments. Each trial consists of multiple dice rolls
until a stopping condition is met. This demonstrates the flexibility of Monte Carlo methods to
handle hierarchical and sequential processes.
Why this demonstrates Monte Carlo’s power: This problem is difficult to solve ana-
lytically. The number of throws needed is random and depends on the cumulative sum, which
fluctuates. Instead of deriving a closed-form formula (which is tedious and error-prone), we
simply run the full experiment many times on a computer and average the outcomes. Each
run is a complete realization of the random process. This approach generalizes to vastly more
complex scenarios: financial simulations with correlated assets, particle transport through mat-
ter, weather prediction, and countless others. The computational cost scales simply with the
number of trials, not with the complexity of the system.

6 One-Dimensional Random Walk


A one-dimensional random walk is a fundamental model in probability theory and stochastic
processes that describes a sequence of random steps taken in one dimension. This section
develops the theory rigorously and connects it to Monte Carlo ideas.

6.1 Definition and conceptual motivation


Consider a particle that starts at position X0 = 0 on a number line. At each discrete time step
n = 1, 2, 3, . . ., the particle takes a step that is:
• Right: with probability 12
• Left: with probability 21

8
This can be formalized as:
Xn = Xn−1 + ξn , (1)
where each increment ξn is defined by:
(
1
+1 with probability 2
ξn = 1
(2)
−1 with probability 2

The sequence {ξn }∞


n=1 consists of independent and identically distributed (i.i.d.) random
variables.

6.2 Position after n steps


After n steps, the position is:
n
X
Xn = ξi . (3)
i=1

This is a sum of n independent random variables taking values ±1.

6.3 Expectation: no drift


By linearity of expectation:
n
X
E[Xn ] = E[ξi ],
i=1

and
1 1
E[ξi ] = (+1) · + (−1) · = 0.
2 2
Thus
E[Xn ] = 0. (4)
On average, the walk has no preferred direction; it does not drift.

6.4 Variance: measure of spreading


The variance of Xn measures how far the walk typically is from the origin:
n
!
X
Var(Xn ) = Var ξi .
i=1

Using independence,
n
X
Var(Xn ) = Var(ξi ).
i=1

For a single step,


Var(ξi ) = E[ξi2 ] − (E[ξi ])2 = 1 − 0 = 1.
Hence
Var(Xn ) = n, E[Xn2 ] = n. (5)

The typical distance from the origin grows like n, which is called diffusive scaling.

9
6.5 Significance of E[Xn2 ] and its interpretation
What does E[Xn2 ] mean? This is the mean squared displacement. After n steps, the particle
could be at any position (positive or negative). Squaring the position emphasizes how far away
it is, regardless of direction. The expectation E[Xn2 ] is the average of these squared distances
over all possible random walks.
Why calculate it? This quantity captures how much the walk has spread out. Unlike
the mean position E[Xn ] = 0 (which suggests no motion), the quantity E[Xn2 ] p = n reveals that

spreading grows linearly with time. A typical displacement from the origin is E[Xn2 ] = n.

Physical significance: This n scaling appears in diffusion processes everywhere. Smoke
spreads through a room, heat diffuses through material, pollutants disperse in water—all follow
this same scaling law. Knowing E[Xn2 ] lets us predict how far the effect has spread after time n.
More mathematically, E[Xn2 ] is related to the variance and controls how concentrated or spread
the distribution is, which directly determines how well we can estimate the true expectation in
Monte Carlo methods.

6.6 Distribution and Central Limit Theorem


The distribution of Xn is discrete (only integer points), but for large n it becomes well approx-
imated by a Gaussian. The Central Limit Theorem states that
X d
√n −
→ N (0, 1) as n → ∞. (6)
n

So, many small symmetric random steps produce an approximately normal distribution at
large scales. This is a concrete example of Gaussian behavior emerging from simple randomness.

6.7 Monte Carlo viewpoint


To estimate E[Xn2 ] or other quantities when analytical formulas are unavailable or inconvenient,
we can simulate:
1. Generate M independent random walks, each of length n.
(j)
2. For each walk j, record the final position Xn .
3. Form the estimator
M
\ 2] =
1 X (j) 2
E[X n (Xn ) .
M
j=1

\ 2 2
By the Law of Large Numbers, E[Xn ] → E[Xn ] = n as M → ∞.

7 Conceptual Summary
Monte Carlo methods:
• Use randomness as a computational tool
• Replace exact computation with statistical estimation
• Apply to discrete and continuous problems alike
• Naturally handle complexity and uncertainty
Key takeaway: Monte Carlo methods transform difficult deterministic problems into prob-
lems of empirical measurement. These foundations prepare us to study Monte Carlo integration
and sampling methods in greater depth.

10
8 Prelude: Why Monte Carlo?
Many problems in physics, statistics, and applied mathematics reduce to the evaluation of
integrals of the form: Z
I= f (x) p(x) dx, (7)

where p(x) is a probability density function (PDF) and f (x) is some quantity of interest (an
“observable”).
In principle, such integrals can be evaluated by standard calculus or by classical numerical
methods. However, in practice there are two major obstacles:
• High dimension: When x ∈ Rd with large d (e.g. d = 10, 100, 1000), grid-based methods
require an astronomically large number of points. The required work typically grows like
M d for M grid points per dimension. This is the curse of dimensionality.
• Complex structure: Even in low dimension, f (x) or the domain Ω may have complicated
shape, sharp peaks, or singularities, making classical quadrature inaccurate or unstable.
Monte Carlo offers a different viewpoint:
• Instead of covering the domain by a regular grid, it draws random samples from Ω.
• Instead of summing function values on a fixed pattern, it computes an average over random
samples.
• Instead of error depending exponentially on dimension, the Monte Carlo error typically
decays like N −1/2 , almost independent of d.
A useful physical picture is the following: imagine wanting to know the average temperature
in a room. One approach is to place many thermometers on a regular grid (deterministic).
Another approach is to walk around randomly and measure temperature at random locations,
then average. Monte Carlo is like the second: a randomized measurement process that estimates
an average.
The key advantage emerges in high dimensions. If you have a 100-dimensional domain
and want to place 10 grid points per dimension, a grid-based method needs 10100 evaluations—
impossible! But Monte Carlo does not care about dimension; it uses the same number of samples
regardless. This is why Monte Carlo dominates in high-dimensional problems.
The goal of this prelude is therefore to shift from:
“evaluate a deterministic integral” to “estimate an expectation by sampling”.
This conceptual shift will be formalized in the next section.

9 From Deterministic Integrals to Expectations


The examples studied so far illustrate that Monte Carlo methods rely on repeated random
experiments and averaging. We now make precise the relationship between ordinary integrals
and expectations of random variables.

9.1 Deterministic integrals


Consider a deterministic integral of the form:
Z
I= f (x) dx, (8)

where Ω ⊂ Rd is a domain and f : Ω → R is integrable. This integral represents a purely
mathematical quantity: the signed area (or volume) under the graph of f (x).
Traditional numerical analysis tries to approximate I by evaluating f at a carefully chosen
set of points (a grid, or special quadrature nodes) and forming a weighted sum. The weight
pattern is fixed in advance and usually depends strongly on the dimension d.

11
9.2 Introduction of probability densities
The key conceptual step in Monte Carlo integration is to reinterpret this integral as an expec-
tation. Choose a probability density function p(x) on Ω such that:
Z
p(x) ≥ 0 for all x ∈ Ω, p(x) dx = 1. (9)

We do not require p(x) to be related to f (x) at this stage; any valid density will do. Now
multiply and divide by p(x) inside the integral:
Z Z
f (x)
I= f (x) dx = p(x) dx. (10)
Ω Ω p(x)

This expression now has the form of an expectation of the random variable f (X)/p(X),
where X is distributed according to p.

9.3 Exact reformulation as expectations


Define a random variable X with density p(x). Then the integral can be written exactly as:
 
f (X)
I = Ep . (11)
p(X)

Here Ep [·] denotes expectation with respect to the distribution p. This is not an approxi-
mation; it is a purely algebraic rewrite of the original integral.
Why this matters.
• Calculus tells us how to manipulate integrals symbolically.
• Probability theory tells us how to estimate expectations by sampling.
• By turning integrals into expectations, we gain access to the entire toolkit of statistics
and Monte Carlo.
Intuitively, one can think of X as a random point in Ω, chosen according to p(x). The
quantity f (X)/p(X) can then be viewed as a “random measurement” whose average over many
trials equals the integral I. This idea becomes concrete in the next section when we introduce
the Monte Carlo estimator.

10 Monte Carlo Integration and Random Variable Generation


10.1 Random variables as computational objects
In rigorous probability theory, a random variable X is defined as a measurable function from a
probability space (Ω, F , P ) to the real line R. This abstraction is needed for general theorems.
In Monte Carlo computation, however, what we actually need are:
• A function in code that, when called, returns a number (a sample).
• A guarantee that the numbers it produces follow a desired distribution.
• The ability to repeat this procedure many times.
Thus, for Monte Carlo, a random variable is effectively a black box sampler : each call gives
a new realization of X.
This is why random variable generation is so central: without a way to generate X with
the right distribution, we cannot hope to approximate E[g(X)] for any function g.

12
10.2 Random variable generation for Monte Carlo integration
To transform the expectation identity (11) into a numerical method, we proceed as follows:
1. Fix a density p(x) on Ω for which we know how to generate samples.
2. Generate N independent random variables

X1 , X2 , . . . , XN ∼ p(x).

3. Combine the samples into an estimator.


Each Xi is one “shot” or “experiment”—a single random point in Ω drawn according to p.
Monte Carlo integration is then simply the process of repeating this experiment many times
and averaging the outcomes in a careful way.

10.3 The Monte Carlo estimator


Given X1 , . . . , XN ∼ p, define

f (Xi )
Yi = , i = 1, . . . , N. (12)
p(Xi )

Each Yi is a random variable. Its expectation satisfies


 
f (X)
E[Yi ] = Ep = I.
p(X)

The Monte Carlo estimator of I is the sample mean of the Yi :


N N
1 X 1 X f (Xi )
IˆN = Yi = . (13)
N N p(Xi )
i=1 i=1

Interpretation. Each Yi is one noisy measurement of I. The estimator IˆN is the average
of N independent measurements. This mirrors experimental science: one never trusts a single
reading of an instrument; one repeats the measurement and averages.

10.4 Unbiasedness: correctness on average


A key property of IˆN is that it is unbiased:

E[IˆN ] = I. (14)

The proof is straightforward but important:


N N N
" #
1 X 1 X 1 X
E[IˆN ] = E Yi = E[Yi ] = I = I. (15)
N N N
i=1 i=1 i=1

This means that if we could repeat the entire Monte Carlo experiment infinitely many times
(each time using fresh random numbers) and then average all resulting estimates, we would
recover the exact integral. Unbiasedness does not guarantee that any single run gives the
correct value, but it does guarantee that there is no systematic tendency to overestimate or
underestimate.

13
10.5 Convergence: consistency of the estimator
The Law of Large Numbers gives a strong statement about individual runs:
a.s.
IˆN −−→ I as N → ∞. (16)
This says that with probability 1, if we fix one infinite sequence of random samples and form
the partial averages IˆN , the sequence will converge to I.
Informally:
• Each IˆN is random and has some error.
• As N grows, these errors get smaller and smaller and do not persist.
• Eventually the estimator stays close to I and does not wander away.
Thus, Monte Carlo integration is consistent: more samples lead to better estimates.

10.6 Variance and statistical error: understanding Big-O notation


To understand how quickly IˆN converges, we examine its variance. Using independence,
N N
!
1 X 1 X
Var(IˆN ) = Var Yi = 2 Var(Yi ) (17)
N N
i=1 i=1
 
1 1 f (X)
= Var(Y1 ) = Var . (18)
N N p(X)
Therefore, the standard deviation (a typical size of the error) behaves like
s  
1 f (X)
q
ˆ
error scale ≈ Var(IN ) = √ Var . (19)
N p(X)

Understanding Big-O notation: When we write error ∼ O(N −1/2 ), we mean the error
scales like N −1/2 in the limit of large N . More precisely, Big-O notation describes how a quantity
depends on some parameter. Here, O(N −1/2 ) means:
• If you quadruple N (multiply by 4), the error shrinks by a factor of about 2 (since 4−1/2 =
1/2).
• If you increase N by a factor of 100, the error decreases by a factor of 10 (since 100−1/2 =
1/10). √
• The error is always proportional to 1/ N , with some constant hidden in the O(·) notation.
Comparison with classical quadrature and dimensionality: Traditional methods like
Simpson’s rule or Gaussian quadrature use a grid : they place evaluation points on a regular
lattice covering the domain. To achieve a given error tolerance in dimension d, you typically
need M points per dimension, requiring a total of M d function evaluations. This is the curse
of dimensionality: exponential growth with dimension.
In contrast, Monte Carlo uses exactly N random samples, regardless of dimension. For high
d (say, 100 or 1000), Monte Carlo is vastly superior: it needs N ≈ 104 or 106 samples, whereas
a grid method would need M 100 or M 1000 evaluations (impossible!). This dimension-robustness
is the central reason Monte Carlo dominates in high-dimensional integration.
This important observation is the central strength of Monte Carlo:

• Doubling the number of samples N reduces the error only by a factor of about 2.
• But this N −1/2 rate does not degrade with the dimension d of Ω.
• Grid-based methods require exponentially more points as d increases.
• Monte Carlo requires the same number of samples regardless of d.
This explains why Monte Carlo is so useful for high-dimensional integrals: although conver-
gence is slow per sample, it is steady and dimension-robust, whereas deterministic methods fail
catastrophically in high dimensions.

14
10.7 Uniform sampling as a special case
If Ω = [a, b] and we choose the uniform density
1
p(x) = ,
b−a
then Z b
I= f (x) dx = (b − a)E[f (U )], (20)
a
where U ∼ Uniform(a, b). The estimator becomes
N
1 X
IˆN = (b − a) f (Ui ), (21)
N
i=1

which is the standard “random points in an interval” method.


Here the weighting factor 1/p(x) is constant, so each sample contributes equally. In more
general choices of p(x), the factors 1/p(Xi ) reweight the samples, emphasizing regions where
p(x) is small. Choosing p(x) intelligently to reduce variance is the principle behind importance
sampling.

10.8 Monte Carlo integration as statistical measurement


Monte Carlo integration can be viewed as a thought experiment in measurement:
• The true integral I is like an unknown physical constant.
• Each random sample Yi is one noisy measurement of I.
• The Monte Carlo estimate IˆN is the experimental mean of N measurements.
• The variance of Yi plays the role of √
experimental noise.
• The error bars on IˆN shrink like 1/ N as more measurements are taken.
This picture is helpful because it connects the abstract analysis to familiar ideas from lab-
oratory science and data analysis.

10.9 Conceptual takeaway


Monte Carlo integration rests on three interconnected ideas:
1. Rewriting integrals as expectations.
2. Estimating expectations by averaging independent samples.
3. Using random variable generators to produce those samples in practice.
Understanding all three components is essential for designing effective Monte Carlo algo-
rithms.

11 Failure of Uniform Sampling


Uniform sampling becomes inefficient when:
• The integrand is sharply peaked
• The target distribution is highly non-uniform
• Rare events dominate the expectation
When f (x) has a sharply localized peak, most uniform samples contribute negligibly to the
integral, while a few samples near the peak dominate. This leads to high variance and slow
convergence.
Consider estimating a rare event: if the probability of occurrence is 10−6 , uniform sampling
requires roughly one million trials to observe a single occurrence. The variance of the estimator
remains large until the sample size is enormous.

15
12 Motivation for Physical Densities
Many physical processes require sampling from specific PDFs:
• Exponential distribution (radioactive decay, waiting times)
• Maxwell–Boltzmann distribution (thermal motion, molecular velocities)
• Power-law distributions (astrophysics, network formation)
• Boltzmann distribution (statistical mechanics at finite temperature)
In each case, the underlying physics dictates a natural probability distribution. Computing
expectations with respect to these distributions requires efficient sampling methods.
Thus we seek systematic methods to generate samples from arbitrary distributions. This is
the focus of the next sections.

13 Probability Foundations
13.1 Random Variables
A random variable X is a measurable map from a probability space (Ω, F , P ) to R. Its distri-
bution can be described by:
• Probability density function (PDF): pX (x)
• Cumulative distribution function (CDF): FX (x) = P (X ≤ x)
The CDF uniquely determines the distribution and is related to the PDF by:
Z x
FX (x) = pX (t) dt.
−∞

13.2 Expectation and Variance


The expectation of a function f (X) is:
Z ∞
E[f (X)] = f (x) pX (x) dx. (22)
−∞

The variance is defined as:

Var(f (X)) = E[f (X)2 ] − E[f (X)]2 . (23)

Variance quantifies the spread of f (X) around its mean and is a key measure of statistical
uncertainty in Monte Carlo estimation.

14 The Cumulative Distribution Function


For a continuous random variable, the CDF is:
Z x
F (x) = p(t) dt. (24)
−∞

The CDF contains all information about the distribution and is defined for all random
variables (both continuous and discrete).

16
14.1 Properties
The cumulative distribution function has several fundamental properties:
1. F (x) is non-decreasing: if x1 < x2 , then F (x1 ) ≤ F (x2 )
2. Boundary conditions: F (−∞) = 0 and F (∞) = 1
3. If p(x) > 0 for all x in the support, then F (x) is strictly increasing and invertible
4. The derivative is the PDF: dF dx = p(x)
The invertibility property is crucial for inverse transform sampling. When the CDF is strictly
increasing and continuous, a unique inverse function F −1 exists.

15 Inverse Transform Sampling


15.1 Interpretation: geometric intuition before formalism
The big picture: We want to generate random samples from a target probability distribution
with density p(x). We have a perfect source of uniform random numbers U ∼ Uniform(0, 1)
(from a computer’s PRNG). Can we transform the uniform samples into samples from p(x)?
The key insight: The cumulative distribution function (CDF) F (x) is a monotone increas-
ing function that maps R to [0, 1]. This is exactly the right shape to bridge uniform randomness
and the target distribution.
Geometric picture: Imagine plotting the CDF F (x) as a curve from bottom-left (−∞, 0)
to top-right (∞, 1). This curve is a monotone ramp. If you:
1. Pick a uniform random point U in the interval [0, 1] (on the y-axis).
2. Draw a horizontal line from U until it hits the curve F (x).
3. Drop vertically down to the x-axis, landing at some point X.
Then X is a random sample from the distribution p(x)!
Why does this work? Points near tall parts of the curve land at x-values where the CDF
rises steeply, which correspond to high-probability regions (high density p(x)). Points near flat
parts of the curve land at low-probability regions. So the procedure naturally concentrates
samples where probability is high.
Mathematical formulation: This geometric procedure is exactly X = F −1 (U ). We invert
the CDF and apply it to a uniform random variable.

15.2 Intuition with a concrete example


Consider the exponential distribution, which describes waiting times in many natural processes.
Its CDF is F (x) = 1 − e−λx . The curve starts at (0, 0) and rises steeply at first, then levels off.
If U = 0.1 (low value), the horizontal line hits the curve early, giving a small X (short
waiting time). If U = 0.9 (high value), the line hits the curve late, giving a large X (long
waiting time).
The steep part at low x means many uniform points U land there, giving many samples at
small X. The flat part at high x means few uniform points land there, giving few samples at
large X.
This matches the exponential density: high probability for small values, decreasing expo-
nentially. The procedure automatically captures the shape of the distribution.

15.3 Theorem
Let U ∼ Uniform(0, 1) and define:
X = F −1 (U ), (25)
where F is the CDF of a target distribution. Then X has PDF p(x).

17
This theorem provides a systematic method to transform uniform randomness into arbitrary
distributions.

15.4 Proof
We verify that X = F −1 (U ) has the correct distribution by computing its CDF:

P (X ≤ x) = P (F −1 (U ) ≤ x) (26)
= P (U ≤ F (x)) (27)
= F (x), (28)

which is precisely the CDF of the target distribution. The second equality holds because
F −1 (u) ≤ x if and only if u ≤ F (x), and the last equality uses that U ∼ Uniform(0, 1) so
P (U ≤ u) = u.

15.5 Example: Exponential Distribution


Consider the exponential distribution:

p(x) = λe−λx , x ≥ 0. (29)

The CDF is:


F (x) = 1 − e−λx . (30)
To find the inverse, solve u = F (x) for x:

u = 1 − e−λx (31)
−λx
e =1−u (32)
−λx = ln(1 − u) (33)
1
x = − ln(1 − u). (34)
λ
Thus, to generate an exponential random variable:
1. Generate U ∼ Uniform(0, 1).
2. Set X = − λ1 ln(1 − U ).
Since 1 − U is also uniform on [0, 1], one often uses X = − λ1 ln U .

15.6 Physical Meaning


The exponential distribution arises naturally in many physical contexts:
• Radioactive decay: waiting time until the next decay event
• Poisson processes: inter-arrival times between events
• Lifetime of systems with constant failure rate
Inverse transform sampling shows concretely how a source of uniform random numbers (from
a PRNG) can be converted into physically meaningful waiting times.

16 Limitations of Inverse Transform Sampling


Inverse transform sampling becomes impractical when:
• The CDF cannot be inverted analytically
• The distribution is high-dimensional
• The PDF is known only up to a normalization constant

18
16.1 Why inverse transform sampling fails
Many important distributions, such as the normal distribution, have CDFs with no simple
closed-form inverse. While numerical inversion is possible, it may be computationally expensive
or unstable.
In multiple dimensions, the notion of a “CDF” becomes more complicated (involving inte-
grals over regions), and inverting it is generally not realistic.

16.2 High-dimensional distributions


In high-dimensional problems, for example:
• Bayesian posterior distributions with many parameters
• Molecular configurations with many particles
• High-dimensional latent variable models
inverse transform sampling is usually not an option, because computing and inverting a full
CDF is infeasible.

16.3 Unnormalized densities


Often we know only an unnormalized density

p̃(x)
p(x) = ,
Z
where p̃(x) is known and Z (the normalizing constant) is unknown and hard to compute.
Without Z, we cannot construct the CDF F (x), so inverse transform sampling cannot be applied
directly.

16.4 Motivation for advanced methods


These limitations motivate more advanced Monte Carlo techniques:
• Rejection sampling
• Importance sampling
• Markov Chain Monte Carlo (MCMC)
• Adaptive and sequential methods
All of these methods preserve the basic Monte Carlo philosophy—estimate expectations by
averaging—but use clever strategies to draw samples from difficult distributions.

19
Part II: Practical Coding Examples

17 Example 1: Estimating π via Buffon’s Method


17.1 Problem
Drop N = 5000 random points in a unit square [0, 1]2 and count how many fall inside a quarter
circle of radius 1. Estimate π.

17.2 Solution

18 Example 2: Estimating Rare Event Probabilities


18.1 Problem
Given 5 random cards from a deck of 52, what is the probability of being dealt all hearts?

18.2 Solution

19 Example 3: Random Walk Simulation


19.1 Problem
Simulate a 1D random walk with n = 100 steps and M = 100,000 independent walks. Verify
that E[Xn ] = 0 and E[Xn2 ] = n.

20
19.2 Solution

20 Example 4: Stopping Time Experiment (Dice)


20.1 Problem
Repeatedly throw three 8-sided dice. Let R be the number of throws required for the cumulative
sum to reach or exceed threshold T = 1000. Estimate E[R].

20.2 Solution

21 Example 5: Inverse Transform Sampling (Exponential)


21.1 Problem
Generate 10,000 samples from an exponential distribution with λ = 2 using inverse transform
sampling. Verify the mean and plot the histogram.

21
21.2 Solution

22 Summary of Coding Examples


• Example 1: Shows how to estimate π using random sampling in a geometric setting
• Example 2: Demonstrates Monte Carlo for rare event probability estimation
• Example 3: Verifies theoretical predictions (mean and variance) via simulation
• Example 4: Illustrates Monte Carlo for complex, sequential random experiments
• Example 5: Shows inverse transform sampling in practice

Conclusion
Monte Carlo methods provide a powerful, flexible framework for:
• Estimating expectations and integrals
• Sampling from complex distributions
• Solving high-dimensional problems
• Handling uncertainty and randomness
The combination of rigorous theory and practical coding examples makes Monte Carlo
both intellectually satisfying and computationally practical. The dimension-robustness of the
O(N −1/2 ) error rate is the key advantage over deterministic methods, especially in modern
applications involving 100s or 1000s of dimensions.

22

You might also like