0% found this document useful (0 votes)

3 views102 pages

Etics24 RQMC Tutorial B

The document introduces randomized quasi-Monte Carlo (RQMC) methods for simulation, focusing on estimating integrals using deterministic and randomized point sets. It discusses the concepts of discrepancies, error bounds, and variance reduction techniques in the context of lattice rules and Fourier expansions. Additionally, it highlights the importance of smoothness in functions for achieving faster convergence rates in RQMC applications.

Uploaded by

Said Saterih

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views102 pages

Etics24 RQMC Tutorial B

Uploaded by

Said Saterih

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

1

Introduction to randomized quasi-Monte Carlo

aft
methods in simulation

Part B

Pierre L’Ecuyer

Dr Université de Montréal, Canada

ETICS, Saissac, September 2024

aft
QMC/RQMC Theory

Dr
3

Short Recap
We want to estimate µ = ∫[0,1)s f (u)du where U = (U0 , . . . , Us−1 ) and the Uj are i.i.d. U(0, 1).

aft
Monte Carlo (MC) estimator:
1 n−1
µ̂n = ∑ f (Ui )
n i=0
where U0 , . . . , Un−1 i.i.d. uniform over [0, 1)s .
We have E[µ̂n ] = µ and Var[µ̂n ] = σ 2 /n = Var[f (U)]/n.
Quasi-Monte Carlo (QMC) approximation:

1 n−1
µ̄n = ∑ f (ui )
n i=0

Dr
where Pn = {u0 , . . . , un−1 } are deterministic points that cover [0, 1)s very evenly.
Randomized Quasi-Monte Carlo (RQMC) estimator:

µ̂n,rqmc =
1 n−1
∑ f (Ui ),
n i=0

where {U0 , . . . , Un−1 } ⊂ (0, 1)s are randomized versions of the points of Pn so that still cover the space
evenly while each Ui is uniform over [0, 1)s .
4

How should we measure the uniformity of Pn ?

We want a measure that can translate into error or variance bounds.

aft
For two arbitrary points a < b ∈ (0, 1)s , the rectangular box B = [a, b) with corners at a and
b is an s-dimensional interval. Let vol[a, b) denote the volume of this box and ∣Pn ∩ [a, b)∣/n
be the proportion of the points that it contains. We call the absolute difference
disc([a, b), Pn ) = ∣vol[a, b) − ∣Pn ∩ [a, b)∣/n∣ the local discrepancy for that interval.

1
u2
b2

Dr a2
0 a1 b1 1 u1

To measure the discrepancy between the distribution of Pn and the uniform distribution, we
can take the sup or average of disc([a, b), Pn ) over the intervals B.
5
Classical discrepancies
Extreme discrepancy of Pn : D(Pn ) = sup disc([a, b), Pn ).
[a,b)⊆[0,1)s

aft
Star discrepancy of Pn : D∗ (Pn ) = sup disc([0
0, b], Pn ).
b∈[0,1)s

These two discrepancies behave in the same way when n → ∞:

Theorem. D∗ (Pn ) ≤ D(Pn ) ≤ 2s D∗ (Pn ).

The sup can also be replaced by a more general Lp norm, for 1 ≤ p < ∞.
Lp star discrepancy of Pn :

∗
D(p) (Pn ) = (∫
Dr
[0,1)s
(disc([0
0, u), Pn ))p du)

and similarly for the Lp extreme discrepancy.

1/p

For large s and n, D∗ (Pn ) is too hard to compute, but D(2)

∗
≤ D∗ (Pn ) ≤ cs,p [D(p)
∗

(Pn ) can be computed in

O(sn ) operations for an arbitrary Pn via a formula from Warnok.
2
(Pn )]p/(p+s) ,
6

Is D∗ (Pn ) really meaningful?

Let P∞ = {u1 , u2 , . . . } be an infinite (deterministic) sequence of points and Pn denote the

aft
first n points of this sequence. P∞ is called uniformly distributed if D∗ (Pn ) → 0 as n → ∞.
Theorem (Weyl 1916). When n → ∞, ∣En ∣ → 0 for all bounded Riemann-integrable
functions f if and only if D∗ (Pn ) → 0.

Dr
6

Is D∗ (Pn ) really meaningful?

Let P∞ = {u1 , u2 , . . . } be an infinite (deterministic) sequence of points and Pn denote the

Classical Koksma-Hlawka inequality (Koksma(1942) for s = 1, Hlawka(1961) for s > 1):

∣En ∣ ≤ Vhk (f ) ⋅ D∗ (Pn )

Dr
where Vhk (f ) denotes the total variation of f in the sense of Hardy and Krause;
see Kuipers and Niederreiter (1974) for a definition.
Vhk (f ) < ∞ implies no discontinuity not aligned with the axes.

Can we easily bound the discrepancy of a Pn ?

Erdõs-Turán-Koksma Inequality. For each s, there is a constant cs such that for all Pn ,
R RRR
⎛ 1 1 RRRR 1 n−1 2πiht ui RRR⎞

aft
∗ RR ∑ e
D (Pn ) ≤ cs + ∑ ∏ RRR
⎝ m + 1 h∈Zs (m) j =/ 0 ∣hj ∣ RRRR n i=0 RRR⎠
R
√
for all integers m ≥ 1, where Zs (m) = {h = (h1 , . . . , hs ) ∈ {−m, . . . , m}s } and i = −1.
This inequality permits one to construct explicit low-discrepancy sequences by bounding the
exponential sum. It is a convenient tool.
It was used for example to prove that the digital sequence constructions by Sobol’, Faure,
Niederreiter, etc., have a discrepancy that converges as O(n−1 (log n)s ). This implies that

Dr
with these point sets, if Vhk (f ) < ∞, then

∣En ∣ ≤ Vhk (f ) ⋅ D∗ (Pn ) = O(n−1 (log n)s ) .

Erdõs-Turán-Koksma Inequality. For each s, there is a constant cs such that for all Pn ,
R RRR
⎛ 1 1 RRRR 1 n−1 2πiht ui RRR⎞

Dr
with these point sets, if Vhk (f ) < ∞, then

∣En ∣ ≤ Vhk (f ) ⋅ D∗ (Pn ) = O(n−1 (log n)s ) .

For lattice rules and digital nets, we will rather use other variants of this inequality that do
not involve D∗ (Pn ), but other discrepancies that are much easier to compute.
8

Error and variance expressions for lattice rules

Suppose f has Fourier expansion

aft
t
f (u) = ∑ fˆ(h)e 2πih u for u ∈ [0, 1)s ,
h∈Zs

with Fourier coefficients

fˆ(h) = ∫ f (u) exp(−2πi h ⋅ u) du.
(0,1)s

Note that
t
s s
e 2πih u = ∏ e 2πihj uj = ∏[cos(2πhj uj ) + i sin(2πhj uj )] (by Euler formula),
j=1

Dr j=1

so this expansion is in terms of one-periodic trigonometric functions over the real space.
It is defined everywhere, just by making copies of its definition over [0, 1)s .
When using standard MC to estimate µ = E[f (U)], the MC variance is
t
Var[f (U)] = ∑ ∣fˆ(h)∣2 Var[e 2πih U ] = ∑ ∣fˆ(h)∣2 .
h∈Zs h∈Zs
9

Suppose now that we estimate µ = E[f (U)] using a lattice rule with point set
Pn = {ui , i = 0, . . . , n − 1}. The dual lattice to Ls is

aft
L∗s = {h ∈ Rs ∶ h t v ∈ Z for all v ∈ Ls } ⊆ Zs .

One can show that in this case,

⎧
1 n−1 2πiht ui ⎪
⎪1 if h ∈ L∗s ;
∑e =⎨
n i=0 ⎪
⎪
⎩0 otherwise.

For a deterministic lattice rule, if ∑ ∣fˆ(h)∣ < ∞, then En = ∑ fˆ(h).

Dr h∈Zs 0=
/ h∈L∗s
9

Suppose now that we estimate µ = E[f (U)] using a lattice rule with point set
Pn = {ui , i = 0, . . . , n − 1}. The dual lattice to Ls is

aft
L∗s = {h ∈ Rs ∶ h t v ∈ Z for all v ∈ Ls } ⊆ Zs .

One can show that in this case,

⎧
1 n−1 2πiht ui ⎪
⎪1 if h ∈ L∗s ;
∑e =⎨
n i=0 ⎪
⎪
⎩0 otherwise.

For a deterministic lattice rule, if ∑ ∣fˆ(h)∣ < ∞, then En = ∑ fˆ(h).

Dr h∈Zs

For a randomly shifted lattice, Var[µ̂n,rqmc ] = ∑ ∣fˆ(h)∣2 .

0=
/ h∈L∗s
0=
/ h∈L∗s

From the viewpoint of variance reduction, an optimal lattice for f minimizes Var[µ̂n,rqmc ].
Roughly, the dual lattice should not contain the large (absolute) Fourier coefficients.
10

For a integer α > 0, a function f has smoothness α if it has square-integrable mixed partial
derivatives up to order α. If f has smoothness α and the periodic continuation of its

aft
derivatives up to order α − 1 is continuous across the unit cube boundaries, then

∣fˆ(h)∣ = O((max(1, h1 )⋯ max(1, hs ))−α ).

Moreover, there always exists a rank-1 lattice (a vector v1 = v1 (n) = a(n)/n) such that

Pα ∶= ∑ (max(1, h1 )⋯ max(1, hs ))−α = O(n−α+ϵ ),

0=
/ h∈L∗s

which gives Var[µ̂n,rqmc ] = O(n−2α+ϵ ), for any ϵ > 0.

Dr
Pα is the error and P2α is the variance for a (worst-case) function f for which

fˆ(h) = (max(1, ∣h1 ∣)⋯ max(1, ∣hs ∣))−α .

A larger α means a smoother f and a faster convergence rate.

11
When α is a positive integer, this worst-case f is
(2π)α
f ∗ (u) = ∑ ∏ Bα (uj )
α!

aft
u⊆{1,...,s} j∈u

and we have
∣u∣
1 n−1 −(−4π 2 )α
P2α = ∑ ∑[ ] ∏ B2α (ui,j )
∅≠u⊆{1,...,s} n i=0 (2α)! j∈u
where Bα is the Bernoulli polynomial of degree α.
In particular, B1 (u) = u − 1/2 and B2 (u) = u 2 − u + 1/6.
Can compute this P2α in O(sn) operations. Easy to search for good lattices in this case!
P2α has been proposed long ago as a figure of merit.

Dr
Beware: in many “older” papers, α is replaced by α/2.
Worst-case function may not be representative of what happens in applications.
Also, the hidden factor in O increases quickly with s.
To get a bound that is uniform in s, the Fourier coefficients must decrease fast enough
with the dimension and “size” of vectors h. This is typically what happens in applications
for which RQMC is effective!
12

Baker’s (or tent) transformation

In applications, the periodic continuation of f is often not continuous at the boundary of
[0, 1)s (i.e., over the torus). A simple transformation can make it continuous:

aft
If f (0) =/ f (1), define f˜ by f˜(1 − u) = f˜(u) = f (2u) for 0 ≤ u ≤ 1/2.
This f˜ has the same integral as f and f˜(0) = f˜(1).

Dr 0
1/2

.
1
12

Baker’s (or tent) transformation

In applications, the periodic continuation of f is often not continuous at the boundary of
[0, 1)s (i.e., over the torus). A simple transformation can make it continuous:

aft
If f (0) =/ f (1), define f˜ by f˜(1 − u) = f˜(u) = f (2u) for 0 ≤ u ≤ 1/2.
This f˜ has the same integral as f and f˜(0) = f˜(1).

Dr 0
1/2

.
1
12

Baker’s (or tent) transformation

In applications, the periodic continuation of f is often not continuous at the boundary of
[0, 1)s (i.e., over the torus). A simple transformation can make it continuous:

aft
If f (0) =/ f (1), define f˜ by f˜(1 − u) = f˜(u) = f (2u) for 0 ≤ u ≤ 1/2.
This f˜ has the same integral as f and f˜(0) = f˜(1).

Dr 0
1/2

.
1
12

Baker’s (or tent) transformation

In applications, the periodic continuation of f is often not continuous at the boundary of
[0, 1)s (i.e., over the torus). A simple transformation can make it continuous:

aft
If f (0) =/ f (1), define f˜ by f˜(1 − u) = f˜(u) = f (2u) for 0 ≤ u ≤ 1/2.
This f˜ has the same integral as f and f˜(0) = f˜(1).

Dr 0
1/2
1

For smooth f , it can reduce the variance from O(n−2+ϵ ) to O(n−4+ϵ ) (Hickernell 2002).
The resulting f˜ is symmetric with respect to u = 1/2.
In practice, we transform the points Ui instead of f .
13

One-dimensional case

aft
Random shift mod 1 followed by baker’s transformation.
Along each coordinate, stretch everything by a factor of 2 and fold back.
Same as replacing Uj by min[2Uj , 2(1 − Uj )].

0 0.5 1

Dr
13

One-dimensional case

aft
Random shift mod 1 followed by baker’s transformation.
Along each coordinate, stretch everything by a factor of 2 and fold back.
Same as replacing Uj by min[2Uj , 2(1 − Uj )].

0 0.5 1
U/n

Dr
13

One-dimensional case

aft
Random shift mod 1 followed by baker’s transformation.
Along each coordinate, stretch everything by a factor of 2 and fold back.
Same as replacing Uj by min[2Uj , 2(1 − Uj )].

0 0.5 1

Dr
13

One-dimensional case

aft
Random shift mod 1 followed by baker’s transformation.
Along each coordinate, stretch everything by a factor of 2 and fold back.
Same as replacing Uj by min[2Uj , 2(1 − Uj )].

0 0.5 1

Dr
Gives locally antithetic points in intervals of size 2/n.
This implies that linear pieces over these intervals are integrated exactly.
Intuition: when f is smooth, it is well-approximated by a piecewise linear function which is
integrated exactly, so the error is small.
14

High dimension: ANOVA decomposition

For large s, covering the hypercube [0, 1)s evenly requires an astronomical number of points.

aft
Just to have one point in each “quadrant”, we need 2s points, which is unrealistic.

Dr
14

High dimension: ANOVA decomposition

For large s, covering the hypercube [0, 1)s evenly requires an astronomical number of points.

Dr
14

High dimension: ANOVA decomposition

For large s, covering the hypercube [0, 1)s evenly requires an astronomical number of points.

aft
Just to have one point in each “quadrant”, we need 2s points, which is unrealistic.
Fortunately, RQMC can still be effective if f is well approximated by a sum of low-dimensional
functions, because it suffices to have low error or variance for these low-dimensional functions.
In general, we can write
s s
f (u) = ∑ fu (u) = µ + ∑ f{i} (ui ) + ∑ f{i,j} (ui , uj ) + ⋯
u⊆{1,...,s} i=1 i,j=1

in a way that the Monte Carlo variance decomposes as

Dr
Var[f (U)] = σ 2 = ∑
u⊆{1,...,s}
σu2 = ∑
u⊆{1,...,s}
Var[fu (U)].

The σu2 ’s can be estimated by MC or RQMC (I skip the details here). Idea: Make sure the
projections Pn (u) of Pn over u are very uniform for the important subsets u (those with a large σu2 ).
When they are mostly lower-dim. projections, then it is much easier to make RQMC effective.
15

Weighted Pγ,α with projection-dependent weights γu

Denote u(h) = u(h1 , . . . , hs ) the set of indices j for which hj =/ 0.

aft
Pγ,α = ∑ γu(h) (max(1, ∣h1 ∣)⋯ max(1, ∣hs ∣)) .
−α

/ h∈L∗s
0=

For α integer > 0, with ui = (ui,1 , . . . , ui,s ) = i v1 mod 1,

1 n−1 −(−4π 2 )α
∣u∣
Pγ,2α = ∑ γu ∑[ ] ∏ B2α (ui,j ),
∅≠u⊆{1,...,s} n i=0 (2α)! j∈u

and the corresponding variation is

Vγ,α
2

Dr
(f )=

Var[µ̂n,rqmc ] =
∑

∑
u⊆{1,...,s}
1
∫ ∣
∂ α∣u∣
2 α∣u∣ [0,1]∣u∣ ∂u α u
∅≠u⊆{1,...,s} γu (4π )

for f ∶ [0, 1)s → R smooth enough. Then, we have the variance bound
f (u)∣ du,

Var[µ̂n,rqmc (fu )] ≤ Vγ,α

2
2

(f ) ⋅ Pγ,2α .
16
This Pγ,2α is the RQMC variance for the worst-case function with Vγ,α
2
(f ) ≤ 1:

√ (2π)α
f ∗ (u) =

aft
∑ γu ∏ Bα (uj ),
u⊆{1,...,s} j∈u (α)!

whose Fourier coefficients are

fˆ∗ (h) = γu(h) (max(1, ∣h1 ∣)⋯ max(1, ∣hs ∣))−α .

For this worst-case function, we have

(4π 2 )α (4π 2 )α
∣u∣ ∣u∣

Dr
σu2 = γu [Var[Bα (U)]
((α)!)2
] = γu [∣B2α (0)∣

For α = 1, we should take γu = (3/π 2 )∣u∣ σu2 ≈ (0.30396)∣u∣ σu2 .

For α = 2, we should take γu = [45/π 4 ]∣u∣ σu2 ≈ (0.46197)∣u∣ σu2 .
For α → ∞, we have γu → (0.5)∣u∣ σu2 .
The ratios weight / variance should decrease exponentially with ∣u∣.
(2α)!
] .
17

Heuristics for choosing the weights

Firstly, note that all one-dimensional projections (before random shift) are the same. So

aft
the weights γu for ∣u∣ = 1 are irrelevant.

Dr
17

Heuristics for choosing the weights

Firstly, note that all one-dimensional projections (before random shift) are the same. So

aft
the weights γu for ∣u∣ = 1 are irrelevant.

For f ∗ , take γu = ρ∣u∣ σu2 for a constant ρ, but there are 2s − 1 subsets u to consider!

Dr
17

Heuristics for choosing the weights

Firstly, note that all one-dimensional projections (before random shift) are the same. So

aft
the weights γu for ∣u∣ = 1 are irrelevant.

For f ∗ , take γu = ρ∣u∣ σu2 for a constant ρ, but there are 2s − 1 subsets u to consider!

One could define a simple parametric model and estimate the parameters by matching the
ANOVA variances σu2 . Some popular model types for the weights:
Order-dependent weights: γu = Γ∣u∣ depends only on ∣u∣.
It suffices to select Γ1 , Γ2 , Γ3 , etc.

Dr
Often, one would take Γj = 0 for j > k for some integer k (k parameters to select).
Or Γj = ρj−1 for all j (a single parameter to select).
Product weights: γu = ∏j∈u γj where γj ≥ 0 is the weight for coordinate j.
Mixture: POD weights γu = Γ∣u∣ ∏j∈u γj , SPOD weights, etc.
18

Notion of effective dimension

aft
(Caflisch, Morokoff, and Owen 1997).
A function f has effective dimension d in proportion ρ in the superposition sense if

∑ σu ≥ ρσ .
2 2

∣u∣≤d

It has effective dimension d in proportion ρ in the truncation sense if

∑ σu2 ≥ ρσ 2 .

Dr u⊆{1,...,d}

High-dimensional functions with low effective dimension are frequent.

One may change f to make this happen.
19

ANOVA for estimator of P[T > x] in Stochastic Activity Network

aft
Stochastic Activity Network

x = 64

x = 100

CMC, x = 64

Dr
CMC, x = 100
0 20 40 60 80
% of total variance for each cardinality of u
100
20

Searching for good parameters

Korobov lattices. Search over all admissible a, for a = (1, a, a2 , . . . , ...).

aft
Random Korobov. Try r random values of a.

Dr
20

Searching for good parameters

Korobov lattices. Search over all admissible a, for a = (1, a, a2 , . . . , ...).

aft
Random Korobov. Try r random values of a.
Rank 1, exhaustive search. Try all possibilities for a = (1, a2 , . . . , as ).
Pure random search. Try admissible vectors a at random.

Dr
20

Searching for good parameters

Korobov lattices. Search over all admissible a, for a = (1, a, a2 , . . . , ...).

aft
Random Korobov. Try r random values of a.
Rank 1, exhaustive search. Try all possibilities for a = (1, a2 , . . . , as ).
Pure random search. Try admissible vectors a at random.
Component by component (CBC) construction. (Sloan, Kuo, etc.).
Let a1 = 1;
For j = 2, 3, . . . , s, find z ∈ {1, . . . , n − 1}, gcd(z , n) = 1, such that
(a1 , a2 , . . . , aj = z) minimizes D(Pn ({1, . . . , j})).

Dr
Fast CBC construction for Pγ,α : use FFT. (Nuyens, Cools).
20

Searching for good parameters

Korobov lattices. Search over all admissible a, for a = (1, a, a2 , . . . , ...).

Let a1 = 1; Dr
Fast CBC construction for Pγ,α : use FFT. (Nuyens, Cools).
Randomized CBC construction.

For j = 2, . . . , s, try r random z ∈ {1, . . . , n − 1}, gcd(z , n) = 1,

and retain (a1 , a2 , . . . , aj = z) that minimizes D(Pn ({1, . . . , j})).
21

Quantiles of figure of merit

We computed Pγ,2 for order-dependent weights with Γj = 0.3j/2 for all j, for n = 26 , ..., 219 , for all
admissible vectors a = (1, a2 , . . . , as ) with odd aj .

aft
For s = 2, a linear regression of log Pγ,2 vs log n for 212 ≤ n ≤ 219 gives a decreasing rate near
O(n−1.9 ) for the best and O(n−1 ) for the mean. The worst value is with a = (1, 1) for all n.
The median is close to the best, which means that the majority of vectors a are quite good!

worst
10−1 mean
90 %
median
merit value

10−4 10 %

10−7

10−10
2 6
Dr
s =2

28 210 212
n
214 216 218
best
22

For s = 3, a linear regression log Pγ,2 vs log n for 210 ≤ n ≤ 215 gives decreasing rates near
O(n−1.75 ) for the best and O(n−1 ) for the mean.

aft
Again, the median is better than the mean.

100 worst
mean
90 %
10−2 median
merit value

10 %
best
10−4

10−6

26
Dr
s =3

28 210
n
212 214 216
23

Example: Playing with the Weights

aft
To see the effect of weights selection on RQMC variance, when choosing a lattice rule, we
shall integrate the worst-case function
√ (2π)α
fα∗ (u) = ∑ vu ∏ Bα (uj ),
u⊆{1,...,s} j∈u (α)!

whose RQMC variance is Pγ,2α .

The ideal weights are γu = vu . In our experiments, we measure the inflation factor of RQMC

Dr
variance when we use a lattice rule constructed via fast CBC with different weights γu ≠ vu .
We start with s = 10 and vu = Γ∣u∣ for ∣u∣ ≤ k, for a given integer k and a given constant
Γ > 0. We select the weights as γu = Γ̃∣u∣ for ∣u∣ ≤ k̃, where Γ̃ and k̃ may differ from Γ and k.
24

Ratio of RQMC variances for modified vs ideal weights

aft
n A1 A2 B1 B2 C1 C2
28 1.11 1.21 1.13 4.08 3.82 6.80
29 1.21 1.10 1.42 10.5 2.93 7.25
210 1.36 1.38 2.04 4.64 2.86 5.94
211 1.24 1.43 2.40 6.18 2.15 5.14
212 1.42 1.66 3.79 13.2 2.47 5.94
213 1.30 2.38 5.51 9.09 2.66 5.97
214 1.51 2.54 30.5 8.66 9.11 29.1

Dr
215
216
1.46
1.80
1.93
2.55
25.6
3.13
13.3
12.9

A1: k = k̃ = s, Γ2 = 0.1 and Γ̃2 = 0.001. Not too bad.

A2: k = k̃ = s, Γ2 = 0.001 and Γ̃2 = 0.1. More impact.
3.52
2.73
9.71
10.2
25

aft
B1: Γ2 = Γ̃2 = 0.1, k = 4 and k̃ = 2. Criterion is blind on projections of order 3 and 4. This
has unpredictable (sometimes dramatic) impact on RQMC variance.

B2: Γ2 = Γ̃2 = 0.5, k = 2 and k̃ = 4. Gives weight to irrelevant projections. Stronger

degradation on average.

Dr
26

C1: Γ2 = Γ̃2 = 0.1 and k = k̃ = 4, but increase the variation of f by replacing vu2 with
vu2 + ṽu2 , with

aft
ṽu2 = 1.0 for u = {1, 3}, {3, 5}, {5, 7}, {7, 9},
ṽu2 = 0.5 for u = {2, 3, 4}, {4, 5, 6}, {6, 7, 8}, {8, 9, 10},
ṽu2 = 0.25 for u = {1, 2, 3, 4}, {4, 5, 6, 7}, {7, 8, 9, 10},
and ṽu = 0 for all other u.
Important projections are not given enough weight relative to others.

C2: Like C1, but vu2 is replaced with only ṽu2 , as defined above. Many irrelevant projections,
with ṽu = 0, now have nonzero weights.

Dr
In all cases, the variance ratio increases (non-monotonically) with n.

The worst degradations are usually observed when we give too much weight to irrelevant
projections!
27

Discrepancies and Bounds with Projection-Dependent Weights

There are many other ways of defining the variation and the discrepancy. This is usually

aft
done by selecting a Reproducing-Kernel Hilbert Space of functions. The choice of kernel
determines D and V.

In general, we can obtain RQMC variance bounds via Hölder-type inequality:

Var[µ̂n,rqmc ] ≤ (D(Pn ) ⋅ V(f ))2 where

1/p 1/q
⎛ ⎞ ⎛ ⎞
V(f ) = ∑

Dr
⎝∅/=u⊆{1,2,...,s}
(V(fu )/γu )p
⎠
, D(Pn ) = ∑
⎝∅/=u⊆{1,2,...,s}
(γu Du (Pn ))q

1/p + 1/q = 1, γu ≥ 0 is a weight assigned to the subset u, V(fu ) is the variation of fu , and
Du (Pn ) is the discrepancy of the projection of Pn (u).
Common choice: p = q = 2. Sometimes q = ∞ with p = 1.
⎠
,
28
Form of discrepancies commonly used in constructions
Many discrepancies that are typically used to search for good parameters efficiently (e.g., in

aft
LatNet Builder) are as in the previous slide, with

1 n−1
Duq (Pn ) = ∑ ∏ φ(ui,j )
n i=0 j∈u

for some function φ ∶ [0, 1) → R, usually with q = 2. When computing the products for all
subsets u, we can start with the smaller subsets u, then save and re-use the partial products.
For the weighted P2α , we take q = 2 and

Dr φ(ui,j ) = −(−4π 2 )α B2α (ui,j )/(2α)!

For digital nets, we have other choices of φ.

We need software to support (1) search for good point set parameters and
(2) to generate and randomize the points, for these various constructions.
29

LatNet Builder Software

A C++ library offering tools to search for good parameters for lattice rules, polynomial

aft
lattice rules, and digital nets.
Various choices of figures of merit, arbitrary weights, construction methods, etc. Easily
extensible.
For better run-time efficiency, uses static polymorphism, via templates, rather than dynamic
polymorphism. Several techniques to reduce computations and improve speed.
Offers a pre-compiled program with Unix-like command line interface.
Also a graphical interface implemented in Python.

Dr
Available for download on GitHub, with source code and documentation.

Show graphical interface

Error and variance expressions for digital nets

Similar development as for lattice rules, but instead of using trigonometric Fourier series,

aft
we use Walsh series for the expansion of f .

Dr
30

Error and variance expressions for digital nets

Similar development as for lattice rules, but instead of using trigonometric Fourier series,

aft
we use Walsh series for the expansion of f .
Let N0 = {0, 1, 2, . . . }. For k = (k1 , . . . , ks ) ∈ Ns0 and u = (u1 , . . . , us ) ∈ [0, 1)s , where
ℓj s ℓj
kj = ∑ kj,ℓ bℓ−1 ∈ N0 and uj = ∑ uj,ℓ b−ℓ ∈ [0, 1), define ⟨k, u⟩ = ∑ ∑ kj,ℓ uj,ℓ mod b.
ℓ=1 ℓ≥1 j=1 ℓ=1

The Walsh expansion in base b of f ∶ [0, 1)s → R is

f (u) = ∑ f˜(k)e 2πi⟨k,u⟩/b ,

with Walsh coefficients Dr f˜(k) = ∫

k∈Ns0

[0,1)s
f (u)e −2πi⟨k,u⟩/b du.
30

Error and variance expressions for digital nets

Similar development as for lattice rules, but instead of using trigonometric Fourier series,

The Walsh expansion in base b of f ∶ [0, 1)s → R is

f (u) = ∑ f˜(k)e 2πi⟨k,u⟩/b ,

with Walsh coefficients Dr f˜(k) = ∫

k∈Ns0

[0,1)s
f (u)e −2πi⟨k,u⟩/b du.

For b = 2, e −2πi⟨k,u⟩/b is either e 0 = 1 or e −πi = cos π = −1.

The Walsh functions are square waves that take values only in {−1, 1}.
31

aft
Dual net
Suppose we estimate µ = E[f (U)] using a digital net in base b with point set
Pn = {ui , i = 0, . . . , n − 1}. The dual net is

Cs∗ = {k ∈ Ns0 ∶ ⟨k, ui ⟩ = 0 for all ui ∈ Pn }.

One can show that

⎧
1 n−1 2πi⟨k,ui ⟩/b ⎪⎪1 if k ∈ Cs∗ ;
∑ e = ⎨
⎪
⎪

Dr n i=0
⎩0 otherwise.
The nonzero digits kj,ℓ of a vector k select a specific set of digits ui,j,ℓ of the points. The
points are equidistributed for those specific digits iff the above sum is 0 for this k, iff k ∈/ Cs∗ .
32

aft
Error and variance expressions
For a deterministic digital net, if ∑ ∣f˜(k)∣ < ∞, then En = ∑ f˜(k).
k∈Ns0 0=
/ k∈Cs∗

Dr
32

aft
Error and variance expressions
For a deterministic digital net, if ∑ ∣f˜(k)∣ < ∞, then En = ∑ f˜(k).
k∈Ns0 0=
/ k∈Cs∗

For a digital net with a random digital shift, Var[µ̂n,rqmc ] = ∑ ∣f˜(k)∣2 .

0=
/ k∈Cs∗

From a variance reduction viewpoint, an optimal digital net for f minimizes this

Dr
Var[µ̂n,rqmc ]. Roughly, the dual net should not contain vectors k that correspond to large
square Walsh coefficients.
33
Bounds on Walsh coefficients
In the following, we assume for simplicity that b = 2.

aft
For a integer α > 0, a function f has smoothness α (roughly) if it has square-integrable
mixed partial derivatives up to order α.
For an integer k = 2a1 + ⋯ + 2aν > 0 with a1 > ⋅ ⋅ ⋅ > aν , we put
µα (k) = (a1 + 1) + ⋯ + (amin(α,ν) + 1), and µα (0) = 0.
ℓ min(α,ℓ )
Equivalently, if kj = ∑ℓ=1
j
kj,ℓ 2ℓ−1 , µα (kj ) = ∑ℓ=1 j ℓI[kj,ℓ > 0].
For a vector k = (k1 , . . . , ks ), let µα (k) = µα (k1 ) + ⋯ + µα (ks ).
Dick (2007, 2008) has shown that if f has smoothness α, then ∣fˆ(k)∣ ≤ K Ṽα2 (f )2−µα (k)

Dr
approximately, where K is a constant and Ṽα (f ) is the variation of f , which depends on the
integrals of the mixed partial derivatives of f of order up to α.
33
Bounds on Walsh coefficients
In the following, we assume for simplicity that b = 2.

Dr
approximately, where K is a constant and Ṽα (f ) is the variation of f , which depends on the
integrals of the mixed partial derivatives of f of order up to α.
This suggests using the FOM ∑0=/ k∈Cs∗ 2−µα (k) for QMC and ∑0=/ k∈Cs∗ 2−2µα (k) for RQMC,
for a given α. But these sums have an infinite number of terms!
Dick replaced the sum by a max and showed how to construct digital nets for which
max0=/ k∈Cs∗ 2−µα (k) = O(n−α (log n)αs ), for any integer α > 0, using an interlacing technique.
34
Figures of Merit for Digital Nets
A variety of discrepancies of the form

aft
1/q
⎛ ⎞ 1 n−1
D(Pn ) = ∑ (γu Du (Pn ))q with (Du (Pn ))q = ∑ ∏ φ(ui,j )
⎝∅/=u⊆{1,2,...,s} ⎠ n i=0 j∈u

for some function φ ∶ [0, 1) → R have been defined recently for digital nets, based on
various assumptions on the rate of decrease of the Walsh coefficients, which often follow
from an assumption on the smoothness level α of f .
Some of them can be seen as counterparts of the Pα used for lattices.

Dr
They can be used in the same way to make CBC constructions
One popular choice of φ for b = 2 and α = 1:

φ(u) = 2 − I[u > 0] ⋅ 6 ⋅ 2⌊log2 (u)⌋

where −⌊log2 u⌋ is the index of the first nonzero digit in the binary expansion of u.
Latnet Builder can make searches for good point sets based on such discrepancies.
35

Polynomial lattice rules (in base b = 2)

Similar to lattice rules, except that:

aft
Replace Z by F2 [z], the ring of polynomials over finite field F2 ≡ {0, 1};
Replace R by L2 = F2 ((z −1 )), the field of formal Laurent series over F2 , of the form
∑∞ −ℓ
ℓ=ω xℓ z , where xℓ ∈ F2 . Define ϕ ∶ L → R by

∞ ∞
ϕ ( ∑ xℓ z −ℓ ) = ∑ xℓ b−ℓ .
ℓ=ω ℓ=ω

We select a polynomial modulus Q(z) ∈ Z2 [z] of degree k and a generating vector

Pn = {(ϕ (
Q(z)
Dr
a(z) = (a1 (z), . . . , as (z)) ∈ Z2 [z]s , whose coordinates are polynomials of degrees less than
k having no common factor with Q(z). The point set of cardinality n = 2k is

h(z)a1 (z)
),...,ϕ(
h(z)as (z)
Q(z)
)) ∶ h(z) ∈ Z2 [z] and deg(h(z)) < k} .
36

aft
Most of the properties of ordinary lattice rules have counterparts for the polynomial rules.
The Fourier expansion is replaced by a Walsh expansion, the weighted Pγ,α has a
counterpart Pγ,α,PRL , CBC constructions can provide good parameters, fast CBC also
works, etc.

Dr
36

A PLR is actually a special case of a digital net in base b. This can be used to generate
the points efficiently: compute the generating matrices and use the digital net
implementation. This is particularly fast in base b = 2.

Dr
36

Dr
Random shift in space of formal series: equivalent to a random digital shift in base b,
applied to all the points. It preserves equidistribution.
37

aft
More Examples

Dr
38

Example: Pricing a financial derivative.

aft
Market price of some asset (e.g., one share of a stock) evolves in time as stochastic
process {S(t), t ≥ 0} with (supposedly) known probability law (estimated from data).
A financial contract gives owner net payoff g(S(t1 ), . . . , S(td )) at time T = td , where
g ∶ Rd → R, and 0 ≤ t1 < ⋯ < td are fixed observation times.
Under a no-arbitrage assumption, present value of contract at time 0, when S(0) = s0 , is

v (s0 , T ) = E∗ [e −r T g(S(t1 ), . . . , S(td ))] ,

Dr
where E∗ is under a risk-neutral measure and e −r T is the discount factor.
This expectation can be written as an integral over [0, 1)s for some s, and estimated by the
average of n i.i.d. replicates of X = e −r T g(S(t1 ), . . . , S(td )).
39

A simple model for S: geometric Brownian motion (GBM):

aft
S(t) = s0 e X(t) where X(t) = (r − σ 2 /2)t + σB(t),

Dr
39

A simple model for S: geometric Brownian motion (GBM):

aft
S(t) = s0 e X(t) where X(t) = (r − σ 2 /2)t + σB(t),

r is the interest rate, σ is the volatility, and B(⋅) is a standard Brownian motion: for any t2 > t1 ≥ 0,
B(t2 ) − B(t1 ) ∼ N(0, t2 − t1 ), and the increments over disjoint intervals are independent.
Here, the vector Y = (X(t1 ), . . . , X(td )) has a multivariate normal distribution with some mean
vector µ and covariance matrix Σ .
To generate Y , we decompose Σ = AAt , and put Y = AZ where Z ∼ N (0
0, I).

Dr
Possible decompositions: Cholesky, Brownian bridge, PCA, etc.
Choice does not matter for MC but does matter for RQMC.
39

A simple model for S: geometric Brownian motion (GBM):

aft
S(t) = s0 e X(t) where X(t) = (r − σ 2 /2)t + σB(t),

Dr
Possible decompositions: Cholesky, Brownian bridge, PCA, etc.
Choice does not matter for MC but does matter for RQMC.

More general: X(t) can also be a vector in c dimensions, so s = cd.

This can apply more generally to a function of a multinormal vector, in many areas.
40

aft
Example of contract: Discretely-monitored Asian call option:

⎛ 1 d ⎞
g(S(t1 ), . . . , S(td )) = max 0, ∑ S(tj ) − K .
⎝ d j=1 ⎠

Numerical illustration: s = d = 12, T = 1 (one year), tj = j/12 for j = 0, . . . , 12, K = 100,

s0 = 100, r = 0.05, σ = 0.5.

Dr
41

ANOVA Variances for ordinary Asian Option

Asian Option with S(0) = 100, K = 100, r = 0.05, σ = 0.5

aft
d = 3, seq.

d = 3, BB

d = 3, PCA

d = 6, seq.

d = 6, BB

d = 6, PCA

Dr
d = 12, seq.

d = 12, BB

d = 12, PCA
42

Variance with good lattices rules and Sobol points

aft
100 Asian Option (PCA) d = 12, S(0) = 100, K = 100, r = 0.05, σ = 0.5 MC
Sobol
Lattice (P2 ) + bake
variance

n−2
10−3

10−6
102
Dr 103
n
104
43

Example: Pricing an Asian basket financial option

aft
We have c assets, d observation times. Want to estimate E[f (U)], where
⎡ ⎤
⎢ 1 c d ⎥
f (U) = e −r T ⎢
max ⎢0, ∑ ∑ Si (tj ) − K ⎥⎥ .
⎢ cd i=1 j=1 ⎥
⎣ ⎦

Dr
43

Example: Pricing an Asian basket financial option

aft
We have c assets, d observation times. Want to estimate E[f (U)], where
⎡ ⎤
⎢ 1 c d ⎥
f (U) = e −r T ⎢
max ⎢0, ∑ ∑ Si (tj ) − K ⎥⎥ .
⎢ cd i=1 j=1 ⎥
⎣ ⎦

Suppose S(t) = (S1 (t), . . . , Sc (t)) obeys a geometric Brownian motion.

Dr
Then, f (U) = g(Y ) where Y = (Y1 , . . . , Ys ) ∼ N(0
0, Σ ) and s = cd.
44

Numerical experiment with c = 10 and d = 25

This gives a 250-dimensional integration problem.

aft
Let ρi,j = 0.4 for all i =/ j (correlations between assets), T = 1, σi = 0.1 + 0.4(i − 1)/9 for all
i , r = 0.04, S(0) = 100, and K = 100. (Imai and Tan 2002; see L. 2009 for more details).

Dr
44

Numerical experiment with c = 10 and d = 25

This gives a 250-dimensional integration problem.

aft
Let ρi,j = 0.4 for all i =/ j (correlations between assets), T = 1, σi = 0.1 + 0.4(i − 1)/9 for all
i , r = 0.04, S(0) = 100, and K = 100. (Imai and Tan 2002; see L. 2009 for more details).
Variance reduction factors for Cholesky (left) and PCA (right) (experiment from 2003):

Korobov Lattice Rules

n = 16381 n = 65521 n = 262139
a = 5693 a = 944 a = 21876
Lattice+shift 18 878 18 1504 9 2643

Dr
Lattice+shift+baker 50 4553 46 3657

Sobol+Shift
Sobol+LMS+Shift
Sobol’ Nets
n = 214
10 1299 17 3184
6 4232
n = 216

4 9219
43

32
7553

n = 218
6046
35 16557

Note: The payoff function is not smooth and also unbounded. So Koksma-Hlawka does not apply.
45

Many other types of successful applications of RQMC, for

example:

aft
Statistics: estimate and optimize the likelihood function for given data.
Computer graphics: estimate the color of a pixel.
Service systems with random arrival rates.
Stochastic differential equations (e.g., in finance).

Dr
PDEs with random coefficients.
Etc.
46

aft
RQMC for Markov chains

Dr
47

Array-RQMC for Markov Chains

Setting: A Markov chain with state space X ⊆ Rℓ , evolves as

aft
X 0 = x0 , Xj = ϕj (Xj−1 , Uj ), j ≥ 1,

where the Uj are i.i.d. uniform r.v.’s over (0, 1)d . Want to estimate
τ
µ = E[Y ] where Y = ∑ gj (Xj ).
j=1

Ordinary MC: n i.i.d. realizations of Y . Requires s = τ d uniforms.

Dr
47

Array-RQMC for Markov Chains

Setting: A Markov chain with state space X ⊆ Rℓ , evolves as

aft
X 0 = x0 , Xj = ϕj (Xj−1 , Uj ), j ≥ 1,

where the Uj are i.i.d. uniform r.v.’s over (0, 1)d . Want to estimate
τ
µ = E[Y ] where Y = ∑ gj (Xj ).
j=1

Ordinary MC: n i.i.d. realizations of Y . Requires s = τ d uniforms.

Dr
Array-RQMC: L., Lécot, Tuffin, et al. [2004, 2006, 2008, etc.]
Simulate an “array” (or population) of n chains in “parallel.”
Goal: Want small discrepancy between empirical distribution of states
Sn,j = {X0,j , . . . , Xn−1,j } and theoretical distribution of Xj , at each step j.
At each step, use RQMC point set to advance all the chains by one step.
48
Array-RQMC insight: To simplify, suppose Xj ∼ U(0, 1)ℓ . We estimate

µj = E[gj (Xj )] = E[gj (ϕj (Xj−1 , U))] = ∫ gj (ϕj (x, u))dxdu by

aft
[0,1)ℓ+d

1 n−1 1 n−1
µ̂arqmc,j,n = ∑ gj (Xi,j ) = ∑ gj (ϕj (Xi,j−1 , Ui,j )).
n i=0 n i=0

This is (roughly) RQMC with the point set Qn = {(Xi,j−1 , Ui,j ), 0 ≤ i < n} .
We want Qn to have low discrepancy (LD) over [0, 1)ℓ+d .

Dr
48
Array-RQMC insight: To simplify, suppose Xj ∼ U(0, 1)ℓ . We estimate

µj = E[gj (Xj )] = E[gj (ϕj (Xj−1 , U))] = ∫ gj (ϕj (x, u))dxdu by

aft
[0,1)ℓ+d

1 n−1 1 n−1
µ̂arqmc,j,n = ∑ gj (Xi,j ) = ∑ gj (ϕj (Xi,j−1 , Ui,j )).
n i=0 n i=0

This is (roughly) RQMC with the point set Qn = {(Xi,j−1 , Ui,j ), 0 ≤ i < n} .
We want Qn to have low discrepancy (LD) over [0, 1)ℓ+d .
However, we do not choose the Xi,j−1 ’s in Qn : they come from the simulation.

Dr
48
Array-RQMC insight: To simplify, suppose Xj ∼ U(0, 1)ℓ . We estimate

µj = E[gj (Xj )] = E[gj (ϕj (Xj−1 , U))] = ∫ gj (ϕj (x, u))dxdu by

aft
[0,1)ℓ+d

1 n−1 1 n−1
µ̂arqmc,j,n = ∑ gj (Xi,j ) = ∑ gj (ϕj (Xi,j−1 , Ui,j )).
n i=0 n i=0

Dr
Idea: select RQMC point set Q̃n = {(w0 , U0,j ), . . . , (wn−1 , Un−1,j )} , where the wi ∈ [0, 1)ℓ
are fixed and each Ui,j ∼ U(0, 1)d .
Permute the states Xi,j−1 so that Xπj (i),j−1 is “close” to wi for each i and compute
Xi,j = ϕj (Xπj (i),j−1 , Ui,j ) for each i .
Example: If ℓ = 1, can take wi = (i + 0.5)/n and just sort the states.
For ℓ > 1, there are various ways to define the matching (multivariate sort).
49

Array-RQMC algorithm

aft
Xi,0 ← x0 (or Xi,0 ← xi,0 ) for i = 0, . . . , n − 1;
for j = 1, 2, . . . , τ do
Compute the permutation πj of the states (for matching);
Randomize afresh {U0,j , . . . , Un−1,j } in Q̃n ;
Xi,j = ϕj (Xπj (i),j−1 , Ui,j ), for i = 0, . . . , n − 1;
i=0 g(Xi,j );
µ̂arqmc,j,n = Ȳn,j = n1 ∑n−1
Estimate µ by the average Ȳn = µ̂arqmc,n = ∑τj=1 µ̂arqmc,j,n .

Dr
49

Array-RQMC algorithm

Dr
Proposition: The average Ȳn is an unbiased estimator of µ.
Can estimate Var[Ȳn ] by making m independent replications.
50

Example of 2-dim batch sort: a (4,4) mapping

aft
Sobol’ net in 2 dimensions af-
States of the chains ter random digital shift
1.0 1.0 s
s s s
0.9 0.9
s
0.8
s s 0.8 s
s s
0.7 0.7 s
0.6 0.6
s
s s
0.5 0.5 s
s s
0.4

0.3

0.2

0.1

0.0
s s

s
Dr
s

s
s s s

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
s
0.4

0.3

0.2

0.1

0.0
s

s
s
s

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
s
s
51

Example of 2-dim batch sort: a (4,4) mapping

aft
Sobol’ net in 2 dimensions af-
States of the chains ter random digital shift
1.0 1.0 s
s s s
0.9 0.9
s
0.8
s s 0.8 s
s s
0.7 0.7 s
0.6 0.6
s
s s
0.5 0.5 s
s s
0.4

0.3

0.2

0.1

0.0
s s

s
Dr
s

s
s s s

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
s
0.4

0.3

0.2

0.1

0.0
s

s
s
s

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
s
s
52

Example of 2-dim batch sort: a (4,4) mapping

aft
Sobol’ net in 2 dimensions af-
States of the chains ter random digital shift
1.0 1.0 s
s s s
0.9 0.9
s
0.8
s s 0.8 s
s s
0.7 0.7 s
0.6 0.6
s
s s
0.5 0.5 s
s s
0.4

0.3

0.2

0.1

0.0
s s

s
Dr
s

s
s s s

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
s
0.4

0.3

0.2

0.1

0.0
s

s
s
s

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
s
s
52

Example of 2-dim batch sort: a (4,4) mapping

aft
Sobol’ net in 2 dimensions af-
States of the chains ter random digital shift
1.0 1.0 s
s s s
0.9 0.9
s
0.8
s s 0.8 s
s sz
0.7 0.7 s
0.6 0.6
s
sz s
0.5 0.5 sz
s s
0.4

0.3

0.2

0.1

0.0
s s

s
Dr
s

s
sz s s

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
s
0.4

0.3

0.2

0.1

0.0
s

s
s
s

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
s
s
53

Hilbert curve sort

Map the states to [0, 1], then sort.

aft
States of the chains

1.0

0.9
s s
0.8
s s
s
0.7

0.6
s

Dr 0.5

0.4

0.3

0.2

0.1

0.0
s s

s
s

s
s
s

s s

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
s
53

Hilbert curve sort

Map the states to [0, 1], then sort.

aft
States of the chains

1.0

0.9
s s
0.8
s s
s
0.7

0.6
s

Dr 0.5

0.4

0.3

0.2

0.1

0.0
s s

s
s

s
s
s

s s

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
s
53

Hilbert curve sort

Map the states to [0, 1], then sort.

aft
States of the chains

1.0

0.9
s s
0.8
s s
s
0.7

0.6
s

Dr 0.5

0.4

0.3

0.2

0.1

0.0
s s

s
s

s
s
s

s s

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
s
53

Hilbert curve sort

Map the states to [0, 1], then sort.

aft
States of the chains

1.0

0.9
s s
0.8
s s
s
0.7

0.6
s

Dr 0.5

0.4

0.3

0.2

0.1

0.0
s s

s
s

s
s
s

s s

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
s
54

Convergence results and proofs

aft
For ℓ = 1, O(n−3/2 ) variance has been proved under some conditions.

For ℓ > 1, worst-case error of O(n−1/(ℓ+1) ) has been proved in deterministic settings under
strong conditions on ϕj , using a batch sort (El Haddad, Lécot, L’Ecuyer 2008, 2010).

Gerber and Chopin (2015) proved o(n−1 ) variance, for Hilbert sort and digital net with
nested scrambling.

Dr
55

A very simple example, one-dimensional state

Let Y = θU + (1 − θ)V , where U, V indep. U(0, 1) and θ ∈ [0, 1). This Y has cdf Gθ .

aft
Markov chain is defined by

X0 = U0 ∼ U(0, 1);
Xj = ϕj (Xj−1 , Uj ) = Gθ (θXj−1 + (1 − θ)Uj ), j ≥ 1

where Uj ∼ U(0, 1). Then, Xj ∼ U(0, 1).

We consider various functions gj :

Dr
gj (x) = x − 1/2, gj (x) = x 2 − 1/3,
gj (x) = sin(2πx), gj (x) = e x − e + 1,
gj (x) = (x − 1/2)+ − 1/8, gj (x) = I[x ≤ 1/3] − 1/3.
They all have E[gj (Xj )] = 0.
Also discrepancies of states X0,j , . . . , Xn−1,j .
56

aft
For array-RQMC, we take Xi,0 = wi = (i − 1/2)/n.

We have
n−3/2 1−θ
E[Dj2 ] ≤ = n−3/2 .
4(1 − ρ) 4(1 − 2θ)

We tried different RQMC methods, for n = 29 to n = 221 .

We did m = 200 independent replications for each n.
We fitted a linear regression of log2 Var[Ȳn,j ] vs log2 n, for various gj

Dr
We also looked at E[Dj2 ] and E[Pα ] for α = 2, 4, 6.
57

aft
Some MC and RQMC point sets:

MC: Crude Monte Carlo

LHS: Latin hypercube sampling
SS: Stratified sampling
SSA: Stratified sampling with antithetic variates in each stratum
Sobol: Sobol’ points, left matrix scrambling + digital random shift
Sobol+baker: Add baker transformation
Sobol+NUS: Sobol’ points with Owen’s nested uniform scrambling
Korobov:
Korobov+baker:
Dr
Korobov lattice in 2 dim. with a random shift modulo 1
Add a baker transformation
58
slope vs log2 n log2 E[Dj2 ] log2 Var[Ȳn,j ]
Xj − 21 Xj2 − 31 (Xj − 21 )+ − 18 I[Xj ≤ 13 ] − 1
3
MC -1.01 -1.02 -1.01 -1.00 -1.02

aft
LHS -1.02 -0.99 -1.00 -1.00 -1.00
SS -1.50 -1.98 -2.00 -2.00 -1.49
SSA -1.50 -2.65 -2.56 -2.50 -1.50
Sobol -1.51 -3.22 -3.14 -2.52 -1.49
Sobol+baker -1.50 -3.41 -3.36 -2.54 -1.50
Sobol+NUS -1.50 -2.95 -2.95 -2.54 -1.52
Korobov -1.87 -2.00 -1.98 -1.98 -1.85
Korobov+baker -1.92 -2.01 -2.02 -2.01 -1.90

MC
LHS
SS
SSA
Sobol
Korobov
Dr
7.35
8.82
13.73
18.12
19.86
13.55
− log10 Var[Ȳn,j ] for n = 221
Xj2 − 13 (Xj − 12 )+ − 18 I[Xj ≤ 31 ] −
7.86
8.93
14.10
17.41
17.51
14.03
6.98
7.61
10.20
10.38
10.36
11.98
1
3
CPU time (sec)

270
992
2334
1576
443
359
59

Example: Pricing an Asian Call Option

S(0) = 100, K = 100, r = 0.05, σ = 0.15, tj = j/52, j = 0, . . . , τ = 13.
RQMC: Sobol’ points with linear scrambling + random digital shift.

aft
Similar results for randomly-shifted lattice + baker’s transform.
log2 Var[µ̂RQMC,n ]

-10
n−1
crude MC
-20
RQMC sequential
-30

-40

8
Dr10 12 14 16 18 20
array-RQMC, split sort
n−2
log2 n
60

Example: Asian Call Option

Var[Ȳn,j ] ≈ O(nα ).

aft
VRF compared with MC, for n = 220 .
CPU time for m = 100 replications.

Sort RQMC points α VRF CPU (sec)

Batch sort SS -1.38 2.0 × 102 744
(n1 = n2 ) Sobol -2.03 4.2 × 106 532
Sobol+NUS -2.03 2.8 × 106 1035
Korobov+baker -2.04 4.4 × 106 482

Dr
Hilbert sort
(logistic map)
SS
Sobol
Sobol+NUS
Korobov+baker
-1.55
-2.03
-2.02
-2.01
2.4 × 103
2.6 × 106
2.8 × 106
3.3 × 106
840
534
724
567
61

Array-RQMC for a system of biological and chemical reactions

aft
[Puchhammer, Ben Abdellah, and L. 2022]
We have d types of molecules and K types of reactions:
ck
α1,k S1 + ⋯ + αd,k Sd Ð→ β1,k S1 + ⋯ + βd,k Sd , αi,k , βi,k ∈ N0 , k = 1, . . . , K.

Let X(t) = (X1 (t), . . . , Xd (t)) where Xk (t) is the number of molecules of type k at time
t. The process {X(t), t ≥ 0} is modeled as a continuous-time Marlov chain (CTMC).
The propensity (or occurrence rate) of reaction k at time t is ak (X(t)) = ck hk (X(t)).

Dr
61

Array-RQMC for a system of biological and chemical reactions

a given time T . Dr
We discretize the time and simulate the discrete-time chain to estimate say E[g(X(T )] for

For Array-RQMC, finding good sorting methods is an important issue.

Example with 6 molecule types and 6 reactions types

aft
c1
Ð→
PKA + 2cAMP ←Ð PKA-cAMP2 ,
c2
c3
Ð→
PKA-cAMP2 + 2cAMP ←Ð PKA-cAMP4 ,
c4
c5
Ð→
PKA-cAMP4 ←Ð PKAr + 2PKAc.
c6

Dr
We simulate 256 time steps.
The state has 6 dimensions, and we need 6 random numbers per step.
Classical RQMC requires 256 × 6 = 1536-dimensional points.
Array-RQMC requires 7 or 12-dimensional RQMC points.
63

Example with 6 molecule types and 6 reactions types

aft
Here we estimate the expected number of PKAr molecules at time T .
Estimated convergence rate β̂ and variance reduction factor for n = 219 points:

Sort Sampling β̂ vrf19 eif19

MC 1.03 1 1
RQMC 1.17 39 45
Latice+shift 1.42 3634 1550
OSLAIF

Dr
Batch-sort-6
Sobol+LMS+ds
Latice+shift
Sobol+LMS+ds
1.47
1.62
1.46
2062
3026
2753
1059
1560
1749
64

Array-RQMC: Some generalizations

aft
L., Lécot, and Tuffin [2008]: τ can be a random stopping time w.r.t. the filtration
F{(j, Xj ), j ≥ 0}.

L., Demers, and Tuffin [2006, 2007]: Combination with splitting techniques (multilevel and
without levels), combination with importance sampling and weight windows. Covers particle
filters.

L. and Sanvido [2010]: Combination with coupling from the past for exact sampling.

stopping problems.
Dr
Dion and L. [2010]: Combination with approximate dynamic programming and for optimal

Gerber and Chopin [2015]: Sequential QMC.

Array-RQMC: Convergence results

aft
L., Lécot, and Tuffin [2006, 2008]: Special cases: convergence at MC rate, one-dimensional,
stratification, etc. O(n−3/2 ) variance.

Lécot and Tuffin [2004]: Deterministic, one-dimension, discrete state.

El Haddad, Lécot, L. [2008, 2010]: Deterministic, multidimensional. O(n−1/(ℓ+1) ) worst-case error

under some conditions.

Dr
Fakhererredine, El Haddad, Lécot [2012, 2013, 2014]: LHS, stratification, Sudoku sampling, ...

L., Lécot, Munger, and Tuffin [2016]: Survey, comparing sorts, and further examples, some with
O(n−3 ) empirical variance.
66

Array-RQMC: Some applications and examples

aft
L., Lécot, and Tuffin [2008]: Several examples.

Wächter and Keller [2008]: Applications in computer graphics.

Gerber and Chopin [2015]: Sequential QMC (particle filters), Owen nested scrambling and Hilbert
sort. o(n−1 ) variance.

Ben Abdellah, L., Puchhammer [2019]: Option Pricing Under Stochastic Volatility Models.

Dr
Puchhammer, Ben Abdellah, L. [2020]: Simulation of Biological and Chemical Reactions.

Arnold, Chen, Sha [2021]: Policy Gradient Methods for Reinforcement Learning.
67

Conclusion, discussion, etc.

▸ RQMC can improve the accuracy of estimators considerably in some applications.

aft
▸ Cleverly modifying the function f can often bring huge statistical efficiency
improvements in simulations with RQMC.
▸ There are often many possibilities for how to change f to make it smoother, periodic,
and reduce its effective dimension.
▸ Point set constructions should be based on discrepancies that take that into account.
▸ Nonlinear functions of expectations: RQMC also reduces the bias.
▸ RQMC for density estimation.

Dr
▸ RQMC for optimization.
▸ Array-RQMC and other QMC methods for Markov chains. Sequential RQMC.
▸ Still a lot to learn and do ...
▸ Software support still very limited in general!

Empirical Process Theory in Time Series Analysis
No ratings yet
Empirical Process Theory in Time Series Analysis
6 pages
High-Dimensional Numerical Integration Techniques
No ratings yet
High-Dimensional Numerical Integration Techniques
28 pages
Monte Carlo Integration Techniques
No ratings yet
Monte Carlo Integration Techniques
41 pages
Introduction to Uncertainty Quantification
No ratings yet
Introduction to Uncertainty Quantification
77 pages
Key Probability Inequalities and Theorems
No ratings yet
Key Probability Inequalities and Theorems
12 pages
Monte Carlo Integration with Subtraction
No ratings yet
Monte Carlo Integration with Subtraction
19 pages
Asymptotic Integral Approximations
No ratings yet
Asymptotic Integral Approximations
325 pages
Quasi Monte Carlo Integration Techniques
No ratings yet
Quasi Monte Carlo Integration Techniques
24 pages
Numerical Probability ch4
No ratings yet
Numerical Probability ch4
38 pages
Bounded Variation
No ratings yet
Bounded Variation
30 pages
On The Distribution of Integration Error by Randomly-Shifted Lattice Rules - L'Ecuyer (2010)
No ratings yet
On The Distribution of Integration Error by Randomly-Shifted Lattice Rules - L'Ecuyer (2010)
44 pages
Lattice Reductions: SIS and LWE Insights
No ratings yet
Lattice Reductions: SIS and LWE Insights
13 pages
Poisson Mean Estimation with Known Errors
No ratings yet
Poisson Mean Estimation with Known Errors
10 pages
Prague Summer School: Discrepancy Theory
No ratings yet
Prague Summer School: Discrepancy Theory
6 pages
Stein's Method for Markov Chains
No ratings yet
Stein's Method for Markov Chains
16 pages
Monte Carlo Simulation and Stratification
No ratings yet
Monte Carlo Simulation and Stratification
56 pages
Monte Carlo Simulation Techniques in Python
No ratings yet
Monte Carlo Simulation Techniques in Python
5 pages
Kolmogorov-Smirnov Distance Insights
No ratings yet
Kolmogorov-Smirnov Distance Insights
3 pages
Non-Uniform Riemann Itô Integral Methods
No ratings yet
Non-Uniform Riemann Itô Integral Methods
20 pages
Statistics 201A: Midterm Review Notes
No ratings yet
Statistics 201A: Midterm Review Notes
15 pages
Stein's Method for Gaussian Approximations
No ratings yet
Stein's Method for Gaussian Approximations
26 pages
Variance Reduction in Monte Carlo Methods
No ratings yet
Variance Reduction in Monte Carlo Methods
73 pages
Heat Kernels and Harnack Inequalities
No ratings yet
Heat Kernels and Harnack Inequalities
90 pages
Random Walks in One-Dimensional Diffusion
No ratings yet
Random Walks in One-Dimensional Diffusion
27 pages
Importance Sampling Techniques Explained
No ratings yet
Importance Sampling Techniques Explained
13 pages
Quasi Monte Carlo Integration Techniques
No ratings yet
Quasi Monte Carlo Integration Techniques
20 pages
Large Deviation Principles Explained
No ratings yet
Large Deviation Principles Explained
84 pages
Concentration Inequalities Overview
No ratings yet
Concentration Inequalities Overview
22 pages
Wavelet Monte Carlo for Integral Equations
No ratings yet
Wavelet Monte Carlo for Integral Equations
11 pages
1950 - Paper - Information Retrieval Viewed As Temporal Signalling
No ratings yet
1950 - Paper - Information Retrieval Viewed As Temporal Signalling
12 pages
Monte-Carlo Methods in Finance Explained
No ratings yet
Monte-Carlo Methods in Finance Explained
10 pages
Variance Inequality for U-Statistics
No ratings yet
Variance Inequality for U-Statistics
15 pages
Statistical Properties of Lorentz Gas
No ratings yet
Statistical Properties of Lorentz Gas
20 pages
Monte Carlo Variance Reduction Techniques
No ratings yet
Monte Carlo Variance Reduction Techniques
9 pages
CS203 PartIV Lecture Notes
No ratings yet
CS203 PartIV Lecture Notes
15 pages
Multilevel Monte Carlo Methods Explained
No ratings yet
Multilevel Monte Carlo Methods Explained
10 pages
Doob's Maximal Lp Inequality Explained
No ratings yet
Doob's Maximal Lp Inequality Explained
15 pages
Functional Central Limit Theorem Overview
No ratings yet
Functional Central Limit Theorem Overview
38 pages
Delta Method in Statistics Explained
No ratings yet
Delta Method in Statistics Explained
10 pages
KTTT Log
No ratings yet
KTTT Log
4 pages
MIT18 175S14 Lecture16
No ratings yet
MIT18 175S14 Lecture16
13 pages
High-Dimensional Inequalities Survey
No ratings yet
High-Dimensional Inequalities Survey
69 pages
Characterizing Weibull Geometric Distribution
No ratings yet
Characterizing Weibull Geometric Distribution
10 pages
Spectral Analysis of Quadratic Variation
No ratings yet
Spectral Analysis of Quadratic Variation
10 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
74 pages
Stochastic Approximation Algorithms Explained
No ratings yet
Stochastic Approximation Algorithms Explained
14 pages
Understanding Conditional Distributions
No ratings yet
Understanding Conditional Distributions
5 pages
BS Lec05
No ratings yet
BS Lec05
41 pages
Probability: Concentration Inequalities & WLLN
No ratings yet
Probability: Concentration Inequalities & WLLN
42 pages
Brownian Motion and Stochastic Calculus
No ratings yet
Brownian Motion and Stochastic Calculus
19 pages
Central Limit Theorem for Convex Sets
No ratings yet
Central Limit Theorem for Convex Sets
45 pages
Mogulskii's Theorem in Large Deviations
No ratings yet
Mogulskii's Theorem in Large Deviations
5 pages
Quasi-Monte Carlo Methods Explained
No ratings yet
Quasi-Monte Carlo Methods Explained
68 pages
Analytic Properties of Brownian Motion
No ratings yet
Analytic Properties of Brownian Motion
8 pages
High-Dimensional Semi-Supervised Learning Insights
No ratings yet
High-Dimensional Semi-Supervised Learning Insights
42 pages
Continuous Random Variables Overview
No ratings yet
Continuous Random Variables Overview
67 pages
Read Probability Statistics and Random Processes For Engineers 4th Edition Stark Solutions Manual Full Ebook Today
100% (4)
Read Probability Statistics and Random Processes For Engineers 4th Edition Stark Solutions Manual Full Ebook Today
193 pages
Markov Chain Monte Carlo Techniques
100% (1)
Markov Chain Monte Carlo Techniques
69 pages
Kendall
No ratings yet
Kendall
3 pages
McDonald's Total Rewards Strategy Overview
No ratings yet
McDonald's Total Rewards Strategy Overview
14 pages
Karnataka University Corporate Accounting Exam Guide
No ratings yet
Karnataka University Corporate Accounting Exam Guide
7 pages
Effective Engineering Presentation Techniques
No ratings yet
Effective Engineering Presentation Techniques
30 pages
5.56 Rifle Ammo Item Details
No ratings yet
5.56 Rifle Ammo Item Details
3 pages
Essence of Indian Traditional Knowledge
No ratings yet
Essence of Indian Traditional Knowledge
33 pages
Porn King The Autobiography of John C Holmes John Holmes Premium Version
100% (3)
Porn King The Autobiography of John C Holmes John Holmes Premium Version
183 pages
NRC 2025 Abstract Submission Call
No ratings yet
NRC 2025 Abstract Submission Call
5 pages
Bigeye Thresher Shark in Gulf of Mexico
No ratings yet
Bigeye Thresher Shark in Gulf of Mexico
6 pages
Dexamethasone Eye Drops - Monograph
No ratings yet
Dexamethasone Eye Drops - Monograph
4 pages
High-Yield Topics for UPSC EPFO/APFC Exam
No ratings yet
High-Yield Topics for UPSC EPFO/APFC Exam
3 pages
HCI Exam Questions and Answers Guide
No ratings yet
HCI Exam Questions and Answers Guide
4 pages
Robotics Overview by Billal Khan
No ratings yet
Robotics Overview by Billal Khan
43 pages
Bài Tập Câu Bị Động Lớp 10
No ratings yet
Bài Tập Câu Bị Động Lớp 10
12 pages
Acceptance Letter for DIT Electromobility Program
No ratings yet
Acceptance Letter for DIT Electromobility Program
12 pages
Gus Malzahn's Tulsa Offense Manual
93% (30)
Gus Malzahn's Tulsa Offense Manual
150 pages
Biochemistry Review Questions for CHE 212
No ratings yet
Biochemistry Review Questions for CHE 212
3 pages
Foundation Fine Arts Resource Guide
No ratings yet
Foundation Fine Arts Resource Guide
8 pages
Acne Scars: Classification & Treatment
100% (23)
Acne Scars: Classification & Treatment
16 pages
AI Transforming Smart Cities
No ratings yet
AI Transforming Smart Cities
2 pages
CCB 231 Materials Science Exam Paper
100% (1)
CCB 231 Materials Science Exam Paper
6 pages
Top Inventions That Changed Humanity
No ratings yet
Top Inventions That Changed Humanity
3 pages
Business Guide for ESPC Exhibitors in Ivory Coast
No ratings yet
Business Guide for ESPC Exhibitors in Ivory Coast
136 pages
Culturally Responsive Teaching Strategies
No ratings yet
Culturally Responsive Teaching Strategies
3 pages
Sheltech Ceramics Inventory Structure
No ratings yet
Sheltech Ceramics Inventory Structure
1 page
The Neural Basis of Improved Cognitive Performance by Threat of Shock
No ratings yet
The Neural Basis of Improved Cognitive Performance by Threat of Shock
10 pages
AEDT Icepak Model Building Guide
No ratings yet
AEDT Icepak Model Building Guide
50 pages
9702 Physics March 2026 Mark Scheme 33
No ratings yet
9702 Physics March 2026 Mark Scheme 33
13 pages
JL-15KW Portable Induction Heater
No ratings yet
JL-15KW Portable Induction Heater
2 pages
Level of Service Calculation Overview
No ratings yet
Level of Service Calculation Overview
7 pages
Question Tags Worksheet and Exercises
No ratings yet
Question Tags Worksheet and Exercises
2 pages