0% found this document useful (0 votes)
3 views102 pages

Etics24 RQMC Tutorial B

The document introduces randomized quasi-Monte Carlo (RQMC) methods for simulation, focusing on estimating integrals using deterministic and randomized point sets. It discusses the concepts of discrepancies, error bounds, and variance reduction techniques in the context of lattice rules and Fourier expansions. Additionally, it highlights the importance of smoothness in functions for achieving faster convergence rates in RQMC applications.

Uploaded by

Said Saterih
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views102 pages

Etics24 RQMC Tutorial B

The document introduces randomized quasi-Monte Carlo (RQMC) methods for simulation, focusing on estimating integrals using deterministic and randomized point sets. It discusses the concepts of discrepancies, error bounds, and variance reduction techniques in the context of lattice rules and Fourier expansions. Additionally, it highlights the importance of smoothness in functions for achieving faster convergence rates in RQMC applications.

Uploaded by

Said Saterih
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

1

Introduction to randomized quasi-Monte Carlo

aft
methods in simulation

Part B

Pierre L’Ecuyer

Dr Université de Montréal, Canada

ETICS, Saissac, September 2024


2

aft
QMC/RQMC Theory

Dr
3

Short Recap
We want to estimate µ = ∫[0,1)s f (u)du where U = (U0 , . . . , Us−1 ) and the Uj are i.i.d. U(0, 1).

aft
Monte Carlo (MC) estimator:
1 n−1
µ̂n = ∑ f (Ui )
n i=0
where U0 , . . . , Un−1 i.i.d. uniform over [0, 1)s .
We have E[µ̂n ] = µ and Var[µ̂n ] = σ 2 /n = Var[f (U)]/n.
Quasi-Monte Carlo (QMC) approximation:

1 n−1
µ̄n = ∑ f (ui )
n i=0

Dr
where Pn = {u0 , . . . , un−1 } are deterministic points that cover [0, 1)s very evenly.
Randomized Quasi-Monte Carlo (RQMC) estimator:

µ̂n,rqmc =
1 n−1
∑ f (Ui ),
n i=0

where {U0 , . . . , Un−1 } ⊂ (0, 1)s are randomized versions of the points of Pn so that still cover the space
evenly while each Ui is uniform over [0, 1)s .
4

How should we measure the uniformity of Pn ?


We want a measure that can translate into error or variance bounds.

aft
For two arbitrary points a < b ∈ (0, 1)s , the rectangular box B = [a, b) with corners at a and
b is an s-dimensional interval. Let vol[a, b) denote the volume of this box and ∣Pn ∩ [a, b)∣/n
be the proportion of the points that it contains. We call the absolute difference
disc([a, b), Pn ) = ∣vol[a, b) − ∣Pn ∩ [a, b)∣/n∣ the local discrepancy for that interval.

1
u2
b2

Dr a2
0 a1 b1 1 u1

To measure the discrepancy between the distribution of Pn and the uniform distribution, we
can take the sup or average of disc([a, b), Pn ) over the intervals B.
5
Classical discrepancies
Extreme discrepancy of Pn : D(Pn ) = sup disc([a, b), Pn ).
[a,b)⊆[0,1)s

aft
Star discrepancy of Pn : D∗ (Pn ) = sup disc([0
0, b], Pn ).
b∈[0,1)s

These two discrepancies behave in the same way when n → ∞:


Theorem. D∗ (Pn ) ≤ D(Pn ) ≤ 2s D∗ (Pn ).

The sup can also be replaced by a more general Lp norm, for 1 ≤ p < ∞.
Lp star discrepancy of Pn :


D(p) (Pn ) = (∫
Dr
[0,1)s
(disc([0
0, u), Pn ))p du)

and similarly for the Lp extreme discrepancy.


1/p

For large s and n, D∗ (Pn ) is too hard to compute, but D(2)



≤ D∗ (Pn ) ≤ cs,p [D(p)

(Pn ) can be computed in


O(sn ) operations for an arbitrary Pn via a formula from Warnok.
2
(Pn )]p/(p+s) ,
6

Is D∗ (Pn ) really meaningful?


Let P∞ = {u1 , u2 , . . . } be an infinite (deterministic) sequence of points and Pn denote the

aft
first n points of this sequence. P∞ is called uniformly distributed if D∗ (Pn ) → 0 as n → ∞.
Theorem (Weyl 1916). When n → ∞, ∣En ∣ → 0 for all bounded Riemann-integrable
functions f if and only if D∗ (Pn ) → 0.

Dr
6

Is D∗ (Pn ) really meaningful?


Let P∞ = {u1 , u2 , . . . } be an infinite (deterministic) sequence of points and Pn denote the

aft
first n points of this sequence. P∞ is called uniformly distributed if D∗ (Pn ) → 0 as n → ∞.
Theorem (Weyl 1916). When n → ∞, ∣En ∣ → 0 for all bounded Riemann-integrable
functions f if and only if D∗ (Pn ) → 0.

Classical Koksma-Hlawka inequality (Koksma(1942) for s = 1, Hlawka(1961) for s > 1):

∣En ∣ ≤ Vhk (f ) ⋅ D∗ (Pn )

Dr
where Vhk (f ) denotes the total variation of f in the sense of Hardy and Krause;
see Kuipers and Niederreiter (1974) for a definition.
Vhk (f ) < ∞ implies no discontinuity not aligned with the axes.

Can we easily bound the discrepancy of a Pn ?


7

Erdõs-Turán-Koksma Inequality. For each s, there is a constant cs such that for all Pn ,
R RRR
⎛ 1 1 RRRR 1 n−1 2πiht ui RRR⎞

aft
∗ RR ∑ e
D (Pn ) ≤ cs + ∑ ∏ RRR
⎝ m + 1 h∈Zs (m) j =/ 0 ∣hj ∣ RRRR n i=0 RRR⎠
R

for all integers m ≥ 1, where Zs (m) = {h = (h1 , . . . , hs ) ∈ {−m, . . . , m}s } and i = −1.
This inequality permits one to construct explicit low-discrepancy sequences by bounding the
exponential sum. It is a convenient tool.
It was used for example to prove that the digital sequence constructions by Sobol’, Faure,
Niederreiter, etc., have a discrepancy that converges as O(n−1 (log n)s ). This implies that

Dr
with these point sets, if Vhk (f ) < ∞, then

∣En ∣ ≤ Vhk (f ) ⋅ D∗ (Pn ) = O(n−1 (log n)s ) .


7

Erdõs-Turán-Koksma Inequality. For each s, there is a constant cs such that for all Pn ,
R RRR
⎛ 1 1 RRRR 1 n−1 2πiht ui RRR⎞

aft
∗ RR ∑ e
D (Pn ) ≤ cs + ∑ ∏ RRR
⎝ m + 1 h∈Zs (m) j =/ 0 ∣hj ∣ RRRR n i=0 RRR⎠
R

for all integers m ≥ 1, where Zs (m) = {h = (h1 , . . . , hs ) ∈ {−m, . . . , m}s } and i = −1.
This inequality permits one to construct explicit low-discrepancy sequences by bounding the
exponential sum. It is a convenient tool.
It was used for example to prove that the digital sequence constructions by Sobol’, Faure,
Niederreiter, etc., have a discrepancy that converges as O(n−1 (log n)s ). This implies that

Dr
with these point sets, if Vhk (f ) < ∞, then

∣En ∣ ≤ Vhk (f ) ⋅ D∗ (Pn ) = O(n−1 (log n)s ) .

For lattice rules and digital nets, we will rather use other variants of this inequality that do
not involve D∗ (Pn ), but other discrepancies that are much easier to compute.
8

Error and variance expressions for lattice rules


Suppose f has Fourier expansion

aft
t
f (u) = ∑ fˆ(h)e 2πih u for u ∈ [0, 1)s ,
h∈Zs

with Fourier coefficients


fˆ(h) = ∫ f (u) exp(−2πi h ⋅ u) du.
(0,1)s

Note that
t
s s
e 2πih u = ∏ e 2πihj uj = ∏[cos(2πhj uj ) + i sin(2πhj uj )] (by Euler formula),
j=1

Dr j=1

so this expansion is in terms of one-periodic trigonometric functions over the real space.
It is defined everywhere, just by making copies of its definition over [0, 1)s .
When using standard MC to estimate µ = E[f (U)], the MC variance is
t
Var[f (U)] = ∑ ∣fˆ(h)∣2 Var[e 2πih U ] = ∑ ∣fˆ(h)∣2 .
h∈Zs h∈Zs
9

Suppose now that we estimate µ = E[f (U)] using a lattice rule with point set
Pn = {ui , i = 0, . . . , n − 1}. The dual lattice to Ls is

aft
L∗s = {h ∈ Rs ∶ h t v ∈ Z for all v ∈ Ls } ⊆ Zs .

One can show that in this case,



1 n−1 2πiht ui ⎪
⎪1 if h ∈ L∗s ;
∑e =⎨
n i=0 ⎪

⎩0 otherwise.

For a deterministic lattice rule, if ∑ ∣fˆ(h)∣ < ∞, then En = ∑ fˆ(h).

Dr h∈Zs 0=
/ h∈L∗s
9

Suppose now that we estimate µ = E[f (U)] using a lattice rule with point set
Pn = {ui , i = 0, . . . , n − 1}. The dual lattice to Ls is

aft
L∗s = {h ∈ Rs ∶ h t v ∈ Z for all v ∈ Ls } ⊆ Zs .

One can show that in this case,



1 n−1 2πiht ui ⎪
⎪1 if h ∈ L∗s ;
∑e =⎨
n i=0 ⎪

⎩0 otherwise.

For a deterministic lattice rule, if ∑ ∣fˆ(h)∣ < ∞, then En = ∑ fˆ(h).

Dr h∈Zs

For a randomly shifted lattice, Var[µ̂n,rqmc ] = ∑ ∣fˆ(h)∣2 .


0=
/ h∈L∗s
0=
/ h∈L∗s

From the viewpoint of variance reduction, an optimal lattice for f minimizes Var[µ̂n,rqmc ].
Roughly, the dual lattice should not contain the large (absolute) Fourier coefficients.
10

For a integer α > 0, a function f has smoothness α if it has square-integrable mixed partial
derivatives up to order α. If f has smoothness α and the periodic continuation of its

aft
derivatives up to order α − 1 is continuous across the unit cube boundaries, then

∣fˆ(h)∣ = O((max(1, h1 )⋯ max(1, hs ))−α ).

Moreover, there always exists a rank-1 lattice (a vector v1 = v1 (n) = a(n)/n) such that

Pα ∶= ∑ (max(1, h1 )⋯ max(1, hs ))−α = O(n−α+ϵ ),


0=
/ h∈L∗s

which gives Var[µ̂n,rqmc ] = O(n−2α+ϵ ), for any ϵ > 0.

Dr
Pα is the error and P2α is the variance for a (worst-case) function f for which

fˆ(h) = (max(1, ∣h1 ∣)⋯ max(1, ∣hs ∣))−α .

A larger α means a smoother f and a faster convergence rate.


11
When α is a positive integer, this worst-case f is
(2π)α
f ∗ (u) = ∑ ∏ Bα (uj )
α!

aft
u⊆{1,...,s} j∈u

and we have
∣u∣
1 n−1 −(−4π 2 )α
P2α = ∑ ∑[ ] ∏ B2α (ui,j )
∅≠u⊆{1,...,s} n i=0 (2α)! j∈u
where Bα is the Bernoulli polynomial of degree α.
In particular, B1 (u) = u − 1/2 and B2 (u) = u 2 − u + 1/6.
Can compute this P2α in O(sn) operations. Easy to search for good lattices in this case!
P2α has been proposed long ago as a figure of merit.

Dr
Beware: in many “older” papers, α is replaced by α/2.
Worst-case function may not be representative of what happens in applications.
Also, the hidden factor in O increases quickly with s.
To get a bound that is uniform in s, the Fourier coefficients must decrease fast enough
with the dimension and “size” of vectors h. This is typically what happens in applications
for which RQMC is effective!
12

Baker’s (or tent) transformation


In applications, the periodic continuation of f is often not continuous at the boundary of
[0, 1)s (i.e., over the torus). A simple transformation can make it continuous:

aft
If f (0) =/ f (1), define f˜ by f˜(1 − u) = f˜(u) = f (2u) for 0 ≤ u ≤ 1/2.
This f˜ has the same integral as f and f˜(0) = f˜(1).

Dr 0
1/2

.
1
12

Baker’s (or tent) transformation


In applications, the periodic continuation of f is often not continuous at the boundary of
[0, 1)s (i.e., over the torus). A simple transformation can make it continuous:

aft
If f (0) =/ f (1), define f˜ by f˜(1 − u) = f˜(u) = f (2u) for 0 ≤ u ≤ 1/2.
This f˜ has the same integral as f and f˜(0) = f˜(1).

Dr 0
1/2

.
1
12

Baker’s (or tent) transformation


In applications, the periodic continuation of f is often not continuous at the boundary of
[0, 1)s (i.e., over the torus). A simple transformation can make it continuous:

aft
If f (0) =/ f (1), define f˜ by f˜(1 − u) = f˜(u) = f (2u) for 0 ≤ u ≤ 1/2.
This f˜ has the same integral as f and f˜(0) = f˜(1).

Dr 0
1/2

.
1
12

Baker’s (or tent) transformation


In applications, the periodic continuation of f is often not continuous at the boundary of
[0, 1)s (i.e., over the torus). A simple transformation can make it continuous:

aft
If f (0) =/ f (1), define f˜ by f˜(1 − u) = f˜(u) = f (2u) for 0 ≤ u ≤ 1/2.
This f˜ has the same integral as f and f˜(0) = f˜(1).

Dr 0
1/2
1

For smooth f , it can reduce the variance from O(n−2+ϵ ) to O(n−4+ϵ ) (Hickernell 2002).
The resulting f˜ is symmetric with respect to u = 1/2.
In practice, we transform the points Ui instead of f .
13

One-dimensional case

aft
Random shift mod 1 followed by baker’s transformation.
Along each coordinate, stretch everything by a factor of 2 and fold back.
Same as replacing Uj by min[2Uj , 2(1 − Uj )].

0 0.5 1

Dr
13

One-dimensional case

aft
Random shift mod 1 followed by baker’s transformation.
Along each coordinate, stretch everything by a factor of 2 and fold back.
Same as replacing Uj by min[2Uj , 2(1 − Uj )].

0 0.5 1
U/n

Dr
13

One-dimensional case

aft
Random shift mod 1 followed by baker’s transformation.
Along each coordinate, stretch everything by a factor of 2 and fold back.
Same as replacing Uj by min[2Uj , 2(1 − Uj )].

0 0.5 1

Dr
13

One-dimensional case

aft
Random shift mod 1 followed by baker’s transformation.
Along each coordinate, stretch everything by a factor of 2 and fold back.
Same as replacing Uj by min[2Uj , 2(1 − Uj )].

0 0.5 1

Dr
Gives locally antithetic points in intervals of size 2/n.
This implies that linear pieces over these intervals are integrated exactly.
Intuition: when f is smooth, it is well-approximated by a piecewise linear function which is
integrated exactly, so the error is small.
14

High dimension: ANOVA decomposition


For large s, covering the hypercube [0, 1)s evenly requires an astronomical number of points.

aft
Just to have one point in each “quadrant”, we need 2s points, which is unrealistic.

Dr
14

High dimension: ANOVA decomposition


For large s, covering the hypercube [0, 1)s evenly requires an astronomical number of points.

aft
Just to have one point in each “quadrant”, we need 2s points, which is unrealistic.
Fortunately, RQMC can still be effective if f is well approximated by a sum of low-dimensional
functions, because it suffices to have low error or variance for these low-dimensional functions.

Dr
14

High dimension: ANOVA decomposition


For large s, covering the hypercube [0, 1)s evenly requires an astronomical number of points.

aft
Just to have one point in each “quadrant”, we need 2s points, which is unrealistic.
Fortunately, RQMC can still be effective if f is well approximated by a sum of low-dimensional
functions, because it suffices to have low error or variance for these low-dimensional functions.
In general, we can write
s s
f (u) = ∑ fu (u) = µ + ∑ f{i} (ui ) + ∑ f{i,j} (ui , uj ) + ⋯
u⊆{1,...,s} i=1 i,j=1

in a way that the Monte Carlo variance decomposes as

Dr
Var[f (U)] = σ 2 = ∑
u⊆{1,...,s}
σu2 = ∑
u⊆{1,...,s}
Var[fu (U)].

The σu2 ’s can be estimated by MC or RQMC (I skip the details here). Idea: Make sure the
projections Pn (u) of Pn over u are very uniform for the important subsets u (those with a large σu2 ).
When they are mostly lower-dim. projections, then it is much easier to make RQMC effective.
15

Weighted Pγ,α with projection-dependent weights γu


Denote u(h) = u(h1 , . . . , hs ) the set of indices j for which hj =/ 0.

aft
Pγ,α = ∑ γu(h) (max(1, ∣h1 ∣)⋯ max(1, ∣hs ∣)) .
−α

/ h∈L∗s
0=

For α integer > 0, with ui = (ui,1 , . . . , ui,s ) = i v1 mod 1,

1 n−1 −(−4π 2 )α
∣u∣
Pγ,2α = ∑ γu ∑[ ] ∏ B2α (ui,j ),
∅≠u⊆{1,...,s} n i=0 (2α)! j∈u

and the corresponding variation is

Vγ,α
2

Dr
(f )=

Var[µ̂n,rqmc ] =


u⊆{1,...,s}
1
∫ ∣
∂ α∣u∣
2 α∣u∣ [0,1]∣u∣ ∂u α u
∅≠u⊆{1,...,s} γu (4π )

for f ∶ [0, 1)s → R smooth enough. Then, we have the variance bound
f (u)∣ du,

Var[µ̂n,rqmc (fu )] ≤ Vγ,α


2
2

(f ) ⋅ Pγ,2α .
16
This Pγ,2α is the RQMC variance for the worst-case function with Vγ,α
2
(f ) ≤ 1:

√ (2π)α
f ∗ (u) =

aft
∑ γu ∏ Bα (uj ),
u⊆{1,...,s} j∈u (α)!

whose Fourier coefficients are

fˆ∗ (h) = γu(h) (max(1, ∣h1 ∣)⋯ max(1, ∣hs ∣))−α .

For this worst-case function, we have

(4π 2 )α (4π 2 )α
∣u∣ ∣u∣

Dr
σu2 = γu [Var[Bα (U)]
((α)!)2
] = γu [∣B2α (0)∣

For α = 1, we should take γu = (3/π 2 )∣u∣ σu2 ≈ (0.30396)∣u∣ σu2 .


For α = 2, we should take γu = [45/π 4 ]∣u∣ σu2 ≈ (0.46197)∣u∣ σu2 .
For α → ∞, we have γu → (0.5)∣u∣ σu2 .
The ratios weight / variance should decrease exponentially with ∣u∣.
(2α)!
] .
17

Heuristics for choosing the weights


Firstly, note that all one-dimensional projections (before random shift) are the same. So

aft
the weights γu for ∣u∣ = 1 are irrelevant.

Dr
17

Heuristics for choosing the weights


Firstly, note that all one-dimensional projections (before random shift) are the same. So

aft
the weights γu for ∣u∣ = 1 are irrelevant.

For f ∗ , take γu = ρ∣u∣ σu2 for a constant ρ, but there are 2s − 1 subsets u to consider!

Dr
17

Heuristics for choosing the weights


Firstly, note that all one-dimensional projections (before random shift) are the same. So

aft
the weights γu for ∣u∣ = 1 are irrelevant.

For f ∗ , take γu = ρ∣u∣ σu2 for a constant ρ, but there are 2s − 1 subsets u to consider!

One could define a simple parametric model and estimate the parameters by matching the
ANOVA variances σu2 . Some popular model types for the weights:
Order-dependent weights: γu = Γ∣u∣ depends only on ∣u∣.
It suffices to select Γ1 , Γ2 , Γ3 , etc.

Dr
Often, one would take Γj = 0 for j > k for some integer k (k parameters to select).
Or Γj = ρj−1 for all j (a single parameter to select).
Product weights: γu = ∏j∈u γj where γj ≥ 0 is the weight for coordinate j.
Mixture: POD weights γu = Γ∣u∣ ∏j∈u γj , SPOD weights, etc.
18

Notion of effective dimension

aft
(Caflisch, Morokoff, and Owen 1997).
A function f has effective dimension d in proportion ρ in the superposition sense if

∑ σu ≥ ρσ .
2 2

∣u∣≤d

It has effective dimension d in proportion ρ in the truncation sense if

∑ σu2 ≥ ρσ 2 .

Dr u⊆{1,...,d}

High-dimensional functions with low effective dimension are frequent.


One may change f to make this happen.
19

ANOVA for estimator of P[T > x] in Stochastic Activity Network

aft
Stochastic Activity Network

x = 64

x = 100

CMC, x = 64

Dr
CMC, x = 100
0 20 40 60 80
% of total variance for each cardinality of u
100
20

Searching for good parameters


Korobov lattices. Search over all admissible a, for a = (1, a, a2 , . . . , ...).

aft
Random Korobov. Try r random values of a.

Dr
20

Searching for good parameters


Korobov lattices. Search over all admissible a, for a = (1, a, a2 , . . . , ...).

aft
Random Korobov. Try r random values of a.
Rank 1, exhaustive search. Try all possibilities for a = (1, a2 , . . . , as ).
Pure random search. Try admissible vectors a at random.

Dr
20

Searching for good parameters


Korobov lattices. Search over all admissible a, for a = (1, a, a2 , . . . , ...).

aft
Random Korobov. Try r random values of a.
Rank 1, exhaustive search. Try all possibilities for a = (1, a2 , . . . , as ).
Pure random search. Try admissible vectors a at random.
Component by component (CBC) construction. (Sloan, Kuo, etc.).
Let a1 = 1;
For j = 2, 3, . . . , s, find z ∈ {1, . . . , n − 1}, gcd(z , n) = 1, such that
(a1 , a2 , . . . , aj = z) minimizes D(Pn ({1, . . . , j})).

Dr
Fast CBC construction for Pγ,α : use FFT. (Nuyens, Cools).
20

Searching for good parameters


Korobov lattices. Search over all admissible a, for a = (1, a, a2 , . . . , ...).

aft
Random Korobov. Try r random values of a.
Rank 1, exhaustive search. Try all possibilities for a = (1, a2 , . . . , as ).
Pure random search. Try admissible vectors a at random.
Component by component (CBC) construction. (Sloan, Kuo, etc.).
Let a1 = 1;
For j = 2, 3, . . . , s, find z ∈ {1, . . . , n − 1}, gcd(z , n) = 1, such that
(a1 , a2 , . . . , aj = z) minimizes D(Pn ({1, . . . , j})).

Let a1 = 1; Dr
Fast CBC construction for Pγ,α : use FFT. (Nuyens, Cools).
Randomized CBC construction.

For j = 2, . . . , s, try r random z ∈ {1, . . . , n − 1}, gcd(z , n) = 1,


and retain (a1 , a2 , . . . , aj = z) that minimizes D(Pn ({1, . . . , j})).
21

Quantiles of figure of merit


We computed Pγ,2 for order-dependent weights with Γj = 0.3j/2 for all j, for n = 26 , ..., 219 , for all
admissible vectors a = (1, a2 , . . . , as ) with odd aj .

aft
For s = 2, a linear regression of log Pγ,2 vs log n for 212 ≤ n ≤ 219 gives a decreasing rate near
O(n−1.9 ) for the best and O(n−1 ) for the mean. The worst value is with a = (1, 1) for all n.
The median is close to the best, which means that the majority of vectors a are quite good!

worst
10−1 mean
90 %
median
merit value

10−4 10 %

10−7

10−10
2 6
Dr
s =2

28 210 212
n
214 216 218
best
22

For s = 3, a linear regression log Pγ,2 vs log n for 210 ≤ n ≤ 215 gives decreasing rates near
O(n−1.75 ) for the best and O(n−1 ) for the mean.

aft
Again, the median is better than the mean.

100 worst
mean
90 %
10−2 median
merit value

10 %
best
10−4

10−6

26
Dr
s =3

28 210
n
212 214 216
23

Example: Playing with the Weights

aft
To see the effect of weights selection on RQMC variance, when choosing a lattice rule, we
shall integrate the worst-case function
√ (2π)α
fα∗ (u) = ∑ vu ∏ Bα (uj ),
u⊆{1,...,s} j∈u (α)!

whose RQMC variance is Pγ,2α .


The ideal weights are γu = vu . In our experiments, we measure the inflation factor of RQMC

Dr
variance when we use a lattice rule constructed via fast CBC with different weights γu ≠ vu .
We start with s = 10 and vu = Γ∣u∣ for ∣u∣ ≤ k, for a given integer k and a given constant
Γ > 0. We select the weights as γu = Γ̃∣u∣ for ∣u∣ ≤ k̃, where Γ̃ and k̃ may differ from Γ and k.
24

Ratio of RQMC variances for modified vs ideal weights

aft
n A1 A2 B1 B2 C1 C2
28 1.11 1.21 1.13 4.08 3.82 6.80
29 1.21 1.10 1.42 10.5 2.93 7.25
210 1.36 1.38 2.04 4.64 2.86 5.94
211 1.24 1.43 2.40 6.18 2.15 5.14
212 1.42 1.66 3.79 13.2 2.47 5.94
213 1.30 2.38 5.51 9.09 2.66 5.97
214 1.51 2.54 30.5 8.66 9.11 29.1

Dr
215
216
1.46
1.80
1.93
2.55
25.6
3.13
13.3
12.9

A1: k = k̃ = s, Γ2 = 0.1 and Γ̃2 = 0.001. Not too bad.


A2: k = k̃ = s, Γ2 = 0.001 and Γ̃2 = 0.1. More impact.
3.52
2.73
9.71
10.2
25

aft
B1: Γ2 = Γ̃2 = 0.1, k = 4 and k̃ = 2. Criterion is blind on projections of order 3 and 4. This
has unpredictable (sometimes dramatic) impact on RQMC variance.

B2: Γ2 = Γ̃2 = 0.5, k = 2 and k̃ = 4. Gives weight to irrelevant projections. Stronger


degradation on average.

Dr
26

C1: Γ2 = Γ̃2 = 0.1 and k = k̃ = 4, but increase the variation of f by replacing vu2 with
vu2 + ṽu2 , with

aft
ṽu2 = 1.0 for u = {1, 3}, {3, 5}, {5, 7}, {7, 9},
ṽu2 = 0.5 for u = {2, 3, 4}, {4, 5, 6}, {6, 7, 8}, {8, 9, 10},
ṽu2 = 0.25 for u = {1, 2, 3, 4}, {4, 5, 6, 7}, {7, 8, 9, 10},
and ṽu = 0 for all other u.
Important projections are not given enough weight relative to others.

C2: Like C1, but vu2 is replaced with only ṽu2 , as defined above. Many irrelevant projections,
with ṽu = 0, now have nonzero weights.

Dr
In all cases, the variance ratio increases (non-monotonically) with n.

The worst degradations are usually observed when we give too much weight to irrelevant
projections!
27

Discrepancies and Bounds with Projection-Dependent Weights


There are many other ways of defining the variation and the discrepancy. This is usually

aft
done by selecting a Reproducing-Kernel Hilbert Space of functions. The choice of kernel
determines D and V.

In general, we can obtain RQMC variance bounds via Hölder-type inequality:

Var[µ̂n,rqmc ] ≤ (D(Pn ) ⋅ V(f ))2 where

1/p 1/q
⎛ ⎞ ⎛ ⎞
V(f ) = ∑

Dr
⎝∅/=u⊆{1,2,...,s}
(V(fu )/γu )p

, D(Pn ) = ∑
⎝∅/=u⊆{1,2,...,s}
(γu Du (Pn ))q

1/p + 1/q = 1, γu ≥ 0 is a weight assigned to the subset u, V(fu ) is the variation of fu , and
Du (Pn ) is the discrepancy of the projection of Pn (u).
Common choice: p = q = 2. Sometimes q = ∞ with p = 1.

,
28
Form of discrepancies commonly used in constructions
Many discrepancies that are typically used to search for good parameters efficiently (e.g., in

aft
LatNet Builder) are as in the previous slide, with

1 n−1
Duq (Pn ) = ∑ ∏ φ(ui,j )
n i=0 j∈u

for some function φ ∶ [0, 1) → R, usually with q = 2. When computing the products for all
subsets u, we can start with the smaller subsets u, then save and re-use the partial products.
For the weighted P2α , we take q = 2 and

Dr φ(ui,j ) = −(−4π 2 )α B2α (ui,j )/(2α)!

For digital nets, we have other choices of φ.

We need software to support (1) search for good point set parameters and
(2) to generate and randomize the points, for these various constructions.
29

LatNet Builder Software


A C++ library offering tools to search for good parameters for lattice rules, polynomial

aft
lattice rules, and digital nets.
Various choices of figures of merit, arbitrary weights, construction methods, etc. Easily
extensible.
For better run-time efficiency, uses static polymorphism, via templates, rather than dynamic
polymorphism. Several techniques to reduce computations and improve speed.
Offers a pre-compiled program with Unix-like command line interface.
Also a graphical interface implemented in Python.

Dr
Available for download on GitHub, with source code and documentation.

Show graphical interface


30

Error and variance expressions for digital nets


Similar development as for lattice rules, but instead of using trigonometric Fourier series,

aft
we use Walsh series for the expansion of f .

Dr
30

Error and variance expressions for digital nets


Similar development as for lattice rules, but instead of using trigonometric Fourier series,

aft
we use Walsh series for the expansion of f .
Let N0 = {0, 1, 2, . . . }. For k = (k1 , . . . , ks ) ∈ Ns0 and u = (u1 , . . . , us ) ∈ [0, 1)s , where
ℓj s ℓj
kj = ∑ kj,ℓ bℓ−1 ∈ N0 and uj = ∑ uj,ℓ b−ℓ ∈ [0, 1), define ⟨k, u⟩ = ∑ ∑ kj,ℓ uj,ℓ mod b.
ℓ=1 ℓ≥1 j=1 ℓ=1

The Walsh expansion in base b of f ∶ [0, 1)s → R is

f (u) = ∑ f˜(k)e 2πi⟨k,u⟩/b ,

with Walsh coefficients Dr f˜(k) = ∫


k∈Ns0

[0,1)s
f (u)e −2πi⟨k,u⟩/b du.
30

Error and variance expressions for digital nets


Similar development as for lattice rules, but instead of using trigonometric Fourier series,

aft
we use Walsh series for the expansion of f .
Let N0 = {0, 1, 2, . . . }. For k = (k1 , . . . , ks ) ∈ Ns0 and u = (u1 , . . . , us ) ∈ [0, 1)s , where
ℓj s ℓj
kj = ∑ kj,ℓ bℓ−1 ∈ N0 and uj = ∑ uj,ℓ b−ℓ ∈ [0, 1), define ⟨k, u⟩ = ∑ ∑ kj,ℓ uj,ℓ mod b.
ℓ=1 ℓ≥1 j=1 ℓ=1

The Walsh expansion in base b of f ∶ [0, 1)s → R is

f (u) = ∑ f˜(k)e 2πi⟨k,u⟩/b ,

with Walsh coefficients Dr f˜(k) = ∫


k∈Ns0

[0,1)s
f (u)e −2πi⟨k,u⟩/b du.

For b = 2, e −2πi⟨k,u⟩/b is either e 0 = 1 or e −πi = cos π = −1.


The Walsh functions are square waves that take values only in {−1, 1}.
31

aft
Dual net
Suppose we estimate µ = E[f (U)] using a digital net in base b with point set
Pn = {ui , i = 0, . . . , n − 1}. The dual net is

Cs∗ = {k ∈ Ns0 ∶ ⟨k, ui ⟩ = 0 for all ui ∈ Pn }.

One can show that



1 n−1 2πi⟨k,ui ⟩/b ⎪⎪1 if k ∈ Cs∗ ;
∑ e = ⎨

Dr n i=0
⎩0 otherwise.
The nonzero digits kj,ℓ of a vector k select a specific set of digits ui,j,ℓ of the points. The
points are equidistributed for those specific digits iff the above sum is 0 for this k, iff k ∈/ Cs∗ .
32

aft
Error and variance expressions
For a deterministic digital net, if ∑ ∣f˜(k)∣ < ∞, then En = ∑ f˜(k).
k∈Ns0 0=
/ k∈Cs∗

Dr
32

aft
Error and variance expressions
For a deterministic digital net, if ∑ ∣f˜(k)∣ < ∞, then En = ∑ f˜(k).
k∈Ns0 0=
/ k∈Cs∗

For a digital net with a random digital shift, Var[µ̂n,rqmc ] = ∑ ∣f˜(k)∣2 .


0=
/ k∈Cs∗

From a variance reduction viewpoint, an optimal digital net for f minimizes this

Dr
Var[µ̂n,rqmc ]. Roughly, the dual net should not contain vectors k that correspond to large
square Walsh coefficients.
33
Bounds on Walsh coefficients
In the following, we assume for simplicity that b = 2.

aft
For a integer α > 0, a function f has smoothness α (roughly) if it has square-integrable
mixed partial derivatives up to order α.
For an integer k = 2a1 + ⋯ + 2aν > 0 with a1 > ⋅ ⋅ ⋅ > aν , we put
µα (k) = (a1 + 1) + ⋯ + (amin(α,ν) + 1), and µα (0) = 0.
ℓ min(α,ℓ )
Equivalently, if kj = ∑ℓ=1
j
kj,ℓ 2ℓ−1 , µα (kj ) = ∑ℓ=1 j ℓI[kj,ℓ > 0].
For a vector k = (k1 , . . . , ks ), let µα (k) = µα (k1 ) + ⋯ + µα (ks ).
Dick (2007, 2008) has shown that if f has smoothness α, then ∣fˆ(k)∣ ≤ K Ṽα2 (f )2−µα (k)

Dr
approximately, where K is a constant and Ṽα (f ) is the variation of f , which depends on the
integrals of the mixed partial derivatives of f of order up to α.
33
Bounds on Walsh coefficients
In the following, we assume for simplicity that b = 2.

aft
For a integer α > 0, a function f has smoothness α (roughly) if it has square-integrable
mixed partial derivatives up to order α.
For an integer k = 2a1 + ⋯ + 2aν > 0 with a1 > ⋅ ⋅ ⋅ > aν , we put
µα (k) = (a1 + 1) + ⋯ + (amin(α,ν) + 1), and µα (0) = 0.
ℓ min(α,ℓ )
Equivalently, if kj = ∑ℓ=1
j
kj,ℓ 2ℓ−1 , µα (kj ) = ∑ℓ=1 j ℓI[kj,ℓ > 0].
For a vector k = (k1 , . . . , ks ), let µα (k) = µα (k1 ) + ⋯ + µα (ks ).
Dick (2007, 2008) has shown that if f has smoothness α, then ∣fˆ(k)∣ ≤ K Ṽα2 (f )2−µα (k)

Dr
approximately, where K is a constant and Ṽα (f ) is the variation of f , which depends on the
integrals of the mixed partial derivatives of f of order up to α.
This suggests using the FOM ∑0=/ k∈Cs∗ 2−µα (k) for QMC and ∑0=/ k∈Cs∗ 2−2µα (k) for RQMC,
for a given α. But these sums have an infinite number of terms!
Dick replaced the sum by a max and showed how to construct digital nets for which
max0=/ k∈Cs∗ 2−µα (k) = O(n−α (log n)αs ), for any integer α > 0, using an interlacing technique.
34
Figures of Merit for Digital Nets
A variety of discrepancies of the form

aft
1/q
⎛ ⎞ 1 n−1
D(Pn ) = ∑ (γu Du (Pn ))q with (Du (Pn ))q = ∑ ∏ φ(ui,j )
⎝∅/=u⊆{1,2,...,s} ⎠ n i=0 j∈u

for some function φ ∶ [0, 1) → R have been defined recently for digital nets, based on
various assumptions on the rate of decrease of the Walsh coefficients, which often follow
from an assumption on the smoothness level α of f .
Some of them can be seen as counterparts of the Pα used for lattices.

Dr
They can be used in the same way to make CBC constructions
One popular choice of φ for b = 2 and α = 1:

φ(u) = 2 − I[u > 0] ⋅ 6 ⋅ 2⌊log2 (u)⌋

where −⌊log2 u⌋ is the index of the first nonzero digit in the binary expansion of u.
Latnet Builder can make searches for good point sets based on such discrepancies.
35

Polynomial lattice rules (in base b = 2)


Similar to lattice rules, except that:

aft
Replace Z by F2 [z], the ring of polynomials over finite field F2 ≡ {0, 1};
Replace R by L2 = F2 ((z −1 )), the field of formal Laurent series over F2 , of the form
∑∞ −ℓ
ℓ=ω xℓ z , where xℓ ∈ F2 . Define ϕ ∶ L → R by

∞ ∞
ϕ ( ∑ xℓ z −ℓ ) = ∑ xℓ b−ℓ .
ℓ=ω ℓ=ω

We select a polynomial modulus Q(z) ∈ Z2 [z] of degree k and a generating vector

Pn = {(ϕ (
Q(z)
Dr
a(z) = (a1 (z), . . . , as (z)) ∈ Z2 [z]s , whose coordinates are polynomials of degrees less than
k having no common factor with Q(z). The point set of cardinality n = 2k is

h(z)a1 (z)
),...,ϕ(
h(z)as (z)
Q(z)
)) ∶ h(z) ∈ Z2 [z] and deg(h(z)) < k} .
36

aft
Most of the properties of ordinary lattice rules have counterparts for the polynomial rules.
The Fourier expansion is replaced by a Walsh expansion, the weighted Pγ,α has a
counterpart Pγ,α,PRL , CBC constructions can provide good parameters, fast CBC also
works, etc.

Dr
36

aft
Most of the properties of ordinary lattice rules have counterparts for the polynomial rules.
The Fourier expansion is replaced by a Walsh expansion, the weighted Pγ,α has a
counterpart Pγ,α,PRL , CBC constructions can provide good parameters, fast CBC also
works, etc.

A PLR is actually a special case of a digital net in base b. This can be used to generate
the points efficiently: compute the generating matrices and use the digital net
implementation. This is particularly fast in base b = 2.

Dr
36

aft
Most of the properties of ordinary lattice rules have counterparts for the polynomial rules.
The Fourier expansion is replaced by a Walsh expansion, the weighted Pγ,α has a
counterpart Pγ,α,PRL , CBC constructions can provide good parameters, fast CBC also
works, etc.

A PLR is actually a special case of a digital net in base b. This can be used to generate
the points efficiently: compute the generating matrices and use the digital net
implementation. This is particularly fast in base b = 2.

Dr
Random shift in space of formal series: equivalent to a random digital shift in base b,
applied to all the points. It preserves equidistribution.
37

aft
More Examples

Dr
38

Example: Pricing a financial derivative.

aft
Market price of some asset (e.g., one share of a stock) evolves in time as stochastic
process {S(t), t ≥ 0} with (supposedly) known probability law (estimated from data).
A financial contract gives owner net payoff g(S(t1 ), . . . , S(td )) at time T = td , where
g ∶ Rd → R, and 0 ≤ t1 < ⋯ < td are fixed observation times.
Under a no-arbitrage assumption, present value of contract at time 0, when S(0) = s0 , is

v (s0 , T ) = E∗ [e −r T g(S(t1 ), . . . , S(td ))] ,

Dr
where E∗ is under a risk-neutral measure and e −r T is the discount factor.
This expectation can be written as an integral over [0, 1)s for some s, and estimated by the
average of n i.i.d. replicates of X = e −r T g(S(t1 ), . . . , S(td )).
39

A simple model for S: geometric Brownian motion (GBM):

aft
S(t) = s0 e X(t) where X(t) = (r − σ 2 /2)t + σB(t),

r is the interest rate, σ is the volatility, and B(⋅) is a standard Brownian motion: for any t2 > t1 ≥ 0,
B(t2 ) − B(t1 ) ∼ N(0, t2 − t1 ), and the increments over disjoint intervals are independent.

Dr
39

A simple model for S: geometric Brownian motion (GBM):

aft
S(t) = s0 e X(t) where X(t) = (r − σ 2 /2)t + σB(t),

r is the interest rate, σ is the volatility, and B(⋅) is a standard Brownian motion: for any t2 > t1 ≥ 0,
B(t2 ) − B(t1 ) ∼ N(0, t2 − t1 ), and the increments over disjoint intervals are independent.
Here, the vector Y = (X(t1 ), . . . , X(td )) has a multivariate normal distribution with some mean
vector µ and covariance matrix Σ .
To generate Y , we decompose Σ = AAt , and put Y = AZ where Z ∼ N (0
0, I).

Dr
Possible decompositions: Cholesky, Brownian bridge, PCA, etc.
Choice does not matter for MC but does matter for RQMC.
39

A simple model for S: geometric Brownian motion (GBM):

aft
S(t) = s0 e X(t) where X(t) = (r − σ 2 /2)t + σB(t),

r is the interest rate, σ is the volatility, and B(⋅) is a standard Brownian motion: for any t2 > t1 ≥ 0,
B(t2 ) − B(t1 ) ∼ N(0, t2 − t1 ), and the increments over disjoint intervals are independent.
Here, the vector Y = (X(t1 ), . . . , X(td )) has a multivariate normal distribution with some mean
vector µ and covariance matrix Σ .
To generate Y , we decompose Σ = AAt , and put Y = AZ where Z ∼ N (0
0, I).

Dr
Possible decompositions: Cholesky, Brownian bridge, PCA, etc.
Choice does not matter for MC but does matter for RQMC.

More general: X(t) can also be a vector in c dimensions, so s = cd.


This can apply more generally to a function of a multinormal vector, in many areas.
40

aft
Example of contract: Discretely-monitored Asian call option:

⎛ 1 d ⎞
g(S(t1 ), . . . , S(td )) = max 0, ∑ S(tj ) − K .
⎝ d j=1 ⎠

Numerical illustration: s = d = 12, T = 1 (one year), tj = j/12 for j = 0, . . . , 12, K = 100,


s0 = 100, r = 0.05, σ = 0.5.

Dr
41

ANOVA Variances for ordinary Asian Option


Asian Option with S(0) = 100, K = 100, r = 0.05, σ = 0.5

aft
d = 3, seq.

d = 3, BB

d = 3, PCA

d = 6, seq.

d = 6, BB

d = 6, PCA

Dr
d = 12, seq.

d = 12, BB

d = 12, PCA
42

Variance with good lattices rules and Sobol points

aft
100 Asian Option (PCA) d = 12, S(0) = 100, K = 100, r = 0.05, σ = 0.5 MC
Sobol
Lattice (P2 ) + bake
variance

n−2
10−3

10−6
102
Dr 103
n
104
43

Example: Pricing an Asian basket financial option

aft
We have c assets, d observation times. Want to estimate E[f (U)], where
⎡ ⎤
⎢ 1 c d ⎥
f (U) = e −r T ⎢
max ⎢0, ∑ ∑ Si (tj ) − K ⎥⎥ .
⎢ cd i=1 j=1 ⎥
⎣ ⎦

Dr
43

Example: Pricing an Asian basket financial option

aft
We have c assets, d observation times. Want to estimate E[f (U)], where
⎡ ⎤
⎢ 1 c d ⎥
f (U) = e −r T ⎢
max ⎢0, ∑ ∑ Si (tj ) − K ⎥⎥ .
⎢ cd i=1 j=1 ⎥
⎣ ⎦

Suppose S(t) = (S1 (t), . . . , Sc (t)) obeys a geometric Brownian motion.

Dr
Then, f (U) = g(Y ) where Y = (Y1 , . . . , Ys ) ∼ N(0
0, Σ ) and s = cd.
44

Numerical experiment with c = 10 and d = 25


This gives a 250-dimensional integration problem.

aft
Let ρi,j = 0.4 for all i =/ j (correlations between assets), T = 1, σi = 0.1 + 0.4(i − 1)/9 for all
i , r = 0.04, S(0) = 100, and K = 100. (Imai and Tan 2002; see L. 2009 for more details).

Dr
44

Numerical experiment with c = 10 and d = 25


This gives a 250-dimensional integration problem.

aft
Let ρi,j = 0.4 for all i =/ j (correlations between assets), T = 1, σi = 0.1 + 0.4(i − 1)/9 for all
i , r = 0.04, S(0) = 100, and K = 100. (Imai and Tan 2002; see L. 2009 for more details).
Variance reduction factors for Cholesky (left) and PCA (right) (experiment from 2003):

Korobov Lattice Rules


n = 16381 n = 65521 n = 262139
a = 5693 a = 944 a = 21876
Lattice+shift 18 878 18 1504 9 2643

Dr
Lattice+shift+baker 50 4553 46 3657

Sobol+Shift
Sobol+LMS+Shift
Sobol’ Nets
n = 214
10 1299 17 3184
6 4232
n = 216

4 9219
43

32
7553

n = 218
6046
35 16557

Note: The payoff function is not smooth and also unbounded. So Koksma-Hlawka does not apply.
45

Many other types of successful applications of RQMC, for


example:

aft
Statistics: estimate and optimize the likelihood function for given data.
Computer graphics: estimate the color of a pixel.
Service systems with random arrival rates.
Stochastic differential equations (e.g., in finance).

Dr
PDEs with random coefficients.
Etc.
46

aft
RQMC for Markov chains

Dr
47

Array-RQMC for Markov Chains


Setting: A Markov chain with state space X ⊆ Rℓ , evolves as

aft
X 0 = x0 , Xj = ϕj (Xj−1 , Uj ), j ≥ 1,

where the Uj are i.i.d. uniform r.v.’s over (0, 1)d . Want to estimate
τ
µ = E[Y ] where Y = ∑ gj (Xj ).
j=1

Ordinary MC: n i.i.d. realizations of Y . Requires s = τ d uniforms.

Dr
47

Array-RQMC for Markov Chains


Setting: A Markov chain with state space X ⊆ Rℓ , evolves as

aft
X 0 = x0 , Xj = ϕj (Xj−1 , Uj ), j ≥ 1,

where the Uj are i.i.d. uniform r.v.’s over (0, 1)d . Want to estimate
τ
µ = E[Y ] where Y = ∑ gj (Xj ).
j=1

Ordinary MC: n i.i.d. realizations of Y . Requires s = τ d uniforms.

Dr
Array-RQMC: L., Lécot, Tuffin, et al. [2004, 2006, 2008, etc.]
Simulate an “array” (or population) of n chains in “parallel.”
Goal: Want small discrepancy between empirical distribution of states
Sn,j = {X0,j , . . . , Xn−1,j } and theoretical distribution of Xj , at each step j.
At each step, use RQMC point set to advance all the chains by one step.
48
Array-RQMC insight: To simplify, suppose Xj ∼ U(0, 1)ℓ . We estimate

µj = E[gj (Xj )] = E[gj (ϕj (Xj−1 , U))] = ∫ gj (ϕj (x, u))dxdu by

aft
[0,1)ℓ+d

1 n−1 1 n−1
µ̂arqmc,j,n = ∑ gj (Xi,j ) = ∑ gj (ϕj (Xi,j−1 , Ui,j )).
n i=0 n i=0

This is (roughly) RQMC with the point set Qn = {(Xi,j−1 , Ui,j ), 0 ≤ i < n} .
We want Qn to have low discrepancy (LD) over [0, 1)ℓ+d .

Dr
48
Array-RQMC insight: To simplify, suppose Xj ∼ U(0, 1)ℓ . We estimate

µj = E[gj (Xj )] = E[gj (ϕj (Xj−1 , U))] = ∫ gj (ϕj (x, u))dxdu by

aft
[0,1)ℓ+d

1 n−1 1 n−1
µ̂arqmc,j,n = ∑ gj (Xi,j ) = ∑ gj (ϕj (Xi,j−1 , Ui,j )).
n i=0 n i=0

This is (roughly) RQMC with the point set Qn = {(Xi,j−1 , Ui,j ), 0 ≤ i < n} .
We want Qn to have low discrepancy (LD) over [0, 1)ℓ+d .
However, we do not choose the Xi,j−1 ’s in Qn : they come from the simulation.

Dr
48
Array-RQMC insight: To simplify, suppose Xj ∼ U(0, 1)ℓ . We estimate

µj = E[gj (Xj )] = E[gj (ϕj (Xj−1 , U))] = ∫ gj (ϕj (x, u))dxdu by

aft
[0,1)ℓ+d

1 n−1 1 n−1
µ̂arqmc,j,n = ∑ gj (Xi,j ) = ∑ gj (ϕj (Xi,j−1 , Ui,j )).
n i=0 n i=0

This is (roughly) RQMC with the point set Qn = {(Xi,j−1 , Ui,j ), 0 ≤ i < n} .
We want Qn to have low discrepancy (LD) over [0, 1)ℓ+d .
However, we do not choose the Xi,j−1 ’s in Qn : they come from the simulation.

Dr
Idea: select RQMC point set Q̃n = {(w0 , U0,j ), . . . , (wn−1 , Un−1,j )} , where the wi ∈ [0, 1)ℓ
are fixed and each Ui,j ∼ U(0, 1)d .
Permute the states Xi,j−1 so that Xπj (i),j−1 is “close” to wi for each i and compute
Xi,j = ϕj (Xπj (i),j−1 , Ui,j ) for each i .
Example: If ℓ = 1, can take wi = (i + 0.5)/n and just sort the states.
For ℓ > 1, there are various ways to define the matching (multivariate sort).
49

Array-RQMC algorithm

aft
Xi,0 ← x0 (or Xi,0 ← xi,0 ) for i = 0, . . . , n − 1;
for j = 1, 2, . . . , τ do
Compute the permutation πj of the states (for matching);
Randomize afresh {U0,j , . . . , Un−1,j } in Q̃n ;
Xi,j = ϕj (Xπj (i),j−1 , Ui,j ), for i = 0, . . . , n − 1;
i=0 g(Xi,j );
µ̂arqmc,j,n = Ȳn,j = n1 ∑n−1
Estimate µ by the average Ȳn = µ̂arqmc,n = ∑τj=1 µ̂arqmc,j,n .

Dr
49

Array-RQMC algorithm

aft
Xi,0 ← x0 (or Xi,0 ← xi,0 ) for i = 0, . . . , n − 1;
for j = 1, 2, . . . , τ do
Compute the permutation πj of the states (for matching);
Randomize afresh {U0,j , . . . , Un−1,j } in Q̃n ;
Xi,j = ϕj (Xπj (i),j−1 , Ui,j ), for i = 0, . . . , n − 1;
i=0 g(Xi,j );
µ̂arqmc,j,n = Ȳn,j = n1 ∑n−1
Estimate µ by the average Ȳn = µ̂arqmc,n = ∑τj=1 µ̂arqmc,j,n .

Dr
Proposition: The average Ȳn is an unbiased estimator of µ.
Can estimate Var[Ȳn ] by making m independent replications.
50

Example of 2-dim batch sort: a (4,4) mapping

aft
Sobol’ net in 2 dimensions af-
States of the chains ter random digital shift
1.0 1.0 s
s s s
0.9 0.9
s
0.8
s s 0.8 s
s s
0.7 0.7 s
0.6 0.6
s
s s
0.5 0.5 s
s s
0.4

0.3

0.2

0.1

0.0
s s

s
Dr
s

s
s s s

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
s
0.4

0.3

0.2

0.1

0.0
s

s
s
s

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
s
s
51

Example of 2-dim batch sort: a (4,4) mapping

aft
Sobol’ net in 2 dimensions af-
States of the chains ter random digital shift
1.0 1.0 s
s s s
0.9 0.9
s
0.8
s s 0.8 s
s s
0.7 0.7 s
0.6 0.6
s
s s
0.5 0.5 s
s s
0.4

0.3

0.2

0.1

0.0
s s

s
Dr
s

s
s s s

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
s
0.4

0.3

0.2

0.1

0.0
s

s
s
s

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
s
s
52

Example of 2-dim batch sort: a (4,4) mapping

aft
Sobol’ net in 2 dimensions af-
States of the chains ter random digital shift
1.0 1.0 s
s s s
0.9 0.9
s
0.8
s s 0.8 s
s s
0.7 0.7 s
0.6 0.6
s
s s
0.5 0.5 s
s s
0.4

0.3

0.2

0.1

0.0
s s

s
Dr
s

s
s s s

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
s
0.4

0.3

0.2

0.1

0.0
s

s
s
s

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
s
s
52

Example of 2-dim batch sort: a (4,4) mapping

aft
Sobol’ net in 2 dimensions af-
States of the chains ter random digital shift
1.0 1.0 s
s s s
0.9 0.9
s
0.8
s s 0.8 s
s sz
0.7 0.7 s
0.6 0.6
s
sz s
0.5 0.5 sz
s s
0.4

0.3

0.2

0.1

0.0
s s

s
Dr
s

s
sz s s

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
s
0.4

0.3

0.2

0.1

0.0
s

s
s
s

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
s
s
53

Hilbert curve sort


Map the states to [0, 1], then sort.

aft
States of the chains

1.0

0.9
s s
0.8
s s
s
0.7

0.6
s

Dr 0.5

0.4

0.3

0.2

0.1

0.0
s s

s
s

s
s
s

s s

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
s
53

Hilbert curve sort


Map the states to [0, 1], then sort.

aft
States of the chains

1.0

0.9
s s
0.8
s s
s
0.7

0.6
s

Dr 0.5

0.4

0.3

0.2

0.1

0.0
s s

s
s

s
s
s

s s

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
s
53

Hilbert curve sort


Map the states to [0, 1], then sort.

aft
States of the chains

1.0

0.9
s s
0.8
s s
s
0.7

0.6
s

Dr 0.5

0.4

0.3

0.2

0.1

0.0
s s

s
s

s
s
s

s s

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
s
53

Hilbert curve sort


Map the states to [0, 1], then sort.

aft
States of the chains

1.0

0.9
s s
0.8
s s
s
0.7

0.6
s

Dr 0.5

0.4

0.3

0.2

0.1

0.0
s s

s
s

s
s
s

s s

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
s
54

Convergence results and proofs

aft
For ℓ = 1, O(n−3/2 ) variance has been proved under some conditions.

For ℓ > 1, worst-case error of O(n−1/(ℓ+1) ) has been proved in deterministic settings under
strong conditions on ϕj , using a batch sort (El Haddad, Lécot, L’Ecuyer 2008, 2010).

Gerber and Chopin (2015) proved o(n−1 ) variance, for Hilbert sort and digital net with
nested scrambling.

Dr
55

A very simple example, one-dimensional state


Let Y = θU + (1 − θ)V , where U, V indep. U(0, 1) and θ ∈ [0, 1). This Y has cdf Gθ .

aft
Markov chain is defined by

X0 = U0 ∼ U(0, 1);
Xj = ϕj (Xj−1 , Uj ) = Gθ (θXj−1 + (1 − θ)Uj ), j ≥ 1

where Uj ∼ U(0, 1). Then, Xj ∼ U(0, 1).

We consider various functions gj :

Dr
gj (x) = x − 1/2, gj (x) = x 2 − 1/3,
gj (x) = sin(2πx), gj (x) = e x − e + 1,
gj (x) = (x − 1/2)+ − 1/8, gj (x) = I[x ≤ 1/3] − 1/3.
They all have E[gj (Xj )] = 0.
Also discrepancies of states X0,j , . . . , Xn−1,j .
56

aft
For array-RQMC, we take Xi,0 = wi = (i − 1/2)/n.

We have
n−3/2 1−θ
E[Dj2 ] ≤ = n−3/2 .
4(1 − ρ) 4(1 − 2θ)

We tried different RQMC methods, for n = 29 to n = 221 .


We did m = 200 independent replications for each n.
We fitted a linear regression of log2 Var[Ȳn,j ] vs log2 n, for various gj

Dr
We also looked at E[Dj2 ] and E[Pα ] for α = 2, 4, 6.
57

aft
Some MC and RQMC point sets:

MC: Crude Monte Carlo


LHS: Latin hypercube sampling
SS: Stratified sampling
SSA: Stratified sampling with antithetic variates in each stratum
Sobol: Sobol’ points, left matrix scrambling + digital random shift
Sobol+baker: Add baker transformation
Sobol+NUS: Sobol’ points with Owen’s nested uniform scrambling
Korobov:
Korobov+baker:
Dr
Korobov lattice in 2 dim. with a random shift modulo 1
Add a baker transformation
58
slope vs log2 n log2 E[Dj2 ] log2 Var[Ȳn,j ]
Xj − 21 Xj2 − 31 (Xj − 21 )+ − 18 I[Xj ≤ 13 ] − 1
3
MC -1.01 -1.02 -1.01 -1.00 -1.02

aft
LHS -1.02 -0.99 -1.00 -1.00 -1.00
SS -1.50 -1.98 -2.00 -2.00 -1.49
SSA -1.50 -2.65 -2.56 -2.50 -1.50
Sobol -1.51 -3.22 -3.14 -2.52 -1.49
Sobol+baker -1.50 -3.41 -3.36 -2.54 -1.50
Sobol+NUS -1.50 -2.95 -2.95 -2.54 -1.52
Korobov -1.87 -2.00 -1.98 -1.98 -1.85
Korobov+baker -1.92 -2.01 -2.02 -2.01 -1.90

MC
LHS
SS
SSA
Sobol
Korobov
Dr
7.35
8.82
13.73
18.12
19.86
13.55
− log10 Var[Ȳn,j ] for n = 221
Xj2 − 13 (Xj − 12 )+ − 18 I[Xj ≤ 31 ] −
7.86
8.93
14.10
17.41
17.51
14.03
6.98
7.61
10.20
10.38
10.36
11.98
1
3
CPU time (sec)

270
992
2334
1576
443
359
59

Example: Pricing an Asian Call Option


S(0) = 100, K = 100, r = 0.05, σ = 0.15, tj = j/52, j = 0, . . . , τ = 13.
RQMC: Sobol’ points with linear scrambling + random digital shift.

aft
Similar results for randomly-shifted lattice + baker’s transform.
log2 Var[µ̂RQMC,n ]

-10
n−1
crude MC
-20
RQMC sequential
-30

-40

8
Dr10 12 14 16 18 20
array-RQMC, split sort
n−2
log2 n
60

Example: Asian Call Option

Var[Ȳn,j ] ≈ O(nα ).

aft
VRF compared with MC, for n = 220 .
CPU time for m = 100 replications.

Sort RQMC points α VRF CPU (sec)


Batch sort SS -1.38 2.0 × 102 744
(n1 = n2 ) Sobol -2.03 4.2 × 106 532
Sobol+NUS -2.03 2.8 × 106 1035
Korobov+baker -2.04 4.4 × 106 482

Dr
Hilbert sort
(logistic map)
SS
Sobol
Sobol+NUS
Korobov+baker
-1.55
-2.03
-2.02
-2.01
2.4 × 103
2.6 × 106
2.8 × 106
3.3 × 106
840
534
724
567
61

Array-RQMC for a system of biological and chemical reactions

aft
[Puchhammer, Ben Abdellah, and L. 2022]
We have d types of molecules and K types of reactions:
ck
α1,k S1 + ⋯ + αd,k Sd Ð→ β1,k S1 + ⋯ + βd,k Sd , αi,k , βi,k ∈ N0 , k = 1, . . . , K.

Let X(t) = (X1 (t), . . . , Xd (t)) where Xk (t) is the number of molecules of type k at time
t. The process {X(t), t ≥ 0} is modeled as a continuous-time Marlov chain (CTMC).
The propensity (or occurrence rate) of reaction k at time t is ak (X(t)) = ck hk (X(t)).

Dr
61

Array-RQMC for a system of biological and chemical reactions

aft
[Puchhammer, Ben Abdellah, and L. 2022]
We have d types of molecules and K types of reactions:
ck
α1,k S1 + ⋯ + αd,k Sd Ð→ β1,k S1 + ⋯ + βd,k Sd , αi,k , βi,k ∈ N0 , k = 1, . . . , K.

Let X(t) = (X1 (t), . . . , Xd (t)) where Xk (t) is the number of molecules of type k at time
t. The process {X(t), t ≥ 0} is modeled as a continuous-time Marlov chain (CTMC).
The propensity (or occurrence rate) of reaction k at time t is ak (X(t)) = ck hk (X(t)).

a given time T . Dr
We discretize the time and simulate the discrete-time chain to estimate say E[g(X(T )] for

For Array-RQMC, finding good sorting methods is an important issue.


62

Example with 6 molecule types and 6 reactions types

aft
c1
Ð→
PKA + 2cAMP ←Ð PKA-cAMP2 ,
c2
c3
Ð→
PKA-cAMP2 + 2cAMP ←Ð PKA-cAMP4 ,
c4
c5
Ð→
PKA-cAMP4 ←Ð PKAr + 2PKAc.
c6

Dr
We simulate 256 time steps.
The state has 6 dimensions, and we need 6 random numbers per step.
Classical RQMC requires 256 × 6 = 1536-dimensional points.
Array-RQMC requires 7 or 12-dimensional RQMC points.
63

Example with 6 molecule types and 6 reactions types

aft
Here we estimate the expected number of PKAr molecules at time T .
Estimated convergence rate β̂ and variance reduction factor for n = 219 points:

Sort Sampling β̂ vrf19 eif19


MC 1.03 1 1
RQMC 1.17 39 45
Latice+shift 1.42 3634 1550
OSLAIF

Dr
Batch-sort-6
Sobol+LMS+ds
Latice+shift
Sobol+LMS+ds
1.47
1.62
1.46
2062
3026
2753
1059
1560
1749
64

Array-RQMC: Some generalizations

aft
L., Lécot, and Tuffin [2008]: τ can be a random stopping time w.r.t. the filtration
F{(j, Xj ), j ≥ 0}.

L., Demers, and Tuffin [2006, 2007]: Combination with splitting techniques (multilevel and
without levels), combination with importance sampling and weight windows. Covers particle
filters.

L. and Sanvido [2010]: Combination with coupling from the past for exact sampling.

stopping problems.
Dr
Dion and L. [2010]: Combination with approximate dynamic programming and for optimal

Gerber and Chopin [2015]: Sequential QMC.


65

Array-RQMC: Convergence results

aft
L., Lécot, and Tuffin [2006, 2008]: Special cases: convergence at MC rate, one-dimensional,
stratification, etc. O(n−3/2 ) variance.

Lécot and Tuffin [2004]: Deterministic, one-dimension, discrete state.

El Haddad, Lécot, L. [2008, 2010]: Deterministic, multidimensional. O(n−1/(ℓ+1) ) worst-case error


under some conditions.

Dr
Fakhererredine, El Haddad, Lécot [2012, 2013, 2014]: LHS, stratification, Sudoku sampling, ...

L., Lécot, Munger, and Tuffin [2016]: Survey, comparing sorts, and further examples, some with
O(n−3 ) empirical variance.
66

Array-RQMC: Some applications and examples

aft
L., Lécot, and Tuffin [2008]: Several examples.

Wächter and Keller [2008]: Applications in computer graphics.

Gerber and Chopin [2015]: Sequential QMC (particle filters), Owen nested scrambling and Hilbert
sort. o(n−1 ) variance.

Ben Abdellah, L., Puchhammer [2019]: Option Pricing Under Stochastic Volatility Models.

Dr
Puchhammer, Ben Abdellah, L. [2020]: Simulation of Biological and Chemical Reactions.

Arnold, Chen, Sha [2021]: Policy Gradient Methods for Reinforcement Learning.
67

Conclusion, discussion, etc.


▸ RQMC can improve the accuracy of estimators considerably in some applications.

aft
▸ Cleverly modifying the function f can often bring huge statistical efficiency
improvements in simulations with RQMC.
▸ There are often many possibilities for how to change f to make it smoother, periodic,
and reduce its effective dimension.
▸ Point set constructions should be based on discrepancies that take that into account.
▸ Nonlinear functions of expectations: RQMC also reduces the bias.
▸ RQMC for density estimation.

Dr
▸ RQMC for optimization.
▸ Array-RQMC and other QMC methods for Markov chains. Sequential RQMC.
▸ Still a lot to learn and do ...
▸ Software support still very limited in general!

You might also like