Etics24 RQMC Tutorial B
Etics24 RQMC Tutorial B
aft
methods in simulation
Part B
Pierre L’Ecuyer
aft
QMC/RQMC Theory
Dr
3
Short Recap
We want to estimate µ = ∫[0,1)s f (u)du where U = (U0 , . . . , Us−1 ) and the Uj are i.i.d. U(0, 1).
aft
Monte Carlo (MC) estimator:
1 n−1
µ̂n = ∑ f (Ui )
n i=0
where U0 , . . . , Un−1 i.i.d. uniform over [0, 1)s .
We have E[µ̂n ] = µ and Var[µ̂n ] = σ 2 /n = Var[f (U)]/n.
Quasi-Monte Carlo (QMC) approximation:
1 n−1
µ̄n = ∑ f (ui )
n i=0
Dr
where Pn = {u0 , . . . , un−1 } are deterministic points that cover [0, 1)s very evenly.
Randomized Quasi-Monte Carlo (RQMC) estimator:
µ̂n,rqmc =
1 n−1
∑ f (Ui ),
n i=0
where {U0 , . . . , Un−1 } ⊂ (0, 1)s are randomized versions of the points of Pn so that still cover the space
evenly while each Ui is uniform over [0, 1)s .
4
aft
For two arbitrary points a < b ∈ (0, 1)s , the rectangular box B = [a, b) with corners at a and
b is an s-dimensional interval. Let vol[a, b) denote the volume of this box and ∣Pn ∩ [a, b)∣/n
be the proportion of the points that it contains. We call the absolute difference
disc([a, b), Pn ) = ∣vol[a, b) − ∣Pn ∩ [a, b)∣/n∣ the local discrepancy for that interval.
1
u2
b2
Dr a2
0 a1 b1 1 u1
To measure the discrepancy between the distribution of Pn and the uniform distribution, we
can take the sup or average of disc([a, b), Pn ) over the intervals B.
5
Classical discrepancies
Extreme discrepancy of Pn : D(Pn ) = sup disc([a, b), Pn ).
[a,b)⊆[0,1)s
aft
Star discrepancy of Pn : D∗ (Pn ) = sup disc([0
0, b], Pn ).
b∈[0,1)s
The sup can also be replaced by a more general Lp norm, for 1 ≤ p < ∞.
Lp star discrepancy of Pn :
∗
D(p) (Pn ) = (∫
Dr
[0,1)s
(disc([0
0, u), Pn ))p du)
aft
first n points of this sequence. P∞ is called uniformly distributed if D∗ (Pn ) → 0 as n → ∞.
Theorem (Weyl 1916). When n → ∞, ∣En ∣ → 0 for all bounded Riemann-integrable
functions f if and only if D∗ (Pn ) → 0.
Dr
6
aft
first n points of this sequence. P∞ is called uniformly distributed if D∗ (Pn ) → 0 as n → ∞.
Theorem (Weyl 1916). When n → ∞, ∣En ∣ → 0 for all bounded Riemann-integrable
functions f if and only if D∗ (Pn ) → 0.
Dr
where Vhk (f ) denotes the total variation of f in the sense of Hardy and Krause;
see Kuipers and Niederreiter (1974) for a definition.
Vhk (f ) < ∞ implies no discontinuity not aligned with the axes.
Erdõs-Turán-Koksma Inequality. For each s, there is a constant cs such that for all Pn ,
R RRR
⎛ 1 1 RRRR 1 n−1 2πiht ui RRR⎞
aft
∗ RR ∑ e
D (Pn ) ≤ cs + ∑ ∏ RRR
⎝ m + 1 h∈Zs (m) j =/ 0 ∣hj ∣ RRRR n i=0 RRR⎠
R
√
for all integers m ≥ 1, where Zs (m) = {h = (h1 , . . . , hs ) ∈ {−m, . . . , m}s } and i = −1.
This inequality permits one to construct explicit low-discrepancy sequences by bounding the
exponential sum. It is a convenient tool.
It was used for example to prove that the digital sequence constructions by Sobol’, Faure,
Niederreiter, etc., have a discrepancy that converges as O(n−1 (log n)s ). This implies that
Dr
with these point sets, if Vhk (f ) < ∞, then
Erdõs-Turán-Koksma Inequality. For each s, there is a constant cs such that for all Pn ,
R RRR
⎛ 1 1 RRRR 1 n−1 2πiht ui RRR⎞
aft
∗ RR ∑ e
D (Pn ) ≤ cs + ∑ ∏ RRR
⎝ m + 1 h∈Zs (m) j =/ 0 ∣hj ∣ RRRR n i=0 RRR⎠
R
√
for all integers m ≥ 1, where Zs (m) = {h = (h1 , . . . , hs ) ∈ {−m, . . . , m}s } and i = −1.
This inequality permits one to construct explicit low-discrepancy sequences by bounding the
exponential sum. It is a convenient tool.
It was used for example to prove that the digital sequence constructions by Sobol’, Faure,
Niederreiter, etc., have a discrepancy that converges as O(n−1 (log n)s ). This implies that
Dr
with these point sets, if Vhk (f ) < ∞, then
For lattice rules and digital nets, we will rather use other variants of this inequality that do
not involve D∗ (Pn ), but other discrepancies that are much easier to compute.
8
aft
t
f (u) = ∑ fˆ(h)e 2πih u for u ∈ [0, 1)s ,
h∈Zs
Note that
t
s s
e 2πih u = ∏ e 2πihj uj = ∏[cos(2πhj uj ) + i sin(2πhj uj )] (by Euler formula),
j=1
Dr j=1
so this expansion is in terms of one-periodic trigonometric functions over the real space.
It is defined everywhere, just by making copies of its definition over [0, 1)s .
When using standard MC to estimate µ = E[f (U)], the MC variance is
t
Var[f (U)] = ∑ ∣fˆ(h)∣2 Var[e 2πih U ] = ∑ ∣fˆ(h)∣2 .
h∈Zs h∈Zs
9
Suppose now that we estimate µ = E[f (U)] using a lattice rule with point set
Pn = {ui , i = 0, . . . , n − 1}. The dual lattice to Ls is
aft
L∗s = {h ∈ Rs ∶ h t v ∈ Z for all v ∈ Ls } ⊆ Zs .
Dr h∈Zs 0=
/ h∈L∗s
9
Suppose now that we estimate µ = E[f (U)] using a lattice rule with point set
Pn = {ui , i = 0, . . . , n − 1}. The dual lattice to Ls is
aft
L∗s = {h ∈ Rs ∶ h t v ∈ Z for all v ∈ Ls } ⊆ Zs .
Dr h∈Zs
From the viewpoint of variance reduction, an optimal lattice for f minimizes Var[µ̂n,rqmc ].
Roughly, the dual lattice should not contain the large (absolute) Fourier coefficients.
10
For a integer α > 0, a function f has smoothness α if it has square-integrable mixed partial
derivatives up to order α. If f has smoothness α and the periodic continuation of its
aft
derivatives up to order α − 1 is continuous across the unit cube boundaries, then
Moreover, there always exists a rank-1 lattice (a vector v1 = v1 (n) = a(n)/n) such that
Dr
Pα is the error and P2α is the variance for a (worst-case) function f for which
aft
u⊆{1,...,s} j∈u
and we have
∣u∣
1 n−1 −(−4π 2 )α
P2α = ∑ ∑[ ] ∏ B2α (ui,j )
∅≠u⊆{1,...,s} n i=0 (2α)! j∈u
where Bα is the Bernoulli polynomial of degree α.
In particular, B1 (u) = u − 1/2 and B2 (u) = u 2 − u + 1/6.
Can compute this P2α in O(sn) operations. Easy to search for good lattices in this case!
P2α has been proposed long ago as a figure of merit.
Dr
Beware: in many “older” papers, α is replaced by α/2.
Worst-case function may not be representative of what happens in applications.
Also, the hidden factor in O increases quickly with s.
To get a bound that is uniform in s, the Fourier coefficients must decrease fast enough
with the dimension and “size” of vectors h. This is typically what happens in applications
for which RQMC is effective!
12
aft
If f (0) =/ f (1), define f˜ by f˜(1 − u) = f˜(u) = f (2u) for 0 ≤ u ≤ 1/2.
This f˜ has the same integral as f and f˜(0) = f˜(1).
Dr 0
1/2
.
1
12
aft
If f (0) =/ f (1), define f˜ by f˜(1 − u) = f˜(u) = f (2u) for 0 ≤ u ≤ 1/2.
This f˜ has the same integral as f and f˜(0) = f˜(1).
Dr 0
1/2
.
1
12
aft
If f (0) =/ f (1), define f˜ by f˜(1 − u) = f˜(u) = f (2u) for 0 ≤ u ≤ 1/2.
This f˜ has the same integral as f and f˜(0) = f˜(1).
Dr 0
1/2
.
1
12
aft
If f (0) =/ f (1), define f˜ by f˜(1 − u) = f˜(u) = f (2u) for 0 ≤ u ≤ 1/2.
This f˜ has the same integral as f and f˜(0) = f˜(1).
Dr 0
1/2
1
For smooth f , it can reduce the variance from O(n−2+ϵ ) to O(n−4+ϵ ) (Hickernell 2002).
The resulting f˜ is symmetric with respect to u = 1/2.
In practice, we transform the points Ui instead of f .
13
One-dimensional case
aft
Random shift mod 1 followed by baker’s transformation.
Along each coordinate, stretch everything by a factor of 2 and fold back.
Same as replacing Uj by min[2Uj , 2(1 − Uj )].
0 0.5 1
Dr
13
One-dimensional case
aft
Random shift mod 1 followed by baker’s transformation.
Along each coordinate, stretch everything by a factor of 2 and fold back.
Same as replacing Uj by min[2Uj , 2(1 − Uj )].
0 0.5 1
U/n
Dr
13
One-dimensional case
aft
Random shift mod 1 followed by baker’s transformation.
Along each coordinate, stretch everything by a factor of 2 and fold back.
Same as replacing Uj by min[2Uj , 2(1 − Uj )].
0 0.5 1
Dr
13
One-dimensional case
aft
Random shift mod 1 followed by baker’s transformation.
Along each coordinate, stretch everything by a factor of 2 and fold back.
Same as replacing Uj by min[2Uj , 2(1 − Uj )].
0 0.5 1
Dr
Gives locally antithetic points in intervals of size 2/n.
This implies that linear pieces over these intervals are integrated exactly.
Intuition: when f is smooth, it is well-approximated by a piecewise linear function which is
integrated exactly, so the error is small.
14
aft
Just to have one point in each “quadrant”, we need 2s points, which is unrealistic.
Dr
14
aft
Just to have one point in each “quadrant”, we need 2s points, which is unrealistic.
Fortunately, RQMC can still be effective if f is well approximated by a sum of low-dimensional
functions, because it suffices to have low error or variance for these low-dimensional functions.
Dr
14
aft
Just to have one point in each “quadrant”, we need 2s points, which is unrealistic.
Fortunately, RQMC can still be effective if f is well approximated by a sum of low-dimensional
functions, because it suffices to have low error or variance for these low-dimensional functions.
In general, we can write
s s
f (u) = ∑ fu (u) = µ + ∑ f{i} (ui ) + ∑ f{i,j} (ui , uj ) + ⋯
u⊆{1,...,s} i=1 i,j=1
Dr
Var[f (U)] = σ 2 = ∑
u⊆{1,...,s}
σu2 = ∑
u⊆{1,...,s}
Var[fu (U)].
The σu2 ’s can be estimated by MC or RQMC (I skip the details here). Idea: Make sure the
projections Pn (u) of Pn over u are very uniform for the important subsets u (those with a large σu2 ).
When they are mostly lower-dim. projections, then it is much easier to make RQMC effective.
15
aft
Pγ,α = ∑ γu(h) (max(1, ∣h1 ∣)⋯ max(1, ∣hs ∣)) .
−α
/ h∈L∗s
0=
1 n−1 −(−4π 2 )α
∣u∣
Pγ,2α = ∑ γu ∑[ ] ∏ B2α (ui,j ),
∅≠u⊆{1,...,s} n i=0 (2α)! j∈u
Vγ,α
2
Dr
(f )=
Var[µ̂n,rqmc ] =
∑
∑
u⊆{1,...,s}
1
∫ ∣
∂ α∣u∣
2 α∣u∣ [0,1]∣u∣ ∂u α u
∅≠u⊆{1,...,s} γu (4π )
for f ∶ [0, 1)s → R smooth enough. Then, we have the variance bound
f (u)∣ du,
(f ) ⋅ Pγ,2α .
16
This Pγ,2α is the RQMC variance for the worst-case function with Vγ,α
2
(f ) ≤ 1:
√ (2π)α
f ∗ (u) =
aft
∑ γu ∏ Bα (uj ),
u⊆{1,...,s} j∈u (α)!
(4π 2 )α (4π 2 )α
∣u∣ ∣u∣
Dr
σu2 = γu [Var[Bα (U)]
((α)!)2
] = γu [∣B2α (0)∣
aft
the weights γu for ∣u∣ = 1 are irrelevant.
Dr
17
aft
the weights γu for ∣u∣ = 1 are irrelevant.
For f ∗ , take γu = ρ∣u∣ σu2 for a constant ρ, but there are 2s − 1 subsets u to consider!
Dr
17
aft
the weights γu for ∣u∣ = 1 are irrelevant.
For f ∗ , take γu = ρ∣u∣ σu2 for a constant ρ, but there are 2s − 1 subsets u to consider!
One could define a simple parametric model and estimate the parameters by matching the
ANOVA variances σu2 . Some popular model types for the weights:
Order-dependent weights: γu = Γ∣u∣ depends only on ∣u∣.
It suffices to select Γ1 , Γ2 , Γ3 , etc.
Dr
Often, one would take Γj = 0 for j > k for some integer k (k parameters to select).
Or Γj = ρj−1 for all j (a single parameter to select).
Product weights: γu = ∏j∈u γj where γj ≥ 0 is the weight for coordinate j.
Mixture: POD weights γu = Γ∣u∣ ∏j∈u γj , SPOD weights, etc.
18
aft
(Caflisch, Morokoff, and Owen 1997).
A function f has effective dimension d in proportion ρ in the superposition sense if
∑ σu ≥ ρσ .
2 2
∣u∣≤d
∑ σu2 ≥ ρσ 2 .
Dr u⊆{1,...,d}
aft
Stochastic Activity Network
x = 64
x = 100
CMC, x = 64
Dr
CMC, x = 100
0 20 40 60 80
% of total variance for each cardinality of u
100
20
aft
Random Korobov. Try r random values of a.
Dr
20
aft
Random Korobov. Try r random values of a.
Rank 1, exhaustive search. Try all possibilities for a = (1, a2 , . . . , as ).
Pure random search. Try admissible vectors a at random.
Dr
20
aft
Random Korobov. Try r random values of a.
Rank 1, exhaustive search. Try all possibilities for a = (1, a2 , . . . , as ).
Pure random search. Try admissible vectors a at random.
Component by component (CBC) construction. (Sloan, Kuo, etc.).
Let a1 = 1;
For j = 2, 3, . . . , s, find z ∈ {1, . . . , n − 1}, gcd(z , n) = 1, such that
(a1 , a2 , . . . , aj = z) minimizes D(Pn ({1, . . . , j})).
Dr
Fast CBC construction for Pγ,α : use FFT. (Nuyens, Cools).
20
aft
Random Korobov. Try r random values of a.
Rank 1, exhaustive search. Try all possibilities for a = (1, a2 , . . . , as ).
Pure random search. Try admissible vectors a at random.
Component by component (CBC) construction. (Sloan, Kuo, etc.).
Let a1 = 1;
For j = 2, 3, . . . , s, find z ∈ {1, . . . , n − 1}, gcd(z , n) = 1, such that
(a1 , a2 , . . . , aj = z) minimizes D(Pn ({1, . . . , j})).
Let a1 = 1; Dr
Fast CBC construction for Pγ,α : use FFT. (Nuyens, Cools).
Randomized CBC construction.
aft
For s = 2, a linear regression of log Pγ,2 vs log n for 212 ≤ n ≤ 219 gives a decreasing rate near
O(n−1.9 ) for the best and O(n−1 ) for the mean. The worst value is with a = (1, 1) for all n.
The median is close to the best, which means that the majority of vectors a are quite good!
worst
10−1 mean
90 %
median
merit value
10−4 10 %
10−7
10−10
2 6
Dr
s =2
28 210 212
n
214 216 218
best
22
For s = 3, a linear regression log Pγ,2 vs log n for 210 ≤ n ≤ 215 gives decreasing rates near
O(n−1.75 ) for the best and O(n−1 ) for the mean.
aft
Again, the median is better than the mean.
100 worst
mean
90 %
10−2 median
merit value
10 %
best
10−4
10−6
26
Dr
s =3
28 210
n
212 214 216
23
aft
To see the effect of weights selection on RQMC variance, when choosing a lattice rule, we
shall integrate the worst-case function
√ (2π)α
fα∗ (u) = ∑ vu ∏ Bα (uj ),
u⊆{1,...,s} j∈u (α)!
Dr
variance when we use a lattice rule constructed via fast CBC with different weights γu ≠ vu .
We start with s = 10 and vu = Γ∣u∣ for ∣u∣ ≤ k, for a given integer k and a given constant
Γ > 0. We select the weights as γu = Γ̃∣u∣ for ∣u∣ ≤ k̃, where Γ̃ and k̃ may differ from Γ and k.
24
aft
n A1 A2 B1 B2 C1 C2
28 1.11 1.21 1.13 4.08 3.82 6.80
29 1.21 1.10 1.42 10.5 2.93 7.25
210 1.36 1.38 2.04 4.64 2.86 5.94
211 1.24 1.43 2.40 6.18 2.15 5.14
212 1.42 1.66 3.79 13.2 2.47 5.94
213 1.30 2.38 5.51 9.09 2.66 5.97
214 1.51 2.54 30.5 8.66 9.11 29.1
Dr
215
216
1.46
1.80
1.93
2.55
25.6
3.13
13.3
12.9
aft
B1: Γ2 = Γ̃2 = 0.1, k = 4 and k̃ = 2. Criterion is blind on projections of order 3 and 4. This
has unpredictable (sometimes dramatic) impact on RQMC variance.
Dr
26
C1: Γ2 = Γ̃2 = 0.1 and k = k̃ = 4, but increase the variation of f by replacing vu2 with
vu2 + ṽu2 , with
aft
ṽu2 = 1.0 for u = {1, 3}, {3, 5}, {5, 7}, {7, 9},
ṽu2 = 0.5 for u = {2, 3, 4}, {4, 5, 6}, {6, 7, 8}, {8, 9, 10},
ṽu2 = 0.25 for u = {1, 2, 3, 4}, {4, 5, 6, 7}, {7, 8, 9, 10},
and ṽu = 0 for all other u.
Important projections are not given enough weight relative to others.
C2: Like C1, but vu2 is replaced with only ṽu2 , as defined above. Many irrelevant projections,
with ṽu = 0, now have nonzero weights.
Dr
In all cases, the variance ratio increases (non-monotonically) with n.
The worst degradations are usually observed when we give too much weight to irrelevant
projections!
27
aft
done by selecting a Reproducing-Kernel Hilbert Space of functions. The choice of kernel
determines D and V.
1/p 1/q
⎛ ⎞ ⎛ ⎞
V(f ) = ∑
Dr
⎝∅/=u⊆{1,2,...,s}
(V(fu )/γu )p
⎠
, D(Pn ) = ∑
⎝∅/=u⊆{1,2,...,s}
(γu Du (Pn ))q
1/p + 1/q = 1, γu ≥ 0 is a weight assigned to the subset u, V(fu ) is the variation of fu , and
Du (Pn ) is the discrepancy of the projection of Pn (u).
Common choice: p = q = 2. Sometimes q = ∞ with p = 1.
⎠
,
28
Form of discrepancies commonly used in constructions
Many discrepancies that are typically used to search for good parameters efficiently (e.g., in
aft
LatNet Builder) are as in the previous slide, with
1 n−1
Duq (Pn ) = ∑ ∏ φ(ui,j )
n i=0 j∈u
for some function φ ∶ [0, 1) → R, usually with q = 2. When computing the products for all
subsets u, we can start with the smaller subsets u, then save and re-use the partial products.
For the weighted P2α , we take q = 2 and
We need software to support (1) search for good point set parameters and
(2) to generate and randomize the points, for these various constructions.
29
aft
lattice rules, and digital nets.
Various choices of figures of merit, arbitrary weights, construction methods, etc. Easily
extensible.
For better run-time efficiency, uses static polymorphism, via templates, rather than dynamic
polymorphism. Several techniques to reduce computations and improve speed.
Offers a pre-compiled program with Unix-like command line interface.
Also a graphical interface implemented in Python.
Dr
Available for download on GitHub, with source code and documentation.
aft
we use Walsh series for the expansion of f .
Dr
30
aft
we use Walsh series for the expansion of f .
Let N0 = {0, 1, 2, . . . }. For k = (k1 , . . . , ks ) ∈ Ns0 and u = (u1 , . . . , us ) ∈ [0, 1)s , where
ℓj s ℓj
kj = ∑ kj,ℓ bℓ−1 ∈ N0 and uj = ∑ uj,ℓ b−ℓ ∈ [0, 1), define ⟨k, u⟩ = ∑ ∑ kj,ℓ uj,ℓ mod b.
ℓ=1 ℓ≥1 j=1 ℓ=1
[0,1)s
f (u)e −2πi⟨k,u⟩/b du.
30
aft
we use Walsh series for the expansion of f .
Let N0 = {0, 1, 2, . . . }. For k = (k1 , . . . , ks ) ∈ Ns0 and u = (u1 , . . . , us ) ∈ [0, 1)s , where
ℓj s ℓj
kj = ∑ kj,ℓ bℓ−1 ∈ N0 and uj = ∑ uj,ℓ b−ℓ ∈ [0, 1), define ⟨k, u⟩ = ∑ ∑ kj,ℓ uj,ℓ mod b.
ℓ=1 ℓ≥1 j=1 ℓ=1
[0,1)s
f (u)e −2πi⟨k,u⟩/b du.
aft
Dual net
Suppose we estimate µ = E[f (U)] using a digital net in base b with point set
Pn = {ui , i = 0, . . . , n − 1}. The dual net is
Dr n i=0
⎩0 otherwise.
The nonzero digits kj,ℓ of a vector k select a specific set of digits ui,j,ℓ of the points. The
points are equidistributed for those specific digits iff the above sum is 0 for this k, iff k ∈/ Cs∗ .
32
aft
Error and variance expressions
For a deterministic digital net, if ∑ ∣f˜(k)∣ < ∞, then En = ∑ f˜(k).
k∈Ns0 0=
/ k∈Cs∗
Dr
32
aft
Error and variance expressions
For a deterministic digital net, if ∑ ∣f˜(k)∣ < ∞, then En = ∑ f˜(k).
k∈Ns0 0=
/ k∈Cs∗
From a variance reduction viewpoint, an optimal digital net for f minimizes this
Dr
Var[µ̂n,rqmc ]. Roughly, the dual net should not contain vectors k that correspond to large
square Walsh coefficients.
33
Bounds on Walsh coefficients
In the following, we assume for simplicity that b = 2.
aft
For a integer α > 0, a function f has smoothness α (roughly) if it has square-integrable
mixed partial derivatives up to order α.
For an integer k = 2a1 + ⋯ + 2aν > 0 with a1 > ⋅ ⋅ ⋅ > aν , we put
µα (k) = (a1 + 1) + ⋯ + (amin(α,ν) + 1), and µα (0) = 0.
ℓ min(α,ℓ )
Equivalently, if kj = ∑ℓ=1
j
kj,ℓ 2ℓ−1 , µα (kj ) = ∑ℓ=1 j ℓI[kj,ℓ > 0].
For a vector k = (k1 , . . . , ks ), let µα (k) = µα (k1 ) + ⋯ + µα (ks ).
Dick (2007, 2008) has shown that if f has smoothness α, then ∣fˆ(k)∣ ≤ K Ṽα2 (f )2−µα (k)
Dr
approximately, where K is a constant and Ṽα (f ) is the variation of f , which depends on the
integrals of the mixed partial derivatives of f of order up to α.
33
Bounds on Walsh coefficients
In the following, we assume for simplicity that b = 2.
aft
For a integer α > 0, a function f has smoothness α (roughly) if it has square-integrable
mixed partial derivatives up to order α.
For an integer k = 2a1 + ⋯ + 2aν > 0 with a1 > ⋅ ⋅ ⋅ > aν , we put
µα (k) = (a1 + 1) + ⋯ + (amin(α,ν) + 1), and µα (0) = 0.
ℓ min(α,ℓ )
Equivalently, if kj = ∑ℓ=1
j
kj,ℓ 2ℓ−1 , µα (kj ) = ∑ℓ=1 j ℓI[kj,ℓ > 0].
For a vector k = (k1 , . . . , ks ), let µα (k) = µα (k1 ) + ⋯ + µα (ks ).
Dick (2007, 2008) has shown that if f has smoothness α, then ∣fˆ(k)∣ ≤ K Ṽα2 (f )2−µα (k)
Dr
approximately, where K is a constant and Ṽα (f ) is the variation of f , which depends on the
integrals of the mixed partial derivatives of f of order up to α.
This suggests using the FOM ∑0=/ k∈Cs∗ 2−µα (k) for QMC and ∑0=/ k∈Cs∗ 2−2µα (k) for RQMC,
for a given α. But these sums have an infinite number of terms!
Dick replaced the sum by a max and showed how to construct digital nets for which
max0=/ k∈Cs∗ 2−µα (k) = O(n−α (log n)αs ), for any integer α > 0, using an interlacing technique.
34
Figures of Merit for Digital Nets
A variety of discrepancies of the form
aft
1/q
⎛ ⎞ 1 n−1
D(Pn ) = ∑ (γu Du (Pn ))q with (Du (Pn ))q = ∑ ∏ φ(ui,j )
⎝∅/=u⊆{1,2,...,s} ⎠ n i=0 j∈u
for some function φ ∶ [0, 1) → R have been defined recently for digital nets, based on
various assumptions on the rate of decrease of the Walsh coefficients, which often follow
from an assumption on the smoothness level α of f .
Some of them can be seen as counterparts of the Pα used for lattices.
Dr
They can be used in the same way to make CBC constructions
One popular choice of φ for b = 2 and α = 1:
where −⌊log2 u⌋ is the index of the first nonzero digit in the binary expansion of u.
Latnet Builder can make searches for good point sets based on such discrepancies.
35
aft
Replace Z by F2 [z], the ring of polynomials over finite field F2 ≡ {0, 1};
Replace R by L2 = F2 ((z −1 )), the field of formal Laurent series over F2 , of the form
∑∞ −ℓ
ℓ=ω xℓ z , where xℓ ∈ F2 . Define ϕ ∶ L → R by
∞ ∞
ϕ ( ∑ xℓ z −ℓ ) = ∑ xℓ b−ℓ .
ℓ=ω ℓ=ω
Pn = {(ϕ (
Q(z)
Dr
a(z) = (a1 (z), . . . , as (z)) ∈ Z2 [z]s , whose coordinates are polynomials of degrees less than
k having no common factor with Q(z). The point set of cardinality n = 2k is
h(z)a1 (z)
),...,ϕ(
h(z)as (z)
Q(z)
)) ∶ h(z) ∈ Z2 [z] and deg(h(z)) < k} .
36
aft
Most of the properties of ordinary lattice rules have counterparts for the polynomial rules.
The Fourier expansion is replaced by a Walsh expansion, the weighted Pγ,α has a
counterpart Pγ,α,PRL , CBC constructions can provide good parameters, fast CBC also
works, etc.
Dr
36
aft
Most of the properties of ordinary lattice rules have counterparts for the polynomial rules.
The Fourier expansion is replaced by a Walsh expansion, the weighted Pγ,α has a
counterpart Pγ,α,PRL , CBC constructions can provide good parameters, fast CBC also
works, etc.
A PLR is actually a special case of a digital net in base b. This can be used to generate
the points efficiently: compute the generating matrices and use the digital net
implementation. This is particularly fast in base b = 2.
Dr
36
aft
Most of the properties of ordinary lattice rules have counterparts for the polynomial rules.
The Fourier expansion is replaced by a Walsh expansion, the weighted Pγ,α has a
counterpart Pγ,α,PRL , CBC constructions can provide good parameters, fast CBC also
works, etc.
A PLR is actually a special case of a digital net in base b. This can be used to generate
the points efficiently: compute the generating matrices and use the digital net
implementation. This is particularly fast in base b = 2.
Dr
Random shift in space of formal series: equivalent to a random digital shift in base b,
applied to all the points. It preserves equidistribution.
37
aft
More Examples
Dr
38
aft
Market price of some asset (e.g., one share of a stock) evolves in time as stochastic
process {S(t), t ≥ 0} with (supposedly) known probability law (estimated from data).
A financial contract gives owner net payoff g(S(t1 ), . . . , S(td )) at time T = td , where
g ∶ Rd → R, and 0 ≤ t1 < ⋯ < td are fixed observation times.
Under a no-arbitrage assumption, present value of contract at time 0, when S(0) = s0 , is
Dr
where E∗ is under a risk-neutral measure and e −r T is the discount factor.
This expectation can be written as an integral over [0, 1)s for some s, and estimated by the
average of n i.i.d. replicates of X = e −r T g(S(t1 ), . . . , S(td )).
39
aft
S(t) = s0 e X(t) where X(t) = (r − σ 2 /2)t + σB(t),
r is the interest rate, σ is the volatility, and B(⋅) is a standard Brownian motion: for any t2 > t1 ≥ 0,
B(t2 ) − B(t1 ) ∼ N(0, t2 − t1 ), and the increments over disjoint intervals are independent.
Dr
39
aft
S(t) = s0 e X(t) where X(t) = (r − σ 2 /2)t + σB(t),
r is the interest rate, σ is the volatility, and B(⋅) is a standard Brownian motion: for any t2 > t1 ≥ 0,
B(t2 ) − B(t1 ) ∼ N(0, t2 − t1 ), and the increments over disjoint intervals are independent.
Here, the vector Y = (X(t1 ), . . . , X(td )) has a multivariate normal distribution with some mean
vector µ and covariance matrix Σ .
To generate Y , we decompose Σ = AAt , and put Y = AZ where Z ∼ N (0
0, I).
Dr
Possible decompositions: Cholesky, Brownian bridge, PCA, etc.
Choice does not matter for MC but does matter for RQMC.
39
aft
S(t) = s0 e X(t) where X(t) = (r − σ 2 /2)t + σB(t),
r is the interest rate, σ is the volatility, and B(⋅) is a standard Brownian motion: for any t2 > t1 ≥ 0,
B(t2 ) − B(t1 ) ∼ N(0, t2 − t1 ), and the increments over disjoint intervals are independent.
Here, the vector Y = (X(t1 ), . . . , X(td )) has a multivariate normal distribution with some mean
vector µ and covariance matrix Σ .
To generate Y , we decompose Σ = AAt , and put Y = AZ where Z ∼ N (0
0, I).
Dr
Possible decompositions: Cholesky, Brownian bridge, PCA, etc.
Choice does not matter for MC but does matter for RQMC.
aft
Example of contract: Discretely-monitored Asian call option:
⎛ 1 d ⎞
g(S(t1 ), . . . , S(td )) = max 0, ∑ S(tj ) − K .
⎝ d j=1 ⎠
Dr
41
aft
d = 3, seq.
d = 3, BB
d = 3, PCA
d = 6, seq.
d = 6, BB
d = 6, PCA
Dr
d = 12, seq.
d = 12, BB
d = 12, PCA
42
aft
100 Asian Option (PCA) d = 12, S(0) = 100, K = 100, r = 0.05, σ = 0.5 MC
Sobol
Lattice (P2 ) + bake
variance
n−2
10−3
10−6
102
Dr 103
n
104
43
aft
We have c assets, d observation times. Want to estimate E[f (U)], where
⎡ ⎤
⎢ 1 c d ⎥
f (U) = e −r T ⎢
max ⎢0, ∑ ∑ Si (tj ) − K ⎥⎥ .
⎢ cd i=1 j=1 ⎥
⎣ ⎦
Dr
43
aft
We have c assets, d observation times. Want to estimate E[f (U)], where
⎡ ⎤
⎢ 1 c d ⎥
f (U) = e −r T ⎢
max ⎢0, ∑ ∑ Si (tj ) − K ⎥⎥ .
⎢ cd i=1 j=1 ⎥
⎣ ⎦
Dr
Then, f (U) = g(Y ) where Y = (Y1 , . . . , Ys ) ∼ N(0
0, Σ ) and s = cd.
44
aft
Let ρi,j = 0.4 for all i =/ j (correlations between assets), T = 1, σi = 0.1 + 0.4(i − 1)/9 for all
i , r = 0.04, S(0) = 100, and K = 100. (Imai and Tan 2002; see L. 2009 for more details).
Dr
44
aft
Let ρi,j = 0.4 for all i =/ j (correlations between assets), T = 1, σi = 0.1 + 0.4(i − 1)/9 for all
i , r = 0.04, S(0) = 100, and K = 100. (Imai and Tan 2002; see L. 2009 for more details).
Variance reduction factors for Cholesky (left) and PCA (right) (experiment from 2003):
Dr
Lattice+shift+baker 50 4553 46 3657
Sobol+Shift
Sobol+LMS+Shift
Sobol’ Nets
n = 214
10 1299 17 3184
6 4232
n = 216
4 9219
43
32
7553
n = 218
6046
35 16557
Note: The payoff function is not smooth and also unbounded. So Koksma-Hlawka does not apply.
45
aft
Statistics: estimate and optimize the likelihood function for given data.
Computer graphics: estimate the color of a pixel.
Service systems with random arrival rates.
Stochastic differential equations (e.g., in finance).
Dr
PDEs with random coefficients.
Etc.
46
aft
RQMC for Markov chains
Dr
47
aft
X 0 = x0 , Xj = ϕj (Xj−1 , Uj ), j ≥ 1,
where the Uj are i.i.d. uniform r.v.’s over (0, 1)d . Want to estimate
τ
µ = E[Y ] where Y = ∑ gj (Xj ).
j=1
Dr
47
aft
X 0 = x0 , Xj = ϕj (Xj−1 , Uj ), j ≥ 1,
where the Uj are i.i.d. uniform r.v.’s over (0, 1)d . Want to estimate
τ
µ = E[Y ] where Y = ∑ gj (Xj ).
j=1
Dr
Array-RQMC: L., Lécot, Tuffin, et al. [2004, 2006, 2008, etc.]
Simulate an “array” (or population) of n chains in “parallel.”
Goal: Want small discrepancy between empirical distribution of states
Sn,j = {X0,j , . . . , Xn−1,j } and theoretical distribution of Xj , at each step j.
At each step, use RQMC point set to advance all the chains by one step.
48
Array-RQMC insight: To simplify, suppose Xj ∼ U(0, 1)ℓ . We estimate
aft
[0,1)ℓ+d
1 n−1 1 n−1
µ̂arqmc,j,n = ∑ gj (Xi,j ) = ∑ gj (ϕj (Xi,j−1 , Ui,j )).
n i=0 n i=0
This is (roughly) RQMC with the point set Qn = {(Xi,j−1 , Ui,j ), 0 ≤ i < n} .
We want Qn to have low discrepancy (LD) over [0, 1)ℓ+d .
Dr
48
Array-RQMC insight: To simplify, suppose Xj ∼ U(0, 1)ℓ . We estimate
aft
[0,1)ℓ+d
1 n−1 1 n−1
µ̂arqmc,j,n = ∑ gj (Xi,j ) = ∑ gj (ϕj (Xi,j−1 , Ui,j )).
n i=0 n i=0
This is (roughly) RQMC with the point set Qn = {(Xi,j−1 , Ui,j ), 0 ≤ i < n} .
We want Qn to have low discrepancy (LD) over [0, 1)ℓ+d .
However, we do not choose the Xi,j−1 ’s in Qn : they come from the simulation.
Dr
48
Array-RQMC insight: To simplify, suppose Xj ∼ U(0, 1)ℓ . We estimate
aft
[0,1)ℓ+d
1 n−1 1 n−1
µ̂arqmc,j,n = ∑ gj (Xi,j ) = ∑ gj (ϕj (Xi,j−1 , Ui,j )).
n i=0 n i=0
This is (roughly) RQMC with the point set Qn = {(Xi,j−1 , Ui,j ), 0 ≤ i < n} .
We want Qn to have low discrepancy (LD) over [0, 1)ℓ+d .
However, we do not choose the Xi,j−1 ’s in Qn : they come from the simulation.
Dr
Idea: select RQMC point set Q̃n = {(w0 , U0,j ), . . . , (wn−1 , Un−1,j )} , where the wi ∈ [0, 1)ℓ
are fixed and each Ui,j ∼ U(0, 1)d .
Permute the states Xi,j−1 so that Xπj (i),j−1 is “close” to wi for each i and compute
Xi,j = ϕj (Xπj (i),j−1 , Ui,j ) for each i .
Example: If ℓ = 1, can take wi = (i + 0.5)/n and just sort the states.
For ℓ > 1, there are various ways to define the matching (multivariate sort).
49
Array-RQMC algorithm
aft
Xi,0 ← x0 (or Xi,0 ← xi,0 ) for i = 0, . . . , n − 1;
for j = 1, 2, . . . , τ do
Compute the permutation πj of the states (for matching);
Randomize afresh {U0,j , . . . , Un−1,j } in Q̃n ;
Xi,j = ϕj (Xπj (i),j−1 , Ui,j ), for i = 0, . . . , n − 1;
i=0 g(Xi,j );
µ̂arqmc,j,n = Ȳn,j = n1 ∑n−1
Estimate µ by the average Ȳn = µ̂arqmc,n = ∑τj=1 µ̂arqmc,j,n .
Dr
49
Array-RQMC algorithm
aft
Xi,0 ← x0 (or Xi,0 ← xi,0 ) for i = 0, . . . , n − 1;
for j = 1, 2, . . . , τ do
Compute the permutation πj of the states (for matching);
Randomize afresh {U0,j , . . . , Un−1,j } in Q̃n ;
Xi,j = ϕj (Xπj (i),j−1 , Ui,j ), for i = 0, . . . , n − 1;
i=0 g(Xi,j );
µ̂arqmc,j,n = Ȳn,j = n1 ∑n−1
Estimate µ by the average Ȳn = µ̂arqmc,n = ∑τj=1 µ̂arqmc,j,n .
Dr
Proposition: The average Ȳn is an unbiased estimator of µ.
Can estimate Var[Ȳn ] by making m independent replications.
50
aft
Sobol’ net in 2 dimensions af-
States of the chains ter random digital shift
1.0 1.0 s
s s s
0.9 0.9
s
0.8
s s 0.8 s
s s
0.7 0.7 s
0.6 0.6
s
s s
0.5 0.5 s
s s
0.4
0.3
0.2
0.1
0.0
s s
s
Dr
s
s
s s s
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
s
0.4
0.3
0.2
0.1
0.0
s
s
s
s
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
s
s
51
aft
Sobol’ net in 2 dimensions af-
States of the chains ter random digital shift
1.0 1.0 s
s s s
0.9 0.9
s
0.8
s s 0.8 s
s s
0.7 0.7 s
0.6 0.6
s
s s
0.5 0.5 s
s s
0.4
0.3
0.2
0.1
0.0
s s
s
Dr
s
s
s s s
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
s
0.4
0.3
0.2
0.1
0.0
s
s
s
s
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
s
s
52
aft
Sobol’ net in 2 dimensions af-
States of the chains ter random digital shift
1.0 1.0 s
s s s
0.9 0.9
s
0.8
s s 0.8 s
s s
0.7 0.7 s
0.6 0.6
s
s s
0.5 0.5 s
s s
0.4
0.3
0.2
0.1
0.0
s s
s
Dr
s
s
s s s
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
s
0.4
0.3
0.2
0.1
0.0
s
s
s
s
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
s
s
52
aft
Sobol’ net in 2 dimensions af-
States of the chains ter random digital shift
1.0 1.0 s
s s s
0.9 0.9
s
0.8
s s 0.8 s
s sz
0.7 0.7 s
0.6 0.6
s
sz s
0.5 0.5 sz
s s
0.4
0.3
0.2
0.1
0.0
s s
s
Dr
s
s
sz s s
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
s
0.4
0.3
0.2
0.1
0.0
s
s
s
s
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
s
s
53
aft
States of the chains
1.0
0.9
s s
0.8
s s
s
0.7
0.6
s
Dr 0.5
0.4
0.3
0.2
0.1
0.0
s s
s
s
s
s
s
s s
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
s
53
aft
States of the chains
1.0
0.9
s s
0.8
s s
s
0.7
0.6
s
Dr 0.5
0.4
0.3
0.2
0.1
0.0
s s
s
s
s
s
s
s s
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
s
53
aft
States of the chains
1.0
0.9
s s
0.8
s s
s
0.7
0.6
s
Dr 0.5
0.4
0.3
0.2
0.1
0.0
s s
s
s
s
s
s
s s
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
s
53
aft
States of the chains
1.0
0.9
s s
0.8
s s
s
0.7
0.6
s
Dr 0.5
0.4
0.3
0.2
0.1
0.0
s s
s
s
s
s
s
s s
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
s
54
aft
For ℓ = 1, O(n−3/2 ) variance has been proved under some conditions.
For ℓ > 1, worst-case error of O(n−1/(ℓ+1) ) has been proved in deterministic settings under
strong conditions on ϕj , using a batch sort (El Haddad, Lécot, L’Ecuyer 2008, 2010).
Gerber and Chopin (2015) proved o(n−1 ) variance, for Hilbert sort and digital net with
nested scrambling.
Dr
55
aft
Markov chain is defined by
X0 = U0 ∼ U(0, 1);
Xj = ϕj (Xj−1 , Uj ) = Gθ (θXj−1 + (1 − θ)Uj ), j ≥ 1
Dr
gj (x) = x − 1/2, gj (x) = x 2 − 1/3,
gj (x) = sin(2πx), gj (x) = e x − e + 1,
gj (x) = (x − 1/2)+ − 1/8, gj (x) = I[x ≤ 1/3] − 1/3.
They all have E[gj (Xj )] = 0.
Also discrepancies of states X0,j , . . . , Xn−1,j .
56
aft
For array-RQMC, we take Xi,0 = wi = (i − 1/2)/n.
We have
n−3/2 1−θ
E[Dj2 ] ≤ = n−3/2 .
4(1 − ρ) 4(1 − 2θ)
Dr
We also looked at E[Dj2 ] and E[Pα ] for α = 2, 4, 6.
57
aft
Some MC and RQMC point sets:
aft
LHS -1.02 -0.99 -1.00 -1.00 -1.00
SS -1.50 -1.98 -2.00 -2.00 -1.49
SSA -1.50 -2.65 -2.56 -2.50 -1.50
Sobol -1.51 -3.22 -3.14 -2.52 -1.49
Sobol+baker -1.50 -3.41 -3.36 -2.54 -1.50
Sobol+NUS -1.50 -2.95 -2.95 -2.54 -1.52
Korobov -1.87 -2.00 -1.98 -1.98 -1.85
Korobov+baker -1.92 -2.01 -2.02 -2.01 -1.90
MC
LHS
SS
SSA
Sobol
Korobov
Dr
7.35
8.82
13.73
18.12
19.86
13.55
− log10 Var[Ȳn,j ] for n = 221
Xj2 − 13 (Xj − 12 )+ − 18 I[Xj ≤ 31 ] −
7.86
8.93
14.10
17.41
17.51
14.03
6.98
7.61
10.20
10.38
10.36
11.98
1
3
CPU time (sec)
270
992
2334
1576
443
359
59
aft
Similar results for randomly-shifted lattice + baker’s transform.
log2 Var[µ̂RQMC,n ]
-10
n−1
crude MC
-20
RQMC sequential
-30
-40
8
Dr10 12 14 16 18 20
array-RQMC, split sort
n−2
log2 n
60
Var[Ȳn,j ] ≈ O(nα ).
aft
VRF compared with MC, for n = 220 .
CPU time for m = 100 replications.
Dr
Hilbert sort
(logistic map)
SS
Sobol
Sobol+NUS
Korobov+baker
-1.55
-2.03
-2.02
-2.01
2.4 × 103
2.6 × 106
2.8 × 106
3.3 × 106
840
534
724
567
61
aft
[Puchhammer, Ben Abdellah, and L. 2022]
We have d types of molecules and K types of reactions:
ck
α1,k S1 + ⋯ + αd,k Sd Ð→ β1,k S1 + ⋯ + βd,k Sd , αi,k , βi,k ∈ N0 , k = 1, . . . , K.
Let X(t) = (X1 (t), . . . , Xd (t)) where Xk (t) is the number of molecules of type k at time
t. The process {X(t), t ≥ 0} is modeled as a continuous-time Marlov chain (CTMC).
The propensity (or occurrence rate) of reaction k at time t is ak (X(t)) = ck hk (X(t)).
Dr
61
aft
[Puchhammer, Ben Abdellah, and L. 2022]
We have d types of molecules and K types of reactions:
ck
α1,k S1 + ⋯ + αd,k Sd Ð→ β1,k S1 + ⋯ + βd,k Sd , αi,k , βi,k ∈ N0 , k = 1, . . . , K.
Let X(t) = (X1 (t), . . . , Xd (t)) where Xk (t) is the number of molecules of type k at time
t. The process {X(t), t ≥ 0} is modeled as a continuous-time Marlov chain (CTMC).
The propensity (or occurrence rate) of reaction k at time t is ak (X(t)) = ck hk (X(t)).
a given time T . Dr
We discretize the time and simulate the discrete-time chain to estimate say E[g(X(T )] for
aft
c1
Ð→
PKA + 2cAMP ←Ð PKA-cAMP2 ,
c2
c3
Ð→
PKA-cAMP2 + 2cAMP ←Ð PKA-cAMP4 ,
c4
c5
Ð→
PKA-cAMP4 ←Ð PKAr + 2PKAc.
c6
Dr
We simulate 256 time steps.
The state has 6 dimensions, and we need 6 random numbers per step.
Classical RQMC requires 256 × 6 = 1536-dimensional points.
Array-RQMC requires 7 or 12-dimensional RQMC points.
63
aft
Here we estimate the expected number of PKAr molecules at time T .
Estimated convergence rate β̂ and variance reduction factor for n = 219 points:
Dr
Batch-sort-6
Sobol+LMS+ds
Latice+shift
Sobol+LMS+ds
1.47
1.62
1.46
2062
3026
2753
1059
1560
1749
64
aft
L., Lécot, and Tuffin [2008]: τ can be a random stopping time w.r.t. the filtration
F{(j, Xj ), j ≥ 0}.
L., Demers, and Tuffin [2006, 2007]: Combination with splitting techniques (multilevel and
without levels), combination with importance sampling and weight windows. Covers particle
filters.
L. and Sanvido [2010]: Combination with coupling from the past for exact sampling.
stopping problems.
Dr
Dion and L. [2010]: Combination with approximate dynamic programming and for optimal
aft
L., Lécot, and Tuffin [2006, 2008]: Special cases: convergence at MC rate, one-dimensional,
stratification, etc. O(n−3/2 ) variance.
Dr
Fakhererredine, El Haddad, Lécot [2012, 2013, 2014]: LHS, stratification, Sudoku sampling, ...
L., Lécot, Munger, and Tuffin [2016]: Survey, comparing sorts, and further examples, some with
O(n−3 ) empirical variance.
66
aft
L., Lécot, and Tuffin [2008]: Several examples.
Gerber and Chopin [2015]: Sequential QMC (particle filters), Owen nested scrambling and Hilbert
sort. o(n−1 ) variance.
Ben Abdellah, L., Puchhammer [2019]: Option Pricing Under Stochastic Volatility Models.
Dr
Puchhammer, Ben Abdellah, L. [2020]: Simulation of Biological and Chemical Reactions.
Arnold, Chen, Sha [2021]: Policy Gradient Methods for Reinforcement Learning.
67
aft
▸ Cleverly modifying the function f can often bring huge statistical efficiency
improvements in simulations with RQMC.
▸ There are often many possibilities for how to change f to make it smoother, periodic,
and reduce its effective dimension.
▸ Point set constructions should be based on discrepancies that take that into account.
▸ Nonlinear functions of expectations: RQMC also reduces the bias.
▸ RQMC for density estimation.
Dr
▸ RQMC for optimization.
▸ Array-RQMC and other QMC methods for Markov chains. Sequential RQMC.
▸ Still a lot to learn and do ...
▸ Software support still very limited in general!