1482299666
1482299666
Computing in Finance
Problems, Methods,
and Solutions
High-Performance
Computing in Finance
Problems, Methods,
and Solutions
Edited by
M. A. H. Dempster
Juho Kanniainen
John Keane
Erik Vynckier
MATLAB R is a trademark of The MathWorks, Inc. and is used with permission. The MathWorks
does not warrant the accuracy of the text or exercises in this book. This book’s use or discussion
of MATLAB R software or related products does not constitute endorsement or sponsorship by The
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
c 2018 by Taylor & Francis Group, LLC
CRC Press is an imprint of Taylor & Francis Group, an Informa business
This book contains information obtained from authentic and highly regarded sources. Reasonable
efforts have been made to publish reliable data and information, but the author and publisher cannot
assume responsibility for the validity of all materials or the consequences of their use. The authors and
publishers have attempted to trace the copyright holders of all material reproduced in this publication
and apologize to copyright holders if permission to publish in this form has not been obtained. If any
copyright material has not been acknowledged please write and let us know so we may rectify in any
future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information stor-
age or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access
[Link] ([Link] or contact the Copyright Clearance Center, Inc.
(CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization
that provides licenses and registration for a variety of users. For organizations that have been granted
a photocopy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and
are used only for identification and explanation without intent to infringe.
Editors xi
Contributors xiii
Introduction xvii
vii
viii Contents
14 Supercomputers 413
Peter Schober
Index 589
Editors
xi
Contributors
Luca Capriotti
Quantitative Strategies Jacques du Toit
Investment Banking Division The Numerical Algorithms Group
and Ltd.
Department of Mathematics United Kingdom
University College London
London, United Kingdom Christina Erlwein-Sayer
OptiRisk Systems
Álvaro Cartea
Mathematical Institute
Oxford-Man Institute of Georgi Gaydadjiev
Quantitative Finance
University of Oxford Mark Gibbs
Oxford, United Kingdom Quantitative Research
FINCAD
Omar Andres Carmona Cortes
Computation Department
Instituto Federal do Maranhão Michael B. Giles
São Luis, Brazil Mathematical Institute
University of Oxford
M. A. H. Dempster Oxford, United Kingdom
Centre for Financial Research
University of Cambridge James B. Glattfelder
and Department of Banking and Finance
Cambridge Systems Associates University of Zurich
Cambridge, United Kingdom Zurich, Switzerland
Jens Deussen
Department of Computer Science Peter Goddard
RWTH Aachen University 1QBit
Germany Vancouver, Canada
xiii
xiv Contributors
Russell Goyder
Alexander Lipton
Quantitative Research
Stronghold Labs
FINCAD
Chicago, Illinois
Jon Gregory and
MIT Connection Science and
Jonathan Hüser
Engineering
Department of Computer Science
Cambridge, Massachusetts
RWTH Aachen University
Germany
Stefanus C. Maree
Sergey Ivliev Centrum Wiskunde & Informatica
Lykke Corporation Amsterdam, The Netherlands
Switzerland
and Juan Ivan Martin
International Air Transport
Laboratory of Cryptoeconomics and Association
Blockchain Systems
Perm State University
Russia Douglas McLean
Moody’s Analytics
Sebastian Jaimungal Edinburgh, Scotland,
Department of Statistical Sciences United Kingdom
University of Toronto
Canada Elena A. Medova
Centre for Financial Research
Mark Joshi
University of Cambridge
Department of Economics
and
University of Melbourne
Cambridge Systems Associates
Melbourne, Australia
Cambridge, United Kingdom
Christian Kahl
Quantitative Research Oskar Mencer
FINCAD
Andrew Milne
Juho Kanniainen
1QBit
Tampere University of Technology
Vancouver, Canada
Tampere, Finland
As lessons are being learned from the recent financial crisis and unsuccess-
ful stress tests, demand for superior computing power has been manifest in
the financial and insurance industries for reliability of quantitative models
and methods and for successful risk management and pricing. From a prac-
titioner’s viewpoint, the availability of high-performance computing (HPC)
resources allows the implementation of computationally challenging advanced
financial and insurance models for trading and risk management. Researchers,
on the other hand, can develop new models and methods to relax unrealis-
tic assumptions without being limited to achieving analytical tractability to
reduce computational burden. Although several topics treated in these pages
have been recently covered in specialist monographs (see, e.g., the references),
we believe this volume to be the first to provide a comprehensive up-to-date
account of the current and near-future state of HPC in finance.
The chapters of this book cover three interrelated parts: (i) Computation-
ally expensive financial problems, (ii) Numerical methods in financial HPC,
and (iii) HPC systems, software, and data with financial applications. They
consider applications which can be more efficiently solved with HPC, together
with topic reviews introducing approaches to reducing computational costs
and elaborating how different HPC platforms can be used for different finan-
cial problems.
Part I offers perspectives on computationally expensive problems in the
financial industry.
In Chapter 1, Jonathan Rosen, Christian Kahl, Russell Goyder, and Mark
Gibbs provide a concise overview of computational challenges in derivative
pricing, paying special attention to counterparty credit risk management. The
incorporation of counterparty risk in pricing generates a huge demand for com-
puting resources, even with vanilla derivative portfolios. They elaborate pos-
sibilities with different computing hardware platforms, including graphic pro-
cessing units (GPU) and field-programmable gate arrays (FPGA). To reduce
hardware requirements, they also discuss an algorithmic approach, called algo-
rithmic differentiation (AD), for calculating sensitivities.
In Chapter 2, Gautam Mitra, Christina Erlwein-Sayer, Cristiano Arbex
Valle, and Xiang Yu describe a method for generating daily trading signals
to construct second-order stochastic dominance (SSD) portfolios of exchange-
traded securities. They provide a solution for a computationally (NP) hard
optimization problem and illustrate it with real-world historical data for the
FTSE100 index over a 7-year back-testing period.
xvii
xviii Introduction
Acknowledgments
This book arose in part from the four-year EU Marie Curie project
High-Performance Computing in Finance (Grant Agreement Number 289032,
[Link]fi[Link]), which was recently completed. We would like to thank
Introduction xxi
all the partners and participants in the project and its several public events,
in particular its supported researchers, many of whom are represented in these
pages. We owe all our authors a debt of gratitude for their fine contributions
and for enduring a more drawn out path to publication than we had originally
envisioned. We would also like to express our gratitude to the referees and to
World Scientific and Emerald for permission to reprint Chapters 71 and 182 ,
respectively. Finally, without the expertise and support of the editors and staff
at Chapman & Hall/CRC and Taylor & Francis this volume would have been
impossible. We extend to them our warmest thanks.
Michael Dempster
Juho Kanniainen
John Keane
Erik Vynckier
Cambridge, Tampere, Manchester
August 2017
Bibliography
1. Foresight: The Future of Computer Trading in Financial Markets. 2012.
Final Project Report, Government Office for Science, London. https://
[Link]/government/publications/future-of-computer-trading-in-financial-
markets-an-international-perspective
2. De Schryver, C., ed. 2015. FPGA Based Accelerators for Financial Applications.
Springer.
Computationally
Expensive Problems in the
Financial Industry
1
Chapter 1
Computationally Expensive
Problems in Investment Banking
CONTENTS
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.1 Valuation requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
[Link] Derivatives pricing and risk . . . . . . . . . . . . . . . . . . . 5
[Link] Credit value adjustment/debit value
adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
[Link] Funding value adjustment . . . . . . . . . . . . . . . . . . . . 10
1.1.2 Regulatory capital requirements . . . . . . . . . . . . . . . . . . . . . . . . 11
[Link] Calculation of market risk capital . . . . . . . . . . . . 12
[Link] Credit risk capital . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
[Link] Capital value adjustment . . . . . . . . . . . . . . . . . . . . . 13
1.2 Trading and Hedging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.3 Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.3.1 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
[Link] Central processing unit/floating point unit . . . 15
[Link] Graphic processing unit . . . . . . . . . . . . . . . . . . . . . . . 16
[Link] Field programmable gate array . . . . . . . . . . . . . . . 16
[Link] In-memory data aggregation . . . . . . . . . . . . . . . . . . 17
1.3.2 Algorithmic differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
[Link] Implementation approaches . . . . . . . . . . . . . . . . . . . 19
[Link] Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
[Link] Coverage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.1 Background
Financial instruments traded on markets are essentially contractual agree-
ments between two parties that involve the calculation and delivery of quanti-
ties of monetary currency or its economic equivalent. This wider definition of
financial investments is commonly known as financial derivatives or options,
3
4 High-Performance Computing in Finance
and effectively includes everything from the familiar stocks and bonds to the
most complex payment agreements, which also include complicated mathe-
matical logic for determining payment amounts, the so-called payoff of the
derivative.
In the early days of derivatives, they were thought of more like traditional
investments and treated as such on the balance sheet of a business. Complex-
ity mainly arose in the definition and calculation of the option payoff, and
applying theoretical considerations to price them. It was quickly discovered
that probabilistic models, for economic factors on which option payoffs were
calculated, had to be quite restrictive in order to produce computationally
straightforward problems in pricing the balance sheet fair mark-to-market
value. The development of log-normal models from Black and Scholes was
quite successful in demonstrating not only a prescient framework for deriva-
tive pricing, but also the importance of tractable models in the practical appli-
cation of risk-neutral pricing theory, at a time when computational facilities
were primitive by modern standards. Developments since then in quantitative
finance have been accompanied by simultaneous advancement in computing
power, and this has opened the door to alternative computational methods
such as Monte Carlo, PDE discretization, and Fourier methods, which have
greatly increased the ability to price derivatives with complex payoffs and
optionality.
Nevertheless, recent crises in 2008 have revealed the above complexities
are only part of the problem. The state of credit worthiness was eventually
to be revealed as a major influence on the business balance sheet in the event
that a contractual counterparty in a derivatives contract fails to meet the
terms for payment. It was recognized that market events could create such
a scenario due to clustering and tail events. In response to the explosion of
credit derivatives and subsequent global financial crisis, bilateral credit value
adjustments (CVAs) and funding cost adjustments were used to represent
the impact of credit events according to their likelihood in the accounting
balance sheet. Credit value adjustments and funding adjustments introduced
additional complexity into the business accounting for market participants.
While previously simple trades only required simple models and textbook
formulas to value, the CVA is a portfolio derivative problem requiring joint
modeling of many state variables and is often beyond the realm of simple
closed-form computation.
Meanwhile, the controversial decision to use tax payer money to bail out
the financial institutions in the 2008 crisis ignited a strong political interest
to introduce regulation that requires the largest investors to maintain capital
holdings that meet appropriate thresholds commensurate with the financial
risk present in their balance sheet. In recent years, there have been an adoption
of capital requirements globally, with regional laws determining methods and
criteria, for the calculation of regulatory capital holdings. The demand placed
on large market participants to apply additional value adjustments for tax
and capital funding costs requires modeling these effects over the lifetime of
Computationally Expensive Problems in Investment Banking 5
Monte Carlo simulation: Monte Carlo methods offer a very generic tool
to approximate the functional of a stochastic process also allowing to
deal effectively with path-dependent payoff structures. The most general
approach to derivative pricing is based on pathwise simulation of time-
discretized stochastic dynamical equations for each underlying factor. The
advantage is in being able to handle any option payoff and exercise style,
as well as enabling models with many correlated factors. The disadvantage
is the overall time-consuming nature and very high level of complexity in
performing a full simulation.
Besides dealing with the complexity of the option payoff, the Black–Scholes
formula made use of a single risk-free interest rate and demonstrated that in
the theoretical economy, this rate had central importance for the time value
of money and the expected future growth of risk-neutral investment strategies
analogous to single currency derivatives. This means a single discount curve
per currency was all that was needed, which by modern standards led to
a fairly simple approach in the pricing of derivatives. For example, a spot-
starting interest rate swap, which has future cash flows that are calculated
using a floating rate that must be discounted to present value, would use
a single curve for the term structure of interest rates to both calculate the
risk-neutral implied floating rates and the discount rates for future cash flows.
However, the turmoil of 2008 revealed that collateral agreements were of
central importance in determining the relevant time value of money to use for
discounting future cash flows, and it quickly became important to separate
discounting from forward rate projection for collateralized derivatives. The
subsequent computational landscape required building multiple curves in a
single currency to account for institutional credit risk in lending at different
tenors.
Computationally Expensive Problems in Investment Banking 7
On top of this problem is the risk calculation, which requires the sensitivity
of the derivative price to various inputs and calculated quantities inside the
numerical calculation. A universal approach to this is to add a small amount
to each quantity of interest and approximate the risk with a finite difference
calculation, known as bumping. While this can be applied for all numeri-
cal and closed-form techniques, it does require additional calculations which
themselves can be time-consuming and computationally expensive. The mod-
ern view is that by incorporating analytic risk at the software library level,
known as algorithmic differentiation (AD), some libraries such as our own
produce analytic risk for all numerical pricing calculations, to be described in
Section 1.3.2.
where R(t) is the recovery rate at time t, V̂ (t) the value at time t of the
remainder of the underlying portfolio whose value is V (t), D(t) the discount
factor at time t, τ the time of default, and Iτ ∈(t,t+dt) is a default indicator,
evaluating to 1 if τ lies between time t and t + dt and 0 otherwise. Note
that Equation 1.3 does not account for default of the issuer, an important
distinction which is in particular relevant for regulatory capital calculation
purposes (Albanese and Andersen, 2014). Including issuer default is commonly
referred to as first-to-default CVA (FTDCVA).
The first step in proceeding with computation of CVA is to define a time
grid over periods in which the portfolio constituents expose either party to
credit risk. Next, the present value of the exposure upon counterparty default
is calculated at each point in time. Often this involves simplifying assump-
tions, such that the risk-neutral drift and volatility are sufficient to model
the evolution of the underlying market factors of the portfolio, and similarly
that counterparty credit, for example, is modeled as jump-to-default process
calibrated to observable market CDS quotes to obtain corresponding survival
probabilities.
However, in practically all realistic situations, the investment portfolio of
OTC derivatives will contain multiple trades with any given counterparty. In
these situations, the entities typically execute netting agreements, such as the
ISDA master agreement, which aims to consider the overall collection of trades
as a whole, such that gains and losses on individual positions are offset against
one another. In the case that either party defaults, the settlement agreement
considers the single net amount rather than the potentially many individual
losses and gains. One major consequence of this is the need to consider CVA
on a large portfolio composed of arbitrary groups of trades entered with a
given counterparty. As Equation 1.3 suggests, the CVA and DVA on such a
portfolio cannot be simply decomposed into CVA on individual investments,
thus it is a highly nonlinear problem which requires significant computational
facilities in order to proceed.
Quantifying the extent of the risk mitigation benefit of collateral is impor-
tant, and this usually requires modeling assumptions to be made. Collateral
modeling often incorporates several mechanisms which in reality do not lead
to perfect removal of credit default risk, which can be summarized as follows
(Gregory, 2015).
Computationally Expensive Problems in Investment Banking 9
Central to FVA is the idea that funding strategies are implicit in the
derivatives portfolio (essentially by replication arguments), however, the fund-
ing policy is typically specific to each business, and so incorporating this in
a value adjustment removes the symmetry of one price for each transaction.
This leads to a general picture of a balance sheet value which differs from
the mutually agreed and traded price. This lack of symmetry in the busi-
ness accounting problem is one of the reasons that FVA has yet to formally
be required on the balance sheet through regulation, though IFRS standards
could change in the near future to account for the increasing importance of
FVA (Albanese and Andersen, 2014; Albanese et al., 2014).
The details of funding costs are also linked to collateral modeling, which
also has strong impact on CVA/DVA, thus in many cases it becomes impos-
sible to clearly separate the FVA and CVA/DVA calculations in an addi-
tive manner. Modern FVA models are essentially exposure models which
derive closely from CVA, and fall into two broad classes, namely expectation
approaches using Monte Carlo simulation and replication approaches using
PDEs (Burgard and Kjaer, 2011; Brigo et al., 2013).
Computationally Expensive Problems in Investment Banking 11
risk-sensitive approach for EAD calculation under the Basel framework. EAD
is calculated at the netting set level and allows full netting and collateral
modeling. However to gain IMM approval takes significant time and requires
a history of backtesting which can be very costly to implement.
During the financial crisis, roughly two-thirds of losses attributed to coun-
terparty credit were due to CVA losses and only about one-third were due
to actual defaults. Under Basel II, the risk of counterparty default and credit
migration risk was addressed, for example, in credit VaR, but mark-to-market
losses due to CVA were not. This has led to a Basel III proposal that considers
fairly severe capital charges against CVA, with relatively large multipliers on
risk factors used in the CVA RCR calculation, compared with similar formulas
used in the trading book RCR for market risk.
The added severity of CVA capital requirements demonstrates a lack of
alignment between the trading book and CVA in terms of RCR, which is
intentional to compensate for structural differences in the calculations. As
CVA is a nonlinear portfolio problem, the computational expense is much
greater, often requiring complex simulations, compared with the simpler val-
uations for instruments in the trading book. The standard calculations of risk
involve revaluation for bumping each risk factor. The relative expense of CVA
calculations means that in practice fewer risk factors are available, and thus
capital requirements must have larger multipliers to make up for the lack of
risk sensitivity.
However, one technology in particular has emerged to level the playing
field and facilitate better alignment between market risk and CVA capital
requirements. AD, described in Section 1.3.2, reduces the cost of risk fac-
tor calculations by eliminating the computational expense of adding addi-
tional risk factors. With AD applied to CVA pricing, the full risk sensi-
tivity of CVA capital requirements can be achieved without significantly
increasing the computational effort, although the practical use of AD can
involve significant challenges in the software implementation of CVA pric-
ing.
1.3 Technology
There are a multitude of technology elements supporting derivative pricing
ranging from hardware (see Section 1.3.1) supporting the calculation to soft-
ware technology such as algorithmic adjoint differentiation (see Section 1.3.2).
In the discussion below, we focus predominantly on the computational
demand related to derivative pricing and general accounting and regulatory
capital calculations, and less on the needs related to high-frequency applica-
tions including the collocation of hardware on exchanges.
1.3.1 Hardware
Choice of hardware is a critical element in supporting the pricing, risk
management, accounting and regulatory reporting requirements of financial
institutions. In the years prior to the financial crisis, the pricing and risk man-
agement have typically been facilitated by in-house developed pricing libraries
using an object-oriented programming language (C++ being a popular choice)
on regular CPU hardware and respective distributed calculation grids.
of profiling tools to optimize at a low level, often very close to the hardware,
leveraging features such as larger processor caches and vectorization within
FPUs, and second, the choice of algorithms to perform the required calcu-
lations. We might consider a third category to be those concerns relevant
to distributed computation: network latency and bandwidth, and marshaling
between wire protocols and different execution environments in a heteroge-
neous system. For optimal performance, a holistic approach is required where
both local and global considerations are taken into account.
The impact of algorithm choice on performance is perhaps nowhere more
dramatic than in the context of calculating sensitivities, or greeks, for a port-
folio. Current proposed regulatory guidance in the form of ISDA’s Standard
Initial Margin Model (SIMM) and the market risk component of the FRTB
both require portfolio sensitivities as input parameters, and sensitivity calcu-
lation is a standard component of typical nightly risk runs for any trading
desk interested in hedging market risk. The widespread approach to calculat-
ing such sensitivities is that of finite difference, or “bumping,” in which each
risk factor (typically a quote for a liquid market instrument) is perturbed,
or bumped, by a small amount, often a basis point, and the portfolio reval-
ued in order to assess the effect of the change. This process is repeated for
each risk factor of interest, although in order to minimize computational cost,
it is common to bump collections of quotes together to measure the effect
of a parallel shift of an entire interest rate curve or volatility surface, for
example.
Owing to the linear scaling of the computational burden of applying finite
difference to a diverse portfolio with exposure to many market risk factors,
sensitivity calculation has traditionally been expensive. This expense is mit-
igated by the embarrassingly parallel nature of the problem, which means
long-running calculations can be accelerated through investment in computa-
tional grid technology and the associated software, energy and infrastructure
required to support it.
With an alternative algorithmic approach, however, the need for portfolio
revaluation can be eliminated, along with the linear scaling of cost with num-
ber of risk factors. This approach is called algorithmic differentiation (AD).
While popularized in finance relatively recently (Sherif, 2015), AD is not new.
It has been applied in a wide range of scientific and engineering fields from
oceanography to geophysics since the 1970s (Ng and Char, 1979; Galanti and
Tziperman, 2003; Charpentier and Espindola, 2005). It is based on encod-
ing the operations of differential calculus for each fundamental mathemati-
cal operation used by a given computation, and then combining them using
the chain rule when the program runs (Griewank, 2003). Consider the func-
tion h : Rm → Rn formed by composing the functions f : Rp → Rn and
g : Rm → Rp such that
y = h(x) = f (u)
u = g(x) (1.4)
Computationally Expensive Problems in Investment Banking 19
[Link] Performance
Source code transformation is a disruptive technique that complicates build
processes and infrastructure. In practice, operator overloading is more popu-
lar, but it suffers from high storage costs and long run times, with the result
that numerous implementation techniques have been devised to mitigate the
performance challenges inherent with AD and it remains an active area of
research (e.g., Faure and Naumann, 2002). The fundamental barrier to per-
formance in AD is the granularity at which the chain rule is implemented.
When using tools to apply operator overloading to an existing codebase, the
granularity is defined by the operators that are overloaded, which is typically
very fine. Any per-operator overhead is multiplied by the complexity of the
calculation, and for even modestly sized vanilla portfolios, this complexity is
considerable. If instead the Jacobians can be formed at a higher level, so that
each one combines multiple functional operations (multiple links in the chain,
such as f and g together), then per-operator overhead is eliminated for each
group and performance scales differently (Gibbs and Goyder, 2013b).
In addition, it is not necessary to construct an entire Jacobian at any level
in a calculation and hold it in memory as a single object. To illustrate, suppose
Equation 1.4 takes the concrete form
h(x, y, z) = αx + βyz = f (x, g(y, z)) (1.6)
where f (a, b) = αa + βb and g(a, b) = ab where all variables are ∈ R and α
and β are considered constants. By differentiating,
∂h ∂h ∂h
dh = dx + dy + dz
∂x ∂z ∂z
∂f ∂f ∂g ∂f ∂g
= dx + dy + dz
∂x ∂g ∂y ∂g ∂z
= α dx + βz dy + βy dz. (1.7)
Computationally Expensive Problems in Investment Banking 21
The Jacobians are A = (α, β) and B = (b, a), however, in the chain rule
above, only factors such as (∂f /∂g)(∂g/∂y) are required to determine the
sensitivity to a given variable, such as y. These factors comprise subsets of
each Jacobian and turn out to be exactly those subsets that reside on the
stack when typically implementations of hand-coded AD are calculating the
sensitivity to a given risk factor (Gibbs and Goyder, 2013b).
Implementing AD “by hand” in this way also enables a number of opti-
mizations which further reduce both storage costs and increase speed of com-
putation. Deeply recursive structures arise naturally in a financial context,
in path-dependent payoffs such as those found in contracts with accumula-
tor features, and in curve bootstrapping algorithms where each maturity of,
say, expected Libor depends on earlier maturities. Naive implementations of
AD suffer from quadratic scaling in such cases, whereas linear performance is
possible by flattening each recursion.
In Monte Carlo simulations, only the payoff and state variable modeling
components of a portfolio valuation change on each path. Curves, volatility
surfaces, and other model parameters do not, and so it is beneficial to calcu-
late sensitivities per path to state variables and only continue the chain rule
through the relationship between model parameters and market data (based
on calibration) once. In contrast, for certain contracts such as barrier options,
it is common to evaluate a volatility surface many times, which would result
in a large Jacobian. Instead, performance will be improved if the chain rule
is extended on each volatility surface evaluation through the parameters on
which the surface depends. This set of parameters is typically much smaller
than the number of volatility surface evaluations.
While the value of implementing AD by hand is high, so is the cost. A good
rule of thumb is that implementing AD costs, on average, is approximately
the same as implementing the calculation whose sensitivities are desired.
The prospect of doubling investment in quantitative development resources
is unwelcome for almost any financial business, and consequently implemen-
tations of AD tend to be patchy in their coverage, restricted to specific types
of trade or other bespoke aspects of a given context.
[Link] Coverage
When evaluating value, risk, and similar metrics, it is important to have a
complete picture. Any hedging, portfolio rebalancing, or other related decision
making that is performed on incomplete information will be exposed to the
slippage due to the difference between the actual position and the reported
numbers used. Whilst some imperfections, such as those imposed by model
choice, are inevitable, there is no need or even excuse to introduce oth-
ers through unnecessary choices such as an incomplete risk implementation.
This can be easily avoided by adopting a generic approach that enables the
necessary computations to be easily implemented at every step of the risk
calculation.
22 High-Performance Computing in Finance
For complex models, the chain rule of differentiation can and does lead
to the mixing of exposures to different types of market risk. For example,
volatility surfaces specified relative to the forward are quite common and
lead to contributions of the form (∂C/∂σ)((∂σ (F, K))/∂F ) to the delta of
an option trade through the “sticky smile.” This mixing obviously increases
with model complexity and consequently any generic approach has to confront
it directly. Imposing the simple requirement that all first-order derivatives,
rather than just some targeted towards a specific subset of exposures, are
always calculated as part of the AD process allows an implementation to
avoid missing these risk contributions.
It is possible to separate valuation and related calculations into a chain
of logical steps with a high level of commonality and reusability for each link
in the chain. In general, the steps are the calibration of model parameters
to input observables such as market data quotes, the setup of simulation or
similar calculations, specific pricing routines for classes of financial contracts,
and portfolio-level aggregation of the values calculated per-trade. These links
in the chain can be separately built up and independently maintained to form
a comprehensive analytic library that can then be used to cover essentially all
valuation, risk, and related calculations.
Within each of these steps, the dependency tree of calculations has to be
checked and propagated through the calculation. By carefully defining how
these steps interact with each other, a high level of reuse can be achieved.
As well as enabling the use of pricing algorithms within model calibration,
a key consideration when setting up complex models, the nested use of such
calculations is important for analyzing the important class of contracts with
embedded optionality. In particular, the presence of multiple exercise oppor-
tunities in American and Bermudan options necessitates some form of approx-
imation of future values when evaluating the choices that can be made at each
of these exercise points. Typically, this approximation is formed by regressing
these future values against explanatory variables known at the choice time.
Standard methods such as maximum likelihood determine the best fit through
minimizing some form of error function; the act of minimization means that
first-order derivatives are by construction zero. Note that this observation
applies on average to the particular values (Monte Carlo paths, or similar)
used in the regression. Subsequent calculations are exposed to numerical noise
when a different set of values are used in an actual calculation, and this noise
extends to the associated risk computations.
1.4 Conclusion
This chapter provides a comprehensive overview of computational
challenges in the history of derivative pricing. Before the 2008 credit crisis,
the main driver of computational expense was the complexity of sophisticated
contracts and associated market risk management. In the wake of the crisis,
Computationally Expensive Problems in Investment Banking 23
counterparty credit risk came to the fore, while contracts simplified. Paradox-
ically, incorporating counterparty risk into pricing for even vanilla derivative
portfolios presents computational problems more challenging than for the most
complex precrisis contracts.
In response to this challenge, investment in specialized computing hard-
ware such as GPU and FPGA, along with the rising tide of commodity com-
pute power available in clouds, rose sharply in the last decade. However,
specialized hardware requires specialized software development to realize its
power, which in turn limits the flexibility of systems and the speed at which
they are able to cope with changing market and regulatory requirements.
Commodity hardware, particularly if managed in-house, can incur nontrivial
infrastructure and maintenance costs.
For calculating sensitivities, a key input to both market risk management
and regulatory calculations for initial margin and capital requirements, a soft-
ware technique called AD offers the potential to reduce hardware requirements
dramatically. While popularized only recently, AD is an old idea. Tools for
applying AD to existing codebases exist but suffer from performance draw-
backs such that the full potential of the technique, for both performance and
coverage of contract and calculation types, is only realized when AD is incor-
porated into a system’s architecture from the beginning.
References
Acerbi, C. and Szkely, B.: Back-testing expected shortfall. Risk Magazine (December
2014).
Albanese, C. and Andersen, L.: Accounting for OTC Derivatives: Funding Adjust-
ments and the Re-Hypothecation Option. Available at SSRN: [Link]
abstract=2482955 or [Link] accessed March
10, 2017. 2014.
Albanese, C., Andersen, L., and Stefano, I.: The FVA Puzzle: Accounting, Risk
Management and Collateral Trading. Available at SSRN: [Link]
abstract=2517301 or [Link] accessed March
10, 2017. 2014.
Brigo, D., Buescu, C., and Morini, M.: Counterparty risk pricing: Impact of close-
out and first-to-default times. International Journal of Theoretical and Applied
Finance, 15(6): 2012.
Brigo, D., Morini, M., and Pallavicini, A.: Counterparty Credit Risk, Collateral and
Funding. John Wiley & Sons, UK, 2013.
Elouerkhaoui, Y.: From FVA to KVA: Including cost of capital in derivatives pricing.
Risk Magazine (March 2016).
Faure, C. and Naumann, U.: Minimizing the tape size. In Corliss G., Faure C.,
Griewank A., Hascoët L., and Naumann U., editors, Automatic Differentiation
of Algorithms: From Simulation to Optimization, Computer and Information
Science, Chapter 34, pages 293–298. Springer, New York, NY, 2002.
Gibbs, M. and Goyder, R.: Automatic Numeraire Corrections for Generic Hybrid
Simulation. Available at SSRN: [Link] or http://
[Link]/10.2139/ssrn.2311740, accessed August 16. 2013a.
Green, A.: XVA: Credit, Funding and Capital Valuation Adjustments. John Wiley
& Sons, UK, 2016.
Gregory, J.: Being two faced over counterparty credit risk. Risk, 22(2):86–90, 2009.
Gregory, J.: The xVA Challenge. John Wiley & Sons, UK, 2015.
Ng, E. and Char, B. W.: Gradient and Jacobian computation for numerical appli-
cations. In Ellen Golden V., editor, Proceedings of the 1979 Macsyma User’s
Conference, pages 604–621. NASA, Washington, D.C., June 1979.
Ramirez, J.: Accounting for Derivatives: Advanced Hedging under IFRS 9. John
Wiley & Sons, UK, 2015.
Sherif, N.: Chips off the menu. Risk Magazine, 27(1): 12–17, 2015.
Watt, M.: Corporates rear CVA charge will make hedging too expensive. Risk,
October Issue 2011.
Chapter 2
Using Market Sentiment to Enhance
Second-Order Stochastic Dominance
Trading Models
CONTENTS
2.1 Introduction and Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.1.1 Enhanced indexation applying SSD criterion . . . . . . . . . . . 26
2.1.2 Revising the reference distribution . . . . . . . . . . . . . . . . . . . . . . 28
2.1.3 Money management via “volatility pumping” . . . . . . . . . . 29
2.1.4 Solution methods for SIP models . . . . . . . . . . . . . . . . . . . . . . . 29
2.1.5 Guided tour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.2 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.2.1 Market data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.2.2 News meta data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.3 Information Flow and Computational Architecture . . . . . . . . . . . . . 31
2.4 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.4.1 Impact measure for news . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
[Link] Sentiment score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
[Link] Impact score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.4.2 Long–short discrete optimization model based on SSD . . 34
2.4.3 Revision of reference distribution . . . . . . . . . . . . . . . . . . . . . . . 35
2.5 Trading Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.5.1 Base strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
[Link] Strategy A: Full asset universe . . . . . . . . . . . . . . . 37
2.5.2 Using relative strength index as a filter . . . . . . . . . . . . . . . . . 37
[Link] Strategy B: Asset filter relative strength
index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.5.3 Using relative strength index and impact as filters . . . . . 38
[Link] Strategy C: Asset filter relative strength
index and impact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.5.4 A dynamic strategy using money management . . . . . . . . . 38
2.6 Solution Method and Processing Requirement . . . . . . . . . . . . . . . . . . 39
2.6.1 Solution of LP and SIP models . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.6.2 Scale-Up to process larger models . . . . . . . . . . . . . . . . . . . . . . . 41
25
26 High-Performance Computing in Finance
2.7 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.8 Conclusion and Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
between the portfolio and index returns. Other methods have been proposed
(for a review of these methods, see Beasley et al. 2003; Canakgoz and Beasley
2008).
The passive portfolio strategy of index tracking is based on the
well-established “Efficient Market Hypothesis” (Fama, 1970), which implies
that financial indices achieve the best returns over time. Enhanced indexa-
tion models are related to index tracking in the sense that they also consider
the return distribution of an index as a reference or benchmark. However,
they aim to outperform the index by generating “excess” return (DiBar-
tolomeo, 2000; Scowcroft and Sefton, 2003). Enhanced indexation is a very
new area of research and there is no generally accepted portfolio construc-
tion method in this field (Canakgoz and Beasley, 2008). Although the idea of
enhanced indexation was formulated as early as 2000, only a few enhanced
indexation methods were proposed later in the research community; for
a review of this topic see Canakgoz and Beasley (2008). These methods
are predominantly concerned with overcoming the computational difficulty
that arises due to restriction on the cardinality of the constituent assets
in the portfolios. Not much consideration is given to answering the ques-
tion if they do attain their stated purpose, that is, achieve return in excess
of the index.
In an earlier paper (Roman, Mitra, and Zverovich, 2013), we have pre-
sented extensive computational results illustrating the effective use of the
SSD criterion to construct “models of enhanced indexation.” SSD dominance
criterion has been long recognized as a rational criterion of choice between
wealth distributions (Hadar and Russell 1969; Bawa 1975; Levy 1992). Empir-
ical tests for SSD portfolio efficiency have been proposed in Post (2003) and
Kuosmanen (2004). In recent times, SSD choice criterion has been proposed
(Dentcheva and Ruszczynski 2003, 2006, Roman et al. 2006) for portfolio
construction by researchers working in this domain. The approach described
in Dentcheva and Ruszczynski (2003, 2006) first considers a reference (or
benchmark) distribution and then computes a portfolio, which dominates the
benchmark distribution by the SSD criterion. In Roman et al. (2006), a multi-
objective optimization model is introduced to achieve SSD dominance. This
model is both novel and usable since, when the benchmark solution itself is
SSD efficient or its dominance is unattainable, it finds an SSD-efficient port-
folio whose return distribution comes close to the benchmark in a satisfying
sense. The topic continues to be researched by academics who have strong
interest in this approach: Dentcheva and Ruszczynski (2010), Post and Kopa
(2013), Kopa and Post (2015), Post et al. (2015), Hodder, Jackwerth, and
Kolokolova (2015), Javanmardi and Lawryshy (2016). Over the last decade,
we have proposed computational solutions and applications to large-scale
applied problems in finance (Fábián et al. 2011a, 2011b; Roman et al. 2013;
Valle et al. 2017).
From a theoretical perspective, enhanced indexation calls for further
justification. The efficient market hypothesis (EMH) is based on the key
assumption that security prices fully reflect all available information. This
28 High-Performance Computing in Finance
hypothesis, however, has been continuously challenged; the simple fact that
academics and practitioners commonly use “active,” that is, non-index track-
ing strategies, vindicates this claim. An attempt to reconcile the advocates
and opponents of the EMH is the “adaptive market hypothesis” (AMH) put
forward by Lo (2004). AMH postulates that the market “adapts” to the infor-
mation received and is generally efficient, but there are periods of time when
it is not; these periods can be used by investors to make profit in excess of the
market index. From a theoretical point of view, this justifies the quest for tech-
niques that seek excess return over financial indices. In this sense, enhanced
indexation aims to discover and exploit market inefficiencies. As set out ear-
lier, a common problem with the index tracking and enhanced indexation
models is the computational difficulty which is due to cardinality constraints
that limit the number of stocks in the chosen portfolio. It is well known that
most index tracking models naturally select a very large number of stocks in
the composition of the tracking portfolio. Cardinality constraints overcome
this problem, but they require introduction of binary variables and thus the
resulting model becomes much more difficult to solve. Most of the literature
in the field is concerned with overcoming this computational difficulty. The
good in-sample properties of the return distribution of the chosen portfolios
have been underlined in previous papers: Roman, Darby-Dowman, and Mitra
(2006), using historical data; Fábián et al. (2011c), using scenarios generated
via geometric Brownian motion.
However, it is the actual historical performance of the chosen portfolios
(measured over time and compared with the historical performance of the
index) that provides empirical validation of whether the models achieved their
stated purpose of generating excess return.
We also investigate aspects related to the practical application of portfolio
models in which the asset universe is very large; this is usually the case in
index tracking and enhanced indexation models. It has been recently shown
that very large SSD-based models can be solved in seconds, using solution
methods which apply the cutting-plane approach, as proposed by Fábián et al.
(2011a). Imposing additional constraints that add realism (e.g., cardinality
constraints, normally required in index tracking) increase the computational
time dramatically.
market sentiment. For a full description of this method, see Valle, Roman, and
Mitra (2017; also Section 2.4.3) and Mitra, Erlwein-Sayer, and Roman (2017).
2.2 Data
Our modelling architecture uses two streams of time-series data: (i) market
data which is given on a daily frequency, and (ii) news metadata as supplied
by Thomson Reuters. A detailed description of these datasets is given below.
Revision of
Market data reference
distribution SSD static
Money Trade
asset
management signals
allocation
Impact
News data measure
FortSP
2.4 Models
2.4.1 Impact measure for news
In our analytical model for news we introduce two concepts, namely, (i)
sentiment score and (ii) impact score. The sentiment score is a quantification
of the mood (of a typical investor) with respect to a news event. The impact
score takes into consideration the decay of the sentiment of one or more news
events and how after aggregation these impact the asset behavior.
where Sent denotes a single transformed sentiment score. We find that such
a derived single score provides a relatively better interpretation of the mood
of the news item. Thus, the news sentiment score is a relative number that
describes the degree of positivity and negativity in a piece of news. During
the trading day, as news arrives it is given a sentiment value. Given that
−50 ≤ Sent ≤ 50, for a given news item k at the time bucket tk , we define
P News(k, tk ) and N News(k, tk ) as the sentiments of the kth news (see the
following section).
a news item does not solely have an effect on the markets at the time of
release; the impact also persists over finite periods of time that follow. To
account for this prolonged impact, we have applied an attenuation technique
to reflect the instantaneous impact of news releases and the decay of this
impact over a subsequent period of time. The technique combines exponential
decay and accumulation of the sentiment score over a given time bucket under
observation. We take into consideration the attenuation of positive sentiment
to the neutral value and the rise of negative sentiment also to the neutral
value and accumulate (sum) these sentiment scores separately. The separation
of the positive and negative sentiment scores is only logical as this avoids
cancellation effects. For instance, cancellation reduces the news flow and an
exact cancellation leads to the misinterpretation of no news.
News arrives asynchronously; depending on the nature of the sentiment it
creates, we classify these into three categories, namely: positive, neutral, and
negative. For the purpose of deriving impact measures, we only consider the
positive and negative news items.
Let,
POS denote the set of news with positive sentiment value Sent > 0
NEG denote the set of news with negative sentiment value Sent < 0
P News(k, tk ) denote the sentiment value of the kth positive news arriving
at time bucket tk , 1 ≤ tk ≤ 630 and k ∈ POS; P News(k, tk ) > 0
N News(k, tk ) denote the sentiment value of the kth negative news arriving
at time bucket tk , 1 ≤ tk ≤ 630 and k ∈ NEG; N News(k, tk ) < 0
Let λ denote the exponent which determines the decay rate. We have
chosen λ such that the sentiment value decays to half the initial value in a 90
minute time span. The cumulated positive and negative sentiment scores for
one day are calculated as
P Impact(t) = P N ews(k, tk )e−λ(t−1) (2.2)
k ∈ P OS
tk ≤ t
N Impact(t) = N N ews(k, tk )e−λ(t−1) (2.3)
k ∈ N EG
tk ≤ t
In Equations 2.2 and 2.3 for intraday P Impact and N Impact, t is in the range,
t = 1, . . . , 630. On the other hand for a given asset all the relevant news items
which arrived in the past, in principle, have an impact for the asset. Hence,
the range of t can be widened to consider past news, that is, news which are
2 or more days “old.”
A bias exists for those companies with a bigger market capitalization
because they are covered more frequently in the news. For the smaller
34 High-Performance Computing in Finance
companies within a stock market index, their press coverage will be less and
therefore, there is fewer data points to work with.
n
x−
i =α (2.6)
i=1
x+ +
i ≤ (1 + α)zi ∀i ∈ N (2.7)
x−i ≤ αzi
−
∀i ∈ N (2.8)
zi+ + zi− ≤ 1 ∀i ∈ N (2.9)
1
n
s
V + τ̂s ≤ rij (x+ −
i − xi ) ∀Js ⊂ {1, . . . , S}, |Js | = s, s = {1, . . . , S}
S S i=1 j∈Js
(2.10)
V ∈ R, x+ −
i , xi ∈R +
, zi+ , zi− ∈ {0, 1} ∀i ∈ N (2.11)
0.25
Density
Improved distribution
0.20 Reference distribution
0.15
Density
0.10
0.05
0.00
−20 −10 0 10 20
Values
FIGURE 2.2: Density curves for the original and improved reference
distributions.
where α(i) = λe−λ∗i is the exponential weight, Xt is the price of the asset and
Lt = (Xt − Xt−days )I{Xt −Xt−days <0} is the loss at time t with days an offset
of chosen number of days. Analogous, we calculate the average gain as
1
n
EM A(Gaint ) = α(i)Gt−i ,
n i=0
where Gt = (Xt − Xt−days )I{Xt −Xt−days ≥0} is the gain at time t. Typical RSI
values are calculated with average gains and losses over a period of 14 days,
38 High-Performance Computing in Finance
often called the lookback period. In the literature, the RSI is considered to
highlight overbought assets, that is when the RSI is above the threshold of
70, and oversold, that is when it is below the threshold of 30.
In our application, we compute the RSI value for each available asset and
flag assets as potential long candidates, if the RSI is below 30 and as potential
short candidates, if the RSI is above 70. If, on the other hand, the RSI value
is between 30 and 70, the asset is not considered to be a portfolio constituent.
By doing so, we restrict the available asset universe compared to the above
base-case strategy A.
SSD criterion and also applying the principle of money management. Since
our SSD model has long–short positions in risky assets we also determine the
long and short exposures dynamically but staying within regulatory regimes
such as “Regulation T” stipulated by the US regulators. Thus if the strategy
is to have (100 + alpha)/alpha; that is, (100 + alpha) long and (alpha) short
exposures, then we simply control this adaptively by a limit (alpha-max) such
that (alpha) <= (alpha-max). The actual settings of the money management
parameter and the parameter “alpha-max” in our experiments are discussed
in Section 2.7 where we report results of our experiments.
a parallel node may have to run many executions of the separation algorithm
until it finds no violated cut for that linear relaxation solution. Perhaps, if
that node was executed after a previous one, the cut found before could have
prevented several of these executions.
Thus in an efficient implementation of branch-and-cut algorithm parallel
processors share a collection of information which comprise current bounds,
pending node priorities, and violated cuts. This is a suitable setting for the
classical master-slaves, or centralized control, strategy. In this case there would
be a dedicated master process handling the queue, fathoming nodes, updating
priorities and controlling cuts, all based on the arrival of asynchronous infor-
mation. Slave processes would be responsible for solving the highest priority
pending tree nodes. Thus, the master process maintains global knowledge and
controls the entire search, while slave processes receive pending nodes from
the master processor, solve their linear relaxations and attempt to find new
violated cuts. Finally, the slave returns information to the master processor.
The higher the number of cuts in the model, the slower is the resolution
of a linear relaxation in a particular node. In a branch-and-cut setting, the
master processor should also maintain a cut pool with a list of previously
found cuts. The master processor must handle a few tasks, listed below:
• Check the cut pool in order to identify “tight” and “slack” cuts, that
is, the master processor must identify cuts that are likely (unlikely) to
change the value of a linear relaxation solution.
• Receive newly identified cuts from slave processors and add them to the
pool.
2.7 Results
We use real-world historical daily data (adjusted closing prices) taken
from the universe of assets defined by the Financial Times Stock Exchange
Using Market Sentiment to Enhance SSD Trading Models 43
100 (FTSE100) index over the period October 9, 2008 to November 1, 2016
(1765 trading days). The data were collected from Thomson Reuters Data
Stream platform and adjusted to account for changes in index composition.
This means that our models use no more data than was available at the time,
removing susceptibility to the influence of survivor bias. For each asset, we
compute the corresponding daily rates of return. The original benchmark dis-
tribution is obtained by considering the historical daily rates of return of
FTSE100 during the same time period.
The methodology we adopt is successive rebalancing over time with recent
historical data as scenarios. We start from the beginning of our data set.
Given in-sample duration of S days, we decide a portfolio using data taken
from an in-sample period corresponding to the first S + 1 days (yielding S
daily returns for each asset). The portfolio is then held unchanged for an
out-of-sample period of 5 days. We then rebalance (change) our portfolio, but
now using the most recent S returns as in-sample data. The decided portfolio
is then again held unchanged for an out-of-sample period of 5 days, and the
process repeats until we have exhausted all of the data. We set S = 1200; the
total out-of-sample period spans slightly more than 6 years (October 1, 2010
to October 26, 2010).
Once the data have been exhausted we have a time series of 1532 portfolio
return values for out-of-sample performance, here from period 1201 (the first
out-of-sample return value, corresponding to January 1, 2010) until the end
of the data.
Portfolios are rebalanced every 5 days, for each experiment we solve 307
instances of the long–short SSD formulation described in Section 2.4.2, each
corresponding to a single rebalance.
For every experiment, we set α = 0.2, that is, portfolios can have a long–
short exposure of up to 120/20. The strategies below also apply a money
management technique. That is, at every day, the percentage of the portfolio
mark-to-market value invested in risky assets is fixed at 75%, the remaining
25% being invested in a risk-free investment of 2% a year. Hence, the SSD
strategy itself is rebalanced every 5 days in order to bring the portfolio to
desired proportions.
Figure 2.3 shows portfolio paths for the three different strategies A, B, and
C as well as the Financial Times Stock Exchange 100 Index (FTSE100) over
the period from October 1, 2010 to October 26, 2016. The strategies all invest
in a subset of the companies listed on the FTSE100, where the actual asset
universe is defined by the asset universe filter stated above. The FTSE100
index is shown in solid black; the strategies A, B, and C are shown in dashed
dark-grey, dashed light-grey and solid light-grey, respectively.
All strategies outperform the FTSE100 index in the period considered.
We can also see that strategies B and C, where we filter the asset universe,
outperformed strategy A. Table 2.3 shows selected performance statistics,
namely:
44 High-Performance Computing in Finance
Portfolio strategies
2.5 FTSE100
Strategy A
Strategy B
Strategy C
2.0
Portfolio values
1.5
1.0
• Final value: Normalized final value of the portfolio at the end of the
out-of-sample period.
• Excess over RFR (%): Annualized excess return over the risk-free rate.
For FTSE100, we used a yearly risk-free rate of 2%.
Both strategies B and C had higher returns and quicker recovery rates
when compared to strategy A. Strategies B and C have very similar statistics.
Overall, reducing the asset universe via strategies B and C allows us to improve
returns and reduce our risk exposure. However, that comes at a cost of a higher
correlation to the market itself (a higher beta) and a higher average turnover.
The latter is due to the asset universe in two consecutive rebalances being
potentially different; in such cases we may need to liquidate current positions
in assets that are not included in the current asset universe.
Our back testing results for NIKKEI 250 are in line with the FTSE results
stated above. Strategies B and C lead to improved performance measures, the
consideration of technical analysis combined with news sentiment results to
desired portfolio properties.
Acknowledgments
We gratefully acknowledge the contribution of Tilman Sayer to this
work. While working at OptiRisk, Tilman participated in this project and
46 High-Performance Computing in Finance
References
Bawa, V. S. Optimal rules for ordering uncertain prospects. Journal of Financial
Economics, 2(1):95–121, 1975.
Beasley, J. E., Meade, N., and Chang, T. J. An evolutionary heuristic for the index
tracking problem. European Journal of Operational Research, 148(3):621–643,
2003.
Fábián, C. I., Mitra, G., and Roman, D. Processing second-order stochastic domi-
nance models using cutting-plane representations. Mathematical Programming,
130(1):33–57, 2011a.
Fábián, C. I., Mitra, G., and Roman, D. An enhanced model for portfolio choice with
SSD criteria: A constructive approach. Quantitative Finance, 11(10):1525–1534,
2011b.
Fábián, C. I., Mitra, G., and Roman, D. Portfolio choice models based on second-
order stochastic dominance measures: An overview and a computational study.
In: Bertocchi M., Consigli G., Dempster M. (eds). Stochastic Optimization Meth-
ods in Finance and Energy. International Series in Operations Research & Man-
agement Science, Springer, New York, NY. vol. 163 pp. 441–469, 2011c.
Fama, E. F. Efficient capital markets: A review of theory and empirical work. The
Journal of Finance, 25(2):383–417, 1970.
Hadar, J. and Russell, W. R. Rules for ordering uncertain prospects. The American
Economic Review, 59(1):25–34, 1969.
Using Market Sentiment to Enhance SSD Trading Models 47
Hodder, J. E., Jackwerth, J. C., and Kolokolova, O. Improved portfolio choice using
second-order stochastic dominance. Review of Finance, 19(1):1623–1647, 2015.
Kopa, M. and Post, T. A general test for SSD portfolio efficiency. OR Spectrum,
37(1):703–734, 2015.
Levy, H. Stochastic dominance and expected utility: Survey and analysis. Manage-
ment Science, 38(4):555–593, 1992.
Lo, A.W. The adaptive markets hypothesis. The Journal of Portfolio Management,
30(5):15–29, 2004.
Patton, A. J. and Verardo, M. Does beta move with news? Firm-specific informa-
tion flows and learning about profitability. The Review of Financial Studies,
25(9):2789–2839, 2012.
Post, T. Empirical tests for stochastic dominance efficiency. The Journal of Finance,
58(5):1905–1931, 2003.
Post, T., Fang, Y., and Kopa, M. Linear tests for DARA stochastic dominance.
Management Science, 61(1):1615–1629, 2015.
Roman, D., Mitra, G., and Zverovich, V. Enhanced indexation based on second-order
stochastic dominance. European Journal of Operational Research, 228(1):273–
281, 2013.
Talbi, E. G. Parallel Combinatorial Optimization. John Wiley & Sons, USA, 2006.
Thorp, E. O. Beat the Dealer: A Winning Strategy for the Game of Twenty-One.
Random House, New York, NY, 1966.
Thorp, E. O. and Kassouf, S. T. Beat the Market: A Scientific Stock Market System.
Random House, New York, NY, 1967.
Valle, C. and Mitra, G. News Analytics Toolkit User Manual, OptiRisk Systems,
London, UK, 2004. available online: [Link]
NAToolkit User [Link].
Valle, C. A., Roman, D., and Mitra, G. Novel approaches for portfolio construction
using second order stochastic dominance. Computational Management Science,
DOI 10.1007/s10287-017-0274-9, 2017.
Welles Wilder, J. New Concepts in Technical Trading Systems. Trend Research, UK,
1978.
Chapter 3
The Alpha Engine: Designing an
Automated Trading Algorithm
CONTENTS
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.1.1 Asset management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.1.2 The foreign exchange market . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.1.3 The rewards and challenges of automated trading . . . . . . 51
3.1.4 The hallmarks of profitable trading . . . . . . . . . . . . . . . . . . . . . 52
3.2 In a Nutshell: Trading Model Anatomy and Performance . . . . . . . 53
3.3 Guided by an Event-Based Framework . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.3.1 The first step: Intrinsic time . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.3.2 The emergence of scaling laws . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.3.3 Trading models and complexity . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.3.4 Coastline trading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.3.5 Novel insights from information theory . . . . . . . . . . . . . . . . . 62
3.3.6 The final pieces of the puzzle . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.4 The Nuts and Bolts: A Summary of the Alpha Engine . . . . . . . . . 67
3.5 Conclusion and Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Appendix 3A A History of Ideas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Appendix 3B Supplementary Material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.1 Introduction
The asset management industry is one of the largest industries in modern
society. Its relevance is documented by the astonishing amount of assets that
are managed. It is estimated that globally there are 64 trillion USD under
management [1]. This is nearly as big as the world product of 77 trillion
USD [2].
49
50 High-Performance Computing in Finance
rules. These markets are an order of magnitude bigger than futures or equity
markets [7].
In contrast to other financial markets, where asset prices are quoted in
reference to specific currencies, exchange rates are symmetric: quotes are cur-
rencies in reference to other currencies. The symmetry of one currency against
another neutralizes effects of trend, which are a significant drivers in other
markets, such as stock markets. This property of symmetry makes currency
markets notoriously hard to trade profitably.
We focus on the foreign exchange market for the development of our trad-
ing model algorithm. Its high liquidity and long/short symmetry make it an
ideal environment for the research and development of fully automated and
algorithmic trading strategies. Indeed, any profitable trading algorithm for
this market should, in theory, also be applicable to other markets.
2 [Link]/news/press-release/2013-222.
The Alpha Engine 53
traders and are therefore well suited to research market behavior [9]. If
agent-based models are fractal, that is, behave in a self-similar manner across
time horizons and only differ with respect to the scaling of their parameters,
the short-term models are a filter for the validity of the long-term models. In
practice, this allows for the short-term agent-based models to be tested and
validated over a huge data sample with a multitude of events. As a result,
the scarcity of data available for the long-term model is not a hindrance of
acceptance if it is self-similar with respect to the short-term models. In effect,
the validation of the model structure for short-term models also implies a
validation for the long-term models, by virtue of the scaling effects. In con-
trast, most standard modeling approaches are typically devised for one time
horizon only and hence there are no self-similar models that complement each
other.
Moreover, the modeling approach should be modular and enable developers
to combine smaller blocks to build bigger components. In other words, models
are built in a bottom-up spirit, where simple building blocks are assembled into
more complex units. This also implies an information flow between building
blocks.
To summarize, our aim is to develop trading models based on parsimo-
nious, self-similar, modular, and agent-based behavior, designed for multiple
time horizons and not purely driven by trend following action. The intellectual
framework unifying these angles of attack is outlined in Section 3.3. The result
of this endeavor is interacting systems that are highly dynamic, robust, and
adaptive; in other words, a type of trading model that mirrors the dynamic
and complex nature of financial markets. The performance of this automated
trading algorithm is outlined in the following section.
In closing, it should be mentioned that transaction costs can represent real-
world stumbling blocks for trading models. Investment strategies that take
advantage of short-term price movements in order to achieve good performance
have higher transaction volumes than longer term strategies. This obviously
increases the impact of transaction costs on the profitability. As far as possible,
it is advisable to use limit orders to initiate trades. They have the advantage
that the trader does not have to cross the spread to get his order executed, thus
reducing or eliminating transaction costs. The disadvantage of limit orders is,
however, that execution is uncertain and depends on buy and sell interest.
• An endogenous time scale called intrinsic time that dissects the price
curve into directional changes and overshoots;
• Patterns called scaling laws that hold over several orders of magnitude,
providing an analytical relationship between price overshoots and direc-
tional change reversals;
• Coastline trading agents operating at intrinsic events, defined by the
event-based language;
The chosen time period is from the beginning of 2006 until the beginning
of 2014, that is, 8 years. The trading model yields an unlevered return of
21.3401%, with an annual Sharp ratio of 3.06, and a maximum drawdown
(computed on daily basis) of 0.7079%. This event occurs at the beginning
of 2013 and lasts approximately 4 months, as the JYP weakens significantly
following the Quantitative Easing program (“three arrows” of fiscal stimulus)
launched by the Bank of Japan.
Figure 3.1 shows the performance of the trading model across all exchange
rates. Table 3A.1 reports the monthly and yearly returns. The difference in
returns among the various exchange rates is explained by volatility: the trading
model reacts only to occurrences of intrinsic time events, which are function-
ally dependent on volatility. Exchange rates with higher volatility will have a
greater number of intrinsic events and hence more opportunities for the model
to extract profits from the market. This behavior can be witnessed during the
The Alpha Engine 55
20%
15%
10%
5%
0%
2006 2008 2010 2012 2014
NZD/JPY GBP/JPY
3% AUD/NZD EUR/CAD
NZD/CAD USD/CAD
EUR/NZD AUD/USD
NZD/USD USD/CHF
EUR/USD
2% AUD/JPY
USD/JPY
GBP/AUD EUR/GBP
CAD/JPY EUR/CHF
EUR/AUD GBP/USD
CHF/JPY EUR/JPY
1% GBP/CAD GBP/CHF
0%
FIGURE 3.1: Daily Profit & Loss of the Alpha Engine, across 23 currency
pairs, for 8 years. See details in the main text of this section and Section 3.4.
while yielding an average yearly profit of 10.05% for the last 4 years. This is
still far from realizing the coastline’s potential, but, in our opinion, a crucial
first step in the right direction.
Finally, we conclude this section by noting that, despite conventional wis-
dom, it is in fact possible to “beat” a random walk. The Alpha Engine pro-
duces profitable results even on time series generated by a random walk,
as seen in Figure 3B.1. This unexpected feature results from the fact that
the model is dissecting Brownian motion into intrinsic time events. Now
these directional changes and overshoots yield a novel context, where a cas-
cading event is more likely to be followed by a de-cascading event than
another cascading one. In detail, the probability of reaching the profitable
de-cascading event after a cascade is 1 − e−1 ≈ 0.63, while the probability
for an additional cascade is about 0.37. In effect, the procedure of trans-
lating a tick-by-tick time series into intrinsic time events skews the odds in
one’s favor—for empirical as well as synthetic time series. For details, see
Reference 11.
In the following section, we will embark on the journey that would ulti-
mately result in the trading model described above. For a prehistory of events,
see Appendix 3A.
2. An overshoot ω [8,13,15].
With these events, every price curve can be dissected into components that
represent a change in the price trend (directional change) and a trend com-
ponent (overshoot). For a directional change to be detected, first an initial
direction mode needs to be chosen. As an example, in an up mode an increas-
ing price move will result in the extremal price being updated and continuously
increased. If the price goes down, the difference between the extremal price
and the current price is evaluated. If this distance (in percent) exceeds the pre-
defined directional change threshold, a directional change is registered. Now
the mode is switched to down and the algorithm continues correspondingly.
If now the price continues to move in the same direction as the directional
change, for the size of the threshold, an overshoot event is registered. As long
as a trend persists, overshoot events will be registered. See Figure 3.2a for
an illustration. Note that two intrinsic time series will synchronize after one
directional change, regardless of the chosen starting direction.
As a result, a price curve is now comprised of segments, made up of a direc-
tional change event δ and one or more overshoots of size ω. This event-based
(a) (b)
Directional change
1.40
Overshoot
1. Overshoot
DC
Mid price
1.38
OS
1.36
1.34
0 10 20 30 40 50 60 70
Events
FIGURE 3.2: (a) Directional change and overshoot events. (b) A coastline
representation of the EUR USD price curve (2008-12-14 22:10:56 to 2008-12-16
21:58:20) defined by a directional change threshold δ = 0.25%. The triangles
represent directional change and the bullets overshoot events.
58 High-Performance Computing in Finance
Directional change
Overshoot
Physical time 1. Overshoot
DC
OS
Threshold
Price
Event time
Threshold 0.2% Threshold 0.4%
Intrinsic time
time series is called the coastline, defined for a specific directional change
threshold. By measuring the various coastlines for an array of thresholds,
multiple levels of event activity can be considered. See Figures 3.2b and 3.3.
This transformed time series is now the raw material for further investigations
[8]. In particular, this price curve will be used as input for the trading model,
as described in Section 3.3.4. With the publication [17], the first decade came
to a close.
1. Scaling-law distributions;
2. Scale-free networks; and
ω ≈ δ. (3.1)
This justifies the procedure of dissecting the price curve into directional change
and overshoot segments of the same size, as seen in Figures 3.2 and 3.3. In
other words, the notion of the coastline is statistically validated.
Scaling laws are a hallmark of complexity and complex systems. They can
be viewed as a universal “law of nature” underlying complex behavior in all
its domains.
In other words, what looks like complex behavior from a distance turns out
to be the result of simple rules at closer inspection. The profundity of this
observation should not be underestimated, as echoed in the words of Stephen
Wolfram, when he was first struck by this realization [30, p. 9]:
Indeed, even some of the very simplest programs that I looked at had
behavior that was as complex as anything I had ever seen. It took
me more than a decade to come to terms with this result, and to
realize just how fundamental and far-reaching its consequences are.
Exposure
Take profit
1%
1%
FIGURE 3.4: Simple rules: The elements of coastline trading. Cascading
and de-cascading trades increase or decrease existing positions, respectively.
Long trade
Short trade
Close long
1
1 Close short
Coastline trading
2
1
1 2
2
1
2 3
1 1
3 3
1
2
1 1 1 1 2 2 2 1 1 3 2 1 3 3 1 2
Each number 2 2
Take profit short
Open short
Increase long
Open long
Open short
Open long
Increase long
Coastline trading
Increase short
corresponds to
1 gets advantage
1 gets advantage
an independent
trading agent
the same fixed increments, coastline trading does not represent a Martingale
strategy. In Figures 3.4 and 3.5, examples of such trading rules are shown.
With these developments, the second decade drew to a close. Led by the
introduction of event-based time, uncovering scaling law relations, the novel
framework could be embedded in the larger paradigm related to the study of
complex systems. The resulting trading models were by construction, auto-
mated, agent-based, contrarian, parsimonious, adaptive, self-similar, and mod-
ular. However, there was one crucial ingredient missing, to render the models
robust and hence profitable in the long term. And so the journey continued.
δ ω
which, as mentioned, is the point-wise entropy that is large when the proba-
bility of transitioning from state si to state sj is small and vice versa. Con-
sequently, the surprise of a price trajectory within a time interval [0, T ], that
has experienced K transitions, is
[0,T ]
K
γK = −logP(sik → sik+1 ). (3.3)
k=1
by virtue of the central limit theorem [38]. In other words, for large K, Δ con-
verges to a normal distribution. Equation 3.4 now allows for the introduction
of our probability indicator L, defined as
[0,T ]
γK − K · H (1)
L=1−Θ √ , (3.6)
K · H (2)
the price trajectory when the price moves by δ in the overshoots’ direction
after a directional change. In the context of the probability indicator, we
depart from this procedure and define the overshoots to occur when the price
moves by 2.525729 · δ. This value comes from maximizing the second-order
informativeness H (2) and guarantees maximal variability of the probability
indicator L. For details, see Reference 11.
The probability indicator L can now be used to navigate the trading mod-
els through times of severe market stress. In detail, by slowing down the
increase of the inventory of agents during price overshoots, the overall trading
models exposure experiences smaller drawdowns and better risk-adjusted per-
formance. As a simple example, when an agent cascades, that is, increases its
inventory, the unit size is reduced in times where L starts to approach zero.
For the trading model, the probability indicator is utilized as follows. The
default size for cascading is one unit (lot). If L is smaller than 0.5, this sizing
is reduced to 0.5, and finally if L is smaller than 0.1, then the size is set to 0.1.
Implementing the above-mentioned measures allowed the trading model
to safely navigate treacherous terrain, where it derailed in the past. However,
there was still one crucial insight missing, before a successful version of the
Alpha Engine could be designed. This last insight evolves around a subtle
recasting of thresholds which has profound effects on the resulting trading
model performance.
μ μ μ
No trend ( = 0) Positive trend ( >> 0) Negative trend ( << 0)
σ2 σ2 σ2
0.1% 0.2% 0.3% 0.4% 0.5%
δdown
δdown
0.1% 0.2% 0.3% 0.4% 0.5% 0.1% 0.2% 0.3% 0.4% 0.5% 0.1% 0.2% 0.3% 0.4% 0.5%
δup δup δup
In Figure 3.7, the result of a Monte Carlo simulation is shown. For the situation
with no trend (left-hand panel), we see the contour lines being perfect circles.
In other words, by following any defined circle, the same number of directional
changes are found for the corresponding asymmetric thresholds. Details about
the analytical expressions and the Monte Carlo simulation regarding the num-
ber of directional changes can be found in Reference 39.
This opens up the space of possibilities, as up to now, only the 45-degree
line in all panels of Figure 3.7 were considered, corresponding to symmetric
thresholds δ = δup = δdown . For trending markets, one can observe a shift
in the contour lines, away from the circles. In a nutshell, for a positive trend
the expected number of directional changes is larger if δup > δdown . This
reflects the fact that an upward trend is naturally comprised of longer up-
move segments. The contrary is true for down moves.
Now it is possible to introduce the notion of invariance as a guiding princi-
ple. By rotating the 45-degree line in the correct manner for trending markets,
the number of directional changes will stay constant. In other words, if the
trend is known, the thresholds can be skewed accordingly to compensate. How-
ever, it is not trivial to construct a trend indicator that is predictive and not
only reactive.
A workaround is found by taking the inventory as a proxy for the trend.
In detail, the expected inventory size I for all agents in normal market con-
ditions can be used to gauge the trend: E[I(δup , δdown )] is now a measure of
trendiness and hence triggers threshold skewing. In other words, by taking the
66 High-Performance Computing in Finance
and
∗∗ ∗∗
U (δdown , δup , I) = U (δdown , δup , I − 1), (3.11)
∗ ∗ ∗∗
where U represents a utility function. The thresholds δup , δdown
and , δup ,
∗∗
δdown are “indifference” thresholds.
A pragmatic implementation of such an inventory-driven skewing of thresh-
olds is given by the following equation, corresponding to a long position
δdown 2 if I ≥ 15;
= (3.12)
δup 4 if I ≥ 30.
(a) (b)
events. In Figure 3.8b, the same events are augmented by asymmetric thresh-
olds. Now ωup = ωdown /4. As a result, each overshoot length is divided into
four segments. The new cascading regime is as follows: increase the position
by one-fourth of a (negative) unit (small arrow) at the directional change and
another fourth at the first, second, and third asymmetric overshoots each. In
effect, the cascading event is “smeared out” and happens in smaller unit sizes
over a longer period. For the cascading events at the first and second original
overshoots, this procedure is repeated.
This concludes the final chapter in the long history of the trading model
development. Many insights from diverse fields were consolidated and a unified
modeling framework emerged.
hallmark cascading and de-cascading events. In other words, the discrete price
curve with occurrences of intrinsic time events triggers an increase or decrease
in position sizes.
In detail, an intrinsic event is either a directional change or a move
of size δ in the direction of the overshoot. For each exchange rate, we
assign four coastline traders CTi [δup/down (i)], i = 1, 2, 3, 4, that operate
at various scales, with upward and downward directional change thresholds
equaling δup/down (1) = 0.25%, δup/down (2) = 0.5%, δup/down (3) = 1.0%, and
δup/down (4) = 1.5%.
The default size for cascading and de-cascading a position is one unit (lot).
The probability indicator Li , assigned to each coastline trader, is evaluated
on the fixed scale δ(i) = δup/down (i). As a result, its states are directional
changes of size δ(i) or overshoot moves of size 2.525729 · δi . The default unit
size for cascading is reduced to 0.5 if Li is smaller than 0.5. Additionally, if
Li is smaller than 0.1, then the size is further reduced to 0.1.
In case a coastline trader accumulates an inventory with a long posi-
tion greater than 15 units, the upward directional change threshold δup (i)
is increased to 1.5 of its original size, while the downward directional change
threshold δdown (i) is decreased to 0.75 of its original size. In effect, the ratio
for the skewed thresholds is δup (i)/δdown (i) = 2. The agent with the skewed
thresholds will cascade when the overshoot reaches 0.5 of the skewed threshold,
that is, half of the original threshold size. In case the inventory with long posi-
tion is greater than 30, then the upward directional change threshold δup (i)
is increased to 2.0 of its original size and the downward directional change
threshold δdown (i) is decreased to 0.5. The ratio of the skewed thresholds
now equals δup (i)/δdown (i) = 4. The agent with these skewed thresholds will
cascade when the overshoot extends by 0.25 of the original threshold, with
one-fourth of the specified unit size. This was illustrated in Figure 3.8b. The
changes in threshold lengths and sizing are analogous for short inventories.
This concludes the description of the trading model algorithm and the
motivation of the chosen modeling framework. Recall that the interested
reader can download the code from GitHub [10].
sciences and their successful implementations in the real world. I argued that
the resilience of the economic and political systems depends on the underlying
economic and political models. Motivated to contribute to the well-being of
society I wanted to work on enhancing economic theory and work on applying
the models.
I first studied law at the University of Zurich and then, in 1979, moved
to Oxford to study philosophy, politics, and economics. In 1980, I attended a
course on growth models by James Mirrlees, who, in 1996, received a Nobel
prize in economics. In his first lecture, he discussed the shortcomings of the
models, such as [40]. He explained that the models are successful in explaining
growth as long as there are no large exogenous shocks. But unanticipated
events are inherent to our lives and the economy at large. I thus started
to search for a model framework that can both explain growth and handle
unexpected exogenous shocks. I spent one year studying the Encyclopedia
Britannica and found my inspiration in relativity theory.
In my 1981 PhD thesis, titled “Interaction Between Law and Society,” at
the University of Zurich, I developed a new model framework that describes
in an abstract language, how interactions in the economy occur. At the core of
the new approach are the concepts of object, system, environment, and event-
based intrinsic time. Every object has its system that comprises all the forces
that impact and influence the object. Outside the system is its environment
with all the forces that do not impact the object. Every object and system
has its own frame of reference with an event-based intrinsic time scale. Events
are interactions between different objects and their systems. I concluded that
there is no abstract and universal time scale applicable to every object. This
motivated me to think about the nature of time and how we use time in our
everyday economic models.
After finishing my studies, I joined a bank working first in the legal depart-
ment, then in the research group, and finally joined the foreign exchange trad-
ing desk. My goal was to combine empirical work with academic research, but
was disappointed with the pace of research at the bank. In the mid-80s, there
was the first buzz about start-ups in the United States. I came up with a busi-
ness idea: banks have a need for quality information to increase profitability,
so there should be a market for quality real-time information.
I launched a start-up with the name of Olsen & Associates. The goal was
to build an information system for financial markets with real-time forecasts
and trading recommendations using tick-by-tick market data. The product
idea combined my research interest with an information service, which would
both improve the quality of decision making in financial markets and generate
revenue to fund further research. The collection of tick market data began in
January 1986 from Reuters. We faced many business and technical obstacles,
where data storage cost was just one of the many issues. After many setbacks,
we successfully launched our information service and eventually acquired 60
big to mid-sized banks across Europe as customers.
The Alpha Engine 71
In 1990, we published our first scientific paper [26] revealing the first scaling
law. The study showed that intraday prices have the same scaling law exponent
as longer term price movements. We had expected two different exponents:
one for intraday price movements, where technical factors dictate price dis-
covery, and another for longer term price movements that are influenced by
fundamentals. The result took us by surprise and was evidence that there are
universal laws that dictate price discovery at all scales. In 1995, we organized
the first high-frequency data conference in Zurich, where we made a large
sample of tick data available to the academic community. The conference was
a big success and boosted market microstructure research, which was in its
infancy at that time. In the following years, we conducted exhaustive research
testing all possible model approaches to build a reliable forecasting service and
trading models. Our research work is described in the book [17]. The book
covers data collection and filtering, basic stylized facts of financial market time
series, the modeling of 24-hour seasonal volatility, realized volatility dynam-
ics, volatility processes, forecasting return and risks, correlation, and trading
models. For many years, the book was a standard text for major hedge funds.
The actual performance of our forecasting and trading models was, however,
spurious and disappointing. Our models were best in class, but we had not
achieved a breakthrough.
Back in 1995, we were selling tick-by-tick market data to top banks and
created a spinoff under the name of OANDA to market a currency converter on
the emergent Internet and eventually build a foreign exchange market making
business. The OANDA currency converter was an instant success. At the start
of 2001, we were completing the first release of our trading platform. At the
same time, Olsen & Associates was a treasure store of information and risk
services, but did not have cash to market the products and was struggling for
funding. When the Internet bubble burst and markets froze, we could not pay
our bills and the company went into default. I was able to organize a bailout
with a new investor. He helped to salvage the core of Olsen & Associates with
the aim of building a hedge fund under the name of Olsen Ltd and buying up
the OANDA shares.
In 2001, the OANDA trading platform was a novelty in the financial indus-
try: straight through processing, one price for everyone, and second-by-second
interest payments. At the time, these were true firsts. At OANDA, a trader
could buy literally 1 EUR against USD at the same low spread as a buyer
of 1 million EUR against USD. The business was an instant success. More-
over, the OANDA trading platform was a research laboratory to analyze the
trades of 10,000 traders, all buying and selling at the same terms and condi-
tions, and observe their behavior patterns in different market environments.
I learned hands on, how financial markets really work and discovered that
basic assumptions of market efficiency that we had taken for granted at Olsen
& Associates were inappropriate. I was determined to make a fresh start in
model development.
72 High-Performance Computing in Finance
2012 −0.08 0.19 0.29 0.08 −0.12 0.15 −0.20 0.23 0.10 0.13 0.12 0.11 0.86
2013 −0.17 −0.01 −0.10 −0.08 0.32 0.52 0.04 0.24 −0.10 0.01 −0.01 −0.16 0.77
Note: The P&L is given in percentages. All 23 currency pairs are aggregated.
73
74 High-Performance Computing in Finance
0%
Time
FIGURE 3B.1: Profit & Loss for a time series, generated by a geometric
random walk of 10 million ticks with annualized volatility of 25%. The average
of 60 Monte Carlo simulations is shown. In the limiting case, the P&L curve
becomes a smooth increasing line.
References
1. Baghai, P., Erzan, O., and Kwek, J.-H. The $64 trillion question, convergence
in asset management, McKinsey & Company, 2015.
3. Chen, J.X. The evolution of computing: Alphago. Computing in Science & Engi-
neering 18(4), 2016, pp. 4–7.
4. Bouveret, A., Guillaumie, C., Roqueiro, C.A., Winkler, C., and Nauhaus, S.
High frequency trading activity in EU equity markets, European Securities and
Markets Authority, 2014.
5. Roseen, T. Are quant funds worth another look? Thomson Reuters, 2016.
10. The alpha engine: Designing an automated trading algorithm code. https://
[Link]/AntonVonGolub/Code/blob/master/[Link], 2017. Accessed:
2017-01-04. 2017.
11. Golub, A., Chliamovitch, G., Dupuis, A., and Chopard, B. Multi-scale represen-
tation of high frequency market liquidity. Algorithmic Finance, 5(1), 2016, pp.
3–19.
12. Müller, U.A., Dacorogna, M.M., Davé, R.D., Pictet, O.V., Olsen, R.B., and
Robert Ward, J. Fractals and intrinsic time: A challenge to econometricians. Pre-
sentation at the XXXIXth International AEA Conference on Real Time Econo-
metrics, 14–15 Oct 1993, Luxembourg, 1993.
13. Aloud, M., Tsang, E., Olsen, R.B., and Dupuis, A. A directional-change events
approach for studying financial time series. Economics Discussion Papers, (2012-
36), 6, 2012, pp. 1–17.
14. Ao, H. and Tsang, E. Capturing market movements with directional changes.
Working paper: Centre for Computational Finance and Economic Agents, Univ.
of Essex, 2013.
15. Bakhach, A., Tsang, E.P.K., and Ng, W. Lon. Forecasting directional changes
in financial markets. Working paper: Centre for Computational Finance and
Economic Agents, Univ. of Essex, 2015.
16. Guillaume, D.M., Dacorogna, M. M., Davé, R. R., Müller, U. A., Olsen, R. B.,
and V Pictet, O. From the bird’s eye to the microscope: A survey of new stylized
facts of the intra-daily foreign exchange markets. Finance and Stochastics, 1(2),
1997, pp. 95–129.
17. Gençay, R., Dacorogna, M., Muller, U.A., Pictet, O., and Olsen, R. An Intro-
duction to High-Frequency Finance. Academic Press, New York, 2001.
19. Newman, M.E.J. Power laws, pareto distributions and Zipf’s law. Contemporary
Physics 46(5), 2005, pp. 323–351.
21. West, G.B., Brown, J.H., and Enquist, B.J. A general model for the origin of
allometric scaling laws in biology. Science 276(5309), 1997, p. 122.
22. Zipf, G.K. Human Behaxvior and the Principle of Least Effort. Addison-Wesley,
Reading, MA, 1949.
23. Albert, R. and Barabási, A.L. Statistical mechanics of complex networks. Review
of Modern Physics 74(1), 2002, pp. 47–97.
24. Barabási, A.L. and Albert, R. Emergence of scaling in random networks. Science
1999, p. 509.
76 High-Performance Computing in Finance
25. Newman, M.E.J. The structure and function of complex networks. SIAM review
45(2), 2003, pp. 167–256.
26. Müller, U.A., Dacorogna, M.M., Olsen, R.B., Pictet, O.V., Schwarz, M., and
Morgenegg, C. Statistical study of foreign exchange rates, empirical evidence of
a price change scaling law, and intraday analysis. Journal of Banking & Finance
14(6), 1990, 1189–1208.
27. Hull, J.C. Options, Futures and Other Derivative Securities, 9th edition. Pear-
son, London, 2014.
28. Voit, J. The Statistical Mechanics of Financial Markets, 3rd edition. Springer,
Berlin, 2005.
34. Aloud, M., Tsang, E., Dupuis, A., and Olsen, R. Minimal agent-based model for
the origin of trading activity in foreign exchange market. In: 2011 IEEE Sympo-
sium on Computational Intelligence for Financial Engineering and Economics
(CIFEr). IEEE. 2011, pp. 1–8.
35. Dupuis, A. and Olsen, R.B. High Frequency Finance, Using Scaling Laws to
Build Trading Models, John Wiley & Sons, Inc., 2012, pp. 563–584.
36. Glattfelder, J.B., Bisig, T., and Olsen, R.B. R&D Strategy Document. Technical
report, A Paper by the Olsen Ltd. Research Group, 2010.
37. Cover, T.M. and Thomas, J.A. Elements of Information Theory. John Wiley &
Sons, Hoboken, 1991.
38. Pfister, H.D., Soriaga, J.B., and Siegel, P.H. On the achievable information rates
of finite state ISI channels. In Proc. IEEE Globecom. Eds. Kurlander, D., Brown,
M., and Rao, R. ACM Press, November 2001, pp. 41–50.
39. Golub, A., Glattfelder, J.B., Petrov, V., and Olsen, R.B. Waiting Times and
Number of Directional Changes in Intrinsic Time Framework, 2017. Lykke Corp
& University of Zurich Working Paper.
40. Kaldor, N. and Mirrlees, J.A. A new model of economic growth. The Review of
Economic Studies 29(3), 1962, pp. 174–192.
Chapter 4
Portfolio Liquidation and Ambiguity
Aversion
CONTENTS
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.2 Reference Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.2.1 Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.2.2 Optimal liquidation problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.2.3 Feedback controls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.3 Ambiguity Aversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.4 Effects of Ambiguity on the Optimal Strategy . . . . . . . . . . . . . . . . . . 88
4.4.1 Arrival rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.4.2 Fill probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.4.3 Midprice drift . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
[Link] Equivalence to inventory penalization . . . . . . . . 90
4.5 Closed-Form Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.6 Inclusion of Market Orders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.6.1 Feedback controls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.6.2 The effects of ambiguity aversion on market order
execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
Appendix 4A Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
4.1 Introduction
This work considers an optimal execution problem in which an agent is
tasked with liquidating a number of shares of an asset before the end of a
defined trading period. The development of an algorithm which quantitatively
specifies the execution strategy ensures that the liquidation is performed with
the perfect balance of risk and profits, and in the modern trading environment
in which human trading activity is essentially obsolete, the execution must
be based on a set of predefined rules. However, the resulting set of rules
depends heavily upon modeling assumptions made by the agent. If the model
77
78 High-Performance Computing in Finance
is misspecified, then the balance between risk and profits is imperfect and the
strategy may be exposed to risks that are not captured by model features.
The design of optimal execution algorithms began in the literature with
the seminal work by Almgren and Chriss (2001). This approach was general-
ized over time with models that include additional features. Kharroubi and
Pham (2010) take a different approach to modeling the price impact of the
agent’s trades depending on the amount of time between them. A more direct
approach in modeling features of the limit order book and an induced price
impact appears in Obizhaeva and Wang (2013).
The aforementioned papers allow the agent to employ only market orders in
their liquidation strategy. By allowing the agent to use limit orders instead (or
in addition), the profits accumulated by the portfolio liquidation can poten-
tially be increased. Intuitively this is possible due to acquiring better trade
prices by using limit orders instead of market orders. Guéant et al. (2012)
rephrase the problem as one in which all of the agent’s trades are performed
by using limit orders. This work is then continued in Guéant and Lehalle
(2015) where more general dynamics of transactions are considered. A com-
bination of limit order and market order strategy is developed in Cartea and
Jaimungal (2015a). For an overview on these and related problems in algo-
rithmic and high-frequency trading, see Cartea et al. (2015).
The present work is most related to Guéant et al. (2012) and Cartea and
Jaimungal (2015a). The underlying dynamics we use here are identical to
Guéant et al. (2012), but we use a different objective function. Our work con-
siders a risk-neutral agent, whereas the former considers one with exponential
utility. Since the purpose of this work is to investigate the effects of ambigu-
ity aversion, we consider the risk-neutral agent in order to eliminate possible
effects due to risk aversion. In addition, we also consider the inclusion of mar-
ket orders in the agent’s trading strategy toward the end of this work, similar
to what is done in Cartea and Jaimungal (2015a).
If the agent expresses equal levels of ambiguity toward the arrival rate and
fill probability of orders, then her optimal trading strategy can be found in
closed form. This is of benefit to the agent if the strategy is computed and
implemented in real time because there is no need to numerically solve a non-
linear PDE. Incorporating other relevant features in the framework makes the
model more realistic, but this may come at the cost of not having closed-form
solutions. An example is when the admissible strategies include market orders.
In this case, the optimal strategies are obtained by solving a quasi-variational
inequality. This requires sophisticated numerical methods and computer power
to implement the strategies in real time.
The rest of the work is presented as follows. In Section 4.2, we present
the dynamics of price changes and trade arrivals that our agent considers to
be her reference model. The solution to the optimal liquidation is presented
here in brief. In Section 4.3, we introduce the method by which our agent
expresses ambiguity aversion. This amounts to specifying a class of equivalent
candidate measures which she considers and a function which assigns to each
Portfolio Liquidation and Ambiguity Aversion 79
dqt = −dNt .
Wealth: Immediately after an MO lifts one of the agent’s LOs, her wealth
increases by an amount equal to the trade value St + δt . The wealth, Xt ,
therefore, has dynamics
The form of the HJB equation 4.2 and the conditions 4.3 and 4.4 allow for
the ansatz H(t, x, q, S) = x + q S + hq (t). Substituting this expression into the
above HJB equation gives
∂t hq + α q + sup λ e−κ δ (δ + hq−1 − hq ) 1q=0 = 0, (4.6)
δ≥0
hq (T ) = −q (q), (4.7)
h0 (t) = 0 . (4.8)
0.25
0.2
q =1
0.15
Depth (δ)
0.1
0.05
Increasing q
0
0 5 10 15
Time (secs)
that for times close to maturity and large levels of inventory, the agent posts
a depth of zero indicating that she is willing to make essentially no profit in
order to avoid the risk of terminal penalty. If it were not for the constraint
δt ≥ 0, the agent would post negative depths in this region if it would mean a
faster rate of order fill. This is an indication of the desire to submit a market
order. A proper formulation of optimal market order submission is provided
in Section 4.6.
On the other hand, if the agent lacks confidence in the reference measure, then
the cost should be small and even large deviations will incur a small penalty.
The optimization problem as originally posed in Equation 4.1 is altered to
reflect the consideration of new dynamics:
where the class of equivalent measures Q and the penalty H are specified
below.
Since the agent may exhibit more confidence in the specifications of the
model with respect to some aspects over others, she penalizes deviations of
the model with respect to different aspects with different magnitudes. In par-
ticular, the agent wants unique penalizations toward specifications of:
We consider the same class of candidate measures and penalty function intro-
duced in Cartea et al. (2014). Namely, any candidate measure is defined in
terms of a Radon–Nikodym derivative
and
⎧ T ∞ ⎫
dQα,λ,κ (η, g) ⎨ T ∞ ⎬
gt (y)
= exp − (e − 1) ν P (dy, dt) + gt (y) μ(dy, dt) .
dQα (η) ⎩ ⎭
0 0 0 0
(4.14)
In denoting the candidate measure Qα,λ,κ , we are conveying that the drift,
arrival rate of MOs, and distribution of MOs have all been changed. The full
84 High-Performance Computing in Finance
The constraints imposed on the first two expectations ensure that the Radon–
Nikodym derivatives in Equations 4.13 and 4.14 yield probability measures.
The inequality constraint ensures that in the candidate measure, the profits
earned by the agent have a finite variance. The set or candidate measures are
parameterized by the process η and the random field g, and the dynamics
of the midprice and MOs can be stated in terms of these quantities. In the
candidate measure Qα,λ,κ (η, g), the drift of the midprice is no longer α, but
is changed to ηt . Also in the candidate measure, the compensator of μ(dy, dt)
becomes νQ (dy, dt) = egt (y) νP (dy, dt), see Jacod and Shiryaev (1987), Chapter
III.3c, Theorem 3.17.
Before introducing the penalty function H, we further decompose the mea-
sure change. First, note that according to the above discussion, if gt (y) does
not depend on y, then the fill probability has not changed in the candidate
measure, only the rate of MO arrival is different. Second, if gt (y) satisfies the
equality
∞
egt (y) F (dy) = 1, (4.16)
0
for all t ∈ [0, T ], then only the fill probability has changed in the candidate
measure and the intensity remains the constant λ. We are then able to break
into two steps the measure change of Equation 4.14. Given a random field g,
we define two new random fields by
∞
gtλ = log egt (y) F (dy) , (4.17)
0
gtκ (y) = gt (y) − gtλ . (4.18)
The random field g λ does not depend on y, and it is easily shown that the
random field g κ satisfies Equation 4.16. Thus any measure Qα,λ,κ (η, g) ∈
Qα,λκ allows us to uniquely define two measures Qα,λ (η, g) and Qα,κ (η, g) via
Portfolio Liquidation and Ambiguity Aversion 85
gλ Qα,λ gκ
η g
P Qα Qα,λ,κ
gκ gλ
Qα,κ
FIGURE 4.2: Three natural alternative routes from the reference measure
P to a candidate measure Qα,λ,κ in which midprice drift, MO intensity, and
execution price distribution of MOs have been altered.
Random–Nikodym derivatives:
⎧ T ∞ ⎫
dQα,λ (η, g) ⎨ λ
T ∞ ⎬
= exp − (egt
− 1) ν P (dy, dt) + g λ
μ(dy, dt) ,
dQα (η) ⎩ t
⎭
0 0 0 0
⎧ T ∞ ⎫
α,κ ⎨ T ∞ ⎬
dQ (η, g) κ
= exp − (egt (y) − 1) νP (dy, dt) + gtκ (y) μ(dy, dt) .
α
dQ (η) ⎩ ⎭
0 0 0 0
Qα,λ
ϕλ ϕκ
ϕα
P Qα Qα,λ,κ
ϕκ ϕλ
Qα,κ
FIGURE 4.3: Ambiguity weights associated with each sequential step of the
full measure change.
with conditions
∞
1
Kc,d (a, b) = −(ea(y) − 1) + a(y) ea(y)+b(y) λ F (dy)
c
0
∞
1
+ [−(eb(y) − 1) ea(y) + b(y) ea(y)+b(y) ] λ F (dy),
d
0
∞
G= g: eg(y) F (dy) = 1 . (4.23)
0
The feedback expressions above give the pointwise optimizer in the dif-
ferential equation 4.6. Since we have a classical solution to this equation, the
function H serves as a candidate value function. Also, by appropriately substi-
tuting the processes for the state variables in Equation 4.27, we get candidate
optimal controls. The verification theorem below guarantees that these can-
didates are indeed the value function and optimal controls we seek.
When the agent is ambiguity averse only to the market order arrival rate, we
expect g κ∗ = 0 and η ∗ = α. In other words, the agent is fully confident in her
model of fill probabilities and drift of the midprice.
In Figure 4.4, we show a comparison of the optimal posting level when
the agent is ambiguity averse to only market order arrival versus ambiguity
neutral. Also shown is the market order intensity induced by the optimal ∗
candidate measure at various times and for various inventory levels (λ egq (t) ).
Of particular interest is how the effective market order intensities differ as
the strategy approaches maturity. For a fixed level of inventory, the effective
market order intensity decreases as maturity approaches because that is the
time in which such a change can impair the agent’s performance the most.
Also note that the largest change in the optimal posting occurs for the largest
value of inventory. When the inventory is at its maximum, this is when the
agent’s fear of a misspecified arrival rate can have most significant impact on
their trading performance.
2 0.25
Market order arrival rate
Depth (δ)
0.15
1.4
0.1
1.2
0.05
1 Increasing q
0.8 0
0 2 4 6 8 10 0 5 10 15
Inventory (q) Time (secs)
0 q=1
Change in depth (δ(ϕλ) −δ )
−0.005
−0.01
−0.015
Increasing q
−0.02
−0.025
0 5 10 15
Time (secs)
× 10−3
0.25
0
Change in depth (δ − δ0)
0.2 −2
q=1
Depth (δ )
0.15 −4
Increasing q
−6
0.1
−8
0.05
Increasing q −10 q=1
0
−12
0 5 10 15 0 5 10 15
Time (secs) Time (secs)
FIGURE 4.5: Optimal depth and change in depth due to ambiguity for an
agent who is ambiguity averse to fill probability (dashed lines are ambiguity
neutral depths). Parameter values are ϕκ = 3, κ = 15, λ = 2, σ = 0.01, α = 0,
(q) = θ q, θ = 0.01, Q = 10, and T = 15.
90 High-Performance Computing in Finance
0.25
0 q=1
0.15
−0.01
0.1
−0.015
Increasing q
0.05
−0.02
Increasing q
0
−0.025
0 5 10 15 0 5 10 15
Time (secs) Time (secs)
FIGURE 4.6: Optimal depth and change in depth due to ambiguity for an
agent who is ambiguity averse to midprice drift (dashed lines are ambiguity
neutral depths). Parameter values are ϕα = 5, κ = 15, λ = 2, σ = 0.01, α = 0,
(q) = θ q, θ = 0.01, Q = 10, and T = 15.
After the usual ansatz, substitution into the corresponding HJB equation, and
solving for the optimal controls δ, the resulting system of ODEs is
1 !"
∂t hφq + α q − φ σ 2 q 2 + sup λ e−κ δ δ + hφq−1 − hφq 1q=0 = 0 .
2 δ≥0
= sup inf α EQ
t,x,q,S [Xτ ∧T + qτ ∧T (Sτ ∧T − (qτ ∧T )) + Ht,τ ∧T [Q|P]] .
(δu )t≤u≤T ∈A Q∈Q
Since we are considering only ambiguity on the midprice drift, the infimum
is taken over by equivalent measures where MO intensity and fill probability
remain fixed to the reference measure. Thus a cumulative inventory penaliza-
tion is equivalent to considering ambiguity on the drift of the midprice of the
asset.
As shown in Figure 4.6, the change in the optimal depth is smallest when
the inventory is smallest. However, the change in optimal depth consistently
becomes more negative with time to maturity. Both of these behaviors make
sense when midprice ambiguity is interpreted as a cumulative inventory penal-
ization. The first characteristic is made clear by noting that the larger the
agent’s inventory position is, the larger the accumulation of the inventory
penalty is, and the faster the agent desire to liquidate shares. The second
characteristic is explained by noting that the impact on the strategy’s perfor-
mance due to a misspecified drift is approximately linear in time to maturity.
The effect of ambiguity on the drift of the midprice turns out to be more
significant than ambiguity on the other two factors in the case of the liqui-
dation problem. As shown in Section 4.5, specifically in Proposition 4.4, the
optimal limit order price grows without bound as time to maturity increases
when there is no ambiguity on drift. However, when ambiguity on drift is
considered, all of the optimal posting levels become finite for all time.
where
−(1+ ϕκ )
ϕ
ξ= 1+ λ,
κ
with terminal and boundary conditions hq (T ) = −q (q) and h0 (t) = 0.
Proposition 4.3 (Solving for hq (t) in Equation 4.28). Let Kq = α κ q −
1 2 2
2 ϕα σ κ q .
where
ξ n −κ (q−n) (q−n)
Cq,n = e .
n!
ii. Suppose Kq = 0 only when q = 0. Then
q
1 Kn (T −t)
hq (t) = log Cq,n e ,
κ n=0
where
#
j
1
Cn+j,n = (−ξ)j Cn,n ,
p=1
K n+p − Kn
q−1
Cq,q = − Cq,n + e−κ q (q) ,
n=1
C0,0 = 1 .
Proof. See Appendix.
The two cases considered in Proposition 4.3 preclude the case Kq = 0 for
exactly two distinct values of q (the proposition only considers that either all
Kq are zero or only K0 = 0). Due to the quadratic dependence of Kq on q,
this omitted case is the only one not considered. However, it is very unlikely
that this case would arise in reality, for example, if the model was calibrated
to market data. Even if it was the case, an arbitrarily small adjustment could
be made to any of the parameters (the most reasonable choice would be ϕα )
so that the ratio ϕαασ2 is irrational.
Proposition 4.3 shows that the value function can have two different func-
tional forms depending on the possible values of Kq = α κ q − 12 ϕα σ 2 κ q 2 .
Thus how does this affect the optimal depth δ ∗ ? The proposition below shows
the difference in behavior of the optimal depths as time to maturity becomes
arbitrarily large when (i) Kq = 0, ∀q > 0, and (ii) Kq < 0, ∀q > 0.
Portfolio Liquidation and Ambiguity Aversion 93
respectively.
The agent’s additional control consists of the set of times at which the
process J increments.
The inclusion of market orders changes the equation satisfied by the value
function H(t, x, q, S). Rather than a standard HJB equation, H now satisfies
a quasi-variational inequality of the following form:
1 2
−κ δ
max ∂t H + α ∂S H + 2 σ ∂SS H + sup λ e DH 1q=0 ;
δ≥0
Δ
H t, x + S − , q − 1, S − H(t, x, q, S) = 0. (4.29)
2
From Equation 4.29, it is clear that one of the two terms must be equal to
zero, and the other term must be less than or equal to zero. This allows the
definition of a continuation region and execution region:
C = (t, x, q, S) : ∂t H + α∂S H + 12 σ 2 ∂SS H + sup λ e−κ δ DH 1q=0 = 0 ;
δ
E = (t, x, q, S) : H(t, x + S − Δ2 , q − 1, S) − H(t, x, q, S) = 0 .
It is clear that the feedback expression for the optimal depths is of the same
form as in Proposition 4.1. However, the numerical values of these depths
will be different because the values of the functions hq (t) will be different.
Also of importance is to note that after making the ansatz, the continuation
and execution regions can be redefined in terms of hq (t) and therefore will
not depend on the state variables x and S. The boundary between the two
regions is a curve in the (t, q) plane. Figure 4.7 illustrates the optimal depths
and market order execution boundary for an agent who liquidates a portfolio
of assets with both limit and market orders. The notable difference in the
optimal depths between this case and that of an agent who does not execute
market orders is that presently, they are bounded below by max(0, κ1 − Δ 2 ),
while without market orders they are bounded below by 0. This is easily seen
from the feedback form of the depths, δq∗ (t) = ( κ1 −hq−1 (t)+hq (t))+ , combined
with the inequality hq−1 (t) − hq (t) − Δ2 ≤ 0.
Portfolio Liquidation and Ambiguity Aversion 95
0.25 60
0.2 50
q=1
Inventory (q)
40
Depth (δ )
0.15
30
0.1
20
0.05
10
Increasing q
0 0
0 5 10 15 20 25 0 5 10 15 20 25
Time (secs) Time (secs)
FIGURE 4.7: Optimal depth and market order execution boundary for
ambiguity neutral agent. Parameter values are κ = 15, λ = 2, σ = 0.01,
α = 0, (q) = θ q, θ = 0.01, Δ = 0.01, Q = 60, and T = 25.
δ
⎫
Δ ⎬
hq−1 − hq − =0 (4.31)
2 ⎭
h(T, q) = −q (q),
h(t, 0) = 0.
(a) (b)
60 ϕλ = 0 60 ϕκ = 0
ϕλ = 6 ϕκ = 10
50 50
ϕλ = 12 ϕκ = 30
Inventory (q)
Inventory (q)
40 40
30 30
20 20
10 10
0 0
0 5 10 15 20 25 0 5 10 15 20 25
Time (secs) Time (secs)
(c)
60 ϕα = 0
ϕα = 1
50
ϕα = 2
Inventory (q)
40
30
20
10
0
0 5 10 15 20 25
Time (secs)
FIGURE 4.8: Effect on market order execution boundary for an agent who is
ambiguity averse to different factors of the reference model. Parameter values
are κ = 15, λ = 2, σ = 0.01, α = 0, (q) = θ q, θ = 0.01, Q = 60, and T = 25.
(a) MO arrival rate, (b) fill probability, and (c) mid price drift.
is strongly encouraged to sell shares very quickly with this type of penalty,
much more so than compared to only the terminal inventory penalty.
The relatively small change due to ambiguity on fill probability can be
understood as follows: for large inventory positions, the natural inclination of
the agent is to post small depths because she is in a hurry to liquidate the
inventory position before maturity. But as the agent posts smaller depths, the
change in the execution price distribution must be larger to have a significant
impact on the fill probability and hence also the performance of the strategy.
This type of large change is prevented by the entropic penalty. Essentially,
by naturally wanting to post small depths, the agent has already gained a
significant amount of protection against a misspecified fill probability.
Ambiguity with respect to market order arrival rate lies in a middle ground.
On one hand, the significance of this type of ambiguity is less than that of
the midprice drift because changing the arrival rate does not directly penal-
ize the holding of inventory. On the other hand, posting smaller depths for
Portfolio Liquidation and Ambiguity Aversion 97
4.7 Conclusion
We have shown how to incorporate ambiguity aversion into the context of
an optimal liquidation problem and have investigated the impact of ambiguity
with respect to different sources of randomness on the optimal trading strat-
egy. The primary mathematical procedure which allows for the computation
of the optimal strategy is in solving a PDE for the agent’s value function.
When the agent is only allowed to employ limit orders and when her ambi-
guity aversion levels satisfy a particular symmetry constraint, the solution to
this PDE is known in closed form. This allows any application of the strat-
egy to be implemented very efficiently by precomputing the optimal trading
strategy.
When the ability to submit market orders is added to the model, we no
longer have closed-form solutions for the optimal trading strategy. In this case,
the PDE must be solved numerically, a task which can become computation-
ally complex when the number of traded assets is increased or when additional
features are added to the model. Other additional features which could also
potentially cause a loss of closed-form solutions are the inclusion of a trade
signal which indicates favorable or unfavorable market conditions, or the pro-
cess of updating the agent’s reference model in real time through a learning
procedure.
Appendix 4A Proofs
Proof of Proposition 4.2. The minimization in η is independent of the opti-
mization in δ, g λ , and g κ and so can be done directly. First-order conditions
imply that η ∗ = α − ϕα σ 2 q, as desired. This value of η ∗ is easily seen to be
unique as it is a quadratic optimization. For the optimization over δ, g λ , and
g κ , first consider ϕλ > ϕκ . Then the term to be optimized is
∞
g λ +g κ (y)
G(δ, g , g ) = λ
λ κ
e F (dy) (δ + hq−1 − hq ) + Kϕλ ,ϕκ (g λ , g κ )
δ
∞
λ
+g κ (y)
=λ eg F (dy) (δ + hq−1 − hq ) . . .
δ
98 High-Performance Computing in Finance
∞
1 λ λ
+g κ (y)
+ [λ −(eg − 1) + g λ eg F (dy)] . . .
ϕλ
0
∞
1 g κ (y) gλ g λ +g κ (y)
+ λ −(e − 1) e + g (y) e
κ
F (dy) . (A.1)
ϕκ
0
λ
+g κ (y) λ λ λ κ
λ eg (δ + hq−1 − hq ) + (−(eg − 1) + g λ eg +g (y) )
ϕλ
λ κ λ λ κ κ
+ (−(eg (y) − 1) eg + g κ (y) eg +g (y) ) + γ(eg (y) − 1). (A.2)
ϕκ
Part 3: Solving for γ: Substituting this expression into the integral constraint
and performing some computations give an expression for γ:
λ
g λ λ gλ λ eg
γ =− e + log(1 − e−κ δ + e−ϕκ (δ+hq−1 −hq ) e−κ δ ).
ϕλ ϕκ
∞
gλ g
=λ e (e + k(y)) F (dy) (δ + Δhq ) . . .
δ
δ
1 λ λ
+ λ −(eg − 1) + g λ eg (eg + k(y)) F (dy) . . .
ϕλ
0
∞
1 λ λ
+ λ −(eg − 1) + g λ eg (eg + k(y)) F (dy) . . .
ϕλ
δ
δ
1 λ λ
+ λ −(eg + k(y) − 1)eg + log(eg + k(y))eg
ϕκ
0
× (e + k(y)) F (dy) . . .
g
∞
1 λ λ
+ λ −(eg + k(y) − 1)eg + log(eg + k(y))eg
ϕκ
δ
× (e + k(y)) F (dy) ,
g
∞ ∞
gλ 1 λ gλ
m () = λ e k(y) F (dy) δ + Δhq + λ g e k(y) F (dy) . . .
ϕλ
δ 0
δ
1 λ λ λ
+ λ −k(y)eg + k(y)eg + k(y) log(eg + k(y))eg F (dy) . . .
ϕκ
0
∞
1 λ λ λ
+ λ −k(y)eg + k(y)eg + k(y) log(eg + k(y))eg F (dy)
ϕκ
δ
∞
λ
=λ eg k(y) F (dy) δ + Δhq . . .
δ
δ
1 λ
+ λ k(y) log(eg + k(y))eg F (dy) . . .
ϕκ
0
∞
1 λ
+ λ k(y) log(eg + k(y))eg F (dy) .
ϕκ
δ
Portfolio Liquidation and Ambiguity Aversion 101
δ λ ∞ λ
1 eg k 2 (y) 1 eg k 2 (y)
m () = λ g F (dy) + λ F (dy)
ϕκ e + k(y) ϕκ eg + k(y)
0 δ
∞ gλ 2
1 e k (y)
= λ F (dy)
ϕκ egκ∗ (y) + k(y)
0
∞ λ
1 eg k 2 (y)
= λ F (dy).
ϕκ ef (y)
0
This expression is non-negative for all ∈ [0, 1], showing that indeed the
expression for g κ∗ (y) in Equation A.6 is a minimizer. This expression is strictly
positive unless k ≡ 0, showing that the inequality in Equation A.11 is strict
unless f = g κ∗ , therefore g κ∗ is the unique minimizer.
Part 5: First-order conditions for g λ : After substituting the expression
A.6 into the term to be minimized (see Equation A.1) and performing some
tedious computations, we must minimize the following with respect to g λ :
λ
λ eg eg (δ + Δhq )e−κ δ
λ λ λ λ λ λ
+ (−(eg − 1) + g λ eg eg )e−κ δ + (−(eg − 1)eg + geg eg )e−κ δ
ϕλ ϕκ
λ λ λ
+ (−(eg − 1) + g λ eg eg )(1 − e−κ δ )
ϕλ
λ λ λ
+ (−(eg − 1)eg + geg eg )(1 − e−κ δ ). (A.12)
ϕκ
102 High-Performance Computing in Finance
holds:
κ
e−ϕκ (hq−1 −hq ) ≤ .
κ + ϕκ
The first derivative of Equation A.14 with respect to δ is
(κ − (κ + ϕκ )e−ϕκ (δ+hq−1 −hq ) )e−κ δ ,
and the preceding inequality implies that this is non-negative for all δ ≥ 0,
implying that δ ∗ = 0 is the minimizer of Equation A.14. Thus the value of δ
which maximizes the original term of interest is
∗ 1 ϕκ
δ = log(1 + ) − hq−1 + hq ,
ϕκ κ +
+e−κ δ−ϕκ (δ+hq−1 −hq ) ) 1q=0 = 0,
hq (T ) = −q (q).
(A.16)
This is a system of ODEs of the form ∂t h = F(h). To show existence and
uniqueness of the solution to this equation, the function F is shown to be
bounded and globally Lipschitz. It suffices to show that the function f is
bounded and globally Lipschitz, where f is given by
λ ϕλ
f (x, y) = sup 1 − exp log(1 − e−κ δ + e−κ δ−ϕκ (δ+x−y) ) .
δ≥0 ϕλ ϕκ
Boundedness and the global Lipschitz property of f implies the same for
F, and so existence and uniqueness follows from the Picard–Lindelöf theo-
rem. The global Lipschitz property is a result of showing that all directional
derivatives of f exist and are bounded for all (x, y) ∈ R2 . !
The supremum is attained at δ ∗ = ϕ1κ log(1 + ϕκκ ) − x + y . Thus two
+
separate domains for f must be considered: ϕ1κ log(1 + ϕκκ ) > x − y and
1 1 ∗
ϕκ log(1 + κ ) ≤ x − y. First consider ϕκ log(1 + κ ) > x − y so that δ =
ϕκ ϕκ
1
ϕκ log(1 + κ ) − x + y. Substituting this into the expression for f yields:
ϕκ
λ ϕλ κ ϕκ
f (x, y) = 1 − exp log 1 − e− ϕκ log(1+ κ )+κ(x−y)
ϕλ ϕκ
− log(1+ ϕκκ )
× (1 − e )
λ ϕλ
= 1 − exp log 1 − Be κ(x−y)
,
ϕλ ϕκ
104 High-Performance Computing in Finance
! ϕκ
κ
where B = κ
ϕκ +κ
ϕκ
ϕκ +κ > 0. Letting z = B eκ (x−y) , the inequality
1
ϕκ log(1 + ϕκ
κ ) > x − y implies
ϕκ
κ κ ϕκ κ
log(1+ ϕκκ )
0<z< e ϕκ
ϕκ + κ ϕκ + κ
ϕκ
κ κ ϕκ ϕκ + κ ϕκ ϕκ
= ( ) κ = < 1.
ϕκ + κ ϕκ + κ κ ϕκ + κ
λ
(1 − e−ϕλ (x−y) ),
f (x, y) =
ϕλ
ϕ
− ϕλ log(1+ ϕκκ )
which is bounded by ϕλ 1 − e
λ κ . Partial derivatives of f are
given by
where C is the curve that connects (x1 , y1 ) to (x2 , y2 ) in a straight line and
A is a uniform bound on the gradient of f . This proves that there exists a
unique solution h to Equation 4.24.
Proof of Theorem 4.1. Let h be the solution to Equation 4.24 with termi-
nal conditions hq (T ) = −q (q), and define a candidate value function by
Ĥ(t, x, q, S) = x + q S + hq (t). From Ito’s lemma we have
T T T
Ĥ(T, XTδ − , ST − , qTδ − ) = Ĥ(t, x, S, q) + ∂t hqs (s)ds + α qs ds + σ qs dWs
t
t t
T ∞
+ (δs + hqs− −1 (s) − hqs− (s))μ(dy, ds).
t δs
Note that for any admissible measure Q(η, g) and admissible control δ, we
have
T ∞ T ∞
Q(η,g) 2 Q(η,g) 2 gt (y)
E (δt ) νQ(η,g) (dy, dt) = E (δt ) e νP (dy, dt)
0 δt 0 δt
T ∞
Q(η,g)
≤E y 2 egt (y) νP (dy, dt)
0 δt
T ∞
Q(η,g) 2 gt (y)
≤E y e νP (dy, dt) < ∞.
0 0
ηt (δ) = α − ϕα σ 2 qt ,
ϕλ
gtλ (δ) = log(1 − e−κ δt (1 − e−ϕκ (δt +hqt −1 (t)−hqt (t)) )),
ϕκ
gtκ (y; δ) = − log(1 − e−κ δt (1 − e−ϕκ (δt +hqt −1 (t)−hqt (t)) ))
− ϕκ (δt + hqt −1 (t) − hqt (t))1y≥δt .
These processes each have the same form as the pointwise minimizers found
in Proposition 4.2, and so for a given δ = (δt )0≤t≤T , these controls achieve the
pointwise infimum in Equation 4.24. Since h is a classical solution to Equation
4.24, it is bounded for t ∈ [0, T ] and 0 ≤ q ≤ Q. Using the boundedness of h, we
see that gtλ (0) is finite and bounded with respect to t, and limδ→∞ gtλ (δ) = 0,
therefore gtλ (δ) is bounded. It is also clear that ηt (δ) is bounded. However,
gtκ (y; δ) is only bounded from above, so it is possible that the pair (ηt (δ), gt (δ))
does not define an admissible measure as per the definition in Equation 4.15.
In order to proceed, we use a modification of gtκ :
κ
gt,M (y; δ) = − log(1 − e−κ δt (1 − e−ϕκ (δt +hqt −1 (t)−hqt (t)) ))
− ϕκ min(δt + hqt −1 (t) − hqt (t), M )1y≥δt .
κ
Since gt,M is bounded, letting gt,M (y; δ) = gtλ (δ) + gt,M κ
(y; δ), the pair
(ηt (δ), gM (δ)) does define an admissible measure Q α,λ,κ
(η(δ), gM (δ)). Note
κ
that for a fixed t and δt , gt,M (y; δ) → gtκ (y; δ) as M → ∞ pointwise in y and
in L1 ( F (dy)).
Step 3: Showing pointwise -optimality: As in the proof of Proposition 4.2,
consider the functional
∞
λ κ
G(t, δ, g , g ) = λ
λ κ
eg +g (y) F (y)dy (δ + hq−1 (t) − hq (t))
δ
+ Kϕλ ,ϕκ (g λ , g κ )1ϕλ >ϕκ + Kϕκ ,ϕλ (g κ , g λ )1ϕκ >ϕλ .
Portfolio Liquidation and Ambiguity Aversion 107
We now show
lim G(t, δt , gtλ (δ), gt,M
κ
(·; δ)) = G(t, δt , gtλ (δ), gtκ (·; δ))
M →∞
uniformly in t and δ. Consider the first term only, and compute the difference
κ
when evaluated at both gt,M (·; δt ) and gtκ (·; δt ), which we denote by
J(t,δ, M )
$ $$∞ ∞ $
$
gtλ (δ) $
$$ $
$ $ κ
(y;δ) gtκ (y;δ)
F (dy)$$
$δt + hq−1 (t) − hq (t)$$ e F (dy) − e
gt,M
= λe
t δ t δ
$ $ $
ϕκ $
)gtλ (δ) $ −κ δ $ −ϕ min(δ +h (t)−h (t),M )
= λe
(1− ϕ $δt + hq−1 (t) − hq (t)$e t$ κ t q−1 q
λ
$ $ $e
$
$
− e−ϕκ (δt +hq−1 (t)−hq (t)) $$
$ $
(1− ϕκ )g λ (δ) $ $
= λ e ϕλ t $$δt + hq−1 (t) − hq (t)$$e−κ δt e−ϕκ M
$ $
$ $
× $$1 − e−ϕκ (δt +hq−1 (t)−hq (t)−M ) $$1δt +hq−1 (t)−hq (t)≥M
$ $
(1− ϕκ )g λ (δ) $ $
≤ λ e ϕλ t $$δt + hq−1 (t) − hq (t)$$e−κ δt e−ϕκ M .
Thus the measure Qα,λ,κ (η(δ), gM (δ)) is pointwise (in t) -optimal, uniformly
in δ.
Step 4: Showing Ĥ(t, x, S, q) ≥ H(t, x, S, q): Taking an expectation of
Ĥ(T, XTδ − , ST − , qTδ − ) in the measure Qα,λ,κ (η(δ), gM (δ)), and using Equation
A.19, gives
Qα,λ,κ (η(δ),gM (δ))
Et,x,q,S Ĥ(T, XTδ − , ST − , qTδ − )
T T T
Qα,λ,κ (η(δ),gM (δ))
= Ĥ(t, x, S, q) + Et,x,q,S ∂t hqs (s)ds + α qs ds + σ qs dWs
t t t
T ∞
+ (δs + hqs −1 (s) − hqs (s)) νQα,λ,κ (η(δ),gM (δ)) (dy, ds)
t δs
T 2
Qα,λ,κ (η(δ),gM (δ)) 1 α − ηs (δ)
≤ Ĥ(t, x, S, q) + (T − t) + Et,x,q,S − ds
2ϕα σ
t
T
− Kϕλ ,ϕκ (gsλ (δ), gs,M
κ
(·; δ)) 1ϕλ ≥ϕκ
t
+ Kϕκ ,ϕλ (gs,M
κ
(·; δ), gsλ (δ)) 1ϕλ <ϕκ ds .
Ĥ(t, x, S, q) + (T − t)
% T 2
Qα,λ,κ (η(δ),gM (δ)) 1 α − ηs (δ)
≥ Et,x,q,S Ĥ(T, XTδ − , ST − , qTδ − ) + ds
2ϕα σ
t
T
+ Kϕλ ,ϕκ (gsλ (δ), gs,M
κ
(·; δ)) 1ϕλ ≥ϕκ
t
&
+K ϕκ ,ϕλ κ
(gs,M (·; δ), gsλ (δ)) 1ϕλ <ϕκ ds
Qα,λ,κ (η(δ),gM (δ))
= Et,x,q,S Ĥ(T, XTδ , ST , qTδ ) + Ht,T Qα,λ,κ (η(δ), gM (δ))|P
Qα,λ,κ (η(δ),gM (δ)) ' δ
(
= Et,x,q,S XT + qTδ (ST − (qTδ )) + Ht,T Qα,λ,κ (η(δ), gM (δ))|P .
Since this holds for one particular choice of admissible measure Qα,λ,κ (η(δ),
gM (δ)), we have
' δ (
Ĥ(t, x, S, q) + (T − t) ≥ inf EQ
t,x,q,S XT + qT (ST − (qT )) + Ht,T (Q|P) .
δ δ
Q∈Q
Portfolio Liquidation and Ambiguity Aversion 109
Step 5: Showing Ĥ(t, x, S, q) ≤ H(t, x, S, q): Now, let δ = (δt )0≤t≤T be the
control process defined in the statement of the theorem, and let ηt , gtλ , and
gtκ (y) be arbitrary such that they induce an admissible measure Qα,λ,κ (η, g) ∈
Qα,λ,κ . Then from Ito’s lemma and the fact that h satisfies Equation 4.24
Qα,λ,κ (η,g)
Et,x,q,S Ĥ(T, XTδ − , ST − , qTδ − )
T T T
Qα,λ,κ (η,g)
= Ĥ(t, x, S, q) + Et,x,q,S ∂t hqs (s)ds + α qsδ ds +σ qsδ dWs
t t t
T ∞
+ δs + hqs −1 (s) − hqs (s)
t δs
× νQα,λ,κ (η,g) (dy, ds)
T 2
Qα,λ,κ (η,g) 1 α − ηs
≥ Ĥ(t, x, S, q) + Et,x,q,S − ds
2ϕα σ
t
T
− Kϕλ ,ϕκ (gsλ , gsκ ) 1ϕλ ≥ϕκ
t
+K ϕκ ,ϕλ
(gsκ , gsλ ) 1ϕλ <ϕκ ds .
Since this holds for any arbitrary admissible measure Qα,λ,κ (η, g), we have
δ δ
Ĥ(t, x, S, q) ≤ inf EQ t,x,q,S XT + qT (ST − (qT )) + Ht,T (Q|P) .
δ
Q∈Qα,λ,κ
Therefore,
' δ (
Ĥ(t, x, S, q) ≤ sup inf EQ
t,x,q,S XT + qT (ST − (qT )) + Ht,T (Q|P)
δ δ
(δs )t≤s≤T ∈A Q∈Qα,λ,κ
= H(t, x, S, q) . (A.21)
as desired.
Proof of Proposition 4.3. Let ωq (t) = eκhq (t) , or equivalently, hq (t) = κ1 ωq (t).
Substituting this into Equation 4.28 gives
∂t ω q 1 ξ ωq−1
+ α q − ϕα σ 2 q 2 + 1q=0 = 0
κωq 2 κ ωq
1
∂t ωq + ακqωq − ϕα κσ 2 q 2 ωq + ξωq−1 1q=0 = 0
2
∂t ωq + Kq ωq + ξωq−1 1q=0 = 0, (A.22)
together with terminal conditions ωq (T ) = e−κq (q) . To prove Part (i), Equa-
tion A.22 becomes
∂t ωq = −ξωq−1 1q=0 .
For q = 0, this results in ω0 (t) = 1. For q > 0, integrating both sides yields
T
ωq (T ) − ωq (t) = −ξ ωq−1
t
T
ωq (t) = ξ ωq−1 (u)du + ωq (T ) . (A.23)
t
Portfolio Liquidation and Ambiguity Aversion 111
Since ω0 (t) is a constant and each ωq results from the integral of ωq−1 , ωq (t)
can be written as
q
ωq (t) = Cq,n (T − t)n , (A.24)
n=0
where each Cq,n must be computed. Substituting Equation A.24 into Equation
A.23 gives
T
q−1
ωq (t) = ξ Cq−1,n (T − u)n + ωq (T )
t n=0
q−1 T
=ξ Cq−1,n (T − u)n + ωq (T )
n=0 t
q−1
(T − t)n+1
=ξ Cq−1,n + ωq (T )
n=0
n+1
q
(T − t)n
=ξ Cq−1,n−1 + ωq (T ).
n=1
n
ξ
Cq,n = Cq−1,n−1
n
and
Cq,0 = ωq (T ) .
ξn ξ n −κ(q−n)(q−n)
Cq,n = ωq−n (T ) = e .
n! n!
To prove Part (ii), return to Equation A.22. Since K0 = 0, it is easily seen that
ω0 (t) = 1. For q > 0, a recursive solution to Equation A.22 can be written as
∂t ωq + Kq ωq + ξωq−1 = 0
∂t (eKq t ωq (t)) = −ξeKq t ωq−1
T
eKq T
ωq (T ) − e ωq (t) = −ξ
Kq t
eKq u ωq−1 (u)du
t
T
−Kq t
ωq (t) = ξe eKq u ωq−1 (u)du + ωq (T )eKq (T −t) .
t
(A.25)
112 High-Performance Computing in Finance
With ωq (t) = 1 and each ωq−1 (t) being integrated against eKq t , the general
form of ωq (t) can be written as
q
ωq (t) = Cq,n eKn (T −t) , (A.26)
n=0
where each Cq,n must be computed. Substituting Equation A.26 for ωq−1 into
Equation A.25 gives
T
q−1
−Kq t
ωq (t) = ξe eKq u
Cq−1,n eKn (T −u) du + ωq (T )eKq (T −t)
t n=0
T
q−1
−Kq t
= ξe Cq−1,n eKn T e(Kq −Kn )u du + ωq (T )eKq (T −t)
t n=0
q−1 T
= ξe−Kq t Cq−1,n eKn T e(Kq −Kn )u du + ωq (T )eKq (T −t)
n=0 t
q−1
e(Kq −Kn )T − e(Kq −Kn )t
= ξe−Kq t Cq−1,n eKn T + ωq (T )eKq (T −t)
n=0
Kq − Kn
q−1
eKq (T −t) − eKn (T −t)
=ξ Cq−1,n + ωq (T )eKq (T −t)
n=0
K q − K n
q−1 q−1
Cq−1,n Kn (T −t) Cq−1,n
= −ξ e + ξ + ωq (T ) eKq (T −t) .
n=0
Kq − Kn n=0
Kq − Kn
(A.27)
−ξCq−1,n
From the first summation term above, it is deduced that Cq,n = Kq −Kn for
q > n. This recursive relation leads to
#
j
1
Cn+j,n = (−ξ)j Cn,n .
p=1
Kn+p − Kn
q−1
Cq,q = − Cq,n + ωq (T ).
n=0
Proof of Proposition 4.4. Case (i): For case (i) with Kq = 0 for all q, the
value function is of the form:
q
1
hq (t) = log Cq,n (T − t)n ,
κ n=0
where each Cq,n > 0, and the feedback form of the optimal depth is
∗ 1 ϕ
δq (t) = log 1 + − hq−1 (t) + hq (t).
ϕ κ
Substituting the value function into the feedback expression gives
)q
n=0 Cq,n (T − t)
n
1 ϕ 1
δq∗ (t) = log 1 + + log )q−1 .
n=0 Cq−1,n (T − t)
ϕ κ κ n
Since each Kn is negative (except for K0 which is equal to 0), this clearly
converges as (T − t) → ∞ to
∗ 1 ϕ 1 Cq,0
δq (t) → log 1 + + log
ϕ κ κ Cq−1,0
1 ϕ 1 −ξ
= log 1 + + log .
ϕ κ κ Kq
References
Almgren, R. and Chriss, N., 2001. Optimal execution of portfolio transactions. Jour-
nal of Risk, 3, 5–40.
114 High-Performance Computing in Finance
Avellaneda, M. and Stoikov, S., 2008. High-frequency trading in a limit order book.
Quantitative Finance, 8(3), 217–224.
Cartea, Á., Donnelly, R., and Jaimungal, S., 2017. Algorithmic trading with model
uncertainty. SIAM Journal on Financial Mathematics, 8(1), 635–671.
Cartea, Á. and Jaimungal, S., 2015a. Optimal execution with limit and market
orders. Quantitative Finance, 15(8), 1279–1291.
Cartea, Á. and Jaimungal, S., 2015b. Risk metrics and fine tuning of high-frequency
trading strategies. Mathematical Finance, 25(3), 576–611.
Cartea, Á., Jaimungal, S., and Penalva, J., 2015. Algorithmic and High-Frequency
Trading. Cambridge University Press, Cambridge, United Kingdom.
Guéant, O. and Lehalle, C.-A., 2015. General intensity shapes in optimal liquidation.
Mathematical Finance, 25(3), 457–495.
Guéant, O., Lehalle, C.-A., and Fernandez-Tapia, J., 2012. Optimal portfolio liquida-
tion with limit orders. SIAM Journal on Financial Mathematics, 3(1), 740–764.
Guéant, O., Lehalle, C.-A., and Fernandez-Tapia, J., 2013. Dealing with the inven-
tory risk: a solution to the market making problem. Mathematics and Financial
Economics, 7(4), 477–507.
Jacod, J. and Shiryaev, A. N., 1987. Limit Theorems for Stochastic Processes.
Grundlehren der mathematischen Wissenschaften. Springer-Verlag, Berlin,
Germany.
Kharroubi, I. and Pham, H., 2010. Optimal portfolio liquidation with execution cost
and risk, SIAM Journal on Financial Mathematics, 1(1), 897–931.
Obizhaeva, A. A. and Wang, J., 2013. Optimal trading strategy and supply/demand
dynamics. Journal of Financial Markets, 16(1), 1–32.
Chapter 5
Challenges in Scenario Generation:
Modeling Market and Non-Market
Risks in Insurance
Douglas McLean
CONTENTS
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
5.1.1 The challenge of negative nominal interest rates . . . . . . . . 116
5.1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
5.1.3 ESG and solvency 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
5.1.4 Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
5.2 Economic Scenario Generators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
5.2.1 What does an ESG do? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
5.2.2 The yield curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
5.2.3 Nominal interest-rate models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
5.2.4 Credit models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
5.2.5 Equity models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
5.2.6 Calibration issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
5.2.7 High-performance computing in ESGs . . . . . . . . . . . . . . . . . . 142
5.3 Risk-Scenario Generators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
5.3.1 Co-dependency structures and simulation . . . . . . . . . . . . . . 144
5.3.2 Mortality and longevity risk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
5.3.3 Lapse and surrender risk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
5.3.4 Operational risk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
5.4 Examples of Challenges in Scenario Generation . . . . . . . . . . . . . . . . . 149
5.4.1 PEDCP representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
[Link] Distribution function . . . . . . . . . . . . . . . . . . . . . . . . . 152
[Link] Density function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
5.4.2 Stochastic volatility and jump diffusion representation . . 157
[Link] Distributional representation . . . . . . . . . . . . . . . . . 158
[Link] The SVJD model and the combined equity
asset shock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
[Link] Unconditional equity asset shock distribution 161
5.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
115
116 High-Performance Computing in Finance
5.1 Introduction
5.1.1 The challenge of negative nominal interest rates
In August 2014, the euro-denominated German 1 year bond yield dropped
below zero and stubbornly stayed there until the time of writing (late Novem-
ber 2017). Like many other rates, including more recently, the German 10-year
rate in June 2016, this heralded an unprecedented era of low interest rates.
Symptomatic of the policy of quantitative easing from the European Central
Bank, the bond market has been reacting to a surfeit of cash. Now overnight
bank deposits incur charges and borrowers would receive interest rather than
pay it. Such a policy was designed to stimulate lending and punish those who
might otherwise save. For economic scenario generation, this has exposed a
challenge: model risk. In what will be discussed in the sequel, this has led
to a quandary in terms of the log-normal basis in which the so-called mar-
ket implied volatilities are typically quoted. It was now no longer possible to
receive market quotes from data vendors as the log-normal implied volatili-
ties had ceased to exist. The Libor Market Model (e.g., Brigo and Mercurio
2006; Wu and Zhang 2006) could not be calibrated as implied volatilities
had ceased to exist. The ramification for economic scenario generators (ESG)
with a knock-on effect to insurance companies’ asset and liability modeling
systems (ALM for short) was that they were no longer going to be able to
make stochastic projections of the most fundamental quantity in the market—
the yield curve. Something needed to be done. The error had been in a lack
of foresight: the assumption that nominal interest rates could go negative had
been inconceivable yet it had happened. The obvious solution would be to
switch to a different model that could cope with negative rates. But which
one? Fortunately, the investment banking industry had already foreseen a neg-
ative nominal rate scenario and had been using Bachelier’s arithmetic Brown-
ian motion as a model (1901) for the underlying interest rates for some time.
Bachelier’s model allowed negative nominal rates where Black’s model did not.
Furthermore, the data providers had been offering absolute implied volatili-
ties to match. The switch was made and, for the time at least, another ESG
challenge had been resolved.
5.1.2 Objectives
I purposely romanticized the situation of negative nominal interest rates
to emphasize the scale of the challenge that faces anyone who sets out to
construct an ESG. It is not easy. For one thing, it involves modeling each of the
major asset classes in the financial markets and then “hanging” them together
within a coherent co-dependency structure. The structure of choice is almost
always the Gaussian copula and this is almost never appropriate within a risk-
management system. It is not by accident that this is referred to as the formula
that killed Wall Street (e.g., Mackenzie and Spears 2012). Its misuse certainly
Challenges in Scenario Generation 117
1 [Link]
2 [Link]
3 [Link]
4 [Link]
118 High-Performance Computing in Finance
the Solvency 2 directive instructs all European insurers to work on this basis.
It is worthy of note that other regimes like Solvency 2 are in force elsewhere in
the world. The Swiss Solvency Test (SST) has been granted full equivalence to
Solvency 2 while regulatory regimes in Australia, Bermuda, Brazil, Canada,
Mexico, and the United States have been granted Solvency 2 equivalence for
a period of 10 years under Pillar 1. This is largely driven out of necessity for
non-EU domiciled insurers who participate in EU markets.
This chapter focuses on the use of an ESG to generate stochastic scenario
sets for use within the context of the European Union’s Solvency 2 directive
under Pillar 1 (and potentially under Pillar 2). Note that by scenario I mean
the market dynamics that can be simulated from a given multi-variate proba-
bility distribution via coupled systems of stochastic differential or econometric
equations. Monte-Carlo simulation is the vehicle which permits any practical
number of simulations each representing a state of the world. Each scenario
unfolds differently given the same, current market conditions. Except to say
that they may be used within the context of ORSA (Pillar 2), I do not discuss
the careful crafting of specific scenarios which are aligned with stress test-
ing in the Basel II banking regulations. These scenario-set sizes are usually
small and are aimed at testing whether a business can withstand a highly
stylized specific shock such as a worldwide pandemic, financial crisis or ter-
rorist attack. Rather the scenarios which emerge from an ESG are randomly
generated from a well-defined stochastic process and are usually generated on
the order of thousands if not tens or even hundreds of thousands.
The Solvency 2 regulatory regime came into vigor on January 1, 2016 fol-
lowing a period of extended industry consultation lasting more than a decade.
It replaces 13 individual regulatory regimes over 27 jurisdictions in Europe
(including the United Kingdom). There are three Pillars of Solvency 2 which
each assess fundamental components of good insurance practices. Pillar 1 is
by far the most prescriptive of the three and requires each insurance com-
pany to report its solvency capital requirement (SCR). This is a quantita-
tive measure of its ability to withstand a severe 1-in-200-year event over
a 1-year horizon: that is, it is the 1-year value-at-risk (VaR) at the 99.5th
percentile.
The local regulator (such as the Prudential Regulatory Authority, PRA,
in the United Kingdom) is then at liberty to add a capital buffer if it judges
the SCR to be too low given the specific nature of an insurer’s business. An
additional metric is computed: the minimum capital requirement or MCR;
which the PRA sets and lies between 25% and 45% of the SCR (including
any capital buffer). If the regulatory capital held by an insurer falls below this
MCR then it is deemed to be insolvent and its business is put into admin-
istration. If, however, the regulatory capital held is more than the MCR yet
less than the SCR then the regulator issues a requirement to the insurer to
detail how it will increase its regulatory capital above the level required by
the SCR. This plan must be returned within one month after which the regu-
lator may directly intervene in the management of the insurance company if
Challenges in Scenario Generation 119
it is not satisfied. The t-year VaR at level α is defined on the (stochastic) loss
variable Lt :
VaRαt := inf x∈R {x : Pr(Lt ≤ x) ≥ α} (5.1)
It is worthy to note the choice and the stability of the metric that is being
used to define the SCR: the 1-year VaR at 99.5%. The rationale for the 1-year
VaR measure is that it predicts the amount of extra capital that would have
to be raised now and invested at the risk-free rate so that in one year’s time
an insurer would be solvent (McNeil et al., 2015). This argument can be used
to show that the SCR is a quantile of the loss distribution. However, McNeil
et al. (2015) detail the properties a good risk measure should have and go
on to show that the VaR risk measure does not satisfy each of these. The
four properties of a coherent risk measure ρ̃(L) as a function of the loss L
(considered a random variable over some linear space M of losses) are:
which is arguably a better choice than the simple VaR measure as it aver-
ages each VaR value above the α-quantile of the loss distribution. In any
event, a risk measure is simply a gross summary statistic from an insurer’s
multivariate risk-driver distribution. Given the large dimensionality directly
implicated in this multivariate density (see Sections 5.2 and 5.3), taking such
a gross summary is a severe marginalization of a high-dimensional object.
Since some aspects of the density will be better estimated than others and
120 High-Performance Computing in Finance
some will be based on very little information at all (if only by expert judge-
ment, e.g., losses incurred by operational risks have little available data), then
the sensitivity of the VaR measure to small perturbations ought to be given
scrutiny. A small change in assumptions leading to a slightly different model
parametrization, for example, has the potential to lead to an unpredictable
change in an insurer’s capital requirements, be it for better or for worse.
Compared to Pillar 1, Pillar 2 is a broad instruction to insurers to make
their own risk and solvency assessment (ORSA). The wording is deliberately
vague and is intended to instill a holistic risk management ethos. Insurers
must demonstrate that they have considered, understood, and quantified all
possible risks their businesses face over a long time horizon (beyond the 1-year
horizon of Pillar 1). It is also a chance for an insurer to use its own in-house
modeling approach that accurately represents the risks on its balance sheet.
One approach is to use an ESG in a multi-period projection of market and
non-market risks. This can be rather onerous on the necessary scenario budget,
however. Given the vague wording of Pillar 2, an insurer may opt for a small
number of stylized scenarios. These are like the Basel II stressed scenarios used
in banking regulation. Typically, they test the robustness of an institution to
specific outcomes in the future such as a second financial crisis. In any case,
insurers must demonstrate that they have implemented an appropriate risk
management strategy to mitigate their risks. Ultimately, the regulator reviews
their approach and can choose whether to accept or reject it. In the latter case,
an insurer would be required to update their risk management procedures to
a new acceptable level and resubmit for approval: a task which is certainly
costly.
Pillar 3 focuses on disclosure and transparency, and sets out regulatory
reporting standards. Under Pillar 3, insurers are required to submit two
reports annually: A Solvency and Financial Condition Report and a Regu-
lator Supervisory Report; the former is made public and the latter is private.
Since scenario generation is primarily concerned with Pillar 1 (and increas-
ingly with Pillar 2) I refer the interested reader to the website of the Bank of
England5 for more information on Pillar 3 and continue with my discussion
of Pillar 1.
Computation of the SCR may follow one of two paths: either insurance
companies use a prescribed standard formula in their calculations or they
may define and use their own internal model. Being formulaic, the standard
formula model is easier to implement but has the downside of being relatively
insensitive to the specifics of an insurer’s business. For example, if a book
of business contains mainly plain vanilla products, that is, such as standard
annuities or self-invested personal pension schemes, then the standard formula
may be a sufficient and cost-effective way of evaluating the SCR. On the other
hand, if the book of business contains defined benefit schemes or guaranteed
products with derivative styled payoffs, it will certainly be advantageous to
5 [Link]
Challenges in Scenario Generation 121
incur the often substantial cost of building one’s own internal model that will
give a more accurate (and hopefully less punitive) SCR.
The regulatory supervisor recognizes the use of stochastic models of mar-
ket dynamics and goes on further to say that these should be supplied by an
ESG. Market dynamics may be simulated under one of the two paradigms:
the real world P-measure or the risk-neutral/market consistent Q-measure.
Valuation of derivative styled optionality embedded in the insurer’s balance
sheet is then computed under the Q-measure, which (typically) guarantees
the absence of arbitrage.6 I say typically since it is not always possible to find
such a Q-measure. For example, when pricing long-dated bonds (or anything
derived from them such as long-dated interest rate hedges), the market may
be quite illiquid with only a few players on the buy and sell sides. Reliable
prices may not exist and so the arbitrage-free Q measure may be unreliable.
This represents a significant challenge in economic scenario generation and is
the subject of current research (Salahnejhad and Pelsser 2015). Forecasting
an insurer’s assets and liabilities over, for example, a 1-year horizon to estab-
lish a loss distribution must be done under the real-world P-measure. This
is the best forward-looking measure given the state of the world today and
where economic expectations may reasonably be expected to fall over time.
Under specific technical conditions (see, e.g., Baxter and Rennie 1996), the
real-world P- and risk-neutral/market-consistent Q-measures may be related
via Girsanov’s theorem and the Radon-Nikodym derivative. One may simply
suffice to postulate the existence of a market price of risk as compensation for
bearing the risk in a world that is not arbitrage-free. This introduces a drift
correction to the growth rates to compensate. Having established the need for
two measures one must choose the most appropriate stochastic models accord-
ingly and calibrate their parameters. This represents a challenging problem in
scenario generation: parameter calibration; a challenge I will return to discuss
in Section 5.2. Typically, one calibrates models under the risk-neutral/market-
consistent Q-measure to a snapshot of the current market. Models under the
real-world P-measure are often fitted to historical data. The obvious diffi-
culty is then in reconciling the financial theory of measure changes between
the risk-neutral and the real world via a drift correction with the statistical
goodness-of-fit in models under each measure. If the fit is good under one
measure yet poor under the other one may opt for different models under
each measure which aren’t related by a measure change. There is no easy way
to reconcile this dichotomy but in practice one needs to be pragmatic.
ALM systems model, at the most granular level, the behavior of the pol-
icyholder schemes and the assets backing them. An essential ingredient of
each ALM system is its market model which is supplied by an ESG. Depen-
dent on the precise flavor of the insurance fund under consideration, these are
invariably sensitive to equity, foreign exchange and credit risk but are always
6 For an introduction to the idea of risk-neutral pricing, see Hull (2005) or Baxter and
Rennie (1996).
122 High-Performance Computing in Finance
sensitive to interest rate and inflation risk. When an ALM is coupled to sce-
narios emerging from an ESG, it becomes a Monte-Carlo simulation engine
that is either used for pricing or forecasting. When pricing, any liability with
derivative styled payoffs may be valued consistently to the market by approx-
imating the expectation of the discounted payoff under the Q-measure, and
by taking the sample mean of Monte-Carlo-simulated market scenarios. It is
imperative that the simulations are run under the risk-neutral measure for
this approach to be appropriate. Indeed, market-consistent valuation occurs
on both sides of the balance sheet: assets as well as liabilities. The real-world
P measure is required under Pillar 1 of the Solvency 2 SCR calculation when
projecting assets and liabilities onto the 1-year horizon. This develops the loss
distribution from which the SCR’s 99.5% 1-year VaR may be taken. How-
ever, the loss distribution will only be completely defined if the variables
in the balance sheet appear at the 1-year horizon. This is not the case for
derivatives such as equity options and interest rate swaptions. Importantly,
if an insurer’s business incorporates policyholder guarantees then these, hav-
ing derivative styled characteristics, will not yet be valued either. For exam-
ple, a minimum money-back guarantee in a with-profits fund hides latent
value or moneyness at any point in time if the fund has policies in-force. One
must make a further projection of the fund but this time under the market-
consistent Q-measure and discount/deflate this value back to the 1-year hori-
zon. This allows a value to be established that considers the time value of
guarantees (TVOG) or the market-consistent embedded value (MCEV) of the
guarantees.
The need for a second projection under the market risk-neutral measure
beyond the 1-year horizon leads to what is known as the nested stochastic
problem and is a significant challenge in economic scenario generation. For
example, if an insurer chooses to build the loss distribution using a thousand
real-world forecasts at 1 year then, for each of these, a similar number (say
another one thousand7 ) of market-consistent scenarios are needed to capture
any latent value inherent in the derivative styled behavior of an insurer’s assets
and liabilities. In some instances, derived securities, such as equity options,
a Black–Scholes model could theoretically replace the need for the second
(inner) set of market risk-neutral simulations. However, such formulas are not
available for more complicated equity models, or for certain interest rate and
credit models. Whenever derivative securities are on an insurer’s balance sheet
then they will need to be valued and one simple solution is to use the Monte-
Carlo simulation to a given time horizon, discount and take a sample mean. For
liabilities, there are certainly no closed-form solutions or numerical methods
for assessing their embedded value, and the Monte-Carlo simulation is the only
way they may be estimated.8 An open question is whether such estimators are
7 Even here, this number may not be enough and is highly problem specific.
8 With the possible exception of using a replicating portfolio of matching assets. However,
getting such a portfolio to match well enough is a somewhat intractable problem.
Challenges in Scenario Generation 123
biased or if they have an unacceptably high variance. One may, for example,
settle for an estimator with a much lower variance if its bias is controllably
small somehow: for example, by using a weighted estimator where the weights
are chosen optimally by minimizing some out-of-sample metric. Opting for
a better estimator may reduce the number of scenarios required which, in
the naive case of the arithmetic sample mean estimator, is of the order of
a million. Such large numbers of scenarios produce a serious performance
bottleneck: ALM systems are currently not sufficiently fast enough to process
this number of scenarios in a reasonable amount of time. Indeed, typically
scenarios from an ESG are written out to a csv (or equivalent) file and then
read into an ALM system. The I/O burden of processing such a large scenario
set file using, say, a monthly time step over multiple risk drivers in multiple
economies is, at least currently, prohibitive.
Mercifully, some solutions to the nested stochastic bottleneck exist. They
include the use of replicating portfolios, curve-fitting and least-squares Monte
Carlo (see Cathcart 2012). The latter technique approximates the asset and/or
liability values with regression functions that have been fitted to noisy ver-
sions of the nested stochastic output. To be more precise, instead of running
one thousand market-consistent simulations for every real-world scenario, a
very small number of market-consistent simulations is run per real-world sce-
nario. This brings the overall scenario budget down from a million to only a
few thousand: something which is much more manageable for an ALM sys-
tem. The challenge here is now to produce a few thousand market-consistent
calibrations for each of the few thousand states of the world emerging at the
1-year horizon: a non-trivial exercise if this is to be done in any reasonable
time. Functional approximations are typically made to produce calibrations
quickly rather than force them through time-consuming numerical optimiza-
tion algorithms. One should exercise caution as, yet again, the parameter
calibration problem appears and care must be taken to ensure it is realis-
tic (see Section 5.2). This scenario set is run through the ALM system and
a regression function (such as a multiple polynomial regression function, or
other, in the underlying variables) is fitted to the (discounted) desired asset
or liability payoffs. The idea behind the least-squares Monte-Carlo technique
is that the expectation of the discounted payoff may be approximated by the
linear predictor coming from the polynomial regression function. If one can
show that the conditions of the Gauss–Markov theorem for best linear unbi-
ased estimators (e.g., Zyskind and Martin 1202; DeGroot and Schervish 2013)
is satisfied, then this approximation is asymptotically rigorous. It is clear that
there will be assets and/or liabilities where this is not the case or where one
is not in the asymptotic limit and so one must use these estimators after they
have been validated for bias and variance. How easy or otherwise this is to
validate is problem specific but it seems that in many cases the simple linear
predictor from an ordinary least-squares regression is adequate. In problem
cases, for example, where the data display heteroskedasticity, simultaneously
modeling the mean and the variance with regression functions can help (see
124 High-Performance Computing in Finance
the generalized additive models for location, shape, and scale9 ). For Pillar 2
ORSA, regression functions could be generated at each time period of interest.
Finally, truly high-performance solutions, such as that offered by Willis
Towers Watson’s MoSes HPC or Aon Benfield’s PathWise(TM) , Phillips
(2016), have only very recently become available. It may take some time before
these solutions can be fully absorbed and accepted by the insurance industry
and so, for the moment at least, approximations are de rigeur.
5.1.4 Layout
This chapter is therefore laid out as follows. In Section 5.2, I discuss the
make-up of a typical ESG including the major asset classes that it models.
At a high level, I will illustrate some of the challenges that face the model-
ing exercise such as parameter calibration and discuss some high-performance
computing techniques that can be deployed to accelerate scenario set gener-
ation.10 In Section 5.3, I will discuss copula co-dependency structures intro-
duced by Sklar (1959) and go on to introduce the risk scenario generator
(RSG). This is a natural extension of an ESG to allow for non-market risks
such as policyholder lapse and operational risk. In Section 5.4, I will illus-
trate in some detail two problems that are typical of the degree of complexity
faced by ESGs. There I give a new method to represent the marginal distri-
bution functions of composite variables. Given this representation, one can
simulate correlated instances of these variables through the copula-marginal
factorization. I conclude with a discussion in Section 5.5.
not model a market observable quantity) and more limited in its ability to
describe market features such as implied volatility skew. It also predicts nega-
tive rates which, until recently, had been an undesirable property for nominal
interest rates. The Vasicek model does have far fewer parameters and as such
is a much simpler model than a Libor market model with stochastic volatility.
If after optimizing over the mean-squared error each model produces similar
qualities of fit for similar optimal objective values and diagnostic plots, or if
the simpler model provides a better fit then this calls into question the use
of the more complicated model. It is possible that the data simply doesn’t
support the more detailed model even if that is the true generation process. A
simple analogy might be when one tries to regress noisy univariate response
data Y on a predictor X. The true process may be quadratic in X, or perhaps
even cubic or quartic in X; but if the data are insufficient to pin down the
model’s parameters, only noise will be represented by the more complicated
model.
These risks serve to illustrate the complexity and challenges involved in
creating an effective scenario generator. The Moody’s Analytics ESG won
the 2015 [Link] award for the best ESG based on a holistic approach ser-
vicing the insurance sector. Its RiskIntegrity suite included, in addition to
the base ESG, an Automation Module permitting an automatic calibration
and scenario set generation ability to any number of economies and models.
This alleviates a significant burden from insurance companies. In addition,
the Enterprise-level Proxy Generator allows clients to directly address the
nested-stochastic problem inherent in the SCR calculation under Pillar 1 of
Solvency 2.
Before I move on to motivate the basic ESG engine, its asset classes and
challenges, I refer the interested actuarial reader to Varnell (2011) who dis-
cusses scenario generators in the context of Solvency 2 (and to the references
contained therein).
dSt
= μ(St , t) dt + σ(St , t) dWt , S0 given (5.3)
St
Here μ is the drift which may depend on the price St and time t, and dt is
the deterministic differential in t. The second term on the right-hand side is
the purely random term. The volatility σ may depend on the price St as well
as on time t like in the drift term μ. The random Brownian component Wt is
written in terms of its stochastic differential dWt . Mathematically, Equation
5.3 is shorthand for a given integral representation that I omit here (but see,
e.g., Baxter and Rennie 1996). The properties of Wt are:
ESG designer is obliged to use some random number generator and several of
these exist: for example, Wichmann and Hill (1982, 1984) and the Mersenne
Twister (Matsumoto and Nishimura 1998). These are industry standard meth-
ods for random number generation and they work reasonably well. However,
one should always be cautious of random number generation: if an insuffi-
ciently elegant generator is used, the numbers emerging can eventually show
deterministic patterns.
Many introductory textbooks on financial engineering (e.g., Hull 2005;
Baxter and Rennie 1996) show, by means of the Girsanov theorem, how a
simple change-of-measure may relate the Brownian shocks dWtP under the
P-forecasting measure to those under the pricing measure Q:
where λ is necessarily (μ−r)/σ. This highlights the duality that exists between
the P and Q measures and illustrate the two paradigms under which an ESG
must be able to simulate, here in a simple setting. One observes the stock-
price process Equation 5.6 and is able, at least in principle, to find an estimate
for μ. For pricing purposes, however, one must use the growth rate r from the
numeraire asset and simulate using Equation 5.7. If the price today of a 3-
month equity put option is sought, then the numeraire asset is the 3-month
nominal interest rate government bond. I say it is possible to estimate μ “in
principle” because this can be a challenging exercise in robust statistical esti-
mation. One must appeal to data to find a suitable historical period and then
use the method of moments, maximum likelihood or use a Bayesian approach
to estimate μ. The question of precisely which historical period becomes pri-
mordial: too short a period and sampling error will dominate an estimate, too
Challenges in Scenario Generation 129
long and the lack of stationarity may bias an estimate. A more robust approach
might be to make an economic assumption about the long-term trend in the
growth rates that can take a holistic approach in view of the entire market.
One may then observe r and compute a risk premium μ∗ = μ − r = λσ. A cal-
ibration approach in the real world is therefore to produce a consistent set of
long-term calibration targets in terms of assets’ risk premia. For more detailed
assets, such as real or nominal interest rates, or credit, the calculation of risk-
premia becomes more challenging as a term structure of targets emerges: one
is liable to make functional assumptions about risk-premia. When making
assumptions on risk premia, one must assess their sensitivity to economic
assumptions. This is a major challenge in economic scenario generation made
more difficult owing to the amount of analyst intervention that is needed to
estimate the risk-premia. Unlike a market-consistent calibration where clearly
available market data are available, the situation in real-world modeling is
much more fluid and open to interpretation.
The requirement of assets’ growth to be fixed at the numeraire rate in the
market-consistent setting affords us a particularly useful yet simple valida-
tion technique that is essential for any well-functioning ESG. Specifically, one
may use a simple charting functionality to demonstrate how well a market-
consistent ESG is functioning. As another example, consider a 10-into-20-year
receiver swaption struck at par. This is a derivative contract giving the holder
the right but not the obligation to enter into a 10-into-20-year swap contract in
10 years’ time. If market swap rates fall below the current par-yield 10 years’
hence then the holder may exercise his or her right to enter into the more
valuable swap contract at its option maturity date. Otherwise, the swaption
matures worthless as swap rates are more valuable in the market at maturity.
The reference asset for this swaption is a 20-year fixed term deferred annuity
whose first payment is in 10 years’ time. Arbitrage theory coupled with the
Girsanov theorem tells us that unless the growth rates of the swaption and the
annuity are the same then a risk-free profit may be locked in by constructing
a portfolio which is short one of the assets and long the other in a way that is
known in advance. One way to validate the output from a risk-neutral ESG
is to check that such derivative securities are martingales. Recall that for a
process Mt to be a martingale with respect to a numeraire Bt :
$
Mt $$ Ms
EQ Fs = 0 ≤ s ≤ t; (5.8)
Bt $ Bs
in the filtered probability space ({Ω, F, {Ft }t≥0 , Q). So, one may reasonably
compute the value of the 10-into-20-year swaption Mt over simulated time and
plot the ratio of it with the deferred annuity Bt . If the resulting time-series
plot shows an absence of a trend (i.e., an absence of growth) and wanders
randomly around a value of 1 then one may have confidence that the pricing
measure is sound. This may be repeated for the term structure of swaptions
struck at par (or other) and overlay error bars for a compact validation plot.
130 High-Performance Computing in Finance
20 Note that it does expose them to policyholder lapse and surrender risk should they need
n ti
P (T ; c) = (δin + c)exp − f (t)dt , n = ωT. (5.9)
i=1 0
The forward rates are represented by a latent cubic spline modeling the
forward rates curve at the K + 1 a priori specified knot points tk for
k = 0, 1, . . . , K (tK = TM , the longest bond maturity) with a Nelson and
Siegel (1987) extrapolation in tK < t ≤ tmax (typically tmax = 120 years):
as
K
f (t|θ1:K ) = I(tk−1 ≤ t < tk )s(t|θk ) + I(tK ≤ t ≤ tmax )fns (t|β); (5.11)
k=1
P (T |Θ) = exp(−τ
T Θ) (5.12)
where τT ∈ R4K+3 contains the parameter-free knot point detail from the
cubic spline and Θ = vec(θ1:K , β)
∈ R4K+3 contains the parameters from
each spline component and the extrapolant. If the modeled forward rates given
in Equation 5.11 evaluated at the knot points t0:K are: f = (f0 , f1 , . . ., fK )
∈
RK+1 ; then the relationship between the spline’s parameter vector Θ and the
21 The forward rate at the longest modeled maturity tmax .
Challenges in Scenario Generation 133
and it is noted that the swap rate is a function of the forward rates at the
required knot points: S[Tj |Θ(f)]. The objective Equation 5.14 may be updated
by adding a term:
M !2
w0 Sj − Ŝ[Tj |Θ(f )] (5.16)
j=1
The challenge is then to find functions A and B consistent with the dynamics
for rt :
drt = μ(t, rt )dt + σ(t, rt )dWt
If one seeks affine transforms of the short rate for the drift and (squared)
volatility, then a coupled system of ordinary differential equations emerges
(maturity T is considered a parameter):
dA 1
− β(t)B + δ(t)B 2 = 0, A(T, T ) = 0
dt 2
dB 1
+ α(t)B − γ(t)B 2 = −1, B(T, T ) = 0
dt 2
For the choice of μ = b − ar and σ constant then α = −a, β = b, γ = 0 and
δ = σ 2 then the Vasicek dynamic model emerges:
The conditional distribution for the short rate given any time horizon is normal
and the benefit of such simple dynamics is in its analytically tractability: a
model such as the Vasicek model is easily implemented in an ESG. However, a
detraction of this model is precisely that its conditional distribution is normal:
this implies negative short rates and, until recently, nominal rates have not
been negative. The Vasicek model still has its place as a real interest rate
model where rates, measuring the purchasing power of a basket of goods, can
realistically be negative. Still, a single-factor model may not produce enough
Challenges in Scenario Generation 135
dispersion and for this one may reasonably take a two-factor Vasicek model
and this is done by introducing a second SDE to model its long-term mean.
Typically, one may work with
While this leads to a more flexible affine term-structure model, it does not
yet allow for the initial yield curve to be modeled. Rather, at time zero some
Vasicek implied yield curve emerges over which there is no control. To obviate
this challenge, one moves to the more flexible framework of the Hull and White
model (see Hull and White 1990; Hull 2005). The two-factor Hull and White
model is
where, crucially, introduction of the function φ(t) allows the initial yield curve
to be matched exactly. If it is chosen to be the Vasicek implied initial yield
curve, then φ(t) ≡ μ and the Vasicek and Hull and White models may then
be related by the simple change of state variables:
rt = μ + x1t + x2t
α2 − α1 2
mt = μ − xt
α1
guaranteed to be positive since the solution of the CIR SDE is a scaled non-
central χ2 -variable. I defer discussion of the CIR process until the next section
on credit models. The final short rate model I will describe is the two-factor
Black–Karasinski model. Its dynamics are
bond prices is
1 P (t, Tk )
Fk,t = −1
Tk+1 − Tk P (t, Tk+1 )
A first criticism of the basic BGM model is in its insensitivity to skew in
the market implied volatility data: for example, European styled payer or
receiver swaptions show variations in their implied volatility relative to those
struck at par. Market implied volatility data depend on three parameters:
the maturity of the swaption, the tenor of the underlying swap contract and
the strike relative to the par yield curve at contract inception. An insurer
whose liabilities are exposed to a fall in interest rates may purchase a receiver
swaptions portfolio to hedge this risk away. The insurer may like to strike
the swaptions in the hedge at the guaranteed rate of interest it has promised
to its policyholders. This is unlikely to be at the current par yield and so a
correct price is sought away-from-the-money. The BGM model is insensitive
to this strike; put another way, the BGM implied volatility (hyper-) surface
is constant across strike. This problem may be alleviated by the introduction
of a stochastic volatility process Vt modeled by, perhaps, a CIR process. The
downside here is that the analytical formulae for the European swaption is
lost and one must appeal to semi-analytical formulae such as that found by
Wu and Zhang (2006).
A second criticism of the BGM model is in its fundamental assumption of
log-normality. This precludes rates ever becoming negative but as illustrated in
the introduction nominal rates can become negative. Market-implied volatili-
ties in the log-normal BGM model require the initial yield curve to be every-
where positive, however. Moreover, whenever the yield-curve approaches zero
it becomes computationally onerous to compute log-normal implied volatili-
ties from swaption price data: they become unstable. When the yield curve is
negative it is no longer possible to compute swaption prices using the stan-
dard Black’s formula. Log-normal implied volatilities cease to exist and the
consequences for parameter calibration are obvious. A solution is to use dis-
placed forward rates: that is, a constant term is added to the forward rates.
The problem then is how to reliably set this displacement term? Furthermore,
the BGM model is quite likely to achieve exponentially large forward rates in
finite time. While this is not an issue for market-consistent pricing per se, it
does mean that scenario sets created by an ESG will have values in it that are
too large to be represented by computational precision: there will be Nan’s.
This is unacceptable for an insurer’s ALM system. A solution is to switch from
the log-normal BGM model to a normal equivalent model with a stochastic
CIR process Vt :
.
dFk,t = Vt σ
dWt
One may calibrate this model using absolute implied volatility data that can
be obtained from market data providers such as Markit and SuperDerivatives.
Exponential blow-up is largely mitigated and the rate may become negative,
but nowadays, this is something that is very real.
138 High-Performance Computing in Finance
model for the purposes of solving the issue of negative nominal interest rates.
For its part, the geometric model Equation 5.18 is lognormally distributed and
non-negative. Non-negativity is certainly sensible for an index or stock price,
but is log-normality always sensible? In the long term, one might envisage a
different unconditional distribution with more or less skew and kurtosis.
For market-consistent pricing of equity derivatives, the lognormal equity
asset model is a practical solution. Its main detraction, however, is that its
price predictions are at odds with the markets. Striking in- and out-of-the
money produces different prices not predicted by the lognormal model. The
volatility surface (Gatheral 2006) is sensitive to strike as well as option matu-
rity. Alternatives to the lognormal model are Merton’s jump diffusion model
(Merton 1976):
dSt
= −λμ̄dt + (ηt − 1)dNt
St (5.19)
ηt ∼ log − N (μ̄, σ 2 )
and dNt ∼ limδt→0+ Po(λδt); and Heston’s stochastic volatility model (Heston
1993):
dSt √ (1)
= μdt + vt dWt
St (5.20)
√ (2)
dvt = α(θ − vt )dt + ξ vt dWt
Jump-diffusions perform well when pricing derivatives of short maturities and
can produce volatility skew there but at longer maturities they do not perform
so well. Stochastic volatility models longer maturities well but suffers on short
maturities. The combined stochastic volatility jump diffusion Bates (1996)
works well:
dSt √ (1)
= (μ − λμ̄) dt + vt dWt + (ηt − 1) dNt (5.21)
St
√ (2)
dvt = α(θ − vt )dt + ξ vt dWt
ηt ∼ log − N (μ̄, σ 2 )
The drift terms above are each carefully set to ensure the zero drift of any
derived quantities of interest under the appropriate numeraire measure. The
remaining difficulty with the Bates model is in a relative inability to corre-
late Bates processes with other processes within an ESG. Although one can
correlate the Brownian shock by means of a Cholesky factorization of the cor-
relation matrix, it is less than evident how to correlate the combined equity
asset process which involves, in addition to the Brownian shock, the effect of
stochastic volatility and a compound Poisson process. I address how to do this
in Section 5.4.
Other popular models that involve stochastic volatility are those of Duffie
et al. (2000), Levy processes (e.g., Carr et al. 2002) and the SABR mixed local
volatility and stochastic volatility models by Hagan et al. (2002).
140 High-Performance Computing in Finance
Luckily there are some remedies to this problem. First, start the optimiza-
tion scheme repeatedly from as many different points in the parameter space
and record the ultimate destinations reached. Examine the index plots of the
optimal metric and if a large proportion of the solutions with the smallest
indices have roughly the same metric, one can be assured that a near global
solution has been reached. Now examine the ultimate parameter values using
the same ranking: do the most optimal calibrations correspond to similar val-
ues of the parameters? If so, this is the best outcome. On the other hand, if
the index plot of the parameters shows a high degree of variability, one may
deduce that the market data does not drive a unique solution. This may be due
to the manifestation of a limit cycle or to a strange attractor. Calculation of
the Lyapunov exponent22 for the system may serve to characterize the conver-
gent phenomena. If one does discover either of these phenomena, the current
market snapshot does not define a unique model. A solution is to complement
the snapshot with more information. If one naively inserts restrictive param-
eter limits, one runs the risk of finding solutions along the boundary. This is
problematic as a calibration found on the boundary is equivalent to reducing
the parameter count by one and grossly simplifies the calibration problem in
an arbitrary way that is difficult to control. This introduces a redundancy to
the difference equation system and one ends up solving an alternative opti-
mization program that was not intended. One needs to add more information
into the system sensibly. One way to achieve this is to weigh the current solu-
tion set with the empirical density function generated from the most recent
well-behaved solution set. This makes it possible to select a solution from the
current data in such a way that is close to the previous data (Hastie et al.
2009). At the very least, when one comes to value one’s liabilities, the change
in parameters will not be unusually profound and the value of the liabilities
today will be consistent with the last time a price was sought.
Having a robust calibration technique that does not change significantly
across consecutive time periods except by an amount that is, in some sense
“reasonable,” is certainly desirable. However, a model calibrated to one type
of market data need not necessarily lead to a similar calibration of the same
model under another. For example, receiver swaption interest rate data is
better defined where it is in-the-money for low interest rates. Payer swaption
prices tend to be smaller where receiver swaption prices are larger and conse-
quently, owing to a disparity in scales, there is less information available for
calibration under payers in this instance. One tail of the underlying density
may be less well defined when comparing across calibrations and hence lead
to different parametrizations. One may calibrate to market-implied volatil-
ities to obviate this risk, but while receiver and payer swaption generated
implied volatilities are theoretically equal, they aren’t in practice and one
finds oneself in the same quandary as with prices. If one replaces swaptions
22 The Lyapunov exponent measures the extent to which two solutions to a dynamical
system will tend to diverge after some period (Hale and Kocak 1991) given they both have
the same initial conditions.
142 High-Performance Computing in Finance
with calibrations to interest rate derivatives such as caps and/or floors, then
the calibration may change yet again despite the theory implying that the
same risk-neutral density should pervade. One must be pragmatic, therefore,
and test the sensitivity of any calibration to data that is out-of-sample. This
is a rather serious issue as I note that the sensitivity to MCEV and TVOG,
and ultimately to the SCR under Solvency 2, depends upon it.
Finally, it is noted that the calibration of an equity SVJD model in eight
parameters necessarily requires the interest rate model to be calibrated first.
The equity model is coupled to the interest rate model through the stochastic
interest rate discount factor. Some approximations can usually be made but
this serves to illustrate the depth and difficulty of the general calibration
problem to market data.
23 [Link]
Challenges in Scenario Generation 143
24 [Link]
25 [Link]
144 High-Performance Computing in Finance
Codependencies and tail risk emerging from, for example, a systemic risk
shock, lead to a plethora of correlation structures. The copula-marginal dis-
tribution factorization (Sklar, 1959) is a key ingredient to any RSG. In the
multivariate risk setting it extends the idea of simple Gaussian correlation of
Brownian shocks via a Cholesky factorization to all risk factors. In what is
clearly unsatisfactory modeling behavior, the Gaussian correlation approach
still allows diversification of risks in times of crisis. During periods of relatively
quiescent and stable economic behavior, diversification seems reasonable but
when concerted action is happening together such as a run on the banks, it is
not. Changing co-dependency structure from a Gaussian to a T -copula on a
low degree of freedom can model the effects of herding in times of crisis yet
still allow for diversifiable behavior during periods of relative calm.
A RSG, therefore, needs to incorporate an ESG but must also provide a
coherent framework for modeling the non-market risks alongside the market
risks. Since the theory for non-market risks is not as well developed as for
market risks, one generally resorts to statistical models for these. RSGs need to
provide an insurer with a statistical toolbox of marginal distribution functions
and copulas to model the joint distribution of market and non-market risks. At
some point downstream of the RSG, the results will be aggregated to produce
capital charges which monetize the impact of these risks on the business. This
section provides some detail on co-dependency structures and non-market
risks.
Sklar showed that for every multivariate distribution function F there exists a
unique (in the case of distribution functions comprised of continuous variables
with certain restrictions to those comprising discrete variables) function C (the
copula) such that for each of the distribution functions marginals F1:p :
#
p
f (x1 , . . . , xp ) = c(F1 (x1 ), . . . , Fp (xp )) fn (xn )
n=1
If one specifies the copula, C(u1 , . . . , up ), then one may combine it with any
desired marginal distributions and the resulting multivariate distribution func-
tion is unique (see McNeil et al. 2015). The strength in this result for scenario
generators is that to simulate from a multivariate distribution, one need only
simulate a multivariate instance from the copula: (u1 , . . . , up )
; and then eval-
uate the quantile functions of each of the marginal variables at these copula
values: (q1 (u1 ), . . . , qp (up ))
; to obtain a desired (ranks-based) instance of the
joint distribution function. Since it is often easy to simulate from a copula (at
least from many of the most popular multivariate distributions: e.g., the nor-
mal, t-copula, or the grouped t-copula) then one may use whatever marginal
distributions desired: they need not be normal nor relate in any way to the
copula.
I note that it is not always possible to generate random variables with any
prescribed correlation under the Pearson product moment coefficient. How-
ever, since the copular approach uses a theoretical ranks-based correlation
between variables then practically any correlation is possible although it may
not be practically possible to obtain a good sample correlation estimate in the
presence of degenerate variables containing multiple repeated instances of the
same value (e.g., the PEDCP case to be discussed in Section 5.4).
Since my motivation was to introduce more realistic co-dependency struc-
tures than the simple Gaussian copula, I will now illustrate the benefits of
alternatives to it in modeling tail risk. For illustration, I will compare the
Gaussian and T -copula on low degrees of freedom in the bivariate case.
The procedure for representing a bivariate copula underlying given bivari-
ate data is easily done by plotting the normalized ranks of the first variable
against the normalized ranks of the second variable. In the theoretical setting
of a known normal or T -copula then this is achieved by applying the marginal
distribution functions to the realized bivariate data. McNeil et al. (2015) detail
how this is achieved. In this example, n = 2.5 × 104 pairs of illustrative data
from a bivariate normal distribution were generated with a given zero mean
and covariance matrix:
1 ρ
Σ=
ρ 1
146 High-Performance Computing in Finance
1.0
0.8
0.4
0.2 Normal
T(4)
0.0
0.0 0.2 0.4 0.6 0.8 1.0
Rank of variable 1
where ρ = 0.8 (Σ is also the correlation matrix P and both marginals are stan-
dard normal). A bivariate T -distribution is created using the same parame-
ters and an additional degrees-of-freedom parameter ν here set to 4. The
T -distribution is created from the bivariate normal data by generating a vec-
tor v of length 2.5 × 104 containing χ2 (4)-variables. Scaling the i-th pair of
normal data by ν/vi generates bivariate T (ν)-data which is easily accom-
plished in any ESG. Since the marginal distributions of a multivariate T (ν)
distribution are univariate tν distributions, then T (ν)-copula data is easily
obtained by applying the tν distribution function to each element of the pair.
Comparing scatterplots of the simulated copula data sample serves to
illustrate the point. Focusing in the upper right-hand corner of the scat-
terplot corresponds to large extremes of both variables: for example, times
of high financial stress. As variable pairs move further into this corner they
are coerced to move closer and closer together under the T (ν) copula than
under the more dispersed normal copula. This goes some way in modeling
the so-called herding behavior of the markets in times of financial stress. The
T (ν)-copula can be extended to the more versatile grouped T -copula in which
sets of like variables in multivariate data are grouped and allocated a group
degrees-of-freedom parameter. Other copulas have been used in finance but
have not, to the best of my knowledge, been applied within the insurance
context such as the Archimedean and Gumbel–Hougaard copulas (see McNeil
et al. 2015); see Figure 5.1.
In the context of ESG and RSG simulations, the copula approach is often
essential in effectively simulating appropriately correlated risk drivers. Finally,
I note that if one intends to simulate the effect of a T (ν)-copula (or other)
over more than one time step, say n steps, then one must decompose the
copula such that after the application of n steps the T (ν)-copula emerges. If
this decomposition is not made and a T (ν)-copula is simulated at each time
step, then a dilution effect occurs which is sometimes referred to as a copula
Challenges in Scenario Generation 147
central limit theorem and a Gaussian copula emerges after n steps for large
enough n. This leads on to the discussion of α-stable distributions and the
reader is referred to McNeil et al. (2015) for more details.
In Section 5.4 where I illustrate challenges in scenario generation, it is as a
direct result of the copula/marginal factorization and Sklar’s converse that I
can simulate correlated instances of some rather challenging variables. In the
remainder of this section I cover some of the major non-market risk drivers.
As a consequence of the converse to Sklar’s theorem, they may be represented
by any univariate distribution function as long as it is possible to simulate
these using the probability integral transform. This is possible for variables
such as the normal, t, chi-squared, and Gamma whose densities are readily
available. For more challenging variables, one solution is to represent them in
some way such as that given for the examples in Section 5.4.
1. Annuitant mortality
3. Income protection
26 [Link]
148 High-Performance Computing in Finance
27 [Link]
150 High-Performance Computing in Finance
fX (x) = fPo (0; λ)fX|N =0 (x) + [1 − fPo (0; λ)]fX|N >0 (x)
Since the distribution of fX|N =0 (x) is degenerate being equal to δ(x) then
from here on the distribution of X given N > 0 is sought and to simplify
notation I simply write X in place of X|N > 0.
The first step is to simulate instances of this variable in isolation for use
in estimating the parameters in a representation of the variable’s distribution
(and latterly, the density) function. To motivate an interesting example, the
following parameters are taken: λ = 1, α = 3, xm = 0.1, and x∗ = 0.15. First,
Challenges in Scenario Generation 151
6000
4000
Frequency 2000
0
0.1 0.2 0.3 0.4 0.5
X
1.0
0.8
ECDF(X)
0.6
0.4
0.2
0.0
0.0 0.1 0.2 0.3 0.4 0.5
X
5×104 Poisson variables Po(1) are simulated using R’s rpois() function giving
the following summary:
## 0 1 2 3 4 5 6 7
## 18560 18271 9024 3156 818 139 24 8
Note that, of the Nreps = 5 × 104 Poisson variates simulated, some 31440
were non-zero, that is, 62.88% of the variates which approximates well the
theoretical value 1 − e−λ ≈ 63.21%. Then, enough Pareto variables were sim-
ulated by generating standard uniform variables u using R’s runif() function
and inserting them into the Pareto quantile function: q(u) = xm (1 − u)−1/α .
After truncating the Pareto variables at x∗ = 0.15 the requisite numbers given
the Poisson counts N were summed to give X.
A histogram of the non-zero results (and truncating all variates simulated
beyond the 99%-ile for exposition only) and its empirical distribution function,
as computed by R’s ecdf() function, are given, respectively, in Figures 5.2
and 5.3.
152 High-Performance Computing in Finance
xm
ϕ(x;κ)
= fX (ϕ−1 (y
; κ))dϕ−1 (y
; κ)
0
ϕ(x;κ)
fX (ϕ−1 (y
; κ))
= dy
0 κ(1 − y
)
y
fX (ϕ−1 (y; κ))
= gY (y
; κ)dy
=: GY (y; κ) where gY (y; κ) :=
0 κ(1 − y)
Note that the steps above are possible since the inverse transformation of ϕ
is found via
1 1
x = xm + log =: ϕ−1 (y; κ)
κ 1−y
and so
d −1 1 d 1 1
ϕ (y; κ) = log =
dy κ dy 1−y κ(1 − y)
In what follows, finite-element representations of both the PEDCP distribu-
tion and density functions are given. As will be seen, the latter leads to a more
satisfactory representation.
Each element of the basis is of the form of a “witches hat”: they are zero
everywhere except over three consecutive nodes: yk−1 , yk and yk+1 ; where
they interpolate the values 0, 1, and 0, respectively. The representation is
K
GY (y) = G̃k φk (y), y ∈ [0, 1]
k=0
K 1
1
iff Ajk G̃k = GY (y)Φj (y)|0 − gY (y)Φj (y)dy
k=0 0
= Φj (1) − EY [Φj (Y )]
This is of the form AG̃ = Φ(1) − E−[Φ(Y )]. The elements of A are, for
0 < j = k < K: 1
2 2
Ajj = φj (y) dy =
0 3K
with A00 = AKK = (1/3K). Elsewhere, for ≤ j, k ≤ K and |j − k| = 1:
1
1
Ajk = φj (y)φk (y)dy =
0 6K
while for |j − k| > 1Ajk = 0. Finally, observe that Φj is an anti-derivative of
φj . Thus, for 0 < j < K:
⎧
⎪
⎪ 0 if y < yj−1
⎨ 2
K(y − yj−1 ) /2 if yj−1 ≤ y < yj
Φj (y) = 2
⎪
⎪1/K − K(yj+1 − y) /2 if yj ≤ y < yj+1
⎩
1/K if y ≥ yj+1
while for j = 0:
2
1/(2K) − K(y1 − y) /2 if y0 ≤ y < y1
Φ0 (y) =
1/(2K) if y ≥ y1
and for j =K:
0 if y < yK−1
ΦK (y) = 2
K(y − yK−1 ) /2 if yK−1 ≤ y < yK
Here, some K = 1000 finite elements were used to represent the information
carried in 31440 non-zero instances of the PEDCP variable that was simulated.
This is the number of doubles that would otherwise be held in memory if the
empirical distribution function were to be used for simulation. The estimated
distribution function is shown in Figure 5.4.
Unfortunately, the distribution function is non-monotone and so this rep-
resentation is unsatisfactory: it may not be used well with the probability
integral transform for simulating instances of the PEDCP variable. A repre-
sentation which is monotone is required. A good finite-element representation
of the density function, that is, one which was everywhere non-negative, would
ensure that the resulting distribution function (now composed of monotone
increasing quadratic segments) would be everywhere increasing. This is the
subject of the next section.
154 High-Performance Computing in Finance
0.8
0.5
0.6
F_X
F_X
0.4
0.4
0.3
0.2
ECDF
FE
0.0 0.2
and note that gY (yk ) = g̃k since φk (yk ) = 1 (the values in between nodes yk
1
are linearly interpolated). Note also that ∫−1 gY (y) dy = 1 and this can be
maintained by scaling the coefficients g̃k accordingly. If the integral’s value is
currently A then:
⎛ ⎞
K 1
1 ⎜1 1 ⎟
a= g̃k φk (y)dy = ⎝ g̃0 + g̃k + g̃K ⎠
−1 K 2 K−1
2
k=0
k=1
One may then scale each coefficient by a to obtain a proper density function.
It is henceforth assumed that this is done. For j ∈ [0, K]:
K 1 1
g̃k φj (y)φk (y)dy = gY (y)φj (y)dy
k=0 0 0
K 1
iff Ajk g̃k = gY (y)φj (y)dy = EY [φj (Y )]
k=0 0
60
40
f_X
20
–20
This solution is noisy and is negative valued for some values of the variable.
To mitigate these problems, seek instead a viscosity solution employing a
regularization technique. Note that the solution x = g of the linear system
described above and denoted by Ax.= e is also the solution to a quadratic
programming problem. Let ||v||2 = Σni=1 vi2 be the familiar 2-norm over all
vectors v ∈ Rn . If h(x; λ) = ||Ax − e||22 + λ||x||22 (λ ∈ R+ ) then the equivalent
quadratic program is
g = argminx∈RK+1 h(x; 0)
The viscosity solution requires λ > 0 and is
gλ = argminx∈RK+1 ,λ>0 h(x; λ)
Note that the objective may be written as
h(x; λ) = x
A
Ax − 2e
Ax + e
e + λx
x
= x
(A
A + λI)x − 2e
Ax + e
e
Let the matrix A
A + λI have the Cholesky factorization LL
for the lower
triangular matrix L ∈ Rn×n . Then defining e∗ := L−1 A
e (or equivalently,
e = A−T Le∗ ) gives
h(x; λ) = x
LL
x − 2(A−T Le )
Ax + (A−T Le )
(A−T Le )
= ||L
x − L−1 A
e||22 + ||e||22 − ||L−1 A
e||22
To find the λ-viscosity solution one may not simply proceed by simultaneously
minimizing h over x and λ as the solution λ → 0 will be sought. Instead, one
observes that for fixed λ > 0 any solution satisfies:
x = L−T L−1 A
e = (LL
)−1 A
e = (A
A + λI)−1 A
e
The λ hyper-parameter is estimated by an out-of-sample-method using 10-fold
cross-validation (Figure 5.6).
156 High-Performance Computing in Finance
Optimization criterion
20 40 60 80 100
Lambda
A
e
g∞ =
a
0.6
F_X
f_X
20
0.4
0
0.2
ECDF
Finite element
–20 0.0
The results are shown in Figure 5.7b. One may simulate from this distri-
bution function using the probability integral transform assured that it is
continuous, monotone increasing and smooth (being everywhere a quadratic
function). This is unlike the case of the representation of the PEDCP distri-
bution function which was non-monotone and only continuous. This is made
even more remarkable given that it was not necessary to solve a large linear
system in achieving this result.
B
= {Tn ∈ C ∞ (R) : Tn (x) = cos[ncos−1 x], n ∈ Z+ }
The support of X is the entire real line R and must be mapped into the
finite interval (−1, 1). Let the transformed variable be Y and the transforma-
tion: y = ϕ(x; κ) := tanh(κx), some κ ∈ R+ (to be chosen presently). The
distribution function FX (x) is then:
x
FX (x) = fX (x
)dx
xm
ϕ(x;κ)
= fX (ϕ−1 (y
; κ))dϕ−1 (y
; κ)
−1
ϕ(x;κ)
fX (ϕ−1 (y
; κ))
= 2 dy
−1 κ(1 − (y
) )
y
fX (ϕ−1 (y; κ))
= gY (y
; κ)dy
=: GY (y; κ) where gY (y; κ) :=
−1 κ(1 − y 2 )
Challenges in Scenario Generation 159
Note that the steps above are possible since the inverse transformation of ϕ
is found via:
1 1+y d −1
x = xm + log =: ϕ−1 (y; κ) iff ϕ (y; κ)
2κ 1−y dy
1
=
κ(1 − y 2 )
Let the (transformed) distribution function GY be given by the representation
in terms of the orthogonal Chebyshev polynomial basis B
(and truncated after
K ∈ N terms):
∞
K−1
GY (y) = G̃k Tk (y) ≈ G̃k Tk (y)
k=0 k=0
I now proceed directly to derive expressions for the coefficients G̃k . First,
observe the following intermediate result which holds for Chebyshev polyno-
mials. Since y ∈ [−1, 1] then let y = cos u so dy = − sin udu, ∀k > 0:
cos[kcos−1 y] 1
Tk (y)w(y)dy = . dy = − sin[kcos−1 y] + c
1−y 2 k
1 #
= − Tk (y) + c
k
having defined Tk# (y) = sin[k cos−1 y]. Note that Tk# (1) = sin(0) = 0 and
Tk# (−1) = sin(kπ) = 0. If k = 0 then the above integral is − cos−1 y + c. The
coefficients G̃k may be obtained, for k = 0 as
1
a00 G̃0 = GY (y)w(y) dy
−1
1
= −GY (y) cos−1 (y)|1−1 + gY (y) cos−1 y dy
−1
1
iff G̃0 = EY [cos−1 Y ]
a00
and ∀k = 0, . . . , K − 1, as:
1
akk G̃k = GY (y)Tk (y)w(y) dy
−1
$1
1 $ 1 1
= − GY (y)Tk# (y)$$ + gY (y)Tk# (y) dy
k −1 k −1
1 1 1
= gY (y)Tk# (y) dy iff G̃k = EY [Tk# (Y )]
k −1 akk k
In practice, one can now empirically estimate the coefficients G̃k by gener-
ating enough samples from the unconditional distribution of X, transforming
160 High-Performance Computing in Finance
[Link] The SVJD model and the combined equity asset shock
The SVJD model has the continuous time SDE representation:
dSt √ (1)
= (μ − λμ̄)dt + vt dWt + (ηt − 1)dNt
St
√ (2)
dvt = α(θ − vt )dt + ξ vt dWt
dSt
= (μ − λμ̄)dt + dXt
St
√ (1)
dXt = vt dWt + (ηt − 1)dNt
√ (2)
dvt = α(θ − vt )dt + ξ vt dWt
and it is about the distribution function of the variable δXt |Ft that I seek
a representation in terms of orthogonal polynomials. For the equity St and
equity shock term δXt , consider the Euler–Maruyama discretization from con-
tinuous time to discrete time29 t = iΔt, i ∈ N, is (S0 given):
ΔSi
= (μ − λμ̄)Δt + ΔXi
Si−1
. . ! (3)
!
(1) (2)
ΔXi = vi−1 Δt 1 − ρ2 Zi + ρZi + eμ̄+σZi − 1 ΔNi
. (2)
Δvi = α(θ − vi−1 )Δt + ξ vi−1 ΔtZi
(1,3)
Here, Zi ∼iid N (0, 1) and ΔNi ∼ Po(λΔt). Since the equity shock depends
on the stochastic variance at time index i − 1 and upon current level of the
(2)
stochastic variance’s normal shock Zi then the equity shock depends upon
28 The notation δX , for some stochastic variable X indexed by continuous time t > 0,
t t
is a short-hand for Xt − Xt−δt where δt ∈ R s.t. 0 < δt 1.
29 Again, but in discrete time I write ΔX = X − X
i i i−1 .
Challenges in Scenario Generation 161
distribution of (ΔX, v, Δv, Z (1) , Z (3) , ΔN ) over the five-tuple (v, Δv,
Z (1) , Z (3) , ΔN ):
ρ . (3)
!
ΔX = (Δv − α(θ − v)Δt) + (1 − ρ2 )vΔtZ (1) + eμ̄+σZ − 1 ΔN
ξ
for every time horizon t of interest, that is, ti = iΔt each i = 1, 2, 3, ..., Tmax =
360. At times t > s ≥ 0 the continuous-time CIR process conditioned on
information up until time s is
where
ξ 2 (1 − e−αt ) 4αθ 4αv
k(t) = , ν= , λ(t, v) =
4α ξ2 ξ 2 (eαt − 1)
Using these results, variables vi ∼ k(ti )χ2 (ν, λ(ti , v0 )), vi |vi−1 ∼ k(Δt)χ2 (Δt,
iid
vi ), Z (1:3) ∼ N (0, 1) were generated and the variance shock set to Δvi =
(vi |vi−1 ) − vi−1 . At each time-index i, the unconditional equity asset shock
orthogonal polynomial representations were created for each variable ΔXi .
To understand precisely how each ΔXi was represented, consider the fol-
lowing. First, a 5 × 104 × 5 array composed of 5 × 104 instances of a five-
dimensional quasi-random uniform sequence was created using Rmetrics’ R
package fOptions.31 Quasi-random variables were used rather than pseudo-
random variables to avoid clumping or clustering of variables. Their effect
is to span more homogeneously the support of the equity asset shocks with
a smaller number of variables than would otherwise have been possible had
pseudo-random numbers been used. I note that quasi-random rather than
pseudo-random numbers could have been used in the previous section on
PEDCP variables but this was not found to be necessary. It was advanta-
geous when modeling the equity asset process since an entire process of 360
distributions were sought with much larger memory implications. The func-
tion [Link]() was used with a seed value of 1983 and Owen and Faure-
Tezuka type scrambling. The following steps in generating the data were: (i)
to generate two 5 × 104 -vectors of standard normal deviates and one 5 × 104 -
vector of Poisson data using the third, fourth and fifth columns of the Sobol
array and the appropriate quantile functions; (ii) to loop over the 360 time-
periods constructing the scaled non-central chi-squared level and step-change
data using the first and second columns of the Sobol array and appropriate
quantile functions, and then marginalizing to determine the equity asset shock
data; and (iii) to determine the coefficients in the Chebyshev representations
using K = 100 terms.
It is important to emphasize that the derivation of the equity asset shock
representations is a one-off cost that was done upfront in relatively little time.
31 A package for “Pricing and Evaluating Basic Options” maintained on CRAN by the
Recovery of vi is now only attainable conditioned upon ΔXi and vi−1 . Using
the quasi-random five-dimensional Sobol set, one must represent the condi-
tional distribution of vi as a function of the tuple (ΔXi , vi−1 ) in a bivariate
Chebyshev expansion at each step of the simulation. Only in this way can the
stochastic variance process vi be consistently realized alongside its equity asset
shock process ΔXi at the time of simulation. Given a relatively small num-
ber of terms was necessary for an accurate representation of the equity asset
shock process (around 15 terms were found to be sufficient, see Figure 5.9),
one anticipates this number squared of terms to represent the variance pro-
cess. It is stressed that this step is not necessary in the projection of correlated
equity assets. However, it must be done in this way to produce a consistent
stochastic variance process if that is desired.
The empirical distribution functions of the equity asset shock distributions
ΔXi at different time horizons are shown in Figure 5.8. The representations
in terms of Chebyshev polynomials are omitted from the plot as they are so
similar the differences are immaterial.
Bar plots of the coefficients at different timesteps: 1 month, 90 months
(7.5 years), 180 months (15 years), and 360 months (30 years) are shown
in Figure 5.9. Out of the K = 100 coefficients, only the first 15 have non-
negligible coefficients.
(1) (2)
Two equity asset processes St and St with the same underlying
parametrization (as given above) are now simulated where the correlation32
between their marginal shocks is set to ρ̃ = 0.8. This is done by modifying
the standard uniform shock pair u = (u1 , u2 )
to v = (v1 , v2 )
that seeds the
equity asset shock quantile function. If the correlation matrix between the two
32 Note: this is different from the correlation between the equity asset and stochastic
variance shocks ρ.
164 High-Performance Computing in Finance
1.0 Timestep
1
90
0.8 180
Distribution function
360
0.6
0.4
0.2
0.0
0.4 0.4
0.2 0.2
0.0 0.0
0 3 6 9 12 0 3 6 9 12
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0.0 0.0
–0.1 –0.1
0 3 6 9 12 0 3 6 9 12
FIGURE 5.9: Bar plots of the coefficients from the Chebyshev polynomial
representation of the SVJD equity asset shock distribution.
5.5 Discussion
In this chapter I have described the role of scenario generators within the
context of insurance and have outlined some challenges that they present.
Motivating the chapter with a high-level overview of the Solvency 2 Euro-
pean directive, I illustrated the use of scenario generators particularly when
an insurer chooses to build its own internal model. This approach is costly but
the cost may be offset against the potentially punitive regulatory capital laid
down by the simpler yet prescriptive standard model. I described how Pillar
1 required a quantitative measure of downside risk: the 1 year VaR from an
insurer’s loss distribution. A scenario generator is then required to upload eco-
nomic and risk scenarios to its asset and liability system to enable modeling
of its market and non-market risks. Moreover, under Pillar 2, an insurer must
166 High-Performance Computing in Finance
Asset shock 2
0.2
1.5
S_
0.0
1.0
–0.2
0.5
–0.4
0.0
0 10 20 30 –0.4 0.0 0.4
Time Asset shock 1
FIGURE 5.10: (a) Two correlated SVJD paths and (b) scatterplot of the
SVJD equity asset shocks (correlation = 0.8026).
References
Antonio, D. and Roseburgh, D. Fitting the Yield Curve: Cubic Spline Interpolation
and Smooth Extrapolation. Barrie & Hibbert Knowledge Base Article, Edin-
burgh, UK, 2010.
Berndt, A. R., Douglas, R., Duffie, D., Ferguson, F., and Schranz, D. Measuring
Default Risk in Premia from Default Swap Rates and EDFs. Preprint, Stanford
University. 2004.
Brace, A., Gatarek, D., and Musiela, M. The market model of interest rate dynamics.
Math. Fin., Vol 7(2), pp 127–154, 1997.
Brigo, D. and Mercurio, F. Interest Rates Models: Theory and Practice. Springer,
Berlin, 2006.
Carr, P., Geman, H., Madan, D. and Yor, M. The fine structure of asset returns: An
empirical investigation. J. Busin., Vol 75(2), pp 305–332, 2002.
Challenges in Scenario Generation 169
Duffie, D., Pan, J., and Singleton, K. Transform analysis and asset for affine jump-
diffusions. Econometric, Vol 68(6), pp 1343–1376, 2000.
Haberman, S. and Renshaw, A. 2008. Mortality, longevity and experiments with the
Lee-Carter model. Lifetime Data Anal., Vol 14, pp 286, doi: 10.1007/s10985-
008-9084-2
Hagan, P., Kumar, D., Lesniewski, A. E., and Woodward, D. Managing smile risk.
Wilmott Magazine, Vol 1, pp 84–108, 2002.
Hale, J. K. and Kocak, H. Dynamics and bifurcations. Texts Appl. Math., Vol 3, pp
444–494, Springer 1991.
Hastie, T., Tibshirani, R., and Friedman, J. The Elements of Statistical Learning:
Data Mining, Inference and Prediction. (2nd ed.), Springer, New York, NY,
2009.
Heath, D., Jarrow, R., and Morton, A. Bond pricing and the term structure of
interest rates: A discrete time approximation. J. Fin. Quant. An., Vol 25, pp
419–440, 1990.
Heston, S. L. A closed-form solution for options with stochastic volatility with appli-
cations to bond and currency options. Rev. Fin. Studies., Vol 6(2), pp 327–343,
1993.
Hull, J. C. Options, Futures and Other Derivatives. (6th ed.), Prentice-Hall, Upper
Saddle River, NJ, 2005.
Hull, J. and White, A. Pricing interest-rate derivative securities. Rev. Fin. Stud.,
Vol 3(4), pp 573–592, 1990.
170 High-Performance Computing in Finance
Mackenzie, D. and Spears, T. The Formula that Killed Wall Street?, The Gaussian
Copula and the Material Cultures of Modelling. Working Paper, 2012.
McNeil, A. J., Frey, R., and Embrechts, P. Quantitative Risk Management: Concepts,
Techniques and Tools. (2nd ed.), Princeton University Press. Woodstock, UK;
New Jersey, USA, 2015.
Neil, C., Clark, D., Kent, J., and Verheugen, H. A Brief Overview of Current
Approaches to Operational Risk under Solvency II. Milliman White Paper series,
2012.
Varnell, E. M. Economic scenario generators and solvency II. BAJ, Vol 16, pp 121–
159, 2011. doi: 10.1017/S1357321711000079.
Challenges in Scenario Generation 171
Wu, L. and Zhang, F. Libor market model with stochastic volatility. J. Indust.
Mgmt. Opt., Vol 2(2), pp 199–227, 2006.
Numerical Methods
in Financial
High-Performance
Computing (HPC)
173
Chapter 6
Finite Difference Methods for
Medium- and High-Dimensional
Derivative Pricing PDEs
CONTENTS
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
6.2 Finite Difference Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
6.3 Decomposition Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
6.3.1 Anchored-ANOVA decomposition . . . . . . . . . . . . . . . . . . . . . . . 182
6.3.2 Constant coefficient PDEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
6.3.3 Variable coefficients: Full freezing . . . . . . . . . . . . . . . . . . . . . . . 184
6.3.4 Partial freezing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
6.3.5 Partial freezing and zero-correlation approximation . . . . 184
6.4 Theoretical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
6.4.1 Constant coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
6.4.2 Variable coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
6.5 Numerical Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
6.5.1 Time-dependent simple correlation . . . . . . . . . . . . . . . . . . . . . . 189
6.5.2 Time-dependent exponential correlation . . . . . . . . . . . . . . . . 190
6.5.3 Time-dependent volatilities, simple correlation . . . . . . . . . 190
6.5.4 Time-dependent volatilities, exponential correlation . . . . 191
6.5.5 Asset-dependent correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
6.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
6.1 Introduction
Many models in financial mathematics and financial engineering, particu-
larly in derivative pricing, can be formulated as partial differential equations
(PDEs). Specifically, for the most commonly used continuous-time models of
asset prices the value function of a derivative security, that is the option value
as a function of the underlying asset price, is given by a PDE. This opens
175
176 High-Performance Computing in Finance
up the possibility to use accurate approximation schemes for PDEs for the
numerical computation of derivative prices.
As the computational domain is normally a box, or can be restricted to
one by truncation, the construction of tensor product meshes and spatial finite
difference stencils is straightforward [1]. Accurate and stable splitting methods
have become standard for efficient time integration [2].
Notwithstanding this, the most common approach in the financial industry
appears to be Monte Carlo methods. This is partly a result of the perception
that PDE schemes, although highly efficient for simple contracts, are less flex-
ible and harder to adapt to more exotic features. In particular, the widespread
belief is that PDE schemes become too slow for practical use if the number of
underlying variables exceeds 3.
Indeed, the increase in computational time and memory requirements of
standard mesh-based methods with the dimension is exponential and has
become known as the “curse of dimensionality.” Various methods, such as
sparse grids [3,4], radial basis functions [5], and tensor approaches ([6] for an
application to finance and [7] for a literature survey), have been proposed
to break this curse. These methods can perform remarkably well for special
cases, but have not been demonstrated to give accurate enough solutions for
truly high dimensions in applications (larger than, say, 5).
In conversations about numerical methods for high-dimensional PDEs
inevitably the question comes up: “How high can you go?” This is a mean-
ingful question if one considers a specific type of PDE with closely defined
characteristics. But even within the fairly narrow class of linear second-order
parabolic PDEs which are most common in finance, the difficulty of solving
them varies vastly and depends on a number of factors: the input data (such
as volatilities and correlations), the boundary data (payoff), and the quantity
of interest (usually the solution of the PDE at a single point).
It is inherent to the methods presented in this chapter that it is not the
nominal dimension of a PDE which matters. A PDE which appears inaccessi-
ble to numerical methods in its raw form may be very easily approximated if
a more adapted coordinate system is chosen. This can be either because the
solution is already adequately described by a low number of principal com-
ponents (it has low “truncation dimension”), or because it can be accurately
represented as the sum of functions of a low number of variables (it has low
“superposition dimension”).
To exploit such features, we borrow ideas from data analysis to represent
the solutions by sums of functions which can be approximated by PDEs with
low effective dimension. More specifically, the method is a “dynamic” version
of the anchored-ANOVA decompositions which were applied to integration
problems in finance in Reference 8. A version which is equivalent in special
cases has been independently derived via PDE expansions in Reference 3; a
detailed error analysis is found in Reference 9 and also in Reference 10; an
efficient parallelization strategy is proposed in Reference 11; and the method
is extended to complex derivatives in Reference 12 and to CVA computations
Medium- and High-Dimensional Derivative Pricing PDEs 177
∂V N
∂V 1
N
∂2V
+ μi + σi σj ρij − αV = 0,
∂t i=1
∂si 2 i,j=1 ∂si ∂sj
V (s, T ) = h(s).
For simplicity, we consider functions defined on the whole of RN , but it will
become clear how to deal with bounded domains.
Let p(y, t; s, 0) be the transition density function of St at y given state s
at t = 0. Then if α does not depend on S, we can write
T
V (s, 0) = exp(− 0 α(u) du) p(y, T ; s, 0)h(y) dy.
RN
∂p ∂ 1 ∂2
N N
− − (μi p) + (σi σj ρij p) = 0,
∂t i=1 ∂yi 2 i,j=1 ∂yi ∂yj
p(y, 0; s, 0) = δ(y − s),
where δ is the Dirac distribution centered at 0.
178 High-Performance Computing in Finance
∂V N
∂V 1
N
∂2V
= μi + σi σj ρij − αV, (6.3)
∂t i=1
∂si 2 i,j=1 ∂si ∂sj
V (s, 0) = h(s), (6.4)
where we keep the symbols t and V for simplicity. We now transform the
PDE into a standard form by using a rotation and subsequent translation
of the spatial coordinates. For a given orthogonal matrix Q ∈ RN ×N , define
β : RN × [0, T ] → RN componentwise by
N t
βi (x, t) ≡ Qji μj (x, T − u) du (6.5)
j=1 0
and set
a = QT s0 + β(s0 , T ). (6.7)
∂V
N
∂2V ∂V
N
= LV := λkl + κk − αV, (6.8)
∂t ∂xk ∂xl ∂xk
k,l=1 k=1
V (x, 0) = g(x) := h(s(x, 0)), (6.9)
1
N
λkl (x, t) = Qik Qjl σi σj ρij ,
2 i,j=1
(6.10)
N
κk (x, t) = Qik [μi − μi (s0 , T − t)] ,
i=1
1
Q = (q1 , . . . , qN ), Σqi = λi qi , λ1 ≥ · · · ≥ λN ≥ 0, (6.11)
2
and get (λkl )1≤k,l,≤N = diag(λ1 , . . . , λN ) as a constant diagonal matrix.
If μ does not depend on the spatial coordinates x but only on t, then
the difference under the sum in Equation 6.10 vanishes identically and thus
κ(x, t) ≡ 0.
Moreover, if α is also only a function of t, the zero order term ! can be
t
eliminated from Equation 6.8 by considering exp 0 α(T − u) du V .
If all this is satisfied, then L simplifies to the N -dimensional heat operator
in Equation 6.12. Keeping the symbol V for the transformed value function
and L for the operator for simplicity, we obtain
∂V ∂2VN
= LV = λk 2 , (6.12)
∂t ∂xk
k=1
V (x, 0) = g(x), (6.13)
Δt > 0 and the spatial mesh sizes Δxi > 0, i = 1, . . . , d, where d is the
dimension of the PDE. We first define basic finite difference operators
Vh (·, t + Δt) − Vh (·, t)
δt Vh (·, t) = ,
Δt
Vh (· + Δxi , t) − Vh (· − Δxi , t)
δxi Vh (·, t) = ,
2Δxi
Vh (· + Δxi , t) − 2Vh (·, t) + Vh (· − Δxi , t)
δxi,i Vh (·, t) = ,
Δx2i
δxi,j Vh = δxi δxj V, i = j,
d
d
L(t) = κi (·, t) δxi + λij (·, t) δxi,j − α(·, t),
i=1 i,j=1
and, following Reference 2, we define one matrix F0 which accounts for the
mixed derivative terms,
Ln0 = λij (·, tn )δxi,j .
i=j
The scheme is unconditionally stable for all θ ≥ 1/2 and of second order in
time for θ = 1/2 (otherwise of first order, see Reference 20).
A second-order modification of the above scheme was proposed by Brian
[17], where the first two steps are as above with θ = 1 and step size Δt/2, and
the last step (6.15) is replaced by a Crank–Nicholson-type step
Un − Un−1 1 d
= (Lnj + Ln−1
j )Yj .
Δt j=1
2
For Ln0 = 0, that is, with cross-derivative terms present as in the general
case of Equation 6.14, second order gets lost and an iteration of the idea is
needed. The HV scheme [18],
N
g= Δgu = Δgu . (6.18)
u⊆N k=0 |u|=k
∂V ∂2V N
= LV = λk 2 , (6.19)
∂t ∂xk
k=1
V (·, 0) = g, (6.20)
with constant λ.
Given an initial-value problem of the form 6.19 and 6.20, and an index set
u ⊆ N , define a differential operator
∂2
Lu = λk 2 ,
∂xk
k∈u
where ck are integer constants that depend on the dimensions N and s. The
point to note is that Vu is essentially a |u|-dimensional function as it only
depends on the fixed anchor and |u| components of x.
In situations where one or several coordinates play a dominant role, it will
be useful to consider a generalization of Equation 6.23 to
s
Vr,s = ck Vu∪{1,...,r} , r + s ≤ N. (6.24)
k=0 |u|=k
∂Vu ∂ 2 Vu
= λii (a, 0) ,
∂t i∈u
∂x2i
Vu (x, 0) = g(x).
∂Vu ∂Vu ∂ 2 Vu
= κi (a\xu , t) + λij (a\xu , t) ,
∂t i∈u
∂xi i,j∈u ∂xi ∂xj
Vu (x, 0) = g(x).
∂Vu ∂Vu ∂ 2 Vu
= κi (a\xu , t) + λii (a\xu , t) ,
∂t i∈u
∂xi i∈u
∂x2i
Vu (x, 0) = g(x).
This extra approximation in addition to Section 6.3.4 does not give any
further dimension reduction, but simplifies the PDEs somewhat, that is, no
cross-derivative terms are present, which simplifies the construction of numer-
ical schemes.
Medium- and High-Dimensional Derivative Pricing PDEs 185
The analysis in Reference 9 derives PDEs for the error itself, and then
makes use of standard maximum principle-type arguments to estimate the
size of the error.
186 High-Performance Computing in Finance
For instance, by using the PDEs satisfied by V and V{1,...,r,i} for different
i, it can be shown that
∂ :
Vr,1 = L{1,...,r} V:r,1
∂t
N
' ( ' (
+ L{1,...,r,i} − L{1,...,r} V{1,...,r,i} + L{1,...,r} − L V
i=r+1
r
∂2 :
N
∂2 ' (
= λk 2 V r,1 + λk 2 V{1,...,r,k} − V . (6.27)
∂xk ∂xk
k=1 k=r+1
This is an inhomogeneous heat equation for V:r,1 with zero initial data and a
right-hand side which can be shown to be small. As a consequence, the solution
itself is small. Informally, the terms on the right-hand side V {1,...,r,k} − V
) O(λr+1 + · · · + λN − λk ), and hence the right-hand side is of
are of order
order O( r<i<j≤N λi λj ). A slightly more careful argument gives the precise
bound (Equation 6.25), and a similar but lengthier argument for Vr,2 gives
Equation 6.26.
A number of comments are in order regarding the smoothness requirements
dictated by the error bounds. First, most option payoffs are nonsmooth, have
kinks and discontinuities. This would appear to render Equation 6.25 and its
higher order versions meaningless. A reworking of the derivation shows that
g can actually be replaced by Vr,0 , which is the solution to
∂Vr,0
r
∂ 2 Vr,0
= λk ,
∂t ∂x2k
k=1
Vr,0 (x, 0) = g(x).
we can still derive a PDE for the expansion error even for nonconstant coeffi-
cients. Straightforward calculus yields an expression similar to Equation 6.27,
namely
∂ : r
∂2 :
Vr,1 = λkl (z, t) Vr,1
∂t ∂xk xl
k,l=1
% &
N
∂2 r
∂2 ' (
+ λkk 2 + λkl V{1,...,r,k} − V (6.28)
∂xk ∂xk xl
k=r+1 l=1
N
∂2
− λkl V (6.29)
∂xk xl
k,l=r+1,k=l
N
∂ ' (
+ κk V{1,...,r,k} − V (6.30)
∂xk
k=r+1
This equation contains three source terms, which determine the error size:
• The first term, (see Equation 6.28), is similar to the source term appear-
ing in the constant coefficient case. It is essentially a restricted differential
operator applied to the difference between full and partial solutions.
• The second term, (see Equation 6.29), consists of the nondiagonal terms
not captured at all in the expansion applied to the full solution. It con-
tains the full solution rather than the difference between full and partial
ones, but the λkl involved are zero for t = 0 and x = a.
• The third term, (see Equation 6.30), where κk (a, 0) = 0, captures the
changes in κ and again acts on the differences between partial and full
solutions.
Away from these initial coordinates, the terms grow slowly and drive a nonzero
error.
Instead of investigating this further theoretically, we give quantitative
examples in the following section.
model for the underlying stock has variable coefficients. We list six “base”
cases of how the PDE coefficients can be varied in Table 6.1.
Consider assets whose dynamics for the prices of St1 , . . . , StN are given by
1
d(log Sti ) = − σi2 (St , t) dt + σi (St , t) dWti , 1 ≤ i ≤ N,
2
under the risk-neutral measure with zero interest rates. By considering log
prices as primitive variable in Equation 6.1, in a Black–Scholes setting, that
is, if σ and ρ are constant, the PDE coefficients are constant. Generally, the
Brownian motions W i are correlated according to the correlation matrix
ω 1 = (1/10, 1/10, 1/10, 1/10, 1/10, 1/10, 1/10, 1/10, 1/10, 1/10),
ω 2 = (4/30, 4/30, 4/30, 4/30, 4/30, 2/30, 2/30, 2/30, 2/30, 2/30),
ω 3 = (1/4, 1/4, 1/4, 1/4, 1/4, 1/4, 1/4, −1/4, −1/4, −1/4).
ρ(t) = ρsimple (0.8 − 0.8 · (t/T − 0.5)2 ) ∈ [ρsimple (0.6), ρsimple (0.8)].
190 High-Performance Computing in Finance
PDE/ADI and PDE/HV results were almost identical and very close to the
MC results. Only in the third case of ω 3 did they even differ in a statistically
significant way, that is, relative to the standard error σMC , from the MC
computation. It is worth noting that the errors are even slightly larger in the
fully frozen case, implying that the variable coefficients present no particular
problem in this model.
heat equation is exact. This case is simple: it merely requires the solution of
a heat equation with time-dependent diffusion coefficients.
For time-dependent σi = σi (t), that is, the case where the volatilities vary
differently over time, the eigenvectors change with t. This in general leads to
the appearance of nonzero off-diagonal terms. With no dependency on the
asset values S, the initial PDE transformation means that those terms vanish
at time t = 0 and then grow over time for t > 0.
Table 6.4 shows results for ρ = ρsimple (0.7) and
i−1 i−1
σi (t) = 0.1(1 + t/T ) 1 + ∈ [0.1, 0.2] 1 + .
N −1 N −1
Both the PDE/diagonal ADI and PDE/HV results are fairly accurate for
the first two test cases. They both struggle with the third one, producing
errors of 2.42% and 2.66%. Given that a similar error is present in the fully
localized case, that is, for the model with constant coefficients, we conclude
that this error is primarily due to the expansion method being applied to the
challenging payout direction ω 3 , rather than the nonconstant nature of the
coefficients.
6.6 Conclusion
This chapter describes a systematic approach to approximating medium-
to high-dimensional PDEs in derivative pricing by a sequence of lower dimen-
sional PDEs, which are then accessible to state-of-the-art finite difference
methods. The splitting is accurate especially in situations where the dynam-
ics of the underlying stochastic processes can be described well by a lower
194 High-Performance Computing in Finance
References
1. Tavella, D. and Randall, C. Pricing Financial Instruments: The Finite Differ-
ence Method. Wiley, New York, 2000.
2. in ’t Hout, K. J. and Foulon, S. ADI finite difference schemes for option pricing.
Int. J. Numer. Anal. Mod., 7(2):303–320, 2010.
5. Pettersson, U., Larsson, E., Marcusson, G., and Persson, J. Improved radial
basis function methods for multi-dimensional option pricing. J. Comput. Appl.
Math., 222(1):82–93, 2008.
6. Kazeev, V., Reichmann, O., and Schwab, C. Low-rank tensor structure of lin-
ear diffusion operators in the TT and QTT formats. Linear Algebra Appl.,
438(11):4204–4221, 2013.
Medium- and High-Dimensional Derivative Pricing PDEs 195
10. Hilber, N., Kehtari, S., Schwab, C., and Winter, C. Wavelet finite element
method for option pricing in high-dimensional diffusion market models. Techni-
cal Report 2010–01, SAM, ETH Zürich, 2010.
11. Schröder, P., Schober, P., and Wittum, G. Dimension-wise decompositions and
their efficient parallelization. Electronic version of an article published in Recent
Developments in Computational Finance, Interdisciplinary Mathematical Sci-
ences, 14:445–472, 2013.
13. de Graaf, C. S. L., Kandhai, D., and Reisinger, C. Efficient exposure computa-
tion by risk factor decomposition. arXiv preprint arXiv:1608.01197, 2016.
14. Reisinger, C. Asymptotic expansion around principal components and the com-
plexity of dimension adaptive algorithms. In Garcke, J. and Griebel, M., editors,
Sparse Grids and Applications, number 88 in Springer Lectures Notes in Com-
putational Science and Engineering, Springer-Verlag, Berlin, Heidelberg, pages
263–276, 2012.
15. Schröder, P., Gerstner, T., and Wittum, G. Taylor-like ANOVA-expansion for
high dimensional problems in finance. Working paper, 2012.
19. Douglas, J. Alternating direction methods for three space variables. Numer.
Math., 4(1):41–63, 1962.
21. Haentjens, T. and in ’t Hout, K. J. ADI finite difference schemes for the Heston–
Hull–White PDE. Journal of Computational Finance, 16(1):83–110, 2012.
Chapter 7
Multilevel Monte Carlo Methods
for Applications in Finance∗
CONTENTS
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
7.2 Multilevel Monte Carlo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
7.2.1 Monte Carlo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
7.2.2 Multilevel Monte Carlo theorem . . . . . . . . . . . . . . . . . . . . . . . . 200
7.2.3 Improved multilevel Monte Carlo . . . . . . . . . . . . . . . . . . . . . . . 202
7.2.4 Stochastic differential equations . . . . . . . . . . . . . . . . . . . . . . . . . 202
7.2.5 Euler and Milstein discretizations . . . . . . . . . . . . . . . . . . . . . . . 203
7.2.6 Multilevel Monte Carlo algorithm . . . . . . . . . . . . . . . . . . . . . . . 205
7.3 Pricing with Multilevel Monte Carlo . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
7.3.1 Euler–Maruyama scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
7.3.2 Milstein scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
[Link] Lookback options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
7.3.3 Conditional Monte Carlo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
7.3.4 Barrier options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
7.3.5 Digital options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
7.4 Greeks with Multilevel Monte Carlo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
7.4.1 Monte Carlo greeks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
7.4.2 Multilevel Monte Carlo greeks . . . . . . . . . . . . . . . . . . . . . . . . . . 218
7.4.3 European call . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
7.4.4 Conditional Monte Carlo for pathwise sensitivity . . . . . . . 218
7.4.5 Split pathwise sensitivities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
7.4.6 Optimal number of samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
7.4.7 Vibrato Monte Carlo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
7.5 Multilevel Monte Carlo for Jump-Diffusion Processes . . . . . . . . . . . 222
7.5.1 A jump-adapted Milstein discretization . . . . . . . . . . . . . . . . . 223
[Link] Multilevel Monte Carlo for constant jump
rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
Algorithms and Applications, Thomas Gerstner and Peter Kloeden, editors, Copyright c
2013 by World Scientific Publishing Co. Pte. Ltd.
197
198 High-Performance Computing in Finance
7.1 Introduction
In 2001, Heinrich [1] developed a multilevel Monte Carlo method for
parametric integration, in which one is interested in estimating the value of
E[f (x, λ)], where x is a finite-dimensional random variable and λ is a parame-
ter. In the simplest case in which λ is a real variable in the range [0, 1], having
estimated the value of E[f (x, 0)] and E[f (x, 1)], one can use 12 (f (x, 0)+f (x, 1))
as a control variate when estimating the value of E[f (x, 12 )], since the variance
of f (x, 12 )− 12 (f (x, 0)+f (x, 1)) will usually be less than the variance of f (x, 12 ).
This approach can then be applied recursively for other intermediate values
of λ, yielding large savings if f (x, λ) is sufficiently smooth with respect to λ.
Giles’ multilevel Monte Carlo path simulation [2] is both similar and dif-
ferent. There is no parametric integration, and the random variable is infinite-
dimensional, corresponding to a Brownian path in the original paper. However,
the control variate viewpoint is very similar. A coarse path simulation is used
as a control variate for a more refined fine path simulation, but since the
exact expectation for the coarse path is not known, this is in turn estimated
recursively using even coarser path simulation as control variates. The coars-
est path in the multilevel hierarchy may have only one timestep for the entire
interval of interest.
A similar two-level strategy was developed slightly earlier by Kebaier [3],
and a similar multilevel approach was under development at the same time
by Speight [4,5].
In this review article, we start by introducing the central ideas in multilevel
Monte Carlo (MLMC) simulation, and the key theorem from Reference 2
which gives the greatly improved computational cost if a number of conditions
Multilevel Monte Carlo Methods for Applications in Finance 199
simulations is given by
1
E (Ŷ − E[Y ])2 = O + O(Δt2 ).
N
To ensure the root-mean-square error is proportional to ε, we must have
M SE = O(ε2 ) and therefore 1/N = O(ε2 ) and Δt2 = O(ε2 ), which means
N = O(ε−2 ) and Δt = O(ε). The computational cost of a standard Monte
Carlo simulation is proportional to the number of paths N multiplied by the
cost of generating a path, that is, the number of timesteps in each sample
path. Therefore, the cost is C = O(ε−3 ). In the following section, we will
show that using MLMC we can reduce the complexity of achieving root mean
square error ε to O(ε−2 ).
and
L
L
E[Ŷ ] = E[Y ] = E[P0 ] + E[P − P−1 ] = E[PL ].
=0 =1
L
Y = Y
=0
In the classical Monte Carlo setting, we are mainly interested in the weak
approximation of SDEs 7.4. Given a smooth payoff P : Rd → R, we say that
X2 converges to x(T ) in a weak sense with order α if
Rate α is required in condition (i) of Theorem 7.1. However, for MLMC con-
dition (iii) of Theorem 7.1 is crucial. We have
' (
V ≡ Var (P − P−1 ) ≤ E (P − P−1 )2
204 High-Performance Computing in Finance
It is clear now that in order to estimate the variance of the MLMC, we need
to examine strong convergence property. The classical strong convergence on
the finite time interval [0, T ] is defined as
; ;p !1/p
E ;x(T ) − XT ; = O(Δtξ ), for p ≥ 2.
Even in the case of globally Lipschitz continuous payoff P , the EM does not
achieve β = 2ξ > 1 which is optimal in Theorem 7.1. In order to improve
the convergence of the MLMC variance, the Milstein approximation Xn ≈
x(n Δt ) is considered, with i th component of the form [12]
m
Xi,n+1 = Xi,n + fi (Xn ) Δt + gij (Xn ) Δwj,n+1
j=1
m
+ hijk (Xn ) Δwj,n
Δwk,n − Ωjk Δt − Ajk,n (7.7)
j,k=1
where Ω is the correlation matrix for the driving Brownian paths, and Ajk,n
is the Lévy area defined as
(n+1)Δt
! ! !
Ajk,n = wj (t) − wj (nΔt ) dwk (t) − wk (t) − wk (nΔt ) dwj (t) .
nΔt
The rate of strong convergence ξ for the Milstein scheme is double the value
we have for the EM scheme and therefore the MLMC variance for Lipschitz
payoffs converges twice as fast. However, this gain does not come without a
price. There is no efficient method to simulate Lévy areas, apart from dimen-
sion 2 [13–15]. In some applications, the diffusion coefficient g(x) satisfies a
commutativity property which gives
hijk (x) = hikj (x) for all i, j, k.
In that case, because the Lévy areas are antisymmetric (i.e., Aljk,n = −Alkj,n ),
it follows that hijk (Xn ) Aljk,n + hikj (Xn ) Alkj,n = 0 and therefore the terms
Multilevel Monte Carlo Methods for Applications in Finance 205
involving the Lévy areas cancel and so it is not necessary to simulate them.
However, this only happens in special cases. Clark and Cameron [16] proved
for a particular SDE that it is impossible to achieve a better order of strong
convergence than the EM discretization when using just the discrete incre-
ments of the underlying Brownian motion. The analysis was extended by
Müller-Gronbach [17] to general SDEs. As a consequence if we use the stan-
dard MLMC method with the Milstein scheme without simulating the Lévy
areas, the complexity will remain the same as for EM. Nevertheless, Giles and
Szpruch showed in Reference 18 that by constructing a suitable antithetic esti-
mator one can neglect the Lévy areas and still obtain a multilevel correction
estimator with a variance which decays at the same rate as the scalar Milstein
estimator.
L
Ŷ = Y .
=0
L
L
1
V[Y ] = V[Y ] = V ,
N
=0 =0
1√
First-order conditions show that N = λ− 2 V Δt , therefore
√
L
V
L
λ
V[Y ] = = √ V .
N V Δt
=0 =0
206 High-Performance Computing in Finance
ε2
Since we want V[Y ] ≤ 2 , we can show that
1
L .
λ− 2 ≥ 2ε−2 V /Δt ,
=0
Assuming O(Δt ) weak convergence, the bias of the overall method is equal
to cΔtL = c T 2−L . If we want the bias to be proportional to √ε2 , we set
√
log (ε/(cT 2))−1
Lmax = .
log 2
From here we can calculate the overall complexity. We can now outline the
algorithm
1. Begin with L = 0
Most numerical tests suggest that Lmax is not optimal and we can sub-
stantially improve MLMC by determining optimal L by looking at bias. For
more details, see Reference 2.
This is due to the fact that the Milstein scheme gives an improved rate of con-
vergence on the grid points, but this is insufficient for path-dependent options.
In many applications, the behavior of the numerical approximation between
grid points is crucial. The analysis of the Milstein scheme for complex payoffs
was carried out in Reference 20. To understand this problem better, we recall a
few facts from the theory of strong convergence of numerical approximations.
We can define a piecewise linear interpolation of a numerical approximation
within the time interval [nΔt , (n + 1)Δt ) as
X (t) = Xn + l (Xn+1
− Xn ), for t ∈ [nΔt , (n + 1)Δt ), (7.9)
where l ≡ (t − nΔt )/Δt . Müller-Gronbach [21] has shown that for the
Milstein scheme 7.9, we have
; ;p
E sup ;x(t) − X (t); = O(|Δt log(Δt )|p/2 ), p ≥ 2, (7.10)
0≤t≤T
that is the same as for the EM scheme. In order to maintain the strong order of
convergence, we use Brownian Bridge interpolation rather than basic piecewise
linear interpolation:
X̃ (t) = Xn + λ (Xn+1
− Xn ) + g(Xn ) w(t) − w(nΔt ) − λ Δwn+1
l
, (7.11)
for t ∈ [nΔt , (n+1)Δt ). For the Milstein scheme interpolated with Brownian
bridges, we have [21]
; ;p
; ;
E sup ;x(t) − X̃ (t); = O(|Δt log(Δt )|p ).
0≤t≤T
Clearly X̃ (t) is not implementable, since in order to construct it, the knowl-
edge of the whole trajectory (w(t))0≤t≤T is required. However, we will demon-
strate that combining X̃ (t) with conditional Monte Carlo techniques can dra-
matically improve the convergence of the variance of the MLMC estimator.
This is due to the fact that for suitable MLMC estimators only distributional
knowledge of certain functionals of (w(t))0≤t≤T will be required.
Using the piecewise linear interpolation 7.9, one can obtain the following
approximation:
T
2 −1
−1 −1 1
Pl ≡ T
X (t) dt = T 2 Δt (Xn +Xn+1
).
0 n=0
+ g (Xn )g(Xn )((Δwn+1
)2 − Δt ), (7.12)
where g
≡ ∂g/∂x. The analysis of Lipschitz European payoffs and Asian
options with the Milstein scheme is analogous to the EM scheme and it has
been proved in Reference 20 that in both these cases V = O(Δt2 ).
1/2
Here β ∗ ≈ 0.5826 is a constant which corrects the O(Δt ) leading order error
due to the discrete sampling of the path, and thereby restores O(Δt ) weak
convergence [23]. However, using this approximation, the difference between
1/2
the computed minimum values and the fine and coarse paths is O(Δt ),
and hence the variance V is O(Δt ), corresponding to β = 1. In the previous
section, this was acceptable because β = 1 was the best that could be achieved
in general with the Euler path discretization which was used, but we now aim
to achieve an improved convergence rate using the Milstein scheme.
210 High-Performance Computing in Finance
where minimum of the fine approximation over the first half of the coarse
timestep is given by [24]
1
Xn,min = Xn + Xn+
1
2 2
>
! 2
−
Xn+ 1 − Xn
− 2 g(Xn )2 Δtl log Un , (7.13)
2
and minimum of the fine approximation over the second half of the coarse
timestep is given by
1
Xn+ 1 ,min = Xn+ 1 + Xn+1
2 2 2
>
! 2
−
Xn+1 − Xn+
− 2 g(Xn+
2
1 1 ) Δt log U , (7.14)
2 2 n+ 1 2
where Un , Un+
1 are uniform random variables on the unit interval. For the
2
coarse path, in order to improve the MLMC variance a slightly different esti-
mator is used (see Equation 7.3). Using the same Brownian increments as we
used on the fine path (to guarantee that we stay on the same path), Equa-
1 ≡ X̃
−1 −1
tion 7.11 is used to define X̃n+ ((n + 12 )Δt−1 ). Given this inter-
2
polated value, the minimum value over the interval [nΔt−1 , (n + 1)Δt−1 ]
can then be taken to be the smaller of the minima for the two intervals
[nΔt−1 , (n + 12 )Δt−1 ) and [(n + 12 )Δt−1 , (n + 1)Δt−1 ),
−1 1 −1
Xn,min = Xn−1 + X̃n+ 1
2 2
>
!2
−1 2 Δt−1
− X̃n+ 1 − Xn
−1 −1
− 2 (g(Xn ))
log Un ,
2 2
−1 1 −1 −1
Xn+ 1 = X̃n+ 1 + Xn+1
2 ,min 2 2
>
!2
−1 2 Δt−1
− Xn+1 − X̃n+ 1 − 2 (g(Xn ))
−1 −1
log Un+ 1 ) .
2 2 2
(7.15)
Multilevel Monte Carlo Methods for Applications in Finance 211
Note that g(Xn−1 ) is used for both timesteps. It is because we used the
Brownian Bridge with diffusion term g(Xn−1 ) to derive both minima. If we
−1 −1
changed g(Xn−1 ) to g(X̃n+ 1 ) in X
n+ 12 ,min
, this would mean that different
2
Brownian Bridges were used on the first and second half of the coarse timestep
and as a consequence condition 7.3 would be violated. Note also the reuse of
the same uniform random numbers Un and Un+
1 used to compute the fine
2
−1 −1
path minimum. The min(Xn,min , Xn+ 1
,min
) has exactly the same distribution
2
−1
as Xn,min , since they are both based on the same Brownian interpolation,
and therefore equality 7.3 is satisfied. Giles et al. [20] proved the following
theorem.
This requires the simulation of (x(T ), 1τ >T )). The simplest method sets
τ Δt = inf {Xn < B}
n
−2 (Xn − B)+ (Xn+
1 − B)+
2
= exp ,
g(Xn )2 Δt
Multilevel Monte Carlo Methods for Applications in Finance 213
and
pn+ 1 =P inf X̃(t) <
B|Xn+
1 , Xn+1
2 (n+ 12 )Δt ≤t<(n+1)Δt 2
+ +
−2 (Xn+
1 − B) (Xn+1 − B)
2
= exp 2
.
g(Xn+ 1 ) Δt
2
The payoff for the coarse path is defined similarly. However, in order to reduce
−1
the variance, we subsample X̃n+ 1 , as we did for lookback options, from the
2
−1
Brownian Bridge connecting Xn−1 and Xn+1
⎡ ⎤
2−1
#−1
E ⎣f (X2−1
−1 ) 1{X −1 ≥B} ⎦
n,min
n=0
⎡ ⎡ ⎤⎤
2−1
#−1
= E ⎣E ⎣f (X2−1
−1 ) 1{X −1 |X0−1 , X̃ −1 , . . . , X̃2−1 −1 ⎦⎦
−1 − 1 , X2−1
n,min ≥B}
1
2 2
n=0
⎡ ⎤
2−1
#−1
−1 ⎦
= E ⎣f (X2−1
−1 ) E 1{X −1 |Xn−1 , X̃n+
−1
1 , Xn+1
n,min ≥B} 2
n=0
⎡ ⎤
2−1
#−1
= E ⎣f (X2−1
−1 ) (1 − p−1 −1 ⎦
1,n )(1 − p2,n ) ,
n=0
where ⎛ ⎞
−2 (Xn−1 − B)+ (X̃n+
−1
1 − B)
+
p−1
1,n == exp ⎝ 2 ⎠
g(Xn−1 )2 Δt
and ⎛ ⎞
+ +
−2 (X̃n+
−1
1 − B) (Xn+1 − B)
−1
p−1
2,n == exp
⎝ 2 ⎠.
g(Xn−1 )2 Δtl
−1 −1
Note that the same g(Xn−1 ) is used (rather than using g(X̃n+ 1 ) in p2,n ) to
2
calculate both probabilities for the same reason as we did for lookback options.
The final estimator can be written as
2−1
#−1
c
P−1 = f (X2−1
−1 ) (1 − p−1
1,n )(1 − p2,n ).
−1
(7.17)
n=0
Theorem 7.3. Provided inf [0,T ] |g(B)| > 0, and inf [0,T ] x(t) has a bounded
density in the neighborhood of B, then the multilevel estimator for a down-
3/2−δ
and-out barrier option has variance V = O(Δt ) for any δ > 0.
3/2−δ
The reason the variance is approximately O(Δt ) instead of O(Δt2 )
is the following: due to the strong convergence property the probability of
the numerical approximation being outside the Δt1−δ -neighborhood of the
solution to the SDE 7.4 is arbitrarily small, that is for any ε > 0
; ;
P sup ;x(nΔt ) − Xn ; ≥ Δt1−ε
0≤nΔt ≤T
; ;
≤ Δt−p+pε E sup ;x(nΔt ) − Xn ;p = O(Δpε ). (7.18)
0≤nΔt ≤T
1/2
If inf [0,T ] x(t) is outside the Δt -neighborhood of the barrier B then by
Equation 7.18 it is shown that so are numerical approximations. The proba-
bilities of crossing the barrier in that case are asymptotically either 0 or 1 and
essentially we are in the Lipschitz payoff case. If the inf [0,T ] x(t) is within the
1/2
Δt -neighborhood of the barrier B, then so are the numerical approxima-
tions. In that case it can be shown that E[(Pf − P−1 c
)2 ] = O(Δt1−δ ) but due
to the bounded density assumption, the probability that inf [0,T ] x(t) is within
1/2 1/2−δ
Δt -neighborhood of the barrier B is of order Δt . Therefore the overall
3/2−δ
MLMC variance is V = O(Δ ) for any δ > 0.
P = 1{x(T )>B} .
X2−1 − 1 +f (X2−1 − 1 )Δt − B
Pf =E 1{X −1 >B} |X2−1 − 1 =Φ 2
√2 ,
2 2 |g(X2−1 − 1 )| Δt
2
(7.19)
c
P−1 = E 1{X −1 >B} |X2−1 −1
−1 −1 , Δw2−1 − 1
−1 2
⎛ 2−1 ⎞
2
X2−1 −1 +f (X2−1 −1 )Δt−1 +g(X2−1 −1 −1 )Δw2−1 − 1 − B
−1 −1
= Φ⎝ −1 √ 2 ⎠.
|g(X22−1 −1 )| Δt
(7.20)
Theorem 7.4. Provided g(B) = 0, and x(t) has a bounded density in the
neighborhood of B, then the multilevel estimator for a digital option has vari-
3/2−δ
ance Vl = O(Δtl ) for any δ > 0.
Here θ represents a generic input parameter, and the probability density func-
tion for Z is
pZ (Z) = (2π)−d/2 exp −Z22 /2 ,
where d is the dimension of the vector Z.
Let Xn = Xn (Z, θ). If the drift, volatility and payoff functions are all dif-
ferentiable, Equation 7.21 may be differentiated to give
∂V ∂P (Xn ) ∂Xn
= pZ (Z) ΔZ, (7.22)
∂θ ∂Xn ∂θ
Multilevel Monte Carlo Methods for Applications in Finance 217
∂Xn+1 ∂Xn ∂f (Xn , θ) ∂Xn ∂f (Xn , θ)
= + + Δtl
∂θ ∂θ ∂Xn ∂θ ∂θ
∂g(Xn , θ) ∂Xn ∂g(Xn , θ) l
+ + Δwn+1 . (7.23)
∂Xn ∂θ ∂θ
M
∂P (Xn,m ) ∂Xn,m
M −1
m=1
∂Xn ∂θ
where pXn (x, θ) is the probability density function for Xn which will depend
on all of the inputs parameters. Since probability density functions are usually
smooth, Equation 7.24 can be differentiated to give
∂V ∂pXn ∂(log pXn ) ∂(log pXn )
= P (x) dx = P (x) pXn dx = E P (x) ,
∂θ ∂θ ∂θ ∂θ
M
∂ log pXn (Xn,m )
M −1 P (Xn,m ) .
m=1
∂θ
This is the LRM. Its great advantage is that it does not require the differen-
tiation of P (Xn ). This makes it applicable to cases in which the payoff is dis-
continuous, and it also simplifies the practical implementation because banks
often have complicated flexible procedures through which traders specify pay-
offs. However, it does have a number of limitations, one being a requirement
of absolute continuity which is not satisfied in a few important applications
such as the LIBOR market model [24].
218 High-Performance Computing in Finance
' ( 1 s
E P (X2 )|X2 −1 ≈ P (X2−1 , Δw2,i ). (7.27)
s i=1
At the coarse level, similar to the case of digital options, the fine increment
of the Brownian motion over the first half of the coarse timestep is used,
−1 1 s !
−1,i
E P X2−1
−1 |X 2−1 −1
, Δw
2 −2 ≈ P X2−1
−1 −1 , Δw2 −2 , Δw2−1 .
s i=1
(7.28)
This approach was tested in Reference 26, with the scalar Milstein scheme used
to obtain the penultimate step, and is presented in Table 7.5. As expected the
values of β tend to the rates offered by conditional expectations as s increases
and the approximation gets more precise.
220 High-Performance Computing in Finance
The outer expectation is an average over the discrete Brownian motion incre-
ments, while the inner conditional expectation is averaging over Z.
To compute the sensitivity to the input parameter θ, the first step
is to apply the pathwise sensitivity approach for fixed wl to obtain
∂μwl /∂θ, ∂σwl /∂θ. We then apply LRM to the inner conditional expectation
to get
% % &&
∂V ∂ ' ( ∂(log pX )
=E E P (X2 )|w
= E EZ P (X2 )
2
|w
,
∂θ ∂θ ∂θ
where
∂(log pX ) ∂(log pX ) ∂μ ∂(log pX ) ∂σ
2 2 w 2 w
= + .
∂θ ∂μw ∂θ ∂σw ∂θ
This leads to the estimator
% &
1
N
∂V ∂μw,m
∂(log pX2 ) ,m
≈ E P X2 |w
∂θ N m=1 ∂θ ∂μw
% &
∂σŵ,m
∂(log pX2 ) ,m
+ E P X2 |w . (7.29)
∂θ ∂σw
We compute (∂μw,m )/∂θ and (∂σw,m )/∂θ with pathwise sensitivities. With
X2,m,i
l = X2 (w,m , Z i ), we substitute the following estimators into Equa-
tion 7.29:
⎧ % &
⎪
⎪
∂(log pX2 ) ,m
⎪
⎪ E P X2 |w
⎪
⎪ ∂μw
⎪
⎪
⎪
⎪ ! X 2 ,m,i − μ ,m
⎪
⎪ 1 ) s
⎪
⎪ ≈ P X2 ,m,i 2 w
⎪
⎪ σ2
⎨ % s i=1 &w,m
⎪
∂(log pX2 ) ,m
⎪
⎪ E P X2 |ŵ
⎪
⎪ ∂σw
⎪
⎪ ⎛ !2 ⎞
⎪
⎪
⎪
⎪ ! X ,m,i
− μ
⎪
⎪ 1 ) s
⎜ 1 2 w ,m
⎟
⎪
⎪ ≈ P X2,m,i ⎝− + ⎠.
⎪
⎩ s i=1
σw,m 3
σw,m
222 High-Performance Computing in Finance
In a multilevel setting, at the fine level we can use Equation 7.29 directly.
At the coarse level, as for digital options in Section 7.3.5, the fine Brown-
ian increments over the first half of the coarse timestep are reused to derive
Equation 7.29.
The numerical experiments for the call option with s = 10 were obtained
[26], with scalar Milstein scheme used to obtain the penultimate step.
)N (t)
where the jump term J(t) is a compound Poisson process i=1 (Yi − 1),
the jump magnitude Yi has a prescribed distribution, and N (t) is a Poisson
process with intensity λ, independent of the Brownian motion. Due to the
existence of jumps, the process is a càdlàg process, that is, having right conti-
nuity with left limits. We note that x(t−) denotes the left limit of the process
while x(t) = lims→t+ x(t). In Reference 30, Merton also assumed that log Yi
has a normal distribution.
Multilevel Monte Carlo Methods for Applications in Finance 223
,−
Xn+1 = Xn + f (Xn ) Δtn + g(Xn ) Δwn+1
+ 12 g (Xn ) g(Xn ) (Δ(wn )2 − Δtn ),
,− ,−
Xn+1 + c(Xn+1 )(Yi − 1), when tn+1 = τi ,
Xn+1 = ,−
Xn+1 , otherwise,
(7.31)
Hence
−1
X̃n+ −1
1 = Xk
n
+ λ−1 (Xk̂−1 − Xk−1
n
)
n
2
!
+ g(Xk−1
n
) w ((n + 12 )Δt−1 ) − w (kn ) − λ−1 (w (k̂n ) − w (kn ))
Notice the use of the left limits X ,− . Following discussion in the previous
sections, the minima for the coarse timestep can be derived using interpolated
−1
value X̃n+ 1 . Deriving the payoffs for lookback and barrier options is now
2
straightforward.
For digital options, due to jump-adapted time grid, in order to find condi-
tional expectations, we need to look at relations between the last jump time
and the last timestep before expiry. In fact, there are three cases:
With this in mind, we can easily write down the payoffs for the coarse and
fine approximations as we presented in Section 7.3.5.
and define
Zt = 1t≥τn .
n≥1
where τ are the jump times. The acceptance probability for a candidate jump
under the measure Q is defined to be 12 for both coarse and fine paths, instead
of pτ = λ(X(τ −), τ ) / λsup . The corresponding Radon–Nikodym derivatives
are
⎧ ⎧
⎪ 1 ⎪ 1
⎨ 2pfτ , if U < ; ⎨ 2pcτ , if U < ;
f
Rτ = 2 c
Rτ = 2
⎪
⎩ 2(1 − p ),
f 1 ⎪
⎩ 2(1 − p ),
c 1
τ if U ≥ , τ if U ≥ .
2 2
Since V[Rτf − Rτc ] = O(Δt2 ) and V[P: − P:−1 ] = O(Δt2 ), this results in the
* *
multilevel correction variance VQ [P: τ Rτf − P:−1 τ Rτc ] being O(Δt2 ).
226 High-Performance Computing in Finance
If the analytic formulation is expressed using the same thinning and change
of measure, the weak error can be decomposed into two terms as follows:
% &
# #
EQ P: Rf − P
τ Rτ
τ τ
% & % &
# # #
= EQ (P: − P ) Rτf + EQ f
P ( Rτ − Rτ ) .
τ τ τ
Using Hölder’s inequality, the bound max(Rτ , Rτf ) ≤ 2 and standard results
for a Poisson process, the first term can be bounded using weak convergence
results for the constant rate process, and the second term can be bounded
using the corresponding strong convergence results [33]. This guarantees that
the multilevel procedure does converge to the correct value.
With this choice of Δt and h , the authors of Reference 34 analyzed the stan-
dard EM scheme for Lévy-driven SDEs. This approach gives good results for a
Blumenthal–Getoor index smaller than 1. For a Blumenthal–Getoor index big-
ger than 1, a Gaussian approximation of small jumps gives better results [35].
so that
L
E[PLf ] = E[P0f ] + E[Pf − P−1
c
],
=1
still holds. For lookback, barrier, and digital options, we showed that we can
obtain a better MLMC variance by suitably modifying the estimator on the
coarse levels. By further exploiting the flexibility of MLMC, Giles and Szpruch
[18] modified the estimator on the fine levels in order to avoid simulation of
the Lévy areas.
and therefore
P (X f ) − P (X c ) ≈ −(P (X a ) − P (X c )) ,
so that 12 P (X f ) + P (X a ) ≈ P (X c ). This leads to 12 P (X f ) + P (X a ) −
P (X c ) having a much smaller variance than the standard estimator P (X f ) −
P (X c ).
We now present a lemma which gives
an upper bound on the convergence
of the variance of 12 P (X f ) + P (X a ) − P (X c ).
Lemma 7.1. If P ∈ C 2 (Rd , R) and there exist constants L1 , L2 such that for
all x ∈ Rd ; ; ; 2 ;
; ∂P ; ; ;
; ; ≤ L1 , ; ∂ P ; ≤ L2 ,
; ∂x ; ; ∂x2 ;
then for p ≥ 2,
p
1
E (P (X f ) + P (X a )) − P (X c )
2
; ;p ;
;1 f ; ;
≤ 2p−1 Lp1 E ; (X +X a
) − X c;
+ 2−(p+1) p
L E ;X f − X a ;2p .
;2 ; 2
with x(0) = y(0) = 0, and zero correlation between the two Brownian motions
w1 (t) and w2 (t). These equations can be integrated exactly over a time interval
Multilevel Monte Carlo Methods for Applications in Finance 229
where Δwi,n ≡ wi (tn+1 ) − wi (tn ), and A12,n is the Lévy area defined as
t t
n+1
! n+1
!
A12,n = w1 (t) − w1 (tn ) dw2 (t) − w2 (t) − w2 (tn ) dw1 (t).
tn tn
1
E[(x2 (T ) − X2 (T ))2 ] ≥ T Δt.
4
c c −1
X1,n+1 = X1,n + Δw1,n+1
c c c −1 1 −1 −1
X2,n+1 = X2,n + X1,n Δw2,n+1 + Δw1,n+1 Δw2,n+1 . (7.36)
2
f f
X1,n+ 1 = X1,n + Δw
1,n+ 1
2 2
f f f 1
X2,n+ 1 = X2,n + X1,n Δw2,n+ 1 + Δw1,n+ 1 Δw
2,n+ 12
2 2 2 2
f f
X1,n+1 = X1,n+1 + Δw1,n+1
f f f 1
X2,n+1 = X2,n+ 1 + X
1,n+ 1 Δw2,n+1 + Δw1,n+1 Δw2,n+1 ,
2 2 2
−1
where Δwn+1 = Δwn+ 1 + Δwn+1 . Using this relation, the equations for the
2
two fine timesteps can be combined to give an equation for the increment over
230 High-Performance Computing in Finance
a a a 1
X2,n+ 1 = X2,n + X1,n Δw2,n+1 + Δw1,n+1 Δw2,n+1 ,
2 2
a a
X1,n+1 = X1,n+ 1 + Δw
1,n+ 1 ,
2 2
a a a 1
X2,n+1 = X2,n+ 1 + X1,n+ 1 Δw2,n+ 1 + Δw1,n+ 1 Δw
2,n+ 12 ,
2 2 2 2 2
and hence
a a −1
X1,n+1 = X1,n + Δw1,n+1 ,
a a a −1 1 −1 −1
X2,n+1 = X2,n + X1,n Δw2,n+1 + Δw1,n+1 Δw2,n+1
2
1 !
−
Δw1,n+ 1 Δw2,n+1 − Δw
2,n+ 2
1 Δw1,n+1 . (7.38)
2 2
Swapping Δwn+ 1 and Δwn+1 does not change the distribution of the driving
2
Brownian increments, and hence X a has exactly the same distribution as X f .
W
Wc
Wf
Wa
FIGURE 7.1: Brownian path and approximations over one coarse timestep.
Multilevel Monte Carlo Methods for Applications in Finance 231
Note also the change in sign in the last term in Equation 7.37 compared to
the corresponding term in Equation 7.38. This is important because these two
terms cancel when the two equations are averaged.
These last terms correspond to the Lévy areas for the fine and antithetic
paths, and the sign reversal is a particular instance of a more general result
for time-reversed Brownian motion [36]. If (wt , 0 ≤ t ≤ 1) denotes a Brownian
motion on the time interval [0, 1], then the time-reversed Brownian motion
(zt , 0 ≤ t ≤ 1) defined by
zt = w1 − w1−t (7.39)
has exactly the same distribution, and it can be shown that its Lévy area is
equal in magnitude and opposite in sign to that of wt .
Lemma 7.2. If Xnf , Xna , and Xnc are as defined above, then
f a c 1 f a
c
X1,n = X1,n = X1,n , X2,n + X2,n = X2,n , ∀n ≤ N
2
and
f a
4 3
E X2,N − X2,N = T (T + Δt) Δt2 .
4
In the following section, we will see how this lemma generalizes to nonlinear
multidimensional SDEs 7.4.
j=1
m !
+ hijk (Xnc ) Δwj,n Δwk,n+1
−1
− Ωjk Δt−1 .
j,k=1
The first fine path approximation Xnf (that corresponds to Xnc ) uses the
corresponding discretization with timestep Δt/2,
m
f f f
Xi,n+ 1 = Xi,n + fi (Xn ) Δt−1 /2 + gij (Xnf ) Δwj,n+
1
2 2
j=1
m !
+ hijk (Xnf ) Δwj,n+
1 Δw
k,n+ 1 − Ωjk Δt−1 /2 ,
(7.40)
2 2
j,k=1
!
m !
f f f f
Xi,n+1 = Xi,n+ 1 + fi Xn+ 1 Δt−1 /2 + gij Xn+ 1 Δwj,n+1
2 2 2
j=1
m !
f
+ hijk Xn+ 1
Δwj,n+1
Δwk,n+1 − Ωjk Δt−1 /2 , (7.41)
2
j,k=1
232 High-Performance Computing in Finance
−1
where Δwn+1 = Δwn+ 1 + Δwn+1 .
2
The antithetic approximation Xna is defined by exactly the same discretiza-
tion except that the Brownian increments Dwn+ 1 and Δwn+1 are swapped,
2
so that
m
a a a
Xi,n+ 1 = Xi,n + fi (Xn ) Δt−1 /2 + gij (Xna ) δwn+ 12
2
j=1
m
+ hijk (Xna ) Δwj,n+1
Δwk,n+1 − Ωjk Δt−1 /2 ,
j,k=1
!
m !
a a a a
Xi,n+1 = Xi,n+ 1 + fi Xn+ 1 Δt−1 /2 + gij Xn+ 1 Δwj,n+ 1
2 2 2 2
j=1
m ! !
a
k,n+ 1 − Ωjk Δt−1 /2 .
+ hijk Xn+ 1 Δwj,n+ 1 Δw
2 2 2
j,k=1
(7.42)
It can be shown that [18]
Lemma 7.3. For all integers p ≥ 2, there exists a constant Kp such that
f a p
E max Xn − Xn ≤ Kp Δtp/2 .
0≤n≤N
This together with a classical strong convergence result for Milstein dis-
cretization allows to estimate the MLMC variance for smooth payoffs. In the
case of payoff which is a smooth function of the final state x(T ), taking p = 2
in Lemma 7.1, p = 4 in Lemma 7.3, and p = 2 in Theorem 7.5 immediately
gives the result that the multilevel variance
1 f a
c
V P (XN ) + P (XN ) − P (XN )
2
has an O(Δt2 ) upper bound. This matches the convergence rate for the multi-
level method for scalar SDEs using the standard first-order Milstein discretiza-
tion, and is much better than the O(Δt) convergence obtained with the EM
discretization.
Multilevel Monte Carlo Methods for Applications in Finance 233
However, very few financial payoff functions are twice differentiable on the
entire domain Rd . A more typical 2D example is a call option based on the
minimum of two assets,
P (x(T )) ≡ max (0, min(x1 (T ), x2 (T )) − K) ,
which is piecewise linear, with a discontinuity in the gradient along the three
lines (s, K), (K, s), and (s, s) for s ≥ K.
To handle such payoffs, an assumption which bounds the probability of
the solution of the SDE having a value at time T close to such lines with
discontinuous gradients is needed.
and there is a corresponding definition for the fine timestep [tk+ 12 , tk+1 ]. It
can be shown that [18]
; f ;p
; ;
sup E ;X (t) − X c (t); ≤ Kp Δtp ,
0≤t≤T
f
where X (t) is the average of the piecewise linear interpolants X f (t) and X a (t).
T
−1
xave ≡ T x(t) dt.
0
T −1
1 c
N
c −1
Xave ≡T X c (t) dt = N −1 c
Xn + Xn+1 ,
n=0
2
0
T
N −1 !
f 1
Xave ≡ T −1 X f (t) dt = N −1 Xnf + 2Xn+
f f
1 + Xn+1 ,
n=0
4 2
0
T
N −1 !
a −1 1
Xave ≡T X a (t) dt = N −1 Xna + 2Xn+
a a
1 + Xn+1 .
n=0
4 2
0
; T
; ; ;p
; f a ;p
E Xave − Xave ≤T −1
E ;X f (t) − X a (t); dt
0
; ;p
≤ sup E ;X f (t) − X a (t); ,
[0,T ]
and similarly
; ;p ; f ;p
;1 f ; ; ;
; a c ;
E ; (Xave +Xave ) − Xave ; ≤ sup E ;X (t) − X c (t); .
2 [0,T ]
Hence, if the Asian payoff is a smooth function of the average, then we obtain
a second-order bound for the multilevel correction variance.
Multilevel Monte Carlo Methods for Applications in Finance 235
This analysis can be extended to include payoffs which are a smooth func-
tion of a number of intermediate variables, each of which is a linear functional
of the path x(t) of the form:
T
g T (t) x(t) μ(dt),
0
for some vector function g(t) and measure μ(dt). This includes weighted aver-
ages of x(t) at a number of discrete times, as well as continuously weighted
averages over the whole time interval.
As with the European options, the analysis can also be extended to pay-
offs which are Lipschitz functions of the average, and have first and second
derivatives which exist, and are continuous and uniformly bounded, except for
a set of points K of zero measure.
|P (x) − P (y)| ≤ L |x − y| , ∀ x, y ∈ Rd ,
and the first and second derivatives exist, are continuous and have uniform
bound L at all points x ∈ K, where K is a set of zero measure, and there exists
a constant c such that the probability of xave being within a neighborhood of
the set K has the bound
P min xave − y ≤ ε ≤ c ε, ∀ ε > 0.
y∈K
0 0
−5 −5
Log2 variance
Log2 |mean|
−10 −10
Pl Pl
−15 −15
Pl − Pl−1 Pl − Pl−1
Ref. 1 and 5 Ref. 1
−20 −20
0 2 4 6 8 0 2 4 6 8
Level l Level l
104 −2
Std MC
MLMC −4
103
Log2 variance
−6
ε 2 cost
102 −8
−10
101
−12 Xf − Xc
Ref. 1
100 −14
10−4 10−3 0 2 4 6 8
Accuracy ε Level l
The top left plot shows the behavior of the variance of both P and P − P−1 .
The superimposed reference slope with rate 1.5 indicates that the variance
V = V[P − P−1 ] = O(Δt1.5 ), corresponding to O(
−2
) computational com-
plexity of antithetic MLMC. The top right plot shows that E[P − P−1 ] =
O(Δt ). The bottom left plot shows the computational complexity C (as
defined in Theorem 7.1) with desired accuracy . The plot is of 2 C versus ,
because we expect to see that 2 C is only weakly dependent on s for MLMC.
For standard Monte Carlo, theory predicts that 2 C should be proportional
to the number of timesteps on the finest level, which in turn is roughly propor-
tional to −1 due to the weak convergence order. For accuracy = 10−4 , the
antithetic MLMC is approximately 500 times more efficient than the standard
Monte Carlo. The bottom right plot shows that V[X1. − X1.−1 ] = O(Δt ).
This corresponds to the standard strong convergence of order 0.5. We have
also tested the algorithm presented in Reference 18 for approximation of Asian
options. Our results were almost identical as for the European options. In order
to treat the lookback, digital, and barrier options, we found that a suitable
antithetic approximation to the Lévy areas is needed. For suitable modifica-
tion of the antithetic MLMC estimator, we performed numerical experiments
where we obtained O(−2 log()2 ) complexity for estimating barrier, digital,
and lookback options. Currently, we are working on theoretical justification
of our results.
Multilevel Monte Carlo Methods for Applications in Finance 237
∂p 1 ∂2p √ ∂p
Δp = −μ Δt + Δt − ρ ΔMt , x>0 (7.43)
∂x 2 ∂x2 ∂x
N−1
N−1
N−1
xn + yn + zn
n=1 n=1 n=1
• Model uncertainty
Our advice would be to always use double precision for the final accumu-
lation of payoff values and pathwise sensitivity analysis as much as possible
for computing Greeks, but if there remains a need for the path simulation to
be performed in double precision then one could use a two-level approach in
which level 0 corresponds to single precision and level 1 corresponds to double
precision.
240 High-Performance Computing in Finance
On both levels one would use the same random numbers. The multilevel
analysis would then give the optimal allocation of effort between the single
precision and double precision computations. Since it is likely that most of
the calculations would be single precision, the computational savings would
be a factor of 2 or more compared to standard double precision calculations.
0 0
−5
−10 −5
Log2 variance
Log2 |mean|
−15
−20 −10
−25
1
−30 −15
16 Pl
−35 256
4096 Pl − Pl−1
−40 −20
0 2 4 6 8 0 2 4 6 8
l l
105 10−1
ε = 0.00005
ε = 0.0001
104
ε = 0.0002
ε = 0.0005 10−2
103 ε = 0.001
ε2 cost
Nl
102
10−3
101
Std QMC
MLQMC
100 10−4
0 2 4 6 8 10−4 10−3
l ε
FIGURE 7.3: European call option (From Giles M.B. and Waterhouse B.J.
Advanced Financial Modelling, Radon Series on Computational and Applied
Mathematics, pages 165–181. de Gruyter, 2009.)
7.9 Conclusion
In the past 6 years, considerable progress has been achieved with the
MLMC method for financial options based on underlying assets described
by Brownian diffusions, jump diffusions, and more general Lévy processes.
The multilevel approach is conceptually very simple. In essence it is a
recursive control variate strategy, using a coarse path simulation as a control
variate for a fine path simulation, relying on strong convergence properties to
ensure a very strong correlation between the two.
In practice, the challenge is to couple the coarse and fine path simulations
as tightly as possible, minimizing the difference in the payoffs obtained for
each. In doing this, there is considerable freedom to be creative, as shown in
the use of Brownian Bridge constructions to improve the variance for lookback
242 High-Performance Computing in Finance
with α(s) = f (s)/g(s). Similarly, the minimum ys,t of the process (x(t)) on
the time interval [s, t] is given by
α(s)
ys,t = x(s) + g(s)mt−s .
α(s) α(s)
P [ys,t ≤ y|x(s), x(t)] = P x(s) + g(s)mt−s ≤ y|x(s) + g(s)wt−s = x(t)
α(s) y − x(s) α(s) x(t) − x(s)
= P mt−s ≤ |wt−s =
g(s) g(s)
2(x(s) − y)(x(t) − y)
= exp − (7A.3)
(g(s))2 (t − s)
Now imagine we want to derive these probabilities over time interval [s, u],
where t ≤ u conditioned on x(s) and x(u). The first strategy would be to
take Equation 7A.2 connecting x(s) and x(u) and calculate the conditional
distribution as we did in Equation 7A.4. The second strategy is as follows: (a)
first we sample a point x(t) from the BB connecting x(s) and x(u); (b) we
calculate the conditional distribution of the minimum of BB (Equation 7A.2)
conditioned first on x(s), x(t), and then on x(t), x(u). However in order to
make sure both strategies give us results that are equivalent in distribution
we are only allow to use the same Brownian bridge as we have used the first
strategy. This has a consequence in calculating conditional distribution of the
minimum given x(t) and x(u):
α(s) α(s)
P [ys,t ≤ y|x(t), x(u)] = P x(t) + g(s)mu−t ≤ y|x(t) + g(s)wu−t = x(u)
2(x(t) − y)(x(u) − y)
= exp − . (7A.4)
(g(s))2 (u − t)
Notice that we have not changed g(s) to g(t) and hence we have used the
same Brownian bridge for both strategies.
Another implication of conditional distribution 7A.1 is that we can find
the minimum ys,t explicitly. If we know the probability function F (z) of a
continuous random variable Z, we can generate random variable Z using uni-
formly distributed random variable U . Let U ∼ U ([0, 1]), then F −1 (U ) = Z,
where F −1 is an inverse function. It is straightforward to see that from Equa-
tion 7A.1 we have
1 . !
mα
t = z − z 2 − 2t log U in distribution.
2
244 High-Performance Computing in Finance
Now
α(s)
ys,t = x(s) + g(s)mt−s
⎛ - ⎞
2
1 x(t) − x(s) x(t) − x(s)
= x(s) + γ(s) ⎝ − − 2(g(s))2 (t − s) log U ⎠
2 g(s) g(s)
?
1 2 2
= x(t) + x(s) − (x(t) − x(s)) − 2(g(s)) (t − s) log U
2
References
1. Heinrich, S. Multilevel Monte Carlo Methods, volume 2179 of Lecture Notes in
Computer Science, pages 58–67. Springer-Verlag, 2001.
7. Giles, M.B. Improved multilevel Monte Carlo convergence using the Milstein
scheme. In Keller, A., Heinrich, S., and Niederreiter, H., editors, Monte Carlo
and Quasi-Monte Carlo Methods 2006, pages 343–358. Springer-Verlag, 2008.
10. Szpruch, L., Mao, X., Higham, D.J., and Pan, J. Numerical simulation of a
strongly nonlinear Ait-Sahalia-type interest rate model. BIT Numerical Mathe-
matics, 51(2):405–425, 2011.
11. Kloeden, P.E., Neuenkirch, A., and Pavani, R. Multilevel Monte Carlo for
stochastic differential equations with additive fractional noise. Annals of Oper-
ations Research, 189(1):255–276, 2011.
12. Kloeden, P.E. and Platen, E. Numerical Solution of Stochastic Differential Equa-
tions. Springer, Berlin, 1992.
13. Gaines, J.G. and Lyons, T.J.. Random generation of stochastic integrals. SIAM
Journal of Applied Mathematics, 54(4):1132–1146, 1994.
16. Clark, J.M.C. and Cameron, R.J. The maximum rate of convergence of discrete
approximations for stochastic differential equations. In Grigelionis, B., editor,
Stochastic Differential Systems Filtering and Control, pp. 162–171. Springer,
Berlin, Heidelberg, 1980.
18. Giles, M.B. and Szpruch, L. Antithetic Multilevel Monte Carlo estimation
for multi-dimensional SDEs without Lévy area simulation. Arxiv preprint
arXiv:1202.6283, 2012.
19. Giles, M.B., Higham, D.J., and Mao, X. Analysing multilevel Monte Carlo for
options with non-globally Lipschitz payoff. Finance and Stochastics, 13(3):403–
413, 2009.
20. Debrabant, K., Giles, M.B., and Rossler, A. Numerical analysis of multilevel
Monte Carlo path simulation using Milstein discretization: Scalar case. Technical
Report, 2011.
22. Avikainen, R. On irregular functionals of SDEs and the Euler scheme. Finance
and Stochastics, 13(3):381–401, 2009.
23. Broadie, M., Glasserman, P., and Kou, S. A continuity correction for discrete
barrier options. Mathematical Finance, 7(4):325–348, 1997.
26. Burgos, S. and Giles, M.B. Computing Greeks using multilevel path simulation.
In Plaskota, L. and Woźniakowski, H., editors, Monte Carlo and Quasi-Monte
Carlo Methods 2010. pp. 281–296. Springer, Berlin, Heidelberg, 2012.
27. Asmussen, A. and Glynn, P. Stochastic Simulation. Springer, New York, 2007.
28. Giles, M.B. Multilevel Monte Carlo for basket options. In Winter Simulation
Conference, pp. 1283–1290. Winter Simulation Conference, 2009.
29. Xia, Y. and Giles, M.B. Multilevel path simulation for jump-diffusion SDEs.
In Plaskota, L. and Woźniakowski, H., editors, Monte Carlo and Quasi-Monte
Carlo Methods 2010. pp. 695–708. Springer, Berlin, Heidelberg, 2012.
30. Merton, R.C. Option pricing when underlying stock returns are discontinuous.
Journal of Finance, 3:125–144, 1976.
33. Xia, Y. Multilevel Monte Carlo method for jump-diffusion SDEs. Arxiv preprint
arXiv:1106.4730, 2011.
34. Dereich, S. and Heidenreich, F. A multilevel Monte Carlo algorithm for Lévy-
driven stochastic differential equations. Stochastic Processes and their Applica-
tions, 121(7):1565–1587, 2011.
35. Dereich, S. Multilevel Monte Carlo algorithms for Lévy-driven SDEs with Gaus-
sian correction. Annals of Applied Probability, 21(1):283–311, 2011.
36. Karatzas, I. and Shreve, S.E. Brownian Motion and Stochastic Calculus. Grad-
uate Texts in Mathematics, Vol. 113. Springer, New York, 1991.
37. Barth, A., Schwab, C., and Zollinger, N.. Multi-level Monte Carlo finite element
method for elliptic PDEs with stochastic coefficients. Numerische Mathematik,
119(1):123–161, 2011.
38. Cliffe, K.A., Giles, M.B., Scheichl, R., and Teckentrup, A. Multilevel Monte
Carlo methods and applications to elliptic PDEs with random coefficients. Com-
puting and Visualization in Science, 14(1):3–15, 2011.
39. Graubner, S. Multi-level Monte Carlo Method für stochastiche partial Differen-
tialgleichungen. Diplomarbeit, TU Darmstadt, 2008.
40. Giles, M.B. and Reisinger, C. Stochastic finite differences and multilevel Monte
Carlo for a class of SPDEs in finance. SIAM Journal of Financial Mathematics,
3:572–592, 2012.
Multilevel Monte Carlo Methods for Applications in Finance 247
41. Belomestny, D. and Schoenmakers, J. Multilevel dual approach for pricing Amer-
ican style derivatives. Preprint 1647, WIAS, 2011.
43. Broadie, M. and Kaya, O. Exact simulation of stochastic volatility and other
affine jump diffusion processes. Operations Research, 54(2):217–231, 2006.
44. Glasserman, P. and Kim K.-K. Gamma expansion of the Heston stochastic
volatility model. Finance and Stochastics, 15(2):267–296, 2011.
45. Heston, S.I. A closed-form solution for options with stochastic volatility with
applications to bond and currency options. Review of Financial Studies, 6:327–
343, 1993.
46. Giles, M.B. and Waterhouse, B.J. Multilevel quasi-Monte Carlo path simulation.
In Advanced Financial Modelling, Radon Series on Computational and Applied
Mathematics, pages 165–181, 2009.
47. Gerstner, T. and Noll, M. Randomized multilevel quasi-Monte Carlo path sim-
ulation. In Recent Developments in Computational Finance: Foundations, Algo-
rithms and Applications, pp. 349–369, 2013.
48. Shreve, S.E. Stochastic Calculus for Finance: Continuous-Time Models, Vol. 2.
Springer, Berlin, Heidelberg, 2004.
Chapter 8
Fourier and Wavelet Option Pricing
Methods
CONTENTS
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
8.1.1 European option pricing problem . . . . . . . . . . . . . . . . . . . . . . . 251
8.2 COS Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
8.2.1 Density coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
8.2.2 Plain vanilla payoff coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . 254
8.2.3 Domain truncation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
8.2.4 Pricing multiple strikes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
8.3 Wavelet Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
8.4 WA[a,b] Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
8.4.1 Density coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
8.4.2 Plain vanilla payoff coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . 260
8.5 SWIFT method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
8.5.1 Density coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
8.5.2 Payoff coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
8.5.3 Pricing multiple strikes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
8.6 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
8.6.1 Computational time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
8.6.2 Robustness of WA[a,b] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
8.6.3 Rate of convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
8.6.4 Multiple strike pricing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
8.1 Introduction
In this overview chapter, we will discuss the use of exponentially converg-
ing option pricing techniques for option valuation. We will focus on the pricing
of European options, and they are the basic instruments within a calibration
procedure when fitting the parameters in asset dynamics. The numerical solu-
tion is governed by the solution of the discounted expectation of the pay-off
function. For the computation of the expectation, we require knowledge about
249
250 High-Performance Computing in Finance
where v denotes the option value, T the maturity, t the initial date, EQ the
expectation operator under the risk-neutral measure Q, x and y the state
variables at time t and T , respectively, f (y|x) the probability density function
of y given x, and r the deterministic risk-neutral interest rate.
Whereas f is typically not known, the characteristic function is often avail-
able. We represent the option values as function of the scaled log-asset prices,
and denote these prices by,
b
−r(T −t)
v(x, t) ≈ v1 (x, t) = e v(y, T )f (y|x) dy. (8.3)
a
where the apostrophe (’) after the summation sign denotes that the first term
of the summation is divided by 2. We will refer to Dk (x) as the (Fourier cosine)
density coefficients.
Inserting the Fourier cosine expansion of f (y|x) into Equation 8.3, using
Fubini’s Theorem, gives,
⎡ b ⎤
∞
y − a
v1 (x, t) = e−r(T −t) Dk (x) ⎣ v(y, T ) cos kπ dy ⎦ , (8.5)
b−a
k=0 a
where we note that the integral at the right-hand side is equal to the Fourier
coefficients of v(y, T ) in y (except for a constant). We therefore define the
payoff coefficients Vk as the Fourier cosine series coefficients of v(y, T ) as
b
2 y−a
Vk := v(y, T ) cos kπ dy, (8.6)
b−a b−a
a
and obtain,
∞
b − a −r(T −t)
v1 (x, t) = e Dk (x)Vk .
2
k=0
Due to the rapid decay of the payoff and density coefficients, we can further
truncate the series summation to obtain,
N −1
b − a −r(T −t)
b
fˇ(ω; x) = f (y|x)e iωy
dy ≈ f (y|x)eiωy dy.
R a
b
2 y−a
Dk (x) = f (y|x) cos kπ dy
b−a b−a
a
⎧ ⎫
⎨ b ⎬
2 a kπ
= Re e−ikπ b−a f (y|x) exp i y dy
b−a ⎩ b−a ⎭
a
⎧ ⎫
⎨ ⎬
2 a kπ
≈ Re e−ikπ b−a f (y|x) exp i y dy
b−a ⎩ b−a ⎭
R
2 kπ a
= Re fˇ ; x e−ikπ b−a =: Dk∗ (x). (8.7)
b−a b−a
N −1
kπ a
v(x, t) ≈ v3 (x, t) = e−r(T −t) Re fˇ ; x e−ikπ b−a Vk , (8.8)
b−a
k=0
b
2 y−a
Vk = [α · K(ey − 1)]+ cos kπ dy.
b−a b−a
a
Let us consider a European call option, α = 1. For a put, the steps are similar.
We distinguish two different cases. If a < b < 0, the integral equals zero, and
Vk = 0 for all k. In the other case, set ā = max(0, a). We can then rewrite
Vk as,
⎡ ⎤
b b
2 y − a 2 y − a
Vk = K ⎣ ey cos kπ dy − cos kπ dy ⎦ ,
b−a b−a b−a b−a
ā ā
where the first term within the brackets represents the Fourier cosine coeffi-
cient of the function ey and the second term the Fourier cosine coefficient of
the constant function 1. Both of them can be solved analytically using basic
calculus, and for a proof the reader is referred to [4].
and,
f (x) ∈ Vj ⇔ f (2x) ∈ Vj+1 .
If these conditions are met, then a function φ ∈ V0 exists, such that {φj,k }k∈Z
forms an orthonormal basis of Vj , where,
In other words, the function φ, called the scaling function or father wavelet,
generates an orthonormal basis for each Vj subspace.
Let us define Wj in such a way that Vj+1 = Vj ⊕ Wj . That ) is, Wj is the
space of functions in Vj+1 but not in Vj , and so, L2 (R) = j ⊕Wj . Then a
function ψ ∈ W0 exists, the mother wavelet, such that by defining,
the wavelet family {ψj,k }k∈Z gives rise to an orthonormal basis of Wj and
{ψj,k }j,k∈Z is a wavelet basis of L2 (R).
For any f ∈ L2 (R), a projection map of Pm : L2 (R) → Vm is defined by
means of,
m−1
Pm f (x) = dj,k ψj,k (x) = cm,k φm,k (x),
j=−∞ k∈Z k∈Z
where dj,k = f (x)ψj,k (x)dx, (8.11)
R
cm,k = f (x)φm,k (x)dx.
R
Note that the first part in Equation 8.11 is a truncated wavelet series. If j were
allowed to go to +∞, we would have the full wavelet series. The second part
in Equation 8.11 gives us an equivalent sum in terms of the scaling functions
φm,k . When m tends to infinity, by the theory of MRA, the truncated wavelet
series converges to f .
As opposed to the Fourier series in Equation 8.4, wavelets can be translated
(by means of k) and stretched or compressed (by means of j) to accurately
represent local properties of a function.
but the resulting wavelet family {φjm,k }m,k∈Z does not form an orthonormal
basis of L2 (R). However, they do form a Riesz basis, a relaxation or orthonor-
mality, which still allows us to apply MRA. For details about Riesz basis,
see [10].
Following the MRA framework, we define the wavelet family {φjm,k }m,k∈Z
with wavelets,
φjm,k (x) = 2m/2 φj (2m x − k),
for a fixed wavelet scale m. We discuss the choice of m ∈ N in the numerical
section at the end of this chapter.
Cardinal B-spline functions are compactly supported, with support
[0, j + 1], and their Fourier transform is,
j+1
1 − e−iω
φ̂j (ω) = .
iω
Since splines are only piecewise polynomial functions (see Figure 8.1), they
are very easy to implement.
In Ortiz-Gracia and Oosterlee [6], two methods for approximating the den-
sity function are described. We focus on the method that applies a Wavelet
Approximation on a bounded interval [a, b], the WA[a,b] method.
1.2
Order 0
Order 1
1 Order 2
Order 3
0.8
0.6
0.4
0.2
0
–1 0 1 2 3 4 5
(j+1)·(2m −1)
j y−a
c
fm,j (y|x) = Dm,k (x)φjm,k (j + 1) · , j ≥ 0, (8.12)
b−a
k=0
b
−r(T −t) c
v2 (x, t) = e v(y, T )fm,j (y|x) dy
a (8.13)
(j+1)·(2m −1)
= e−r(T −t) j
Dm,k j
(x)Vm,k ,
k=0
b
j y−a
Vm,k = v(y, T )φjm,k (j + 1) · dy. (8.14)
b−a
a
where γ denotes a circle of radius ρ, ρ > 0, about the origin. We set ρ = 0.9995
[13]. Considering now the change of variables z = ρeiu , and the approximation
Pmj
(z; x) ≈ Qjm (z; x) gives us,
2π
1
j
Dm,k (x) ≈ Qjm (ρeiu ; x)e−iku du. (8.16)
2πρk
0
We approximate the above integral with the Trapezoidal Rule over the grid
points un = n 2π
N for N = 2 (j + 1) and n = 0, 1, 2, . . . , N − 1. Thus the final
m
260 High-Performance Computing in Finance
Note that we can directly apply the FFT algorithm to compute the whole
j −1
vector of coefficients {Dm,k }N
k=0 with a computational complexity of just
O(N · log2 N ).
The resulting B-splines wavelet pricing formula for general European
options is,
N
v(x, t) ≈ e−r(T −t) j,∗
Dm,k j
(x)Vm,k , (8.18)
k=0
j,∗
where the density coefficients Dm,k (x) are given by Equation 8.17 . The payoff
coefficients depend on the type of contract, which we discuss in the following
section for plain vanilla options.
b
j + y−a
Vm,k = [α · K(ey − 1)] φjm,k (j + 1) · dy. (8.19)
b−a
a
Let us consider a European call option, α = 1. For a put, the steps are similar.
We distinguish two different cases. If a < b < 0, the integral equals zero,
j
and Vm,k = 0 for all k. In the other case, set a = max(0, a). We can then
j
rewrite Vm,k as,
⎡ b
y−a
j
Vm,k = K ⎣ ey φj (j + 1) · dy
m,k
b−a
a
⎤
b
y − a
− φjm,k (j + 1) · dy ⎦ . (8.20)
b−a
a
Both of the integrals can be solved analytically using basic calculus, and
for a proof the reader is referred to [6].
Fourier and Wavelet Option Pricing Methods 261
1 phi
psi
0.5
–0.5
–1
–6 –4 –2 0 2 4 6
x
FIGURE 8.2: Shannon scaling function φ(x) (phi, thick line) and wavelet
ψ(x) (psi, dashed line).
262 High-Performance Computing in Finance
where the scaling functions are defined from φ(x) = sinc(x) for a fixed wavelet
scale m ∈ N.
Since Shannon wavelets have infinite support, we take a different approach
in truncating the wavelet series. We note that for h ∈ Z,
$ $
h $$ h $$ m m
f m $ x ≈ Pm f m $ x =22 Dm,k (x)δk,h = 2 2 Dm,h (x).
2 2
k∈Z
2
Now, since f ∈ L (R) and it is nonnegative, and if we assume that
limx→±∞ f (x) = 0 then we conclude that Dm,k vanishes as well as k → ±∞.
We therefore approximate the infinite series in Equation 8.21 by a finite
summation without loss of considerable density mass,
k2
f (y|x) ≈ fm (y|x) := Dm,k (x)φm,k (y), (8.22)
k=k1
for conveniently chosen integers k1 < k2 . When setting Im = [ 2km1 , 2km2 ], the
option pricing formula becomes,
v(x, t) = e−r(T −t) v(y, T )f (y|x)dy
R
(8.23)
k2
−r(T −t)
≈e Dm,k (x)Vm,k ,
k=k1
k1 k2
≤ a < b ≤ m,
2m 2
J−1
2
∗ 1 2j − 1
sinc(t) ≈ sinc (t) := cos πt , (8.25)
2J−1 j=1
2J
(πc)2
|sinc(t) − sinc∗ (t)| ≤ , (8.26)
22(J+1) − (πc)2
Proof. We show how to find the expression for sinc∗ (t). The proof of the error
bound is Lemma 2 in Ortiz-Gracia and Oosterlee [11]. As shown by Vieta,
and described by Gearhart and Shultz [16], the sinc function can be written
as an infinite product,
∞
#
πt
sinc(t) = cos , (8.27)
j=1
2j
#
J J−1
2
πt 1 2j − 1
sinc(t) ≈ cos = cos πt =: sinc∗ (t).
j=1
2j 2J−1 j=1
2J
264 High-Performance Computing in Finance
We now note the resemblance between the integral in the right-hand side of
the equation above and the integral in the COS method in Equation 8.7. In a
similar way, we replace the integral over the unknown density function by its
Fourier transform,
2j − 1
f (x) cos π(2 m
x − k) dx
2J
R
⎧ ⎫
⎨ ⎬
2j − 1
= Re f (x) exp −i J π(2m x − k) dx
⎩ 2 ⎭
R
⎧ ⎫
⎨ 2j−1 ⎬
(2j − 1)π2m
= Re ei 2J πk f (x) exp −i x dx
⎩ 2J ⎭
R
2j−1 (2j − 1)π2m
= Re ei 2J πk fˆ . (8.28)
2J
Inserting this into the density coefficients gives us an expression for the
density coefficients,
2J−1
∗ 2m/2 ˆ (2j − 1)π2m ikπ(2j−1)
J
Dm,k (x) = J−1 Re f ;x e 2 . (8.29)
2 j=1
2J
A strategy for choosing J follows from Theorem 8.1, which implies that when
we set Mm,k = max(|2m a − k|, |2m b + k|) and Mm := maxk1 <k<k2 Mm,k then,
we set J = j := log2 (πMm ), where x denotes the smallest integer greater
than or equal to x. For a proof, the reader is referred to [11].
Although for every k another J could be chosen, we decide to fix a j for
all k, such that we can benefit from the efficiency of the FFT algorithm to
compute the vector of density coefficients {Dm,k (x)}kk=k2
1
at once, as described
by Ortiz-Gracia and Oosterlee [11].
WA[a,b] methods, we do not have an analytic expression for the payoff coeffi-
cients, but we can once more benefit from the FFT algorithm for an efficient
approximation.
We look for an expression for the payoff for a European call. The steps for
deriving a formula for the European put are similar. Recall that the payoff
coefficients for a European call are defined by,
⎡ ⎤
Vm,k = K ⎣ ey vφm,k (y)dy − φm,k (y)dy ⎦ , (8.30)
Im Im
b
1
Ij,k (a, b) := ey cos(ωj (2m y − k))dy
a
and,
b
2
Ij,k (a, b) := cos(ωj (2m y − k))dy,
a
where ωj := 2j−12J
π. Note that these integrals are just a change of variables
of integrals we had to solve for the COS payoff coefficients. For a proof the
reader is referred to [11].
When k2 ≤ 0, the payoff coefficients vanish, that is, Vm,k = 0 for every k.
In case 0 < k2 , we can write Equation 8.30 as,
2J−1
∗ 2m/2 1 k̄1 k2 2 k̄1 k2
Vm,k ≈ Vm,k := K J−1 Ij,k , − Ij,k , . (8.31)
2 j=1
2m 2m 2m 2m
The resulting SWIFT pricing formula for general European options is,
k2
v(x, t) ≈ e−r(T −t) ∗
Dm,k ∗
(x)Vm,k , (8.32)
k=k1
where the density coefficients are given by Equation 8.29, and the payoff coef-
∗
ficients Vm,k for a European call are given by Equation 8.31.
266 High-Performance Computing in Finance
k2
∗
f (y|x) = f (y − x|0) = Dm,k (0)φm,k (y − x),
k=k1
∗
where Dm,k (x) are the density coefficients as in Equation 8.29, evaluated at
x = 0. The SWIFT option pricing formula then becomes,
k2
v(x, t) =e−r(T −t) ∗
Dm,k ∗
(0)Vm,k (x),
k=k1
∗
where Vm,k (x) := v(y, T )φm,k (y − x)dy.
Im
Compared to the original SWIFT pricing formula in Equation 8.32, the depen-
dence on x has been moved from the density coefficients to the payoff coef-
ficients. The density coefficients have to be computed only once. The payoff
coefficients now depend on x, and they are generally cheaper to compute,
especially for the WA[a,b] method.
TABLE 8.1: CPU times (in milliseconds) for a European put on CGMY
dynamics at different scales for a corresponding absolute price error
COS SWIFT Haar
N Error Time Scale Error Time Scale Error Time
32 1.36e−02 0.41 1 1.58e−01 0.36 5 8.59e−02 0.42
64 3.32e−05 0.34 2 2.27e−04 0.42 7 1.41e−04 0.49
128 3.42e−09 0.39 3 5.61e−07 0.48 9 5.13e−07 1.10
256 4.44e−14 0.53 4 5.80e−11 0.70 11 1.73e−08 2.16
Note: Reference price by the COS method.
100 100
10–10 10–10
Density coefficients
Payoff coefficients
10–20 10–20
5 10 15 20 25 30 5 10 15 20 25 30
kth coefficient kth coefficient
Wavelets form a local basis and, as can be seen from the figure, each coefficient
j,∗
Dm,k only affects the points of the density locally, in the interval [ 2km , k+1
2m ].
We can avoid big round-off errors by removing the payoff coefficients that
cause very big round-off errors at the right-hand side of the domain. We
therefore consider the truncated series,
κm
vκm (x, t) := e−r(T −t) j,∗
Dm,k j
(x)Vm,k , (8.33)
k=0
j,∗
with Dm,k as in Equation 8.17, and by choosing κm such that vκm (x, t) < S0 ,
using that S0 is an upper bound for the value of a call, we find an error of
about 10−1 .
Lin.B-Spline
SWIFT
10–5
10–10
10–15
0 50 100 150 200 250 300
Number of coefficients (log scale)
Heston EU call T = 45
1010
COS
Haar
Absolute price error
105 Lin.B-Spline
SWIFT
100
10–5
10–10
0 20 40 60 80 100 120 140 160 180
Number of coefficients
Miliseconds
10–3
40
30
10–4
20
10–5 COS (n = 512)
SWIFT (m = 8) 10
Haar (m = 9)
10–6 0
95 100 105 0 50 100 150 200
Strike Number of strikes simultaneously
Remark 8.2. The SWIFT method has a slight disadvantage in multiple strike
pricing, as both the payoff and density coefficients are computed using the FFT,
and thus the resulting computational complexity is O((M + 1)N log2 N ).
generally for log-asset models from the affine jump-diffusion class, like the
Heston model. By wavelets, and in particular by the SWIFT method, we can
enhance the robustness of Fourier methods, when dealing with fat-tailed dis-
tributions or very long (and very short) maturity options. Parallelization tech-
niques may be employed in the context of the calibration framework, where
option pricing computations need to be performed for multiple strike prices.
It is the strike direction which lends itself well for parallelization, leading to
a truly high-performance calibration.
References
1. Carr, P.P. and Madan, D.B. Option valuation using the fast Fourier transform.
Journal of Computational Finance, 2:61–73, 1999.
3. Lindström, E., Ströjby, J., Brodén, M., Wiktorsson, M., and Holst, J. Sequential
calibration of options. Computational Statistics & Data Analysis, 52:2877–2891,
2008.
4. Fang, F. and Oosterlee, C.W. A novel option pricing method based on Fourier-
cosine series expansions. SIAM Journal on Scientific Computing, 31(2):826–848,
2008.
7. Ortiz-Gracia, L. and Oosterlee, C.W. Efficient VaR and expected shortfall com-
putations for nonlinear portfolios within the delta-gamma approach. Applied
Mathematics and Computation, 244:16–31, 2014.
8. Kirkby, J.L. Efficient option pricing by frame duality with the fast Fourier trans-
form. SIAM Journal on Financial Mathematics, 6(1):713–747, 2016.
10. Daubechies, I. Ten Lectures on Wavelets. Society for Industrial and Applied
Mathematics, Philadelphia, PA, USA, 1992.
272 High-Performance Computing in Finance
11. Ortiz-Gracia, L. and Oosterlee, C.W. A highly efficient Shannon wavelet inverse
Fourier technique for pricing European options. SIAM Journal on Scientific
Computing, 38(1):B118–B143, 2016.
13. Ortiz-Gracia, L. and Masdemont, J.J. Peaks and jumps reconstruction with
B-splines scaling functions. Journal of Computational and Applied Mathematics,
272:258–272, 2014.
15. Maree, S.C., Ortiz-Gracia, L., and Oosterlee, C.W. Pricing early-exercise and
discrete barrier options by Shannon wavelet expansions. Numerische Mathe-
matik, 136(4):1035–1070, 2017.
16. Gearhart, W.B. and Shultz, H.S. The function sin(x)/x. The College Mathemat-
ics Journal, 21(2):90–99, 1990.
17. Quine, B.M. and Abrarov, S.M. Application of the spectrally integrated Voigt
function to line-by-line radiative transfer modelling. Journal of Quantitative
Spectroscopy & Radiative Transfer, 244:37–48, 2013.
18. von Sydow, L. et al. BENCHOP—The BENCH marking project in option pric-
ing. International Journal of Computer Mathematics, 92:12, 2015.
19. Carr, P.P., Geman, H., Madan, D.B., and Yor, M. The fine structure of asset
returns: An empirical investigation. Journal of Business, 75:305–332, 2002.
20. Heston, S. A closed-form solution for options with stochastic volatility with
applications to bond and currency options. The Review of Financial Studies,
6:327–343, 1993.
Chapter 9
A Practical Robust Long-Term
Yield Curve Model
CONTENTS
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
9.2 Multifactor Yield Curve Models and Their Drawbacks . . . . . . . . . 276
9.2.1 Requirements for model development . . . . . . . . . . . . . . . . . . . 276
9.2.2 Available multifactor yield curve models . . . . . . . . . . . . . . . . 277
9.2.3 Classification of three-factor affine short rate models . . . 280
9.2.4 Difficulties with Gaussian affine models . . . . . . . . . . . . . . . . . 282
9.3 Nonlinear Black Correction for the EFM Model . . . . . . . . . . . . . . . . 283
9.3.1 Three-factor basic EFM model . . . . . . . . . . . . . . . . . . . . . . . . . . 284
9.3.2 EFM model calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
9.3.3 Black correction for negative rates . . . . . . . . . . . . . . . . . . . . . . 287
9.3.4 Stylized properties of Black models . . . . . . . . . . . . . . . . . . . . . 288
9.4 HPC Approaches to Calibrating Black Models . . . . . . . . . . . . . . . . . . 288
9.4.1 Three-factor Black model calibration . . . . . . . . . . . . . . . . . . . 290
9.4.2 Monte Carlo bond pricing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
9.4.3 PDE bond pricing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
9.4.4 Black model calibration progress . . . . . . . . . . . . . . . . . . . . . . . . 292
9.5 UKF EM Algorithm HPC Implementation . . . . . . . . . . . . . . . . . . . . . . 295
9.5.1 UKF for the Black EFM model . . . . . . . . . . . . . . . . . . . . . . . . . 295
9.5.2 Quasi-maximum likelihood estimation . . . . . . . . . . . . . . . . . . 297
9.5.3 Technical implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
9.5.4 HPC implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
9.6 Empirical Evaluation of the Model In- and Out-of-Sample . . . . . 299
9.6.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
9.6.2 Yield curve bootstrapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300
9.6.3 In-sample goodness-of-fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300
9.6.4 Out-of-sample Monte Carlo projection . . . . . . . . . . . . . . . . . . 304
9.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
Appendix 9A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310
273
274 High-Performance Computing in Finance
9.1 Introduction
Since the 2007–2008 financial crisis, low interest rates have prevailed in
all the world’s major developed economies and were presaged by more than a
decade in Japan. This has posed a problem for the widespread use of diffusion-
based yield curve models for derivative and other structured financial product
pricing and for forward rate simulation for systematic investment, asset lia-
bility management (ALM), and economic forecasting. Indeed, while Gaussian
models remained sufficiently accurate for pricing and discounting in relatively
high-rate environments, their tendency to produce an unacceptable proportion
of negative forward rates at short maturities with Monte Carlo scenario simu-
lation from initial conditions in low-rate environments has called their current
use into question. The implications for this question of negative nominal rates
in deflationary regimes remain to be seen, as does the necessity for currently
fashionable multicurve models. Be that as it may, beginning with work in the
Bank of Japan in the early 2000s, there has recently been a flurry of research in
universities, central banks, and financial services firms to develop yield curve
models whose simulation produces nonnegative rate scenarios.
All this work is based on a posthumously published suggestion of Fisher
Black (1995) to apply a call option payoff with zero strike to the model
instantaneous short rate which leads to a piecewise linear nonlinearity in
standard Gaussian affine yield curve model formulae for zero coupon (dis-
count) bond prices and the corresponding yields, and precludes their explicit
closed-form solution. As a result, most of the published solutions to Black-
corrected yield curve models to date are approximations and even these require
high-performance computing (HPC) techniques for numerical solution. We
shall study here an obvious approximation which works extremely well and is
amenable to cloud computing for speed up.
In practice, there are a variety of approaches to yield curve modeling which
are driven by the intended use of the models. The literature is devoted pre-
dominately to the needs of investment banks in pricing and hedging fixed
income derivative and other structured products. Model calibration is short
term, to current forward market data at pricing and hedging time, and is
updated for rehedging. A specific model is evaluated by its realized hedging
profit and loss.
A second approach is that of central bank forecasting for monetary pol-
icy making. Here calibration uses long-term historical data for medium-term
forecasts and is updated for the next forecast. Model accuracy appears from
the open literature to be mainly evaluated by in-sample fit to the historical
data employed, with little out-of-sample forecasting evaluation reported.
The approach of interest here supports consultants’ and fund managers’
advice to institutional and individual clients regarding product pricing, invest-
ment, and asset liability management over long horizons. It involves long-term
calibration of historical market data, often using filtering techniques, which
is updated for decision points such as restructurings or portfolio rebalances.
A Practical Robust Long-Term Yield Curve Model 275
• Allow both pricing and dynamic evolution under the market (real world)
measure, that is, the model should reflect the market risk premium1
• Reproduce a wide range of yield curve shapes and dynamics (to allow for
realistic risk assessment, for example), including steepening, flattening,2
inverted and humped yield curves
1 As argued in Nawalkha and Rebonato (2011), this is especially relevant for the buy-
side practitioner. For sell-side banks, it usually suffices to do pricing and initial hedging
calculations under the risk-neutral measure from the forward market data on the day. Having
an exact fit to the observed yields is thus more important for the sell side.
2 Products based on these properties of the yield curve are traded on the NYSE, for
example, US Treasury Flattener ETN (ticker: FLAT) and iPath US treasury Steepener
ETN (ticker: STPP), although admittedly these are not very popular.
A Practical Robust Long-Term Yield Curve Model 277
• Time homogeneity
the structure of forward rates. This framework is very general and convenient
for studying arbitrage-free properties in theory. However, some of the models
in this framework may be non-Markovian and most practical models coming
from the HJM framework are either well-known short rate models or market
models.
The class of market models is focused on describing the dynamics of the
observable quantities (e.g., LIBOR and SWAP market models). They are espe-
cially useful for derivative pricing. However, under the current actually occur-
ring low interest rate conditions, the popular LIBOR Market Model (see, e.g.,
de Jong et al. 2001) may require parameter estimates that are unrealistic
(e.g., simulated cash returns more volatile than actual equity returns, with
significant probability assigned to interest rates of more than 10,000%).
Model and computational complexity considerations, as well as the appli-
cations envisaged, suggest that short rate models are the most suitable class
for our needs.
There are other factors influencing our considerations. First, we have had
long successful experience with utilization of the EFM model (see Section 9.3)
in situations in which the rate ZLB is not binding. We have confidence in the
performance of the EFM yield curve model in these situations, so we would
prefer our new model not to deviate too far from it.
Secondly, most of the current research on ZLBs is done in the framework
of short rate models. Having a way of estimating the level of “shadow” rates
may be useful, not least because some policy makers appear to monitor them.
There have been attempts in the research literature to use the “shadow” short
rate and its distance to zero as a forecast of the estimated time until the
low-rate regime is lifted, see Ueno et al. (2006) and Wu and Xia (2014). The
Federal Reserve Bank of Atlanta publishes the Wu-Xia Shadow Federal Funds
Rate based on the Wu and Xia paper. It should be noted, however, that the
level of the shadow rate is not a very reliable indicator, as it is strongly model
dependent (see Bauer and Rudebusch 2013; Christensen and Rudebusch 2013).
A review of the literature on short rate models shows that most popu-
lar sub-class of short rate models in empirical research and applications are
the Affine Term Structure Models (ATSMs), due to their analytical tractabil-
ity, flexibility, and empirical efficiency. This class of models includes Vasicek
(1977), Dothan (1978), Cox-Ingersoll-Ross (1985), Ho-Lee (1986), Hull-White
(1990), and many other one- and multifactor models.
To illuminate the analysis that we undertake below for the more complex
multifactor models, we first discuss the characteristics of the simpler one-factor
models. The stochastic differential equations (SDEs) governing the evolution
of the short rate under the risk-neutral or pricing measure Q for the respective
models are:4
1. Vasicek (1977)
dX t = λ(θ − Xt ) dt + σ dW t (9.1)
4 We use boldface type in the sequel to denote stochastic entities, here conditionally.
A Practical Robust Long-Term Yield Curve Model 279
2. Dothan (1978)
3. Cox-Ingersoll-Ross (1985)
.
dX t = λ(θ − Xt ) dt + σ Xt dW t (9.3)
4. Ho-Lee (1986)
dX t = θt dt + σ dW t (9.4)
5. Hull-White (1990)
dX t = λ(θt − Xt ) dt + σt dW t . (9.5)
where A and B are solutions of certain ODEs (see, e.g., James and Webber
2000).
The instantaneous short rate is also an affine function of the state
rt = φQ Q
0 + φX X t . (9.10)
Zero coupon bond yields to maturity, termed rates, are linked with the
bond prices by
yt (τ ) = − log Pt (τ )/τ. (9.11)
There are models that lack affine structure (and thus forfeit simple formu-
lae for bond prices) but a vector of K rates Rt of specified maturities may
sometimes still be recovered as the numerical solution of the Ricatti equation
∂Rt (τ ) 1
= ΛRt (τ ) − Rt (τ )ΣSΣ
Rt (τ )
+ rt 1, (9.12)
∂τ 2
where 1 is the K vector of ones (Dempster et al. 2014).
A Practical Robust Long-Term Yield Curve Model 281
Dai and Singleton (2000) denote different affine subfamilies by Am (n) with
n the number of factors and m ≤ n the number of bounded factors. They
perform empirical tests on the different subclasses for n equal to 3. Dempster
et al. (2014) also analyzed various three-factor affine models with requirements
similar to ours to uncover a variety of shortcomings with the models evaluated.
In particular, they studied the three-factor extended Vasicek model specified
under the market measure P in Equation 9.6 by
⎛ ⎞
λ11 0 0
Λ := ⎝λ21 λ22 0 ⎠
λ31 λ32 λ33
⎛ ⎞
θ1
Θ := ⎝θ2 ⎠
θ3
⎛ ⎞
σ1 0 0 (9.13)
Σ := ⎝ 0 σ2 0 ⎠
0 0 σ3
⎛ ⎞
1 0 0
S := ⎝0 1 0⎠
0 0 1
r(t) := δ0 + δ1 y1 (t) + δ2 y2 (t) + δ3 y3 (t).
This Dai and Singleton A0 (3) model with 16 parameters (also known as
the Hull–White model) is not econometrically identified under P (i.e., different
values of Θ can give the same paths of the factor process X ) unless Θ := 0,
which is only appropriate to the pricing measure Q, and has other difficulties
as well.
Dempster et al. (2014) were led to introduce a Black-corrected affine model
which always produces nonnegative rates.6 This was based on the recent (2011)
Joslin–Singleton–Zhu (JSZ) three-factor affine Gaussian yield curve model
whose continuous evolution of the three factors Y is given by
.
dY (t) = Λ(θ − Y (t) + Π (t)) dt + Σ S(t) dW (t), (9.14)
where
Π(t) = k0 + K1 Y (t) (9.15)
is the affine state-dependent market price of risk (excess factor return) vector.
JSZ estimate the parameters of the discrete time version of their model with
three observed yield curve points (rates) fit exactly and a few extra rates
fit approximately by least squares by means of two standard econometric
vector autoregression (VAR) models given, respectively, under the market
(real-world) and pricing (risk-neutral) measures P and Q. Dempster et al.
(2014) use only a single extra yield curve point fit by least squares.
6 We shall describe the Black correction in the following section.
282 High-Performance Computing in Finance
1.0
0.8
0.6
0.4
0.2
0.0
–0.2
0 50 100 150 200 250 300
Maturity (months)—fitted exactly: 6, 60, 180, approximately: 300
(6 months, 5 years, and 15 years points fitted exactly, 25 years point fitted approximately)
0.004
0.002
0.000
–0.002
–0.004
0 200 400 600 800 1000 1200
Maturity (months)—fitted exactly: 6, 60, 180, approximately: 300
(Estimated using exact fit to 6 months, 5 years, and 15 years data points
and approximate fit to 25 years point)
January 2, 2012. This model gives a 25% probability of future 10-year negative
rates within 3.5 years starting from an initial value of about 3.2% which it
predicts will remain the median value over the 30-year forecast horizon.
In summary, we have stated a number of desirable requirements for a prac-
tical long-term yield curve model and have briefly surveyed the range of models
available in the literature. We determined that the short rate class is the most
suitable for our needs and within this class it seems that the most reasonable
decision a priori is to evaluate a model with three factors, in particular, within
the A0 (3) affine class. We have illustrated some of the potential drawbacks of
such models which do not have a correction for simulated nonnegative rates.
0.2
0.15
0.1 Q_0.01
Q_0.05
Q_0.25
0.05
Q_0.5
Q_0.75
0
Q_0.95
Jan-12
Jul-13
Jan-15
Jul-16
Jan-18
Jul-19
Jan-21
Jul-22
Jan-24
Jul-25
Jan-27
Jul-28
Jan-30
Jul-31
Jan-33
Jul-34
Jan-36
Jul-37
Jan-39
Jul-40
Q_0.99
–0.05 Market data
–0.1
–0.15
FIGURE 9.3: EFM model Euro 10-year rate forecast for 30 years.
mirrored by its rate of interest: the higher a people’s intelligence and moral
strength, the lower the rate of interest (Homer and Sylla 2005).
As a low-rate environment has prevailed in most major developed countries
since 2008, and in Japan since the early 1990s, it is crucial to realistically
model rates behavior in these circumstances. We will present a Black-corrected
version of the EFM model discussed in Medova et al. (2005), Yong (2007), and
Dempster et al. (2010).
dX t = λX (θX − Xt ) dt + σX dW X t
dY t = λY (θY − Yt ) dt + σY dW Yt (9.16)
dRt = k (Xt + Yt − Rt ) dt + σR dW Rt ,
and was first brought to our attention by Lehman Brothers under the auspices of Pioneer
Investments of UniCredit Bank.
A Practical Robust Long-Term Yield Curve Model 285
yt = Bxt + d, (9.19)
y obs
t = Bxt + d + εt , (9.21)
where y obs
t corresponds to the yields observed in the market and B and d are
defined above. The centered measurement error process εt is a K-vector of
serially independent Gaussian noise with covariance matrix H.8
Given a data series for the observed yields y obst , the KF generates an
estimated expected path of the Gaussian state variables and their conditional
covariance matrix Σt|t−1 .
Initialization
The filter is initialized using unconditional moments which following
Harvey (1993) gives initial values
x̂0 := (I − A)−1 c
(9.22)
vec(Σ0 ) := (I − A ⊗ A)−1 vec(G),
8 But see Dempster and Tang (2011) regarding handling measurement error serial corre-
innovations η. The matrix A and the vector c are the entities in the transition
equation 9.20 and the elements of Σ0 can be computed analytically.
KF prediction
x̂t|t−1 = Ax̂t−1 + c
ŷt|t−1 = B x̂t|t−1 + d (9.23)
T
Σt|t−1 = AΣt−1 A + G.
KF update
vt := ytobs − ŷt|t−1 = ytobs − B x̂t|t−1 − d
Ft = BΣt|t−1 B
+ H
(9.24)
x̂t = x̂t|t−1 + Σt|t−1 B T Ft−1 vt
Σt =Σt|t−1 − Σt|t−1 B
Ft−1 BΣt|t−1 .
Quasi-maximum likelihood parameter estimation
Let Θ denote the 14 SDE model parameters of the transition equation and
define ψ := {Θ, H}. Then the log-likelihood is given by
1 1
−1
T T
TK
log L(Θ, H ) = − log 2π − log det Ft − v F vt , (9.25)
2 2 t=1 2 t=1 t t
where K is the total number of maturities used, T is the number of time steps,
and v and F are computed using Equation 9.24.
The maximization of the log-likelihood is performed in two steps, alter-
natively optimizing Θ and H to convergence. There are two phases of the
numerical optimization: a global phase using the DIRECT global optimization
algorithm (Jones et al. 1993) to locate the region of the maximum, followed
by a local phase using an approximate conjugate gradient algorithm (Powell
1964) to locate the maximum itself.
FIGURE 9.4: Black JSZ model 10-year gilt rate 50-year prediction.
their linearity when modified using this idea. This makes the resulting mod-
els difficult to calibrate. We discuss different approaches to calibration in the
following section.
work on extensions to multifactor models started only after the crisis of 2008.
There are two main reasons for such a timeline. First, the ZLB was not
observed in the United States from the Great Depression until 2008. Only
in Japan from the mid-1990s did rates come near zero in a major economy.
Perhaps more importantly, the implementation of the Black correction is con-
siderably more difficult (both theoretically and computationally) than imple-
mentation of the usual ATSMs. The main problem is the lack of a closed-form
formula for the bond price given by
t+τ
Pt (τ ) = EtQ exp − (0 ∨ s u ) du . (9.27)
t
• Method of inferring states of the latent three factors from observed mar-
ket rates
– Inverse mapping or least squares
– Extended or iterated extended Kalman filter (EKF or IEKF) using
piecewise linearization
– Unscented Kalman filter (UKF) using averaged multiple displaced KF
paths
Filtering
Christensen and Rudebusch (2013), Bauer and Rudebusch (2014), and
Lemke and Vladu (2014) use the EKF for parameter estimates. However,
Krippner (2013a) uses IEKF to fit his shadow rate approximation for the case
of two and three factors, because he found that using the EKF was not robust.
Priebsch (2013) uses the UKF. Christoffersen et al. (2014) perform a series of
comparisons of EKF with UKF and the particle filter. They conclude that the
UKF significantly outperforms the EKF and performs well compared to the
significantly more computationally expensive particle filter.
Likelihood maximization
Most of the papers on shadow rate models omit discussion of the opti-
mization methods used for this purpose. Richard (2013) mentions that he
maximizes the likelihood function by using Nelder-Mead (1965) global search
combined with Powell (1964) local search.
The mapping that links observed yields and the shadow rate is no longer
linear, so it takes the piecewise linear form
example, the shadow rates in their analysis reach the implausibly low levels
of −15%, which suggests model misspecification (see Ichiue and Ueno, 2006,
2007).
Both Bomfim (2003) and Kim and Singleton (2011) relied on a numerical
(finite-difference) method for solving a 2D parabolic quasi-linear bond price
PDE given by
∂Pt 1 ∂ 2 Pt ∂Pt
− tr ΣΣ
− K(θ − x) + max [0, s(x)] Pt = 0 (9.31)
∂τ 2 ∂x∂x
∂x
where PtS (τ ) is the shadow bond price (i.e., the price of a bond in a market
where currency is not available) and CtA (τ, τ ; 1) is the value of an American
call option at time t with maturity in τ years and strike 1, written on the
shadow bond maturing in τ years. There is no analytic formula for CtA (τ, τ ; 1),
but Krippner argues that the American option can be approximated by an
analytically tractable European one and introduces an auxiliary bond price
equation
Ptaux (τ, τ + δ) = PtS (τ + δ) − CtE (τ, τ + δ; 1), (9.33)
where CtE (τ, τ + δ; 1) is the value of a European call option at time t with
maturity at time t + τ and strike 1 written on a shadow bond maturing at
t + τ + δ . Krippner then takes the limit with δ → 0 to obtain the nonnegative
(due to future currency availability immediately after maturity) instantaneous
294 High-Performance Computing in Finance
forward rate as
∂ aux
ft (τ ) = lim − ln Pt (τ, τ + δ) . (9.34)
δ→0 ∂δ
The nonnegative yield with maturity τ in Krippner’s framework is calcu-
lated as
1 t+τ S 1 t+τ ∂ CtE (τ, τ + δ; 1)
yt (τ ) = ft (s) ds = yt (τ ) + lim ds.
τ t τ t δ→0 ∂δ Pt (s + δ)
(9.35)
Here ytS (τ ) are the shadow bond yields. Unfortunately, closed-form ana-
lytic expressions for the bond prices and yields are still not available, but
they can be evaluated through calculating integrals that are numerically
tractable. More importantly, Krippner’s approach is not fully arbitrage-free.
The short rates are identical under the market measure P in the Black and
Krippner frameworks, but different under the risk-neutral measure Q. Kripp-
ner’s approach is extendible to three factors. Priebsch (2013) proposes to view
the quantity
t+τ
Q
log Pt (τ ) = log Et exp − (0 ∨ s u ) du . (9.36)
t
These 2L sigma point results are then combined to obtain the predicted
(here yield) measurements, measurements covariance matrix, and predicted
state-measurement cross-covariance matrix
2L
ŷt|t−1 = Wsj γtj
j=0
2L
Σyt yt = Wcj γtj − ŷt|t−1 γtj − ŷt|t−1 (9.44)
j=0
2L
Σxt yt = Wcj χjt|t−1 − x̂t|t−1 γtj − ŷt|t−1 ,
j=0
where the weights W j for combining sigma point estimates (predictions) are
potentially different for the state vector and the covariance matrices. They
are given by
Ws0 := λ
L+λ Wc0 := λ
L+λ + (1 − α2 + β)
1
(9.45)
Wsj := Wcj := 2(L+λ) j = 1, . . . , 2L.
1 1
−1
T T
TK
log L(Θ, H ) = − log 2π − log det Ft − v F vt ,
2 2 t=1 2 t=1 t t
by alternating between the parameters Θ and H, except that now the mea-
surement prediction errors in the last term of the log-likelihood are those of
the UKF and the calculation of the Ft terms from Equations 9.24 and 9.25
uses the UKF state covariance matrices Σt of Equation 9.48.
• Node 1:
Memory: 16 GB
298 High-Performance Computing in Finance
Master thread
DIRECT, Powell optimization,
slaves synchronizing
Slave thread 30
Slave thread 31
Slave thread 1
Slave thread 2
Kalman filter
Kalman filter
Kalman filter
Kalman filter
... ...
• Nodes 2 to 5:
4 x CPU Xeon (TM) 3 GHz
Memory: 16 GB
OS Centos 5.7
9.6.1 Data
We use a combination of LIBOR data and fixed interest rate swap rates
(the ISDA fix) for each of 4 currency areas (EUR, GBP, USD, JPY) to boot-
strap the yield curve daily for 14 maturities:
In the case of the Swiss franc (CHF), only 12 maturities are available:
The calibration periods used for these 5 currencies are the following:
After the 2012 Libor scandals, ICAP (formerly InterCapital Brokers) lost
to ICE Benchmark Administration Limited its role as administrator for the
ISDA fix rates, data collection, and calculation. Major reforms in the calcu-
lation methodology are being implemented (changing sources from polls of
contributing banks to actual transaction quotes). This transfer process is not
without difficulties for data providers.
The data was obtained from Bloomberg (indices US000**, BP000**,
EE000**, JY000**, SF000** for LIBOR rates and USISDA**, BPISDB**,
JYISDA**, SFISDA** for ISDAFIX rates).
9 Longer term out-of-sample yield curve prediction has recently been independently found
y(t, T ) = τ −1 [A(τ, θ)Rt + B(τ, θ)Xt + C(τ, θ)Yt + D(τ, θ)]. (9.49)
0.054
0.052
0.05
0.048
Black
0.046 EFM
0.044 Data
0.042
0.04
0.25
1.5
2.75
4
5.25
6.5
7.75
9
10.25
11.5
12.75
14
15.25
16.5
17.75
19
20.25
21.5
22.75
24
25.25
26.5
27.75
29
FIGURE 9.6: In-sample EUR yield curves on August 12, 2008.
0.04
0.035
0.03
0.025
Black
0.02
EFM
0.015 Data
0.01
0.005
0
0.25
0.75
1.25
1.75
2.25
2.75
3.25
3.75
4.25
4.75
5.25
5.75
6.25
6.75
7.25
7.75
8.25
8.75
9.25
9.75
0.04
0.035
0.03
0.025
0.02 Black
0.015 EFM
Data
0.01
0.005
0
0.25
1.5
2.75
4
5.25
6.5
7.75
9
10.25
11.5
12.75
14
15.25
16.5
17.75
19
20.25
21.5
22.75
24
25.25
26.5
27.75
29
0.06
0.05
0.04
0.03 Black
EFM
0.02 Data
0.01
0
0.25
1.25
2.25
3.25
4.25
5.25
6.25
7.25
8.25
9.25
10.25
11.25
12.25
13.25
14.25
15.25
16.25
17.25
18.25
19.25
20.25
21.25
22.25
23.25
24.25
25.25
26.25
27.25
28.25
29.25
FIGURE 9.9: In-sample USD yield curves on October 14, 2008.
to the average level of rates in the currency area. From Table 9.2 and Fig-
ures 9.6 through 9.10, we can see that the total measurement error volatility
of the best model fit is very respectably small for all currency areas, EUR,
GBP, and USD being the highest and CHF and JPY the lowest. Moreover, for
all but USD the Black-corrected EFM model goodness-of-fit equals or exceeds
that of the EFM shadow rate model.
Although the three models in Table 9.2 all have the same parameter set,
their likelihoods are not generally comparable as the models are not nested
in the statistical sense. However, the likelihoods of the affine EFM model
estimated with the KF and the UKF are comparable and in all cases, except
for Japan, the UKF likelihood exceeds the KF likelihood, a reflection of the
general power of the UKF widely attested to in the literature.
We may nevertheless compare the likelihoods achieved with the UKF for
the affine EFM and nonlinear Black EFM models. Here the Black EFM like-
lihood exceeds that of the EFM in all cases. In terms of total measurement
error standard deviation, the two models are also close, but the Black EFM
again gives the lowest values.
We found that the NAG UKF code we used required careful tuning of the
α parameter of Equation 9.42 which adjusts the displacement of sigma points
from the central path for each data set. The calibration column of Table 9.2
shows that for all currencies except CHF small α parameter values closer to the
generally recommended 10−3 are appropriate for the simple piecewise linear
option “hockey stick” nonlinearity being handled here with the UHF. We are
currently at a loss to explain the anomalous case with α = 1, particularly since
JPY has an even better overall fit than CHF. This suggests that in future it
may be fruitful to consider the reparametrization of sigma point displacement
and more generally to study the properties of the algorithm theoretically—
which appears to be a lacunae in the literature to date. To this end, further
A Practical Robust Long-Term Yield Curve Model 303
0.02
0.018
0.016
0.014
0.012
0.01 Black
0.008 EFM
Data
0.006
0.004
0.002
0
0.25
1.5
2.75
4
5.25
6.5
7.75
9
10.25
11.5
12.75
14
15.25
16.5
17.75
19
20.25
21.5
22.75
24
25.25
26.5
27.75
29
FIGURE 9.10: In-sample JPY yield curves on November 12, 2012.
0.15 Q_0.01
Q_0.05
0.1
Q_0.25
0.05 Q_0.5
0 Q_0.75
Jan-12
Aug-13
Mar-15
Oct-16
May-18
Dec-19
Jul-21
Feb-23
Sep-24
Apr-26
Nov-27
Jun-29
Jan-31
Aug-32
Mar-34
Oct-35
May-37
Dec-38
Jul-40
Q_0.95
–0.05
Q_0.99
–0.1 Market data
–0.15
EUR 10-year rate Black EFM with market data up to 15 July 2015
CHF 10-year rate Black EFM with market data up to 11 December 2014
Nov-17
Nov-20
Nov-23
Nov-26
Nov-29
Nov-32
Nov-35
Nov-38
Nov-41
May-13
May-16
May-19
May-22
May-25
May-28
May-31
May-34
May-37
May-40
of the paths of the quartiles and 1% and 5% tails of the 10,000 scenario dis-
tribution. The actual market data evolution is also plotted on these figures
up to a more recent date for each currency area: EUR and CHF, December
11, 2014; GPB and USD, January 15, 2015; and JPY, January 24, 2014.10
The out-of-sample 10-year rate median forecast root mean square error rel-
ative to the market realization is also shown in the figures for EUR, CHF,
0.1 Q_0.01
Q_0.05
0.05 Q_0.25
Q_0.5
Q_0.75
0 Q_0.95
May-13
Aug-14
Nov-15
Feb-17
May-18
Aug-19
Nov-20
Feb-22
May-23
Aug-24
Nov-25
Feb-27
May-28
Aug-29
Nov-30
Feb-32
May-33
Aug-34
Nov-35
Feb-37
May-38
Aug-39
Nov-40
Feb-42
Q_0.99
Market data
–0.05
–0.1
GBP 10-year rate Black EFM with market data up to 15 July 2015
Nov-17
Nov-20
Nov-23
Nov-26
Nov-29
Nov-32
Nov-35
Nov-38
Nov-41
May-13
May-16
May-19
May-22
May-25
May-28
May-31
May-34
May-37
May-40
Market data
GBP, and USD, which are naturally higher than the comparable in-sample
figures. Surprisingly these are best for the poorest in-sample fitting USD,
for which, as for CHF, the Black EFM model prediction is superior to that
of EFM.11
These figures demonstrate the basic negative scenario generation problem
with the Gaussian EFM model (cf. Dempster et al. 2014) and the primary
11 The data series for JPY was deemed too short at 8 months to be significant.
A Practical Robust Long-Term Yield Curve Model 307
0.15 Q_0.01
Q_0.05
0.1 Q_0.25
Q_0.5
Q_0.75
0.05
Q_0.95
Q_0.99
0 Market data
May-13
Aug-14
Nov-15
Feb-17
May-18
Aug-19
Nov-20
Feb-22
May-23
Aug-24
Nov-25
Feb-27
May-28
Aug-29
Nov-30
Feb-32
May-33
Aug-34
Nov-35
Feb-37
May-38
Aug-39
Nov-40
Feb-42
–0.05
–0.1
USD 10-year rate Black EFM with market data up to 15 July 2015
USD 10-year rate Black UKF alpha = 0.001
0.25
Q_0.01
0.2 Q_0.05
Q_0.25
0.15 Q_0.5
Q_0.75
0.1
Q_0.95
0.05 Q_0.99
Market data
0
May-13
Oct-14
Mar-16
Aug-17
Jan-19
Jun-20
Nov-21
Apr-23
Sep-24
Feb-26
Jul-27
Dec-28
May-30
Oct-31
Mar-33
Aug-34
Jan-36
Jun-37
Nov-38
Apr-40
Sep-41
Feb-43
Nov-17
Nov-20
Nov-23
Nov-26
Nov-29
Nov-32
Nov-35
Nov-38
Nov-41
May-13
May-16
May-19
May-22
May-25
May-28
May-31
May-34
May-37
May-40
–0.05 Q_0.99
–0.1
–0.15
–0.2
9.7 Conclusion
This chapter explains the initial development and evaluation of a new
approximation of the Black (1995) correction to ensure nonnegative nominal
rates at all maturities for a practically effective Gaussian three-factor affine
yield curve model—the EFM model. Perhaps the most important feature of
this novel approach is the demonstrated fact that the HPC calibration of the
Black EFM model can be effected in only about double the runtime of that of
the underlying shadow rate EFM model. Although some issues with using the
UKF code have been identified here for further work, the results presented
in this paper are promising, both in- and out-of-sample. We are confident
that addressing the identified issues in future research can result in a deeper
understanding of both the Black correction and the UKF.
A Practical Robust Long-Term Yield Curve Model 309
Acknowledgments
The research leading to these results has received funding from the Euro-
pean Union Seventh Framework Programme (FP7/2007–13) under grant
agreement no. 289032 (HPC Finance). The authors wish to acknowledge
support and helpful comments from John Holden and Martyn Byng of the
Numerical Algorithms Group, Grigorios Papamanousakis of Aberdeen Asset
Management and, particularly, Giles Thompson, senior associate of Cambridge
Systems Associates.
Appendix 9A
Given an initial set of parameters (Θ0 , H0 ), the EM algorithm for
estimating the parameters of the EFM model from market data using the
Kalman filter alternates between generating paths with the filter for the log-
likelihood function and optimizing this function in the model parameters.
Defining O(Θ, H) := log L(Θ, H), a single step of the EM algorithm for
quasi-MLE can be presented in pseudo code as follows:
Calculation of the log-likelihood function
1. Input (Θ0 , H0 )
2. for t = 1 to T do
3. KF predictions (9.19)
4. KF update (9.20)
5. Calculate a term of the log-likelihood function (9.21)
6. end for
7. Compute O(Θ, H) (9.21)
8. Output O(Θ, H)
5. end while
310 High-Performance Computing in Finance
9. end while
References
Andreasen, M. A. and Meldrum, A. 2014. Dynamic term structure models: The best
way to enforce the zero lower bound. CREATES Research Paper 2014–47.
Bauer, M. D. and Rudebusch, G. D. 2013. The shadow rate, Taylor rules and
monetary policy lift-off. Working Paper, Federal Reserve Bank of San Francisco,
February 2013. [Link]
[Link]
Bomfim, A. N. 2003. Interest rates as options: Assessing the markets view of the liq-
uidity trap. Working Paper 2003–45, Finance and Economics Discussion Series,
Federal Reserve Board, Washington, DC.
Christensen, J., Diebold, F., and Rudebusch, G. D. 2011. The affine arbitrage-free
class of Nelson–Siegel term structure models. Working Paper, Federal Reserve
Bank of San Francisco. Journal of Econometrics 164, 4–20.
Christoffersen, P., Dorion, C., Jacobs, K., and Karoui, L. 2014. Nonlinear Kalman fil-
tering in affine term structure models: CREATES Research Paper 14–04, Aarhus
University, January 2014.
A Practical Robust Long-Term Yield Curve Model 311
Cox, J., Ingersoll, J., and Ross, S. 1985. A theory of the term structure of interest
rates. Econometrica 53, 363–384.
Dempster, A. P., Laird, N. M., and Rubin, D. B. 1977. Maximum likelihood for
incomplete data via the EM algorithm. Journal of the Royal Statistical Society
39, 1–38.
Feldhutter, P., Heyerdahl-Larsen, C., and Illeditsch, P. 2015. Risk premia and volatil-
ities in a non-linear term structure model. Working Paper, The Wharton School,
University of Pennsylvania. [Link]
Gorovoi, V. and Linetsky, V. 2004. Black’s model of interest rates as options, eigen-
function expansions and Japanese interest rates. Mathematical Finance 14(1),
49–78.
Ho, T. and Lee, S. 1986. Term structure movements and pricing interest rate con-
tingent claims. Journal of Finance 41, 1011–1029.
Homer, S. and Sylla, R. 2005. A History of Interest Rates. Wiley, Hoboken, NJ, p.1.
Hull, J. and White, A. 1990. Pricing interest rate derivative securities. Review of
Financial Studies 3, 573–592.
312 High-Performance Computing in Finance
Ichiue, H. and Ueno, Y. 2006. Monetary policy and the yield curve at zero inter-
est: The macro-finance model of interest rates as options. Working Paper 06-
E-16, Bank of Japan. [Link] rev/wps 2006/
data/[Link]
Ichiue, H. and Ueno, Y. 2007. Equilibrium interest rates and the yield curve in a
low interest rate environment. Working Paper 07-E-18, Bank of Japan.
Ichiue, H. and Ueno, Y. 2013. Estimating term premia at zero bound: An analysis
of Japanese, US and UK yields. Working Paper 13-E-8, Bank of Japan.
Jameson, L. 1998. A wavelet-optimized, very high order adaptive grid and order
numerical method. SIAM Journal on Scientific Computing 19, 1980–2013.
Jong, F. de, Driessen, J., and Pelsser, A. 2001. Libor market models versus swap
market models for pricing interest rate derivatives: An empirical analysis. Euro-
pean Finance Review 5, 201–237.
Joslin, S., Singleton, K. J., and Zhu, H. 2011. A new perspective on Gaussian
dynamic term structure models. Review of Financial Studies 24, 926–970.
Julier, S. J., Uhlmann, J. K., and Durrant-Whyte, H. 1995. A new approach for
filtering nonlinear systems. In Proceedings of the American Control Conference,
1628–1632.
Julier, S. J. and Uhlmann, J. K. 1997. A new extension of the Kalman filter to non-
linear systems. International Symposium on Aerospace/Defense Sensing, Sim-
ulation and Control, Signal Processing, Sensor Fusion, and Target Recognition
VI 3, 182–193.
Kim, D. H. and Singleton, K. J. 2011. Term structure models and the zero bound: An
empirical investigation of Japanese yields. Journal of Econometrics 170, 32–49.
Krippner, L. 2012. Modifying Gaussian term structure models when interest rates
are near the zero lower bound. Discussion Paper 2012/02, Reserve Bank of New
Zealand.
Krippner, L. 2013a. A tractable framework for zero lower bound Gaussian term
structure models. CAMA Working Paper No. 49/2013, Australian National
University.
Krippner, L. 2013b. Faster solutions for Black zero lower bound term structure
models. CAMA Working Paper No. 66/2013, Australian National University.
A Practical Robust Long-Term Yield Curve Model 313
Lemke, W. and Vladu, A. L. 2014. A shadow-rate term structure for the Euro Area.
[Link]/events/pdf/conferences/140908/lemke [Link].
Lipton, A., Gal, A., and Lasis, A. 2014. Pricing of vanilla nad first-generation exotic
options in the local stochastic volatility framework: Survey and results. Quan-
titative Finance 14, 1899–1922.
Medova, E. A., Rietbergen, M. I., Villaverde, M., and Yong, Y. S. 2005. Modelling
the long term dynamics of yield curves. Working Paper 09/2005, Centre for
Financial Research, Judge Business School, University of Cambridge.
Nawalkha, S. K. and Rebonato, R. 2011. What interest rate models to use? Buy side
versus sell side. SSRN Electronic Journal, 01/2011.
Rebonato, R. and Cooper, I. 1995. The limitations of simple two-factor interest rate
models. Journal of Financial Engineering 5, 1–16.
Ron, U. 2000. A practical guide to swap curve construction. Working Paper 2000–17,
Bank of Canada, August 2000.
314 High-Performance Computing in Finance
Swanson, E. T. and Williams, J. C. 2013. Measuring the effect of the zero lower
bound on medium- and longer-term interest rates. Working Paper, Federal
Reserve Bank of San Francisco, January 2013. [Link]
research/files/[Link]
Ueno, Y., Baba, N., and Sakurai, Y. 2006. The use of the Black model of interest
rates as options for monitoring the JGB market expectations. Working Paper
06E15, Bank of Japan. [Link] rev/wps 2006/
data/[Link]
Yong, Y. S. 2007. Scenario Generation for Dynamic Fund Management. PhD Thesis,
Centre for Financial Research, Judge Business School, University of Cambridge.
Chapter 10
Algorithmic Differentiation
CONTENTS
10.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
10.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
10.2.1 Tangent mode AD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318
10.2.2 Adjoint mode AD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318
10.2.3 Second derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318
10.2.4 Review of AD in finance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
10.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320
10.3.1 Checkpointing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
10.3.2 Implicit functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
[Link] Linear systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
[Link] Nonlinear systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
[Link] Convex optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 326
10.3.3 Smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
10.3.4 Preaccumulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
10.3.5 Further issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
10.4 Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
10.4.1 European option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
10.4.2 American option pricing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
10.4.3 Nearest correlation matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332
10.5 Summary and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334
10.1 Motivation
Inspired by Giles and Glasserman [1], Algorithmic Differentiation (AD)
[2,3] has been gaining popularity in computational finance over recent years.
Adjoint AD (AAD) in particular facilitates a paradigm shift in financial mod-
elling through provision of first-order sensitivities at a relative computational
cost which is independent of the number of sensitivities asked for.
For illustration, we consider a simple European call option written on an
underlying driven by a local volatility process. Let S = (St )t≥0 be the solution
315
316 High-Performance Computing in Finance
where W = (Wt )t≥0 is a standard Brownian motion, r > 0 is the risk-free inter-
est rate, and σ is the local volatility function. The price of the call option is
then given by
V = e−rT E(ST − K)+ (10.2)
for maturity T > 0 and strike K > 0. The local volatility σ = σ(x, t) is
computed from the market observed implied volatility surface using bicubic
spline interpolation.
To compute the call price V from Equation 10.2, we apply a simple Euler–
Maruyama scheme to the log process Xt = log(St ) which satisfies the SDE
dXt = r − 12 σ 2 (Xt , t) dt + σ(Xt , t) dWt . (10.3)
600
FD
500 AAD
400
Run time (s)
300
200
100
0
0 50 100 150 200
n
10.2 Introduction
AD is a semantic transformation1 of a given computer code called the
primal code or primal function. In addition to computing the primal func-
tion value, the transformed code also computes the derivatives of the primal
function with respect to a specified set of parameters.
Consider a computer implementation of a function F mapping IRn × IRñ
into IRm × IRm̃ . We are interested in derivatives of the first m outputs of F
(the active outputs) with respect to the first n inputs (the active inputs). The
second m̃ outputs and second ñ inputs of F are termed the passive outputs and
passive inputs, respectively. For example, an active output may be the Monte
Carlo price of an option while a passive output may be the corresponding
confidence interval. An active input may be the initial asset price S0 , while
a passive input may be the set of random numbers used in the Monte Carlo
simulation. Without loss of generality and to keep the notation simple, we
restrict the discussion to scalar active outputs, that is, m = 1.2 We therefore
consider multivariate functions of the type
along with the values of all active and passive outputs as functions of the
active and passive inputs. Similarly, we look for the Hessian of all second
partial derivatives of y with respect to x, that is,
2
2 2 ∂ y
∇ F = ∇ F (x, x̃) ≡ ∈ IRn×n . (10.7)
∂xj ∂xi j,i=0,...,n−1
A corresponding tangent code implements (y, y (1) , ỹ) := F (1) (x, x(1) , x̃),
where
y (1) := ∇F (x, x̃) · x(1) (10.8)
A corresponding adjoint code implements (y, ỹ, x(1) ) := F(1) (x, x(1) , x̃, y(1) ),
where
x(1) := x(1) + ∇F (x, x̃)T · y(1) . (10.9)
The adjoint code therefore increments given adjoints x(1) of the active inputs
with the product of the gradient and a given adjoint y(1) of the active out-
put (see Reference 3 for details). We use subscripts (1) to denote first-order
adjoints. Initializing (seeding) x(1) = 0 and y(1) = 1 yields the gradient in x(1)
from a single run of the adjoint code. Again, the gradient is computed with
machine accuracy. The computational cost no longer depends on the size of
the gradient n.
(y, y (2) , y (1) , y (1,2) , ỹ) := F (1,2) (x, x(2) , x(1) , x(1,2) , x̃),
where
where
The full Hessian can be obtained by n runs of the second-order adjoint code.
The reduction in computational complexity due to the initial application of
adjoint mode to the primal code is therefore carried over to the second-order
adjoint. Sparsity of the Hessian can and should be exploited [5].
the pathwise AAD method for Greeks to a LIBOR market model and
Joshi has several other contributions in the field with different collaborators
[8–11]. Capriotti et al. applied AAD for fast Greeks in various different applica-
tions including PDEs, credit risk, Bermudan-style options, and XVA [12–16].
Antonov also used AAD for XVA and callable exotics [17,18]. Contributions
in the area of calibration include those of Turinici [19], Kaebe et al. [20],
Schlenkrich [21], and Henrard [22]. For Greeks in the context of discontinuous
payoffs, Giles introduced the Vibrato Monte Carlo method [23]. The problem
of discontinuity is also treated in works related to second-order Greeks [24,25].
The above is not meant as a complete review of AAD applications in
finance. Further related work can be found in the references within the cited
publications.
10.3 Implementation
AD constitutes a set of rules for deriving first- and higher-order tangent
and adjoint versions of a primal numerical simulation code. Two fundamental
modes of implementing AD are distinguished between source transformation
and operator overloading.
Source transformation rewrites the given primal code yielding a corre-
sponding derivative code usually in the same programming language. For
example, consider an implementation of f : IR2 → IR, y = f (x) = esin(x0 ·x1 )
as
v o i d f ( d o u b l e x , d o u b l e& y ) { x∗=y ; y=exp ( s i n ( x ) ) }
where x =ˆ x0 and y = ˆ x1 on input and with output y =
ˆ y. The first-order
tangent version returns the directional derivative
∂y
y (1) := · x(1)
∂x
in addition to the function value, for example,
v o i d t f ( d o u b l e x , d o u b l e& tx , d o u b l e& y , d o u b l e &ty ) {
tx=tx ∗y+x∗ ty ; x∗=y ;
y=\exp ( \ s i n ( x ) ) ; ty=y ∗\ c o s ( x ) ∗ tx .
}
Each arithmetic statement is augmented with its directional derivative (also
tangent).
A first-order adjoint version returns the adjoint directional derivative
∂y T
x(1) := x(1) + · y(1)
∂x
in addition to the function value as
Algorithmic Differentiation 321
Adjoints of all arithmetic statements are executed in reverse order (see reverse
section of the adjoint code). They require intermediate values computed in the
augmented forward section of the adjoint code. Some of these values may need
to be recorded prior to getting lost due to overwriting, for example, s. An in-
depth discussion of adjoint code generation rules is beyond the scope of this
article. Refer to Reference 3 for a more detailed description. Manual source
transformation turns out to be tedious, error-prone, and hard to maintain from
a software evolution perspective. Preprocessing tools have been developed for
many years providing reasonable coverage of Fortran and C in addition to
various simpler special-purpose scripting languages [26].
Currently, there is no mature source transformation tool for C++. The
method of choice for implementing AD for C++ programs is based on
operator and function overloading typically combined with advanced meta-
programming techniques using the dynamic typing mechanism provided by
C++ templates. Instead of parsing the primal source code followed by unpars-
ing a differentiated version, the semantics of operators and intrinsic functions
are redefined. Overloading for a custom active data type yields augmented
operations. For example, in basic tangent mode the active data type consists
of a value and a directional derivative component. All operations are over-
loaded for such pairs yielding, for example, (y, y (1) ) := (sin(x), cos(x) · x(1) ).
In basic adjoint mode, the operations are overloaded to record a tape of the pri-
mal computation. Conceptually, the tape can be regarded as a directed acyclic
graph with vertices representing the inputs to the program as well as all oper-
ations performed to compute its outputs. Edges represent data dependencies.
They can be labeled with the local partial derivative of the value represented
by its target vertex with respect to the value represented by its source. An
example is shown in Figure 10.2a. Adjoints are computed by interpretation of
the tape. The tape interpreter eliminates all intermediate vertices in reverse
topological order by introducing new edges connecting all predecessor vertices
with all successors. New edges are labeled with the product of the local partial
derivatives on the corresponding incoming and outgoing edges. Parallel edges
are merged by adding their labels. The vertex to be eliminated is removed
together with its incident edges; see Figure 10.2b and c for elimination of ver-
tices 3 and 2 from the tape in Figure 10.2a. Tape interpretation amounts to
a sequence of fused multiply–add (fma; the elemental operation of the chain
rule) operations whose length is of the order of the number of edges in the
tape. The resulting bipartite graph contains only edges representing nonzero
Jacobian/gradient entries.
322 High-Performance Computing in Finance
(y)
3: sin [y ⋅ cos(v2)]
2: ∗ 2: ∗
0: x0 1: x1 0: x0 1: x1 0: x0 1: x1
4 In basic adjoint mode, elemental functions are the arithmetic operations built into the
T
(1) = ∇F
assuming availability of adjoint elementals xi−1 · xi(1) for i = q, . . . , 1.
i
T
(1) = ∇F
Let xk−1 · xk(1) for some k ∈ {1, . . . , q} not be treated in basic adjoint
k
10.3.1 Checkpointing
Consider Figure 10.3 for motivation. Basic adjoint mode applied to x :=
F i (x) for i = 1, . . . , q (q = 3 in Figure 10.3) uses a store-all strategy. It
generates a tape of size q assuming unit tape size for the individual F i . The
total primal operations count5 adds up to q assuming unit primal operations
count per F i . A tape similar to Figure 10.3a is generated for all F i by running
→ ←
F i for i = 1, . . . , q followed by its interpretation by F j for j = q, . . . , 1 (see
Figure 10.3b). The tape memory requirement reaches its minimum 1 + in
a recompute-all strategy by checkpointing the original inputs of size 1
(↓ F 1 ) followed by the evaluation of F i for i = 1, . . . , q − 1, the generation
→ ←
of the tape for F q (F q ) and its interpretation (F q ). Repeated accesses to the
checkpoint (↑ F 1 ) enable the recursive application of this data flow reversal
scheme for i = q − 1, . . . , 1 (see Figure 10.3c). The primal operations count
grows quadratically with q yielding 6 for q = 3. Figure 10.3d illustrates a data
5 Number of evaluations of the primal function. The adjoint operations count is invariant
with respect to different data flow reversal schemes as the adjoint of each primal operation
is evaluated exactly once.
324 High-Performance Computing in Finance
(a) (b)
4: y := R(x1, x2, x3) ↓ F1
F1 ; F2 ; F3
→ ←
[∇x1 R] [∇x2 R] [∇x3 R] R; R
↑ F1
→ ←
1: x1 := F 1(x) 2: x2 := F 2(x) 3: x3 := F 3(x) F1 ; F1
↑ F1
→ ←
[∇F 1) [∇F 2] [∇F 3] F2 ; F2
↑ F1
→ ←
0: x F3 ; F3
FIGURE 10.4: Adjoint of ensemble: tape (a) and pathwise adjoints (b).
flow reversal scheme built on two checkpoints. It reduces the primal operations
count to 5 at the expense of additional memory required to store the second
checkpoint yielding a total memory requirement of 1 + 2.
Single- and multilevel checkpointing schemes have been proposed in the
literature [27]. The fundamental combinatorial optimization problem of min-
imizing the primal operations count given an upper bound on the available
memory for storing tape and checkpoints is known to be NP-complete [28].
Efficient algorithms for its solution exist for relevant special cases such as
evolutions [29] similar to the previous example. They form the core of many
iterative algorithms including Crank–Nicolson schemes used in the context of
finite difference methods for solving parabolic partial differential equations
(see also Section 10.4).
A second special case with particular relevance to finance is Monte Carlo
sampling for solving SDEs as, for example, in Section 10.1 (see also Sec-
tion 10.4). Refer to Figure 10.4 for illustration. Adjoints of such ensembles
can be computed very efficiently through exploiting the missing data depen-
dencies among the individual paths (F i ) drawing from a common set of active
inputs x. Their results are typically computed in parallel followed by a reduc-
tion to a (often scalar) value y by some function R. A gap in the tape is
induced by checkpointing x (with size in memory equal to ) as an input to
the F i (e.g., ↓ F 1 ) followed by a passive evaluation of the primal ensemble and
the generation and interpretation of the tape for R. Adjoints can be computed
individually for each path after recovering x (e.g., ↑ F 1 ). The maximum mem-
ory requirement is limited to 1+ under the assumption that a single path has
unit memory requirement exceeding the memory occupied by the tape of R.
The primal operations count is roughly doubled, that is, approximately equal
to 2q, where again q = 3 in Figure 10.4. Parallelization of pathwise adjoint
computation turns out to be straightforward. Potential conflicts need to be
handled when writing to x(1) .
Algorithmic Differentiation 325
linear system
∂2G
(s, λ)T · z = −s(1)
∂s2
followed by a single call of the second-order adjoint version of F at the solution
z to obtain
∂2G
λ(1) := λ(1) + (s, λ)T · z.
∂λ∂s
Savings in computational complexity are obtained as in Section [Link]. Sim-
ilar comments apply. See Reference 33 for a discussion of this method in the
context of calibration.
10.3.3 Smoothing
AD is based on the assumption that the given implementation of the target
function F : IRn → IRm is continuously differentiable at all points of inter-
est. This prerequisite is likely to be violated in many practical applications.
Generalized derivatives have been proposed to overcome this problem; see, for
example, References 34 and 35 for recent work in this area.
In pricing of financial derivatives, nonsmoothness is often induced by
branches in the flow of control depending, for example, on a strike price. An
option may be exercised or not. Any data flow dependence of the predicted
price or payoff p on strike K is lost suggesting independence of the sensitivity
of p from K, which is obviously false. Local finite differencing as well as vari-
ous smoothing techniques can be used to potentially overcome this problem.
For example, in sigmoidal smoothing [36] the nonsmooth function
f1 (S), S < K,
f (S) =
f2 (S), otherwise
and the width of the transition α controls the quality of the approximation.
A case study is discussed in Section 10.4.2.
Algorithmic Differentiation 327
10.3.4 Preaccumulation
Preaccumulation is a technique for speeding up the computation of adjoints
while at the same time reducing the size of the tape. It comes in various flavors.
As an example, we consider adjoint versions of implementations of a simulation
F : IRn → IRm as y = F (x) = F 3 (F 2 (F 1 (x))), where F 2 : IRk → IRl is
assumed to yield a tape with q edges (local partial derivatives). Without loss
of generality, the Jacobian ∇F 2 ∈ IRl×k is assumed to be dense.6 Hence, the
number of scalar fma operations required for its evaluation in tangent mode
AD is k · q. Accumulation of the overall Jacobian ∇F ∈ IRm×n in adjoint
mode induces a local computational cost of m · q due to m interpretations
of the tape of F 2 . Preaccumulation of ∇F 2 in tangent mode yields a tape
with k · l edges. The contribution of F 2 to the total cost of accumulating ∇F
without preaccumulation (m · q) potentially exceeds the cumulative cost of
preaccumulation of ∇F 2 (k · q) followed by interpretation of the compressed
tape (adding m · k · l). Moreover, no tape is required for preaccumulating ∇F 2
yielding a reduction of the overall tape size by q − k · l assuming unit memory
size per tape entry.
Alternative scenarios for preaccumulation include the repeated use of a
local Jacobian as part of an iteration in the enclosing adjoint computation. In
this case, the local Jacobian should be preaccumulated (cached) potentially
yielding significant savings in terms of tape size and run time. An exponential
number (in terms of the size of the tape) of different scenarios for preaccumu-
lation result from the associativity of the chain rule. Determining the optimal
method turns out to be computationally intractable [37]. Further theory is
discussed in Reference 38.
6 Jacobian compression methods based on coloring techniques enter the scene in case of
sparsity [5].
328 High-Performance Computing in Finance
where Ex,t denotes expectation with respect to the measure under which the
Markov process X starts at time t ∈ [0, T ] at the value x ∈ IR. Standard
results from the theory of Markov processes then show that V satisfies the
parabolic PDE
∂
∂
0= V (x, t) + r − 12 σ 2 (x, t) V (x, t) (10.16)
∂t ∂x
∂2
+ 12 σ 2 (x, t) 2 V (x, t) − rV (x, t) for (x, t) ∈ IR × [0, T ),
∂x
(ex − K)+ = V (x, T ) for all x ∈ IR. (10.17)
Table 10.2 shows the accuracy of gradient entries computed via finite dif-
ferences (forward and central) compared with AD. Figures are for the smallest
problem (gradient size n = 10).
Table 10.3 compares peak memory requirements and elapsed run times
for the PDE solutions without and with (evolution) checkpointing for N =
104 spatial grid points and M = 360 time steps on our reference computer.
Basic adjoint mode (pde/a1s) yields infeasible memory requirements for all
gradients of size n ≥ 10.
First-order sensitivities are obtained by AAD for five active inputs, the
stock price, volatility, time to maturity, risk-free interest rate, and strike price,
respectively. Second-order sensitivities are computed for Δ with respect to the
five active inputs.
A computation of the test case with the number of paths NP equal to 106
and 102 exercise opportunities NT yields enormous memory requirements in
basic adjoint mode. Moreover, it can be seen that some of the second-order
sensitivities including Γ are computed to be zero by using the AD technique,
due to its inability to capture control flow dependencies. After each regression,
a decision occurs whether to exercise the current option or to hold it (see
line 11 of Algorithm 10.1). This decision leads to zero adjoints of the local
cash flow with respect to the exercise boundary.
For the memory reduction evolution, checkpointing introduced in Sec-
tion 10.3.1 is applied to each iteration in line 2 of Algorithm 10.1. A checkpoint
consists of the values of the inputs, the local cash flow, and the time of the
respective loop cycle.
To capture the control flow dependency of the exercise decision sigmoidal
smoothing is used as introduced in Section 10.3.3. Therefore the exercise
boundary is chosen to be the center of the transition. Then, the “if” statement
in line 11 and the payoff function in line 12 of the algorithm are replaced by
the assignment (10.21) in which σs is the sigmoid function and vp denotes the
payoff of the current path.
vp := (1 − σs ) · (K − Sp ) + σs · vp . (10.21)
Checkpointing yields a reduction in memory requirement by approximately
85% compared to basic adjoint mode. Larger test cases can be computed at a
Algorithmic Differentiation 331
Algorithm:
1: Initialization
2: for t = NT − 2 to 1 do
3: I ← {}
4: for p = 1 to NP do
5: vp ← vp · exp (−r · T /NT )
6: Sp,t = h(S0 , T, σ, r, t, NT , Zp,t )
7: if Sp,t < K then
8: I ← I ∪ {p};
9: b = R(I, St , v)
10: for all p ∈ I do
11: if K − Sp,t > b then
12:
)vp ← K − Sp,t
13: V ← p vp · exp (−r · T /NT ) /NP
slightly higher computational cost. Run times and memory requirements are
shown in Table 10.4.
Those second-order sensitivities which could not be calculated satisfac-
torily due to the missing control flow dependencies are approximated by the
smoothing approach for the exercise decision. All other sensitivities are similar
to the values that are computed with the AAD method without smoothing.
The option price as well as the sensitivities are given in Table 10.5. The qual-
ity of the smoothing depends on the transition parameter α often determined
through experiments in practice.
By assuming the missing control flow dependencies to be negligible, the
time and the path loop can be switched and the algorithm for the sensitivity
332 High-Performance Computing in Finance
TABLE 10.4: Run times and required tape memory for a single pricing
calculation and the basic and the checkpointed adjoint methods
Run time (s) Memory requirement (GB)
Basic Checkpointed Basic Checkpointed
NT Pricer adjoint adjoint Pricer adjoint adjoint
100 22 192 (8.7) 228 (10.4) 0.80 84.78 10.93
500 113 – 1011 (8.9) 3.78 >100 49.90
1000 226 – 2245 (9.9) 7.51 >100 98.47
Note: Relative run times are given in brackets.
TABLE 10.5: Value and sensitivities of the test cases for the
algorithmic differentiation methods applied to the basic and smoothed
version (subscript s) of the Longstaff–Schwartz algorithm for S0 = 1, K = 1,
T = 1, r = 4%, σ = 20%, and α = 0.005
NT V Vs Δ Δs ν νs Γ Γs
100 0.06361 0.06353 −0.41962 −0.41761 0.37653 0.37921 0 0.82227
500 0.06378 0.06345 −0.42414 −0.41749 0.37688 0.38275 0 0.86813
1000 0.06369 0.06318 −0.42465 −0.41869 0.37611 0.38515 0 0.73878
Note: In Reference 44, analytical reference values for V and Δ are given as Vref = 0.064
and Δref = −0.416.
∂h ∗
(y , G)T · z = −y∗ (1) (10.24)
∂y
The basic adjoint run times clearly show the advantages of AAD for com-
puting Greeks as opposed to using a finite differences based approach. The
symbolic adjoint of the NCM presents a significant improvement as it is mul-
tiple orders of magnitude faster than the basic adjoint and is not restricted
by memory even for large or hard problem instances.
References
1. Giles, M. and Glasserman, P. Smoking adjoints: Fast Monte Carlo greeks. Risk,
19:88–92, 2006.
Algorithmic Differentiation 335
5. Gebremedhin, A., Manne, F., and Pothen, A. What color is your Jacobian?
Graph coloring for computing derivatives. SIAM Review, 47:629–705, 2005.
6. Leclerc, M., Liang, Q., and Schneider, I. Fast Monte Carlo Bermudan greeks.
Risk, 22(7):84–88, 2009.
8. Joshi, M. and Pitt, D. Fast sensitivity computations for Monte Carlo valuation
of pension funds. Astin Bulletin, 40:655–667, 2010.
9. Joshi, M. and Yang, C. Fast and accurate pricing and hedging of long-dated
CMS spread options. International Journal of Theoretical and Applied Finance,
13:839–865, 2010.
10. Joshi, M. and Yang, C. Algorithmic hessians and the fast computation of cross-
gamma risk. IIE Transactions, 43:878–892, 2011.
11. Joshi, M. and Yang, C. Fast delta computations in the swap-rate market model.
Journal of Economic Dynamics and Control, 35:764–775, 2011.
13. Capriotti, L. and Giles, M. Fast correlation greeks by adjoint algorithmic differ-
entiation. Risk, 23:79–83, 2010.
14. Capriotti, L. and Giles, M. Adjoint greeks made easy. Risk, 25:92, 2012.
15. Capriotti, L., Jiang, Y., and Macrina, A. Real-time risk management: An AAD–
PDE approach. International Journal of Financial Engineering, 2:1550039,
2015.
16. Capriotti, L., Jiang, Y., and Macrina, A. AAD and Least Squares Monte Carlo:
Fast Bermudan-style options and XVA greeks. Algorithmic Finance, 6(1–2):35–
49, 2017.
18. Antonov, A., Issakov, S., Konikov, M., McClelland, A., and Mechkov, S. PV and
XVA greeks for callable exotics by algorithmic differentiation. 2017. Available at
336 High-Performance Computing in Finance
19. Turinici, G. Calibration of local volatility using the local and implied instanta-
neous variance. Journal of Computational Finance, 13(2):1, 2009.
20. Käbe, C., Maruhn, J. H., and Sachs, E. W. Adjoint-based Monte Carlo calibra-
tion of financial market models. Finance and Stochastics, 13(3):351–379, 2009.
22. Henrard, M. Calibration in finance: Very fast greeks through algorithmic dif-
ferentiation and implicit function. Procedia Computer Science, 18:1145–1154,
2013.
23. Giles, M. Vibrato Monte Carlo sensitivities. In L’Ecuyer, P. and Owen, A. edi-
tors. Monte Carlo and Quasi Monte Carlo Methods, 369–382. Springer, 2009.
24. Capriotti, L. Likelihood ratio method and algorithmic differentiation: Fast sec-
ond order greeks. Algorithmic Finance, 4:81–87, 2015.
25. Pironneau, O. et al. Vibrato and automatic differentiation for high order deriva-
tives and sensitivities of financial options. arXiv preprint arXiv:1606.06143 ,
2016.
26. Utke, J., Naumann, U., Fagan, M., Tallent, N., Strout, M., Heimbach, P., Hill,
C., and Wunsch, C. OpenAD/F: A modular open-source tool for automatic
differentiation of Fortran codes. ACM Transactions on Mathematical Software,
34:18:1–18:36, July 2008.
27. Stumm, P. and Walther, A. Multi-stage approaches for optimal offline check-
pointing. SIAM Journal of Scientific Computing, 31:1946–1967, 2009.
30. Giles, M. Collected matrix derivative results for forward and reverse mode algo-
rithmic differentiation. In Bischof, C., Bücker, M., Hovland, P., Naumann, U.,
and Utke, J., editors. Advances in Automatic Differentiation, Volume 64 of Lec-
ture Notes in Computational Science and Engineering, pages 35–44. Springer,
2008.
32. Naumann, U., Lotz, J., Leppkes, K., and Towara, M. Algorithmic differentiation
of numerical methods: Tangent and adjoint solvers for parameterized systems
of nonlinear equations. ACM Transactions on Mathematical Software, 41:26:1–
26:21, 2015.
35. Khan, K. and Barton, P. A vector forward mode of automatic differentiation for
generalized derivative evaluation. Optimization Methods and Software, 30(6):1–
28, 2015.
40. Deussen, J., Mosenkis, V., and Naumann, U. Fast Estimates of Greeks from
American Options: A Case Study in Adjoint Algorithmic Differentiation. Tech-
nical Report AIB-2018-02, RWTH Aachen University, January 2018.
43. Qi, H. and Sun, D. A quadratically convergent Newton method for comput-
ing the nearest correlation matrix. SIAM Journal Matrix Analysis Applications,
28:360–385, 2006.
44. Geske, R. and Johnson, H. E. The American put option valued analytically.
Journal of Finance, 39(5):1511–1524, 1984.
Chapter 11
Case Studies of Real-Time Risk
Management via Adjoint
Algorithmic Differentiation (AAD)
CONTENTS
11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340
11.2 Adjoint Algorithmic Differentiation: A Primer . . . . . . . . . . . . . . . . . . 341
11.2.1 Adjoint design paradigm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342
[Link] A simple example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
11.3 Real-Time Risk Management of Interest Rate Products . . . . . . . . 344
11.3.1 Pathwise derivative method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344
[Link] Libor market model simulation . . . . . . . . . . . . . . . 345
11.4 Real-Time Counterparty Credit Risk Management . . . . . . . . . . . . . 349
11.4.1 Counterparty credit risk management . . . . . . . . . . . . . . . . . . . 350
11.4.2 Adjoint algorithmic differentiation and the counterparty
credit risk management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352
[Link] Rating transition risk . . . . . . . . . . . . . . . . . . . . . . . . . 354
11.4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354
11.5 Real-Time Risk Management of Flow Credit Products . . . . . . . . . 357
11.5.1 Pricing of credit derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
[Link] Calibration step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358
11.5.2 Challenges in the calculation of credit risk . . . . . . . . . . . . . . 359
11.5.3 Adjoint calculation of risk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
11.5.4 Implicit function theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
11.5.5 Adjoint of the calibration step . . . . . . . . . . . . . . . . . . . . . . . . . . 362
11.5.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363
[Link] Credit default swaps . . . . . . . . . . . . . . . . . . . . . . . . . . 363
[Link] Credit default index swaptions . . . . . . . . . . . . . . . 364
11.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369
339
340 High-Performance Computing in Finance
11.1 Introduction
The renewed emphasis of the financial industry on quantitatively sound
risk management practices comes with formidable computational challenges.
In fact, standard approaches for the calculation of risk require repeating the
calculation of the P&L of the portfolio under hundreds of market scenar-
ios. As a result, in many cases these calculations cannot be completed in a
practical amount of time, even employing a vast amount of computer power,
especially for risk management problems requiring computationally intensive
Monte Carlo (MC) simulations. Since the total cost of the through-the-life risk
management can determine whether it is profitable to execute a new trade,
solving this technology problem is critical to allow a securities firm to remain
competitive.
Following the introduction of adjoint methods in Finance [1], a compu-
tational technique dubbed adjoint algorithmic differentiation (AAD) [2–4]
has recently emerged as tremendously effective for speeding up the calcu-
lation of sensitivities in MC in the context of the so-called pathwise derivative
method [5].
Algorithmic differentiation (AD) [6] is a set of programming techniques for
the efficient calculation of the derivatives of functions implemented as com-
puter programs. The main idea underlying AD is that any such function—no
matter how complicated—can be interpreted as a composition of basic arith-
metic and intrinsic operations that are easy to differentiate. What makes AD
particularly attractive, when compared to standard (finite-difference) meth-
ods for the calculation of derivatives, is its computational efficiency. In fact,
AD exploits the information on the structure of the computer code in order
to optimize the calculation. In particular, when one requires the derivatives
of a small number of outputs with respect to a large number of inputs, the
calculation can be highly optimized by applying the chain rule through the
instructions of the program in opposite order with respect to their original
evaluation [6]. This gives rise to the adjoint (mode of) algorithmic differenti-
ation (AAD).
Surprisingly, even if AD has been an active branch of computer science
for several decades, its impact in other research fields has been fairly limited
until recently. Interestingly, in a twist with the usual situation in which well-
established ideas in Applied Maths or Physics have been often “borrowed” by
quants, AAD has been introduced in MC applications in Natural Science [7]
only after its “rediscovery” in Quantitative Finance.
In this chapter, we discuss three particularly significant applications of
AAD to risk management, interest rate products, counterparty credit risk
management (CCRM), and volume credit products, that illustrate the power
and generality of this groundbreaking numerical technique.
Case Studies of Real-Time Risk Management via AAD 341
Y = FUNCTION(X) (11.1)
X → · · · → U → V → · · · → Y. (11.2)
which corresponds to the adjoint mode equation for the intermediate function
V = V (U )
∂Vk
Ūi = V̄k , (11.4)
∂Ui
k
namely a function of the form Ū = V̄ (U, V̄ ). Starting from the adjoint of the
outputs, Ȳ , we can apply this to each step in the calculation, working from
right to left,
X̄ ← · · · ← Ū ← V̄ ← · · · ← Ȳ (11.5)
until we obtain X̄, that is, the following linear combination of the rows of the
Jacobian of the function X → Y :
m
∂Yj
X̄i = Ȳj , (11.6)
j=1
∂Xi
with i = 1, . . . , n.
342 High-Performance Computing in Finance
In the adjoint mode, the cost does not increase with the number of inputs,
but it is linear in the number of (linear combinations of the) rows of the
Jacobian that need to be evaluated independently. In particular, if the full
Jacobian is required, one needs to repeat the adjoint calculation m times,
setting the vector Ȳ equal to each of the elements of the canonical basis in
Rm . Furthermore, since the partial (branch) derivatives depend on the values
of the intermediate variables, one generally first has to compute the original
calculation storing the values of all of the intermediate variables such as U
and V , before performing the adjoint mode sensitivity calculation.
One particularly important theoretical result [6] is that given a computer
program performing some high-level function 11.1, the execution time of its
adjoint counterpart
X̄ = FUNCTION b(X, Ȳ ) (11.7)
with ωA ∈ [3,4]. Thus one can obtain the sensitivity of a single output, or of
a linear combination of outputs, to an unlimited number of inputs for a little
more work than the original calculation.
As also discussed at length in References 2–4, AAD can be straightfor-
wardly implemented by starting from the output of an algorithm and pro-
ceeding backwards applying systematically the adjoint composition rule 11.4
to each intermediate step, until the adjoints of the inputs 11.6 are computed.
As already noted, the execution of such backward sweep requires information
that needs to be computed and stored by executing beforehand the steps of
the original algorithm—the so-called forward sweep.
a volatility surface or an interest rate curve, that are shared across different
pricing applications.
Fortunately, the principles of AD can be used as a programming paradigm
for any algorithm. An easy way to illustrate the adjoint design paradigm is
to consider again the arbitrary computer function in Equation 11.1 and to
imagine that this represents a certain high-level algorithm that we want to
differentiate. By appropriately defining the intermediate variables, any such
algorithm can be abstracted in general as a composition of functions as in
Equation 11.2. In the following section, we give a very simple example illus-
trating this idea. The interested reader can find in Reference 3 a practical
step-by-step guide.
)n
Step 2̄ Set X̄i = P̄ I( i=1 Xi − K), for i = 1, . . . , n. Here I(x) is the Heavi-
side function.
Here X(t) is a N -dimensional vector and represents the value of a set of under-
lying market factors (e.g., stock prices, interest rates, foreign exchange pairs,
and so on) at time t. P (X(T1 ), . . . , X(TM )) is the discounted payout function
of the priced security and depends in general on M observations of those fac-
tors. In the following, we will indicate the collection of such observations with
a d = N × M -dimensional state vector X = (X(T1 ), . . . , X(TM ))t .
The expectation value in Equation 11.9 can be estimated by means of
MC by sampling a number NMC of random replicas of the underlying state
vector X[1], . . . , X[NMC ], sampled according to the distribution Q(X), and
evaluating the payout P (X) for each of them. This leads to the estimate of
the option value V as
1
NMC
V $ P (X[iMC ]) . (11.10)
NMC iMC =1
The point of this subtle change is that P(Z) does not depend on the parameters
θ whereas Q(X) does. Indeed, whenever the payout function is regular enough,
for example, Lipschitz-continuous, and under additional conditions that are
Case Studies of Real-Time Risk Management via AAD 345
often satisfied in financial pricing (see, e.g., [9]), one can write the sensitivity
θ̄k ≡ ∂V /∂θk as
∂Pθ (X)
θ̄k = EQ . (11.12)
∂θk
In general, the calculation of Equation 11.12 can be performed by applying
the chain rule and averaging on each MC path the so-called pathwise derivative
estimator
∂Pθ (X) ∂Pθ (X) ∂Xj
d
∂Pθ (X)
θ̄k ≡ = × + . (11.13)
∂θk j=1
∂X j ∂θ k ∂θk
dLi (t)
= μi (L(t)) dt + σi (t)T dWt , (11.14)
Li (t)
i
σiT σj δLj (t)
μi (L(t)) = , (11.15)
1 + δLj (t)
j=η(t)
where η(t) denotes the index of the bond maturity immediately following
time t, with Tη(t)−1 ≤ t < Tη (t). As is common in the literature, to keep this
example as simple as possible, we take each vector σi to be a function of time
346 High-Performance Computing in Finance
to maturity
σi (t) = σi−η(t)+1 (0) = λ(i − η(t) + 1). (11.16)
Equation 11.14 can be simulated by applying an Euler discretization to
the logarithms of the forward rates, for example, by dividing each interval
[Ti , Ti+1 ) into Ns steps of equal width, h = δ/Ns . This gives
Li (tn+1 )
√
= exp μi (L(tn )) − ||σi (tn )||2 /2 h + σiT (n)Z(tn ) h , (11.17)
Li (tn )
where L̂j (tn+1 ) is calculated from Lj (tn ) using the evolution 11.17, that is,
with the simple Euler drift 11.15.
The pseudocode for the propagation of the Libor rates for dW = 1, cor-
responding to a function PROP implementing the Euler step 11.17, is shown
in Figure 11.1. Here, as discussed in Reference 1, the computational cost of
implementing Equation 11.17 is minimized by first evaluating
i
σj δLj (tn )
vi (tn ) = , (11.19)
1 + δLj (tn )
j=η(nh)
N +1
V (Tn ) = B(Tn , Ti )δ(Sn (Tn ) − K)+ , (11.20)
i=n+1
Case Studies of Real-Time Risk Management via AAD 347
#
i−1
1
B(Tn , Ti ) = , (11.21)
1 + δLl (Tn )
l=n
1 − B(Tn , TN +1 )
Sn (Tn ) = )N +1 . (11.22)
δ l=n+1 B(Tn , Tl )
348 High-Performance Computing in Finance
FIGURE 11.2: Adjoint of the propagation method PROP bn for the Libor
Market Model of Equation 11.14 for dW = 1, under the predictor corrector
Euler approximation 11.18, and the volatility parameterization 11.16. The
corresponding forward method is shown in Figure 11.1. The instructions com-
mented are the forward counterpart to the adjoint instructions immediately
after.
Case Studies of Real-Time Risk Management via AAD 349
140
130
120
110
100
90 Delta and Vega bumping
80
RCPU
80
60
50
Delta bumping
40
30
20
10 Delta and Vega AAD
~2.2
0
0 5 10 15
Tn
FIGURE 11.3: Ratio of the CPU time required for the AAD calculation of
the Delta and Vega and the time to calculate the option value for the swaption
in Equation 11.20 as a function of the option expiry Tn . The time to calculate
Delta and Vega using bumping is also shown. Lines are guides for the eye.
where τc is the default time of the counterparty, N P V (t) is the net present
value of the portfolio at time t from the dealer’s point of view, C(R(t)) is the
collateral outstanding, typically dependent on the rating R of the counter-
party, LGD (t) is the loss given default, D(0, t) is the discount factor for the
interval [0, t], and I(τc ≤ T ) is the indicator that the counterparty’s default
happens before the longest deal maturity in the portfolio, T . Here for simplic-
ity of notation we consider the unilateral CVA, the generalization to bilateral
CVA [13] is straightforward. The quantity in Equation 11.23 is typically com-
puted on a discrete time grid of “horizon dates” T0 < T1 < · · · < TNO as, for
instance,
NO
!+
VCVA $ E I(Ti−1 < τc ≤ Ti )D(0, Ti )LGD (Ti ) N P V (Ti ) − C R(Ti− ) .
i=1
(11.24)
In general, the quantity above depends on several correlated random market
factors, including interest rate, counterparty’s default time and rating, recov-
ery amount, and all the market factors the net present value of the portfolio
depends on. As such, its calculation requires an MC simulation.
To simplify the notation and generalize the discussion beyond the small
details that might enter in a dealer’s definition of a specific credit charge, here
Case Studies of Real-Time Risk Management via AAD 351
NO
P = P (Ti , R(Ti ), X(Ti )) , (11.26)
i=1
where
NR
P (Ti , R(Ti ), X(Ti )) = P̃i (X(Ti ); r) δr,R(Ti ) . (11.27)
r=0
Here the rating of the counterparty entity including default, R(t), is rep-
resented by an integer r = 0, . . . , NR for simplicity; X(t) is the realized
value of the M market factors at time t. Q = Q(R, X) represents a prob-
ability distribution according to which R = (R(T1 ), . . . , R(TN0 ))t and X =
(X(T1 ), . . . , X(TN0 ))t are distributed; P̃i (·; r) is a rating-dependent payout at
time Ti .2
The expectation value in Equation 11.25 can be estimated by means of
MC by sampling a number NMC of random replicas of the underlying rating
and market state vector, R[1], . . . , R[NMC ] and X[1], . . . , X[NMC ], according
to the distribution Q(R, X), and evaluating the payout P (R, X) for each of
them.
In the following, we will make minimal assumptions on the particular
model employed to describe the dynamics of the market factors. In partic-
ular, we will only assume that for a given MC sample the value at time Ti of
the market factors can be obtained from their value at time Ti−1 by means
of a mapping of the form X(Ti ) = Fi (X(Ti−1 ), Z X ) where Z X is an N X -
dimensional vector of correlated standard normal random variates, X(T0 ) is
today’s value of the market state vector, and Fi is a mapping regular enough
for the pathwise derivative method to be applicable [9], as it is generally the
case for practical applications.
As an example of a counterparty rating model generally used in practice,
here we consider the rating transition Markov chain model of Jarrow et al.
[14] in which the rating at time Ti can be simulated as
NR !
R(Ti ) = I Z̃iR > Q(Ti , r) , (11.28)
r=1
∂Pθ (R, X) O N M
∂Pθ (R, X) ∂Xl (Ti ) ∂Pθ (R, X)
θ̄k ≡ = × + , (11.29)
∂θk i=1
∂Xl (Ti ) ∂θk ∂θk
l=1
where we have allowed for an explicit dependence of the payout on the model
parameters. Due to the discreteness of the state space of the rating factor, the
pathwise estimator for its related sensitivities is not well defined. However,
as we will show below, one can express things in such a way that the rating
sensitivities are incorporated in the explicit term ∂Pθ (R, X)/∂θk .
In the following, we will show how the calculation of the pathwise derivative
estimator 11.29 can be implemented efficiently by means of AAD.
with P̄ = 1.
A few comments are in order. In Step 4̄, the adjoint of the payout function
is defined while keeping the discrete rating variable constant. This provides
the derivatives X̄l (Ti ) = ∂Pθ /∂Xl (Ti ) and θ̄k = ∂Pθ /∂θk . In defining the
adjoint in Step 2̄, we have taken into account that the propagation rule in
Step 2 is explicitly dependent on both X(Ti ) and the model parameters θ. As
a result, its adjoint counterpart produces contributions to both θ̄ and X̄(Ti ).
Both the adjoint of the payout and of the propagation mapping can be imple-
mented following the principles of AAD as discussed in References 2 and 3.
In many situations, AD tools can be also used as an aid or to automate the
implementation, especially for simpler, self-contained functions. In the back-
ward sweep above, Steps 1̄ and 3̄ have been skipped because we have assumed
for simplicity of exposition that the parameters θ do not affect the correlation
matrices ρi , and the rating dynamics. If correlation risk is instead required,
Step 2̄ produces also the adjoint of the random variables Z X , and Step 1̄ con-
tains the adjoint of the Cholesky decomposition, possibly with the support of
the binning technique, as described in Reference 4.
354 High-Performance Computing in Finance
!
NR !
P Ti , Z̃iR , X(Ti ) = P̃i (X(Ti ); 0) + P̃i (X(Ti ); r) − P̃i (X(Ti ); r−1)
r=1
!
×I Z̃iR > Q(Ti , r; θ) , (11.30)
NR !
∂θk P Ti , Z̃i , X(Ti ) = − P̃i (X(Ti ); r) − P̃i (X(Ti ); r − 1)
r=1
!
× δ Z̃iR = Q(Ti , r; θ) × ∂θk Q(Ti , r; θ). (11.31)
This estimator cannot be sampled in this form with MC. Nevertheless, it can
be integrated out using the properties of Dirac’s delta along the lines of [16],
giving after straightforward computations,
NR
φ(Z , ZiX , ρi ) !
θ̄k = − √ ∂θ k
Q(T i , r; θ) P̃ i (X(Ti ); r) − P̃ i (X(Ti ); r−1) ,
r=1
i φ(ZiX , ρXi )
(11.32)
)i−1 √
where Z is such that (Z + j=1 ZjR )/ i = Q(Ti , r; θ) and φ(ZiX , ρX i ) is
a NX -dimensional standard normal probability density function with corre-
lation matrix ρX i obtained by removing the first row and column of ρi ; here
∂θk Q(Ti , r; θ) is not stochastic and can be evaluated (e.g., using AAD) once
per simulation. The final result is rather intuitive as it is given by the proba-
bility weighted sum of the discontinuities in the payout.
11.4.3 Results
As a numerical test, we present here results for the calculation of risk on
the CVA of a portfolio of swaps on commodity Futures. For the purpose of this
illustration, we consider a simple one-factor lognormal model for the Futures
curve of the form
dFT (t)
= σT exp(−β(T − t)) d Wt , (11.33)
FT (t)
can be simulated exactly for any time t so that the propagation rule in Step 2
reads for Ti ≤ T
.
1
FT (Ti ) = FT (Ti−1 ) exp σi ΔTi Z − σi2 ΔTi , (11.34)
2
where ΔTi = Ti − Ti−1 , and
σT2 !
σi2 = e−2βT e2βTi − e2βTi−1
2βΔTi
is the outturn variance. In this example, we will consider deterministic interest
rates. As underlying portfolio for the CVA calculation, we consider a set of
commodity swaps, paying on a strip of Futures (e.g., monthly) expiries tj ,
j = 1, . . . , Ne the amount Ftj (tj ) − K. The time t net present value for this
portfolio reads
Ne !
N P V (t) = D(t, tj ) Ftj (t) − K . (11.35)
j=1
Note that although we consider here for simplicity of exposition a linear portfo-
lio, the method proposed applies to an arbitrarily complex portfolio of deriva-
tives, for which in general the N P V will be a nonlinear function of the market
factors Ftj (t) and model parameters θ.
For this example, the adjoint propagation rule in Step 2̄ simply reads
.
1 2
F̄T (Ti − 1) + = F̄T (Ti ) exp σi ΔTi Z − σi ΔTi ,
2
. !
σ̄i = F̄T (Ti )F (Ti ) ΔTi Z − σi ΔTi
with σ̄i related to this step’s contribution to the adjoint of the Futures’ volatil-
ity σ̄T by >
σ̄i !
σ̄T + = √ e−2βT e2βTi − e2βTi−1 .
2βΔTi
At the end of the backward path, F̄T (0) and σ̄T contain the pathwise derivative
estimator 11.29 corresponding, respectively, to the sensitivity with respect to
today’s price and volatility of the Futures contract with expiry T .
The remarkable computational efficiency of the AAD implementation is
clearly illustrated in Figure 11.4. Here we plot the speedup produced by AAD
with respect to the standard finite-difference method. On a fairly typical trade
horizon of 5 years, for a portfolio of 5 swaps referencing distinct commodi-
ties Futures with monthly expiries, the CVA bears nontrivial risk to over 600
parameters: 300 Futures prices (FT (0)), and at the money volatilities (σT ),
(say) 10 points on the zero rate curve, and 10 points on the CDS curve of
the counterparty used to calibrate the transition probabilities of the rating
transition model 11.28. As illustrated in Figure 11.4, the CPU time required
for the calculation of the CVA, and its sensitivities, is less than 4 times the
356 High-Performance Computing in Finance
160
140
Speedup/RCPU 120
100
80
60
40
20
0 ~3.8
0 100 200 300 400 500 600
Nrisks
FIGURE 11.4: Speedup in the calculation of risk for the CVA of a portfolio
of 5 commodity swaps over a 5-year horizon, as a function of the number
of risks computed (empty dots). The full dots are the ratio of the CPU time
required for the calculation of the CVA, and its sensitivities, and the CPU time
spent for the computation of the CVA alone. Lines are guides for the eye.
CPU time spent for the computation of the CVA alone, as predicted by Equa-
tion 11.8. As a result, even for this very simple application, AAD produces
risk over 150 times faster than finite differences, that is, for a CVA evaluation
taking 10 seconds, AAD produces the full set of sensitivities in less than 40
seconds, while finite differences require approximately 1 hour and 40 minutes.
Moreover, as a result of the analytic integration of the singularities intro-
duced by the rating process, the risk produced by AAD is typically less noisy
than the one produced by finite differences. This is clearly illustrated in
Table 11.1 showing the variance reduction on the sensitivities with respect
to the thresholds Q(Ti , r) for a simple test case. Here we have considered the
calculation of a call option of the form (FT (Ti ) − C(R(Ti )))+ with a strike
C(R(Ti )) linearly dependent on the rating, and Ti = 1. The variance reduction
displayed in the table can be thought of as a further speedup factor because
4 The Specification of the Parameters used for This Example is Available upon Request.
Case Studies of Real-Time Risk Management via AAD 357
⎡ ⎤
T
Q(t, T ; λi ) = exp ⎣− du λiu ⎦ . (11.36)
t
358 High-Performance Computing in Finance
where L(t, Tj ; λ, θ) and A(t, Tj ; λ, θ) are, respectively, the expected loss and
credit risky annuity for a Tj maturity CDS contract starting at time t.5 These
are defined as
T
dQ(t, u; λ)
L(t, T ; λ, θ) = du Z(t, u; θ) (1 − Ru ) − , (11.40)
du
t
T
A(t, T ; λ, θ) = du Z(t, u; θ)Q(t, u; λ). (11.41)
t
Here, Z(t, u; θ) is the discount factor from time t to time u, Q(t, u; λ) (resp.
−dQ(t, u; λ)) is the probability that the reference entity survives up to (resp.
defaults in an infinitesimal interval around) time u, and Ru is the expected
percentage recovery upon default at time u. The latter is generally expressed
as a piecewise constant function with the same discretization of the hazard
rate function, say R = (R1 , . . . , RM ).
The calibration Equations 11.38 and 11.39 are based on the definition of
par spread si as break-even coupons c making the value of CDS
worth zero.6 Since both the expected loss and the risky annuity at time Ti
depend on hazard rate points λj with j ≤ i, the calibration equations can be
solved iteratively starting from i = 1, by keeping fixed the hazard rate knot
points λj with j < i, and solving for λi .
Through the calibration process, the system of M Equation 11.38 defines
implicitly the function λ = λ(θ), linking the hazard rate to the credit spreads,
the term structure of expected recovery and the discount factors. These are
in turn a function of the market instruments that are used for the calibration
of the discount curve.
characterized by a standard coupon and are generally quoted in terms of upfronts or quote
spreads. Both mark types can be mapped to a dollar value of a CDS contract by means of a
market standard parameterization [19], and hazard rates can be equivalently bootstrapped
from these marks using Equation 11.42. Credit (par) spreads remain nonetheless commonly
used in the market practice as risk factors for credit derivatives. The analysis of this paper
can be easily formulated in terms of quote spreads or upfronts.
360 High-Performance Computing in Finance
where the first term captures the explicit dependence on the model param-
eters θ through the pricing step and the second term captures the implicit
dependence via the calibration step.
The computation of the calibration component of the price sensitivities
with standard bump and reval approaches is particularly onerous because it
involves repeating the calibration step for each perturbation. Especially for
portfolio of simple credit derivatives, like CDS, this can easily represent the
bulk of the computational burden. In addition, finite size perturbations of
credit spreads, recovery, or interest rates often correspond to inputs that do
not admit an arbitrage-free representation in terms of a nonnegative hazard
rate curve, thus making the robust and stable computation of sensitivities
challenging.
P ricing Step :
∂V ∂V
θ̄k = V̄ λ̄j = V̄ , (11.44)
∂θk ∂λj
Calibration Step :
M
∂λj
θ̄k = θ̄k + λ̄j . (11.45)
j=1
∂θk
This relation allows the computation of the sensitivities of λ(θ), locally defined
in an implicit fashion by Equations 11.38 and 11.39, in terms of the sensitivities
of the function 11.39.
In the specific case, when θk = sj for j = 1, . . . , M , that is, when con-
sidering sensitivities with respect to market risk factors other than the credit
spreads, Equation 11.46 can be expressed in turn as
% −1 &
∂λi ∂s(λ, θ) ∂s(λ, θ)
=− . (11.47)
∂θk ∂λ ∂θ
ik
with respect to the model parameters, ∂sj (λ, θ)/∂θk , and the hazard rates,
∂sk (λ, θ)/∂λi , and (ii) solving a linear system, for example, by Gaussian elim-
ination. This method is significantly more stable and efficient than the naı̈ve
approach of calculating the derivatives of the implicit functions θ → λ(θ)
by differentiating directly the calibration step either by bump and reval or
by applying AAD to the calibration step. This is because s(λ, θ) in Equa-
tion 11.48 are explicit functions of the hazard rate and the model parameters
that are easy to compute and differentiate.
Combining the implicit function theorem with adjoint methods results in
extremely efficient risk computations, as we will demonstrate later.
sj = sj (λ, θ)
defined by Equation 11.48, namely, using the definitions 11.1 and 11.6,
where the scalar s̄j is the adjoint of the jth par spread with j = 1, . . . , M . By
applying the rules of AAD, this can be implemented as
L(t, Tj ; λ, θ)
Aj = −s̄j
A(t, Tj ; λ, θ)2
1
Lj = s̄j
A(t, Tj ; λ, θ)
(λ̄, θ̄) += A(t, Tj ; λ, θ, Aj )
(λ̄, θ̄) += L(t, Tj ; λ, θ, Lj ),
1. Execute (λ̄, θ̄) = s̄j (λ, θ, s̄j ) with s̄j = 1 for j = 1, . . . , M . This gives
the derivatives:
∂sj ∂sj
λ̄ij = θ̄kj = ,
∂λi ∂θk
for i = 1, . . . , M and k = 1, . . . , Nθ .
40
AAD (calibration)
35 AAD (total)
Bumping
30
Calibration
Pricing
Ratio CPU time
25
20
15
10
0
2 4 8 16 24 36
# Risks
3. Return:
M
∂λi
θ̄k = λ̄i ,
i=1
∂θk
for k = 1, . . . , M .
The adjoint of the calibration algorithm described earlier is extremely effi-
cient. Indeed, as illustrated in Figure 11.5, the sensitivities of the hazard rate
with respect to the credit spreads, and interest rate instruments can be com-
puted in ∼25% less time than performing a single bootstrap.
11.5.6 Results
[Link] Credit default swaps
As a first example, we consider the calculation of price sensitivities for a
(portfolio of) CDS. In this case, the adjoint of the pricing step simply reads,
from Equation 11.42,
L = V̄
A = −V̄ c
(λ̄, θ̄) = A(t, T ; λ, θ, A)
(λ̄, θ̄)+ = L(t, T ; λ, θ, L),
where the risky annuity and expected loss (and their adjoint counterparts) are
those of the CDS in the portfolio. In this case, as illustrated in Figure 11.5,
364 High-Performance Computing in Finance
the cost of the pricing step is a small portion (∼10%) of the overall cost
of computing the sensitivities which is instead dominated by the cost of the
calibration step. As a result, all the sensitivities can be obtained by means of
AAD for ∼15% less than the cost of performing a single valuation. In typical
applications, where computing sensitivities with respect to 18 spread tenors
and interest rate instruments is commonplace, this results in a reduction of
the computational cost by a factor of 50 or more.
where ζ = 1 for a payer and ζ = −1 for a receiver option, ViCDS (TE , TM ) is the
value at time TE of the underlying credit default index swap (long protection)
with standard coupon rate and maturity TM , PE is the exercise fee, and L(TE )
is the value at time TE of the loss given default associated to the names that
have defaulted before expiry,
N
L(TE ) = I(τ i < TE )N i (1 − Rτi ),
i=1
where N is the number of names in the index, I is the indicator function, and
N i , τ i , and Rui are the notional, default time, and recovery function of the
ith name in the portfolio.7
According to the de facto market standard model [22], the value at time
TE of the random quantity given by the sum of the loss amount, L(TE ), and
the value of the credit default index swap, ViCDS (TE , TM ), are modeled in
terms of a single state variable, the default adjusted forward spread sTE , as
ViCDS (TE , TM ) + L(TE ) = Ntot Aisda (sTE , TE , TM ) (sTE − c), (11.51)
where c) is the fixed rate in the underlying credit default index swap and
N
Ntot = i=1 N i is the total notional of the index. Here Aisda (s, t, T ) is the
standardized risky annuity of Equation 11.41 calculated assuming a flat term
structure of the credit spread s, according to the standard ISDA conventions
[19]. In the simplest setting, the default adjusted forward spread is assumed
lognormally distributed,
.
1 2
sTE = FTE exp − σTE (TE − t) + σTE TE − t Z̃ , (11.52)
2
where σTE is the volatility of the default adjusted forward spread, Z̃ is a
standard normal random variable and the forward FTE , can be determined by
7 Here for simplicity of exposition, we assume that no names in the index have defaulted
at valuation time.
Case Studies of Real-Time Risk Management via AAD 365
adj isda
GF (FTE , λ, θ) ≡ ViCDS (TE , TM ; λ, θ) − ViCDS (TE , TM ; FTE , θ) = 0. (11.53)
can be computed according to the standard hazard rate model using the time
t default and recovery curves of the index constituents:
adj
ViCDS (TE , TM ; λ, θ) =L̃(t, TE ; λ, θ) + Z(t, TE ; θ)
N
× N i L(TE , TM ; λi , θ) − c A(TE , TM ; λi , θ) ,
i=1
(11.54)
with
N
L̃(t, TE ;λ, θ) = Z(t, TE ; θ) N i L̃i (t, TE ; λi , θ),
i=1
The calibration Equation 11.53 defines implicitly the loss adjusted forward
spread, FTE , as a function of its volatility σTE , the hazard rates and expected
recoveries of the index constituents, and the risk parameters of the discount
curve, in short
For a given set of input parameters θ and the calibrated hazard rates
for the index constituents λ, the pricing algorithm consists of the following
steps:
Step 1 Calibrate the forward by solving the calibration Equation 11.53. This
involves computing Equation 11.54 using the hazard rate model and
Equation 11.55 by numerical integration for each trial value of FTE .
366 High-Performance Computing in Finance
Step 2 Compute the option value 11.50 using Equation 11.51, for example,
using Gaussian quadrature
L
Vt = Z(t, TE ) wk φ(xk ; FTE , θ)Pk , (11.57)
k=1
with
adj isda
ḠF = V iCDS (TE ,TM ; λ, θ, ḠF ) − V iCDS (TE , TM ; FTE , θ, ḠF ). (11.60)
Here
adj adj
(λ̄, θ̄) = V Idx (TE , TM ; λ, θ, V̄iCDS )
and
isda isda
(F̄TE , θ̄) = V iCDS (TE , TM ; FTE , θ, V̄iCDS )
are the adjoints of Equations 11.54 and 11.55, respectively. For ḠF = 1,
Equation 11.60 gives F̄TE = ∂GF /∂FTE , λ̄ij = ∂GF /∂λij , and θ̄k = ∂GF /∂θk ,
for i = 1, . . . , N , j = 1, . . . , M , k = 1, . . . , Nθ . Applying the implicit function
theorem to the function GF , one finally obtains the outputs of the function
in Equation 11.59:
−1
∂FTE ∂GF ∂GF
λ̄ij = F̄TE =− ,
∂λij ∂FTE ∂λij
−1
∂FTE ∂GF ∂GF
θ̄k = F̄TE =− .
∂θk ∂FTE ∂θk
Step 2̄ Set:
Vt
Z̄ = V̄
Z(t, TE ; θ)
and
θ̄ = Z̄(t, TE ; θ, Z̄),
where Z̄(t, T ; θ, Z̄) is the adjoint of the discount function. Then com-
pute the adjoint of the Gaussian quadrature Equations 11.57 and
11.58, namely set F̄TE = 0, and
φ̄k = V̄ Z(t, TE ; θ)wk Pk ,
(F̄TE , θ̄) += φ̄(xk ; FTE , θ, φ̄k ),
for k = 1, . . . , L, where φ̄(xi ; FTE , θ, φ̄i ) is the adjoint of the proba-
bility density function. Note that due to the linearity of the adjoint
function with respect to the adjoint input, these instructions can be
re-expressed in terms of a numerical integration of the form
L
(F̄TE , θ̄) = Z(t, TE ; θ) wk φ̄(xk ; FTE , θ, V̄ )Pk ,
k=1
10,000
AAD
Bumping
Ratio CPU time 1000
100
10
0
10 25 50 100 125
# Index constituents
FIGURE 11.6: Cost of computing the sensitivities with respect to the volatil-
ity, the constituents’ credit spreads and interest rate instruments—relative to
the cost of performing a single valuation—as a function of the number of index
constituents.
11.6 Conclusion
In conclusion, we have shown how AAD is extremely beneficial for the risk
management of financial derivatives by discussing three examples: (i) interest
rate products; (ii) counterparty credit risk management; and (iii) flow credit
products. These examples illustrate how AAD is effective in speeding up, by
several orders of magnitude, the computation of price sensitivities both in the
context of MC applications and for applications involving faster numerical
methods. In particular, we have shown how by combining adjoint ideas with
the implicit function theorem one can avoid the necessity of repeating mul-
tiple times the calibration step of financial model which, especially for flow
products, often represents the bottle neck in the computation of risk. A recent
publication [23] illustrates the application of these ideas to the calculation of
risk for Partial Differential Equation application.
These examples illustrate how AAD allows one to perform in minutes risk
runs that would take otherwise several hours or could not even be performed
overnight without large parallel computers. AAD therefore makes possible
real-time risk management on an industrial scale without onerous investments
in calculation infrastructure, allowing investment firms to hedge their posi-
tions more effectively, actively manage their capital allocation, reduce their
infrastructure costs, and ultimately attract more business.
The opinions and views expressed in this chapter are uniquely those of the
authors and do not necessarily represent those of Credit Suisse.
Case Studies of Real-Time Risk Management via AAD 369
References
1. Giles, M. and Glasserman, P. Smoking adjoints: Fast Monte Carlo greeks. Risk,
19:88–92, 2006.
8. Brace, A., Gatarek, D., and Musiela, M. The market model of interest rate
dynamics. Mathematical Finance, 7:127–155, 1997.
10. Denson, N. and Joshi, M. Fast and accurate greeks for the Libor Market Model.
Journal of Computational Finance, 14:115–125, 2011.
11. Leclerc, Q. M. and Schneider, I. Fast Monte Carlo Bermudan greeks. Risk, 22:84–
88, 2009.
12. Capriotti, L., Peacock, M., and Lee, J. Real time counterparty credit risk man-
agement in Monte Carlo. Risk, 24:86–90, 2011.
13. Brigo, D. and Capponi, A. Bilateral counterparty risk with application to CDSS.
Risk, 22:85–90, 2010.
14. Jarrow, R., Lando, D., and Turnbull, S. A Markov model for the term structure
of credit risk spreads. Review of Financial Studies, 10:481–523, 1997.
16. Joshi, M. and Kainth, D. Rapid computation of prices and deltas of nth to
default swaps in the li model. Quantitative Finance, 4:266–275, 2004.
18. Capriotti, L. and Lee, J. Adjoint credit risk management. Risk, 27:90–96, 2014.
19. ISDA. ISDA CDS standard model. Lehman Brothers Quantitative Credit
Research, 2003.
23. Capriotti, L., Jiang, Y., and Macrina, A. Real-time risk management: An AAD–
PDE approach. International Journal of Financial Engineering, 2:1550039, 2015.
Chapter 12
Tackling Reinsurance Contract
Optimization by Means of
Evolutionary Algorithms and HPC
CONTENTS
12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
12.2 Modeling the RCO Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373
12.2.1 Reinsurance costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373
12.2.2 Reinsurance recoveries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374
12.2.3 The risk value and optimization problem . . . . . . . . . . . . . . . 375
12.3 Evolutionary Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376
12.3.1 Population-based incremental learning (PBIL) . . . . . . . . . 377
12.3.2 Differential evolution (DE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379
12.4 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382
12.4.1 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382
12.4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383
[Link] Parallel version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389
12.1 Introduction
Risk hedging strategies are at the heart of prudent risk management. Indi-
viduals often hedge risks to their property, particularly from infrequent but
expensive events such as fires, floods, and robberies, by entering into risk
transfer contracts with insurance companies. Insurance companies collect pre-
miums from those individuals with the expectation that at the end of the year
they will have taken in more money than they have had to pay out in losses
and overhead, and therefore remain profitable or at least solvent. Perhaps
not surprisingly, insurance companies themselves try to hedge their risks, par-
ticularly from the potentially enormous losses often associated with natural
catastrophes such as earthquakes, hurricanes, and floods. Much of this hedging
is facilitated by the global “property cat” reinsurance market [1], where rein-
surance companies insure primary insurance companies against the massive
371
372 High-Performance Computing in Finance
Premium/risk Premium/risk
Claims Claims
Primary insurer
Consumer Reinsurers
claims that can occur due to natural catastrophes. Figure 12.1 illustrates how
this flow works.
Analytics in the reinsurance market is becoming increasingly complex for
at least three reasons. First, factors like climate change are skewing the data
in ways that are not fully understood, making experience less useful for deci-
sion making. Second, the global distribution of economic activity is changing
rapidly with key supply chains now having significant presence in parts of
the world where catastrophic risk is not as well understood. For example, few
in 2011 understood that a Thailand flood event could cost US$47 billion in
property losses and cause a global shortage of hard disk drives that lasted
throughout 2012. Lastly, there is a tendency for risk transfer contracts to
become ever more complex, in large part by increasing the number of subcon-
tracts (called layers) that make up a contract. This in turn makes it increas-
ingly important to have good computational tools that can help underwrit-
ers understand the interaction between layers and to decide on placement
percentages—placement percentages involve choosing layers to buy and how
large a share or percentage of them to buy in order to maximize the risk
hedging and the expected return.
From the perspective of an insurance company, the problem is known as
Reinsurance Contract Optimization (RCO). In this problem, we can iden-
tify a reinsurance contract consisting of a fixed number of layers and a set
of expected loss distributions (one per layer) as produced by a Catastrophe
Model [2], plus a model of current costs in the global reinsurance market.
The main difficulty in this problem is to identify the optimal combinations of
placements (percent shares of subcontracts).
In order to solve the RCO problem, an enumeration method can be used;
however, this approach presents two main problems: (i) it has to be discretized,
demanding some changes in numerical algorithms and (ii) it is only applica-
ble in small problem instances ranging from two to four layers, whereas real
instances of the RCO problem can have seven or more layers. For instance, a
seven-layered problem can take several weeks to be solved with a 5% level of
discretization on the search space using the enumeration method as presented
Tackling Reinsurance Contract Optimization 373
Layer 1
Layer 2
Solution 1 Solution 2
where p is the monetary value of the premium, μ is the rate on line, π is the
placement, d is the deductible, and l is the limit.
For contracts with multiple layers, Equation 12.1 can be generalized to
Equation 12.2,
T Lπ
p=μ (12.2)
where ci is the value of the claim for ith instance of X. Equation 12.4 can
then be extended to contracts with multiple layers as follows:
n
ci = max{0, min{lj , xi − dj }}πj (12.5)
j=1
where lj , dj , and πj are the limit, deductible, and placement of the jth layer,
respectively. In addition to this, many contracts allow for multiple claims in
any given contractual year. The yearly contractual loss is then, assuming no
financial terms that impose a maximum amount claimable, simply the sum of
all individual claims in a given contractual year
n
yj = cij (12.6)
i=1
where yj is the annual amount claimed for the jth layer. The annual return
for reinsurance contract is then defined as
r = y Tπ − p
= y Tπ − μ
T Lπ (12.7)
T T
= (y − μ
L)π
R = (Y − ML)π (12.8)
cost of placing risk in the marketplace. Since the same year is being simulated
each row in matrix M is the same. This formulation leads to this optimization
problem:
maximize VaR α (R(π))
(12.9)
s.t. E(R(π)) = a
Given that the expected return a is not specified, Equation 12.9 can be
rewritten as a Pareto Frontier problem such that
Initial population
Evaluation
Stop Yes
Final population
criteria
No
Genetic operators
then updated based on the best members of a population. Unlike other EAs,
which transform the current population into new populations, a new popu-
lation is generated at random using an updated probability vector on each
generation. Baluja describes his algorithm as a “combination of evolutionary
optimization and hill-climbing” [8].
Since Baluja’s work, extensions to the algorithm have been proposed for
continuous and base-n represented search spaces [9–11]. The extension to con-
tinuous search spaces using histograms (PBILH ) and real-code (RPBIL) sug-
gests splitting the search space into intervals, each with their own probability
[9,11]. For multivariate cases, the probability vector is then substituted for
a probability matrix, such that each row or column of the matrix represents
a probability vector for any given independent variable. While those meth-
ods support continuous spaces, a similar idea extended PBIL to a discrete
approach in Reference 3 in order to deal with the reinsurance problem con-
straints, that is, we substitute the intervals for equidistant increments in the
lower and upper bounds of the search space.
where LFijk is the ith learning factor, as described in Reference 10, for the
kth best result for the jth variable.
The main drawback of the discrete PBIL is that it requires all of the objec-
tive functions get transformed into a single function. In order to address this
issue, and compute a better Pareto frontier, a true multiobjective optimization
approach, called MOPBIL, is presented in Algorithm 12.2.
FIGURE 12.4: How the MOPBIL creates points and updates the probability
matrix.
randomly chosen. When Pidx3 is the best individual in the population, the
strategy is called DE/Best/1.
The process of computing v is called mutation. Afterwards, a new individ-
ual is created in a similar way as the discrete cross-over of genetic algorithms,
that is, for each dimension d, a gene is chosen from the vector of differences v
with a probability of CR, or from the target individual i with a probability of
1 − CR. Finally, if the fitness of the new individual is better than the fitness
of the target one, then the new individual replaces the individual i.
The canonical DE presents the same problem of PBIL single objective, that
is, it has to aggregate all functions into only one evaluation function. Thus, a
multiobjective version, called DEMO, is shown in Algorithm 12.4, where we
can observe that it is similar to the canonical version of DE whose strategy
is DE/Rand/1 [13]. The differences start in line 16 when the new population
is selected for the next iteration. Thus if a new individual (indiv) dominates
the target one (P opi ), then the new one is added into a new population; if the
target individual dominates the new one, then the target element is added into
the new population; otherwise, both individuals go to the new population. The
dominance process builds a new population whose size ranges from pop size to
2 × pop size. Finally, if the size of the new population is larger than pop size,
then the new individuals which go to next iteration are selected by crowding
distance (select cdistance function).
The main drawback of the original DEMO was not to maintain an archive
thereby loosing good solutions when the number of nondominated points
overcome the size of the population. Taking this into account, we changed
Tackling Reinsurance Contract Optimization 381
Algorithm 12.3: DE
P op ← generate pop(n,d)
f it ← evaluate (P opk )
while (Stop Criteria is FALSE) do
for i = 1 to #pop size do
idx ← select indiv(3)
v ← P opidx3 + F ∗ (P opkidx1 − P opkidx2 )
for j = 1 to dimension do
nj = rand()
if (nj < CR) then
pop’← vj
else
pop’← popi j
end
end
f it
i ← evaluate (Pi )
if f it
i < f iti then
popi ← pop
i
f iti ← f it
i
end
end
end
the original algorithm into two parts. First, we introduce an archive in the
algorithm (after line 31, i.e., it is done on each iteration) in order to not lose
nondominated solutions from one iteration to another due to the crowding dis-
tance algorithm in line 30. Doing so, we are able to use different mutation oper-
ators such as those presented in Equations 12.12 through 12.15. These strate-
gies are called DE/ND/Rand/1, DE/ND/Rand/RF/1, DE/Arch/Rand/1, and
DE/Arch/Rand/RF/1, respectively. Equation 12.12 uses an random individ-
ual from the set of nondominated ones. In order to do so, it is necessary to
compute the nondominated set between lines 3 and 4, that is, before starting
the loop which deals with the population. Equation 12.13 is similar to the
previous one; however, F is a random number between 0 and 1. Then, both
Equations 12.14 and 12.15 use the first individual chosen from the archive.
The difference between them is the use of F which is randomly chosen in
Equation 12.15.
add f it and f it
into nf
end
end
if (nrow(pop
) == nrow(P opi ))) then
P op ← pop
else
[P op, f it] ← select cdistance(pop
, nf )
end
end
hypervolume by the union of all vi . The final number of solutions after all
trials is showed as well.
⎛ ⎞
|Q|
A
hv = volume ⎝ vi ⎠ (12.16)
i=1
12.4.2 Results
All tests were conducted using R version 3.2.1 on a Red Hat Linux 64-bit
operating system with an Intel Xeon processor comprised of two Xeon pro-
cessors E5-2650 running at 2.0 GHz with 8 cores, hyperthreading and 256 GB
of memory. Considering 250 and 500 with a population size equals to 50. The
following parameters were used for MOPBIL:
• Population size = 50
• Slice size = 0.05
• Best population = 3
384 High-Performance Computing in Finance
• [Link] = 0.1
• [Link] = 0.075
• [Link] = 0.02
• [Link] = 0.05
• Population size = 50
• Strategy = DE/Arch/Rand/RF/1
Table 12.1 shows metrics for 250, 500, and 1000 iterations (Figure 12.5).
As expected, as we increase iterations results tend to be better. On the other
hand, we can see DEMO is the best of the two executed algorithms in terms
of both number of solutions and hypervolume.
–1.4 × 109
MOPBIL
DEMO
–1.45 × 109
–1.5 × 109
Risk ($)
–1.55 × 109
–1.6 × 109
–1.65 × 109
–2.8 × 107 –2.6 × 107 –2.4 × 107 –2.2 × 107 –2 × 107 –1.8 × 107 –1.6 × 107 –1.4 × 107 –1.2 × 107
Expected return ($)
–1.4 × 109
MOPBIL
DEMO
–1.45 × 109
–1.5 × 109
Risk ($)
–1.55 × 109
–1.6 × 109
–1.65 × 109
–2.8 × 107 –2.6 × 107 –2.4 × 107 –2.2 × 107 –2 × 107 –1.8 × 107 –1.6 × 107 –1.4 × 107 –1.2 × 107
Expected return ($)
–1.4 × 109
MOPBIL
DEMO
–1.45 × 109
–1.5 × 109
Risk ($)
–1.55 × 109
–1.6 × 109
–1.65 × 109
–2.8 × 107 –2.6 × 107 –2.4 × 107 –2.2 × 107 –2 × 107 –1.8 × 107 –1.6 × 107 –1.4 × 107 –1.2 × 107
Expected return ($)
FIGURE 12.5: Final Pareto frontier for DEMO and MOPBIL using 7 layers,
250, 500, and 1000 iterations.
386 High-Performance Computing in Finance
Figures 12.6 and 12.7 show the time and speedup reached in the Xeon
architecture variating the thread count. Regardless the number of layers, the
best efficiency is reached using two threads representing an efficiency of 96.7%
and 98.2%, respectively. In terms of speedup, it is almost linear up to four
threads. Then, the best one is reached using 32 threads representing 9.38 and
8.33 for 7 and 15 layers, respectively; however, the use of 32 threads represents
an efficiency of 29.3% and 26% for 7 and 15 layers. Moreover, the best speedups
are reached by 7 layers saturating in approximately 16 threads.
Figure 12.8 presents the Pareto frontier obtained by varying the thread
count for 1000 iterations and 7 layers, where we can observe that visually all
Pareto frontiers seem to be the same. Table 12.2 depicts the average in term
of metrics. Even though, the number of solutions decrease as we increase the
number of threads, the final number of solutions is not affected. Moreover,
1600
1400
1200
1000
Seconds (s)
800
600
400
200
0
1T 2T 4T 8T 16T 32T
7 Layers 15 Layers
FIGURE 12.6: Time for 7 and 15 layers and 1000 iterations on Xeon.
10
9
8
7
6
Speedup
5
4
3
2
1
0
2T 4T 8T 16T 32T
7 Layers 15 Layers
FIGURE 12.7: Speedup for 7 and 15 layers and 1000 iterations on Xeon.
Tackling Reinsurance Contract Optimization 387
–1.35E+09
–28,000,000 –23,000,000 –18,000,000 –13,000,000 –8,000,000
–1.4E+09
–1.45E+09
–1.5E+09
–1.55E+09
–1.6E+09
–1.65E+09
T1 T2 T4 T8 T16 T32
FIGURE 12.8: Pareto frontier varying thread count for 1000 iterations and
7 layers.
the hypervolume is quite stable between threads, therefore, the faster the
execution the better. In fact, the small numbers in Table 12.3, which represent
the coverage, mean that the Pareto frontiers are very similar regardless the
number of threads.
Figure 12.9 shows the Pareto frontier obtained by varying the thread count
for 1000 iterations and 15 layers, where we can observe that, visually, the dif-
ference between Pareto frontiers obtained by different counting of threads
is not meaningful. On the other hand, Table 12.4 presents how the num-
ber of solutions decrease as we increase the number of threads; nonetheless,
the hypervolume indicates that this decrement is worth up to eight threads
388 High-Performance Computing in Finance
–1.29E+09
–36,000,000 –31,000,000 –26,000,000 –21,000,000 –16,000,000 –11,000,000 –6,000,000
–1.34E+09
–1.39E+09
–1.44E+09
–1.49E+09
–1.54E+09
–1.59E+09
–1.64E+09
–1.69E+09
T1 T2 T4 T8 T16 T32
FIGURE 12.9: Pareto frontier varying thread count for 1000 iterations and
15 layers.
because it is larger. Actually, the time saved using eight threads is also of
great note. Table 12.5 reinforces that the quality of using eight threads is an
attractive option because it dominates 52% and 70% of solutions from 16 and
32 threads, respectively.
References
1. Cai, J., Tan, K. S., Weng, C., and Zhang, Y. Optimal reinsurance under VAR
and CTE risk measures. Insurance: Mathematics and Economics, 43(1):185–196,
2008.
3. Cortes, A. C., Rau-Chaplin, A., Wilson, D., Cook, I., and Gaiser-Porter, J.
Efficient optimization of reinsurance contracts using discretized PBIL. In The
Third International Conference on Data Analytics, pp. 18–24, Porto, Portugal,
2013.
4. Mistry, S., Gaiser-Porter, J., McSharry, P., and Armour, T. Parallel computation
of reinsurance models, 2012. (Unpublished)
5. Mitschele, A., Oesterreicher, I., Schlottmann, F., and Seese, D. Heuristic opti-
mization of reinsurance programs and implications for reinsurance buyers. In
Operations Research Proceedings, pp. 287–292, 2006.
6. Herrera, F., Lozano, M., and Verdegay, J. L. Tackling real-coded genetic algo-
rithms: Operators and tools for behavioural analysis. Artificial Intelligence
Review, 12(4):265–319, 1998.
10. Servais, M.P., de Jager, G., and Greene, J. R. Function optimisation using
multiple-base population based incremental learning. In The Eighth Annual
South African Workshop on Pattern Recognition, Rhodes University, Graham-
stown, South Africa, 1997.
11. Yuan, B. and Gallagher, M. Playing in continuous, some analysis and extension
of population-based incremental learning. In IEEE Congress on Evolutionary
Computation, IEEE, pp. 443–450, Canberra, Australia, 2003.
12. Storn, R. and Price, K. Differential evolution: A simple and efficient heuristic
for global optimization over continuous spaces. Journal of Global Optimization,
12(4):341–359, 1997.
13. Qin, A. K., Huang, V. L., and Suganthan, P. N. Differential evolution algo-
rithm with strategy adaptation for global numerical optimization. Transaction
on Evolutionary Computation, 13(2):398–417, 2009.
17. Tierney, L., Rossini, A. J., Li, N., and Sevcikova, H. Snow, [Link]
[Link]/web/packages/snow/[Link], 2017.
Chapter 13
Evaluating Blockchain
Implementation of Clearing
and Settlement at the IATA
Clearing House
CONTENTS
13.1 ICH (IATA Clearing House)’s Current
Clearing Procedure and Potential for Improving
Its Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392
13.1.1 ICH’s clearance procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392
13.1.2 Improving ICH’s clearance procedure with
blockchain technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393
13.2 Data Simulation Description and Model
Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395
13.3 Mathematical Formulation of the Model . . . . . . . . . . . . . . . . . . . . . . . . 397
13.3.1 Discrete-time optimal control model . . . . . . . . . . . . . . . . . . . . 397
13.3.2 The variants of the objective function . . . . . . . . . . . . . . . . . . 399
[Link] Reducing the transaction costs . . . . . . . . . . . . . . . 399
[Link] Improving the liquidity profile . . . . . . . . . . . . . . . . 399
13.3.3 How to choose the values of the control variables? . . . . . 400
[Link] The control variable a ∈ [0, 1] . . . . . . . . . . . . . . . . 400
[Link] The control variables ui ∈ {0; 1}, i = 1, N . . . . 400
13.3.4 The final mathematical formulation of the model . . . . . . 402
13.4 Practical Implementation of the Model . . . . . . . . . . . . . . . . . . . . . . . . . . 403
13.4.1 Results representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403
[Link] Before IATA Coin adoption . . . . . . . . . . . . . . . . . . 403
[Link] After IATA Coin adoption . . . . . . . . . . . . . . . . . . . . 404
13.4.2 The model results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405
[Link] Example of one simulation . . . . . . . . . . . . . . . . . . . . 406
[Link] Increasing the number of simulations . . . . . . . . . 407
13.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 408
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409
391
392 High-Performance Computing in Finance
• IATA guarantees that all invoices will be submitted for payment and this
means that ICH’s clearing and settlement procedure reduces the credit
risk.
• About a 64% offset ratio (the relation between the volume of billings
and the amount of cash required to settle them). This means that from
Implementation of Clearing and Settlement at the IATA Clearing House 393
Invoices
t
t t+7 t + 21
Clearance cycle
Cash flow
ICH
t
t t+7 t + 21
Cash flow
Other
t
t t+7 t + 21
There are also disadvantages for both sides—for ICH and its members.
They are:
• Slow payment procedure (a huge delay between sending the invoices and
receiving the money)
and government, to name just a few. Similarly, blockchain has the potential
to be a positive transformative force within the airline industry.
A distributed ledger is essentially an asset database that is shared across
a network of multiple sites, geographies, or institutions. The assets can be
financial, legal, physical, or virtual in nature. Selected participants within the
network have copies of the ledger and any changes to the ledger are reflected
in all copies within minutes, or in some cases, seconds. The security and accu-
racy of the transactions stored in the ledger are ensured cryptographically
through the use of keys and signatures to control who can do what within the
shared ledger. Entries can also be updated by one, some, or all of the par-
ticipants according to rules agreed by the network. Distributed ledger tech-
nologies (DLTs) hold the potential to redefine a number of industries and
various aspects of society. From reducing the transaction costs experienced by
large companies to providing greater possibilities for distributed economies
and business models, they hold great disruptive potential. DLT can bring
multiple benefits to its users, including but not limited to:
• Accelerate cash flows (the payments will become significantly faster with
settlement and confirmation on a distributed ledger)
Invoices
t
t t+7 t + 21
Clearance cycle
Cash flow
Submitting invoices in IATA Coins ICH
t
t t+7 t + 21
And exchanging IATA Coins for fiat currency
Cash flow
Other
t
t t+7 t + 21
Can provide liquidity when it’s necessary
1. There are two main agents whose interests are not the same:
• ICH, which tries to increase the offset ratio and to accelerate the
payment procedure
396 High-Performance Computing in Finance
• The company, which has its own view (and its own criteria of opti-
mality) in choosing the dates when invoices should be submitted
2. The company is acting rationally and trying to reduce transaction costs.
3. ICH current clearance cycle is still working, but all invoices can be sub-
mitted in IATA Coins and consequently excluded from ICH’s clearing
procedure.
5. The company can choose the times when it wants to exchange IATA
Coins for fiat money (according to the liquidity profile of the company).
6. On the settlement day of the clearance cycle, the payments are effected
not in IATA Coins, but in fiat currency.
7. For calculating the offset ratio, only fiat currency payments will be taken
into account.
8. Due to the absence of historical data, the sample for analysis will be
generated according to the following distributional assumptions:
• The number of invoices per day for a company will be considered as a
random variable with a Poisson distribution (with an arbitrary value
of the parameter)
• The volume of invoices will be considered as a random variable with
a lognormal distribution (with arbitrary values of the parameters)
So, the sample for constructing the model in the simplest case will contain
the following attributes:
• Date
where
Clt+21 is the after-clearing sum of money for company A and the time period
[t, t + 7];
Invi is the volume of the invoice i of company A;
ui is the control variable defined by Equation 13.2 as
⎧
⎪
⎨ 0, if the company decides to include Invi into clearing;
ui := 1, if the company decides to submit Invi in IATA Coins
⎪
⎩
and exclude it from clearing.
(13.2)
The company has the opportunity to exchange IATA Coins for fiat cur-
rency. Let the proportion of IATA Coins exchanged for fiat currency be
denoted by a. The sum of money for the period [t, t + 7] which will be paid
to (or received from) the company A in fiat currency can be calculated by
Equation 13.3:
N
M
Pt+7 = a Invi ui + Rj , (13.3)
i=1 j=1
where
Pt+7 is the total volume of payments of company A in the time period [t, t + 7]
which are not included into clearing and are submitted in fiat currency;
Rj is the volume of fiat currency payment j which does not go through IATA.
The control variable a can take on the values [0, 1]. If a = 1, the sum of all
the volumes of invoices submitted in IATA Coins will be exchanged for fiat
money. If 0 < a < 1, only )some IATA Coins will be exchanged and the others
N
will be equal to (1 − a) i=1 Invi ui .
It is easy to see that
N
N
M
Clt+21 + Pt+7 + (1 − a) Invi ui = Invi + Rj . (13.4)
i=1 i=1 j=1
This means that the factual cash flow (in fiat currency) can be reduced by the
)N
sum (1 − a) i=1 Invi ui which will be held in IATA Coins.
Implementation of Clearing and Settlement at the IATA Clearing House 399
)N )M
| i=1 Invi (1 − ui )| + j=1 |Rj | + |Clt+7 |
)N )M → min. (13.5)
i=1 |Invi |(1 − ui ) + j=1 |Rj | + |Clt+7 |
Expression (13.5) is the equivalent of the offset ratio maximization. This ratio-
nal multiextremal function is nonlinear and nonconvex, and its global solution
can only be found by exhaustive search. It may be approximated by a suitable
global optimization procedure.
In other words, the company tries to decrease the absolute value of the sum
of all the fiat currency payments on the time period [t, t + 7]. This potentially
can help to smooth the liquidity profile by receiving additional liquidity from
exchanging IATA Coins for fiat currency.
In order to give this bijective model, only a single optimization criterion,
the second objective (13.6) will be implemented as a constraint:
$ N $ $M $
$
M $ $ $
$ $ $ $
$a Invi ui + Rj + Clt+7 $ ≤ $ Rj + Clt+7 $ . (13.7)
$ $ $ $
i=1 i=1 i=1
400 High-Performance Computing in Finance
)N )M
1. i=1 Invi > 0 and j=1 Rj + Clt+7 > 0 : It does not make sense to
exchange IATA Coins for fiat currency ⇒ a = 0.
)N )M
2. i=1 Invi > 0 and i=1 Rj + Clt+7 < 0 : It makes sense to exchange
IATA Coins for fiat currency ⇒ 0 < a ≤ 1.
There are two cases which may appear:
)N )M !
a. i=1 Invi ui > − j=1 Rj + Clt+7 : The sum of IATA Coins is
greater than the value of the negative fiat currency balance. In this
case, the company will intend only to cover the!value of the negative
)N )M
balance: a i=1 Invi ui = − j=1 Rj + Clt+7 ⇒ 0 < a < 1.
)N )M !
b. i=1 Invi ui ≤ − j=1 Rj + Clt+7 : The sum of IATA Coins is
less or equal to the value of the negative fiat currency balance. In
this case, the company will intend to exchange all IATA Coins that
it possesses ⇒ a = 1.
)N )M
3. i=1 Invi < 0 and j=1 Rj + Clt+7 > 0 : The balance in IATA Coins
is negative, there is nothing to exchange ⇒ a = 0.
It does not make sense to exchange fiat currency for IATA Coins,
because it seems to be more rational to include the invoices in clearing
and later to pay for them in fiat currency.
)N )M
4. i=1 Invi < 0 and j=1 Rj + Clt+7 < 0 : The balance in IATA Coins
is negative, there is nothing to exchange ⇒ a = 0.
1. The company’s opinion: Using IATA Coins, the company will try to
decrease the cash amount which is circulated in the network. This can
be formulated as the following constraints:
a. Total cash payments after IATA Coin adoption should be not greater
than before it:
$N $ $ N $
$ $ $ $ M
$ $ $ $
$ Invi (1 − ui )$ + $a Invi ui $ + |Rj | + |Clt+7 |
$ $ $ $
i=1 i=1 j=1
$N $
$ $ M
$ $
≤$ Invi $ + |Rj | + |Clt+7 | (13.8)
$ $
i=1 j=1
or
$ $ $ $ $ $
$N $ $ N $ $ N $
$ $ $ $ $ $
$ Invi (1 − ui )$ + $a Invi ui $ ≤ $ Invi $ (13.9)
$ $ $ $ $ $
i=1 i=1 i=1
3. ICH’s criterion: As was said above, ICH’s main goal is to increase the
offset ratio. In fact, ICH is not a decision maker in the model, so it
only can hope that the decisions made by the companies will improve
the offset ratio. Actually, it can be taken for granted (according to the
formulation of the model) that no matter what decisions are made by
the companies (of course, if they are rational), they will not decrease
the initial value of the offset ratio.
402 High-Performance Computing in Finance
)N )M
| i=1 Invi (1 − ui ) | + j=1 |Rj | + |Clt+7 |
)N )M → min,
i=1 |Invi | (1 − ui ) + j=1 |Rj | + |Clt+7 |
$N $ $N $ $N $
$ $ $ $ $ $
$ $ $ $ $ $
$ Invi (1 − ui )$ + $ Invi ui $ ≤ $ Invi $ ,
$ $ $ $ $ $
i=1 i=1 i=1
$ $
N $ N $ N
$ $ (13.15)
|Invi | (1 − ui ) + $ Invi ui $ ≤ |Invi |,
$ $
i=1 i=1 i=1
$ $ $ $
$N $ $M $
$ M
$ $ $
$ Inv u + R + Cl $ ≤ $ R + Cl $
t+7 $ ,
$ i i j t+7 $ $ j
$ i=1 j=1 $ $ j=1 $
a ∈ [0, 1] , ui ∈ {0; 1}, i = 1, N .
Total )
1 2 ... n Offset
1 a12 ... a1n a1· a1 = a1· − a·1
2 a21 ... a2n a2· a2 = a2· − a·2
... ... ... ... ... ...
n
) an1 an2 ... an· an = an· − a·n
a·1 a·2 ... a·n
404 High-Performance Computing in Finance
where
aij is the sum of money, that airline i needs to pay to airline j in current
clearing cycle (the sum of invoices from airline i to airline j during the clearing
cycle, where Nij is the number of invoices from airline i to airline j):
Nij
aij = Invijk , (13.16)
k=1
ai· is the sum of money, that airline i needs to pay to all of the members of
ICH:
n
ai· = aij , (13.17)
j=1
a·j is the sum of money, that airline j should receive from all of the members
of ICH:
n
a·j = aij . (13.18)
i=1
In IATA Coins )
1 2 ... n
1 b12 . . . b1n b1·
2 b21 . . . b2n b2·
... ... ... ... ...
n
) bn1 bn2 . . . bn·
b·1 b·2 . . . b·n
In Fiat Currency
)
1 2 ... n Offset
1 ã12 . . . ã1n ã1· ã1 = ã1· − ã·1
2 ã21 . . . ã2n ã2· ã2 = ã2· − ã·2
... ... ... ... ... ...
n
) ãn1 ãn2 . . . ãn· ãn = ãn· − ã·n
ã·1 ã·2 . . . ã·n
Implementation of Clearing and Settlement at the IATA Clearing House 405
where
bij is the sum of payments in IATA Coins, that airline i will pay to airline j.
ãij is the sum of cash, that airline i needs to pay to airline j in the current
clearing cycle (invoices submitted in IATA Coins are excluded).
Generally, the sum of billings both in cash and IATA Coins is equal to the
volume of billings in the case when airlines do not use IATA Coins:
The offset ratio for fiat payments can be calculated in a similar way as in
Equation 13.19. But actually, for invoices in IATA Coins if we calculate the
offset (in the same meaning as for fiat currency invoices), it will be equal to 0,
because the invoices submitted in IATA Coins are not aggregated.
The total offset ratio for the system:
)n )n )n
i=1 |ãi | + i=1 j=1 bij
Offset2 = 1 − )n )n )Nij . (13.21)
i=1 j=1 k=1 |Invijk |
But, if we say that offset ratio is the relation between the volume of billings
and the amount of cash required to settle them and, in fact, IATA Coins are
not the cash, in that sense it can be said that the offset ratio for the second
case will be )n
∗ |ãi |
Offset2 = 1 − )n )n i=1 )Nij . (13.22)
i=1 j=1 k=1 |Invijk |
• The number of invoices per day for a company was considered as a ran-
dom variable with Poisson distribution. The parameter λ was fitted on a
sample of the invoices between 2 airlines which was provided by IATA.
It is approximately equal to 10, meaning that for each pair of counter-
parties on each day there are 10 invoices on average. The model was also
implemented on sample invoices with other values of λ.
• Other payments which do not go through IATA are also included into
the sample (for simplicity, there is only one counterparty nonmember of
ICH and λ is set the same, equal to 10).
The solution of the optimization problem was found via a genetic algorithm
[2–4]. It is rather a resource-intensive optimization method which explains why
the number of simulations was only 1000 and the number of the companies in
the sample was equal to 3 in order to decrease the time needed for calculations.
The above described model is implemented for one clearing cycle and for
each company. This means that every company chooses its optimal strategy
independently from the others. Then for each invoice in the sample, the opin-
ions of both counterparties (the payer and the payee) are compared. If both
companies decide that the submission of the invoice in IATA Coins is more
profitable for them, then the invoice will be submitted in IATA Coins, other-
wise it will be included in the clearing procedure.
We see that using IATA Coins can potentially help to reduce the cash flow:
8e–07
6e–07
Density
4e–07
2e–07
0e–00
1,500,000 2,000,000 2,500,000 3,000,000 3,500,000 4,000,000 4,500,000
Cashflow
In order to check the quality of the model, the results of the model were
compared with random behavior in which invoices that should be submitted
in IATA Coins were selected in a random way.
The results of the comparison are shown in Figure 13.3. For this example,
a large number (namely, 100,000) of scenarios of random invoice selection
were generated and for each scenario the value of after-clearing net cash flow
was calculated. Then these values were represented in the form of histogram,
where also three values are marked with vertical lines:
• The value of net cash flow without using IATA Coins (as in the current
clearing procedure)
• The average value of the cash flow in the randomly generated scenarios
We see that the current clearing procedure gives a better result than the
average of the random scenarios, but the model can help to improve this
result.
In Table 13.1, there are main metrics calculated for these three cases:
the values of cash flow (going through ICH and externally), the sum of the
invoices, and the offset ratio. In this example, the offset ratio has increased
from 65.9% to 69.4%, an increase of 5.3%.
Due to the nature of a small sample from the lognormal invoice volume
distribution, the averaged offset ratio is lower than in the example from pre-
vious subsection, but the model result is better than the current procedure
result by 4.8%.
13.5 Summary
This chapter covers the issues of the implementation of blockchain tech-
nologies in the clearing and settlement procedure of the ICH. We have devel-
oped an approach to estimate the industry level benefits of adoption of the
blockchain-based industry money (IATA Coin) for clearing and settlement.
The potential system-wide offsetting benefits are mainly driven by the follow-
ing factors:
References
1. Committee on Payments and Market Infrastructures (CPMI), 2017. Distributed
Ledger Technology in Payment, Clearing and Settlement: An Analytical Frame-
work. Basel. [Link]
2. Lucasius, C.B. and Kateman, G., 1993. Understanding and using genetic
algorithms—Part 1. Concepts, properties and context. Chemometrics and Intel-
ligent Laboratory Systems, 19:1–33.
3. Lucasius, C.B. and Kateman, G., 1994. Understanding and using genetic
algorithms—Part 2. Representation, configuration and hybridization. Chemo-
metrics and Intelligent Laboratory Systems, 25:99–145.
4. Willighagen, E. and Ballings M., 2015. Package “genalg”: R Based Genetic Algo-
rithm. [Link]
Part III
411
Chapter 14
Supercomputers
Peter Schober
CONTENTS
14.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414
14.2 History, Current Landscape, and Upcoming Trends . . . . . . . . . . . . . 416
14.3 Programming Languages and Parallelization
Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 418
14.4 Advantages and Disadvantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421
14.5 Supercomputers for Financial Applications . . . . . . . . . . . . . . . . . . . . . . 422
14.5.1 Suitable financial applications . . . . . . . . . . . . . . . . . . . . . . . . . . . 422
14.5.2 Performance measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423
14.5.3 Access to and costs of supercomputers . . . . . . . . . . . . . . . . . . 425
14.6 Case Study: Pricing Basket Options Using C++
and MPI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427
14.6.1 Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427
14.6.2 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 428
[Link] Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 428
[Link] Combination technique . . . . . . . . . . . . . . . . . . . . . . . 428
[Link] Parallelization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429
[Link] Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429
14.6.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429
14.7 Case Study: Optimizing Life Cycle Investment
Decisions Using MATLAB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431
14.7.1 Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432
14.7.2 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433
[Link] Discrete time dynamic programming . . . . . . . . . 433
[Link] Parallelization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433
[Link] Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434
14.7.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434
Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438
413
414 High-Performance Computing in Finance
14.1 Introduction
Generally speaking, a supercomputer is a computer which is one of the
fastest computers of its time. Usually, the computer’s performance is mea-
sured in Floating Point Operations Per Second (FLOPS) when running a cer-
tain benchmark program. Since 1993, the TOP500 list [1] ranks the top 500
supercomputers worldwide according to their maximal performance achieved
when running the LINPACK benchmark.1 As of November 2016, the fastest
supercomputer in the world is the Sunway TaihuLight at the National Super-
computing Center in Wuxi, China. Its maximal performance is at 93 Peta
FLOPS, that is, 93 × 1015 FLOPS.
All supercomputers on the current TOP500 list are massively parallel com-
puters designed either in a computer cluster or in a Massively Parallel Process-
ing (MPP) architecture. A computer cluster consists of many interconnected,
independent computers that are set up in a way that they are virtually a sin-
gle system. The computers in the cluster (often called compute nodes) usually
are standalone systems and are configured to work on a parallel job, which is
distributed over the nodes and controlled by specific job scheduling software.
In contrast, an MPP computer consists of many individual compute nodes
that do not qualify as a standalone computer and are connected by a custom
network. For example, in IBM’s Blue Gene series, the compute nodes comprise
multiple CPUs and a shared RAM, but no hard disk, and are interconnected
via a multidimensional torus network. In MPP architecture, often times the
complexity of a single CPU (e.g., the number of the transistors or the clock
frequency) is reduced to allow for a higher number of parallel cores. Both archi-
tectures are frequently combined with certain hardware acceleration methods
such as coprocessors or general-purpose GPUs (GPGPUs).
An example of a supercomputer with MPP architecture is the Blue Gene/Q
supercomputer JUQUEEN, currently ranked 19 on the TOP500 list. It com-
bines compute nodes which have 16 cores with 1.6 GHz clock frequency per
node sharing 16 GB RAM and run the lightweight Compute Node Kernel
(CNK). These are stacked together in 28 racks, share a main memory, and
are interconnected by infiniband with five-dimensional torus structure.2 The
supercomputer can be accessed via the SSH protocol on separate log-in nodes
that run Linux. Jobs can be sent to the compute nodes via a central job
manager called LoadLeveler. A simple example for a computer cluster is a
compilation of conventional servers; each with multiple cores, a high amount
of RAM and its own hard drive running on a Windows HPC Server. These
servers are put in one rack and are connected via Gigabit Ethernet. A central
software on a dedicated head node, the HPC Cluster Manager, schedules the
jobs to the compute nodes.
1 Theoretical maximum performance can also be reported and is calculated as follows:
maximum number of floating point operations per clock cycle × maximum frequency of one
core × number of cores of the computer.
2 The full specifications can be found in Reference 2.
Supercomputers 415
10 EFlop/s
1 EFlop/s
100 PFlop/s
10 PFlop/s
1 PFlop/s
Performance
100 TFlop/s
10 TFlop/s
1 TFlop/s
100 GFlop/s
10 GFlop/s
1 GFlop/s
100 MFlop/s
1995 2000 2005 2010 2015 2020
Lists
Sum #1 #500
3 For comparison: today, common desktop computers comprise multicore CPUs with two,
4 The Cray-1 was built by Cray Research, Inc., founded by Seymour Cray, who is broadly
and a CPU with 16 cores, which totals 560,640 cores. Lately, China matches
up with the USA (171 systems each in the TOP500 list as of November 2016,
Germany is third with 32 systems) and manifests its dominance at the top
of the list. In June 2016, the Chinese Sunway TaihuLight, developed by the
National Research Center of Parallel Computer Engineering & Technology,
with 93 Peta FLOPS and over 10 million cores, took the position as the
world’s fastest supercomputer from the likewise Chinese Tianhe-2 (32.9 Peta
FLOPS). Thus the top two Chinese systems alone account for about 127 Peta
FLOPS of the total aggregate of 672 Peta FLOPS on the current TOP500
list.
The focus on speed as the only measure for performance leads to increas-
ing power consumption of the top supercomputers during the last decade.
Half of the costs of a conventional cluster can be accounted for cooling of the
facilities, the other half for the actual computations. In 2006, an awareness of
social responsibility regarding climate change and global warming, but also
cost efficiency within the High Performance Computing community, initiates
the announcement of the Green500 list [5]: the Green500 aims at ranking the
most energy efficient supercomputers worldwide (measured by Mega FLOPS
per Watt). Since then, architecture develops toward hybrid supercomputers
combining CPUs with GPGPU and coprocessor accelerators, which are con-
nected with low latency network technologies like Infiniband. Besides the good
suitability of GPGPUs and coprocessors for data flow tasks, a reason for the
increased use of these technologies is that their power and cost efficiency is
about 10 times higher than for conventional CPUs. In November 2014, the
L-CSC at the GSI Helmholtz Center claims the number one position in the
Green500 list with 5271 Mega FLOPS per Watt. It is developed at the Frank-
furt Institute for Advanced Studies (FIAS) [6], a public-private partnership
of the Goethe University Frankfurt and private sponsors. L-CSC reaches its
top position by massive use of GPU acceleration and an especially efficient
cooling system. Since November 2016, the in-house supercomputer DGX Sat-
urnV of the GPU manufacturer NVIDIA based in Santa Clara, Calif., USA,
is at the top of the Green500 list. It operates at 9462 Mega FLOPS per Watt
and is closely followed by the Swiss Piz Daint (7453 Mega FLOPS per Watt),
which also claims a rank of 8 in the TOP500 list. Recently, in the advent of
energy efficient supercomputers, the European Mont-Blanc project [7] aims at
building power efficient supercomputers based on low-power embedded system
technology as used in smart phones or tablet computers. Within this project
funded by the European Commission, Systems on Chip (SoC) are stacked
together in blades and are interconnected by 10 Gigabit Ethernet. The first
Mont Blanc prototype has a total of 2160 CPUs and 1080 GPUs and runs
parallel applications from physics and engineering using conventional pro-
gramming languages and parallelization approaches.
Whichever technology is going to prevail in the future, if computing power
doubles every 2 years, supercomputers will be operating at Exa scale FLOPS
by 2020.
418 High-Performance Computing in Finance
Table 14.1 depicts features of programming languages that are widely used by
scientific researchers and practitioners for writing applications for supercom-
puters. Of course, this overview has no claim to completion, and various other
programming languages exist that can be used on supercomputers in combina-
tion with parallelization. Table 14.2 shortly summarizes the characteristics of
the most important parallelization interfaces. Usually, these are extended by
task-specific (e.g., I/O of data) or hardware-specific parallelization interfaces
(such as CUDA for the use of GPUs).
Still widely used programming languages in scientific computing are For-
tran and C followed by C++, recently more often combined with Python.
Parallelization of code is mostly done with MPI, because it can handle the
distributed memory of the systems, but is often combined with shared memory
parallelization using OpenMP, Pthreads, or more recently, GPUs. Table 14.3
provides an overview on programming languages and parallelization interfaces
used by codes that run on all 458,752 cores of the supercomputer JUQUEEN.
Double counting is possible as nearly all codes use combinations of multiple
programming languages and parallelization interfaces. However, all codes use
MPI for parallelization, mostly in combination with Fortran, closely followed
by C and C++. OpenMP is frequently used on node level, because all cores
on one node share the node’s memory. Pthreads are rarely used. While inter-
preted programming languages as Python or MATLAB play a minor role in
5 Python is sometimes used to write wrappers for pre- and postprocessing of the results
or similar tasks.
6 There is (commercial) software that facilitates parallel debugging with MPI.
7 However, these licenses are part of a special license for a computer clusters called Dis-
tributed Computing Server and are cheaper than full MATLAB licenses.
8 Most MPI extensions for Python, similarly to MATLAB, provide (a subset) of the MPI
9 Usually, GPUs have many more cores and threads with a lower clock frequency than
standard CPUs.
422 High-Performance Computing in Finance
• Numerical integration
• Finite differences and finite element methods
Some of these numerical methods are inherently parallel, such as Monte Carlo
simulations; others are inherently coupled, like finite difference methods.
The problems that are inherently parallel are often called embarrassingly
parallel problems. This term usually applies if one big problem can be disas-
sembled into a set of decoupled problems. Examples include:
• Portfolio optimization over the life cycle with time discrete dynamic pro-
gramming
10 Due to the high costs of implementing and maintaining, supercomputers are often run by
The latter two are discussed as case studies in Sections 14.6 and 14.7. For
embarrassingly parallel problems, parallelization is straightforward and, using
the right combination of programming language and parallelization interface,
often a “quick win” when using supercomputers.
In the case of coupled problems, parallelization is usually not so straight-
forward and harder to implement, for example, for finite differences or finite
element methods for general pricing of derivatives.
Sstrong
Estrong = opt . (14.1)
Sstrong
For example, if the runtime on one computing unit, Pmin = 1, is tmax = 80 sec-
onds, then one expects the runtime on P = 8 cores to be t = 10 seconds and the
opt
application would scale linearly with speedup Sstrong = 80/10 = 8 = Sstrong .
Parallel efficiency is Estrong = 1. Embarrassingly parallel problems often times
scale strongly, but, in general, it is harder to achieve high strong scaling effi-
ciencies at larger number of computing units since, depending on the paral-
lelization approach, the communication overhead for many applications also
increases proportionally in the number of computing units used. In addition,
ideally the problems should be distributed in a way that the variance of the
runtime distribution of the atomic components (e.g., the paths) is minimal.
Otherwise, the longest running computing unit governs the overall solution
time. As already indicated, an example is a Monte Carlo simulation where
the development of a single path is decoupled from the development of all
other paths and hence all paths can be computed in parallel. If a fraction of
the code cannot run in parallel, the realized speedup Sstrong cannot achieve
424 High-Performance Computing in Finance
opt
the optimal speedup Sstrong . Consider the following example: the sequential
part of the code takes 1 second and the parallelizable part of the code takes
4 seconds. With four computing units available, the optimal speedup would
be 4, that is, a runtime of 1.25 seconds. However, since there is a part of the
code with runtime of 1 second that is not parallelizable, the best achievable
runtime is 2 seconds, that is, at best a speedup of 2.5. Amdahl’s law con-
nects the fraction α of the code that can run in parallel with the theoretically
achievable speedup given this fraction by
theo 1
Sstrong = (14.2)
(1 − α) + α
opt
Sstrong
theo Sstrong
Estrong = theo
. (14.3)
Sstrong
For an application that scales weakly, usually the system or node level
resources like the RAM are the limiting resource. That means, the problem size
per computing unit stays constant and additional computing units are used
to solve an altogether bigger problem. For example, when parallelizing the
solution routine to a PDE using a finite difference grid, the grid is partitioned
into pieces and each computing unit works on a piece of the grid that just
fits in its memory. If the total number of grid points is to be increased, more
computing units have to be used. In the case of weak scaling, optimal scaling
is achieved if the run time stays constant while the problem size is increased
opt
proportionally to the number of computing units, that is, Sweak = 1. The
realized speedup is then given by Sweak = tmax /t and parallel efficiency is
measured by
Sweak
Eweak = opt = Sweak . (14.4)
Sweak
For example, if the runtime on one computing unit, Pmin = 1, is tmax = 10 sec-
onds, then one expects the runtime of an eight times larger problem on P = 8
cores to be t = 10 seconds and the application would scale with speedup
opt
Sweak = 10/10 = 1 = Sweak . In contrast to embarrassingly parallel problems,
coupled problems often scale weakly. Most applications scale well to larger
numbers of computing units as they typically employ nearest-neighbor com-
munication. That is, in this example the pieces of the grid only need to know
the values at the grid points of their direct neighbors. Thus the communica-
tion overhead is constant regardless of the number of computing units used.
For weak scaling, Gustafson’s law describes the theoretically possible speedup
in presence of inherently serial code parts:
theo (1 − α)Pmin
Sweak = + α. (14.5)
P
Supercomputers 425
theo Sweak
Eweak = theo
. (14.6)
Sweak
The following best practice rules should be taken into consideration when
measuring parallel code performance:
3. Calculate the speedup for all scaling stages on basis of the average run-
times and compare it to the respective ideal linear speedup in your
setting.
4. Calculate the nonparallel fraction of your code and report the theoreti-
cally achievable speedup given by Amdahl’s or Gustafson’s law, respec-
tively.
on the current TOP500 list) offers access and a variety of services to the indus-
try and small- and medium-sized enterprises. In Germany, computing time
at the tier 0 supercomputers Hazel Hen, JUQUEEN (currently ranked 19),
and SuperMUC (rank 36) can be applied for at the Gauss Centre for Super-
computing. In Europe, the Partnership for Advanced Computing in Europe
(PRACE)—an international nonprofit association consisting of 25 member
countries—provides access to supercomputers for large-scale scientific and
engineering applications. Though access to the supercomputing infrastructure
can be granted to the industry, it is foremost for research purposes such as sim-
ulations in the automotive or aerospace sector. Thereby, the costs commercial
institutions have to bear are not publicly disseminated. Moreover, computing
time is distributed among the users using a job queuing system. This means,
that the job execution start time is conditional on the queue priority, which
is determined by the total runtime of the job (the so-called walltime) and the
availability of the requested computing resources, that is, the number of cores
and potentially special accelerators. Altogether, having access to a shared
supercomputer is not sufficient for common operational tasks in financial insti-
tutions, such as overnight portfolio valuation or pricing of derivatives at the
trading desk.
Proprietary supercomputers, run and maintained by an institution or
shared over different entities in a bigger institution or consortium, pose an
often chosen but costly alternative to shared access to the largest super-
computers of the world. Table 14.4 depicts the costs associated with setting
up and maintaining a small proprietary computer cluster of 128 cores with
an estimated LINPACK performance of 6.4 Tera FLOPS. A medium-sized
supercomputer with roughly 3200 cores has total costs of approximately
700,000 EUR.13
13 Based on calculations presented by Bull Atos Technologies at the ICS 2015 [9].
Supercomputers 427
14.6.1 Problem
In higher dimensions, the development of d correlated stocks over time can
be described by a vector process x = (x1 , . . . , xd ) ∈ Rd , where the component
i follows a geometric Brownian motion with drift
n
dxi (t) = xi (t)μi dt + xi (t) σij dWj (t), t ≥ 0, xi (0) = x0i . (14.7)
j=1
∂u d
∂u 1
d
∂2u
−r xi − σi σj ρij xj xj + ru = 0, (14.8)
∂t i=1
∂xi 2 i,j=1 ∂xi ∂xj
where r denotes the risk-free rate. Subject of this case study is a European
put option with strike K on the S&P 500 index with an arithmetic average
payoff. Hence, the initial condition for the PDE 14.8 is
500
u(x, 0) = K− γi xi ∀x ∈ Rd+ . (14.9)
i=1 +
14.6.2 Approach
Firstly, a decomposition technique is employed to decompose the high-
dimensional PDE into a linear combination of low-dimensional PDEs. For
these low-dimensional PDEs, the curse of dimensionality is then broken by use
of the so-called combination technique on sparse grids, which in turn allows
for straightforward parallelization. This parallelization approach significantly
reduces the overall runtime of the solution routine for the decomposed high-
dimensional PDE.
[Link] Decomposition
The Taylor-like ANOVA decomposition is a special form of an anchored
ANOVA-type decomposition of the solution u(x), x ∈ Rd , to a d-dimensional
PDE. In addition to the anchor point a, all contributing terms also depend
on the first r ≥ 1 coordinates. With s ≤ d − r, the solution can be written as
(r)
s (r)
u(x) = u0 (a; x1 , . . . , xr ) + ui1 ,...,ip (a; x1 , . . . , xr ; xi1 , . . . , xip ).
p=1 {i1 ,...,ip }
⊆{r+1,...,d}
(14.10)
For u being a function of the eigenvalues (λ1 , . . . , λd ) of the covariance matrix
of the vector process x, the first-order approximation, s = 1, r = 1, is given by
d !
(1) (1) (1)
u(x) ≈ u0 (a; x1 ) + uj (a; x1 ; xj ) − u0 (a; x1 )
j=2
d
(1) (1)
≈ uj (a; x1 ; xj ) − (d − 2)u0 (a; x1 ), (14.11)
j=2
(1)
where u1,j (a; x1 ; xj ) is a solution to the heat equation
∂u 1 ∂ 2 u 1 ∂ 2 u
− λ1 2 − λj 2 = 0 ∀(t, x1 , xj ) ∈ [0, T ] × R × R (14.12)
∂t 2 ∂x1 2 ∂xj
)d
Here, |l|1 = i=1 ld .
Computing the approximated solution un (x) to the (high-dimensional)
PDE thus boils down to computing the solution to each of the O(dnd−1 )
full grid solutions ul (x) and each of these grids has only O(2n ) grid points.
This solution approach breaks the curse of dimensionality as it only involves
O(2n dnd−1 ) grid points compared to O(2nd ) grid points of the full grid solu-
tion with mesh width 2−n . It can be shown that the error of the approximation
is only slightly deteriorated by a logarithmic factor, if u fulfills certain smooth-
ness conditions. In addition, the combination technique facilitates straightfor-
ward parallelization as the up to O(dnd−1 ) full grid solutions can be computed
in parallel.
[Link] Parallelization
The Taylor-like ANOVA decomposition 14.11 generates a set of d = 493
independent problems (1 one-dimensional, 492 two-dimensional). To solve the
two-dimensional PDEs in parallel, computing units are grouped and each
group of computing units solves one (or more) of the two-dimensional PDEs
on sparse grids using the combination technique. In dependence of the level
n used in the combination technique, let Pmax be the maximum number of
computing units for a fully parallel solution. When having Pmax computing
units available, every computing unit calculates the solution to exactly one full
grid. If there were more than Pmax computing units available, the additional
computing units would be idle. Assuming there are less than Pmax computing
units available, it is favorable to split the available computing units into as
many groups as independent problems exist. Figure 14.2 sketches the idea.
[Link] Implementation
The calculations are run on JUQUEEN. For the combination technique,
the PDEs are solved on the full grids using a Crank–Nicolson time stepping
scheme combined with a finite difference discretization in space. The resulting
set of linear equations is solved using a BiCGSTAB solver with a precision of
the minimum residual norm of 10−12 . The implementation of the paralleliza-
tion approach is done with C++ and MPI and the code is compiled using the
IBM XL compiler.
14.6.3 Results
For an at-the-money basket put option with strike K = 1, the price of the
basket option computes to u(x0 , 0) = 0.0276 using the Taylor-like ANOVA
430 High-Performance Computing in Finance
d
u(x) ≈ ∑ uj(1) (a; x1; xj) (d – 2) u0(1) (a; x1)
j=2
d–1
decomposition of first order 14.11 and the combination technique 14.13 on level
n = 10. Since there is no analytical solution, a benchmark result was calculated
using a Monte Carlo simulation with 100,000 paths (u(x0 , 0) = 0.0281). To
test for strong scaling, the number of computing units is repeatedly doubled
until Pmax = 13,285 computing units are reached, that is, full parallelization
when utilizing 13,296 computing units, because of round lot allocation on
JUQUEEN. The code runs seven times for every scaling stage and average
runtimes μ and standard deviations σ are reported to account for jitter due to
the job scheduler of the cluster and the cluster’s network. Table 14.5 depicts
realized mean runtimes, standard deviations, and speedups. Figure 14.3 plots
16
Realized
Theoretical
8
Speedup
1
1024 2048 4096 8192 16,384
Number of cores
FIGURE 14.3: Strong scaling results for the arithmetic basket option on
the S&P 500. (Adapted from Schober, P., Schröder, P., and Wittum, G., In
revision at the Journal of Computational Finance, available at SSRN 2591254,
2015.)
the realized speedup against the theoretically achievable speedup. The parallel
efficiency is Estrong = 63.85% at 13,296 computing units with respect to 1024
computing units. Looking at the realized mean runtime μ, the basket option
was priced within 3 minutes using massive parallelization.
14.7.1 Problem
Formally, the problem of maximizing the individual’s expected utility from
her choices pt ∈ Rk over all time periods t ∈ {0, . . . , T } needs to be solved:
T
max ρt E0 [u (pt )] . (14.14)
pt
t=0
1
ft+1 S
:= Wt+1 = St rt+1 + Bt rB + 1{t<tR } Gt Pt νt+1 ϑt+1 + 1{t≥tR } λGt Pt
2
ft+1 := Pt+1 = 1{t<tR } Pt νt+1 + 1{t≥tR } Pt (14.15)
3 1
ft+1 := Lt+1 = Lt +
ät
subject to
St , Bt , At ≥ 0 (14.18)
Ct − ε ≥ 0 (14.19)
Wt − Ct − St − Bt − At = 0. (14.20)
Here, v is a known function of the final state and a minimal consumption of
ε > 0 is assumed.
14.7.2 Approach
A general way to solve problem 14.16–14.20 is discrete time dynamic pro-
gramming stepping backwards in time, which allows for a simple paralleliza-
tion easily implemented in MATLAB.
i
where the basis functions φ can be global polynomials or Ansatz functions
with local support (such as B-splines). The same grid is chosen for every time
step t and the coefficients cit are determined in such way that the approxi-
mation fits the known function values at all grid points at a given time. The
last period’s optimal value function 14.17 is given by jT (siT ) = v(siT ), ∀i ∈ I.
To determine the optimal solution of the next-to-last period jT −1 (sit ) and all
earlier periods t ∈ {0, . . . , T − 2} at all grid points sit , i ∈ I, an optimization
routine which maximizes 14.16 over the real-valued vector pt is used (sub-
ject to the boundary and budget constraints 14.18–14.20). Since within the
optimization routine ft+1 does not generally correspond to a grid point, the
expectation is approximated by a Gaussian quadrature rule with q = 1, . . . , m
nodes ω q and weights wq :
'
(
m
q q
Et jt+1 ft+1 (pt , st , ωt+1 ) ≈
i
cit φi (ft+1 (pt , sit , ωt+1 )) wt+1 . (14.22)
q=1 i∈I
[Link] Parallelization
The dynamic programming approach yields a set of independent optimiza-
tion problems in every period t: for every grid point in the state space grid
434 High-Performance Computing in Finance
Note that this is a fixed-size parallelization problem that should scale strongly
until Pmax is reached.
For the parallelization, a master–worker pattern is employed: one of the P
computing units is the designated master, the other P − 1 computing units
become the workers. The master firstly generates a (possibly random) execu-
tion order over all multi-indexes i from the index set I, and then assigns an
optimization problem associated to the grid point indexed by i to all workers.
Every time any computing unit idles, the master dispatches the next prob-
lem from the predefined order to this idle computing unit until all problems
are solved. Figure 14.4 contains a schematic representation of the approach.
If there is no a priori knowledge about the runtime of the single problems
that are dispatched in parallel, the master–worker pattern is a simple load-
balancing solution that in practice achieves very high speedups.
[Link] Implementation
The computations are performed using MATLAB version R2012b on a
proprietary computer cluster consisting of heterogeneous high-performance
servers with CPU speed varying from 2 GHz to 3.3 GHz, number of cores
from 12 to 16 and RAM size from 48 GB to 128 GB per server. All servers are
stacked in one air-cooled rack and are connected via Gigabit Ethernet. The
servers run on Windows HPC Server 2008 with Microsoft HPC Pack as a job
scheduler.
The parallelization is done using MATLAB’s parfor command that imple-
ments a master–worker pattern. For the optimization, the gradient-based con-
strained optimization solver fmincon from the MATLAB Optimization Tool-
box is used. Besides standard MATLAB, the Parallel Computing Toolbox
and MATLAB Distributed Computing Server are required to use the parfor
construct for parallelization across MATLAB sessions running on multiple
servers.
14.7.3 Results
To evaluate the optimal choices obtained from the optimization, a Monte
Carlo simulation over the life cycle from age 20 to age 99 for 100,000 paths is
performed. In every path and every time step, the optimal choices from the
numerical solution are evaluated to determine the path’s evolution. Finally,
labor income, stock and bond investment, consumption, and annuity purchase
profiles are generated by averaging over all paths, see Figure 14.5.
Supercomputers 435
(a)
<<controls>>
P1 P2 P3 P4
<<communicates>>
(b) 64 63 62 61 60
59 58 57 56 55 54
<<controls>>
1 2 3
P1 P2 P3 P4
<<communicates>>
70 Consumption
Stock holdings
60 Bond holdings
Averages in 1000 US dollar
Annuity purchases
50 Labor income
40
30
20
10
0
20 30 40 50 60 70 80 90 100
Age
FIGURE 14.5: Average life cycle profile for consumption, stock and bond
holdings, annuity purchases, and labor income in total US dollar values.
Further Reading
Section 14.1 [1] contains the current TOP500 list, various statistics, his-
torical data, and an extensive FAQ regarding the TOP500. In Reference 4, the
LINPACK benchmark is published and available for download. For detailed
information on the LINPACK benchmark, see also [14]. Supercomputers as a
platform for heavily parallel applications are discussed in Reference 15.
Supercomputers 437
16
Realized
Theoretical
8
Speedup
1
8 16 32 64 128
Number of cores
Section 14.6 [11] is the basis for the case study. Reference 27 covers the
special class of ANOA decomposition used. Reference 28 is a comprehensive
reference for sparse grids. Reference 29 proposes the parallelization approach
applied.
Section 14.7 [13] comprises and extends this case study for the same life
cycle model, which is taken from Reference 12.
References
1. Top500 homepage. Available: [Link] 2017.
4. Petitet, A., Whaley, C., Dongarra, J., and Cleary, A., LINPACK. [Online]. Avail-
able: [Link] 2008.
11. Schober, P., Schröder, P., and Wittum, G., Efficient parallel solution methods
for high-dimensional option pricing problems, In revision at the Journal of Com-
putational Finance, available at SSRN 2591254, 2015.
12. Horneff, W. J., Maurer, R. H., and Stamos, M. Z., Life-cycle asset allocation
with annuity markets, Journal of Economic Dynamics and Control, vol. 32, no.
11, pp. 3590–3612, 2008.
13. Horneff, V., Maurer, R., and Schober, P., Efficient parallel solution methods
for dynamic portfolio choice models in discrete time, Available: SSRN 2665031,
2016.
Supercomputers 439
14. Dongarra, J., Luszczek, P., and Petitet, A., The LINPACK benchmark: Past,
present and future, University of Tennessee, Tech. Rep., 2002.
15. Vajteršic, M., Zinterhof, P., and Trobec, R., Overview—Parallel Computing:
Numerics, Applications, and Trends. Springer, London, United Kingdom, 2009.
17. Lindenstruth, V., The L-CSC construction and its applications, In Presentation
at the ISC Conference, Frankfurt, 2015.
18. Rajovic, N., Carpenter, P. M., Gelado, I., Puzovic, N., Ramirez, A., and Valero,
M., Supercomputing with commodity CPUs: Are mobile SoCs ready for HPC?
In High Performance Computing, Networking, Storage and Analysis (SC), 2013
International Conference for IEEE, pp. 1–12, 2013.
21. Message Passing Interface Forum, MPI: A Message passing interface standard,
Version 2.2. [Online]. Available: [Link]
[Link], 2009.
23. Wilmott, P., On Quantitative Finance, 2nd ed. John Wiley & Sons, Ltd., Chich-
ester, United Kingdom, vol. I–III, 2006.
25. Amdahl, G. M., Validity of the single processor approach to achieving large scale
computing capabilities, In Proceedings of the April 18–20, 1967, Spring Joint
Computer Conference. ACM, pp. 483–485, 1967.
27. Schröder, P., Gerstner, T., and Wittum, G., Taylor-like ANOVA expansion for
high-dimension PDEs in finance, Working paper , 2013.
28. Bungartz, H.-J. and Griebel M., Sparse grids, Acta Numerica, vol. 13, pp. 147–
269, 2004.
29. Schröder, P., Mlynczak, P., and Wittum, G., Dimension-Wise Decomposi-
tions and Their Efficient Parallelization. World Scientific, ch. 13, pp. 445–472,
2013, [Online]. Available: [Link]
9789814436434 0013.
Chapter 15
Multiscale Dataflow Computing
in Finance
CONTENTS
15.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 441
15.2 The Dataflow Paradigm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443
15.3 Maxeler Dataflow Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445
15.3.1 DFEs in the cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447
15.4 Dataflow Programming Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449
15.5 Development Process and Design Optimization . . . . . . . . . . . . . . . . . 457
15.6 A Case Study: Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459
15.7 Financial Application Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461
15.7.1 Maxeler RiskAnalytics platform . . . . . . . . . . . . . . . . . . . . . . . . . 461
15.7.2 Interest rate swap pricing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462
15.7.3 Value-at-risk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463
15.7.4 Exotic interest rate pricing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464
15.7.5 Credit value adjustment capital . . . . . . . . . . . . . . . . . . . . . . . . . 466
15.7.6 Standard initial margin model . . . . . . . . . . . . . . . . . . . . . . . . . . 467
15.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 469
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 470
15.1 Introduction
Computer technology has become an essential driver for the financial
industry in almost all of its areas. Advances in hardware and software tech-
nology, novel numerical methods, financial models, and algorithms have made
computers a key technology that became essential for all financial institutions.
High-performance computing (HPC) systems are widely used to price finan-
cial products or to quickly calculate the risk of complex portfolios. Often,
the available computational power determines the types of problems that can
be practically solved. Being able to handle a more complex problem or to
obtain the results faster than all other organizations directly translates into a
competitive advantage.
441
442 High-Performance Computing in Finance
Data
Data
part of the application while host processors are tasked with control-intensive
tasks and also with setting up and controlling the computation on the DFE.
Depending on the nature of the problem, one can also adopt a combined
processing approach where the processor computes the less demanding part
of the application while the DFE will target the performance-critical part.
This results in a codesign approach where we develop a conventional pro-
cessor application together with a customized DFE implementation. In the
following, we first cover Maxeler dataflow systems, followed by programming
principles and custom optimizations.
MPC-C
MaxRing
Main memory Data-flow engines (up to 4)
Data-flow LMem
cores 48 GB
PCIe Gen 2
×86 CPU
cores
FIGURE 15.2: MPC-C series architecture. A single node contains both ×86
CPUs and 4 DFEs connected via PCIe and MaxRing.
MaxRing
Memory Data-flow engines (8)
Infiniband
Switch fabric
Data-flow LMem
cores 48 GB
×86 CPU
cores
FIGURE 15.3: MPC-X series architecture with eight DFEs inside a node.
The system also includes storage and networking, and it is integrated into a
dense 1U1 industry standard rack unit. Such a system supports simple stand-
alone deployment of DFE technology, tightly coupled with high-end CPUs.
The architecture is beneficial for high-performance applications that run on a
fixed number of CPU cores and continuously use one or multiple DFEs.
The MPC-X series enable a more heterogeneous system architecture sup-
porting dynamic balancing of CPU and DFE resources. MPC-X series systems
are pure DFE nodes without any CPUs (see Figure 15.3). An MPC-X system
combines 8 DFE cards in a 1U chassis directly connected through MaxRing.
DFEs are also linked through Infiniband to a cluster of CPU nodes. The
system can dynamically allocate arbitrary (often large) numbers of DFEs,
1 U is the abbreviation of Rack Unit. It is the standard unit of measure for the vertical
usable data center space, or height defined as 1.75 inches (44.45 mm). A typical full-size rack
cage is 42U high, composed by multiple 1U, 2U, or 4U boxes.
Multiscale Dataflow Computing in Finance 447
MPC-N
FIGURE 15.4: MPC-N series architecture with four DFEs inside a node.
providing good scalability and flexibility for applications with changing behav-
ior, for example, when the computation has several stages that vary in their
characteristics. The CPUs to DFEs ratio can be matched to user application
requirements.
Maxeler’s MPC-N series systems (see Figure 15.4) are a network-oriented
platform that provides Ethernet connections directly to the DFEs, supporting
ultra low-latency line-rate processing of multiple 10–40 Gbit data streams.
A single MPC-N node contains up to 4 DFE cards similar to the MPC-C
series architecture. However, each DFE card also supports up the 3 QSFP+
40 Gbit Ethernet connections where each 40 Gbit port can be split into 4 ×
10 Gbit ports. Providing fast Ethernet connections directly to the DFE enables
network processing with minimal latency. The memory architecture in DFE
also differs from the two previous system architectures: in addition to 24 GB
DRAM available as LMem, the DFE also integrates 72 MB of QDR SRAM
(QMem) supporting very low latency off-chip data access. The system contains
additional 10 Gbit connections to the CPU. MPC-N series systems are well
suited for a range of networking applications including gateways, aggregators,
or endpoints.
Maxeler systems are provided with a compilation and simulation environ-
ment (called MaxCompiler) for application development, and the MaxelerOS
system management environment. MaxelerOS coordinates the use of DFE
resources at run time, and manages the scheduling and data movement within
Maxeler systems. MaxCompiler provides a high-level programming environ-
ment to express dataflow structures, and produces the necessary binaries for
CPU and DFE binaries.
with no moving parts and hence faster and more predictable read and write times.
Multiscale Dataflow Computing in Finance 449
CPU application
CPU
SLiC
Kernels
MaxelerOS
Configurable logic
Memory
Interconnect
+
+ *
Memory
Manager
Rewrite
CPU code to be MaxIDE
application accelerated
[Link]
Matlab, [Link]
Python, C...
User input
Compiler MaxCompiler
MaxelerOS
SliC library
library
Output
Output
Executable
(the so-called .max file). The computation will later be performed by loading
the .max configuration file into the DFE and streaming the data through it.
Before we can do this, we need to modify the CPU application to invoke the
DFE. To simplify this process, MaxCompiler will generate the necessary func-
tion prototypes and header files. The CPU code is then compiled as usual and
is linked with the .max file and Maxeler’s Simple Live CPU (SLiC) interface
library. The result of this is a single executable file that contains all the binary
code to run on both the conventional CPUs and the DFEs in a system.
Let us focus on the principles of dataflow programming in MaxJ. As men-
tioned previously, MaxJ is a metalanguage that describes dataflow computing
structures; it uses Java syntax but is in principle different from regular Java
programming (or other imperative programming paradigms that describe com-
putations by changing state). The most important principle in MaxJ is that
we describe a fixed spatial dataflow structure that can perform computations
by simply streaming through data, and not a sequence of instructions to be
executed on a traditional processor.
To illustrate these principles, we show how a simple loop computation can
be transformed into a dataflow description using MaxJ. Let us assume we
want to calculate y = x2 + 3x + 17 over a data set. Even though there is
nothing inherently sequential in this computation, a conventional C program
would require a for loop. This is illustrated in Figure 15.7. The calculation is
repeated for the number of data elements in a loop. Within the loop body, all
operations also run sequentially.
In contrast, a dataflow implementation would focus on identifying the core
part of the computation and creating a data path for it. Figure 15.8 illustrates
such a dataflow implementation. The same computation that is described
inside the loop body can be performed by a fixed data path that contains two
multipliers and two adders. It is one of the key features of dataflow computing
having several operators present at the same time and running concurrently,
instead of using a time-shared functional unit inside a processor. A practical
dataflow implementation can have thousands of operators in a data path all
running concurrently. Another important principle is the absence of control
and instructions. The data path is fixed and the computation is performed by
streaming data from memory directly into the data path.
Figure 15.9 depicts the MaxJ kernel description that can generate the data
path shown in Figure 15.8. The MaxJ descriptions begin by extending the
kernel class (line 1). The kernel class is part of the Maxeler Java extensions
X X
+
17
and the user develops their own kernels by using inheritance. Next, we define
a constructor for SimpleCalc class (line 2). It is important to remember that
this MaxJ program will only run once to build the DFE configuration; the
constructor will facilitate building the dataflow implementation. To create the
streaming inputs and outputs for the kernel, the methods [Link] (line 3)
and [Link] (line 5) are used. Streaming inputs and outputs replace the for
loop in the original C code that iterates over data. The input method takes two
arguments: the name on the input that will be used by the manager to connect
the kernel and the data type of the input. In this case, we use a standard single
precision floating point format (8-bit exponent and a 24-bit mantissa), but
MaxJ also supports custom data types that can be defined by the user. This
is useful when optimizing the numerical behavior and performance, which will
1 int x = 10;
2 DFEVar y ;
3 DFEVar z ;
4 y = x ; // ok , a s s i g n c o n s t a n t t o run−time v a r i a b l e
5 x = y ; // c o m p i l e r e r r o r , cannot r e a d run−time v a r i a b l e i n t o
c o m p i l e −time Java v a r i a b l e
6 z = y ; // ok , both h a n d l e run−time data
FIGURE 15.10: DFEVars handle run-time data, Java constants are evalu-
ated only at compile time.
be covered later. The output method uses three arguments: the name of the
output to be used by the manager, the variable to connect to the output, and
the data format. The computation itself (see Figure 15.9) is expressed in a
very similar way as in the original C code (line 4).
In MaxJ, the DFEVar object is used to handle run-time data. Since MaxJ
describes a dataflow graph rather than a procedure, we have to distinguish
between run-time values and compile-time values. Regular Java variables such
as int will be evaluated and fixed at compile time. Such variables can be
used as constants for improved code readability or to control the build of the
dataflow graph. The values of DFEVars are known only at run time when
data is streamed through the kernel. This means assigning a Java variable
to a DFEVar will result in a constant. However, it is not possible to read a
DFEVar and assign its value to a Java variable (see Figure 15.10).
This principle described above means that we can use Java variables and
control constructs to shape the structure of our dataflow graph. Let us consider
an example of a nested loop as shown in Figure 15.11. We observe that the
outer for loop performs an iteration over data, while the inner for loop
describes a computation with a cyclic dependency of v from one loop iteration
to another.
This example can be effectively transformed into a dataflow description
as illustrated in Figure 15.12. Again, the outer loop is replaced by streaming
inputs and outputs. The inner loop is described with the same for for loop
statement in Java, but the compilation of this loop will result in an unrolled
implementation of the loop body in space, as depicted in Figure 15.13. Unlike
1 c l a s s Loop e x t e n d s K e r n e l {
2 Loop ( ) {
3 DFEVar d = i o . i n p u t ( ” d ” , d f e F l o a t ( 8 , 2 4 ) ) ;
4 DFEVar v = 2 . 9 1 − 2 . 0 ∗ d ;
5 f o r ( i n t i t e r a t i o n = 0 ; i t e r a t i o n < 4 ; i t e r a t i o n +=
1) {
6 v = v ∗ (2.0 − d ∗ v) ;
7 }
8 i o . output ( ” output ” , v , d f e F l o a t ( 8 , 2 4 ) ) ;
9 }
10 }
2.0 d
2.91 – X
v
Iteration 0
X
2.0 –
X
v
Iteration 1
X
2.0 –
X
v
Iteration 2
X
2.0 –
X
v
Iteration 3
X
2.0 –
FIGURE 15.13: The result of the MaxJ loop is an unrolled and pipelined
data path.
Multiscale Dataflow Computing in Finance 455
1 DFEVar x = i o . i n p u t ( ” x ” , d f e F l o a t ( 9 , 3 1 ) ) ;
2 DFEVar a = i o . i n p u t ( ” y ” , d f e F l o a t ( 9 , 3 1 ) ) ;
3
4 DFEVar y1 = x ∗ 5 ;
5 DFEVar y2 = x − 7 ;
6
7 DFEVar y = a > 3 ? y1 : y2 ;
8
9 i o . output ( ” y ” , y , d f e F l o a t ( 9 , 3 1 ) ) ;
FIGURE 15.14: Data-dependent control with the ternary operator and use
of a custom number format.
the original loop in C, the for loop in MaxJ does not carry out four iterations
at run time. Instead, the compiler can resolve the dependency of v from one
loop iteration to another and construct an unrolled, acyclic data path where
the calculation inside the loop body is replicated four times, and each v is
connected to the result from the previous iteration.
The previous example has shown how a Java for loop can be used to
control the replication of statements inside the loop body into an unrolled
data path. Likewise, it is possible to use Java conditionals such as if or case
to control the construction of the dataflow graph. The Java if condition is
evaluated at compile time, and the block of code inside the conditional state-
ment will be added into the dataflow graph only if the condition is evaluated
as true.
However, we cannot use a Java conditional on DFEVars because their
value will be only known at run time. As previously mentioned, run-time
dependent behavior is undesirable as it is against the principles of static
dataflow computing. If a data-dependent decision needs to be made, then
this can be expressed using the ternary operator ? : (see Figure 15.14).
This example results in data-dependent control, but in the data path, both
y1 and y2 will be computed concurrently. At the output, we simply select
one of the two results, depending on the value of a. This switching will be
very fast and will not delay or stall the stream processing. However, it also
means that we require resources for both computations on the DFE chip even
though only one of the two outputs will be used at any time. This makes
this type of control effective for fast, small-scale switching. For switching
between larger blocks of computation, it might be more effective to imple-
ment separate DFE kernels and handle the switching and control from the
CPU host.
Figure 15.14 also illustrates that custom number formats other than con-
ventional single or double-precision floating point can be used. In this exam-
ple, we use a 9-bit exponent and a 31-bit mantissa, which offers better scaling
and precision than single precision (8, 24 bit) but less than double precision
(11, 53 bit). Likewise, it is possible to use any arbitrary fixed-point or integer
format. The application developer can use such custom number formats to
456 High-Performance Computing in Finance
1 DFEVar x = i o . i n p u t ( ” x ” , d f e F l o a t ( 8 , 2 4 ) ) ;
2 DFEVar p r e v = s t r eam . o f f s e t ( x , −1) ;
3 DFEVar n e x t = stream . o f f s e t ( x , 1 ) ;
4 DFEVar y = ( p r e v + x + n e x t ) / 3 ;
5 i o . output ( ” y ” , y , d f e F l o a t ( 8 , 2 4 ) ) ;
FIGURE 15.15: Using stream offsets to access values with relative offsets
in the stream.
However, in some cases we need to access values that are ahead or behind
the current element in the data stream. For example, in a moving average
filter we need to compute:
xi−1 + xi + xi+1
yi = (15.2)
3
In dataflow computing, x is a stream rather than an indexed array, and we
need a way of accessing elements of the same stream with other indices than
the current one. This can be achieved with the [Link] method that
accesses values with a relative offset from the current value in the stream. In
the moving average example (see Figure 15.15), we need the previous value
(−1) and the next value (+1):
Figure 15.16 illustrates how a DFE application interacts with the CPU host
application. On the right side, we see the moving average kernel MAVKernel
from our last example. As previously mentioned, we also create a manager to
describe the connectivity between the kernel and the available DFE interfaces.
In Figure 15.16, the kernel is connected directly to the CPU, and all of the
communication will be facilitated via PCIe. The manager also makes visible
to the CPU application all the names of the kernel streaming inputs and
outputs. Compiling the manager and kernel will produce a .max file that can
be included in the host application code. In the host application, running
the moving average calculation will be performed with a simple function call
to MAVKernel(). In this example, the host application is written in C but
MaxCompiler can also generate bindings for a variety of other languages such
as MATLAB or Python.
MaxelerOS and the SLiC library provide a software layer that facilitates
the execution and control of the DFE applications. The SLiC Application
Programming Interface (API) is used to invoke the DFE and process data
on it. In the example in Figure 15.16, we use a simple SLiC interface and
Multiscale Dataflow Computing in Finance 457
x
CPU Memory
Host code
MATLAB, Python, C... –1 +1
DFE
+
SliC
MaxelerOS PCI x
–1 +1
+
+
3
+
Main /
3
Express /
memory x y y
Manager
y
the simple function call MAVKernel() will carry out all DFE control functions
such as loading the binary configuration file and streaming data in and out
over PCIe. More advanced SLiC interfaces are also available that provide the
user with additional control over the DFE behavior. For example, in many
cases it is beneficial to transfer the data to DFE memory (LMem) first and
then start the computation. This is one of many performance optimizations,
which we will briefly cover in the following section.
architecture while at the same time optimizing the dataflow structure to match
the requirements of the algorithm. Another key difference to traditional soft-
ware design is the implementation and optimization cycle. In software design,
a developer would typically implement a design, go on to profile and evaluate
the performance of the current implementation, and then tweak the implemen-
tation. In dataflow design, we adopt a different approach where the design is
optimized before it is implemented. The behavior inside a DFE is very pre-
dictable and we can therefore plan and precisely predict the performance of a
possible solution without even implementing it. This means the design will be
analyzed and optimized with simple spreadsheet calculations before we create
the final implementation.
This development process is illustrated in Figure 15.17. The first step con-
sists of an application analysis phase. The purpose of this step is to establish
an understanding of the application, the data set, the algorithms used, and
the potential performance-critical parts. Since we will codesign an algorithm
and its dataflow architecture, this analysis should cover all parts of the com-
putational problem, from the mathematical formulation and algorithm to the
architecture and implementation details. Typical considerations are the type
and regularity of the computation, the ratio between computation and mem-
ory accesses, the ratio of computation to disk IO or network communication,
and the balance between recomputation and storage of precomputed results.
All these aspects can have a significant impact on the performance of the
final implementation. If, for instance, an application is limited by the speed
at which data can be read from disk, then optimizing the throughput of the
compute kernel beyond that limit will have no benefit.
The second step involves algorithmic transformations. A designer could
attempt to choose a different algorithm to solve the problem, or transform
the code, data access patterns or number representations. A typical example
of an algorithmic transformations is to change the number format: Choos-
ing a smaller number representation can support more IO bandwidth, and
higher computational performance, but the numerical effects of the algorithm
have to be well understood. The reconfigurable technology used inside the
DFEs support far greater flexibility in the available number formats than
all conventional processors. Instead of choosing from single or double pre-
cision floating point, a design can exploit a custom format with arbitrary
bit-widths of its exponent and mantissa. Another common optimization is the
Multiscale Dataflow Computing in Finance 459
12 pipes
LMem
load
(the pipeline depth). These values will vary between DFE generations and
can only be truly determined through performing place-and-route using the
reconfigurable chip vendor tools. The latter takes a significant amount of com-
pute time (typically 10s of hours). Figure 15.18 shows the layout of the DFE
correlation with 12 parallel pipes each with a pipeline depth of 12 stages, so
that 144 data elements can be simultaneously processed with a throughput of
12 data elements per clock cycle. Even when assuming a low clock frequency,
this design is able to perform all 36 million required multiplications in less
than 30 ms.
• Database integration
interest rate swaps, involving bootstrapping the Overnight Index Swap (OIS)
curve and the London Interbank Offered Rate (LIBOR) curve, followed by gen-
erating swap cashflow schedules, valuing swaps, and calculating swap portfolio
risk. Each stage is available as either a CPU or a DFE library component and
can be accessed via number of convenient APIs. The implementation provides
construction of and access to all intermediate and final objects.
Depending on the characteristics of the swap pricing application, DFE
acceleration can be beneficial at one or more stages of the computa-
tion. Table 15.1 illustrates two possible module configurations where the
performance-critical DFE acceleration can be carried out at different stages of
the pipeline. Modular design of Maxeler’s RiskAnalytics allows the user appli-
cation to dynamically load balance between CPUs and DFEs, and to target
heavy compute load to DFEs, leaving CPUs to support application logic and
lighter compute loads. DFE functionality can be switched in real time by using
MaxelerOS SLiC API functions. Fully pipelined, a Maxeler DFE-equipped 1U
MPC-X node can value a portfolio of 10-year interest rate swaps at a rate of
over 2 billion per second—including bootstrapping of the underlying interest
rate curves.
15.7.3 Value-at-risk
Value-at-risk, or VaR, a measure widely used to evaluate the risk of loss on
a portfolio over a given time period. VaR defines the loss amount that a portfo-
lio is not expected to exceed for a specified level of confidence over a given time
frame. VaR can be calculated in a number of ways (e.g., using fixed historical
scenarios, or using arbitrarily specified scenarios, a delta-based approach, or
using Monte Carlo generated scenarios). Irrespective of the method chosen,
the VaR computation involves evaluating many possible market scenarios, a
technique that is computationally very demanding. Regardless of the chosen
approach, the computation of VaR using conventional technology is frequently
slow and often inaccurate, as well as being unstable in the tail of the loss distri-
bution, resulting in uncertainty in risk attribution and difficulty in optimizing
against portfolio VaR targets. This is illustrated in Figure 15.20a, where the
tail of the loss distribution for a mixed portfolio of interest rate swaps exhibits
a stepwise profile, making it extremely difficult to accurately manage portfolio
VaR.
Mitigating these problems requires massively increased number of scenar-
ios, in order to provide higher resolution in the tail of the loss distribution,
in order to significantly improve stability for risk attribution and/or provide
464 High-Performance Computing in Finance
(a) (b)
FIGURE 15.20: Value-at-Risk with 10,000 scenarios (a) and 500,000 sce-
narios (b).
greater visibility of the impact of market and portfolio changes. This is clearly
illustrated when comparing Figure 15.20a and b. In the second case, the num-
ber of Monte Carlo scenarios is increased by a factor of 50, resulting in far
greater granularity in the tail of the loss distribution leading to improved accu-
racy of portfolio risk management. Fully pipelined, a Maxeler DFE-equipped
1U MPC-X node can compute full revaluation VaR on a portfolio of 250,000
10-year interest rate swaps (equivalent to a rate of over 2 billion swaps per
second)—including bootstrapping of the underlying interest rate curves, as
well as scenario construction.
Increasing the number of Monte Carlo scenarios as suggested above obvi-
ously increases the computational requirements, but with DFE acceleration,
the extra scenarios can be easily and practically achieved. When the accu-
racy of computation is increased, several new approaches to VaR can become
feasible:
a single 1U MPC-X node can replace between 19 and 221 conventional CPU-
based units. The power efficiency advantage due to the dataflow nature of the
implementation also ranges between 1 and 2 orders of magnitude.
3.0%
2017 2019
2.5%
2.0%
Capital
1.5%
1.0%
0.5%
0.0%
Current SA-CCR BA-CVA SA-CVA SA-CVA
(managed)
their return on capital but should also be able to expand their OTC derivative
franchises relative to competitors who are more capital (and leverage ratio)
constrained.
A limiting factor for market participants is the computational complex-
ity of the CVA calculation, which requires an entire subportfolio of deriva-
tive assets to be projected through multiple scenarios to determine the
potential exposure to counterparty default. Since large portfolios of trades
between two parties can extend to tens of thousands of assets, each asset
must be projected until its maturity date (which can run as long as 30
years) and a large number of scenarios must be used to appropriately cap-
ture tail risk, this quickly becomes a problem too large in scale for all tradi-
tional CPU-based implementations (including state-of-the art and overclocked
systems).
Utilizing the DFE-accelerated components of the Maxeler RiskAnalytics
library, CVA calculations can be made practical at enterprise scale for large
financial institutions. Additionally, banks are also becoming more rigorous
about pricing capital via KVA. This requires the simulation of future capi-
tal requirements rather than just the calculation of spot capital. Under the
SA-CVA4 approach, for example, this is particularly time-consuming because
this methodology is sensitivity based and so these sensitivities would need
to be calculated in all possible future states. Nevertheless, due to the con-
vexity of capital requirements (i.e., capital can go up more than it can go
down) such a calculation is extremely important to accurately quantify KVA.
Figure 15.23 shows such a capital projection run on Maxeler DFEs. The
bold red line is the expected value, often known as the expected capital
profile (ECP).
16%
14%
12%
10%
Capital
8%
6%
4%
2%
0%
0 2 4 6 8 10
Time (years)
to risk factors, such as interest rates or equity prices, are computed and then
aggregated using a proscribed set of weights and correlations. One key draw-
back of the framework as currently drafted is that the methodology to calcu-
late these sensitivities is not similarly proscribed, so counterparties may not
agree on the inputs to the model. Exposures may be netted within a single
counterparty portfolio, but are grossed up between counterparties, so firms
have a strong incentive to reduce initial margin requirements through careful
matching of trades and portfolio compression.
In order to calculate the initial margin requirement for a new trade, every
trade already existing between two counterparties must be revalued under
both a base scenario and a number of shocked scenarios, so that any risk off-
sets between the new trade and the existing portfolio are taken into account.
If this trading portfolio is already large, a significant performance cost can
be incurred simply transferring it from disk storage to memory—in a DFE-
accelerated solution, the portfolio can instead be stored in LMem and accessed
directly as needed. Combining this memory model with high-performance pric-
ing engines like that described in Section 15.7.2 allows for margin calculations
to occur at the speed of trading.
Maxeler’s SIMM calculation product on Amazon Web Services (AWS)
combines the ease of deployment of a cloud service with an industry-proven
risk analytics infrastructure. Our SIMM product splits naturally into the cal-
culation of sensitivities, and the application of risk weights and aggregation.
The Maxeler RiskAnalytics library provides the framework for calculating
greeks on CPUs as well as on Maxeler DFEs. Initial margin requirements can
be calculated directly from a portfolio of trades supplied in FpML format, or
Multiscale Dataflow Computing in Finance 469
15.8 Conclusion
Cutting-edge applications in computational finance depend on highly capa-
ble computational systems while scaling over current computing technology is
becoming increasingly problematic. Maxeler has pioneered a novel vertically
integrated, dataflow oriented approach that can deliver orders-of-magnitude
improvement in performance, data center space, and power consumption for
a wide range of real highly demanding applications. DFEs provide a highly
efficient computational model for data- and compute-intensive parts of an
application. In addition, DFE-based systems can be balanced with all other
computing system resources, for example, CPUs and storage, according to the
specific requirements of the application. Elastic scaling of these systems to the
public cloud to cope with peak performance demands is another key advan-
tage to be mentioned. Maxeler supports a high-level programming model that
allows application experts to harness the computational power of dataflow
systems and focus on optimizing their applications all the way from the for-
mulation of the algorithm down to the design of the best possible dataflow
architecture for its solution. This dataflow technology is key to many finance
applications where a more complex model, larger data sets, or more fre-
quent recomputation often directly translates into monetisable, competitive
advantage. A number of DFE-based products for analytics and trading are
available from Maxeler, and we described several practical application scenar-
ios that could not be achieved with conventional, general-purpose computing
technology.
470 High-Performance Computing in Finance
References
1. Godfrey, M., Hendry, D.: The computer as von Neumann planned it. Annals of
the History of Computing, IEEE 15(1), 11–21, 1993.
2. Pell, O., Mencer, O.: Surviving the end of frequency scaling with reconfigurable
dataflow computing. SIGARCH Computer Architecture News 39(4), 60–65, 2011.
3. Chau, T.C.P., Niu, X., Eele, A., Maciejowski, J., Cheung, P.Y.K., Luk, W.:
Mapping adaptive particle filters to heterogeneous reconfigurable systems. ACM
Transactions on Reconfigurable Technology and Systems 7(4), 36:1–36:17, 2014.
4. Thomas, D.B., Luk, W.: Multiplierless algorithm for multivariate gaussian ran-
dom number generation in FPGAs. IEEE Transactions on VLSI Systems 21(12),
2193–2205, 2013.
5. Lindtjorn, O., Clapp, R., Pell, O., Fu, H., Flynn, M., Mencer, O.: Beyond tradi-
tional microprocessors for geoscience high-performance computing applications.
IEEE Micro 31(2), 41–49, 2011.
6. Weston, S., Spooner, J., Racanière, S., Mencer, O.: Rapid computation of value
and risk for derivatives portfolios. Concurrency and Computation: Practice and
Experience 24(8), 880–894, 2012.
10. Jin, Q., Dong, D., Tse, A.H.T., Chow, G.C.T., Thomas, D.B., Luk, W.,
Weston, S.: Multi-level customisation framework for curve based Monte Carlo
financial simulations. In: Reconfigurable Computing: Architectures, Tools and
Applications—8th International Symposium, ARC, pp. 187–201. Springer, 2012.
11. Fu, H., Gan, L., Clapp, R.G., Ruan, H., Pell, O., Mencer, O., Flynn, M.J.,
Huang, X., Yang, G.: Scaling reverse time migration performance through recon-
figurable dataflow engines. IEEE Micro 34(1), 30–40, 2014.
12. Pell, O., Bower, J., Dimond, R., Mencer, O., Flynn, M.J.: Finite-difference wave
propagation modeling on special-purpose dataflow machines. IEEE Transactions
on Parallel Distributed Systems 24(5), 906–915, 2013.
13. Gan, L., Fu, H., Yang, C., Luk, W., Xue, W., Mencer, O., Huang, X., Yang,
G.: A highly-efficient and green data flow engine for solving Euler atmospheric
equations. In: 24th International Conference on Field Programmable Logic and
Applications, FPL 2014, Munich, Germany, September 2–4, 2014, pp. 1–6. IEEE,
2014.
14. Arram, J., Luk, W., Jiang, P.: Ramethy: Reconfigurable acceleration of bisulfite
sequence alignment. In: Proceedings of the International Symposium on Field-
Programmable Gate Arrays (FPGA), pp. 250–259. ACM, 2015.
Chapter 16
Manycore Parallel Computation
CONTENTS
16.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472
16.1.1 What a practitioner needs to know . . . . . . . . . . . . . . . . . . . . . 472
16.1.2 Outline of the chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472
16.2 The Parallelism Imperative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473
16.2.1 Moore’s Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473
16.2.2 Dennard Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474
16.2.3 Performance then and now . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474
16.3 Systems Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475
16.3.1 Building blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475
16.3.2 Basic CPU architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 476
16.3.3 Basic GPU architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477
16.4 Parallelism and Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477
16.4.1 Amdahl’s Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477
16.4.2 Gustafson’s Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 478
16.4.3 Little’s Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479
16.4.4 Task parallel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479
16.4.5 Instruction parallel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479
16.4.6 Data parallel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 480
16.5 Parallelism and Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 480
16.5.1 Logical threading models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 480
16.5.2 Physical execution models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481
16.5.3 Data structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 482
16.6 Putting It All Together . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484
16.7 The LIBOR Market Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 486
16.7.1 The LMM in discrete time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 488
16.7.2 Multiple regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489
16.7.3 Packages and hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491
16.7.4 Design overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492
16.7.5 Memory use, threads, and blocks . . . . . . . . . . . . . . . . . . . . . . . . 493
16.7.6 Path generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495
16.7.7 Product specification and design . . . . . . . . . . . . . . . . . . . . . . . . 497
16.7.8 Least-squares and multiple regressions on the GPU . . . . 498
16.7.9 The data collection phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499
471
472 High-Performance Computing in Finance
16.1 Introduction
In this chapter, we will argue that there is a parallelism imperative: quants
must learn to write effective parallel code in order to take advantage of future
computing hardware. We provide a grounding in the basic computer science
and hardware considerations needed to explore effective parallelism in more
depth. These are, fortunately, far simpler than the typical financial mathemat-
ics encountered in computational finance. We spend the bulk of the chapter
applying parts of these foundational points in working a detailed example of
coding a nontrivial early exercise LMM problem on the GPU, which highlights
the key issues of writing highly parallel code. The results clearly highlight the
parallelism imperative—quants who leverage parallel execution well can gain
a significant advantage over competitors who do not.
Section 16.3. In Sections 16.4 and 16.5, we start to fill our parallel computing
design considerations toolbox. We switch gears in Section 16.6, and begin to
work an actual example, which we detail in Section 16.7.
sockets. Each core will have a dedicated set of 2/4 or 4/8 single/double pre-
cision vector lanes.
CPUs based on other microarchitectures such as ARM or IBM Power will
have different characteristics, but these are currently less commonly used in
finance and are out of the scope of this chapter, as are systems based on
AMD’s x86 chips.
time of the code is P , and the total execution time is Told , then if you can
infinitely accelerate the parallel portion of the code, or equivalently, drive
the execution time of the parallel portion of the code to zero, then the new
execution time is given by
Tnew = (1 − P ) ∗ Told ,
This is closely related to the concept of strong scaling, which looks at the ratio
of improvement in compute time to the number of processing units engaged in
the computation. Normally, there is a point where strong scaling breaks down,
as communications and coordination costs overwhelm the actual computation.
The strong scaling properties of a code on a system limit our ability to achieve
the theoretical Amdahl limit. Combined with Gustafson’s Law, below, this is
extremely useful in defining exceptions for performance on existing and future
hardware.
From a parallel speedup perspective, if a code spends 90 seconds in poten-
tially parallel sections and 30 seconds in serial sections, a 3× speedup of the
parallel code reduces execution time from 2 minutes to 1 minute. If in a sec-
ond phase of the performance tuning the parallel section is now accelerated
by 9×, the overall elapsed time goes to 40 seconds. This highlights the law of
diminishing returns from a performance perspective.
From a strong scaling perspective, if the problem at hand is to evaluate
32,000 paths, and the parallelism of the problem is at the path level, then
adding threads up to 32,000 could result in some performance improvement—
any threads/cores added after that have no useful work to do and contribute
no additional performance. Strong scaling performance cliffs are not always
this dramatic.
in the register file is fast. Because of the limit on the number of resident
threads, the compiler and the hardware need to ensure that a given thread
completes as much work as possible, and keeps the CPU core as busy as
possible, while it is active. Logically, threads have great flexibility in passing
messages and sharing memory, with a few more options for threads within a
given process than for threads between processes. CPU threads thrive on task
parallelism, have dedicated hardware to take advantage of ILP, and if written
to take advantage of the vector units, have some ability to leverage them for
data parallelism as well. Highly performant code must take advantage of all
three types of parallelism.
A logical thread on a GPU is quite different. A GPU thread is always part
of a hierarchy that includes a local thread block and a set of thread blocks
called a grid (which in our definitions here is a task). A GPU thread is a very
lightweight construct, with the majority of startup and teardown managed in
hardware. GPU threads have three main states—waiting to be active, active
and in the register set of a core, and completed. All the threads in a local
thread block are in the same main state; and threads within an active thread
block can communicate and synchronize through a variety of mechanisms.
Threads that are not in the same thread block can only communicate through
the main device memory. GPUs excel at data parallelism, although they can
take limited advantage of ILP and, particularly with more recent GPUs, can
exploit task parallelism as well. The logical threading model for the GPU does
not allow the programmer to assume or force any particular ordering for the
execution of thread blocks.
Whereas efficient CPU code will normally have one or two active threads
per core, efficient GPU code will have many active threads per core—and
with many more cores. Cooperative CPU codes may use dozens of threads,
but GPUs will not function efficiently without thousands of threads, and rou-
tinely are used with millions of logical threads. This is very much due to
differences in the complexity of the large CPU cores versus the smaller but
more numerous GPU cores (bringing Amdahl’s and Gustafson’s Laws into
play) and the latency and bandwidth of the memory subsystems (bringing in
Little’s Law).
When a core needs data that isn’t already in the local register file, it will
normally begin a cascade of checks down the cache hierarchy until the memory
management subsystem finds the data it is looking for; this is then hoisted
through the various cache layers into the local register file where it can be
used as an operand in instructions. Writes that spill out of the register file or
are directed at memory proceed to invalidate caches in other cores until they
reach system memory, causing cache synchronization traffic.
On a GPU, things are less familiar but in many ways simpler. When a
task (kernel) is launched onto the GPU by a CPU side thread, a grid of GPU
threads is created. The threads are grouped into thread blocks, and each
thread block is independently scheduled onto a single SM. Depending on the
hardware and resources required, several thread blocks may be resident on
an SM at a given time and many SMs can be servicing thread blocks from
a grid at the same time. Once a thread block is active, it will remain active
on that SM until all its threads have exited, having processed their section of
the overall dataset. As a programmer, you have no control over or ability to
assume what order thread blocks will execute in nor can you communicate,
coordinate, or synchronize with other thread blocks except through global
device memory.
Memory management on the GPU presents several more options that the
programmer can consider. Reads and writes to and from a device global mem-
ory behave similarly to CPU reads and writes. Alternatively, however, a thread
block can declare a small area of memory to be a shared memory. This creates
a private allocation of memory, physically drawn from the same pool as the L1
cache, that is visible to all the threads in the thread block and to no threads
outside that thread block. Because this data is very near to the cores in the
memory hierarchy, it is extremely fast. And because it is not coherent with
any other memory in the system, it pays no synchronization or more distant
memory management costs. It is effectively a private working space for the
threads in a thread block to share data.
Drawing on their graphics heritage, GPUs also have additional paths from
global device memory to the cores that bypass some or all of the conventional
caches—the most commonly used are the constant cache and texture cache.
The constant cache is now well leveraged by the compiler and, as its name
suggests, is well suited to delivering a very small number of values to all the
threads in the SM. The texture cache is another path to memory that has
slightly different caching behavior from the standard caches. The usefulness
of the texture cache to the programmer changes from generation to generation
on the GPU. Some algorithms and data access patterns can make good use of
the texture cache.
cache line and uses 3/4 of it. Effective BW goes from 0.45 to 0.75 of peak—a
huge improvement. In the second example, each thread needs all three cache
lines—9 requests to memory, with an efficiency of 0.25 as opposed to 0.32,
which is only slightly worse.
Good data structure design needs to balance usability and performance.
As these toy examples show, data layout can have a huge impact on how
efficient a code can be on any given hardware.
products are the most common exotic interest rate derivatives. However, no
special features of the product are used and other products can be handled
by modifying the functions defining the coupons with no changes elsewhere.
We note that there has been previous work on the use of GPUs for Bermu-
dan/American options. Dang, Christara, and Jackson (2010) proceed using a
PDE approach. Abbas-Turki and Lapeyre (2009) use a least-squares Monte
Carlo approach. However, the case they study is four-dimensional and indica-
tive rather than realistic so it is difficult to know how their techniques would
translate to the high-dimensional interest rate case. Most other work appears
to be focussed on binomial trees and/or the one-dimensional case.
In this example, we focus on lower bounds. However, another part of the
Bermudan pricing problem is upper bounds: one cannot be sure a lower bound
is good without a nearby upper bound. To do a regression-based method such
as that in Joshi and Tang (2014) would not be a particularly hard extension of
the work here. First, one would have to compute Deltas along each path using
adjoint differentiation techniques. Second, one would have to use similar tech-
niques to those here to compute regression estimates of their value. Third, one
would run a hedging simulation using these estimates. To do a method such
as Andersen and Broadie (2004) would be more challenging since it involves
running many subsimulations. One could put each subsimulation on the GPU
as a separate kernel; however, that would result in a very large number of
kernel launches and possibly not much of a relative speedup unless large num-
bers of paths were being used. We defer the problem of designing a smarter
implementation to future work.
We review the LIBOR market model in Section 16.7. We discuss its algo-
rithmic implementation in a discretized setting in Section 16.7.1. We develop
new ideas for early exercise in Section 16.7.2. We discuss the software and
hardware used, in Section 16.7.3. We outline the design of the code in Sec-
tion 16.7.4. The intricacies of memory use are examined in Section 16.7.5. We
study how to evolve the LIBOR market model on the GPU in Section 16.7.6.
The specification of products is done in Section 16.7.7. We go into the imple-
mentation details of regression on the GPU in Section 16.7.8. Section 16.7.9
covers methodologies for data collection to prepare for least-squares. Pricing
is discussed in Section 16.7.10. We present timings and numerical results in
Section 16.7.11 and we conclude in Section 16.8.
Mark would like to thank Oh Kang Kwon for his assistance with coding a
Brownian bridge and with a skipping Sobol generator. Mark is also grateful
to Jacques Du Toit for his comments on an earlier version of this work.
as Andersen and Piterbarg (2010), Brace (2007), Brigo and Mercurio (2006),
and Joshi (2011) for more details.
Since it was given a firm theoretical base in the fundamental papers by
Brace, Gatarek, and Musiela (1997), Musiela and Rutkowski (1997), and
Jamshidian (1997), the LIBOR market model has become a very popular
method for pricing interest rate derivatives. It is based on the idea of evolving
the yield curve directly through a set of discrete market observable forward
rates, rather than indirectly through the use of a single nonobservable quantity
which is assumed to drive the yield curve.
Suppose we have a set of tenor dates, 0 = T0 < T1 < · · · < Tn+1 , with
corresponding forward rates f0 , . . . , fn . Let δj = Tj+1 − Tj , and let P (t, T )
denote the price at time t of a zero-coupon bond paying one at its maturity,
T . Using no-arbitrage arguments,
P (t,Tj )
P (t,Tj+1 ) −1
fj (t) = ,
δj
where fj (t) is said to reset at time Tj , after which point it is assumed that it
does not change in value. We work solely in the spot LIBOR measure, which
corresponds to using the discretely compounded money market account as
numeraire, because this has certain practical advantages (Joshi, 2003a). This
numeraire is made up of an initial portfolio of one zero-coupon bond expiring
at time T1 , with the proceeds received when each bond expires being reinvested
in bonds expiring at the next tenor date, up until Tn . More formally, the value
of the numeraire portfolio at time t will be
#
η(t)−1
N (t) = P t, Tη(t) (1 + δi fi (Ti )),
i=1
and thus gives the index of the next forward rate to reset.
Under the displaced diffusion LIBOR market model, the forward rates
that make up the state variables of the model are assumed to be driven by
the following process:
dfi (t) = μi (f, t)(fi (t) + αi ) dt + σi (t)(fi (t) + αi ) dWi (t), (16.1)
i
(fj (t) + αj )δj
μi (f, t) = σi (t)σj (t)ρi,j ;
1 + fj (t)δj
j=η(t)
• The pseudo-square root, Aj−1 , of the covariance matrix of the log forward
rates for each time step from Tj−1 to Tj
Note that one could equivalently specify the covariance matrix, Cj , for the
time step instead of the pseudo-square root. The pseudo-square root uniquely
determines the covariance matrix, of course, and it is the covariance matrix
that determines the drifts. However, since we are working with reduced-factor
models, a pseudo-square root is a more natural object. See Joshi (2011) for
Manycore Parallel Computation 489
discussion of this approach to calibration. Plus when working with low discrep-
ancy numbers, it is generally believed that working with a spectral pseudo-
square root can improve convergence (Giles, Kuo, Sloan, and Waterhouse,
2008), Jäckel (2001), so it is convenient to specify this explicitly.
We use log rates xr = log(fr + αr ) and the rates fr as convenient. We need
to compute drifts; these are state dependent so only the drift at the start of
the first step is known in advance. We use a predictor–corrector algorithm so
they have to be computed twice per step. The discretized drift of a log forward
rate fr across step j is
r
(fr (t) + αr )δr
μj,r = −0.5Cj,rr + Cj,rl ,
1 + fr (t)δr
l=0
where t = Tj−1 when predicting and Tj when correcting. Whilst this expres-
sion is correct, it is inefficient from a computational perspective and an algo-
rithm giving the same numbers with lower computational order is presented
in Joshi (2003b) and we use it here.
Our evolution algorithm for the rates on each path is therefore as follows:
1. Draw uncorrelated N F standard normals for a quasi-random generator.
Once the forward rate path has been developed, other ancillary quantities
such as discount factors are easy to compute.
N d = N1 θ d
θ = 0.11/5
when using 327,680 paths so that the final regression has roughly 32,768 paths.
The advantage of this approach is that by only discarding a small fraction of
paths that are very far from the money at each stage, we are less likely to be
affected by a substantial misestimation of the boundary.
Many authors use second-order polynomials in forward rates and swap
rates as basis functions following Piterbarg (2004). We will take the first for-
ward rate, the adjoining coterminal swap rate and the final discount factor
as our basis variables. The basis functions are then quadratic polynomials in
these with or without cross-terms.
3. A second Monte Carlo simulation is run using the exercise strategy gen-
erated in the second phase using N2 paths. This is equivalent to pricing a
path-dependent derivative with no optionality since the exercise strategy
has been fixed.
It is the first phase that requires most care. The reason being that the data
generated has to be stored until the end of the second phase. Thus we will
require memory proportional to N1 . For our design, we keep all this data on
the GPU at all times and so there must be sufficient memory to store it. If the
GPU’s total global memory is M and we store m bytes per path, we have the
immediate constraint
N1 m < M.
A Tesla K20c has 4800 megabytes of global memory so if we run 327,680
paths, the maximum storage per path is 15,360 bytes. In practice, since other
data must be stored the maximum would be lower. A float takes 4 bytes so
we have storage for less than 3840 floats per path.
If our forward rate evolution has N rates and n steps, to store the entire
evolution for a path will take N n rates. If we take N = n, we will run out of
memory for some N < 62. Whilst one could squeeze some more memory by
discarding already reset rates, it would complicate accesses and there would
be very little left for other computations. In practice, one will often want to
develop many pieces of auxiliary data for all paths simultaneously, such as all
the implied discount factors, which multiplies the memory requirements.
We therefore adopt an approach based on batching. Thus rather than
storing everything about 327,680 paths, we divide into say 10 batches and only
store the aspects of the paths that are required for the backwards induction.
So what must be stored? First, we note that we only need data at exercise
times. We use “exercise step” to mean the step from one exercise date to the
next. This may or may not be the same as the step from one reset date to the
next.
• The sum of the discounted values of any cash flows generated by the
product during the exercise step. This yields 1 float per exercise step.
• The value of the numeraire at the start of the exercise step again yields
1 float.
Manycore Parallel Computation 493
• The basis variables for the exercise time are an input according to choices
of basis functions but typically 3 is enough.
drastic effects on the speed of a program. The principal sorts of memory are
(with the amount on a K20c in parenthesis):
• Global—the graphics card’s main memory (4800 MB), large and plentiful
but must be accessed correctly or slowness occurs
Memory transfer between host and global memory is typically slow for
large data sets and is often the main bottleneck in GPU programs. Kooderive
avoids this issue by simply not using host memory after the set-up phase.
Thus the model calibration and product specifications are passed to the GPU
initially but the only data passed back thereafter is the mean values for paths.
The layout of how data is stored in global memory greatly affects speed.
This is a consequence of the fact that threads do not access global memory
independently. Each thread has a thread number, t, and a block number, b.
We will call the total of number of threads the width, w. The number of
threads per block we denote s for size. If the code is written so that in a warp,
thread t accesses location l + t for some l, then the access is coalesced and
occurs quickly. However, if the mapping is more complicated and each thread
accesses f (t) for some nontrivial function f such as f (t) = l + αt for some
α > 1, memory access is slower because more cache lines must be fetched to
service all the memory requests. A common approach throughout Kooderive
is that thread t in block b is responsible for the path
t + bs + kw,
for all k such that t + bs + kw is less than the total number of data points
(typically the paths in a batch). Data is then stored with the path in the
smallest dimension, the time step in the largest dimension, and any other
index in the middle. Thus if there are R rates, N steps, and P paths, forward
rate r on time step s for path p would be stored at location
p + rP + sRP.
• Scrambling
• Conversion to normals
• Brownian bridging
• Path generation
• Indices that denote which rates are not yet reset for each step (texture)
• The initial logs of the rates (texture)
• The quasi-random variates (texture)
from Joshi (2003b) requires floats equal to the number of factors to store the
partially computed sums. We therefore use factors times block-size floats in
shared memory for each thread as storage. Given these facts, the evolution
for each path is then straightforward and the coding of the algorithm is little
different from that of a C program.
For step 0, we multiply the variates by the pseudo-square root and add
them to the log rates. We add on the drifts. We then compute the state-
dependent drifts at the end of the step. We correct the values of the log
rates and the rates. We store their values in the output data and we use this
location to retrieve them when needed during this kernel. We then use the
rates to compute the discount factors implied by these forward rates for the
step. For the other steps, we first compute the drifts at the start of the step,
and then do as for step zero.
Whilst the code is robust against variation in block and grid size, the com-
bination of 128 threads per block and 256 blocks proved effective when using
32,768 paths per batch. Note that this implies that each thread does precisely
one path. A slight further optimization could be obtained by rewriting the
code not to handle other cases, but the gains do not seem sufficient for this
to be worthwhile.
path is terminated when the product indicates to do so. The design is set up
so that the product is able to store auxiliary data such a running coupon if
necessary which allows the possibility of path dependence. Alternatively, one
of the rates passed in could be made path dependent.
Exercise values are generated independently of the product, again allowing
maximum flexibility.
1, x, y, z, x2 , y 2 , z 2 ,
and have 7. The code is templatized on the algorithm for turning variables
into functions to allow flexibility.
Thus at the start of each step, we first use a kernel to generate the basis
functions for the step. Note that we only ever store the full basis functions for
one step at a time to reduce memory usage.
Once the basis functions are known, we have to find the minimal least-
squares error solution of a highly overdetermined rectangular system
Ax = y,
where A has N1 rows. Each row consists of the values of the basis functions
for one path for the step. The target y is the discounted future cash flows
for the path. Typically, N1 = 327,680 and there are 10 basis functions so the
system is very overdetermined. We solve in two phases. First, we write
(At A)x = At y,
with the sum taken over a subset of i. The subsets for different blocks partition
the paths, and we use 1024 blocks. So at the end, we have 1024 numbers for
each matrix entry still to be summed. These sums are performed by a second
kernel which uses a different block for each entry. The computation is then
done by copying the data into shared memory and then using a repeated
binary summation so that each thread in the first half adds the value for the
corresponding thread in the second half. Once a thread reaches the second half
it does nothing further. Eventually only thread zero remains and it contains
the value of the sum. Note that this approach minimizes the length of the
computation chain which is an important consideration when working with
floats to avoid round off error. The computation of At y is done similarly.
For the second part of solving the small system, we copy the problem to
the CPU and solve there. The fact that it is only a 10 × 10 system means
that this is fast. The solution of the small system is the vector of regression
coefficients.
However, the step is not yet done since we are doing multiple regressions.
The regression coefficients yield an implied continuation value for every path.
Our multiple regression algorithm requires us to discard the fraction 1 − θ
of paths which are furthest from the exercise boundary, that is the ones that
yield the largest absolute value for discounted continuation value minus dis-
counted exercise value. First, a cut-off level is found which gives the threshold
above which paths are discarded. This is done by repeated bisection with the
counting being done using the thrust transform and count algorithms. Second,
the data for the remaining paths are moved to be contiguous in memory. This
is performed using thrust’s scatter-if algorithm. The process is repeated on
the remaining data until the preset regression depth is reached (e.g., 5) or
too few paths remaining according to a preset cut-off such as 2048. For sub-
sequent estimates of continuation values, cascading through the coefficients is
performed until a threshold level is reached or the maximum depth is obtained.
Once the continuation values have been estimated, the next stage of the
algorithm is to set the paths’ stepwise values to be either the discounted future
flows for the path if exercise does not occur, or to the discounted exercise value
if it does. This is straightforwardly performed by a simple kernel. The final
action for the step is to deflate to the previous exercise time using the ratio
of the numeraire values at the two times. This is again straightforward.
We then simply repeat back to step 0. The average value after doing step
0 yields the first pass estimate of the discounted cash flows on or after the
first exercise time.
• The just reset forward rate at each step is extracted as is the adjoining
coterminal swap rate.
• The basis variables are computed from the three previous extracted val-
ues and stored.
• The cash flows along the paths are generated and discounted to exercise
dates. They are then aggregated to each exercise date and stored.
• The coterminal swap rates and their annuities are computed that is
the swap rates with the same final date but the first date varying (c.f.
Jamshidian 1997).
• The just reset forward rate at each step is extracted as is the adjoining
coterminal swap rate.
• The final discount factor for each step is also extracted.
• The basis variables are computed from the three previous extracted val-
ues and stored.
Manycore Parallel Computation 501
• The cash flows along the paths are generated up to the exercise time. If
exercise occurs, the exercise value is taken into account.
• The cash flows are deflated.
For each of these, a simple dedicated kernel is used. The only one of much
interest is the cash-flow generation kernel. For maximum flexibility, this is
templatized on three parameters: the product, the exercise value computer,
and the exercise strategy. The auxiliary data for the strategy is moved into
shared memory for rapid access. Once this has been done, the routine is very
similar to that for the cash-flow generation in the first phase.
Note that each batch produces one number which is the mean pay-off
value. Different batches can be achieved either by making the quasi-random
generator skip or by using scrambling.
• Compute the full-factor covariance matrix for the step. This involves
integrating the product of the volatility functions and multiplying by
the instantaneous correlation for each entry.
• Scale the rows of B so that the variances of the log rates are the same
as before factor reduction.
We note, however, that there is nothing special about 5 from the implementa-
tion perspective and the Kooderive code will function for a differing number
of factors.
Beveridge, Joshi, and Tang (2013) achieve a price of 1088 with a standard
error of 2.5 using double regression and the exclusion of suboptimal points.
They use policy iteration to get an increase of 6.5 with a standard error of
2. Thus their lower bound price is 1094.5 with a standard error of 3.2. Their
upper bound has the slightly lower value of 1094 with a standard error of 3.
We present results on timings and price for varying regression depths in
Table 16.1. We use 10 batches of 32,768 paths for the first pass and 32 of
them for the second. We separate the time actually spent doing regressions
from that spent on doing other parts of the exercise strategy building. The
price increases substantially when we increase from single regression to double
regression. Another slight increase occurs from double to triple and it is stable
thereafter. The fraction of paths retained after each regression is given by
0.11/d where d is the total regression depth. The implied price whilst slightly
lower than that in Beveridge, Joshi, and Tang (2013) is within one standard
error and so can be regarded as accurate.
In Table 16.2, we present timing comparisons for Kooderive versus
QuantLib. For simplicity in running the QuantLib (QL) code, we do not con-
sider the first two noncall coupons. The value of these is analytic in any case
and a simulation pricing of them is not needed. We see that the first pass of
path storage is 179 times faster in Kooderive. Similarly, the second pass pricing
Manycore Parallel Computation 503
TABLE 16.1: Timings and prices for the 40-rate cancellable swap with
varying numbers of regressions
Regression
depth 1 2 3 4 5
Time taken for 0.207 0.206 0.207 0.206 0.207
first pass paths
Time taken for 0.206 0.169 0.172 0.179 0.179
regression set
up
Time taken for 0.094 0.191 0.254 0.32 0.39
regression
Time taken for 0.697 0.71 0.729 0.747 0.765
second pass
Total time 1.204 1.276 1.362 1.452 1.541
Second pass 0.10740 0.10911 0.10924 0.10927 0.10925
price
Note: All standard errors are between 0.5 and 0.6 basis points.
is 188 times faster. Note that despite the fact that QuantLib early terminates
path generation when appropriate and Kooderive does not. The timing ratio
for the computation of regression coefficients is not quite so impressive as
only a 29 times speedup is achieved. Note, however, that the division between
stages in Kooderive is slightly different from QuantLib. The generation of
basis functions from basis variables is done in the first pass in QuantLib and
during strategy building in Kooderive. This means that our numbers overstate
the speed of the first pass and the slowness of strategy building. The overall
ratio, which is what really matters, at 137 is very large. We can run numbers
of paths in seconds that would previously have been regarded as silly in a live
environment.
Of course, a CPU implementation could also be multithreaded and the first
and third parts should scale well since they are embarrassingly parallel. For
the second part, one would have to solve similar challenges to that presented
for the GPU. We do not explore how to carry out such an implementation
here. However, we note that if the CPU used t threads, the best we could
hope for is a t-times speed up, and so we would still expect the GPU to 137/t
504 High-Performance Computing in Finance
times faster. It would take a very large number of CPU cores for the CPU to
be competitive against a single GPU.
16.8 Conclusion
We spent the first part of this chapter arguing that:
References
Abbas-Turki, L., and Lapeyre, B. 2009. American options pricing on multi-core
graphic cards. 2009 International Conference on Business Intelligence and Finan-
cial Engineering, 307, Beijing, China.
Manycore Parallel Computation 505
Andersen, L. and Piterbarg, V. V. 2010. Interest rate modelling. London, New York:
Atlantic Financial Press.
Beveridge, C. J., Joshi, M. S., and Tang, R. 2013. Practical policy iteration: Generic
methods for obtaining rapid and tight bounds for Bermudan exotic derivatives
using Monte Carlo simulation. Journal of Economic Dynamics and Control, 37 ,
1342–1361.
Brace, A., Gatarek, D., and Musiela, M. 1997. The market model of interest rate
dynamics. Mathematical Finance, 7 , 127–155.
Brigo, D. and Mercurio, F. 2001. Interest Rate Models: Theory and Practice. Hei-
delberg: Springer Verlag.
Brigo, D. and Mercurio, F. 2006. Interest Rate Models—Theory and Practice: With
Smile, Inflation and Credit. Springer.
Broadie, M. and Cao, M. 2008. Improved lower and upper bound algorithms for
pricing American options by simulation. Quantitative Finance, 8 , 845–861.
Carrière, J. F. 1996. Valuation of the early-exercise price for options using simu-
lation and nonparametric regression. Insurance: Mathematics and Economics, 19 ,
19–30.
Dang, D. M., Christara, C. C., and Jackson, K. R. 2010. Pricing multi-asset Amer-
ican options on graphics processing units using a PDE approach. 2010 IEEE
Workshop on High Performance Computational Finance (WHPCF) (pp. 1–8), New
Orleans, Louisiana, USA.
Dennard, R. H., Gaensslen, F. H., Rideout, V. L., Bassous, E., LeBlanc, A. R., and
Gaensslen, R. H. 1974. Design of ion-implanted mosfets with very small physical
dimensions. IEEE Journal of Solid-State Circuits, SC-9 (5), 256–268.
506 High-Performance Computing in Finance
du Toit, J. 2011. A high-performance Brownian bridge for GPUS: Lessons for band-
width bound applications.
Giles, M., Kuo, F. Y., Sloan, I. H., and Waterhouse, B. J. 2008. Quasi-Monte Carlo
for finance applications. ANZIAM Journal, 50 , C308–C323.
Giles, M. and Xiaoke, S. 2008. Notes on using the nVidia 8800 GTX graphics card.
([Link] old/[Link])
Hunter, C., Jäckel, P., and Joshi, M. S. 2001. Getting the drift. Risk, July, 81–84.
Jäckel, P. 2001. Monte Carlo Methods in Finance. New York: John Wiley & Sons
Ltd.
Jamshidian, F. 1997. LIBOR and swap market models and measures. Finance and
Stochastics, 1 , 293–330.
Joshi, M. 2003b. Rapid drift computations in the LIBOR market model. Wilmott
Magazine, May, 84–85.
Joshi, M. 2008. C++ Design Patterns and Derivatives Pricing (2nd edition).
London: Cambridge University Press.
Little, J. D. C. 1961. A proof for the queuing formula: L = λw. Operations Research,
9.3 , 383–387.
Shaw, W. T. and Brickman, N. 2009. Differential equations for Monte Carlo recy-
cling and a GPU-optimized normal quantile (Tech. Rep.). Citeseer.
Chapter 17
Practitioner’s Guide on the Use of
Cloud Computing in Finance
CONTENTS
17.1 What Is Cloud Computing? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 510
17.1.1 Why cloud computing and why now? . . . . . . . . . . . . . . . . . . . 511
17.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513
17.2.1 The taxonomy of parallel computing . . . . . . . . . . . . . . . . . . . . 513
17.2.2 Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515
17.3 Financial Applications of Cloud Computing . . . . . . . . . . . . . . . . . . . . . 517
17.3.1 Derivative valuation and pricing . . . . . . . . . . . . . . . . . . . . . . . . 517
17.3.2 Risk management and reporting . . . . . . . . . . . . . . . . . . . . . . . . 518
17.3.3 Quantitative trading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519
17.3.4 Credit scoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519
17.4 The Nature of Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 520
17.5 Implementation and Practices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521
17.5.1 Implementation example: Techila middleware
with MATLAB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521
17.5.2 Computational needs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523
[Link] Do you have a computational
bottleneck? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523
[Link] Where is your computational
bottleneck? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523
17.5.3 Solution selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524
17.5.4 Algorithm design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 527
17.6 Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 528
17.6.1 Portfolio backtesting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 528
[Link] Potential computing bottleneck . . . . . . . . . . . . . . 529
[Link] Computing environment and
architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 529
[Link] Experiment design and test result . . . . . . . . . . . . 530
17.6.2 Distributed portfolio optimization . . . . . . . . . . . . . . . . . . . . . . 531
[Link] Challenges in large-scale portfolio
construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531
[Link] Algorithm design for large-scale
mean-variance optimization problem . . . . . . . . . 532
509
510 High-Performance Computing in Finance
3,500,000
3,000,000
2,500,000
2,000,000
1,500,000
1,000,000
500,000
0
se
ab
ic
es
al
es
er
r
n
SV
e
in
io
A
ig
ci
th
th
nc
nc
en
tl
/I
C
er
es
ut
an
de
ea
O
en
ie
ie
ef
/IT
ne
ld
r ib
fin
W
ca
sc
sc
nm
gi
ca
A
st
eo
o-
/A
s/
en
ED
ni
di
er
ic
Bi
ity
ha
om
ov
d
al
rs
an
ic
ec
ve
on
em
M
CC
ni
Ec
Ch
U
D
FIGURE 17.1: HPC spending by sector 2013 versus 2018. (Adapted from
Joseph, E. et al., 2014. IDC HPC update at ISC’14.)
512 High-Performance Computing in Finance
1900s 1940s 1950s 1960s 1970s 1980s 1990s 2000 2000s 2006 2012 2015
By 2012, 80% of Fortune 1000 enterprises will pay for some cloud
computing service and 30% of them will pay for cloud computing
infrastructure. Through 2010, more than 80% of enterprise use of
2500
2000
1500
1000
500
0
05 20 6
05 20 2
06 20 6
06 20 2
07 20 6
07 20 2
08 20 6
08 20 2
09 20 6
09 20 2
10 20 6
10 20 2
11 20 6
11 20 2
12 20 6
12 20 2
13 20 6
13 20 2
14 20 6
14 20 2
15 20 6
20 2
6
20 07– 4-0
20 1– -1
20 07– 5-0
20 1– -1
20 07– 6-0
20 1– -1
20 07– 7-0
20 1– -1
20 07– 8-0
20 1– -1
20 07– 9-0
20 1– -1
20 07– 0-0
20 1– -1
20 07– 1-0
20 1– -1
20 07– 2-0
20 1– -1
20 07– 3-0
20 1– -1
20 07– 4-0
1– -1
-0
-0 04
-0 05
-0 06
-0 07
-0 08
-0 09
-0 10
-0 11
-0 12
-0 13
-0 14
15
0
1
04 20
20 01–
-
-
04
20
17.2 Background
17.2.1 The taxonomy of parallel computing
Computer processors process instructions sequentially. Thus traditional
computing problems are serial problems by such design. The birth of multi-
processors has innovated a new type of computing problem: how to utilize the
parallel structure.
Parallel computing problem, in contrast to serial computing problems,
refers to the type of computing problems that can be divided into subproblems
to be processed simultaneously. Based on the dependency structure of sub-
problems, it can be further classified into embarrassingly parallel and nonem-
barrassingly parallel computing problems. If the processing of one subproblem
is independent of other subproblems, then it is an embarrassingly parallel
computing problem. It is called nonembarrassingly parallel computing prob-
lem otherwise.
The following figure illustrates the structure of embarrassingly parallel
and nonembarrassingly parallel problems. There is no communication between
jobs in embarrassingly parallel case as in Figure 17.4a, while communication
is required in nonembarrassingly parallel case as in Figure 17.4b.
By the nature of underlying problem, it can be classified as data-
parallel problem and task-parallel problem. While data parallelism focuses
on distributing data across different processors, task parallelism focuses on
distributing execution processes (subtasks) to different processors.
Another important aspect of parallel computing is whether the parallel
computing problem is a scalable problem. A scalable problem has either a
scalable problem size or scalable parallelism. Either the solution time reduces
514 High-Performance Computing in Finance
Information Information
Jobs Jobs
Results Results
17.2.2 Glossary
• Computing instance: refers to a (virtual) server instance that is linked
to a computing network to provide computing resources. To offer flex-
ibility to their customers, cloud vendors offer different types of nodes
516 High-Performance Computing in Finance
• Wall clock time (WCT): Wall clock time is the human perception of the
passage of time from the start to the completion of a task.
• CPU time: The amount of time for which a CPU was used for processing
instructions of a computer program or operating system, as opposed to,
for example, waiting for input/output (I/O) operations or entering low-
power (idle) mode.
• Workload : In cloud computing, workload is measured by the amount of
CPU time and memory consumption.
• CPU efficiency: CPU efficiency measured as the CPU time used for com-
putation divided by the sum of CPU time and I/O time used for data
transfer. Thus CPU efficiency measures the overhead of paralleling a com-
putation. A low CPU efficiency, in general, indicates a high overhead.
• Acceleration factor : Acceleration factor is measured by wall clock time of
running the program locally on the end user’s computer divided by the
wall clock time of running it on the cloud. In an ideal case, the acceler-
ation factor can be linear in the number of cores used for computation.
• Total cost of ownership (TCO): TCO measures both direct and indirect
costs of deploying the solution. In cloud computing and alternative com-
puting solutions, TCO includes the cost of: hardware, software, operat-
ing expenses (such as infrastructure, electricity, outage cost, and so on),
and long-term expenses (such as replacement, upgrade and scalability
expenses, decommissioning, and so on).
increased rapidly after 2009 and so have the search volumes for Solvency
II and Basel III. We are not suggesting any causality between the increasing
attention of cloud computing and that of risk regulation. However, such trends
show the right timing of popularity of cloud computing as a potential solution
for regulation-oriented computation needs.
The financial industry started to embrace cloud solutions, especially when
they are integrated to support the need for an effective and timely risk man-
agement. IBM’s survey (reference) on the implementation of cloud computing
for Solvency II in the insurance industry points out the trend of adopting cloud
computing as part of the implementation strategy for risk management. Of
the 19 firms, 27% either have successfully implemented cloud solutions or are
in the process of implementing cloud solutions. Another 23.8% have started
considering cloud solutions.
One of the key questions is whether a cloud solution is cost-efficient. Lit-
tle (2011) from Moody’s Analytics analyzes the potential usage of cloud for
economic scenario generation and Solvency II in general. They conclude:
see (Hand and Henley, 1997; West, 2000; Baesens et al., 2003) and many
others.
The large number of consumers and the variety of credit report formats
create a big data problem. To solve the classification problem over the massive
data set of credit history of consumers, an efficient data storage and processing
system is required.
As a summary, modern day financial computing requires:
• Data security
There are several solutions for the type 1 challenge. In a production sce-
nario, for example, when implementing a high frequency trading algorithm,
hardware accelerators, such as FPGA and GPU, may be better alternatives
to cloud computing.4 The reasons are:
4 Although in practice, there are firms implementing their algorithms in the cloud to gain
benefit for lower latency in connection to exchange when colocation is not possible or too
costly to implement.
524 High-Performance Computing in Finance
5 GCE machine types are charged a minimum of 10 minutes. After 10 minutes, instances
Azure-A8 (Win)
AWS-c3.8xlarge (Win)
AWS-c3.8xlarge (Linux)
6 For more information on other user scenarios, please refer to Techila’s benchmark report
120
Linux Windows
100
CPU cores online/%
80
GCE-n1-standard-8 (Linux)
60 AWS-c3.8xlarge (Linux)
Azure-extra large (Win)
AWS-c3.8xlarge (Win)
40 Azure-A8 (Win)
20
0
0 200 400 600 800 1000 1200
Time/seconds
Normalized price-performance
1
Execution time on Azure-A11 (Win)
reference servers
Azure-D14 (Win)
0.8
GCE-n1-standard-16
Azure-A11 (Linux) (Win)
Normalized price
0.6
Azure-D14 (Linux) AWS-c3.8xlarge
(Win)
AWS-c4.4xlarge (Win)
0.4
AWS-c3.8xlarge
(Linux)
0.2
GCE-n1-standard-16 (Linux)
AWS-c4.4xlarge (Linux)
0
0 0.2 0.4 0.6 0.8 1
Normalized average execution time
computation after correcting for the difference in pricing model. For example,
for portfolio simulation, the simplified cost ranges from 0.58 USD (GCE with
n1-standard-16 instance on Linux Debian 7 operating system) to 1.99 USD
(Azure A11 on Windows Server 2012 R2). The difference is significant (about
4 times). However, if we take into consideration the pricing model, the real
cost (that is the billing from vendor) differs even more. The cost of using
AWS is more than 10 times the cost using Azure or GCE. This is because
Practitioner’s Guide on the Use of Cloud Computing in Finance 527
AWS uses an hour-based pricing model. Users should be able to allocate their
computation as units of hours to reduce the cost of computation with AWS.
The report provides valuable insight about the effect of pricing models
on the cost of cloud computing. Together with the benchmarks on instance
performances, this should provide readers some information on how to choose
cloud vendors.
their strategies and understand the potential risks by backtesting on the prior
time period using the datasets that are available today. Computer simulation
of the strategy/model is the main part of modern backtesting procedure. It
might be very time-consuming due to a few computing issues raised during the
procedure. Thus it is necessary to seek acceleration using modern techniques
and shorten time-to-market in the rapidly changing financial world.
1. The datasets used for testing might be huge while the requested out-
put (performance measure) is relatively small. For example, a portfolio
consists of N assets. Its historical return data series over the past T
time period is N × T . The covariance matrix is of size N × N . In case
N = 75,000, the memory size of the covariance matrix is 450 GB (double
precision).
Cloud computing has its natural advantage of processing large data. In gen-
eral, CPU threads have better performance than GPU threads, especially in
handling complex logic branching operations. Thus cloud computing seems
to be a suitable technique for accelerating backtesting procedure. To illus-
trate how to use cloud computing for backtesting, we did some experiments
in Microsoft Windows Azure Cloud, as well as a local cluster. The results are
presented in the following sections.
[Link]
portfolios-with-financial-toolbox
Practitioner’s Guide on the Use of Cloud Computing in Finance 531
0.6
0.4
Portfolio returns
0.2
0
0.8
–0.2 0.6
2000 k
2002 0.4 r is
2004 io
2006 2008 0.2 r t fol
2010 Po
2012 0
2014
covariance or its inverse of the return series. When using the sample variance
as the expected variance, the estimation error could be large. To achieve a rea-
sonable accuracy, as stated in DeMiguel et al., 2009, an in-sample period of
3000 months is needed for a portfolio of 25 assets to beat naive 1/N strategy.
The problem becomes even more significant when the portfolio size is large.
As noticed in Fan et al., 2011, estimating the moments of high-dimension
distribution is challenging. Among them, one crucial problem is the spurious
correlation arise with the curse of dimension.
On the other hand, the problem is also challenging numerically. First, when
the degree of freedom is large, finding optimum in high-dimension parameter
space is almost impossible to achieve in reasonable time with general optimiz-
ers. Additionally, we need to take good care of the property of the matrices
to retain feasibility. It is also a data-intensive problem from a hardware per-
spective. Suppose we are dealing with 75,000 assets (data of the universe), the
covariance matrix has 2,812,537,500 parameters. That means, it takes more
than 20 GB of memory if we are using double precision. Last but not least,
the matrix operation for matrix size of M × N has a linear computational cost
increase with the number of columns.
min wT Cw
s.t.
wT μ = b
wT 1N = 1
IC (w) = 0, if w ∈ C
(17.3)
IC (w) = ∞, if w ∈
/ C,
alternative solutions8 :
17.7.2 Risks
The risk of IT system failure is nonnegligible in the finance industry. The
following two examples provide some ideas of the importance of having backup
IT systems and highly reliable IT systems.
IT failure can be costly; however, what would be the best way of risk man-
agement for IT systems? Cloud computing can be viewed as an insurance of
IT. While diversification is a widely accepted concept in the finance industry,
cloud computing may be an easy way to diversify the IT failure risk for the
finance industry. The distributed file systems, either in-house or in cloud ven-
dors’ data centers, protect data from hardware failures. Cloud vendors also
offer access to computing to data centers located in various locations around
the world. Such a scheme provides constant supply of computing resources in
case of catastrophic tail events, such as earthquakes, tsunamis, and so on.
References
Armbrust, M., Fox, A., Griffith, R., Joseph, A.D., Katz, R., Konwinski, A., Lee, G.,
Patterson, D., Rabkin, A., Stoica, I. et al., 2010. A view of cloud computing.
Communications of the ACM 53, 50–58.
Baesens, B., Van Gestel, T., Viaene, S., Stepanova, M., Suykens, J., Vanthienen, J.,
2003. Benchmarking state-of-the-art classification algorithms for credit scoring.
Journal of the Operational Research Society 54, 627–635.
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J., 2011. Distributed optimiza-
tion and statistical learning via the alternating direction method of multipliers.
Foundations and Trends in Machine Learning 3, 1–122.
536 High-Performance Computing in Finance
DeMiguel, V., Garlappi, L., Uppal, R., 2009. Optimal versus naive diversification:
How inefficient is the 1/n portfolio strategy? Review of Financial Studies 22,
1915–1953.
Fan, J., Lv, J. Qi, L., 2011. Sparse high dimensional models in economics. Annual
Review of Economics 3, 291.
Hand, D.J., Henley, W.E., 1997. Statistical classification methods in consumer credit
scoring: A review. Journal of the Royal Statistical Society. Series A (Statistics
in Society) 160, 523–541.
Joseph, E., Conway, S., Dekate, C., Cohen, L., 2014. IDC HPC update at ISC’14.
Kanniainen, J., Lin, B., Yang, H., 2014. Estimating and using garch models with
vix data for option valuation. Journal of Banking & Finance 43, 200–211.
Kanniainen, J., Piché, R., 2013. Stock price dynamics and option valuations under
volatility feedback effect. Physica A: Statistical Mechanics and its Applications
392, 722–740.
Little, M., 2011. ESG and Solvency II in the cloud. Moody’s Analytics Insights.
Published in Barrie+Hibbert (later Moody’s Analytics) magazine, see http://
[Link]/[Link].
Mell, P., Grance, T., 2009. The NIST definition of cloud computing. National Insti-
tute of Standards and Technology 53, 50.
Techila, T., 2015. Cloud HPC in finance, cloud benchmark report with real-world
use-cases.
West, D., 2000. Neural network credit scoring models. Computers & Operations
Research 27, 1131–1152.
Yang, H., Kanniainen, J., 2017. Jump and volatility dynamics for the S&P 500:
Evidence for infinite-activity jumps with non-affine volatility dynamics from
stock and option markets. Review of Finance 21, 811–844.
Chapter 18
Blockchains and Distributed Ledgers
in Retrospective and Perspective
Alexander Lipton
CONTENTS
18.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 538
18.2 Blockchains and Distributed Ledgers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 539
18.3 Historical Examples of BCs and DLs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 540
18.3.1 Genealogical trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 540
18.3.2 Land titles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543
18.4 The Bitcoin Ecosystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543
18.5 Potential Usages of DLT in Banking . . . . . . . . . . . . . . . . . . . . . . . . . . . . 547
18.5.1 Banking X-Road . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 547
18.5.2 Trade execution, clearing, settlement . . . . . . . . . . . . . . . . . . . 548
18.5.3 Global payments, trade finance, rehypothecation . . . . . . . 549
18.6 Monetary Circuit and Money Creation . . . . . . . . . . . . . . . . . . . . . . . . . . 551
18.6.1 Monetary circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 551
18.6.2 General aspects of money creation . . . . . . . . . . . . . . . . . . . . . . 552
18.6.3 Money creation by individual banks . . . . . . . . . . . . . . . . . . . . 553
18.6.4 Money creation by the banking system . . . . . . . . . . . . . . . . . 554
18.6.5 Bank lending versus bitcoin and P2P lending . . . . . . . . . . . 554
18.7 CBDCs and Negative Interest Rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555
18.7.1 Why CBDCs? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555
18.7.2 How CBDCs can be issued? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 556
18.7.3 How CBDCs can be used to implement
the Chicago Plan? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557
18.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 558
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 558
Alexander Ostrovsky
Without a Dowry, A drama in four acts
537
538 High-Performance Computing in Finance
18.1 Introduction
In this chapter, we discuss blockchains (BCs) and distributed ledgers (DLs)
in retrospective and prospective, with an emphasis on their applications to
money and banking in the twenty-first century. Additional aspects are dis-
cussed in References 1 through 3.
Civilization is not possible without money, and, by extension, banking,
and vice versa. Through the ages, money existed in many forms, stretching
from the exquisite electrum coins of the Phrygian King Midas, giant stones of
Polynesia, cowry shells, the paper money of Khublai Khan and other rulers
who came after him, to digital currencies, and everything in between. The
meaning of money has preoccupied rulers and their tax collectors, traders,
entrepreneurs, laborers, economists, philosophers, writers, stand-up comedi-
ans, and ordinary folks alike. It is universally accepted that money has several
important functions, such as a store of value, a means of payments in general,
and taxes in particular, and a unit of account. The author shares the view of
Aristotle formulated in his Ethics: “Money exists not by nature but by law”
[4]. Thus money is linked to government and government to money. In fact,
anything taken in lieu of tax eventually becomes money.
For the last five centuries, money has gradually assumed the form of records
in various ledgers. This aspect of money is all important in the modern world.
At present, money is nothing more than a sequence of transactions, organized
in ledgers maintained by various private banks, and by central banks who pro-
vide means (central bank cash) and tools (various money transfer systems)
used to reconcile these ledgers. In addition to their ledger-maintaining func-
tions, private banks play a very important role, which central banks are not
equipped to perform. They are the system gatekeepers, who provide know your
customer (KYC) services, and system policemen, who provide antimoney laun-
dering services (AMLs). We argue that, in addition to the more obvious areas
of application of distributed ledger technology (DLT), for instance, digital
currencies (DCs), including central bank issued digital currencies (CBDCs),
DLT can be used to solve such complex issues as trust and identity, with an
emphasis on the KYC and AML aspects [5]. Further, given that all banking
activities boil down to maintaining a ledger, judicious applications of DLT
can facilitate trading, clearing and settlement triad, payments, trade finance,
and so on.
The chapter is organized as follows. We introduce DLs and briefly discuss
their different types in Section 18.2. We present historical instances of BCs
and DLs in Section 18.3, and describe what happened when they underwent
hard forking. Bitcoin, the most popular current application of DLT, is covered
in Section 18.4, where a few less well-known facts about bitcoin are presented.
Potential applications of DLT to banking are discussed in Section 18.5. As an
interesting potential area of applications of BC/DLT, we introduce a mod-
ern version of monetary circuit in Section 18.6 and show that it can benefit
from the BC/DL framework because money moves in a gigantic circle (or
Blockchains and Distributed Ledgers in Retrospective and Perspective 539
In order to resolve it, the peers of France applied the Salic law of Succession, by
which persons descended from a previous sovereign only through a woman are
not eligible to occupy the throne. The House of Plantagenet did not accept
this outcome and started the Hundred Years’ War (1337–1453) against the
House of Valois, a cadet branch of the Capetian dynasty, which was a dynastic
conflict for control of the Kingdom of France. In the end, the Valois established
themselves as Kings of France at the expense of the Plantagenets.
Similar conflicts occurred with regularity and for very similar reasons
throughout history. For example, the War of the Austrian Succession (1740–
1748), which involved all major powers of Europe, was fought to settle the
question of the Pragmatic Sanction and to decide whether the Habsburg
hereditary possessions could be inherited by a woman. It was finally resolved
in favor of Maria Theresa, who became the only female ruler of the Habsburg
dominions.
Closer to our times, an interesting example of Ethereum hard forking hap-
pened in July of 2016, as a result of fixing a theft of 60 Mil USD worth of
Ethereum from DAO. Buterin [12] described the situation as follows:
Louis X King of John I King of
542
France France
1289-1314-1316 1316-1316-1316
Clementia of
Hungary
Philip V King of
France
1293-1316-1322
Charles IV King of
France
1294-1322-1323
Hugh Capet King of Philip III King of Philip IV King of Edward II King of Kings of France and
the Franks France France England England of the House
941-987-996 1245-1270-1285 1268-1285-1314 1284-1307-1327 of Plantagenet
Eight generations
Isabella of
Adelaide Joan of Navarre Isabella of France
Aragon
Margaret of Blanche of
Anjou Navarre
FIGURE 18.2: Genealogical chart (chain) of the House of Capet. Hard fork was resolved in favor of the House of Valois at
the expense of the House of Plantagenet by inventing the Salic law. The Hundred Years’ War commence as a result. (Adapted
from Wikipedia.)
Blockchains and Distributed Ledgers in Retrospective and Perspective 543
Once again, we see that ambiguity within a BC cannot be resolved via its
intrinsic mechanisms.
Title deeds are paper documents showing the chain of ownership for
land and property. They can include: conveyances, contracts for sale,
wills, mortgages and leases.
It is clear that titles are BCs currently held in a central repository; instead
of miners, succession is verified by notaries. Titles are meaningful candidates
for being treated on DL. However, there are still some issues which need
to be resolved before it can be done. For example, recent lawsuits by Mark
Zuckerberg seeking to force hundreds of Hawaiians to sell to him small plots of
land located within the external boundaries of his 700-acre beachfront prop-
erty on the island of Kauai, is a good case in point. It illustrates that in some
instances, it is not possible to identify the first owner of land, and then build
a chain of ownership from the original owner to the present, resulting in an
ambiguous and potentially vulnerable BC.
1 There is a heated debate of the true identity of Satoshi Nakamoto. Nick Szabo is often
mentioned as a potential inventor of bitcoin. Here is a small piece of evidence, which might be
of interest. Nakamoto’s initials are SN, while Szabo’s are NS. However, Szabo is originally a
Hungarian name, where the last name comes first, so his initials would be SN. An interesting
coincidence.
544 High-Performance Computing in Finance
All building blocks of the bitcoin ecosystem have been known for some
time, including two of the most important techniques in public-key cryp-
tography, a one-way hash function and the Elliptic Curve Digital Signature
Algorithm (ECDSA), (see References 14 through 16). Proof of work, based
on cryptographic hash functions, specifically SHA-256, is similar to hashcash
invented by Back [17], while Merkle trees were introduced in the seminal paper
by Merkle [18].
Ignoring such nuances as wallets, and so on, we can describe the basic setup
as follows. Participants of the system are represented by their public/private
key pairs. The main control variable is the number of bitcoins belonging to
a particular public key. This number is known to all participants at all times
(in theory). The owner of a particular public key broadcasts their intent to
send a certain quantity of bitcoins to another public key. Miners aggregate
individual transactions into blocks, verify them to ensure that there is no
double spend by competitively providing proof of work, and receive mining
rewards in bitcoins. A transaction is confirmed if there are at least six new
blocks built on the top on the block to which it belongs. A typical block is
shown in Figure 18.3.
The size of mining rewards is halved at regular intervals so that the total
number of bitcoins in circulation converges to 21 Mil. Currently there are
about 16 Mil bitcoins in circulation. It is believed that at least one Mil are
irretrievably lost or stolen. Some 450,000 blocks have been mined so far; a
new block is mined every 10 minutes on average. Due to the fact that mining
rewards are paid with new bitcoins, transaction costs are claimed to be very
low. This is a nifty bit of sleight of hand, however, because the value of existing
bitcoins is constantly diluted. Some representative bitcoin statistics is given in
Figure 18.4.
Bitcoin promises are grand. Its proponents expect it to become a supra-
national currency eventually supplanting national currencies, which, in their
minds, can be easily manipulated. Many even believe that bitcoin is the mod-
ern digital version of gold, due to the effort required for PoW [13]. Whilst
bitcoin is clearly an impressive breakthrough, reality is much less grand than
Total BTC Transactions last 24 h
1.80E+07 4.00E+05
1.60E+07 3.50E+05
1.40E+07 3.00E+05
1.20E+07
2.50E+05
1.00E+07
2.00E+05
8.00E+06
1.50E+05
6.00E+06
4.00E+06 1.00E+05
2.00E+06 5.00E+04
0.00E+06 0.00E+00
2013-05-06 2013-11-22 2014-06-10 2014-12-27 2015-07-15 2016-01-31 2016-08-18 2017-03-06 2017-09-22 2013-05-06 2013-11-22 2014-06-10 2014-12-27 2015-07-15 2016-01-31 2016-08-18 2017-03-06 2017-09-22
2.00E+09
200
0.00E+00
2013-05-06 2013-11-22 2014-06-10 2014-12-27 2015-07-15 2016-01-31 2016-08-18 2017-03-06 2017-09-22 0
2008-02-22 2009-07-06 2010-11-18 2012-04-01 2013-08-14 2014-12-27 2016-05-10 2017-09-22
perception, and is quite telling (at the time of writing this paper):
6. Miners are arranged in gigantic pools (so much for peer to peer (P2P)
mining!): AntPool—18.7%, F2Pool—17.7%, BitFury—7.7%, BTCC
Pool—7.4%, [Link]—7.3%. Thus a 51% attack becomes possible!
There is a very high probability that six consecutive blocks will be mined
by the same actor (so much for checks and balances!). Most of all these
pools are Chinese, partly due to low electricity cost, partly due to high-
tech advances. Not only miners are predominantly Chinese, so are the
players—91% CNY, 7% USD, 1% EUR.
7. At the moment, the main purpose of using bitcoin is for speculation and
circumvention of capital controls in China.
It is truly amazing to see how miners are prepared to perform socially useless
tasks, as long as they are paid for it. A telling historical analogy jumps to
mind. During the contest for design of the dome of Santa Maria del Fiore,
it was suggested to use dirt mixed with small coins to serve as scaffolding.
After the dome’s completion, the dirt was to be cleared away for free by the
profit-seeking citizens of Florence (proto-miners). It is clear that BC/DL is
still awaiting its Brunelleschi who figured out how to build the dome without
scaffolding [19].
T. J. Dunning, quoted by Karl Marx in Das Kapital [20], put it
succinctly:
With adequate profit, capital is very bold. A certain 10 per cent will
ensure its employment anywhere; 20 per cent certain will produce
eagerness; 50 per cent, positive audacity; . . .
Blockchains and Distributed Ledgers in Retrospective and Perspective 547
1. Post-trade processing
2. Global payments
3. Trade finance
2 Other countries tried to follow suit but not all attempts were unqualified successes.
3 Corda, recently described in a white paper by R3, might be a step in this direction [22].
548 High-Performance Computing in Finance
4. Rehypothecation
5. Syndicated loans
4 The thriller “Ronin,” which is dealing with DvP, is not critically acclaimed [24]. In the
author’s view, it takes the difficult challenges of transactions among many untrustworthy
parties which underlie many great thrillers and brings them to the fore, arguably making
“Ronin” arguably one of the greatest of all thrillers ever (Perhaps the ending would have
been different had the characters known about DLT).
Blockchains and Distributed Ledgers in Retrospective and Perspective 549
2. Netting
1. Cost
2. Speed
(a) Corporation
Central securities
depository
Buyer’s custodian Seller’s custodian
Buyer Clearing counterparty Seller
Buyer’s broker Seller’s broker
Exchange
Market maker
(b) Corporation
Market maker
(c) Corporation
BC/DL
Buyer Seller
Buyer’s broker Seller’s broker
Exchange
Market maker
FIGURE 18.5: (a) Current organization of share trading. (b) First improve-
ment of stock trading setup, CSD and CCP are replaced by BC/DL. (c) Second
improvement of stock trading setup, in addition to CSD and CCP, custodians
and stock transfer agents are replaced by BC/DL.
For trade finance, there is the potential to use BC/DL to simplify the flow
of information among all participants and smart contracts to partially solve
the DvP problem.
In the rehypothecation setup, it is possible to use BC/DL to untangle
the ownership of the collateral. However, this is more of an accounting tool,
rather than a comprehensive solution, because in many instances the actual
legal ownership of collateral cannot be established with certainty.
Blockchains and Distributed Ledgers in Retrospective and Perspective 551
Money and currency are very strange things; They keep on going
up and down and no one knows why; If you want to win, you lose,
however hard you try.
B1 B2 G
B4 B3
PB CB
H F
NFA NFA
In the author’s opinion, the functioning of the economy and the role of
money is best described by monetary circuit theory (MCT), which provides
a unifying framework for specifying how money lubricates and facilitates pro-
duction and consumption cycles in the society. MCT describes in the most
precise way the dynamics of the economy and explains how and by whom
money is created. More specifically, it describes the interactions among the
five sectors, including government, central bank, private banks, firms, and
households. As part of the monetary circuit, private banks play an outstand-
ing role as credit money creators. In this framework, central banks do not cre-
ate money directly, but rather accelerate or slow down the process of money
creation by private banks by providing a unique universal medium in the
form of electronic cash for different banks to control their inventories of assets
and liabilities. A schematic representation of the monetary circuit is given in
Figure 18.6, which represents money flowing among the above-mentioned five
sectors of the economy.
the latter theory severely underemphasizes the unique and special role of the
banking sector in the process of money creation, and cannot rationally explain
things like the global financial crisis of 2007–2008 and other similar events,
which happen with disconcerting regularity. This aspect is particularly impor-
tant because currently there is a profound lack of appreciation on the part of
the conventional economic paradigm of the special role of banks. For example,
banks are excluded from widely used dynamic stochastic general equilibrium
models, which are influential in contemporary macroeconomics and popular
among central bankers, in spite of the fact that they systematically fail to
produce any meaningful results [31]. It is clear that a vibrant financial system
cannot operate without banks, and that the banking system is very complex
and difficult to regulate because banks become interconnected as a part of
their regular lending activities. In addition to their money creation role, banks
regulate access to the monetary system, by providing KYC and AML services.
destroyed, but the interest stays in the system. If the borrower defaults, the
money stays in the system indefinitely. The chain of money transfers from one
owner to the next is naturally described by a BC, ideally residing on DL.
FIGURE 18.9: Money creation by two banks. The case of borrower’s default.
(a and b) Assets and liabilities of the first bank. (c and d) Assets and liabilities
of the second bank. Capital and CB cash of the first bank decrease, while
capital and CB cash of the second bank increase.
P2P—only money they have. Hence, the P2P impact on the financial system
as a whole is very limited.
6 One cannot help but notice with a modicum of satisfaction, that critics of the celebrated
Vasicek model for interest rates [35], who vigorously attacked him for allowing short rates
to become negative, proved to be completely wrong.
Blockchains and Distributed Ledgers in Retrospective and Perspective 557
country’s GDP see, e.g., [38]. It will smooth the motion of the wheels of com-
merce and help the unbanked to become participants in the digital economy,
thus positively affecting the society at large.
18.8 Conclusion
While the idea of BC/DLs is not new, modern technology gives it a new
lease of life. DLT opens new possibilities for making conventional banking and
trading activities less expensive and more efficient by removing unnecessary
frictions. Moreover, if built with skill, knowledge, and ambition, it has the
potential for restructuring the whole financial system on new principles. We
emphasize that achieving this goal requires overcoming not only technical but
also political obstacles.
While DLT has numerous applications, it is not entirely clear which finan-
cial applications should be handled first. Exchanges, payments, trade finance,
rehypothecation, syndicated loans, and other similar areas, where frictions
are particularly high, are attractive candidates. DCs, including CBDCs, are
another very promising venue.
Currently, many applications of DL and related technology appear to be
misguided. In some cases, they are driven by a desire to apply these tools for
their own sake, rather than because the result would be clearly superior. In
other cases, they are driven by a failure to appreciate that the current systems
may not be as they are because of technological reasons, but rather because
of business and other considerations.
558 High-Performance Computing in Finance
Acknowledgments
The invaluable help of Marsha Lipton from Numeraire Financial in think-
ing about and preparing this chapter cannot be overestimated. I am grateful to
several colleagues, including Alex Pentland and David Shrier from MIT, Damir
Filipovic from EPFL, Matheus Grasselli from McMaster, Julian Phillips from
Standard Charter Bank, and Paolo Tasca from UCL for their help and sugges-
tions. As a CEO of StrongHold Labs, I am currently working on a new type
of a digital bank, which will be utilizing some of the ideas presented in this
chapter. This chapter is reprinted with permission from the Journal of Risk
Finance, 19(1), 2018.
References
1. Lipton, A., 2016, Banks must embrace their digital destiny, Risk Magazine, Vol.
29, No. 8.
2. Lipton, A., Shrier, D., and Pentland, A., 2016, Digital Banking Manifesto: The
End of Banks? in Frontiers of Financial Technology, Visionary Future, pp. 117–
140.
3. Tasca, P., Aste, T., Pelizzon, L., and Perony, N. (Eds.) 2016, Banking Beyond
Banks and Money: A Guide to Banking Services in the Twenty-First Century,
Springer, Switzerland.
4. Aristotle, Aristotle’s Nicomachean Ethics, R.C. Bartlett and S.D. Collins (trans-
lators), University of Chicago Press, Reprint edition.
5. Zyskind, G., Nathan, O., and Pentland, A., 2015, Enigma: Decentralized com-
putation platform with guaranteed privacy, MIT Working Paper.
6. Allen, W.R., 1993, Irving Fisher and the 100 percent reserve proposal, The
Journal of Law and Economics, Vol. 36, No. 2, pp. 703–717.
7. Beneš, J. and Kumhof, M., 2012, The Chicago plan revisited, IMF Working
Paper.
8. Miller, S.J., 1993, A fully replicated distributed database system, Research Note
ERL-0719-RN, Electronics Research Laboratory.
10. Nakamoto, S., 2008, Bitcoin: A peer-to-peer electronic cash system, Working
Paper.
11. Chaum, D., 1983, Blind signatures for untraceable payments, in Advances in
Cryptology, Springer, US, pp. 199–203.
13. Popper, N., 2015, Digital Gold: The Untold Story of Bitcoin, Penguin, UK.
14. Diffie, W., and Hellman, M., 1976, New directions in cryptography, IEEE Trans-
actions on Information Theory, Vol. 22, No. 6, pp. 644–654.
15. Miller, V.S., 1986, Use of Elliptic Curves in Cryptography. In: Williams H.C.
(Eds). Advances in Cryptology—CRYPTO ’85 Proceedings, Lecture Notes in
Computer Science, Vol. 218. Springer, Berlin, Heidelberg, pp. 417–426.
17. Back, A., 2002, Hashcash—A denial of service counter-measure, Working Paper.
18. Merkle, R.C., 1987, A digital signature based on a conventional encryption func-
tion, in Conference on the Theory and Application of Cryptographic Techniques,
Springer, Berlin, Heidelberg, pp. 369–378.
19. King, R., 2013, Brunelleschi’s Dome: How a Renaissance Genius Reinvented
Architecture, Walker & Company, New York, NY.
20. Marx, K., 1867, Das Kapital: Kritik der Politischen Őkonomie, Verlag von Otto
Meisner, Germany.
21. Ansper, A., Buldas, A., Freudenthal, M., and Willemson, J., 2003, Scalable
and efficient PKI for inter-organizational communication, in Computer Security
Applications, Proceedings of 19th Annual Conference, IEEE, pp. 308–318.
22. Brown, R.G., Carlyle, J., Grigg, I., and Hearn, M., 2016, Corda: An Introduc-
tion, R3 CEV Working Paper.
23. Skeel, D., 2010, The New Financial Deal: Understanding the Dodd–Frank Act
and Its (Unintended) Consequences, John Wiley & Sons, Hoboken, NJ.
24. Turan, K., 2004, Never Coming to a Theater Near You: A Celebration of a
Certain Kind of Movie, PublicAffairs, New York.
25. Bloch, M., 1953, Mutations monétaires dans l’ancienne France: Première Partie,
Annales Economies, Societes, Civilisations, Vol. 8, No. 2, pp. 145–158.
26. Keynes, J.M., 1936, General Theory of Employment, Interest and Money,
Macmillan, London.
27. Keen, S., 2001, Debunking Economics: The Naked Emperor of the Social Sci-
ences, Zed Books, London & New York.
560 High-Performance Computing in Finance
28. Werner, R.A., 2014, Can banks individually create money out of nothing?—The
theories and the empirical evidence, International Review of Financial Analysis,
Vol. 36, pp. 1–19.
29. Lipton, A., 2016, Modern monetary circuit theory, stability of interconnected
banking network, and balance sheet optimization for individual banks, Interna-
tional Journal of Theoretical and Applied Finance, Vol. 19, No. 6, pp. 1650034-1–
1650034-57.
30. Robinson, J., 1977, Michal Kalecki on the economics of capitalism, Oxford Bul-
letin of Economics and Statistics, Vol. 39, No. 1, pp. 7–17.
31. Buiter, W.H., 2009, The unfortunate uselessness of most “state of the art” aca-
demic monetary economics, MPR A Working Paper.
32. Rogoff, K.S., 2016, The Curse of Cash, Princeton University Press, Princeton
and Oxford.
33. Ilgmann, C., 2015, Silvio Gesell: “A strange, unduly neglected” monetary theo-
rist, Journal of Post Keynesian Economics, Vol. 38, No. 4, pp. 532–564.
34. Fisher, I., Cohrssen, H.R., and Fisher, H.W., 1933, Stamp Scrip, Adelphi Com-
pany, New York, NY.
35. Vasicek, O., 1977, An equilibrium characterization of the term structure, Journal
of Financial Economics, Vol. 5, No. 2, pp. 177–188.
36. Barrdear, J. and Kumhof, M. 2016, The macroeconomics of central bank issued
digital currencies, Bank of England, Working Paper.
37. Broadbent, B. 2016, Central banks and digital currencies, Speech at London
School of Economics.
38. Lipton, A., 2016, The decline of the cash empire, Risk Magazine, Vol. 29, No.
11, p. 53.
39. Danezis, G. and Meiklejohn, S., 2015, Centrally banked cryptocurrencies, UCL
Working Paper.
40. Reid, F. and Harrigan, M., 2013, An analysis of anonymity in the bitcoin system,
in Security and Privacy in Social Networks, Springer, New York, pp. 197–223.
41. Baynham-Herd, X., 2016, Banking Balance Sheets and Blockchain: A Path to
100% Digital Money, UBS Discussion Paper.
42. King, M., 2016, The End of Alchemy: Money, Banking, and the Future of the
Global Economy, WW Norton & Company, New York, NY.
43. Dwyer, J., 2016, Central Bank-Issued Digital Currency: Assessing Central Bank
Perspectives of DLT and Implications for Fiat Currency and Policy Stimulus,
Celent Working Paper.
Chapter 19
Optimal Feature Selection Using
a Quantum Annealer
CONTENTS
19.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 562
19.2 Credit Scoring and Classification as a Business Problem . . . . . . . 562
19.3 Quadratic Unconstrained Binary Optimization
as an Established Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 564
19.4 Formulation of the Credit Scoring and Classification Problem . . 565
19.5 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 566
19.6 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 568
19.7 Binarizing, Scaling, and Correlating the German Credit Data . . 569
19.8 Coding the Feature Selector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 570
19.8.1 An inspiring simplicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 570
19.8.2 QUBO feature selection in the 1QBit SDK . . . . . . . . . . . . . 570
19.8.3 What happens in the call to minimize() . . . . . . . . . . . . . . . . 571
19.9 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573
19.10 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574
19.10.1 Establishing the zero-rule and other baseline
properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574
19.10.2 QUBO feature selection with logistic regression . . . . . . . . 579
19.10.3 Recursive feature elimination with logistic
regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 580
19.10.4 Comparison of QUBO feature selection and
recursive feature elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . 582
19.10.5 Comparison with potentially missed subsets . . . . . . . . . . . . 584
19.11 Comparison with Previously Reported Results . . . . . . . . . . . . . . . . . . 585
19.12 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 586
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 587
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 587
561
562 High-Performance Computing in Finance
19.1 Introduction
Quantum computing is still in its infancy. Its potential is sensed, but not
yet widely applied. Part of this is due to its specialized nature, and the small
size of the problems that can currently be handled. However, small does not
mean zero, and with the aid of software like the 1QBit Quantum-Ready Soft-
ware Development Kit, machines like the D-Wave quantum annealer can be
used to solve small but useful problems.
The software development kit (SDK) forms an abstraction layer between
the quantum hardware and the financial application program. In the specific
case of D-Wave, the SDK provides the objects needed to represent the objec-
tive function for a quadratic unconstrained binary optimization (QUBO) prob-
lem. The SDK also provides tools for translating constrained problems into
unconstrained problems, integer problems into binary problems, and so on.
Optimization is a computational paradigm that follows naturally from
the physics of quantum annealing. However, other types of hardware can
have other paradigms. The abstraction layer is designed to dispatch its high-
level problem representations to the appropriate physical solver. 1QBit refers
to the SDK as quantum ready, and to its overall architecture as hardware
agnostic.
The practical details of abstracting from multiple-state qubits to conven-
tional ones and zeros are outside the scope of this chapter. Suffice it to say
that human beings have been doing experiments with quantum mechanical
systems for over a century now. A lot is known about how to distinguish
between energy states, and how to accumulate observations until some level
of certainty has been reached. Some of this knowledge is encoded into the
software made by the hardware manufacturers themselves. However, for the
quantitative analyst or software developer working on a business problem, it is
easier to work with a consistent set of abstract entities that map more closely
onto the problem domain. The goal of the SDK is to provide these.
In the rest of this chapter, we will approach the problem of optimal feature
selection for credit scoring and classification as a perfectly ordinary problem
from the literature, that we just happen to solve with a quantum computer.
Only at a few select points will we pull back the curtain to reveal the hardware
being used.
are nonperforming [4]. Superficially, this indicates that lenders are making
good decisions 98% of the time. However, the U.S. Federal Deposit Insurance
Corporation [5] publishes yearly summaries of bank failures, and in the 15
years since 2001, there have been 547 failed institutions. Nonperforming loans
have been a major cause.
According to the 2016 Credit Access Survey by the U.S. Federal Reserve
Bank of New York [6], approximately 40% of U.S. credit applications are
rejected. Moreover, between 20% and 40% of consumers expect their applica-
tions to be rejected (it depends on the type of credit), and many do not even
apply. Yet, among these people, there may well be qualified customers for the
right kind of lender.
In a literature survey by Huang [7], the academic approach to credit scoring
is typically one of bigger data and bigger models. However, in a recent article
in Forbes magazine, consumer lending veteran Matt Harris [8] takes a different
view:
The recipe for success here starts with picking a truly underserved
segment. Then figure out some new methods for sifting the gold from
what everyone else sees as sand; this will end up being a combination
of data sources, data science and hard won credit observations.
If Harris is correct, the thing to look for in new credit scoring and classifi-
cation tools will not be their success rate in large-scale applications for which
tools like FICO [9] already exist, but in their flexibility and ease of integration
into specialized applications.
Feature selection has a natural role in this. More and more data is available
all the time, and although there are various complex schemes for using it, the
idea of finding a small set of key features is simple and easy to grasp. Ideally, we
would want these features to be both influential and independent, but here too
there are nuances. People lie. Data can mislead [10]. The “redundant” feature
might actually be the corroborative feature. The correct balance of influence
and independence will depend on the “hard-won credit observations” that
Harris sees as crucial, and on the lender’s confidence that the model genuinely
includes them.
564 High-Performance Computing in Finance
It should be noted that even the largest and most elaborate system
for credit scoring has to begin somewhere. Moreover, the addition of new
credit instruments to an existing system needs its own stage of analysis and
validation. Thus the work of a quantitative analyst may involve both small
feature sets and large feature sets. The development of new instruments is
not so very different from the development of new markets as Harris pictures
them.
The main difference between QUBO Feature Selection and RFE lies in how
aggressively each tries to reduce the number of feature variables in the fea-
ture subset. QUBO Feature Selection considers both the independence and
influence of the features under consideration. RFE focuses on eliminating
features that are less influential. Both approaches yield good results on the
German Credit Data.
For the classifier, we use logistic regression from scikit-learn. Unlike linear
regression, where one can imagine two continuous variables and fitting a line
to a scattered set of points, logistic regression assumes that the dependent
variable is a category, for example, the zero and one of a binary classifier. The
fitted line no longer predicts the value of the dependent variable, but rather
the probability that the dependent variable will have a specific value. It is a
well-established technique with a long pedigree [15].
To provide a benchmark, we apply logistic regression to the full fea-
ture set. Out of the box (i.e., without tuning), the logistic regression class
from scikit-learn gives a 75% success rate on the German Credit Data.
This is comparable to other methods reported in the literature, such as
support vector machines (SVMs), decision trees, neural networks, k-Nearest
Neighbors (k-NN) classification schemes, and so forth, as in Waad [12] or
Huang [7].
Standing on the shoulders of giants, then, we will now approach the optimal
feature selection problem in a perfectly ordinary way.
• An integer, where the order (lower to higher) has potential meaning, for
example, bank balances, years of education, and so on.
• A category, such as a geographic region code, where higher and lower val-
ues are arbitrary. It may also include value such as “missing” or “refused
to answer.”
• A decimal number, representing, for example, age, dollar amounts, inter-
est rates, and so on, where the order has meaning. Data such as the
latitude and longitude of the applicant’s home would be better repre-
sented as a category.
• A Boolean (yes/no) value, which may be considered as an integer or a
category.
566 High-Performance Computing in Finance
From this, we can express the problem in terms of the argmin operator,
which returns the vector x∗ for which its function argument is minimized:
% &
x∗ = argmin − xT Qx .
x
19.6 Classification
The classification problem may be stated as follows: given a row vector u of
new observations from a new applicant, calculate whether the vector belongs
to the creditworthy class. More specifically, find a function f (u) that returns
0 for acceptance and 1 for rejection.
One of the premises of machine learning is that such a function can be
derived from a programmatic analysis of existing data. The existing data is
divided into a training set and a test set. A candidate function is derived
from the training set, and its performance is measured on the test set. Much
has been written on the best way to define such functions, the best way to
divide the data, how to adapt to new data, and so on. For example, see
the citation lists at the UCI Machine Learning Repository [13] or Chen [17].
For a cautionary note, however, the well-known Anscombe’s Quartet is worth
revisiting as a problem in dividing points as opposed to fitting lines through
them [18].
In our example here, we use a simple classifier based on logistic regression.
However, in the code examples we will see that other classifiers could easily be
used in its place. One might also imagine a classifier with tunable parameters,
and searching through these to obtain the settings for best overall perfor-
mance. In the future, the speed of QUBO Feature Selection on a quantum
annealer might enable searches on quite large spaces to be done interactively,
as opposed to being spread out over hours or even days.
Optimal Feature Selection Using a Quantum Annealer 569
• All of the numerical features are scaled to mean zero and variance one.
• The classification variable is transformed to 0 = good, 1 = bad.
featureMatrix = [Link]
classVector = [Link]
estimator = LogisticRegression()
selector = RFECV(estimator, step = 1, cv = 3)
selector = [Link](featureMatrix, classVector)
indexList = selector.get_support()
featureList = [Link](indexList)[0]
At the end of this simple code block, the analyst has a feature list that can
be used to select columns from the feature matrix. The accuracy scores for
the classifier (a.k.a. the estimator) can then be computed using testing and
scoring classes such as ShuffleSplit or StratifiedShuffleSplit.
Strictly speaking, RFECV is not directly comparable with QUBO Feature
Selection, since the feature list is calculated independently of the accuracy
scoring. The simpler variant of RFE with a cardinality target is closer in
terms of program flow. A list of candidate subsets (of varying cardinalities) is
returned by the selector and subsequently tested for performance. However,
the simplicity of the RFECV code block is both an example and an inspiration.
is no a priori reason why this should be so, and the authors of this chapter
are hopeful that the wider use of QUBO Feature Selection may lead to some
deeper insights.
In the code example below, we construct a Q matrix using code that can
be easily written from the equations shown in the previous section. We can
assign our value of α at the outer level of a grid search, or at the core of a
bisection search that looks for jumps in the objective function, but in each
case we must somewhere solve the optimization problem to obtain a candidate
feature subset.
To do this, we use the SDK’s QuadraticBinaryPolynomialBuilder class,
which returns a polynomial object representing the objective function seen
earlier. The poly object is passed to a solver and the solution is returned as a
list of ones and zeros, referred to as a configuration. We convert this to a list of
integer indices that can be used to extract columns from a pandas DataFrame
or NumPy array, which is typically how feature matrices and class vectors are
passed to machine learning methods.
In the example below, the D-Wave solver is assigned explicitly.
builder = [Link]()
## ... Some code to construct the Q matrix...
poly = builder.build_polynomial()
solver = [Link](HWDWaveSolver(url, token))
solutionList = [Link](poly)
lowEnergySolution = solutionList.get_minimum_energy_solution()
config = [Link]
featureList = [Link]([Link]()).flatten().
tolist()
featureMatrix = [Link][:, featureList].values
classVector = [Link]
estimator = LogisticRegression() # and so on...
In the experimental study that led to this chapter being written, we fixed
the parameters for the classifier and focused on studying how the value of
α affected the feature subset returned by QUBO Feature Selection. A full
holistic optimization across all of the available parameters is a topic for future
work.
• Readout. The state of the system is then read, and in the ideal case it
would correspond to the optimal solution of the optimization problem
we wish to solve.
evaluation metric was defined as the unweighted accuracy, that is, the number
of correct classifications divided by the total number of classifications made.
Other metrics from scikit-learn were attempted, but they always led to the
same optimal alpha or RFE feature set. Unweighted accuracy was kept for
compatibility with other work and ease of understanding, as in Reference 7.
Testing and scoring were performed using the StratifiedShuffleSplit cross-
validation class from scikit-learn. Given the feature matrix, this class returns
sets of row indices that can be used to divide the matrix rows into a training
set and a test set. The separation is done in folds, with the number of folds
set by an argument. For example, a shuffle and split with 5 folds will take
80% of the matrix for the training set and 20% for the test set, and repeat
this process until all 5 of the possible 20% folds have been used as test sets.
The accuracy score for each split is slightly different. Since these reflect a
random selection of data for training and testing, it is conventional to report
the mean of the individual accuracy scores. We follow the convention used by
scikit-learn and calculate the error bars for the 95% confidence level.
0.30
Spearman correlation (Abs value)
0.25
0.20
0.15
0.10
0.05
0.00
1 10 20 30 40 48
Feature number (sorted by influence)
were binned into ordered categories and generally behaved as expected. How-
ever, the more arbitrary binarized categories (e.g., loan purpose) had uni-
formly low mutual information scores, and the objective function tended to
cycle among features. A plot of the ranked mutual information scores is shown
in Figure 19.3.
The results shown in this chapter were all calculated using the Spearman
correlation coefficient. As mentioned previously, this is an area where more
research is needed, for example, using other data sets with a broader mix of
feature variables.
Before we select any features, however, we first examine how well the
logistic regression classifier performs on the full feature set, that is, all 48
feature variables. We do this using the StratifiedShuffleSplit() class from scikit-
learn. Figure 19.4 shows that the mean accuracy depends on how many times
the data is shuffled, and on how the data is split between the training set and
the test set.
The combination of 1000 shuffles and 20% test share was chosen arbitrarily
as the standard for initial performance comparisons, being much more con-
venient than the larger numbers. It avoids the fluctuations found below 500
shuffles, and is close to the converged scores at 3000 samples and above. For
the definitive score comparison, however, the full 3000 shuffles were used. The
results for 10%, 15%, and 20% share were always very close and typically in
the median position. The 20% test share was therefore used throughout.
576 High-Performance Computing in Finance
0.05
Mutual information score
0.04
0.03
0.02
0.01
0.00
1 10 20 30 40 48
Feature number (sorted by influence)
FIGURE 19.3: Mutual information scores. The age, term, and credit
amount fields were binned into categories. However, the correlation matrix
based on this technique led to fluctuations in the accuracy scores, a lower
mean accuracy at the “best” feature subset, and a larger “best” feature sub-
set cardinality of 34 elements.
0.755
0.750
0.745
0 500 1000 1500 2000 2500 3000
Number of shuffles
60
Count
20
0
0.60 0.65 0.70 0.75 0.80 0.85 0.90
Accuracy score
FIGURE 19.5: Accuracy scores for all 48 samples, using 1000 shuffles with
20% test share. In the work described in this chapter, this distribution was
typical of the German Credit Data, regardless of the feature subset, number
of shuffles, number of shares, classifier parameters, and the like.
It can be seen in Figure 19.4 that the mean accuracy increases as the test
share is reduced, that is, a bigger training set yields a more accurate predictor.
However, the difference is small in relation to the dispersion of scores from
different shuffles, as can be seen in Figure 19.5.
Note that the 30% share curve does not fluctuate as much as the curves
for smaller shares. It turns out that scikit-learn chose a threefold default for
RFECV, and this may be one reason why it can attain a good (although not
optimal) result in relatively little time. It is also important to keep track of
absolute numbers as well as percentages: a 5% test share of 1000 samples
consists of only 50 samples, which stratification on the German Credit Data
will constrain to 35 good credit samples and 15 bad samples. It fluctuates
widely at the outset and converges slowly.
In Figure 19.5, we see the distribution of accuracy scores for all 48 features
at a 20% test share (800/200 train/test split) counted over 1000 shuffles.
Stratification forces the training set and the test set to have the same 70/30
distribution of good and bad credit samples, so that a 200-sample test set
will contain 60 bad credit samples chosen from 300 in the set overall. In
a large number of shuffles, there will inevitably be some repetition (a point
highlighted by scikit-learn in its documentation). Thus although the data looks
“Gaussian” and fits into the Gaussian overlay, in practice there are certain
scores that occur more frequently, and extreme values are not observed above
a certain limit. Note that in comparison to the spread of accuracy scores from
0.7 to 0.8 seen in Figure 19.5, the (converged) spread of mean scores from
578 High-Performance Computing in Finance
0.75 to 0.76 in Figure 19.4 is relatively small. So long as we avoid small test
shares and low numbers of shuffles, the error bars from the accuracy scores
will dominate the uncertainty overall.
We now take a moment to examine the behavior of logistic regression on
feature subsets with fewer than 48 features.
The number of possible subsets is given by the combinatorial function
C(48, K), where K is the cardinality of the subset. The largest number of
possible subsets occurs when 24 feature variables are selected, and is approx-
imately 32 trillion. There are some 280 trillion subsets possible overall.
It is not possible to test these trillions of subsets systematically. How-
ever, we can gain an idea of how they behave from random sampling. For
example, Figure 19.6 shows the accuracy of logistic regression for 10,000 ran-
domly selected subsets at each of the 48 possible cardinalities, which examines
432,354 feature subsets out of 281,474,976,710,656 possibilities. If we record
the best of the mean accuracies for each group of 10,000 subsets, and plot
them separately (the triangle markers in Figure 19.6), we can identify a “best
detected” subset at cardinality 35, with an accuracy of 0.76 ± 0.05. We then
examine how this mean was calculated from the accuracy scores for 1000 shuf-
fles with a 20% test share. In Figure 19.7, we see that the variance is large,
and comparable to what we saw with all 48 features present. Feature selection
looks to be a search for small improvements in collections of very noisy test
results.
0.80
Mean accuracy
0.75
0.70
0.65
0.60
1 10 20 30 40 48
Feature subset cardinality
60
Count
40
20
0
0.60 0.65 0.70 0.75 0.80 0.85 0.90
Mean accuracies for the splits (using StratifiedShuffleSplit)
FIGURE 19.7: Distribution of accuracy scores for the best feature subset
found through random search.
• Calculate the feature subset in an efficient way that can scale to larger
initial feature sets.
0.80
Accuracy means
30
0.75
20
0.70
10
0.65
0.60 0
0.0 0.2 0.4 0.6 0.8 1.0
<-- “Feature independence” Alpha “Feature influence” -->
of the wrapper model is that optimization can be done “at the α level” without
looking at the details of the subsets.
Figure 19.8 shows the full range of α from 0 to 1. On the left-hand side,
where α is close to zero, the emphasis is on feature independence. This favors
small subsets, and since their regression coefficients are often not large enough
to “push” the classifier across the cutoff point of p ≥ 0.5, the predicted class
is 0. They classify almost all of the samples as “good credit” and achieve the
zero-rule’s 70% success rate.
In Figure 19.9, we look more closely at the region between α = 0.9 and
α = 1. Here, the emphasis is on feature influence, and the subsets eventually
grow to include all 48 available feature variables.
It is interesting that accuracy increases with the size of the subset, reaches
a peak at α = 0.977 with 24 elements, and then declines gradually as more
features are added. The drop in accuracy to the left of α = 0.977 is quite
sharp, and although it is encouraging to see a global maximum so clearly
defined, this may be due to the data, and should be further investigated.
0.80
Accuracy means
30
0.75
20
0.70
10
0.65
0.60 0
0.90 0.92 0.94 0.96 0.98 1.00
Alpha (zoomed in on 0.9 to 1.0)
fitting is accompanied by testing, using a training set and test set chosen
according to a folding parameter.
Conceptually, RFE is like starting the QUBO objective function at α = 1
and working downward toward α = 0, except that RFE is recursive and treats
each iteration as a new feature set.1 Unlike QUBO Feature Selection, RFE
does not test explicitly for feature independence, nor does it allow a feature
to “come back” after it has been eliminated.
We began with the direct version of RFE, where we specified the desired
number of features and then measured the performance of the returned feature
subset.
The accuracies were measured with the same 1000 shuffles and 20% test
share that was used with the other methods. The results are shown in
Figure 19.10.
RFECV was very fast, although it converged to different feature subsets as
the cross-validation settings and random seeds were varied. In practice, how-
ever, it was easy to search these (and faster than running RFE for thousands
of shuffles). RFECV ultimately delivered a 31-element feature subset with an
accuracy of 0.76 ± 0.05, comparable with the other methods.
1 One could imagine the recursive elimination of features as α is iterated from 1 down
to 0. A recursive version of QUBO Feature Selection would make an interesting topic for
future work.
582 High-Performance Computing in Finance
0.85
0.80
Mean accuracy
0.75
0.70
FIGURE 19.10: RFE for cardinality targets from 1 to 48, using 1000 shuffles
with 20% test share. Error bars represent a confidence level of 95%. The
maximum mean accuracy is 0.77 ± 0.05.
0.80
0.75
0.70
0.65
0.60
0 10 20 30 40
Feature subset cardinality
0.775
0.770
0.765
0.760
0.755
0.750
QUBO
0.745 RFE
Random max
0.740
20 25 30 35 40
Feature subset cardinality (region with the best accuracy scores)
positives [16]. The above values reflect the classification of the German Credit
Data without the application of a cost matrix and can be compared with
results in the literature that were calculated in the same way.
The dashed vertical lines represent the means of the best subsets. The
vertical bars represent the counted occurrences of mean accuracies for the
perturbed feature subsets. QUBO Feature Selection does a better job than
RFE in finding a subset that is better than its neighbors. This reflects the
behavior of the quadratic objective function, which considers all of the feature
sets simultaneously (especially in the quantum annealer implementation). In
contrast, once a feature has been eliminated by RFE, there is no possibility
of bringing it back on a later iteration.
The same behavior is observed when adding or subtracting a feature from
the best subsets. In Figure 19.14, we see that the accuracy of the best QUBO
subset is again better than the perturbed subsets.
with other methods is given by Waad [12]. Waad also used a publicly available
machine learning package, Weka 3.7.0 [21], and created a “three-stage feature
selection fusion” technique with QUBO Feature Selection as the first stage,
which yielded very good accuracies. Waad’s results are not directly compara-
ble to the results in this chapter, since they were given in terms of precision
and recall, which are affected by whether the 5:1 cost ratio prescribed by
Hofmann has been applied in the training set. In this chapter, we did not use
the cost ratio and Waad does not mention it. Also, the Weka package does
not have a function like scikit-learn’s StratifiedShuffleSplit(), and in Waad’s
reported experimental procedure, the division of the samples into a training
set and a test set was done at an early stage with 10 folds but no shuffling.
Taken in total, however, Waad provides strong motivation for studying
how QUBO Feature Selection might be used as part of a larger procedure. For
example, Figure 19.12 shows that, although QUBO Feature Selection found a
very good subset with 24 elements, there were other subsets nearby that were
slightly better. Additional searching could uncover more.
For feature selection overall, Chen and Li [17] were able to achieve good
results with 12 features from the original German Credit Data with its 20 cat-
egorical and integer (or fixed decimal) features. These were manually selected
after comparing various correlation measures between the features and the
classification. Chen and Li did not use the programmatic binarization of cat-
egorical variables that came into greater popularity between 1998 and the
present day. Their results are primarily for SVMs and do not deliver notable
accuracy, especially since Chen and Li also report performing only a single
10-fold cross-validation. The real lesson from this early work is that correlation
makes a difference, and that automatic binarization might not always be a
good idea.
19.12 Conclusion
Our objective in this chapter has been to show that quantum computing is
now within the reach of everyone. We took an old problem and an old method
that people used to think was slow, and implemented it using an SDK that
can route the problem to either a quantum solver or an advanced classical
solver. If the reader has forgotten the term “quantum annealer” by this point,
that is perhaps a sign of success.
On the binarized German Credit Data, QUBO Feature Selection delivered
a smaller feature set (24 features) than either RFE (28 features) or RFECV
(31 features). All three methods showed comparable accuracy. A priority for
future work is to study the behavior of the QUBO method on different data
sets, with different correlation methods, and with a multistage approach that
could improve its performance.
Also of interest is the unusually small number of candidate feature sets
returned to the selector, and the possibility of applying the technique to
Optimal Feature Selection Using a Quantum Annealer 587
much larger initial feature sets. The method performed best when the fea-
ture correlation coefficients fell off smoothly. One could imagine broadening
the “wrapper” concept to include the selection of a correlation algorithm, and
using the speed of the QUBO Feature Selector to explore this space more
systematically.
The authors wishes that the availability of QUBO Feature Selection
via 1QBit’s quantum-ready SDK will make this method more accessible to
researchers interested in studying its possibilities, and to practitioners want-
ing to add a new (and quantum-ready) tool to their metaphorical toolboxes.
Acknowledgments
The author would like to thank Majid Dadashi and the 1QBit Software
Development Team for creating the SDK, and for their assistance with its
use. Anna Levit provided useful comments on the draft. Jaspreet Oberoi con-
tributed the idea that led to recursive QUBO Feature Selection, a concept
whose possibilities we have yet to explore.
The author wishes to express his gratitude to Gili Rosenberg and his many
collaborators on the 1QBit research paper, Solving the Optimal Trading Tra-
jectory Problem Using a Quantum Annealer, and to 1QBit for permission to
quote at length from their description of the D-Wave quantum annealer.
References
1. Organization for Economic Cooperation and Development (OECD). Household
debt (indicator), December 2016.
4. World Bank. Bank Non-Performing Loans to Gross Loans for United States,
2016. Data series DDSI02USA156NWDB retrieved from FRED, Federal Reserve
Bank of St. Louis.
5. U.S. Federal Deposit Insurance Corporation. Bank Failures in Brief, 2017. From
[Link].
6. Center for Microeconomic Data U.S. Federal Reserve Bank of New York. SCE
Credit Access Survey, 2016. From [Link].
8. Harris, M. The Short History and Long Future of the Online Lending Industry.
Forbes Valley Voices, 2017.
9. FICO (formerly the Fair Isaac Company). FICO Website www.fi[Link], 2017.
10. Goel, V. Russian Cyberforgers steal millions a day with fake sites. New York
Times Online, 2016.
12. N’Cir Waad, B. B. On Feature Selection Methods for Credit Scoring. PhD thesis,
Université de Tunis, Institut Supérieur de Gestion, École Doctorale Sciences de
Gestion LARODEC, 2016.
14. Pedregosa, F., Varoquaux, G., and Gramfort, A. Scikit-learn: Machine learning
in python. Journal of Machine Learning Research, 12:2825–2830, 2011. Note:
This is the citation requested at the [Link] website, accessed January
12, 2017.
15. Cox, D. The regression analysis of binary sequences (with discussion). Journal
of Royal Statistical Society B., 20:215–242, 1958.
17. Chen, Fei-Long and Li, Feng-Chia. Combination of feature selection approaches
with SVM in credit scoring. Expert Systems with Applications, 37:4902–4909,
2010.
21. Bouckaert, R. R., Frank, E., Hall, M., Kirkby, R., Reutemann, P., Seewald, A.,
and Scuse, D. Weka Manual (3.7.1), 2016. This is the citation given by Waad
(see below) and describes the Weka features available at the time.
23. Rao, M. How to Evaluate Bank Credit Risk Prediction Accuracy based on SVM
and Decision Tree Models, Capgemini “Capping IT Off” Blog, November 2,
2016. [Link], accessed January 18, 2017.
Index
589
590 Index
[Link]
for more...









