0% found this document useful (0 votes)
5 views61 pages

Stochastic Differential Equations Overview

Uploaded by

fatmahaliali5
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views61 pages

Stochastic Differential Equations Overview

Uploaded by

fatmahaliali5
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Lecture notes on

Stochastic Differential Equations and Applications


(draft version)

Nuno M. Brites, nbrites@[Link]


November 16, 2023
Contents

Preface

1 Computational Simulation 1
1.1 Monte Carlo simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Simulation of random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.1 Uniform random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.2 Bernoulli and Binomial random variables . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.3 Poisson random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.4 Exponential random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.5 Normal random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.6 Other built-in random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Multivariate random number generation . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4 Markov chain simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5 Monte Carlo integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.6 Advanced simulation methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2 Mathematical Foundations 11
2.1 Stochastic Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3 Path Regularity of Stochastic Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4 Markov Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.5 Brownian motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.6 Itô’s integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.7 Itô formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.8 Martingale Representation Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.9 Multidimensional Itô Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3 Solution of Stochastic Differential Equations 25


3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2 Existence and Uniqueness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3 Linear SDEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.4 Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4 Stochastic Differential Equation models 33


4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.2 Deterministic and Stochastic Populational Growth Models . . . . . . . . . . . . . . . . . 35
4.3 Deterministic and Stochastic Logistic Growth Models . . . . . . . . . . . . . . . . . . . . 36
4.4 Deterministic and Stochastic Generalized Logistic Growth Models . . . . . . . . . . . . . 37
4.5 Deterministic and Stochastic Gompertz Growth Models . . . . . . . . . . . . . . . . . . . 38
4.6 Deterministic and Stochastic Negative Exponential Growth Models . . . . . . . . . . . . 39
4.7 Deterministic and Stochastic Linear Growth Models . . . . . . . . . . . . . . . . . . . . . 39
4.8 Stochastic Square-Root Growth Model with Mean Reversion . . . . . . . . . . . . . . . . 40
CONTENTS

5 Approximation and Estimation of Solutions to Stochastic Differential Equations 41


5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.2 Taylor Expansions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.3 Iterative Schemes for Approximating SDE’s . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.3.1 The Euler-Maruyama Approximation . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.3.2 The Milstein (Second-Order) Approximation . . . . . . . . . . . . . . . . . . . . . 43
5.3.3 Variations on the EM and Milstein Schemes . . . . . . . . . . . . . . . . . . . . . . 43
5.3.4 The Lamperti Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.4 Local Linearization techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.4.1 The Ozaki Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.4.2 The Shoji-Ozaki Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.4.3 The Rate of Convergence of the Local Linearization Method . . . . . . . . . . . . 47

6 Estimation of Parameters of Stochastic Differential Equations 49


6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
6.2 The Maximum Likelihood Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
6.3 The Markov Property, Transitional Densities, and the Likelihood Function of the Sample 50
6.4 Change of Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
6.5 The Transition Probability Density Function is Known . . . . . . . . . . . . . . . . . . . . 50
6.6 The Transition Probability Density Function is Unknown . . . . . . . . . . . . . . . . . . 51
6.6.1 Euler-Maruyama scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
6.6.2 Ozaki Linearization Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
6.6.3 Shoji-Ozaki Linearization Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

References 55
Preface

This notes must be used as a secondary bibliography to clarify punctual doubts and/ or as a compiler
of the fundamental concepts regarding Stochastic Differential Equations. Please do not use this as a
reference book.
All errors found are the sole responsibility of the authors and must be communicated. Furthermore,
comments and suggestions are welcome.
CONTENTS
Chapter 1

Computational Simulation

The present chapter is related with the simulation of random numbers using computational tools,
specially important in mathematics when treating the estimation of unknown parameters, for example.
The structure of the chapter is based on a summarisation of fundamental concepts, followed by the
presentation of explained examples.

1.1 Monte Carlo simulation


Monte Carlo method is constituted by a wide range of algorithms used in the simulation of parameters
for a given random variables. This method is based on the simulation of various independent and
identically distributed copies of the random variable we are in search of parameters. The resulting
values from the simulated random variables are in turned used to estimate the value for the desired
parameter.
The most ubiquitous of these parameters to estimate is the mean, µ = E[ x ], of a given random variable
P
X. Equipped with the Law of large numbers, i.e. n1 ∑in=1 Xi −
→ µ, the approximation thus becomes
the generation of a large number of values for X and the subsequent computation of their average
X = n1 ∑in=1 Xi .
Recall that according to the central limit theorem, given a sequence of independent and identically
distributed random variables, X1 , X2 , · · · , Xn , with mean µ, variance σ2 , and finite moments (E[ x n ] < ∞
for any n ∈ N), then

∑in=1 Xi − nµ P
Zn = √ −−−−→ N (0, 1). (1.1)

In other words, when n → ∞, the distribution function of Zn follows asymptotically a standard normal
distribution, N (0, 1).
Example: Consider an experience consisting in removing one ball from a basket with balls numbered
from 1 to 4. What is the expected number of the removed ball?
Let X be the random variable defining the number of the ball removed. Since we have 4 balls, the
probability function has value 41 = 0.25.
The real Expected Value is:
E_X <- 1*0.25 + 2*0.25 + 3*0.25 + 4*0.25
print(E_X)

## [1] 2.5
Using the Monte Carlo Method, we have the following simulated expected value:

1
2 CHAPTER 1. COMPUTATIONAL SIMULATION

sample_1 <- round(runif(5, min=1, max=4)) # Small Sample


# Round fundamental to guarantee the number is integer
# print(sample_1) # Uncomment to visualize sample
MC_E_X <- sum(sample_1) / length(sample_1)
print(MC_E_X) # Poor approximation

## [1] 2.8
sample_2 <- round(runif(20000, min=1, max=4)) # Larger sample
#print(sample_2)
MC_E_X_2 <- sum(sample_2) / length(sample_2)
print(MC_E_X_2) # CLoser to the real parameter

## [1] 2.49415
Monte Carlo method may be applied to other parameters, proceeding supported by a large simulated
sample.

1.2 Simulation of random variables


1.2.1 Uniform random variables
Independent random numbers uniformly distributed in the interval [0, 1], un , can be generated by first
selecting three positive integers m, b, x0 ∈ Z+ , where m > b, x0 > 0, and then applying the algorithm:

x n ≡ b x n −1 mod m
xn
un = ,
m
for any n ∈ {1, 2, ..., m}.
Example:
Let m = 7, b = 3, and x0 = 2, we have,

x1 ≡ 3 × 2 ≡ 6 mod 7, u1 = 0.857
x2 ≡ 3 × 6 ≡ 4 mod 7, u2 = 0.571
x3 ≡ 3 × 4 ≡ 5 mod 7, u3 = 0.714
x4 ≡ 3 × 5 ≡ 1 mod 7, u4 = 0.143
x5 ≡ 3 × 1 ≡ 3 mod 7, u5 = 0.429
x6 ≡ 3 × 3 ≡ 2 mod 7, u6 = 0.286. ■

Notice, that parameters b and m require careful selection, in order to ensure that the cycle length is m, be-
cause if b, m, and x0 are not coprime, i.e. gcd(b, m), gcd(m, x0 ), gcd(b, x0 ) ̸= 1, then the aforementioned
algorithm will not complete a cycle.
1.2. SIMULATION OF RANDOM VARIABLES 3

Example:
Let us consider m = 14, b = 6, and x0 = 5, where gcd(m, b) = 2, we have

x1 ≡ 6 × 5 ≡ 2 mod 14
x2 ≡ 6 × 2 ≡ 12 mod 14
x3 ≡ 6 × 12 ≡ 2 mod 14

The cycle is only of length 2. ■


Example:
Let us consider m = 7, b = 4, and x0 = 6, where gcd(b, x0 ) = 2, we have

x1 ≡ 4 × 6 ≡ 3 mod 7
x2 ≡ 4 × 3 ≡ 5 mod 7
x3 ≡ 4 × 5 ≡ 6 mod 7

The cycle is only of length 3. ■


Hence, by the previous examples, m must preferably be prime to avoid having a greatest common
divisor different from 1 with b, and b and x0 must be coprime. Otherwise, at a certain point the method
will start returning 0 (if m = bi × x0 for some i ∈ Z+ ), or the cycle will be of a length lower than m, and
hence repeat generated values un , which may ruin the approximation.
There exists a built-in function in R to generate random uniformly distributed numbers, using a strategy
similar to the presented above: runif(n, min=a, max=b) returns a sample of n random numbers in the
interval [ a, b].
Due to the fact that theses methods generate predictable random numbers when x0 , b and m are known,
the obtained numbers are also called pseudorandom numbers.

1.2.2 Bernoulli and Binomial random variables


A Bernoulli trial is an experiment with two possible outcomes, success or failure, with probability p
and 1 − p, respectively, for some p ∈ [0, 1].

P( X = 0) = 1 − p, P( X = 1) = p, (1.2)

A Binomial random variable is the sum of m independent Bernoulli random variables.


 
m x
P( X = x ) = p (1 − p)m− x , for any x ∈ {0, ..., m}, (1.3)
x

In both cases, the random variables count the number of successes.


One possible way to simulate a Binomial random variable is through the simulation of uniform random
variables.
4 CHAPTER 1. COMPUTATIONAL SIMULATION

Example: Considering a basket with balls numbered from 1 to 4, let the success be the observation of
ball number “2”. How many times do we observe ball number “2”?
ball <- runif(10, min=0, max=1)
correct_ball <- (ball<=0.25) #Success if correct_ball <= 0.25
table(correct_ball) # TRUE represent the number of "2" observed

## correct_ball
## FALSE TRUE
## 7 3
Another is using R built-in functions:
Generator: rbinom(n, size, prob); Returns n random variables of a Binomial distribution with parameter
m and p.
Probability function: dbinom(x, size, prob); Returns P( X = x ) when size represents the number of
trials and prob represents the probability of a success.
Distribution function: pbinom(x, size, prob); Returns P( X ≤ x ) when size represents the number of
trials and prob represents the probability of a success.
Quantile function: qbinom(x, size, prob); Returns a such that P( X ≤ a) = x, when size represents the
number of trials and prob represents the probability of a success.
Example:
rbinom(5, 10, 0.25) # 5 samples of a binomial distribution

## [1] 0 2 2 3 2
dbinom(5, 10, 0.25) # P(X = 5)

## [1] 0.0583992
pbinom(5, 10, 0.25) # P(X =< 5)

## [1] 0.9802723
qbinom(0.5, 10, 0.25) # Median

## [1] 2

1.2.3 Poisson random variables


A Poisson random variable is the limit of a sequence of binomial distribution with parameters m going
to infinity and pm to zero. Both expected value and variance converge to a constant, λ, referred as rate.

e−λ λ x
P( X = x ) = , for any x ∈ N, (1.4)
x!

Functions
Generator: rpois(n, lambda); Returns n random variables of a Poisson random variable with rate λ.
Probability function: dpois(x, lambda); Returns P( X = x ) where λ represents the rate.
Distribution function: ppois(x, lambda); Returns P( X ≤ x ) where λ represents the rate.
Quantile function: qpois(x, lambda); Returns a such that P( X ≤ a) = x, where λ represent the rate.
1.2. SIMULATION OF RANDOM VARIABLES 5

Example:
rpois(10, 1.5)

## [1] 1 0 1 3 3 2 4 2 0 2
dpois(2, 1.5)

## [1] 0.2510214
ppois(2, 1.5)

## [1] 0.8088468
qpois(0.5, 1.5)

## [1] 1

A homogeneous Poisson random variable can be generated considering λ as the rate of a poisson for
one single period and T as the number of periods. The next algorithm generates the number of events
in each period:

1. Generate the number of events, N, in the time interval T, using rpois(1, λ*T);
2. Generate the moment when the events occur in the time interval [0,T], using runif(N, max=T).

Example
[Link](123456789) # fix the results
T <- 5 # Number of periods
N_events <- rpois(1, 1.5*T) # lambda=1.5
N_events # number of events in 5 periods

## [1] 9
Instant <- sort(runif(N_events, max=5)) # "Sort" to be crescent in time
Instant# return the moments when events occur!

## [1] 1.175238 1.340850 2.266536 3.269508 3.364405 3.594946 4.495499 4.607723


## [9] 4.654993
Poisson_vector <- rep(0,T)

for (i in 1:T){
Poisson_vector[i] <- sum(Instant<i & Instant>i-1)
}
print(Poisson_vector) # Poisson random variable

## [1] 0 2 1 3 3

Poisson random variables disposed on a line can be simulated using the generator of a exponential
distribution, representing the time to perform a task or the duration until failure, which will be summed
in a cumulative way in order to obtain the instants where the studied event occurs.

Example:
t <- rexp(5, rate=2)
cumsum(t)

## [1] 1.577225 2.366788 2.735835 4.189652 4.239626


6 CHAPTER 1. COMPUTATIONAL SIMULATION

1.2.4 Exponential random variables


Exponential distribution is commonly associated to the Poisson distribution but its scope is larger.
Exponential random variables are used to model waiting or failure times, or the time to complete a
task. With probability density function f (t) and distribution function F (t), respectively, given by

f (t) = λe−λt , (1.5)


−λt
F (t) = P( T ≤ t) = 1 − e . (1.6)

The parameter that defines the distribution is again λ.

Functions

Generator: rexp(n, lambda); Returns n exponential random variables with rate λ.

Probability density function: dexp(t, lambda); Returns P( T = t) where λ represents the rate, f (t) =
λe−λt

Distribution function: pexp(t, lambda); Returns P( T ≤ t) where λ represents the rate, F (t) = 1 − e−λt

Quantile function: qexp(x, lambda); Returns a such that P( T ≤ a) = t, where λ represent the rate.

Simulation of exponential random variables can be done by inverting function F,

log(1 − U )
F ( T ) = U ⇔ T = F −1 (U ) ⇔ T = − , (1.7)
λ

then generating values for a uniform distribution U ([0, 1]) and finally imputing those values on the
expression for F −1 .

1.2.5 Normal random variables


Normal distribution is among the most important distributions and it’s widely applied. Mean, or
expected value, and variance characterise the normal distribution. When µ = 0 and σ2 = 1, it is said
that the random variable follow a standard normal distribution. Central limit theorem contributes to
the relevance of this distribution.

Functions

Generator: rnorm(n, mean=0, sd=1); Returns n random variables normally distributed with mean 0
and variance 1.

Probability density function: dnorm(n, mean=0, sd=1); Returns P( X = x ) where X is a random variable
normally distributed with mean 0 and variance 1.

Distribution function: pnorm(n, mean=0, sd=1); Returns P( X ≤ x ) where X is a random variable


normally distributed with mean 0 and variance 1.

Quantile function: qnorm(n, mean=0, sd=1); Returns a such that P( X ≤ a) = x, where X is a random
variable normally distributed with mean 0 and variance 1.

1.2.6 Other built-in random variables


R software has many standard distributions following the convention of rxxx to generate random
variables, dxxx to obtain the probability (density) function, pxxx to the cumulative distribution function
and qxxx to obtain the quantiles. To visualise all these distributions, please consult Braun and Murdoch
(2021).
1.3. MULTIVARIATE RANDOM NUMBER GENERATION 7

1.3 Multivariate random number generation


When the problem to model is based in more than one random variable, which may happen frequently
when working on real problems, it’s required to simulate random variables with different distributions.

One possible technique is sequential conditional generation, where random variables are simulated
conditioned by the previous. Considering X = ( X1 , X2 , · · · , Xn ), and knowing the distributions of each
random variable, the first step is to simulate X1 , then simulate X2 conditioned on X1 , X3 conditioned on
X1 and X2 and so on until Xn , which is conditioned by X1 , · · · , Xn−1 . This method may not be feasible
when distributions are not known.

Even though R has some multivariate distributions built in, in general it is necessary to construct our
own distribution.

1.4 Markov chain simulation


Markov chains are sequences of random variables where, at time n, the random variable depends
only on the previous time instant, i.e. n − 1. Markov chains are used in modeling systems with short
memory, like stock market or in genome passage from parents to offspring.

When there are 1, · · · , n states, the transition probabilities are represented by a n × n matrix P, where
i represent the current state, j a possible future state and P( Xn+1 = j | Xn = i ) = Pi,j represents the
probability of moving from state i to state j.

Two things to note is that the sum of the rows of a transition probability matrix adds to one,
i.e. ∑nj=1 Pi,j = ∑nj=1 P( Xn+1 = j | Xn = i ) = 1, and since time increments are independent, the matrix
will be invariant over time, hence the transition matrix remains unchanged.

An important property of a Markov chain, is that they possess an invariant distribution, that is a
distribution πi = [ P( X = i )]i , such that,
π = πP.

Notice that π is simply an eigenvector of P T , such that, ∑in=1 πi = 1. An even more interesting property
is that for some transition probability matrices P, Xn will converge to the invariant distribution π.

Example: Let’s consider a two state case: weather is sunny (1) or rainy (2). The transition probability
matrix associated to this trial is:
MC_weather <- matrix( c(0.9, 0.1, 0.5, 0.5), nrow=2, ncol = 2, byrow = TRUE)
MC_weather

## [,1] [,2]
## [1,] 0.9 0.1
## [2,] 0.5 0.5

We can extract from the matrix that the probability of having a sunny day tomorrow if today was
sunny is 90%. Oppositely, the probability of raining after a sunny day is 10%. If today was raining, the
probability of having a rainny or sunny day tomorrow is equal to 50%.

1.5 Monte Carlo integration


Integration through Monte Carlo method is useful to approximate the value of an integral resorting to
the simulation of various points uniformly distributed along the interval of integration, which are then
evaluated on the integrating function to finally take the estimate of the integral as the mean of that last
points times the length of that interval.
8 CHAPTER 1. COMPUTATIONAL SIMULATION

The previously explained strategy uses the law of large numbers to support the use of the sample mean
and is based on the formula of the expected value to obtain its final expression:

Z b Z b
1
E( g(Ui )) = g(Ui ) du ⇔ g(Ui ) du = (b − a) E( g(Ui ))
a b−a a

1
Note that we are using uniform random variables, so the density distribution is equal to b− a .
R3
Example: Considering the integral 0
x2 dx, with value 9. Computing this integral with Monte Carlo
we have:
[Link](1234567656) # Fixing results
aux <- runif(100, min=0, max=3)

Figure 1.1: Simulation of points with an uniform distribution.

points <- auxˆ2

Figure 1.2: Valued of simulated points in the target function.

integral_approx <- mean(points)*(3-0)


integral_approx

## [1] 9.918777
1.6. ADVANCED SIMULATION METHODS 9

# or: (augmented sample)


[Link](1234567656) # Fixing results
aux <- runif(10000, min=0, max=3)
points <- auxˆ2
integral_approx <- mean(points)*(3-0)
integral_approx

## [1] 9.013217
As one can see, the approximation of the integral through Monte Carlo provides a better result as the
number of simulated points increase.
Rb Rb
When dealing with a multiple integration on m variables, a11 · · · amm g(u1 , · · · , um ) du1 · · · dum , the
j
strategy is similar but with the adjustment of simulating as many uniform random variables, Ui ∼
U [ a j , b j ] for any i ∈ {1, · · · , n} and j ∈ {1, · · · , m}, as the number of variables in the integral and the
length of the integral is substituted by the product of the length of the integrals.

Z b1 Z bm
du1 dum
E( g(Ui1 , · · · , Uim )) = ··· g ( u1 , · · · , u m ) ··· (1.8)
a1 am b1 − a1 bm − a m
Z b1 Z bm m

a1
···
am
g(u1 , · · · , um ) du1 · · · dum = ∏(bj − a j ) E( g(Ui1 , · · · , Uim )) (1.9)
j =1

R 10 R 7
Example: Considering the integral 3 1 sin( x − y) dx dy, with approximated value 0.1185. Computing
this integral with Monte Carlo we have:
x_aux <- runif(10000000, min=1, max=7)
y_aux <- runif(10000000, min=3, max=10)
points_2 <- sin(x_aux-y_aux)
integral_approx_2 <- mean(points_2)*(10-3)*(7-1)
integral_approx_2

## [1] 0.1129121
The Uniform distribution is not the unique distribution that can be generated to compute an integral.
 g( X ) 
If, for instance, we consider a random variable X that has a distribution function f then E f (X ) =
R  g( X )  R R
f (X)
f ( x ) dx = g( x ) dx. Hence we can compute g( x ) dx by generating a very large number of
g( X )
values for X, inputting them in f (X)
and calculating the average of the resulting values.

1.6 Advanced simulation methods


Up to now we have discussed particular, and well-known, distributions. However, there exists two
simulation methods commonly used: rejection sampling and importance sampling.
Rejection sampling from a set S consists on the simulation of values for an appropriate random variable
X, such that S ⊂ { X (ω ) : ω ∈ Ω), and selecting the simulated values contained in S, discarding the
remaining ones.
Importance sampling is a technique to generate randomly both the sample and the weights,aiming to
approximate the expected values to the wanted density function. In this technique we start by choosing
a convenient density distribution, f ( x ), to obtain a sample, then compute the weights dividing the
target distribution, g( x ), by the selected distribution in order to, finally, approximate the expectation
of a function h( X ) where X has the probability distribution function g( x ) using averages of h( xi )
g( x )
weighted by wi = f (xi ) .
i
10 CHAPTER 1. COMPUTATIONAL SIMULATION
Chapter 2

Mathematical Foundations

This chapter covers the fundamental contents required to the topic of Stochastic Differential Equations,
which is a subject that requires a broad knowledge and many times is necessary to review or address
any deficiency in background. In this sense, our intention is to reproduce a succinct yet complete
summary comprehending the most important topics.

2.1 Stochastic Processes


A stochastic process is a collection of random variables { Xt , t ∈ T } defined on a given probability space
(Ω, A, P ), where the set of values assumed by Xt , called states, form the state space S, for each t in a
parameter space T.
Therefore, a stochastic process is a real-valued random function Xt : T → S. For a fixed t ∈ T, the map
w ∈ Ω → Xt (w) is a random variable. For each w ∈ Ω fixed, the map t ∈ T → Xt (w) is a real-valued
function called the trajectory, realization or sample path of the process.
When the parameter space is designated as the time space, T can be either discrete, T = N, or
continuous, T = [0, +∞). State space S can be a countable set, S = N for example, or an interval in R,
S = [ a, b), a, b ∈ R.
Two stochastic processes, { Xt }t∈T and {Yt }t∈T defined on the same probability space (Ω, A, P ), are
said to be stochastically equivalent or indistinguishable if
 
Xt = Yt a.s., ∀ t ∈ T, or equivalently, P Xt (w) = Yt (w) | w = 1.

The process { Xt } is a version or modification of {Yt }, and vice versa, and their finite-dimensional
distributions coincide.
The finite-dimensional distribution of a stochastic process { Xt }t∈T defined on a probability space
(Ω, A, P ) is the joint distribution of the random vector ( X (t1 ), · · · , X (tn )), given by

Ft1 ,··· ,tn ( x1 , · · · , xn ) = P( Xt1 ≤ x1 , · · · , Xtn ≤ xn ), (2.1)

with t ∈ T, xi ∈ R, n ≥ 1.
The family of distributions that define a stochastic process are important to comprehend its behavior and
to determine the P(( X (t1 ), · · · , X (tn )) ∈ A, with n ∈ N+ , (t1 , · · · , tn ) ∈ T n and A ⊆ Sn a measurable
set.
Sometimes it is convenient to construct a stochastic process with a specific finite-dimensional dis-
tribution. When this construction is not readily obtained from a sequence of random variables
{ Xi , i = 1, · · · , n}, one might verify the Kolmogorov Existence Theorem.

11
12 CHAPTER 2. MATHEMATICAL FOUNDATIONS

Theorem 2.1 (Kolmogorov Existence Theorem). Let P to be a family of finite-dimensional distributions


satisfying the following consistency conditions:
1. If {π (1), · · · , π (n)} is a permutation of the numbers 1, · · · , n, then for arbitrary time points t1 , · · · , tn ∈
T and n ≥ 1, Ftπ(1) ,··· ,tπ(n) ( xπ (1) , · · · , xπ (n) ) = Ft1 ,··· ,tn ( x1 , · · · , xn );

2. For m < n and arbitrary tm+1 , · · · , tn ∈ T, Ft1 ,··· ,tm ,tm+1 ,··· ,tn ( x1 , · · · , xm , ∞, · · · , ∞) =
Ft1 ,··· ,tm ( x1 , · · · , xm ).
Then, there exists a probability space (Ω, A, P ) and a stochastic process { Xt }t∈T defined on a that space such
that P is the collection of finite-dimensional distributions of { Xt }t∈T .

A stochastic process { Xt }t∈T defined on a probability space (Ω, A, P ) has the following characteristics:
1. Continuous, if all its paths are continuous, i.e., for almost all w ∈ Ω the mapping t ∈ T → Xt (w)
is continuous;
2. Integrable, if Xt is an integrable random variable ∀t ≥ 0;
3. Measurable, if the mapping (t, w) → Xt (w) is measurable with respect to the σ-algebra B( T ) × A,
with B( T ) the family of all Borel subsets of T;
4. Square-integrable, if E(| Xt |2 ) < ∞, ∀t ≥ 0;
5. Increasing, if for any s < t, Xs ≤ Xt ;
6. (Strictly) Stationary, if the finite-dimensional distributions are invariant under time displace-
ments, i.e., for ti , ti+h ∈ [to , n] and h ∈ R, Ft1+h ,··· ,tn+h ( x1 , · · · , xn ) = Ft1 ,··· ,tn ( x1 , · · · , xn );
7. Gaussian, if finite-dimensional distributions are normally distributed;
8. Continuous in probability, if P(| Xt − Xs |> ε) → 0, with any s ∈ T, s → t and ε > 0;
9. Quadratic variation process, if Xt (w) : Ω → R is continuous and the second is defined as

( X, X )2t (w) = lim∆tk →0 ∑ | X t k +1 ( w ) − X t k ( w ) | 2 ,


tk ≤t

with 0 < t1 < t2 < · · · < tn = t and ∆tk = tk+1 − tk ;


10. Markov, if the process has no memory in the sense that the change at time t is only determined
by the value of the process in that specific moment, and not by its past values (Markov Property).
A σ-algebra on the probability space (Ω, A, P ) is defined as At = σ( Xs , s ∈ T, s ≤ t) and comprises all
the information regarding the process available at time t ∈ T. The σ-algebra A∞ = σ(∪t∈T At ) gathers
all information of all time points t and, for any t ∈ T, At ⊆ A∞ .
Given a probability space (Ω, A, P ), a filtration J = {At }t∈T is an increasing family of sub-σ-algebras
of A, that is, for s, t ∈ T with 0 ≤ s ≤ t ≤ ∞, As ⊆ At ⊆ A, which implies that At contains the same or
more information about the process than As , both bounded above by A, and the information of As is a
subset of the At information.
A probability space endowed with a filtration is called a filtered probability space and is denoted as
(Ω, A, J , P ). A stochastic process { Xt }t∈T is adapted to the filtration J if, for each t ∈ T, σ( Xs , s ∈
T, s ≤ T ) ⊆ At , or, equivalently, it is At -measurable, that is, the information held at time t by At is
enough to determine the value of Xt .
A nonanticipating process is a process that does not present the capability to anticipate the future,
that is, the value of Xt depends on the previous behavior of the system, without losing information,
and it is only known if At is also known. An important property, denominated tower property, states
that E( E( Xt | As ) | Ar ) = E( Xt | Ar ) with r < s < t, since Ar comprises less information than As
(Ar ⊂ As ).
2.2. MARTINGALES 13

Given a stochastic process { Xt }t∈T , one can construct its natural filtration denoted as FtX = {AtX }t∈T ,
which consists in a family of σ-algebras generated by the values of process up to time t, generating the
smallest filtration to which { Xt }t∈T is adapted.
Filtrations are of special importance because they model the information produced by a stochastic
process over time. Thus, one can check if a given event has occurred up to the time under analysis, that
is, if A ∈ σ( Xs , s ≤ t) with σ( Xs , s ≤ t) the natural filtration of { Xt }t∈T .
A probability space (Ω, A, P ) is complete with respect to a measure µ if all sets of measure zero belong
to the σ-algebra A.
A filtration J = {At }t∈T is complete if the probability space (Ω, A, P ) is complete and if the collection
of all subsets with zero probability, A0 , contains all the sets with null probability- A ∈ A, P( A) = 0
then A ∈ A0 .
A filtration J = {At }t∈T is right-continuous if At = At+ , with At+ = ∩{ As , s ∈ T, s > t} ∀ t ∈ T.
A continuous stochastic process adapted to a filtration is also progressively measurable with respect to
that filtration.

2.2 Martingales
Let (Ω, A, P ) be a probability space and J = {At }t∈T a filtration on that probability space. A
martingale defined on a filtered probability space (Ω, A, J , P ) is a sequence of random variables
{ Xt }t∈T on the probability space that are adapted to the filtration. Hence, if { Xt }t∈T satisfies
1. Xt is At -measurable (Xt ’s are adapted to the filtration)
2. E(| Xt |) < ∞ (Xt ’s are integrable)
3. E( Xt | As ) = Xs , ∀ 1 ≤ s < t < ∞ (martingale property)
is discrete-time martingale.
The martingale property states that the expected value of Xt+1 given the information available up to
time t is Xt . The best next forecast for the fortune of a gambler on a fair game after tth trials is the
current fortune at time t, for example. If the “=” sign in the martingale property is substituted by a “≤”
(“≥”), the process { Xt }t∈T is called supermartingale (submartingale).
One interest thing that worth mentioning regarding martingales is that one can construct new martin-
gales based on an old one. The general approach is to consider a new process { X̃t , 0 ≤ t < ∞} defined
as X̃0 = X0 and X̃t = X0 + A1 ( X1 − X0 ) + A2 ( X2 − X1 ) + · · · + An ( Xt − Xt−1 ), t ≥ 1. The process
X̃t is the martingale transform of { Xt }t∈T by { At }t∈T . The following theorem states the conditions
appropriately.

Theorem 2.2 (Martingale Transform Theorem). Let { Xt }t∈T be a martingale adapted to the filtration
J = {At }t∈T with { At , 1 ≤ t < ∞} a sequence of bounded random variables that are nonantecipating with
respect to J . Then, the sequence of martingale transforms { X̃t }t∈T is itself a martingale adapted to J .

A stopping time for the filtration J = {At }t∈T is a random variable τ : Ω → T = N ∪ {+∞} where
{w | τ (w) ≤ t} ∈ At , ∀ 0 ≤ t < ∞, that is, a stopping time constitutes a rule to stop the process if the
criterion τ was verified based on the information about the filtration J . Stopping times are related with
martingales by the noticeable fact that a stopped process is a martingale adapted to J if the original
process was also a martingale adapted to J .
The study of the behavior of a process is interesting to check if this process converges to a certain
threshold as time evolves. The following theorems cover these theoretical aspects.

Theorem 2.3. If a process { Xt }∞


t=1 is a L -bounded martingale on a probability space ( Ω, A, P ), i.e., if
1

supk∈K E(| Xk |) < +∞, then there exists an integrable random variable Z such that limt→+∞ Xt = Z a.s.
14 CHAPTER 2. MATHEMATICAL FOUNDATIONS

Convergence in L1 can be proved using uniform integrability, verifying if for a sequence of random
variables { Xt }∞
t=1 , either 1 or 2 verifies.
R
1. limm→∞ (supt E(| Xt |; | Xt | > m) = limm→∞ (supt {|Xt |>m} | Xt |dP) = 0.

2. { Xt }∞ 1 ( Ω, A, P ) and for all ε > 0 there exists a δ > 0 such that, for an event
t=1 is bounded on L R
A ∈ A, P( A) < δ implies A | Xt | dP < δ ∀ t.

Theorem 2.4. Let { Xt }∞ ∞


t=1 be a martingale on the probability space ( Ω, A, P ) adapted to the filtration {A}t=1 .
If { Xt }∞
t=1 is uniformly integrable, then (1) exists an integrable r.v. Z on ( Ω, A, P ) such that Xt → Z a.s. in
L1 (Ω, A, P) as t → ∞ and (2) Xm = E( Z | Am ) a.s. in (Ω, A, P).

Theorem 2.5. If (Ω, A0 , P), X ∈ L1 and At are arbitrary σ-algebras with A0 , then the r.v. E( X | A∞ ) are
uniformly integrable.

Theorem 2.6 (Dominated Convergence Theorem for Conditional Expectations). Suppose { Xt }∞ t=1 is a
martingale on a probability space (Ω, A, P). Let Xt → X a.s. and | Xt | ≤ Z ∀t, with E( Z ) < +∞. If At ↑ A∞ ,
then E( Xt | At ) → E( X | A∞ ) a.s.

A martingale { Xt }t≥0 is L2 -bounded if E(| Xt |2 ) < B, B ∈ R, for all t ≥ 1.

Theorem 2.7 (L2 -bounded Martingale Convergence Theorem). Let { Xt }t≥0 be a L2 -bounded martingale.
Then there exists a r.v. X with E(| Xt |2 ) < B, B ∈ R, such that P(limt→∞ Xt = X ) = 1 and limt→∞ ∥( Xt −
X ∥2 = 0.

Let´s now extend the previous concepts to the case of continuous-time martingales. Through the use of
the correct conversion, it’s possible to generate that results from discrete-time martingales.
Given a stochastic process { Xt }t≥0 adapted to the filtration {At }t≥0 , defined on the probability space
(Ω, A, P), it is a continuous-time martingale if E( Xt | As ) = Xs ∀ 0 ≤ s ≤ t.
Considering { Xt }t≥0 a continuous-time martingale adapted to the filtration {At }t≥0 and { Xtn }∞ n =1
uniformly integrable, with {tn }∞
n=1 a strictly increasing sequence of real numbers verifying tn → ∞ as
n → ∞. Hence, by theorem 2.4, exists a r.v. Z such that Xtn → Z a.s. as n → ∞ and E( Z | Atn ) = Xtn .
Furthermore, E( Z | At ) = E( E( Z | Atn ) | At ) = E( Xtn | At ) = Xt , ∀ tn > t > 0.
The random variable τ : Ω → R ∪ {+∞} is a stopping time adpated to the filtration {At }t≥0 since
{w ∈ Ω | τ (w) ≤ t} ∈ At , ∀t ≥ 0.

Theorem 2.8 (Doob’s Continuous-Time Stopping Theorem). Let { Xt }t≥0 be a continuous martingale
adapted to the filtration {At }t≥0 that is right-continuous and A0 contains all P-null sets. If τ is a stopping time
for J , then the process {Yt }t≥0 = { Xmin(τ,t) }t≥0 is also a continuous martingale adapted to J .

The convergence aspect of a continuous time process requires the definition of a Càdlàg process (from
the french continue à droite, limite à gauche), i.e., a process that is right continuous, if for w ∈ Ω the
function Xt (w) is right continuous, with left limits, if for w ∈ Ω the lims↑t Xs (w) exists and is finite for
t ≥ 0.

Theorem 2.9 (Doob’s Convergence Theorem). Let J = {At }t≥0 be a filtration of the probability space
(Ω, A, P) and let { Xt }t≥0 be a martingale with respect to the filtration J whose paths are right-continuous and
left limit. Then, the following are equivalent:
1. { Xt }t≥0 converges in L1 when t → ∞;
2. As t → ∞, { Xt }t≥0 converges a.s. to an integrable and At -measurable r.v. Z that satifies Xt = E( Z |
At ), t ≥ 0;
3. { Xt }t≥0 is uniformly integrable.
2.3. PATH REGULARITY OF STOCHASTIC PROCESSES 15

2.3 Path Regularity of Stochastic Processes


A function f : R+ → R is Hölder with exponent γ if there exists constants c, γ > 0 such that
| f (t) − f (s)| ≤ c|t − s|γ with s, t ∈ R+ . Hölder continuity allows the determination of the rate of
convergence: | f (t) − f (s)| → 0 at least as fast as |t − s|γ → 0.
The following theorem is related to the regularity of the paths of a stochastic process.

Theorem 2.10 (Kolmogorov Continuity Theorem). For α, ε, c > 0, if the stochastic process { Xt }t∈[0,1]
defined on the probability space (Ω, A, P) satisfies E(| Xt − Xs |α ) ≤ c|t − s|1+ε for s, t ∈ [0, 1] then there exists
a modification of { Xt }t∈[0,1] that is a continuous process and whose paths are γ-Hölder for every γ ∈ [0, γα ].

Regarding martingales, their regular versions depend on the regularity properties of the associated
filtration, thus some conditions on the filtration are mandatory. Thereon, the following theorem
guarantees the existence of modified martingales with regular paths.

Theorem 2.11 (Doob’s Regularization Theorem). Let (Ω, A, {At }t≥0 , P) be a filtered probability space that
satisfies:
1. For A ∈ A with P( A) = 0, then every subset of A is in A0 ;
2. the filtration {At }t≥0 is right-continuous (At ∩ At+ε ∀ t ≥ 0, ε > 0).
Let { Xt }t≥0 be a supermartingale with respect to the filtration J = {At }t≥0 . Assuming the function t 7→ E( Xt )
is right-continuous, then there exists a modified process { X̃t }t≥0 of { Xt }t≥0 with the following properties:
1. { X̃t }t≥0 is adapted to the filtration J ;
2. The paths of { X̃t }t≥0 are locally bounded, right-continuous and left limited;
3. { X̃t }t≥0 is a supermartingale with respect to the filtration J .

2.4 Markov Processes


A stochastic process { Xt }t≥0 defined on (Ω, A, P) provided with the filtration At = σ( Xs , 0 ≤ s < t)
that is real-valued and adapted is a Markov process if

P( Xt ∈ B | As ) = P( Xt ∈ B | Xs ), ∀ 0 ≤ s < t < ∞, B ∈ R, (Markov property). (2.2)

From the Markov property is possible to conclude that, given the present value, the future value of
a Markov process is independent of the past values, which is the reason why Markov processes are
classified as “memoryless”. Moreover, from the definition is possible to conclude that any process with
independent increments, either discrete or continuous, is a Markov process.
A remark to the fact that 2.2 is defined for a continuous state space, being the process denominated as a
Markov Process. However, it is possible to adjust the definition for the case of a discrete state space,
obtaining a process called Markov Chain.
The transition probability of a process is denoted by P(t, B|s, x ) = P( Xt ∈ B| Xs = x ) and represents
the probability of a process that is at position x at time s to move to a state in B at time t. Transition
probabilities of a Markov process has the following properties:
1. P(t, ·|s, x ) is a probability measure on the family of Borel sets B for every 0 ≤ s < t < ∞ and
x ∈ R;
2. P(t, B|s, ·) is Borel measurable for every 0 ≤ s < t < ∞ and B ∈ B ;
R
3. P(t, B|s, x ) = R P(t, B|r, y) P(r, dy|s, x ) for s < r < t, x ∈ R and B ∈ B (Chapman-Kolmogorov
equation).
16 CHAPTER 2. MATHEMATICAL FOUNDATIONS

To understand property 3, let´s divide the transition from x to a state in B in two steps and consider an
intermediate point at moment r. More steps are allowed if it is intended to, but when aiming to some
intuition two steps are enough.
The process moves firstly from x to y (at moment r), and then from y to a state in B. The Markov
property says that past information should be disregarded and considered only the present value of the
process, so the useful information at time r is that the process is in state y.
The absence of memory that characterize this process impose that transitions of moments s → r and
r → t are independent and the probability of moving from x to a state in B is the product of the
probabilities of the intermediate movements.
Since y can assume any value, through the total probability theorem, this transition probability from
x to a state in B verified between moments s and t should consider all possible trajectories for each
different y, which is the reason for the presence of the integral.
The transition probabilities are stationary if

P( Xt+τ ∈ B| Xs+τ = x ) = P( Xt ∈ B| Xs = x ), τ constant, s < t (2.3)

is satisfied. That is, the initial and final moments are no longer relevant but, instead, the critical point is
the duration of the interval, along with states x and B. A Markov process is termed homogeneous (in
time) when their transition probabilities are stationary.
A homogeneous Markov process satisfies the strong Markov property if, for any τ ≥ 0 and B a Borel
set,
P ( X t + τ ∈ B | A t ) = P ( X τ ∈ B | X0 ) (2.4)

holds for every Markov instants of time. An equivalent way of defining the strong Markov property
presented in 2.4 is through the verification of

E [ h ( X t + τ ) | A t ] = E [ h ( X τ ) | X0 ] , (2.5)

for all Borel-measurable bounded functions h. If the strong Markov property is verified, a homogeneous
Markov process is said to be a strong Markov process. Please note that a strong Markov process is also
Markov process, but a Markov process is not always a strong Markov process.

2.5 Brownian motion


A stochastic process {Wt }t≥0 defined on a probability space (Ω, A, P) is a standard Brownian Motion if
the following properties are satisfied:
1. the process begins at 0, i.e., t0 = 0 or P(W0 = 0) = 1;
2. for 0 = t0 ≤ t1 ≤ · · · ≤ tn , the increments W (t1 ), W (t2 ) − W (t1 ), · · · , W (tn ) − W (tn−1 ) are
independent random variables;
3. for 0 < s < t, the increments Wt − Ws ∼ N (0, t − s).
Standard Brownian Motions are also called Wiener Processes, due to the contribution of Norbert Wiener
on the mathematical definition.
Property (i) is a convention. Property (ii) translates the absence of memory of a standard Brownian
motion, since the increment verified on the time interval [tk , tk+1 ] is independent of the increment on
[tk+n , tk+n+1 ], with n ∈ N . Finally, property (iii) states the distribution followed by the increments,
which has zero mean and variance proportional to the duration of the interval- the longer the interval,
the larger the variance, so a process like Brownian motion does not evidence propensity to return to a
previous position.
2.5. BROWNIAN MOTION 17

Standard Brownian motion may be used to represent the cumulative effect of noise. The difference
between the value of a standard Brownian motion at instants t and s, Wt − Ws ∀ 0 ≤ s < t, reflects the
noise on the interval [s, t].
The following characteristics of standard Brownian motion worth mentioning:
1. The increments are stationary, i.e., the distribution of Wt − Ws , 0 ≤ s < t, only depends on t − s;
2. It is a Quadratic Variation process with quadratic variation equal to t: E(Wt ) = 0 and Var (Wt ) =
t, 0 ≤ s < t;
3. E(Ws Wt ) = E(Ws Wt + Ws2 − Ws2 ) = E(Ws (Wt − Ws ) + Ws2 ) = E(Ws ) E(Wt − Ws ) + E(Ws2 ) =
E(Ws2 ) = s, or in general, E(Ws Wt ) = min{s, t};
4. The mapping t → W (t, w) is a.s. continuous in t and W (0, w) = 0, for each w;
5. If a standard Brownian motion process {Wt }t≥0 is measurable, then each sample path function
W (·, w) is A-measurable;
6. Its sample paths are highly irregular and a.s. of unbounded variation;
7. Its sample paths are a.s. non differentiable;
8. The symmetric process {−Wt }t≥0 is also a Brwonian motion (symmetric property);

9. {Wct }t≥0 follows the same probability law as { c Wt }t≥0 (scalling property);
10. limt→0 (Wt /t) = 0 a.s., i.e., {tW1/t }t≥0 follows the same probability law as {Wt }t≥0 (Time-
inversion property);
11. {Wt+h − Wt }t≥0 with h > 0 is also a standard Brownian motion;
12. A standard Brownian motion process is a continuous-time martingale (given {Wt }t≥0 and At =
σ(Ws , 0 ≤ s < t), E(Wt | As ) = E(Wt − Ws | As ) + E(Ws | As ) = 0 + Ws = Ws because
Wt − Ws is independent of As and Ws is As -measurable, and E((Wt − Ws )2 | As ) = t − s since
the increment is independent of the past and normally distributed with mean zero- a standard
Brownian motion with independent increments is a square integrable martingale);
A random process {Wt = (Wt1 , · · · , Wtd )}t≥0 is a d-dimensional standard Brownian motion if every
{Wti }t≥0 is a one-dimensional standard Brownian motion and independent, for i = 1, · · · , d. For
{Wt = (Wt1 , · · · , Wtd )}t≥0 a d-dimensional standard Brownian motion and i, j = 1, · · · , d,
1. E(Wt − Ws | As ) = 0, 0 ≤ s < t < ∞
j
2. E(Wti Ws ) = min{t, s}δij , 1 ≤ i, j ≤ d;
j j
3. E((Wti − Wsi )(Wt − Ws )) = (t − s)δij , 0 ≤ s < t;
with 
1 if i = j
δij = ,
0 if i ̸= j
the Dirac delta function.
When Wt − Ws is normally distributed with mean µ(t − s) and variance σ2 (t − s), we have a general
Brownian motion. The coefficient µ is called drift coefficient and σ2 is the variance coefficient, with
σ2 ̸= 0. When µ = 0 and σ2 = 1 we have process termed normalized Brownian motion.
Given a Brownian motion {Wt }t≥0 process with drift µ and variance σ2 ,
1. {−Wt }t≥0 is a Brownian motion process with drift −µ and σ2 ;
2. { aWbt }t≥0 is a Brownian motion process with drift abµ and variance a2 bσ2 , with constants a, b > 0;
(Wt −Ws )−µ(t−s)
3. σ is a normalized Brownian motion process.
18 CHAPTER 2. MATHEMATICAL FOUNDATIONS

A standard Brownian motion process {Wt }t≥0 is a Markov process with stationary transition probability
R −(x−Ws )2
P(Wt ∈ B | Ws ) = √ 1 B
e 2(t−s) dx a.s., homogeneous with respect to state space and time and
2π (t−s)
it is possible to present continuous sample paths with probability 1- theorem 2.10 guarantees it.
Furthermore, {Wt }t≥0 is also a strong Markov process.
The derivative with respect to time of a standard Brownian motion is a constructed notion, since
Brownian motion are not differentiable due to its unbounded variation, called a white noise and
denoted as Ẇt = dWt /dt. This process is Gaussian and wide-sense stationary (Xt is wide-sense
stationary if E( Xt Xs ) = h(t − s) for h : R → R and E( Xt ) = E( Xs )) with

1 if 0 ∈ A
h(·) = δ0 ( A) = .
0 if 0 ∈
/A

Using R software, one can simulate a Brownian motion path through an Euler scheme, using normally
distributed random variables, as in the following example. There exist other strategies that allow the
construction of this paths.
Example:
n <- 100 # number of steps
T <- 5 # Time

delta_t <- T/n # time step


t <- seq(0, T, by=delta_t) # time vector

W <- rep(0, length(t)) # Brownian motion


[Link](535563663) # Fix results
N <- rnorm (length(t)-1) # N(0,1) r.v.

for (i in 2:n){
W[i] <- W[i-1] + N[i]*sqrt(delta_t)
}

plot (t, W, type="l", ylim=range(W), xlab="Time (Years)",


ylab="W(t)", [Link]=1, lwd=1, lty=1, col="black")
1.0
0.5
W(t)

0.0
−0.5
−1.0

0 1 2 3 4 5

Time (Years)
2.6. ITÔ’S INTEGRAL 19

2.6 Itô’s integral


Consider the filtered probability space (Ω, A, {At }t≥0 , P), and a one dimensional Brownian motion Wt
adapted to the filtration J = {At }t≥0 . Our objective in this section is to define a stochastic integral of
the form Z b Z b
f (s, ω )dW (s, ω ) where ω ∈ Ω, or more succinctly f (s)dWs ,
a a
for any a, b ∈ R.

Definition 2.1 (Natural domain of the Itô integral). For any a, b ∈ R0+ , we define the natural domain
of the Itô integral as the space M2 ([ a, b]) of all real-valued processes f = { f (t)}t∈[a,b] on the product
space Ω × [ a, b] such that
1. the process f is measurable with regards to the σ- algebra on Ω × [ a, b];
2. the process f is adapted with respect to the filtration {At }[a,b] i.e. for any t ∈ [ a, b], f (t) is
At -measurable;
R 
b
3. ∥ f ∥22 ([ a, b]) = E a | f (t)|2 dt < ∞.

Note, that under the norm ∥·∥22 ([ a, b]) the space M2 ([ a, b]) is complete, and any two processes f , g ∈
M2 ([ a, b]) are equivalent if ∥ f − g∥22 ([ a, b]).

Definition 2.2 (Itô Integral for Elementary Functions). A stochastic process ϕ = {ϕ(t)}t∈[a,b] ∈
M20 ([ a, b]) ⊂ M2 ([ a, b]) is said to be an elementary process (where M20 ([ a, b]) is the subspace of el-
ementary functions of M2 ([ a, b])) if

k −1
ϕ(t, ω ) = ∑ ei (ω ), 1(t ,t + ] (t),
i i 1
(2.6)
i =1

where a = t0 < t1 < · · · < tn = b, and ei are square integrable random variables and At -measurable.
Its stochastic integral in respect to Wt is thus defined as
Z b k −1
I (ϕ) =
a
ϕ(t) dW (t) = ∑ ei [Wt + i 1
− Wti ]. (2.7)
i =1

Theorem 2.12 (Itô Isometry). For any elementary function ϕ ∈ M20 ([ a, b]), ∥ f ∥2M2 ([a,b]) = ∥ f ∥2L2 (P) =
0
∥ f ∥22 ([ a, b]) = E( I (ϕ)), i.e. the mapping I : M20 ([ a, b]) → L2 (P) is a continuous isometry.

Lemma 2.1. Let f ∈ M2 ([ a, b]), then


 there exists a sequence of elementary functions ϕn ∈ M20 ([ a, b]), such
Z b 
that lim ∥ f − ϕn ∥2M2 ([a,b]) = lim E | f − ϕn |2 dt = 0.
n→∞ 0 n→∞ a

Definition 2.3 (Itô Integral). For any stochastic process f ∈ M2 ([ a, b]), and a sequence of elementary
functions ϕn ∈ M20 ([ a, b]) such that ϕn ∈ M20 ([ a, b]), such that lim ∥ f − ϕn ∥2M2 ([a,b]) = 0, the Itô integral
n→∞ 0
of f is defined as
Z b
I ( f ) = lim ϕn dW (t) in L2 (P). (2.8)
n→∞ a

Consider now, f , g ∈ M2 ([ a, b]) and α, β ∈ R, then the following 9 properties can be easily proven, by
first proving the properties for an elementary function ϕ ∈ M20 ([ a, b]) and then
 using the fact that there Z b
exists a sequence of elementary functions ϕn ∈ M20 ([ a, b]), such that lim E | f − ϕn |2 dt = 0:
n→∞ a
20 CHAPTER 2. MATHEMATICAL FOUNDATIONS

Rb
1. (Measurability) a f (t) dW (t) is A a -measurable;
R  R 
b b
2. E a f (t) dW (t) = 0 and E a f (t) dW (t)|A a = 0;
 
Rb 2
R
b

2
3. E a
f (t) dW (t) = E a |
f ( t )| dt and
 
Rb 2 R
b
 R
b
 
2 2
E a
f ( t ) dW ( t ) |A a = E a | f ( t )| dt |A a = a
E | f ( t )| |A a dt;

Rb Rb Rb
4. (Linearity) a (α f (t) + βg(t)) dW (t) = α a f (t) dW (t) + β a g(t) dW (t);
hR i hR i
b Rb b
5. E a f (t) dW (t) a g(t) dW (t) = E a f (t) g(t) dW (t) ;

6. If Z is a real value, bounded Ab -measurable random variable then Z f ∈ M2 ([ a, b]) and


Rb Rb
a
Z f (t) dW (t) = Z a f (t) dW (t);
Rb
7. Gaussian If f is deterministic (i.e. f is independent of the value ω ∈ Ω, then a f (t) dW (t) ∼
 R 
b
N 0, a | f (t)|2 dt ;
Rb Rb Rc
8. (Additivity) a
f (t) dW (t) = f (t) dW (t), for any c ∈ ( a, b);
c
f (t) dW (t) + a

9. For any set A ⊂ [ a, b], we have A f (t) dW (t) = a f (t)1 A dW (t).


R Rb

Definition 2.4 (Itô Integral). For any stochastic process f ∈ M2 ([ a, b]), the undefinided Itô stochastic
integral of f is defined as
Z t
I (t) = f (t) dW (t), for any t ∈ [0, T ], (2.9)
0

where f (0) = 0, { I (t)}t∈[0,T ] is At -adapted, square integrable and martingale with respect to the
filtration {At }t≥0 , i.e.

t t
Z  Z 
E[ I (t)|As ] = E[ I (s)|As ] + E f (r ) dW (r )|As = I (s) + E f (r ) dW (r ) = I (s), (2.10)
s s

for any 0 ≤ s < t ≤ T. By the Doob’s martingale property we have:


" # "Z #
Z t 2 t 2 Z T

E max f (s) dW (s) ≤ 4E f (s) dW (s) = 4E | f (s)|2 ds . (2.11)
0≤ t ≤ T 0 0 0

Moreover, { I (t)}t∈[0,T ] has a continuous version a.s., that is, there exists a t-continuous stochastic
 Rt 
process Jt on (Ω, A, {A}t , P) with continuous paths such that P Jt = 0 g(s) dW (s) = 1, for any
t ∈ [0, T ].

Rt
Let us write the Itô’s integral of a stochastic process f as Xt = 0 f (s) dW (s), for every t ∈ [0, T ] the
following four properties are easily shown:
Rt
1. E( Xt ) = 0 and V( Xt ) = E( Xt2 ) = 0 E( f (s)2 ) ds, for any t ∈ [0, T ];
R min{t,s}
2. Cov( Xt , Xs ) = E( Xt Xs ) = 0 E( f (u)2 ) du, for any t, s ∈ [0, T ];
 Rt
3. E | Xt |2 = 0 E | f (u)|2 du, for any t ∈ [0, T ];


4. Xt has the orthogonal property, that is E [( Xu − Xt )( Xs − Xr )] = 0 for 0 ≤ r ≤ s ≤ t ≤ u ≤ T.


2.7. ITÔ FORMULA 21

2.7 Itô formula


Definition 2.5 (Itô’s process). Let {Wt }t≥0 be a one-dimensional Brownian motion defined on the
complete probability space (Ω, A, {At }t≥0 , P). A continuous and adapted process Xt is called an Itô
Process if it is of the form
Z t Z t
X t = X0 + f (s) ds + g(s) dW (s), (2.12)
0 0

where f ∈ L1R+ and g ∈ L2R+ . Its stochastic differential is


0 0

dXt = f (s) ds + g(s) dW (s) (2.13)

Theorem 2.13 (One-Dimensional Itô Formula). Let Xt be an Itô process as previously defined. Consider a
function V = V ( X, t) : R × R0+ → R twice continuously differentiable in X and once in t. Then the process
Y (t) = V ( X (t), t) is an Itô process that can be expressed as

Y ( t ) − Y (0) = V ( X ( t ), t ) − V ( X (0), 0) (2.14)


Z t 
1 2
= Vt ( X (s), s) + VX ( X (s), s) f (s) + VXX ( X (s), s) g(s) ds
0 2
Z t
+ VX ( X (s), s) g(s) dWs
0

We can all write the Itô formula for Y (t) in the differential form:
 
1 2
dY (t) = Vt ( X (t), t) + VX ( X (t), t) f (t) + VXX ( X (t), t) g(t) dt (2.15)
2
+ VX ( X (t), t) g(t) dWt

The proof of the Itô formula while simple is quite laborious. It is a consequence of the application of
the two dimensional Taylor series to V ( X, t), which results in the equation
∞ ∞ Z t Z X (t)
1
V ( X ( t ), t ) − V ( X (0), 0) = ∑ ∑ n!m! 0 X (0)
VX n ,tm (dX )n (ds)m , (2.16)
n =0 m =0

and after every component of the sum is computed we arrive at the Itô formula. The computation for
each term of the sum can be heuristically expressed by the Itô multiplication table:

· dt dW (t)
dt 0 0
dW (t) 0 dt

Theorem 2.14 (Integration by parts Formula). Let f (s) be independent of Ω, with continuous and bounded
variation on s ∈ [0, T ]. Then
Z t Z t
f (s) dW (s) = f (t)W (t) − W ( s ) d f ( s ). (2.17)
0 0

The previous theorem 2.17 is the result of an application of the Itô formula to the function V ( X, t) =
f (t) X, where X is W (t).
Rt Rt
Note that if { Xt }t≥0 is an Itô process Xt = X0 + 0 f (s) ds + 0 g(s) dW (s) where both f and g are
deterministic functions of t, with X0 constant, then
 Z t Z t 
X (t) ∼ N [E( X (t)), V( X (t))] = N X0 + f (s) ds, 2
g(s) ds
0 0
22 CHAPTER 2. MATHEMATICAL FOUNDATIONS

with independent increments.

The following tables present the solution to common stochastic integrals and the function V ( X, t),
where X is W (t), that was applied to the Itô formula, if any, to solve the integral.

Stochastic Integral
Z t
1. dW (s) = W (t)
Z0 t
1 1
2. W (s) dW (s) = W ( t )2 − t
Z0 b 2 2
3. c dW (s) = c (W (b) − W ( a)),
Za b
1  1
4. W (s) dW (s) = W ( b )2 − W ( a )2 − ( b − a )
Za t 2 Z 2
t
5. s dW (s) = tW (t) − W (s)ds
Z0 t 0
t
1
Z
6. W (s) dW (s) = W (t)3 −
2
W (s)ds
Z0 t 3 0Z
1 t W (s)
7. eW (s) dW (s) = eW (t) − 1 − e ds
Z0 2 0
t 1
Z t
W (s) W (t) W (t)
8. W (s)e dW (s) = 1 + W (t)e −e eW (s) (1 + W (s)) ds

Z0 0 2
t 1 t
 
1 t
Z
9. sW (s) dW (s) = W ( t )2 − − W (s)2 ds
0
Z t
2 2 2 0
1
W (s)2 − s dW (s) = W (t)3 − tW (t)

10.
Z0 t 3  µ σ Z t
µs+σW (s) 1  µt+σW (t)
11. e dW (s) = e −1 − + eµs+σW (s) ds, with µ, σ constants
Z0 t σ σ 2 0
1 t
Z
12. sin (W (s)) dW (s) = 1 − cos(W (t)) − cos(W (s))ds
Z0 t Z t2 0
1
13. cos (W (s)) dW (s) = sin(W (t)) + sin(W (s))ds
0 Z 2 0
t

14. E dW (s)
 Z0 t 
15. E W (s) dW (s) = 0
Z0 t
t2

16. V W (s) dW (s) =
0 2

V ( X, t)
1. direct
2. V ( X, t) = X 2
3. direct
4. V ( X, t) = X 2
5. V ( X, t) = tX
6. V ( X, t) = X 3
7. V ( X, t) = e X
8. V ( X, t) = Xe X and the result from 2
9. V ( X, t) = tX 2
10. from results 5 and 6
11. V ( X, t) = eµt+σX
12. V ( X, t) = cos( X )
13. V ( X, t) = sin( X )
14. direct
15. direct from 2
16. direct from 2
2.8. MARTINGALE REPRESENTATION THEOREM 23

2.8 Martingale Representation Theorem


Theorem 2.15 (Itô Representation Theorem). Let {Wt }t≥0 be a Brownian motion in the probability space
(Ω, At , {At }t≥0 , P). For every f ∈ L2 (P) and adaptedto the filtration {At }t≥0 , there exists a unique stochastic
Z t
process Ut , adapted to {At }t≥0 , where E U (s)2 ds < ∞, such that
0
Z t
f (t) = E[ f (0)] + Us dW (s). (2.18)
0

Theorem 2.16 (Martingale Representation Theorem). Let {Wt }t≥0 be a Brownian motion in the probability
space (Ω, At , {At }t≥0 , P). For every { Xt }t≥0 a square martingale to the filtration {At }t≥0 , there exists
adapted 
 Z t
a unique stochastic process Ut , adapted to {At }t≥0 , where E U (s)2 ds < ∞, such that
0
Z t
X (t) = E[ X (0)] + Us dW (s). (2.19)
0

2.9 Multidimensional Itô Formula


Definition 2.6 (Multidimensional Itô Process). Suppose {Wt }t≥0 = (W1 (t), . . . , Wm (t)) is an m-
dimensional Brownian motion defined in the probability space (Ω, A, {At }t≥0 , P), adapted to the
filtration {At }t≥0 , where the components Wr , for r = 1, . . . , m, are independent one dimensional
Brownian motions. A Rd -valued continuous adapted process {X(t)}t≥0 = ( X1 (t), . . . , Xd (t))′ is called
a d-dimensional Itô process if it is of the form,
Z t Z t
X t = X0 + f(s) ds + g(s) dW(s), (2.20)
0 0

or in a different notation,
Z t m Z t
Xi ( t ) = Xi ( 0 ) + f i (s) ds + ∑ gi,j (s) dWj (s), for any i = 1, 2, . . . , d., (2.21)
0 j =1 0

where f(t) = ( f 1 (t), . . . , f d (t))′ ∈ L1R+ Rd and g(t) = gi,j d×m ∈ L2R+ Rd×m . Its stochastic differ-
   
0 0
ential is
dXt = f(s) dt + g(t) dW(t), (2.22)
or in a different notation,
m
dXi (t) = f i (t) dt + ∑ gi,j (t) dWj (t), for any i = 1, 2, . . . , d. (2.23)
j =1

Theorem 2.17 (Multidimensional Itô Formula). Let Xt be an Itô process as previously defined . Consider a
function V = V (X, t) : Rd × R0+ → R twice continuously differentiable in X and once in t. Then the process
Y(t) = V (X(t), t) is an Itô process that can be expressed as
 
1
dY (t) = Vt (X(t), t) + VX′ (X(t), t)f(t) + tr g′ (t)VXX (X(t), t)g(t) dt

(2.24)
2
+ VX′ (X(t), s)g(t) dW(t),
or
" #
d
1 d d m
dY (t) = Vt + ∑ VXi f i (t) + ∑ ∑ ∑ VXi Xj gik (t) g jk (t) dt (2.25)
i =1
2 i =1 j =1 k =1
d m
+ ∑ ∑ VXi gij (t) dWj (t).
i =1 j =1
24 CHAPTER 2. MATHEMATICAL FOUNDATIONS

The proof of the multidimensional Itô formula is analogous to the one-dimensional case. It is a
consequence of applying the d + 1 dimensional Taylor series to V ( X1 , X2 , . . . , Xd , t), which results in
the equation

V (X( t ), t ) − V (X(0), 0) (2.26)


=
∞ ∞ ∞ Z t Z X nd (t ) Z X n1 ( t )
1
∑ ∑ ∑
d 1
··· n
··· n
VX n1 ,...,X nd ,tm (dX1 )n1 . . . (dXd )nd (ds)m ,
n1 =0 nd n
=0 m =0 1 ! . . . nd !m! 0 Xd d (0) X1 1 ( 0 ) 1 d

and after every component of the sum computed we arrive at the multidimensional Itô formula. Just
as we did previously, the computation for each term of the sum can be heuristically expressed by the
multidimensional Itô multiplication table:

· dt dWi dWj
dt 0 0 0
dWi 0 dt 0
dWj 0 0 dt
Chapter 3

Solution of Stochastic Differential


Equations

The present chapter is dedicated to the study of the solutions of Stochastic Differential Equation, namely
the conditions to its existence and uniqueness, how to determine the exact solution and its stability.

3.1 Introduction
We are interested in the solution of a Stochastic Differential Equation of the form
(
dX (t) = f ( X (t), t)dt + g( X (t), t)dW (t), t ∈ [t0 , T ], T > 0
, (3.1)
X ( t 0 ) = X0

where X0 is independent of the W (t) − W (t0 ) and E(| X0 |2 ) < +∞. Function f is called the drift
coefficient and can be interpreted as the short-term growth. Function g is the diffusion coefficient
and can be viewed as the short-term variability. If g( X (t), t) = 0, the equation 3.1 is converted to an
Ordinary Differential Equation.
Let (Ω, A, P) be a complete probability space endowed with the filtration J = {At }t≥0 that is right-
continuous and {A0 } contains all P-null sets, {Wt }t≥0 a one-dimensional Brownian motion defined
on the filtered probability space (Ω, A, {At }t≥0 , P) and, for 0 ≤ to < T < +∞, X0 is a At0 -measurable
random variable with E(| X0 |2 ) < +∞. Furthermore, consider f , g : R × [to , T ] → R Borel-measurable
functions. One can represent the solution of 3.1 in the integral form as follows:
Z t Z t
X ( t ) = X (0) + f ( X (s), s)ds + g( X (s), s)dW (s), t ∈ [t0 , T ]. (3.2)
t0 t0

A one-dimensional stochastic process { Xt }t∈[to ,T ] is the solution to equation 3.1 if:


1. { Xt } is continuous and At -adapted;
2. { f ( X (t), t)} ∈ L1 ([t0 , T ]) and { g( X (t), t)} ∈ L2 ([t0 , T ]);
3. Equation 3.2 holds for every t ∈ [to , T ] almost surely (a.s.).
Finally, a solution { Xt }t∈[to ,T ] is unique if it is indistinguishable from another solution { X̃t }t∈[to ,T ] , i.e.,
P( Xt = X̃t , t ∈ [t0 , T ]) = 1.

3.2 Existence and Uniqueness


The following theorem emphasize the conditions to have a solution that is unique to equation 3.1.

25
26 CHAPTER 3. SOLUTION OF STOCHASTIC DIFFERENTIAL EQUATIONS

Theorem 3.1 (Existence and Uniqueness Theorem). Let the coefficient functions f , g of the SDE 3.1 satisfy
the following conditions:
1. Uniform Lipschitz condition: | f ( x, t) − f (y, t)| + | g( x, t) − g(y, t)| ≤ k | x − y|, for some k ∈ R constant,
t ∈ [t0 , T ], T > 0;
2. Linear growth condition: | f ( x, t)|2 + | g( x, t)|2 ≤ c2 (1 + | x |2 ), for some constant c and t ∈ [t0 , T ], T >
0.
If these conditions hold, then there exists a continuously adapted solution Xt of equation 3.1 such that X (0) = X0
and Xt is uniformly bounded in M2 ([t0 , T ]) (supt0 ≤t≤T E(| Xt2 |) < +∞). Furthermore, if both Xt and X̃t are
 
2
continuous M ([t0 , T ]) bounded solutions of 3.1, then P sup | Xt − X̃t | = 0 = 1, and thus Xt is the
t0 ≤ t ≤ T
unique solution.
R 1/2
t λs
One proof of the Existence Uniqueness Theorem involves the use of a norm ∥ X ∥ = 0
e E [| X s | 2 ] ds ,
where λ > k2 (t + 1), and the operator Ψ : M2 ([t0 , T ]) → M2 ([t0 , T ]), where Ψ( X )t =
Z t Z t
X0 + f (s, Xs ) ds + g(s, Xs ) dWs . The linear growth condition ensures that the operator Ψ is well
0 0
defined (i.e. Ψ( X )t ∈ M2 ([t0 , T ]) for any X ∈ M2 ([t0 , T ])), and the uniform Lipschitz condition guaran-
tees the operator Ψ is contractive under the aforementioned norm (i.e. ∥ X − Y ∥ > ∥Ψ( X ) − Ψ(Y )∥).
Hence, by the Fixed Point Theorem, (3.2) has a unique solution.
Another way to look at the theorem is by noticing that the first condition guarantees that functions f
and g do not change faster than a linear function of X relative to changes in X, implying the Lipschitz
continuity of f (·, t) and g(·, t) for all t ∈ [t0 , T ]. The second condition restricts the coefficient functions
f and g to linear increases with respect to X, guaranteeing a.s. that the solution Xt does not explode in
the interval [t0 , T ] independently of X0 .
A solution Xt is termed a strong solution if it has strongly unique sample path and the filtered probability
space, the Brownian motion and the coefficient functions are specified in advance. When only f and g
are specified in advance and the pair ( X̃t , W̃t ) is defined on a suitably probability space the solution X̃t
is called a weak solution, besides the verification of the conditions of theorem 3.1. If two weak solutions
are indistinguishable, then path uniqueness holds. Two weak solutions are called weakly unique if
they have the same probability distribution function. A strong solution is also weak, but generally a
weak solution is not strong.
Given that the conditions of theorem 3.1 are quiet restrictive, the first condition can be relaxed to
a local uniform Lipschitz condition: for every n ≥ 1, exists a positive constant k n such that, for all
x, y ∈ R, t ∈ [t0 , T ], T > 0 and max {| x |, |y|} ≤ n,

| f ( x, t) − f (y, t)| + | g( x, t) − g(y, t)| ≤ k n | x − y|. (3.3)

Thus, if this local uniform Lipschitz condition holds simultaneously with the second condition of
theorem 3.1, then there exists a unique solution Xt , Xt ∈ M2 ([t0 , T ]) to the SDE 3.1. This holds every
time when f and g are continuous differentiable in x on R × [t0 , T ].
Moreover, it is also possible to replace the linear growth condition by a monotone condition: let k be a
positive constant such that
1
x f ( x, t) + | g( x, t)|2 ≤ k (1 + | x |2 ), ∀ ( X (t), t) ∈ R × [t0 , T ], T > 0. (3.4)
2
Now, if 3.3 and 3.4 holds, then there exists a solution Xt to 3.1, with Xt ∈ M2 ([t0 , T ]). The linear growth
condition implies the monotonic condition, but the opposite may not be true.
It worth mention that for f and g defined on R × [t0 , T ] and when the conditions of theorem 3.1 are
verified on every finite subinterval [t0 , T ] ⊂ [to , +∞), the equation 3.1 has a unique solution Xt define
on [t0 , T ) called global solution.
3.3. LINEAR SDES 27

An autonomous SDE of the form


(
dX (t) = f ( X (t))dt + g( X (t))dW (t), t ∈ [t0 , T ], T > 0
, (3.5)
X ( t 0 ) = X0

with X0 a random variable independent of W (t) − W (t0 ), that verifies the assumptions of theorem 3.1
has a unique, continuous global solution Xt on [t0 , +∞) such that X (t0 ) = X0 given that

| f ( x ) − f (y)| + | g( x ) − g(y)| ≤ k| x − y| (3.6)

is satisfied and
| f ( x )|2 + | g( x )|2 ≤ k2 (1 + | x |2 ) (3.7)

holds for a fixed y.

3.3 Linear SDEs


Equation 3.1 is a linear SDE if the coefficient functions f and g are both linear with respect to X
on R × [t0 , T ]. Thus, if f ( X (t), t) = a(t) + A(t) X (t) and g( X (t), t) = b(t) + B(t) X (t) hold with
a(t), b(t), A(t), B(t) : [to , T ] → R, the linear SDE is as follows:

(
dX (t) = ( a(t) + A(t) X (t))dt + (b(t) + B(t) X (t))dW (t), t ∈ [t0 , T ], T > 0
. (3.8)
X ( t 0 ) = X0

A linear SDE is homogeneous if a(t) = b(t) = 0 and linear in the narrow sense if B(t) ≡ 0.
If functions a(t), b(t), A(t), B(t) are measurable and bounded on [t0 , T ],T > 0, i.e.,

sup (| a(t)| + |b(t)| + | A(t)| + | B(t)|) < +∞,


t0 ≤ t ≤ T

then functions f ( X (t), t) = a(t) + A(t) X (t) and g( X (t), t) = b(t) + B(t) X (t) satisfy the hypothesis of
theorem 3.1, guaranteeing that equation 3.8 has a unique continuous solution Xt over [t0 , T ], provided
E(| X0 |2 ) < ∞ and X0 is independent of W (t) − W (t0 ). Again, if the previous is verified in all subsets
of [t0 , +∞), the solution Xt is classified as a unique, continuous and global solution.
An autonomous linear SDE requires functions a(t), b(t), A(t), B(t) independent of time, with SDE
represented by
dX (t) = ( a + AX (t))dt + (b + BX (t)) dW (t), (3.9)

where a, b, A, B are real constants and a unique, continuous and global solution Xt is always achievable
for this type of SDE.
Itô formula is extremely useful to determine the solution to a SDE of the form Xt = (Wt , t). One can
obtain that solution by writing Xt = (Wt , t) in terms of second-order Taylor Expansion and applying
the rule of products established on Itô multiplication rule. After the relevant simplifications, it remains
to match the coefficients of the first SDE with the coefficients of the SDE obtained by the previous
explained strategy. Finally, the function that satisfies both equations is the solution to the SDE. This
methodology is called Itô coefficient matching and the class of SDEs that are particularly well suited
for solving applying this method are characterized as exact.

Theorem 3.2 (Test for exactness). A SDE is exact if the coefficient functions f ( x, t) and g( x, t) satisfy the
following condition:
1
f x = gt + gxx .
2
28 CHAPTER 3. SOLUTION OF STOCHASTIC DIFFERENTIAL EQUATIONS

Note, that the test for exactness, is just an application of the Itô’s form to a stochastic function h(t, Wt ),
where h(t, Wt ) = Xt .
Example To solve the Geometric BM one can apply directly Itô formula or Itô matching coefficient
strategy.
(
dXt = µXt dt + σXt dWt , t ∈ [0, +∞)
(3.10)
X ( 0 ) = X0

Let’s start by Itô formula: it’s possible to rewrite Geometric BM as

dXt
= µdt + σdWt . (3.11)
Xt

Left hand side of the equation suggest the existence of ln( Xt ) in the solution. Thus, considering
V ( Xt , t) = ln( Xt ), f = µXt and g = σXt , Itô formula returns
 1 1 −1 2 2   1 
dV ( Xt , t) = 0 + µXt + σ X t dt + σX t dWt , (3.12)
Xt 2 Xt2 Xt

which can be developed as


Z t Z t Z t 
1 
dV ( Xs , s) = µ − σ2 ds + σ dWs . (3.13)
0 0 2 0

Finally, substituting V ( Xt , t) and integrating, one obtains

1
ln( Xt ) − ln( X0 ) = (µ − σ2 )t + σ2 Wt, (3.14)
2
with W0 = 0. The solution of Geometric BM SDE is
X  1 1 2 2
t
ln = (µ − σ2 )t + σ2 Wt ⇐⇒ Xt = X0 e(µ− 2 σ )t+σ Wt . (3.15)
X0 2

To apply Itô matching coefficient strategy let’s impose the conditions to find function h( x, t):
1. µh( x, t) = ht ( x, t) + 21 h xx ( x, t)
2. σh( x, t) = h x ( x, t)
From 2., one can obtain
h x ( x, t)
= σ ⇐⇒ h( x, t) = eσx+c(t) , (3.16)
h( x, t)
with c(t) arbitrary. Replacing that information in 1., one obtains

1  1 
µeσx+c(t) = c′ (t)eσx+c(t) + σ2 eσx+c(t) ⇐⇒ µeσx+c(t) = c′ (t) + σ2 eσx+c(t) (3.17)
2 2
The previous equation imposes that

1  1 
c′ (t) = µ − σ2 ⇐⇒ c(t) = µ − σ2 t, (3.18)
2 2
1 2 1 2
obtaining h( x, t) = eσx+(µ− 2 σ )t . Finally, the solution to the Geometric BM SDE is Xt = X0 e(µ− 2 σ )t+σ2 Wt ,
since it is necessary to leave Xt on the left hand side of the SDE, as in the Itô formula. ■
Below we present the strong solutions of specific types of SDE, given the General form dX (t) =
f ( X (t), t)dt + g( X (t), t)dW (t).
3.3. LINEAR SDES 29

1. Linear → dX (t) = ( at + bt Xt )dt + (ct + et Xt )dW (t)


Rt Rt Rt
Solution: X (t) = Φ(t){ X0 + 0 ( as − es cs )Φ− s ds + 0 cs Φs dWs }, with Φ ( t ) = exp{ 0 es dWs +
1 −1
Rt −1 = exp {− s e dW − s ( b − 1 e2 ) du }
1 2
Φ
R R
0
( b s − 2 e s ) ds } and s 0 u u 0 u 2 u

(a) Homogeneous → dX (t) = bt Xt dt + et Xt dW (t) ( at = ct ≡ 0)


Rt Rt
Solution: Xt = X0 exp{ 0 (bs − 12 e2s )ds + 0 es dWs }

(b) Narrow sense →R dX (t) = ( at + bRt Xt )dt + ct dW (t) (Ret ≡ 0)


t Rt u Rt u
Solution: Xt = e 0 bs ds { X0 + 0 e− 0 bu du as ds + 0 e− 0 bu du cs dWs}

2. Autonomous → dX (t) = f ( Xt )dt + g( Xt )dW (t) ( f ( Xt , t) ≡ f ( Xt ); g( Xt , t) ≡ g( Xt ))

(a) Linear → dX (t) = ( a + bXt )dt + (c + eXt )dW (t)


Rt Rt Rt 1 ds + c t Φ−1 ds },
Solution: Xt = exp{e 0 dWs + (b − 21 e2 ) 0 ds} × { X0 + ( a − ec) 0 Φ−
R
s 0 s
with Φ− 1 = exp {− e s dW − ( b − 1 e2 ) s du }
R R
s 0 u 2 0

(b) Homogeneous → dX (t) = bXt dt + eXt dW (t) ( a = c ≡ 0)


Rt Rt Rt
Solution: Xt = X0 exp{ 0 bs ds + 0 es dWs − 12 0 e2s ds}

(c) Narrow sense → dX (t) = ( a + bXt )dt + cdW (t) (e ≡ 0)


Rt Rt
Solution: Xt = X0 ebt + a 0 eb(t−s) ds + c 0 eb(t−s) dWs

The solution to the most general case, i.e. dX (t) = ( at + bt Xt )dt + (ct + et Xt )dW (t), can be computed
by presupposing Xt is the product of two well known stochastic processes Ut = αt dt + β t dWt , and
Vt = bt Vt dt + et Vt dWt , applying the Itô formula to the product Ut Vt , developing the resulting equation,
and find the expressions for αt and β t that satisfy the assumed equality, dXt = d(Ut Vt ).

A solution Xt , with X (t0 ) = X0 , is stochastically bounded if for each ϵ > 0 there exists a γϵ = γϵ (t0 , X0 )
such that
inf P(| Xt | ≤ γϵ ) > 1 − ϵ. (3.19)
t∈[t0 ,T ]

If γϵ only depends on X0 , then Xt is termed uniformly stochastically bounded. A sufficient condition


for stochastic boundedness of a process { Xt }t∈[t0 ,T ] is a p > 0 such that the pth-order moment E| Xt | p is
bounded on the interval [t0 , T ].

Theorem 3.3 (Existence of Moments). Let the assumptions of theorem 3.1 hold, E| X0 |2p < +∞ for p integer
and Xt a solution to SDE 3.1 on interval [t0 , T ], T < +∞. Thus,

1. E| Xt |2p ≤ (1 + E( X0 )2p )ec(t−t0 )

2. E| Xt − X0 |2p ≤ D (1 + E( X0 )2p )(t − t0 ) p ec(t−t0 ) ,

where c = 2p(2p + 1)K2 and D constants dependent on p, k, T − t0 .

The two previous inequalities can be proved by the use of other well known inequalities, being the two
most worthy of note among them, the Gronwall’s inequality, and the assumed linear growth condition.

RT
Theorem 3.4. Let p ≥ 2, g(t) ∈ M2 ([0, T ]) and E 0
| g(s)| p ds. Then
Z t Z T
p p ( p − 1 ) p p −2
E| g(s)dW (s)| ≤ ( )2 T 2 E | g(s)| p ds. (3.20)
0 2 0
30 CHAPTER 3. SOLUTION OF STOCHASTIC DIFFERENTIAL EQUATIONS

Let f , g satisfy the conditions stated on the theorem 3.1 and Xt be the solution of SDE 3.1. Then, Xt is
continuous on [t0 , T ] if there is a constant c > 0 such that | Xt − Xr |2 ≤ c|t − r |, t0 ≤ r, t ≤ T, where
Rt Rt
Xt − Xr = r f ( Xs , s)ds + r g( Xs , s)dWS holds.
In some cases, the solution of a SDE is a Markov process, as follows.

Theorem 3.5. Let Xt be a solution to SDE 3.1 with coefficient functions f , g satisfying the conditions of theorem
3.1. Then Xt is a Markov process with initial probability given by P0 ( B) = P( X0 ∈ B), B ∈ B , transition
Rt
probability P( x, s; B, t) = P( Xt ( x, s) ∈ B) and Xt = Xt ( x, s) is a solution to Xt = x + s f ( Xr , r )dr +
Rt
s
g( Xr , r )dWr , s ≤ t.

If instead of the conditions of theorem 3.5 f , g are uniformly Lipschitz continuous with linear growth
condition satisfied, then the solution Xt is called a strong Markov process.
When transition probabilities are stationary, P( Xt+τ ∈ B | At ) = P( Xτ ∈ B | X0 ), 0 ≤ u ≤ T − t, a
Markov process is said to be homogeneous.
When in the case of an autonomous SDE, if f , g satisfy the conditions of theorem 3.1 with stationary
transition probabilities the solution Xt is a homogeneous Markov process. Again if f , g are uniformly
Lipschitz continuous and satisfy the linear growth condition, stronger conditions, Xt is called homoge-
neous strong Markov process.
A solution of a SDE can be expressed as special stochastic process called Itô diffusion. A one-
dimensional Markov process with continuous sample paths with probability 1 is a diffusion process if
transition probabilities satisfy:
1. limt↓s t−1 s |y− x|<ϵ p( Xt = y | Xs = x )dy = 0;
R

2. limt↓s t−1 s |y− x|<ϵ (y − x ) p( Xt = y | Xs = x )dy = a( x, s);


R

3. limt↓s t−1 s |y− x|<ϵ (y − x )2 p( Xt = y | Xs = x )dy = b( x, s)2 ,


R

with a( x, s) representing the conditional infinitesimal mean and b( x, s)2 the variance of the process.

Theorem 3.6. Given a SDE of the form of 3.1, if f and g satisfy the conditions of theorem 3.1, then any solution
Xt is a diffusion process on interval [0, T ] with drift f ( Xt , t) and diffusion g( Xt , t)2 .

3.4 Stability
Due to some constraints in the representation of solutions of SDEs, its importance is more related to the
qualitative information one can extract, rather than the presentation of the explicit solution.
A relevant aspect of solutions is the concept of stability, which evaluates the impact of a small change
in the initial conditions on the solution of the SDE. When small changes in the initial conditions lead to
small changes in the solution we are in the presence of a stable solution. When small changes lead to
large changes in the solution, we have an unstable solution.
Considering equation 3.1, let (a) the assumptions of theorem 3.1 hold, (b) the initial value X (t0 ) =
X0 ∈ R is constant with probability one, (c) equation 3.1 has a unique global solution Xt (t0 , X0 ) with
continuous sample paths and finite moments for any X0 independent of Wt , t ≥ t0 and (d) f (0, t) =
g(0, t) = 0 such that the unique solution Xt ≡ 0 corresponds to the initial value X (t0 ) = X0 = 0. The
trivial or equilibrium solution to SDE 3.1 is Xt ≡ 0.
It is important to state some concepts related to stability. A continuous scalar-valued function v( x )
defined on a suitably restricted neighbourhood of zero Nh = { x || x | < h, h > 0} is positive definite
if v(0) = 0 and v( x ) > 0, x ∈ Nh ∀ x ̸= 0. A function v is negative definite if −v is positive
definite. Function v is descending if exists a positive definite function u( x ) such that v( x, t) ≤ u( x ),
3.4. STABILITY 31

that is, function u( x ) is an upper bounded for v( x, t). Function v( x, t) is radially unbounded if
lim| x|→+∞ inf{t≥t0 } v( x, t) → +∞.
Let Vt = v( Xt , t) be a continuous positive definite function twice differentiable in Xt and once in t
defined on Nh × R, Nh = { Xt || Xt | < h, h > 0}. The operator L associated with SDE 3.1 is
δ δ 1 δ2
L= + f ( Xt , t ) + g ( Xt , t )2 2 . (3.21)
δt δXt 2 δXt
Applying L on Vt we get
δv δv 1 δv2
Lv( Xt , t) = + f ( Xt , t ) + g ( Xt , t )2 2 . (3.22)
δt δXt 2 δXt
Using Itô formula, if Xt ∈ Nh then
δv
dv( Xt , t) = Lv( Xt , t)dt + g( Xt , t) dWt . (3.23)
δXt
A stable solution to 3.1 requires that, on average, E(dVt ) = E( Lv( Xt , t)dt) < 0, which is satisfied if
Lv( Xt , t) ≤ 0, ∀t ≥ 0. Thus, Vt = v( Xt , t) represents the stochastic Lyapunov function for SDE 3.1,
which allows the application of extended Lyapunov method to determine if the trivial solution is stable.
The main issue with Lyapunov method is Lyapunov function, since it is often hard to find.
Assuming that conditions (a)-(d) are satisfied, the trivial or equilibrium solution is stochastically stable
or stable in probability if  
lim P sup | Xt (t0 , X0 )| ≥ ε = 0, (3.24)
X0 → 0 t∈[t0 ,+∞)

otherwise the solution is called stochastically unstable.


Trivial solution is stochastically asymptotically stable if it is stochastically stable and
 
lim P lim Xt (t0 , X0 ) = 0 = 1. (3.25)
X0 → 0 t→+∞

Moreover, the trivial solution is globally stochastically asymptotically stable, or stochastically asymp-
totic stable in large if it is stochastically stable and ∀ X0 ∈ R
 
P lim Xt (t0 , X0 ) = 0 = 1. (3.26)
t→+∞

Theorem 3.7. 1. Suppose assumptions (a)-(d) hold and there exists a positive definite stochastic Lyapunov
function v( Xt , t) defined on Nh × [t0 , +∞) that is everywhere continuous, twice differentiable in Xt and
once in t. Let Lv( Xt , t) ≤ 0, t ≥ t0 , 0 ≤ Xt ∈ Nh , where Lv( Xt , t) is given by equation 3.22. Then, the
trivial solution is stochastically stable.
2. If, in addition, v( Xt , t) is decreasing function and Lv( Xt , t) is negative definite, then the trivial solution is
stochastically asymptotically stable.
3. If point 2 hold for a radially bounded function v( Xt , t) defined everywhere on [t0 , +∞) × R, then the
trivial solution is globally stochastically asymptotically stable.

Example: Consider the following SDE: dXt = µXt dt + σXt dWt . Let the selected stochastic Lyapunov
function be Vt = v( Xt , t) = Xt2 . Then, applying equation 3.22,
 1 
Lv = 0 + µXt (2Xt ) + (σXt )2 (2)
2

2 1 2 2 
= 2µXt + σ Xt 2
2
 1 2 2
= 2 µ + σ Xt
2
 1 
= 2 µ + σ2 vt ≤ 0
2
32 CHAPTER 3. SOLUTION OF STOCHASTIC DIFFERENTIAL EQUATIONS

if µ + 12 σ2 < 0. Hence, if the inequality holds, the trivial solution Xt = 0 is globally stochastically
asymptotically stable, or stochastically asymptotic stable in large. ■
A. M. Lyapunov developed the stability concept for ODEs: please consult this author and others with
relevant contribution in this topic to better understand the general idea for the case of ODEs.
Chapter 4

Stochastic Differential Equation models

This chapter intends to present some models based on Stochastic Differential Equations with applica-
tions on biological and financial areas. This models are derived from the simpler Ordinary Differential
Equation models, with the introduction of a stochastic component.

4.1 Introduction
A deterministic model ignores the noise presented on the behaviour of a modeled variable. Thus,
the stochastic component aims to address this issue, namely through the introduction of a Brownian
motion process.
Models have a deterministic component that captures the growth rate of the dependent variable,
measurable given the initial conditions, and a stochastic component that represents the variability
provoked by random phenomena.
Regarding biology, some sources of noise are the demographic stochasticity, comprising all factors
related to reproducible and mortality of each individual, environmental stochasticity for issues like
weather, natural disasters and others, mensuration stochasticity originated from the misleading count-
ing and, finally, informational stochasticity for all possible errors or omissions made during the process
of modelling.
In the financial case, noise can be generated when a big institution/ investor program their trading,
creating a suddenly increase in the volume of trade, or when a group of traders bet on the same industry
or company originating a bubble.
The conversion of a non-linear SDE into a linear, by a changing of variable, constitutes a great advantage
since the solution to a linear SDE is already known (please look to 3.3).
Given a non-linear SDE of the form

dXt = f ( Xt , t)dt + g( Xt , t)dWt , (4.1)

consider Yt = U ( Xt , t) with δU /δXt ̸= 0. Then, the solution of 4.1 has the form Xt = V (Yt , t), provided
the coefficients a1 (t), a2 (t), b1 (t), b2 (t) of

dYt = ( a1 (t) + a2 (t)Yt )dt + (b1 (t) + b2 (t)Yt )dWt . (4.2)

From the application of Itô formula to U ( Xt , t), one can verify that the following expressions make 4.1
reducible.
1. Ut + f U X + 21 g2 U XX = a1 (t) + a2 (t)U ( Xt , t);
2. gU X = b1 (t) + b2 (t)U ( Xt , t).

33
34 CHAPTER 4. STOCHASTIC DIFFERENTIAL EQUATION MODELS

The next step uses the equalities above to determine the value of a1 (t), a2 (t), b1 (t), b2 (t), U ( Xt , t) to
further plug in the linear equation 4.2 to find the value of Yt . Then, when Yt is known, it just follows to
compute Xt by the inverse operations.
For the special case of autonomous SDE, we have
1. f U X + 12 g2 U XX = a1 + a2 U ( Xt );
2. gU X = b1 + b2 U ( Xt ).
R Xs ds
b
If g( Xt ) ̸= 0 and b2 ̸= 0, then by 2 we have U ( Xt ) = Ce 2 x0 g(s) − bb12 , with C and X0 constants. Applying
  R Xs ds  
b ( Xt )
U ( Xt ) in 1, one obtains C b2 A( Xt ) + 12 b22 − a2 e 2 x0 g(s) = a1 − a2 bb12 , with A( Xt ) = gf ((XXtt )) − 21 dgdX t
.
 
dA d dA
Skipping computations, b2 dX t
+ dX t
g( Xt ) dX t
= 0 and it represents the necessary and sufficient
condition to convert an autonomous non-linear SDE into an autonomous linear SDE. Moreover,
R Xs ds
b2
• if b2 ̸= 0, U ( Xt ) = e x0 g(s)

R Xt ds
• if b2 = 0, U ( Xt ) = b1 x0 g ( s )
+ C.

It’s mandatory to chose b1 in a way that it satisfies 2. Given U ( Xt ), b1 , b2 properly defined, 1 allows us
to determine a1 , a2 . Substituting this coefficients in 4.2, the solution Yt is direct. Finally, the solution Xt
to the non-linear problem 4.1 is obtained from the application of the inverse operation on Yt .
Example: To find the solution of the non-linear SDE

1
dXt = − e−2Xt dt + e−Xt dWt , (4.3)
2

with f ( Xt ) = − 21 e−2Xt and g( Xt ) = e−Xt , let’s first evaluate A( Xt ),

f ( Xt ) 1 dg( Xt ) 1 1
A ( Xt ) = − = − e−Xt + e−Xt = 0. (4.4)
g( Xt ) 2 dXt 2 2

The result 4.4 implies that


dA d  dA 
b2 + g ( Xt ) =0 (4.5)
dXt dXt dXt
holds for any b2 . For b2 = 0 and b1 = 1,

gU X = b1 ⇔ e−Xt U X = 1 ⇔ U X = e Xt ⇒ U = e Xt . (4.6)

Then, f U X + 12 g2 U XX = a1 + a2 U ( Xt ) implies

1 1 2
− e−2Xt · e Xt + e−Xt e Xt = a1 + a2 e Xt ⇔ 0 = a1 + a2 e Xt , (4.7)
2 2
thus a1 , a2 can assume any value. To simplify, let a1 = a2 = 0, check 4.2 and note that the linear SDE is
Z t Z t
dYt = dWt ⇒ dYs = dWs ⇔ Yt = Y0 + Wt . (4.8)
0 0

Since Yt = U ( Xt ) = e Xt , then the solution to 4.3 is

Yt = Y0 + Wt ⇔ e Xt = e X0 + Wt ⇔ Xt = ln(e X0 + Wt ). ■ (4.9)

When Yt = U ( Xt ) is known in advance, we resort to the second-order Taylor expansion of U to write


dYt = dU ( Xt ) = U X dXt + 12 U XX (dXt )2 and substitute dXt by the corresponding SDE, which will lead
to an autonomous linear SDE.
4.2. DETERMINISTIC AND STOCHASTIC POPULATIONAL GROWTH MODELS 35

Example: If one is working on SDE 4.3 but Yt = U = e Xt is given, then

1
dYt = e Xt dXt + e Xt (dXt )2
2
 1 
= e Xt − e−2Xt dt + e−Xt dWt
2
1 Xt  1 −2Xt 2
− Xt
+ e − e dt + e dWt
2 2
1 1
= − e−Xt dt + dWt + e Xt (e−Xt )2 (dWt )2
2 2
1 − Xt 1
= − e dt + dWt + e−Xt dt
2 2
= dWt

To finalize the answer, one must proceed as in the end of the previous example. ■

4.2 Deterministic and Stochastic Populational Growth Models


The deterministic population growth model is based on the ODE

dNt = rNt dt, (4.10)

where r is a constant and represents the instantaneous relative growth rate that can be rewritten as the
difference of birds (b) and deaths (d) for a single species with no migration, i.e. r = b − d, termed as the
net bird rate or the constant per capita growth rate. The solution to 4.10 is

N (t) = N0 ert , N (0) = N0 , (4.11)

so when r > 0 the population grows exponentially and when r < 0 the decreases exponentially.

If instead of considering relative population growth rate just as the difference between birds and deaths,
one considers a new relative population growth rate defined as the net bird rate plus a noise component,
in the form of r (t) − α(t) · noise with r, α constants and noise = dW
dt , then the ODE is now as follows:
t

dNt dWt
= (r ( t ) − α ( t ) · ) Nt , N (0) = N0 , t ≥ 0 (4.12)
dt dt

or, alternatively,
dNt = r (t) Nt dt − α(t) Nt dWt , N (0) = N0 , t ≥ 0. (4.13)

Noting that SDE 4.13 is autonomous, linear and homogeneous, the solution is a geometric Brownian
1 2
motion process of the form N (t) = N0 e(r− 2 a )t+αWt with

1. E( N (t)) = N0 ert ;
2
2. Var ( N (t)) = N0 e2rt (eα t − 1), if N0 is independent of Wt .

In the long-term:

1. if r > 12 α2 , then N (t) −→ +∞ as t −→ +∞;

2. if r < 21 α2 , then N (t) −→ −∞ as t −→ +∞;

3. if r = 21 α2 , then N (t) fluctuates between arbitrarily large and small small values as t −→ +∞.
36 CHAPTER 4. STOCHASTIC DIFFERENTIAL EQUATION MODELS

4.3 Deterministic and Stochastic Logistic Growth Models


One possible critic addressed to the Stochastic Population Growth model is the possibility of populations
to expand without limit.

To fix this issue, Verhulst proposed a self-regulating mechanism that ponders the population
 relative
N (t)
growth rate according to the current size of the population. The new r is given by r 1 − K , where
K represents the limiting size of the population or carrying capacity. Thus, the deterministic Logistic
Growth Model is
 N (t) 
dN (t) = r 1 − N (t)dt, (4.14)
K
with solution
KN0
N (t) = , N (0) = N0 ,t ≥ 0. (4.15)
N0 + (K − N0 )e−rt

If births are defined as b = b1 − b2 N (t), deaths as d = d1 + d2 N (t) and r as the difference between
births and deaths, then

dNt  
= b1 − b2 N (t) − d1 − d2 N (t) N (t) (4.16)
dt  
= (b1 − d1 ) − (b2 + d2 ) N (t) N (t)
 N (t) 
= (b1 − d1 ) 1 − N ( t ).
(b1 − d1 )/(b2 + d2 )

Hence, K = (b1 − d1 )/(b2 + d2 ).

The stochastic Logistic Growth Model appears with the introduction of the multiplicative noise term,
originating
 N (t) 
dN (t) = r 1 − N (t)dt + αN (t)dWt , (4.17)
K
A strategy similar to the used to convert deterministic population growth model into a stochastic one
may also be applied here.

The solution to the stochastic logistic growth model is obtained resorting to the Itô equation, considering
V ( Nt , t) = Nt−1 and VN = − Nt−2 , VNN = 2Nt−3 , f = r (1 − NKt ) Nt , g = αNt . Following the procedure
of Itô formula, one may arrive to the solution
n  1 o
d( Nt−1 ) = r − Nt−1 + + α2 Nt−1 dt − αNt−1 dWt . (4.18)
k

The previous solution can be expressed as:

exp{(r − (1/2)α2 )t + αWt }


Nt = Rt . (4.19)
N0−1 + (r/K ) 0 exp{(r − (1/2)α2 )s + αWs }ds

The mean and variance of the process Nt are, respectively,:


K
1. E( N (t)) = 1+[(K/N0 )−1]e−rt
;
n o −2 2
K
2. Var ( N (t)) = K2 e2rt N0 + (ert − 1) ( e α t − 1).
4.4. DETERMINISTIC AND STOCHASTIC GENERALIZED LOGISTIC GROWTH MODELS 37

4.4 Deterministic and Stochastic Generalized Logistic Growth Models


The Generalized Logistic Growth model is obtained considering the a new growth equation to the
logistic growth model. Hence, the deterministic Generalized Logistic Growth Model is
dNt h  N r i
t
= βNt 1 − . (4.20)
dt N∞
N∞ denotes the horizontal asymptote when t −→ ∞, termed saturation level. β is a feedback term
that ponders growth rate.
The solution to 4.20 is
h  N −r  i− 1r
0
Nt = N∞ 1 + − 1 e− βrt . (4.21)
N∞
The stochastic Generalized Logistic Growth model is obtained introducing to 4.20 the multiplicative
noise term. The SDE is h  N r i
t
dNt = βNt 1 − dt + αNt dWt . (4.22)
N∞
Nt
To solve SDE 4.22 one must start by diving all terms by N∞ , defining Yt = N∞ and observe that 4.20 the
multiplicative noise term. The SDE is
dYt = βYt (1 − Ytr )dt + αYt dWt (4.23)
is an autonomous non-linear SDE. To proceed, one should apply the reduction method presented in
subsection 4.1, starting by determining a suitable transformation function Zt = U (Yt ) and then using
the results presented in 3.3.
We have f = βYt (1 − Ytr ) and g = αYt , so let’s proceed to find U (Yt ).
βYt (1 − Ytr ) 1 β(1 − Ytr ) α
A(Yt ) = − α= − ̸= 0 (4.24)
αYt 2 α 2
 
b2 − ( β/α)rYtr−1 − r2 βYtr−1 = 0 ⇔ b2 = −rα. (4.25)
Hence, R Yt 1
−rα ds 1
U (Yt ) = Ce Y0 αYt
= U (Yt ) = Ce−rα α ln(Yt −Y0 ) = Ce−rln(Yt /Y0 ) = Yt−r , (4.26)
for C, Y0 arbitrary.
Now let´s find a1 , a2 , b1 , b2 .
 
αYt − rYt−r−1 = b1 − rαYt−r ⇔ b1 = 0, (4.27)

  1  2
− rYt−r−1 + αYt · (−r )(−r − 1)Yt−r−2 = a1 (t) + a2 (t)Yt−r
βYt (1 − Ytr ) · (4.28)
2
⇐⇒
1 
rβ + r (r + 1)α2 − rβ Yt−r = a1 (t) + a2 (t)Yt−r , (4.29)
2
so a1 = rβ and a2 = 21 r (r + 1)α2 − rβ.
Let Zt = Yt−r . The linear SDE is
 1  
dZt = rβ + r (r + 1)α2 − rβ Zt dt − rαZt dWt . (4.30)
2

Resorting to 3.3 and inverting the change of variables, the final solution to 4.22 is achieved. This
solution is
n N −r Z t o− 1r
−1 0
Nt = N∞ Φt r + βr Φ−
s
1
ds , (4.31)
N∞ 0
n  o
with Φt = exp − αrWt + 21 α2 r − βr t .
38 CHAPTER 4. STOCHASTIC DIFFERENTIAL EQUATION MODELS

4.5 Deterministic and Stochastic Gompertz Growth Models


If the instantaneous rate of growth is proportional to lnN∞ − lnNt we have
dNt = β(lnN∞ − lnNt ) Nt dt. (4.32)
The logarithm presents a vertical asymptote at x = 0 and the growth rate has the shape of an inverse
logarithm function.
The solution to ODE 4.32 is the deterministic Gompertz population growth model
− βt
Nt = N∞ e−ln( N∞ /Nt )e , N (0) = N0 , t ≥ 0. (4.33)

Nt
Considering Xt = N∞ , one can write ODE 4.32 as
dNt N∞ Nt
= β(ln ) dt ⇔ dXt = βln( Xt−1 ) Xt dt ⇔ dXt = − βXt lnXt dt. (4.34)
N∞ Nt N∞

By introducing a noise term, the stochastic model arise, as presented below,


dXt = − βXt lnXt dt + kXt dWt , (4.35)
which is an autonomous non-linear SDE.
Once again, the method applied to find the solution of 4.35 is the reduction model presented in 4.1.
First, let’s find the transformation function, given that f = − βXt lnXt and g = kXt :
− βXt 1 −β k
A ( Xt ) = lnXt − k = lnXt − , (4.36)
kXt 2 k 2

 β  d  −β   β 
b2 − + kXt = 0 ⇔ b2 − + 0 = 0 ⇔ b2 = 0, (4.37)
kXt dXt kXt kXt
Z Xt Z Xt
1 1 1
B ( Xt ) = ds = ds = lnXt , (4.38)
X0 g(s) X0 kXs k

b1
U ( Xt ) = b1 B( Xt ) + C = lnXt + C, (4.39)
k

b1
gU X = b1 + b2 U ( Xt ) ⇐⇒ kXt
= b1 + 0 ⇐⇒ b1 = b1 , (4.40)
kXt
so that b1 can assume any value. Thus, with b1 = k and C = 0, we have
k
U ( Xt ) = lnXt + 0 = lnXt . (4.41)
k
To find a1 , a2 we use
1 βXt lnXt k2 Xt2 k2 k2
f U X + g2 U XX = − − = − βlnX t − = − βZt − , (4.42)
2 Xt 2Xt2 2 2
with Zt = U ( Xt ) = lnXt .
The reduced form of 4.35 is
1  
dZt = − k2 − βZt dt + kdWt , (4.43)
2
whose solution is given by 3.3, since now we are working with an autonomous linear SDE. After
inverting the change of variables, the final solution to 4.35 is the stochastic Gompertz growth function
Z t
n h 1 2 βt io
Xt = exp e− βt lnX0 − k ( e − 1) + k e βs dWs . (4.44)
2β 0
4.6. DETERMINISTIC AND STOCHASTIC NEGATIVE EXPONENTIAL GROWTH MODELS 39

4.6 Deterministic and Stochastic Negative Exponential Growth Models


The deterministic negative exponential model is
N


dNt = β − 1 Nt dt, (4.45)
Nt

where NN∞0 − 1 is called feedback term because it compares the limit level of the variable N with its
current state, returning a decrease when the variable is above the limit (Nt > N∞ ) or supporting an
increase when this value is below the limit (Nt < N∞ ). β represents the growth rate.
The representation of the evolution of a variable under the deterministic negative exponential model is
given by
 
Nt = N∞ 1 − e− βt , t ≥ 0, N∞ , β > 0. (4.46)

Considering noise as αNt , the stochastic negative exponential growth model is

dNt = β( N∞ − Nt )dt + αNt dWt , (4.47)

which, fortunately, is an autonomous linear SDE, so its solution is direct.


As in 3.3, the solution to SDE 4.47 is
Z t
n  1 o n n 1  o o
Nt = exp αWt − β + α2 t × N0 + βN∞ exp β + α2 s − αWs ds . (4.48)
2 0 2

4.7 Deterministic and Stochastic Linear Growth Models


A deterministic linear growth model is characterized by the ODE

dNt = βdt, (4.49)

with solution
Nt = N0 + βt, t ≥ 0. (4.50)

The addition of a multiplicative noise term generates the SDE

dNt = βdt + αNt dWt , (4.51)

the general form of a stochastic linear growth model with multiplicative noise. This SDE is autonomous
and linear, with parameter a = β, b = c = 0 and e = α, and its solution is
Z t
n 1 o n 1  o
Nt = exp αWt − α2 t × N0 + β exp α2 s − αWs ds , (4.52)
2 0 2

as defined on 3.3.
If instead of the multiplicative noise one introduces the additive noise, the SDE becomes

dNt = βdt + αdWt , (4.53)

an autonomous and linear SDE in the narrow sense (a = β, b = e = 0, c = α) with solution


Z t Z t
Nt = X0 + β ds + α dWs = X0 + βt + αWt . (4.54)
0 0
40 CHAPTER 4. STOCHASTIC DIFFERENTIAL EQUATION MODELS

4.8 Stochastic Square-Root Growth Model with Mean Reversion


The mean-reverting square-root SDE is defined as follows:

dXt = (α − βXt )dt + σ Xt dWt , X (0) = X0 > 0. (4.55)

This model is famous as the Cox-Ingersoll-Ross (CIR) model to the dynamics of short-run interest rates.
In that sense, α√ is the speed of adjustment and β, σ > 0. The drift α − βXt ensures mean revertion and
the diffusion σ X t describes volatility.

Given an integrating factor e βt , if one rewrites 4.55 as dXt + βXt dt = αdt + σ X t dWt , then
√ √
e βt dXt + βe βt Xt dt = αe βt dt + σe βt X t dWt ⇔ d(e βt Xt ) = αe βt dt + σe βt X t dWt . (4.56)

Integrating 4.56, one obtains


Z t
βs
Z t
βs
Z t √
d ( e Xs ) = αe ds + σe βs Xs dWs (4.57)
0 0 0

 e βte β ·0 
Z t √
e βt Xt − e β·0 X0 = α − +σ e βu Xu dWu (4.58)
β β 0

α βt
  Z t √
e βt Xt = X0 + e −1 +σ e βu Xu dWu (4.59)
β 0

− βt α − βt  βt 
− βt
Z t √
X t = X0 e + e e − 1 + σe e βu Xu dWu (4.60)
β 0

α  α
Z t √
Xt = + e− βt X0 − + σe− βt e βu Xu dWu , (4.61)
β β 0

the stochastic mean-reverting square-root growth equation or CIR growth equation.


Due to its application to interest rates, mean and variance must be highlighted.
h   Rt √ i  
1. E[ Xt ] = E αβ + e− βt X0 − αβ + σe− βt 0 e βu Xu dWu = αβ + e− βt X0 − αβ ;
h Rt √ 2 i h R
t βu √
2 i
2. Var [ Xt ] = E[( Xt − E( Xt ))2 ] = E σe− βt 0 e βu Xu dWu = σ2 e−2βt E 0
e X u dWu =
hR i   
t t t
σ2 e−2βt E 0 e2βu Xu du = σ2 e−2βt 0 e2βu E( Xu )du = σ2 e−2βt 0 e2βu αβ + e− βt X0 − αβ du =
R R
h 2 −βt 2βt i 2 − e−2βt )
σ (e −e )
X0 β + ασ (12β 2 .

CIR model in 4.55 was subject to an extension, defined as CKLS mean-reverting gamma SDE in honour
to the authors, and is defined as
γ
dXt = (α − βXt )dt + σXt dWt , X (0) = X0 > 0, (4.62)

where γ measures the degree of non-linearity of Xt and its volatility.


Chapter 5

Approximation and Estimation of Solutions


to Stochastic Differential Equations

5.1 Introduction
In a lot of cases, it is not possible to determine a close form solution to the Itô initial value problem,
given by the stochastic differential equation:

dXt = f (t, Xt ) dt + g(t, Xt ) dWt , X (t0 ) = X0 , t ∈ [t0 , +∞). (5.1)

Hence, similar to the deterministic differential equations case, numerical methods are employed. We
start by discretizing the interval [t0 , T ] into N smaller subintervals t0 < t1 < · · · < t N = T, then
√ pseudorandom number generator, compute increments for ∆Wi = W (ti+1 ) − W (ti ) ∼
using a normal
N (0, ∆ti ) = ∆ti N (0, 1), where ∆ti = ti+1 − ti , and applying a recursive algorithm to calculate the
approximations Yi of X (ti ; X0 , t0 ), where Y0 = X0 , for each i = 0, 1, · · · , N.

Definition 5.1 (Weak and Strong Convergence). Let us consider a discretization ti of [t0 , T ] into N
subintervals of equal length, i.e. ti = Ni ( T − t0 ) + t0 , hence ∆ti = T − t0
N , for i = 0, 1, . . . , N.

1. A time discretization approximation Y converges with strong order γ to a continuous-time


stochastic process X at time T, if there exists a constant C, independent of ∆t, where for N large
enough such that ∆t < 1, we have:

E[| X ( T ) − Y ( T )|] ≤ C (∆t)γ (5.2)

2. A time discretization approximation Y converges with weak order β to a continuous-time stochas-


tic process X at time T, if there exists a continuously differentiable polynomial function h and a
constant Ch , independent of ∆t, where for N large enough such that ∆t < 1, we have:

|E[h( X ( T ))] − E[h(Y ( T ))]| ≤ Ck (∆t) β . (5.3)

5.2 Taylor Expansions


For the deterministic case, the approximation of a certain continuous process of the form dX dt = f ( Xt )
t

can be calculated by the application of the deterministic Taylor formula until a certain r-term, resulting in
(t−t )k
the approximation h( Xt ) ≈ h( Xt0 ) + ∑rk=1 k!0 Lh( X0 ), where L is the operator L = f ∂X ∂
. Analogously,
for the stochastic case where f (t, Xt ) = f ( Xt ) and g(t, Xt ) = g( Xt ) (that is dXt = f ( Xt ) dt + g( Xt ) dWt ),
and f and g are sufficiently smooth real-value functions, satisfying the linear growth condition, then

41
42CHAPTER 5. APPROXIMATION AND ESTIMATION OF SOLUTIONS TO STOCHASTIC DIFFERENTIAL EQ

any h( Xt ) : R → R twice continuously differentiable, can be expressed as

∂2
Z t 
∂ 1
h ( Xt ) = h ( Xt0 ) + f ( Xs ) h( Xs ) + g( Xs )2 2 h( Xs ) ds (5.4)
t0 ∂X 2 ∂X
Z t

+ g ( Xs ) h( Xs ) dWs , t ∈ [t0 , T ]
t0 ∂X

or as
Z t Z t
h ( Xt ) = h ( Xt0 ) + L0 h( Xs ) ds + L1 h( Xs ) dWs (5.5)
t0 t0

where

∂ 1 ∂2 ∂
L0 = f + g2 2 , L1 g (5.6)
∂X 2 ∂X ∂X

are the operators.

If we apply this formula to the functions f and g within the formula we obtain the Itô-Taylor expansion

Xt = Xt0 + f ( Xt )(t − t0 ) + g( Xt )(Wt − Wt0 ) + Rt , (5.7)

where the remainder term Rt is


Z tZ s Z tZ s
Rt = L0 f ( Xz ) dz ds + L1 f ( Xz ) dWz ds (5.8)
t0 t0 t0 t0
Z tZ s Z tZ s
+ L0 g( Xz ) dz dWs + L1 g( Xz ) dWz dWs .
t0 t0 t0 t0

Note that, in the context of numerical methods, we can continuously apply the Itô-Taylor expansion
and getting better algorithms for approximation of a stochastic Xt , with each subsequent expansion.

5.3 Iterative Schemes for Approximating SDE’s


5.3.1 The Euler-Maruyama Approximation
The Euler-Maruyama approximation for SDEs can either be viewed as the generalization of the Euler
method for ODEs to the SDE case, or as the first term of the Itô-Taylor expansion. Given a time
discretization of [t0 , T ], t0 < t1 < · · · < t N = T, the EM approximation for a stochastic process Xt , is
derived from
Z t i +1 Z t i +1
X t i +1 = X t i + f ( Xs , s) ds + g( Xs , s) dWs (5.9)
ti ti
Z t i +1 Z t i +1
≈ Xti + f ( Xti , t i ) ds + g( Xti , ti ) dWs
ti ti
= Xti + f ( Xti , ti )∆ti + g( Xti , ti )∆Wti .

Thus, the Euler-Maruyama approximation is defined as a continuous-time stochastic process {Yt }t∈[t0 ,T ]
that satisfies the iterative scheme:

Yi+1 = Yi + f (Yi , ti )∆t + g(Yi , ti ) ∆tZ, (5.10)

where Z ∼ N (0, 1). The EM routine converges with strong order γ = 1/2 and weak order β = 1.
5.3. ITERATIVE SCHEMES FOR APPROXIMATING SDE’S 43

5.3.2 The Milstein (Second-Order) Approximation


As the name implies, the Milstein Approximation is derived from the second order Itô-Taylor expansion.
Given a time discretization of [t0 , T ], t0 < t1 < · · · < tn = T, the Milstein approximation for a stochastic
process Xt , is derived from:

Z t i +1 Z t i +1 Z t i +1 Z s
∂g
X t i +1 = X t i + f ( Xs , s) ds + g( Xs , s) dWs + g ( Xz , z ) ( Xz , z) dWz dWs + Rt (5.11)
ti ti ti ti ∂Xz
Z t i +1 Z t i +1 t i +1 s Z Z
∂g
≈ Xti + f ( Xti , t i ) ds + g( Xti , ti ) dWs + g( Xti , ti ) ( Xti , t i ) dWz dWs
ti ti ∂Xti ti ti
∂g 1
= Xti + f ( Xti , ti )∆ti + g( Xti , ti )∆Wti + g( Xti , ti ) ( Xti , ti ) ((∆Wti )2 − ∆ti )
∂Xti 2

Hence, the Milstein approximation is defined as a continuous-time stochastic process {Yt }t∈[t0 ,T ]
satisfying the iterative scheme:

√ 1 ∂g
Yi+1 = Yi + f (Yi , ti )∆t + g(Yi , ti ) ∆tZ + g(Yi , ti ) ( Xti , ti ) ∆tZ2 − ∆t ,

(5.12)
2 ∂Yi

where Z ∼ N (0, 1). The Milstein routine converges with strong order γ = 1.

Example

Given the SDE for the Geometric BM model

dXt = µXt dt + σXt dWt , X (0) = X0

consider a n-partition 0 = t0 ≤ t1 ≤ · · · ≤ tn = T of [0, T ]. The EM and Milstein iteration schemes for


approximating the geometric BM are given by

• EM: Yti+1 = Yti + µYti ∆ti + σYti ∆ti Z, i = 0, 1, ..., n − 1,

• Milstein: Yti+1 = Yti + µYti ∆ti + σYti ∆ti Z + 12 σ2 ∆( Z2 − 1), i = 0, 1, ..., n − 1,

where Z ∼ N (0, 1).

If we use the Geometric BM model SDE in its logarithmic form,


 
1
d log( Xt ) = µ − σ2 dt + σ dWt ,
2

then EM and Milstein iteration schemes become identical with expression


  
1 2 p
Yti+1 = Yti exp µ − σ ∆ti + σ ∆ti Z , i = 0, 1, ..., n − 1,
2

again with Z ∼ N (0, 1). ■

5.3.3 Variations on the EM and Milstein Schemes


As we stated previously, the Itô-Taylor expansion can be expanded to derive other iterative methods
aside from the EM and Milstein schemes. We now present 5 iteration techniques for an approximation
Yt for SDEs of the form dXt = f ( Xt ) dt + g( Xt ) dWt .
44CHAPTER 5. APPROXIMATION AND ESTIMATION OF SOLUTIONS TO STOCHASTIC DIFFERENTIAL EQ

1. Strong Order 1.5 Taylor Scheme is given by

1
Yti+1 = Yti + f (Yti )∆ti + g(Yti )∆Wti + g(Yt ) gX (Yt ) (∆Wti )2 − ∆ti

(5.13)
2 
∆Vt

1
+ f X (Yti ) g(Yti )∆ti ∆Wti + √ i
2 3
 
1 1
+ f (Yti ) f X (Yti ) + ( g(Yti )) f XX (Yti ) (∆ti )2
2
2 2
∆Vti
   
1 1
+ f (Yti ) gX (Yti ) + ( g(Yti )) gXX (Yti ) ∆ti ∆Wti −
2

2 2 3
 
1 1
+ g(Yti ) g(Yti ) gXX (Yti ) + ( g(Yti ))2 (∆Wti )2 − ∆ti ∆Wti ,

2 3

where ∆Wti , ∆Vti ∼ N (0, ∆ti ) are independent random variables, for all i = 1, . . . , N.
2. Weak Order 2.0 Taylor Scheme is defined as

1
Yti+1 = Yti + f (Yti )∆ti + g(Yti )∆Wti + g(Yt ) gX (Yt ) (∆Wti )2 − ∆ti

(5.14)
2 
∆Vti

1
+ f X (Yti ) g(Yti )∆ti ∆Wti + √
2 3
 
1 1
+ f (Yti ) f X (Yti ) + ( g(Yti )) f XX (Yti ) (∆ti )2
2
2 2
∆Vti
   
1 1
+ f (Yti ) gX (Yti ) + ( g(Yti )) gXX (Yti ) ∆ti ∆Wti − √
2
,
2 2 3

where ∆Wti , ∆Vti ∼ N (0, ∆ti ) are independent random variables for all i = 1, . . . , N.
3. Weak Order 2.0 Predictor-Corrector Scheme is divided into two steps
• Step 1: Consider a predictor Y ti given by the scheme

1
Y ti+1 = Yti + f (Yti )∆ti + λi + f X (Yt ) g(Yt ) ∆Ŵti ∆ti

(5.15)
 2 
1 1
+ f (Yti ) f X (Yti ) + ( g(Yti ))2 f XX (Yti ) (∆ti )2 ,
2 2

where
1
λi = g(Yti )∆Ŵti + g(Yt ) gX (Yt ) (∆Ŵti )2 − ∆ti

(5.16)
 2 
1 1
+ f (Yti ) gX (Yti ) + ( g(Yti )) gXX (Yti ) ∆Ŵti ∆ti ,
2
2 2

and ∆Wti ∼ N (0, ∆ti ).


• Step 2: We define the corrector Yti as

1
Y ti+1 + f (Yti ) ∆ti + λi

Yti+1 = Yti + (5.17)
2

4. Derivative-Free Weak Order 2.0 Predictor-Corrector Scheme is defined in two steps


• Step 1: We define the predictor Y ti

1
Y ti+1 = Yti + [ f (ui ) + f (Yti )] ∆ti + Oi (5.18)
2
5.3. ITERATIVE SCHEMES FOR APPROXIMATING SDE’S 45

where the supporting values ui and Oi are defined by

ui = Yti + f (Yti )∆ti + g(Yti )∆Ŵti (5.19)


1
Oi = [ g(u+ ) + g(u− ) + 2g(Yti )]∆Ŵti
4
1 1
+ [ g(u+ ) + g(u− )] (∆Ŵti )2 − ∆ti (∆ti )− 2 ,
 
4
where
p
ui+ = Yti + f (Yti )∆ti + g(Yti ) ∆ti (5.20)
ui− = Yti + f (Yti )∆ti − g(Yti )
p
∆ti

and ∆Ŵti ∼ N (0, ∆ti ).


• Step 2: We compute the corrector Yti as

1
f (Y ti+1 ) + f (Yti ) ∆ti + Oi

Yti+1 = Yti + (5.21)
2

5. Weak Order 2.0 Milstein Scheme is given by


 
1 p 1
Yti+1 = Yti + f (Yti ) − g(Yt ) gX (Yt ) ∆ti + g(Yti ) ∆ti Z + g(Yt ) gX (Yt )∆ti Z2 (5.22)
2 2
1
f (Yti ) gX (Yti ) + f X (Yti ) g(Yti ) + ( g(Yti ))2 gXX (Yti ) (∆ti )3/2 Z

+
2 
1 1
+ f (Yti ) f X (Yti ) + ( g(Yti )) f XX (Yti ) (∆ti )2 ,
2
2 2

where Z ∼ N (1, 0).

5.3.4 The Lamperti Transformation


One desired aspect of any approximation Yt of any SDE is their stability. A way to ensure that feature is
to remove the state Yt as factor in the diffusion coefficient g(t, Yt ). For this end, and if possible, we may
want to deploy the Lamperti transformation.

Theorem 5.1 (Lamperti transformation). Consider an SDE of the form

dXt = f ( Xt , t) dt + g( Xt ) dWt , X ( t 0 ) = X0 (5.23)

with diffusion coefficient g( Xt ) dependent only on Xt . The Lamperti transformation, given by

1
Z
Yt = F ( Xt ) = du , (5.24)
g(u) u = Xt

transforms process Xt into the process Yt , which solves the SDE


 
f (t, Xt ) 1
dYt = + gX ( Xt ) dt + dWt (5.25)
g ( Xt ) 2
f (t, F −1 (Yt )) 1
 
−1
= + g X ( F ( Yt )) dt + dWt .
g( F −1 (Yt )) 2

Notice that the previous theorem is a direct result of applying the Itô formula to F ( Xt ).
46CHAPTER 5. APPROXIMATION AND ESTIMATION OF SOLUTIONS TO STOCHASTIC DIFFERENTIAL EQ

5.4 Local Linearization techniques


5.4.1 The Ozaki Method
Consider the continuous time process Xt defined by the SDE of the form, dXt = f ( Xt )dt + g( Xt )dWt
where f is twice continuously differentiable, and g is continuously differentiable. As we seen previously,
R
if we apply the Lamperti transformation, Yt = F ( Xt ) = g(σu) du , we get an SDE with constant
u = Xt
diffusion coefficient σ

dYt = b(Yt ) dt + σ dWt . (5.26)

The Ozaki method is developed by taking bY (Ys ) to be constant for an interval s ∈ [t, t + ∆t]. In this
case, for Zs = E[Ys |Yt ] the previous SDE is transformed into the ODE dZ ds = [ bY ( Zt ) s + b (Yt )], which
s

has the solution


b(Yt )  bY (Yt )∆t 
Zt+∆t = Yt + e −1 . (5.27)
bY (Yt )

Recall that we supposed bY (Ys ) to be a constant on the interval s ∈ [t, t + ∆t], given this, the SDE for Yt
becomes the autonomous linear narrow sense (ALNS) variety dYt = Kt Yt dt + σ dWt , which we know
the solution to be
Z t+∆t
Yt+∆t = Yt eKt ∆t + σ eKt (t+∆t−s) dWs . (5.28)
t

On the other hand Zt+∆t = E[Yt+∆t |Yt ] = Yt eKt ∆t , hence we compute Kt to be


 
1 b(Yt )  bY (Yt )∆t
Kt = log 1 + e −1 (5.29)
∆t Yt bY (Yt )
Notice
q that the stochastic integral of Yt is a normal distribution with mean 0 and standard deviation
2K ∆t
σ e 2Kt−1 , thus the discrete time model given by the Ozaki method approximates Yt as follow:
t

Yt+∆t = At Yt + Bt Wt+∆t (5.30)


At = eKt ∆t
s
e2Kt ∆t − 1
Bt = σ
2Kt
 
1 b(Yt )  bY (Yt )∆t
Kt = log 1 + e −1 .
∆t Yt bY (Yt )

5.4.2 The Shoji-Ozaki Method


The Shoji-Ozaki local linearization method is a generalization of the Ozaki method to a continuous-time
process Xt defined by the SDE of the form, dXt = f ( Xt , t)dt + g( Xt )dWt , where f is twice continuously
differentiable on Xt , continuous differentiable in t and g is continuously differentiable. As we seen
R
previously, if we apply the Lamperti transformation, Yt = F ( Xt ) = g(σu) du , we get an SDE with
u = Xt
constant diffusion coefficient σ

dYt = b(Yt , t) dt + σ dWt . (5.31)

Analogously to the previous method, the Shoji-Ozaki method is developed by supposing bt (Yt , t),
bY (Yt , t) and bYY (Yt , t) all to be constant in an interval t ∈ [s, s + ∆s]. Therefore if we apply the Itô
formula to b, we get

db(Yt , t) = Ls Yt + Ms t + Ns , (5.32)
5.4. LOCAL LINEARIZATION TECHNIQUES 47

where

Ls = bY (Ys , s), (5.33)


σ2
Ms = bt (Ys , s) + bYY (Ys , s),
2
σ2
 
Ns = b(Ys , s) − bY (Ys , s)Ys − bt (Ys , s) + bYY (Ys , s) s
2
= b(Ys , s) − Ls Ys − Ms s,

hence we can rewrite the SDE for the process Yt as

dYt = ( Ls Yt + Ms t + Ns ) dt + σ dWt , t ≥ s. (5.34)

If we consider the transformed process Zt = e− Ls t Yt , apply the Itô formula to it, compute the integral
formula of Zt and reincorporate the result back into Yt , we arrive at the discretized process of Yt given
by
Z s+∆s
b(Ys , s)  Ls ∆s  M h i
− 1 + 2 e Ls ∆s − 1 + Ls ∆s + σ e Ls (s+∆s−u) dWu .
s
Ys+∆s = Ys + e (5.35)
Ls Ls s

q
2Ls ∆s
The stochastic integral of Yt is a normal distribution with mean 0 and standard deviation σ e 2Ls−1 ,
thus the discrete time model, given by the Shoji-Ozaki method, which approximates Ys , at Ys ̸= 0 and
Ls ̸= 0, is given by:

Ys+∆s = A(Ys )Ys + B( Xs )Ws+∆s , with (5.36)


b(Ys , s)  Ls ∆s  Ms h Ls ∆s i
A(Ys ) = 1 + e −1 + e − 1 + L s ∆s and
Ys Ls Ys L2s
s
e2Ls ∆s − 1
B(Ys ) = σ .
2Ls

5.4.3 The Rate of Convergence of the Local Linearization Method


Theorem 5.2 (Shoji’s Theorem 1). Consider a continuous-time stochastic process Xt and its discretized
approximation X̃t given by the Shoji-Ozaki local linearization method. If we define a p-th order error one-step-
1/p
ahead prediction as Es | Xt − X̃t | p , for s ≤ t ≤ T. Then the rate of convergence of Es | Xt − X̃t | p is 2,
i.e.

Es | Xt − X̃t | p = O (t − s)2p

(5.37)

where O represents the big O notation, and Es the conditional expectation at s.

Theorem 5.3 (Shoji’s Theorem 2). Consider a fixed t ∈ [s, T ], a n-partition s = t0 ≤ t1 ≤ · · · ≤ tn = T of


[s, T ], where ∆ti = ti+1 − ti , and a step-by-step approximation of the integration
Z t
Xt − Xs = f ( Xs , s) ds + σ(Wt − Ws ), (5.38)
s

denoted by X̃t − X̃s , using the Shoji-Ozaki local linearization method. Then the convergence of the step-by-step
approximation is O(∆t) in L p , that is

E| ( Xt+∆t − Xt ) − X̃t+∆t − X̃t | p = O ((∆t) p ) .



(5.39)
48CHAPTER 5. APPROXIMATION AND ESTIMATION OF SOLUTIONS TO STOCHASTIC DIFFERENTIAL EQ
Chapter 6

Estimation of Parameters of Stochastic


Differential Equations

6.1 Introduction
Consider the time-homogeneous stochastic differential equation
dXt = f ( Xt ; θ ) dt + g( Xt ; θ ) dWt , t ≥ 0 (6.1)
our objective is to estimate θ ∈ Θ where Θ is an open subset of Rn . Since, continuous-time diffusions
can only be observed at discrete time points, the transition probability density function is not explicitly
computable. Thus, in order to overcome this quandary, we will make use of the pseudo-maximum
likelihood (PML) method, which is undergirded by the well-known maximum likelihood estimation
technique.

6.2 The Maximum Likelihood Technique


If we want to estimate the value of an unknown parameter θ ∈ Rn of a given population, where the
identical independent random variables X1 , . . . , Xn have been sampled from the population with a
probability density function p( x; θ ), then the joint probability function is given by
n
L ( x1 , . . . , xn ; θ, n) = ∏ p ( x i ; θ ). (6.2)
i =1

In this equation θ is constant and xi , for i = 1, . . . , n are variables. The inverse is true for the likelihood
function of a sample
n
L (θ; x1 , . . . , xn , n) = ∏ p( xi ; θ ) (6.3)
i =1

where xi , for i = 1, . . . , n are realized samples of the random variables and θ is now the variable. For
ease of computation the logarithmic-likelihood function is sometimes used:
n
log L (θ; x1 , . . . , xn , n) = ∑ log p(xi ; θ ). (6.4)
i =1

The maximum likelihood estimate θ̂ ∈ Rn of θ, for the given xi realizations for the independent random
variables, results from a well know result in analysis, that the local maximum θ̂ of L is given when
∂ log L
= 0, (6.5)
∂θ θ =θ̂

∂2 log L
and ∂θ 2
is negative definite.
θ =θ̂

49
50 CHAPTER 6. ESTIMATION OF PARAMETERS OF STOCHASTIC DIFFERENTIAL EQUATIONS

6.3 The Markov Property, Transitional Densities, and the Likelihood Func-
tion of the Sample
Let { Xt }t∈[t0 ,T ] be a continuous Markov process on the probability space (Ω, A, {A}t∈[t0 ,T ] , P), and con-
sider a time discretization t0 ≤ t1 ≤ · · · ≤ tn = T, and let us define Xi = X (ti ), for i = 0, 1, . . . , n. Let
P( X0 , . . . , Xn |θ ) be the joint probability of observing the path, for the a given value for parameter θ. Re-
call that if the process Xt is Markov, it has short memory, that is P( Xn | X0 , . . . , Xn−1 ; θ ) = P( Xn | Xn−1 ; θ ),
therefore by the continuous application of the conditional probability definition and the Markov
property P( X0 , . . . , Xn |θ ) is simplified into,

P ( X0 , . . . , X n ; θ ) = P ( X n | X n − 1 ; θ ) · P ( X n − 1 | X n − 2 ; θ ) · · · P ( X1 | X0 ; θ ) · P ( X0 ; θ ) . (6.6)

The joint probability P( X0 , . . . , Xn ; θ ) of the stochastic process { Xt }t∈[t0 ,T ] has the probability density
function, therefore the Markov property can be expressed as p( Xn | X1 , . . . , Xn−1 ; θ ) = p( Xn | Xn−1 ; θ ).
Again, the continuous application of the conditional probability definition and the Markov property,
results in

p ( X0 , . . . , X n ; θ ) = p ( X n | X n − 1 ; θ ) · p ( X n − 1 | X n − 2 ; θ ) · · · p ( X1 | X0 ; θ ) · p ( X0 ; θ ) . (6.7)

Thus the likelihood function, for the θ variable, of a realized sample xi , for i = 1, . . . , n, is given by the
formula
n
L (θ; x1 , . . . , xn , n) = p( x0 ; θ ) ∏ p( xi+1 | xi ; θ ) (6.8)
i =1

6.4 Change of Variables


We know from analysis that given two open sets U, V ⊂ Rn , a differentiable function a : U → R,
and a diffeomorphism f : V → U between open sets U, V, i.e. a change of variables f (y1 , . . . , yn ) =
( x1 (y1 , . . . , yn ), . . . , xn (y1 , . . . , yn )), we have
Z Z
a( x1 , . . . , xn ) dx1 . . . dxn = a ◦ f (y1 , . . . , yn )| det d f | dy1 . . . dyn (6.9)
Z  
∂x
= a ◦ f (y1 , . . . , yn ) det dy1 . . . dyn .
∂y

Thus, the change of variables for a probability density function p X ( x1 , . . . , xn ) of X for a probabil-
ity density function pY (y1 , . . . , yn ) of Y, where Y is defined as X = f (Y ), where f (y1 , . . . , yn ) =
( x1 (y1 , . . . , yn ), . . . , xn (y1 , . . . , yn )) is an homeomorphism, is simply a generalization of the previous
property. In fact, for every measurable A set we have
Z
E[1 A ( X )] = p X ( x1 , . . . , xn ) dx1 . . . dxn (6.10)
X ( A)
Z  
∂x
= p X ◦ f (y1 , . . . , yn ) det dy1 . . . dyn .
Y ( A) ∂y

Therefore, the probability density function pY is given by


 
∂x
pY (y1 , . . . , yn ) = p X ◦ f (y1 , . . . , yn ) det . (6.11)
∂y

6.5 The Transition Probability Density Function is Known


To have a better perspective of the PML method let us start by tackling a case where equation 6.1 is
solvable, hence the PML is not needed
6.6. THE TRANSITION PROBABILITY DENSITY FUNCTION IS UNKNOWN 51

Example:
Consider the SDE
dXt = µXt dt + σXt dWt , t ≥ 0
and n realized samples for Xt , at increasing time points ti , for i = 1, . . . , n. By the Itô formula we have:

X (ti ) = X (ti−1 )e(µ− 2 σ )∆ti +σ ∆ti Zi
1 2

where Zi ∼ N (0, 1) are  independent and ∆ti = ti − ti−1 . If we denote xi = log( X (ti ) − X (ti−1 )),
then xi ∼ N µ − 2 σ ∆ti , σ ∆ti . By the change of variables equation (6.11), we conclude that the
1 2


probability density function of X (ti ) − X( ti−1 ) is given by

[ xi − (µ − (1/2)σ2 )∆ti ]2
 
1
p( xi , ∆ti ; µ, σ) = √ exp −
σ( X (ti ) − X( ti−1 )) 2π∆ti 2σ2 ∆ti
Thus, the log-likelihood function for this probability density function is
n
n n
log L = − ∑ log( X (ti ) − X (ti−1 )) − log(2π ) − log(σ2 ∆ti )
i =1
2 2
n
1

2σ ∆ti
2 ∑ [xi − (µ − (1/2)σ2 )∆ti ]2 .
i =1
∂ log L ∂ log L ∑in=1 xi
= µ̂ − 12 σ̂2 ∆ti and

If we compute ∂µ = 0, and ∂σ = 0, we arrive at the expressions m̂ = n
∑in=1 ( xi −m̂)2
ν̂ = n = σ̂2 ti which in turn yield the estimates for µ and σ:
s
m̂ + (1/2)ν ν̂
µ̂ = , and σ̂ = .■ (6.12)
∆ti ∆ti

6.6 The Transition Probability Density Function is Unknown


When the probability density function p is unknown but some of the conditional moments of the
diffusion process are known, then under certain restrictions it might be expedient to estimate θ from a
pseudo-transition density function h, which while different from p has consonant moments. Therefore,
the method for obtaining a pseudo-likelihood function, for a Markov process { Xt }t≥0 where n + 1
historical observations of X (t) at non stochastic times t0 < t1 < · · · < tn , where we denote Xi = X (ti ),
is analogous to the ML case, resulting in
n
L(θ, X1 , . . . , Xn , ∆ti , n) = h( X0 ; θ ) ∏ h(∆ti , Xi−1 , Xi ; θ ) (6.13)
i =1

or if X0 is independent from θ
n
L(θ, X1 , . . . , Xn , ∆ti , n) = ∏ h(∆ti , Xi−1 , Xi ; θ ). (6.14)
i =1

Thus, the PML estimate of θ is the value that maximizes the logarithm of the pseudo-likelihood function,
i.e.
n
θ̂ = arg max ∑ log h(∆ti , Xi−1 , Xi ; θ ). (6.15)
θ i =1

In the following, we shall observe how to estimate the PML for the SDE (6.1) using 3 three different
iterations schemes: the Euler-Maruyama scheme, the Ozaki linearization scheme, and the Shoji-Ozaki
linearization scheme. Recall, the every SDE of the form (6.1) can be transformed to an SDE with constant
diffusion term, using the Lamperti transformation, hence we only need to consider SDE’s of the form
dXt = f ( Xt ; θ ) dt + σ dWt , t ≥ 0, X (0) = X0 . (6.16)
52 CHAPTER 6. ESTIMATION OF PARAMETERS OF STOCHASTIC DIFFERENTIAL EQUATIONS

6.6.1 Euler-Maruyama scheme


The Euler-Maruyama scheme for SDE (6.16), is given by the expression,

Xt+∆t = Xt + f ( Xt ; θ )∆t + σ∆Wt . (6.17)

Since, Xt+∆ti − Xt are independent Normal random variables their pseudo-transition density function
are thus
( Xi − Xi −1 − f ( Xi −1 ;θ )∆ti )2
1 −
h(∆ti , Xi−1 , Xi ; θ, σ) = p e 2σ2 ∆ti
,
2πσ2 ∆ti

where Xi = Xti . Therefore, its pseudo log-likelihood function is given by


n
log L(θ, σ; X1 , . . . , Xn , ∆ti , n) = log h( X0 ; θ, σ) −
log(2πσ2 ∆ti ) (6.18)
2
1 n ( X − Xi−1 − f ( Xi−1 ; θ )∆ti )2
− ∑ i .
2 i =1 σ2 ∆ti

Example:
Consider the Ornstein-Uhlenbeck process given by the SDE

dXt = α(µ − Xt ) dt + σ dWt .

By the previous results, the pseudo log-likelihood function associated with the EM-scheme is
n
log L(θ, σ; X1 , . . . , Xn , ∆ti , n) = log h( X0 ; θ, σ) −
log(2πσ2 ∆ti )
2
1 n ( X − Xi−1 − α(µ − Xi−1 )∆ti )2
− ∑ i
2 i =1 σ2 ∆ti

The approximation afforded by the EM scheme is considered good if:


1. if ∆t is very small;
2. if the polynomial growth condition holds, i.e. there exist L, m > 0, such that | f ( X; θ )| ≤
L (1 + | X |m ), for any θ ∈ Θ; and
3. if n(∆tn )3 → 0 (this condition ensures the PML estimator from EM-scheme is consistent and
asymptotically efficient). ■

6.6.2 Ozaki Linearization Method


The Ozaki Linearization scheme for SDE (6.16), is given by the expression

Xt+∆ti = At Xt + Bt Wt+∆t , (6.19)

where
 12
e2Kt ∆t − 1

At = eKt ∆t , Bt = σ , (6.20)
2Kt

and
 
1 f ( Xt )  f X (Xt )∆t
Kt = log 1 + e −1 . (6.21)
∆t Xt f X ( Xt )
6.6. THE TRANSITION PROBABILITY DENSITY FUNCTION IS UNKNOWN 53

Similarly to the EM scheme case, Xt+∆ti − Xt are independent Normal random variables with expected
value, and variance given by

e2Kti ∆ti − 1
!
Kti ∆ti
Ei = E( Xti ) = e Xti , Vi = V( Xti ) = σ 2
, (6.22)
2Kt

respectively. Hence, their pseudo-transition density function are


2
1 (X −E )
− i2V i−1
h(∆ti , Xi−1 , Xi ; θ, σ) = √ e i −1 (6.23)
2πVi−1
where Wi = Wti and Xi = Xti . Therefore, its pseudo log-likelihood function is

n 1 n ( Xi − Ei−1 )2
log L(θ, σ; X1 , . . . , Xn , ∆ti , n) = log h( X0 ; θ, σ) − log(2πVi−1 ) − ∑ (6.24)
2 2 i =1 σ2 ∆ti

Example:
Consider the SDE, dXt = θXt3 dt + σ dWt , by the previous results, the pseudo log-likelihood function
resulting from the Ozaki Linearization scheme, is completely defined by
 
1 1  3θX2 ∆ti
Ki = log 1 + e i −1
∆ti 3
because, in turn, Ki defines the expected value and variance as
h  i2 
1 + 1
e 3θXi2 ∆ti − 1 − 1
Xi 3θX2 ∆ti
  3
Ei = eKti ∆ti Xti = Xi + e i − 1 , Vi = σ2  . ■
 h  i 
3 2
1+ 1
e 3θXi2 ∆ti
−1
∆ti 3

6.6.3 Shoji-Ozaki Linearization Method


The Shoji-Ozaki Linearization scheme for SDE (6.16), is given by the expression

Xt+∆t = A( Xt ) Xt + B( Xt )Wt+∆t , (6.25)

where
f ( Xt )  Lt ∆t  Mt h Lt ∆t  i
At = 1 + e −1 + e − 1 − L t ∆t , (6.26)
Xt L t Xt L2t
 2Lt ∆t 1
e −1 2 σ2
Bt = σ , Lt = f X ( Xt ), and Mt = f XX ( Xt ).
2Lt 2
Similarly to the previous two schemes, Xt+∆ti − Xt are independent Normal random variables with
expected value, and variance given by

e2Lti ∆ti − 1
!
Ei = E( Xti ) = A( Xti ) Xti , Vi = V( Xti ) = σ 2
(6.27)
2Lti

respectively. Hence, their pseudo-transition density function are


2
1 (X −E )
− i2V i−1
h(∆ti , Xi−1 , Xi ; θ, σ) = √ e i −1 (6.28)
2πVi−1
where Wi = Wti and Xi = Xti . Therefore, its pseudo log-likelihood function is

n 1 n ( Xi − Ei−1 )2
log L(θ, σ; X1 , . . . , Xn , ∆ti , n) = log h( X0 ; θ, σ) − log(2πVi−1 ) − ∑ (6.29)
2 2 i =1 σ2 ∆ti
54 CHAPTER 6. ESTIMATION OF PARAMETERS OF STOCHASTIC DIFFERENTIAL EQUATIONS

Example:
Consider the SDE, dXt = θXt3 dt + σ dWt , by the previous results, the pseudo log-likelihood function
resulting from the Shoji-Ozaki Linearization scheme, is completely defined by

Li = 3θXi2 , Mi = σ2 (3θXi )

because, in turn, Li and Mi define the expected value and variance as


!
e6θXi ∆ti − 1
2
Xi  3θX2 ∆ti  σ2  3θX2 ∆ti 
Ei = Xi + e i −1 + e i − 1 − 3θX 2
i ∆t i , Vi = σ2 .■
3 3θXt3 6θXi2
References

Braumann, Carlos A. 2019. Introduction to Stochastic Differential Equations with Applications to Modelling
in Biology and Finance. John Wiley & Sons.
Braun, W John, and Duncan J Murdoch. 2021. A First Course in Statistical Programming with r. Cambridge
University Press.
Capiński, Marek, and Peter E Kopp. 2004. Measure, Integral and Probability. Vol. 14. Springer.
Øksendal, Bernt, and Bernt Øksendal. 2003. Stochastic Differential Equations. Springer.
Panik, Michael J. 2017. “Stochastic Differential Equations: An Introduction with Applications in
Population Dynamics Modelling.”
Pires, Gabriel E. 2012. Cálculo Diferencial e Integral Em Rn. Vol. 45. IST Press.

55

You might also like