Stochastic Differential Equations Overview
Stochastic Differential Equations Overview
Preface
1 Computational Simulation 1
1.1 Monte Carlo simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Simulation of random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.1 Uniform random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.2 Bernoulli and Binomial random variables . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.3 Poisson random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.4 Exponential random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.5 Normal random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.6 Other built-in random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Multivariate random number generation . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4 Markov chain simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5 Monte Carlo integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.6 Advanced simulation methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 Mathematical Foundations 11
2.1 Stochastic Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3 Path Regularity of Stochastic Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4 Markov Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.5 Brownian motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.6 Itô’s integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.7 Itô formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.8 Martingale Representation Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.9 Multidimensional Itô Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
References 55
Preface
This notes must be used as a secondary bibliography to clarify punctual doubts and/ or as a compiler
of the fundamental concepts regarding Stochastic Differential Equations. Please do not use this as a
reference book.
All errors found are the sole responsibility of the authors and must be communicated. Furthermore,
comments and suggestions are welcome.
CONTENTS
Chapter 1
Computational Simulation
The present chapter is related with the simulation of random numbers using computational tools,
specially important in mathematics when treating the estimation of unknown parameters, for example.
The structure of the chapter is based on a summarisation of fundamental concepts, followed by the
presentation of explained examples.
∑in=1 Xi − nµ P
Zn = √ −−−−→ N (0, 1). (1.1)
nσ
In other words, when n → ∞, the distribution function of Zn follows asymptotically a standard normal
distribution, N (0, 1).
Example: Consider an experience consisting in removing one ball from a basket with balls numbered
from 1 to 4. What is the expected number of the removed ball?
Let X be the random variable defining the number of the ball removed. Since we have 4 balls, the
probability function has value 41 = 0.25.
The real Expected Value is:
E_X <- 1*0.25 + 2*0.25 + 3*0.25 + 4*0.25
print(E_X)
## [1] 2.5
Using the Monte Carlo Method, we have the following simulated expected value:
1
2 CHAPTER 1. COMPUTATIONAL SIMULATION
## [1] 2.8
sample_2 <- round(runif(20000, min=1, max=4)) # Larger sample
#print(sample_2)
MC_E_X_2 <- sum(sample_2) / length(sample_2)
print(MC_E_X_2) # CLoser to the real parameter
## [1] 2.49415
Monte Carlo method may be applied to other parameters, proceeding supported by a large simulated
sample.
x n ≡ b x n −1 mod m
xn
un = ,
m
for any n ∈ {1, 2, ..., m}.
Example:
Let m = 7, b = 3, and x0 = 2, we have,
x1 ≡ 3 × 2 ≡ 6 mod 7, u1 = 0.857
x2 ≡ 3 × 6 ≡ 4 mod 7, u2 = 0.571
x3 ≡ 3 × 4 ≡ 5 mod 7, u3 = 0.714
x4 ≡ 3 × 5 ≡ 1 mod 7, u4 = 0.143
x5 ≡ 3 × 1 ≡ 3 mod 7, u5 = 0.429
x6 ≡ 3 × 3 ≡ 2 mod 7, u6 = 0.286. ■
Notice, that parameters b and m require careful selection, in order to ensure that the cycle length is m, be-
cause if b, m, and x0 are not coprime, i.e. gcd(b, m), gcd(m, x0 ), gcd(b, x0 ) ̸= 1, then the aforementioned
algorithm will not complete a cycle.
1.2. SIMULATION OF RANDOM VARIABLES 3
Example:
Let us consider m = 14, b = 6, and x0 = 5, where gcd(m, b) = 2, we have
x1 ≡ 6 × 5 ≡ 2 mod 14
x2 ≡ 6 × 2 ≡ 12 mod 14
x3 ≡ 6 × 12 ≡ 2 mod 14
x1 ≡ 4 × 6 ≡ 3 mod 7
x2 ≡ 4 × 3 ≡ 5 mod 7
x3 ≡ 4 × 5 ≡ 6 mod 7
P( X = 0) = 1 − p, P( X = 1) = p, (1.2)
Example: Considering a basket with balls numbered from 1 to 4, let the success be the observation of
ball number “2”. How many times do we observe ball number “2”?
ball <- runif(10, min=0, max=1)
correct_ball <- (ball<=0.25) #Success if correct_ball <= 0.25
table(correct_ball) # TRUE represent the number of "2" observed
## correct_ball
## FALSE TRUE
## 7 3
Another is using R built-in functions:
Generator: rbinom(n, size, prob); Returns n random variables of a Binomial distribution with parameter
m and p.
Probability function: dbinom(x, size, prob); Returns P( X = x ) when size represents the number of
trials and prob represents the probability of a success.
Distribution function: pbinom(x, size, prob); Returns P( X ≤ x ) when size represents the number of
trials and prob represents the probability of a success.
Quantile function: qbinom(x, size, prob); Returns a such that P( X ≤ a) = x, when size represents the
number of trials and prob represents the probability of a success.
Example:
rbinom(5, 10, 0.25) # 5 samples of a binomial distribution
## [1] 0 2 2 3 2
dbinom(5, 10, 0.25) # P(X = 5)
## [1] 0.0583992
pbinom(5, 10, 0.25) # P(X =< 5)
## [1] 0.9802723
qbinom(0.5, 10, 0.25) # Median
## [1] 2
e−λ λ x
P( X = x ) = , for any x ∈ N, (1.4)
x!
Functions
Generator: rpois(n, lambda); Returns n random variables of a Poisson random variable with rate λ.
Probability function: dpois(x, lambda); Returns P( X = x ) where λ represents the rate.
Distribution function: ppois(x, lambda); Returns P( X ≤ x ) where λ represents the rate.
Quantile function: qpois(x, lambda); Returns a such that P( X ≤ a) = x, where λ represent the rate.
1.2. SIMULATION OF RANDOM VARIABLES 5
Example:
rpois(10, 1.5)
## [1] 1 0 1 3 3 2 4 2 0 2
dpois(2, 1.5)
## [1] 0.2510214
ppois(2, 1.5)
## [1] 0.8088468
qpois(0.5, 1.5)
## [1] 1
A homogeneous Poisson random variable can be generated considering λ as the rate of a poisson for
one single period and T as the number of periods. The next algorithm generates the number of events
in each period:
1. Generate the number of events, N, in the time interval T, using rpois(1, λ*T);
2. Generate the moment when the events occur in the time interval [0,T], using runif(N, max=T).
Example
[Link](123456789) # fix the results
T <- 5 # Number of periods
N_events <- rpois(1, 1.5*T) # lambda=1.5
N_events # number of events in 5 periods
## [1] 9
Instant <- sort(runif(N_events, max=5)) # "Sort" to be crescent in time
Instant# return the moments when events occur!
for (i in 1:T){
Poisson_vector[i] <- sum(Instant<i & Instant>i-1)
}
print(Poisson_vector) # Poisson random variable
## [1] 0 2 1 3 3
Poisson random variables disposed on a line can be simulated using the generator of a exponential
distribution, representing the time to perform a task or the duration until failure, which will be summed
in a cumulative way in order to obtain the instants where the studied event occurs.
Example:
t <- rexp(5, rate=2)
cumsum(t)
Functions
Probability density function: dexp(t, lambda); Returns P( T = t) where λ represents the rate, f (t) =
λe−λt
Distribution function: pexp(t, lambda); Returns P( T ≤ t) where λ represents the rate, F (t) = 1 − e−λt
Quantile function: qexp(x, lambda); Returns a such that P( T ≤ a) = t, where λ represent the rate.
log(1 − U )
F ( T ) = U ⇔ T = F −1 (U ) ⇔ T = − , (1.7)
λ
then generating values for a uniform distribution U ([0, 1]) and finally imputing those values on the
expression for F −1 .
Functions
Generator: rnorm(n, mean=0, sd=1); Returns n random variables normally distributed with mean 0
and variance 1.
Probability density function: dnorm(n, mean=0, sd=1); Returns P( X = x ) where X is a random variable
normally distributed with mean 0 and variance 1.
Quantile function: qnorm(n, mean=0, sd=1); Returns a such that P( X ≤ a) = x, where X is a random
variable normally distributed with mean 0 and variance 1.
One possible technique is sequential conditional generation, where random variables are simulated
conditioned by the previous. Considering X = ( X1 , X2 , · · · , Xn ), and knowing the distributions of each
random variable, the first step is to simulate X1 , then simulate X2 conditioned on X1 , X3 conditioned on
X1 and X2 and so on until Xn , which is conditioned by X1 , · · · , Xn−1 . This method may not be feasible
when distributions are not known.
Even though R has some multivariate distributions built in, in general it is necessary to construct our
own distribution.
When there are 1, · · · , n states, the transition probabilities are represented by a n × n matrix P, where
i represent the current state, j a possible future state and P( Xn+1 = j | Xn = i ) = Pi,j represents the
probability of moving from state i to state j.
Two things to note is that the sum of the rows of a transition probability matrix adds to one,
i.e. ∑nj=1 Pi,j = ∑nj=1 P( Xn+1 = j | Xn = i ) = 1, and since time increments are independent, the matrix
will be invariant over time, hence the transition matrix remains unchanged.
An important property of a Markov chain, is that they possess an invariant distribution, that is a
distribution πi = [ P( X = i )]i , such that,
π = πP.
Notice that π is simply an eigenvector of P T , such that, ∑in=1 πi = 1. An even more interesting property
is that for some transition probability matrices P, Xn will converge to the invariant distribution π.
Example: Let’s consider a two state case: weather is sunny (1) or rainy (2). The transition probability
matrix associated to this trial is:
MC_weather <- matrix( c(0.9, 0.1, 0.5, 0.5), nrow=2, ncol = 2, byrow = TRUE)
MC_weather
## [,1] [,2]
## [1,] 0.9 0.1
## [2,] 0.5 0.5
We can extract from the matrix that the probability of having a sunny day tomorrow if today was
sunny is 90%. Oppositely, the probability of raining after a sunny day is 10%. If today was raining, the
probability of having a rainny or sunny day tomorrow is equal to 50%.
The previously explained strategy uses the law of large numbers to support the use of the sample mean
and is based on the formula of the expected value to obtain its final expression:
Z b Z b
1
E( g(Ui )) = g(Ui ) du ⇔ g(Ui ) du = (b − a) E( g(Ui ))
a b−a a
1
Note that we are using uniform random variables, so the density distribution is equal to b− a .
R3
Example: Considering the integral 0
x2 dx, with value 9. Computing this integral with Monte Carlo
we have:
[Link](1234567656) # Fixing results
aux <- runif(100, min=0, max=3)
## [1] 9.918777
1.6. ADVANCED SIMULATION METHODS 9
## [1] 9.013217
As one can see, the approximation of the integral through Monte Carlo provides a better result as the
number of simulated points increase.
Rb Rb
When dealing with a multiple integration on m variables, a11 · · · amm g(u1 , · · · , um ) du1 · · · dum , the
j
strategy is similar but with the adjustment of simulating as many uniform random variables, Ui ∼
U [ a j , b j ] for any i ∈ {1, · · · , n} and j ∈ {1, · · · , m}, as the number of variables in the integral and the
length of the integral is substituted by the product of the length of the integrals.
Z b1 Z bm
du1 dum
E( g(Ui1 , · · · , Uim )) = ··· g ( u1 , · · · , u m ) ··· (1.8)
a1 am b1 − a1 bm − a m
Z b1 Z bm m
⇔
a1
···
am
g(u1 , · · · , um ) du1 · · · dum = ∏(bj − a j ) E( g(Ui1 , · · · , Uim )) (1.9)
j =1
R 10 R 7
Example: Considering the integral 3 1 sin( x − y) dx dy, with approximated value 0.1185. Computing
this integral with Monte Carlo we have:
x_aux <- runif(10000000, min=1, max=7)
y_aux <- runif(10000000, min=3, max=10)
points_2 <- sin(x_aux-y_aux)
integral_approx_2 <- mean(points_2)*(10-3)*(7-1)
integral_approx_2
## [1] 0.1129121
The Uniform distribution is not the unique distribution that can be generated to compute an integral.
g( X )
If, for instance, we consider a random variable X that has a distribution function f then E f (X ) =
R g( X ) R R
f (X)
f ( x ) dx = g( x ) dx. Hence we can compute g( x ) dx by generating a very large number of
g( X )
values for X, inputting them in f (X)
and calculating the average of the resulting values.
Mathematical Foundations
This chapter covers the fundamental contents required to the topic of Stochastic Differential Equations,
which is a subject that requires a broad knowledge and many times is necessary to review or address
any deficiency in background. In this sense, our intention is to reproduce a succinct yet complete
summary comprehending the most important topics.
The process { Xt } is a version or modification of {Yt }, and vice versa, and their finite-dimensional
distributions coincide.
The finite-dimensional distribution of a stochastic process { Xt }t∈T defined on a probability space
(Ω, A, P ) is the joint distribution of the random vector ( X (t1 ), · · · , X (tn )), given by
with t ∈ T, xi ∈ R, n ≥ 1.
The family of distributions that define a stochastic process are important to comprehend its behavior and
to determine the P(( X (t1 ), · · · , X (tn )) ∈ A, with n ∈ N+ , (t1 , · · · , tn ) ∈ T n and A ⊆ Sn a measurable
set.
Sometimes it is convenient to construct a stochastic process with a specific finite-dimensional dis-
tribution. When this construction is not readily obtained from a sequence of random variables
{ Xi , i = 1, · · · , n}, one might verify the Kolmogorov Existence Theorem.
11
12 CHAPTER 2. MATHEMATICAL FOUNDATIONS
2. For m < n and arbitrary tm+1 , · · · , tn ∈ T, Ft1 ,··· ,tm ,tm+1 ,··· ,tn ( x1 , · · · , xm , ∞, · · · , ∞) =
Ft1 ,··· ,tm ( x1 , · · · , xm ).
Then, there exists a probability space (Ω, A, P ) and a stochastic process { Xt }t∈T defined on a that space such
that P is the collection of finite-dimensional distributions of { Xt }t∈T .
A stochastic process { Xt }t∈T defined on a probability space (Ω, A, P ) has the following characteristics:
1. Continuous, if all its paths are continuous, i.e., for almost all w ∈ Ω the mapping t ∈ T → Xt (w)
is continuous;
2. Integrable, if Xt is an integrable random variable ∀t ≥ 0;
3. Measurable, if the mapping (t, w) → Xt (w) is measurable with respect to the σ-algebra B( T ) × A,
with B( T ) the family of all Borel subsets of T;
4. Square-integrable, if E(| Xt |2 ) < ∞, ∀t ≥ 0;
5. Increasing, if for any s < t, Xs ≤ Xt ;
6. (Strictly) Stationary, if the finite-dimensional distributions are invariant under time displace-
ments, i.e., for ti , ti+h ∈ [to , n] and h ∈ R, Ft1+h ,··· ,tn+h ( x1 , · · · , xn ) = Ft1 ,··· ,tn ( x1 , · · · , xn );
7. Gaussian, if finite-dimensional distributions are normally distributed;
8. Continuous in probability, if P(| Xt − Xs |> ε) → 0, with any s ∈ T, s → t and ε > 0;
9. Quadratic variation process, if Xt (w) : Ω → R is continuous and the second is defined as
Given a stochastic process { Xt }t∈T , one can construct its natural filtration denoted as FtX = {AtX }t∈T ,
which consists in a family of σ-algebras generated by the values of process up to time t, generating the
smallest filtration to which { Xt }t∈T is adapted.
Filtrations are of special importance because they model the information produced by a stochastic
process over time. Thus, one can check if a given event has occurred up to the time under analysis, that
is, if A ∈ σ( Xs , s ≤ t) with σ( Xs , s ≤ t) the natural filtration of { Xt }t∈T .
A probability space (Ω, A, P ) is complete with respect to a measure µ if all sets of measure zero belong
to the σ-algebra A.
A filtration J = {At }t∈T is complete if the probability space (Ω, A, P ) is complete and if the collection
of all subsets with zero probability, A0 , contains all the sets with null probability- A ∈ A, P( A) = 0
then A ∈ A0 .
A filtration J = {At }t∈T is right-continuous if At = At+ , with At+ = ∩{ As , s ∈ T, s > t} ∀ t ∈ T.
A continuous stochastic process adapted to a filtration is also progressively measurable with respect to
that filtration.
2.2 Martingales
Let (Ω, A, P ) be a probability space and J = {At }t∈T a filtration on that probability space. A
martingale defined on a filtered probability space (Ω, A, J , P ) is a sequence of random variables
{ Xt }t∈T on the probability space that are adapted to the filtration. Hence, if { Xt }t∈T satisfies
1. Xt is At -measurable (Xt ’s are adapted to the filtration)
2. E(| Xt |) < ∞ (Xt ’s are integrable)
3. E( Xt | As ) = Xs , ∀ 1 ≤ s < t < ∞ (martingale property)
is discrete-time martingale.
The martingale property states that the expected value of Xt+1 given the information available up to
time t is Xt . The best next forecast for the fortune of a gambler on a fair game after tth trials is the
current fortune at time t, for example. If the “=” sign in the martingale property is substituted by a “≤”
(“≥”), the process { Xt }t∈T is called supermartingale (submartingale).
One interest thing that worth mentioning regarding martingales is that one can construct new martin-
gales based on an old one. The general approach is to consider a new process { X̃t , 0 ≤ t < ∞} defined
as X̃0 = X0 and X̃t = X0 + A1 ( X1 − X0 ) + A2 ( X2 − X1 ) + · · · + An ( Xt − Xt−1 ), t ≥ 1. The process
X̃t is the martingale transform of { Xt }t∈T by { At }t∈T . The following theorem states the conditions
appropriately.
Theorem 2.2 (Martingale Transform Theorem). Let { Xt }t∈T be a martingale adapted to the filtration
J = {At }t∈T with { At , 1 ≤ t < ∞} a sequence of bounded random variables that are nonantecipating with
respect to J . Then, the sequence of martingale transforms { X̃t }t∈T is itself a martingale adapted to J .
A stopping time for the filtration J = {At }t∈T is a random variable τ : Ω → T = N ∪ {+∞} where
{w | τ (w) ≤ t} ∈ At , ∀ 0 ≤ t < ∞, that is, a stopping time constitutes a rule to stop the process if the
criterion τ was verified based on the information about the filtration J . Stopping times are related with
martingales by the noticeable fact that a stopped process is a martingale adapted to J if the original
process was also a martingale adapted to J .
The study of the behavior of a process is interesting to check if this process converges to a certain
threshold as time evolves. The following theorems cover these theoretical aspects.
supk∈K E(| Xk |) < +∞, then there exists an integrable random variable Z such that limt→+∞ Xt = Z a.s.
14 CHAPTER 2. MATHEMATICAL FOUNDATIONS
Convergence in L1 can be proved using uniform integrability, verifying if for a sequence of random
variables { Xt }∞
t=1 , either 1 or 2 verifies.
R
1. limm→∞ (supt E(| Xt |; | Xt | > m) = limm→∞ (supt {|Xt |>m} | Xt |dP) = 0.
2. { Xt }∞ 1 ( Ω, A, P ) and for all ε > 0 there exists a δ > 0 such that, for an event
t=1 is bounded on L R
A ∈ A, P( A) < δ implies A | Xt | dP < δ ∀ t.
Theorem 2.5. If (Ω, A0 , P), X ∈ L1 and At are arbitrary σ-algebras with A0 , then the r.v. E( X | A∞ ) are
uniformly integrable.
Theorem 2.6 (Dominated Convergence Theorem for Conditional Expectations). Suppose { Xt }∞ t=1 is a
martingale on a probability space (Ω, A, P). Let Xt → X a.s. and | Xt | ≤ Z ∀t, with E( Z ) < +∞. If At ↑ A∞ ,
then E( Xt | At ) → E( X | A∞ ) a.s.
Theorem 2.7 (L2 -bounded Martingale Convergence Theorem). Let { Xt }t≥0 be a L2 -bounded martingale.
Then there exists a r.v. X with E(| Xt |2 ) < B, B ∈ R, such that P(limt→∞ Xt = X ) = 1 and limt→∞ ∥( Xt −
X ∥2 = 0.
Let´s now extend the previous concepts to the case of continuous-time martingales. Through the use of
the correct conversion, it’s possible to generate that results from discrete-time martingales.
Given a stochastic process { Xt }t≥0 adapted to the filtration {At }t≥0 , defined on the probability space
(Ω, A, P), it is a continuous-time martingale if E( Xt | As ) = Xs ∀ 0 ≤ s ≤ t.
Considering { Xt }t≥0 a continuous-time martingale adapted to the filtration {At }t≥0 and { Xtn }∞ n =1
uniformly integrable, with {tn }∞
n=1 a strictly increasing sequence of real numbers verifying tn → ∞ as
n → ∞. Hence, by theorem 2.4, exists a r.v. Z such that Xtn → Z a.s. as n → ∞ and E( Z | Atn ) = Xtn .
Furthermore, E( Z | At ) = E( E( Z | Atn ) | At ) = E( Xtn | At ) = Xt , ∀ tn > t > 0.
The random variable τ : Ω → R ∪ {+∞} is a stopping time adpated to the filtration {At }t≥0 since
{w ∈ Ω | τ (w) ≤ t} ∈ At , ∀t ≥ 0.
Theorem 2.8 (Doob’s Continuous-Time Stopping Theorem). Let { Xt }t≥0 be a continuous martingale
adapted to the filtration {At }t≥0 that is right-continuous and A0 contains all P-null sets. If τ is a stopping time
for J , then the process {Yt }t≥0 = { Xmin(τ,t) }t≥0 is also a continuous martingale adapted to J .
The convergence aspect of a continuous time process requires the definition of a Càdlàg process (from
the french continue à droite, limite à gauche), i.e., a process that is right continuous, if for w ∈ Ω the
function Xt (w) is right continuous, with left limits, if for w ∈ Ω the lims↑t Xs (w) exists and is finite for
t ≥ 0.
Theorem 2.9 (Doob’s Convergence Theorem). Let J = {At }t≥0 be a filtration of the probability space
(Ω, A, P) and let { Xt }t≥0 be a martingale with respect to the filtration J whose paths are right-continuous and
left limit. Then, the following are equivalent:
1. { Xt }t≥0 converges in L1 when t → ∞;
2. As t → ∞, { Xt }t≥0 converges a.s. to an integrable and At -measurable r.v. Z that satifies Xt = E( Z |
At ), t ≥ 0;
3. { Xt }t≥0 is uniformly integrable.
2.3. PATH REGULARITY OF STOCHASTIC PROCESSES 15
Theorem 2.10 (Kolmogorov Continuity Theorem). For α, ε, c > 0, if the stochastic process { Xt }t∈[0,1]
defined on the probability space (Ω, A, P) satisfies E(| Xt − Xs |α ) ≤ c|t − s|1+ε for s, t ∈ [0, 1] then there exists
a modification of { Xt }t∈[0,1] that is a continuous process and whose paths are γ-Hölder for every γ ∈ [0, γα ].
Regarding martingales, their regular versions depend on the regularity properties of the associated
filtration, thus some conditions on the filtration are mandatory. Thereon, the following theorem
guarantees the existence of modified martingales with regular paths.
Theorem 2.11 (Doob’s Regularization Theorem). Let (Ω, A, {At }t≥0 , P) be a filtered probability space that
satisfies:
1. For A ∈ A with P( A) = 0, then every subset of A is in A0 ;
2. the filtration {At }t≥0 is right-continuous (At ∩ At+ε ∀ t ≥ 0, ε > 0).
Let { Xt }t≥0 be a supermartingale with respect to the filtration J = {At }t≥0 . Assuming the function t 7→ E( Xt )
is right-continuous, then there exists a modified process { X̃t }t≥0 of { Xt }t≥0 with the following properties:
1. { X̃t }t≥0 is adapted to the filtration J ;
2. The paths of { X̃t }t≥0 are locally bounded, right-continuous and left limited;
3. { X̃t }t≥0 is a supermartingale with respect to the filtration J .
From the Markov property is possible to conclude that, given the present value, the future value of
a Markov process is independent of the past values, which is the reason why Markov processes are
classified as “memoryless”. Moreover, from the definition is possible to conclude that any process with
independent increments, either discrete or continuous, is a Markov process.
A remark to the fact that 2.2 is defined for a continuous state space, being the process denominated as a
Markov Process. However, it is possible to adjust the definition for the case of a discrete state space,
obtaining a process called Markov Chain.
The transition probability of a process is denoted by P(t, B|s, x ) = P( Xt ∈ B| Xs = x ) and represents
the probability of a process that is at position x at time s to move to a state in B at time t. Transition
probabilities of a Markov process has the following properties:
1. P(t, ·|s, x ) is a probability measure on the family of Borel sets B for every 0 ≤ s < t < ∞ and
x ∈ R;
2. P(t, B|s, ·) is Borel measurable for every 0 ≤ s < t < ∞ and B ∈ B ;
R
3. P(t, B|s, x ) = R P(t, B|r, y) P(r, dy|s, x ) for s < r < t, x ∈ R and B ∈ B (Chapman-Kolmogorov
equation).
16 CHAPTER 2. MATHEMATICAL FOUNDATIONS
To understand property 3, let´s divide the transition from x to a state in B in two steps and consider an
intermediate point at moment r. More steps are allowed if it is intended to, but when aiming to some
intuition two steps are enough.
The process moves firstly from x to y (at moment r), and then from y to a state in B. The Markov
property says that past information should be disregarded and considered only the present value of the
process, so the useful information at time r is that the process is in state y.
The absence of memory that characterize this process impose that transitions of moments s → r and
r → t are independent and the probability of moving from x to a state in B is the product of the
probabilities of the intermediate movements.
Since y can assume any value, through the total probability theorem, this transition probability from
x to a state in B verified between moments s and t should consider all possible trajectories for each
different y, which is the reason for the presence of the integral.
The transition probabilities are stationary if
is satisfied. That is, the initial and final moments are no longer relevant but, instead, the critical point is
the duration of the interval, along with states x and B. A Markov process is termed homogeneous (in
time) when their transition probabilities are stationary.
A homogeneous Markov process satisfies the strong Markov property if, for any τ ≥ 0 and B a Borel
set,
P ( X t + τ ∈ B | A t ) = P ( X τ ∈ B | X0 ) (2.4)
holds for every Markov instants of time. An equivalent way of defining the strong Markov property
presented in 2.4 is through the verification of
E [ h ( X t + τ ) | A t ] = E [ h ( X τ ) | X0 ] , (2.5)
for all Borel-measurable bounded functions h. If the strong Markov property is verified, a homogeneous
Markov process is said to be a strong Markov process. Please note that a strong Markov process is also
Markov process, but a Markov process is not always a strong Markov process.
Standard Brownian motion may be used to represent the cumulative effect of noise. The difference
between the value of a standard Brownian motion at instants t and s, Wt − Ws ∀ 0 ≤ s < t, reflects the
noise on the interval [s, t].
The following characteristics of standard Brownian motion worth mentioning:
1. The increments are stationary, i.e., the distribution of Wt − Ws , 0 ≤ s < t, only depends on t − s;
2. It is a Quadratic Variation process with quadratic variation equal to t: E(Wt ) = 0 and Var (Wt ) =
t, 0 ≤ s < t;
3. E(Ws Wt ) = E(Ws Wt + Ws2 − Ws2 ) = E(Ws (Wt − Ws ) + Ws2 ) = E(Ws ) E(Wt − Ws ) + E(Ws2 ) =
E(Ws2 ) = s, or in general, E(Ws Wt ) = min{s, t};
4. The mapping t → W (t, w) is a.s. continuous in t and W (0, w) = 0, for each w;
5. If a standard Brownian motion process {Wt }t≥0 is measurable, then each sample path function
W (·, w) is A-measurable;
6. Its sample paths are highly irregular and a.s. of unbounded variation;
7. Its sample paths are a.s. non differentiable;
8. The symmetric process {−Wt }t≥0 is also a Brwonian motion (symmetric property);
√
9. {Wct }t≥0 follows the same probability law as { c Wt }t≥0 (scalling property);
10. limt→0 (Wt /t) = 0 a.s., i.e., {tW1/t }t≥0 follows the same probability law as {Wt }t≥0 (Time-
inversion property);
11. {Wt+h − Wt }t≥0 with h > 0 is also a standard Brownian motion;
12. A standard Brownian motion process is a continuous-time martingale (given {Wt }t≥0 and At =
σ(Ws , 0 ≤ s < t), E(Wt | As ) = E(Wt − Ws | As ) + E(Ws | As ) = 0 + Ws = Ws because
Wt − Ws is independent of As and Ws is As -measurable, and E((Wt − Ws )2 | As ) = t − s since
the increment is independent of the past and normally distributed with mean zero- a standard
Brownian motion with independent increments is a square integrable martingale);
A random process {Wt = (Wt1 , · · · , Wtd )}t≥0 is a d-dimensional standard Brownian motion if every
{Wti }t≥0 is a one-dimensional standard Brownian motion and independent, for i = 1, · · · , d. For
{Wt = (Wt1 , · · · , Wtd )}t≥0 a d-dimensional standard Brownian motion and i, j = 1, · · · , d,
1. E(Wt − Ws | As ) = 0, 0 ≤ s < t < ∞
j
2. E(Wti Ws ) = min{t, s}δij , 1 ≤ i, j ≤ d;
j j
3. E((Wti − Wsi )(Wt − Ws )) = (t − s)δij , 0 ≤ s < t;
with
1 if i = j
δij = ,
0 if i ̸= j
the Dirac delta function.
When Wt − Ws is normally distributed with mean µ(t − s) and variance σ2 (t − s), we have a general
Brownian motion. The coefficient µ is called drift coefficient and σ2 is the variance coefficient, with
σ2 ̸= 0. When µ = 0 and σ2 = 1 we have process termed normalized Brownian motion.
Given a Brownian motion {Wt }t≥0 process with drift µ and variance σ2 ,
1. {−Wt }t≥0 is a Brownian motion process with drift −µ and σ2 ;
2. { aWbt }t≥0 is a Brownian motion process with drift abµ and variance a2 bσ2 , with constants a, b > 0;
(Wt −Ws )−µ(t−s)
3. σ is a normalized Brownian motion process.
18 CHAPTER 2. MATHEMATICAL FOUNDATIONS
A standard Brownian motion process {Wt }t≥0 is a Markov process with stationary transition probability
R −(x−Ws )2
P(Wt ∈ B | Ws ) = √ 1 B
e 2(t−s) dx a.s., homogeneous with respect to state space and time and
2π (t−s)
it is possible to present continuous sample paths with probability 1- theorem 2.10 guarantees it.
Furthermore, {Wt }t≥0 is also a strong Markov process.
The derivative with respect to time of a standard Brownian motion is a constructed notion, since
Brownian motion are not differentiable due to its unbounded variation, called a white noise and
denoted as Ẇt = dWt /dt. This process is Gaussian and wide-sense stationary (Xt is wide-sense
stationary if E( Xt Xs ) = h(t − s) for h : R → R and E( Xt ) = E( Xs )) with
1 if 0 ∈ A
h(·) = δ0 ( A) = .
0 if 0 ∈
/A
Using R software, one can simulate a Brownian motion path through an Euler scheme, using normally
distributed random variables, as in the following example. There exist other strategies that allow the
construction of this paths.
Example:
n <- 100 # number of steps
T <- 5 # Time
for (i in 2:n){
W[i] <- W[i-1] + N[i]*sqrt(delta_t)
}
0.0
−0.5
−1.0
0 1 2 3 4 5
Time (Years)
2.6. ITÔ’S INTEGRAL 19
Definition 2.1 (Natural domain of the Itô integral). For any a, b ∈ R0+ , we define the natural domain
of the Itô integral as the space M2 ([ a, b]) of all real-valued processes f = { f (t)}t∈[a,b] on the product
space Ω × [ a, b] such that
1. the process f is measurable with regards to the σ- algebra on Ω × [ a, b];
2. the process f is adapted with respect to the filtration {At }[a,b] i.e. for any t ∈ [ a, b], f (t) is
At -measurable;
R
b
3. ∥ f ∥22 ([ a, b]) = E a | f (t)|2 dt < ∞.
Note, that under the norm ∥·∥22 ([ a, b]) the space M2 ([ a, b]) is complete, and any two processes f , g ∈
M2 ([ a, b]) are equivalent if ∥ f − g∥22 ([ a, b]).
Definition 2.2 (Itô Integral for Elementary Functions). A stochastic process ϕ = {ϕ(t)}t∈[a,b] ∈
M20 ([ a, b]) ⊂ M2 ([ a, b]) is said to be an elementary process (where M20 ([ a, b]) is the subspace of el-
ementary functions of M2 ([ a, b])) if
k −1
ϕ(t, ω ) = ∑ ei (ω ), 1(t ,t + ] (t),
i i 1
(2.6)
i =1
where a = t0 < t1 < · · · < tn = b, and ei are square integrable random variables and At -measurable.
Its stochastic integral in respect to Wt is thus defined as
Z b k −1
I (ϕ) =
a
ϕ(t) dW (t) = ∑ ei [Wt + i 1
− Wti ]. (2.7)
i =1
Theorem 2.12 (Itô Isometry). For any elementary function ϕ ∈ M20 ([ a, b]), ∥ f ∥2M2 ([a,b]) = ∥ f ∥2L2 (P) =
0
∥ f ∥22 ([ a, b]) = E( I (ϕ)), i.e. the mapping I : M20 ([ a, b]) → L2 (P) is a continuous isometry.
Definition 2.3 (Itô Integral). For any stochastic process f ∈ M2 ([ a, b]), and a sequence of elementary
functions ϕn ∈ M20 ([ a, b]) such that ϕn ∈ M20 ([ a, b]), such that lim ∥ f − ϕn ∥2M2 ([a,b]) = 0, the Itô integral
n→∞ 0
of f is defined as
Z b
I ( f ) = lim ϕn dW (t) in L2 (P). (2.8)
n→∞ a
Consider now, f , g ∈ M2 ([ a, b]) and α, β ∈ R, then the following 9 properties can be easily proven, by
first proving the properties for an elementary function ϕ ∈ M20 ([ a, b]) and then
using the fact that there Z b
exists a sequence of elementary functions ϕn ∈ M20 ([ a, b]), such that lim E | f − ϕn |2 dt = 0:
n→∞ a
20 CHAPTER 2. MATHEMATICAL FOUNDATIONS
Rb
1. (Measurability) a f (t) dW (t) is A a -measurable;
R R
b b
2. E a f (t) dW (t) = 0 and E a f (t) dW (t)|A a = 0;
Rb 2
R
b
2
3. E a
f (t) dW (t) = E a |
f ( t )| dt and
Rb 2 R
b
R
b
2 2
E a
f ( t ) dW ( t ) |A a = E a | f ( t )| dt |A a = a
E | f ( t )| |A a dt;
Rb Rb Rb
4. (Linearity) a (α f (t) + βg(t)) dW (t) = α a f (t) dW (t) + β a g(t) dW (t);
hR i hR i
b Rb b
5. E a f (t) dW (t) a g(t) dW (t) = E a f (t) g(t) dW (t) ;
Definition 2.4 (Itô Integral). For any stochastic process f ∈ M2 ([ a, b]), the undefinided Itô stochastic
integral of f is defined as
Z t
I (t) = f (t) dW (t), for any t ∈ [0, T ], (2.9)
0
where f (0) = 0, { I (t)}t∈[0,T ] is At -adapted, square integrable and martingale with respect to the
filtration {At }t≥0 , i.e.
t t
Z Z
E[ I (t)|As ] = E[ I (s)|As ] + E f (r ) dW (r )|As = I (s) + E f (r ) dW (r ) = I (s), (2.10)
s s
Moreover, { I (t)}t∈[0,T ] has a continuous version a.s., that is, there exists a t-continuous stochastic
Rt
process Jt on (Ω, A, {A}t , P) with continuous paths such that P Jt = 0 g(s) dW (s) = 1, for any
t ∈ [0, T ].
Rt
Let us write the Itô’s integral of a stochastic process f as Xt = 0 f (s) dW (s), for every t ∈ [0, T ] the
following four properties are easily shown:
Rt
1. E( Xt ) = 0 and V( Xt ) = E( Xt2 ) = 0 E( f (s)2 ) ds, for any t ∈ [0, T ];
R min{t,s}
2. Cov( Xt , Xs ) = E( Xt Xs ) = 0 E( f (u)2 ) du, for any t, s ∈ [0, T ];
Rt
3. E | Xt |2 = 0 E | f (u)|2 du, for any t ∈ [0, T ];
Theorem 2.13 (One-Dimensional Itô Formula). Let Xt be an Itô process as previously defined. Consider a
function V = V ( X, t) : R × R0+ → R twice continuously differentiable in X and once in t. Then the process
Y (t) = V ( X (t), t) is an Itô process that can be expressed as
We can all write the Itô formula for Y (t) in the differential form:
1 2
dY (t) = Vt ( X (t), t) + VX ( X (t), t) f (t) + VXX ( X (t), t) g(t) dt (2.15)
2
+ VX ( X (t), t) g(t) dWt
The proof of the Itô formula while simple is quite laborious. It is a consequence of the application of
the two dimensional Taylor series to V ( X, t), which results in the equation
∞ ∞ Z t Z X (t)
1
V ( X ( t ), t ) − V ( X (0), 0) = ∑ ∑ n!m! 0 X (0)
VX n ,tm (dX )n (ds)m , (2.16)
n =0 m =0
and after every component of the sum is computed we arrive at the Itô formula. The computation for
each term of the sum can be heuristically expressed by the Itô multiplication table:
· dt dW (t)
dt 0 0
dW (t) 0 dt
Theorem 2.14 (Integration by parts Formula). Let f (s) be independent of Ω, with continuous and bounded
variation on s ∈ [0, T ]. Then
Z t Z t
f (s) dW (s) = f (t)W (t) − W ( s ) d f ( s ). (2.17)
0 0
The previous theorem 2.17 is the result of an application of the Itô formula to the function V ( X, t) =
f (t) X, where X is W (t).
Rt Rt
Note that if { Xt }t≥0 is an Itô process Xt = X0 + 0 f (s) ds + 0 g(s) dW (s) where both f and g are
deterministic functions of t, with X0 constant, then
Z t Z t
X (t) ∼ N [E( X (t)), V( X (t))] = N X0 + f (s) ds, 2
g(s) ds
0 0
22 CHAPTER 2. MATHEMATICAL FOUNDATIONS
The following tables present the solution to common stochastic integrals and the function V ( X, t),
where X is W (t), that was applied to the Itô formula, if any, to solve the integral.
Stochastic Integral
Z t
1. dW (s) = W (t)
Z0 t
1 1
2. W (s) dW (s) = W ( t )2 − t
Z0 b 2 2
3. c dW (s) = c (W (b) − W ( a)),
Za b
1 1
4. W (s) dW (s) = W ( b )2 − W ( a )2 − ( b − a )
Za t 2 Z 2
t
5. s dW (s) = tW (t) − W (s)ds
Z0 t 0
t
1
Z
6. W (s) dW (s) = W (t)3 −
2
W (s)ds
Z0 t 3 0Z
1 t W (s)
7. eW (s) dW (s) = eW (t) − 1 − e ds
Z0 2 0
t 1
Z t
W (s) W (t) W (t)
8. W (s)e dW (s) = 1 + W (t)e −e eW (s) (1 + W (s)) ds
−
Z0 0 2
t 1 t
1 t
Z
9. sW (s) dW (s) = W ( t )2 − − W (s)2 ds
0
Z t
2 2 2 0
1
W (s)2 − s dW (s) = W (t)3 − tW (t)
10.
Z0 t 3 µ σ Z t
µs+σW (s) 1 µt+σW (t)
11. e dW (s) = e −1 − + eµs+σW (s) ds, with µ, σ constants
Z0 t σ σ 2 0
1 t
Z
12. sin (W (s)) dW (s) = 1 − cos(W (t)) − cos(W (s))ds
Z0 t Z t2 0
1
13. cos (W (s)) dW (s) = sin(W (t)) + sin(W (s))ds
0 Z 2 0
t
14. E dW (s)
Z0 t
15. E W (s) dW (s) = 0
Z0 t
t2
16. V W (s) dW (s) =
0 2
V ( X, t)
1. direct
2. V ( X, t) = X 2
3. direct
4. V ( X, t) = X 2
5. V ( X, t) = tX
6. V ( X, t) = X 3
7. V ( X, t) = e X
8. V ( X, t) = Xe X and the result from 2
9. V ( X, t) = tX 2
10. from results 5 and 6
11. V ( X, t) = eµt+σX
12. V ( X, t) = cos( X )
13. V ( X, t) = sin( X )
14. direct
15. direct from 2
16. direct from 2
2.8. MARTINGALE REPRESENTATION THEOREM 23
Theorem 2.16 (Martingale Representation Theorem). Let {Wt }t≥0 be a Brownian motion in the probability
space (Ω, At , {At }t≥0 , P). For every { Xt }t≥0 a square martingale to the filtration {At }t≥0 , there exists
adapted
Z t
a unique stochastic process Ut , adapted to {At }t≥0 , where E U (s)2 ds < ∞, such that
0
Z t
X (t) = E[ X (0)] + Us dW (s). (2.19)
0
or in a different notation,
Z t m Z t
Xi ( t ) = Xi ( 0 ) + f i (s) ds + ∑ gi,j (s) dWj (s), for any i = 1, 2, . . . , d., (2.21)
0 j =1 0
where f(t) = ( f 1 (t), . . . , f d (t))′ ∈ L1R+ Rd and g(t) = gi,j d×m ∈ L2R+ Rd×m . Its stochastic differ-
0 0
ential is
dXt = f(s) dt + g(t) dW(t), (2.22)
or in a different notation,
m
dXi (t) = f i (t) dt + ∑ gi,j (t) dWj (t), for any i = 1, 2, . . . , d. (2.23)
j =1
Theorem 2.17 (Multidimensional Itô Formula). Let Xt be an Itô process as previously defined . Consider a
function V = V (X, t) : Rd × R0+ → R twice continuously differentiable in X and once in t. Then the process
Y(t) = V (X(t), t) is an Itô process that can be expressed as
1
dY (t) = Vt (X(t), t) + VX′ (X(t), t)f(t) + tr g′ (t)VXX (X(t), t)g(t) dt
(2.24)
2
+ VX′ (X(t), s)g(t) dW(t),
or
" #
d
1 d d m
dY (t) = Vt + ∑ VXi f i (t) + ∑ ∑ ∑ VXi Xj gik (t) g jk (t) dt (2.25)
i =1
2 i =1 j =1 k =1
d m
+ ∑ ∑ VXi gij (t) dWj (t).
i =1 j =1
24 CHAPTER 2. MATHEMATICAL FOUNDATIONS
The proof of the multidimensional Itô formula is analogous to the one-dimensional case. It is a
consequence of applying the d + 1 dimensional Taylor series to V ( X1 , X2 , . . . , Xd , t), which results in
the equation
and after every component of the sum computed we arrive at the multidimensional Itô formula. Just
as we did previously, the computation for each term of the sum can be heuristically expressed by the
multidimensional Itô multiplication table:
· dt dWi dWj
dt 0 0 0
dWi 0 dt 0
dWj 0 0 dt
Chapter 3
The present chapter is dedicated to the study of the solutions of Stochastic Differential Equation, namely
the conditions to its existence and uniqueness, how to determine the exact solution and its stability.
3.1 Introduction
We are interested in the solution of a Stochastic Differential Equation of the form
(
dX (t) = f ( X (t), t)dt + g( X (t), t)dW (t), t ∈ [t0 , T ], T > 0
, (3.1)
X ( t 0 ) = X0
where X0 is independent of the W (t) − W (t0 ) and E(| X0 |2 ) < +∞. Function f is called the drift
coefficient and can be interpreted as the short-term growth. Function g is the diffusion coefficient
and can be viewed as the short-term variability. If g( X (t), t) = 0, the equation 3.1 is converted to an
Ordinary Differential Equation.
Let (Ω, A, P) be a complete probability space endowed with the filtration J = {At }t≥0 that is right-
continuous and {A0 } contains all P-null sets, {Wt }t≥0 a one-dimensional Brownian motion defined
on the filtered probability space (Ω, A, {At }t≥0 , P) and, for 0 ≤ to < T < +∞, X0 is a At0 -measurable
random variable with E(| X0 |2 ) < +∞. Furthermore, consider f , g : R × [to , T ] → R Borel-measurable
functions. One can represent the solution of 3.1 in the integral form as follows:
Z t Z t
X ( t ) = X (0) + f ( X (s), s)ds + g( X (s), s)dW (s), t ∈ [t0 , T ]. (3.2)
t0 t0
25
26 CHAPTER 3. SOLUTION OF STOCHASTIC DIFFERENTIAL EQUATIONS
Theorem 3.1 (Existence and Uniqueness Theorem). Let the coefficient functions f , g of the SDE 3.1 satisfy
the following conditions:
1. Uniform Lipschitz condition: | f ( x, t) − f (y, t)| + | g( x, t) − g(y, t)| ≤ k | x − y|, for some k ∈ R constant,
t ∈ [t0 , T ], T > 0;
2. Linear growth condition: | f ( x, t)|2 + | g( x, t)|2 ≤ c2 (1 + | x |2 ), for some constant c and t ∈ [t0 , T ], T >
0.
If these conditions hold, then there exists a continuously adapted solution Xt of equation 3.1 such that X (0) = X0
and Xt is uniformly bounded in M2 ([t0 , T ]) (supt0 ≤t≤T E(| Xt2 |) < +∞). Furthermore, if both Xt and X̃t are
2
continuous M ([t0 , T ]) bounded solutions of 3.1, then P sup | Xt − X̃t | = 0 = 1, and thus Xt is the
t0 ≤ t ≤ T
unique solution.
R 1/2
t λs
One proof of the Existence Uniqueness Theorem involves the use of a norm ∥ X ∥ = 0
e E [| X s | 2 ] ds ,
where λ > k2 (t + 1), and the operator Ψ : M2 ([t0 , T ]) → M2 ([t0 , T ]), where Ψ( X )t =
Z t Z t
X0 + f (s, Xs ) ds + g(s, Xs ) dWs . The linear growth condition ensures that the operator Ψ is well
0 0
defined (i.e. Ψ( X )t ∈ M2 ([t0 , T ]) for any X ∈ M2 ([t0 , T ])), and the uniform Lipschitz condition guaran-
tees the operator Ψ is contractive under the aforementioned norm (i.e. ∥ X − Y ∥ > ∥Ψ( X ) − Ψ(Y )∥).
Hence, by the Fixed Point Theorem, (3.2) has a unique solution.
Another way to look at the theorem is by noticing that the first condition guarantees that functions f
and g do not change faster than a linear function of X relative to changes in X, implying the Lipschitz
continuity of f (·, t) and g(·, t) for all t ∈ [t0 , T ]. The second condition restricts the coefficient functions
f and g to linear increases with respect to X, guaranteeing a.s. that the solution Xt does not explode in
the interval [t0 , T ] independently of X0 .
A solution Xt is termed a strong solution if it has strongly unique sample path and the filtered probability
space, the Brownian motion and the coefficient functions are specified in advance. When only f and g
are specified in advance and the pair ( X̃t , W̃t ) is defined on a suitably probability space the solution X̃t
is called a weak solution, besides the verification of the conditions of theorem 3.1. If two weak solutions
are indistinguishable, then path uniqueness holds. Two weak solutions are called weakly unique if
they have the same probability distribution function. A strong solution is also weak, but generally a
weak solution is not strong.
Given that the conditions of theorem 3.1 are quiet restrictive, the first condition can be relaxed to
a local uniform Lipschitz condition: for every n ≥ 1, exists a positive constant k n such that, for all
x, y ∈ R, t ∈ [t0 , T ], T > 0 and max {| x |, |y|} ≤ n,
Thus, if this local uniform Lipschitz condition holds simultaneously with the second condition of
theorem 3.1, then there exists a unique solution Xt , Xt ∈ M2 ([t0 , T ]) to the SDE 3.1. This holds every
time when f and g are continuous differentiable in x on R × [t0 , T ].
Moreover, it is also possible to replace the linear growth condition by a monotone condition: let k be a
positive constant such that
1
x f ( x, t) + | g( x, t)|2 ≤ k (1 + | x |2 ), ∀ ( X (t), t) ∈ R × [t0 , T ], T > 0. (3.4)
2
Now, if 3.3 and 3.4 holds, then there exists a solution Xt to 3.1, with Xt ∈ M2 ([t0 , T ]). The linear growth
condition implies the monotonic condition, but the opposite may not be true.
It worth mention that for f and g defined on R × [t0 , T ] and when the conditions of theorem 3.1 are
verified on every finite subinterval [t0 , T ] ⊂ [to , +∞), the equation 3.1 has a unique solution Xt define
on [t0 , T ) called global solution.
3.3. LINEAR SDES 27
with X0 a random variable independent of W (t) − W (t0 ), that verifies the assumptions of theorem 3.1
has a unique, continuous global solution Xt on [t0 , +∞) such that X (t0 ) = X0 given that
is satisfied and
| f ( x )|2 + | g( x )|2 ≤ k2 (1 + | x |2 ) (3.7)
(
dX (t) = ( a(t) + A(t) X (t))dt + (b(t) + B(t) X (t))dW (t), t ∈ [t0 , T ], T > 0
. (3.8)
X ( t 0 ) = X0
A linear SDE is homogeneous if a(t) = b(t) = 0 and linear in the narrow sense if B(t) ≡ 0.
If functions a(t), b(t), A(t), B(t) are measurable and bounded on [t0 , T ],T > 0, i.e.,
then functions f ( X (t), t) = a(t) + A(t) X (t) and g( X (t), t) = b(t) + B(t) X (t) satisfy the hypothesis of
theorem 3.1, guaranteeing that equation 3.8 has a unique continuous solution Xt over [t0 , T ], provided
E(| X0 |2 ) < ∞ and X0 is independent of W (t) − W (t0 ). Again, if the previous is verified in all subsets
of [t0 , +∞), the solution Xt is classified as a unique, continuous and global solution.
An autonomous linear SDE requires functions a(t), b(t), A(t), B(t) independent of time, with SDE
represented by
dX (t) = ( a + AX (t))dt + (b + BX (t)) dW (t), (3.9)
where a, b, A, B are real constants and a unique, continuous and global solution Xt is always achievable
for this type of SDE.
Itô formula is extremely useful to determine the solution to a SDE of the form Xt = (Wt , t). One can
obtain that solution by writing Xt = (Wt , t) in terms of second-order Taylor Expansion and applying
the rule of products established on Itô multiplication rule. After the relevant simplifications, it remains
to match the coefficients of the first SDE with the coefficients of the SDE obtained by the previous
explained strategy. Finally, the function that satisfies both equations is the solution to the SDE. This
methodology is called Itô coefficient matching and the class of SDEs that are particularly well suited
for solving applying this method are characterized as exact.
Theorem 3.2 (Test for exactness). A SDE is exact if the coefficient functions f ( x, t) and g( x, t) satisfy the
following condition:
1
f x = gt + gxx .
2
28 CHAPTER 3. SOLUTION OF STOCHASTIC DIFFERENTIAL EQUATIONS
Note, that the test for exactness, is just an application of the Itô’s form to a stochastic function h(t, Wt ),
where h(t, Wt ) = Xt .
Example To solve the Geometric BM one can apply directly Itô formula or Itô matching coefficient
strategy.
(
dXt = µXt dt + σXt dWt , t ∈ [0, +∞)
(3.10)
X ( 0 ) = X0
dXt
= µdt + σdWt . (3.11)
Xt
Left hand side of the equation suggest the existence of ln( Xt ) in the solution. Thus, considering
V ( Xt , t) = ln( Xt ), f = µXt and g = σXt , Itô formula returns
1 1 −1 2 2 1
dV ( Xt , t) = 0 + µXt + σ X t dt + σX t dWt , (3.12)
Xt 2 Xt2 Xt
1
ln( Xt ) − ln( X0 ) = (µ − σ2 )t + σ2 Wt, (3.14)
2
with W0 = 0. The solution of Geometric BM SDE is
X 1 1 2 2
t
ln = (µ − σ2 )t + σ2 Wt ⇐⇒ Xt = X0 e(µ− 2 σ )t+σ Wt . (3.15)
X0 2
To apply Itô matching coefficient strategy let’s impose the conditions to find function h( x, t):
1. µh( x, t) = ht ( x, t) + 21 h xx ( x, t)
2. σh( x, t) = h x ( x, t)
From 2., one can obtain
h x ( x, t)
= σ ⇐⇒ h( x, t) = eσx+c(t) , (3.16)
h( x, t)
with c(t) arbitrary. Replacing that information in 1., one obtains
1 1
µeσx+c(t) = c′ (t)eσx+c(t) + σ2 eσx+c(t) ⇐⇒ µeσx+c(t) = c′ (t) + σ2 eσx+c(t) (3.17)
2 2
The previous equation imposes that
1 1
c′ (t) = µ − σ2 ⇐⇒ c(t) = µ − σ2 t, (3.18)
2 2
1 2 1 2
obtaining h( x, t) = eσx+(µ− 2 σ )t . Finally, the solution to the Geometric BM SDE is Xt = X0 e(µ− 2 σ )t+σ2 Wt ,
since it is necessary to leave Xt on the left hand side of the SDE, as in the Itô formula. ■
Below we present the strong solutions of specific types of SDE, given the General form dX (t) =
f ( X (t), t)dt + g( X (t), t)dW (t).
3.3. LINEAR SDES 29
The solution to the most general case, i.e. dX (t) = ( at + bt Xt )dt + (ct + et Xt )dW (t), can be computed
by presupposing Xt is the product of two well known stochastic processes Ut = αt dt + β t dWt , and
Vt = bt Vt dt + et Vt dWt , applying the Itô formula to the product Ut Vt , developing the resulting equation,
and find the expressions for αt and β t that satisfy the assumed equality, dXt = d(Ut Vt ).
A solution Xt , with X (t0 ) = X0 , is stochastically bounded if for each ϵ > 0 there exists a γϵ = γϵ (t0 , X0 )
such that
inf P(| Xt | ≤ γϵ ) > 1 − ϵ. (3.19)
t∈[t0 ,T ]
Theorem 3.3 (Existence of Moments). Let the assumptions of theorem 3.1 hold, E| X0 |2p < +∞ for p integer
and Xt a solution to SDE 3.1 on interval [t0 , T ], T < +∞. Thus,
The two previous inequalities can be proved by the use of other well known inequalities, being the two
most worthy of note among them, the Gronwall’s inequality, and the assumed linear growth condition.
RT
Theorem 3.4. Let p ≥ 2, g(t) ∈ M2 ([0, T ]) and E 0
| g(s)| p ds. Then
Z t Z T
p p ( p − 1 ) p p −2
E| g(s)dW (s)| ≤ ( )2 T 2 E | g(s)| p ds. (3.20)
0 2 0
30 CHAPTER 3. SOLUTION OF STOCHASTIC DIFFERENTIAL EQUATIONS
Let f , g satisfy the conditions stated on the theorem 3.1 and Xt be the solution of SDE 3.1. Then, Xt is
continuous on [t0 , T ] if there is a constant c > 0 such that | Xt − Xr |2 ≤ c|t − r |, t0 ≤ r, t ≤ T, where
Rt Rt
Xt − Xr = r f ( Xs , s)ds + r g( Xs , s)dWS holds.
In some cases, the solution of a SDE is a Markov process, as follows.
Theorem 3.5. Let Xt be a solution to SDE 3.1 with coefficient functions f , g satisfying the conditions of theorem
3.1. Then Xt is a Markov process with initial probability given by P0 ( B) = P( X0 ∈ B), B ∈ B , transition
Rt
probability P( x, s; B, t) = P( Xt ( x, s) ∈ B) and Xt = Xt ( x, s) is a solution to Xt = x + s f ( Xr , r )dr +
Rt
s
g( Xr , r )dWr , s ≤ t.
If instead of the conditions of theorem 3.5 f , g are uniformly Lipschitz continuous with linear growth
condition satisfied, then the solution Xt is called a strong Markov process.
When transition probabilities are stationary, P( Xt+τ ∈ B | At ) = P( Xτ ∈ B | X0 ), 0 ≤ u ≤ T − t, a
Markov process is said to be homogeneous.
When in the case of an autonomous SDE, if f , g satisfy the conditions of theorem 3.1 with stationary
transition probabilities the solution Xt is a homogeneous Markov process. Again if f , g are uniformly
Lipschitz continuous and satisfy the linear growth condition, stronger conditions, Xt is called homoge-
neous strong Markov process.
A solution of a SDE can be expressed as special stochastic process called Itô diffusion. A one-
dimensional Markov process with continuous sample paths with probability 1 is a diffusion process if
transition probabilities satisfy:
1. limt↓s t−1 s |y− x|<ϵ p( Xt = y | Xs = x )dy = 0;
R
with a( x, s) representing the conditional infinitesimal mean and b( x, s)2 the variance of the process.
Theorem 3.6. Given a SDE of the form of 3.1, if f and g satisfy the conditions of theorem 3.1, then any solution
Xt is a diffusion process on interval [0, T ] with drift f ( Xt , t) and diffusion g( Xt , t)2 .
3.4 Stability
Due to some constraints in the representation of solutions of SDEs, its importance is more related to the
qualitative information one can extract, rather than the presentation of the explicit solution.
A relevant aspect of solutions is the concept of stability, which evaluates the impact of a small change
in the initial conditions on the solution of the SDE. When small changes in the initial conditions lead to
small changes in the solution we are in the presence of a stable solution. When small changes lead to
large changes in the solution, we have an unstable solution.
Considering equation 3.1, let (a) the assumptions of theorem 3.1 hold, (b) the initial value X (t0 ) =
X0 ∈ R is constant with probability one, (c) equation 3.1 has a unique global solution Xt (t0 , X0 ) with
continuous sample paths and finite moments for any X0 independent of Wt , t ≥ t0 and (d) f (0, t) =
g(0, t) = 0 such that the unique solution Xt ≡ 0 corresponds to the initial value X (t0 ) = X0 = 0. The
trivial or equilibrium solution to SDE 3.1 is Xt ≡ 0.
It is important to state some concepts related to stability. A continuous scalar-valued function v( x )
defined on a suitably restricted neighbourhood of zero Nh = { x || x | < h, h > 0} is positive definite
if v(0) = 0 and v( x ) > 0, x ∈ Nh ∀ x ̸= 0. A function v is negative definite if −v is positive
definite. Function v is descending if exists a positive definite function u( x ) such that v( x, t) ≤ u( x ),
3.4. STABILITY 31
that is, function u( x ) is an upper bounded for v( x, t). Function v( x, t) is radially unbounded if
lim| x|→+∞ inf{t≥t0 } v( x, t) → +∞.
Let Vt = v( Xt , t) be a continuous positive definite function twice differentiable in Xt and once in t
defined on Nh × R, Nh = { Xt || Xt | < h, h > 0}. The operator L associated with SDE 3.1 is
δ δ 1 δ2
L= + f ( Xt , t ) + g ( Xt , t )2 2 . (3.21)
δt δXt 2 δXt
Applying L on Vt we get
δv δv 1 δv2
Lv( Xt , t) = + f ( Xt , t ) + g ( Xt , t )2 2 . (3.22)
δt δXt 2 δXt
Using Itô formula, if Xt ∈ Nh then
δv
dv( Xt , t) = Lv( Xt , t)dt + g( Xt , t) dWt . (3.23)
δXt
A stable solution to 3.1 requires that, on average, E(dVt ) = E( Lv( Xt , t)dt) < 0, which is satisfied if
Lv( Xt , t) ≤ 0, ∀t ≥ 0. Thus, Vt = v( Xt , t) represents the stochastic Lyapunov function for SDE 3.1,
which allows the application of extended Lyapunov method to determine if the trivial solution is stable.
The main issue with Lyapunov method is Lyapunov function, since it is often hard to find.
Assuming that conditions (a)-(d) are satisfied, the trivial or equilibrium solution is stochastically stable
or stable in probability if
lim P sup | Xt (t0 , X0 )| ≥ ε = 0, (3.24)
X0 → 0 t∈[t0 ,+∞)
Moreover, the trivial solution is globally stochastically asymptotically stable, or stochastically asymp-
totic stable in large if it is stochastically stable and ∀ X0 ∈ R
P lim Xt (t0 , X0 ) = 0 = 1. (3.26)
t→+∞
Theorem 3.7. 1. Suppose assumptions (a)-(d) hold and there exists a positive definite stochastic Lyapunov
function v( Xt , t) defined on Nh × [t0 , +∞) that is everywhere continuous, twice differentiable in Xt and
once in t. Let Lv( Xt , t) ≤ 0, t ≥ t0 , 0 ≤ Xt ∈ Nh , where Lv( Xt , t) is given by equation 3.22. Then, the
trivial solution is stochastically stable.
2. If, in addition, v( Xt , t) is decreasing function and Lv( Xt , t) is negative definite, then the trivial solution is
stochastically asymptotically stable.
3. If point 2 hold for a radially bounded function v( Xt , t) defined everywhere on [t0 , +∞) × R, then the
trivial solution is globally stochastically asymptotically stable.
Example: Consider the following SDE: dXt = µXt dt + σXt dWt . Let the selected stochastic Lyapunov
function be Vt = v( Xt , t) = Xt2 . Then, applying equation 3.22,
1
Lv = 0 + µXt (2Xt ) + (σXt )2 (2)
2
2 1 2 2
= 2µXt + σ Xt 2
2
1 2 2
= 2 µ + σ Xt
2
1
= 2 µ + σ2 vt ≤ 0
2
32 CHAPTER 3. SOLUTION OF STOCHASTIC DIFFERENTIAL EQUATIONS
if µ + 12 σ2 < 0. Hence, if the inequality holds, the trivial solution Xt = 0 is globally stochastically
asymptotically stable, or stochastically asymptotic stable in large. ■
A. M. Lyapunov developed the stability concept for ODEs: please consult this author and others with
relevant contribution in this topic to better understand the general idea for the case of ODEs.
Chapter 4
This chapter intends to present some models based on Stochastic Differential Equations with applica-
tions on biological and financial areas. This models are derived from the simpler Ordinary Differential
Equation models, with the introduction of a stochastic component.
4.1 Introduction
A deterministic model ignores the noise presented on the behaviour of a modeled variable. Thus,
the stochastic component aims to address this issue, namely through the introduction of a Brownian
motion process.
Models have a deterministic component that captures the growth rate of the dependent variable,
measurable given the initial conditions, and a stochastic component that represents the variability
provoked by random phenomena.
Regarding biology, some sources of noise are the demographic stochasticity, comprising all factors
related to reproducible and mortality of each individual, environmental stochasticity for issues like
weather, natural disasters and others, mensuration stochasticity originated from the misleading count-
ing and, finally, informational stochasticity for all possible errors or omissions made during the process
of modelling.
In the financial case, noise can be generated when a big institution/ investor program their trading,
creating a suddenly increase in the volume of trade, or when a group of traders bet on the same industry
or company originating a bubble.
The conversion of a non-linear SDE into a linear, by a changing of variable, constitutes a great advantage
since the solution to a linear SDE is already known (please look to 3.3).
Given a non-linear SDE of the form
consider Yt = U ( Xt , t) with δU /δXt ̸= 0. Then, the solution of 4.1 has the form Xt = V (Yt , t), provided
the coefficients a1 (t), a2 (t), b1 (t), b2 (t) of
From the application of Itô formula to U ( Xt , t), one can verify that the following expressions make 4.1
reducible.
1. Ut + f U X + 21 g2 U XX = a1 (t) + a2 (t)U ( Xt , t);
2. gU X = b1 (t) + b2 (t)U ( Xt , t).
33
34 CHAPTER 4. STOCHASTIC DIFFERENTIAL EQUATION MODELS
The next step uses the equalities above to determine the value of a1 (t), a2 (t), b1 (t), b2 (t), U ( Xt , t) to
further plug in the linear equation 4.2 to find the value of Yt . Then, when Yt is known, it just follows to
compute Xt by the inverse operations.
For the special case of autonomous SDE, we have
1. f U X + 12 g2 U XX = a1 + a2 U ( Xt );
2. gU X = b1 + b2 U ( Xt ).
R Xs ds
b
If g( Xt ) ̸= 0 and b2 ̸= 0, then by 2 we have U ( Xt ) = Ce 2 x0 g(s) − bb12 , with C and X0 constants. Applying
R Xs ds
b ( Xt )
U ( Xt ) in 1, one obtains C b2 A( Xt ) + 12 b22 − a2 e 2 x0 g(s) = a1 − a2 bb12 , with A( Xt ) = gf ((XXtt )) − 21 dgdX t
.
dA d dA
Skipping computations, b2 dX t
+ dX t
g( Xt ) dX t
= 0 and it represents the necessary and sufficient
condition to convert an autonomous non-linear SDE into an autonomous linear SDE. Moreover,
R Xs ds
b2
• if b2 ̸= 0, U ( Xt ) = e x0 g(s)
R Xt ds
• if b2 = 0, U ( Xt ) = b1 x0 g ( s )
+ C.
It’s mandatory to chose b1 in a way that it satisfies 2. Given U ( Xt ), b1 , b2 properly defined, 1 allows us
to determine a1 , a2 . Substituting this coefficients in 4.2, the solution Yt is direct. Finally, the solution Xt
to the non-linear problem 4.1 is obtained from the application of the inverse operation on Yt .
Example: To find the solution of the non-linear SDE
1
dXt = − e−2Xt dt + e−Xt dWt , (4.3)
2
f ( Xt ) 1 dg( Xt ) 1 1
A ( Xt ) = − = − e−Xt + e−Xt = 0. (4.4)
g( Xt ) 2 dXt 2 2
gU X = b1 ⇔ e−Xt U X = 1 ⇔ U X = e Xt ⇒ U = e Xt . (4.6)
Then, f U X + 12 g2 U XX = a1 + a2 U ( Xt ) implies
1 1 2
− e−2Xt · e Xt + e−Xt e Xt = a1 + a2 e Xt ⇔ 0 = a1 + a2 e Xt , (4.7)
2 2
thus a1 , a2 can assume any value. To simplify, let a1 = a2 = 0, check 4.2 and note that the linear SDE is
Z t Z t
dYt = dWt ⇒ dYs = dWs ⇔ Yt = Y0 + Wt . (4.8)
0 0
Yt = Y0 + Wt ⇔ e Xt = e X0 + Wt ⇔ Xt = ln(e X0 + Wt ). ■ (4.9)
1
dYt = e Xt dXt + e Xt (dXt )2
2
1
= e Xt − e−2Xt dt + e−Xt dWt
2
1 Xt 1 −2Xt 2
− Xt
+ e − e dt + e dWt
2 2
1 1
= − e−Xt dt + dWt + e Xt (e−Xt )2 (dWt )2
2 2
1 − Xt 1
= − e dt + dWt + e−Xt dt
2 2
= dWt
To finalize the answer, one must proceed as in the end of the previous example. ■
where r is a constant and represents the instantaneous relative growth rate that can be rewritten as the
difference of birds (b) and deaths (d) for a single species with no migration, i.e. r = b − d, termed as the
net bird rate or the constant per capita growth rate. The solution to 4.10 is
so when r > 0 the population grows exponentially and when r < 0 the decreases exponentially.
If instead of considering relative population growth rate just as the difference between birds and deaths,
one considers a new relative population growth rate defined as the net bird rate plus a noise component,
in the form of r (t) − α(t) · noise with r, α constants and noise = dW
dt , then the ODE is now as follows:
t
dNt dWt
= (r ( t ) − α ( t ) · ) Nt , N (0) = N0 , t ≥ 0 (4.12)
dt dt
or, alternatively,
dNt = r (t) Nt dt − α(t) Nt dWt , N (0) = N0 , t ≥ 0. (4.13)
Noting that SDE 4.13 is autonomous, linear and homogeneous, the solution is a geometric Brownian
1 2
motion process of the form N (t) = N0 e(r− 2 a )t+αWt with
1. E( N (t)) = N0 ert ;
2
2. Var ( N (t)) = N0 e2rt (eα t − 1), if N0 is independent of Wt .
In the long-term:
3. if r = 21 α2 , then N (t) fluctuates between arbitrarily large and small small values as t −→ +∞.
36 CHAPTER 4. STOCHASTIC DIFFERENTIAL EQUATION MODELS
To fix this issue, Verhulst proposed a self-regulating mechanism that ponders the population
relative
N (t)
growth rate according to the current size of the population. The new r is given by r 1 − K , where
K represents the limiting size of the population or carrying capacity. Thus, the deterministic Logistic
Growth Model is
N (t)
dN (t) = r 1 − N (t)dt, (4.14)
K
with solution
KN0
N (t) = , N (0) = N0 ,t ≥ 0. (4.15)
N0 + (K − N0 )e−rt
If births are defined as b = b1 − b2 N (t), deaths as d = d1 + d2 N (t) and r as the difference between
births and deaths, then
dNt
= b1 − b2 N (t) − d1 − d2 N (t) N (t) (4.16)
dt
= (b1 − d1 ) − (b2 + d2 ) N (t) N (t)
N (t)
= (b1 − d1 ) 1 − N ( t ).
(b1 − d1 )/(b2 + d2 )
The stochastic Logistic Growth Model appears with the introduction of the multiplicative noise term,
originating
N (t)
dN (t) = r 1 − N (t)dt + αN (t)dWt , (4.17)
K
A strategy similar to the used to convert deterministic population growth model into a stochastic one
may also be applied here.
The solution to the stochastic logistic growth model is obtained resorting to the Itô equation, considering
V ( Nt , t) = Nt−1 and VN = − Nt−2 , VNN = 2Nt−3 , f = r (1 − NKt ) Nt , g = αNt . Following the procedure
of Itô formula, one may arrive to the solution
n 1 o
d( Nt−1 ) = r − Nt−1 + + α2 Nt−1 dt − αNt−1 dWt . (4.18)
k
1 2
− rYt−r−1 + αYt · (−r )(−r − 1)Yt−r−2 = a1 (t) + a2 (t)Yt−r
βYt (1 − Ytr ) · (4.28)
2
⇐⇒
1
rβ + r (r + 1)α2 − rβ Yt−r = a1 (t) + a2 (t)Yt−r , (4.29)
2
so a1 = rβ and a2 = 21 r (r + 1)α2 − rβ.
Let Zt = Yt−r . The linear SDE is
1
dZt = rβ + r (r + 1)α2 − rβ Zt dt − rαZt dWt . (4.30)
2
Resorting to 3.3 and inverting the change of variables, the final solution to 4.22 is achieved. This
solution is
n N −r Z t o− 1r
−1 0
Nt = N∞ Φt r + βr Φ−
s
1
ds , (4.31)
N∞ 0
n o
with Φt = exp − αrWt + 21 α2 r − βr t .
38 CHAPTER 4. STOCHASTIC DIFFERENTIAL EQUATION MODELS
Nt
Considering Xt = N∞ , one can write ODE 4.32 as
dNt N∞ Nt
= β(ln ) dt ⇔ dXt = βln( Xt−1 ) Xt dt ⇔ dXt = − βXt lnXt dt. (4.34)
N∞ Nt N∞
β d −β β
b2 − + kXt = 0 ⇔ b2 − + 0 = 0 ⇔ b2 = 0, (4.37)
kXt dXt kXt kXt
Z Xt Z Xt
1 1 1
B ( Xt ) = ds = ds = lnXt , (4.38)
X0 g(s) X0 kXs k
b1
U ( Xt ) = b1 B( Xt ) + C = lnXt + C, (4.39)
k
b1
gU X = b1 + b2 U ( Xt ) ⇐⇒ kXt
= b1 + 0 ⇐⇒ b1 = b1 , (4.40)
kXt
so that b1 can assume any value. Thus, with b1 = k and C = 0, we have
k
U ( Xt ) = lnXt + 0 = lnXt . (4.41)
k
To find a1 , a2 we use
1 βXt lnXt k2 Xt2 k2 k2
f U X + g2 U XX = − − = − βlnX t − = − βZt − , (4.42)
2 Xt 2Xt2 2 2
with Zt = U ( Xt ) = lnXt .
The reduced form of 4.35 is
1
dZt = − k2 − βZt dt + kdWt , (4.43)
2
whose solution is given by 3.3, since now we are working with an autonomous linear SDE. After
inverting the change of variables, the final solution to 4.35 is the stochastic Gompertz growth function
Z t
n h 1 2 βt io
Xt = exp e− βt lnX0 − k ( e − 1) + k e βs dWs . (4.44)
2β 0
4.6. DETERMINISTIC AND STOCHASTIC NEGATIVE EXPONENTIAL GROWTH MODELS 39
where NN∞0 − 1 is called feedback term because it compares the limit level of the variable N with its
current state, returning a decrease when the variable is above the limit (Nt > N∞ ) or supporting an
increase when this value is below the limit (Nt < N∞ ). β represents the growth rate.
The representation of the evolution of a variable under the deterministic negative exponential model is
given by
Nt = N∞ 1 − e− βt , t ≥ 0, N∞ , β > 0. (4.46)
with solution
Nt = N0 + βt, t ≥ 0. (4.50)
the general form of a stochastic linear growth model with multiplicative noise. This SDE is autonomous
and linear, with parameter a = β, b = c = 0 and e = α, and its solution is
Z t
n 1 o n 1 o
Nt = exp αWt − α2 t × N0 + β exp α2 s − αWs ds , (4.52)
2 0 2
as defined on 3.3.
If instead of the multiplicative noise one introduces the additive noise, the SDE becomes
This model is famous as the Cox-Ingersoll-Ross (CIR) model to the dynamics of short-run interest rates.
In that sense, α√ is the speed of adjustment and β, σ > 0. The drift α − βXt ensures mean revertion and
the diffusion σ X t describes volatility.
√
Given an integrating factor e βt , if one rewrites 4.55 as dXt + βXt dt = αdt + σ X t dWt , then
√ √
e βt dXt + βe βt Xt dt = αe βt dt + σe βt X t dWt ⇔ d(e βt Xt ) = αe βt dt + σe βt X t dWt . (4.56)
CIR model in 4.55 was subject to an extension, defined as CKLS mean-reverting gamma SDE in honour
to the authors, and is defined as
γ
dXt = (α − βXt )dt + σXt dWt , X (0) = X0 > 0, (4.62)
5.1 Introduction
In a lot of cases, it is not possible to determine a close form solution to the Itô initial value problem,
given by the stochastic differential equation:
Hence, similar to the deterministic differential equations case, numerical methods are employed. We
start by discretizing the interval [t0 , T ] into N smaller subintervals t0 < t1 < · · · < t N = T, then
√ pseudorandom number generator, compute increments for ∆Wi = W (ti+1 ) − W (ti ) ∼
using a normal
N (0, ∆ti ) = ∆ti N (0, 1), where ∆ti = ti+1 − ti , and applying a recursive algorithm to calculate the
approximations Yi of X (ti ; X0 , t0 ), where Y0 = X0 , for each i = 0, 1, · · · , N.
Definition 5.1 (Weak and Strong Convergence). Let us consider a discretization ti of [t0 , T ] into N
subintervals of equal length, i.e. ti = Ni ( T − t0 ) + t0 , hence ∆ti = T − t0
N , for i = 0, 1, . . . , N.
can be calculated by the application of the deterministic Taylor formula until a certain r-term, resulting in
(t−t )k
the approximation h( Xt ) ≈ h( Xt0 ) + ∑rk=1 k!0 Lh( X0 ), where L is the operator L = f ∂X ∂
. Analogously,
for the stochastic case where f (t, Xt ) = f ( Xt ) and g(t, Xt ) = g( Xt ) (that is dXt = f ( Xt ) dt + g( Xt ) dWt ),
and f and g are sufficiently smooth real-value functions, satisfying the linear growth condition, then
41
42CHAPTER 5. APPROXIMATION AND ESTIMATION OF SOLUTIONS TO STOCHASTIC DIFFERENTIAL EQ
∂2
Z t
∂ 1
h ( Xt ) = h ( Xt0 ) + f ( Xs ) h( Xs ) + g( Xs )2 2 h( Xs ) ds (5.4)
t0 ∂X 2 ∂X
Z t
∂
+ g ( Xs ) h( Xs ) dWs , t ∈ [t0 , T ]
t0 ∂X
or as
Z t Z t
h ( Xt ) = h ( Xt0 ) + L0 h( Xs ) ds + L1 h( Xs ) dWs (5.5)
t0 t0
where
∂ 1 ∂2 ∂
L0 = f + g2 2 , L1 g (5.6)
∂X 2 ∂X ∂X
If we apply this formula to the functions f and g within the formula we obtain the Itô-Taylor expansion
Note that, in the context of numerical methods, we can continuously apply the Itô-Taylor expansion
and getting better algorithms for approximation of a stochastic Xt , with each subsequent expansion.
Thus, the Euler-Maruyama approximation is defined as a continuous-time stochastic process {Yt }t∈[t0 ,T ]
that satisfies the iterative scheme:
√
Yi+1 = Yi + f (Yi , ti )∆t + g(Yi , ti ) ∆tZ, (5.10)
where Z ∼ N (0, 1). The EM routine converges with strong order γ = 1/2 and weak order β = 1.
5.3. ITERATIVE SCHEMES FOR APPROXIMATING SDE’S 43
Z t i +1 Z t i +1 Z t i +1 Z s
∂g
X t i +1 = X t i + f ( Xs , s) ds + g( Xs , s) dWs + g ( Xz , z ) ( Xz , z) dWz dWs + Rt (5.11)
ti ti ti ti ∂Xz
Z t i +1 Z t i +1 t i +1 s Z Z
∂g
≈ Xti + f ( Xti , t i ) ds + g( Xti , ti ) dWs + g( Xti , ti ) ( Xti , t i ) dWz dWs
ti ti ∂Xti ti ti
∂g 1
= Xti + f ( Xti , ti )∆ti + g( Xti , ti )∆Wti + g( Xti , ti ) ( Xti , ti ) ((∆Wti )2 − ∆ti )
∂Xti 2
Hence, the Milstein approximation is defined as a continuous-time stochastic process {Yt }t∈[t0 ,T ]
satisfying the iterative scheme:
√ 1 ∂g
Yi+1 = Yi + f (Yi , ti )∆t + g(Yi , ti ) ∆tZ + g(Yi , ti ) ( Xti , ti ) ∆tZ2 − ∆t ,
(5.12)
2 ∂Yi
where Z ∼ N (0, 1). The Milstein routine converges with strong order γ = 1.
Example
1
Yti+1 = Yti + f (Yti )∆ti + g(Yti )∆Wti + g(Yt ) gX (Yt ) (∆Wti )2 − ∆ti
(5.13)
2
∆Vt
1
+ f X (Yti ) g(Yti )∆ti ∆Wti + √ i
2 3
1 1
+ f (Yti ) f X (Yti ) + ( g(Yti )) f XX (Yti ) (∆ti )2
2
2 2
∆Vti
1 1
+ f (Yti ) gX (Yti ) + ( g(Yti )) gXX (Yti ) ∆ti ∆Wti −
2
√
2 2 3
1 1
+ g(Yti ) g(Yti ) gXX (Yti ) + ( g(Yti ))2 (∆Wti )2 − ∆ti ∆Wti ,
2 3
where ∆Wti , ∆Vti ∼ N (0, ∆ti ) are independent random variables, for all i = 1, . . . , N.
2. Weak Order 2.0 Taylor Scheme is defined as
1
Yti+1 = Yti + f (Yti )∆ti + g(Yti )∆Wti + g(Yt ) gX (Yt ) (∆Wti )2 − ∆ti
(5.14)
2
∆Vti
1
+ f X (Yti ) g(Yti )∆ti ∆Wti + √
2 3
1 1
+ f (Yti ) f X (Yti ) + ( g(Yti )) f XX (Yti ) (∆ti )2
2
2 2
∆Vti
1 1
+ f (Yti ) gX (Yti ) + ( g(Yti )) gXX (Yti ) ∆ti ∆Wti − √
2
,
2 2 3
where ∆Wti , ∆Vti ∼ N (0, ∆ti ) are independent random variables for all i = 1, . . . , N.
3. Weak Order 2.0 Predictor-Corrector Scheme is divided into two steps
• Step 1: Consider a predictor Y ti given by the scheme
1
Y ti+1 = Yti + f (Yti )∆ti + λi + f X (Yt ) g(Yt ) ∆Ŵti ∆ti
(5.15)
2
1 1
+ f (Yti ) f X (Yti ) + ( g(Yti ))2 f XX (Yti ) (∆ti )2 ,
2 2
where
1
λi = g(Yti )∆Ŵti + g(Yt ) gX (Yt ) (∆Ŵti )2 − ∆ti
(5.16)
2
1 1
+ f (Yti ) gX (Yti ) + ( g(Yti )) gXX (Yti ) ∆Ŵti ∆ti ,
2
2 2
1
Y ti+1 + f (Yti ) ∆ti + λi
Yti+1 = Yti + (5.17)
2
1
Y ti+1 = Yti + [ f (ui ) + f (Yti )] ∆ti + Oi (5.18)
2
5.3. ITERATIVE SCHEMES FOR APPROXIMATING SDE’S 45
1
f (Y ti+1 ) + f (Yti ) ∆ti + Oi
Yti+1 = Yti + (5.21)
2
1
Z
Yt = F ( Xt ) = du , (5.24)
g(u) u = Xt
Notice that the previous theorem is a direct result of applying the Itô formula to F ( Xt ).
46CHAPTER 5. APPROXIMATION AND ESTIMATION OF SOLUTIONS TO STOCHASTIC DIFFERENTIAL EQ
The Ozaki method is developed by taking bY (Ys ) to be constant for an interval s ∈ [t, t + ∆t]. In this
case, for Zs = E[Ys |Yt ] the previous SDE is transformed into the ODE dZ ds = [ bY ( Zt ) s + b (Yt )], which
s
Recall that we supposed bY (Ys ) to be a constant on the interval s ∈ [t, t + ∆t], given this, the SDE for Yt
becomes the autonomous linear narrow sense (ALNS) variety dYt = Kt Yt dt + σ dWt , which we know
the solution to be
Z t+∆t
Yt+∆t = Yt eKt ∆t + σ eKt (t+∆t−s) dWs . (5.28)
t
Analogously to the previous method, the Shoji-Ozaki method is developed by supposing bt (Yt , t),
bY (Yt , t) and bYY (Yt , t) all to be constant in an interval t ∈ [s, s + ∆s]. Therefore if we apply the Itô
formula to b, we get
db(Yt , t) = Ls Yt + Ms t + Ns , (5.32)
5.4. LOCAL LINEARIZATION TECHNIQUES 47
where
If we consider the transformed process Zt = e− Ls t Yt , apply the Itô formula to it, compute the integral
formula of Zt and reincorporate the result back into Yt , we arrive at the discretized process of Yt given
by
Z s+∆s
b(Ys , s) Ls ∆s M h i
− 1 + 2 e Ls ∆s − 1 + Ls ∆s + σ e Ls (s+∆s−u) dWu .
s
Ys+∆s = Ys + e (5.35)
Ls Ls s
q
2Ls ∆s
The stochastic integral of Yt is a normal distribution with mean 0 and standard deviation σ e 2Ls−1 ,
thus the discrete time model, given by the Shoji-Ozaki method, which approximates Ys , at Ys ̸= 0 and
Ls ̸= 0, is given by:
Es | Xt − X̃t | p = O (t − s)2p
(5.37)
denoted by X̃t − X̃s , using the Shoji-Ozaki local linearization method. Then the convergence of the step-by-step
approximation is O(∆t) in L p , that is
6.1 Introduction
Consider the time-homogeneous stochastic differential equation
dXt = f ( Xt ; θ ) dt + g( Xt ; θ ) dWt , t ≥ 0 (6.1)
our objective is to estimate θ ∈ Θ where Θ is an open subset of Rn . Since, continuous-time diffusions
can only be observed at discrete time points, the transition probability density function is not explicitly
computable. Thus, in order to overcome this quandary, we will make use of the pseudo-maximum
likelihood (PML) method, which is undergirded by the well-known maximum likelihood estimation
technique.
In this equation θ is constant and xi , for i = 1, . . . , n are variables. The inverse is true for the likelihood
function of a sample
n
L (θ; x1 , . . . , xn , n) = ∏ p( xi ; θ ) (6.3)
i =1
where xi , for i = 1, . . . , n are realized samples of the random variables and θ is now the variable. For
ease of computation the logarithmic-likelihood function is sometimes used:
n
log L (θ; x1 , . . . , xn , n) = ∑ log p(xi ; θ ). (6.4)
i =1
The maximum likelihood estimate θ̂ ∈ Rn of θ, for the given xi realizations for the independent random
variables, results from a well know result in analysis, that the local maximum θ̂ of L is given when
∂ log L
= 0, (6.5)
∂θ θ =θ̂
∂2 log L
and ∂θ 2
is negative definite.
θ =θ̂
49
50 CHAPTER 6. ESTIMATION OF PARAMETERS OF STOCHASTIC DIFFERENTIAL EQUATIONS
6.3 The Markov Property, Transitional Densities, and the Likelihood Func-
tion of the Sample
Let { Xt }t∈[t0 ,T ] be a continuous Markov process on the probability space (Ω, A, {A}t∈[t0 ,T ] , P), and con-
sider a time discretization t0 ≤ t1 ≤ · · · ≤ tn = T, and let us define Xi = X (ti ), for i = 0, 1, . . . , n. Let
P( X0 , . . . , Xn |θ ) be the joint probability of observing the path, for the a given value for parameter θ. Re-
call that if the process Xt is Markov, it has short memory, that is P( Xn | X0 , . . . , Xn−1 ; θ ) = P( Xn | Xn−1 ; θ ),
therefore by the continuous application of the conditional probability definition and the Markov
property P( X0 , . . . , Xn |θ ) is simplified into,
P ( X0 , . . . , X n ; θ ) = P ( X n | X n − 1 ; θ ) · P ( X n − 1 | X n − 2 ; θ ) · · · P ( X1 | X0 ; θ ) · P ( X0 ; θ ) . (6.6)
The joint probability P( X0 , . . . , Xn ; θ ) of the stochastic process { Xt }t∈[t0 ,T ] has the probability density
function, therefore the Markov property can be expressed as p( Xn | X1 , . . . , Xn−1 ; θ ) = p( Xn | Xn−1 ; θ ).
Again, the continuous application of the conditional probability definition and the Markov property,
results in
p ( X0 , . . . , X n ; θ ) = p ( X n | X n − 1 ; θ ) · p ( X n − 1 | X n − 2 ; θ ) · · · p ( X1 | X0 ; θ ) · p ( X0 ; θ ) . (6.7)
Thus the likelihood function, for the θ variable, of a realized sample xi , for i = 1, . . . , n, is given by the
formula
n
L (θ; x1 , . . . , xn , n) = p( x0 ; θ ) ∏ p( xi+1 | xi ; θ ) (6.8)
i =1
Thus, the change of variables for a probability density function p X ( x1 , . . . , xn ) of X for a probabil-
ity density function pY (y1 , . . . , yn ) of Y, where Y is defined as X = f (Y ), where f (y1 , . . . , yn ) =
( x1 (y1 , . . . , yn ), . . . , xn (y1 , . . . , yn )) is an homeomorphism, is simply a generalization of the previous
property. In fact, for every measurable A set we have
Z
E[1 A ( X )] = p X ( x1 , . . . , xn ) dx1 . . . dxn (6.10)
X ( A)
Z
∂x
= p X ◦ f (y1 , . . . , yn ) det dy1 . . . dyn .
Y ( A) ∂y
Example:
Consider the SDE
dXt = µXt dt + σXt dWt , t ≥ 0
and n realized samples for Xt , at increasing time points ti , for i = 1, . . . , n. By the Itô formula we have:
√
X (ti ) = X (ti−1 )e(µ− 2 σ )∆ti +σ ∆ti Zi
1 2
where Zi ∼ N (0, 1) are independent and ∆ti = ti − ti−1 . If we denote xi = log( X (ti ) − X (ti−1 )),
then xi ∼ N µ − 2 σ ∆ti , σ ∆ti . By the change of variables equation (6.11), we conclude that the
1 2
[ xi − (µ − (1/2)σ2 )∆ti ]2
1
p( xi , ∆ti ; µ, σ) = √ exp −
σ( X (ti ) − X( ti−1 )) 2π∆ti 2σ2 ∆ti
Thus, the log-likelihood function for this probability density function is
n
n n
log L = − ∑ log( X (ti ) − X (ti−1 )) − log(2π ) − log(σ2 ∆ti )
i =1
2 2
n
1
−
2σ ∆ti
2 ∑ [xi − (µ − (1/2)σ2 )∆ti ]2 .
i =1
∂ log L ∂ log L ∑in=1 xi
= µ̂ − 12 σ̂2 ∆ti and
If we compute ∂µ = 0, and ∂σ = 0, we arrive at the expressions m̂ = n
∑in=1 ( xi −m̂)2
ν̂ = n = σ̂2 ti which in turn yield the estimates for µ and σ:
s
m̂ + (1/2)ν ν̂
µ̂ = , and σ̂ = .■ (6.12)
∆ti ∆ti
or if X0 is independent from θ
n
L(θ, X1 , . . . , Xn , ∆ti , n) = ∏ h(∆ti , Xi−1 , Xi ; θ ). (6.14)
i =1
Thus, the PML estimate of θ is the value that maximizes the logarithm of the pseudo-likelihood function,
i.e.
n
θ̂ = arg max ∑ log h(∆ti , Xi−1 , Xi ; θ ). (6.15)
θ i =1
In the following, we shall observe how to estimate the PML for the SDE (6.1) using 3 three different
iterations schemes: the Euler-Maruyama scheme, the Ozaki linearization scheme, and the Shoji-Ozaki
linearization scheme. Recall, the every SDE of the form (6.1) can be transformed to an SDE with constant
diffusion term, using the Lamperti transformation, hence we only need to consider SDE’s of the form
dXt = f ( Xt ; θ ) dt + σ dWt , t ≥ 0, X (0) = X0 . (6.16)
52 CHAPTER 6. ESTIMATION OF PARAMETERS OF STOCHASTIC DIFFERENTIAL EQUATIONS
Since, Xt+∆ti − Xt are independent Normal random variables their pseudo-transition density function
are thus
( Xi − Xi −1 − f ( Xi −1 ;θ )∆ti )2
1 −
h(∆ti , Xi−1 , Xi ; θ, σ) = p e 2σ2 ∆ti
,
2πσ2 ∆ti
Example:
Consider the Ornstein-Uhlenbeck process given by the SDE
By the previous results, the pseudo log-likelihood function associated with the EM-scheme is
n
log L(θ, σ; X1 , . . . , Xn , ∆ti , n) = log h( X0 ; θ, σ) −
log(2πσ2 ∆ti )
2
1 n ( X − Xi−1 − α(µ − Xi−1 )∆ti )2
− ∑ i
2 i =1 σ2 ∆ti
where
12
e2Kt ∆t − 1
At = eKt ∆t , Bt = σ , (6.20)
2Kt
and
1 f ( Xt ) f X (Xt )∆t
Kt = log 1 + e −1 . (6.21)
∆t Xt f X ( Xt )
6.6. THE TRANSITION PROBABILITY DENSITY FUNCTION IS UNKNOWN 53
Similarly to the EM scheme case, Xt+∆ti − Xt are independent Normal random variables with expected
value, and variance given by
e2Kti ∆ti − 1
!
Kti ∆ti
Ei = E( Xti ) = e Xti , Vi = V( Xti ) = σ 2
, (6.22)
2Kt
n 1 n ( Xi − Ei−1 )2
log L(θ, σ; X1 , . . . , Xn , ∆ti , n) = log h( X0 ; θ, σ) − log(2πVi−1 ) − ∑ (6.24)
2 2 i =1 σ2 ∆ti
Example:
Consider the SDE, dXt = θXt3 dt + σ dWt , by the previous results, the pseudo log-likelihood function
resulting from the Ozaki Linearization scheme, is completely defined by
1 1 3θX2 ∆ti
Ki = log 1 + e i −1
∆ti 3
because, in turn, Ki defines the expected value and variance as
h i2
1 + 1
e 3θXi2 ∆ti − 1 − 1
Xi 3θX2 ∆ti
3
Ei = eKti ∆ti Xti = Xi + e i − 1 , Vi = σ2 . ■
h i
3 2
1+ 1
e 3θXi2 ∆ti
−1
∆ti 3
where
f ( Xt ) Lt ∆t Mt h Lt ∆t i
At = 1 + e −1 + e − 1 − L t ∆t , (6.26)
Xt L t Xt L2t
2Lt ∆t 1
e −1 2 σ2
Bt = σ , Lt = f X ( Xt ), and Mt = f XX ( Xt ).
2Lt 2
Similarly to the previous two schemes, Xt+∆ti − Xt are independent Normal random variables with
expected value, and variance given by
e2Lti ∆ti − 1
!
Ei = E( Xti ) = A( Xti ) Xti , Vi = V( Xti ) = σ 2
(6.27)
2Lti
n 1 n ( Xi − Ei−1 )2
log L(θ, σ; X1 , . . . , Xn , ∆ti , n) = log h( X0 ; θ, σ) − log(2πVi−1 ) − ∑ (6.29)
2 2 i =1 σ2 ∆ti
54 CHAPTER 6. ESTIMATION OF PARAMETERS OF STOCHASTIC DIFFERENTIAL EQUATIONS
Example:
Consider the SDE, dXt = θXt3 dt + σ dWt , by the previous results, the pseudo log-likelihood function
resulting from the Shoji-Ozaki Linearization scheme, is completely defined by
Li = 3θXi2 , Mi = σ2 (3θXi )
Braumann, Carlos A. 2019. Introduction to Stochastic Differential Equations with Applications to Modelling
in Biology and Finance. John Wiley & Sons.
Braun, W John, and Duncan J Murdoch. 2021. A First Course in Statistical Programming with r. Cambridge
University Press.
Capiński, Marek, and Peter E Kopp. 2004. Measure, Integral and Probability. Vol. 14. Springer.
Øksendal, Bernt, and Bernt Øksendal. 2003. Stochastic Differential Equations. Springer.
Panik, Michael J. 2017. “Stochastic Differential Equations: An Introduction with Applications in
Population Dynamics Modelling.”
Pires, Gabriel E. 2012. Cálculo Diferencial e Integral Em Rn. Vol. 45. IST Press.
55