EM Processus stochastique
➢ Rappel sur les probabilités, probabilité conditionnelles et moments d’ordres élevés
➢ Variables aléatoires et principe de convergence
➢ Calcul stochastique
➢ Intégrales stochastiques et intégrales d’Ito
➢ Processus de Browniens et de Wiener
➢ Chaine de Markov
➢ Propagation des incertitudes
➢ Equations différentielles stochastiques
1
Stochastic Modeling
Objectives
The objectives of this course are to
- introduce students to the standard concepts and methods of stochastic modeling;
- illustrate the rich diversity of applications of stochastic processes in the general sciences and
appropriate problems
More precisely, to
- introduce several classes of stochastic processes,
- analyze their behavior over a finite or infinite time horizon,
- study topics such as martingales, Markov chains, renewal processes, and queuing
systems with various approaches
- give a knowledge allowing students to enhance their problem-solving skills.
Prof AZRAR Lahcen
[Link]@[Link]
2
Chapter I: Preliminaries
Probability Review
Events and Probabilities
Let A and B be events. The event that at least one of A or B occurs is called the
union of A and B and is written; A∪B
The event that both occur is called the intersection of A and B and is written
A∩B, or simply AB.
Generally,
Given events A1; A2; …, the event that at least one occurs is written A1 ∪A2∪
A3, … = ⋃∞
𝑖=1 𝐴𝑖
The event that all occur is written as: A1 ∩A2∩A3∩, … = ⋂∞
𝑖=1 𝐴𝑖
Probability
The probability of an event A is written Pr(A).
The certain event, denoted by Ω always occurs, and Pr(Ω) = 1.
The impossible event, denoted by Ø, never occurs, and Pr(Ø) = 0.
It is always the case that 0 Pr(A) 1 for any event A.
Events A and B are said to be disjoint if A∩B = Ø,
i.e., if A and B both cannot occur.
For disjoint events A and B, we have the addition law
Pr(A∪ B) = Pr(A) + Pr(B).
Theorem
Let A and B be two events with A B. Then
Corollary
Let A and B be two events with A B. Then
3
More generally, even if we do not have A B, we have the following property.
Theorem (Principle of inclusion–exclusion, two event version)
Let A and B be two events. Then
Exercise
Demonstrate this theorem
A stronger form of the addition law is as follows: Let A1; A2; …., be events with
Ai and Aj disjoint whenever i ≠ j. Then,
The addition law leads directly to the law of total probability:
Let A1; A2,…, be disjoint events for which Ω = A1 ∪A2 ∪, …. . Equivalently,
exactly one of the events A1; A2; , …, will occur.
The law of total probability asserts that Pr(B) = ∑∞
𝑖=1 ⬚ 𝑃𝑟(𝐵 ∩ 𝐴𝑖) for any event
B.
The law enables the calculation of the probability of an event B from the
sometimes more easily determined probabilities 𝑃𝑟(𝐵 ∩ 𝐴𝑖) where i = 1;2; …
Events A and B are said to be independent if 𝑷𝒓(𝑩 ∩ 𝑨) = 𝑷𝒓(𝑩 ). 𝑷𝒓(𝑨)
Property
Events A1; A2;…. are independent if
for every finite set of distinct indices i1; i2…, in,
Axiomatic Probability Theory and probability space
Recall that the basic elements of probability theory are
1. the sample space, a set Ω whose elements ω correspond to the possible
outcomes of an experiment;
2. the family of events, a collection ℱ of subsets A
we say that the event A occurs if the outcome ω of the experiment is an
element of A;
4
3. the probability measure, a function P defined on ℱ and satisfying
and
if the events A1; A2; , …. are disjoint, i.e., Ai∩Bj = Ø, when i ≠ j.
The triple ( Ω; ℱ; P) is called a probability space
σ-algebra
The family of events ℱ should satisfy
(a) Ø is in ℱ and Ω is in ℱ; (*)
c c
(b) A is in ℱ whenever A is in ℱ, where A ={ 𝜔 ∈ 𝛀, 𝜔 ∉ 𝐴 } is the
complement of A; and
(c) ⋃∞
𝑛=1 𝐴𝑛 is in ℱ whenever An is in ℱ for n = 1; 2; …
A collection ℱ of subsets of a set 𝛀 satisfying (a, b, c) is called a σ -algebra
Random variable
A random variable is a variable that takes on its values by chance.
A random variable is also called random quantity, aleatory variable, or stochastic
variable
Definition
A random variable (real valued) is a function from the sample space S to the set
R of all real numbers.
A random variable X as a function on the sample space S
5
Example Indicator Functions
One special kind of random variable is worth mentioning. If A is any event, then
we can define the indicator function of A, written IA, to be the random variable
which is equal to 1 on A, and is equal to 0 on AC
Given random variables X and Y, we can perform the usual arithmetic operations
on them.
Thus, for example, Z=X2 is another random variable, defined by
Z (s)=X 2(s) =(X(s)) 2 = X(s) X(s).
Similarly, if W=X.Y 3, then W( s) = X(s).Y(s).Y(s).Y(s)
Also, if Z =X+Y , then Z (s) = X(s) + Y (s ), etc.
Most of the time we adhere to the convention of using capital letters such as X; Y;
Z to denote random variables, and lowercase letters such as x, y, z for real
numbers.
The expression { X x } is the event that the random variable X assumes a value
that is less than or equal to the real number x.
More precisely, a real random variable X is a real-valued function defined on 𝛀
fulfilling certain “measurability” conditions.
The distribution function of the random variable X is formally given by
In other words, the probability that the random variable X takes a value in (a; b]
is calculated as the probability of the set of outcomes 𝜔 for which a < X( 𝜔) b
where X satisfies the condition
Let A be any σ -algebra of subsets of 𝛀. We say that X is measurable with
respect to A , or more briefly A -measurable, if
6
Stochastic Processes
A stochastic process is a family of random variables Xt, where t is a parameter
running over a suitable index set T. (Where convenient, we will write X(t) instead
of Xt.)
In a common situation, the index t corresponds to discrete units of time, and the
index set is T = {0;1;2; ….} or T =[0, )
t often represents time, but different situations also frequently arise.
t may represent distance from an arbitrary origin, and Xt may indicate the number
of defects in the interval (0; t]
or the number of cars in the interval (0; t] along a highway
Example: assume that we run an
experiment and observe the random values
of X(·) as time evolves, we are in fact
looking at a sample path X(t, ω) (t ≥ 0)
for some fixed ω ∈ Ω. If we rerun the
Two continuous sample paths of a
experiment, we will in general observe
stochastic process
a different sample path:
Probability Distribution Function
The probability that the event occurs is written Pr{ X x }
Allowing x to vary, this probability defines a function
called the distribution function of the random variable X.
Where several random variables appear in the same context, we may choose to
distinguish their distribution functions with subscripts, writing,
7
defining the distribution functions of the random variables X and Y, respectively,
as functions of the real variable ξ
Remark
The distribution function contains all the information available about a random
variable
Properties
Discrete random variable
A random variable X is called discrete if there is a finite or denumerable set of
distinct values x1; x2; … such that
Probability mass function
The function
is called the probability mass function for the random variable X.
It is related to the distribution function via
Remark
The distribution function for a discrete random variable is a step function,
which increases only in jumps, the size of the jump at xi being p(xi).
Continuous random variable
If Pr{X = x} = 0 for every value of x, then the random variable X is called
continuous and its distribution function F(x) is a continuous function of x.
Probability density function
If there is a nonnegative function f (x) = fX(x) defined for -< x < such that
8
then f (x) is called the probability density function for the random variable X.
If X has a probability density function f (x), then X is continuous and
If F(x) is differentiable in x, then X has a probability density function given by
In differential form, this equation leads to the informal statement
This leads to a shorthand approximate version
where o(x) is a generic remainder term of order less than x as x →0.
Moments and Expected Values
If X is a discrete random variable, then its mth moment is given by
provided that the infinite sum converges absolutely.
If X is a continuous random variable with probability density function f (x), then
its mth moment is given by
provided that this integral converges absolutely.
9
Mean or Expected value and Variance of X
The first moment, corresponding to m = 1, is commonly called the mean or
expected value of X and written mX or X.
The mth central moment of X is defined as the mth moment of the random variable
X - X, provided that X exists.
The first central moment is zero.
The second central moment is called the variance of X and written σ2x
or Var[X].
We have the equivalent formulas
The median of a random variable X is any value υ with the property that
Expectation of g(X)
If X is a random variable and g is a function, then Y =g(X) is also a random variable.
Discrete case
If X is a discrete random variable with possible values x1; x2; … then the expectation
of g(X) is given by:
provided that the sum converges absolutely.
10
Continuous Case
If X is continuous and has the probability density function fX, then the expected
value of g(X) is evaluated from
Let FY (y) = Pr{Y y} denotes the distribution function for Y = g(X). When X is a discrete
random variable, then
if yi = g(xi) and provided that the second sum converges absolutely
In general, we have
Joint Distribution Functions
Given a pair (X; Y) of random variables, their joint distribution function is the
function FXY of two real variables given by
A joint distribution function FXY is said to possess a (joint) probability density
if there exists a function fXY of two real variables for which
Marginal distribution function
The function FX(x) = lim y → F(x; y) is a distribution function, called the
marginal distribution function of X.
11
Similarly, FY (y)= lim x → F(x; y) is the marginal distribution function of Y.
If the distribution function F possesses the joint density function f, then the
marginal density functions for X and Y are given, respectively, by:
Property
If X and Y are jointly distributed, then E[X +Y] = E[X] + [ Y],
provided only that all these moments exist.
Independence
If F(x; y) = FX(x).FY (y) for every choice of x; y, then the random variables X and
Y are said to be independent.
Property
If X and Y are independent and possess a joint density function f (x; y),
then necessarily f(x, y) = fX(x). fY (y) for all x; y.
(1st course stop)
Covariance
Given jointly distributed random variables X and Y having means X and Y and
finite variances, the covariance of X and Y, written σXY or Cov[X, Y], is the
product moment
X and Y are said to be uncorrelated if their covariance is zero, i.e., σXY = 0.
Remark
Independent random variables having finite variances are uncorrelated,
but the converse is not true.
There are uncorrelated random variables that are not independent.
Correlation coefficient
σ𝑋𝑌
The correlation coefficient is defined by : = σ𝑋 .σ𝑌
for which -1 1
12
Generally
The joint distribution function of any finite collection X1; … ;Xn of random
variables is defined as
Property
If F(x1;…; xn) = FX1 (x1). …..FXn (xn) for all values of x1; …; xn, then the random
variables X1; …;Xn are said to be independent.
A joint distribution function F(x1, …; xn) is said to have a probability density
function f (ξ1; ξ2, …; ξn ) if
for all values of x1; :..; xn.
13
Algebraic operations of random variables
Distribution function of the sums
If X and Y are independent random variables having distribution functions FX
and FY , respectively,
then the distribution function of their sum Z = X +Y is the convolution of FX
and FY :
If X and Y have the probability densities fX and fY, respectively,
then the density function fZ of the sum Z = X + Y is the convolution of the
densities fX and fY :
If X and Y are independent and have respective variances σ2x and σ2Y, then the
variance of the sum Z=X +Y is the sum of the variances:
More generally,
If X1, …;Xn are independent random variables having variances σ2X1,…, σ2Xn
respectively, then the variance of the sum Z = X1 + X1 , …., Xn is
Exemple
Let X and Y be two Independent Uniform Random Variables on the interval
[0, 1] and Z=X+Y. What is the density function of their sum?.
We have
The density function for the sum Z is given by
14
Since fY(y)=1 if 0 ≤ y ≤ 1 and 0 otherwise, this becomes
Now, the integrand is 0 unless 0 ≤ z − y ≤ 1 (i.e., unless z − 1 ≤ y ≤ z) and then it is 1.
So, if 0 ≤ z ≤ 1, we have
while if 1 < z ≤ 2, we have
and if z < 0 or z > 2, we have fZ(z)=0.
Hence
Convolution of two uniform densities
Example
Let X, Y be two random variables with an exponential density with parameter
λ. What is the density of their sum Z = X + Y .
Denote the relevant density functions fX ,fY, and fZ associated to X, Y and Z.
If z > 0, then
15
while if z < 0, fZ(z)=0 . Hence
Convolution of two exponential densities with λ = 1
16
Change of Variable
Suppose that X is a random variable with probability density function fX and that
g is a strictly increasing differentiable function.
Then, Y = g(X) defines a random variable, and
where g-1 is the inverse function to g;
i.e., y = g(x) if and only if x = g-1(y).
Thus, we obtain the correspondence between the distribution function of Y and
that of X.
Recall the differential calculus formula
This in the chain rule of differentiation is used to obtain
The density function for Y is expressed in terms of the density for X when g is
strictly increasing and differentiable by the following formula
17
CONDITIONAL PROBABILITY
For any events A and B, the conditional probability of A given B is written
Pr(A/B) and defined by
More useful, is the following equivalent multiplicative form
Theorem (Bayes’ theorem)
Let A and B be two events, each of positive probability. Then
PROOF We compute that
This gives the result.
Using the law of total probability, we have
The previous equation leads to
Then, one has
18
The Major Distributions
Discrete Distributions
In this section, let us summarize the most important discrete probability
distributions and their relevant properties.
Bernoulli Distribution
The so-called Bernoulli distribution describes a binary experiment in which only
two exclusive options are possible. (‘‘heads or tails’’, ‘‘either it rains or not’’). A
random variable X following the Bernoulli distribution with parameter p has
only two possible values, 0 and 1, and the probability mass function is
p(1)= p and p(0)=1-p, where 0 < p < 1.
The mean and variance are E[X] = p and Var[X] = p(1-p), respectively.
Example:
1) "Heads and Tails" Coin tossing, or heads or tails is the practice
of throwing a coin in the air and checking which side is showing
when it lands, in order to randomly choose between two
alternatives, heads or tails
2) Bernoulli random variables occur frequently as indicators of events. The
indicator of an event A is the random variable
Property
For example, let 1, 2, 3 , …, n be arbitrary real numbers and A1; A2, …; An
be events, one has
Prove this property
19
Demonstration
This gives, after taking expectations,
Binomial Distribution
The binomial distribution with parameters n and p is the discrete probability
distribution of the number of successes in a sequence of n independent
experiments, each asking a yes–no question.
In general, if the random variable X follows the binomial distribution with
parameters n ∈ N and p ∈ [0,1], we write X ~ B(n, p). The probability of getting
exactly k successes in n independent Bernoulli trials (with the same rate p) is
given by the Bernoulli probability mass function
𝑛
( ) is the binomial coefficient, hence the name of the distribution.
𝑘
Consider independent events A1, A2, …;An, all having the same probability
p = Pr(Ai) of occurrence. Let Y count the total number of events among A1, …. ;
An that occur. Then, Y has a binomial distribution with parameters n and p. The
associated probability mass function is
20
Writing Y as a sum of indicators in the form
This makes it easy to determine the moments
and using independence, we can also determine that
Example
Suppose a biased coin comes up heads with probability p=0.3 when tossed. The
probability of seeing exactly k=4 heads in n=6 tosses is
The Poisson Distribution
The Poisson distribution plays a role in the class of discrete distributions that
parallels in some sense that of the normal distribution in the continuous class.
It has many elegant and surprising mathematical properties that make analysis a
pleasure.
A random variable X, taking on one of the values 0, 1, 2,… , is said to be a Poisson
random variable with parameter λ, if for some λ >0, we have
The Poisson distribution with parameter > 0 of a random variable X has the
probability mass function p(X=k) = p(k) given by:
Using this series expansion
The same series helps calculate the mean via
21
The same trick works on the variance,
Exercise
Let X be a random variable having the Poisson distribution with parameter ,
demonstrate that:
Example
If the number of accidents occurring on a highway each day is a Poisson random
variable with parameter λ = 3, what is the probability that no accidents occur
today?
Solution:
P(x=0) = e -3 ≈ 0.05
An important property of the Poisson random variable is that it may be used to
approximate a binomial random variable when the binomial parameter n is large
and p is small.
Exercise
Demonstrate that the binomial distribution with parameters n and p converges to
the Poisson with parameter if n→ and p→ 0 in such a way that = np remains
constant
Proof
22
and
to obtain the Poisson distribution
The Multinomial Distribution
The multinomial distribution generalizes the binomial one.
Consider an experiment having a total of r possible outcomes, and let the
corresponding probabilities be p1, …, pr, respectively. Now perform n
independent replications of the experiment and let Xi record the total number of
times that the ith type outcome is observed in the n trials.
Then, X1; : : : ;Xr has the multinomial distribution. This is a joint distribution of r
variables in which only nonnegative integer values 0; …; n are possible. The joint
probability mass function is given by:
where pi > 0 for i = 1, …, r and p1 +p2 + ….+ pr = 1.
23
Continuous Distributions
The Uniform Distribution
A random variable U is uniformly distributed over the interval [a; b], where
a < b, if it has the probability density function given by:
The mean and variance are, respectively,
Example
Calculate the cumulative distribution function of a random variable uniformly
distributed over (α, β).
Example
If X is uniformly distributed over (0, 10), calculate the probability that
(a) X<3, (b) X>7, (c) 1<X<6.
24
The Normal Distribution
We say that X is a normal random variable (or simply that X is normally
distributed) with parameters μ and σ2 > 0 if the probability density function of
X is given by:
Figure: Normal densities with different location
and scale parameters
The density function is symmetric about the point μ, and the parameter σ2 is the
variance of the distribution.
The case μ =0 and σ2 =1 is referred to as the standard or unit normal
distribution.
If X is normally distributed with mean μ and variance σ2, then Z = (X- μ)/ σ has
a standard or unit normal distribution
Z = (X- μ)/ σ is normally distributed with parameters 0 and 1.
The standard normal density and distribution functions are given respectively
by
Property
25
If X is normally distributed with parameters μ and σ2 then Y = αX + β is
normally distributed with parameters αμ + β and α2σ2.
Exercise: Prove this property
To prove this, suppose first that α > 0 and note that FY (·), the cumulative
distribution function of the random variable Y, is given by
Using the change in variables v = αx + β, this equation leads to:
𝑎
since FY (a) = ∫−∞ ⬚ f𝑌 (v) dv, it follows from this equation that the probability
density function fY (·) is given by
Hence, Y is normally distributed with parameters αμ+β and (ασ)2.
The Gamma Distribution
The gamma distribution with parameters α >
0 and λ > 0 has probability density function
Gamma densities with different shape parameters α
26
The gamma function, defined by
The gamma function at integral
arguments is a generalization of the
factorial function, and (k) =(k-1)! for
k = 1;2, ….
Asymptotically, we have
for n = 0;1, ….
Given an integer number α of independent exponentially distributed random
variables Y1, …, Yα having common parameter λ, then their sum
Xα = Y1 +… +Yα has the gamma density f(x), from which we obtain the moments
Exercise: Prove these results
The Beta Distribution
The beta density with parameters α > 0 and β > 0 is given by
The mean and variance are, respectively
Exercise: Prove these results
The Joint Normal Distribution
Let σX; σY; X; X, and be real constants subject to σX > 0; σY> 0, and -1<<1.
For real variables x and y, define
The joint normal (or bivariate normal) distribution for random variables X;
Y is defined by the density function
27
The moments var and Covar are
The dimensionless parameter is called the correlation coefficient.
Exercise: Prove these results
Property
If = 0, then X and Y are independent random variables
Linear Combinations of Normally Distributed Random Variables
Suppose X and Y have the bivariate normal density, and let Z =a.X +b.Y for
arbitrary constants a; b. Then Z is normally distributed with mean
Exercise: Prove these results
28