0% found this document useful (0 votes)
131 views101 pages

Probability Theory & Stochastic Processes

This document provides an overview of probability theory and stochastic processes. It defines key concepts such as sample space, events, axioms of probability, conditional probability, Bayes' law, independence, random variables, probability mass functions, binomial, Poisson, exponential, uniform and other common probability distributions. Examples are given for each concept to illustrate their application and properties. The document appears to be teaching materials for a class on probability theory and stochastic processes.

Uploaded by

mohan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
131 views101 pages

Probability Theory & Stochastic Processes

This document provides an overview of probability theory and stochastic processes. It defines key concepts such as sample space, events, axioms of probability, conditional probability, Bayes' law, independence, random variables, probability mass functions, binomial, Poisson, exponential, uniform and other common probability distributions. Examples are given for each concept to illustrate their application and properties. The document appears to be teaching materials for a class on probability theory and stochastic processes.

Uploaded by

mohan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

INSTITUTE OF AERONAUTICAL ENGINEERING

(Autonomous)
DUNDIGAL, HYDERABAD - 500043

PPT ON PROBABILITY THEORY


&STOCHASTIC PROCESS
II [Link] I semester (JNTUH-R15)

Prepared by
[Link] Swarna Latha
(Assistant professor)
[Link] kumar reddy
(Assistant professor)
probability introduced through sets
and relative frequency
• Experiment:- a random experiment is an
action or process that leads to one of several
possible outcomes
Experiment Outcomes

Flip a coin Heads, Tails


Numbers: 0, 1, 2, ...,
Exam Marks
100
Assembly Time t > 0 seconds

Course Grades F, D, C, B, A, A+
Sample Space
• List: “Called the Sample Space”
• Outcomes: “Called the Simple Events”
This list must be exhaustive, i.e. ALL possible
outcomes included.
• Die roll {1,2,3,4,5} Die roll {1,2,3,4,5,6}

• The list must be mutually exclusive, i.e. no


two outcomes can occur at the same time:
• Die roll {odd number or even number}
• Die roll{ number less than 4 or even
number}
Sample Space
• A list of exhaustive [don’t leave anything out] and
mutually exclusive outcomes [impossible for 2
different events to occur in the same experiment]
is called a sample space and is denoted by S.

• The outcomes are denoted by O1, O2, …, Ok

• Using notation from set theory, we can represent


the sample space and its outcomes as:

• S = {O1, O2, …, Ok}


• Given a sample space S = {O1, O2, …, Ok}, the
probabilities assigned to the outcome must
satisfy these requirements:

(1) The probability of any outcome is between 0 and


1
• i.e. 0 ≤ P(Oi) ≤ 1 for each i, and

(2) The sum of the probabilities of all the outcomes


equals 1
• i.e. P(O1) + P(O2) + … + P(Ok) = 1
Relative Frequency
Random experiment with sample space S. we shall assign
non-negative number called probability to each event
in the sample space.
Let A be a particular event in S. then “the probability of
event A” is denoted by P(A).
Suppose that the random experiment is repeated n times,
if the event A occurs nA times, then the probability of
event A is defined as “Relative frequency “
• Relative Frequency Definition: The probability of an
• event A is defined as
nA
P ( A)  lim
n n
Axioms of Probability
For any event A, we assign a number P(A), called
the probability of the event A. This number
satisfies the following three conditions that
act the axioms of probability.
(i) P ( A)  0 (Probabili ty is a nonnegativ e number)
(ii) P ()  1 (Probabili ty of the whole set is unity)
(iii) If A  B   , then P ( A  B )  P( A)  P ( B ).
(Note that (iii) states that if A and B are mutually
exclusive (M.E.) events, the probability of their union
is the sum of their probabilities.)
Events
• The probability of an event is the sum of the
probabilities of the simple events that
constitute the event.
• E.g. (assuming a fair die) S = {1, 2, 3, 4, 5, 6}
and
• P(1) = P(2) = P(3) = P(4) = P(5) = P(6) = 1/6
• Then:
• P(EVEN) = P(2) + P(4) + P(6) = 1/6 + 1/6 + 1/6 =
3/6 = 1/2
Conditional Probability
• Conditional probability is used to determine how
two events are related; that is, we can determine
the probability of one event given the occurrence
of another related event.
• Experiment: random select one student in class.
• P(randomly selected student is male) =
• P(randomly selected student is male/student is
on 3rd row) =
• Conditional probabilities are written as P(A | B)
and read as “the probability of A given B” and is
calculated as
• P( A and B) = P(A)*P(B/A) = P(B)*P(A/B) both
are true
• Keep this in mind!
Bayes’ Law
• Bayes’ Law is named for Thomas Bayes, an
eighteenth century mathematician.

• In its most basic form, if we know P(B | A),

• we can apply Bayes’ Law to determine P(A | B)

P(B|A) P(A|B)
• The probabilities P(A) and P(AC) are called
prior probabilities because they are
determined prior to the decision about taking
the preparatory course.
• The conditional probability P(A | B) is called a
posterior probability (or revised probability),
because the prior probability is revised after
the decision about taking the preparatory
course.
Total probability theorem
• Take events Ai for I = 1 to k to be:
– Mutually exclusive: Ai  A j  0 for all i,j
– Exhaustive: A1    Ak  S
For any event B on S

p( B)  p( B A1 ) p( A1 )    p( B Ak ) p( Ak )
k
p( B)   p( B Ai ) p( Ai )
i 1

Bayes theorem follows

p( A j  B) p( B A j )  p( A)
p( A j B)   k
p( B)
 p( B A ) p( A )
i 1
i i
Independence
• Do A and B depend on one another?
– Yes! B more likely to be true if A.
– A should be more likely if B.
• If Independent
p  A  B   p  A  p  B 
pA B   p A pB A  pB 

• If Dependent
p  A  B   p  A  p B 
p  A  B   p  A  p B   p  A  B 
p A  B   p B A p A
Random variable
• Random variable
– A numerical value to each outcome of a particular
experiment
S

-3 -2 -1 0 1 2 3
• Example 1 : Machine Breakdowns
– Sample space : S  {electrical , mechanical, misuse}
– Each of these failures may be associated with a
repair cost
– State space : {50, 200,350}
– Cost is a random variable : 50, 200, and 350
• Probability Mass Function (p.m.f.)
– A set of probability value assigned to each of the
values taken by the discrete random variable xi
– 0  pi  1 and  pi  1
i
– Probability : P( X  xi )  pi
Continuous and Discrete random
variables
• Discrete random variables have a countable number
of outcomes
– Examples: Dead/alive, treatment/placebo, dice, counts,
etc.
• Continuous random variables have an infinite
continuum of possible values.
– Examples: blood pressure, weight, the speed of a car, the
real numbers from 1 to 6.
• Distribution function:

• If FX(x) is a continuous function of x, then X is a


continuous random variable.
– FX(x): discrete in x  Discrete rv’s
– FX(x): piecewise continuous  Mixed rv’s
– PROPERTIES:


Probability Density Function (pdf)
• X : continuous rv, then,

• pdf properties:
1.
t
2.
F (t )   f ( x)dx

t
  f ( x)dx ,
0
Binomial
• Suppose that the probability of success is p

• What is the probability of failure?


q=1–p

• Examples
– Toss of a coin (S = head): p = 0.5  q = 0.5
– Roll of a die (S = 1): p = 0.1667  q = 0.8333
– Fertility of a chicken egg (S = fertile): p = 0.8  q = 0.2
binomial
• Imagine that a trial is repeated n times

• Examples
– A coin is tossed 5 times
– A die is rolled 25 times
– 50 chicken eggs are examined

• Assume p remains constant from trial to trial and that the trials are
statistically independent of each other
• Example
– What is the probability of obtaining 2 heads from a coin that
was tossed 5 times?

P(HHTTT) = (1/2)5 = 1/32


Poisson
• When there is a large number of trials, but a small probability
of success, binomial calculation becomes impractical
– Example: Number of deaths from horse kicks in the Army
in different years

• The mean number of successes from n trials is µ = np


– Example: 64 deaths in 20 years from thousands of soldiers
If we substitute µ/n for p, and let n tend to infinity, the binomial
distribution becomes the Poisson distribution:
e -µµx
P(x) =
x!
poisson
• Poisson distribution is applied where random
events in space or time are expected to occur

• Deviation from Poisson distribution may


indicate some degree of non-randomness in
the events under study

• Investigation of cause may be of interest


Exponential Distribution
Uniform
All (pseudo) random generators generate random deviates of U(0,1)
distribution; that is, if you generate a large number of random variables
and plot their empirical distribution function, it will approach this
distribution in the limit.
U(a,b)  pdf constant over the (a,b) interval and CDF is the ramp
function
0
U(0,1) pdf
0.1
0.2
1.2
0.3
0.6
1
0.7
0.8
0.8
0.9
1
cdf

0.6
1.1

0.4 1.2
1.3

0.2 1.4
1.5

0 1.6
0 0.1 0.2 0.3 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 1.7
tim e 1.8
1.9
2
Uniform distribution

0 , x < a,

{
F(x)= xa
ba

1 ,
, a <x<b

x > b.
Gaussian (Normal) Distribution
• Bell shaped pdf – intuitively pleasing!
• Central Limit Theorem: mean of a large
number of mutually independent rv’s (having
arbitrary distributions) starts following Normal
distribution as n 

• μ: mean, σ: std. deviation, σ2: variance (N(μ,


σ2))
• μ and σ completely describe the statistics. This
is significant in statistical estimation/signal
processing/communication theory etc.
• N(0,1) is called normalized Guassian.
• N(0,1) is symmetric i.e.
– f(x)=f(-x)
– F(z) = 1-F(z).
• Failure rate h(t) follows IFR behavior.
– Hence, N( ) is suitable for modeling long-term wear or
aging related failure phenomena
Exponential Distribution
Conditional Distributions
• The conditional distribution of Y given X=1 is:
• While marginal distributions are obtained
from the bivariate by summing, conditional
distributions are obtained by “making a cut”
through the bivariate distribution
The Expectation of a Random
Variable
Expectation of a discrete random variable with p.m.f
P( X  xi )  pi

E( X )  pxi
i i

Expectation of a continuous random variable with p.d.f f(x)

E( X )  state space
xf ( x ) dx
expectation of X = mean of X = average of X

E[ X ]  X   xf X ( x)dx continuous r.v.

N
E[ X ]  X   xi P( xi ) discrete r.v.
i 1
f X ( x  a)  f X ( x  a), x  E[ X ]  a
X r.v.  Y =g ( X ) r.v. Ex: Y  g ( X )  X 2

1 1 2
P( X  0)  P ( X  1)  P ( X  1)  P(Y  0)  P(Y  1) 
3 3 3
Expectation
expectation of a function of a r.v. X

E[ g ( X )]   g ( x) f X ( x)dx continuous r.v.

N
E[ g ( X )]   g ( xi ) P( xi ) discrete r.v.
i 1
conditional expectation of a r.v. X

E[ X B]   xf X ( x B)dx continuous r.v.

N
E[ X B]   xi P( xi B) discrete r.v.
i 1
Ex: B  { X  b}
 f X ( x)
 b , xb b

f X ( x X  b)    f X ( x)dx
 E[ X X  b] 

xf X ( x)dx
 b

0, xb 


f X ( x)dx

Moments
n-th moment of a r.v. X

mn  E[ X ]   x n f X ( x)dx
n
 continuous r.v.
N
mn  E[ X n ]   xin P( xi ) discrete r.v.
i 1
m0  1
m1  X
properties of expectation:
(1) E[c]  c c -- constant

(2) E[ag ( X )  bh( X )]  aE[ g ( X )]  bE[h( X )]


 
PF: E[c]   cf X ( x)dx  c  f X ( x)dx  c


E[ag ( X )  bh( X )]   {ag ( x)  bh( x)} f X ( x)dx

 
 a  g ( x) f X ( x)dx  b  h( x) f X ( x)dx  aE[ g ( X )]  bE[h( X )]
 
variance of a r.v. X
 X2  2  E[( X  X ) 2 ]  E[ X 2  2 XX  X 2 ]
 E[ X 2 ]  2 XE[ X ]  X 2  m2  m12
standard deviation of a r.v. X   X ( 0)
3
skewness of a r.v. X  3
X
f X ( x) symmetric about x  X  3  0
Ex 3.2-1 & Ex3.2-2:
 1  xb a
 e , xa
exponential r.v. f X ( x)   b
0, xa

 1  xb a
m1  E[ X ]   x e dx  a  b
a b x a
 1 
m2  E[ X 2 ]   x 2 e b dx  (a  b)2  b 2
a b
 X2   2  m2  m12  b 2
x a
 1 
m3  E[ X 3 ]   x3 e b dx  a 3  3a 2b  6ab 2  6b3
a b
3  E[( X  X )3 ]  E[ X 3  3 X 2 X  3 XX 2  X 3 ]  m3  3m1m2  3m12m1  m13
 a3  3a 2b  6ab2  6b3  3(a  b){(a  b)2  b2 }  2(a  b)3  2b3
3 2b3
skewness of a r.v. X  3  3  2
X b
 X2
Chebychev's inequality P[ X  X   ]  2
  2
 X   ( x  X ) f X ( x)dx  
2 2
( x  X ) f X ( x)dx
 x  X 

2 f X ( x)dx   2 P[ X  X   ]
x  X 

Markov's inequality E[ X ]
P[ X  0]  0  P[ X  a] 
 X2 1 a
Ex 3.2-3: P[ X  X  3 X ]  
9 X 9
2
Characteristic function of r.v. X

 X ( )  E[e j X
]   f X ( x)e j x dx


1  Fourier transform

 j x
f X ( x)   X ( )e d
2 

 
 X ( )   f X ( x) e j x
dx   f X ( x)dx  1   X (0)
 

d n  X ( )  
 
n n j x
 f ( x ) j x e dx  0  j n
f X ( x) x n dx  j n E[ X n ]
d n  X 
 0

d n
 X ( )
mn  ( j ) n

d  n  0
Functions That Give Moments
Moment generating function of r.v. X

M X (v)  E[e ]   f X ( x)evx dx
vX


d n M X (v )  
    f X ( x) x n dx  mn
n vx
n
f X ( x ) x e dx v 0
dv v 0
 

Ex 3.3-1 & Ex 3.3-2:


 1  xb a
 e , xa
f X ( x)   b
0, xa


1
a 1 a  (  j ) x
1   (  j ) x 1 e b
 X ( )  E[e j X ]  e b  e b dx  e b
b a b ( 1  j )
1
 (  j ) a b x a
a j a
1 b e b
e
 e 
b ( 1  j ) 1  j b
d  X ( ) jae j a (1  j b)  e j a jb
b 
va d  (1  j b ) 2
e
M X (v)  E[evX ] 
1  vb dM ( v ) ae va
(1  vb )  e va
b
X

dv (1  vb)2
d  X ( ) dM X (v)
m1  ( j )  ab m1   ab
d  0 dv v 0
Chernoff's inequality Ex 3.3-3:
v0

 
P[ X  a ]   f X ( x)dx   f X ( x)u ( x  a )dx
a 

 f X ( x )ev ( xa ) dx  e  va M X (v )


Transformations of a Random
Variable
Y  T(X ) f X ( x) given  fY ( y)  ?

monotone increasing 
T ( x1 )  T ( x2 ) for any x1  x2

monotone decreasing 
T ( x1 )  T ( x2 ) for any x1  x2
Assume monotone increasing T () Y  T(X )
FY ( y0 )  P[Y  y0 ]  P[ X  x0 ]  FX ( x0 )

y0 T 1 ( y0 )

fY ( y )dy  

f X ( x)dx

1
1 dT ( y0 )
fY ( y0 )  f X [T ( y0 )]
dy0
1
1 dT ( y) dx
fY ( y )  f X [T ( y )]  f X ( x)
dy dy
Assume monotone decreasing T () Y  T(X )
FY ( y0 )  P[Y  y0 ]  P[ X  x0 ]  1  FX ( x0 )

dx
fY ( y )   f X ( x )
dy

dx 1
monotone T ()  fY ( y )  f X ( x )  f X ( x)
dy dy
dx
nonmonotone T ()

Y  T(X )

f X ( xn )
fY ( y )  
n dT ( x)
dx x xn
Ex 3.4-2:
Y  T ( X )  cX 2 nonmonotone

d y/c
fY ( y )  f X ( y / c )
dy

d y / c
 f X ( y / c )
dy

f X ( y / c )  f X ( y / c )
 , y0
2 cy
MULTIPLE RANDOM VARIABLES and OPERATIONS:
MULTIPLE RANDOM VARIABLES :
Vector Random Variables
A vector random variable X is a function that assigns a vector of real
numbers to each outcome ζ in S, the sample space of the random
experiment

Events and Probabilities


EXAMPLE 4.4

Consider the tow-dimensional random variable X = (X, Y). Find the


region of the plane corresponding to the events
A  X  Y  10,
B  min( X , Y )  5, and
 
C  X 2  Y 2  100 .

The regions corresponding to events A and C are straightforward


to find and are shown in Fig. 4.1.
Independence
If the one-dimensional random variable X and Y are “independent,” if A1
is any event that involves X only and A2 is any event that involves Y only,
then
PX in A1, Y in A2   PX in A1 PY in A2 .
In the general case of n random variables, we say that the random
variables X1, X2,…, Xn are independent if
PX 1 in A1 ,, X n in An   PX 1 in A1  PX n in An  , (4.3)

where the Ak is an event that involves Xk only.


Pairs of Discrete Random Variable
Let the vector random variable X = (X,Y) assume values from some countable
 
setS  ( x j , yk ), j  1,2, , k  1,2,  .The joint probability mass function of X
specifies the probabilities of the product-form event
X  x  Y  y :j k

p X ,Y ( x j , y )  PX  x  Y  y 
k j k


 P X  x j , Y  yk  j  1,2, k  1,2, (4.4)
The probability of any event A is the sum of the pmf over the outcomes
in A

PX in A   p X ,Y ( x j , yk ) . (4.5)
( x j , yk ) in A
 

 p
j 1 k 1
X ,Y ( x j , yk )  1 . (4.6)

The marginal probability mass functions :


pX (x j )  P X  x j 
 PX  x , Y  anything 
j

 PX  x and Y  y  X  x
j 1 j 
and Y  y2  

  p X,Y (x j ,yk ) , (4.7a)
k 1

pY ( yk )  PY  yk 

  p X,Y ( x j ,yk ) . (4.7b)
j 1
The Joint cdf of X and Y
The joint cumulative distribution function of X and Y is defined as the
probability of the product-form event X  x1 Y  y1":
FX ,Y ( x1 , y1 )  PX  x1 , Y  y1 . (4.8)

The joint cdf is nondecreasing in the “northeast” direction,


(i) FX,Y (x1,y1 )  FX,Y (x2 ,y2 ) if x1  x2 and y1  y2 ,

It is impossible for either X or Y to assume a value less than  ,


therefore
(ii) FX,Y (  ,y1 )  FX,Y (x2 ,  )  0

It is certain that X and Y will assume values less than infinity,


therefore
(iii) FX,Y (,)  1.
If we let one of the variables approach infinity while keeping the
other fixed, we obtain the marginal cumulative distribution functions
(iv) FX ( x)  FX ,Y ( x, )  PX  x, Y    PX  x 

and
FY ( y )  FX ,Y (, y )  PY  y .
Recall that the cdf for a single random variable is continuous
form the right. It can be shown that the joint cdf is continuous from
the “north” and from the “east”
(v) lim FX ,Y ( x, y )  FX ,Y (a, y )
xa 

and
lim FX ,Y ( x, y)  FX ,Y ( x, b)
y b 
The Joint pdf of Two Jointly Continuous Random
Variables
We say that the random variables X and Y are jointly continuous
if the probabilities of events involving (X, Y) can be expressed as an
integral of a pdf. There is a nonnegative function fX,Y(x,y), called the
joint probability density function, that is defined on the real plane
such that for every event A, a subset of the plane,

PX in A   f X ,Y ( x' , y' )dx' dy' , (4.9)


A

as shown in Fig. 4.7. When a is the entire plane, the integral must
equal one :
 
1   f X ,Y ( x' , y' )dx' dy' . (4.10)
 
The joint cdf can be obtained in terms of the joint pdf of jointly
continuous random variables by integrating over the semi-infinite
The marginal pdf’s fX(x) and fY(y) are obtained by taking the derivative of the
corresponding marginal cdf’s
FX ( x)  FX ,Y ( x, )
FY ( y )  FX ,Y (, y ) .

FX ( x) 
d x
dx 



f X ,Y ( x' , y ' )dy ' dx'

  f X ,Y ( x, y ' )dy ' . (4.15a)



FY ( y)   f X ,Y ( x' , y)dx' . (4.15b)

INDEPENDENCE OF TWO RANDOM
VARIABLES

X and Y are independent random variables if any event A1 defined in


terms of X is independent of any event A2 defined in terms of Y ;

PX in A1 , Y in A2   PX in A1 PY in A2 . (4,17)

Suppose that X and Y are a pair of discrete random variables. If we let


A1  X  x j  and A2  Y  yk then
, the independence of X and Y
implies that
 
p X ,Y ( x j , yk )  P X  x j , Y  yk
 PX  x PY  y 
j k

 p X ( x j ) pY ( yk ) for all x j and yk . (4.18)


4.4 CONDITIONAL PROBABILITY AND
CONDITIONAL EXPECTATION

Conditional Probability
In Section 2.4, we know
PY in A, X  x
PY in A | X  x  . (4.22)
PX  x

If X is discrete, then Eq. (4.22) can be used to obtain the


conditional cdf of Y given X = xk :
PY  y, X  xk 
FY ( y | xk )  , for PX  xk   0 . (4.23)
PX  xk 

The conditional pdf of Y given X = xk , if the derivative exists, is given


by d
fY ( y | xk )  FY ( y | xk ) . (4.24)
dy
MULTIPLE RANDOM VARIABLES

Joint Distributions
The joint cumulative distribution function of X1, X2,…., Xn is defined as the
probability of an n-dimensional semi-infinite rectangle associate with the
point (x1,…, xn):

FX1 , X 2 ,X n ( x1 , x2 , xn )  PX 1  x1 , X 2  x2 ,, X n  xn . (4.38)

The joint cdf is defined for discrete, continuous, and random variables of
mixed type
FUNCTIONS OF SEVERAL RANDOM
VARIABLES
One Function of Several Random Variables

Let the random variable Z be defined as a function of several random


variables:
Z  g  X 1 , X 2 ,, X n  . (4.51)

The cdf of Z is found by first finding the equivalent event of


that is, the set RZ  x  x1 ,, xn  such that g x   z, then

FZ ( z )  PX in Rz 
 
x in Rz   
f X1 ,, X n x1' ,, xn' dx1'  dxn' . (4.52)
EXAMPLE 4.31 Sum of Two Random Variables

Let Z = X + Y. Find FZ(z) and fZ(z) in terms of the joint pdf of X


and Y.

The cdf of Z is
 z  x'
FZ ( z)    f X ,Y ( x' , y' )dy' dx' .
 

The pdf of Z is
d 
f Z ( z)  FZ ( z )   f X ,Y ( x' , z  x' )dx' . (4.53)
dz 

Thus the pdf for the sum of two random variables is given by a superposition
integral. If X and Y are
independent random variables, then by Eq. (4.21) the pdf is given by the
convolution integral of the margial pdf’s of X and Y :


f Z ( z)   f X ( x' ) fY ( z  x' )dx' . (4.54)

pdf of Linear Transformations
We consider first the linear transformation of two random variables

V  aX  bY V  a b  X 
W    c e  Y  .
W  cX  eY     

Denote the above matrix by A. We will assume A has an inverse, so each


point (v, w) has a unique corresponding point (x, y) obtained from

x  1 v 
 y  A  w . (4.56)
   
In Fig. 4.15, the infinitesimal rectangle and the parallelogram are equivalent
events, so their probabilities must be equal. Thus

f X ,Y ( x, y )dxdy  fV ,W (v, w)dP


where dP is the area of the parallelogram. The joint pdf of V and W is thus given
by
f X ,Y ( x, y)
fV ,W (v, w)  , (4.57)
dP
dxdy
where x an y are related to (v, w) by Eq. (4.56) It can
 
dP  ae  bc dxdy , so the “stretch factor” is
be shown that

dP ae  bc dxdy
  ae  bc  A ,
dxdy dxdy
where |A| is the determinant of A.
Let the n-dimensional vector Z be

Z  AX,

where A is an n n invertible matrix. The


joint of Z is then
EXPECTED VALUE OF FUNCTIONS OF RANDOM VARIABLES

The expected value of Z = g(X, Y) can be found using the following


expressions

   g x, y  f ( x, y )
 
X, Y jointly continuous
EZ   
X ,Y
(4.64)
 g ( xi , yn ) p X ,Y ( xi , yn ) X,Y discrete.
 i n
*Joint Characteristic Function
The joint characteristic function of n random variables is defined as


 X1 , X 2 ,X n (w1 , w2 ,wn )  E e j w1 X1  w2 X 2  wn X n  .  (4.73a)


 X ,Y (w1 , w2 )  E e j w1 X  w2Y  .  (4.73b)

If X and Y are jointly continuous random variables, then


 
 X ,Y (w1 , w2 )    f X ,Y ( x, y)e j w1x w2 y dxdy . (4.73c)
 

The inversion formula for the Fourier transform implies that the joint pdf is
given by

1  
f X ,Y ( x, y )     X ,Y ( w1 , w2 )e j  w1x  w2 y dw1dw2 . (4.74)
4 2  
JOINTLY GAUSSIAN RANDOM VARIABLES
The random variables X and Y are said to be jointly Gaussian if their
joint pdf has the form

f X ,Y ( x, y )
 1  x  m  2  x  m1  y  m2   y  m2   
2

exp   1
  2  X ,Y        
 2 1  2
X ,Y    1    1   2    2   

21 2 1   X2 ,Y
(4.79)
   x   and    y  
The pdf is constant for values x and y for which the argument of the
exponent is constant
 x  m  2  x  m1  y  m2   y  m2  
2

 1
  2  X ,Y         constant
  1    1   2    2  
When ρX,Y = 0, X and Y are independent ; when ρX,Y ≠ 0, the major axis of
the ellipse is oriented along the angle

1  2   
  arctan  X2 ,Y 1 2 2  . (4.80)
2  1   2 
Note that the angle is 45º when the variance are equal.
The marginal pdf of X is found by integrating fX,Y(x, y) over all y

  x  m1 2 / 2 12
e
f X ( x)  , (4.81)
2  1
that is, X is a Gaussian random variable with mean m1 and variance

 12
n Jointly Gaussian Random Variables
The random variables X1, X2,…, Xn are said to be jointly Gaussian if their
joint pdf is given by  1 
exp  x  m K x  m
T 1

f X (x )  f X1 , X 2 ,, X n ( x1 , x2 , xn )   2 , (4.83)
2 n / 2 k 1/ 2
where x and m are column vectors defined by

 x1   m1   EX 1 
x  m   EX 
x   2 , m   2   2 

     EX 3 
     
 xn  m
 n  E  X 
4 

and K is the covariance matrix that is defined by

 VAR X 1  COV  X 2 , X 1   COV  X 1 , X n 


COV  X , X  VAR  X   COV  X , X 
K  2 1 2 2 n 
(4.84)
    
 
COV  X n , X 1   VAR  X n  
Transformations of Random Vectors
Let X1,…, Xn be random variables associate with some experiment, and let the
random variables Z1,…, Zn be defined by n functions of X = (X1,…, Xn) :

Z1  g1 ( X) Z 2  g 2 ( X)  Z n  g n ( X) .
The joint cdf of Z1,…, Zn at the point z = (z1,…, zn) is equal to the
probability of the region of x where
FZ1 ,, Z n ( z1 ,, zn )  Pg1 ( X)  z1 ,, g n ( X)  z n . (4.55a)

FZ1 ,,Z n ( z1 ,, zn )    f


' ' ' '
X1 ,..., X n ( x1 ,..., xn ) dx1  dxn . (4.55b)
x ':g k ( x ') zk
pdf of Linear Transformations
We consider first the linear transformation of two random variables

V  aX  bY V  a b  X 
W    c e  Y  .
W  cX  eY     

Denote the above matrix by A. We will assume A has an inverse, so


each point (v, w) has a unique corresponding point (x, y) obtained
from
x  1 v 
 y  A  w . (4.56)
   
In Fig. 4.15, the infinitesimal rectangle and the parallelogram are
equivalent events, so their probabilities must be equal. Thus

f X ,Y ( x, y )dxdy  fV ,W (v, w)dP


Stochastic Processes
Let  denote the random outcome of an experiment. To every such
outcome suppose a waveform
X (t , ) is assigned.
X (t ,  )


The collection of such X (t,  )
waveforms form a
n

X (t,  ) 
stochastic process. The
k

set of { k } and the time X (t,  )


2

index t can be continuous
X (t,  )
or discrete (countably 1

0
t
t t
infinite or finite) as well. 1 2

For fixed  i  S(the set of Fig. 14.1


all experimental outcomes), X (t , )is a specific time function.
For fixed t,
X 1  X (t1 , i )
is a random variable. The ensemble of all such realizations X (t , )
over time represents the stochastic
process X(t). (see Fig 14.1). For example

X (t )  a cos( 0t   ),
If X(t) is a stochastic process, then for fixed t, X(t) represents
a random variable. Its distribution function is given by
FX ( x, t )  P{X (t )  x}
Notice thatFX ( x, t ) depends on t, since for a different t, we obtain
a different random variable. Further dF ( x, t )
f X ( x, t )  X

dx
represents the first-order probability density function of the
process X(t).
For t = t1 and t = t2, X(t) represents two different random variables
X1 = X(t1) and X2 = X(t2) respectively. Their joint distribution is
given by
FX ( x1 , x2 , t1 , t2 )  P{X (t1 )  x1 , X (t2 )  x2 }

and
 2 FX ( x1 , x2 , t1 , t2 )
f X ( x1 , x2 , t1 , t2 ) 
x1 x2
represents the second-order density function of the process X(t).
Similarly f X ( x1 , x2 ,  xn , t1 , t 2 , trepresents
n) the n th order density
function of the process X(t). Complete specification of the stochastic
process X(t) requires the knowledge of f X ( x1 , x2 ,  xn , t1 , t 2 , t n )
for allti , i  1, 2, , nand for all n. (an almost impossible task
in reality).
Mean of a Stochastic Process:

 (t )  E{ X (t )}    x f ( x, t )dx
X

represents the mean value of a process X(t). In general, the mean of


a process can depend on the time index t.

Autocorrelation function of a process X(t) is defined as

RXX (t1 , t2 )  E{ X (t1 ) X * (t2 )}    x1 x2* f X ( x1 , x2 , t1 , t2 )dx1dx2


and it represents the interrelationship between the random variables
X1 = X(t1) and X2 = X(t2) generated from the process X(t).

Properties:

1. RXX (t1 , t2 )  RXX* (t2 , t1 )  [ E{X (t2 ) X * (t1 )}]*

2. RXX (t, t )  E{| X (t ) |2 }  0.


3. RXX (t1 , t2 represents
) a nonnegative definite function, i.e., for any
set of constants
{ai }in1
n n

Eq. (14-8) follows by noticing that  i j RXX (ti , t j )  0.


a a * (14-8)
i 1 j 1 n
The function
E{| Y | }  0 for Y   ai X (ti ).
2

i 1
represents the autocovariance function of the process X(t).
(14-9)
Example 14.1
Let C XX
(t ,
1 2t )  RXX
(t ,
1 2t )   X
(t1 )  *
X
(t2 )

Then
T
z   T X (t )dt.

T T
E[| z | ]   T  T E{ X (t1 ) X * (t2 )}dt1dt2
2

T T
  T  T R XX (t1 , t2 )dt1dt2 (14-10)
Stationary Stochastic Processes
Stationary processes exhibit statistical properties that are
invariant to shift in the time index. Thus, for example, second-order
stationarity implies that the statistical properties of the pairs
{X(t1) , X(t2) } and {X(t1+c) , X(t2+c)} are the same for any c.
Similarly first-order stationarity implies that the statistical properties
of X(ti) and X(ti+c) are the same for any c.
In strict terms, the statistical properties are governed by the
joint probability density function. Hence a process is nth-order
Strict-Sense Stationary (S.S.S) if
f X ( x1 , x2 , xn , t1 , t2 , tn )  f X ( x1 , x2 , xn , t1  c, t2  c, tn  c)
for any c, where the left side represents the joint density function of
the random variables X 1  X (t1 ), X 2  X (t2 ), , X n  X (tn ) and
the right side corresponds to the joint density function of the random
variables X 1  X (t1  c), X 2  X (t2  c), , X n  X (tn  c).
A process X(t) is said to be strict-sense stationary if (14-14) is
true for all ti , i  1, 2, , n, n  1, 2,  and any c.
For a first-order strict sense stationary process,
from (14-14) we have
f X ( x, t )  f X ( x, t  c ) (14-15)
for any c. In particular c = – t gives
f X ( x, t )  f X ( x) (14-16)
i.e., the first-order density of X(t) is independent of t. In that case

Similarly, for a second-order strict-sense stationary process


we have from (14-14) 
E[ X (t )]    x f ( x )dx   , a constant. (14-17)

for any c. For c = – t2 we get

f X ( x1 , x2 , t1 , t2 )  f X ( x1 , x2 , t1  c, t2  c)

f X ( x1 , x2 , t1 , t2 )  f X ( x1 , x2 , t1  t2 ) (14-18)
i.e., the second order density function of a strict sense stationary
process depends only on the difference of the time indices
In that case the autocorrelation function is given by t1  t2   .
RXX (t1 , t2 )  E{ X (t1 ) X * (t2 )}
   1 2 f X ( x1 , x2 ,  t1  t2 )dx1dx2
*
 x x (14-19)
 RXX (t1  t2 )  RXX ( )  RXX
*
(  ),
i.e., the autocorrelation function of a second order strict-sense
stationary process depends only on the  difference of the time
indices   t1  t2 .
Notice that (14-17) and (14-19) are consequences of the stochastic
process being first and second-order strict sense stationary.
On the other hand, the basic conditions for the first and second order
stationarity – Eqs. (14-16) and (14-18) – are usually difficult to verify.
In that case, we often resort to a looser definition of stationarity,
known as Wide-Sense Stationarity (W.S.S), by making use of
(14-17) and (14-19) as the necessary conditions. Thus, a process X(t)
is said to be Wide-Sense Stationary if
(i)
and E{ X (t )}   (14-20)
(ii)
E{X (t1 ) X (t2 )}  RXX (t1  t2 ),
*
(14-21)
i.e., for wide-sense stationary processes, the mean is a constant and
the autocorrelation function depends only on the difference between
the time indices. Notice that (14-20)-(14-21) does not say anything
about the nature of the probability density functions, and instead deal
with the average behavior of the process. Since (14-20)-(14-21)
follow from (14-16) and (14-18), strict-sense stationarity always
implies wide-sense stationarity. However, the converse is not true in
general, the only exception being the Gaussian process.
This follows, since if X(t) is a Gaussian process, then by definition
X 1  X (t1 ), X 2  X (t2 ), , X n  X (tn ) are jointly Gaussian random
variables for any t1 , t2 , tn whose joint characteristic function
is given by
n n
j   ( tk )k   C ( ti ,tk )ik / 2
XX

(14-22)
 (1 , 2 ,
X
, n )  e k 1 l ,k

where C XX (ti , tk ) is as defined on (14-9). If X(t) is wide-sense


stationary, then using (14-20)-(14-21) in (14-22) we get
n n n
j  k  12  C XX
( ti  tk )i k
 (1 , 2 , ,  n )  e
X
k 1 11 k 1

and hence if the set of time indices are shifted by a constant c to (14-23)
generate a new set of jointly Gaussian random variables X 1  X (t1  c),
X 2  X (t2  c),, X n  X (tn  c) then their joint characteristic
function is identical to (14-23). Thus the set of random variables
and { X i}i 1have the same joint probability distribution for all n and { X }n
n
i i 1
all c, establishing the strict sense stationarity of Gaussian processes
from its wide-sense stationarity.
To summarize if X(t) is a Gaussian process, then
wide-sense stationarity (w.s.s)  strict-sense stationarity (s.s.s).
Notice that since the joint p.d.f of Gaussian random variables depends
only on their second order statistics, which is also the basis

PILLAI/Cha
Systems with Stochastic Inputs
A deterministic system1 transforms each input waveform X (t , i )into
an output waveform Y (t , i )  T [ X (t , i )] by operating only on the
time variable t. Thus a set of realizations at the input corresponding
to a process X(t) generates a new set of realizations{Y (t , )}at the
output associated with a new process Y(t).

Y (t,  i )
X (t,  i )


X (t )
T [] Y

(t )

t t

Fig. 14.3

Our goal is to study the output process statistics in terms of the input
process statistics and the system function.

1A stochastic system on the other hand operates on both the variables t and  .
Linear Systems: L[]represents a linear system if
L{a1 X (t1 )  a2 X (t2 )}  a1 L{X (t1 )}  a2 L{X (t2 )}. (14-28)
Let
Y (t )  L{ X (t )}
(14-29)
represent the output of a linear system.
Time-Invariant System: L[] represents a time-invariant system if

Y (t )  L{ X (t )}  L{ X (t  t0 )}  Y (t  t0 )
i.e., shift in the input results in the same shift in the output also.
If L[] satisfies both (14-28) and (14-30), then it corresponds to (14-30)
a linear time-invariant (LTI) system.
LTI systems can be uniquely represented in terms of their output to
a delta function

h (t )
Impulse
response of
 (t ) LTI h (t ) the system
t
Fig. 14.5 Impulse
Impulse
response
then
Y (t )

X (t )
t
X (t ) Y (t )
t LTI

Y (t )     h(t   ) X ( )d
arbitrary Fig. 14.6

input
    h( ) X (t   )d (14-31)
Eq. (14-31) follows by expressing X(t) as

X (t )     X ( ) (t   )d (14-32)
and applying (14-28) and (14-30) toY (t )  L{ X (t )}.Thus


Y (t )  L{ X (t )}  L{   X ( ) (t   )d }

    L{ X ( ) (t   )d } By Linearity

    X ( ) L{ (t   )}d By Time-invariance
 
    X ( )h(t   )d     h( ) X (t   )d . (14-33)
Output Statistics: Using (14-33), the mean of the output process
is given by

 (t )  E{Y (t )}     E{ X ( )h(t   )d }
Y


     X ( )h (t   )d   X (t )  h(t ). (14-34)
Similarly the cross-correlation function between the input and output
processes is given by
R XY (t1 , t 2 )  E{ X (t1 )Y * (t 2 )}

 E{ X (t1 )    X * (t 2   )h * ( )d }

    E{ X (t1 ) X
*
(t 2   )}h * ( )d

   R XX
(t1 , t 2   )h * ( )d (14-35)
 R XX (t1 , t 2 )  h * (t 2 ).
Finally the output autocorrelation function is given by
RYY (t1 , t2 )  E{Y (t1 )Y * (t2 )}

 E{   X (t1   )h(  )d Y * (t2 )}

    E{ X (t1   )Y * (t2 )}h(  )d

    R XY (t1   , t2 )h(  )d
(14-36)
 R XY (t1 , t2 )  h(t1 ),
or

RYY (t1 , t2 )  RXX (t1 , t2 )  h* (t2 )  h(t1 ). (14-37)

 (t )
X h(t)  (t )
Y

(a)

RXX (t1 , t2 ) 
 h*(t2) R 
XY ( t1 ,t 2 )
h(t1) 
 RYY (t1 , t2 )
(b)
Fig. 14.7
In particular if X(t) is wide-sense stationary, then we have  (t )  
X X
so that from (14-34)

 (t )      h( )d   c, a constant.
Y X X (14-38)
Also R (t , t )  R (t  t ) so that (14-35) reduces to
XX 1 2 XX 1 2

R (t1 , t2 )     R (t1  t2   )h * ( )d
XY XX
(14-39)
 R XX ( )  h (  )  R XY ( ),   t1  t2 .
*

Thus X(t) and Y(t) are jointly w.s.s. Further, from (14-36), the output
autocorrelation simplifies to

RYY (t1 , t2 )    RXY (t1    t2 )h(  )d ,   t1  t2

 RXY ( )  h( )  RYY ( ). (14-40)


From (14-37), we obtain

RYY ( )  RXX ( )  h* ( )  h( ). (14-41)


From (14-38)-(14-40), the output process is also wide-sense stationary.
This gives rise to the following representation

X (t ) Y (t )
wide-sense wide-sense
stationary process LTI system
h(t) stationary process.

(a)

X (t ) Y (t )
strict-sense LTI system strict-sense
stationary process h(t) stationary process
(see Text for proof )
(b)

X (t ) Y (t )
Gaussian process
Gaussian
Linear system (also stationary)
process (also
stationary)
(c)
Fig. 14.8
Discrete Time Stochastic Processes:

A discrete time stochastic process Xn = X(nT) is a sequence of


random variables. The mean, autocorrelation and auto-covariance
functions of a discrete-time process are gives by

 n  E{ X (nT )} (14-57)
and
R(n1 , n2 )  E{ X (n1T ) X * (n2T )} (14-58)

respectively. As before strict sense stationarity and wide-sense


stationarity definitions apply here also.
For example, X(nT) is wide sense stationary if
C (n1 , n2 )  R(n1 , n2 )   n1  n*2 (14-59)
and

E{ X (nT )}   , a constant (14-60)

E[ X {(k  n)T }X *{(k )T }]  R(n )  rn  r*n (14-61)


Power Spectrum
For a deterministic signal x(t), the spectrum is well defined: If X ( )
represents its Fourier transform, i.e., if

X ( )    x(t )e  j t dt , (18-1)
then | X ( ) |2represents its energy spectrum. This follows from
Parseval’s theorem since the signal energy is given by
 
1  d   E.
  x (t )dt    (18-2)
2 2
2 | X ( ) |
( ,    )
Thus | X ( ) |  represents the signal energy in the band
2

(see Fig 18.1).

| X ( )|2
X (t ) Energy in( ,  )

0 t 0 
   
Fig 18.1
However for stochastic processes, a direct application of (18-1)
generates a sequence of random variables for every  . Moreover,
for a stochastic process, E{| X(t) |2} represents the ensemble average
power (instantaneous energy) at the instant t.

To obtain the spectral distribution of power versus frequency for


stochastic processes, it is best to avoid infinite intervals to begin with,
and start with a finite interval (– T, T ) in (18-1). Formally, partial
Fourier transform of a process X(t) based on (– T, T ) is given by

so that T
X T ( )   T X (t )e  j t dt (18-3)

represents the power distribution associated with that realization based


on (– T, T ). Notice that (18-4) represents a random variable for every
and its ensemble average gives, the average power distribution
based on (– T, T ). Thus
| X T ( ) |2 1 T 2
  T X (t )e
 j t
dt , (18-4)
2T 2T
 | X T ( ) |2  1 T T
PT ( )  E      E { X ( t1 ) X *
( t 2 )}e  j ( t1  t2 )
dt1dt2
 2 T  2 T T T

1 T T  j ( t1  t2 )

2T   T  T
R XX
( t1 , t 2 ) e dt1dt2
(18-5)
represents the power distribution of X(t) based on (– T, T ). For wide
sense stationary (w.s.s) processes, it is possible to further simplify
(18-5). Thus if X(t) is assumed to be w.s.s, then
and (18-5) simplifies to
RXX (t1 , t2 )  RXX (t1  t2 )

Let   t1  t2 and proceeding as in (14-24), we get


1 T T  j ( t1  t2 )
PT ( )    T   T XX 1
R ( t  t 2 ) e dt1dt2 .
2T

to be the power distribution of the w.s.s. process X(t) based on


(– T, T ). Finally letting T   in (18-6), we obtain
1 2T  j
PT ( )    2T XX
R ( ) e (2T  |  |)d
2T (18-6)
2T
   2T RXX ( )e  j (1  2|T| )d  0

S XX ( )  lim PT ( )   RXX ( )e  j d  0 (18-7)
T 
to be the power spectral density of the w.s.s process X(t). Notice that

RXX ( ) 
FT
S XX ( )  0. (18-8)

i.e., the autocorrelation function and the power spectrum of a w.s.s


Process form a Fourier transform pair, a relation known as the
Wiener-Khinchin Theorem. From (18-8), the inverse formula gives

and in particular for  0, we get 


RXX ( )  21  XX
S ( ) e j
d (18-9)

From (18-10), the area under S XX ( ) represents the total power of the
process X(t), and hence S XX ( )truly represents the power
spectrum.(Fig 18.2).
1  
 XX   }  P,
2
2 S ( ) d R XX
(0) E {| X ( t ) | the total power.
(18-10)
If X(t) is a real w.s.s process, then R ( ) = R ( ) so that
XX XX

S XX ( )    RXX ( )e  j d

   RXX ( ) cos  d

 2  0 RXX ( ) cos  d  S XX (  )  0
so that the power spectrum is an even function, (in addition to being (18-13)
real and nonnegative).

You might also like