Probability Theory in Random Processes
Probability Theory in Random Processes
DEPARTMENT OF ECE
COURSE FILE
PREPARED BY
[Link] Ram
DEPARTMENT OF ECE
Unit-I: PROBABILITY AND RANDOM VARIABLE
Introduction
It is remarkable that a science which began with the consideration of games of
chance should have become the most important object of human knowledge.
A brief history
Probability has an amazing history. A practical gambling problem faced by the French
nobleman Chevalier de Méré sparked the idea of probability in the mind of Blaise Pascal
(1623-1662), the famous French mathematician. Pascal's correspondence with Pierre de
Fermat (1601-1665), another French Mathematician in the form of seven letters in 1654 is
regarded as the genesis of probability. Early mathematicians like Jacob Bernoulli (1654-
1705), Abraham de Moivre (1667-1754), Thomas Bayes (1702-1761) and Pierre Simon De
Laplace (1749-1827) contributed to the development of probability. Laplace's Theory
Analytique des Probabilities gave comprehensive tools to calculate probabilities based on
the principles of permutations and combinations. Laplace also said, "Probability theory is
nothing but common sense reduced to calculation."
For example, thermal noise appearing in an electronic device is generated due to random
motion of electrons. We have deterministic model for weather prediction; it takes into
account of the factors affecting weather. We can locally predict the temperature or the
rainfall of a place on the basis of previous data. Probabilistic models are established from
observation of a random phenomenon. While probability is concerned with analysis of a
random phenomenon, statistics help in building such models from data.
A deterministic model can be used for a physical quantity and the process generating it
provided sufficient information is available about the initial state and the dynamics of the
process generating the physical quantity. For example,
Many of the physical quantities are random in the sense that these quantities cannot be
predicted with certainty and can be described in terms of probabilistic models only. For
example,
The outcome of the tossing of a coin cannot be predicted with certainty. Thus the
outcome of tossing a coin is random.
The number of ones and zeros in a packet of binary data arriving through a
communication channel cannot be precisely predicted is random.
The ubiquitous noise corrupting the signal during acquisition, storage and
transmission can be modelled only through statistical analysis.
A digital signal is defined at discrete points and also takes a discrete set of
values.
As an example, consider the case of an analog-to-digital (AD) converter. The input to the
AD converter is an analog signal while the output is a digital signal obtained by taking the
samples of the analog signal at periodic intervals of time and approximating the sampled
values by a discrete set of values.
Figure 3 Analog-to-digital (AD) converters
Random Signal
i. Radar signal: Signals are sent out and get reflected by targets. The reflected
signals are received and used to locate the target and target distance from the
receiver. The received signals are highly noisy and demand statistical techniques
for processing.
ii. Sonar signal: Sound signals are sent out and then the echoes generated by some
targets are received back. The goal of processing the signal is to estimate the
location of the target.
These signals can be described with the help of probability and other concepts in
statistics. Particularly the signal under observation is considered as a realization of a
random process or a stochastic process. The terms random processes, stochastic processes
and random signals are used synonymously.
A deterministic signal is analyzed in the frequency-domain through Fourier series
and Fourier transforms. We have to know how random signals can be analyzed in the
frequency domain.
Processing refers to performing any operations on the signal. The signal can be
amplified, integrated, differentiated and rectified. Any noise that corrupts the signal can
also be reduced by performing some operations. Signal processing thus involves
o Amplification
o Filtering
These operations are performed by passing the input signal to a system that
performs the processing. For example, filtering involves selectively emphasising certain
frequency components and attenuating others. In low-pass filtering illustrated in Fig.4,
high-frequency components are attenuated
A problem frequently come across in signal processing is the estimation of the true
value of the signal from the received noisy data. Consider the received noisy signal
given by
where is the desired transmitted signal buried in the noise .
Simple frequency selective filters cannot be applied here, because random noise
cannot be localized to any spectral band and does not have a specific spectral pattern. We
have to do this by dissociating the noise from the signal in the probabilistic sense. Optimal
filters like the Wiener filter, adaptive filters and Kalman filter deals with this problem.
In estimation, we try to find a value that is close enough to the transmitted signal.
The process is explained in Figure 6. Detection is a related process that decides the best
choice out of a finite number of possible values of the transmitted signal with minimum
error probability. In binary communication, for example, the receiver has to decide about
'zero' and 'one' on the basis of the received waveform. Signal detection theory, also known
as decision theory, is based on hypothesis testing and other related techniques and widely
applied in pattern classification, target detection etc.
Digital data is efficiently represented with number of bits for a symbol decided by
its probability of occurrence.
The data at a rate smaller than the channel capacity can be transmitted over a noisy
channel with arbitrarily small probability of error. The channel capacity again is
determined from the probabilistic descriptions of the signal and the noise.
A set is a well defined collection of objects. These objects are called elements or
members of the set. Usually uppercase letters are used to denote sets.
Probability Concepts
2. Sample Space: The sample space is the collection of all possible outcomes of a
random experiment. The elements of are called sample points.
3. Event: An event A is a subset of the sample space such that probability can be
assigned to it. Thus
Figure 1
. . .
The associated finite sample space is .Some events are
And so on.
We may have to toss the coin any number of times before a head is obtained. Thus the
possible outcomes are:
H, TH, TTH, TTTH,
How many outcomes are there? The outcomes are countable but infinite in number. The
countably infinite sample space is .
Definition of probability
Consider a random experiment with a finite number of outcomes If all the outcomes of
the experiment are equally likely , the probability of an event is defined by
where
Example 6 A fair die is rolled once. What is the probability of getting a ‘6’ ?
Here and
Example 7 A fair coin is tossed twice. What is the probability of getting two ‘heads'?
Here and .
Total number of outcomes is 4 and all four outcomes are equally likely.
Discussion
The classical definition is limited to a random experiment which has only a finite
number of outcomes. In many experiments like that in the above examples, the
sample space is finite and each outcome may be assumed ‘equally likely.' In such
cases, the counting method can be used to compute probabilities of events.
Consider the experiment of tossing a fair coin until a ‘head' [Link] we have
discussed earlier, there are countably infinite outcomes. Can you believe that all
these outcomes are equally likely?
The notion of equally likely is important here. Equally likely means equally
probable. Thus this definition presupposes that all events occur with equal
probability . Thus the definition includes a concept to be defined
If an experiment is repeated times under similar conditions and the event occurs in
times, then
Example 8 Suppose a die is rolled 500 times. The following table shows the frequency
each face.
We see that the relative frequencies are close to . How do we ascertain that these
Definition Let be a sample space and a sigma field defined over it. Let be
a mapping from the sigma-algebra into the real line such that for each , there
exists a unique . Clearly is a set function and is called probability, if it satisfies
the following three axioms.
Figure 2
Discussion
Any assignment of probability assignment must satisfy the above three axioms
If ,
This is a special case of axiom 3 and for a discrete sample space , this simpler
version may be considered as the axiom 3. We shall give a proof of this result
below.
1.
Suppose,
Then
Therefore
3. where where
We have,
4. If
We have,
5. If
We have ,
6. We can apply the properties of sets to establish the following result for
,
Consider a finite sample space . Then the sigma algebra is defined by the
power set of S. For any elementary event , we can assign a probability P( si ) such
that,
In a special case, when the outcomes are equi-probable, we can assign equal
probability p to each elementary event.
Example 9 Consider the experiment of rolling a fair die considered in example 2.
Example 10 Consider the experiment of tossing a fair coin until a head is obtained
discussed in Example 3. Here . Let us call
Suppose the sample space S is continuous and un-countable. Such a sample space
arises when the outcomes of an experiment are numbers. For example, such sample space
occurs when the experiment consists in measuring the voltage, the current or the
resistance. In such a case, the sigma algebra consists of the Borel sets on the real line.
Example 11 Suppose
Then for
In many applications we have to deal with a finite sample space and the
elementary events formed by single elements of the set may be assumed equiprobable. In
this case, we can define the probability of the event A according to the classical definition
discussed earlier:
1. Product rule Suppose we have a set A with m distinct elements and the set B with
Example 1 A fair die is thrown twice. What is the probability that a 3 will appear at least
once.
Solution: The sample space corresponding to two throws of the die is illustrated in the
following table. Clearly, the sample space has elements by the product rule.
The event corresponding to getting at least one 3 is highlighted and contains 11 elements.
Example 2 Birthday problem - Given a class of students, what is the probability of two
students in the class having the same birthday? Plot this probability vs. number of
students and be surprised!.
Let be the number of students in the class.
Example 3 An urn contains 6 red balls, 5 green balls and 4 blue balls. 9 balls were picked
at random from the urn without replacement. What is the probability that out of the balls 4
are red, 3 are green and 2 are blue?
Solution :
Example 4 What is the probability that in a throw of 12 dice each face occurs twice.
Solution: The total number of elements in the sample space of the outcomes of a
single throw of 12 dice is
The number of favourable outcomes is the number of ways in which 12 dice can be
arranged in six groups of size 2 each – group 1 consisting of two dice each showing 1,
group 2 consisting of two dice each showing 2 and so on.
Therefore, the total number distinct groups
Conditional probability
Consider the probability space . Let A and B two events in . We ask the
following question –
Given that A has occurred, what is the probability of B?
Let us consider the case of equiprobable events discussed earlier. Let sample
points be favourable for the joint event .
Figure 1
Clearly ,
Example 2 A family has two children. It is known that at least one of the children is a
girl. What is the
probability that both the children are girls?
Clearly,
In the following we show that the conditional probability satisfies the axioms of
probability.
By definition
Axiom 1:
Axiom 2 :
We have ,
Axiom 3 :
We have ,
Figure 2
If , then
We have ,
Chain Rule of Probability
We have ,
Joint Probability
Joint probability is defined as the probability of both A and B taking place, and is denoted
by P(AB).
Joint probability is not the same as conditional probability, though the two concepts are
often confused. Conditional probability assumes that one event has taken place or will take place,
and then asks for the probability of the other (A, given B). Joint probability does not have such
conditions; it simply asks for the chances of both happening (A and B). In a problem, to help
distinguish between the two, look for qualifiers that one event is conditional on the other
(conditional) or whether they will happen concurrently (joint).
Probability definitions can find their way into CFA exam questions. Naturally, there may also
be questions that test the ability to calculate joint probabilities. Such computations require use of
the multiplication rule, which states that the joint probability of A and B is the product of the
conditional probability of A given B, times the probability of B. In probability notation:
Given a conditional probability P(A | B) = 40%, and a probability of B = 60%, the joint probability
P(AB) = 0.6*0.4 or 24%, found by applying the multiplication rule.
P(AUB)=P(A)+P(B)-P(AחB)
Moreover, the rule generalizes for more than two events provided they are all independent of one
another, so the joint probability of three events P(ABC) = P(A) * (P(B) * P(C), again assuming
independence.
Total Probability
Remark
(2) The theorem of total probability can be used to determine the probability of a
complex event in terms of related simpler events. This result will be used in Bays'
theorem to be discussed to the end of the lecture.
Example 3 Suppose a box contains 2 white and 3 black balls. Two balls are picked at
random without replacement.
Bayes' Theorem
This result is known as the Baye's theorem. The probability is called the a priori
probability and is called the a posteriori probability. Thus the Bays' theorem
enables us to determine the a posteriori probability from the observation that B
has occurred. This result is of practical importance and is the heart of Baysean
classification, Baysean estimation etc.
Example 6
Given and
Example 7: In an electronics laboratory, there are identically looking capacitors of three
makes in the ratio [Link]. It is known that 1% of , 1.5% of
are defective. What percentages of capacitors in the laboratory are defective? If a capacitor
picked at defective is found to be defective, what is the probability it is of make ?
Here
Independent events
Two events are called independent if the probability of occurrence of one event
does not affect the probability of occurrence of the other. Thus the events A and B are
independent if
and
or --------------------
Two events A and B are called statistically dependent if they are not independent.
Similarly, we can define the independence of n events. The events are
called independent if and only if
Example 4 Consider the example of tossing a fair coin twice. The resulting sample space
is given by and all the outcomes are equiprobable.
and
Again, so that
A random variable associates the points in the sample space with real numbers.
Observations:
is the domain of .
The range of denoted by ,is given by
Clearly .
• The above definition of the random variable requires that the mapping is such
that is a valid event in . If is a discrete sample space, this
requirement is met by any mapping . Thus any mapping defined on the
discrete sample space is a random variable.
Example 2 Consider the example of tossing a fair coin twice. The sample space is S={
HH,HT,TH,TT} and all four outcomes are equally likely. Then we can define a random
variable as follows
Here .
Example 3 Consider the sample space associated with the single toss of a fair die.
The sample space is given by .
If we define the random variable that associates a real number equal to the
number on the face of the die, then .
.
Discrete, Continuous and Mixed-type Random Variables
Consider the Borel set , where represents any real number. The equivalent
event is denoted as .The event can be
taken as a representative event in studying the probability description of a random variable
. Any other event can be represented in terms of this event. For example,
and so on.
This follows from the fact that is a probability and its value should lie
between 0 and 1.
.
We have ,
We can further establish the following results on probability of events on the real line:
Find a) .
b) .
c) .
d) .
Solution:
Figure 6 shows the plot of FX(x).
Figure 6
The discrete random variable in this case is completely specified by the probability
mass function (pmf) .
Clearly,
• Suppose .Then
Example 1
Interpretation of
so that
.
This follows from the fact that is a non-decreasing function
Figure 8 below illustrates the probability of an elementary interval in terms of the pdf.
Remark: Using the Dirac delta function we can define the density function for a
discrete random variables.
Consider the random variable defined by the probability mass function (pmf)
Example 3
Consider the random variable defined with the distribution function given by,
where
where
Figure 10
where
and
Example 5
X is the random variable representing the life time of a device with the PDF for
. Define the following random variable
Find FY(y).
Solution:
In the following, we shall discuss a few commonly-used discrete random variabes. The
importance of these random variables will be highlighted.
Suppose X is a random variable that takes two values 0 and 1, with probability mass
functions
And
Such a random variable X is called a Bernoulli random variable, because it describes the
outcomes of a Bernoulli trial.
Remark
We can define the pdf of with the help of Dirac delta function. Thus
Remark
The Bernoulli RV is the simplest discrete RV. It can be used as the building block
for many discrete RVs.
For the Bernoulli RV,
Thus all the moments of the Bernoulli RV have the same value of
where
The sum of n independent identically distributed Bernoulli random variables is a
binomial random variable.
The binomial distribution is useful when there are two types of objects - good, bad;
correct, erroneous; healthy, diseased etc.
Suppose is the random variable representing the number of bit errors in a block of
8 bits. Then
Therefore,
The probability mass function for a binomial random variable with n = 6 and p =0.8 is
shown in the Figure 3 below.
Figure 3
A discrete random variable X is called a Poisson random variable with the parameter if
and
i. no call is received
ii. exactly 5 calls are received
iii. More than 3 calls are received.
Solution: Let X be the random variable representing the number of calls received. Given
Where Therefore,
0.9897
Then
Thus the Poisson approximation can be used to compute binomial probabilities for large n.
It also makes the analysis of such probabilities easier. Typical examples are:
Example 4 Suppose there is an error probability of 0.01 per word in typing. What is the
probability that there will be more than 1 error in a page of 120 words?
Solution: Suppose X is the RV representing the number of errors per page of 120 words.
Where Therefore,
In the following we shall discuss some important continuous random variables.
A continuous random variable X is called uniformly distributed over the interval [a,
b], , if its probability density function is given by
Figure 1
Distribution function
Figure 2 illustrates the CDF of a uniform random variable.
Figure 2
Example 1
Figure 3 illustrates two normal variables with the same mean but different variances.
Figure 3
Thus can be computed from tabulated values of . The table was very
useful in the pre-computer days.
These results follow from the symmetry of the Gaussian pdf. The function is
tabulated and the tabulated results are used to compute probability involving the Gaussian
random variable.
Using the Error Function to compute Probabilities for Gaussian Random Variables
The function is closely related to the error function and the complementary
error function .
Note that,
If X is distributed, then
Proof:
Exponential Random Variable
Figure 1
Example 1
Figure 6
Similarly,
Relation between the Rayleigh Distribution and the Gaussian Distribution
Consider the event and any event B involving the random variable X . The
conditional distribution function of X given B is defined as
We can verify that satisfies all the properties of the distribution function.
Particularly.
And .
.
Is a non-decreasing function of .
All the properties of the pdf applies to the conditional pdf and we can easily show that
Then
And
Case 2:
and
and are plotted in the following figures.
Figure 1
For , .Therefore,
For , .Therefore,
Thus,
and . Then
where and
Remark
Provided exists.
Is also called the mean or statistical average of the random variable and is
denoted by
Note that, for a discrete RV with the probability mass function (pmf)
the pdf is given by
Example 1
Then
Example 2
pX(x)
Then
Then
=
Hence EX does not exist. This density function is known as the Cauchy density function.
We shall illustrate the above result in the special case when is one-to-
one and monotonically increasing function of x In this case,
Figure 2
(a) If is a constant,
Clearly
Mean-square value
Variance
For a random variable with the pdf and mean the variance of is denoted
by and
defined as
Example 4
Find the variance of the random variable in the above example
Example 5
Find the variance of the random variable discussed in above example. As already
computed
For example, consider two random variables with pmf as shown below. Note
Properties of variance
(1)
(2) If then
(3) If is a constant,
We can define the nth moment and the nth central- moment of a random variable X by
the following relations
Note that
The mean is the first moment and the mean-square value is the
second moment
The first central moment is 0 and the variance is the second
central moment
SKEWNESS
The third central moment measures lack of symmetry of the pdf of a random
variable. Is called kurtosis. If the peak of the pdf is sharper, then the
random variable has a higher kurtosis.
Chebychev Inequality
The standard deviation gives us an intuitive idea how the random variable is
distributed about the mean. This idea is more precisely expressed in the remarkable
Chebysev Inequality stated below. For a random variable with mean
Proof:
Characteristic function
Example 1
Solution:
Example 2
Suppose X is a random variable taking values from the discrete set with
corresponding probability mass function for the value
Then,
Thus ,
Description:
Suppose we are given a random variable X with density fX(x). We apply a function g
to produce a random variable Y = g(X). We can think of X as the input to a black
box,and Y the output.
UNIT-3
MULTIPLE RANDOM VARIABLES
In many applications we have to deal with more than two random variables. For
example, in the navigation problem, the position of a space craft is represented by three
random variables denoting the x, y and z coordinates. The noise affecting the R, G, B
channels of colour video may be represented by three random variables. In such situations,
it is convenient to define the vector-valued random variables where each component of the
vector is a random variable.
In this lecture, we extend the concepts of joint random variables to the case of
multiple random variables. A generalized analysis will be presented for random
variables defined on the same sample space.
We may define two or more random variables on the same sample space. Let and
be two real random variables defined on the same probability space The
mapping such that for is called a joint random variable.
Figure 1
Recall the definition of the distribution of a single random variable. The event
was used to define the probability distribution function . Given , we
can find the probability of any event involving the random variable. Similarly, for two
random variables and , the event is considered as
the representative event.
The probability is called the joint distribution function or
the joint cumulative distribution function (CDF) of the random variables and and
denoted by .
Figure 2
Properties of JPDF
1)
2)
3)
Note that
4)
6)
7)
To prove this
Similarly .
If and are two discrete random variables defined on the same probability space
such that takes values from the countable subset and takes values from
the countable subset .Then the joint random variable can take values from the
countable subset in . The joint random variable is completely specified by
their joint probability mass function
Remark
This is because
and similarly
These probability mass functions and obtained from the joint probability
mass functions are called marginal probability mass functions .
Example 4 Consider the random variables and with the joint probability mass
function as tabulated in Table 1. The marginal probabilities and are as shown
in the last column and the last row respectively.
Table 1
If and are two continuous random variables and their joint distribution
function is continuous in both and , then we can define joint probability density
function by
provided it exists.
Clearly
Example 6 The joint pdf of two random variables and are given by
• Find .
• Find .
• Find and .
• What is the probability ?
Conditional Distributions
and
We can define these quantities for two random variables. We start with the conditional
probability mass functions for two random variables.
Conditional Probability Density Functions
Suppose and are two discrete jointly random variable with the joint PMF
The conditional PMF of given is denoted by and defined as
Consider two continuous jointly random variables and with the joint probability
distribution function We are interested to find the conditional distribution
function of one of the random variables on the condition of a particular value of the other
random variable.
We cannot define the conditional distribution function of the random variable on the
condition of the event by the relation
Because,
Similarly we have
Example 2 X and Y are two jointly random variables with the joint pdf given by
find,
(a)
(b)
(c)
Solution:
Since
We get
and equivalently
We are often interested in finding out the probability density function of a function of
two or more RVs. Following are a few examples.
where is received signal which is the superposition of the message signal and the
noise .
Figure 1
Consider Figure 2
Figure 2
We have
Example 1
Suppose X and Y are independent random variables and each uniformly distributed over
(a, b). And are as shown in the figure below.
The CLT states that under very general conditions converges in distribution
to as . The conditions are:
We shall consider the first condition only. In this case, the central-limit theorem can be
stated as follows:
We give a less rigorous proof of the theorem with the help of the characteristic
function. Further we consider each of to have zero mean. Thus,
Clearly,
The characteristic function of is given by
We will show that as the characteristic function is of the form of the
characteristic function of a Gaussian random variable.
Expanding in power series
Substituting
Note also that each term in involves a ratio of a higher moment and a power of
and therefore,
which is the characteristic function of a Gaussian random variable with 0 mean and
variance .
Expected Values of Functions of Random Variables
Where
Note that
Thus,
Example 2 If
Proof:
Example 3
(1) We have earlier shown that expectation is a linear operator. We can generally
write
Thus
(2) If are independent random variables and ,then
Just like the moments of a random variable provide a summary description of the random
variable, so also the joint moments provide summary description of two random variables.
For two continuous random variables , the joint moment of order is defined
as
Remark
(1) If are discrete random variables, the joint expectation of order and
is defined as
We will also show that To establish the relation, we prove the following
result:
Non-negativity of the left-hand side implies that its minimum also must be nonnegative.
Now
Thus
then
Note that independence implies uncorrelated. But uncorrelated generally does not
imply independence (except for jointly Gaussian random variables).
function instead of
If and are discrete random variables, we can define the joint characteristic function in
terms of the joint probability mass function as follows:
The joint characteristic function has properties similar to the properties of the chacteristic
function of a single random variable. We can easily establish the following properties:
1.
2.
3. If and are independent random variables, then
4. We have,
Hence,
Example 2 The joint characteristic function of the jointly Gaussian random variables
and with the joint pdf
We can use the joint characteristic functions to simplify the probabilistic analysis as
illustrated on next page:
Many practically occurring random variables are modeled as jointly Gaussian random
variables. For example, noise samples at different instants in the communication system
are modeled as jointly Gaussian random variables.
Two random variables are called jointly Gaussian if their joint probability
density
The joint pdf is determined by 5 parameters
means
variances
correlation coefficient
We denote the jointly Gaussian random variables and with these parameters as
The joint pdf has a bell shape centered at as shown in the Figure 1 below. The
variances determine the spread of the pdf surface and determines the
orientation of the surface in the plane.
(1) If and are jointly Gaussian, then and are both Gaussian.
We have
Similarly
(2) The converse of the above result is not true. If each of and is Gaussian, and
are not necessarily jointly Gaussian. Suppose
And
Similarly,
(3) If and are jointly Gaussian, then for any constants and ,the random
variable given by is Gaussian with mean and variance
(4) Two jointly Gaussian RVs and are independent if and only if and are
uncorrelated .Observe that if and are uncorrelated, then
Example 1 Suppose X and Y are two jointly-Gaussian 0-mean random variables with
variances of 1 and 4 respectively and a covariance of 1. Find the joint PDF
We have
Suppose then
Hence proved.
Univariate transformations
When working on the probability density function (pdf) of a random variable X, one
is often led to create a new variable Y defined as a function f(X) of the original variable
X. For example, if X~N(µ, ²), then the new variable:
Y = f(X) = (X - µ)/
Is N (0, 1).
It is also often the case that the quantity of interest is a function of another (random)
quantity whose distribution is known. Here are a few examples:
*Scaling: from degrees to radians, miles to kilometers, light-years to parsecs, degrees
Celsius to degrees Fahrenheit, linear to logarithmic scale, to the distribution of the
variance
* Laws of physics: what is the distribution of the kinetic energy of the molecules of a
gas if the distribution of the speed of the molecules is known ?
Multivariate transformations
The problem extends naturally to the case when several variables Yj are defined from
several variables Xi through a transformation y = h(x).
Here are some examples:
Sampling distributions
Let f(x) is the pdf of the r. v. X. Let also Z1 = z1(x1, x2... xn) be a statistic, e.g.
the sample mean. What is the pdf of Z1?
Z1 is a function of the n r. v. Xi (with n the sample size), that are lid with pdf f(x). If it
is possible to identify n - 1 other independent statistics Zi, i = 2... n, then a
transformation Z = h(X) is defined, and g(z), the joint distribution of Z = {Z1, Z2, ...,
Zn} can be calculated. The pdf of Z1 is then calculated as one of the marginal
distributions of Z by integrating g(z) over zi , i = 2, .., n.
Integration limits
Calculations on joint distributions often involve multiple integrals whose
integration limits are themselves variables. An appropriate change of variables
sometimes allows changing all these variables but one into fixed integration limits,
thus making the calculation of the integrals much simpler.
Linear Transformations of Random Variables
Adding a constant: Y = X + b
Subtracting a constant: Y = X - b
Multiplying by a constant: Y = mX
Dividing by a constant: Y = X/m
Multiplying by a constant and adding a constant: Y = mX + b
Dividing by a constant and subtracting a constant: Y = X/m - b
Suppose the vector of random variables has the joint
distribution . Set for some square matrix and vector . If
then has the joint distribution
Indeed, suppose (this is the notation for "the is the distribution density of ")
and . For any domain of the space we can
write
(Linear
transformation of
random variables)
For two independent standard normal variables (s.n.v.) and the combination
is distributed as .
A product of normal variables is not a normal variable. See the section on the chi-squared
distribution.
UNIT 4
STOCHASTIC PROCESSES-TEMPORAL
CHARACTERISTICS
Random Processes
In practical problems, we deal with time varying waveforms whose value at a time
is random in nature. For example, the speech waveform recorded by a microphone, the
signal received by communication receiver or the daily record of stock-market data
represents random variables that change with time. How do we characterize such data?
Such data are characterized as random or stochastic processes. This lecture covers the
fundamentals of random processes.
Recall that a random variable maps each sample point in the sample space to a point
in the real line. A random process maps each sample point to a waveform.
The value of a random process is at any time can be described from its
probabilistic model.
The state is the value taken by at a time , and the set of all such states is called
the state space. A random process is discrete-state if the state-space is finite or countable.
It also means that the corresponding sample space is also finite or countable. Otherwise ,
the random process is called continuous state.
Firtst order and nth order Probability density function and Distribution functions
We defined the moments of a random variable and joint moments of random variables. We
can define all the possible moments and joint moments of a random process .
Particularly, following moments are important.
Note that
The autocorrelation function and the autocovariance functions are widely used to
characterize a class of random process called the wide-sense stationary process.
On the basis of the above definitions, we can study the degree of dependence between two
random processes
The concept of stationarity plays an important role in solving practical problems involving
random processes. Just like time-invariance is an important characteristics of many
deterministic systems, stationarity describes certain time-invariant property of a class of
random processes. Stationarity also leads to frequency-domain description of a random
process.
Particularly,
If then
is called order stationary.
Is called order stationary does not depend on the placement of the origin of the
time axis. This requirement is a very strict. Less strict form of stationary may be defined.
If is stationary up to order 1
As a consequence
If is stationary up to order 2
Put
Therefore, the autocorrelation function of a SSS process depends only on the time lag
We can also define the joint stationary of two random processes. Two processes
This is because
It is very difficult to test whether a process is SSS or not. A subclass of the SSS process
called the wide sense stationary process is extremely important from practical point of
view.
(2) An SSS process is always WSS, but the converse is not always true.
This is the model of the carrier wave (sinusoid of fixed frequency) used to analyse
the noise performance of many receivers.
Note that
Note that
and
Such signals are called power signals. For a power signal the autocorrelation function
is defined as
The autocorrelation of the deterministic signal gives us insight into the properties of the
autocorrelation function of a WSS process. We shall discuss these properties next.
Because,
Remark For a complex process
We have
4. is a positive semi-definite function in the sense that for any positive integer
and real ,
Proof
It can be shown that the sufficient condition for a function to be the autocorrelation
function of a real WSS process is that be real, even and positive semidefinite.
Proof: Note that a real WSS random process is called mean-square periodic ( MS
periodic ) with a period if for every
Again
If and are two real jointly WSS random processes, their cross-correlation
functions are independent of and depends on the time-lag. We can write the cross-
correlation function
We Have
Further,
iii. If and Y (t) are uncorrelated,
iv. If X ( t ) and Y (t) are orthogonal processes,
Example 2
Consider a random process which is sum of two real jointly WSS random
processes.
We have
Example 3
Suppose
which is constant for the selected realization. Note that represents the dc value of
.
The above definitions are in contrast to the corresponding ensemble average defined by
Let us consider the simplest case of the time averaged mean of a discrete-time WSS
random process given by
The mean of
Let us consider the time-averaged mean for the continuous case. We have
The above double integral is evaluated on the square area bounded by and
We divide this square region into sum of trapezoidal strips parallel to
(See Figure 1)Putting and noting that the differential area between
and is , the above double integral is converted to a
single integral as follows:
Figure 1
Ergodicity Principle
If the time averages converge to the corresponding ensemble averages in the probabilistic
sense, then a time-average computed from a large realization can be used as the value for
the corresponding ensemble average. Such a principle is the ergodicity principle to be
discussed below:
and
Here
Autocorrelation ergodicity
We consider so that,
Then will be autocorrelation ergodic if is mean ergodic.
Thus will be autocorrelation ergodic if
where
Example 2
and
UNIT 5
STOCHASTIC PROCESSES—
SPECTRAL CHARACTERISTICS
Definition of Power Spectral Density of a WSS Process
And
The average power is given by
We have
Figure 1
Therefore,
Figure 2
Figure 4
Properties of the PSD
Consider a random process which is sum of two real jointly WSS random
processes As we have seen earlier,
Thus we see that includes contribution from the Fourier transform of the cross-
correlation functions
These Fourier transforms represent cross power spectral densities.
Given two real jointly WSS random processes the cross power spectral
density (CPSD) is defined as
Proceeding in the same way as the derivation of the Wiener-Khinchin-Einstein theorem for
the WSS process, it
can be shown that
and
The cross-correlation function and the cross-power spectral density form a Fourier
transform pair and we can
write
and
Properties of the CPSD
The CPSD is a complex function of the frequency ’w’. Some properties of the CPSD of
two jointly WSS processes
are listed below:
(1)
Note that
We have
Observe that
(4) If are orthogonal, then
We have,
Wiener-Khinchin-Einstein theorem
(b) In a single throw of two dice, what is the probability of obtaining a sum of at
least 10.
2. (a) Define the joint distribution function. Explain how marginal density
functions are computed given their joint distribution functions.
(b) A product is classified according to the number of defects it contains (X1) and
the factory that produces it (X2). The joint probability distribution is given by :
X2 → 1 2
X1 ↓
0 1/8 1/16
1 1/16 1/16
2 3/16 1/8
3 1/8
3. (a) Show that mean of the binomial distribution is the product of the parameters
p & the number of times n.
(b) Sketch the probability density function & Probability distribution function of
i) Exponential distribution
ii) Uniform Distribution
iii) Gaussian Distribution
4. (a) State the condition for wide sense stationary Random process.
(b) Find the Auto Correlation function for white noise shown in the figure 4b
below :
-t0/2 t0/2
Figure 4b
5. (a) Derive the relation between PSDs of input and output random process of an
LTI system.
(b) X(t) is a stationary random process with zero mean and auto correlation R XX(
t ) e-2| t | is applied to a system of function H(w) = 1/(jw+2). Find the mean
and PSD of its output.
7. (a) What are the important parameters that determine the overall noise figure of a
multistage filtering?
(b) Bring out the importance of Frii’s formula.
8. (a) A code is composed of dots and dashes . Assume that a dash is three times as
long as the dot and has one-third the probability of occurrence. Find :
(b)Suppose 100 voltage levels are employed to transmit 100 equally likely
messages .Assume the system to be Gaussian channel with λ =3.5 and
bandwidth B= 104 Hz. Find S/N.
SET 2
1 . (a) If A and B are any events, not necessarily mutually exclusive events, derive an
expression for probability of A Union B. When A and B are mutually
exclusive, ehat happens to the above expression derived.
(b) Define the term Independent events .State the conditions for independence of :
(c) A coin is tossed .If it turns up heads, two balls will be drawn from Box A,
otherwise, two
balls will be drawn from Box B. Box A contains three black and five white balls. Box B
contains seven black and one white ball. In both cases,selctions are to be made with
replacement. What is the probability that Box A is used ,given that both balls are black?
2. (a) Two discrete random variables X and Y have joint p.m.f. given by the following
table
X↓ 1 2 3 Y
←
i. X≤ 11/2
ii .XY is even
iii .Y is even given that X is even.
4. (a) State and prove any four properties of mean of a random variable X.
(b) Prove that the density function of sum of two statistically independent random
variables is the convolution of their individual density functions.
Xc(t) = AX(t)Cos(wct+θ)
Where X(t) is the message signal and Cos(Wct+θ) is the carrier. The message Signal
X(t) is modulated as a zero mean stationary random process with the auto
correlation function Rxx(Τ) and the PSD Gx(f).The carrier amplitude A and
frequency Wc are assumed to be constants and the initial carrier phase θ is
assumed to be a random variable uniformly distributed in the interval (-
∏,∏).Further more X(t) and θ are assumed to be independent.
5. (a) Derive the relation between PSDs of input and output random process of an LTI
system.
(b) X(t) is a stationary random process with zero mean and auto correlation R xx(T)e-2|T|
.
is applied to a system of function Find mean and PSD of its
output.
(b) Calculate the rms noise voltage generated in a bandwidth if 15 kHz by a resistor of
2kΩ operating at [Link] the noise power over this bandwidth. Find the noise
PSD.
7. In TV receivers, the antenna is often mounted on a tall mask and a long lossy cable is
used to connect the antenna and receiver. To overcome the effect of noisy cable, a
preamplifier is mounted on the antenna. The parameters of the different stages are:
Preamplifier gain = 20dB
Preamplifier Noise Figure = 6dB
Lossy cable Noise figure = 3dB
Cable Loss = -20dB
Receiver front end gain = 60dB
Receiver Noise cable = 16dB
Determine the overall noise figure of the system.
8. (a) A code is composed of dots and [Link] that a dash is three times as long as
the dot and has one-third the probability of occurrence.
Find:
(b) Suppose 100 voltage levels are employed to transmit 100 equally likely messages.
Assume the system to be a Gaussian channel with λ=3.5 and bandwidth B=
[Link] S/N.
SET 3
1. (a) Define Probability density function and obtain the relationship between
probability and probability density.
(b) Consider the probability density f(x) = ae -b|x| where x is a random variable
Whose allowable values range from .Find:
i. the CDF
ii. The relationship between a and b and
iii. The probability that the outcome X lies between 1 and 2.
2. (a) Derive an expression for the average value and variance associated with the
Gaussian probability density function.
(b) The average life of a certain type of electric bulb is Rs.1200 hours. What
percentage of this type of bulbs is expected to fail in the first 800 hours of
working? What percentage is expected to fail between 800 is 1000 hours? Assume
a normal distribution with
4. X(t) is zero mean stationary Random process with an Auto correction function Rxx
( ). By integration X(t) form a Random variable Y as
5. (a)Derive the relation between PSDs of input and output random process of an LTI
system.
(b) X(t) is a stationary random process with zero means and auto correlation
Rxx ( )e-2|T| is appied to a system of function .Find mean and
PSD of its output.
7. (a) What are the precautions to be taken in cascading stages of a network in the
point of view of noise reduction?
(b) What is the need for band limiting the signal towards the direction increasing
SNR.
8. (a) Obtain the Shannon –Hartley law giving the relation amongst channel capacity,
bandwidth and signal to noise ratio of a continuous system.
(b) Consider a message sequence having alphabets Q1, Q2, Q3 and Q4 with
probabilities 1/2, 1/4, 1/8 and 1/8 respectively.
(b)A letter is known to have come either from LONDON or [Link] the
postmark only the two consecutive letters ‘ON’ are [Link] is the Chance that
it came from London? Give step-by-step answer.
(c) Show that the chances ofthrowing six with 4.3 or 2 dice respectively are as
[Link].
2. (a) Derive an expression for the average value and variance associated with the
Gaussian probability density function.
(b) The average life of a certain type of electric bulb is Rs.1200 hours. What
percentage of this type of bulbs is expected to fail in the first 800 hours of
working? What percentage is expected to fail between 800 is 1000 hours? Assume
a normal distribution with
3. (a) State and Prove any four properties of mean of a random variable X.
(b) Prove that the density function of sum of two statistically independent random
variables is the convolution of their individual density functions.
4. (a) If the auto correlation function of a wss process is R ( )=k.e( -kт) ,Show that its
6. (a) Calculate the equivalent noise bandwidth of an RC Low pass filter. How it is
related to its 3dB bandwidth.
(b) Find the PSD of the noise voltage across the terminals 1 & 2 for the following
ckt figure 6b:
1
1Ω 1F
1Ω 1H
2
Figure 6b
7. (a)Evaluate the equivalent noise temperature of a two porr device with a matched
source and a matched load.
(b) Bring out the significance of noise figure in the determining the performance of
communication system.
SET 1
2. (a) Define rayleigh density and distribution function and explain them with their
plots.
(b) Define and explain the guassian random variable in brief?
(c) Determine whether the following is a valid distrbution function. F(x) = 1-
exp(-x/2) for x ) 0 and 0 elsewhere. [5+5+6]
3. (a) State and prove properties of characteristic function of a random variable X
(b) Let X be a random variable defined by the density function
fX(x) = _ 5
4 (1 − x4) 0 < x _ 1
0 elsewhere
. Find E[X] ,E[X2] and variance. [8+8]
4. The joint space for two random variables X and Y and corresponding probabilities
are shown in table
Find and Plot
(a) FXY (x, y)
(b) marginal distribution functions of X and Y.
(c) Find P(0.5 < X < 1.5),
(d) Find P(X _ 1,Y _ 2) and
(e) Find P(1 < X _ 2,Y _ 3).
X, Y 1,1 2,2 3,3 4,4
P 0.05 0.35 0.45 0.15
[3+4+3+3+3]
5. (a) Show that the variance of a weighted sum of uncorrected random variables
equals the weighted sum of the variances of the random variables.
(b) Two random variables X and Y have joint characteristic function
φX, Y(ω1,ω2) = exp(−2ω21−8ω22).
i. Show that X and Y are zero mean random variables.
ii. are X and Y are correlated. [8+8]
7. (a) Derive the expression for PSD and ACF of band pass white noise and plot
them
(b) Define various types of noise and explain. [8+8]
SET 2
1. (a) With an example define and explain the following:
i. Equality likely events
ii. Exhaustive events.
iii. Mutually exclusive events.
(b) In an experiment of picking up a resistor with same likelihood of being picked
up for the events; A as “draw a 47 resistor”, B as “draw a resistor with 5%
Tolerance” and C as “draw a 100 resistor” from a box containing 100. Resistors
having resistance and tolerance as shown below. Determine joint Probabilities and
conditional probabilities. [6+10]
Resistance() Tolerance
5% 10% Total
22 10 14 24
47 28 16 44
100 24 8 32
Total 62 38 100
2. (a) What is binomial density function? Find the equation for binomial distribution
function.
(b) What do you mean by continuous and discrete random variable? Discuss the
condition for a function to be a random variable. [6+10]
5. (a) let Xi, i = 1,2,3,4 be four zero mean Gaussian random variables. Use the joint
characteristic function to show that E {X1 X2 X3 X4} = E[X1 X2] E[X3 X4]
+ E[X1X3]E[X2X4] + E[X2X3] E[X1X4]
(b) Show that two random variables X1 and X2 with joint pdf.
fX1X2(X1, X2) = 1/16 |X1|< 4, 2 < X2< 4 are independent and orthogonal.[8+8]
6. A random process Y (t) = X (t) - X (t +τ) is defined in terms of a process X(t) that
is at least wide sense stationary.
(a) Show that mean value of Y (t) is 0 even if X(t) has a non Zero mean value.
(b) Show that σY2= 2[RXX (0) − RXX (τ )]
(c) If Y (t) = X (t) +X (t + τ) find E[Y (t)] and σY 2. [5+5+6]
8. (a) A Signal x(t) = u(t) exp (-αt ) is applied to a network having an impulse
response h(t)= ω u(t) exp (-ω t). Here α & ω are real positive constants.
Find the network response? (6M)
(b) Two systems have transfer functions H1 ( ω) & H2( ω). Show the transfer
Function H (ω) of the cascade of the two is H(ω ) =H1( ω) H2 (ω ).
(c) For cascade of N systems with transfer functions Hn(ω) , n=1,2,.... .N show
that H( ω) = πHn(ω). [6+6+4]
SET 3
6. (a) Define cross correlation function of two random processes X(t) and Y(t) and
State the properties of cross correlation function.
(b) Let two random processes X (t) and Y (t) be defined by
a) X (t) = A cos ω0t + B sin ω0t
Y (t) = B cos ω0t - A sin ω0t
Where A and B are random variables and ω0 is a constant. Assume A and
B are uncorrelated, zero mean random variables with same variance. Find the cross
correlation function RXY (t, t+τ ) and show that X(t) and Y(t) are
jointly wide sense stationary. [6+10]
SET 4
5. Three statistically independent random variables X1, X2 and X3 have mean values
¯X1= 3, ¯X2= 6 and ¯X3= −2. Find the mean values of the following functions.
(a) g(X1,X2,X3) = X1 + 3X2 + 4X3
(b) g(X1,X2,X3) = X1 X2 X3
(c) g(X1,X2,X3) = −2X1,X2 −3X1 X3 + 4X2 X3
(d) g (X1,X2,X3) = X1+X2+X3. [16]
6. Statistically independent zero means random processes X (t) and Y (t) have auto
Correlations functions RXY (τ ) = e - |_| and RYY(τ ) = cos (2_τ ) respectively.
(a) Find the auto correlation function of the sum W1 (t) = X (t) + Y (t)
(b) Find the auto correlation function of difference W2 (t) = X (t) - Y(t)
(c) Find the cross correlation function of W1 (t) and W2 (t). [5+5+6]
8. A random noise X (t) having power spectrum SXX (ω) = 349+! 2 is applied to a to a
Network for which h (t) = u (t) t2 exp (−7t). The network response is denoted by Y (t)
(a) What is the average power is X (t)
(b) Find the power spectrum of Y (t)
(c) Find average power of Y (t). [5+6+5]
SOLVED EXAMPLES
INTERNAL PAPERS
(MID, ASSSIGNMENT)
Hall Ticket No.
SET-1
*****
Hall Ticket No.
SET-2
CMR COLLEGE OF ENGINEERING & TECHNOLOGY
(AUTONOMOUS)
[Link] III Semester- I mid Examinations August – 2016
(Regulation: CMRCET-R15)
Subject Name: Probability Theory & Random Process
Department: ECE
Date: 09 .08.2016 Time: 10AM-11:20AM [Link]
PART-A
Answer all TEN questions (Compulsory)
Each question carries ONE mark.
5x1=5M
1. Associative law is given as______________
2. Three events A,B and C are independents events then P(A∩B∩C) is _____________
3. List out the types of distribution funcions
4. What is the significance of standard deviation ?
5. What is the relation between density and distribution function?
PART-B
Answer any THREE questions. Each question carries FIVE Marks.
3x5=15M
6. a) State and prove the total probability theorem and baye’s theorem.
b) In a pen manufacture factory, machines A, B, C manufacture 20%, 30%, 50% of total
output respectively. From their outputs 4%, 5%, 3% are defective pens. A pen is drawn at
random and found to be defective. i) What is the probability of getting defective pen? ii)
What is the probability that it was manufactured by machine A? iii) What is the probability
that it was manufactured by machine C
7. a) What is conditional probability? Verify does it satisfy the axioms?
b) In an experiment of drawing a card from a pack of 52 cards, the event of getting a
spade is denoted by A, and getting a pictored card is denoted by B. Find the probabilities
of A,B, A∩B and AUB
8. a) A Gaussian random variable X has a X=0, and X=2. Then find i) P{2<X 3} ii)
P{2<X 3/X>2).
b) Two dice are rolled simultaneously, The random variable X denotes the “sum of two
faces shows up”. Find the mean and variance of random variable X.
9. a) State and prove the properties of Characteristic function.
b) The amplitude of output signal of a radar system is Rayleigh random variable with
a=0 and b=4 volts. The system gets false detection if signal exceeds V volts. What is the
value of V, if the probability of false detection is 0.001.
10. a) Show that mean and variance of binomial distribution is NP, NPq respectively
b) State and prove properties of joint distribution function.
Hall Ticket No.
SET-2
CMR COLLEGE OF ENGINEERING & TECHNOLOGY
(AUTONOMOUS)
[Link] III Semester- I mid Examinations August – 2016
(Regulation: CMRCET-R15)
Subject Name: Probability Theory & Random Process
Department: ECE
Date: 09 .08.2016 Time: 10AM-11:20AM [Link]
PART-A
Answer all TEN questions (Compulsory)
Each question carries ONE mark.
5x1=5M
1. Associative law is given as______________
2. Three events A,B and C are independents events then P(A∩B∩C) is _____________
3. List out the types of distribution funcions
4. What is the significance of standard deviation ?
5. What is the relation between density and distribution function?
PART-B
Answer any THREE questions. Each question carries FIVE Marks.
3x5=15M
6. a) State and prove the total probability theorem and baye’s theorem.
b) In a pen manufacture factory, machines A, B, C manufacture 20%, 30%, 50% of total
output respectively. From their outputs 4%, 5%, 3% are defective pens. A pen is drawn at
random and found to be defective. i) What is the probability of getting defective pen? ii)
What is the probability that it was manufactured by machine A? iii) What is the probability
that it was manufactured by machine C
7. a) What is conditional probability? Verify does it satisfy the axioms?
b) In an experiment of drawing a card from a pack of 52 cards, the event of getting a
spade is denoted by A, and getting a pictored card is denoted by B. Find the probabilities
of A,B, A∩B and AUB
8. a) A Gaussian random variable X has a X=0, and X=2. Then find i) P{2<X 3} ii)
P{2<X 3/X>2).
b) Two dice are rolled simultaneously, The random variable X denotes the “sum of two
faces shows up”. Find the mean and variance of random variable X.
9. a) State and prove the properties of Characteristic function.
b) The amplitude of output signal of a radar system is Rayleigh random variable with
a=0 and b=4 volts. The system gets false detection if signal exceeds V volts. What is the
value of V, if the probability of false detection is 0.001.
10. a) Show that mean and variance of binomial distribution is NP, NPq respectively
b) State and prove properties of joint distribution function.
CMR COLLEGE OF ENGINEERING AND TECHNOLOGY
KANDLAKOYA (VILLAGE), MEDCHAL
DEPARTMENT OF ECE
PTRPASSIGNMENT-I QUESTIONS
1. a) State and prove the total probability theorem and baye’s theorem.
b) Write short notes on independent events
2. a) Define Probability? State and prove its axioms?
b) A box contains 4 point contact diodes and 6 alloy junction diodes. What is the
probability that 3 diodes picked at random contain at least 2 point contact diodes.
3. b) It is known that a particular random voltage can be represented by Rayleigh
random variable with a=0 and b=5. The voltage is applied to a device that
generates power g(V)=v2. Find the average power.
b) State and prove properties of probability density function (PDF)
4. a) Explain in detail about Gaussian distribution
b) State and prove the properties of variance
5 a) Explain about moment generating function and find the mean value of
exponential random variable X by using moment generating function
b) State and prove the properties of joint density function
RESULT ANALYSIS
Academic No of No of No of No of Pass
Year students students students students Percentage(%)
registered Attended Failed Passed
COURSE CLASS COs PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12
1 √ √ √
III
2 √ √ √
LDICA [Link]
3 √ √
I SEM
4 √ √ √
CONTENT BEYOND THE SYLLABUS
Topics covered: