Notes On PAS
Notes On PAS
by
Dr Amol Deshpande
Govind Tukram Haldankar
Electronics Engineering
Bharatiya Vidya Bhavan’s
Sardar Patel Institute of Technology
Munshi Nagar, Andheri(W), Mumbai-400058
University of Mumbai
January 2026
Contents
1 Probability and Stochastic Processes 4
1.1 Set theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.1 Types of Sets: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.2 Operations on Sets: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.3 Relationships between Sets: . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Key Concepts in Probability: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Random Variable: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4.1 Discrete Random Variable: . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4.2 Numerical on Discrete Random Variable . . . . . . . . . . . . . . . . . . 9
1.5 Continuous Random Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.6 Example: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.7 Types of Probability: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.8 Inclusive and mutually exclusive probability: . . . . . . . . . . . . . . . . . . . . 26
1.8.1 Inclusive probability: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.8.2 Mutually exclusive events . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.8.3 Examples of Mutually Exclusive Events: . . . . . . . . . . . . . . . . . . 27
1.8.4 Non mutually exclusive events . . . . . . . . . . . . . . . . . . . . . . . . 28
1.8.5 Example of Non Mutually exclusive events . . . . . . . . . . . . . . . . . 28
1.8.6 Brackets and their meanings . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.8.7 Independent events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.8.8 Example of Independent Events: . . . . . . . . . . . . . . . . . . . . . . . 29
1.8.9 Example of dependent Events: . . . . . . . . . . . . . . . . . . . . . . . . 29
1.9 Conditional probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.10 Bayes’ Theorem: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.11 Gamma function: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
1.11.1 Properties of the Gamma Function : . . . . . . . . . . . . . . . . . . . . 31
2 Statistics 38
2.1 Key Components of Statistics: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.2 Applications of Statistics: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.3 Descriptive statistics: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3 Random number 41
3.1 Discrete Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.2 Probability Mass Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.3 Example on Probability Mass Function . . . . . . . . . . . . . . . . . . . . . . . 42
3.4 Usage in Discrete Probability Distributions . . . . . . . . . . . . . . . . . . . . . 44
3.5 Binomial distribution: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.6 Example on Binomial Distribution: . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.7 Poisson’s distribution: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.8 Continuous Random Variables: . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.8.1 Uniform Random Variable . . . . . . . . . . . . . . . . . . . . . . . . . . 48
1
3.8.2 Exponential Random Variable : . . . . . . . . . . . . . . . . . . . . . . . 50
3.9 Comparison between exponential distribution and poisson’s distrubution . . . . 51
3.9.1 A Laplace random variable : . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.10 Random Process: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.11 Types of Random Processes: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.12 Examples of Random Processes: . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5 Normal distribution: 65
5.1 Probability distribution of a binomial random variable . . . . . . . . . . . . . . 65
5.2 Probability distribution of a binomial random variable . . . . . . . . . . . . . . 65
5.3 Gaussian distribution: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.4 Question . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.5 Normal Approximations of the Binomial Distribution . . . . . . . . . . . . . . . 69
5.6 Probability Histogram for Binomial Distribution . . . . . . . . . . . . . . . . . . 70
5.7 Example : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.8 Example : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.9 Example : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.10 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6 Series expansion 80
7 Characteristic function 80
2
8 Random Processes: 84
8.1 Ensemble averages: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
8.2 What is the definition of a stationary random process? . . . . . . . . . . . . . . 85
8.3 Invariant under the translation of time period . . . . . . . . . . . . . . . . . . . 85
8.4 Stationarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
8.5 Stochastic Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
8.6 Bounds of Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
8.7 Law of Large Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
8.8 Central Limit Theorem (CLT) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
8.9 Covariance and Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
8.10 Spectral Characteristics of Random Process . . . . . . . . . . . . . . . . . . . . 102
8.11 Autocorrelation and Power Spectral Density . . . . . . . . . . . . . . . . . . . . 103
8.12 Ergodicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
8.13 Auto-Correlation and Cross-Correlation Functions . . . . . . . . . . . . . . . . . 106
8.14 Impulse Response in LTI Systems . . . . . . . . . . . . . . . . . . . . . . . . . . 107
8.15 Applications in Noise Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
8.16 Gaussian and Poisson random processes . . . . . . . . . . . . . . . . . . . . . . 108
8.17 Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
3
1 Probability and Stochastic Processes
1.1 Set theory
Set theory is a branch of mathematical logic that studies collections of objects, called sets. It
provides a fundamental framework for various mathematical concepts and structures. Here are
some key points to explain set theory:
A set is a collection of distinct objects, considered as a whole. The objects in a set are
called elements or members.
Sets are typically denoted using curly braces. For example, A = {1, 2, 3} is a set
containing the numbers 1, 2, and 3.
2. Intersection The intersection of two sets ( A ) and ( B ) is the set that contains only the
elements that are common to both sets.
Example: Using the same sets ( A ) and ( B ): The intersection ( A ∩ B ) is: A ∩ B = {3}
3. Complement The complement of a set ( A ) refers to all the elements in a universal set (
U ) that are not in ( A ).
Example: Let ( U = {1, 2, 3, 4, 5, 6} ) (the universal set) and ( A = {2, 3} ). The
complement of ( A ), denoted as ( A’ ) or ( A ), is: A’ = U - A = {1, 4, 5, 6}
4
1.1.3 Relationships between Sets:
Subset: A set ( A ) is a subset of set ( B ) if all elements of ( A ) are also in ( B ). Denoted
as ( A ⊆ B ).
1.2 Probability
Probability is a branch of mathematics that measures the likelihood or chance of an event
occurring. It quantifies uncertainty and is expressed as a number between 0 and 1, where:
Outcome: The result of a single trial of an experiment. For example, getting a 4 when
rolling a die.
Event: A specific outcome or a set of outcomes from an experiment. For example, the
event of rolling an even number on a die includes the outcomes {2, 4, 6}.
Sample Space: The set of all possible outcomes of an experiment. For example, when
rolling a die, the sample space is ( S = {1, 2, 3, 4, 5, 6} ).
5
Numerical 1 : The probability of the closing of each relay (5 relays are shown in the
circuit) in the circuit shown below is given by p (we say that a relay is closed, when current
can flow). If all relays function independently, what is the probability that the lamp lights?
Solution : The lamp lights if there exists at least one conducting path from the supply to
the lamp through the relay network.
We analyze the circuit by conditioning on the state of Relay R3 .
P (A) = P (R1 ∩ R2 ) = p × p = p2
Similarly, the probability that the bottom branch conducts is
P (B) = P (R4 ∩ R5 ) = p × p = p2
The lamp lights if at least one branch conducts:
P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
Since the relays are independent,
P (A ∩ B) = p4
Therefore,
P (A ∪ B) = p2 + p2 − p4
P (A ∪ B) = 2p2 − p4
Since R3 is open with probability (1 − p), the probability contribution is
P1 = (1 − p)(2p2 − p4 )
6
Case 2: Relay R3 is closed
If R3 is closed, the middle connection links the two branches.
The circuit conducts if
PL = 1 − (1 − p)2
Probability that the right side conducts:
PR = 1 − (1 − p)2
Since these events are independent,
P = PL × P R
P = [1 − (1 − p)2 ]2
Since R3 must also be closed (probability p),
P2 = p[1 − (1 − p)2 ]2
Total Probability
The required probability that the lamp lights is
P = P1 + P2
Final Result
7
1.4.1 Discrete Random Variable:
A discrete random variable takes a finite or countably infinite number of distinct values.
Let X be a discrete random variable that can take values x1 , x2 , x3 , . . ..
The probability mass function (PMF) of X is defined as
P (X = xi ) = p(xi ), i = 1, 2, 3, . . .
The PMF satisfies the following properties:
p(xi ) ≥ 0, ∀i
∞
X
p(xi ) = 1
i=1
where µ = E[X].
where µ = E[X].
8
1.4.2 Numerical on Discrete Random Variable
A random variable X represents the number obtained when a fair die is thrown once.
Solution:
Since the die is fair,
X = 1, 2, 3, 4, 5, 6
1
P (X = x) = , x = 1, 2, 3, 4, 5, 6
6
Mean:
6 6
X X 1 1 21
E[X] = xP (X = x) = x· = (1 + 2 + 3 + 4 + 5 + 6) = = 3.5
x=1 x=1
6 6 6
Second moment:
6
X 1 91
E[X 2 ] = x2 P (X = x) = (12 + 22 + 32 + 42 + 52 + 62 ) =
x=1
6 6
Variance:
91 35
Var(X) = E[X 2 ] − (E[X])2 = − (3.5)2 =
6 12
35
E[X] = 3.5, Var(X) =
12
9
Qu 1 Suppose that each of 3 sticks is broken into one long and one short part. The 6 parts
are arranged into 3 pairs from which new sticks are formed. Determine the probability that
the parts will be joined in the original order.
Solution : Label the parts as: L1 , L2 , L3 and S1 , S2 , S3
The correct (original) pairing is: (L1 , S1 ), (L2 , S2 ), (L3 , S3 )
[ Total number of ways to pair: ]
We must pair 3 long parts with 3 short parts. Number of ways to match 3 long parts with
3 short parts: [ = 3! = 6 ]
[ Favorable outcomes: ]
Only one arrangement gives the original pairing: (L1 , S1 ), (L2 , S2 ), (L3 , S3 )
So, favorable cases: [ = 1 ]
[ Probability: ]
P = Favorable outcomes
Total outcomes
= 3!1 = 61
1
[ Final Answer: ]
6
[ Problem: ] A product is manufactured by two factories A and B. 80% of the product is
manufactured in company A and 20% in company B. 30% of the products from A are defective,
while 10% from B are defective.
If a randomly selected product from the market is found to be defective, find the probability
that it was manufactured by company A.
[ Solution: ]
Let: [ P(A) = 0.8, P(B) = 0.2 ] [ P(D—A) = 0.3, P(D—B) = 0.1 ]
We need to find: [ P(A—D) ]
P (A),P (D|A)
Using Bayes’ Theorem: P (A|D) = P (A),P (D|A)+P (B),P (D|B)
[ Substituting values: ]
0.8×0.3
P (A|D) = (0.8×0.3)+(0.2×0.1)
0.24
= 0.24+0.02
= 0.24
0.26
= 1213
12
[ P (A|D) = ≈ 0.9231 ]
13
Solution:
10
(a) Verification:
Z ∞ Z 1
1
fX (x) dx = 2x dx = x2 0 = 1
−∞ 0
(b) Mean:
Z ∞ Z 1 Z 1
E[X] = xfX (x) dx = x(2x) dx = 2 x2 dx
−∞ 0 0
3
1
x 2
=2 =
3 0 3
2
E[X] =
3
Example 2 (Discrete Random Variable):
Let X be the number obtained when a fair die is thrown once.
The mean of X is E[X] = 16 (1 + 2 + 3 + 4 + 5 + 6) = 3.5.
The second moment is E[X 2 ] = 61 (12 + 22 + 32 + 42 + 52 + 62 ) = 91
6
.
Therefore, the variance is Var(X) = 916
35
− (3.5)2 = 12 .
Example 1:
The distribution function of a continuous random variable X is given by F (x) = 1−e−x , x ≥
0.
Find the probability density function of X.
Solution:
The probability density function is the derivative of the distribution function.
d d
f (x) = F (x) = (1 − e−x ) = e−x , x ≥ 0.
dx dx
11
Hence, the probability density function is f (x) = e−x , x ≥ 0.
Example 2:
The distribution
function of a random variable X is defined as
0, x < 0
F (x) = x2 , 0 ≤ x ≤ 1
1, x > 1
Find the corresponding probability density function.
Solution:
The probability density function is obtained by differentiating F (x).
For 0 ≤ x ≤ 1,
d 2
f (x) = (x ) = 2x.
dx
Thus, the
( probability density function is
2x, 0 ≤ x ≤ 1
f (x) = .
0, otherwise
Example 3:
The distribution
function of a random variable X is
0, x<1
x−1
F (x) = , 1≤x≤4
3
1, x>4
Find P (2 ≤ X ≤ 3) and the probability density function.
Solution:
Using the distribution function,
P (2 ≤ X ≤ 3) = F (3) − F (2).
3−1 2 2−1 1
F (3) = = F (2) = =
3 3 3 3
Hence,
2 1 1
P (2 ≤ X ≤ 3) = − = .
3 3 3
The probability density function is
d 1
f (x) = F (x) = , 1 ≤ x ≤ 4.
dx 3
Example 4:
x
Let the distribution function of X be F (x) = for 0 ≤ x ≤ 5.
5
Find the probability density function and verify that it integrates to unity.
Solution:
The probability density function is
d 1
f (x) = F (x) = , 0 ≤ x ≤ 5.
dx 5
Verification:
R5 R5 1 1
0
f (x) dx = 0
dx = (5) = 1.
5 5
12
Hence, f (x) is a valid probability density function.
For a continuous random variable, the probability density function is the derivative of the
d
distribution function, i.e., f (x) = F (x).
dx
Example 1:
A discrete random variable X takes the values 0, 1, 2 with probabilities
P (X = 0) = 0.2, P (X = 1) = 0.5, P (X = 2) = 0.3.
Find the distribution function F (x).
Solution:
The distribution function is defined as F (x) = P (X ≤ x).
For x < 0, F (x) = 0.
For 0 ≤ x < 1, F (x) = P (X = 0) = 0.2.
For 1 ≤ x < 2, F (x) = P (X = 0) + P (X = 1) = 0.2 + 0.5 = 0.7.
For x ≥ 2, F (x) = P (X = 0) + P (X = 1) + P (X = 2) = 1.
Hence,
0, x<0
0.2, 0 ≤ x < 1
F (x) = .
0.7, 1 ≤ x < 2
x≥2
1,
Example 2:
The distribution
function of a discrete random variable X is given by
0, x<1
0.3, 1 ≤ x < 2
F (x) = .
0.8, 2 ≤ x < 3
x≥3
1,
Find P (X = 2).
Solution:
For a discrete random variable, P (X = x) = F (x) − F (x− ).
Therefore,
P (X = 2) = F (2) − F (2− ) = 0.8 − 0.3 = 0.5.
Example 3:
The distribution
function of X is
0, x <0
0.4, 0 ≤ x < 1
F (x) = .
0.6, 1 ≤ x < 2
x≥2
1,
Find the probability mass function of X.
Solution:
P (X = 0) = F (0) − F (0− ) = 0.4 − 0 = 0.4.
P (X = 1) = F (1) − F (1− ) = 0.6 − 0.4 = 0.2.
13
P (X = 2) = F (2) − F (2− ) = 1 − 0.6 = 0.4.
Hence, the probability
mass function is
0.4, x = 0
P (X = x) = 0.2, x = 1 .
0.4, x = 2
Example 4:
A function
F (x) is defined as
0, x < 0
1, 0 ≤ x < 1
F (x) = 43 .
, 1 ≤ x < 2
4
1, x ≥ 2
Verify whether F (x) is a valid distribution function.
Solution:
The given function satisfies the following properties: F (x) is non-decreasing, limx→−∞ F (x) =
0, limx→∞ F (x) = 1.
Hence, F (x) is a valid distribution function.
For a discrete random variable, the distribution function is a step function and the proba-
bility mass function is obtained using P (X = x) = F (x) − F (x− ).
1.6 Example:
Example 1: If you roll a fair six-sided die:
The sample space ( S ) is ( 1, 2, 3, 4, 5, 6 ).
If you want to calculate the probability of rolling a 3, there is 1 favorable outcome (rolling a 3)
out of 6 possible outcomes: [P (rolling a 3) = 16 ]
Example 2: A die is rolled twice. What is the probability that the sum is 9 or 11?
Solution:
A die is rolled twice. Let us find the probability that the sum is 9 or 11.
Step 1: Total number of outcomes
Each die has 6 faces, so the total number of outcomes when two dice are rolled is:
6 × 6 = 36
14
Number of favorable outcomes = 4
(ii) Sum = 11:
(5, 6), (6, 5)
Number of favorable outcomes = 2
Step 3: Total favorable outcomes
4+2=6
Step 4: Probability
favorable outcomes 6 1
P (sum = 9 or 11) = = =
total outcomes 36 6
Answer:
1
6
Independent events :
Two events are said to be independent if the occurrence of one does not affect the occur-
rence of the other. Mathematically, for independent events A and B:
P (Ri ) = 0.2
Example 3: The probability that it rains in a day is 0.2. What is the probability that
there will be rain in the next five days (all five days)?
Solution:
Let the probability of rain on a single day be
P (rain) = 0.2
15
Answer:
0.00032
Place Floor
Movie theater First
Restaurant First
Restaurant Second
Garment Second
Movie theater Second
Restaurant Third
Garment Third
If a customer is chosen at random, what is the probability that the customer is going to a
movie theater or to the third floor?
—
Solution:
Step 1: Total number of facilities
There are 7 facilities in total:
n=7
Step 2: Identify favorable outcomes
- Movie theaters:
None, so overlap = 0
P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
2 2 4
P (A ∪ B) = + −0=
7 7 7
16
—
Answer:
4
7
Example 5: A bag contains 4 white balls and 3 black balls. Two balls are drawn at
random. Find the probability of drawing one white and one black ball (not necessarily in that
order):
1. Without replacement
2. With replacement
Solution:
Step 1: Total balls in the bag
Total balls = 4 + 3 = 7
—
Case 1: Without replacement
- First, consider the two possible orders: 1. White first, then Black 2. Black first, then
White
(i) White first, Black second):
4 3 12 2
P (W then B) = · = =
7 6 42 7
(ii) Black first, White second):
3 4 12 2
P (B then W ) = · = =
7 6 42 7
Total probability (without replacement):
2 2 4
P (one white and one black) = + =
7 7 7
—
Case 2: With replacement
- Here, after drawing the first ball, it is put back into the bag. - The probabilities for each
draw remain the same.
(i) White first, Black second):
4 3 12
P (W then B) = · =
7 7 49
(ii) Black first, White second):
3 4 12
P (B then W ) = · =
7 7 49
17
Total probability (with replacement):
12 12 24
P (one white and one black) = + =
49 49 49
—
Answer:
4 24
Without replacement: With replacement:
7 49
Numerical 5: Five distinct numbers are randomly given to persons 1 to 5. Persons 1 and
2 compare their numbers and the winner is the one having the smaller number. The winner
compares with person 3 and so on. What is the probability that person 1 wins 2 times?
Solution : Let the numbers assigned to the persons be N1 , N2 , N3 , N4 , N5 , which are distinct
numbers from 1 to 5.
Person 1 wins exactly 2 times if and only if:
5! = 120
—
Step 2: Count favorable outcomes
- Let N1 be person 1’s number. - To win exactly 2 times, N1 must be **smaller than N2 and
N3 , but bigger than N4 **. - By enumeration (or combinatorial reasoning), there are exactly 10
favorable arrangements.
—
Step 3: Probability
favorable outcomes 10 1
P (person 1 wins exactly 2 times) = = =
total outcomes 120 12
—
Answer:
1
12
18
Conditional Probability:
Conditional probability refers to the probability of an event occurring given that another
event has already occurred.
Let A and B be two events in a sample space with P (B) > 0. The conditional probability
of A given B is defined as:
P (A ∩ B)
P (A | B) =
P (B)
Explanation:
- P (A ∩ B) is the probability that both A and B occur. - P (B) is the probability that B
occurs. - By dividing P (A ∩ B) by P (B), we are restricting our sample space to B, i.e., we
only consider outcomes where B has occurred.
Example:
Suppose a card is drawn from a standard deck of 52 cards. Let:
Example 6: Out of 8 items that have arrived, one is defective. A worker picks these parts
one by one. What is the probability that the third is defective given that the first two are not
(use conditional probability)
Solution : Solution:
Let the events be defined as follows:
P (A ∩ B)
P (A | B) =
P (B)
—
Step 1: Compute P (B)
- Total items = 8, defective = 1, non-defective = 7 - Probability that the first item is not
defective:
7
P (first not defective) =
8
19
- Probability that the second item is not defective (given first is not defective):
6
P (second not defective | first not defective) =
7
- Therefore,
7 6 6 3
P (B) = P (first two not defective) = · = =
8 7 8 4
—
Step 2: Compute P (A ∩ B)
- For A ∩ B to happen: first two are not defective, third is defective. - Probability:
P (A∩B) = P (first not defective)·P (second not defective | first not defective)·P (third defective | first two
7 6 1 1
P (A ∩ B) = · · =
8 7 6 8
—
Step 3: Conditional probability
1
P (A ∩ B) 8 1 4 1
P (A | B) = = 3 = · =
P (B) 4
8 3 6
—
Answer:
1
6
—
Explanation:
- Conditional probability allows us to compute the probability of an event (third item de-
fective) given that some other event has already occurred (first two are non-defective). - Once
we know the first two are non-defective, there are only 6 items left, one of which is defective,
so the probability that the third is defective is 1/6.
1. Conditional Probability:
Conditional probability is the probability of an event occurring given that another event has
already occurred.
Let A and B be two events in a sample space S, with P (B) > 0. The conditional probability
of A given B is defined as:
P (A ∩ B)
P (A | B) =
P (B)
Example:
20
Suppose a card is drawn from a standard deck of 52 cards. Let
Example:
A factory has two machines producing items:
- Machine 1 produces 60% of items, Machine 2 produces 40% - Defective rates: Machine 1
1%, Machine 2
2%
Let D be the event that an item is defective. The marginal probability of selecting a
defective item is:
21
Let B be any event with P (B) > 0. Then, the probability of Ak given that B has occurred
is:
P (B | Ak ) P (Ak )
P (Ak | B) = Pn
i=1 P (B | Ai ) P (Ai )
—
Explanation:
- P (Ak ): Prior probability of event Ak (before observing B) - P (B | Ak ): Likelihood of
observing B given Ak - P (AP k | B): Posterior probability, updated probability of Ak after
n
observing B - Denominator i=1 P (B | Ai ) P (Ai ): total probability of B, also called the
marginal probability of B
—
Example:
Suppose a factory produces items from two machines:
- Machine 1 produces 60% of items, Machine 2 produces 40% of items. - Defective rate:
Machine 1
1%, Machine 2 2%
Let D be the event that an item is defective. Find the probability that a defective item
came from Machine 2.
P (D | Machine 2) P (Machine 2)
P (Machine 2 | D) =
P (D | Machine 1)P (Machine 1) + P (D | Machine 2)P (Machine 2)
Example 7 : It is observed that 40% the tape recorders have a flaw and they will die
within six months if they had a flaw. Out of those that don’t have a flaw, 20% dies within 6
months. Your tape recorder died in 4 months. What is the probability that it had the flaw?
Solution :
Let us define the events:
22
P (D | F ) P (F )
P (F | D) =
P (D | F )P (F ) + P (D | F c )P (F c )
—
Step 2: Substitute the given probabilities
P (D | F c ) = 0.2
—
Step 3: Compute P (F | D)
1 · 0.4 0.4 0.4 10
P (F | D) = = = =
1 · 0.4 + 0.2 · 0.6 0.4 + 0.12 0.52 13
—
Answer:
10
13
—
Explanation:
- This uses **Bayes’ theorem**, which allows us to compute the probability of an event (the
tape recorder has a flaw) given observed evidence (it died within 4 months). - We combine the
probability of dying given a flaw and the probability of dying without a flaw, weighted by their
prior probabilities.
Example1 1An urn contains 4 white and 6 black balls and another urn contains 3 white
and 5 black balls. Two balls are drawn at random from the first urn and placed in the second
urn and then 1 ball is drawn at random from the first urn and placed in the second urn, what is
the probability that the ball drawn is white from the second urn. SOlve using total probability.
Solution:
Urn I contains 4 white and 6 black balls
Urn II contains 3 white and 5 black balls
Two balls are drawn at random from Urn I and transferred to Urn II. Then one ball is
drawn at random from Urn II. We find the probability that this ball is white.
23
4
2 6 2
P (W W ) = 10
= =
2
45 15
After transfer, Urn II has:
(3 + 2)W, 5B = 5W, 5B
5
P (W | W W ) =
10
24
Solution :
P (W | F ) = 0.6
P (W | O) = 0.1
P (F ) = 0.3
where W = event that the team wins the game, F = event that the team scores the first
goal, O = event that the opposing team scores the first goal.
Since either the team or the opponent scores the first goal,
P (W ) = P (W | F )P (F ) + P (W | O)P (O)
= (0.6)(0.3) + (0.1)(0.7)
= 0.18 + 0.07
= 0.25
Solution : Let Hk : the event that the bag contains k white balls, k = 0, 1, 2, 3, 4, 5.
Since nothing is known beforehand, we assume all hypotheses are equally likely:
1
P (Hk ) = , k = 0, 1, 2, 3, 4, 5
6
Let E: the event that the two drawn balls are white.
—
Step 1: Likelihood P (E | Hk )
If the bag contains k white balls, then
0,
k<2
k
P (E | Hk ) = 2
5 ,
k≥2
2
25
P (E | H5 )P (H5 )
P (H5 | E) = P5
k=0 P (E | Hk )P (Hk )
Compute numerator:
5
2
P (E | H5 ) = 5
=1
2
1 1
P (E | H5 )P (H5 ) = 1 · =
6 6
Compute denominator:
5 k
X
2 1 1 2 3 4 5
5
· = 5 + + +
k=2 2
6 6 2 2 2 2 2
1 20 1
= (1 + 3 + 6 + 10) = =
6 · 10 60 3
—
1
6 1
P (H5 | E) = 1 =
3
2
1
P (all balls are white) =
2
26
1.8.1 Inclusive probability:
Inclusive probability considers the probability of either event occurring, allowing for the pos-
sibility of overlap. When events can happen simultaneously, the probability of their union
includes the overlapping outcomes.
Rolling a Die:When rolling a fair six-sided die, the outcomes (1, 2, 3, 4, 5, 6) are mutually
exclusive. For instance, if you roll a 3, you cannot simultaneously roll a 5. If you denote
two events A = roll a 2)and(B = roll a 5), then(P (A ∩ B) = 0).
Choosing a Card from a Deck:In a standard deck of cards, the event of drawing a heart
(A) and the event of drawing a spade (B) are mutually exclusive. If you draw a card that
is a heart, it cannot be a spade at the same time. Thus, P (A ∩ B) = 0).
Weather Conditions: The events ”it is raining” and ”it is snowing” can be considered
mutually exclusive in a particular location during a specific time (e.g., at roughly the
same temperature and conditions). You cannot have rain and snow falling at the same
time under typical conditions.
27
Sports Events: If a sports team has an event where they either win (A) or lose (B), these
outcomes are mutually exclusive. If the team wins, they cannot lose in that particular
game.
Intersections: The intersection A ∩ B includes the outcomes where the card drawn is both
a heart and a face card. In this case, it includes the cards king, queen and Jack. So,
3
P (A ∩ B) = 52 .
Calculating the Probability of the Union:P (A∪B) = P (A)+P (B)−P (A∩B) Substituting
13 12 3 22 11
the values P (A ∪ B) = 52
+ 52
− 52
= 52
= 26
28
1.8.6 Brackets and their meanings
[a, b) includes all numbers from a to b , including a but not b .
(a, b) includes all numbers strictly between a and b , excluding both a and b .
1.8.7 Independent events
Can both happen together, with their own individual probabilities.
Events that do not influence each other’s occurrence.
P(A ∩ B) =P (A) × P (B) = × = 1
2
1
6
1
12
29
Event ( A ): Drawing a heart.
Event ( B ): Drawing a red card. There are 26 red cards in total (hearts and diamonds),
and since half of the red cards are hearts, we can find:
( P (A) = 13
52
= 41 )(probability of drawing a heart).
( P (B) = 52 = 21 ) (probability of drawing a red card).
26
( P (A ∩ B) = P (A) = 1352
= 41 )(since all hearts are red).
1
P (A∩B)
Now we can find P (A ∩ B): [P (A|B) = P (B)
= 4
1 = 21 ]
2
This means that given that a red card is drawn, the probability that it is a heart is 12 .
P (A|B) = P (B|A)·P
P (B)
(A)
Problem: The patients of the infirmary of a cardiology clinic have been operated by
three physicians P1 , P2 and P3 . Assume that these physicians have operated 50%, 30% and
20% of all the patients of the infirmary and that they commit a malpractice with probability
0.04, 0.05 and 0.02, respectively. If a patient of the infirmary is chosen at random and if he/she
is victim of a malpractice, what is the probability that the physician P3 has caused the problem?
Solution: Probability that a patient was operated by P1 , P2 and P3 : P (P1 ) = 0.50, P (P2 ) =
0.30, P (P3 ) = 0.20
Probability of malpractice given the physician:
P (M |P 1) = 0.04, P (M |P2 ) = 0.05, P (M |P3 ) = 0.02
Problem: In a bag there are three true coins and one false coin with head on both sides.
A coin is choosen at random and toaased four times. If head occurs all the four times, what is
the probability that the false coin was chosen and used.
Solution: P(Selecting true coin)= P1 = 43
P(Selecting false coin)= P2 = 14
P(getting all heads with true coin) = 21 ∗ 12 ∗ 12 ∗ 12 = 16
1
30
P (Selectingf alsecoin)∗P (gettingallheadswithf alsecoin) 1/4∗1
P (Selectingtruecoin)∗P (gettingallheadswithtruecoin)+P (gettingallheadswithf alsecoin)∗P (Selectingf alsecoin)
= 3/4∗1/16+1/4∗1
16
P(false coin was chosen and used)= 19
Problem: A bag contains 7 red and 3 black balls and another bag contains 4 red and
5 black balls. One ball is transferred from the first bag to the second bag and then a ball is
drawn from the second bag. If this ball happens to be red, find the probability that a black
ball was transferred.
Solution: P(probability that a black ball was transferred)=
P (P robabilityof transf erringblackball)∗P (”N ow”drawingaredball)
P (P robabilityof transf erringaredball)∗P (N owtransf erringaredball)+P (P robabilityof transf erringblackball)∗P (”N ow”drawingaredball)
(3/10∗4/10)
= 3/10∗4/10+7/10∗5/10 = 12/47
Problem Find the probability that a year is selected at random would contain 53 Sundays.
Solution: There are two possibilities
Case 1 : P(Leap year is selected) = 1/4 i.e 53 Sundays Or 52 weeks + 2 days extra (Sat-Sun
or Sun-Mon)
= 1/4 * 2/7 = 2/28
Case 2 : P(Selecting a non leap year) = 3/4 i.e 52 weeks and 1 day
= 3/4 * 1/7 = 3/28
∴, P( a year is selected at random would contain 53 Sundays) = 2//28+3/28 = 5/28
Problem: Let X, Y, Z be te events which are independent with probability a,b,c respec-
tively. Let the random variable ’n’ denotes the number x,y,z which occurs. Then find the
probability that exactly two events occur.
Solution: Let P(x) = a, P(y) = b, P(z) = c;
XYZ’ + XY’Z + X’YZ probability that exactly two events occurs
ab(1-c) + a(1-b)c + (1-a)bc
ab - abc + ac -abc + bc -abc
ab + ac + bc -3abc
1√ 3√ 15 √ 105 √
3 5 7 9
(Γ = π); (Γ = π); (Γ = π); (Γ = π) (1)
2 2 2 4 2 8 2 16
Negative Arguments: The Gamma function is not defined for non-positive integers, and
it has poles at these values. However, it can be related via the reflection formula: [
π
Γ(z)Γ(1 − z) = sin(πz) ]
It allows for the computation of the factorial of any positve real number instead of just integers.
A/B/C
where:
A = Arrival process
B = Service time distribution
C = Number of servers
Explanation of M/M/1:
A queue with Poisson arrivals, exponential service time, and a single server.
2. Markov Property
The system satisfies the Markov property:
3. What is a Server?
A server is the entity that provides service to customers in a queueing system.
Definition:
32
4. Examples of Servers
System Server
Bank Teller
Hospital Doctor
Call center Operator
Computer system CPU
Restaurant Waiter
Elevator system Elevator
5. Service Rate
The service rate is denoted by:
6. Types of Servers
(a) Single Server System
Machine
Software system
Automated process
33
Queueing Theory – Questions and Answers
Q1. What is a queueing system? Explain its components.
Answer:
A queueing system is a mathematical model used to study waiting lines.
Components:
34
(b) Average number in system:
λ 4
L= = =2
µ−λ 6−4
Telecommunication systems
Traffic control
Computer networks
Q6. What is the condition for stability of a queueing system?
λ
ρ= <1
µ
If ρ ≥ 1, the system becomes unstable.
35
(c) Find the average time spent in the system (W ).
Answer:
λ 5
ρ= = = 0.625
µ 8
λ 5 5
L= = = ≈ 1.667
µ−λ 8−5 3
1 1
W = = ≈ 0.333 hours
µ−λ 3
λ2 25 25
Lq = = = ≈ 1.042
µ(µ − λ) 8×3 24
λ 5 5
Wq = = = ≈ 0.208 hours
µ(µ − λ) 8×3 24
36
Average time spent in the system:
1
W =
µ−λ
Using Little’s Law:
L = λW
The system remains stable only when the service rate exceeds the arrival rate.
As λ approaches µ, the queue length and waiting time increase rapidly.
37
Qu 1: If X is a binomial random variable then variance of X is?
Qu 2: If Y is a poissons random variable then variance of Y is ?
Qu 3: If X is a binomial random variable, the probability of X=n is ?
Qu 4: If Y is a poissons random variable the probability of Y=1 is ?
Qu 5: The probability of getting a head when a biased coin is tossed is 0.6. What is the
probability of getting a three heads when this coin is tossed 5 times.
Qu 6: Consider a Poisson random variable with λ=3 per hour. The probability of non arrival
in an hour is
Qu 7:In a standard normal variable the area to the left of Z=0 is
Qu 8:A random variable following a normal distribution has X=2µ and µ = 2σ. Find the value
of z.
Qu 9:A random variable following a normal distribution has µ = 2σ. What should be X so that
z=1
Qu 10:In a normal distribution, p(z<1) is
Qu 11:Imagine a biased coin where probability of getting head is 0.7 and tails is 0.3 is tossed
once. Is it Bernoulli distribution.
Qu 12:Number of students in class is wihch type of distribution.
Qu 13:If a cricket match is held between India and Kenya, probability of India winning is 0.6.
Consider the result is either win or loss (i.e there is no tie of cancellation) Suppose there is 5
match series between India and Kenya, what is the probability of India winning the series.
Qu 14:A damaged product will come to the production area at an average rate of 20 minutes.
Assuming a poisson process, what is the probability of 5 defective products arriving in one
hour.
Qu 15:A die has four sides pasted red and two sides pasted green. It is rolled six times. Find
the probability of getting four red and two green?
Qu 16:Comment on the normal distribution curve. RIght/Left skewed/Symmetric.
Qu 17:If only the mean of a normal distribution changes :
Qu 18:If only the standard deviation of a normal distribution changes :
Qu 19:
Qu 20:
Qu 21:
Qu 22:
2 Statistics
Statistics is a branch of mathematics that deals with the collection, analysis, interpretation,
presentation, and organization of data. It provides methods and techniques to summarize and
make sense of data, allowing for informed decision-making based on empirical evidence.
38
can significantly impact the analysis and conclusions.
Data Analysis: Statistical analysis involves techniques to explore, describe, and under-
stand the data. This can include descriptive statistics (summarizing and organizing data)
and inferential statistics (making predictions or generalizations about a population based
on sample data).
Inferential Statistics:This branch uses sample data to make inferences about a larger
population. It involves hypothesis testing, confidence intervals, regression analysis, and
other methods that allow statisticians to draw conclusions beyond the immediate data.
Statistical Models:Statistics often involves creating models that represent real-world pro-
cesses. These models can help in understanding relationships between variables and
predicting outcomes.
39
Median: The middle value when the data is ordered. f the number of observations is odd,
the median is the middle value; if even, it is the average of the two middle values.
Mode: The most frequently occurring value(s) in a dataset. A dataset may have one
mode (unimodal), more than one mode (multimodal), or no mode at all.
Variance: The average of the squared differences from the mean. For a population, it is
P 2
P 2
calculated as: σ 2 = (X−µ)
N
For a sample, it is calculated as: s2 = (X−
n−1
X̄)
where µ is
the population mean, X̄ is the sample mean, (N) is the population size, and (n) is the
sample size.
Standard Deviation:
√ The square root of variance, representing
√ the average distance from
the mean. σ = σ2 (population standard deviation) s = s2 (sample standard deviation)
Interquartile Range (IQR): Measures the spread of the middle 50% of the data. Calculated
as the difference between the third quartile (Q3) and the first quartile (Q1): IQR =
Q3 − Q1
Kurtosis: Measures the ”tailedness” of the distribution. High kurtosis indicates more
data located in the tails; low kurtosis indicates flatter tails.
40
3 Random number
We are interested in measuring any characteristic of an experiment, we must associate a number
with each outcome. For instance, we can assign the values 1 and 0 to a perfect and a defective
manufactured item, and to a sequence of heads and tails we can assign the number of heads
observed. There exist innumerous examples for random variables. For instance, age, weight,
height, income, number of children and number of cars are possible random variables associated
with a randomly chosen person. The numbers of balls of a given color are random variables
associated with a random selection of balls from an urn, or the sum of the outcomes is a
random variable associated with the experiment of tossing two dice. Usually capital letters
such as X, Y and Z are used to denote random variables. However, when speaking of the value
of these variables, in general, lowercase letters such as x, y and z are used. Random variables
provide a rigorous framework for quantifying and analyzing the outcomes of complex, real-world
experiments and processes across multiple domains.
41
3.1 Discrete Random Variables
Given a random variable X, if the range space RX is finite or countably infinite, X is called a
discrete random variable.
The range space of such a variable can be written as RX = x1, x2, . . . , xn, . . . , i.e., in the finite
case, the list of values terminates and in the countably infinite case it continues indefinitely.
We associate a probability p(xi) with each element xi of RX , such that p(xi) ≥ 0 for all i and
p(x1) + p(x2) + . . . = 1.
Discrete Random Variables is defined by a probability mass function (PMF), which provides
the probability of each possible value.
42
Solution: For the number of selfish actions when drawing two from a mix of 4 ”selfish
actions” and 16 ”kind actions”, we first define our scenario mathematically. The total number
of actions is 20.
When drawing two actions, we can have the following possibilities for the number of ”selfish
actions” drawn:
0 ”selfish actions” 1 ”selfish actions” 2 ”selfish actions” We will use combinations to calculate
the probabilities for each of these scenarios, based on the distribution.
Probability
20×19 Calculations Total Ways to Choose 2 ”selfish actions” from 20: [ Total combinations
20
= 2 = 2 = 190 ]
Probability of Drawing 0 ”selfish actions”:
(4)·(16)
Choosing 0 selfish actions means choosing kind actions. [ P(X = 0) = 0 20 2 = 1·120 =
(2) 190
120
190
= 12
19
≈ 0.632] Probability of Drawing 1 selfish actions:
(4)·(16)
Choosing 1 selfish actions and 1 kind actions. [ P(X = 1) = 1 20 1 = 4·16 64
= 190 = 32 ≈ 0.337
(2) 190 95
] Probability of Drawing 2 ”selfish actions”:
(4)·(16) 6·1 6 3
Choosing 2 ”selfish actions”. [ P(X = 2) = 2 20 0 = 190 = 190 = 95 ≈ 0.032 ] Summary
(2)
of the Probability Distribution The probability distribution of the number of ”selfish actions”
drawn when picking 2 actions is:
( P(X = 0) = 12 19
≈ 0.632 ) ( P(X = 1) = 95 32
≈ 0.337 ) ( P(X = 2) = 95 3
≈ 0.032)
Example 2: Let X be a random variable taking values 1,2 and 3 with probabilitiyes 3/15,
7/15, 5/15. Find its distribution function and show the distribution function diagrammatically.
Solution:Fx(x ≤ 1)= P(X=1)=3/15
43
Another example is of dais.
(
1
6
if x = 1, 2, 3, 4, 5, 6
p(x) = (2)
0 otherwise
Poisson Distribution: For counting the number of events occurring within a fixed interval
of time or space.
Geometric Distribution: For modeling the number of trials until the first success occurs.
For a discrete random variable ( X ), the CDF, denoted as FX (x) , is defined mathematically
as:
FX (x) = P (X ≤ x)
Where:
Fixed Number of Trials (n):The binomial distribution is defined for a fixed number of
trials, denoted as ( n ). Each trial is an independent event.
Two Possible Outcomes:Each trial results in one of two possible outcomes: ”success” or
”failure.” Success is typically denoted by ( p ), while failure is denoted by ( q ) (where (
q = 1 - p )).
Constant Probability of Success (p):The probability of success remains constant across all
trials. For instance, if you are flipping a coin, the probability of getting heads (considered
a success) remains ( 0.5 ) in each trial.
Independence of Trials: The outcome of one trial does not affect the outcomes of others.
This independence is crucial for applying the binomial distribution.
44
Probability Mass Function (PMF): The probability of obtaining exactly ( k ) successes
in ( n ) trials is given by the probability mass function: [ P(X = k) = nk pk (1 − p)n−k ]
where ( nk ) is the binomial coefficient, representing the number of ways to choose ( k )
Mean and Variance: The mean (expected value) of the binomial distribution is given by:
[µ= n p ] The variance of the binomial
p distribution is given by: [ σ 2 = n p (1 - p) ] The
standard deviation is thus ( σ = np(1 − p) ).
Shape of the Distribution: The shape of the binomial distribution can vary depending on
the values of ( n ) and ( p ) For ( p = 0.5 ), the distribution is symmetric if ( n ) is large.
For (p < 0.5), the distribution is skewed to the right.
For (p > 0.5 ), the distribution is skewed to the left.
Example 3: Studies show that color blindness affects 8% pf men. A random sample of 10
men is taken. Find the probability that all 10 men are color blind? No men are color blind?
Exactly 2 men are color blind and at least 2 men are color blind?
Solution: Given that the probability of a man being color blind is ( p = 0.08 ) (which is 8%),
the probability of a man not being color blind is ( 1 − p= 0.92 ). Case 1: All 10 men are color
blind
45
P (X = k) = nk pk (1 − p)n−k
P (X = 10) = 1010
(0.08)10 (0.92)0
P (X = 10) = (0.08)10 ≈ 1.073741824 × 10−11
Case 2: No men are color blind
P (X = 0) = 100
(0.08)0 (0.92)10
P (X = 0) = (0.92)10 ≈ 0.4344
Case 3: Exactly2 men are color blind
P (X = 2) = 102
(0.08)2 (0.92)8
P (X = 2) = 45 × 0.0064 × 0.5132 ≈ 0.1478
Case 4: At least 2 men are color blind
P (X ≥ 2) = 1 − P (X < 2)
P (X < 2) = P (X = 0)1+ P (X9 = 1)
P (X = 1) = 101
(0.08) (0.92)
P (X = 1) = 10 × 0.08 × 0.4721 ≈ 0.3777
P (X < 2) = P (X = 0) + P (X = 1) ≈ 0.4344 + 0.3777 = 0.8121
P (X ≥ 2) = 1 − P (X < 2) ≈ 1 − 0.8121 = 0.1879
Applications:
Poisson’s distribution is used in various real-world situations, such as:
46
The number of phone calls received by a call center in an hour.
The number of decay events per unit time from a radioactive source.
The number of printing errors on a single page.
Traffic flow, such as the number of cars passing through a toll booth in an hour.
Mean and Variance:
Non-negativity: The pdf must be non-negative for all possible values of the random
variable: f (x) ≥ 0 for all x
Normalization: The total area under the pdf over the entire range of the random variable
must beR equal to 1. This represents the certainty that some value within the range will
∞
occur: −∞ f (x)dx = 1
Probability of an Interval: The probability that the random variable ( X ) falls within an
interval ([a, b]) is given by the integral of the pdf over that interval: P (a ≤ X ≤ b) =
Rb
a
f (x), dx
4 3
Example 4: The function given is: f (x) = 65 x for 2 ≤ x ≤ 3
Elsewhere, ( f(x) = 0 ). Show that f(x) is a pdf also determine the probability P(1.5 ≤ X ≤ 2.5)
Solution:
47
R3 4 3
2 65
x dx = 1
R 2.5 h 4 i2.5
4 3 4 x
2 65
x dx = 65 4
= 0.355
2
If the pdf of a random variable is given, we can identify the sample space as the region of
the real axis where the pdf has positive values.
Uniformly distributed continuous random variableA uniformly distributed continu-
ous random variable is characterized by having an equal probability of taking any value within a
specified range. This type of distribution is defined by its constant probability density function
(pdf) over a particular interval, making it one of the simplest forms of continuous probability
distributions.
Key Characteristics of Uniform Distribution
Uniformity:Within the specified interval ([a, b]), every outcome is equally likely. Outside
this interval, the probability is zero. Constant probability within domain.
Probability Density Function (pdf):For a uniform distribution over the interval ([a, b]),
1
the pdf is given by: f (x) = b−a
for a ≤ x ≤ b
1
b−a
= height
1
P (a ≤ X ≤ b) = b−a
a+b
mean or expected value= median = µ = 2
q
2
Standard deviation = σ = (b−a)
12
d−c
P (c ≤ X ≤ d) = b−a
48
has no prior information about it. The phase of a sinusoidal signal is a crucial parameter in
determining its precise position or timing within one complete cycle of the waveform. Mathe-
matically, a sinusoidal signal can be expressed as ( A cos(ωt + ϕ) ). The phase of a sinusoid is
typically measured in radians, and a full cycle corresponds to (2π) radians. While the transmit-
ter may know the precise phase (ϕ), the receiver often does not have prior information about
the phase due to channel impairments or lack of synchronization. Without any additional infor-
mation, all phases between (0) and (2π) are equally possible at the receiver. In such a scenario,
the phase at the receiver can be modeled as a random variable that is uniformly distributed
over [0, 2π). A uniform distribution over [0, 2π) implies that any phase within this interval is
equally likely. This reflects a state of maximum uncertainty in the phase information. Prob-
ability Density Function (PDF) for a uniformly distributed random variable (Φ) over [0, 2π)
1 1
[ f (ϕ) = 2π for 0 ≤ ϕ < 2π]. The height of the PDF is constant at ( 2π ), ensuring that the
total probability over the interval sums to 1.
Modeling the phase as a uniform random variable is mathematically convenient and realistic
in cases where the receiver does not have phase information. This assumption facilitates the
analysis and design of communication systems, particularly in evaluating system performance,
designing demodulation schemes, and studying the effects of phase uncertainty.
Probability Density Function (PDF): The PDF ( f(x) ) for a uniform distribution is given
by: (
1
b−a
for a ≤ x ≤ b
f (x) = (3)
0 otherwise
This indicates that the probability density is constant across the interval ([a, b]) and zero
outside this interval.
The cumulative distribution function ( F(x) ) is defined as the probability that the random
variable ( X ) takes a value less than or equal to ( x ): F (x) = P (X ≤ x)
To derive the CDF from the PDF, we simply integrate the PDF over the range [a, x] ,
considering different cases based on the value of ( x ) relative to ( a ) and ( b ).
Case 1: ( x < a ) For values of ( x ) less than ( a ), the CDF is: [ F(x) = P(X ≤ x) = 0
(since the distribution starts at a) ]
Case 2: (a ≤ x ≤ b ) For values of ( x ) withinR the interval ([a, b]), we integrate the
x
PDF from R x (1 a ) to ( x ): [ F (x) = P (X ≤ x) = a1 f (t)x , dt1 ] Substituting the PDF: [
F (x) = a b−a , dt ] Calculating the integral: [ F (x) = b−a [t]a = b−a (x − a) ] Thus, simplifying:
[ F (x) = x−a
b−a
for a ≤ x ≤ b ]
Case 3: (x > b) For values greater than ( b ), the CDF is: [ F (x) = P (X ≤ x) = 1
(since all of the distribution is covered) ]
Putting all three cases together, the cumulative distribution function ( F(x) ) for a uniform
49
distribution on the interval ([a, b]) can be expressed as:
0
if x < a
x−a
F (x) = b−a if a ≤ x ≤ b (4)
1 if x > b
1
The variance is given by: [ Var(X) = λ2
]
50
Example
Suppose a call center receives phone calls randomly at an average rate of 3 calls per hour.
Let ( X ) be the time until the next call.
Here, ( λ = 3 ) calls per hour. The PDF is given by: [f (x; 3) = 3e−3x for x ≥ 0 ] If we
want to find the probability that the time until the next call is more than 30 minutes (i.e.,
(x > 0.5 ) hours), we can calculate: [ P (X > 0.5) = 1 − F (0.5) = e−3×0.5 ≈ e−1.5 ≈ 0.2231 ]
Comparison between the distrubutions The binomial distribution deals with the number
of successes in a fixed number of independent trials, and the geometric distribution deals with
the time between successes in a series of independent trials. Just so, the Poisson distribution
deals with the number of occurrences in a fixed period of time, and the exponential distribution
deals with the time between occurrences of successive events as time flows by continuously. A
continuous random variable is a random variable which can take any value in some interval. A
continuous random variable is characterized by its probability density function, a graph which
has a total area of 1 beneath it: The probability of the random variable taking values in any
interval is simply the area under the curve over that interval (and the probability of the random
variable taking any one specific value is essentially 0). The exponential distribution: Consider
the time between successive incoming calls at a switchboard, or between successive patrons
entering a store. These “interarrival” times are typically exponentially distributed. If the mean
interarrival time is λ (so λ is the mean arrival rate per unit time), then the variance will be
1/λ2 (and the standard deviation will be 1/λ ). The graph below displays the graph of the
exponential density function when λ = 1. Generally, if X is exponentially distributed, then
Pr(s < X = t) = e−λs − e−λt (where e = 2.71828) . The exponential distribution fits the
examples cited above because it is the only distribution with the “lack-of-memory” property:
51
If X is exponentially distributed, then Pr(X ≤ s + t|X > s) = Pr(X ≤ t). (After waiting a
minute without a call, the probability of a call arriving in the next two minutes is the same
as was the probability (a minute ago) of getting a call in the following two minutes. As you
continue to wait, the chance of something happening “soon” neither increases nor decreases.)
Note that, among discrete distributions, the geometric distribution is the only one with the
lack-of-memory property; indeed, the exponential and geometric distributions are analogues of
one another.
Example 5:Bus is uniformly late between 2 and 10 minutes. How long can you expect to
wait? With what standard deviation? If it’s greater than 7 minutes late, you will be late for
work. What is the probability of you being late.
Solution: a=2 , b=10
µ = 2+10 =6
q2
(10−2)2
σ= 12
=2.31
P (7 ≤ X ≤ 10)= 10−7
10−2
=0.375
Example 6: Let X be the lifetime of a certain electronic component (in hours). Suppose
that the pdf is given by
52
f (x) = xC2 for 1000 ≤ x ≤ 2000
Elsewhere, ( f(x) = 0 ). The pdf implies that we are assigning probability zero to the events
X < 1, 000 and X > 2, 000.
R 2000
Solution: 1000 x12 dx = 2000
1
C = 2000
The constant C is called a normalizing constant.
Continuous-Time Process: The index set is continuous, which allows variables at any
instant within a time interval. Example: Temperature readings over time.
Markov Process: A process where the future state depends only on the current state, not
on the sequence of events that preceded it.
53
4 Multiple Random Variables
We have so far only considered one-dimensional random variables, i.e., we assumed that the
outcome of a random experiment could be represented as a single number. However, in many
practical situations there are several characteristics associated with the elements of a popula-
tion or a sample. For example, a physician is interested in several characteristics of a patient,
e.g., age, weight, blood pressure, blood sugar values, etc. In evaluating the competitiveness of
the countries of a certain community, several characteristics are interesting, as, e.g., an index of
unemployment, stock prices, exchange values, etc. A bidimensional random variable gives you
a way to study how two random variables relate to each other. By analyzing them together,
you can assess correlations, patterns, and dependencies that single random variables might
miss. The range space—whether discrete or continuous—provides a visual representation of all
possible outcomes for these two variables on a coordinate plane. [2]
4.1 Example:
Let X, Y denote the number of sons and daughters of a family, randomly chosen from a certain
district of a city. The fictive probability distribution is given in the following table
X 0 1 2 3 4 P(Y=i)
Y
0 0.02 0.02 0.03 0.08 0.05 0.20
1 0.05 0.10 0.15 0.15 0.05 0.50
2 0.05 0.05 0.10 0.05 0.05 0.30
P(Y=i) 0.12 0.17 0.28 0.28 0.15 1
54
0.02+0.03+0.08+0.05+0.15+0.15+0.05+0.05+0.05=0.63.
Similarly, the probability of the event B=(X=Y) is obtained as P(B)=p(0,0)+p(1,1)+p(2,2)=
0.02+0.1+0.1= 0.22.
Finally, for C=(X+Y ≥ 4)we get
P(C)=p(2,2)+p(3,1)+p(3,2)+p(4,0)+p(4,1)+p(4,2) =0.1+0.15+0.05+0.05+0.05+0.05=0.45.
In order to calculate a probability depending on only one of the variables X, Y, we need only
the values of the margins of the table. For example, P(Y ≤ 1)=P(Y=0)+P(Y=1)=0.20+0.50=0.70.
One can also calculate conditional probabilities of two events depending on X and Y. For ex-
ample, the probability that the family has at least three sons if it has no daughter, is:
Non-Negativity:
The joint density function ( fX,Y (x, y) ) must be non-negative for all values of ( x ) and
( y ). That is, [ fX,Y (x, y) ≥ 0 for all x, y ]
Normalization:
The total
RR ∞ probability over the entire space must equal 1. This is expressed mathematically
as: [ −∞ fX,Y (x, y), dx, dy = 1] This means that when you integrate the joint density
function over the entire range of both variables, the result must be 1.
Marginal Densities:
The marginal density functions for each variable can be obtained by integrating the joint
density function with respect to the other
R ∞variable. The marginal density
R ∞ functions (fX (x)
) and ( fY (y)) are given by: [ fX (x) = −∞ fX,Y (x, y), dy] [ fY (y) = −∞ fX,Y (x, y), dx]
Independence:
If the random variables ( X ) and ( Y ) are independent, the joint density function can
be expressed as the product of the marginal densities: [fX,Y (x, y) = fX (x) · fY (y) ] If
this property holds, it indicates that knowledge of one variable does not provide any
information about the other.
4.3 Example:
An urn contains 3 balls numbered 1,2,3 and two balls are drawn in succession. If X is the
number on the first ball drawn and Y is the number on the second ball, find the probability
55
distribution of (X,Y). Case 1: The balls are replaced each time and Case 2: The balls are not
replaced
Solution: If the balls are replaced after each draw, the situation changes significantly. In
this case, each draw is independent of the previous one. The urn contains balls numbered 1, 2,
and 3, and you draw two balls in succession with replacement. Sample Space Since there are
3 balls and each ball can be drawn each time, the possible pairs ( (X, Y) ) when drawing two
balls can be represented as:
(1, 1) (1, 2) (1, 3) (2, 1) (2, 2) (2, 3) (3, 1) (3, 2) (3, 3) The total number of outcomes is
equal to ( 3 ×3 = 9).
P (X = x, Y = y) = P (X = x) ∗ P (Y = y) = 19 for each (x, y)
Case 2: The balls are not replaced. Each of the outcomes is equally likely, and since we
are drawing without replacement, the total number of ways to draw two balls from three is
(3 × 2 = 6 ).
Thus, the probability of each outcome is: [ P(X = x, Y = y) = 16 for each (x, y) ]
4.4 Example :
Verify whether fXY (x, y) = x2 + xy
3
is a valid two-dimensional probability density function (pdf)
over the specified region (0 ≤ x ≤ 1 ) , (0 ≤ y ≤ 2 ) and zero otherwise.
Solution:
R1R2 R1R2
f (x, y), dy, dx = 0 0 x2 + xy
0 0 XY 3
, dx dy = 1
xy
fXY (x, y) = x2 + 3
is a valid two-dimensional probability density function (pdf) over the
specified region.
56
particularly in the context of multiple continuous random variables.
The joint probability density function fXY (x, y) of two continuous random variables X and
Y describes the likelihood of these two random variables occurring simultaneously at specific
values x and y.
If ( X ) and ( Y ) are joint continuous random variables with a joint PDF given by:
fXY (x, y) = 6xy for 0 ≤ x ≤ 1, 0 ≤ y ≤ 1 this function describes the joint distribution of
X and Y .
Whereas, The marginal probability density function for a continuous random variable pro-
vides the probability density of that variable without consideration of the other variable(s). It
is derived from the joint PDF by integrating out the other variable(s).
From the above joint PDF (fXY (x, y) = 6xy ), the marginal PDF for ( X ) can be found by
integrating over ( Y ):
R1 R1 h 2 i1
fX (x) = 0 fXY (x, y), dy = 0 6xy, dy = 6x y2 = 3x
0
57
4.7 Example :
Find the marginal distribution functions of ( X ) and ( Y ) as well as the joint probabil-
ity density function (pdf) from the joint distribution function (FXY (x, y) = xy
16
(x + y) ) for
(0 ≤ x ≤ 2)and(0 ≤ y ≤ 2)
To find the marginal distribution function ( FX (x) ), replace ( y ) with its upper limit (2)
in (FXY (x, y)):
FX (x) == x·216
(x + 2) = x8 (x + 2) for (0 ≤ x ≤ 2) also,
To find the marginal distribution function ( FY (y) ), replace ( x ) with its upper limit (2) in
(FXY (x, y)):
FY (y) == y·2
16
(y + 2) = y8 (y + 2) for (0 ≤ y ≤ 2)
∂ 2 FXY (x,y)
fXY (x, y) = ∂x∂y
58
4.7.2 Example :
The joint probability distribution of X1 and X2 is given by
1
P (X1 = x1 , X2 = x2 ) = 27 (x1 + 2x2 ) where (x1 = 0, 1, 2) and (x2 = 0, 1, 2). Find the pdfs
of X1 and X2
Solution: Prepare the table of joint probability distribution
x1 +2
P (X1 = x1 ) = 9
for x1 = 0, 1, 2
P (X2 = x2 ) = 1+2x
9
2
for x2 = 0, 1, 2
The marginal pmf of X1 in tabular form is
4.7.3 Example :
The joint function of two dimensional discrete random variable (X,Y) is given by
f(x,y)= c(x2 + 2y) for x=0,1,2 ; y=1,2,3,4 and ’0’ elsewhere
2. p(X=2,Y=3)
59
Solution: We can tabulate the probabilities as follows.
Y 1 2 3 4 Total
X
0 2c 4c 6c 8c 20c
1 3c 5c 7c 9c 24c
2 6c 8c 10c 12c 36c
Total 11c 17c 23c 29c 80c
80 c = 1 hence c = 1/80
4.7.4 Example :
3 3
Given the joint probability density function fXY (x, y = x16y for 0 ≤ x ≤ 2 and 0 ≤ y ≤ 2 , and
fXY (x, y) = 0 elsewhere, we need to verify if this is a valid joint probability density functioniand,
R 2 R 2 x3 y 3 R 2 h R 2 x3 y 3
if desired, find the marginal distributions. Solution : 0 0 16 , dx, dy = 0 0 16 , dx , dy
The marginal probability h i2 density function of X is given by
R 2 x3 y 3 y 3 x4 y 3 24 3 3
0 16
, dx = 16 4 = 16 · 4 = y16 · 4 = y4
0 h 4 i2
R2 3 3 3
The marginal probability density function of Y is given by 0 x16y dx = y16 x4 =
0
y3 x24 y3 y3
16
· 4
= 16
·4= 4
4.7.5 Example :
The joint probability density function provided is:
fXY (x, y) = 2 for 0 ≤ x ≤ 1 and 0 ≤ y ≤ x
and fXY (x, y) = 0 otherwise.
Find the marginal pdfs of the random variables ( X ) and ( Y ).
Solution:
R1Rx R1Rx
0 0
fXY (x, y)dydx = 0 0
2dydx
Rx R1
2 0
[y]x0 dx = 2 0
xdx = 1
60
Marginal
R∞ pdf of y isR given by
1
f
−∞ XY
(x, y)dx = y 2dx = 2(1 − y)for 0 ≤ y ≤ 1
4.7.6 Example :
Prove that the given function fXY (x, y) = x + y is a valid probability density function (PDF)
and find the marginal pdf of X and Y.
Solution:
Marginal pdf of x is given by
R1R1
0 0
(x + y)dydx =1
R1 R1
fX (x) = 0 fXY (x, y)dy = 0 (x + y)dy
2 1
h i
xy + y2 = x + 12 for 0 ≤ x ≤ 1
0
4.7.7 Example :
Suppose X and Y represent the operating lives of the components A and B in years, in a certain
system and their probability density function is given by fXY (x, y) = e−x−y for x ≥ 0, y ≥ 0.
Elsewhere it is 0.
Find the probability of the event that component A has an operating life less than or
equal to 1 year.
Find the probability of the event that component B has an operating life greater than 2
years.
Solution:
The joint probability density function is given by:
61
R∞ R∞
= −∞ f (x, y)dx = −∞ e−x e−y dx
R∞
= e−y 0
e−x dx = e−y
R∞ ∞
2
e−y dy = [−e−y ]2 = 0.13
4.7.8 Example :
The joint probability density function of two random variables is given by
fXY (x, y) = 15e−3x−5y for x ≥ 0, y ≥ 0. Elsewhere it is [Link] the probebility that
(i) 1 < x > 2 and 0.2 < Y < 0.3 (ii) X < 2and Y > 0.2
Find the marginal probability distributions of X and Y.
R 2 R 0.3
Solution : (i) P (1 < X < 2, 0.2 < Y < 0.3) = 1 0.2 15e−3x−5y dydx
R2 −3x
R 0.3 −5y
1
−3e dx 0.2
e dy = (e−6 − e−3 ) (e−1.5 − e−1 ) =0.0322
R2 R∞
(ii) X < 2and Y > 0.2 = 15 0 e−3x dx 2 e−5y dy
= (1 − e−6 )((Re−1 ) = 0.367
∞
(b) fX (x)R = 0 15e−3x e−5y dy = 3e−3x for x > 0 and 0 elesewhere
∞
fY (y) = 0 15e−5y e−3x dx = 5e−5y for y > 0 and 0 elesewhere
4.7.9 Example :
A two dimensional random variable (X,Y) has the joint density
fX,Y (x, y) = 98 xy for 1 ≤ y ≤ 2 and 1 ≤ x ≤ y and 0 elesewhere
Find the marginal distributions.
R2 R2 8
Solution: Ry f X (x) = x
f XY (x,
Ry 8 y)dy = x 9
xydy = 49 x(4 − x2 )for 1 ≤ x ≤ 2
4
fY (y) = 1 fXY (x, y)dx = 1 9 xydx = 9 y(y 2 − 1)for 1 ≤ y ≤ 2
62
4.7.10 Example :
(X,Y) is a two dimensioinal continuous random variable with the following probability distri-
bution.
fXY (x, y) = 6e−2x−3y for X > 0, y > 0 and 0 elsewhere
Verify whether X , Y are independent.
Solution:
R ∞ The marginal probability distribution of X and Y are
−2x−3y
f (x) = R 0 6e dy = 2e−2x for x > 0
∞
f (y) = 0 6e−2x−3y dx = 3e−3y for y > 0
Since, f (x) ∗ f (x) = 2e−2x 3e−3y = 6e−2x−3y = fXY The variable X and Y are independent.
4.7.11 Example :
Find the probability mass function (PMF) of the joint distribution ( P (X1 = x1 , X2 = x2 ) =
1
27
(x1 + 2x2 ) ) for ( x1 = 0, 1, 2; x2 = 0, 1, 2), (i) Find the probability mass functions of x1 and
x2 . (ii) The conditional probability distribution of X1 given X2 = 2
Solution: TheP marginal probability distribution P2 of X 1 is given by
2 1
P (X1 = x1 ) = x2 =0 P (X1 = x1 , X2 = x2 ) = x2 =0 27 (x1 + 2x2 )
1
27
[(x1 + 0) + (x1 + 2) + (x1 + 4)] = 3x27 1 +6
= x19+2
X1 0 1 2 Total
2 1 4 The marginal probability distribution of X2 is given by
P(X1 ) 9 3 9
1
P2
x1 , X2 = x2 ) = 2x1 =0 27
1
P
P (X2 = x2 ) = x1 =0 P (X1 = (x1 + 2x2 )
1 3+6x2 1+2x2
27
[(0 + 2x2 ) + (1 + 2x2 ) + (2 + 2x2 )] = 27 = 9
X2 0 1 2 Total
1 1 5
P(X2 ) 9 3 9
1
The conditional probability distribution of X1 given X2 = 2 is obtained as follows.
x1 +4
P [(X1 =x1 )∩(X2 =2)] x1 +4
P (X1 = x1 | X2 = 2) = P (X2 =2)
= 27
5 = 15
for x1 = 0,1,2
9
X1 0 1 2 Total
4 1 2
P(X1 = x1 | X2 = 2) 15 3 5
1
Independence of Random Variables: Two random variables (X) and (Y) are said to
be statistically independent if and only if the joint cumulative distribution function (CDF) can
63
be expressed as the product of the individual cumulative distribution functions:
P (X ≤ x, Y ≤ y) = P (X ≤ x)P (Y ≤ y)
To find the joint probability density function (PDF) when (X) and (Y) are independent, we
differentiate the joint CDF with respect to (x) and (y):
4.7.12 Example :
64
5 Normal distribution:
5.1 Probability distribution of a binomial random variable
65
median, and mode of the Gaussian distribution are all equal.
The value of the normal distribution will become more apparent when we begin to work with the
sampling distribution of the mean. For now, however, it is important to note that many random
variables of interest-including blood pressure, serum cholesterol level, height, and weight are
approximately normally distributed. The normal curve can thus be used to estimate probabil-
ities associated with these variables. For example, in a population in which serum cholesterol
level is normally distributed with σ and µ, we might wish to find the probability that a ran-
domly chosen individual has a serum cholesterol level greater than 250 mg/100 mi. Perhaps
this knowledge will help us to plan for future cardiac services. Since the total area be neath
the normal curve is equal to 1, we can estimate the probability in question by determining the
proportion of the area under the curve that lies to the right of the point x = 250, or P(X > 250).
This can be done using a computer program or a table of areas calculated for the normal curve.
Since a normal distribution could have an infinite number of possible values for its mean and
standard deviation, it is impossible to tabulate the area associated with each and every normal
curve. Instead, only a single curve is tabulated-the special case for which µ = 0 and σ = 1 .
This curve is known as the standard normal distribution.
66
probability, the sum of the area to the right of 1 and to the left of -1 is P(Z > 1) + P(Z < −1)
= 0.159 + 0.159 = 0.318.
Since the total area under the curve is equal to 1, the area between -1 and 1 must be
P(-1 ≤ Z ≤ 1) = 1 - [P (Z > 1) + P (Z < −1)] = 1-0.138 = 0.682
Therefore, for the standard normal distribution, approximately 68.2% of the area beneath the
curve lies within ± 1 standard deviation from the mean. We might also wish to calculate the
area under the standard normal curve that is contained in the interval µ ± 2σ, or P(-2≤ Z ≤2).
The area to the right of z = 2.00 is 0.023; the area to the left of z = -2.00 is 0.023. Therefore,
the area between -2.00 and 2.00 must be
P(-2≤ Z ≤ 2) = 1- [P (Z > 2) + P (Z < −2)] = 1.000 - [0.023 + 0.023] = 0.954. Approximately
±
95.4% of the area under the standard normal curve lies within 2 standard deviations from the
mean.
67
5.4 Question
Plot normal distribution curve for the following types.
Three normal distribution curves for three different values of the mean µ = −2,µ = 2, µ =
0 with σ = 1
Three normal distribution curves with fixed value of mean,µ = 0 and three different
standard deviation values σ = 0.5, σ = 1, σ = 2.
68
5.5 Normal Approximations of the Binomial Distribution
The normal distribution can be used to approximate the binomial distribution when certain
conditions are met. This approximation makes calculations easier, especially for large number
of trials, and is based on the Central Limit Theorem. The binomial distribution describes the
probability of having exactly (k) successes in (n) independent Bernoulli trials, each with success
probability (p). Its probability mass function (PMF) is:
P (k) = nk pk (1 − p)n−k
When (n) is large, the shape of the binomial distribution resembles a bell-shaped curve. Ac-
cording to the Central Limit Theorem, the sum of many independent random variables (like
binomial trials) tends toward a normal distribution.
For the normal approximation to be reasonably accurate, typically:
np ≥ 5 and n(1 − p) ≥ 5
This ensures the distribution isn’t too skewed and is symmetric enough for the normal approx-
imation.
The binomial probabilities (P(k)) are approximated by the area under the normal curve
between (k - 0.5) and (k + 0.5) (using a continuity correction):
R k+0.5 (x−µ)2
P (k) ≈ √1 e− 2σ 2 dx
k−0.5 σ 2π
69
5.6 Probability Histogram for Binomial Distribution
A probability histogram for a binomial distribution is a visual representation that shows how the
probabilities of different numbers of successes (k) are distributed across all possible outcomes
in a binomial experiment [5].
The histogram plots the probability of each possible number of successes (k = 0, 1, 2, ...,
n) on the vertical axis. The x-axis represents the number of successes (k).
Each bar’s height corresponds to the probability P (k) = nk pk (1−p)n−k , which is the likelihood
of observing k successes out of n trials, given the probability p of success in each trial.
The histogram is typically discrete: bars are separated at integer values of k. It often has a
bell-shaped curve when n is large, especially for ( p ) near 0.5, resembling a normal distribution.
70
When p is close to 0 or 1, the histogram skews toward the lower or higher end of k.
Basic property of the Binomial Probability Histogram: For np ≥ 5 and nq ≥ 5, the
probability histogram for binomial distribution is nearly symmetric about µ = np over the
√
interval [µ − 3σ, µ + 3σ], where σ = npq, and outside this interval P(k) ≈ 0.
Example A fair coin is tossed 100 times. Find the probability P that heads occurs (a)
exactly 60 times. (b) between 48 and 53 times inclusive, (c) less than 45 times.
Solution: This is binomial experiment with n = 100, p = 0.5 and q =0.5.
√
∴ µ = np = 100X0.5 = 50 and σ = npq = 5.
Since we are using the normal approximation for discrete binomial counts, apply the continuity
correction:
To find ( P(k = 60) ), evaluate the probability of the normal variable falling between 59.5
and 60.5.
Binomial distribution BP(60) ≊ NP(59.5≤ X ≤ 60.5) where NP is normal probability.
z1 = 59.5−50
5
= 1.9 and z2 = 60.5−50
5
= 2.1
P = BP(60) ≊ N P (59.5 ≤ X ≤ 60.5) = N P (1.9 ≤ Z ≤ 2.1) = 0.4821 − 0.4713 = 0.0108
To find P(48≤k ≤53), evaluate the probability of the normal variable falling between 47.5
and 53.5.
Binomial distribution BP(47.5≤k ≤53.5)
z1 = 47.5−50
5
= −0.5 and z2 = 53.5−50
5
= 0.7
∴ P (48 ≤ k ≤ 53) = P (−0.5 ≤ k ≤ 0.7) = ϕ(0.7) − ϕ(0.5) = 0.2580+0.1915= 0.4495
Example A fair coin is tossed 12 times. Determine the probability P that the number
of heads occurring is between 4 and 7 inclusive by using a) the binomial distribution, b) the
normal approximation to the binomial distribution.
Solution: Number of trials, ( n = 12 )
Probability of heads in each trial, ( p = 0.5 )
a) We need probability that heads occur between 4 and 7 inclusive, i.e., P (4 ≤ X ≤ 7).
P (k) = nk pk(1 − p)n−k
BP (4) = 12 (0.5)4 (0.5)12−4 = 12 495
4 4
(0.5)4 (0.5)8 = 4096
BP (6) = 126
(0.5)6 (0.5)12−6 = 12
6
924
(0.5)6 (0.5)6 = 4096
BP (5) = 125
(0.5)5 (0.5)12−5 = 12
5
792
(0.5)5 (0.5)7 = 4096
BP (7) = 127
(0.5)7 (0.5)12−7 = 12
7
792
(0.5)7 (0.5)5 = 4096
Hence P = BP(4)+ BP(5)+ BP(6)+ BP(7) = 3003 4096
= 0.7332
71
P (3.5 < Z < 7.5) where Z = X−µ σ
3.5 in standard units = 3.5−6
1.73
= −1.45
7.5 in standard units = 7.5−6
1.73
= 0.87
5.7 Example :
Find the value of z that cuts off the upper 10% of the standard normal distribution, or the
value of z for which P(Z > z) = 0.10.
Solution:
Locating 0.100 in the body of the table, we observe that the corresponding value of z is
1.28. Locating 0.100 in the body of the table, we observe that the corresponding value of z is
1.28. Therefore, 10% of the area under the standard normal curve lies to the right of z = 1.28.
Similarly, another 10% of the area lies to the left of z = -1.28.
5.8 Example :
If the height of the 300 students are normally distributed with mean 64.5 inches and standard
deviation 3.3 inches. How many students have height less than 5 feet?
Solution:
Z = X−µ σ
Z = 60−64.5
3.3
= −4.5
3.3
≈ −1.36
P (Z < −1.36) = 1 − 0.4131 ≈ 0.0869
Number of students = 0.0869 ×300 ≈ 26.07 = 26 students.
To determine how many students have a height between 5 feet (60 inches) and 5 feet 9
inches (69 inches), we will use the properties of the normal distribution with the given mean
((µ)) of 64.5 inches and a standard deviation ((σ)) of 3.3 inches.
Z1 = 60−64.5
3.3
= −4.5
3.3
≈ −1.36
69−64.5 4.5
Z2 = 3.3 = 3.3 ≈ 1.36
72
P (60 < X < 69) = P (Z < 1.36) − P (Z < −1.36) = 0.9131 − 0.0869 = 0.8262
Number of students = 0.8262 × 300 ≈ 247.86 = 248 students
5.9 Example :
let X be a random variable that represents systolic blood pressure. For the population of 18-
to 74-year-old males in the United States, systolic blood pressure is approximately normally
distributed with mean 129 millimeters of mercury (mm Hg) and standard deviation 19.8 mm
Hg. Find the value of x that cuts off the upper 2.5% of the curve of systolic blood pressures.
Solution: We wish to find the value of x that cuts off the upper 2.5% of the curve of systolic
blood pressures, or, equivalently, the value of x for which P(X > x) = 0.025. From the z table,
we see that the area to the right of z = 1.96 is 0.025. z = 1.96 = x−µσ
X= 129 + (1.96)(19.8) = 167.8. Therefore, approximately 2.5% of the men in this population-a
minuscule minority have systolic blood pressures that are greater than 167.8 mm Hg, while
97.5% have blood pressures less than 167.8 mm Hg. In other words, if we randomly select an
individual from this adult male population, the probability that his systolic blood pressure is
greater than 167.8 mm Hg is 0.025. Because the standard normal curve is symmetric around z
= 0, we know that the area to the left of z = -1.96 is also 0.025. By solving the equation z =
-1.96 = x−µσ
X= 129 + (-1.96)(19.8) = 90.2. we find that 2.5% of the men have a systolic blood pressure
that is less than 90.2 mm Hg. Equivalently, the probability that a randomly selected male has a
systolic blood pressure less than 90.2 mm Hg is 0.025. Since 2.5% of the men in the population
have systolic blood pressures greater than 167.8 mm Hg and 2.5% have values less than 90.2
mm Hg, the remaining 95% of the men must have systolic blood pressure readings that lie
between 90.2 and 167.8 mm Hg. We might also be interested in determining the proportion of
men in the population who have systolic blood pressures greater than 150 mm Hg. In this case,
we are given the outcome of the random variable X and must solve for the normal deviate z: z
= 150−129
19.8
= 1.06
The area to the right of z = 1.06 is 0.145. Therefore, approximately 14.5% of the men in this
population have systolic blood pressures greater than 150 mm Hg.
The coefficient of variation (CV) is a statistical measure that quantifies the relative vari-
ability of a data set. It is defined as the ratio of the standard deviation to the mean, often
expressed as apercentage. The formula for the coefficient of variation is:
[CV = σµ × 100]
where:
( σ ) is the standard deviation, (µ) is the mean. Importance of Coefficient of Variation
Comparison of Variability: The CV allows for the comparison of variability between different
data sets, regardless of their unit of measurement or scale. This is particularly useful in finance,
where you might want to compare the risk (volatility) of different investments with different
expected returns.
Standardized Measure: Since the CV is a dimensionless number (it has no units), it stan-
dardizes variability, making it easier to interpret.
Insights into Data Distribution: A low CV indicates that the data points tend to be close to
the mean, whereas a high CV indicates that they are spread out over a wider range of values.
73
Example 1: Investment Returns:Followings are the marks obtained by the students in
a subject.
Year 2021 Bitcoin
32 17
49 0
44 5
52 3
69 20
63 14
34 15
Find the standard deviation. qP
(x−x̄)2 (x−x̄)2
P P
|x−x̄| 1144 2
Solution: Standard deviation : n−1 7
= 7−1
=13.8 Variance = σ = n−1
Example 2: Investment Returns:
Investment A: Mean return = 10%, Standard deviation = 2% Investment B: Mean return
2
= 15CV for A: ( 10 × 100= 20% )
5
CV for B: ( 15 × 100= 33.33% ) Interpretation: Although Investment B has a higher mean
return, it also has a higher relative volatility compared to Investment A.
Example 2: Numerical Find the covariance and commment on the value obtained.
Year 2021 Bitcoin Ethereum
Jan 33100 1310
Feb 45200 1420
Mar 58800 1920
April 57700 2770
May 37300 2710
June 35000 2270
July 41600 2530
August 47100 3430
Sept 43800 3000
Oct 61100 4290
Nov 56900 4630
Dec 46200 3680
Bit coin standard deviation = SB = 9650 Ethereum standard deviation = SE = 1045
CVB = Sx̄ = 47000
9650
= 0.21 CVE = Sx̄ = 2830
1045
= 0.37
So the standard deviation of bitocoin is 21 % of mean and in case of etherium it is 37 % of
the mean. It indicates that Ethereum is more volatile. We can say that coefficient of variation
state the relative size of the standard deviation.
5.10 Moments
In probability and statistics, moments are quantitative measures related to the shape of a
probability distribution. They provide important information about the characteristics of the
distribution, such as its central tendency, variability, and shape. Understanding moments can
help in data analysis, model fitting, and various applications in fields like finance, engineering,
74
and social sciences. Moments are the expected values of the random variable.
Folowings are the types.
Zeroth Moment (Total Probability):
Definition: The zeroth moment of a random
R∞ variable is the total probability, which is
always equal to 1. Formula: M0 = −∞ f (x), dx = 1 Example: For any probability
density function (PDF), integrating the PDF over its entire range yields 1, confirming
that total probability is conserved.
First Moment (Mean):
Definition: The first moment is the mean (or expected value) of the random variable,
reflecting
R∞ the central tendency of the distribution. E(X) Formula: M1 = µ = E[X] =
−∞
xf (x), dx Example: For a uniform distribution on the interval ([a, b]): E[X] = a+b
2
If ( a = 1 ) and ( b = 3 ), then ( E[X] = 1+32
= 2 ).
Second Moment (Variance):
Definition: The second moment about the mean (or variance) measures R ∞the dispersion or
2 2 2 2
spread of the distribution. E(X ) Formula: M2 = E[(X−µ) ] = σ = −∞ (x−µ) f (x), dx
Example: For a normal distribution, where (µ) is the mean and σ 2 is the variance, if µ = 0
and σ =, the second moment is σ 2 = 1.
Third Moment (Skewness):
Definition: The third moment about the mean (skewness) measures the asymmetry of the
3]
distribution. E(X 3 ) Formula: γ1 = E[(X−µ)
σ3
Example: A distribution with positive skew
(right-skewed) has a longer tail on the right side, while a negative skew (left-skewed) has
a longer tail on the left side. For a skewed distribution, the skewness value may be larger
than 0 or less than 0 accordingly.
Fourth Moment (Kurtosis): E(X ) 4
Definition: The fourth moment about the mean (kurtosis) measures the ”tailedness” of
4]
the distribution, indicating the presence of outliers. Formula: γ2 = E[(X−µ)
σ 4 − 3 (the
subtraction of 3 makes the kurtosis of the normal distribution equal to 0). Example: A
distribution with high kurtosis greater than 3 has heavy tails and more outliers, whereas
one with low kurtosis less than 3 has lighter tails. For instance, a Laplace distribution
has a higher kurtosis than a normal distribution.
Mean is written as E(X) = µ′1 and Variance is written as µ′2
Let X be a (discrete or continuous) random variable. We define the kth moment about the
origin as µ′k = E(X k ) and the kth central moment as µk = E[(X − E(X))k ].
75
µ′3 = E(X 3 ) = 13 61 + 23 16 + 33 16 + 43 61 + 53 16 + 63 61 = 147
2
76
Numerical based on expectation E(x): The daily consumption of electric power (in
million kwh) is a random variable X with probability density function is f(x) = kxe−x/3 for
x > 0 and 0 elsewhere. Find the value of k, the expectation of k and the probability that on a
given
R ∞ day the electric consumption is more than expected value.
−x/3
0
kxe , dx = 1
77
1
k= 9
R∞ R∞ R∞
1
xe−x/3 1
x2 e−x/3 dx
E[X] = 0
xf (x)dx = 0
x· 9
dx = 9 0
The limits remain the same since as ( x ) goes from ( 0 ) to (∞ ), ( t ) also goes from ( 0 )
to (∞ ).
R∞ R∞
0
x2 e−x/3 dx = 0
(3t)2 e−t · 3dt
R∞ R∞
=3 0
9t2 e−t dt = 27 0
t2 e−t dt
R∞
But, 0
t2 e−t dt = 2!
R∞
Thus, 0
x2 e−x/3 dx = 54 and
R∞
1
9 0
x2 e−x/3 dx = 6
R∞ h −x/3 i∞
2 −x/3 2e e−x/3 e−x/3
0
xe dx = x −1/3 − 2x( 1/9 ) + 2 −1/27 =6
0
78
Numerical based on expectation E(x): The distribution function of a random variable
X is given by FX (x) = 1 − (1 + x)e−x , x ≥ 1. Find the mean and the variance.
dFx (x)
Solution : fX (x) = dx
= (1 + x)e−x − e−x = xe−x , x ≥ 0
R∞ R∞
Mean = X̄ = 0
xf (x)dx = 0
x2 e−x dx = 2
R∞ R∞
E(X 2 ) = 0
x2 f (x)dx = 0
x3 e−x dx = 6
Discrete Random Variables:If (X) is a discrete random variable with probability mass
function (P(X = x i)), then the expectation of the function (g(X)) is calculated as follows:
P
E[g(X)] = i g(xi )P (X = xi )
Numerical based on expectation f(x) and g(x): If the probability density function
of x is f(x) = 29 x(2 − x2 ) , 0 ≤ x ≤ 3. Find E(Y) where Y = (X + 1)2 .
R∞ R3
Solution : 0
g(x)f (x)dx = 0
(x + 1)2 f (x)dx
R3
0
(x + 1)2 92 x(2 − x2 )dx = 29 [−24.3 + 20.25 + 31.5 + 9] = 8.1
V (a X + a X ) = a V (X ) + a V (X )
1 1 2 2
2
1 1
2
2 2
79
6 Series expansion
Taylor series for ( eu ) (where ( u = tx ))
t2 x2 t3 x3 t4 x4
etx = 1 + tx + 2
+ 6
+ 24
+ ···
Expansion of (1 − z)− 1
(1 − z)−1 = 1 + z + z 2 + z 3 + z 4 + . . .
Expansion of (x − a)n
n n n
n−1
(x − a)n = xn − 1
ax + 2
a2 xn−2 − 3
a3 xn−3 + . . . + (−1)n an
The first four moments about the mean (using the notation (µr ) for moments about the
mean and (µ′r ) for moments about zero):
First Moment (Mean):
µ1 = 0
Third Moment:
µ3 = µ′3 − 3µ′2 µ′1 + 2(µ′1 )3
Fourth Moment:
µ4 = µ′4 − 4µ′3 µ′1 + 6µ′2 (µ′1 )2 − 3(µ′1 )4
7 Characteristic function
: The characteristic function is a way to describe a random variable X. The characteristic
function, [?] ϕx (t) = E[eitx ], a function of t, determines the behavior and properties of the
80
probability distribution of X. It is equivalent to a probability density function or cumulative
distribution function, since knowing one of these functions allows computation of the others,
but they provide different insights into the features of the random variable. In particular cases,
one or another of these equivalent functions may be easier to represent in terms of simple stan-
dard functions. If a random variable admits a density function, then the characteristic function
is its Fourier dual, in the sense that each of them is a Fourier transform of the other. If a ran-
dom variable has a moment-generating function MX (t), then the domain of the characteristic
function can be extended to the complex plane, and ϕx (it) = MX (t). Note however that the
characteristic function of a distribution is well defined for all real values of t, even when the
moment-generating function is not well defined for all real values of t.
The characteristic
R ∞ itx function (ϕX (t)) of a random variable ( X ) is defined as: ϕX (t) =
itX
E[e ] = −∞ e fX (x), dx for continuous variables
or P
ϕX (t) = x eitx P (X = x) for discrete variables
If the moment generating function (MX (t)) exists, the characteristic function can be related
to it. Specifically, the characteristic function can be derived from the moment generating func-
tion by substituting ( it ) for ( t ).
The moments of a random variable can be obtained from derivatives of its characteristic
function:
Example : Prove that the characteristic function of the sum independent random variable
is the product of their individual characteristic functions.
Let S = RX + Y
pS (s) = pX (u)pY (s − u)du
ϕS (t) = ϕX (t)ϕY (t)
R ∞Which is nothing but the fourier convolution theorem.
itx
Proof: ϕX (t) = −∞ e PX (x)dx
R∞
pX (x) = 2π1
−∞R x
ϕ (t)e−itx dt inverse fourier transform
Since, pS (s) = pRX (u)pY (s − u)du
∞
1
pS (s) = pX (u)[ 2π −∞ r
ϕ (t)e−it(s−u) dt]du
∞ ∞
1
ϕ (t)e−its [ −∞ pX (u)eitu du]dt
R R
2π −∞ r R
∞
1
pS (s) = 2π −∞ Y
ϕ (t)ϕX (t)e−its dt
Therefore it is proved that the characteristic function of the sum of independent random vari-
able is the product of their individual characteristic R functions i.e ϕS (t)
R = ϕY (t)ϕX (t)
Scaling Rlaw for Random Variables : ϕax (t) = eitx pax (X)dx = eitx a1 px ( xa )dx
x
ϕax (t) = ei(at) a px ( xa ) dx
a
= ϕX (at)
81
Example : Let X have the probability mass function
|eix | = 1
ϕx (t) = E[eitx ]
(itx)2 (itx)3
ϕx (t) = E[1 + itx + 2!
+ 3!
− −−]
(it)2 (it)3
ϕx (t) = E[1 + (it)E(x) + 2!
E(x)2 + 3!
E(x)3 − −−]
r r
µr′ = coeff of i r!t in the expansion of ϕr (t)
also ,
r
µr′ = E(xr ) = i1r dtd r ϕx (t)
Problem : Find the characteristic function of Poisson’s distribution and hence find its
mean and variance.
Solution: The probability mass function of Poisson’s distribution with respect to λ is
−λ x
P(x) = e x!λ
Thus thePcharacteristic function of it is
itx e−λ λx
ϕx (t) = ∞ x=0 e x!
(eit λ)x
ϕx (t) = e−λ ∞
P
x=0 x!
−λ eitλ
ϕx (t) = e e
itλ
ϕx (t) = e−λ ee
itλ
ϕx (t) = e−λ(1−e )
itλ )
E(x) = 1i dtd ϕx (t) = 1i e−λ(1−e
At t=0, E(x) = λ
2
E(x2 ) = i12 dtd 2 ϕx (t) = λ + λ2
Problem : Find the characteristic function of Binomial distribution and hence find its
mean and variance.
Solution: The binomial distribution is characterized by two parameters: ( n ) (the number of
trials) and ( p ) (the probability of success in each trial). The probability mass function (PMF)
of a binomially distributed random variable ( X ) is given by:
n
k
P (X = k) = k
p (1 − p)n−k for k = 0, 1, 2, . . . , n
82
The characteristic function ( ϕX (t) ) of a binomial random variable ( X ) is defined as:
Pn
ϕX (t) = E[eitX ] = k=0 eitk P (X = k)
1 d 1 d
E(x) = ϕ (t)
i dt x
= i dt
(peit + (1 − p))n
Problem : Find the characteristic function of a random variableR (X) that follows a uniform
∞
distribution on the interval ([-1, 1]). Solution: ϕX (t) = E[eitX ] = −∞ eitx fX (x), dx
For a uniform distribution over the interval ([-1, 1]), the probability density function (fX (x))
is defined as:(
1
if − 1 ≤ x ≤ 1,
fX (x) = 2
0 otherwise
R 1 itx R1
ϕX (t) = −1 e fX (x), dx = −1 eitx · 12 , dx
1
R1
ϕX (t) = 2 −1
eitx , dx
1 1 1 itx
R
2
eitx dx = 2 it
e
83
R1 1 1 itx 1 sin(t)
1
2 −1
e itx
, dx = 2 it
e −1 = 1 1
2 it
(eit − e−it ) = t
(
sin(t)
t
if t ̸= 0,
ϕX (t) =
1 t=0
8 Random Processes:
A random process is a collection of random variables indexed by time or space, representing
a system that evolves over time or across different configurations. It can be thought of as a
”family” of possible outcomes at different points, where each specific realization of the random
process corresponds to a possible state of the system.
In statistical terms, an ensemble refers to the complete set of realizations of the random
process at a given point in time. Each realization represents a different ”trajectory” or ”sam-
ple path” of the process, providing insight into the variability and probabilistic nature of the
system’s behavior.
Using the notion of an ensemble, you can derive important statistical properties, such as:
Mean (Expected Value): The average of all realizations at a given time provides insights
into the expected behavior of the random process. This is often computed as: E[X(t)] =
Average of the values in the ensemble at time t.
Variance: Measures the dispersion of the values in the ensemble around the mean, provid-
ing information about the variability of the process: V ar(X(t)) = E[(X(t) − E[X(t)])2 .
Covariance: Captures how two random variables (or values of the random process at
different times) move together and is derived from the joint distribution of the ensemble at
those times.
(PX (x, t)) is the probability density function of the random variable (X(t)) at time (t).
The integral computes the average of all possible values that the process might take at time (t)
weighted by their probabilities.
84
8.2 What is the definition of a stationary random process?
A stationary random process is defined as a stochastic process whose statistical properties do
not change over time. Specifically, this means that:
Mean: The expected value (mean) of the process is constant over time.
Variance: The variance of the process is also constant over time.
Covariance: The covariance between values of the process at different times depends only
on the time difference (lag) between those values, not on the actual time at which the
values are observed.
In simpler terms, a stationary random process exhibits consistent behavior regardless of when
it is observed, allowing for reliable statistical inference over time. Stationarity is a critical
assumption in many statistical models and time series analyses.
8.4 Stationarity
A random process is said to be stationary if it is invariant under time translation. This means
that if you take the process (let’s denote it as ( X(t) )) and look at it at a different time ( t +
τ ) (where ( τ ) is a constant), the statistical properties remain the same: [ X(t) and X(t +
τ ) ] Specifically for a stationary process, the covariance between two points in time depends
only on the difference in time between those points, not the absolute times. For example: [
Cov(X(t), X(t + τ )) = Cov(X(0), X(τ )) ] This indicates that the statistical structure of the
process does not change when observed at different times.
Implications of Invariance:
If a process is invariant under time translation, it suggests that the process is stable over
time. This is an important property in many fields, including signal processing, finance, and
physics, as it allows for consistent modeling and predictions. For instance, in time series analy-
sis, many statistical methods assume that the underlying data-generating process is stationary
(or at least approximately stationary) so that past behavior can inform future expectations
reliably.
Applications and Importance:
Invariance under time translation is key in constructing models for forecasting and under-
standing the underlying dynamics of stochastic processes. In practical terms, if observing a
process yields the same statistical behavior regardless of when observations are made, it sim-
plifies analytical operations, allowing for broader applications of models developed based on
those observations.
85
What is IID random variables?
IID stands for ”Independent and Identically Distributed,” and it refers to a collection of
random variables that have two key properties:
Independent: Each random variable in the collection does not influence or provide any
information about the others. The occurrence of one event does not affect the probability
of occurrence of another event.
Identically Distributed: Each random variable has the same probability distribution. This
means that they all share the same mean, variance, and shape of the distribution, even
though the individual outcomes may differ.
A set of IID random variables behaves as if they are drawn from the same statistical population
without any dependence between them, making them a fundamental concept in probability the-
ory and statistics, especially in the context of sampling and many statistical inference methods.
Strict Stationarity:
A random process (X(t)) is said to be strictly stationary if the joint distribution of any
collection of random variables (X(t1 ), X(t2 ), ..., X(tn )) is the same as the joint distribution of
(X(t1 + τ ), X(t2 + τ ), ..., X(tn + τ )) for all time shifts (τ ) and any collection of time points
(t1 , t2 , ..., tn ). In simpler terms, the statistical properties of the process are invariant under time
shifts, meaning that the entire distribution does not change if you shift the time variable.
Wide Sense Stationarity (WSS):
A random process (X(t)) is said to be wide sense stationary if:
The mean (E[X(t)]) is constant over time: [E[X(t)] = µ for all t, ]
The autocovariance (Cov(X(t ), X(t ))) depends only on the time difference (|t −t |) and
1 2 2 1
not on the actual time points: [ Cov(X(t1 ), X(t2 )) = Cov(X(t1 ), X(t1 +τ )) for all t1 , t2 , τ.
]
In broad terms, it focuses on the first two moments (mean and variance) of the process.
Conditions
Strict Stationarity: (Strict sense staionary process or strongly stationary process) Re-
quires all statistical characteristics (all moments of the distribution) to be the same at
all times and invariant under time shifts. This is a stronger condition; it applies to the
entire distribution of the process.
Wide Sense Stationarity:Only requires the mean to be constant and the autocovariance to
depend only on the time difference. It does not impose any restrictions on higher moments
(like skewness or kurtosis). This is a less restrictive condition, and many processes can
be WSS without being strictly [Link] process is also called as weakly stationary
process or covariance stationary process.
Note: SSS process with finite first and second order moments is a WSS process, while a WSS
process need not be a SSS process.
Strictly Stationary Process Example:
86
A Gaussian process with a constant mean and variance is strictly stationary because any
linear combination or marginal distribution remains Gaussian and unchanged under time shifts.
Wide Sense Stationary Process Example:
A process that follows a sine wave with a constant mean (like (X(t) = A sin(ωt + ϕ) + µ))
could be considered wide sense stationary if the average and the autocovariance depend solely
on the frequency of oscillation, despite potentially being non-Gaussian or having varying higher
moments.
WSS processes are often easier to work with in practical applications like time series analy-
sis, especially when using methods that rely on mean and autocovariance properties, but they
may leave out important characteristics of the process.
87
The covariance depends on the specific times (t1 ) and (t2 ), particularly it depends on (t1 )
and ( t2 ) together rather than just their difference: [ Cov(X(t1 ), X(t2 )) = λt1 when t1 = t2 . ]
The mean ( E[X(t)] = λt) changes with time ( t ), and the covariance also depends on the
specific values of (t1 ) and (t2 ).
Since:
The mean of ( X(t) ) changes with time (not constant). The covariance ( Cov(X(t1 ), X(t2 )))
does not depend solely on the time difference (|t2 − t1 |). Thus, the Poisson process ( X(t) )
is not covariance stationary because it does not satisfy the conditions of constant mean and
covariance structure dependent only on the lag [3].
88
FX (x) for all x where FX is continuous. This type of convergence is concerned with
the limiting behavior of the probability distribution of the random variables.
Weak Convergence Concept:
The term weak convergence comes from the fact that convergence in distribution does
not require strong forms of convergence, such as convergence of the random variables
themselves almost surely or in probability. Instead, it only requires that the distributions
of the random variables converge to the distribution of the limiting random variable. This
means that convergence is ”weak” in the sense that it applies to the distribution functions
rather than to the random variables themselves.
Convergence of Random Variables: Even though the values of Xn (the random vari-
ables) cannot be expected to converge pointwise to a specific value, the distribution (as
captured by the CDF) is converging.
1
P (|X − µ| < kσ) = 1 − P (|X − µ| ≥ kσ) = 1 − k2
89
σ2
If kσ = C then P (|X − µ| ≥ C) ≤ C2
σ2
P (|X − µ| < C) ≥ 1 − C2
90
Thus the interval is [75-10(5), 75+10(5)] = [25,125]
Chebyshev’s Inequality and the properties of the normal distribution (along with the Z-
table) serve different purposes, and each has its own advantages depending on the context
of the analysis. Here are some reasons and scenarios where Chebyshev’s Inequality is still
valuable, even when the normal distribution is available:
– Applicability to All Distributions Chebyshev’s Inequality applies to any probability
distribution, regardless of its shape (normal, uniform, skewed, etc.) or the nature of
the data. This is particularly useful when the distribution of the data is unknown
or cannot be assumed to be normal.
Non-Normal Data: In many real-world applications, data may not follow a normal
distribution. Chebyshev’s Inequality can be used to understand the dispersion of
such data without needing specific distribution information.
– Fewer Assumptions No Assumed Normality: Using the Z-table and properties of the
normal distribution requires a normality assumption. If the data does not conform to
this assumption, the Z-scores and resultant probabilities may be misleading. Cheby-
shev’s Inequality circumvents this issue by not requiring any assumptions about the
underlying distribution.
– Conservative Estimates: Chebyshev’s Inequality provides a conservative estimate
that gives a lower bound on probabilities. This means that it can be useful in
scenarios where it’s important to establish minimum expectations about the spread
of data.
– Preliminary Analysis: In exploratory data analysis, if you are unsure whether the
data follows a particular distribution, Chebyshev’s Inequality can be leveraged to
make initial observations about variance and spread.
– Foundational Understanding: Chebyshev’s Inequality is often taught in statistics as
it helps reinforce concepts about variance, mean, and dispersion, providing founda-
tional knowledge about probability without leaning solely on normal distributions.
91
Example : Two unbiased dice are thrown. If X is the sum of the numbers shown up,
35
prove that P (|X − 7| ≥ 3) = 54 . Also
P find the 1actual probability.
Solution : We know that E(X) = pi xi = 36 (2 + 6 + 12 + 20 + 30 + 42 + 40 + 36 +
30 + 22 +P12) = 7
1
E(X 2 ) = pi x2i = 36 (4 + 18 + 48 + 100 + 180 + 294 + 320 + 324 + 300 + 242 + 144) = 329
6
σ 2 = E(X 2 ) = [E(x)]2 = 329
6
− 49 = 35
6
Using Chebyshev’s inequality,
2
P (|X − µ| ≥ C) ≤ Cσ 2
35
35
P (|X − 7| ≥ 3) ≤ 6
9
≤ 54
Example : The mean height of students in the class is 5 feet 5 inches. Find the bound
on the probability that a student selected at random from the class is taller that 8 feet.
Example : Given that E[H] = 65 inches and k = 96 inches.
P [|X| ≥ k] ≤ E[|X|]
k
P [|H| ≥ 96] ≤ 65
96
= 0.68
92
In case of discrete random variable, P (X ≥ k) ≤ e−tk Mx (t)
For the uniform distribution, the spread is simply the length of the interval (b - a), and
2
because Pof the uniformity, the variance is (b−a)
12
.
X̄n =Pn1 i = 1n X .
Pi n
V ar ( ni=1 Xi ) = P 2
i=1 V ar(Xi )= nσ .
V ar(X̄n) = V ar n i = 1 Xi = n2 V ar ( ni=1 Xi ).
1 n 1
P
2
V ar(X̄n ) = n12 × nσ 2 = σn .
(1−0)2 1
V ar(X) = 12
= 12
.
V ar(X) 1/12 1
n
= n
= 12n
. is the Variance of the sample mean.
Explain the law of large numbers (LLN) using the concepts of Stochastic
Convergence.
The Law of Large Numbers states that as the sample size ( n ) increases, the sample mean
(average) of a sequence of independent and identically distributed (IID) random variables
converges in probability to the expected value (mean) of the underlying distribution.
Let (X1 , X2 , . . . , Xn ) be a sequence of IID random variables with a finite expectation (
E[X] = µ ).
As ( n ) approaches infinity, the sample mean (X̄n ) defined as:
Converges in probability to ( µ ):
P
X̄n −
→ µ.
limn→∞ P (|X̄n − µ| > ϵ) = 0 for any ϵ > 0.
This statement indicates that as we take more samples, the probability that the sample
mean (X̄n ) deviates from the true mean ( µ ) by more than any fixed amount (ϵ ) ap-
proaches zero.
93
Sampling: If we roll the die and calculate the average from various sample sizes:
The sample mean (X̄n ) will fluctuate more closely around ( 3.5 ) as the number of rolls
increases due to the increased cancellation of random extremes (variance). Using conver-
gence in probability, we can say that the probability of observing an average significantly
different from ( 3.5 ) decreases with larger numbers of samples.
Compare the almost sure or strong convergence with the weak convergence?
Almost Sure Convergence (also known as strong convergence) and Weak Convergence
(also known as convergence in distribution) are two types of convergence for sequences of
random variables.
Weak Convergence:
94
– Weaker Requirement: Weak convergence does not require pointwise convergence of
the random variables. It only requires the distributions of Xn to approximate the
distribution of ( X ) as ( n ) becomes large.
– Limited Insight: Weak convergence does not provide information about specific sam-
ple paths or how the sequence behaves on specific realizations. You might have sit-
uations where Xn converges in distribution to ( X ) but does not converge almost
surely.
V ar(X̄n ) = V ar n1 (X1 + X2 + . . . + Xn )
V ar(aX) = a2 V ar(X)
V ar(X̄n ) = n12 V ar(X1 + X2 + . . . + Xn ) = n12 (nσ 2 )
By applying Chebyshev’s Inequality, we can state that:
P (|X̄n − µ| > ϵ) ≤ V ar(ϵ2X̄n )
2 2
P (|X̄n − µ| > ϵ) ≤ σ ϵ2/n = ϵσ2 n
which goes to zero as n → ∞. Hence weak law of large number (WLNN) P |X̄n − µ| > ϵ →
0 as n → ∞ is proved.
95
8.8 Central Limit Theorem (CLT)
Question: State the Central Limit Theorem (CLT)
Let X1 , X2 , . . . , Xn be a sequence of IID random variables, each with finite mean µ = E[Xi ]
and finite variance σ 2 = V ar(Xi )
As ( n ) approaches infinity, the distribution of the standardized sum (or average) of these
random variables approaches a normal distribution:
d
Zn = X̄σ/n√−µ
n
→
− N (0, 1),
where ( X̄n ) is the sample mean defined as:
The Central Limit Theorem provides that the distribution of the sample mean (or sum) of
a large number of IID random variables approaches a normal distribution, regardless of the
original distribution of the variables, as long as they have a finite mean and variance.
Problem : The lifetime of a certain brand electric bulb may be considered a random
variable with mean 1200 hours and standard deviation 250 hours. Using the Central Limit
Theorem find the probability that the average lifetime of 60 bulbs exceeds 1250 hours.
Solution: If X̄ deontes the mean lifetime of 60 bulbs then by central limit theorem,
z = X̄−µ
√σ
n
Problem : A random sample of size 100 is taken from a population whose mean is 60
and the vriance is 400. Using central limit theorem with what probability can we assert that
the mean of the sample will not differ from µ = 60 by more than 4. Solution: If X̄ deontes
the mean of 100 samples. z = X̄−µ
√σ
n
96
cars (consisting of 30 cars and 29 gaps) is between 185 and 195 m?
Solution:
P30 The
P29total length of the queue can be defined as:
Ltot = i=1 Li + j=1 Dj ,
E[Ltot ] = E[ 30
P P29
i=1 Li + j=1 Dj ],
The variance of the total length is calculated by summing the variances of the lengths and
the gaps:
P30 P
29
V ar(Ltot ) = V ar i=1 Li + V ar j=1 Dj .
Z = L√tot −E[Ltot ] .
V ar(Ltot )
Z= √ −190.9 .
Ltot
45.81
185−190.9
Z1 = √
45.81
.
195−190.9
Z2 = √
45.81
.
p √
V ar(Ltot ) = 45.81 ≈ 6.77.
185−190.9 −5.9
Z1 = 6.77
≈ 6.77
≈ −0.87
195−190.9 4.1
Z2 = 6.77
≈ 6.77
≈ 0.61.
97
0.7291-1+0.8078 = 0.5369.
Problem : A distribution with unknown mean µ has variance equal to 1.5. Use central
limit theorem to find how large a sample should be taken from the distribution in order to find
how large a sample should be taken from the distribution in order that the probability will be
at least 0.95 that the sample mean will be within 0.5 of the population mean.
P [ |√
X̄−µ|
1.5
< √0.51.5 ] ≥ 0.95
n √n
P |Z| < 0.4082 n ≥ 0.95
From the table of areas undr normal curve
P |Z| < 1.96 = 0.95
P (−1.96 < Z < 1.96) = 0.95
P (Z < 1.96) ≈ 0.975
P (Z < −1.96) ≈ 0.025
P |Z| < 1.96 = P (−1.96 < Z √< 1.96) = 0.975 − 0.025 = 0.95
∴ least n is given y 0.4082 n = 1.96, i.e n = 24.
∴ the size of the sample must be at least 24.
98
. Are X and Y orthogonal?
R +∞ R +∞
Answer : RXY = −∞ −∞ xyfXY (x, y)dxdy
RXY = 0.4(−α)(2) + 0.3(α)(2) + 0.1(−α)2 + 0.2(1)(1)
Since dirac function is given, it indicates 0.4δ(x + α)δ(y − 2) means (−α, 2) occurs with proba-
bility 0.4 and the similar explanation for the remaining three terms. Hence we can write it as
follows. RXY = −0.2α + 0.1α2 + 0.2
dRXY
dα
= −0.2 + 0.2α = 0
∴α=1
Verify dRdαXY > 0 or not since dRdαXY > 0 it indicates α = 1 is the minimum value and hence we
can put α = 1 in the equation to find RXY = −0.2 + 0.1(12 ) + 0.2 = 0.1 which is the minimum
value. Since the value of RXY ̸= 0 we can conclude that x and y are not orthogonal.
NOTE : When you have a function ( f(x) ) and you’ve found its first derivative ( f’(x) ), set-
ting it to zero gives you critical points (points where the function could have a local maximum,
local minimum, or saddle point). The second derivative ( f”(x) ) is used to analyze these points:
If (f ”(x) > 0) at a critical point, the function is concave up, indicating a local minimum
at that point.
If (f ”(x) < 0) at a critical point, the function is concave down, indicating a local maxi-
mum.
If ( f”(x) = 0 ), the test is inconclusive, and further analysis or higher derivatives may be
needed.
Question: Discrete random variable X and Y have the joint density fXY (x, y) = 0.3δ(x −
α)δ(y − α) + 0.5δ(x + α)δ(y − 4) + 0.2δ(x + 2)δ(y + 2). Determine the value of α if any that
minimizes the covariance of X and Y. Find the minimum covariance. Are X and Y correlated?
Answer : We know that,
Cov(X, Y) = E[XY] - E[X]E[Y]
C(X,Y ) = RXY − X̄ Ȳ
RXY = 0.3(α)2 + 0.5(−α)(4) + 0.2(−2)(−2)
RXY = 0.3(α)2 − 2(α) + 0.8
X̄ = 0.3(α) + 0.5(−α) + 0.2(−2) = −0.2(α) − 0.4
Ȳ = 0.3(α) + 0.5(4) + 0.2(−2) = 0.3(α) − 1.6
∴ C(X,Y ) = 0.3(α)2 − 2α + 0.8 − [(−0.2α − 0.4)(0.3α − 1.6)]
∴ C(X,Y ) = 0.36α2 − 1.56α + 1.44
dCX,Y
∴ dα = 0.72α − 1.56 = 0
∴ α = 2.16
d2 C
Verify dαX,Y which is 0.72 and greater than zero. Hence minimum Cmin(X,Y ) = 0.36 ∗ 2.162 −
1.56 ∗ 2.16 + 1.44 == 0.25
Since Cmin(X,Y ) is ̸= 0 X and Y not uncorrelated.
99
Question: Define the following terms
Uncorrelated
Not Uncorrelated
Positive Correlation
Negative Correlation
Covariance and Correlation
Answer :
Uncorrelated : Two random variables ( X ) and ( Y ) are said to be uncorrelated if
their covariance is zero: Cov(X, Y) = E[XY] - E[X]E[Y] = 0. This implies that there
is no linear relationship between the two variables; knowing the value of X provides no
information about the value of Y.
Not Uncorrelated : Saying that two random variables are not uncorrelated means that
they are correlated, which can happen when: Cov(X, Y ) ̸= 0. This indicates that there
is some degree of linear relationship between the two variables.
Positive Correlation : If the covariance is positive (( Cov(X, Y ) > 0)), this suggests
that as one variable increases, the other variable tends to increase as well.
Negative Correlation: If the covariance is negative ((Cov(X, Y ) < 0 )), it implies that
as one variable increases, the other variable tends to decrease.
Correlation is a standardized measure that quantifies the strength and direction of the
relationship between two variables. It provides insights into how closely related the vari-
ables are, in a dimensionless form. The most commonly used correlation measure is the
Pearson correlation coefficient, denoted as ’r’. The correlation coefficient ’r’ ranges from
-1 to 1. r = 1 : Perfect positive linear correlation; r = -1 : Perfect negative linear corre-
lation, and r = 0 : No linear correlation.
r = √ Cov(X,Y )
V ar(X)·V ar(Y )
yi -3 2 4
g(yi) 0.4 0.3 0.3
Cov(X, Y ) = E(XY ) − µx µy
First compute µx and µy
P
µxP= xi f (xi ) = (1)(0.5) + (3)(0.5) = 2
µy = yi g(yi ) = (−3)(0.4) + (2)(0.3) + (4)(0.3) = 0.6
Next compute
P E(XY) as follows.
E(XY ) = xi yi h(xi , yi )
E(XY )= (1)(-3)(0.1)+(1)(2)(0.2)+(1)(4)(0.2)+(3)(-3)(0.3)+(3)(2)(0.1)+(3)(4)(0.1) = 0
Cov(X, Y ) = E(XY ) − µx µy = 0 - 2(0.6) = -1.2
To compute σx and σy
2 2
P
E(X ) = (xi ) f (xi )= (1)(0.5)+(9)(0.5) = 5
σx2 = V ar(X) = E(X 2 ) − µ2 X = 5-4 = 1
σx = 1
and
E(Y 2 ) = (yi )2 g(yi )= (9)(0.4)+(4)(0.3) + (16)(0.3) = 9.6
P
σy2 = V ar(Y ) = E(Y 2 ) − µ2 Y = 9.6-(0.62 ) = 9.24
√
σy = 9.24 = 3
ρ(X, Y ) = Cov(X,Y
σx σy
)
= (−1.2)
(1)(3)
= 0.4
101
8.10 Spectral Characteristics of Random Process
Question: Explain the Spectral Charactristics of a random process.
Answer : The spectral characteristics of a random process provide insights into how the
process behaves in the frequency domain, as opposed to the time domain. The spectral char-
acteristics of a random process primarily refer to the distribution of power or energy of the
process across different frequencies. This is typically described using Power Spectral Density
(PSD). The Power Spectral Density (PSD) is a function that provides a measure of the power
present in a signal as a function of frequency.
The Power Spectral Density SX (f ) of a random processR ( X(t) ) is defined as the Fourier trans-
+∞
form of its auto-correlation function RX (τ ): SX (f ) = −∞ RX (τ )e−j2πf τ dτ .
RX (τ ) = E[X(t)X(t + τ )] is the auto-correlation function.
Non-negative: The PSD is non-negative since it represents power, which cannot be neg-
ative.
Symmetry: For real-valued processes, the PSD is an even function, meaning that S X (−f ) =
SX (f ).
Real quantity
The total
R power of the signal can be found by integrating the PSD across all frequencies:
+∞
P = −∞
SX (f ), df .
Wide-Sense Stationarity: If a random process is wide-sense stationary, its PSD exists and
is a function of frequency alone, independent of time.
Frequency Content: Analyzing the PSD allows us to understand which frequencies carry
significant power in the signal, providing insights into the nature of the random process.
Band-limited Signals: If the PSD is non-zero only within a certain bandwidth, the pro-
cess is considered band-limited. This is important in telecommunications, as it helps in
designing efficient communication systems.
Signal Processing: Engineers use spectral analysis to design filters that can enhance
desired signals while reducing noise. The PSD helps in determining the effectiveness of
these filters at different frequencies.
System Design: Understanding the spectral characteristics allows for designing control
systems that can effectively deal with different frequency behaviors.
Noise Analysis: In systems like electrical circuits, the spectral characteristics of noise play
a vital role in determining system performance.
102
8.11 Autocorrelation and Power Spectral Density
Autocorrelation and Power Spectral Density:
From Autocorrelation to Power Spectral Density Using the Fourier transform, you can
transform the
R ∞ autocorrelation function to obtain the Power Spectral Density: SX (f ) =
−j2πf τ
FRX (τ ) = −∞ RX (τ )e dτ .
This relationship tells us that the PSD can be derived from the autocorrelation function
by performing a Fourier transform.
From Power Spectral Density to Autocorrelation Conversely, you can obtain the autocor-
relation function from Rthe Power Spectral Density using the inverse Fourier transform:
∞
RX (τ ) = F −1 SX (f ) = −∞ SX (f )ej2πf τ df .
This means that the values of the autocorrelation function at different lags are obtained
by integrating the Power Spectral Density weighted by the complex exponential function.
Expected Value of the Square: The left-hand side, E[X (t)], represents the expected value
2
of the square of the random variable ( X(t) ). This measure provides information about
the power or energy of the random process at time ( t ).
Autocorrelation at Zero Lag: The right-hand side, R XX](0), signifies the auto correlation
[
at zero lag. This value captures the variance of the process if the process is wide-sense
stationary (WSS) and is equal to the expected value of the square of the random variable.
If ( X(t) ) is stationary, and we set the mean E[X(t)] = µ, then the relationship can also
be expressed as: RXX (0) = V ar(X(t)) + (E[X(t)])2 = E[X 2 (t)]. Hence, V ar(X(t)) =
E[X 2 (t)] − (E[X(t)])2 . This emphasizes the role of RXX (0) in assessing both variance and
103
expected power. The equation E[X 2 (t)] = RXX (0) reflects a fundamental relationship in
stochastic process analysis, indicating that the expected value of the square of the process
is linked directly to its autocorrelation at a zero lag. This is crucial for understanding the
power and variance characteristics of random processes.
Problem : Find the Power Spectral Density (PSD) of the given wireless signal X(t) with
1 −a|τ |
the auto correlation function RXX (τ ) = 2a e . GIven that a= 5Khz. Calculate the power
spectral density for the given signal. From PSD calculate BW required which contains 90% of
the signal energy.
Solution: In wireless communication, the power outside the bandwidth, should not get trans-
mitted.
1
E[X]2 (t) = R[ XX](0) = 2a is the poer of the signal.
Step 1: Autocorrelation Function
1 −a|τ |
RXX (z) = 2a e , where a = 5 kHz
Step 2: Calculate the Power Spectral Density (PSD) R∞
Fourier Transform of the Autocorrelation Function: SXX (f ) = −∞ RXX (τ )e−j2πf τ , dτ.
R ∞ 1 −a|τ | −j2πf τ
SXX (f ) = −∞ 2a e e , dτ.
104
R R∞
0
SXX (f ) = 1
2a −∞
eaτ e−j2πf τ , dτ + 0
e−aτ e−j2πf τ , dτ
1 1 1 1
SXX (f ) = 2a a−j2πf
+ a+j2πf
= a2 +(2πf )2
The total energy of the signal corresponds to the integral of the PSD over all frequencies:
R∞
Etotal = −∞
SXX (f )df .
1
Rf 1
0.9 2a = −f a2 +(2πf )2
df.
1 2π −1 F
0.9 2a = 4aπ 2 tan a
2π
∴ f = 5 Khz and Bandwidth = 10 Khz.
8.12 Ergodicity
What is ergodicity? Ergodicity is a concept from statistics and probability theory, particularly
in the context of stochastic processes and dynamical systems. In simple terms, ergodicity
connects the behavior of a system over time with its behavior over an ensemble (a collection of
all possible states).
Ergodicity tells us that observing a single system for a long time is equivalent to observing
many identical systems at one point in time. It’s crucial in areas like:
Signal processing
105
Statistical mechanics
Time series analysis
Define ergodicity? A process X(t) is ergodic if, for a given function f, the following holds:
RT
limn→∞ 0 f [X(t)]dt = E[f (X)]
The average of f[X(t)] over time is equal to the expected value over all possible states.
Example : A random process is given by X(t) = cos (t+ϕ) where ϕ is a random variable
distributed in (0,2π). Show that X(t) is (i)stationary in the wide sense, (ii) ergodic (based on
the first order). Solution: A random process is said to be stationary in the wide sense if its
mean is constant over time and its autocorelation function depends only on the time difference
τ = t1 − t2
E[X(t)] = E[cos(t + ϕ)]
Since ϕ is uniformly
R 2π distributed on (0, 2ϕ) :
1
E[X(t)] = 2π 0
cos(t + ϕ)dϕ = 0
This shows that the mean ( E[X(t)] ) is constant (zero) for all ( t ).
Verify the auto-correlation function RXX (t1 , t2 ) = E[X(t1 )X(t2 )] = E[X(t)X(t + τ )]
E[X(t)X(t + τ )] =R E[cos(t + ϕ). cos(t + τ + ϕ)]
2π
RXX (t, t + τ ) = 21 0 cos(2t + τ + 2ϕ). 2π 1
dϕ + 12 cosτ
∴ RXX (t, t + τ ) = 12 cosτ
∴ The process is wide sense R T stationary.1 R T
¯
The time average XT = −T X(t)dt = 2T −T cos(t + ϕ)dt
X¯T = 2T1
[sin(T + ϕ) − sin(−T + ϕ)]
limT →∞ X¯T = 0 as T → to ∞ and since sine lies between -1 to +1. Since the time average X¯T
is equal to the ensemble average, (both zero) the process is mean ergodic.
Auto-Correlation: Used in time series analysis for identifying trends, seasonality, and
106
cyclic patterns. Commonly applied in fields such as economics, meteorology, and engi-
neering to analyze the stability and predictability of processes.
Cross-Correlation: Commonly used in signal processing to detect time delays and rela-
tionships between signals (e.g., in communication systems, physics, and audio processing).
Used in image processing to detect similar features across different images.
107
Assuming ( m(t) ) is a random process with certain statistical properties, it can be charac-
terized by its mean ( E[m(t)] ) and autocorrelation function Rm (τ ) = E[m(t)m(t + τ )]. For
stationary random processes, this function depends only on the time difference τ .
When transmitting the AM signal, it is subjected to various types of noise, often modeled as
an additive white Gaussian noise (AWGN). The received signal ( r(t) ) can be described as:
r(t) = s(t) + n(t) = A + m(t) cos(2πfc t) + n(t) where n(t) is the noise process, modeled as
n(t): A stationary random process with zero mean, characterized by its power spectral density
Sn (f ).
The mean of the received signal can be calculated as: E[r(t)] = E[(A + m(t)) cos(2πfc t)] +
E[n(t)] Since ( n(t) ) is a zero-mean process: E[r(t)] = (A + E[m(t)]) cos(2πfc t) If ( m(t) ) is
also zero-mean i.e., E[m(t)] = 0 : E[r(t)] =A cos(2πfc t)
The autocorrelation function of the received signal ( r(t) ) can be expressed as: Rr (t1 , t2 ) =
E[r(t1 )r(t2 )] Expanding this, we have: Rr (t1 , t2 ) = E [(A + m(t1 )) cos(2πfc t1 ) + n(t1 )]·E [(A + m(t2 )) cos(2
Applying the expectation and properties of independent processes, the autocorrelation function
can be separated into parts relating to m(t) and n(t):
Rr (t1 , t2 ) = [A2 + E[m(t1 )]E[m(t2 )]] cos(2πfc (t1 + t2 )) + Rm (t1 , t2 ) cos(2πfc (t1 + t2 )) + Rn (t1 , t2 )
When the received signal ( r(t) ) is demodulated, the presence of noise affects the accuracy of
the signal recovery:
The mean square error (MSE) in demodulation can be evaluated, providing key insights
into how noise affects the received message signal. The performance of the demodulator can be
understood by analyzing the probability of error, which depends on the SNR.
Applications:
Modeling arrival times in queuing theory (e.g., customers arriving at a service counter).
Telecommunications (e.g., modeling packet arrivals or calls in a network).
Traffic flow analysis.
NOTE : Gaussian Processes are characterized by their linear relationships and normal dis-
tributions, making them suitable for noise modeling and regression analysis. Poisson Processes
are employed for modeling count events that occur independently in time or space, especially
relevant in fields dealing with random occurrences, such as telecommunications and queuing
theory.
109
it. This property makes Markov chains memoryless.
State Space (S): The set of all possible states that the process can occupy, denoted as
S = s1 , s2 , . . . , sn . Transition Probabilities: The probabilities of moving from one state to
another are represented in a transition matrix P : Pij = P (Xn+1 = sj | Xn = si ) where Pij is
the probability of transitioning from state si to state sj .
Initial State Distribution: The process starts at an initial state, described by a probability
vector π = [π1 , π2 , . . . , πn ], where πi is the probability of starting in state si .
The transition matrix ( P ) contains all transition probabilities. For example, for a system
with states s1 , s2 , s3 , the transition matrix ( P ) might look like:
P11 P12 P13
P = P21 P22 P23
P31 P32 P33
In the equation rows indicate initial state and column shows next state.
State Transition: To describe the system moving from one state to another, the state prob-
ability distribution at the next step can be computed as: pn+1 = pn · P, where pn is the
probability distribution over states at time ( n ).
n-step Transition: The n-step transition probabilities can be calculated by raising the
transition matrix to the power of ( n ): P (n) = P n ,
and the probabilities can be computed as: pn = p0 · P n ,
where p0 is the initial state distribution.
Stationary Distribution: A stationary distribution π satisfies: πP = π, meaning that
the distribution does not change after applying the transition matrix.
Assumptions in Markov Process:
Finite number of states.
States are mutually exclusive.
States are collectively exhaustive.
Probability of moving from one state to another is constant over time.
Transition probability : The probability of moving from one state to another or remaining in
same state is called as transition probability. Rij = P (N extstateSj att = 1|initialstateSi att =
o) where i is the initial state and j is the final state.
Example : In a certain market, only two brands of refrigerator A and B are sold. Given
that a man last purchased brand A, there is 80 % chance that he would buy the same brand
in the next purchase, while if a man purchased brand B, there is 90% chance that his next
purchase would be brand B. Using this information,
Develop transition probability matrix.
110
Interprete the state transition matrix in terms of retention and loss as well as retention
and gain.
Draw transition diagram.
Solution
P11 P12 0.8 0.2
P = =
P21 P22 0.1 0.9
Retention and Loss
P11 = 0.8 = 80% retention to A
P12 = P(B next at t = 1 — A at time 0)
P12 = 0.2 = 20% loss to A
Retention and gain.
P21 = 0.1 = 10% Loss to B
P22 = 0.9 = 90% Retention to B
Transition Diagram
0.8
0.2
A B 0.9
0.1
In the transition diagram A and B are not absorbing because it is moving from A to B and
again from B to A.
Example : The school of international studies for population found out by its survey that
the mobility of population of a state to the village, town and city is in the following percentage.
Interprete the state transition matrix in terms of retention and loss as well as retention
and gain.
Draw transition diagram.
Solution
Retention and loss
P11 = 0.5 = Retention to Village, P12 = 0.3 = Loss to Village, P13 = 0.2 = Loss to Village
P21 = 0.1 = Loss to Town or gain to village, P22 = 0.7 = Retention to town,
P23 = 0.2 = loss to town and gain to city, P31 = 0.1 = Loss to city
111
0.7
0.3
0.5 V T
0.1
0.5
The probability that the customer returns to Store A after a visit is P (A → A) = 0.8.
The probability that the customer returns to Store B after a visit is P (B → B) = 0.7.
The probability that the customer returns to Store C after a visit is P (C → C) = 0.6.
Probability of transferring from Store A to Store B: P (A → B) = 0.10
Probability of transferring from Store A to Store C: P (A → C) = 0.10
Probability of transferring from Store B to Store C: P (B → C) = 0.10
Probability of transferring from Store B to Store A: P (B → A) = 0.20
Probability of transferring from Store C to Store A: P (C → A) = 0.10
Probability of transferring from Store C to Store B: P (C → B) = 0.30
Initially 200 customers in shop A, 120 in shop B and 180 in shop C.
Solution :
112
∴, A1 = 0.4040, B1 = 0.316, C1 = 0, 280
New state : Customer in shop A = 202, Customer in shop B = 158 and Customer in shop C =
140.
0.7
0.1
0.8 A
V B
0.2
0.6
113
9 Queueing Theory
There are many situation in daily life when a queue is formed. For example, machine is waiting
to be repaired, patients waiting in a Doctor’s room, counters form queues. Queue is formed if
the service required by the customer (machine, patient, car, etc.) is not immediately available,
that is if the current demand for a particular service exceeds the capacity to provide the service.
Queues may be decreased in size or prevented from forming by providing additional service fa-
cilities which results in a drop in the profit. On the other hand, excessively long queues may
result in lost sales and lost customers. Hence the problem of interest is how to achieve a balance
between the cost associated with the prevention of waiting in order to maximize the profit. As
queueing theory provides an answer to this problem, it has become a topic of interest. Before
we consider the solutions of queueing problems, we shall consider the general framework of a
queueing system [1].
Although there are many types of queueing systems, all of them can be classified and de-
scribed according to the following characteristics.
The input (or arrival) pattern: The input (arrival) pattern in queueing systems describes
how entities (such as customers, data packets, or jobs) arrive at the system over time. The
most common pattern assumed in queueing theory is the Poisson arrival process, where
arrivals occur randomly but at an average rate, and the number of arrivals in any interval
follows a Poisson distribution. This is often used because of its mathematical simplicity
and because many natural processes approximate this randomness, such as customers
arriving at a bank or calls coming into a call center.
The service mechanism in a queueing system describes how long it takes to serve each
customer or entity. When we say it follows an exponential distribution, it means that
the service times are memoryless and randomly distributed, with a constant average rate
of service. Specifically, the exponential distribution is characterized by the probability
density function: f (t) = µe−µt , t ≥ 0
where:
( t ) is the service time, µ is the service rate, representing the average number of customers
served per unit time.
This means that the likelihood of finishing service at any moment is independent of how
long the customer has already been served, making the process ”memoryless.” For exam-
ple, if a machine has a 5-minute average service time, the probability that it will finish
exactly at 5 minutes is the same as the probability it will finish at 6 minutes, 10 minutes,
or any other time, provided the process continues. The exponential service pattern sim-
plifies analysis in queueing models like M/M/1 and M/M/c systems, where both arrival
and service times are assumed to be exponentially distributed.
In an exponential distribution, the mean (or expected value) of the service or inter-arrival
times is given by:
114
Mean = µ1
Here’s why:
The parameter µ represents the rate at which events occur (for example, the number of
customers served per unit time).
The exponential distribution’s probability density function (pdf) is:
f (t) = µe−µt , t ≥ 0
The expected value (mean) of the exponential distribution is derived by integrating ( t )
times the pdf over all ( t ):
R∞
E[T ] = 0 t · µe−µt dt Solving this integral, you find:
E[T ] = µ1
If µ is the rate of service (e.g., 2 customers per minute), then the average service time is
the reciprocal:
Average service time = µ1 = 12 minute So, higher µ means faster service and lower average
service times; lower µ means slower service and higher average times.
The queue discipline: Queue discipline refers to the rules or policies that determine how
entities (customers, data packets, jobs, etc.) are selected for service in a queueing system.
It essentially defines the order in which entities are served, affecting the system’s fairness,
efficiency, and waiting times.
The most common queue discipline is First-Come, First-Served (FCFS), where en-
tities are served in the order they arrive, ensuring fairness. Other types include: Prior-
ity Queueing: Entities are served based on priority levels; higher-priority entities are
served first regardless of arrival time. Round Robin: Each entity gets a fixed time
slot in a cyclic order, often used in computer systems to share CPU time. Last-Come,
First-Served (LCFS): The most recent arrival is served next, used in some specific
scenarios like certain types of processing. Service in batches: Multiple entities are
served together as a group rather than individually. Choosing a queue discipline depends
on system goals—whether fairness, minimizing wait times, or prioritizing urgent tasks is
most important. It influences overall system performance and user satisfaction.
Standard formulas for these queueing parameters, typically in the context of an M/M/1
system:
115
Single server queueing systems.
Problem : Only one railway reservation concession form counter is available in the institute.
The students arrive at a counter according to a Poisson’s input process with mean rate of 30
per hour. The time required to serve a student has an exponential distribution with mean 90
seconds. Find the
Length of service L s
Queue Length L q
Given data:
116
λ2 30X30 9
Lq = µ(µ−λ)
= 40(40−30)
= 4
λ 30 3 3
Waiting time in queue = Wq = µ(µ−λ)
= 40(40−30)
= 40
hours = 40
∗ 60 = 49 min
1 1 1
Time spent by a customer in the system = Ws = µ−λ
= 40−30
= 40−30
hours = 6 min
117
References
[1] T Veerarajan, ”Probability, Statistics and Random Processes”, second edition, Tata
McGraw-Hill, 2003.
[2] Peter Zörnig, ”Probability Theory and Statistics with Real World Applications” , Deutsche
National bibliothek, 2024.
[3] Marcello Pagano, Kimberlee Gauvreau, ”Principles of Biostatistics”, 2nd edition, Duxbury
Thomson Learning, 2000.
[4] Scott Miller, Donald Childers, ”Probability and Random Processes”, 2nd edition, Elsevier,
2012.
[5] Murray R Spiegel, ”Schaum’s Outline of Theory and Problems of Statistics”, 2nd edition,
McGraw Hill, 1992.
[6] Ronald E Walpole, Raymond H. Myers, Sharon L Myers, Keying E. Ye, ”Probability and
Statistics for Engineers and Scientist”, 9nd edition, Pearson, 2022.
[7] Henry Stark, John W Woods, ”Probability and Random Processes with Applications to
Signal Processing”, 3rd edition, Pearson, 2012.
[9] Feller, ”An Introduction to Probability Theory and Its Applications”, 2nd edition, Wiley,
1970.
[10] Scheldon M Ross, ”Probability Models”, 6th edition, Harcourt Asia PTE LTD, 2000.
[11] Richard A Johnson, ”Miller & Freund’s Probability and Statistics for Enginnes”, 6th
edition, Pearson, 2001.
[12] Alberto Leon-Garcia, ”Probability and Random Processes for Electrical Engineering”, 2nd
edition, Pearson, 2009.
[13] J Susan Milton, Jesse C. Arnold ”Introduction to Probability and Statistics”, 4th edition,
McGrawHill, 2014.
118
119
120