0% found this document useful (0 votes)
7 views121 pages

Notes On PAS

The document is a comprehensive set of notes on Probability and Stochastic Processes, intended for second-year Electronics and Telecommunication students. It covers various topics including set theory, probability concepts, random variables, statistics, and random processes, along with numerous examples and applications. The notes are authored by Dr. Amol Deshpande and are part of the curriculum at Bharatiya Vidya Bhavan’s Sardar Patel Institute of Technology, University of Mumbai.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views121 pages

Notes On PAS

The document is a comprehensive set of notes on Probability and Stochastic Processes, intended for second-year Electronics and Telecommunication students. It covers various topics including set theory, probability concepts, random variables, statistics, and random processes, along with numerous examples and applications. The notes are authored by Dr. Amol Deshpande and are part of the curriculum at Bharatiya Vidya Bhavan’s Sardar Patel Institute of Technology, University of Mumbai.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Notes on Probability and Stochastic Processes

Second Year Electronics and Telecommunication

by

Dr Amol Deshpande
Govind Tukram Haldankar

Electronics Engineering
Bharatiya Vidya Bhavan’s
Sardar Patel Institute of Technology
Munshi Nagar, Andheri(W), Mumbai-400058
University of Mumbai
January 2026
Contents
1 Probability and Stochastic Processes 4
1.1 Set theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.1 Types of Sets: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.2 Operations on Sets: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.3 Relationships between Sets: . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Key Concepts in Probability: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Random Variable: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4.1 Discrete Random Variable: . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4.2 Numerical on Discrete Random Variable . . . . . . . . . . . . . . . . . . 9
1.5 Continuous Random Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.6 Example: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.7 Types of Probability: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.8 Inclusive and mutually exclusive probability: . . . . . . . . . . . . . . . . . . . . 26
1.8.1 Inclusive probability: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.8.2 Mutually exclusive events . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.8.3 Examples of Mutually Exclusive Events: . . . . . . . . . . . . . . . . . . 27
1.8.4 Non mutually exclusive events . . . . . . . . . . . . . . . . . . . . . . . . 28
1.8.5 Example of Non Mutually exclusive events . . . . . . . . . . . . . . . . . 28
1.8.6 Brackets and their meanings . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.8.7 Independent events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.8.8 Example of Independent Events: . . . . . . . . . . . . . . . . . . . . . . . 29
1.8.9 Example of dependent Events: . . . . . . . . . . . . . . . . . . . . . . . . 29
1.9 Conditional probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.10 Bayes’ Theorem: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.11 Gamma function: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
1.11.1 Properties of the Gamma Function : . . . . . . . . . . . . . . . . . . . . 31

2 Statistics 38
2.1 Key Components of Statistics: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.2 Applications of Statistics: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.3 Descriptive statistics: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3 Random number 41
3.1 Discrete Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.2 Probability Mass Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.3 Example on Probability Mass Function . . . . . . . . . . . . . . . . . . . . . . . 42
3.4 Usage in Discrete Probability Distributions . . . . . . . . . . . . . . . . . . . . . 44
3.5 Binomial distribution: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.6 Example on Binomial Distribution: . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.7 Poisson’s distribution: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.8 Continuous Random Variables: . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.8.1 Uniform Random Variable . . . . . . . . . . . . . . . . . . . . . . . . . . 48

1
3.8.2 Exponential Random Variable : . . . . . . . . . . . . . . . . . . . . . . . 50
3.9 Comparison between exponential distribution and poisson’s distrubution . . . . 51
3.9.1 A Laplace random variable : . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.10 Random Process: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.11 Types of Random Processes: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.12 Examples of Random Processes: . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4 Multiple Random Variables 54


4.1 Example: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.2 Two dimension Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.3 Example: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.4 Example : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.4.1 Joint PDF and Marginal PDF : . . . . . . . . . . . . . . . . . . . . . . . 56
4.5 Maginal Distribution Function: . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.6 Marginal Probability Mass Function (Discrete Case) . . . . . . . . . . . 57
4.7 Example : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.7.1 The marginal probability density function (PDF) . . . . . . . . . . . . . 58
4.7.2 Example : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.7.3 Example : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.7.4 Example : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.7.5 Example : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.7.6 Example : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.7.7 Example : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.7.8 Example : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.7.9 Example : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.7.10 Example : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.7.11 Example : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.7.12 Example : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

5 Normal distribution: 65
5.1 Probability distribution of a binomial random variable . . . . . . . . . . . . . . 65
5.2 Probability distribution of a binomial random variable . . . . . . . . . . . . . . 65
5.3 Gaussian distribution: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.4 Question . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.5 Normal Approximations of the Binomial Distribution . . . . . . . . . . . . . . . 69
5.6 Probability Histogram for Binomial Distribution . . . . . . . . . . . . . . . . . . 70
5.7 Example : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.8 Example : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.9 Example : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.10 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

6 Series expansion 80

7 Characteristic function 80

2
8 Random Processes: 84
8.1 Ensemble averages: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
8.2 What is the definition of a stationary random process? . . . . . . . . . . . . . . 85
8.3 Invariant under the translation of time period . . . . . . . . . . . . . . . . . . . 85
8.4 Stationarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
8.5 Stochastic Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
8.6 Bounds of Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
8.7 Law of Large Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
8.8 Central Limit Theorem (CLT) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
8.9 Covariance and Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
8.10 Spectral Characteristics of Random Process . . . . . . . . . . . . . . . . . . . . 102
8.11 Autocorrelation and Power Spectral Density . . . . . . . . . . . . . . . . . . . . 103
8.12 Ergodicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
8.13 Auto-Correlation and Cross-Correlation Functions . . . . . . . . . . . . . . . . . 106
8.14 Impulse Response in LTI Systems . . . . . . . . . . . . . . . . . . . . . . . . . . 107
8.15 Applications in Noise Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
8.16 Gaussian and Poisson random processes . . . . . . . . . . . . . . . . . . . . . . 108
8.17 Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

9 Queueing Theory 114

3
1 Probability and Stochastic Processes
1.1 Set theory
Set theory is a branch of mathematical logic that studies collections of objects, called sets. It
provides a fundamental framework for various mathematical concepts and structures. Here are
some key points to explain set theory:

ˆ A set is a collection of distinct objects, considered as a whole. The objects in a set are
called elements or members.

ˆ Sets are typically denoted using curly braces. For example, A = {1, 2, 3} is a set
containing the numbers 1, 2, and 3.

1.1.1 Types of Sets:


ˆ Finite Set: A set with a limited number of elements. E.g., a, b, c .
ˆ Infinite Set: A set with an unlimited number of elements. E.g., the set of all natural
numbers (1, 2, 3, . . . ).

ˆ Empty Set: A set with no elements, denoted as (∅) or ().


1.1.2 Operations on Sets:
ˆ Union: The set containing all elements from both sets. ( A ∪ B).
ˆ Intersection: The set containing only elements common to both sets. ( A ∩ B ).
ˆ Difference: The set of elements in one set but not the other. ( A - B ).
ˆ Complement: The set of all elements not in the given set, relative to a universal set.
1. Union The union of two sets ( A ) and ( B ) is the set that contains all elements from
both sets.
Example: Let ( A = {1, 2, 3} ) and ( B = {3, 4, 5} ).
The union ( A ∪ B ) is: A ∪ B = {1, 2, 3, 4, 5}

2. Intersection The intersection of two sets ( A ) and ( B ) is the set that contains only the
elements that are common to both sets.
Example: Using the same sets ( A ) and ( B ): The intersection ( A ∩ B ) is: A ∩ B = {3}

3. Complement The complement of a set ( A ) refers to all the elements in a universal set (
U ) that are not in ( A ).
Example: Let ( U = {1, 2, 3, 4, 5, 6} ) (the universal set) and ( A = {2, 3} ). The
complement of ( A ), denoted as ( A’ ) or ( A ), is: A’ = U - A = {1, 4, 5, 6}

4
1.1.3 Relationships between Sets:
ˆ Subset: A set ( A ) is a subset of set ( B ) if all elements of ( A ) are also in ( B ). Denoted
as ( A ⊆ B ).

ˆ Proper Subset: ( A ) is a proper subset of ( B ) if ( A ⊆ B ) and ( A ̸= B ).


ˆ Subset: All elements of ( A ) are in ( B ): ( A ⊆ B ).
ˆ Proper Subset: ( A ) is a subset of ( B ) and (A ̸= B): ( A ⊂ B ).
ˆ Equal Sets: If ( A ) and ( B ) contain exactly the same elements: ( A = B ).
ˆ Non-Subset: If at least one element of ( C ) is not in ( B ): (C ⊈ B ).
Applications: Set theory forms the basis for various other mathematical disciplines, such as
topology, abstract algebra, and measure theory. It is essential for defining functions, relations,
cardinal numbers, and more.
Set theory helps to establish a precise language and framework for mathematics, making it
foundational to the field.

1.2 Probability
Probability is a branch of mathematics that measures the likelihood or chance of an event
occurring. It quantifies uncertainty and is expressed as a number between 0 and 1, where:

ˆ A probability of 0 indicates that an event is impossible (it will not occur).


ˆ A probability of 1 indicates that an event is certain (it will always occur).
1.3 Key Concepts in Probability:
ˆ Experiment: An action or process that leads to an outcome. For example, rolling a die
or flipping a coin.

ˆ Outcome: The result of a single trial of an experiment. For example, getting a 4 when
rolling a die.

ˆ Event: A specific outcome or a set of outcomes from an experiment. For example, the
event of rolling an even number on a die includes the outcomes {2, 4, 6}.

ˆ Sample Space: The set of all possible outcomes of an experiment. For example, when
rolling a die, the sample space is ( S = {1, 2, 3, 4, 5, 6} ).

ˆ Probability of an Event: The probability ( P(E) ) of an event ( E ) occurring is calculated


as the ratio of the number of favorable outcomes to the total number of outcomes in the
sample space.
This can be expressed as: [P (E) = Number of favorable outcomes
Total number of outcomes
]

5
Numerical 1 : The probability of the closing of each relay (5 relays are shown in the
circuit) in the circuit shown below is given by p (we say that a relay is closed, when current
can flow). If all relays function independently, what is the probability that the lamp lights?

Solution : The lamp lights if there exists at least one conducting path from the supply to
the lamp through the relay network.
We analyze the circuit by conditioning on the state of Relay R3 .

Case 1: Relay R3 is open


When R3 is open, the circuit reduces to two parallel branches:

ˆ Top branch: R and R in series


1 2

ˆ Bottom branch: R and R in series


4 5

The probability that the top branch conducts is

P (A) = P (R1 ∩ R2 ) = p × p = p2
Similarly, the probability that the bottom branch conducts is

P (B) = P (R4 ∩ R5 ) = p × p = p2
The lamp lights if at least one branch conducts:

P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
Since the relays are independent,

P (A ∩ B) = p4
Therefore,

P (A ∪ B) = p2 + p2 − p4

P (A ∪ B) = 2p2 − p4
Since R3 is open with probability (1 − p), the probability contribution is

P1 = (1 − p)(2p2 − p4 )

6
Case 2: Relay R3 is closed
If R3 is closed, the middle connection links the two branches.
The circuit conducts if

ˆ At least one of R 1 or R4 is closed (left side), and

ˆ At least one of R 2 or R5 is closed (right side).

Probability that the left side conducts:

PL = 1 − (1 − p)2
Probability that the right side conducts:

PR = 1 − (1 − p)2
Since these events are independent,

P = PL × P R

P = [1 − (1 − p)2 ]2
Since R3 must also be closed (probability p),

P2 = p[1 − (1 − p)2 ]2

Total Probability
The required probability that the lamp lights is

P = P1 + P2

P = (1 − p)(2p2 − p4 ) + p[1 − (1 − p)2 ]2


Simplifying,

P = 2p2 + 2p3 − 5p4 + 2p5

Final Result

P (Lamp lights) = 2p2 + 2p3 − 5p4 + 2p5

1.4 Random Variable:


A random variable is a function that assigns a real number to each outcome of a random
experiment. Random variables are broadly classified into two types: discrete and continuous.

7
1.4.1 Discrete Random Variable:
A discrete random variable takes a finite or countably infinite number of distinct values.
Let X be a discrete random variable that can take values x1 , x2 , x3 , . . ..
The probability mass function (PMF) of X is defined as
P (X = xi ) = p(xi ), i = 1, 2, 3, . . .
The PMF satisfies the following properties:
p(xi ) ≥ 0, ∀i

X
p(xi ) = 1
i=1

The expectation (mean) of a discrete random variable is given by



X
E[X] = xi p(xi )
i=1

The variance of a discrete random variable is


X∞
Var(X) = (xi − µ)2 p(xi )
i=1

where µ = E[X].

2. Continuous Random Variable


A continuous random variable can take any value in a given interval of the real line.
Let X be a continuous random variable with probability density function (PDF) fX (x).
The PDF satisfies the following properties:
fX (x) ≥ 0, ∀x
Z ∞
fX (x) dx = 1
−∞
For a continuous random variable, the probability that X lies between a and b is
Z b
P (a ≤ X ≤ b) = fX (x) dx
a

The expectation (mean) of a continuous random variable is given by


Z ∞
E[X] = xfX (x) dx
−∞

The variance of a continuous random variable is


Z ∞
Var(X) = (x − µ)2 fX (x) dx
−∞

where µ = E[X].

8
1.4.2 Numerical on Discrete Random Variable
A random variable X represents the number obtained when a fair die is thrown once.

(a) Find the probability mass function of X


(b) Find the mean and variance of X

Solution:
Since the die is fair,
X = 1, 2, 3, 4, 5, 6
1
P (X = x) = , x = 1, 2, 3, 4, 5, 6
6

Mean:
6 6
X X 1 1 21
E[X] = xP (X = x) = x· = (1 + 2 + 3 + 4 + 5 + 6) = = 3.5
x=1 x=1
6 6 6

Second moment:
6
X 1 91
E[X 2 ] = x2 P (X = x) = (12 + 22 + 32 + 42 + 52 + 62 ) =
x=1
6 6

Variance:
91 35
Var(X) = E[X 2 ] − (E[X])2 = − (3.5)2 =
6 12

35
E[X] = 3.5, Var(X) =
12

9
Qu 1 Suppose that each of 3 sticks is broken into one long and one short part. The 6 parts
are arranged into 3 pairs from which new sticks are formed. Determine the probability that
the parts will be joined in the original order.
Solution : Label the parts as: L1 , L2 , L3 and S1 , S2 , S3
The correct (original) pairing is: (L1 , S1 ), (L2 , S2 ), (L3 , S3 )
[ Total number of ways to pair: ]
We must pair 3 long parts with 3 short parts. Number of ways to match 3 long parts with
3 short parts: [ = 3! = 6 ]
[ Favorable outcomes: ]
Only one arrangement gives the original pairing: (L1 , S1 ), (L2 , S2 ), (L3 , S3 )
So, favorable cases: [ = 1 ]
[ Probability: ]
P = Favorable outcomes
Total outcomes
= 3!1 = 61
1
[ Final Answer: ]
6
[ Problem: ] A product is manufactured by two factories A and B. 80% of the product is
manufactured in company A and 20% in company B. 30% of the products from A are defective,
while 10% from B are defective.
If a randomly selected product from the market is found to be defective, find the probability
that it was manufactured by company A.
[ Solution: ]
Let: [ P(A) = 0.8, P(B) = 0.2 ] [ P(D—A) = 0.3, P(D—B) = 0.1 ]
We need to find: [ P(A—D) ]
P (A),P (D|A)
Using Bayes’ Theorem: P (A|D) = P (A),P (D|A)+P (B),P (D|B)
[ Substituting values: ]
0.8×0.3
P (A|D) = (0.8×0.3)+(0.2×0.1)
0.24
= 0.24+0.02
= 0.24
0.26
= 1213
12
[ P (A|D) = ≈ 0.9231 ]
13

1.5 Continuous Random Variable


A continuous random variable X has probability density function
(
2x, 0 ≤ x ≤ 1
fX (x) =
0, otherwise

(a) Verify that fX (x) is a valid PDF


(b) Find the mean of X

Solution:

10
(a) Verification:
Z ∞ Z 1
 1
fX (x) dx = 2x dx = x2 0 = 1
−∞ 0

Hence, fX (x) is a valid PDF.

(b) Mean:
Z ∞ Z 1 Z 1
E[X] = xfX (x) dx = x(2x) dx = 2 x2 dx
−∞ 0 0
 3
1
x 2
=2 =
3 0 3

2
E[X] =
3
Example 2 (Discrete Random Variable):
Let X be the number obtained when a fair die is thrown once.
The mean of X is E[X] = 16 (1 + 2 + 3 + 4 + 5 + 6) = 3.5.
The second moment is E[X 2 ] = 61 (12 + 22 + 32 + 42 + 52 + 62 ) = 91
6
.
Therefore, the variance is Var(X) = 916
35
− (3.5)2 = 12 .

Example 3 (Continuous Random Variable):


Let a continuous random variable X have the probability density function f (x) = 2x for
0 ≤ x ≤ 1. R1
The mean of X is E[X] = 0 x(2x) dx = 23 .
R1
The second moment is E[X 2 ] = 0 x2 (2x) dx = 12 .
2
Hence, the variance is Var(X) = E[X 2 ] − (E[X])2 = 21 − 23 = 18
1
.

Example 4 (Continuous Random Variable):


Let X be uniformly distributed
R1 over the interval [0, 1].
1
The mean is E[X] = 0 x dx = 2 .
R1
The second moment is E[X 2 ] = 0 x2 dx = 31 .
2
Therefore, the variance is Var(X) = 13 − 21 = 12 1
.

Example 1:
The distribution function of a continuous random variable X is given by F (x) = 1−e−x , x ≥
0.
Find the probability density function of X.
Solution:
The probability density function is the derivative of the distribution function.
d d
f (x) = F (x) = (1 − e−x ) = e−x , x ≥ 0.
dx dx

11
Hence, the probability density function is f (x) = e−x , x ≥ 0.

Example 2:
The distribution
 function of a random variable X is defined as
0, x < 0

F (x) = x2 , 0 ≤ x ≤ 1

1, x > 1

Find the corresponding probability density function.
Solution:
The probability density function is obtained by differentiating F (x).
For 0 ≤ x ≤ 1,
d 2
f (x) = (x ) = 2x.
dx
Thus, the
( probability density function is
2x, 0 ≤ x ≤ 1
f (x) = .
0, otherwise
Example 3:
The distribution
 function of a random variable X is

 0, x<1
x−1

F (x) = , 1≤x≤4

 3
1, x>4

Find P (2 ≤ X ≤ 3) and the probability density function.
Solution:
Using the distribution function,
P (2 ≤ X ≤ 3) = F (3) − F (2).
3−1 2 2−1 1
F (3) = = F (2) = =
3 3 3 3
Hence,
2 1 1
P (2 ≤ X ≤ 3) = − = .
3 3 3
The probability density function is
d 1
f (x) = F (x) = , 1 ≤ x ≤ 4.
dx 3
Example 4:
x
Let the distribution function of X be F (x) = for 0 ≤ x ≤ 5.
5
Find the probability density function and verify that it integrates to unity.
Solution:
The probability density function is
d 1
f (x) = F (x) = , 0 ≤ x ≤ 5.
dx 5
Verification:
R5 R5 1 1
0
f (x) dx = 0
dx = (5) = 1.
5 5

12
Hence, f (x) is a valid probability density function.

For a continuous random variable, the probability density function is the derivative of the
d
distribution function, i.e., f (x) = F (x).
dx
Example 1:
A discrete random variable X takes the values 0, 1, 2 with probabilities
P (X = 0) = 0.2, P (X = 1) = 0.5, P (X = 2) = 0.3.
Find the distribution function F (x).
Solution:
The distribution function is defined as F (x) = P (X ≤ x).
For x < 0, F (x) = 0.
For 0 ≤ x < 1, F (x) = P (X = 0) = 0.2.
For 1 ≤ x < 2, F (x) = P (X = 0) + P (X = 1) = 0.2 + 0.5 = 0.7.
For x ≥ 2, F (x) = P (X = 0) + P (X = 1) + P (X = 2) = 1.
Hence, 

 0, x<0

0.2, 0 ≤ x < 1
F (x) = .


 0.7, 1 ≤ x < 2
x≥2

1,
Example 2:
The distribution
 function of a discrete random variable X is given by

 0, x<1

0.3, 1 ≤ x < 2
F (x) = .


 0.8, 2 ≤ x < 3
x≥3

1,
Find P (X = 2).
Solution:
For a discrete random variable, P (X = x) = F (x) − F (x− ).
Therefore,
P (X = 2) = F (2) − F (2− ) = 0.8 − 0.3 = 0.5.

Example 3:
The distribution
 function of X is

 0, x <0

0.4, 0 ≤ x < 1
F (x) = .


 0.6, 1 ≤ x < 2
x≥2

1,
Find the probability mass function of X.
Solution:
P (X = 0) = F (0) − F (0− ) = 0.4 − 0 = 0.4.
P (X = 1) = F (1) − F (1− ) = 0.6 − 0.4 = 0.2.

13
P (X = 2) = F (2) − F (2− ) = 1 − 0.6 = 0.4.
Hence, the probability
 mass function is
0.4, x = 0

P (X = x) = 0.2, x = 1 .

0.4, x = 2

Example 4:
A function
 F (x) is defined as

 0, x < 0
1, 0 ≤ x < 1



F (x) = 43 .

 , 1 ≤ x < 2
4



1, x ≥ 2
Verify whether F (x) is a valid distribution function.
Solution:
The given function satisfies the following properties: F (x) is non-decreasing, limx→−∞ F (x) =
0, limx→∞ F (x) = 1.
Hence, F (x) is a valid distribution function.

For a discrete random variable, the distribution function is a step function and the proba-
bility mass function is obtained using P (X = x) = F (x) − F (x− ).

The cumulative distribution function (CDF) of a random variable X is defined as F (x) =


P (X ≤ x) and it is defined for both discrete and continuous random variables.

1.6 Example:
Example 1: If you roll a fair six-sided die:
The sample space ( S ) is ( 1, 2, 3, 4, 5, 6 ).
If you want to calculate the probability of rolling a 3, there is 1 favorable outcome (rolling a 3)
out of 6 possible outcomes: [P (rolling a 3) = 16 ]

Example 2: A die is rolled twice. What is the probability that the sum is 9 or 11?
Solution:
A die is rolled twice. Let us find the probability that the sum is 9 or 11.
Step 1: Total number of outcomes
Each die has 6 faces, so the total number of outcomes when two dice are rolled is:

6 × 6 = 36

Step 2: Favorable outcomes


(i) Sum = 9:
(3, 6), (4, 5), (5, 4), (6, 3)

14
Number of favorable outcomes = 4
(ii) Sum = 11:
(5, 6), (6, 5)
Number of favorable outcomes = 2
Step 3: Total favorable outcomes

4+2=6

Step 4: Probability
favorable outcomes 6 1
P (sum = 9 or 11) = = =
total outcomes 36 6
Answer:
1
6

Independent events :
Two events are said to be independent if the occurrence of one does not affect the occur-
rence of the other. Mathematically, for independent events A and B:

P (A and B) = P (A) · P (B)


Let Ri denote “rain on day i”, i = 1, 2, 3, 4, 5. Each day is independent and

P (Ri ) = 0.2

Then, the probability that it rains on all five days is


P (R1 ∩ R2 ∩ R3 ∩ R4 ∩ R5 ) = P (R1 ) · P (R2 ) · P (R3 ) · P (R4 ) · P (R5 )
= 0.25 = 0.00032
The probability decreases rapidly as more independent events are required to
occur simultaneously.

Example 3: The probability that it rains in a day is 0.2. What is the probability that
there will be rain in the next five days (all five days)?
Solution:
Let the probability of rain on a single day be

P (rain) = 0.2

The probability that it rains on all 5 consecutive days (assuming independence) is

P (rain on all 5 days) = P (rain)5

Substitute the value:


P (all 5 days) = (0.2)5 = 0.00032

15
Answer:
0.00032

Example 4: A mall has the following facilities on various floors:

Place Floor
Movie theater First
Restaurant First
Restaurant Second
Garment Second
Movie theater Second
Restaurant Third
Garment Third

If a customer is chosen at random, what is the probability that the customer is going to a
movie theater or to the third floor?

Solution:
Step 1: Total number of facilities
There are 7 facilities in total:
n=7
Step 2: Identify favorable outcomes
- Movie theaters:

First floor, Movie theater, Second floor, Movie theater ⇒ 2 outcomes

- Third floor facilities:

Restaurant, Third, Garment, Third ⇒ 2 outcomes

- Overlap (movie theater on third floor):

None, so overlap = 0

Step 3: Use the formula for union of events


Let A = going to a movie theater, B = going to third floor. Then:

P (A ∪ B) = P (A) + P (B) − P (A ∩ B)

Substitute the values:


2 2
P (A) = , P (B) = , P (A ∩ B) = 0
7 7

2 2 4
P (A ∪ B) = + −0=
7 7 7

16

Answer:
4
7

Example 5: A bag contains 4 white balls and 3 black balls. Two balls are drawn at
random. Find the probability of drawing one white and one black ball (not necessarily in that
order):
1. Without replacement

2. With replacement
Solution:
Step 1: Total balls in the bag

Total balls = 4 + 3 = 7

Case 1: Without replacement
- First, consider the two possible orders: 1. White first, then Black 2. Black first, then
White
(i) White first, Black second):
4 3 12 2
P (W then B) = · = =
7 6 42 7
(ii) Black first, White second):
3 4 12 2
P (B then W ) = · = =
7 6 42 7
Total probability (without replacement):
2 2 4
P (one white and one black) = + =
7 7 7

Case 2: With replacement
- Here, after drawing the first ball, it is put back into the bag. - The probabilities for each
draw remain the same.
(i) White first, Black second):
4 3 12
P (W then B) = · =
7 7 49
(ii) Black first, White second):
3 4 12
P (B then W ) = · =
7 7 49

17
Total probability (with replacement):
12 12 24
P (one white and one black) = + =
49 49 49

Answer:
4 24
Without replacement: With replacement:
7 49

Numerical 5: Five distinct numbers are randomly given to persons 1 to 5. Persons 1 and
2 compare their numbers and the winner is the one having the smaller number. The winner
compares with person 3 and so on. What is the probability that person 1 wins 2 times?
Solution : Let the numbers assigned to the persons be N1 , N2 , N3 , N4 , N5 , which are distinct
numbers from 1 to 5.
Person 1 wins exactly 2 times if and only if:

1. Person 1 beats person 2 (N1 < N2 )

2. Person 1 beats person 3 (N1 < N3 )

3. Person 1 loses to person 4 (N1 > N4 )

Step 1: Count total permutations


There are 5 distinct numbers, so total permutations:

5! = 120

Step 2: Count favorable outcomes
- Let N1 be person 1’s number. - To win exactly 2 times, N1 must be **smaller than N2 and
N3 , but bigger than N4 **. - By enumeration (or combinatorial reasoning), there are exactly 10
favorable arrangements.

Step 3: Probability
favorable outcomes 10 1
P (person 1 wins exactly 2 times) = = =
total outcomes 120 12

Answer:

1
12

18
Conditional Probability:
Conditional probability refers to the probability of an event occurring given that another
event has already occurred.
Let A and B be two events in a sample space with P (B) > 0. The conditional probability
of A given B is defined as:

P (A ∩ B)
P (A | B) =
P (B)
Explanation:
- P (A ∩ B) is the probability that both A and B occur. - P (B) is the probability that B
occurs. - By dividing P (A ∩ B) by P (B), we are restricting our sample space to B, i.e., we
only consider outcomes where B has occurred.
Example:
Suppose a card is drawn from a standard deck of 52 cards. Let:

A = {the card is a King}, B = {the card is a face card}


4 1
- P (A ∩ B) = P (King) = 52 = 13 - P (B) = P (face card) = 12
52
3
= 13
Then, the conditional probability that the card is a King given that it is a face card:
1
P (A ∩ B) 13 1
P (A | B) = = 3 =
P (B) 13
3
This means that among all face cards, the probability of drawing a King is 31 .

Example 6: Out of 8 items that have arrived, one is defective. A worker picks these parts
one by one. What is the probability that the third is defective given that the first two are not
(use conditional probability)
Solution : Solution:
Let the events be defined as follows:

ˆ A: The third item picked is defective.


ˆ B: The first two items picked are not defective.
We are asked to find the conditional probability:

P (A ∩ B)
P (A | B) =
P (B)

Step 1: Compute P (B)
- Total items = 8, defective = 1, non-defective = 7 - Probability that the first item is not
defective:
7
P (first not defective) =
8

19
- Probability that the second item is not defective (given first is not defective):
6
P (second not defective | first not defective) =
7
- Therefore,
7 6 6 3
P (B) = P (first two not defective) = · = =
8 7 8 4

Step 2: Compute P (A ∩ B)
- For A ∩ B to happen: first two are not defective, third is defective. - Probability:

P (A∩B) = P (first not defective)·P (second not defective | first not defective)·P (third defective | first two

7 6 1 1
P (A ∩ B) = · · =
8 7 6 8

Step 3: Conditional probability
1
P (A ∩ B) 8 1 4 1
P (A | B) = = 3 = · =
P (B) 4
8 3 6

Answer:

1
6

Explanation:
- Conditional probability allows us to compute the probability of an event (third item de-
fective) given that some other event has already occurred (first two are non-defective). - Once
we know the first two are non-defective, there are only 6 items left, one of which is defective,
so the probability that the third is defective is 1/6.

1. Conditional Probability:
Conditional probability is the probability of an event occurring given that another event has
already occurred.
Let A and B be two events in a sample space S, with P (B) > 0. The conditional probability
of A given B is defined as:

P (A ∩ B)
P (A | B) =
P (B)
Example:

20
Suppose a card is drawn from a standard deck of 52 cards. Let

A = {the card is a King}, B = {the card is a face card}.


4 1
- P (A ∩ B) = P (King) = 52 = 13 - P (B) = P (face card) = 1252
3
= 13
Then the probability that the card is a King given that it is a face card:
1
P (A ∩ B) 13 1
P (A | B) = = 3 =
P (B) 13
3

2. Marginal Probability:
Marginal probability is the probability of an event occurring irrespective of the outcome of
other events. It is obtained by summing (or integrating) over all possible outcomes of the other
events.
Let A1 , A2 , . . . , An be a partition of the sample space S, and let B be an event. The marginal
probability of B is:
n
X n
X
P (B) = P (B ∩ Ai ) = P (B | Ai ) P (Ai )
i=1 i=1

Example:
A factory has two machines producing items:
- Machine 1 produces 60% of items, Machine 2 produces 40% - Defective rates: Machine 1
 1%, Machine 2 
2%
Let D be the event that an item is defective. The marginal probability of selecting a
defective item is:

P (D) = P (D | Machine 1)P (Machine 1) + P (D | Machine 2)P (Machine 2)

P (D) = 0.01 · 0.6 + 0.02 · 0.4 = 0.006 + 0.008 = 0.014



Summary:

ˆ Conditional probability P (A | B) depends on the occurrence of another event B.


ˆ Marginal probability P (B) considers B regardless of other events.
Bayes’ Theorem:
Bayes’ theorem provides a way to update the probability of an event based on new evidence.
It relates the conditional and marginal probabilities of random events.
Let A1 , A2 , . . . , An be a partition of the sample space, meaning that these events are mutually
exclusive and exhaustive:
n
[
Ai ∩ Aj = ∅ for i ̸= j, and Ai = S
i=1

21
Let B be any event with P (B) > 0. Then, the probability of Ak given that B has occurred
is:

P (B | Ak ) P (Ak )
P (Ak | B) = Pn
i=1 P (B | Ai ) P (Ai )


Explanation:
- P (Ak ): Prior probability of event Ak (before observing B) - P (B | Ak ): Likelihood of
observing B given Ak - P (AP k | B): Posterior probability, updated probability of Ak after
n
observing B - Denominator i=1 P (B | Ai ) P (Ai ): total probability of B, also called the
marginal probability of B

Example:
Suppose a factory produces items from two machines:
- Machine 1 produces 60% of items, Machine 2 produces 40% of items. - Defective rate:
Machine 1 
1%, Machine 2 2%
Let D be the event that an item is defective. Find the probability that a defective item
came from Machine 2.

P (D | Machine 2) P (Machine 2)
P (Machine 2 | D) =
P (D | Machine 1)P (Machine 1) + P (D | Machine 2)P (Machine 2)

0.02 · 0.4 0.008 0.008 4


= = = =
0.01 · 0.6 + 0.02 · 0.4 0.006 + 0.008 0.014 7

Answer:
The probability that the defective item came from Machine 2 is 47 .

Example 7 : It is observed that 40% the tape recorders have a flaw and they will die
within six months if they had a flaw. Out of those that don’t have a flaw, 20% dies within 6
months. Your tape recorder died in 4 months. What is the probability that it had the flaw?
Solution :
Let us define the events:

ˆ F : Tape recorder has a flaw


ˆ F : Tape recorder does not have a flaw
c

ˆ D: Tape recorder dies within 6 months


We are asked to find the conditional probability P (F | D).

Step 1: Bayes’ Theorem

22
P (D | F ) P (F )
P (F | D) =
P (D | F )P (F ) + P (D | F c )P (F c )

Step 2: Substitute the given probabilities

P (F ) = 0.4, P (F c ) = 1 − 0.4 = 0.6

P (D | F ) = 1 (dies within 6 months if flawed)

P (D | F c ) = 0.2

Step 3: Compute P (F | D)
1 · 0.4 0.4 0.4 10
P (F | D) = = = =
1 · 0.4 + 0.2 · 0.6 0.4 + 0.12 0.52 13

Answer:

10
13

Explanation:
- This uses **Bayes’ theorem**, which allows us to compute the probability of an event (the
tape recorder has a flaw) given observed evidence (it died within 4 months). - We combine the
probability of dying given a flaw and the probability of dying without a flaw, weighted by their
prior probabilities.
Example1 1An urn contains 4 white and 6 black balls and another urn contains 3 white
and 5 black balls. Two balls are drawn at random from the first urn and placed in the second
urn and then 1 ball is drawn at random from the first urn and placed in the second urn, what is
the probability that the ball drawn is white from the second urn. SOlve using total probability.
Solution:
Urn I contains 4 white and 6 black balls
Urn II contains 3 white and 5 black balls
Two balls are drawn at random from Urn I and transferred to Urn II. Then one ball is
drawn at random from Urn II. We find the probability that this ball is white.

Step 1: Possible cases of transfer from Urn I


 
10
Total ways to choose 2 balls from Urn I =
2

Case 1: Two white balls (WW)

23
4

2 6 2
P (W W ) = 10
 = =
2
45 15
After transfer, Urn II has:
(3 + 2)W, 5B = 5W, 5B
5
P (W | W W ) =
10

Case 2: One white and one black ball (WB)


4 6
 
1 1 24 8
P (W B) = 10 = =
2
45 15
After transfer, Urn II has:

(3 + 1)W, (5 + 1)B = 4W, 6B


4
P (W | W B) =
10

Case 3: Two black balls (BB)


6

2 15 1
P (BB) = 10 = =
2
45 3
After transfer, Urn II has:
3W, (5 + 2)B = 3W, 7B
3
P (W | BB) =
10

Step 2: Apply the Law of Total Probability

P (W ) = P (W W )P (W | W W ) + P (W B)P (W | W B) + P (BB)P (W | BB)


2 5 8 4 1 3
= · + · + ·
15 10 15 10 3 10
10 32 15
= + +
150 150 150
57
=
150
19
P (white ball from Urn II) =
50
Numerical 2: A football team wins 60% of its games when it scores the first goal and wins
only 10% of its games when the opposing team scores the first goal. If the team scores the first
goal in 30% of the games, what is the probability that the team wins a randomly selected game?

24
Solution :
P (W | F ) = 0.6
P (W | O) = 0.1
P (F ) = 0.3
where W = event that the team wins the game, F = event that the team scores the first
goal, O = event that the opposing team scores the first goal.
Since either the team or the opponent scores the first goal,

P (O) = 1 − P (F ) = 1 − 0.3 = 0.7

Using the Law of Total Probability:

P (W ) = P (W | F )P (F ) + P (W | O)P (O)
= (0.6)(0.3) + (0.1)(0.7)
= 0.18 + 0.07
= 0.25

P (team wins a game) = 0.25


Numerical 3: A bag contains 5 ball and it is not known how many of them are white.
Two balls are drawn at random from the bag and both were white. What is the chance that
all the balls in the bag are white.

Solution : Let Hk : the event that the bag contains k white balls, k = 0, 1, 2, 3, 4, 5.
Since nothing is known beforehand, we assume all hypotheses are equally likely:
1
P (Hk ) = , k = 0, 1, 2, 3, 4, 5
6
Let E: the event that the two drawn balls are white.

Step 1: Likelihood P (E | Hk )
If the bag contains k white balls, then

0, 
 k<2
k
P (E | Hk ) = 2

 5 ,
 k≥2
2

Step 2: Apply Bayes’ theorem

25
P (E | H5 )P (H5 )
P (H5 | E) = P5
k=0 P (E | Hk )P (Hk )
Compute numerator:
5

2
P (E | H5 ) = 5
 =1
2
1 1
P (E | H5 )P (H5 ) = 1 · =
6 6
Compute denominator:
5 k
        
X
2 1 1 2 3 4 5
5
 · = 5 + + +
k=2 2
6 6 2 2 2 2 2

1 20 1
= (1 + 3 + 6 + 10) = =
6 · 10 60 3

Step 3: Final probability

1
6 1
P (H5 | E) = 1 =
3
2
1
P (all balls are white) =
2

1.7 Types of Probability:


ˆ Theoretical Probability: Based on reasoning or mathematical analysis (like the die exam-
ple).
ˆ Experimental Probability: Based on the results of experiments or historical data.
Number of times event E occurred
It is
calculated as:[P (E) = Total number of trials
]
ˆ Subjective Probability: Based on personal judgment or opinion, rather than exact calcu-
lations.
Probability plays a crucial role in statistics, risk assessment, decision-making, and many
fields such as finance, science, and engineering.

1.8 Inclusive and mutually exclusive probability:


Inclusive and exclusive probability refer to different ways of calculating the probability of com-
bined events, particularly when considering overlapping outcomes.
Here’s a breakdown of each:

26
1.8.1 Inclusive probability:
Inclusive probability considers the probability of either event occurring, allowing for the pos-
sibility of overlap. When events can happen simultaneously, the probability of their union
includes the overlapping outcomes.

ˆ Notation: The probability of event ( A ) or event ( B ) happening is denoted as (P (A ∪ B)


).

ˆ Formula: When calculating inclusive probability for two events ( A ) and ( B ): [ P (A ∪


B) = P (A) + P (B) − P (A ∩ B) ] Here, (P (A ∩ B) ) is subtracted to avoid double counting
the outcomes that are in both events.

Example: If you have:


( P(A) = 0.3 ) (probability of drawing a red card),
( P(B) = 0.5 ) (probability of drawing a face card),
( P(A ∩ B) = 0.1 ) (probability of drawing a red face card).
The inclusive probability would be: [P (A∪B) = P (A)+P (B)−P (A∩B) = 0.3+0.5−0.1 = 0.7]

1.8.2 Mutually exclusive events


Mutually exclusive events are events that cannot occur at the same time. If one event happens,
the other cannot. This means the occurrence of one event completely rules out the possibility
of the other event occurring. [1]
Key Characteristics: The probability of both events occurring simultaneously is zero: P (A ∩
B) = 0. The sum of the probabilities of mutually exclusive events gives the total probability
of either event occurring.

1.8.3 Examples of Mutually Exclusive Events:


ˆ Flipping a Coin:When you flip a coin, the outcomes are either heads (H) or tails (T).
These events are mutually exclusive because if the coin lands on heads, it cannot land on
tails at the same time. Mathematically, P (H) + P (T) = 1.

ˆ Rolling a Die:When rolling a fair six-sided die, the outcomes (1, 2, 3, 4, 5, 6) are mutually
exclusive. For instance, if you roll a 3, you cannot simultaneously roll a 5. If you denote
two events A = roll a 2)and(B = roll a 5), then(P (A ∩ B) = 0).

ˆ Choosing a Card from a Deck:In a standard deck of cards, the event of drawing a heart
(A) and the event of drawing a spade (B) are mutually exclusive. If you draw a card that
is a heart, it cannot be a spade at the same time. Thus, P (A ∩ B) = 0).

ˆ Weather Conditions: The events ”it is raining” and ”it is snowing” can be considered
mutually exclusive in a particular location during a specific time (e.g., at roughly the
same temperature and conditions). You cannot have rain and snow falling at the same
time under typical conditions.

27
ˆ Sports Events: If a sports team has an event where they either win (A) or lose (B), these
outcomes are mutually exclusive. If the team wins, they cannot lose in that particular
game.

ˆ Easy to remember the example is probability of getting ACE or King


Mutually exclusive events are significant in probability theory as they simplify calculations
involving the probabilities of combined outcomes. When assessing the overall likelihood of
mutually exclusive events, you can simply add their individual probabilities.
Mutually exclusive events

ˆ cannot occur at same time.


ˆ P(A ∩ B) = 0
ˆ P(A ∪ B) = P (A) + P (B)
1.8.4 Non mutually exclusive events
Non-mutually exclusive events are events that can occur at the same time. In other words, the
occurrence of one event does not prevent the other from occurring. These events can overlap,
meaning they can have common outcomes. Non Mutually exclusive events

ˆ The events can happen simultaneously.


ˆ The probability of the union of two non-mutually exclusive events includes the probability
of their intersection.

1.8.5 Example of Non Mutually exclusive events


ˆ Event A: Drawing a heart card.
ˆ Event B: Drawing a face card (Jack, Queen, or King).
Applying the events:

ˆ Intersections: The intersection A ∩ B includes the outcomes where the card drawn is both
a heart and a face card. In this case, it includes the cards king, queen and Jack. So,
3
P (A ∩ B) = 52 .

ˆ Calculating Individual Probabilities:Probability of drawing a heart:


12 3
P (A) = 13
52
= 1
4
;Probability of drawing a face card: P (B) = 52
= 13

ˆ Calculating the Probability of the Union:P (A∪B) = P (A)+P (B)−P (A∩B) Substituting
13 12 3 22 11
the values P (A ∪ B) = 52
+ 52
− 52
= 52
= 26

ˆ best example to remember is P(queen or black card)

28
1.8.6 Brackets and their meanings
ˆ [a, b) includes all numbers from a to b , including a but not b .
ˆ (a, b) includes all numbers strictly between a and b , excluding both a and b .
1.8.7 Independent events
ˆ Can both happen together, with their own individual probabilities.
ˆ Events that do not influence each other’s occurrence.
ˆ P(A ∩ B) =P (A) × P (B) = × = 1
2
1
6
1
12

1.8.8 Example of Independent Events:


Flipping a Coin and Rolling a Die:
Let event ( A ) be flipping heads on a coin, and event ( B ) be rolling a 5 on a die.
The outcome of the coin flip does not impact the outcome of the die roll.
If P(A) = 0.5 and P (B) = 16 , then: P (A ∩ B) = P (A) × P (B) = 0.5 × 16 = 121

1.8.9 Example of dependent Events:


To find the probability of both events occurring (drawing an Ace first and then another Ace),
we use the formula for dependent events:
P (A ∩ B) = P (A) × P (B|A)
Substituting the values:
4 3 4 3 12 1
Calculate ( P (A∩B) ): [P (A∩B) = P(A)×P (B|A) = 52 × 51 P (A∩B) = 52 × 51 = 2652 = 221

1.9 Conditional probability


The notation P (A/B) denotes the conditional probability of event ( A ) occurring given that
event ( B ) has already occurred. The formula for conditional probability is defined as:
P (A∩B)
[P (A/B) = P (B)
]
where:

ˆ P (A/B) is the probability of event ( A ) occurring given that ( B ) has occurred.


ˆ (P (A ∩ B) ) is the probability of both events ( A ) and ( B ) occurring together (the
intersection of ( A ) and ( B )). ( P(B) ) is the probability of event ( B ) occurring. This
formula is valid only if P (B) > 0 because we cannot condition on an event that has zero
probability. Conditional probability provides insight into how the occurrence of one event
can affect the likelihood of another event.
Example Scenario: Suppose we have a standard deck of 52 playing cards.
Let:

29
Event ( A ): Drawing a heart.
Event ( B ): Drawing a red card. There are 26 red cards in total (hearts and diamonds),
and since half of the red cards are hearts, we can find:
( P (A) = 13
52
= 41 )(probability of drawing a heart).
( P (B) = 52 = 21 ) (probability of drawing a red card).
26

( P (A ∩ B) = P (A) = 1352
= 41 )(since all hearts are red).
1
P (A∩B)
Now we can find P (A ∩ B): [P (A|B) = P (B)
= 4
1 = 21 ]
2

This means that given that a red card is drawn, the probability that it is a heart is 12 .

1.10 Bayes’ Theorem:


Bayes’ Theorem: This theorem is derived from set theory principles and describes how to up-
date the probability of an event based on new information:

P (A|B) = P (B|A)·P
P (B)
(A)

Problem: The patients of the infirmary of a cardiology clinic have been operated by
three physicians P1 , P2 and P3 . Assume that these physicians have operated 50%, 30% and
20% of all the patients of the infirmary and that they commit a malpractice with probability
0.04, 0.05 and 0.02, respectively. If a patient of the infirmary is chosen at random and if he/she
is victim of a malpractice, what is the probability that the physician P3 has caused the problem?

Solution: Probability that a patient was operated by P1 , P2 and P3 : P (P1 ) = 0.50, P (P2 ) =
0.30, P (P3 ) = 0.20
Probability of malpractice given the physician:
P (M |P 1) = 0.04, P (M |P2 ) = 0.05, P (M |P3 ) = 0.02

Total probability that a randomly chosen patient has a malpractice P (M ) = P (P1 ) ×


P (M |P1 ) + P (P2 ) × P (M |P2 ) + P (P3 ) × P (M |P3 )
P (M ) = (0.50)(0.04) + (0.30)(0.05) + (0.20)(0.02) = 0.020 + 0.015 + 0.004 = 0.039
Using Bayes’ Theorem:
(M |P3 )
P (P3 |M ) = P (P3 )×P
P (M )
P (P3 |M ) = 0.20×0.02
0.039
0.004
= 0.039 ≈ 0.1026
The probability that Physician P3 caused the malpractice, given that a malpractice has oc-
curred, is approximately 10.26%.

Problem: In a bag there are three true coins and one false coin with head on both sides.
A coin is choosen at random and toaased four times. If head occurs all the four times, what is
the probability that the false coin was chosen and used.
Solution: P(Selecting true coin)= P1 = 43
P(Selecting false coin)= P2 = 14
P(getting all heads with true coin) = 21 ∗ 12 ∗ 12 ∗ 12 = 16
1

P(getting all heads with false coin) = 1* 1 *1 *1 = 1


P(false coin was chosen and used)=

30
P (Selectingf alsecoin)∗P (gettingallheadswithf alsecoin) 1/4∗1
P (Selectingtruecoin)∗P (gettingallheadswithtruecoin)+P (gettingallheadswithf alsecoin)∗P (Selectingf alsecoin)
= 3/4∗1/16+1/4∗1
16
P(false coin was chosen and used)= 19

Problem: A bag contains 7 red and 3 black balls and another bag contains 4 red and
5 black balls. One ball is transferred from the first bag to the second bag and then a ball is
drawn from the second bag. If this ball happens to be red, find the probability that a black
ball was transferred.
Solution: P(probability that a black ball was transferred)=
P (P robabilityof transf erringblackball)∗P (”N ow”drawingaredball)
P (P robabilityof transf erringaredball)∗P (N owtransf erringaredball)+P (P robabilityof transf erringblackball)∗P (”N ow”drawingaredball)
(3/10∗4/10)
= 3/10∗4/10+7/10∗5/10 = 12/47

Problem Find the probability that a year is selected at random would contain 53 Sundays.
Solution: There are two possibilities
Case 1 : P(Leap year is selected) = 1/4 i.e 53 Sundays Or 52 weeks + 2 days extra (Sat-Sun
or Sun-Mon)
= 1/4 * 2/7 = 2/28

Case 2 : P(Selecting a non leap year) = 3/4 i.e 52 weeks and 1 day
= 3/4 * 1/7 = 3/28
∴, P( a year is selected at random would contain 53 Sundays) = 2//28+3/28 = 5/28

Problem: Let X, Y, Z be te events which are independent with probability a,b,c respec-
tively. Let the random variable ’n’ denotes the number x,y,z which occurs. Then find the
probability that exactly two events occur.
Solution: Let P(x) = a, P(y) = b, P(z) = c;
XYZ’ + XY’Z + X’YZ probability that exactly two events occurs
ab(1-c) + a(1-b)c + (1-a)bc
ab - abc + ac -abc + bc -abc
ab + ac + bc -3abc

1.11 Gamma function:


Gamma
R ∞ n−1 Function Definition: The Gamma function, denoted as (Γ(n) ), is defined as: [Γ(n) =
−t
0
t e , dt for n > 0]
For positive integers: [ Γ(n) = (n − 1)! ] ; This means that ( Γ(5) = 4! = 24).
Gamma function computes an area under the curve of the function ( tn−1 e−t ), which helps us
generalize the idea of multiplying numbers (like in factorials) to other values.

1.11.1 Properties of the Gamma Function :


ˆ Recursive Property: [ Γ(n + 1) = n · Γ(n) ] This property shows how moving from one
integer to the next involves multiplication by that integer, similar to the factorial property.
ˆ Special Values:
31

(Γ(1) = 1 ) ( Γ(2) = 1! = 1 ) ( Γ(3) = 2! = 2 ) ( Γ(0.5) = π)

1√ 3√ 15 √ 105 √
       
3 5 7 9
(Γ = π); (Γ = π); (Γ = π); (Γ = π) (1)
2 2 2 4 2 8 2 16
ˆ Negative Arguments: The Gamma function is not defined for non-positive integers, and
it has poles at these values. However, it can be related via the reflection formula: [
π
Γ(z)Γ(1 − z) = sin(πz) ]

It allows for the computation of the factorial of any positve real number instead of just integers.

1. Why is it called M/M/1?


The notation M/M/1 is based on Kendall’s notation used to describe queueing systems.

A/B/C
where:

ˆ A = Arrival process
ˆ B = Service time distribution
ˆ C = Number of servers
Explanation of M/M/1:

ˆ First M : Markovian arrival process (Poisson arrivals, exponential inter-arrival time)


ˆ Second M : Markovian service process (exponential service time)
ˆ 1: Single server
Thus, M/M/1 represents:

A queue with Poisson arrivals, exponential service time, and a single server.

2. Markov Property
The system satisfies the Markov property:

Future state depends only on present state, not on past history

3. What is a Server?
A server is the entity that provides service to customers in a queueing system.
Definition:

A server is a facility, person, or system that processes or serves arriving customers.

32
4. Examples of Servers
System Server
Bank Teller
Hospital Doctor
Call center Operator
Computer system CPU
Restaurant Waiter
Elevator system Elevator

5. Service Rate
The service rate is denoted by:

µ = number of customers served per unit time

6. Types of Servers
(a) Single Server System

ˆ Only one server is available


ˆ Example: Single bank counter
ˆ Model: M/M/1
(b) Multiple Server System

ˆ More than one server is available


ˆ Example: Multiple counters in a bank
ˆ Model: M/M/c
7. Key Insight
A server is not always a human. It can be:

ˆ Machine
ˆ Software system
ˆ Automated process

33
Queueing Theory – Questions and Answers
Q1. What is a queueing system? Explain its components.
Answer:
A queueing system is a mathematical model used to study waiting lines.
Components:

ˆ Arrival process (λ): Rate at which customers arrive


ˆ Service mechanism (µ): Rate at which service is provided
ˆ Queue discipline: FIFO, LIFO, Priority
ˆ System capacity: Maximum number of customers allowed
ˆ Number of servers
Q2. Define the following terms:
(a) Traffic intensity:
λ
ρ=
µ
(b) Average number in system (L): Total number of customers in the system (queue
+ service).
(c) Waiting time (W ): Average time spent by a customer in the system.

Q3. What is an M/M/1 queue? State its assumptions.


Answer:
An M/M/1 queue is a single-server queue with:

ˆ Poisson arrival process


ˆ Exponentially distributed service time
ˆ Infinite queue capacity
ˆ FIFO service discipline
Q4. Numerical Problem
Given: λ = 4 customers/min, µ = 6 customers/min.
(a) Traffic intensity:
4
ρ = = 0.667
6

34
(b) Average number in system:

λ 4
L= = =2
µ−λ 6−4

(c) Average time in system:


1 1
W = = = 0.5 min
µ−λ 2

Q5. Why is queueing theory important in engineering?


Answer:
Queueing theory helps in:

ˆ Reducing waiting time


ˆ Designing efficient systems
ˆ Improving service utilization
Applications include:

ˆ Telecommunication systems
ˆ Traffic control
ˆ Computer networks
Q6. What is the condition for stability of a queueing system?
λ
ρ= <1
µ
If ρ ≥ 1, the system becomes unstable.

Q7. Expression for average number of customers in M/M/1 system


λ
L=
µ−λ

Q1. (10 Marks)


An M/M/1 queue has an arrival rate of λ = 5 customers per hour and a service rate of µ = 8
customers per hour.

(a) Find the traffic intensity.

(b) Find the average number of customers in the system (L).

35
(c) Find the average time spent in the system (W ).

(d) Find the average number of customers in the queue (Lq ).

(e) Find the average waiting time in the queue (Wq ).

Answer:
λ 5
ρ= = = 0.625
µ 8
λ 5 5
L= = = ≈ 1.667
µ−λ 8−5 3
1 1
W = = ≈ 0.333 hours
µ−λ 3
λ2 25 25
Lq = = = ≈ 1.042
µ(µ − λ) 8×3 24
λ 5 5
Wq = = = ≈ 0.208 hours
µ(µ − λ) 8×3 24

Q2. (10 Marks)


Explain the M/M/1 queueing model. Derive expressions for the average number of customers
in the system (L) and the average waiting time in the system (W ). State the condition for
system stability.
Answer:
M/M/1 Model Assumptions:

ˆ Arrivals follow a Poisson process with rate λ


ˆ Service times are exponentially distributed with rate µ
ˆ Single server
ˆ Infinite queue capacity
ˆ FIFO discipline
Steady-state condition:
λ
ρ= <1
µ
Average number of customers in the system:
λ
L=
µ−λ

36
Average time spent in the system:
1
W =
µ−λ
Using Little’s Law:
L = λW
The system remains stable only when the service rate exceeds the arrival rate.
As λ approaches µ, the queue length and waiting time increase rapidly.

37
Qu 1: If X is a binomial random variable then variance of X is?
Qu 2: If Y is a poissons random variable then variance of Y is ?
Qu 3: If X is a binomial random variable, the probability of X=n is ?
Qu 4: If Y is a poissons random variable the probability of Y=1 is ?
Qu 5: The probability of getting a head when a biased coin is tossed is 0.6. What is the
probability of getting a three heads when this coin is tossed 5 times.
Qu 6: Consider a Poisson random variable with λ=3 per hour. The probability of non arrival
in an hour is
Qu 7:In a standard normal variable the area to the left of Z=0 is
Qu 8:A random variable following a normal distribution has X=2µ and µ = 2σ. Find the value
of z.
Qu 9:A random variable following a normal distribution has µ = 2σ. What should be X so that
z=1
Qu 10:In a normal distribution, p(z<1) is
Qu 11:Imagine a biased coin where probability of getting head is 0.7 and tails is 0.3 is tossed
once. Is it Bernoulli distribution.
Qu 12:Number of students in class is wihch type of distribution.
Qu 13:If a cricket match is held between India and Kenya, probability of India winning is 0.6.
Consider the result is either win or loss (i.e there is no tie of cancellation) Suppose there is 5
match series between India and Kenya, what is the probability of India winning the series.
Qu 14:A damaged product will come to the production area at an average rate of 20 minutes.
Assuming a poisson process, what is the probability of 5 defective products arriving in one
hour.
Qu 15:A die has four sides pasted red and two sides pasted green. It is rolled six times. Find
the probability of getting four red and two green?
Qu 16:Comment on the normal distribution curve. RIght/Left skewed/Symmetric.
Qu 17:If only the mean of a normal distribution changes :
Qu 18:If only the standard deviation of a normal distribution changes :
Qu 19:
Qu 20:
Qu 21:
Qu 22:

2 Statistics
Statistics is a branch of mathematics that deals with the collection, analysis, interpretation,
presentation, and organization of data. It provides methods and techniques to summarize and
make sense of data, allowing for informed decision-making based on empirical evidence.

2.1 Key Components of Statistics:


ˆ Data Collection: Statistics involves gathering data through various methods such as sur-
veys, experiments, observations, and records. The quality and method of data collection

38
can significantly impact the analysis and conclusions.

ˆ Data Analysis: Statistical analysis involves techniques to explore, describe, and under-
stand the data. This can include descriptive statistics (summarizing and organizing data)
and inferential statistics (making predictions or generalizations about a population based
on sample data).

ˆ Descriptive Statistics:This aspect focuses on summarizing and presenting data in a way


that is understandable. It includes measures of central tendency (mean, median, mode)
and measures of dispersion (range, variance, standard deviation), as well as visual repre-
sentations like graphs and charts.

ˆ Inferential Statistics:This branch uses sample data to make inferences about a larger
population. It involves hypothesis testing, confidence intervals, regression analysis, and
other methods that allow statisticians to draw conclusions beyond the immediate data.

ˆ Probability Theory:Probability is a fundamental concept in statistics that provides a


framework for quantifying uncertainty. It helps in making predictions about future events
based on the likelihood of occurrence.

ˆ Statistical Models:Statistics often involves creating models that represent real-world pro-
cesses. These models can help in understanding relationships between variables and
predicting outcomes.

2.2 Applications of Statistics:


Statistics is widely used across various fields, including:

ˆ Business: For market research, quality control, and decision-making.


ˆ Healthcare: In clinical trials, epidemiology, and health surveys.
ˆ Social Sciences: To study population trends, behaviors, and social issues.
ˆ Economics: For analyzing economic indicators and trends.
ˆ Sports: In performance analysis and game strategy.
2.3 Descriptive statistics:
In descriptive statistics, parameters are numerical values that summarize and describe certain
characteristics of a dataset. The following parameters describe the center or typical value of a
dataset.

ˆ Mean: The arithmetic average of a set of numbers. Calculated by summing


P
X P all the values
and dividing by the number of observations. Mean = N
where X is the sum of all
data points and ( N ) is the number of data points.

39
ˆ Median: The middle value when the data is ordered. f the number of observations is odd,
the median is the middle value; if even, it is the average of the two middle values.

ˆ Mode: The most frequently occurring value(s) in a dataset. A dataset may have one
mode (unimodal), more than one mode (multimodal), or no mode at all.

The following parameters describes the Measures of Dispersion (Variability)

ˆ Range:The difference between the largest and smallest [Link] = Maximum −


Minimum

ˆ Variance: The average of the squared differences from the mean. For a population, it is
P 2
P 2
calculated as: σ 2 = (X−µ)
N
For a sample, it is calculated as: s2 = (X−
n−1
X̄)
where µ is
the population mean, X̄ is the sample mean, (N) is the population size, and (n) is the
sample size.

ˆ Standard Deviation:
√ The square root of variance, representing
√ the average distance from
the mean. σ = σ2 (population standard deviation) s = s2 (sample standard deviation)

ˆ Interquartile Range (IQR): Measures the spread of the middle 50% of the data. Calculated
as the difference between the third quartile (Q3) and the first quartile (Q1): IQR =
Q3 − Q1

The following parameters describe the distribution shape of the dataset.

ˆ Skewness: Measures the asymmetry of the probability distribution. A skewness of 0 indi-


cates a symmetrical distribution. Positive skewness indicates a longer right tail; negative
skewness indicates a longer left tail.

ˆ Kurtosis: Measures the ”tailedness” of the distribution. High kurtosis indicates more
data located in the tails; low kurtosis indicates flatter tails.

40
3 Random number
We are interested in measuring any characteristic of an experiment, we must associate a number
with each outcome. For instance, we can assign the values 1 and 0 to a perfect and a defective
manufactured item, and to a sequence of heads and tails we can assign the number of heads
observed. There exist innumerous examples for random variables. For instance, age, weight,
height, income, number of children and number of cars are possible random variables associated
with a randomly chosen person. The numbers of balls of a given color are random variables
associated with a random selection of balls from an urn, or the sum of the outcomes is a
random variable associated with the experiment of tossing two dice. Usually capital letters
such as X, Y and Z are used to denote random variables. However, when speaking of the value
of these variables, in general, lowercase letters such as x, y and z are used. Random variables
provide a rigorous framework for quantifying and analyzing the outcomes of complex, real-world
experiments and processes across multiple domains.

41
3.1 Discrete Random Variables
Given a random variable X, if the range space RX is finite or countably infinite, X is called a
discrete random variable.
The range space of such a variable can be written as RX = x1, x2, . . . , xn, . . . , i.e., in the finite
case, the list of values terminates and in the countably infinite case it continues indefinitely.
We associate a probability p(xi) with each element xi of RX , such that p(xi) ≥ 0 for all i and
p(x1) + p(x2) + . . . = 1.

Discrete Random Variables is defined by a probability mass function (PMF), which provides
the probability of each possible value.

3.2 Probability Mass Function


The Probability Mass Function (PMF) is a fundamental concept in probability theory that
applies to discrete random variables. It provides a way to describe the probability distribution
of a discrete random variable by assigning probabilities to each possible value the variable can
take.
For a discrete random variable ( X ), the PMF is denoted as ( P(X = x) ) and defined as
follows:

P (X = x) = p(x) = P (X takes the value x)


Where
ˆ p(x) is the probability that the random variable ( X ) equals a specific value ( x ).
ˆ p(x) ≥ 0 for all x (the probability cannot be negative).
ˆ The
P sum of probabilities across all possible values must equal 1:
x∈S p(x) = 1 where S is the set of all possible values of the discrete random variable X.
Let X be a discrete or continuous random variable. The (cumulative) distribution function
of X (abbreviated as cdf) is defined by F(x) = P(X ≥ x). it follows that F(x) is defined on the
entire real
R x line. We obtain immediately
F (x) = −∞ f (s)ds for continuous and
P
F(x) = j P (xj )

3.3 Example on Probability Mass Function


Example 1 To find the probability distribution function in case of following imagined situation.
You have a collection of actions, where:
4 actions are ”selfish actions”. 16 actions are ”kind actions”.
Now the situation would involve drawing two actions from this collection. Define the pos-
sibilities for the number of selfish action/actions drawn:

42
Solution: For the number of selfish actions when drawing two from a mix of 4 ”selfish
actions” and 16 ”kind actions”, we first define our scenario mathematically. The total number
of actions is 20.
When drawing two actions, we can have the following possibilities for the number of ”selfish
actions” drawn:
0 ”selfish actions” 1 ”selfish actions” 2 ”selfish actions” We will use combinations to calculate
the probabilities for each of these scenarios, based on the distribution.
Probability
 20×19 Calculations Total Ways to Choose 2 ”selfish actions” from 20: [ Total combinations
20
= 2 = 2 = 190 ]
Probability of Drawing 0 ”selfish actions”:
(4)·(16)
Choosing 0 selfish actions means choosing kind actions. [ P(X = 0) = 0 20 2 = 1·120 =
(2) 190
120
190
= 12
19
≈ 0.632] Probability of Drawing 1 selfish actions:
(4)·(16)
Choosing 1 selfish actions and 1 kind actions. [ P(X = 1) = 1 20 1 = 4·16 64
= 190 = 32 ≈ 0.337
(2) 190 95
] Probability of Drawing 2 ”selfish actions”:
(4)·(16) 6·1 6 3
Choosing 2 ”selfish actions”. [ P(X = 2) = 2 20 0 = 190 = 190 = 95 ≈ 0.032 ] Summary
(2)
of the Probability Distribution The probability distribution of the number of ”selfish actions”
drawn when picking 2 actions is:
( P(X = 0) = 12 19
≈ 0.632 ) ( P(X = 1) = 95 32
≈ 0.337 ) ( P(X = 2) = 95 3
≈ 0.032)
Example 2: Let X be a random variable taking values 1,2 and 3 with probabilitiyes 3/15,
7/15, 5/15. Find its distribution function and show the distribution function diagrammatically.
Solution:Fx(x ≤ 1)= P(X=1)=3/15

Fx(x ≤ 2)= P(X=1)+P(X=2)=3/15+7/15=10/15

Fx(x ≤ 3)= P(X=1)+P(X=2)+P(X=3)=3/15+7/15+5/15=1

43
Another example is of dais.
(
1
6
if x = 1, 2, 3, 4, 5, 6
p(x) = (2)
0 otherwise

3.4 Usage in Discrete Probability Distributions


ˆ Binomial Distribution: For modeling the number of successes in a series of independent
Bernoulli trials.

ˆ Poisson Distribution: For counting the number of events occurring within a fixed interval
of time or space.

ˆ Geometric Distribution: For modeling the number of trials until the first success occurs.
For a discrete random variable ( X ), the CDF, denoted as FX (x) , is defined mathematically
as:
FX (x) = P (X ≤ x)
Where:

FX (x) is the cumulative distribution function evaluated at the value x. P (X ≤ x) is the


probability that the random variable X takes on a value less than or equal to x.

3.5 Binomial distribution:


The binomial distribution is a discrete probability distribution that models the number of
successes in a fixed number of independent trials, each with the same probability of success.
Here are the key characteristics of the binomial distribution:

ˆ Fixed Number of Trials (n):The binomial distribution is defined for a fixed number of
trials, denoted as ( n ). Each trial is an independent event.

ˆ Two Possible Outcomes:Each trial results in one of two possible outcomes: ”success” or
”failure.” Success is typically denoted by ( p ), while failure is denoted by ( q ) (where (
q = 1 - p )).

ˆ Constant Probability of Success (p):The probability of success remains constant across all
trials. For instance, if you are flipping a coin, the probability of getting heads (considered
a success) remains ( 0.5 ) in each trial.

ˆ Independence of Trials: The outcome of one trial does not affect the outcomes of others.
This independence is crucial for applying the binomial distribution.

ˆ Random Variable:The random variable ( X ) represents the number of successes in the (


n ) trials. ( X ) can take values from ( 0 ) to ( n ).

44
ˆ Probability Mass Function (PMF): The probability of obtaining exactly ( k ) successes
in ( n ) trials is given by the probability mass function: [ P(X = k) = nk pk (1 − p)n−k ]
where ( nk ) is the binomial coefficient, representing the number of ways to choose ( k )


successes from ( n ) trials.

ˆ Mean and Variance: The mean (expected value) of the binomial distribution is given by:
[µ= n p ] The variance of the binomial
p distribution is given by: [ σ 2 = n p (1 - p) ] The
standard deviation is thus ( σ = np(1 − p) ).

ˆ Shape of the Distribution: The shape of the binomial distribution can vary depending on
the values of ( n ) and ( p ) For ( p = 0.5 ), the distribution is symmetric if ( n ) is large.
For (p < 0.5), the distribution is skewed to the right.
For (p > 0.5 ), the distribution is skewed to the left.

Applications of Binomial Distribution: The binomial distribution is widely used in


various fields, including

ˆ Quality control: Testing for defective items.


ˆ Medicine: Determining treatment success rates.
ˆ Marketing: Predicting the success of campaigns.
ˆ Finance: Analyzing success probabilities in investment decisions.
3.6 Example on Binomial Distribution:
Example 2:Suppose that an item is perfect with probability p = 0.8 and defective with proba-
bility 1−p = 0.2. What is the probability that there are exactly two defective pieces among five
manufactured items, produced in sequence? Assume that the probability p is the same for each
item throughout the duration of the study and that the results of the individual productions
are independent events.
Solution: The probability mass function (PMF) of the binomial distribution is given by:
P (X = k) = nk pk (1 − p)n−k
5 5!
= 5×4

2
= 2!(5−2)! = 10
5
2×1 2
P(X = 2) = 2 (0.2) (0.8)5−2
P(X = 2) = 10 ×(0.2)2 × (0.8)3
Thus, the final probability is approximately 0.2048 or 20.48%.

Example 3: Studies show that color blindness affects 8% pf men. A random sample of 10
men is taken. Find the probability that all 10 men are color blind? No men are color blind?
Exactly 2 men are color blind and at least 2 men are color blind?
Solution: Given that the probability of a man being color blind is ( p = 0.08 ) (which is 8%),
the probability of a man not being color blind is ( 1 − p= 0.92 ). Case 1: All 10 men are color
blind

45
P (X = k) = nk pk (1 − p)n−k


P (X = 10) = 1010
(0.08)10 (0.92)0
P (X = 10) = (0.08)10 ≈ 1.073741824 × 10−11
Case 2: No men are color blind
P (X = 0) = 100
(0.08)0 (0.92)10
P (X = 0) = (0.92)10 ≈ 0.4344
Case 3: Exactly2 men are color blind
P (X = 2) = 102
(0.08)2 (0.92)8
P (X = 2) = 45 × 0.0064 × 0.5132 ≈ 0.1478
Case 4: At least 2 men are color blind
P (X ≥ 2) = 1 − P (X < 2)
P (X < 2) = P (X = 0)1+ P (X9 = 1)
P (X = 1) = 101
(0.08) (0.92)
P (X = 1) = 10 × 0.08 × 0.4721 ≈ 0.3777
P (X < 2) = P (X = 0) + P (X = 1) ≈ 0.4344 + 0.3777 = 0.8121
P (X ≥ 2) = 1 − P (X < 2) ≈ 1 − 0.8121 = 0.1879

3.7 Poisson’s distribution:


Poisson’s distribution is a discrete probability distribution that models the number of events
occurring within a fixed interval of time or space. These events must be independent, and the
average rate at which they occur should be constant. It is particularly useful for modeling rare
events.
Key Characteristics of Poisson Distribution:
ˆ Events in Fixed Intervals:The distribution is used to model the number of events hap-
pening in a specific fixed interval of time, space, area, or volume.
ˆ Independence: Events occur independently of each other. The occurrence of one event
does not affect the probability of another occurring.
ˆ Constant Average Rate (λ): Events occur with a known constant mean rate ((λ)), which
is the expected number of occurrences in the interval.
ˆ Discrete Outcomes: The possible number of events (k) in any given interval ranges from
0 to infinity.
ˆ Poisson Probability Mass Function (PMF):The probability of observing ( k ) events in an
e−λ λk
interval is given by the formula: P (X = k) = k!
Where
( X ) is the random variable representing the number of events.
( k ) is the number of occurrences.
(λ) is the average number of events in the interval.
( e ) is the base of the natural logarithm, approximately equal to 2.71828.

Applications:
Poisson’s distribution is used in various real-world situations, such as:

46
ˆ The number of phone calls received by a call center in an hour.
ˆ The number of decay events per unit time from a radioactive source.
ˆ The number of printing errors on a single page.
ˆ Traffic flow, such as the number of cars passing through a toll booth in an hour.
Mean and Variance:

ˆ Mean of the Poisson distribution is (λ).


ˆ Variance of the Poisson distribution is also (λ).
3.8 Continuous Random Variables:
X is called a continuous random variable, if there is a function f, called the probability density
function (pdf), satisfying the conditions

ˆ Non-negativity: The pdf must be non-negative for all possible values of the random
variable: f (x) ≥ 0 for all x

ˆ Normalization: The total area under the pdf over the entire range of the random variable
must beR equal to 1. This represents the certainty that some value within the range will

occur: −∞ f (x)dx = 1

ˆ Probability of an Interval: The probability that the random variable ( X ) falls within an
interval ([a, b]) is given by the integral of the pdf over that interval: P (a ≤ X ≤ b) =
Rb
a
f (x), dx

4 3
Example 4: The function given is: f (x) = 65 x for 2 ≤ x ≤ 3
Elsewhere, ( f(x) = 0 ). Show that f(x) is a pdf also determine the probability P(1.5 ≤ X ≤ 2.5)
Solution:

47
R3 4 3
2 65
x dx = 1
R 2.5 h 4 i2.5
4 3 4 x
2 65
x dx = 65 4
= 0.355
2

If the pdf of a random variable is given, we can identify the sample space as the region of
the real axis where the pdf has positive values.
Uniformly distributed continuous random variableA uniformly distributed continu-
ous random variable is characterized by having an equal probability of taking any value within a
specified range. This type of distribution is defined by its constant probability density function
(pdf) over a particular interval, making it one of the simplest forms of continuous probability
distributions.
Key Characteristics of Uniform Distribution
ˆ Uniformity:Within the specified interval ([a, b]), every outcome is equally likely. Outside
this interval, the probability is zero. Constant probability within domain.
ˆ Probability Density Function (pdf):For a uniform distribution over the interval ([a, b]),
1
the pdf is given by: f (x) = b−a
for a ≤ x ≤ b

ˆ This constant value ( 1


b−a
) ensures the total probability across the interval sums to 1.
P (a ≤ X ≤ b)=1 = height *(b-a)

1
b−a
= height
1
P (a ≤ X ≤ b) = b−a
a+b
mean or expected value= median = µ = 2
q
2
Standard deviation = σ = (b−a)
12
d−c
P (c ≤ X ≤ d) = b−a

3.8.1 Uniform Random Variable


In a communication system, the phase of a radio frequency (RF) sinusoid can indeed be con-
sidered as a uniformly distributed random variable over the interval ([0, 2π)) when the receiver

48
has no prior information about it. The phase of a sinusoidal signal is a crucial parameter in
determining its precise position or timing within one complete cycle of the waveform. Mathe-
matically, a sinusoidal signal can be expressed as ( A cos(ωt + ϕ) ). The phase of a sinusoid is
typically measured in radians, and a full cycle corresponds to (2π) radians. While the transmit-
ter may know the precise phase (ϕ), the receiver often does not have prior information about
the phase due to channel impairments or lack of synchronization. Without any additional infor-
mation, all phases between (0) and (2π) are equally possible at the receiver. In such a scenario,
the phase at the receiver can be modeled as a random variable that is uniformly distributed
over [0, 2π). A uniform distribution over [0, 2π) implies that any phase within this interval is
equally likely. This reflects a state of maximum uncertainty in the phase information. Prob-
ability Density Function (PDF) for a uniformly distributed random variable (Φ) over [0, 2π)
1 1
[ f (ϕ) = 2π for 0 ≤ ϕ < 2π]. The height of the PDF is constant at ( 2π ), ensuring that the
total probability over the interval sums to 1.
Modeling the phase as a uniform random variable is mathematically convenient and realistic
in cases where the receiver does not have phase information. This assumption facilitates the
analysis and design of communication systems, particularly in evaluating system performance,
designing demodulation schemes, and studying the effects of phase uncertainty.

A uniform distribution, specifically a continuous uniform distribution defined on the interval


([a, b]), has the following characteristics:

ˆ Probability Density Function (PDF): The PDF ( f(x) ) for a uniform distribution is given
by: (
1
b−a
for a ≤ x ≤ b
f (x) = (3)
0 otherwise
This indicates that the probability density is constant across the interval ([a, b]) and zero
outside this interval.

The cumulative distribution function ( F(x) ) is defined as the probability that the random
variable ( X ) takes a value less than or equal to ( x ): F (x) = P (X ≤ x)
To derive the CDF from the PDF, we simply integrate the PDF over the range [a, x] ,
considering different cases based on the value of ( x ) relative to ( a ) and ( b ).
Case 1: ( x < a ) For values of ( x ) less than ( a ), the CDF is: [ F(x) = P(X ≤ x) = 0
(since the distribution starts at a) ]
Case 2: (a ≤ x ≤ b ) For values of ( x ) withinR the interval ([a, b]), we integrate the
x
PDF from R x (1 a ) to ( x ): [ F (x) = P (X ≤ x) = a1 f (t)x , dt1 ] Substituting the PDF: [
F (x) = a b−a , dt ] Calculating the integral: [ F (x) = b−a [t]a = b−a (x − a) ] Thus, simplifying:
[ F (x) = x−a
b−a
for a ≤ x ≤ b ]
Case 3: (x > b) For values greater than ( b ), the CDF is: [ F (x) = P (X ≤ x) = 1
(since all of the distribution is covered) ]

Putting all three cases together, the cumulative distribution function ( F(x) ) for a uniform

49
distribution on the interval ([a, b]) can be expressed as:

0
 if x < a
x−a
F (x) = b−a if a ≤ x ≤ b (4)

1 if x > b

3.8.2 Exponential Random Variable :


An exponential random variable is a type of continuous random variable that is commonly used
to model the time until an event occurs, particularly in processes that are memoryless. It is
characterized by a constant average rate of occurrence and follows an exponential distribution.
The probability density function (PDF) of an exponential random variable ( X ) is given by:
[ f (x; λ) = λe−λx for x ≥ 0 ] where ( λ > 0) is the rate parameter of the distribution. It
represents the average number of events per unit time.
The cumulative distribution function (CDF) is: [ F (x; λ) = P (X ≤ x) = 1 − e−λx for x ≥ 0 ]
The exponential distribution is memoryless, meaning that the probability of an event occurring
in the next time interval is independent of how much time has already passed. Mathematically,
this is represented as: [P (X > s + t | X > s) = P (X > t) ]
The expected value (mean) of an exponential random variable is: [E[X] = λ1 ]

1
The variance is given by: [ Var(X) = λ2
]

Applications of Exponential Random Variables: Exponential random variables are


widely used in various fields, particularly where the timing of events is of interest. Here are
some notable applications:
ˆ Queueing Theory: In systems where customers arrive at a service point (like banks or call
centers), the time between arrivals can be modeled as an exponential random variable.
This helps evaluate service efficiency and waiting times.
ˆ Reliability Engineering: The time until failure of mechanical systems (like components
in a machine) is often modeled using the exponential distribution, especially when the
failure rate is constant.
ˆ Telecommunications:The time between successive transmissions of data packets can be
modeled as exponential, helping to analyze network performance and congestion.
ˆ Survival Analysis: In medical and biological studies, the time until an event such as death
or relapse from a disease can be modeled as an exponential random variable, allowing
researchers to analyze survival rates and treatment effectiveness.
ˆ Life Testing: In industrial applications, companies may conduct tests to understand the
lifespan of products. The time until failure can be modeled as an exponential random
variable, providing insights into product reliability.
ˆ Inventory Systems: The time between restocks in inventory management can be modeled
using the exponential distribution to optimize ordering and storage.

50
Example
Suppose a call center receives phone calls randomly at an average rate of 3 calls per hour.
Let ( X ) be the time until the next call.
Here, ( λ = 3 ) calls per hour. The PDF is given by: [f (x; 3) = 3e−3x for x ≥ 0 ] If we
want to find the probability that the time until the next call is more than 30 minutes (i.e.,
(x > 0.5 ) hours), we can calculate: [ P (X > 0.5) = 1 − F (0.5) = e−3×0.5 ≈ e−1.5 ≈ 0.2231 ]

3.9 Comparison between exponential distribution and poisson’s dis-


trubution
Poisson distribution describes the number of events that occur in a set time period. Ex. Poisson
distribution to model the number of customers that arrive at a coffee shop in an hour.
Exponential distribution describes the time between events that occur continuously over time.
For example, you might use an exponential distribution to model the time between customers
arriving at a coffee shop. The exponential distribution has a memoryless feature, meaning that
the probability of an event occurring is the same regardless of how long it’s been since the last
event.

Comparison between the distrubutions The binomial distribution deals with the number
of successes in a fixed number of independent trials, and the geometric distribution deals with
the time between successes in a series of independent trials. Just so, the Poisson distribution
deals with the number of occurrences in a fixed period of time, and the exponential distribution
deals with the time between occurrences of successive events as time flows by continuously. A
continuous random variable is a random variable which can take any value in some interval. A
continuous random variable is characterized by its probability density function, a graph which
has a total area of 1 beneath it: The probability of the random variable taking values in any
interval is simply the area under the curve over that interval (and the probability of the random
variable taking any one specific value is essentially 0). The exponential distribution: Consider
the time between successive incoming calls at a switchboard, or between successive patrons
entering a store. These “interarrival” times are typically exponentially distributed. If the mean
interarrival time is λ (so λ is the mean arrival rate per unit time), then the variance will be
1/λ2 (and the standard deviation will be 1/λ ). The graph below displays the graph of the
exponential density function when λ = 1. Generally, if X is exponentially distributed, then
Pr(s < X = t) = e−λs − e−λt (where e = 2.71828) . The exponential distribution fits the

examples cited above because it is the only distribution with the “lack-of-memory” property:

51
If X is exponentially distributed, then Pr(X ≤ s + t|X > s) = Pr(X ≤ t). (After waiting a
minute without a call, the probability of a call arriving in the next two minutes is the same
as was the probability (a minute ago) of getting a call in the following two minutes. As you
continue to wait, the chance of something happening “soon” neither increases nor decreases.)
Note that, among discrete distributions, the geometric distribution is the only one with the
lack-of-memory property; indeed, the exponential and geometric distributions are analogues of
one another.

3.9.1 A Laplace random variable :


A Laplace random variable is a type of continuous random variable that is often used to model
data with heavy tails, meaning that it has a higher likelihood of producing extreme values
compared to a distribution like the normal distribution. The Laplace distribution is defined
by its probability density function (PDF). If we denote the random variable as ( X ) and
the parameters of the distribution as ( µ ) (the location parameter) and ( b ) (the scaling
parameter), the PDF of a Laplace-distributed random variable is given by:
1 − |x−µ|
[ f (x; µ, b) = 2b e b ]
for all ( x ∈ R), where:
( µ ) is the mean of the distribution. ( b > 0 ) controls the spread or scale of the distribution.
Characteristics of the Laplace Distribution:
ˆ Shape: The Laplace distribution has a double-exponential shape, characterized by a peak
at the mean (µ ) and tails that decrease exponentially. This gives it the heavy tail
property.
ˆ Mean and Variance: The mean of the Laplace distribution is (µ ). Variance: The variance
is given by: [ Var(X) = 2b2 ]
ˆ Cumulative Distribution Function (CDF): The CDF of the Laplace distribution is given
by:
F (x; µ, b) = (
1 x−µ
2
e b if x < µ
x−µ (5)
1 − 12 e− b if x ≥ µ

Example 5:Bus is uniformly late between 2 and 10 minutes. How long can you expect to
wait? With what standard deviation? If it’s greater than 7 minutes late, you will be late for
work. What is the probability of you being late.
Solution: a=2 , b=10
µ = 2+10 =6
q2
(10−2)2
σ= 12
=2.31
P (7 ≤ X ≤ 10)= 10−7
10−2
=0.375

Example 6: Let X be the lifetime of a certain electronic component (in hours). Suppose
that the pdf is given by

52
f (x) = xC2 for 1000 ≤ x ≤ 2000
Elsewhere, ( f(x) = 0 ). The pdf implies that we are assigning probability zero to the events
X < 1, 000 and X > 2, 000.
R 2000
Solution: 1000 x12 dx = 2000
1

C = 2000
The constant C is called a normalizing constant.

3.10 Random Process:


A random process (also called a stochastic process) is a collection of random variables indexed
by time or space, representing the evolution of some system of random values over time or
space. It is used to model systems or phenomena that evolve in a way that is not deterministic,
meaning their future evolution is dependent on probabilities rather than being fixed by initial
conditions.

3.11 Types of Random Processes:


ˆ Discrete Time Process: The index set is discrete, such as a sequence of random variables
at specific time points. Example: Stock prices at the end of each trading day.

ˆ Continuous-Time Process: The index set is continuous, which allows variables at any
instant within a time interval. Example: Temperature readings over time.

ˆ Stationary and Non-Stationary Processes: In Stationary Process,Statistical properties


like mean and variance remain constant over time whereas in Non-Stationary Process,
Statistical properties change over time.

3.12 Examples of Random Processes:


ˆ Brownian Motion (Wiener Process):A continuous-time, continuous-state process often
used to model stock prices and physical phenomena like the diffusion of particles.
A continuous-time, continuous-state process often used to model stock prices and physical
phenomena like the diffusion of particles. Poisson Process:

ˆ A discrete-state, continuous-time process used to model the occurrence of events over


time, such as phone call arrivals or radioactive decay.

ˆ Markov Process: A process where the future state depends only on the current state, not
on the sequence of events that preceded it.

53
4 Multiple Random Variables
We have so far only considered one-dimensional random variables, i.e., we assumed that the
outcome of a random experiment could be represented as a single number. However, in many
practical situations there are several characteristics associated with the elements of a popula-
tion or a sample. For example, a physician is interested in several characteristics of a patient,
e.g., age, weight, blood pressure, blood sugar values, etc. In evaluating the competitiveness of
the countries of a certain community, several characteristics are interesting, as, e.g., an index of
unemployment, stock prices, exchange values, etc. A bidimensional random variable gives you
a way to study how two random variables relate to each other. By analyzing them together,
you can assess correlations, patterns, and dependencies that single random variables might
miss. The range space—whether discrete or continuous—provides a visual representation of all
possible outcomes for these two variables on a coordinate plane. [2]

4.1 Example:
Let X, Y denote the number of sons and daughters of a family, randomly chosen from a certain
district of a city. The fictive probability distribution is given in the following table

X 0 1 2 3 4 P(Y=i)
Y
0 0.02 0.02 0.03 0.08 0.05 0.20
1 0.05 0.10 0.15 0.15 0.05 0.50
2 0.05 0.05 0.10 0.05 0.05 0.30
P(Y=i) 0.12 0.17 0.28 0.28 0.15 1

p(x, y) ≥ 0 for all x, y


P P
x y p(x, y) = 1
P
P (B) = (x,y)∈B p(x, y)
The last column and the last row of table contain the sums of the rows and columns, re-
spectively. For example, the first value of the last column is p(0, 0)+ p(1, 0) + + p(4, 0)
=0.02+0.02+0.03+0.08+0.05=0.20. This is the probability that Y=0 and that X has any pos-
sible value, the sum is P(Y=0). The last column can therefore be interpreted as the distribution
of the variable Y. In the same way the last row can be interpreted as the distribution of X.
Since the probability distributions of X and Y appear at the margins of the table, they are
called the marginal distribution of X and Y, respectively. We now calculate the probabilities
of some events
Let be A > X + Y i.e., A denotes the event that the family has more sons than daughters. We
obtain: P
P (A) = (x,y) =
p(1,0)+p(2,0)+p(3,0)+p(4,0)+p(2,1)+p(3,1)+p(4,1)+p(3,2)+p(4,2)+p(4,3)=

54
0.02+0.03+0.08+0.05+0.15+0.15+0.05+0.05+0.05=0.63.
Similarly, the probability of the event B=(X=Y) is obtained as P(B)=p(0,0)+p(1,1)+p(2,2)=
0.02+0.1+0.1= 0.22.
Finally, for C=(X+Y ≥ 4)we get
P(C)=p(2,2)+p(3,1)+p(3,2)+p(4,0)+p(4,1)+p(4,2) =0.1+0.15+0.05+0.05+0.05+0.05=0.45.

In order to calculate a probability depending on only one of the variables X, Y, we need only
the values of the margins of the table. For example, P(Y ≤ 1)=P(Y=0)+P(Y=1)=0.20+0.50=0.70.
One can also calculate conditional probabilities of two events depending on X and Y. For ex-
ample, the probability that the family has at least three sons if it has no daughter, is:

P(X ≥ 3|Y = 0) = X≥3,Y =0


P (Y =0)
= 0.08+0.05
0.2
= 0.65

4.2 Two dimension Random Variables


The joint density function (also known as the joint probability density function) describes the
probability distribution of two or more random variables. Followings are the key properties of
a joint density function:

ˆ Non-Negativity:
The joint density function ( fX,Y (x, y) ) must be non-negative for all values of ( x ) and
( y ). That is, [ fX,Y (x, y) ≥ 0 for all x, y ]

ˆ Normalization:
The total
RR ∞ probability over the entire space must equal 1. This is expressed mathematically
as: [ −∞ fX,Y (x, y), dx, dy = 1] This means that when you integrate the joint density
function over the entire range of both variables, the result must be 1.

ˆ Marginal Densities:
The marginal density functions for each variable can be obtained by integrating the joint
density function with respect to the other
R ∞variable. The marginal density
R ∞ functions (fX (x)
) and ( fY (y)) are given by: [ fX (x) = −∞ fX,Y (x, y), dy] [ fY (y) = −∞ fX,Y (x, y), dx]

ˆ Independence:
If the random variables ( X ) and ( Y ) are independent, the joint density function can
be expressed as the product of the marginal densities: [fX,Y (x, y) = fX (x) · fY (y) ] If
this property holds, it indicates that knowledge of one variable does not provide any
information about the other.

4.3 Example:
An urn contains 3 balls numbered 1,2,3 and two balls are drawn in succession. If X is the
number on the first ball drawn and Y is the number on the second ball, find the probability

55
distribution of (X,Y). Case 1: The balls are replaced each time and Case 2: The balls are not
replaced
Solution: If the balls are replaced after each draw, the situation changes significantly. In
this case, each draw is independent of the previous one. The urn contains balls numbered 1, 2,
and 3, and you draw two balls in succession with replacement. Sample Space Since there are
3 balls and each ball can be drawn each time, the possible pairs ( (X, Y) ) when drawing two
balls can be represented as:
(1, 1) (1, 2) (1, 3) (2, 1) (2, 2) (2, 3) (3, 1) (3, 2) (3, 3) The total number of outcomes is
equal to ( 3 ×3 = 9).
P (X = x, Y = y) = P (X = x) ∗ P (Y = y) = 19 for each (x, y)

(X) (1) (2) (3)


(Y)
(1) 1/9 1/9 1/9
(2) 1/9 1/9 1/9
(3) 1/9 1/9 1/9

Case 2: The balls are not replaced. Each of the outcomes is equally likely, and since we
are drawing without replacement, the total number of ways to draw two balls from three is
(3 × 2 = 6 ).
Thus, the probability of each outcome is: [ P(X = x, Y = y) = 16 for each (x, y) ]

(X) (1) (2) (3)


(Y)
(1) 0 1/6 1/6
(2) 1/6 0 1/6
(3) 1/6 1/6 0
Note that in both the cases sum of all the probabilities is equal to unity.

4.4 Example :
Verify whether fXY (x, y) = x2 + xy
3
is a valid two-dimensional probability density function (pdf)
over the specified region (0 ≤ x ≤ 1 ) , (0 ≤ y ≤ 2 ) and zero otherwise.
Solution:
R1R2 R1R2
f (x, y), dy, dx = 0 0 x2 + xy

0 0 XY 3
, dx dy = 1
xy
fXY (x, y) = x2 + 3
is a valid two-dimensional probability density function (pdf) over the
specified region.

4.4.1 Joint PDF and Marginal PDF :


The joint probability density function (joint PDF) and the marginal probability density function
(marginal PDF) are concepts used to describe the probability distribution of random variables,

56
particularly in the context of multiple continuous random variables.
The joint probability density function fXY (x, y) of two continuous random variables X and
Y describes the likelihood of these two random variables occurring simultaneously at specific
values x and y.

If ( X ) and ( Y ) are joint continuous random variables with a joint PDF given by:

fXY (x, y) = 6xy for 0 ≤ x ≤ 1, 0 ≤ y ≤ 1 this function describes the joint distribution of
X and Y .

Whereas, The marginal probability density function for a continuous random variable pro-
vides the probability density of that variable without consideration of the other variable(s). It
is derived from the joint PDF by integrating out the other variable(s).
From the above joint PDF (fXY (x, y) = 6xy ), the marginal PDF for ( X ) can be found by
integrating over ( Y ):
R1 R1 h 2 i1
fX (x) = 0 fXY (x, y), dy = 0 6xy, dy = 6x y2 = 3x
0

Thus, (fX (x) = 3x ) for (0 ≤ x ≤ 1 ).

4.5 Maginal Distribution Function:


The marginal distribution function is a concept in probability and statistics used to describe
the distribution of a subset of random variables from a joint distribution of multiple random
variables. It provides insight into the behavior of individual variables irrespective of the other
variables present.
Definition: If you have two (or more) random variables, say ( X ) and ( Y ), the joint
probability distribution function (pdf) or probability mass function (pmf) of ( X ) and ( Y ) is
represented as ( fX,Y (x, y) ). The marginal distribution function refers to the distribution of
one of these random variables, obtained by ”marginalizing” over the other variables.
For continuous random variables, the marginal pdf of ( X ), denoted as ( fX (x) ), is calculated
by integrating
R ∞the joint pdf ( fX,Y (x, y)) with respect to the other variable ( Y ):
fX (x) = −∞ fX,Y (x, y) , dy
Similarly, the marginal pdf of ( Y ), denoted as (fY (y) ), is obtained by integrating the joint
pdf over ( XR):

fY (y) = −∞ fX,Y (x, y) , dx

4.6 Marginal Probability Mass Function (Discrete Case)


For discrete random variables, the marginal distribution can be computed by summing the joint
pmf ( P(X = x, YP = y) ):
[ P(X = x) =P y P(X = x, Y = y) ]
[ P(Y = y) = x P(X = x, Y = y) ]

57
4.7 Example :
Find the marginal distribution functions of ( X ) and ( Y ) as well as the joint probabil-
ity density function (pdf) from the joint distribution function (FXY (x, y) = xy
16
(x + y) ) for
(0 ≤ x ≤ 2)and(0 ≤ y ≤ 2)

Solution: The given joint CDF is:


FXY (x, y) = xy
16
(x + y) for (0 ≤ x ≤ 2 ) and (0 ≤ y ≤ 2).

To find the marginal distribution function ( FX (x) ), replace ( y ) with its upper limit (2)
in (FXY (x, y)):
FX (x) == x·216
(x + 2) = x8 (x + 2) for (0 ≤ x ≤ 2) also,
To find the marginal distribution function ( FY (y) ), replace ( x ) with its upper limit (2) in
(FXY (x, y)):
FY (y) == y·2
16
(y + 2) = y8 (y + 2) for (0 ≤ y ≤ 2)
∂ 2 FXY (x,y)
fXY (x, y) = ∂x∂y

∂ 2 FXY (x,y) ∂ 2 [1/16(x2 y+xy 2 )]


∂x∂y
= ∂x∂y
= 18 (x + y)

Hence, FXY (x, y) = 81 (x + y) for (0 ≤ x ≤ 2 ) and ( 0 ≤ y ≤ 2) elsewhere it is zero.

4.7.1 The marginal probability density function (PDF)


The marginal probability density function (PDF) is a way to describe the probability distribu-
tion of one of the variables in a pair of continuous random variables, independent of the other
variable. It is obtained by integrating the joint probability density function over the range of
the other variable. When dealing with two continuous random variables, (X) and (Y), the joint
probability density function (fXY (x, y)) describes the likelihood of both (X) and (Y) occurring
simultaneously at specific values. This joint PDF must be integrated to derive the marginal
PDFs, which give the probabilities of each variable separately.
Marginal PDF of (X), (fX (x)):
To obtain the marginal PDF of (X), integrate the joint PDF (fXY (x, y)) over all possible
values of (Y):
R∞
fX (x) = −∞ fXY (x, y), dy
This process essentially sums up all the probabilities over the dimension of (Y), leaving only
the distribution of (X).
Marginal PDF of (Y), (fY (y)):
Similarly, to obtain the marginal PDF of (Y), integrate the joint PDF (fXY (x, y)) over all
possible values
R ∞ of (X):
fY (y) = −∞ fXY (x, y), dx
This integration aggregates all the probabilities over the dimension of (X), resulting in the
distribution of (Y).

58
4.7.2 Example :
The joint probability distribution of X1 and X2 is given by
1
P (X1 = x1 , X2 = x2 ) = 27 (x1 + 2x2 ) where (x1 = 0, 1, 2) and (x2 = 0, 1, 2). Find the pdfs
of X1 and X2
Solution: Prepare the table of joint probability distribution

(x2 ) (0) (1) (2) (Total)


(x1 )
(0) 0 2/27 4/27 6/27
(1) 1/27 3/27 5/27 9/27
(2) 2/27 4/27 6/27 12/27
(Total) 3/27 9/27 15/27 1

ThePmarginal probability P2 function of X1 :


2 1 1
X2 =0 p(x 1 , x2 ) = X2 =0 27 (x 1 + 2x2 ) = 27
[(x1 + 0) + (x1 + 2) + (x1 + 4)]

x1 +2
P (X1 = x1 ) = 9
for x1 = 0, 1, 2

The marginal pmf of X1 in tabular form is

x1 (0) (1) (2)


P (x1 ) 2/9 1/3 4/9
ThePmarginal probability P2 function of X2 :
2 1 1
X1 =0 p(x1 , x2 ) = X2 =0 27 (x1 + 2x2 ) = 27
[(0 + 2x2 ) + (1 + 2x2 ) + (2 + 2x2 )]

P (X2 = x2 ) = 1+2x
9
2
for x2 = 0, 1, 2
The marginal pmf of X1 in tabular form is

x1 (0) (1) (2)


P (x2 ) 1/9 1/3 5/9

4.7.3 Example :
The joint function of two dimensional discrete random variable (X,Y) is given by
f(x,y)= c(x2 + 2y) for x=0,1,2 ; y=1,2,3,4 and ’0’ elsewhere

1. Find the value of c

2. p(X=2,Y=3)

3. p(X ≤ 1, Y > 2) and

4. marginal probability density functions of X and Y.

59
Solution: We can tabulate the probabilities as follows.

Y 1 2 3 4 Total
X
0 2c 4c 6c 8c 20c
1 3c 5c 7c 9c 24c
2 6c 8c 10c 12c 36c
Total 11c 17c 23c 29c 80c
80 c = 1 hence c = 1/80

4.7.4 Example :
3 3
Given the joint probability density function fXY (x, y = x16y for 0 ≤ x ≤ 2 and 0 ≤ y ≤ 2 , and
fXY (x, y) = 0 elsewhere, we need to verify if this is a valid joint probability density functioniand,
R 2 R 2 x3 y 3 R 2 h R 2 x3 y 3
if desired, find the marginal distributions. Solution : 0 0 16 , dx, dy = 0 0 16 , dx , dy
The marginal probability h i2 density function of X is given by
R 2 x3 y 3 y 3 x4 y 3 24 3 3

0 16
, dx = 16 4 = 16 · 4 = y16 · 4 = y4
0 h 4 i2
R2 3 3 3
The marginal probability density function of Y is given by 0 x16y dx = y16 x4 =
0
y3 x24 y3 y3
16
· 4
= 16
·4= 4

4.7.5 Example :
The joint probability density function provided is:
fXY (x, y) = 2 for 0 ≤ x ≤ 1 and 0 ≤ y ≤ x
and fXY (x, y) = 0 otherwise.
Find the marginal pdfs of the random variables ( X ) and ( Y ).
Solution:
R1Rx R1Rx
0 0
fXY (x, y)dydx = 0 0
2dydx
Rx R1
2 0
[y]x0 dx = 2 0
xdx = 1

Hence it is two dimensional density function.

R ∞ Now, marginal pdf


R x og x is given by
f (x, y)dy = 0 2dy = 2xfor 0 ≤ x ≤ 1
−∞ XY

60
Marginal
R∞ pdf of y isR given by
1
f
−∞ XY
(x, y)dx = y 2dx = 2(1 − y)for 0 ≤ y ≤ 1

4.7.6 Example :
Prove that the given function fXY (x, y) = x + y is a valid probability density function (PDF)
and find the marginal pdf of X and Y.
Solution:
Marginal pdf of x is given by
R1R1
0 0
(x + y)dydx =1

R1 R1
fX (x) = 0 fXY (x, y)dy = 0 (x + y)dy
2 1
h i
xy + y2 = x + 12 for 0 ≤ x ≤ 1
0

Marginal pdf of y is given by


R1 R1
fY (y) = 0 fXY (x, y)dx = 0 (x + y)dx
h 2 i1
x
2
+ xy = 12 + y for 0 ≤ y ≤ 1
0

4.7.7 Example :
Suppose X and Y represent the operating lives of the components A and B in years, in a certain
system and their probability density function is given by fXY (x, y) = e−x−y for x ≥ 0, y ≥ 0.
Elsewhere it is 0.

ˆ Find the probability of the event that component A has an operating life less than or
equal to 1 year.

ˆ Find the probability of the event that component B has an operating life greater than 2
years.

Solution:
The joint probability density function is given by:

fXY (x, y) = e−(x+y) for x ≥ 0, y ≥ 0


R∞ R∞
= −∞ f (x, y)dy = −∞ e−x e−y dy
R∞
= e−x 0
e−y dy = e−x

61
R∞ R∞
= −∞ f (x, y)dx = −∞ e−x e−y dx
R∞
= e−y 0
e−x dx = e−y

component A has an operating life less than or equal to 1 year


R1 1
0
e−x dx = [−e−x ]0 = 0.63

component B has an operating life less than or equal to 2 year

R∞ ∞
2
e−y dy = [−e−y ]2 = 0.13

4.7.8 Example :
The joint probability density function of two random variables is given by
fXY (x, y) = 15e−3x−5y for x ≥ 0, y ≥ 0. Elsewhere it is [Link] the probebility that

ˆ (i) 1 < x > 2 and 0.2 < Y < 0.3 (ii) X < 2and Y > 0.2
ˆ Find the marginal probability distributions of X and Y.
R 2 R 0.3
Solution : (i) P (1 < X < 2, 0.2 < Y < 0.3) = 1 0.2 15e−3x−5y dydx
R2 −3x
R 0.3 −5y
1
−3e dx 0.2
e dy = (e−6 − e−3 ) (e−1.5 − e−1 ) =0.0322
R2 R∞
(ii) X < 2and Y > 0.2 = 15 0 e−3x dx 2 e−5y dy
= (1 − e−6 )((Re−1 ) = 0.367

(b) fX (x)R = 0 15e−3x e−5y dy = 3e−3x for x > 0 and 0 elesewhere

fY (y) = 0 15e−5y e−3x dx = 5e−5y for y > 0 and 0 elesewhere

4.7.9 Example :
A two dimensional random variable (X,Y) has the joint density
fX,Y (x, y) = 98 xy for 1 ≤ y ≤ 2 and 1 ≤ x ≤ y and 0 elesewhere
Find the marginal distributions.
R2 R2 8
Solution: Ry f X (x) = x
f XY (x,
Ry 8 y)dy = x 9
xydy = 49 x(4 − x2 )for 1 ≤ x ≤ 2
4
fY (y) = 1 fXY (x, y)dx = 1 9 xydx = 9 y(y 2 − 1)for 1 ≤ y ≤ 2

62
4.7.10 Example :
(X,Y) is a two dimensioinal continuous random variable with the following probability distri-
bution.
fXY (x, y) = 6e−2x−3y for X > 0, y > 0 and 0 elsewhere
Verify whether X , Y are independent.
Solution:
R ∞ The marginal probability distribution of X and Y are
−2x−3y
f (x) = R 0 6e dy = 2e−2x for x > 0

f (y) = 0 6e−2x−3y dx = 3e−3y for y > 0
Since, f (x) ∗ f (x) = 2e−2x 3e−3y = 6e−2x−3y = fXY The variable X and Y are independent.

4.7.11 Example :
Find the probability mass function (PMF) of the joint distribution ( P (X1 = x1 , X2 = x2 ) =
1
27
(x1 + 2x2 ) ) for ( x1 = 0, 1, 2; x2 = 0, 1, 2), (i) Find the probability mass functions of x1 and
x2 . (ii) The conditional probability distribution of X1 given X2 = 2
Solution: TheP marginal probability distribution P2 of X 1 is given by
2 1
P (X1 = x1 ) = x2 =0 P (X1 = x1 , X2 = x2 ) = x2 =0 27 (x1 + 2x2 )
1
27
[(x1 + 0) + (x1 + 2) + (x1 + 4)] = 3x27 1 +6
= x19+2

X1 0 1 2 Total
2 1 4 The marginal probability distribution of X2 is given by
P(X1 ) 9 3 9
1
P2
x1 , X2 = x2 ) = 2x1 =0 27
1
P
P (X2 = x2 ) = x1 =0 P (X1 = (x1 + 2x2 )
1 3+6x2 1+2x2
27
[(0 + 2x2 ) + (1 + 2x2 ) + (2 + 2x2 )] = 27 = 9

X2 0 1 2 Total
1 1 5
P(X2 ) 9 3 9
1
The conditional probability distribution of X1 given X2 = 2 is obtained as follows.
x1 +4
P [(X1 =x1 )∩(X2 =2)] x1 +4
P (X1 = x1 | X2 = 2) = P (X2 =2)
= 27
5 = 15
for x1 = 0,1,2
9
X1 0 1 2 Total
4 1 2
P(X1 = x1 | X2 = 2) 15 3 5
1
Independence of Random Variables: Two random variables (X) and (Y) are said to
be statistically independent if and only if the joint cumulative distribution function (CDF) can

63
be expressed as the product of the individual cumulative distribution functions:

P (X ≤ x, Y ≤ y) = P (X ≤ x)P (Y ≤ y)

This can also be stated using CDFs:

FX,Y (x, y) = FX (x)FY (y)

To find the joint probability density function (PDF) when (X) and (Y) are independent, we
differentiate the joint CDF with respect to (x) and (y):

fX,Y (x, y) = fX (x)fY (y)

4.7.12 Example :

64
5 Normal distribution:
5.1 Probability distribution of a binomial random variable

5.2 Probability distribution of a binomial random variable

5.3 Gaussian distribution:


The most common continuous distribution is the normal distribution, also known as the Gaus-
sian distribution or the bell-shaped curve. Its shape is that of a binomial distribution for which
p is constant but n approaches infinity, or a Poisson distribution for which λ, approaches infin-
ity. Its probability density is given by the equation
(x−µ)2
1
f (x | µ, σ 2 ) = √2πσ 2
e− 2σ2 where (Together, the two parameters σ and µ completely de-
fine a normal curve. µ is the mean and (σ 2 ) is the variance of the distribution. The mean,

65
median, and mode of the Gaussian distribution are all equal.
The value of the normal distribution will become more apparent when we begin to work with the
sampling distribution of the mean. For now, however, it is important to note that many random
variables of interest-including blood pressure, serum cholesterol level, height, and weight are
approximately normally distributed. The normal curve can thus be used to estimate probabil-
ities associated with these variables. For example, in a population in which serum cholesterol
level is normally distributed with σ and µ, we might wish to find the probability that a ran-
domly chosen individual has a serum cholesterol level greater than 250 mg/100 mi. Perhaps
this knowledge will help us to plan for future cardiac services. Since the total area be neath
the normal curve is equal to 1, we can estimate the probability in question by determining the
proportion of the area under the curve that lies to the right of the point x = 250, or P(X > 250).
This can be done using a computer program or a table of areas calculated for the normal curve.
Since a normal distribution could have an infinite number of possible values for its mean and
standard deviation, it is impossible to tabulate the area associated with each and every normal
curve. Instead, only a single curve is tabulated-the special case for which µ = 0 and σ = 1 .
This curve is known as the standard normal distribution.

Z Area in the right tail -Z Area in the left tail


0.00 0.500 0.00 0.500
1.65 0.049 -1.65 0.049
1.96 0.025 -1.96 0.025
2.58 0.005 -2.58 0.005
3.00 0.001 -3.00 0.001
Suppose that we wish to know the area under the standard normal curve that lies between
z = -1.00 and z = 1.00; since µ = 0 and σ = 1, this is the area contained in the interval µ ± 1σ
as shown in figure. Equivalently, it is P(-1 ≤ Z ≤ 1). The area to the right of z = 1.00 is
P(Z > 1) = 0.159. Therefore, the area to the left of z = -1.00 must be 0.159 as well. The
events that Z ¿ 1 and Z ¡ -1 are mutually exclusive; consequently, applying the additive rule of

66
probability, the sum of the area to the right of 1 and to the left of -1 is P(Z > 1) + P(Z < −1)
= 0.159 + 0.159 = 0.318.
Since the total area under the curve is equal to 1, the area between -1 and 1 must be
P(-1 ≤ Z ≤ 1) = 1 - [P (Z > 1) + P (Z < −1)] = 1-0.138 = 0.682
Therefore, for the standard normal distribution, approximately 68.2% of the area beneath the
curve lies within ± 1 standard deviation from the mean. We might also wish to calculate the
area under the standard normal curve that is contained in the interval µ ± 2σ, or P(-2≤ Z ≤2).
The area to the right of z = 2.00 is 0.023; the area to the left of z = -2.00 is 0.023. Therefore,
the area between -2.00 and 2.00 must be
P(-2≤ Z ≤ 2) = 1- [P (Z > 2) + P (Z < −2)] = 1.000 - [0.023 + 0.023] = 0.954. Approximately
±
95.4% of the area under the standard normal curve lies within 2 standard deviations from the
mean.

67
5.4 Question
Plot normal distribution curve for the following types.

ˆ Three normal distribution curves for three different values of the mean µ = −2,µ = 2, µ =
0 with σ = 1

ˆ Three normal distribution curves with fixed value of mean,µ = 0 and three different
standard deviation values σ = 0.5, σ = 1, σ = 2.

ˆ Standardize normal distribution according to 68-95-99.7 rule.


Solution:

68
5.5 Normal Approximations of the Binomial Distribution
The normal distribution can be used to approximate the binomial distribution when certain
conditions are met. This approximation makes calculations easier, especially for large number
of trials, and is based on the Central Limit Theorem. The binomial distribution describes the
probability of having exactly (k) successes in (n) independent Bernoulli trials, each with success
probability (p). Its probability mass function (PMF) is:
P (k) = nk pk (1 − p)n−k
When (n) is large, the shape of the binomial distribution resembles a bell-shaped curve. Ac-
cording to the Central Limit Theorem, the sum of many independent random variables (like
binomial trials) tends toward a normal distribution.
For the normal approximation to be reasonably accurate, typically:
np ≥ 5 and n(1 − p) ≥ 5
This ensures the distribution isn’t too skewed and is symmetric enough for the normal approx-
imation.
The binomial probabilities (P(k)) are approximated by the area under the normal curve
between (k - 0.5) and (k + 0.5) (using a continuity correction):
R k+0.5 (x−µ)2
P (k) ≈ √1 e− 2σ 2 dx
k−0.5 σ 2π

The continuity correction improves the approximation’s [Link] binomial


distribution can be closely approximated by a normal distribution when the number
of trials is large and the success probability isn’t too close to 0 or 1. This is
especially useful for simplifying calculations of probabilities for large (n), making
the analysis of binomial-based problems more practical.

69
5.6 Probability Histogram for Binomial Distribution
A probability histogram for a binomial distribution is a visual representation that shows how the
probabilities of different numbers of successes (k) are distributed across all possible outcomes
in a binomial experiment [5].
The histogram plots the probability of each possible number of successes (k = 0, 1, 2, ...,
n) on the vertical axis. The x-axis represents the number of  successes (k).
Each bar’s height corresponds to the probability P (k) = nk pk (1−p)n−k , which is the likelihood
of observing k successes out of n trials, given the probability p of success in each trial.

The histogram is typically discrete: bars are separated at integer values of k. It often has a
bell-shaped curve when n is large, especially for ( p ) near 0.5, resembling a normal distribution.

70
When p is close to 0 or 1, the histogram skews toward the lower or higher end of k.
Basic property of the Binomial Probability Histogram: For np ≥ 5 and nq ≥ 5, the
probability histogram for binomial distribution is nearly symmetric about µ = np over the

interval [µ − 3σ, µ + 3σ], where σ = npq, and outside this interval P(k) ≈ 0.

Example A fair coin is tossed 100 times. Find the probability P that heads occurs (a)
exactly 60 times. (b) between 48 and 53 times inclusive, (c) less than 45 times.
Solution: This is binomial experiment with n = 100, p = 0.5 and q =0.5.

∴ µ = np = 100X0.5 = 50 and σ = npq = 5.
Since we are using the normal approximation for discrete binomial counts, apply the continuity
correction:
To find ( P(k = 60) ), evaluate the probability of the normal variable falling between 59.5
and 60.5.
Binomial distribution BP(60) ≊ NP(59.5≤ X ≤ 60.5) where NP is normal probability.

z1 = 59.5−50
5
= 1.9 and z2 = 60.5−50
5
= 2.1
P = BP(60) ≊ N P (59.5 ≤ X ≤ 60.5) = N P (1.9 ≤ Z ≤ 2.1) = 0.4821 − 0.4713 = 0.0108

To find P(48≤k ≤53), evaluate the probability of the normal variable falling between 47.5
and 53.5.
Binomial distribution BP(47.5≤k ≤53.5)
z1 = 47.5−50
5
= −0.5 and z2 = 53.5−50
5
= 0.7
∴ P (48 ≤ k ≤ 53) = P (−0.5 ≤ k ≤ 0.7) = ϕ(0.7) − ϕ(0.5) = 0.2580+0.1915= 0.4495

To find less than 45 times.


BP (k < 45) = BP (k ≤ 44) ≈ N P (X ≤ 44.5) = N P (Z ≤ −1.1)
= 0.5 - ϕ(1.1) = 0.5 - 0.3643 =0.1357

Example A fair coin is tossed 12 times. Determine the probability P that the number
of heads occurring is between 4 and 7 inclusive by using a) the binomial distribution, b) the
normal approximation to the binomial distribution.
Solution: Number of trials, ( n = 12 )
Probability of heads in each trial, ( p = 0.5 )
a) We need probability that heads occur between 4 and 7 inclusive, i.e., P (4 ≤ X ≤ 7).
P (k) = nk pk(1 − p)n−k
BP (4) = 12 (0.5)4 (0.5)12−4 = 12 495

4 4
(0.5)4 (0.5)8 = 4096
BP (6) = 126
(0.5)6 (0.5)12−6 = 12
6
924
(0.5)6 (0.5)6 = 4096
BP (5) = 125
(0.5)5 (0.5)12−5 = 12
5
792
(0.5)5 (0.5)7 = 4096
BP (7) = 127
(0.5)7 (0.5)12−7 = 12
7
792
(0.5)7 (0.5)5 = 4096
Hence P = BP(4)+ BP(5)+ BP(6)+ BP(7) = 3003 4096
= 0.7332

b) Approximate probability using normal distribution


µ = np
p = 12×0.5 =√6 √
σ = np(1 − p) = 12 × 0.5 × 0.5 = 3 ≈ 1.732

71
P (3.5 < Z < 7.5) where Z = X−µ σ
3.5 in standard units = 3.5−6
1.73
= −1.45
7.5 in standard units = 7.5−6
1.73
= 0.87

P = NP(3.5 ≤ X ≤ 7.5) = NP(−1.45 ≤ Z ≤ 0.87 )


P = ϕ(0.87) + ϕ(1.45)= 0.3087 + 0.4265 = 0.7343
Note that the relative error = 0.7332−0.7343
0.7332
= 0.0015

5.7 Example :
Find the value of z that cuts off the upper 10% of the standard normal distribution, or the
value of z for which P(Z > z) = 0.10.
Solution:

Locating 0.100 in the body of the table, we observe that the corresponding value of z is
1.28. Locating 0.100 in the body of the table, we observe that the corresponding value of z is
1.28. Therefore, 10% of the area under the standard normal curve lies to the right of z = 1.28.
Similarly, another 10% of the area lies to the left of z = -1.28.

5.8 Example :
If the height of the 300 students are normally distributed with mean 64.5 inches and standard
deviation 3.3 inches. How many students have height less than 5 feet?
Solution:
Z = X−µ σ
Z = 60−64.5
3.3
= −4.5
3.3
≈ −1.36
P (Z < −1.36) = 1 − 0.4131 ≈ 0.0869
Number of students = 0.0869 ×300 ≈ 26.07 = 26 students.
To determine how many students have a height between 5 feet (60 inches) and 5 feet 9
inches (69 inches), we will use the properties of the normal distribution with the given mean
((µ)) of 64.5 inches and a standard deviation ((σ)) of 3.3 inches.
Z1 = 60−64.5
3.3
= −4.5
3.3
≈ −1.36
69−64.5 4.5
Z2 = 3.3 = 3.3 ≈ 1.36

72
P (60 < X < 69) = P (Z < 1.36) − P (Z < −1.36) = 0.9131 − 0.0869 = 0.8262
Number of students = 0.8262 × 300 ≈ 247.86 = 248 students

5.9 Example :
let X be a random variable that represents systolic blood pressure. For the population of 18-
to 74-year-old males in the United States, systolic blood pressure is approximately normally
distributed with mean 129 millimeters of mercury (mm Hg) and standard deviation 19.8 mm
Hg. Find the value of x that cuts off the upper 2.5% of the curve of systolic blood pressures.
Solution: We wish to find the value of x that cuts off the upper 2.5% of the curve of systolic
blood pressures, or, equivalently, the value of x for which P(X > x) = 0.025. From the z table,
we see that the area to the right of z = 1.96 is 0.025. z = 1.96 = x−µσ
X= 129 + (1.96)(19.8) = 167.8. Therefore, approximately 2.5% of the men in this population-a
minuscule minority have systolic blood pressures that are greater than 167.8 mm Hg, while
97.5% have blood pressures less than 167.8 mm Hg. In other words, if we randomly select an
individual from this adult male population, the probability that his systolic blood pressure is
greater than 167.8 mm Hg is 0.025. Because the standard normal curve is symmetric around z
= 0, we know that the area to the left of z = -1.96 is also 0.025. By solving the equation z =
-1.96 = x−µσ
X= 129 + (-1.96)(19.8) = 90.2. we find that 2.5% of the men have a systolic blood pressure
that is less than 90.2 mm Hg. Equivalently, the probability that a randomly selected male has a
systolic blood pressure less than 90.2 mm Hg is 0.025. Since 2.5% of the men in the population
have systolic blood pressures greater than 167.8 mm Hg and 2.5% have values less than 90.2
mm Hg, the remaining 95% of the men must have systolic blood pressure readings that lie
between 90.2 and 167.8 mm Hg. We might also be interested in determining the proportion of
men in the population who have systolic blood pressures greater than 150 mm Hg. In this case,
we are given the outcome of the random variable X and must solve for the normal deviate z: z
= 150−129
19.8
= 1.06
The area to the right of z = 1.06 is 0.145. Therefore, approximately 14.5% of the men in this
population have systolic blood pressures greater than 150 mm Hg.
The coefficient of variation (CV) is a statistical measure that quantifies the relative vari-
ability of a data set. It is defined as the ratio of the standard deviation to the mean, often
expressed as  apercentage. The formula for the coefficient of variation is:
[CV = σµ × 100]
where:
( σ ) is the standard deviation, (µ) is the mean. Importance of Coefficient of Variation
Comparison of Variability: The CV allows for the comparison of variability between different
data sets, regardless of their unit of measurement or scale. This is particularly useful in finance,
where you might want to compare the risk (volatility) of different investments with different
expected returns.
Standardized Measure: Since the CV is a dimensionless number (it has no units), it stan-
dardizes variability, making it easier to interpret.
Insights into Data Distribution: A low CV indicates that the data points tend to be close to
the mean, whereas a high CV indicates that they are spread out over a wider range of values.

73
Example 1: Investment Returns:Followings are the marks obtained by the students in
a subject.
Year 2021 Bitcoin
32 17
49 0
44 5
52 3
69 20
63 14
34 15
Find the standard deviation. qP
(x−x̄)2 (x−x̄)2
P P
|x−x̄| 1144 2
Solution: Standard deviation : n−1 7
= 7−1
=13.8 Variance = σ = n−1
Example 2: Investment Returns:
Investment A: Mean  return = 10%, Standard deviation = 2% Investment B: Mean return
2
= 15CV for A: ( 10 × 100= 20% )
5
CV for B: ( 15 × 100= 33.33% ) Interpretation: Although Investment B has a higher mean
return, it also has a higher relative volatility compared to Investment A.
Example 2: Numerical Find the covariance and commment on the value obtained.
Year 2021 Bitcoin Ethereum
Jan 33100 1310
Feb 45200 1420
Mar 58800 1920
April 57700 2770
May 37300 2710
June 35000 2270
July 41600 2530
August 47100 3430
Sept 43800 3000
Oct 61100 4290
Nov 56900 4630
Dec 46200 3680
Bit coin standard deviation = SB = 9650 Ethereum standard deviation = SE = 1045
CVB = Sx̄ = 47000
9650
= 0.21 CVE = Sx̄ = 2830
1045
= 0.37
So the standard deviation of bitocoin is 21 % of mean and in case of etherium it is 37 % of
the mean. It indicates that Ethereum is more volatile. We can say that coefficient of variation
state the relative size of the standard deviation.

5.10 Moments
In probability and statistics, moments are quantitative measures related to the shape of a
probability distribution. They provide important information about the characteristics of the
distribution, such as its central tendency, variability, and shape. Understanding moments can
help in data analysis, model fitting, and various applications in fields like finance, engineering,

74
and social sciences. Moments are the expected values of the random variable.
Folowings are the types.
ˆ Zeroth Moment (Total Probability):
Definition: The zeroth moment of a random
R∞ variable is the total probability, which is
always equal to 1. Formula: M0 = −∞ f (x), dx = 1 Example: For any probability
density function (PDF), integrating the PDF over its entire range yields 1, confirming
that total probability is conserved.
ˆ First Moment (Mean):
Definition: The first moment is the mean (or expected value) of the random variable,
reflecting
R∞ the central tendency of the distribution. E(X) Formula: M1 = µ = E[X] =
−∞
xf (x), dx Example: For a uniform distribution on the interval ([a, b]): E[X] = a+b
2
If ( a = 1 ) and ( b = 3 ), then ( E[X] = 1+32
= 2 ).
ˆ Second Moment (Variance):
Definition: The second moment about the mean (or variance) measures R ∞the dispersion or
2 2 2 2
spread of the distribution. E(X ) Formula: M2 = E[(X−µ) ] = σ = −∞ (x−µ) f (x), dx
Example: For a normal distribution, where (µ) is the mean and σ 2 is the variance, if µ = 0
and σ =, the second moment is σ 2 = 1.
ˆ Third Moment (Skewness):
Definition: The third moment about the mean (skewness) measures the asymmetry of the
3]
distribution. E(X 3 ) Formula: γ1 = E[(X−µ)
σ3
Example: A distribution with positive skew
(right-skewed) has a longer tail on the right side, while a negative skew (left-skewed) has
a longer tail on the left side. For a skewed distribution, the skewness value may be larger
than 0 or less than 0 accordingly.
ˆ Fourth Moment (Kurtosis): E(X ) 4

Definition: The fourth moment about the mean (kurtosis) measures the ”tailedness” of
4]
the distribution, indicating the presence of outliers. Formula: γ2 = E[(X−µ)
σ 4 − 3 (the
subtraction of 3 makes the kurtosis of the normal distribution equal to 0). Example: A
distribution with high kurtosis greater than 3 has heavy tails and more outliers, whereas
one with low kurtosis less than 3 has lighter tails. For instance, a Laplace distribution
has a higher kurtosis than a normal distribution.
Mean is written as E(X) = µ′1 and Variance is written as µ′2

Let X be a (discrete or continuous) random variable. We define the kth moment about the
origin as µ′k = E(X k ) and the kth central moment as µk = E[(X − E(X))k ].

Let X be the result of tossing a die, then

75
µ′3 = E(X 3 ) = 13 61 + 23 16 + 33 16 + 43 61 + 53 16 + 63 61 = 147
2

µ3 = E[(X − E(X))3 )] = 16 [(1 − 3.5)3 + (2 − 3.5)3 + (3 − 3.5)3 + (4 − 3.5)3 + (5 − 3.5)3 +


(6 − 3.5)3 ] + 23 16 + 33 16 + 43 16 + 53 16 + 63 61 = 147
2
The expectation of a function of a random variable extends the idea of expectation to incor-
porate transformations of the random variable. It captures the average behavior of the function
applied to the variable, weighted by the probabilities associated with the random variable. This
concept is highly useful in probability and statistics for evaluating various metrics, such as vari-
ance and moments.

76
Numerical based on expectation E(x): The daily consumption of electric power (in
million kwh) is a random variable X with probability density function is f(x) = kxe−x/3 for
x > 0 and 0 elsewhere. Find the value of k, the expectation of k and the probability that on a
given
R ∞ day the electric consumption is more than expected value.
−x/3
0
kxe , dx = 1

77
1
k= 9
R∞ R∞ R∞
1
xe−x/3 1
x2 e−x/3 dx

E[X] = 0
xf (x)dx = 0
x· 9
dx = 9 0

To solve using the gamma function, we will put

Let ( t = x3 ), which implies ( x = 3t ) and ( dx = 3 dt ).

The limits remain the same since as ( x ) goes from ( 0 ) to (∞ ), ( t ) also goes from ( 0 )
to (∞ ).

Now substitute into the integral:

R∞ R∞
0
x2 e−x/3 dx = 0
(3t)2 e−t · 3dt

This simplifies to:

R∞ R∞
=3 0
9t2 e−t dt = 27 0
t2 e−t dt

R∞
But, 0
t2 e−t dt = 2!
R∞
Thus, 0
x2 e−x/3 dx = 54 and

R∞
1
9 0
x2 e−x/3 dx = 6

Calculate the Probability P (X > E[X])


R∞
P (X > E[X]) = P (X > 6) = 1
9 0
xe−x/3 dx = 0.406

Note that without using the Gamma function also,

R∞ h −x/3 i∞
2 −x/3 2e e−x/3 e−x/3
0
xe dx = x −1/3 − 2x( 1/9 ) + 2 −1/27 =6
0

78
Numerical based on expectation E(x): The distribution function of a random variable
X is given by FX (x) = 1 − (1 + x)e−x , x ≥ 1. Find the mean and the variance.
dFx (x)
Solution : fX (x) = dx
= (1 + x)e−x − e−x = xe−x , x ≥ 0
R∞ R∞
Mean = X̄ = 0
xf (x)dx = 0
x2 e−x dx = 2
R∞ R∞
E(X 2 ) = 0
x2 f (x)dx = 0
x3 e−x dx = 6

V(X) = E(X 2 ) − [E(X)]2 = 6-4 = 2


Expectation of a function of a random variable :The expectation of a function of a
random variable is a generalization of the concept of expectation itself. If you have a random
variable (X) and a function (g(X)), the expected value of the function (g(X)) is denoted as
(E[g(X)]).

ˆ Discrete Random Variables:If (X) is a discrete random variable with probability mass
function (P(X = x i)), then the expectation of the function (g(X)) is calculated as follows:
P
E[g(X)] = i g(xi )P (X = xi )

ˆ Continuous Random Variables: If (X) is a continuous random variable with probability


density function (f(x)), the expectation is calculated with an integral:
R∞
E[g(X)] = −∞ g(x)f (x)dx

Numerical based on expectation f(x) and g(x): If the probability density function
of x is f(x) = 29 x(2 − x2 ) , 0 ≤ x ≤ 3. Find E(Y) where Y = (X + 1)2 .
R∞ R3
Solution : 0
g(x)f (x)dx = 0
(x + 1)2 f (x)dx
R3
0
(x + 1)2 92 x(2 − x2 )dx = 29 [−24.3 + 20.25 + 31.5 + 9] = 8.1

Properties of variance : Followings are the properties of variance.

ˆ Variance of constant is zero.


ˆ V (aX + b) = a V (X) 2

ˆ V (a X + a X ) = a V (X ) + a V (X )
1 1 2 2
2
1 1
2
2 2

Moment Generating Function: It is denoted by MX t. MX′ 0 gives the value of E(X).


M ”X 0 gives the value of E(X 2 ).

79
6 Series expansion
Taylor series for ( eu ) (where ( u = tx ))

t2 x2 t3 x3 t4 x4
etx = 1 + tx + 2
+ 6
+ 24
+ ···

Expansion of (1 − z)− 1

(1 − z)−1 = 1 + z + z 2 + z 3 + z 4 + . . .

Expansion of (x − a)n
n n n
 n−1  
(x − a)n = xn − 1
ax + 2
a2 xn−2 − 3
a3 xn−3 + . . . + (−1)n an

Moment Generating Function of Gaussian


1 2 t2
MX (t) = E[etX ] = eµt+ 2 σ

Moments about the mean and moments about the origin:

(µr ): Moment about the mean (central moment).


(µ′r ): Moment about zero (raw moment).

The first four moments about the mean (using the notation (µr ) for moments about the
mean and (µ′r ) for moments about zero):
First Moment (Mean):
µ1 = 0

Second Moment (Variance):


µ2 = µ′2 − (µ′1 )2

Third Moment:
µ3 = µ′3 − 3µ′2 µ′1 + 2(µ′1 )3

Fourth Moment:
µ4 = µ′4 − 4µ′3 µ′1 + 6µ′2 (µ′1 )2 − 3(µ′1 )4

7 Characteristic function
: The characteristic function is a way to describe a random variable X. The characteristic
function, [?] ϕx (t) = E[eitx ], a function of t, determines the behavior and properties of the

80
probability distribution of X. It is equivalent to a probability density function or cumulative
distribution function, since knowing one of these functions allows computation of the others,
but they provide different insights into the features of the random variable. In particular cases,
one or another of these equivalent functions may be easier to represent in terms of simple stan-
dard functions. If a random variable admits a density function, then the characteristic function
is its Fourier dual, in the sense that each of them is a Fourier transform of the other. If a ran-
dom variable has a moment-generating function MX (t), then the domain of the characteristic
function can be extended to the complex plane, and ϕx (it) = MX (t). Note however that the
characteristic function of a distribution is well defined for all real values of t, even when the
moment-generating function is not well defined for all real values of t.

The characteristic
R ∞ itx function (ϕX (t)) of a random variable ( X ) is defined as: ϕX (t) =
itX
E[e ] = −∞ e fX (x), dx for continuous variables
or P
ϕX (t) = x eitx P (X = x) for discrete variables

If the moment generating function (MX (t)) exists, the characteristic function can be related
to it. Specifically, the characteristic function can be derived from the moment generating func-
tion by substituting ( it ) for ( t ).

The moments of a random variable can be obtained from derivatives of its characteristic
function:

The characteristic Rfunction of a distribution is its fourier transform.



ϕX (t) = E[eitX ] = −∞ eitx fX (x), dx ϕX (0) = 1; ϕ′X (0) = ixPX (x)dx;
−ϕ”X (0) = x2 PX (x)dx;

Example : Prove that the characteristic function of the sum independent random variable
is the product of their individual characteristic functions.
Let S = RX + Y
pS (s) = pX (u)pY (s − u)du
ϕS (t) = ϕX (t)ϕY (t)
R ∞Which is nothing but the fourier convolution theorem.
itx
Proof: ϕX (t) = −∞ e PX (x)dx
R∞
pX (x) = 2π1
−∞R x
ϕ (t)e−itx dt inverse fourier transform
Since, pS (s) = pRX (u)pY (s − u)du

1
pS (s) = pX (u)[ 2π −∞ r
ϕ (t)e−it(s−u) dt]du
∞ ∞
1
ϕ (t)e−its [ −∞ pX (u)eitu du]dt
R R
2π −∞ r R

1
pS (s) = 2π −∞ Y
ϕ (t)ϕX (t)e−its dt
Therefore it is proved that the characteristic function of the sum of independent random vari-
able is the product of their individual characteristic R functions i.e ϕS (t)
R = ϕY (t)ϕX (t)
Scaling Rlaw for Random Variables : ϕax (t) = eitx pax (X)dx = eitx a1 px ( xa )dx
x
ϕax (t) = ei(at) a px ( xa ) dx
a
= ϕX (at)

81
Example : Let X have the probability mass function

fx = π62 x12 ; x = 1,2,3,4, - - -


and 0 otherwiseP Find the moment generating function of X.
Here the series X=1 is divergent and not convergent. Hence moment generating function of
X is not exists.
We can solve such problems by using characteristic function.

|eix | = 1
ϕx (t) = E[eitx ]
(itx)2 (itx)3
ϕx (t) = E[1 + itx + 2!
+ 3!
− −−]

(it)2 (it)3
ϕx (t) = E[1 + (it)E(x) + 2!
E(x)2 + 3!
E(x)3 − −−]

r r
µr′ = coeff of i r!t in the expansion of ϕr (t)
also ,
r
µr′ = E(xr ) = i1r dtd r ϕx (t)

Problem : Find the characteristic function of Poisson’s distribution and hence find its
mean and variance.
Solution: The probability mass function of Poisson’s distribution with respect to λ is
−λ x
P(x) = e x!λ
Thus thePcharacteristic function of it is
itx e−λ λx
ϕx (t) = ∞ x=0 e x!
(eit λ)x
ϕx (t) = e−λ ∞
P
x=0 x!
−λ eitλ
ϕx (t) = e e
itλ
ϕx (t) = e−λ ee
itλ
ϕx (t) = e−λ(1−e )
itλ )
E(x) = 1i dtd ϕx (t) = 1i e−λ(1−e
At t=0, E(x) = λ
2
E(x2 ) = i12 dtd 2 ϕx (t) = λ + λ2

Problem : Find the characteristic function of Binomial distribution and hence find its
mean and variance.
Solution: The binomial distribution is characterized by two parameters: ( n ) (the number of
trials) and ( p ) (the probability of success in each trial). The probability mass function (PMF)
of a binomially distributed random variable ( X ) is given by:
n
 k
P (X = k) = k
p (1 − p)n−k for k = 0, 1, 2, . . . , n

82
The characteristic function ( ϕX (t) ) of a binomial random variable ( X ) is defined as:
Pn
ϕX (t) = E[eitX ] = k=0 eitk P (X = k)

Substituting the PMF into the equation:


Pn n

ϕX (t) = k=0 eitk k
pk (1 − p)n−k
Pn n

ϕX (t) = k=0 k
(peit )k (1 − p)n−k = (peit + (1 − p))n

1 d 1 d
E(x) = ϕ (t)
i dt x
= i dt
(peit + (1 − p))n

E(x) = np(peit + (1 − p))n−1 eit


2
At t= 0, We have E(X) = np E(x2 ) = i12 dtd 2 ϕx (t) = −np[(n − 1)(q + eit p)n−2 pieitieit + (q +
eit p)n−1 (i2 eit )]
At t = 0 , We have E(X 2 ) = np[(n − 1)p + 1] Thus mean = E(X) = np and Variance = npq

Problem :If a characteristic function of a random variable is given by


ϕ(x) t = 1/6 + 1/2eit + 1/3e3it . Find the probability P (X ≤ 1).
Solution: The given characteristic function is:
ϕX (t) = 61 + 12 eit + 31 e3it
P(X = 0) = 16 , P (X = 1) = 21 , P (X = 3) = 31
P (X ≤ 1) = P (X = 0) + P (X = 1) = 16 + 21 = 23

Problem : Find the characteristic function of a random variableR (X) that follows a uniform

distribution on the interval ([-1, 1]). Solution: ϕX (t) = E[eitX ] = −∞ eitx fX (x), dx

For a uniform distribution over the interval ([-1, 1]), the probability density function (fX (x))
is defined as:(
1
if − 1 ≤ x ≤ 1,
fX (x) = 2
0 otherwise
R 1 itx R1
ϕX (t) = −1 e fX (x), dx = −1 eitx · 12 , dx

1
R1
ϕX (t) = 2 −1
eitx , dx

1 1 1 itx
R
2
eitx dx = 2 it
e

As eit − e−it = 2i sin(t)

83
R1  1 1 itx 1 sin(t)
1
2 −1
e itx
, dx = 2 it
e −1 = 1 1
2 it
(eit − e−it ) = t
(
sin(t)
t
if t ̸= 0,
ϕX (t) =
1 t=0

8 Random Processes:
A random process is a collection of random variables indexed by time or space, representing
a system that evolves over time or across different configurations. It can be thought of as a
”family” of possible outcomes at different points, where each specific realization of the random
process corresponds to a possible state of the system.

In statistical terms, an ensemble refers to the complete set of realizations of the random
process at a given point in time. Each realization represents a different ”trajectory” or ”sam-
ple path” of the process, providing insight into the variability and probabilistic nature of the
system’s behavior.

Using the notion of an ensemble, you can derive important statistical properties, such as:
Mean (Expected Value): The average of all realizations at a given time provides insights
into the expected behavior of the random process. This is often computed as: E[X(t)] =
Average of the values in the ensemble at time t.
Variance: Measures the dispersion of the values in the ensemble around the mean, provid-
ing information about the variability of the process: V ar(X(t)) = E[(X(t) − E[X(t)])2 .
Covariance: Captures how two random variables (or values of the random process at
different times) move together and is derived from the joint distribution of the ensemble at
those times.

8.1 Ensemble averages:


Ensemble averages refer to the statistical averages calculated over a set of realizations (or ”sam-
ples”) of a random process or random variables at the same point in time, rather than over
time for a single realization. This concept is fundamental in the study of random processes,
especially in physics, engineering, and statistics.
In the context of a random process (X(t)), the ensemble average at time (t) is defined by the
expected value of the process over multiple realizations
R ∞ (also called the ensemble) of the random
process at that specific time: ⟨X(t)⟩ = E[X(t)] = −∞ x · PX (x, t), dx, where:

(PX (x, t)) is the probability density function of the random variable (X(t)) at time (t).
The integral computes the average of all possible values that the process might take at time (t)
weighted by their probabilities.

84
8.2 What is the definition of a stationary random process?
A stationary random process is defined as a stochastic process whose statistical properties do
not change over time. Specifically, this means that:
ˆ Mean: The expected value (mean) of the process is constant over time.
ˆ Variance: The variance of the process is also constant over time.
ˆ Covariance: The covariance between values of the process at different times depends only
on the time difference (lag) between those values, not on the actual time at which the
values are observed.
In simpler terms, a stationary random process exhibits consistent behavior regardless of when
it is observed, allowing for reliable statistical inference over time. Stationarity is a critical
assumption in many statistical models and time series analyses.

8.3 Invariant under the translation of time period


The term invariant under the translation of time period refers to a property of a random process
where the statistical characteristics (mean, variance, covariance, etc.) do not change when shifts
occur in the time index. In simpler terms, it means that the behavior of the process remains
the same when the time variable is shifted or translated by a fixed amount. Stationarity:

8.4 Stationarity
A random process is said to be stationary if it is invariant under time translation. This means
that if you take the process (let’s denote it as ( X(t) )) and look at it at a different time ( t +
τ ) (where ( τ ) is a constant), the statistical properties remain the same: [ X(t) and X(t +
τ ) ] Specifically for a stationary process, the covariance between two points in time depends
only on the difference in time between those points, not the absolute times. For example: [
Cov(X(t), X(t + τ )) = Cov(X(0), X(τ )) ] This indicates that the statistical structure of the
process does not change when observed at different times.
Implications of Invariance:
If a process is invariant under time translation, it suggests that the process is stable over
time. This is an important property in many fields, including signal processing, finance, and
physics, as it allows for consistent modeling and predictions. For instance, in time series analy-
sis, many statistical methods assume that the underlying data-generating process is stationary
(or at least approximately stationary) so that past behavior can inform future expectations
reliably.
Applications and Importance:
Invariance under time translation is key in constructing models for forecasting and under-
standing the underlying dynamics of stochastic processes. In practical terms, if observing a
process yields the same statistical behavior regardless of when observations are made, it sim-
plifies analytical operations, allowing for broader applications of models developed based on
those observations.

85
What is IID random variables?
IID stands for ”Independent and Identically Distributed,” and it refers to a collection of
random variables that have two key properties:
ˆ Independent: Each random variable in the collection does not influence or provide any
information about the others. The occurrence of one event does not affect the probability
of occurrence of another event.

ˆ Identically Distributed: Each random variable has the same probability distribution. This
means that they all share the same mean, variance, and shape of the distribution, even
though the individual outcomes may differ.
A set of IID random variables behaves as if they are drawn from the same statistical population
without any dependence between them, making them a fundamental concept in probability the-
ory and statistics, especially in the context of sampling and many statistical inference methods.

Strict Stationarity:
A random process (X(t)) is said to be strictly stationary if the joint distribution of any
collection of random variables (X(t1 ), X(t2 ), ..., X(tn )) is the same as the joint distribution of
(X(t1 + τ ), X(t2 + τ ), ..., X(tn + τ )) for all time shifts (τ ) and any collection of time points
(t1 , t2 , ..., tn ). In simpler terms, the statistical properties of the process are invariant under time
shifts, meaning that the entire distribution does not change if you shift the time variable.
Wide Sense Stationarity (WSS):
A random process (X(t)) is said to be wide sense stationary if:
ˆ The mean (E[X(t)]) is constant over time: [E[X(t)] = µ for all t, ]
ˆ The autocovariance (Cov(X(t ), X(t ))) depends only on the time difference (|t −t |) and
1 2 2 1
not on the actual time points: [ Cov(X(t1 ), X(t2 )) = Cov(X(t1 ), X(t1 +τ )) for all t1 , t2 , τ.
]
In broad terms, it focuses on the first two moments (mean and variance) of the process.
Conditions
ˆ Strict Stationarity: (Strict sense staionary process or strongly stationary process) Re-
quires all statistical characteristics (all moments of the distribution) to be the same at
all times and invariant under time shifts. This is a stronger condition; it applies to the
entire distribution of the process.

ˆ Wide Sense Stationarity:Only requires the mean to be constant and the autocovariance to
depend only on the time difference. It does not impose any restrictions on higher moments
(like skewness or kurtosis). This is a less restrictive condition, and many processes can
be WSS without being strictly [Link] process is also called as weakly stationary
process or covariance stationary process.
Note: SSS process with finite first and second order moments is a WSS process, while a WSS
process need not be a SSS process.
Strictly Stationary Process Example:

86
A Gaussian process with a constant mean and variance is strictly stationary because any
linear combination or marginal distribution remains Gaussian and unchanged under time shifts.
Wide Sense Stationary Process Example:
A process that follows a sine wave with a constant mean (like (X(t) = A sin(ωt + ϕ) + µ))
could be considered wide sense stationary if the average and the autocovariance depend solely
on the frequency of oscillation, despite potentially being non-Gaussian or having varying higher
moments.

WSS processes are often easier to work with in practical applications like time series analy-
sis, especially when using methods that rely on mean and autocovariance properties, but they
may leave out important characteristics of the process.

IID Variables and Stationarity


Relation Between IID and Stationarity is IID =⇒ WSS:
IID random variables inherently exhibit properties of wide-sense stationarity (WSS). Since
they are identically distributed, their means are constant, and since they are independent, the
covariances between different variables are zero. Therefore: The mean of an IID sequence is
constant. The variance is also constant. The covariance between any two different IID random
variables is zero.
Strict Stationary: For strict stationarity, the distribution of the sequence must remain the
same across time shifts. IID variables satisfy this since they come from the same distribution.
Thus, the entire distribution of the random process remains the same regardless of time shifts.
Numerical 1:Examine whether the Poisson’s process X(t), given by the probabiity law
−λt r
P X(t) = r = e r!(λt) , r = 0, 1, 2, 3, . . . is a covariance stationary.
Solution : The mean (expected value) of a Poisson random variable ( X(t) ) is given by:
E[X(t)] =λ t.
This indicates that the mean of the process is directly proportional to time ( t ) and will
vary with ( t ).
Calculate the Covariance The formal definition of Covariance is Cov(X, Y) = E[(X - E[X])(Y
- E[Y])]
Cov(X, Y) = E[XY - E[X]Y - XE[Y] + E[X]E[Y]].
Cov(X, Y) = E[XY] - E[E[X]Y] - E[XE[Y]] + E[E[X]E[Y]].
E[E[X]Y] = E[X]E[Y]
E[XE[Y]] = E[Y]E[X]
Cov(X, Y) = E[XY] - E[X]E[Y] - E[Y]E[X] + E[X]E[Y].
Cov(X, Y) = E[XY] - E[X]E[Y].
Cov(X(t1 ), X(t2 )) = E[X(t1 )X(t2 )] − E[X(t1 )]E[X(t2 )]
We can express X(t2 ) = X(t1 ) + (X(t2 ) − X(t1 )).
E[X(t1 )X(t2 )] = E[X(t1 )(X(t1 ) + (X(t2 ) − X(t1 )))] = E[X(t1 )2 ] + E[X(t1 )]E[X(t2 ) − X(t1 )]
(E[X(t2 )] = λt2 ).
(E[X(t2 ) − X(t1 )] = λ(t2 − t1 ) ).
E[X(t1 )2 ] = V ar(X(t1 )) + (E[X(t1 )])2 = λt1 + (λt1 )2 .
Cov(X(t1 ), X(t2 )) = E[X(t1 )2 ] + E[X(t1 )]E[X(t2 ) − X(t1 )] − E[X(t1 )]E[X(t2 )]
The final expression will show that:

87
The covariance depends on the specific times (t1 ) and (t2 ), particularly it depends on (t1 )
and ( t2 ) together rather than just their difference: [ Cov(X(t1 ), X(t2 )) = λt1 when t1 = t2 . ]

The mean ( E[X(t)] = λt) changes with time ( t ), and the covariance also depends on the
specific values of (t1 ) and (t2 ).
Since:
The mean of ( X(t) ) changes with time (not constant). The covariance ( Cov(X(t1 ), X(t2 )))
does not depend solely on the time difference (|t2 − t1 |). Thus, the Poisson process ( X(t) )
is not covariance stationary because it does not satisfy the conditions of constant mean and
covariance structure dependent only on the lag [3].

8.5 Stochastic Convergence


Explain the Stochastic Convergence? The behavior of a sequence of random variables or
processes as they approach a particular limit or value in some statistical sense. Different types
of convergence are used to describe how these processes behave in limit scenarios, and each
type has its own implications and applications in probability theory and statistics.
ˆ Convergence in Distribution (or Weak Convergence)
A sequence of random variables ( Xn ) converges in distribution to a random variable
( X ) if the cumulative distribution functions (CDFs) of ( Xn ) converge to the CDF
of ( X ) at all points where ( X ) is continuous: [ limn→∞ FXn (x) = FX (x) for all
x where FX is continuous. This type of convergence is useful for understanding the
limiting behavior of random variables and often applies in statistical inference.
Example of Convergence in Distribution (or Weak Convergence)
Suppose we have a sequence of random variables defined as follows:
Let Xn be a random variable representing the average of ( n ) independent and identically
distributed (IID) random variables, each uniformly distributed between 0 and 1, i.e., (
U(0, 1) ). As ( n ) increases, by the Central Limit Theorem, we know that the distribution
of the sample mean Xn approachesP a normal distribution. Formally, the sum of these IID
random variables follows: Xn = n ni=1 Ui where Ui ∼ U (0, 1)
1

The mean and variance of the uniform distribution ( U(0, 1) ) are:


Mean: µ = 12
1
Variance: σ 2 = 12
Now, as n → ∞, according to the Central Limit Theorem:
Xn −µ d
√σ

− N (0, 1),
n
where ( N(0, 1) ) is the standard normal distribution. This implies:
d
− N 12 , 12n
1

Xn → as n → ∞.

Why it is Called Weak Convergence?:


A sequence of random variables ( Xn ) converges in distribution to a random variable
( X ) if the CDFs converge at all points where ( X ) is continuous: limn→∞ FXn (x) =

88
FX (x) for all x where FX is continuous. This type of convergence is concerned with
the limiting behavior of the probability distribution of the random variables.
Weak Convergence Concept:
The term weak convergence comes from the fact that convergence in distribution does
not require strong forms of convergence, such as convergence of the random variables
themselves almost surely or in probability. Instead, it only requires that the distributions
of the random variables converge to the distribution of the limiting random variable. This
means that convergence is ”weak” in the sense that it applies to the distribution functions
rather than to the random variables themselves.

Convergence of Random Variables: Even though the values of Xn (the random vari-
ables) cannot be expected to converge pointwise to a specific value, the distribution (as
captured by the CDF) is converging.

ˆ Convergence in Probability: This concept is essential in probability theory and statis-


tics, particularly in the context of sampling and estimation.
A sequence of random variables (Xn ) converges in probability to a random variable ( X )
if, for any small positive number (ϵ): [limn→∞ P (|Xn − X| > ϵ) = 0. Essentially, as ( n )
increases, the probability that (Xn ) deviates from ( X ) by more than ( ϵ) approaches zero
indicating that (Xn ) is ”getting closer” to ( X ). Convergence in probability is commonly
used in law of large numbers .

8.6 Bounds of Probabilities


Bounds in Probabilities If we know probability distribution of a random variable, we
can calculate E(X) and Var (X) if these exist. But from the knowledge of these mea-
sures we cannot find the probaility distributions or calculate the probability that X =
a, where a is given constant. Although we cannot find such probabilities we can find
the bounds within which these probabilities lie by using Chebyshev’s inequality. In other
words, Chebysheve’s inequality gives us bounds on the probability how much a random
variable can deviate from its mean value X̄. The most striking aspect of the inequality is
that it is quite universal in the sense it dones not depend upon the nature of probability
distribution of X. Chebyshev’s Inequality: If X is a random variable with mean µ and
standard deviation σ, then for any positive number k,
P (|X − µ| ≥ kσ) ≤ k12

In case of discrete random varible,


P (|X − µ| ≥ kσ) + P (|X − µ| < kσ) = 1

1
P (|X − µ| < kσ) = 1 − P (|X − µ| ≥ kσ) = 1 − k2

89
σ2
If kσ = C then P (|X − µ| ≥ C) ≤ C2

σ2
P (|X − µ| < C) ≥ 1 − C2

Example : X is a random variable donating the number of complaints received at a


service station on a day with mean 20 and standard deviation 2. Find the probability
that on a day the number of complaints will lie between 8 and 32.
Solution : We have µ = 20 and σ = 2.
|X − µ| = kσ
|X−µ|
σ
=k
When X = 32, K= 32−20 2
= 6 and When X = 8, k = 8−202
=6
1
P (|X − µ| < kσ) ≥ 1 − k2
1 35
P (|X − µ| < 6σ) ≥ 1 − 36 ≥ 36
P (|X − 20| < 12) ≥ 35
36
35
∴, P (8 < X < 32) ≥ 36

Example : Suppose X is a random variable with µ = 75 and standard deviation σ = 5.


What conclusion about X can be drawn from Chebyshev’s inequality for k =2 and k=3?

Solution : Setting k = 2 , we get


µ − kσ= 75-2(5) = 65 and µ + kσ= 75+2(5) = 85
Thus we can conclude from Chebyshev’s inequality that the probability that a value of
X lies between 65 and 85 is at least 1 − 212 = 34 ; that is,
P (65 ≤ X ≤ 85) ≥ 34
By letting k = 3 , we get
µ − kσ= 75-3(5) = 60 and µ + kσ= 75+3(5) = 90
Thus we can conclude from Chebyshev’s inequality that the probability that a value of
X lies between 65 and 85 is at least 1 − 312 = 89 ; that is,
P (60 ≤ X ≤ 90) ≥ 89

Example : Suppose X is a random variable with µ = 75 and standard deviation σ = 5.


Estimate the probability that X lies between 75-20 = 55 and 75+20 = 95.
Solution : Set kσ = 20 and solve for k. Since σ = 5, we get k = 4.
Thus, by Chebyshev’s inequality,
P (55 ≤ X ≤ 95) ≥ 1 − 412 = 1615
≊ 0.94
That is, the probability that X lies between 55 and 95 is at least 94 percent.
Example : Suppose X is a random variable with µ = 75 and standard deviation σ = 5.
Determine an interval [a,b] about the mean for which the probability that X lies in the
interval is at least 99 percent.
Solution : Set 1 − k12 = 0.99 and solve for k. We get
1-0.99 = k12 or k 2 = 0.01
1
= 100 or k = 10

90
Thus the interval is [75-10(5), 75+10(5)] = [25,125]

Qustion: Compare Normal distribution and Chebyshev’s distribution? Chebyshev’s


Inequality:
Answer Chebyshev’s Inequality provides a general bound applicable to any probability
distribution, stating that for any (k > 1) : [P (|X − µ| ≥ kσ) ≤ k12 ] Specifically, when
( k = 2 ) (2 standard deviations from the mean), the inequality states that at least
(1 − k12 = 1 − 14 = 0.75)or 75% of the observations lie within 2 standard deviations of the
mean. For (k = 1) : [P (|X − µ| ≥ σ) ≤ 1] This means no useful information is provided
since 100% of the data lies within any bound.
Normal Distribution:
In a normal distribution, the empirical rule (68-95-99.7 rule) defines specific probabilities
for the distribution of data: Approximately 68% of the data falls within 1 standard de-
viation ((µ ± σ)) from the mean. Approximately 95% falls within 2 standard deviations
((µ ± 2σ)). Approximately 99.7% falls within 3 standard deviations ((µ ± 3σ)).

Chebyshev’s Inequality and the properties of the normal distribution (along with the Z-
table) serve different purposes, and each has its own advantages depending on the context
of the analysis. Here are some reasons and scenarios where Chebyshev’s Inequality is still
valuable, even when the normal distribution is available:
– Applicability to All Distributions Chebyshev’s Inequality applies to any probability
distribution, regardless of its shape (normal, uniform, skewed, etc.) or the nature of
the data. This is particularly useful when the distribution of the data is unknown
or cannot be assumed to be normal.
Non-Normal Data: In many real-world applications, data may not follow a normal
distribution. Chebyshev’s Inequality can be used to understand the dispersion of
such data without needing specific distribution information.
– Fewer Assumptions No Assumed Normality: Using the Z-table and properties of the
normal distribution requires a normality assumption. If the data does not conform to
this assumption, the Z-scores and resultant probabilities may be misleading. Cheby-
shev’s Inequality circumvents this issue by not requiring any assumptions about the
underlying distribution.
– Conservative Estimates: Chebyshev’s Inequality provides a conservative estimate
that gives a lower bound on probabilities. This means that it can be useful in
scenarios where it’s important to establish minimum expectations about the spread
of data.
– Preliminary Analysis: In exploratory data analysis, if you are unsure whether the
data follows a particular distribution, Chebyshev’s Inequality can be leveraged to
make initial observations about variance and spread.
– Foundational Understanding: Chebyshev’s Inequality is often taught in statistics as
it helps reinforce concepts about variance, mean, and dispersion, providing founda-
tional knowledge about probability without leaning solely on normal distributions.

91
Example : Two unbiased dice are thrown. If X is the sum of the numbers shown up,
35
prove that P (|X − 7| ≥ 3) = 54 . Also
P find the 1actual probability.
Solution : We know that E(X) = pi xi = 36 (2 + 6 + 12 + 20 + 30 + 42 + 40 + 36 +
30 + 22 +P12) = 7
1
E(X 2 ) = pi x2i = 36 (4 + 18 + 48 + 100 + 180 + 294 + 320 + 324 + 300 + 242 + 144) = 329
6
σ 2 = E(X 2 ) = [E(x)]2 = 329
6
− 49 = 35
6
Using Chebyshev’s inequality,
2
P (|X − µ| ≥ C) ≤ Cσ 2
35
35
P (|X − 7| ≥ 3) ≤ 6
9
≤ 54

Check the actual probabiity, |X − 7| ≥ 3 means −(|X − 7|) ≥ 3 or (|X − 7|) ≥ 3


P (|X − 7| ≥ 3) = P (X ≥ 10) + P (X ≤ 4) = 31

Markov’s inequality : Markov’s inequality gives upperbound for the probability.


P [|X| ≥ k] ≤ E[|X|]
k
This is known as Markov’s Inequality. Addresses non-negative ran-
dom variables only.

Example : The mean height of students in the class is 5 feet 5 inches. Find the bound
on the probability that a student selected at random from the class is taller that 8 feet.
Example : Given that E[H] = 65 inches and k = 96 inches.
P [|X| ≥ k] ≤ E[|X|]
k
P [|H| ≥ 96] ≤ 65
96
= 0.68

NOTE: Markov’s Inequality is more general and simpler, while Chebyshev’s


Inequality is more specific and stronger when mean and variance are known.
Both are foundational concepts in probability that showcase how probabilistic
behaviors can be bounded despite limited information.
Examples of Chebyshev’s and Markov’s Inequality to understand the compar-
ison:
Markov’s Inequality Example: If the expected value ( E[X] = 10 ) and you want to know
the probability (P (X ≥ 20)), using Markov’s Inequality:
10
P (X ≥ 20) ≤ 20 = 0.5

Chebyshev’s Inequality Example: If ( E[X] = 100 ) and ( Var(X) = 25 ) (thus ( σ = 5)),


for ( k = 2 ):
P (|X − 100| ≥ 10) ≤ 212 = 0.25

Chernoff ’s Inequality: Provides exponential bounds on the tail probabilities of sums


of independent random variables.
In case continuous random variable, P (X ≥ a) ≤ e−at Mx (t) where Mx (t) is the Moment
Generating Function.

92
In case of discrete random variable, P (X ≥ k) ≤ e−tk Mx (t)

Using Chebyshev’s Inequality demonstrate Convergence in Probability :


V ar(X̄n )
P (|X̄n − E[X]| > ϵ) ≤ ϵ2
is the Chebyshev’s Inequality

For the uniform distribution, the spread is simply the length of the interval (b - a), and
2
because Pof the uniformity, the variance is (b−a)
12
.
X̄n =Pn1 i = 1n X .
Pi n
V ar ( ni=1 Xi ) = P 2
i=1 V ar(Xi )= nσ .
V ar(X̄n) = V ar n i = 1 Xi = n2 V ar ( ni=1 Xi ).
1 n 1
P
2
V ar(X̄n ) = n12 × nσ 2 = σn .

(1−0)2 1
V ar(X) = 12
= 12
.

V ar(X) 1/12 1
n
= n
= 12n
. is the Variance of the sample mean.

P (|X̄n − 0.5| > ϵ) ≤ V ar(ϵ2X̄n ) = 1/(12n)


ϵ2
= 12ϵ12 n .
As ( n ) approaches infinity:
limn→∞ P (|X̄n − 0.5| > ϵ) ≤ lim n → ∞; 12ϵ12 n = 0.
This shows that the probability that the sample mean X̄n deviates from ( 0.5 ) by more
than ϵ approaches zero as ( n ) increases.

Explain the law of large numbers (LLN) using the concepts of Stochastic
Convergence.
The Law of Large Numbers states that as the sample size ( n ) increases, the sample mean
(average) of a sequence of independent and identically distributed (IID) random variables
converges in probability to the expected value (mean) of the underlying distribution.
Let (X1 , X2 , . . . , Xn ) be a sequence of IID random variables with a finite expectation (
E[X] = µ ).
As ( n ) approaches infinity, the sample mean (X̄n ) defined as:
Converges in probability to ( µ ):
P
X̄n −
→ µ.
limn→∞ P (|X̄n − µ| > ϵ) = 0 for any ϵ > 0.

This statement indicates that as we take more samples, the probability that the sample
mean (X̄n ) deviates from the true mean ( µ ) by more than any fixed amount (ϵ ) ap-
proaches zero.

Suppose we have a fair six-sided die, which has an expected value:


E[X] = 1+2+3+4+5+6
6
= 3.5.

93
Sampling: If we roll the die and calculate the average from various sample sizes:

For ( n = 5 ): The average may be something like ( 3.2 ).


For ( n = 100 ): The average might be closer to ( 3.5 ).
For ( n = 1000 ): The average would likely be even closer to ( 3.5 ).
As ( n ) increases:

The sample mean (X̄n ) will fluctuate more closely around ( 3.5 ) as the number of rolls
increases due to the increased cancellation of random extremes (variance). Using conver-
gence in probability, we can say that the probability of observing an average significantly
different from ( 3.5 ) decreases with larger numbers of samples.

ˆ Almost Sure Convergence (or Strong Convergence:) A sequence of random variables ( X n


) converges almost surely to a random variable ( X ) if: P (limn→∞ Xn = X) = 1. This
type of convergence indicates that with probability 1, the random variables will eventually
equal ( X ) and remain equal. It’s a stronger form of convergence than convergence in
probability.

Compare the almost sure or strong convergence with the weak convergence?
Almost Sure Convergence (also known as strong convergence) and Weak Convergence
(also known as convergence in distribution) are two types of convergence for sequences of
random variables.

A sequence of random variables (Xn ) converges almost surely to a random variable ( X


) if: P (limn→∞ Xn = X) = 1. This means that the sequence Xn will eventually equal (
X ) for all ( n ) large enough, with probability 1.

Whereas, A sequence of random variables Xn converges in distribution (or weakly) to


a random variable ( X ) if the cumulative distribution functions (CDFs) converge at
all points where ( X ) is continuous: limn→∞ FXn (x) = FX (x) for all x where FX
is continuous.

Almost Sure Convergence:

– Stronger Requirement: Almost sure convergence is a stronger criterion than weak


convergence. It indicates that Xn converges to ( X ) for almost all sample paths. This
means that the convergence is guaranteed for nearly every outcome in the sample
space; only a set of measure zero may not converge.
– Pointwise Convergence: Almost sure convergence implies that for every individual
realization of the random variables, the sequence converges to the limiting variable.

Weak Convergence:

94
– Weaker Requirement: Weak convergence does not require pointwise convergence of
the random variables. It only requires the distributions of Xn to approximate the
distribution of ( X ) as ( n ) becomes large.
– Limited Insight: Weak convergence does not provide information about specific sam-
ple paths or how the sequence behaves on specific realizations. You might have sit-
uations where Xn converges in distribution to ( X ) but does not converge almost
surely.

Example of Almost Sure Convergence:


Consider (Xn = n1 ) converges almost surely to 0: P (limn→∞ Xn = 0) = 1. This means
that for almost every realization of n, Xn will become 0.
Example of Weak Convergence:
Suppose Xn follows the distribution N (0, n1 ) (a normal distribution with mean 0 and
variance n1 ). As n → ∞, Xn converges in distribution to ( 0 ) (which is ( N(0, 0) )),
but Xn does not converge to 0 almost surely because its specific values do not stabilize
around zero; they still have random variations.

ˆ Convergence in Mean (or Mean Convergence: ) Xn converges in mean to ( X )


if: limn→∞ E[|Xn − X|] = 0. Essentially, this requires the expected value of the absolute
difference between the random variables to approach zero. It shows that as we increase
the sample size, the random variables become close to ( X ) in terms of their average
behavior.

ˆ Convergence in ( p )-th Mean : Generalizing the idea of convergence in mean, a


sequence of random variables Xn converges in ( p )-th mean to a random variable ( X
) if: limn→∞ E[|Xn − X|p ] = 0. Here, ( p ) is a positive integer and typically ( p = 2 )
corresponds to L2 convergence (mean square convergence).

8.7 Law of Large Numbers


Question: State and prove the weak law of large numbers (WLLN).
Answer : We  know that,
E[X̄n ] = E n1 (X1 + X2 + . . . + Xn ) = n1(E[X1 ] + E[X2 ] + . . . + E[Xn ])


V ar(X̄n ) = V ar n1 (X1 + X2 + . . . + Xn )
V ar(aX) = a2 V ar(X)
V ar(X̄n ) = n12 V ar(X1 + X2 + . . . + Xn ) = n12 (nσ 2 )
By applying Chebyshev’s Inequality, we can state that:
P (|X̄n − µ| > ϵ) ≤ V ar(ϵ2X̄n )
2 2
P (|X̄n − µ| > ϵ) ≤ σ ϵ2/n = ϵσ2 n 
which goes to zero as n → ∞. Hence weak law of large number (WLNN) P |X̄n − µ| > ϵ →
0 as n → ∞ is proved.

95
8.8 Central Limit Theorem (CLT)
Question: State the Central Limit Theorem (CLT)
Let X1 , X2 , . . . , Xn be a sequence of IID random variables, each with finite mean µ = E[Xi ]
and finite variance σ 2 = V ar(Xi )
As ( n ) approaches infinity, the distribution of the standardized sum (or average) of these
random variables approaches a normal distribution:
d
Zn = X̄σ/n√−µ
n

− N (0, 1),
where ( X̄n ) is the sample mean defined as:
The Central Limit Theorem provides that the distribution of the sample mean (or sum) of
a large number of IID random variables approaches a normal distribution, regardless of the
original distribution of the variables, as long as they have a finite mean and variance.

Problem : The lifetime of a certain brand electric bulb may be considered a random
variable with mean 1200 hours and standard deviation 250 hours. Using the Central Limit
Theorem find the probability that the average lifetime of 60 bulbs exceeds 1250 hours.
Solution: If X̄ deontes the mean lifetime of 60 bulbs then by central limit theorem,
z = X̄−µ
√σ
n

Mean of sample : X̄ = 1250


Mean of population µ : 1200
No of samples (n) = 60
Standard deviation of population : σ = 250

P (z > 1.55) = area right of 1.55 = 0.5 - 0.4394 = 0.0606

Problem : A random sample of size 100 is taken from a population whose mean is 60
and the vriance is 400. Using central limit theorem with what probability can we assert that
the mean of the sample will not differ from µ = 60 by more than 4. Solution: If X̄ deontes
the mean of 100 samples. z = X̄−µ
√σ
n

Mean of sample : X̄ = N otgiven


Mean of population µ : 60
No of samples (n) = 100 √
Standard deviation of population : σ = 400 = 20
The mean will not differ from 60 by more than 4,
P [|X̄ − 60 < 4|] = P [−4 < X̄ − 60 < 4] = P [56 < X̄ < 64]
When X̄ = 56, Z = −2
and When X̄ = 64, Z = 2
P [|X̄ − 60 < 4|] = P (−2 < Z < 2) = 2XArea from Z = 0 to Z = 2,
= 2 X 0.4772 = 0.9544
Problem : A queue of 30 cars has been formed before a railway crossing. Suppose that the
length L of a car is a random variable with expected value µ = 5.3m and variance σ 2 = 1.44M 2
and that the distance D between two successive vehicles is a r.v. with expected value µg = 1.1m
and variance σ 2 g = 0.09m2 . What is the probability that the total length Ltot of the queue of

96
cars (consisting of 30 cars and 29 gaps) is between 185 and 195 m?

Solution:
P30 The
P29total length of the queue can be defined as:
Ltot = i=1 Li + j=1 Dj ,

The expected value of the total length of the cars:

E[Ltot ] = E[ 30
P P29
i=1 Li + j=1 Dj ],

E[Ltot ] = 30 · E[L] + 29 · E[D] = 30 · 5.3 + 29 · 1.1

E[Ltot ] = 30 · 5.3 + 29 · 1.1 = 159 + 31.9 = 190.9, m.

The variance of the total length is calculated by summing the variances of the lengths and
the gaps:
P30 P 
 29
V ar(Ltot ) = V ar i=1 Li + V ar j=1 Dj .

V ar(Ltot ) = 30 · V ar(L) + 29 · V ar(D) = 30 · 1.44 + 29 · 0.09.

Since the cars and distances are independent, we can write:

E[Ltot ] = 30 · E[L] + 29 · E[D] = 30 · 5.3 + 29 · 1.1.

V ar(Ltot ) = 30 · 1.44 + 29 · 0.09 = 43.2 + 2.61 = 45.81.

Z = L√tot −E[Ltot ] .
V ar(Ltot )

Z= √ −190.9 .
Ltot
45.81

P (185 < Ltot < 195)

185−190.9
Z1 = √
45.81
.

195−190.9
Z2 = √
45.81
.
p √
V ar(Ltot ) = 45.81 ≈ 6.77.

185−190.9 −5.9
Z1 = 6.77
≈ 6.77
≈ −0.87

195−190.9 4.1
Z2 = 6.77
≈ 6.77
≈ 0.61.

97
0.7291-1+0.8078 = 0.5369.

Problem : A distribution with unknown mean µ has variance equal to 1.5. Use central
limit theorem to find how large a sample should be taken from the distribution in order to find
how large a sample should be taken from the distribution in order that the probability will be
at least 0.95 that the sample mean will be within 0.5 of the population mean.

Solution: Let n be the size of the sample, a typical member of which is Xr .


E(Xi ) = µ and V arXi = 1.5
Let X̄ denote the sample mean.
By corollary under
√ central limit theorem,
X̄ follows N (µ, √1.5
n
)
We have to find n such that
P (µ − 0.5 < X̄ < µ + 0.5) ≥ 0.95
P [|X − µ| < 0.5] ≥ 0.95

P [ |√
X̄−µ|
1.5
< √0.51.5 ] ≥ 0.95
n √n
P |Z| < 0.4082 n ≥ 0.95
From the table of areas undr normal curve
P |Z| < 1.96 = 0.95
P (−1.96 < Z < 1.96) = 0.95
P (Z < 1.96) ≈ 0.975
P (Z < −1.96) ≈ 0.025
P |Z| < 1.96 = P (−1.96 < Z √< 1.96) = 0.975 − 0.025 = 0.95
∴ least n is given y 0.4082 n = 1.96, i.e n = 24.
∴ the size of the sample must be at least 24.

Problem : If X1 , X2 , . . . , Xn are Poisson variates with parameter λ = 2, use the central


limit theorem to estimate P(120 ≤ Sn ≤ 160), where Sn = X1 + X2 + X3 + . . . Xn and n =75.
Solution : E(Xi ) = λ and V ar(Xi ) = λ = 2√
√ Sn follows N (nµ, σ n)
By central limit theorem,
i.e. Sn follows N (150, 150)
n −150)
P (120 ≤ Sn ≤ 160) = P ( √−30
150
≤ (S√ 150
≤ √10
150
) = P (−2.45 ≤ z ≤ 0.82)
where z is the standard normal variable.
P (−2.45 ≤ Z ≤ 0.82) ≈ 0.7868

8.9 Covariance and Correlation


Question: Discrete random variable X and Y have the joint density fXY (x, y) = 0.4δ(x +
α)δ(y − 2) + 0.3δ(x − α)δ(y − 2) + 0.1δ(x − α)δ(y − α) + 0.2δ(x − 1)(y − 1). Determine the value
of α, if any that minimizes the correlation between X and Y and find the minimum correlation

98
. Are X and Y orthogonal?
R +∞ R +∞
Answer : RXY = −∞ −∞ xyfXY (x, y)dxdy
RXY = 0.4(−α)(2) + 0.3(α)(2) + 0.1(−α)2 + 0.2(1)(1)
Since dirac function is given, it indicates 0.4δ(x + α)δ(y − 2) means (−α, 2) occurs with proba-
bility 0.4 and the similar explanation for the remaining three terms. Hence we can write it as
follows. RXY = −0.2α + 0.1α2 + 0.2
dRXY

= −0.2 + 0.2α = 0
∴α=1
Verify dRdαXY > 0 or not since dRdαXY > 0 it indicates α = 1 is the minimum value and hence we
can put α = 1 in the equation to find RXY = −0.2 + 0.1(12 ) + 0.2 = 0.1 which is the minimum
value. Since the value of RXY ̸= 0 we can conclude that x and y are not orthogonal.

NOTE : When you have a function ( f(x) ) and you’ve found its first derivative ( f’(x) ), set-
ting it to zero gives you critical points (points where the function could have a local maximum,
local minimum, or saddle point). The second derivative ( f”(x) ) is used to analyze these points:

ˆ If (f ”(x) > 0) at a critical point, the function is concave up, indicating a local minimum
at that point.

ˆ If (f ”(x) < 0) at a critical point, the function is concave down, indicating a local maxi-
mum.

ˆ If ( f”(x) = 0 ), the test is inconclusive, and further analysis or higher derivatives may be
needed.
Question: Discrete random variable X and Y have the joint density fXY (x, y) = 0.3δ(x −
α)δ(y − α) + 0.5δ(x + α)δ(y − 4) + 0.2δ(x + 2)δ(y + 2). Determine the value of α if any that
minimizes the covariance of X and Y. Find the minimum covariance. Are X and Y correlated?
Answer : We know that,
Cov(X, Y) = E[XY] - E[X]E[Y]
C(X,Y ) = RXY − X̄ Ȳ
RXY = 0.3(α)2 + 0.5(−α)(4) + 0.2(−2)(−2)
RXY = 0.3(α)2 − 2(α) + 0.8
X̄ = 0.3(α) + 0.5(−α) + 0.2(−2) = −0.2(α) − 0.4
Ȳ = 0.3(α) + 0.5(4) + 0.2(−2) = 0.3(α) − 1.6
∴ C(X,Y ) = 0.3(α)2 − 2α + 0.8 − [(−0.2α − 0.4)(0.3α − 1.6)]
∴ C(X,Y ) = 0.36α2 − 1.56α + 1.44
dCX,Y
∴ dα = 0.72α − 1.56 = 0
∴ α = 2.16
d2 C
Verify dαX,Y which is 0.72 and greater than zero. Hence minimum Cmin(X,Y ) = 0.36 ∗ 2.162 −
1.56 ∗ 2.16 + 1.44 == 0.25
Since Cmin(X,Y ) is ̸= 0 X and Y not uncorrelated.

99
Question: Define the following terms
ˆ Uncorrelated
ˆ Not Uncorrelated
ˆ Positive Correlation
ˆ Negative Correlation
ˆ Covariance and Correlation
Answer :
ˆ Uncorrelated : Two random variables ( X ) and ( Y ) are said to be uncorrelated if
their covariance is zero: Cov(X, Y) = E[XY] - E[X]E[Y] = 0. This implies that there
is no linear relationship between the two variables; knowing the value of X provides no
information about the value of Y.

ˆ Not Uncorrelated : Saying that two random variables are not uncorrelated means that
they are correlated, which can happen when: Cov(X, Y ) ̸= 0. This indicates that there
is some degree of linear relationship between the two variables.

ˆ Positive Correlation : If the covariance is positive (( Cov(X, Y ) > 0)), this suggests
that as one variable increases, the other variable tends to increase as well.

ˆ Negative Correlation: If the covariance is negative ((Cov(X, Y ) < 0 )), it implies that
as one variable increases, the other variable tends to decrease.

ˆ Covariance and Correlation : Covariance is a measure of how two random variables


change together. It indicates the direction of the linear relationship between the variables.
Specifically, it quantifies whether an increase in one variable corresponds to an increase
or decrease in another variable. Positive covariance, negative covariance and zero covari-
ance.
Cov(X, Y) = E[XY] - E[X]E[Y]

Correlation is a standardized measure that quantifies the strength and direction of the
relationship between two variables. It provides insights into how closely related the vari-
ables are, in a dimensionless form. The most commonly used correlation measure is the
Pearson correlation coefficient, denoted as ’r’. The correlation coefficient ’r’ ranges from
-1 to 1. r = 1 : Perfect positive linear correlation; r = -1 : Perfect negative linear corre-
lation, and r = 0 : No linear correlation.
r = √ Cov(X,Y )
V ar(X)·V ar(Y )

Question: Let X and Y be random variables with the joint distribution.


ˆ Find the distributions of X and Y.
ˆ Find Cov(X,Y), i.e. the covariance of X and Y.
100
ˆ Find ρ(X, Y ), i.e the correlation of X and Y.
ˆ Are X and Y are independent random variables?
Y -3 2 4 Sum
X
1 0.1 0.2 0.2 0.5
3 0.3 0.1 0.1 0.5
Sum 0.4 0.3 0.3
Answer : The marginal distribution on the right is the distribution of X , and the marginal
distribution the bottom is the distribution of Y. Namely,
xi 1 3
f(xi) 0.5 0.5

yi -3 2 4
g(yi) 0.4 0.3 0.3

Cov(X, Y ) = E(XY ) − µx µy
First compute µx and µy
P
µxP= xi f (xi ) = (1)(0.5) + (3)(0.5) = 2
µy = yi g(yi ) = (−3)(0.4) + (2)(0.3) + (4)(0.3) = 0.6
Next compute
P E(XY) as follows.
E(XY ) = xi yi h(xi , yi )
E(XY )= (1)(-3)(0.1)+(1)(2)(0.2)+(1)(4)(0.2)+(3)(-3)(0.3)+(3)(2)(0.1)+(3)(4)(0.1) = 0
Cov(X, Y ) = E(XY ) − µx µy = 0 - 2(0.6) = -1.2

To compute σx and σy
2 2
P
E(X ) = (xi ) f (xi )= (1)(0.5)+(9)(0.5) = 5
σx2 = V ar(X) = E(X 2 ) − µ2 X = 5-4 = 1
σx = 1
and
E(Y 2 ) = (yi )2 g(yi )= (9)(0.4)+(4)(0.3) + (16)(0.3) = 9.6
P
σy2 = V ar(Y ) = E(Y 2 ) − µ2 Y = 9.6-(0.62 ) = 9.24

σy = 9.24 = 3
ρ(X, Y ) = Cov(X,Y
σx σy
)
= (−1.2)
(1)(3)
= 0.4

X and Y are not independent, since


P(X=1, Y=-3) ̸= P(X=1) (Y=-3)

101
8.10 Spectral Characteristics of Random Process
Question: Explain the Spectral Charactristics of a random process.
Answer : The spectral characteristics of a random process provide insights into how the
process behaves in the frequency domain, as opposed to the time domain. The spectral char-
acteristics of a random process primarily refer to the distribution of power or energy of the
process across different frequencies. This is typically described using Power Spectral Density
(PSD). The Power Spectral Density (PSD) is a function that provides a measure of the power
present in a signal as a function of frequency.
The Power Spectral Density SX (f ) of a random processR ( X(t) ) is defined as the Fourier trans-
+∞
form of its auto-correlation function RX (τ ): SX (f ) = −∞ RX (τ )e−j2πf τ dτ .
RX (τ ) = E[X(t)X(t + τ )] is the auto-correlation function.

Properties of Power Spectral Density

ˆ Non-negative: The PSD is non-negative since it represents power, which cannot be neg-
ative.

ˆ Symmetry: For real-valued processes, the PSD is an even function, meaning that S X (−f ) =
SX (f ).

ˆ Real quantity
ˆ The total
R power of the signal can be found by integrating the PSD across all frequencies:
+∞
P = −∞
SX (f ), df .

ˆ Wide-Sense Stationarity: If a random process is wide-sense stationary, its PSD exists and
is a function of frequency alone, independent of time.

ˆ Frequency Content: Analyzing the PSD allows us to understand which frequencies carry
significant power in the signal, providing insights into the nature of the random process.

ˆ Band-limited Signals: If the PSD is non-zero only within a certain bandwidth, the pro-
cess is considered band-limited. This is important in telecommunications, as it helps in
designing efficient communication systems.

Applications of Spectral Characteristics

ˆ Signal Processing: Engineers use spectral analysis to design filters that can enhance
desired signals while reducing noise. The PSD helps in determining the effectiveness of
these filters at different frequencies.

ˆ System Design: Understanding the spectral characteristics allows for designing control
systems that can effectively deal with different frequency behaviors.

ˆ Noise Analysis: In systems like electrical circuits, the spectral characteristics of noise play
a vital role in determining system performance.

102
8.11 Autocorrelation and Power Spectral Density
Autocorrelation and Power Spectral Density:

ˆ From Autocorrelation to Power Spectral Density Using the Fourier transform, you can
transform the
R ∞ autocorrelation function to obtain the Power Spectral Density: SX (f ) =
−j2πf τ
FRX (τ ) = −∞ RX (τ )e dτ .
This relationship tells us that the PSD can be derived from the autocorrelation function
by performing a Fourier transform.

ˆ From Power Spectral Density to Autocorrelation Conversely, you can obtain the autocor-
relation function from Rthe Power Spectral Density using the inverse Fourier transform:

RX (τ ) = F −1 SX (f ) = −∞ SX (f )ej2πf τ df .
This means that the values of the autocorrelation function at different lags are obtained
by integrating the Power Spectral Density weighted by the complex exponential function.

NOTE : In signal processing, if the PSD of noise in a communication channel is measured,


you can directly calculate the expected correlations of that noise at different time shifts, inform-
ing design choices for filtering and signal processing strategies. A direct relationship between
the auto correlation function and the Power Spectral Density of a stationary process allows a
transition between the time domain and the frequency domain, providing powerful tools for
analyzing and understanding the behavior of stochastic processes and signals.

How to calculate PSD in the range of frequencies?


The total power (or variance) of a random process (X(t)) can be obtained by integrating its
power spectral density SX (f ) over the frequency domain.
Let the power variance contributed by the R −fcomponents between
Rf frequencyR f −f2 and −f1 , and
between +f1 and +f2 be given by: P = −f21 SX (f )df + f12 SX (f )df = 2 f12 SX (f )df .
Interpret the equation EX 2 (t) = RXX (0)
The autocorrelation function RXX (τ ) of a random process ( X(t) ) is defined as:
RXX (τ ) = E[X(t)X(t + τ )], where τ is the time lag.
RXX (0) = E[X(t)X(t)] = E[X 2 (t)]
The equation EX 2 (t) = RXX (0) conveys the following:

ˆ Expected Value of the Square: The left-hand side, E[X (t)], represents the expected value
2

of the square of the random variable ( X(t) ). This measure provides information about
the power or energy of the random process at time ( t ).

ˆ Autocorrelation at Zero Lag: The right-hand side, R XX](0), signifies the auto correlation
[
at zero lag. This value captures the variance of the process if the process is wide-sense
stationary (WSS) and is equal to the expected value of the square of the random variable.

If ( X(t) ) is stationary, and we set the mean E[X(t)] = µ, then the relationship can also
be expressed as: RXX (0) = V ar(X(t)) + (E[X(t)])2 = E[X 2 (t)]. Hence, V ar(X(t)) =
E[X 2 (t)] − (E[X(t)])2 . This emphasizes the role of RXX (0) in assessing both variance and

103
expected power. The equation E[X 2 (t)] = RXX (0) reflects a fundamental relationship in
stochastic process analysis, indicating that the expected value of the square of the process
is linked directly to its autocorrelation at a zero lag. This is crucial for understanding the
power and variance characteristics of random processes.

The Power Spectral Density SX (f ) is defined as: SX (f ) = T1 |F[X(t)]|2 where [F ] denotes


the Fourier transform, and |FX(t)|2 indicates the power of the frequency components whereas,
If you want to find the total power contributed by the frequency R f2 components between two
f2
frequencies f1 and f2 , you compute the following integral: Pf1 = f1 SX (f )df
Integrating the power spectral density SX (f ) between two frequencies f1 and f2 allows you to
quantify the total power contained in that frequency band.

Problem : Find the Power Spectral Density (PSD) of the given wireless signal X(t) with
1 −a|τ |
the auto correlation function RXX (τ ) = 2a e . GIven that a= 5Khz. Calculate the power
spectral density for the given signal. From PSD calculate BW required which contains 90% of
the signal energy.
Solution: In wireless communication, the power outside the bandwidth, should not get trans-
mitted.

Figure 1: Autocorrelation Function RXX (tau)

1
E[X]2 (t) = R[ XX](0) = 2a is the poer of the signal.
Step 1: Autocorrelation Function
1 −a|τ |
RXX (z) = 2a e , where a = 5 kHz
Step 2: Calculate the Power Spectral Density (PSD) R∞
Fourier Transform of the Autocorrelation Function: SXX (f ) = −∞ RXX (τ )e−j2πf τ , dτ.
R ∞ 1 −a|τ | −j2πf τ
SXX (f ) = −∞ 2a e e , dτ.

104
R R∞ 
0
SXX (f ) = 1
2a −∞
eaτ e−j2πf τ , dτ + 0
e−aτ e−j2πf τ , dτ
 
1 1 1 1
SXX (f ) = 2a a−j2πf
+ a+j2πf
= a2 +(2πf )2

Step 3: Calculating Bandwidth for 90% of Energy

The total energy of the signal corresponds to the integral of the PSD over all frequencies:
R∞
Etotal = −∞
SXX (f )df .

1
Rf 1
0.9 2a = −f a2 +(2πf )2
df.

1 2π −1 F
0.9 2a = 4aπ 2 tan a

∴ f = 5 Khz and Bandwidth = 10 Khz.

Figure 2: Energy of X(t) SXX (F )

8.12 Ergodicity
What is ergodicity? Ergodicity is a concept from statistics and probability theory, particularly
in the context of stochastic processes and dynamical systems. In simple terms, ergodicity
connects the behavior of a system over time with its behavior over an ensemble (a collection of
all possible states).
Ergodicity tells us that observing a single system for a long time is equivalent to observing
many identical systems at one point in time. It’s crucial in areas like:
ˆ Signal processing
105
ˆ Statistical mechanics
ˆ Time series analysis
Define ergodicity? A process X(t) is ergodic if, for a given function f, the following holds:
RT
limn→∞ 0 f [X(t)]dt = E[f (X)]
The average of f[X(t)] over time is equal to the expected value over all possible states.

Example : A random process is given by X(t) = cos (t+ϕ) where ϕ is a random variable
distributed in (0,2π). Show that X(t) is (i)stationary in the wide sense, (ii) ergodic (based on
the first order). Solution: A random process is said to be stationary in the wide sense if its
mean is constant over time and its autocorelation function depends only on the time difference
τ = t1 − t2
E[X(t)] = E[cos(t + ϕ)]
Since ϕ is uniformly
R 2π distributed on (0, 2ϕ) :
1
E[X(t)] = 2π 0
cos(t + ϕ)dϕ = 0
This shows that the mean ( E[X(t)] ) is constant (zero) for all ( t ).
Verify the auto-correlation function RXX (t1 , t2 ) = E[X(t1 )X(t2 )] = E[X(t)X(t + τ )]
E[X(t)X(t + τ )] =R E[cos(t + ϕ). cos(t + τ + ϕ)]

RXX (t, t + τ ) = 21 0 cos(2t + τ + 2ϕ). 2π 1
dϕ + 12 cosτ
∴ RXX (t, t + τ ) = 12 cosτ
∴ The process is wide sense R T stationary.1 R T
¯
The time average XT = −T X(t)dt = 2T −T cos(t + ϕ)dt
X¯T = 2T1
[sin(T + ϕ) − sin(−T + ϕ)]
limT →∞ X¯T = 0 as T → to ∞ and since sine lies between -1 to +1. Since the time average X¯T
is equal to the ensemble average, (both zero) the process is mean ergodic.

8.13 Auto-Correlation and Cross-Correlation Functions


Auto-Correlation and Cross-Correlation Functions. Auto-correlation and cross-correlation
functions are both important tools in signal processing and statistics that measure relationships
between random processes. The auto-correlation function measures the correlation of a signal
with itself over different time lags. It quantifies how the values of a single signal at different
times are related.
RX (τ ) = E[X(t)X(t + τ )]
It is used to identify periodicities and patterns within the same process, detect the presence of
repeating patterns (such as noise or signals), and determine the time series’ structure.
The cross-correlation function evaluates the correlation between two different signals (or pro-
cesses). It measures how one signal relates to another signal over different time lags.
RXY (τ ) = E[X(t)Y (t + τ )]
Followings are the applications:

ˆ Auto-Correlation: Used in time series analysis for identifying trends, seasonality, and

106
cyclic patterns. Commonly applied in fields such as economics, meteorology, and engi-
neering to analyze the stability and predictability of processes.

ˆ Cross-Correlation: Commonly used in signal processing to detect time delays and rela-
tionships between signals (e.g., in communication systems, physics, and audio processing).
Used in image processing to detect similar features across different images.

8.14 Impulse Response in LTI Systems


Impulse Response in LTI Systems : When a random process is input into a Linear Time-
Invariant (LTI) system, the response of the system can be characterized using the impulse
response of the system along with the concept of convolution. The impulse response ( h(t) )
of an LTI system is the output when an impulse signal ( δ(t) ) is applied as an input. For any
input random process ( X(t) ), the output ( Y(t) ) can be obtained by convolving ( X(t) ) with
the impulse responseR( h(t) ):

Y (t) = (X ∗ h)(t) = −∞ X(τ )h(t − τ )dτ
By studying the output of an LTI system when subjected to an impulse, you can determine
how the system alters the characteristics of a random process. This is particularly useful in
understanding the system’s effect on statistical properties like mean, variance, and correlation
structure. The sifting property of the impulse function allows for evaluating statistical moments
of a random process. For a random process X(t): E[X(t)δ(t − t0 )] = E[X(t0 )] This property
shows that you can isolate and analyze the random variable at the specific time t0 , which
is critical in statistical analysis. In the spectral analysis of random processes, impulses can
be thought of as frequency components. The power spectral density (PSD) of the process
often involves Fourier transforms that use the properties of impulse signals. For example, the
Fourier transform of the impulse yields a constant function across all frequencies, indicating
that an impulse contributes energy equally at all frequencies. White noise can be modeled
as a series of modulated impulse signals, where the impulse positions represent the random
occurrences of signal changes. This representation allows for efficient analysis and simulation
of random processes, as systems driven by white noise can be estimated and characterized. n
communication systems, recognizing how systems respond to impulse signals helps design filters
and equalizers that appropriately shape the random noise components in signals, optimizing
the reception and interpretation of data.

8.15 Applications in Noise Analysis


Qu. State how the noise analysis is carried out in Amplitude Modulation? Answer : In
Amplitude Modulation (AM) systems, noise plays a critical role in determining the quality and
reliability of the received signal. The analysis of noise in AM can be approached mathemati-
cally by modeling both the random message signal and the noise that affects the transmitted
signal.
In AM, the transmitted signal ( s(t) ) can be represented as: s(t) = (A + m(t)) cos(2πfc t) m(t)
is the message signal, which is typically a random process representing the information being
sent.

107
Assuming ( m(t) ) is a random process with certain statistical properties, it can be charac-
terized by its mean ( E[m(t)] ) and autocorrelation function Rm (τ ) = E[m(t)m(t + τ )]. For
stationary random processes, this function depends only on the time difference τ .
When transmitting the AM signal, it is subjected to various types of noise, often modeled as
an additive white Gaussian noise (AWGN). The received signal ( r(t) ) can be described as:
r(t) = s(t) + n(t) = A + m(t) cos(2πfc t) + n(t) where n(t) is the noise process, modeled as
n(t): A stationary random process with zero mean, characterized by its power spectral density
Sn (f ).
The mean of the received signal can be calculated as: E[r(t)] = E[(A + m(t)) cos(2πfc t)] +
E[n(t)] Since ( n(t) ) is a zero-mean process: E[r(t)] = (A + E[m(t)]) cos(2πfc t) If ( m(t) ) is
also zero-mean i.e., E[m(t)] = 0 : E[r(t)] =A cos(2πfc t)
The autocorrelation function of the received signal ( r(t) ) can be expressed as: Rr (t1 , t2 ) =
E[r(t1 )r(t2 )] Expanding this, we have: Rr (t1 , t2 ) = E [(A + m(t1 )) cos(2πfc t1 ) + n(t1 )]·E [(A + m(t2 )) cos(2
Applying the expectation and properties of independent processes, the autocorrelation function
can be separated into parts relating to m(t) and n(t):
Rr (t1 , t2 ) = [A2 + E[m(t1 )]E[m(t2 )]] cos(2πfc (t1 + t2 )) + Rm (t1 , t2 ) cos(2πfc (t1 + t2 )) + Rn (t1 , t2 )
When the received signal ( r(t) ) is demodulated, the presence of noise affects the accuracy of
the signal recovery:
The mean square error (MSE) in demodulation can be evaluated, providing key insights
into how noise affects the received message signal. The performance of the demodulator can be
understood by analyzing the probability of error, which depends on the SNR.

8.16 Gaussian and Poisson random processes


Gaussian and Poisson random processes are two essential types of random processes used in
statistics, probability theory, and various applications, including telecommunications, signal
processing, and queuing theory. Let’s delve into each process, explaining their properties, char-
acteristics, and typical applications. Gaussian Random Process : A Gaussian random
process, also known as a Gaussian process or normal process, is characterized by the property
that any finite collection of random variables from the process follows a multivariate normal
distribution. This implies that any linear combination of these variables is also normally dis-
tributed.
Mean and Variance: A Gaussian process is fully characterized by its mean function
µ(t) = E[X(t)] and covariance function RXX (t1 , t2 ) = E[(X(t1 ) − µ(t1 ))(X(t2 ) − µ(t2 ))].
Normal distribution: At any given time ( t ), the random variable ( X(t) ) has a normal
distribution, typically expressed as: X(t) ∼ N (µ(t), RXX (t, t)).
Stationarity: A Gaussian process can be stationary (constant mean and covariance func-
tion) or non-stationary. When stationary, RXX (t1 , t2 ) depends only on the time difference
τ.
NOTE: The Gaussian process has the property that its mean and covariance completely
describe its behavior, making it simpler to model and analyze.
Gaussian random processes are widely used in fields such as:

ˆ Noise modeling in communication systems.


108
ˆ Time series analysis in statistics.
ˆ Machine learning (e.g., Gaussian processes used for regression).
Poisson Random Process: A Poisson random process is a type of stochastic process
that models the occurrence of events happening randomly over time or space. The key feature
of a Poisson process is that it counts the number of events occurring in a fixed interval of time
or space, assuming:

ˆ Events occur independently.


ˆ The average number of events in a given time interval follows a Poisson distribution.
Events Counted: If ( N(t) ) is the number of events in time ( t ), then: P N (t) = n =
(λt)n e−λt
n!
,
where λ is the rate (average number of events per time unit).
Inter-arrival Times: The time between consecutive events follows an exponential distribu-
tion. If ( T ) is the inter-arrival time, then: P T ≤ t = 1 − e−λt
Stationarity: The process is stationary in the sense that the probability of observing a given
number of events in any interval only depends on the length of the interval, not on its position.
Independent Increments: The number of events occurring in disjoint time intervals is inde-
pendent.

Applications:

ˆ Modeling arrival times in queuing theory (e.g., customers arriving at a service counter).
ˆ Telecommunications (e.g., modeling packet arrivals or calls in a network).
ˆ Traffic flow analysis.
NOTE : Gaussian Processes are characterized by their linear relationships and normal dis-
tributions, making them suitable for noise modeling and regression analysis. Poisson Processes
are employed for modeling count events that occur independently in time or space, especially
relevant in fields dealing with random occurrences, such as telecommunications and queuing
theory.

8.17 Markov Chains


Markov chains : Markov chains are powerful mathematical constructs used to model stochas-
tic processes where the future state depends only on the current state, not on the past states.
With their well-defined state transition properties and the ability to find stationary distribu-
tions, Markov chains find applications in numerous fields, influencing both theoretical research
and practical implementations.
A Markov chain is a stochastic process that undergoes transitions from one state to another in
a state space. It is characterized by the Markov property, which states that the future state of
the process depends only on the current state and not on the sequence of events that preceded

109
it. This property makes Markov chains memoryless.

State Space (S): The set of all possible states that the process can occupy, denoted as
S = s1 , s2 , . . . , sn . Transition Probabilities: The probabilities of moving from one state to
another are represented in a transition matrix P : Pij = P (Xn+1 = sj | Xn = si ) where Pij is
the probability of transitioning from state si to state sj .
Initial State Distribution: The process starts at an initial state, described by a probability
vector π = [π1 , π2 , . . . , πn ], where πi is the probability of starting in state si .

The transition matrix ( P ) contains all transition probabilities. For example, for a system
with states s1 , s2 , s3 , the transition matrix ( P ) might look like:
 
P11 P12 P13
P = P21 P22 P23 
P31 P32 P33
In the equation rows indicate initial state and column shows next state.
State Transition: To describe the system moving from one state to another, the state prob-
ability distribution at the next step can be computed as: pn+1 = pn · P, where pn is the
probability distribution over states at time ( n ).

n-step Transition: The n-step transition probabilities can be calculated by raising the
transition matrix to the power of ( n ): P (n) = P n ,
and the probabilities can be computed as: pn = p0 · P n ,
where p0 is the initial state distribution.
Stationary Distribution: A stationary distribution π satisfies: πP = π, meaning that
the distribution does not change after applying the transition matrix.
Assumptions in Markov Process:
ˆ Finite number of states.
ˆ States are mutually exclusive.
ˆ States are collectively exhaustive.
ˆ Probability of moving from one state to another is constant over time.
Transition probability : The probability of moving from one state to another or remaining in
same state is called as transition probability. Rij = P (N extstateSj att = 1|initialstateSi att =
o) where i is the initial state and j is the final state.

Example : In a certain market, only two brands of refrigerator A and B are sold. Given
that a man last purchased brand A, there is 80 % chance that he would buy the same brand
in the next purchase, while if a man purchased brand B, there is 90% chance that his next
purchase would be brand B. Using this information,
ˆ Develop transition probability matrix.
110
ˆ Interprete the state transition matrix in terms of retention and loss as well as retention
and gain.
ˆ Draw transition diagram.
Solution
   
P11 P12 0.8 0.2
P = =
P21 P22 0.1 0.9
Retention and Loss
P11 = 0.8 = 80% retention to A
P12 = P(B next at t = 1 — A at time 0)
P12 = 0.2 = 20% loss to A
Retention and gain.
P21 = 0.1 = 10% Loss to B
P22 = 0.9 = 90% Retention to B
Transition Diagram

0.8
0.2

A B 0.9

0.1

In the transition diagram A and B are not absorbing because it is moving from A to B and
again from B to A.

Example : The school of international studies for population found out by its survey that
the mobility of population of a state to the village, town and city is in the following percentage.

Village Town City


Village 50 30 20
P =
Town 10 70 20
City 10 40 50

ˆ Interprete the state transition matrix in terms of retention and loss as well as retention
and gain.
ˆ Draw transition diagram.
Solution
Retention and loss
P11 = 0.5 = Retention to Village, P12 = 0.3 = Loss to Village, P13 = 0.2 = Loss to Village
P21 = 0.1 = Loss to Town or gain to village, P22 = 0.7 = Retention to town,
P23 = 0.2 = loss to town and gain to city, P31 = 0.1 = Loss to city

111
0.7
0.3

0.5 V T

0.1

0.1 0.2 0.4


0.2

0.5

∴ [N extstate] = [CurrentState][T ransitinM atrix]


Problem : Scenario: A customer has three stores to choose from: Store A, Store B, and
Store C. The transition probabilities governing the customer’s choices are as follows:

ˆ The probability that the customer returns to Store A after a visit is P (A → A) = 0.8.
ˆ The probability that the customer returns to Store B after a visit is P (B → B) = 0.7.
ˆ The probability that the customer returns to Store C after a visit is P (C → C) = 0.6.
ˆ Probability of transferring from Store A to Store B: P (A → B) = 0.10
ˆ Probability of transferring from Store A to Store C: P (A → C) = 0.10
ˆ Probability of transferring from Store B to Store C: P (B → C) = 0.10
ˆ Probability of transferring from Store B to Store A: P (B → A) = 0.20
ˆ Probability of transferring from Store C to Store A: P (C → A) = 0.10
ˆ Probability of transferring from Store C to Store B: P (C → B) = 0.30
Initially 200 customers in shop A, 120 in shop B and 180 in shop C.
Solution :

∴ [N extstate] = [CurrentState][T ransitinM atrix]


Initially 200 customers (200/500=40%)in shop A, 120 (120/500=24%)in shop B and 180
(180/500=0.36%) in shop C.
 
0.8 0.1 0.1
[A1 B1 C1 ]= [0.4 0.24 0.36] 0.2 0.7 0.1
0.1 0.3 0.6

112
∴, A1 = 0.4040, B1 = 0.316, C1 = 0, 280
New state : Customer in shop A = 202, Customer in shop B = 158 and Customer in shop C =
140.

0.7
0.1

0.8 A
V B
0.2

0.1 0.1 0.3


0.1

0.6

113
9 Queueing Theory
There are many situation in daily life when a queue is formed. For example, machine is waiting
to be repaired, patients waiting in a Doctor’s room, counters form queues. Queue is formed if
the service required by the customer (machine, patient, car, etc.) is not immediately available,
that is if the current demand for a particular service exceeds the capacity to provide the service.
Queues may be decreased in size or prevented from forming by providing additional service fa-
cilities which results in a drop in the profit. On the other hand, excessively long queues may
result in lost sales and lost customers. Hence the problem of interest is how to achieve a balance
between the cost associated with the prevention of waiting in order to maximize the profit. As
queueing theory provides an answer to this problem, it has become a topic of interest. Before
we consider the solutions of queueing problems, we shall consider the general framework of a
queueing system [1].

Although there are many types of queueing systems, all of them can be classified and de-
scribed according to the following characteristics.

ˆ The input (or arrival) pattern: The input (arrival) pattern in queueing systems describes
how entities (such as customers, data packets, or jobs) arrive at the system over time. The
most common pattern assumed in queueing theory is the Poisson arrival process, where
arrivals occur randomly but at an average rate, and the number of arrivals in any interval
follows a Poisson distribution. This is often used because of its mathematical simplicity
and because many natural processes approximate this randomness, such as customers
arriving at a bank or calls coming into a call center.
ˆ The service mechanism in a queueing system describes how long it takes to serve each
customer or entity. When we say it follows an exponential distribution, it means that
the service times are memoryless and randomly distributed, with a constant average rate
of service. Specifically, the exponential distribution is characterized by the probability
density function: f (t) = µe−µt , t ≥ 0

where:

( t ) is the service time, µ is the service rate, representing the average number of customers
served per unit time.
This means that the likelihood of finishing service at any moment is independent of how
long the customer has already been served, making the process ”memoryless.” For exam-
ple, if a machine has a 5-minute average service time, the probability that it will finish
exactly at 5 minutes is the same as the probability it will finish at 6 minutes, 10 minutes,
or any other time, provided the process continues. The exponential service pattern sim-
plifies analysis in queueing models like M/M/1 and M/M/c systems, where both arrival
and service times are assumed to be exponentially distributed.
In an exponential distribution, the mean (or expected value) of the service or inter-arrival
times is given by:

114
Mean = µ1
Here’s why:
The parameter µ represents the rate at which events occur (for example, the number of
customers served per unit time).
The exponential distribution’s probability density function (pdf) is:
f (t) = µe−µt , t ≥ 0
The expected value (mean) of the exponential distribution is derived by integrating ( t )
times the pdf over all ( t ):

R∞
E[T ] = 0 t · µe−µt dt Solving this integral, you find:
E[T ] = µ1
If µ is the rate of service (e.g., 2 customers per minute), then the average service time is
the reciprocal:
Average service time = µ1 = 12 minute So, higher µ means faster service and lower average
service times; lower µ means slower service and higher average times.

ˆ The queue discipline: Queue discipline refers to the rules or policies that determine how
entities (customers, data packets, jobs, etc.) are selected for service in a queueing system.
It essentially defines the order in which entities are served, affecting the system’s fairness,
efficiency, and waiting times.
The most common queue discipline is First-Come, First-Served (FCFS), where en-
tities are served in the order they arrive, ensuring fairness. Other types include: Prior-
ity Queueing: Entities are served based on priority levels; higher-priority entities are
served first regardless of arrival time. Round Robin: Each entity gets a fixed time
slot in a cyclic order, often used in computer systems to share CPU time. Last-Come,
First-Served (LCFS): The most recent arrival is served next, used in some specific
scenarios like certain types of processing. Service in batches: Multiple entities are
served together as a group rather than individually. Choosing a queue discipline depends
on system goals—whether fairness, minimizing wait times, or prioritizing urgent tasks is
most important. It influences overall system performance and user satisfaction.
Standard formulas for these queueing parameters, typically in the context of an M/M/1
system:

ˆ Average number of customers in service: L s = λ


µ−λ
where λ is the arrival rate, µ is the
service rate.

ˆ Average number of customers in queue: L = q


λ2
µ(µ−λ)

ˆ Average waiting time of a customer in queue: W = q


λ
µ(µ−λ)

ˆ Average waiting time of a customer in service: W = s


1
µ−λ

ˆ Probability that the queue size exceeds (n): P(Queue size ≥ n) = ρ n


where (N) is the
number of customers in the system.

115
Single server queueing systems.

Customer Arrival Queue Queue n Service 1

Multiple servers (In parallel) queueing systems.

Customer Arrival Queue Queue n

Service 1 Service 2 Service 3

Multiple servers (In series) queueing systems.

Customer Arrival Queue Service 1 Queue2 Service 2

Problem : Only one railway reservation concession form counter is available in the institute.
The students arrive at a counter according to a Poisson’s input process with mean rate of 30
per hour. The time required to serve a student has an exponential distribution with mean 90
seconds. Find the
ˆ Length of service L s

ˆ Queue Length L q

ˆ Waiting time in queue W q

ˆ Time spent by a customer in the system


average length of service.

Given data:

Arrival rate, λ = 30 students per hour


Service time, exponential with mean = 90 seconds
90
Since arrival rate is per hour, convert service time to hours: 90, seconds = 3600
= 0.025, hours
Calculate the service rate, µ
1 1
µ = average service time
= 0.025 = 40, students/hour
We know that,
Length of service = Average number of students in service:
λ
Ls = µ−λ
where λ is the arrival rate, µ is the service rate.
30
∴, Ls = 40−30 =3
Queue Length = Average number of customers in queue:

116
λ2 30X30 9
Lq = µ(µ−λ)
= 40(40−30)
= 4

λ 30 3 3
Waiting time in queue = Wq = µ(µ−λ)
= 40(40−30)
= 40
hours = 40
∗ 60 = 49 min

1 1 1
Time spent by a customer in the system = Ws = µ−λ
= 40−30
= 40−30
hours = 6 min

117
References
[1] T Veerarajan, ”Probability, Statistics and Random Processes”, second edition, Tata
McGraw-Hill, 2003.

[2] Peter Zörnig, ”Probability Theory and Statistics with Real World Applications” , Deutsche
National bibliothek, 2024.

[3] Marcello Pagano, Kimberlee Gauvreau, ”Principles of Biostatistics”, 2nd edition, Duxbury
Thomson Learning, 2000.

[4] Scott Miller, Donald Childers, ”Probability and Random Processes”, 2nd edition, Elsevier,
2012.

[5] Murray R Spiegel, ”Schaum’s Outline of Theory and Problems of Statistics”, 2nd edition,
McGraw Hill, 1992.

[6] Ronald E Walpole, Raymond H. Myers, Sharon L Myers, Keying E. Ye, ”Probability and
Statistics for Engineers and Scientist”, 9nd edition, Pearson, 2022.

[7] Henry Stark, John W Woods, ”Probability and Random Processes with Applications to
Signal Processing”, 3rd edition, Pearson, 2012.

[8] Robert Hogg, Joseph W McKean,Allen T Craig, ”Introduction to Mathematical Statistics”,


8nd edition, Pearson, 2024.

[9] Feller, ”An Introduction to Probability Theory and Its Applications”, 2nd edition, Wiley,
1970.

[10] Scheldon M Ross, ”Probability Models”, 6th edition, Harcourt Asia PTE LTD, 2000.

[11] Richard A Johnson, ”Miller & Freund’s Probability and Statistics for Enginnes”, 6th
edition, Pearson, 2001.

[12] Alberto Leon-Garcia, ”Probability and Random Processes for Electrical Engineering”, 2nd
edition, Pearson, 2009.

[13] J Susan Milton, Jesse C. Arnold ”Introduction to Probability and Statistics”, 4th edition,
McGrawHill, 2014.

118
119
120

You might also like