0% found this document useful (0 votes)
10 views35 pages

Simulating Multivariate Distributions PDF 4

This document explains the theory and simulation of multivariate probability distributions, focusing on joint, marginal, and conditional distributions, dependence measures (covariance and correlation), and simulation techniques for multivariate normal, t, gamma, and exponential distributions.The document originates from a Statistical Simulation course at Symbiosis Statistical Institute, Pune, and is authored by Md Aktar Ul Karim.

Uploaded by

sabiha.banderkar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views35 pages

Simulating Multivariate Distributions PDF 4

This document explains the theory and simulation of multivariate probability distributions, focusing on joint, marginal, and conditional distributions, dependence measures (covariance and correlation), and simulation techniques for multivariate normal, t, gamma, and exponential distributions.The document originates from a Statistical Simulation course at Symbiosis Statistical Institute, Pune, and is authored by Md Aktar Ul Karim.

Uploaded by

sabiha.banderkar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Simulating Multivariate Distributions

Md Aktar Ul Karim
Statistical Simulation Course
Symbiosis Statistical Institute, Pune

Contents

1 Introduction 2

2 Theory of Multivariate Distributions: 2


2.1 Key Aspects of Multivariate Distributions: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.2 Joint Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.3 Marginal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.4 Conditional Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.5 Covariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.6 Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3 Multivariate Normal Distribution 14

4 Other Multivariate Distributions 15


4.1 Multivariate t-Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.2 Multivariate Gamma Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.3 Multivariate Exponential Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

5 Useful Points to Remember 15

6 Simulations of Multivariate distribution 16


6.1 Multivariate Normal: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
6.2 Multivariate t-distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
6.2.1 t-Distribution vs. Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
6.2.2 Role of Degrees of Freedom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
6.2.3 Multivariate t-distribution Simulation: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

7 Excercises 20
1. Introduction

Multivariate distributions are extensions of univariate distributions where we analyze multiple random vari-
ables simultaneously. These distributions model the relationships between multiple variables and are commonly
used in fields like statistics, finance, and biological modeling.
Examples of multivariate distributions:

• Multivariate Normal Distribution

• Multivariate t-Distribution

• Multivariate Gamma Distribution

• Multivariate Exponential Distribution

In this lecture, we will focus on:

• Understanding the joint distribution of multiple variables

• Covariance and correlation structures

• Simulation of multivariate normal, t, gamma, and other distributions

• Practical examples in R

• Problems and solutions to solidify concepts

2. Theory of Multivariate Distributions:

A multivariate distribution extends the concept of a univariate distribution (distribution of a single random
variable) to multiple random variables. It describes the probability structure for two or more random variables
taken together.
For example, if X1 , X2 , . . . , Xn are random variables, their joint distribution specifies the probabilities of
different outcomes for these variables considered as a whole.

2.1. Key Aspects of Multivariate Distributions:

• Joint Distribution: Describes how random variables interact with each other.

• Marginal Distribution: The distribution of a subset of the variables.

• Conditional Distribution: The distribution of some variables given specific values of others.

2
• Covariance and Correlation: Measures of how variables are related.

• Multivariate Normal Distribution: One of the most important multivariate distributions.

2.2. Joint Distribution

The joint distribution of random variables describes the probability structure for a set of variables X1 , X2 , ..., Xn .
For discrete random variables, it is represented by a joint probability mass function (PMF):

P (X1 = x1 , X2 = x2 , ..., Xn = xn ).

For continuous random variables, it is represented by a joint probability density function (PDF):

f (x1 , x2 , ..., xn ).

The joint PDF satisfies:


Z Z
P (X1 ∈ A1 , X2 ∈ A2 , ..., Xn ∈ An ) = ··· f (x1 , x2 , ..., xn )dx1 . . . dxn
A1 An

For a set of random variables X ∈ Rn , the joint distribution defines the probabilities of different outcomes for
all variables considered together.
Example:

1. Consider two random variables X1 and X2 which can take values from the set {1, 2, 3}. We want to derive
their joint probability mass function (PMF).
To compute the joint probability for any specific pair of values X1 = 2 and X2 = 3:

1
P (X1 = 2, X2 = 3) =
9

Similarly, we can compute joint probability for any other pair and all combinations of X1 and X2 have
equal probability.

So the joint PMF can be written as:



 1 , if (x1 , x2 ) = (1, 1), (1, 2), (1, 3), (2, 1), (2, 2), (2, 3), (3, 1), (3, 2), (3, 3)

9
P (X1 = x1 , X2 = x2 ) =

0, otherwise

This joint distribution describes the likelihood of any pair of X1 and X2 occurring simultaneously. As all
pairs have equal probability, these variables are uniformly distributed.

3
2. Let X and Y be continuous random variables with the joint probability density function (PDF):

2, if 0 ≤ x ≤ 1, 0 ≤ y ≤ 1, x + y ≤ 1

fX,Y (x, y) =

0, otherwise

• Verify that fX,Y (x, y) is a valid joint PDF.

• Find P (X ≤ 21 , Y ≤ 12 ).

Sol: A function fX,Y (x, y) is a valid joint PDF if:

• fX,Y (x, y) ≥ 0 for all x, y.

• The integral over all possible values equals 1, i.e.,


Z ∞Z ∞
fX,Y (x, y) dy dx = 1
−∞ −∞

By looking at the joint PDF fX,Y (x, y) ≥ 0 for all x, y is already proved.
Now we will compute the integral:
1 Z 1−x 1 1 1 !
x2
Z Z Z      
1 1
2 dy dx = 2(1−x) dx = 2 (1 − x) dx =2 x− =2 1− −0 =2 =1
0 0 0 0 2 0 2 2

Therefore, fX,Y (x, y) is a valid joint PDF.

Now we will find the values of P (X ≤ 21 , Y ≤ 12 ).


Z 1/2 Z 1/2 Z 1/2 Z 1/2    
1 1 1 1 1 1
P (X ≤ 2, Y ≤ 2) = fX,Y (x, y) dy dx = 2 dy dx = 2 · =2 =
0 0 0 0 2 2 4 2

2.3. Marginal Distribution

The marginal distribution of a random variable provides the probability distribution of that variable inde-
pendently from other variables in a multivariate setting. It is obtained by summing or integrating over the
values of the other random variables in the joint distribution.
For two continuous random variables X1 and X2 , the marginal distribution of X1 is obtained by integrating
out X2 from the joint probability density function (PDF) and is given by:
Z ∞
fX1 (x1 ) = f (x1 , x2 ) dx2
−∞

Similarly, the marginal distribution of X2 is:


Z ∞
fX2 (x2 ) = f (x1 , x2 ) dx1
−∞

4
For two discrete random variables X1 and X2 , the marginal probability mass function (PMF) of X1 is obtained
by summing over all possible values of X2 :
X
P (X1 = x1 ) = P (X1 = x1 , X2 = x2 )
x2

Similarly, the marginal PMF of X2 is:


X
P (X2 = x2 ) = P (X1 = x1 , X2 = x2 )
x1

Marginal distributions describe the probability of each variable on its own, ignoring the presence of other
variables.
Example:

1. Continuous Case: Let us consider the joint PDF of two random variables X1 and X2 as follows:

f (x1 , x2 ) = 6x1 x2 , 0 ≤ x1 ≤ 1, 0 ≤ x2 ≤ 1

The joint PDF is valid as the integral over the entire domain is equal to 1.
Z 1Z 1
6x1 x2 dx1 dx2 = 1
0 0

To find the marginal PDF of X1 , we have to integrate out X2 :


Z 1 Z 1  2 1
x 1
fX1 (x1 ) = 6x1 x2 dx2 = 6x1 x2 dx2 = 6x1 2 = 6x1 × = 3x1
0 0 2 0 2
Thus, the marginal PDF of X1 is:

fX1 (x1 ) = 3x1 , 0 ≤ x1 ≤ 1

Similarly, we can find the marginal PDF of X2 :


Z 1 1 1
x21
Z 
1
fX2 (x2 ) = 6x1 x2 dx1 = 6x2 x1 dx1 = 6x2 = 6x2 × = 3x2
0 0 2 0 2
Hence, the marginal PDF of X2 is:

fX2 (x2 ) = 3x2 , 0 ≤ x2 ≤ 1

2. Discrete Case: Suppose the joint PMF of two discrete random variables X1 and X2 as follows:

X2 \X1 1 2 3
1 0.1 0.05 0.05
P (X1 = x1 , X2 = x2 ) =
2 0.2 0.1 0.1
3 0.05 0.1 0.25

5
To derive the marginal PMF of X1 , we sum over all possible values of X2 :

(X1 = 1) = P (X1 = 1, X2 = 1) + P (X1 = 1, X2 = 2) + P (X1 = 1, X2 = 3)

= 0.1 + 0.2 + 0.05 = 0.35

P (X1 = 2) = P (X1 = 2, X2 = 1) + P (X1 = 2, X2 = 2) + P (X1 = 2, X2 = 3)

= 0.05 + 0.1 + 0.1 = 0.25

P (X1 = 3) = P (X1 = 3, X2 = 1) + P (X1 = 3, X2 = 2) + P (X1 = 3, X2 = 3)

= 0.05 + 0.1 + 0.25 = 0.4

Then, the marginal PMF of X1 is given by

P (X1 = 1) = 0.35, P (X1 = 2) = 0.25, P (X1 = 3) = 0.4

Similarly, we can derive the marginal PMF of X2 (left for the readers).

So, we have to remember

• For continuous cases, the marginal distribution is derived by integrating the joint PDF over the other
variable.

• For discrete cases, the marginal distribution is found by summing the joint PMF over the other variable.

2.4. Conditional Distribution

The conditional distribution describes the probability of one random variable given the specific values of
other random variables. This allows us to understand how one variable behaves when the value of another is
fixed.
For continuous random variables, the conditional probability density function (PDF) of X1 given X2 = x2
is expressed as:
f (X1 = x1 , X2 = x2 )
f (X1 = x1 |X2 = x2 ) = ,
f (X2 = x2 )
where:

• f (X1 = x1 |X2 = x2 ) is the conditional PDF of X1 given X2 = x2 .

• f (X1 , X2 = x2 ) is the joint PDF of X1 and X2 .

• f (X2 = x2 ) is the marginal PDF of X2 .

6
For discrete random variables, the conditional probability mass function (PMF) of X1 given X2 = x2 is
expressed as:
P (X1 = x1 , X2 = x2 )
P (X1 = x1 |X2 = x2 ) = ,
P (X2 = x2 )
where:

• P (X1 = x1 |X2 = x2 ) is the conditional probability that X1 = x1 given X2 = x2 .

• P (X1 = x1 , X2 = x2 ) is the joint PMF of X1 and X2 .

• P (X2 = x2 ) is the marginal PMF of X2 .

Example:

1. Continuous Case: Suppose X1 and X2 are continuous random variables with the joint PDF:

f (x1 , x2 ) = 6x1 x2 , 0 ≤ x1 ≤ 1, 0 ≤ x2 ≤ 1

To derive the conditional distribution of X1 given X2 = 0.5, first we have to derive the marginal PDF.

The marginal PDF of X2 is written as:


Z 1 Z 1
1
fX2 (x2 ) = 6x1 x2 dx1 = 6x2 x1 dx1 = 6x2 × = 3x2
0 0 2

Hence, fX2 (x2 ) = 3x2 , for 0 ≤ x2 ≤ 1.

Now we compute the conditional PDF f (X1 |X2 = 0.5) by using the conditional distribution formula.

f (X1 = x1 , X2 = 0.5)
f (X1 = x1 |X2 = 0.5) =
fX2 (0.5)

The joint PDF is


f (X1 = x1 , X2 = 0.5) = 6x1 × 0.5 = 3x1

The marginal PDF


fX2 (0.5) = 3 × 0.5 = 1.5

Therefore, the conditional PDF is:

3x1
f (X1 = x1 |X2 = 0.5) = = 2x1 , 0 ≤ x1 ≤ 1
1.5

Hence, the conditional distribution of X1 given X2 = 0.5 is f (X1 |X2 = 0.5) = 2x1 , which is a valid PDF
over [0, 1].

7
2. Discrete Case: Now we take the same discrete joint probability distribution table for X1 and X2 as in
the marginal case:
X2 \X1 1 2 3
1 0.1 0.05 0.05
P (X1 = x1 , X2 = x2 ) =
2 0.2 0.1 0.1
3 0.05 0.1 0.25
The marginal PMF of X2 can be written as:

P (X2 = 1) = 0.1 + 0.05 + 0.05 = 0.2

P (X2 = 2) = 0.2 + 0.1 + 0.1 = 0.4

P (X2 = 3) = 0.05 + 0.1 + 0.25 = 0.4

Now, we use the conditional probability formula to derive the conditional PMF:

P (X1 = x1 , X2 = 2)
P (X1 = x1 |X2 = 2) =
P (X2 = 2)

For X2 = 2, the joint probabilities P (X1 = x1 , X2 = 2) are as follows:

P (X1 = 1, X2 = 2) = 0.2

P (X1 = 2, X2 = 2) = 0.1

P (X1 = 3, X2 = 2) = 0.1

The marginal probability P (X2 = 2) = 0.4. Now, the conditional probabilities are given by:

0.2
P (X1 = 1|X2 = 2) = = 0.5
0.4
0.1
P (X1 = 2|X2 = 2) = = 0.25
0.4
0.1
P (X1 = 3|X2 = 2) = = 0.25
0.4

Therefore, the conditional PMF of X1 given X2 = 2 is:

P (X1 = 1|X2 = 2) = 0.5, P (X1 = 2|X2 = 2) = 0.25, P (X1 = 3|X2 = 2) = 0.25

8
2.5. Covariance

The variance-covariance between two random variables X1 and X2 measures how much the variables change
together. It is defined as:
Cov (X1 , X2 ) = E [(X1 − E [X1 ]) (X2 − E [X2 ])]

• If Cov (X1 , X2 ) > 0, the variables tend to increase together.

• If Cov (X1 , X2 ) < 0, one variable tends to increase when the other decreases.

• If Cov (X1 , X2 ) = 0, the variables are uncorrelated.

The covariance matrix for a set of random variables X1 , X2 , ..., Xn is:


 
Cov(X1 , X1 ) Cov(X1 , X2 ) . . . Cov(X1 , Xn )
 
 Cov(X2 , X1 ) Cov(X2 , X2 ) . . . Cov(X2 , Xn ) 
 
Σ= .. .. .. ..

.
 

 . . . 

Cov(Xn , X1 ) Cov(Xn , X2 ) . . . Cov(Xn , Xn )

This matrix describes the joint variability between all variables.

2.6. Correlation

The correlation between two variables is a normalized measure of covariance that gives the strength and
direction of a linear relationship. It is defined as:

Cov(X1 , X2 )
Corr (X1 , X2 ) = p
V ar(X1 )V ar(X2 )

The correlation lies between -1 and 1:

• Corr (X1 , X2 ) = 1 means a perfect positive linear relationship.

• Corr (X1 , X2 ) = −1 means a perfect negative linear relationship.

• Corr (X1 , X2 ) = 0 means no linear relationship.

The correlation matrix is derived from the covariance matrix by normalizing each element.
Example:

9
1. Consider a discrete joint probability distribution for X1 and X2 given by the table:

X2 \X1 1 2 3
1 0.1 0.05 0.05
2 0.2 0.1 0.1
3 0.05 0.1 0.25

Derive the covariance and correlation between them.

Sol: First, we have to compute E[X1 ] and E[X2 ], and for that we need to compute the marginal PMF
of X1 and X2 . Marginal distribution of X1 :

P (X1 = 1) = P (X1 = 1, X2 = 1) + P (X1 = 1, X2 = 2) + P (X1 = 1, X2 = 3)

P (X1 = 1) = 0.1 + 0.2 + 0.05 = 0.35

P (X1 = 2) = P (X1 = 2, X2 = 1) + P (X1 = 2, X2 = 2) + P (X1 = 2, X2 = 3)

P (X1 = 2) = 0.05 + 0.1 + 0.1 = 0.25

P (X1 = 3) = P (X1 = 3, X2 = 1) + P (X1 = 3, X2 = 2) + P (X1 = 3, X2 = 3)

P (X1 = 3) = 0.05 + 0.1 + 0.25 = 0.4

So the marginal distribution of X1 is:

P (X1 = 1) = 0.35, P (X1 = 2) = 0.25, P (X1 = 3) = 0.4

Marginal distribution of X2 :

P (X2 = 1) = P (X1 = 1, X2 = 1) + P (X1 = 2, X2 = 1) + P (X1 = 3, X2 = 1)

P (X2 = 1) = 0.1 + 0.05 + 0.05 = 0.2

P (X2 = 2) = P (X1 = 1, X2 = 2) + P (X1 = 2, X2 = 2) + P (X1 = 3, X2 = 2)

P (X2 = 2) = 0.2 + 0.1 + 0.1 = 0.4

P (X2 = 3) = P (X1 = 1, X2 = 3) + P (X1 = 2, X2 = 3) + P (X1 = 3, X2 = 3)

P (X2 = 3) = 0.05 + 0.1 + 0.25 = 0.4

So the marginal distribution of X2 is:

P (X2 = 1) = 0.2, P (X2 = 2) = 0.4, P (X2 = 3) = 0.4

10
Now we derive expectation of X1 and X2
X
E[X1 ] = x1 P (X1 = x1 ) = 1 × 0.35 + 2 × 0.25 + 3 × 0.4 = 0.35 + 0.5 + 1.2 = 2.05
x1
X
E[X2 ] = x2 P (X2 = x2 ) = 1 × 0.2 + 2 × 0.4 + 3 × 0.4 = 0.2 + 0.8 + 1.2 = 2.2
x2

Now, we compute E[X1 X2 ]


XX
E[X1 X2 ] = x1 x2 P (X1 = x1 , X2 = x2 )
x1 x2

Using the joint probability table given above

E[X1 X2 ] = 1×1×0.1+1×2×0.2+1×3×0.05+2×1×0.05+2×2×0.1+2×3×0.1+3×1×0.05+3×2×0.1+3×3×0.25

E[X1 X2 ] = 0.1 + 0.4 + 0.15 + 0.1 + 0.4 + 0.6 + 0.15 + 0.6 + 2.25 = 4.75

Then, by using the formula for covariance, we can write

Cov(X1 , X2 ) = E[X1 X2 ] − E[X1 ]E[X2 ]

Cov(X1 , X2 ) = 4.75 − (2.05 × 2.2) = 4.75 − 4.51 = 0.24

So, the covariance between X1 and X2 is 0.24.

To compute the correlation, we first have to calculate the variances.

V ar(Xi ) = E[Xi2 ] − (E[Xi ])2

For X1 , we have already computed E[X1 ] = 2.05. Now we need to compute E[X12 ].
X
E[X12 ] = x21 P (X1 = x1 )
x1

Using the marginal probabilities from the table:

E[X12 ] = 12 × 0.35 + 22 × 0.25 + 32 × 0.4 = 0.35 + 1 × 0.25 + 9 × 0.4 = 0.35 + 1 + 3.6 = 4.95

So:
V ar(X1 ) = E[X12 ] − (E[X1 ])2 = 4.95 − (2.05)2 = 4.95 − 4.2025 = 0.7475

For X2 , we have to compute E[X22 ]:


X
E[X22 ] = x22 P (X2 = x2 )
x2

11
Using the marginal probabilities:

E[X22 ] = 12 × 0.2 + 22 × 0.4 + 32 × 0.4 = 0.2 + 4 × 0.4 + 9 × 0.4 = 0.2 + 1.6 + 3.6 = 5.4

So:
V ar(X2 ) = E[X22 ] − (E[X2 ])2 = 5.4 − (2.2)2 = 5.4 − 4.84 = 0.56

Now that we have the covariance and variances. From this we can compute the correlation:

Cov(X1 , X2 )
ρ(X1 , X2 ) = p p
V ar(X1 ) · V ar(X2 )

0.24
ρ(X1 , X2 ) = √ √
0.7475 · 0.56
Finally, the correlation is:
0.24
ρ(X1 , X2 ) = ≈ 0.3708
0.6471

The correlation between X1 and X2 is approximately 0.37. This indicates a moderate positive linear
relationship between X1 and X2 . The closer the correlation is to 1, the stronger the positive relationship,
while values closer to 0 indicate weaker relationships.

2. Continuous Joint Distribution:


Suppose we have two continuous random variables X1 and X2 with the following joint PDF

2(1 − x1 )(1 − x2 ), 0 ≤ x1 ≤ 1, 0 ≤ x2 ≤ 1

f (x1 , x2 ) =

0, otherwise

This is a joint PDF over the range [0, 1] for both X1 and X2 .

We integrate the joint distribution over the other variable to compute the marginal distribution.

Marginal PDF of X1 :
Z 1 Z 1 Z 1
fX1 (x1 ) = f (x1 , x2 ) dx2 = 2(1 − x1 )(1 − x2 ) dx2 = 2(1 − x1 ) (1 − x2 ) dx2
0 0 0

x22 1
R1 h i
1
The integral 0 (1 − x 2 ) dx 2 = x2 − 2 0 =1− 2 = 12 .

SO, the marginal PDF of X1 is given by

fX1 (x1 ) = 1 − x1 , 0 ≤ x1 ≤ 1

Marginal PDF of X2 :

12
Similarly, we can compute the marginal PDF of X2 :
Z 1 Z 1 Z 1
fX2 (x2 ) = f (x1 , x2 ) dx1 = 2(1 − x1 )(1 − x2 ) dx1 = 2(1 − x2 ) (1 − x1 ) dx1
0 0 0
R1 1
The integral 0 (1 − x1 ) dx1 = 1 − 2 = 21 .

So, the marginal PDF of X2 is


fX2 (x2 ) = 1 − x2 , 0 ≤ x2 ≤ 1

We now need to calculate the expectations E[X1 ], E[X2 ], E[X12 ], E[X22 ], and E[X1 X2 ].
1 1 1 1
x2 x3
Z Z Z 
1 1 1
E[X1 ] = x1 fX1 (x1 ) dx1 = x1 (1 − x1 ) dx1 = (x1 − x21 ) dx1 = 1− 1 = − =
0 0 0 2 3 0 2 3 6

1 1 1 1
x22 x32
Z Z Z 
1 1 1
E[X2 ] = x2 fX2 (x2 ) dx2 = x2 (1 − x2 ) dx2 = (x2 − x22 ) dx2 = − = − =
0 0 0 2 3 0 2 3 6

1 1 1 1
x3 x4
Z Z Z 
1 1 1
E[X12 ] = x21 fX1 (x1 ) dx1 = x21 (1 − x1 ) dx1 = (x21 − x31 ) dx1 = 1− 1 = − =
0 0 0 3 4 0 3 4 12

1 1 1 1
x3 x4
Z Z Z 
1 1 1
E[X22 ] = x22 fX2 (x2 ) dx2 = x22 (1 − x2 ) dx2 = (x22 − x32 ) dx2 = 2− 2 = − =
0 0 0 3 4 0 3 4 12

Z 1Z 1 Z 1Z 1
E[X1 X2 ] = x1 x2 f (x1 , x2 ) dx1 dx2 = x1 x2 · 2(1 − x1 )(1 − x2 ) dx1 dx2
0 0 0 0

Separate the terms:


Z 1  Z 1 
E[X1 X2 ] = 2 x1 (1 − x1 ) dx1 x2 (1 − x2 ) dx2
0 0

We have already calculated both integrals earlier, and both are 16 , so:

1 1 1
E[X1 X2 ] = 2 × × =
6 6 18

Now we will calculate the covariance:


 
1 1 1 1 1 1
Cov(X1 , X2 ) = E[X1 X2 ] − E[X1 ]E[X2 ] = − × = − =
18 6 6 18 36 36

13
Now, the correlation is given by:

Cov(X1 , X2 )
ρ(X1 , X2 ) = p p
V ar(X1 ) V ar(X2 )

So, we need to find the variances.


 2
1 1 1 1 1
V ar(X1 ) = E[X12 ] 2
− (E[X1 ]) = − = − =
12 6 12 36 18

1 1 1
V ar(X2 ) = E[X22 ] − (E[X2 ])2 = − =
12 36 18

So the correlation is given by:


1 1
1
ρ(X1 , X2 ) = q 36q = 36
1 =
1 1 18
2
18 18

3. Multivariate Normal Distribution

The multivariate normal distribution is one of the most important multivariate distributions. A random
vector X = (X1 , X2 , ..., Xn ) follows a multivariate normal distribution with mean vector µ ∈ Rn and covariance
matrix Σ ∈ Rn×n , denoted by X ∼ N (µ, Σ), if its PDF is:
 
1 1 T −1
f (X) = exp − (X − µ) Σ (X − µ)
(2π)n/2 |Σ|1/2 2

where:

• µ = (E[X1 ], E[X2 ], ..., E[Xn ])T is the mean vector.

• Σ is the covariance matrix, which must be positive definite.

Properties

• Marginal Distributions: The marginal distribution of any subset of the variables is also multivariate
normal.

• Conditional Distributions: The conditional distribution of one subset of variables given another is also
multivariate normal.

• Independence: If Σ is diagonal, the variables are independent.

14
Cholesky Decomposition The Cholesky decomposition of the covariance matrix Σ is key to simulating
multivariate normal distributions. If Σ = LLT , where L is a lower triangular matrix, then we can simulate
X ∼ N (µ, Σ) by:
X = µ + LZ

where Z ∼ N (0, I) is a vector of independent standard normal variables.

4. Other Multivariate Distributions


4.1. Multivariate t-Distribution

The multivariate t-distribution generalizes the univariate t-distribution to multiple variables. It is useful
when the variables follow a normal distribution but the variance is unknown. If X follows a multivariate
t-distribution with degrees of freedom ν, mean µ, and scale matrix Σ, we write:

X ∼ t(ν, µ, Σ).

The PDF is similar to the multivariate normal distribution but includes an additional term for ν.

4.2. Multivariate Gamma Distribution

It extends the gamma distribution to multiple variables. It models positive random variables with a de-
pendency structure. The multivariate gamma distribution has applications in reliability analysis and survival
models.

4.3. Multivariate Exponential Distribution

It models the joint behavior of time-to-event variables. It is used in survival analysis and reliability engi-
neering, where we are interested in the time until multiple events occur (e.g., component failures).

5. Useful Points to Remember

• Independence
Two random variables X1 and X2 are independent if:

P (X1 = x1 , X2 = x2 ) = P (X1 = x1 )P (X2 = x2 )

or equivalently, their joint PDF can be factorized into the product of their marginal PDFs:

f (x1 , x2 ) = f (x1 )f (x2 )

For multivariate cases, independence indicates that all pairs of random variables are uncorrelated, and
their covariance matrix is diagonal.

15
• The covariance matrix describes the linear dependence between all pairs of variables, while the corre-
lation matrix standardizes this dependence.

The diagonal entries of the covariance matrix represent variances, while the off-diagonal entries represent
covariances.

Understanding the covariance or correlation structure is crucial for modeling multiple variables’ joint
behavior and designing simulation algorithms for multivariate distributions.

6. Simulations of Multivariate distribution

6.1. Multivariate Normal:

To simulate samples from a multivariate normal distribution we directly use mvtnorm packages.
# Load the package
library(mvtnorm)

# Mean vector and covariance matrix


mu = c(1, 2) # Mean vector
Sigma = matrix(c(1, 0.5, 0.5, 1), 2, 2) # Covariance matrix

# Generate 1000 samples from the multivariate normal distribution


n = 1000
Y = rmvnorm(n, mean = mu, sigma = Sigma)

# Plot the results


plot(Y, col = ’blue’, pch = 17, main = "Simulated Multivariate Normal Samples")

Figure 1: Bi-variate Normal Distribution Sample Generation

16
6.2. Multivariate t-distribution

In multivariate t-distributions, the samples are not drawn purely based on a mean vector and a covariance
matrix.
The multivariate t-distribution introduces an additional parameter, degrees of freedom (df ), which controls
the heaviness of the tails of the distribution.

6.2.1. t-Distribution vs. Normal Distribution


The multivariate normal distribution has thin tails, meaning extreme values (far from the mean) are rela-
tively rare.
The multivariate t-distribution, however, has heavier tails, meaning it produces more extreme values (out-
liers) with higher probability than the normal distribution. This is useful in situations where the data is prone
to outliers or fat tails.

6.2.2. Role of Degrees of Freedom


• The degrees of freedom (df ) parameter controls how heavy the tails are.

• When the degrees of freedom are very large, the multivariate t-distribution approaches a multivariate
normal distribution.

• For small degrees of freedom (e.g., less than 10), the tails become much heavier, increasing the likelihood
of generating extreme values.

• This parameter effectively adjusts the scale of the covariance matrix based on how much variability
(beyond normal distribution assumptions) is expected.

6.2.3. Multivariate t-distribution Simulation:


In simulating the multivariate t-distribution, the samples from a multivariate normal distribution are scaled
by a factor that accounts for the degrees of freedom, allowing the distribution to have heavier tails.
If we draw a sample Z from a multivariate normal distribution N (µ, Σ), and W is a chi-squared distributed
random variable with ν degrees of freedom, the multivariate t-distribution sample is generated as:

Z
t=µ+ p ,
W/ν

Where:

• µ is the mean vector.

• Σ is the covariance matrix.

17
• Z is a sample from the multivariate normal distribution.

• W ∼ χ2 (ν) is a chi-squared distributed random variable with ν degrees of freedom.


q
W
The factor ν scales the covariance structure to account for the degrees of freedom.
Simulations in R
So, first, we generate samples from a multivariate normal distribution and then convert them into a t-
distribution.

Figure 2: Tail part comparison of multivariate normal distribution and multivariate t-distribution

# Load the required package


library(mvtnorm)
library(MASS)

# Parameters for the multivariate normal distribution


mu = c(1, 2) # Mean vector
Sigma = matrix(c(1, 0.5, 0.5, 1), 2, 2) # Covariance matrix
n = 10000 # Number of samples
df = 5 # Degrees of freedom for the t-distribution

# Generate multivariate normal samples


X_normal = rmvnorm(n, mean = mu, sigma = Sigma)

# Convert normal samples to t-distributed samples


W = rchisq(n, df) # Generate chi-squared distributed variables
X_t = sweep(X_normal, 1, sqrt(W / df), FUN = "/") # Scale the normal samples

# Function to calculate tail values (points outside a given quantile)


get_tail_values = function(data, quantile_value = 0.975) {
abs(data) > quantile(apply(abs(data), 1, max), quantile_value)
}

# tail part for both distributions


tail_normal = get_tail_values(X_normal)

18
tail_t = get_tail_values(X_t)

# Compare the proportion of tail points


normal_tail_proportion = sum(tail_normal) / n
t_tail_proportion = sum(tail_t) / n

# Visualization of the tail


par(mfrow = c(1,1))

plot(X_normal, col = ifelse(tail_normal, "red", "black"), main = "Multivariate Normal (tail in red)", pch = 2

# Plot multivariate t-distribution tail


plot(X_t, col = ifelse(tail_t, "red", "black"),
main = "Multivariate t (tail in red)", pch = 20)

But we can directly generate samples multivariate t-distribution.

Figure 3: Samples from multivariate t-distribution

# Load the required package


library(mvtnorm)

# Set parameters
mu = c(1, 2) # Mean vector
Sigma = matrix(c(1, 0.8, 0.8, 1), 2, 2) # Covariance matrix
n = 10000 # Number of samples
df = 5 # Degrees of freedom for the t-distribution

# Generate multivariate t-distributed samples

19
X_t = rmvt(n, sigma = Sigma, df = df, delta = mu)

# Tail analysis (proportion of tail points)


get_tail_values = function(data, quantile_value = 0.975) {
abs(data) > quantile(apply(abs(data), 1, max), quantile_value)
}

# Tail indicators for t-distribution samples


tail_t = get_tail_values(X_t)

# Proportion of samples in the tails


t_tail_proportion = sum(tail_t) / n

#Visualization
# Plot multivariate t-distribution tail
plot(X_t, col = ifelse(tail_t, "red", "black"),
main = "Multivariate t-distribution (tail in red)", pch = 20)

In the exercise, we will discuss the generation of simulated data from the other multivariate
distributions.

7. Excercises

1. Given a multivariate normal distribution X ∼ N (µ, Σ), simulate conditional distribution of X1 given
X2 = 5, where:    
1 2 0.5
µ =  ,Σ =  
2 0.5 1
Sol.
Conditional distribution of X1 given X2 = 5 is normally distributed with:

Σ12
µ1|2 = µ1 + (X2 − µ2 )
Σ22

Σ212
Σ1|2 = Σ11 −
Σ22
Now we simulate values and plot the distribution.

# Conditional mean and variance


mu_cond = 1 + (0.5 / 1) * (5 - 2)
sigma_cond = sqrt(2 - (0.5^2 / 1))

# Simulate conditional distribution


X1_given_X2 = rnorm(1000, mean=mu_cond, sd=sigma_cond)

# Plot
hist(X1_given_X2, main="Conditional Distribution", breaks=30)

20
Figure 4: Conditional distribution

2. Simulate 1000 samples from a bivariate normal distribution with mean vector
 
4
µ= 
−2

and covariance matrix  


2 0.6
Σ= .
0.6 1.5
Estimate sample mean and variance for both variables and verify whether they match the theoretical
mean and variance.
Sol.: R code

# Given parameters
mean_vector = c(4, -2)
cov_matrix = matrix(c(2, 0.6, 0.6, 1.5), nrow = 2)

# Generate 1000 samples from the bivariate normal distribution


[Link](123)
samples = mvrnorm(n = 1000, mu = mean_vector, Sigma = cov_matrix)
DF = [Link](X1 = samples[, 1], X2 = samples[, 2])

21
# Estimate sample mean and sample variance
sample_mean_X1 = mean(DF$X1)
sample_mean_X2 = mean(DF$X2)

sample_var_X1 = var(DF$X1)
sample_var_X2 = var(DF$X2)

cat("Sample Mean of X1:", sample_mean_X1, "\n")


cat("Sample Mean of X2:", sample_mean_X2, "\n")
cat("Sample Variance of X1:", sample_var_X1, "\n")
cat("Sample Variance of X2:", sample_var_X2, "\n")

# Compare with theoretical values


cat("Theoretical Mean of X1: 4, Variance: 2 \n")
cat("Theoretical Mean of X2: -2, Variance: 1.5 \n")

• Sample Mean of X1: 4.003916

• Sample Mean of X2: -2.050917

• Sample Variance of X1: 1.849226

• Sample Variance of X2: 1.632462

• Theoretical Mean of X1: 4, Variance: 2

• Theoretical Mean of X2: -2, Variance: 1.5

3. Simulate 10000 samples from a bivariate normal distribution with the following parameters: Mean vector
 
1
µ= 
2

Covariance matrix  
1 0.5
Σ= .
0.5 1

(a) Plot the histograms of the marginal distributions of both variables.

(b) Draw a scatter plot to visualize their relationship.

(c) Estimate the covariance and correlation between them.

Sol.: R code

# Packages
library(MASS)

# Given parameter values


mean_vector = c(1, 2)

22
cov_matrix = matrix(c(1, 0.5, 0.5, 1), nrow = 2)

# Simulate 1000 samples from the bivariate normal distribution


[Link](123)
samples = mvrnorm(n = 10000, mu = mean_vector, Sigma = cov_matrix)

# Extract the two variables


X1 = samples[, 1]
X2 = samples[, 2]

# Histogram plot of the marginal distributions


par(mfrow = c(1, 3))

hist(X1, main = "Histogram of X1", xlab = "X1", col = "lightblue", border = "black", freq = FALSE)
hist(X2, main = "Histogram of X2", xlab = "X2", col = "lightgreen", border = "black", freq = FALSE)

# Scatter plot
plot(X1, X2, main = "Scatter Plot of X1 vs X2", xlab = "X1", ylab = "X2", col = "blue", pch = 19)

# Covariance and correlation


estimated_cov = cov(X1, X2)
estimated_corr = cor(X1, X2)

# Print the covariance and correlation estimates


cat("Estimated Covariance between X1 and X2: ", estimated_cov, "\n")
cat("Estimated Correlation between X1 and X2: ", estimated_corr, "\n")

Figure 5: Histogram plot of the marginal distributions and Scatter plot of the bivariate normal distributions

23
4. Simulate 500 samples from a trivariate normal distribution with the following parameters: Mean vector
 
0
 
µ= 1 
 
 
−1

Covariance matrix  
1 0.4 −0.2
 
Σ =  0.4
 
1 0.5 
 
−0.2 0.5 1

(a) Create pairwise scatter plots of the variables.

(b) Compute the covariance matrix and the correlation matrix.

(c) visualize the marginal distributions with histograms.

Sol

Figure 6: scatter plots of X1 , X2 , X3 with X1

24
Figure 7: scatter plots of X1 , X2 , X3 with X2

Figure 8: scatter plots of X1 , X2 , X3 with X3

25
Results

• Estimated Covariance Matrix:


 
1.0023525 0.3297379 −0.2248198
 
Cov(X) =  0.3297379 0.9043863
 
0.5079871 
 
−0.2248198 0.5079871 1.0324293

• Estimated Correlation Matrix:


 
1.0000000 0.3463232 −0.2210009
 
Corr(X) =  0.3463232 1.0000000
 
0.5257092 
 
−0.2210009 0.5257092 1.0000000

Figure 9: Histograms of the marginal distributions of X1 , X2 , X3

26
# Load the MASS package
library(MASS)

# Parameter values
mean_vector = c(0, 1, -1)
cov_matrix = matrix(c(1, 0.4, -0.2,
0.4, 1, 0.5,
-0.2, 0.5, 1), nrow = 3)

# Simulate 500 samples from the trivariate normal distribution


[Link](123)
samples = mvrnorm(n = 500, mu = mean_vector, Sigma = cov_matrix)

# Extract the three variables


X1 = samples[, 1]
X2 = samples[, 2]
X3 = samples[, 3]

# Pairwise scatter plots


par(mfrow = c(1, 3))

plot(X1, X1, main = "X1 vs X1", xlab = "X1", ylab = "X1", col = "blue", pch = 19)
plot(X1, X2, main = "X1 vs X2", xlab = "X1", ylab = "X2", col = "red", pch = 19)
plot(X1, X3, main = "X1 vs X3", xlab = "X1", ylab = "X3", col = "green", pch = 19)

plot(X2, X1, main = "X2 vs X1", xlab = "X2", ylab = "X1", col = "red", pch = 19)
plot(X2, X2, main = "X2 vs X2", xlab = "X2", ylab = "X2", col = "blue", pch = 19)
plot(X2, X3, main = "X2 vs X3", xlab = "X2", ylab = "X3", col = "orange", pch = 19)

plot(X3, X1, main = "X3 vs X1", xlab = "X3", ylab = "X1", col = "green", pch = 19)
plot(X3, X2, main = "X3 vs X2", xlab = "X3", ylab = "X2", col = "orange", pch = 19)
plot(X3, X3, main = "X3 vs X3", xlab = "X3", ylab = "X3", col = "blue", pch = 19)

# Plot histograms of the marginal distributions


hist(X1, main = "Histogram of X1", xlab = "X1", col = "lightblue", border = "black", freq = FALSE)
hist(X2, main = "Histogram of X2", xlab = "X2", col = "lightgreen", border = "black", freq = FALSE)
hist(X3, main = "Histogram of X3", xlab = "X3", col = "lightpink", border = "black", freq = FALSE)

# Covariance matrix
sample_cov_matrix = cov(samples)
cat("Estimated Covariance Matrix:\n")
print(sample_cov_matrix)

# Correlation matrix
sample_corr_matrix = cor(samples)
cat("Estimated Correlation Matrix:\n")
print(sample_corr_matrix)

5. Simulate 1000 samples from a bivariate t-distribution with 5 degrees of freedom, mean vector
 
2
µ= 
3

27
, and covariance matrix  
1 0.6
Σ= .
0.6 1.5

(a) Plot the histograms of the marginal distributions.

(b) Create a scatter plot of the two variables.

(c) Estimate the sample mean and covariance matrix.

Sol.

Figure 10: Histograms of the marginal distributions of multivariate t- distributions

28
Figure 11: Scatter plot of two variables X1 and X2 following bivariate t- distributions

Results

• Estimated Sample Mean:  


2.034184
Mean(X) =  
3.027248

• Estimated Sample Covariance Matrix:


 
1.4756852 0.7775488
Cov(X) =  
0.7775488 2.5255454

# Load the necessary package


library(mvtnorm)

# Parameters
mean_vector = c(2, 3)

29
cov_matrix = matrix(c(1, 0.6, 0.6, 1.5), nrow = 2)
df = 5 # Degrees of freedom

# Number of samples to simulate


n = 1000

# Simulate from the multivariate t-distribution directly


[Link](123)
samples = rmvt(n = n, sigma = cov_matrix, df = df, delta = mean_vector)

# Extract the two variables


X1 = samples[, 1]
X2 = samples[, 2]

# Histograms plot of the marginal distributions


par(mfrow = c(1, 2))
hist(X1, main = "Histogram of X1", xlab = "X1", col = "lightblue",
border = "black", freq = FALSE, breaks = 30)
hist(X2, main = "Histogram of X2", xlab = "X2", col = "lightgreen",
border = "black", freq = FALSE, breaks = 30)

# Scatter plot of the two variables


par(mfrow = c(1, 1))
plot(X1, X2, main = "Scatter Plot of X1 vs X2", xlab = "X1", ylab = "X2", col = "blue", pch = 18)

# Sample mean and covariance matrix


sample_mean = colMeans(samples)
sample_cov_matrix = cov(samples)

cat("Estimated Sample Mean:\n")


print(sample_mean)

cat("\nEstimated Sample Covariance Matrix:\n")


print(sample_cov_matrix)

6. Simulate 500 samples from a trivariate gamma distribution. Let the shape parameters for the gamma
distributions be
k1 = 2, k2 = 3, k3 = 4,

and the scale parameters be


θ1 = 1, θ2 = 1.5, θ3 = 2.

(a) Plot the histograms of the marginal distributions.

(b) Create pairwise scatter plots.

(c) Estimate the sample mean and variance for each variable.

Sol.

30
Figure 12: Histograms of the marginal distributions of multivariate Gamma distributions

Results:

• Sample Mean:
2.093228, 4.470578, 8.312582

• Sample Variance:
2.081435, 6.920623, 16.16606

31
Figure 13: Scatter plot of three variables X1 , X2 , and X3 following multivariate Gamma distributions

# Parameters
shape_params = c(2, 3, 4)
scale_params= c(1, 1.5, 2)

# Generate 500 samples from the gamma distribution


[Link](456)
samples = matrix(nrow = 500, ncol = 3)
samples[, 1] = rgamma(500, shape = shape_params[1], scale = scale_params[1])
samples[, 2] = rgamma(500, shape = shape_params[2], scale = scale_params[2])
samples[, 3] = rgamma(500, shape = shape_params[3], scale = scale_params[3])

# Plot histograms for each variable


par(mfrow = c(1, 3))
hist(samples[, 1], main = "Histogram of X1", xlab = "X1", col = "lightblue", border = "black")
hist(samples[, 2], main = "Histogram of X2", xlab = "X2", col = "lightgreen", border = "black")
hist(samples[, 3], main = "Histogram of X3", xlab = "X3", col = "lightcoral", border = "black")

# Pairwise scatter plots


par(mfrow = c(1, 1))
pairs(samples, main = "Pairwise Scatter Plots", pch = 21,

32
bg = c("red", "blue", "green")[unclass(samples)])

# Estimate sample mean and variance


sample_mean = colMeans(samples)
sample_var = apply(samples, 2, var)

cat("Sample Mean:\n", sample_mean, "\n")


cat("Sample Variance:\n", sample_var, "\n")

7. Simulate 1000 samples from a bivariate exponential distribution using a copula-based approach. The
marginal distributions are exponential with rate parameters λ1 = 0.5 and λ2 = 1.

(a) Create histograms of the marginals.

(b) Draw a scatter plot of the two variables.

(c) Estimate the sample mean and correlation coefficient.

Sol.

Figure 14: Histograms of the marginal distributions of multivariate Exponential distributions

Results:

• Sample Mean:
2.06757, 0.9578588

33
• Sample Correlation: 0.646122

Figure 15: Scatter plot of two variables X1 , and X2 following multivariate Exponential distributions

# Load necessary library


library(copula)

# Parameters
lambda1 = 0.5
lambda2 = 1
rho = 0.7 # Copula correlation

# Generate 1000 samples from a bivariate exponential distribution


[Link](789)
cop = normalCopula(param = rho, dim = 2)
u = rCopula(1000, cop)
samples = cbind(-log(u[, 1]) / lambda1, -log(u[, 2]) / lambda2)

# Plot histograms for each variable


par(mfrow = c(1, 2))
hist(samples[, 1], main = "Histogram of X1", xlab = "X1",
breaks = 30, col = "lightblue", border = "black")
hist(samples[, 2], main = "Histogram of X2", xlab = "X2",
breaks = 30, col = "lightgreen", border = "black")

par(mfrow = c(1, 1))


# Scatter plot
plot(samples[, 1], samples[, 2], main = "Scatter Plot of X1 vs X2",
xlab = "X1", ylab = "X2", col = "orange", pch = 18)

34
# Estimate sample mean and correlation
sample_mean = colMeans(samples)
sample_cor = cor(samples[, 1], samples[, 2])

cat("Sample Mean:\n", sample_mean, "\n")


cat("Sample Correlation:\n", sample_cor, "\n")

35

You might also like