Multivariate Distributions
Introduction
Dr. Moutushi Chatterjee
Assistant Professor,
SQC & OR Unit,
Indian Statistical Institute, Bangalore
What is Multivariate Data?
▶ Statistical methods that allow the simultaneous investigation
of more than two variables
▶ Most of the real life problems in statistics involve looking at
more than a single measurement at a time, at relationships
among measurements and comparisons between them
Examples of Multivariate Data
▶ During a survey of households, several measurements on each
household are taken. These measurements, being taken on the
same household, will be dependent. For example, the
education level of the head of the household and the annual
income of the family are related.
▶ During a production process, a number of different
measurements such as the tensile strength, brittleness,
diameter, etc. are taken on the same unit. Collectively such
data are viewed as multivariate data.
▶ On a sample of 100 cars, various measurements such as the
average gas mileage, number of major repairs, noise level, etc.
are taken. Also each car is followed for the first 50,000 miles
and these measurements are taken after every 10,000 miles.
Measurements taken on the same car at the same mileage and
those taken at different mileage are going to be correlated. In
fact, these data represent a very complex multivariate analysis
problem.
Examples Contd....
▶ A new drug is to be compared with a control for its
effectiveness. Two different groups of patients are assigned to
each of the two treatments and they are observed weekly for
next two months. The periodic measurements on the same
patient will exhibit dependence and thus the basic problem is
multivariate in nature. Additionally, if the measurements on
various possible side-effects of the drugs are also considered,
the subsequent analysis will have to be done under several
carefully chosen models.
Examples Contd....
▶ In a designed experiment conducted in a research and
development center, various factors are set up at desired levels
and a number of response variables are measured for each of
these treatment combinations. The problem is to find a
combination of the levels of these factors where all the
responses are at their ‘optimum’. Since a treatment
combination which optimizes one response variable may not
result in the optimum for the other response variable, one has
a problem of conflicting objectives especially when the problem
is treated as collection of several univariate optimization
problems. Due to dependence among responses, it may be
more meaningful to analyze response variables simultaneously.
Examples Contd....
▶ An engineer wishes to set up a control chart to identify the
instances when the production process may have gone out of
control. Since an out of control process may produce an
excessively large number of out of specification items,
detection at an early stage is important. In order to do so, she
may wish to monitor several process characteristics on the
same units. However, since these characteristics are functions
of process parameters (conditions), they are likely to be
correlated leading to a set of multivariate data. Thus many
times, it is appropriate to set up a single (or only a few)
multivariate control chart(s) to detect the occurrence of any
out of control conditions. On the other hand, if several
univariate control charts are separately set up and individually
monitored, one may witness too many false alarms, which is
clearly an undesirable situation.
Examples Contd....
▶ In many situations, it is more economical to collect a large
number of measurements on the same unit but such
measurements are made only on a few units. Such a situation
is quite common in many remote sensing data collection
plans. Obviously, it is practically impossible to collectively
interpret hundreds of univariate analyses to come up with
some definite conclusions. A better approach may be that of
data reduction by using some meaningful approach. One may
eliminate some of the variables which are deemed redundant
in the presence of others. Better yet, one may eliminate some
of the linear combinations of all variables which contain little
or no information and then concentrate only on a few
important ones. Which linear combinations of the variables
should be retained can be decided using certain multivariate
methods such as principal component analysis.
IRIS data
Figure: The Famous IRIS Data
Figure:
Methods of Multivariate Analysis
Probability Space
▶ Consider the probability space (Ω, A, P) arising out of a
random experiment.
▶ Ω is the sample space corresponding to a random experiment
and the elements of Ω are called the outcomes and are usually
denoted by ω.
▶ A is the σ− field of the subsets of S. Sets in A are called the
events.
▶ P is a function from A to [0, 1] with P(W) = 1 and such that
if E! , E2 , · · · ∈ A are disjoint, then
[∞ X ∞
P Ej = P [Ej ]
j=1 j=1
Random Vector
Consider the probability space (Ω, A, P) arising out of a random
experiment. A vector of functions (X1 , X2 , · · · , Xn ) which maps Ω
into Rn is called a random vector if and only iff for each ai ∈ R1
(i = 1, 2, · · · , n),
{ω : X1 (ω) ≤ a1 , · · · , Xn (ω) ≤ an } ⊂ A,
i.e.
X −1 [I = {(x1 , · · · , xn ) : −∞ < xi < ai , i = 1, · · · , n}] ⊂ A,
i.e., if X is a A− measurable function. Here I is an n−
dimensional interval.
Example
Suppose, a fair coin is tossed twice. Let us assign the score 1(2)
for H(T). Let X1 and X2 denote respectively, the score at the first
toss and the total scores at the two tosses.
From this, we can define a random vector X = (X1 , X2 ) as follows:
ϕ, a1 < 0, a2 < 1
(HH), 0 < a1 ≤ 1, 1 < a2 ≤2
X −1 {(−∞, a1 ], (−∞, a2 ]} = (HT ), 0 < a1 ≤ 1, 2 < a2 ≤3
(TH), 1 < a1 ≤ 2, 2 < a2 ≤3
(TT ), 1 < a1 ≤ 2, 3 < a2 ≤4
Example
Suppose, a fair coin is tossed twice. Let us assign the score 1(2)
for H(T). Let X1 and X2 denote respectively, the score at the first
toss and the total scores at the two tosses.
From this, we can define a random vector X = (X1 , X2 ) as follows:
ϕ, a1 < 0, a2 < 1
(HH), 0 < a1 ≤ 1, 1 < a2 ≤2
X −1 {(−∞, a1 ], (−∞, a2 ]} = (HT ), 0 < a1 ≤ 1, 2 < a2 ≤3
(TH), 1 < a1 ≤ 2, 2 < a2 ≤3
(TT ), 1 < a1 ≤ 2, 3 < a2 ≤4
Thus, X is a random vector which can assume values (1,2), (1,3),
(2,3), (2, 4) with the corresponding probabilities....
Distribution Function of a Random Vector (Bivariate Case)
The joint cumulative distribution function (c.d.f.) of (X, Y) is
defined as
FX ,Y (x, y ) = P [X ≤ x, Y ≤ y ] , ∀(x, y ) ∈ R2
Properties of cdf (Bivariate Case)
1. 1.1 FX ,Y (−∞, y ) = lim F (x, y ) = 0, ∀y
x→−∞
1.2 FX ,Y (x, −∞) = lim F (x, y ) = 0, ∀x
y →−∞
2. FX ,Y (∞, ∞) = lim F (x, y ) = 1
x→,∞y →∞
3. For x1 < x2 , y1 < y2 ,
P [x1 < X ≤ x2 , y1 < Y ≤ y2 ]
= F (x2 , y2 ) − F (x2 , y1 ) − F (x1 , y2 ) + F (x1 , y1 ) ≥ 0
4. F(x, y) is right continuous for each argument.
Distribution Function (General Case)
By a multivariate distribution we mean the distribution of a random
′
vector X = (X1 , X2 , · · · , Xp ) whose elements Xi are univariate
random variables with the corresponding distribution function
′
FXi (xi ) = P[Xi ≤ xi ], ∀i = 1(1)p. Let x = (x1 , x2 , · · · , xp ) is a
realisation of X. then the distribution function of X is given by
FX (x) = P[X1 ≤ x1 , X2 ≤ x2 , · · · , Xp ≤ xp ]
Multivariate Joint Probability Density Function
For an n-dimensional continuous random vector X , suppose there
exists fX (x1 , x2 , · · · , xn )(≥ 0) such that, for all
(x1 , x2 , · · · , xn ) ∈ Rn ,
Zxn Zx1
FX (x1 , x2 , · · · , xn ) = ··· f (u1 , · · · , un )du1 · · · dun
−∞ −∞
i.e., if F has finite partial derivatives upto order ‘n’, such that,
δ n FX (x1 , x2 , · · · , xn )
= fX (x1 , x2 , · · · , xn )
δx1 · · · δxn
Here fX (x1 , x2 , · · · , xn ) is called the joint pdf of X1 , X2 , · · · , Xn
and satisfies
Z∞ Z∞
··· f (x1 , · · · , xn )dx1 · · · dxn = 1
−∞ −∞
Joint probability mass function for Discrete
random vector
If each Xi is a discrete random variable, then X is called a discrete
random vector and its probability mass function is given by
FX (x) = P[X1 = x1 , X2 = x2 , · · · , Xp = xp ]
.
It is also called the joint probability mass function of
X1 , X2 , · · · , Xp .
Marginal Distribution Function of Xi
▶ The function
FX (∞, ∞, · · · , xi , ∞, ∞) = FXi (xi ) = P(Xi ≤ xi )
is called the marginal d.f. of Xi .
Marginal Distribution Function of Xi
▶ The function
FX (∞, ∞, · · · , xi , ∞, ∞) = FXi (xi ) = P(Xi ≤ xi )
is called the marginal d.f. of Xi .
▶ Similarly, the marginal d.f. of any subset (Xi1 , Xi2 , · · · , Xim ) is
obtained by setting all the remaining variables to ∞.
marginal pmf of a subset of X
For discrete X , the marginal pmf of a subset (Xi1,··· ,X im
) of X is,
pi1 ,··· ,im = P [Xi1 = xi1 , · · · , Xim = xim , Xj < ∞, j ̸= (i1 , · · · , im )]
X X
= ··· P [Xi1 = xi1 , · · · , Xim = xim ,
all values of Xj
Xj , j ̸= (i1 , · · · , im )]
Marginal pdf of a subset of X
For continuous X , the marginal pmf of a subset (Xi1,··· ,X im
) of X is,
Z∞ Z∞ Y
fXi1 ,··· ,Xim (xi1 , · · · , xim ) = ··· fX (x1 , · · · , xn ) dxj
−∞ −∞ j̸=i1 ,··· ,im
Conditional pmf of a subset of X
For discrete X , the conditional pmf Xi1 , · · · , Xik given
Xj1 = xj1 , · · · , Xjn−k = xjn−k is
P Xi1 = xi1 , · · · , Xik = xik |Xj1 = xj1 , · · · , Xjn−k = xjn−k
p1,2,··· ,n
=
pj1 ,j2 ,··· ,jn−k
The above quantities are non-negative and their summation over
all values of Xi1 , · · · , Xik is unity.
Conditional pdf of a subset of X
For continuous X , the conditional pmf Xi1 , · · · , Xik given
Xj1 , · · · , Xjn−k is
fXi1 ,··· ,Xik | Xj1 ,··· ,jn−k (xi1 , xi2 , · · · , xik |xj1 , · · · , xjn−k )
fX1 ,X2 ,··· ,Xn (x1 , x2 , · · · , xn )
=
fXj1 ,Xj2 ,··· ,Xjn−k (xj1 , xj2 , · · · , xjn−k )
Stochastic Independence
▶ The random variables X1 , X2 , · · · , Xn are said to be
stochastically independent if and only if
FX = FX1 (x1 )FX2 (x2 ) · · · FXn (xn ), ∀(x1 , x2 , · · · , xn ) ∈ Rn
▶ If X is a n-dimensional discrete random vector, then
X1 , X2 , · · · , Xn are stochastically independent if
P (X1 = x1 , · · · , Xn = xn ) = P(X1 = x1 ) · · · P(xn = xn ),
∀(x1 , x2 , · · · , xn ) ∈ Rn .
Stochastic Independence
▶ The random variables X1 , X2 , · · · , Xn are said to be
stochastically independent if and only if
FX = FX1 (x1 )FX2 (x2 ) · · · FXn (xn ), ∀(x1 , x2 , · · · , xn ) ∈ Rn
▶ If X is a n-dimensional discrete random vector, then
X1 , X2 , · · · , Xn are stochastically independent if
P (X1 = x1 , · · · , Xn = xn ) = P(X1 = x1 ) · · · P(xn = xn ),
∀(x1 , x2 , · · · , xn ) ∈ Rn .
Thus, in case of two discrete random variables, it implies,
P(X = x|Y = y ) = P(X = x)
for all (x, y )
P(Y = y |X = x) = P(Y = y )
▶ If X is a n-dimensional continuous random vector, then
X1 , X2 , · · · , Xn are stochastically independent if
fX (x1 , x2 , · · · , xn ) = fX1 (x1 )fX2 (x2 ) · · · fXn (xn )
∀(x1 , x2 , · · · , xn ) ∈ Rn .
▶ If X is a n-dimensional continuous random vector, then
X1 , X2 , · · · , Xn are stochastically independent if
fX (x1 , x2 , · · · , xn ) = fX1 (x1 )fX2 (x2 ) · · · fXn (xn )
∀(x1 , x2 , · · · , xn ) ∈ Rn .
Thus, in case of two continuous random variables, it implies,
fX |Y (x|y ) = fX (x)
for all (x, y )
fY |X (y |x) = fY (y )
Theorem
Let X1 , X2 , · · · , Xn are independent random variables. let gi (Xi ) be
a function of XI such that, gi (Xi ) is a random variable, for
i = 1(1)n. Then show that g1 (X1 ), g2 (X2 ), · · · , gn (Xn ) are
independent random variables.
Proof: Do it yourself
Mutual independence v.s. pairwise independence
▶ n random variables X1 , X2 , · · · , Xn are pairwise independent if
for all pairs (i, j) with i ̸= j, the random variables Xi and Xj
are independent.
▶ If the n random variables X1 , X2 , · · · , Xn are independent and
have the same distribution, then we say that they are
independent and identically distributed, which we abbreviate
as i.i.d.
Example
Exercise 1
Let X1 , X2 be i.i.d. random variables with
1
P(Xi ̸= 1) = , i = 1, 2
2
Define X3 = X1 X2 . Show that X1 , X2 , X3 are pairwise independent
but not stochastically independent.
Solution