0% found this document useful (0 votes)
6 views36 pages

Chapter 3

Chapter 3 of Statistics 2A focuses on joint distributions of random variables, covering both discrete and continuous cases. It explains concepts such as joint probability mass functions, cumulative distribution functions, and the independence of random variables. The chapter also introduces copulas, which are used to model dependencies between random variables in various applications, including financial statistics.

Uploaded by

Lesego Komane
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views36 pages

Chapter 3

Chapter 3 of Statistics 2A focuses on joint distributions of random variables, covering both discrete and continuous cases. It explains concepts such as joint probability mass functions, cumulative distribution functions, and the independence of random variables. The chapter also introduces copulas, which are used to model dependencies between random variables in various applications, including financial statistics.

Uploaded by

Lesego Komane
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Statistics 2A

Chapter 3: Joint Distributions

Vaughan van Appel

University of Johannesburg
vvanappel@[Link]

Department of Statistics

1 / 36
Overview
3.0 Revision: Chapter 2

3.1 Introduction

3.2 Discrete Random Variables

3.3 Continuous Random Variables

3.4 Independent Random Variables

3.5 Conditional Distributions

3.5.1 The Discrete Case

3.5.2 The Continuous Case

2 / 36
Revision: Chapter 2

Discrete: pmf : p(x)


X
▶ a valid pmf : p(x) ≥ 0, ∀x ∈ Ω and p(x) = 1
x∈Ω
X
▶ P (X ∈ D) = p(x)
x∈D
x
X
▶ cdf : F (x) = P (X ≤ x) = p(u)
u∈Ω

Please also revise Chapter 3 from the


STA01A1/STA1A10/STA1A2E prescribed textbook
[Devore and Berk, 2007].

3 / 36
Continuous: pdf : f (x)
Z
▶ a valid pdf : f (x) ≥ 0, ∀x ∈ A and f (x)dx = 1
x∈Ω
Z b
▶ P (a ≤ X ≤ b) = f (x)dx
a
Z x
▶ cdf : F (x) = P (X ≤ x) = f (u)du
−∞

Please revise Chapter 4 from the STA01A1/STA1A10/STA1A2E


prescribed textbook [Devore and Berk, 2007].

4 / 36
3.1 Introduction
This chapter is concerned with the joint probability structure of
two or more random variables defined on the same sample space.
Joint distributions arise naturally in many applications, of which
the following are illustrative:
▶ In ecological studies, counts of several species, modelled as
random variables, are often made. One species is often the
prey of another; clearly, the number of predators will be
related to the number of prey.
▶ The joint probability distribution of the x, y, and z
components of wind velocity can be experimentally measured
in studies of atmospheric turbulence.
▶ The joint distribution of the values of various physiological
variables in a population of patients is often of interest in
medical studies.
▶ to estimate the age distribution from the length distribution.
The age distribution is relevant to the setting of reasonable
harvesting policies.
5 / 36
The joint behaviour of two random variables, X and Y , is
determined by the cumulative distribution function (cdf ):

F (x, y) = P (X ≤ x, Y ≤ y), ≡ P (X ≤ x ∩ Y ≤ y)

regardless of whether X and Y are continuous or discrete.

The cdf gives the probability that the point (X, Y ) belongs to a
semi-infinite rectangle in the plane, as shown in the following
Figure:

Figure: F (a, b) gives the probability of the shaded rectangle.

6 / 36
The probability that (X, Y ) belongs to a given rectangle is,

P (x1 < X ≤ x2 , y1 < Y ≤ y2 ) = F (x2 , y2 ) − F (x2 , y1 )


− F (x1 , y2 ) + F (x1 , y1 )

The probability of the shaded rectangle can be found by


subtracting from the probability of the (semi-infinite) rectangle
having the upper-right corner (x2 , y2 ) the probabilities of the
(x1 , y2 ) and (x2 , y1 ) rectangles, and then adding back in the
probability of the (x1 , y1 ) rectangle.
7 / 36
We can also (if continuous):
Z x2 Z y2
P (x1 < X ≤ x2 , y1 < Y ≤ y2 ) = f (x, y)dydx
x y
Z 1y2 Z 1x2
= f (x, y)dxdy
y1 x1
or (if discrete):
y2 X
X x2
P (x1 < X ≤ x2 , y1 < Y ≤ y2 ) = p(x, y)
y=y1 x=x1
x2 X
X y2
= p(x, y)
x=x1 y=y1

The probability that (X, Y ) belongs to a set A, for a large enough


class of sets for practical purposes, can be determined by taking
limits of intersections and unions of rectangles. In general, if
X1 , ..., Xn are jointly distributed random variables, their joint cdf is
F (x1 , x2 , ..., xn ) = P (X1 ≤ x1 , X2 ≤ x2 , ..., Xn ≤ xn ).
8 / 36
3.2 Discrete Random Variables
Suppose that X and Y are discrete random variables defined on
the same sample space and that they take on values x1 , x2 , ..., and
y1 , y2 , ..., respectively. Their joint frequency function, or joint
probability mass function, p(x, y), is:
p(xi , yj ) = P (X = xi , Y = yj ).
A simple example will illustrate this concept. A fair coin is tossed
three times; let X denote the number of heads on the first toss
and Y the total number of heads. From the sample space, which is
Ω = {hhh, hht, hth, htt, thh, tht, tth, ttt}
We see that the joint frequency function of X and Y is as given in
the following table:

x/y 0 1 2 3
1 2 1
0 8 8 8 0
1 2 1
1 0 8 8 8

Table: Joint frequency function of X and Y .


9 / 36
In general, to find the frequency function of Y , we simply sum
down the appropriate column of the table. For this reason, pY is
called the marginal frequency function of Y , i.e.,
X
pY (y) = p(xi , y).
i

Similarly, summing across the rows gives:


X
pX (x) = p(x, yi ),
i

which is the marginal frequency function of X.

10 / 36
The case for several random variables is analogous. If X1 , ..., Xm
are discrete random variables defined on the same sample space,
their joint frequency function is:

p(x1 , ..., xm ) = P (X1 = x1 , ..., Xm = xm ).

The marginal frequency function of X1 , for example, is:


X
pX1 (x1 ) = p(x1 , x2 , ..., xm ).
x2 ...xm

The two-dimensional marginal frequency function of X1 and X2 ,


for example, is:
X
pX1 X2 (x1 , x2 ) = p(x1 , x2 , ..., xm ).
x3 ...xm

11 / 36
Example (A)
Suppose that the pmf pXY is given in the following table:

(X, Y ) Y =0 Y =1
X=0 1/10 2/10
X=1 3/10 4/10

Find the marginal pmf of X.



 3/10 x = 0
pX (x) = 7/10 x = 1
0 otherwise

12 / 36
3.3 Continuous Random Variables
Suppose that X and Y are continuous random variables with a
joint cdf , F (x, y). Their joint density function is a piecewise
continuous function of two Rvariables, f (x, y). The density function
∞ R∞
f (x, y) is nonnegative and −∞ −∞ f (x, y)dydx = 1. For any
reasonable two-dimensional set A
Z Z

P (X, Y ) ∈ A = f (x, y)dydx.
A

In particular, if A = {(X, Y )|X ≤ x and Y ≤ y},


Z x Z y
F (x, y) = f (u, v)dvdu.
−∞ −∞

From the fundamental theorem of multivariable calculus, it follows


that
∂2
f (x, y) = F (x, y)
∂x∂y
wherever the derivative is defined.
13 / 36
For small δx and δy , if f is continuous at (x, y),
Z x+δx Z y+δy
P (x ≤ X ≤ x + δx , y ≤ Y ≤ y + δy ) = f (u, v)dvdu
x y
≈ f (x, y)δx δy .

Thus, the probability that (X, Y ) is in a small neighborhood of


(x, y) is proportional to f (x, y). Differential notation is sometimes
useful:

P (x ≤ X ≤ x + dx, y ≤ Y ≤ y + dy) = f (x, y)dxdy.

14 / 36
Example (A)
Consider the bivariate density function:
12 2
f (x, y) = (x + xy), 0 ≤ x ≤ 1, 0 ≤ y ≤ 1,
7
which is plotted in Figure 3.4 (see [Rice, 2007, pg. 76]).

P (X > Y ) can be found by integrating f over the set


{(x, y)|0 ≤ y ≤ x ≤ 1}:

12 1 x 2
Z Z
P (X > Y ) = (x + xy)dydx
7 0 0
12 1 2 12 1 3 3
Z Z
1 y=x
= (x y + xy 2 ) dx = x dx
7 0 2 y=0 7 0 2
9
=
14
2
15 / 36
The marginal cdf of X, or FX , is

FX (x) = P (X ≤ x)
= lim F (x, y)
y→∞
Z x Z ∞
= f (u, y)dydu.
−∞ −∞

From this, it follows that the density function of X alone, known


as the marginal density of X, is
Z ∞

fX (x) = FX (x) = f (x, y)dy.
∂x −∞

In the discrete case, the marginal frequency function was found by


summing the joint frequency function over the other variable; in
the continuous case, it is found by integration.

16 / 36
Example (B)
Continuing Example A, the marginal density of X is:

12 1 2
Z
12  2 x 
fX (x) = (x + xy)dy = x +
7 0 7 2

A similar calculation shows that the marginal density of Y is:


 
12 1 y
fY (y) = + .
7 3 2
2

17 / 36
For several jointly continuous random variables, we can make the
obvious generalizations. The joint density function is a function of
several variables, and the marginal density functions are found by
integration. There are marginal density functions of various
dimensions. Suppose that X, Y , and Z are jointly continuous
random variables with density function f (x, y, z).

The one-dimensional marginal distribution of X is


Z ∞Z ∞
fX (x) = f (x, y, z)dydz
−∞ −∞

and the two-dimensional marginal distribution of X and Y is


Z ∞
fXY (x, y) = f (x, y, z)dz.
−∞

18 / 36
▶ A copula is a joint cumulative distribution function of random
variables that have uniform marginal distributions.
▶ The functions H(x, y) in Example C are copulas.
▶ Note that a copula C(u, v) is nondecreasing in each variable,
because it is a cdf .
▶ Also, P (U ≤ u) = C(u, 1) = u and C(1, v) = v, since the
marginal distributions are uniform.
▶ We will restrict ourselves to copulas that have densities, in
which case the density is

c(u, v) = C(u, v) ≥ 0.
∂u∂v
Now, suppose that X and Y are continuous random variables with
cdf s FX (x) and FY (y). Then U = FX (x) and V = FY (y) are
uniform random variables (Proposition 2.3C).

19 / 36
For a copula C(u, v), consider the joint distribution defined by

FXY (x, y) = C FX (x), FY (y) .

Since C(FX (x), 1) = FX (x), the marginal cdf s corresponding to


FXY are FX (x) and FY (y). Using the chain rule, the
corresponding density is

fXY (x, y) = c FX (x), FY (y) fX (x)fY (y).

▶ This points out that from the ingredients of two marginal


distributions and any copula, a joint distribution with those
marginals can be constructed.
▶ It is thus clear that the marginal distributions do not
determine the joint distribution. The dependence between the
random variables is captured in the copula.

20 / 36
Copulas are not just academic curiosities - they have been
extensively used in financial statistics in recent years to model
dependencies in the returns of financial instruments.

In some applications, it is useful to analyse distributions that are


uniform over some region of space. For example, in the plane, the
random point (X, Y ) is uniform over a region, R, if for any A ⊂ R,
 |A|
P (X, Y ) ∈ A =
|R|

where | · | denotes area.

21 / 36
3.4 Independent Random Variables
Definition
Random variables X1 , X2 , ..., Xn are said to be independent if
their joint cdf factors into the product of their marginal cdf ’s:
F (x1 , x2 , ..., xn ) = FX1 (x1 )FX2 (x2 )...FXn (xn ) ∀ x1 , x2 , ..., xn .

Or if their joint pmf /pdf factors into the product of their marginal
pmf /pdf ’s:

f (x1 , x2 , ..., xn ) = fX1 (x1 )fX2 (x2 )...fXn (xn ) ∀ x1 , x2 , ..., xn .

2
The definition holds for both continuous and discrete random
variables.
▶ For discrete random variables, it is equivalent to state that
their joint frequency function factors;
▶ for continuous random variables, it is equivalent to state that
their joint density function factors.
22 / 36
To see why this is true, consider the case of two jointly continuous
random variables, X and Y . If they are independent, then

F (x, y) = FX (x)FY (y)

and taking the second mixed partial derivative makes it clear that
the density function factors. On the other hand, if the density
function factors, then the joint cdf can be expressed as a product:
Z x Z y
F (x, y) = fX (u)fY (v)dvdu
−∞ −∞
Z x  Z y 
= fX (u)du fY (v)dv = FX (x)FY (y).
−∞ −∞

23 / 36
▶ It can be shown that the definition implies that if X and Y
are independent, then

P (X ∈ A, Y ∈ B) = P (X ∈ A)P (Y ∈ B)
▶ It can also be shown that if g and h are functions, then
Z = g(X) and W = h(Y ) are independent as well.
▶ A sketch of an argument goes like this (the details are beyond
the level of this course): We wish to find P (Z ≤ z, W ≤ w).
Let A(z) be the set of x such that g(x) ≤ z, and let B(w) be
the set of y such that h(y) ≤ w. Then

P (Z ≤ z, W ≤ w) = P X ∈ A(z), Y ∈ B(w)
 
= P X ∈ A(z) P Y ∈ B(w)
= P (Z ≤ z)P (W ≤ w)

24 / 36
Example (A)
Suppose that (X, Y ) has a joint pdf :

1 −1/2 ≤ x ≤ 1/2, −1/2 ≤ y ≤ 1/2
f (x, y) =
0 otherwise

Make a sketch of this square. You can visualize that the marginal
distributions of X and Y are uniform on [−1/2, 1/2]. For example, the
marginal density at a point x,

1 −1/2 ≤ x ≤ 1/2
fX (x) =
0 otherwise

and marginal density at a point y,



1 −1/2 ≤ y ≤ 1/2
fY (y) =
0 otherwise

The joint density is equal to the product of the marginal densities, so X


and Y are independent.

2
25 / 36
26 / 36
3.5 Conditional Distributions
3.5.1 Conditional Distributions: The Discrete Case

If X and Y are jointly distributed discrete random variables, the


conditional probability that X = xi given that Y = yj is, if
pY (yj ) > 0,

P (X = xi , Y = yj )
P (X = xi |Y = yj ) =
P (Y = yj )
pXY (xi , yj )
=
pY (yj )

This probability is defined to be zero if pY (yj ) = 0. We will denote


this conditional probability by pX|Y (x|y). Note that this function
of x is a genuine frequency function since it is nonnegative and
sums to 1 and that pY |X (y|x) = pY (y) if X and Y are
independent.

27 / 36
Example (A)
We return to the simple discrete distribution considered in
[Rice, 2007, Section 3.2], reproducing the table of values for
convenience here:
x/y 0 1 2 3
1 2 1
0 8 8 8 0
1 2 1
1 0 8 8 8

Table: Joint frequency function of X and Y .

The conditional frequency function of X given Y = 1 is


2
8 2
pX|Y (0|1) = 3 =
8
3
1
8 1
pX|Y (1|1) = 3 =
8
3

2
28 / 36
The definition of the conditional frequency function just given can
be re-expressed as

pXY (x, y) = pX|Y (x|y)pY (y)

(the multiplication law from [Rice, 2007, Chapter 1]). This useful
equation gives a relationship between the joint and conditional
frequency functions. Summing both sides over all values of y, we
have an extremely useful application of the law of total probability:
X
pX (x) = pX|Y (x|y)pY (y)
y

29 / 36
3.5.2 Conditional Distributions: The Continuous Case

In analogy with the definition in the preceding section, if X and Y


are jointly continuous random variables, the conditional density of
Y given X is defined to be

fXY (x, y)
fY |X (y|x) =
fX (x)

if 0 < fX (x) < ∞, and 0 otherwise.

30 / 36
This definition is in accord with the result to which a differential
argument would lead. We would define fY |X (y|x)dy as
P (y ≤ Y ≤ y + dy|x ≤ X ≤ x + dx) and calculate

fXY (x, y)dxdy


P (y ≤ Y ≤ y + dy|x ≤ X ≤ x + dx) =
fX (x)dx
fXY (x, y)dy
=
fX (x)

Note that the bottom expression is interpreted as a function of y,


x being fixed. The numerator is the joint density fXY (x, y),
viewed as a function of y for fixed x: you can visualize it as the
curve formed by slicing through the joint density function
perpendicular to the x axis. The denominator normalizes that
curve to have unit area.

31 / 36
The joint density can be expressed in terms of the marginal and
conditional densities as follows:

fXY (x, y) = fY |X (y|x)fX (x)

Integrating both sides over x allows the marginal density of Y to


be expressed as
Z ∞
fY (y) = fY |X (y|x)fX (x)dx
−∞

which is the law of total probability for the continuous case.

32 / 36
Leave Out!
You can leave out Sections 3.6 - 3.7

33 / 36
Tutorial Exercises
Do the following exercises:

▶ [Rice, 2007, Chapter 3, pg. 107]:


1, 7, 8, 10, 12, 15, 18, 19

34 / 36
References/Prescribed Textbook(s)
John A. Rice
Mathematical Statistics and Data Analysis
3rd edition (2007)
Cengage Learning.

Jay L. Devore, and Kenneth N. Berk


Modern Mathematical Statistics with Applications
2nd edition (2007)
Cengage Learning.

35 / 36
Questions?

36 / 36

You might also like