Unit 5
Brief overview of probability
Some Common continuous
Distributions
• Uniform distributions
• Gaussian (normal) distribution
• Laplace distribution
Uniform distributions
A continuous random variable is said to follow a continuous uniform or rectangular
distribution over a internal (a, b) if its probability density is given by
Fx(x)= k; a<x<b
The uniform
=0 otherwise distribution is a type of
probability distribution
in that all the possible
outcomes are equally
possible.
Ex: A deck of cards has
uniform distributions within
it since the probability of
drawing a heart, club,
diamond or spade is equally
k(b-a)=1
possible
So, k= 1/(b-a)
Uniform Distributions
• The cumulative distribution function (cdf) of x is given by
• Mean or expected value of X
X is uniformly distributed over the interval [a, b]
we can
expect the mean to be at the middle point, E(X) =
Uniform Distributions
• Mean and variance of uniform random variable is
• This is a useful distribution when we don’t have any prior knowledge of the actual pdf
and all continuous values in the same range seems to be equally likely.
Example 1: Uniform Distributions
Given the continuous random variable x U[1,10]
• Write the transform function Y=2x+3
• Find P(10<y<20)
Uniform Distributions
Uniform Distributions
A bus arrives regularly every 20 minutes throughout the day. What is the probability that
you will have to wait more than 15 minutes assuming that you arrive at a random time?
F(x)=1/(20-0) = 1/20 , 0<=x<=20
P(x>15)=
Gaussian (Normal) Distributions
• Gaussian probability distribution is perhaps the most used distribution in all of science.
• Normal distribution, also known as the Gaussian distribution, is a probability
distribution that is symmetric about the mean, showing that data near the mean are
more frequent in occurrence than data far from the mean.
• Symmetrical distributions occur when where a dividing line produces two mirror images.
• In graphical form, the normal distribution appears as a "bell curve".
• Normal distributions are symmetrical, but not all symmetrical distributions are normal.
Gaussian (Normal) Distributions
• The Gaussian or Normal distribution is the most widely used distribution in the study of
random phenomena in nature statistics due to few reasons:
1. It has two parameters that are easy to interpret, and which capture some of the most
basic properties of a distribution, namely its mean and variance.
2. The central limit theorem provides the result that sums of independent random
variables have an approximately Gaussian distribution, which makes it a good choice
for modelling residual errors or ‘noise’.
3. The Gaussian distribution makes the least number of assumptions, subject to the
constraint of having a specified mean and variance and thus is a good default choice in
many cases.
4. Its simple mathematical form is easy to implement, but often highly effective.
Gaussian (Normal) Distributions
• Its pdf is given by
the mean (μ ) and standard deviation (σ)
• When mean is 0 and variance/standard deviation is 1, it becomes Standard Normal
Distribution.
Any arbitrary normal distribution X can be
converted to a standard normal distribution Z by
changing variables to Z=
Gaussian (Normal) Distributions
For a normal random variable, mean and variance are
• Its cdf is given by
• Or in terms of Z A standard normal random variable is defined as
the
one whose mean is 0 and variance is 1 which
means Z =
N(0;1).
For easier reference a function Φ(z) is defined as
Laplace Distributions
• The Laplace distribution, also called the double exponential distribution, is the
distribution of differences between two independent variable with identical
exponential distributions.
• This PDF is given by • The cdf is given by
• Where
μ = location parameter.
b = scale parameter and is > 0.
Bivariate random variables
• Bivariate random variables
• Joint distribution functions
• Joint probability mass functions
• Joint probability density functions
• Conditional distributions
• Covariance and correlation
Bivariate random variables
• Let us consider two random variables X and Y in the sample space of S of a random
experiment.
• Then the pair (X, Y) is called bivariate random variable or two dimensional random
vector where each of X and Y are associated with a real number for every element of S.
• The range space of bivariate random variable (X, Y) is denoted by R and (X, Y) can be
considered as a function that to each point ζ in S assigns a point (x, y) in the plane.
• (X, Y) is called a discrete bivariate random variable if the random variables X and Y both
by themselves are discrete.
• Similarly, (X, Y) is called a continuous bivariate random variable if the random variables
X and Y both are continuous.
• (X,Y) is called a mixed bivariate random variable if one of X and Y is discrete and the
other is continuous.
Joint distribution functions
• The joint cumulative distribution function (or joint cdf) of X and Y is defined as:
• Then FXY (x, y) = P (A ∩ B).
• For certain values of x and y, if A and B are independent events of S, then
Joint distribution functions
• Few important properties of joint cdf of two random variables which are similar to that
of the cdf of single random variable are
Joint probability mass functions
• For the discrete bivariate random variable (X, Y) if it takes the values (x , y ) for certain
allowable integers i and j, then the joint probability mass function (joint pmf) of (X, Y) is
given by
Joint probability mass functions
• In the discrete case, we can obtain the joint cumulative distribution function (joint cdf)
of X and Y by summing the joint pmf:
Marginal PMFs
Example 1: Join probability
distribution for discrete variable
Consider two random variables X and Y with joint PMF given in Table
Example 1: Join probability
distribution for discrete variable
Consider two random variables X and Y with joint PMF given in Table
Example 1: Join probability
distribution for discrete variable
Consider two random variables X and Y with joint PMF given in Table
Example 1: Join probability
distribution for discrete variable
Consider two random variables X and Y with joint PMF given in Table
Example 1: Join probability
distribution for discrete variable
Consider two random variables X and Y with joint PMF given in Table
Example 1: Join probability
distribution for discrete variable
Consider two random variables X and Y with joint PMF given in Table
Example 2: Join probability
distribution for discrete variable
if we let (x,y) denote one of the possible outcomes of one toss of the pair of dice, then certainly (1,
1) is a possible outcome, as is (1, 2), (1, 3) and (1, 4). If we continue to enumerate all of the
possible outcomes, we soon see that the joint support S has 16 possible outcomes:
Example 2: Join probability
distribution for discrete variable
Join Probability mass function
Example 3: Join probability
distribution for discrete variable
Show that the following function satisfies the properties of a joint probability mass function:
and determine the following:
a. P(X<2.5,Y<3)P(X<2.5,Y<3);
b. P(X<2.5)P(X<2.5);
c. P(X<3)P(X<3);
d. P(X>1.8,Y>4.7)P(X>1.8,Y>4.7);
e. the marginal probability distribution of the random variable X;
f. the conditional probability distribution of Y given that X=1.5;
Example 3: Join probability
distribution for discrete variable
Show that the following function satisfies the properties of a joint probability mass function:
Example 3: Join probability
distribution for discrete variable
Show that the following function satisfies the properties of a joint probability mass function:
Example 3: Join probability
distribution for discrete variable
Show that the following function satisfies the properties of a joint probability mass function:
Example 3: Join probability
distribution for discrete variable
Show that the following function satisfies the properties of a joint probability mass function:
Example 3: Join probability
distribution for discrete variable
Show that the following function satisfies the properties of a joint probability mass function:
Example 3: Join probability
distribution for discrete variable
Show that the following function satisfies the properties of a joint probability mass function:
Joint probability density functions
• In case (X, Y) is a continuous bivariate random variable with a<=X<=b and c<=Y<=d
then, pdf of it is
Joint probability density functions
• The cumulative distribution function FXY (x,y) of a continuous random variable X and Y
with probability density function f XY(x,y) is
In case (X, Y) is a continuous bivariate random variable with cdf FXY (x, y) and then the
function
Example: Joint probability density
functions
Example: Joint probability density
functions
Example: Joint probability density
functions
Marginal PDFs
Example 1 : Marginal PDFs
Where C=3/2
Find the marginal PDFs fX(x) and fY(y)
Example 1 : Marginal PDFs
Example 1 : Marginal PDFs
Example 2 : PDFs and Marginal PDFs
Example 2 : PDFs and Marginal PDFs
Example 2 : PDFs and Marginal PDFs
Example 2 : PDFs and Marginal PDFs
Example 2 : PDFs and Marginal PDFs
Example 2 : PDFs and Marginal PDFs
Conditional distributions
While working with a discrete bivariate random variable, it is important to deduce the conditional
probability function as X and Y are related in the finite space. Based on the joint pmf of (X, Y) the
conditional pmf of Y when X = xi is defined as
Conditional distributions
In the same way, when (X,Y) is a continuous bivariate random variable and has joint pdf f (x, y), then the
conditional cdf of Y in case X = x is defined as
Example: Conditional distributions
Example: Conditional distributions
Example: Conditional distributions
Example: Conditional distributions
Covariance and correlation
The covariance between two random variables X and Y measure the degree to which
X and Y are (linearly) related, which means how X varies with Y and vice versa.
So, if the variance is the measure of how a random variable varies with itself, then the
covariance is the measure of how two random variables vary with each other.
Covariance and correlation
Covariance can be between 0 and infinity.
Sometimes, it is more convenient to work with a normalized measure, because
covariance alone may not have enough information about the relationship among
the random variables.
For example, let’s define 3 different random variables based on flipping of a coin:
Just by looking into these random variables we can understand that they are
essentially the same just a constant multiplied at their output. But the covariance
of them will be very different when calculating
Covariance and correlation
To solve this problem, it is necessary to add a normalizing term that provides this
intelligence:
If Cov (X,Y) = 0, then we can say X and Y are uncorrelated.
If X and Y are uncorrelated if E(X, Y) = E(X)E(Y)
Covariance and correlation
Few important properties of correlation are:
Central Limit Theorem
• The central limit theorem states that if you have a population with mean μ and standard
deviation σ and take sufficiently large random samples from the population with
replacement then the distribution of the sample means will be approximately normally
distributed.
• This will hold true regardless of whether the source population is normal or skewed,
provided the sample size is sufficiently large (usually n > 30).
Central Limit Theorem
• Central limit theorem states that the sum of independent and identically distributed
(i.i.d) random variables (with finite mean and variance) approaches normal distribution
as sample size N .
• In simpler terms, the theorem states that under certain general conditions, the sum of
independent observations that follow same underlying distribution approximates to
normal distribution.
• The approximation steadily improves as the number of observations increase. The
underlying distribution of the independent observation can be anything – binomial,
Poisson, exponential, Chi-Squared etc.
Central Limit Theorem
• This is one of the most important theorems in probability theory.
• It states that if X1 ,… Xn is a sequence of independent identically distributed random
variables and each having mean μ and variance σ and
• converges in distribution to the standard normal random variable as n goes to infinity,
that is
• where Φ(x) is the standard normal CDF.
Example: Central Limit Theorem
• In a communication system each data packet consists of 1000 bits. Due to the noise, each
bit may be received in error with probability 0.1. It is assumed bit errors occur
independently. Find the probability that there are more than 120 errors in a certain data
packet.
We have to calculate
Solution P(X>120) = 1-P(X120)
N=1000 Convert 120 into Z
P=0.1 Z=(X-mean)std. deviation
X= 1 or 0 Z=(120-100)/sqrt(90)
Hence it follow binomial distribution Z=2.108
Mean= = np= 1000*0.1 =100 1-P(Z<2.108) -- Need to see in Positive Standard
Variance = np(1-p) = 1000*0.1*0.9=90 Normalize table
Standard Deviation = Sqrt(variance) = sqrt(90) =1-0.9826
=0.0174
Example: Central Limit Theorem
We have to calculate
Solution P(X>18000) = 1-P(X18000)
N=100 Convert 18000 into Z
Mean= = np= 170*100 =17000 Z=(X-mean)std. deviation
Variance = 100*30*30=90000 Z=(18000-17000)/300
Standard Deviation = Sqrt(variance) = sqrt(90000) Z=3.33
=300 1-P(Z<3.33) -- Need to see in Positive Standard Normalize
table
=1-0.99957
=0.00043
Sampling Distribution
Population is a finite set of objects being investigated.
Random sample refers to a sample of objects drawn from a population in a way that every member of the
population has the same chance of being chosen.
Sampling distribution refers to the probability distribution of a random variable defined in a space of
random samples.
Simple random sampling
There is an equal probability of selecting any
particular item
Sampling without replacement Sampling with replacement
(SRSWOR) (SRSWR)
Once an object is selected, it is A selected object is not
removed from the population removed from the population
Sampling Distribution
Population is a finite set of objects being investigated.
Random sample refers to a sample of objects drawn from a population in a way that every member of the
population has the same chance of being chosen.
Sampling distribution refers to the probability distribution of a random variable defined in a space of
random samples.
Sampling Distribution
Let N= Population size and n= sample size
Sampling With Replacement: No of sample Nn and probability of each sample being
chosen is the same 1/Nn
Sampling Without Replacement
No of sample=
probability of each sample being chosen is 1/NCn
Mean and Variance of sample
As X̅ is a random variable, it also has a mean μ and variance σx2 and it is related to
population parameters as:
Sampling with replacement
Sampling without replacement
HYPOTHESIS TESTING
• In terms of statistics, a hypothesis is an assumption about the probability law of
the random variables. Take, e.g. a random sample (X1 ,….. Xn ) of a random
variable whose pdf on parameter κ is given by f(x, κ) = f(x1 , x2 …., xn ; κ).
• We want to test the assumption κ = κ0 against the assumption κ = κ1 .
• In this case, the assumption κ = κ0 is called null hypothesis and is denoted by H0 .
Assumption κ = κ1 is called alternate hypothesis and is denoted by H1 .
• A simple hypothesis is the one where all the parameters are specified with an
exact value, like H0 or H1 in this case. But if the parameters don’t have an exact
value, like H1 :κ ≠ κ1 then H1 is composite.
HYPOTHESIS TESTING
• Concept of hypothesis testing is the decision process used for validating a
hypothesis. We can interpret a decision process by dividing an observation
space, say R into two regions – R0 and R1 . If x = (x1 ,…..xn ) are the set of
observation, then if x ∈ R0 the decision is in favor of H0 and if x ∈ R1 then the
decision is in favor of H1 .
• The region R0 is called acceptance region as the null hypothesis is accepted and
R1 is the rejection region.
• There are 4 possible decisions based on the two regions in observation space:
HYPOTHESIS TESTING
• Type I error: reject H0 (or accept H1 ) when H0 is true. The example of this
situation is in a malignancy test of a tumour, a benign tumour is accepted as
malignant tumour and corresponding treatment is started. This is also called
Alpha error where good is interpreted as bad.
• Type II error: reject H1 (or accept H0 ) when H1 is true. The example of this
situation is in a malignancy test of a tumour, a malignant tumour is accepted as a
benign tumour and no treatment for malignancy is started. This is also called
Beta error where bad is interpreted as good and can have a more devastating
impact.
MONTE CARLO APPROXIMATION
• Using the Monte Carlo technique, we can approximate the expected value of any
function of a random variable by simply drawing samples from the population of
the random variable, and then computing the arithmetic mean of the function
applied to the samples.