0% found this document useful (0 votes)
8 views43 pages

Chapter I Ad 2

Chapter I introduces basic probability theory, defining key concepts such as sample space, sample points, events, and event space. It explains the classical definition of probability, which is based on equally likely outcomes, and discusses its limitations, particularly in cases of non-mutually exclusive outcomes and infinite possibilities. The chapter also presents the relative frequency definition of probability, emphasizing its application in real-world scenarios through repeated experiments.

Uploaded by

abish6940
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views43 pages

Chapter I Ad 2

Chapter I introduces basic probability theory, defining key concepts such as sample space, sample points, events, and event space. It explains the classical definition of probability, which is based on equally likely outcomes, and discusses its limitations, particularly in cases of non-mutually exclusive outcomes and infinite possibilities. The chapter also presents the relative frequency definition of probability, emphasizing its application in real-world scenarios through repeated experiments.

Uploaded by

abish6940
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Chapter I: Basic Probability Theory

For scientific investigation of uncertainty of outcomes, it is first necessary to have a measure of the
degree of uncertainty. Since out comes of a random experiment are associated with uncertainties a
measure of the degree of uncertainty associated with an event (to be defined latter on) is known as
probability measure.

1.2. Sample Space, sample point, Events and Event space


As we have said earlier we can not predict with certainty the outcome of a random experiment in
advance of (prior to) the completion of the experiment; we can only list the total possible outcomes.

Definition:-
1. Sample space. The set of all possible outcomes of an experiment is called the sample space,
and denoted by S. The word sample is included as the reminder of the random nature of the
experiment meaning that its outcomes is uncertain so that a given outcome is one sample of
several possible outcomes. That is each individual outcome is occurring as a sample from
many possible outcomes of an experiment.
Some other symbols that are used in other texts to denote sample space are ,Z,X.
2. Sample point. - Each element of the sample space S is called a sample point and usually
denoted by  .
3. Events. - Events are subsets of the sample space, S.
4. Event space. The class or collection of all events associated with a given experiment is
defined to be the event space. We use E to denote event space.
! now let us see some examples which will help us understand the above mentioned definitions.
Example.1. Let the experiment be rolling of a well balanced single die. The sample space of the
experiment is S= {1, 2, 3, 4, 5, 6} which contains all the possible outcomes of the experiment. Each
outcome is a sample point, i.e., i  S . And this sample has a finite number of elements. Let A = {2,
4, 6}. Then A is a subset of S and defines the event of obtaining an even outcome. Let B= {4}. Then
B defines the event of obtaining a 4. Both A and B are subsets of the sample space and thus are
events. The size of this sample space(S), is finite.
We say that an event A occurs if a trial produces one of its elements, i.e. when 2 or 4 or 6 occurs. If a
4 is turned up, we say events A and B both occur. Since in this case B  A, when B occurs then A
occurs.
We may be interested on the following events
a) the outcome is the number 1
b) the outcome is even but less than 3
c) the outcomes is not even.
When an event contains only one element of S, like B above, it is called elementary event.
If we collect all subsets of S in one set and denote it by E this new set is called event space.
Example2. The experiment is to record the number of traffic death on the eve of the next new year in
Addis Ababa. Any nonnegative integer is a conceivable outcome of this experiment. The sample
space for this experiment is
S= {0, 1, 2, . . .} There are countably infinite number of points in the sample space. Hence the size of
this sample space is countably infinite. Since each point itself is an elementary event; so there is an
infinite number of events. Other possible event is A=fewer than 500 deaths, then A = {0,1,2...499}.

Example 3. Consider the agricultural experiment of growing wheat on an acre of land under a well
defined set of conditions, then recording the yield in quintals (x). Any non negative number of
quintals could be an outcome of this experiment. The sample space here is
S= { x : x  0 }
In this case the sample space has uncountably infinite number of elements since it takes values in
terms of an interval of the real number line.

We may be interested on the following events

i) Event A is a yield less than or equal to 100 quintals:


A = { x : x  100 }
ii) Event B is obtaining a yield between 20 and 50 quintals =
B= { x : 20 < x <50}

Thus we think of events as subset of the sample space S. Whenever A and B are events in which we
are interested, we can reasonably concern ourselves also with events A or B A and B, and not A.
Using set notations we can rewrite them as AUB, A  B and AC respectively.

Events A and B are called disjoint if their intersection is the empty set ;  is called the impossible
event. The set S is called the certain or sure event.

we know from our study of set theory the total number of subsets that can be formed from a given set
is defined by the formula 2n where n is the number of elements of the set under discussion. Exactly
this formula is used to obtain the number of possible events that can be formed from a given sample
space S. In this case the power n is the number of elements of the sample space. Therefore from
sample spaces with 'sufficiently large' or infinite sample points the number of events that can be
formed will be too large.

We have seen, in introduction to statistics course that we always try to know the likelihood of
occurrence of events, i.e. we always try to calculate probabilities only for events. Hence sample
spaces with infinite number of elements like example 2 and 3 above will have infinite events and it
would be impossible( or not necessary) to assign probabilities for all events. Hence, for reasons
beyond this module, for infinite sized sample spaces not all subsets of the sample space are entitled to
be events. Definitely the question is then which subsets of S are entitled to be events?

The collection of events as sub collection E, and known as event space, of the set of all subsets of S
which satisfy the following properties are entitled to be the required events:

a) S  E
b) If A  E, then A  E

c) If A1, A2 ...  E, then (A1UA2U... )  E . In compact form = U A E i
i =1
Using mathematical language any collection of events E of subsets of S which satisfy the above three
properties is known as  -filed or algebra of events.

A  -filed is a field which is closed under the operation of taking finite union countable intersection
and complementation.
1.3. Definitions of Probability
There are three widely used or applied definitions or interpretation of probability. All of them are, of
course, familiar to you, because you have exercised them not only in your previous course on
introduction to statistics. These definitions are:
a) Classical or a priori definition of probability
b) Relative frequency or posterior definition of probability
c) Subjective definition of probability
In this section we will discuss each of these definitions briefly.
1.3.1. Classical or a priori definition of Probability
The theory of probabilities in its early stage was closely associated with games of chance. This
association is the base for the classical definition. The classical definition of probability is based the
notion of 'mutually exclusive' and 'equally likely' experimental outcomes.

Definition. If a random experiment can result in N mutually exclusive and equally likely outcomes,
i.e if the sample space S consists of N mutually exclusive and equally likely outcomes,
and if NA of these outcomes have an attribute A. then the probability of A is given by the
ratio.
NA Favourablenumberofoutcomeswithattributed ( A)
P( A) = = ……..(1)
N Totalpossi bleoutcomes

If an event A is elementary event, meaning containing a single outcome, then its probability, P (A), is
1 . Where N is the sample size i.e. we are taking n(S) = N.
N

Example 4. Let the random experiment be tossing of a single die.


S= {1, 2, 3, 4, 5, 6}. These six outcomes are mutually exclusive since two or more
faces can not turn up at once or simultaneously. And if the die is fair, or unbiased, the six outcomes
are equally likely that is, each face is expected to appear with about equal relative frequency in the
long run.

In this example the probability of obtaining any single outcome say 2, on a single toss is 1 . If we let
6
event A be an event of obtaining an even number, three of the six outcomes have this attribute, i.e. A=
{2, 4, 6}. Then probability of the event A occurring is given by
NA N ( A) 3 1
P ( A) = = = =
N N 6 2

P(A) can also be obtained as the sum of probabilities of the sample point that result in the occurrence
1 1 1 3 1
of A. That is P(A) = P(2)+P(4)+P(6) = + + = = .
6 6 6 6 2

Example 5: Let the experiment be tossing a single coin 3 times: or tossing 3 coins simultaneously.

Total possible outcomes of this experiment can be obtained by applying Cartesian cross product
principle.
(H,T)X(H,T)X(H,T)= (HH,HT,TH,TT)X(HT)
S= {HHH, HHT, HTH, HTT, THH, THT, TTK, TTT}
If the coin is well balanced, or fair, these 8 possible outcomes are mutually exclusive and equally
1 1
likely outcomes. Each single outcome, or sample point, has the probability of  =  . Now suppose
8 N
that we want the probability that the result of a toss be at least two heads. Let an event of at least two
heads be represented by B.

B= {HHH, HHT, HTH, THH} then the probability of the event B occurring is given by
N ( B) 4 1
P( B) = = =
N 8 2
Example 6. Probability calculation associated with drawing a car, like drawing a spade, at random
from a deck of playing cards is another example of application of classical definition or interpretation
of probability. Dear student, refer the discussion of probability in your introduction to statistics
module for additional examples of this sort.
According to the classical definition, the probability P(A) of an event A is determined a priori with
out actual experimentation, i.e. no actual experiment need ever take place. In other words it is possible
to conceive the experiment, say, of throwing a coin 3 times with out actually doing it and proceed
logically on the assumption of unbiased coin. That is why it is called a priori probability.
The classical definition was introduced as a consequence of the principle of insufficient reason “In the
absence of any prior knowledge, we must assume that the events Ai (elementary) probabilities.” This
conclusion is based on the subjective interpretation of probability as a measure of our state of
knowledge about the events Ai.
By classical definition the probability of event A is a number between 0 and 1 inclusive, i.e. P(A) take
values between 0 and 1 inclusive, i.e. 0P(A) 1.

(i) P(A) 1 because total number of possible outcomes can not be less than the
number of outcomes with specified attributes, i.e. NN(A).
(ii) If an event is certain to happen, its probability is 1
(iii) If it is certain not to happen, its probability is 0.

Critique:
The classical definition of probability breaks down in the following cases.

a) If the various outcomes of an experiment are not mutually exclusive and equally likely.

Example7. What is the probability of the event of rolling a 4, with single toss of a die if the die is
biased? To put it differently, if the die is loaded and the probability of 4 equals, say 0.2, the number
0.2 can not be calculated from the ratio given by equation (1).

Example 8. The classical definition will not help us when we try to answer questions such as
- What is the probability that a child that will be born next in Bahir Dar will be a boy?
- What is the probability that a male will die before age 50?
- What is the probability that a candidate will pass in a certain test?
- What is the probability that it will rain tomorrow?

Since out comes of the above questions are not equally likely to answer the probability questions of
these sort using classical definition is impossible.

b) When the number of possible outcomes is infinite.

To deal with the above sorts of questions, we can consider broader notions of probability. This widely
applicable probability is called the relative frequency or posterior definition of probability.

Even if the classical probability interpretation is with strong weakness mentioned above it is still
widely applicable definition. To mention two of them:

i) In a number of applications, the assumption that there are N equally likely alternatives is well
established through long experience. Moreover, we can often, by an appropriate choice of the
sample space, reduce the problem to one in which all outcomes are equally likely.

A) In a number of applications it is impossible to determine the probabilities of various events


by repeating the experiment at hand a sufficient number of times. In such cases, we have no
choice but to assume that certain alternatives are equally likely and to determine the desired
probability using classical definition. In this case the classical definition will be used as
working hypothesis. This hypothesis will be accepted if its observable consequences agree
with experience, otherwise it is rejected.
1.3.2. Relative Frequency or posterior definition of probability
We have raised some important questions of probability when we discuss the short coming of
classical definition. Look to example 8. To deal with those sorts of questions, we have to consider
alternative and broader ways of calculating probability of events. One is the relative frequency, or
posterior, definition of probability. The relative frequency interpretation of probability is based on the
following concept:

Suppose that we repeat an experiment a large number N of times, under essentially similar condition,
and suppose that A is some event which may or may not occur on each repetition. Experience of such
experimentation shows that the proportion of times that A occurs settles down to some value as N→
∞. That is to say, if N(A) is the number of times that A occurs in the N trials, the ration
N ( A)
N
appears to converge to a constant limit as N increases indefinitely. We can take the ultimate value of
this ratio as the P(A). Then the relative frequency definition is given by
N ( A)
P( A) = lim ………………..(2)
N → N

That means assume that a series of experiment can be made keeping the initial conditions as equal as
possible. An observation of a random experiment is made, then the experiment is repeated and another
observation is taken. When the experiment is repeated up to sufficiently large times, in many of the
cases the observations fall into certain classes where in the relative frequencies are quite stable. This
stable relative frequency can be taken as the probability (approximate) of events.
Example 9. Consider tossing a coin (balanced or unbalance 1000 times. The following result is
expected from this experiment.

Outcome Observed Observed relative Long-run expected


frequency frequency relative frequency
H 540 0.54 0.50
T 460 0.46 0.50
Total 1000 1.00 1.00

540 1
Probability of obtaining H = P( H ) =  0.5 = , P (T ) = 0.5
1000 2
Example 10 . If we find up on examination of large records that about 51% of births, in one locality,
are male; it might be reasonable to postulate that the probability of a male birth in this locality is equal
to P and P 0.51, i.e.
P(male birth in that locality)=P0.51.

Example 11. A study of 8,000 economics graduates of A.A.U was conducted. The study revealed that
400 of these students were not employed in their major areas of study. What is the probability that a
particular economics graduate will be employed in an area other than his/her field of study?

400 1
P(a graduate will be employed in area other than his/her major) P  
8000 20

What we learn from the above examples is that the probability of an event happening in the long run
is determined by observing what fraction of the time similar events happened in the past.
Numberoftimeseventoccuredinpas t
Pr obabilityo feventhappening =
Tota ln umberofobservations
Dear distance student! Please notice that the relative frequency definition does not a) require events to
be equally likely b) necessarily requiring the objects of the experiment to be unbiased. When the
objects of experiment are biased (not balanced) it is the relative frequency definition that must be used
to determine the probability of events.

In general the important thing is that it is possible to conceive of a series of observations or


experiments under identical conditions. Then a number p can be postulated as the probability of the
event A happening and p can be approximated by the relative frequency of the event A in a series of
experiments. i.e.
N ( A)
P( A) = P  , N →  …………….(3)
N
1.3.3. Subjective Definition
There are many situations, however, when the relative frequency definition is difficult to apply. If
there is little or no past experience on which to base probability it may be guessed subjectively.
Essentially, this means evaluating the available information and then estimating the probability of an
event. This probability is known as subjective probability.

Example 12. i) Estimating the probability that new product of a firm will be successful in
the market.
ii. Estimating the probability that a distant student will score an A in the course. It depends on
evaluation of individuals about the performance of a student in other subject.
iii. When a person, may be friend of you, says that the probability of rain tomorrow is, say 70% -
he is expressing his personal degree of belief.
You may have noticed from the above examples that subjective probabilities are based on any and all
information available to each person about the uncertainties related to the outcome of the event and
will differ from person to person.

The concept of subjective probability is the function for the Bayesian approach to statistics which we
don’t deal in the course.

1.4. Axioms of Probability: the rules of Probability


The modern theory of probability is based on the axiomatic approach introduced by the Russian
mathematician A.N. Kolmogorove. In axiomatic approach some concepts are laid down and certain
properties, or commonly known as axioms, are defined and from these axioms the entire theory is
derived by logic of deduction.

The axiomatic definition of probability includes the above three definitions of probability but is free
from their draw backs.

Discussion of the axiomatic definition of probability is conveniently couched in the language of set
theory. Hence before giving axiomatic definition of probability below we will present, the summary
of what we know about set theory which are relevant to our interes

1. Demorgan’s Law
( AUB) = A  B , and ( A  B ) = A UB . Using Ven diagram

A B
A B ( A  B) = A  B

A B ( A  B) = A  B

8. Set difference Law


A–B= AB
A – B = A- (A  B) = ( A  B ) –B
A- (B- C) = (A- B)  (A – C)
9. Mutually exclusive or disjoin subsets. Subsets A and B of S are defined to be disjoint or
mutually exclusive if AB =. Similarly if Ai Aj =  for every i  j.
10. If A and B are subsets of S then
a) A= AB  A B
b) AB  A B = . Thus subsets AB and A B are disjoint subsets.
11. If A  B, then AB = A, and A  B =B
1.4.1. Axiomatic Definition of Probability
Given the sample space S and event space E (assumed to be an algebra of events), a probability
function P (A) is defined as a set function with domain E (algebra of events) and counter domain [ 0,
1] which satisfies the following axioms.
Axiom i) P(A) >0 for every A  E
Axiom ii) P(S) =1
Axiom iii) if A1, A2, . . . is a sequence of ( finite or infinite) mutually exclusive

events in E.( i.e Ai Aj =  for i  j) and if A1  A2,  --- = A i  E, then
i =1

 
 
P   Ai  =  P(A ). i
 i =1  i =1
Note that when we say algebra of events we mean that events in E satisfy the algebra of subsets
mentioned in the summary of algebra of set operation above. This is because events are subsets of a
sample space.

From the above 3 axioms we are able to deduce a number of laws ( properties) of probability.
Since E is an algebra of events, if we assume that A and B  E we know that A , A  B, AB, A B ,
etc. are also elements of E, and hence it makes sense to talk about P( A ), P ( A  B), P (AB) , P (
A B) etc.

1.4.2. Rules (properties) of Probability


For each of the following rules assume that S and E are given and P( A) is a probability with domain
E.
1. The probability of the impossible event is zero, P () = 0
Proof
Since S   = , then S and  are disjoint events SU= S
P (S) = P(S)
P(S) + P() = P(S) ………… by axiom iii
 P() = P(S)- P(S)
1 -1 = 0  P() =0

2. Probability of the complementary event.


If A is an event in E, then P (Ac) =1-P (A)…………………(4)
Proof
A  Ac=, then A and Ac are disjoint events.
A  Ac = S
P(A  Ac) = P(S)
P (A)+ P(Ac) =1 ………by axiom iii and ii.
 P (Ac)= 1- P (A)
3. Law of addition of probabilities
a) General law of addition of probability
i) If A and B are any two events in E, the probability of occurrence of at least one of
the two events is given by .
P(A  B) = P(A) + P(B) – P(AB)………………(5)
The proof of this law is well facilitated by the help of Ven diagram and it is given below.
Proof
Proof Aa A B = A  A B
and A B = B − A = B − AB
A Since A  A  B =, A and A B are disjoint
B P (A  B) = P(A) + P( A B)
But P ( A B)= P(A) + P(B) – P (AB). Then
P (A  B) = P(A) + P(B) – P(AB)

We can arrive at the same result by relative frequency approach.


N ( A  B) Numberof ( A  B)
P (A  B) = =
N (S ) Sizeofsamplespace

P(A  B)=
N ( A) − N ( AB) + N ( AB) + N ( B) − N ( AB)
N (S )
N ( A) + N ( B) − N ( AB)
=
N (S )
N ( A) N ( B) N ( AB)
= + − = P( A) + P( B) − P( ANB)
N (S ) N (S ) N (S )
ii) For three events, A, B, C in E, then
P (A  B  C) = P(A)+ P(B)+ P(C) – P (AB) – P(BC) – P (AC) + P(ABC) ……(6)

Proof
N ( A  B  C)
P (AUBUC) =
N (S )

=
1
N ( A) − N ( AB) − N ( AC) + N ( ABC ) + N ( B) − N ( BC ) + N (C )
N (S )

After re arranging and dividing each by N(S we get


C
N ( A) N ( B) N (C ) N ( AB) N ( BC ) N ( ABC )
+ + − − + N (S )
N (S ) N (S ) N (S ) N (S ) N (S )
A B

:. P ( A  B  C) = P(A)+P(B)+P(C) – P(AB)-P(AC)- P(BC) + (ABC)


iii) When we extend the general Law of addition of probabilities for n events,
i.e. A1, A2, . . .. An  E we have
 n

P [A1 U A2 U . . . . U An ]= P   A  =
i
 i =1 
 P( Ai ) −  P( Ai A j ) +  P( Ai A j Ak )..... + (−1) n+1 P( A1nA2 2n...nAn) …(7)
i i<j i<j<k
A rule for remembering the algebraic sign is that probabilities involving even numbers of events occur
with a negative sign, the other with positive sign.
Corollary. P (A-B) = P(A) – P (AB). ……………………………(8)
We know that A-B = A B = A-AB . Then P (A-B) = P (A B ) = P(A)- P(AB)

b) Mutually Exclusive Events


If events are mutually exclusive (disjoint) events we simply apply axiom iii to obtain the probability
of occurrence of at least one of them.
i) If A and B are mutually exclusive events then the probability of occurrence
of A or B is then the sum of the individual probabilities P (A or B) =
P (A  B) = P ( A) + P(B)……………………………………………….(9)
ii) For three events, i.e. A,B, C  E then probability of occurrence of A or B or
C is the sum of their individual probabilities.
P (A  B  C) = P(A) + P(B) + P(C) …………………………….(10)
iii) If we partition (classify) the sample space into n mutually exclusive groups, say A 1, A2, A3 . . . .
n
An. Then we have S= A1  A2  A3  . . .  An, i.e S =  Ai , Then P(S) = P
i =1

 n
 n
  Ai  = P( A1 ) P( A2 ) + ... + P( An ) =  P( A ) ……………………………(11)
i
 i =1  i =1
Since P(S) =1 Then we have 1= ( A1) + P(A2) + . . . . P (An)
Corollary . If A and B are elements of E, then P (A)= P (AB) + P (A B )

Proof
We Know that A =A( B  B )= A  S= A we also know that AB  A B =, hence AB and A B
are mutually exclusive events. Therefore
P(A)= P (AB  A B )= P(AB) + P (A B )
Generalization

If A1, A2, A3 . . . . are sequences of mutually exclusive events in E, and A i  E . Then P(A1
i =1

    
 A2  . . .) = P  Ai  =  P( Ai ) =  p ( Ai )
 
 i= j  i =1 i =1

4. Some Laws (theorems) on inequality of probabilities


A) If A and B  E and A  B, then
P (A) < P(B)…………………………..(12)
Proof
B= BA  B A , But BA = A, there fore B = A  B A .
Since A  B A =, A and B A are disjoint events.
Hence P (B) = P (A  B A ) .
P(A)= P (B) – P (A B ), if P (A B ) > 0 then P (A) < P(B) .

B) Boole’s inequality.
 n

If A1, A2, .... An  E, then P   Ai   P( A1 ) + P( A2 ) + ..... + P( An ) ……(13)
 i =1 

Proof
Take two events A1 and A2  E. Then
P (A1  A2)= P(A1) + P(A2)- ( A1A2)  P (A1)+ P(A2)
That is P (A1  A2) = P(A1) + P(A2) if A1  A2 =.
How ever, if A1 A2   obviously P (A1  A2) < P(A1) + P (A2)
Thus far we have seen some of rules of probability what is left is the multiplication of probabilities.
We will discuss this rule together with discussion of conditional probability and independence. Before
that we will see counting procedures.
1.5. Counting Procedure
If the number of possible outcomes in an experiment is small, it is relatively easy to list and count all
the possible events and assign probability to all events. However, if the size of the event, say N(A) ,
and the size of the sample space, N(S) are large for a given random experiment with a finite number
of equally likely outcomes, the counting procedure become difficult problem. Such counting is
usually handled by use of counting procedures known as combinatorial formulas.
There are three widely used counting (Enumeration) methods namely.
• multiplication principle
• permutations
• Combinations. we will see them one by one
I. Multiplication Principle
Suppose we have two sets F and T, If F has m distinct objects f1, f2, . . . fm and T has n distinct
objects, t1, t2, . . . tn , then the number of pairs ( fi, tj) that can be formed by taking one object from set
F and a second from the set T is (m)(n). ……………..(14)

Example 13. Let the random experiment be throwing a balanced coin and fair die and record the
paired outcomes. If set F contains total possible outcomes of throwing a coin its elements are ( H, T)
and N(F) =2 and if set T contains the total possible out comes of throwing a die its elements are
{1,2,3,4,5,6, } and N (T) = 6 . The outcomes of our random experiment are obtained by the Cartesian
cross products of the two sets. That is
(H, T) X ( 1, 2, 3, 4, 5, 6)
Total possible paired outcomes are
( H ,1), ( H ,2), ( H ,3), ( H ,4), ( H ,5), ( H ,6)
S=  
(T ,1), (T ,2), (T ,3), (T ,4), (T ,5), (T ,6) 

There fore total number of possible out comes are 12. It can simply be obtained by taking the product
of the size of the two sets under discussion. In our example it is 2 x 6 =12

In general if there are m ways of doing one thing and n ways of doing another thing, there are (m)(n)
ways doing both .
Multiplication formula. Total number of arrangements =(m)(n)

This principle obviously can be extended to any number of procedures. That is if there are m ways of
doing one thing n ways of doing another thing and r ways of doing the third thing then total number of
arrangements is given by (m)(n)( r) .

Example 14. A manufactured item must pass through three controls stations. At each station the item
is inspected for particular characteristics and marked accordingly. At the first station, 3 ratings are
possible while at the last two stations 4 ratings are possible. Therefore there are 3 x 4 x 4 = 48 ways in
which the item may be marked.

Corralary: Let us say that there are m ways of doing one things; and if there are n ways of doing
another thing. Suppose, unlike the above mentioned possibility, that it is not possible to perform them
together. Then the number of ways in which we an perform the first or the second is given by m + n..
It is possible to extend it to more than arranging two distinct sets.

Example15 : Suppose that you and your friends are planning a trip to some tourist attraction areas,
and all of you are deciding between bus or air transformation. If there are 3 bus routes and 1 air
routes, then there are 3 + 1 =4 different routes available for your trip.

II Permutations
Suppose that we have n different objects. Then the question is in how many ways may these objects
be arranged (Permuted ) where the order of arrangement is important. That is ABC and CBA are
considered as two different arrangements. In general, consider the following scheme. Arranging n
objects is equivalent to filling them into a box with n compartments in some specified order.
1 2 . . . n
We have n choices to fill the first compartment. Once we choose any one of n objects to fill the first
compartment we will have ( n-1) options to fill the second compartment etc. and for the last
compartment we have exactly one option, the total number of arrangements denoted by
n P n , is
given by

n Pn
= n( n-1) ( n-2) n-3) . . . . . (2) (1) = n!
We may be interested in the number of permutations possible when we choose r ( <n) objects from n .
That is at arrangement ( Permutation ) of n objects taking r objects at a time . Like the above case the
first compartment position can be filled in n ways, the second in ( n-1) ways etc, and the last
compartment in ( n-r + 1) ways. Hence total permutation of n objects taken 4 at a time is
n!
n Pr
= n( n-1) ( n-2) (n-3) . . . . . . ( n-r+1) = ………………(15)
(n − r )!
Example16. A manufacturer uses a color code to identify lots of manufactured items. The code
consists of stamping seven colored stripes on the container. The order of the color is significant and
each identification uses all 7 colors.

Suppose that the manufacturer wishes to use seven stripes but has 15 colors available. How many
distinct markings can he get?

15! 15!
= = = 32,432,400.
15 P 7 (15 − 7)! 8!
III. Combinations
When arrangements are made without regard to order, they called combinations. In this case ABC and
CBA are considered to be the same combination. The number of combination of n different objects
 n n! P
or   = = n r
nC
taken r (0 < r < n) at a time denoted by
r
 r  (n − r )!r! r!
…………………………………………(16)

Example 17. From eight persons, how many committees of 3 members may be chosen? Since two
committees are the same if they are made up of the same members we have
8 8!

 3
 = 5!3! = 56
possible committees.
 
Example 18. A group of eight persons consists of 5 men and 3 women. Here we must do two things:
choose 2 men out of 5; and choose one women. The required number of committee is given by

 5   3
 .  = 30 Committees
 2  1 
Dear colleague! So far what we have discussed in this sub section is determining the size of the
sample space i. e n (S). Obviously you are asking your self how to use these counting (enumeration)
methods in deciding the size of an event and hence obtain their probabilities?

1.6. Conditional Probability and Independence

Dear Learner! Understanding of conditional probability and probability of independent events will be
easier by first learning the concept of joint and marginal probabilities.

Joint and Marginal probabilities.


In considering events which can be classified by two or more criteria it is often helpful to portray the
situation in one of the following tables.

Tables 1. Classification by two criteria


B B Marginal Total
A P (AB) P (A B ) P(A)
A P ( A B) P( A B) P( A )
Marginal Total P(B) P( B ) 1
The probabilities inside the body of the table are joint probabilities, i.e. P(AB), P (A B ) and P( A B
). In other words they are probabilities of the joint occurrence of the two events.
The probabilities given in the last column and row are marginal probabilities. Note that marginal
probabilities are the usual probabilities and are obtained by summing the joint probabilities in a given
row or column. For instance P (A)= P(AB) + P (A B ) and P ( B ) = P (A B ) + P ( A B ).

1.6.1. Conditional Probability


An experiment is repeated N times and on each occasion we observe the occurrence or non-
occurrence of two events, say , A and B. Given these two events we are interested to know the
probability of event A given that event B has occurred. This probability is known as conditional
probability of event A, given that B has occurred and denoted by P(A/B). Similarly conditional
probability of event B, given that A has occurred is expressed as P (B/A).

Definition. Let A and B be two events in E (event space) of the given Probability space. If P(B) >0
the conditional probability that event A occurs given that B occurred is defined by
P( AnB) P( AB) Jo int Pr obability
P( A / B) = = = …………….(23)
P( B) P( B) M arg inal Pr obilityofB.
P ( AB)
Similarly P(B/A) = , ifP ( A)  0. …………………………………(24)
p ( A)
Example 22. Two fair dice are thrown. Given that the first show 3, what is the
probability that the total exceeds 6?

Solution.
Clearly S= {1,2,3,4,5,6} x {1,2,3,4,5,6}
Then N(s)= 36.
Let B be the event that the first die shows 3, and A be the event that the total exceeds 6. In usual
notations
B= {(3,b): 1b6} , A = {(a,b): a+b>6}
AB = {(3,4),(3,5), (3,6) = joint occurrence.
P ( AB) N ( AB) / N 3 6 3
Hence P(A/B) = = =  =
P( B) N ( B) / N 36 36 6

Example 23. A family has two children. What is the probability that both are boys, given that at least
one is a boy?
Solution. The older and younger child may each be male or female, so there are four possible
combinations of sexes, which we assume to be equally likely. Hence the sample space will be,
S= { GG,GB,BG,BB}
We have P(GG) = P(GB) = P(BG) = P(BB) = ¼
The question is to obtain P(BB/one boy at least)
P ( BB  (GB  BG  BB )
= P(BB/GB  BG  BB)=
P (GB  BG  BB )
P( BB ) 1 3 1 4 1
= =  = x =
P(GB  BG  BB ) 4 4 4 3 3
From the definitions of conditional probability we can see that
P(AB) = P(A/B)xP(B)= P(B/A)P(A)
An important effect of conditional probability is reducing the sample size. To make this idea clear let
us revisit the formula again
P ( AB )
P(A/B) = . Using frequency approach
P( B)
N ( AB)
P(AB) = , where N(AB) the number of occurrence of AB in the N
N
N ( B)
occurrence. P(B) =
N
N ( AB) N N ( AB)
Hence P(A/B) = x = , From this
N N ( B) N ( B)
We can see that the given event is our new sample space, whose size is smaller than the size of the
actual sample space (S).

In passing we would like to tell you that actually all probabilities are conditional, because
probabilities can not be computed except on the basis of probabilities of basic events. Frequently the
conditioning statement in a probability statement is omitted either because it is obvious in the context
of the problem or it is universally accepted, i.e; (P(A)= P(A/S).

The next question to be investigated is that does the conditional probability satisfy the three axioms
and various postulates of probability?
The answer is yes.
i) P(A/B) = P(AB)/P(B)  0 for every AE
i.e. 0 P(A/B)  1
ii) P(S/B) = P(SB)/P(B) = 1
Proof
Since B  S, then BS= B
P ( SB ) P ( B )
Hence = =1
P( B) P( B)
 

iii) If A1,A2,... is a sequence of mutually exclusive events in E and  A E, While 


i=
i
i =1
Ai

=  , then
  
  p  ( Ai B)  P[ A B] i
P  Ai / B  = P( A1  A2  ... / B) =  i =1 = i =1
 i =1  P( B) P( B)
 

= P ( A1 / B) + P( A2 / B) + P( A3 / B) =  p ( Ai / B) ………………..(24)
i =1

The conditional probabilities also satisfy the following rules /properties/ of probabilities:

Assume that the probability space is given and let BE and P(B) >0.

1. P(/B) = 0
P(/B) = P(B)/P(B) =0 [Since B= and P()=0]
2. If A is an event in E(event space), then
P( A /B)= 1-P(A/B)
Proof
We know that A  A = S and AA =  using this result we have
P( AUA / B) = P( A / B) + P( A / B) = P( S / B) axiom iii)
P ( A / B) + P( A / B) = 1
 P( A / B) = 1 − P( A / B)
3. If A1 and A2 E, then
P[A1/B) = P(A1A2/B) + P(A1 A2 /B)
Proof
We can write A1 = A1A2  A1 A2
And we know A1A2  A1 A2 = A1(A2n A2 )=

Using this result we have


P(A1/B) =P( A1A2  A1 A2 /B) = P(A1A2/B) + P(A1 A2 /B)
4. For every two events A1 and A2 E.
P(A1  A2/B) = P(A1/B)+P(A2/B)-P(A1A2/B)

[Link] of total probability


Let B1,B2 ...Bn be a partition ( collection of mutually exclusive events) of sample space S
satisfying
n
S= B
j =1
j and if P(Bj) >0 for j =1 , ....,n then

For every A in E (event space)


n
P(A) =  P( A / B
j =1
j ) P( B j ) ………………………..(25)

Proof
We know that AS= A(B1  B2  ...  Bn)
n
= AB1  AB2  ....  ABn=  AB j = A then
j =1

n  n 
P(A)= P (  AB ) j =P   A B j  P ( B j ) =
j =1  j =1 
P( A B1 ) P( B1 ) + P( A B2 ) P( B2 ) +  + P( A Bn ) P( Bn )
n
 P( A) =  P( A / B j ) p( BJ )
j =1
The total probability theorem will help us to calculate the probability of a single event A using the
conditional probability and the concept of simultaneous occurrence of two events.
The formation of a partition of S means that when the experiment is performed then exactly one of the
events Bj will occur. The result

P(A)=  P( A / B )P(B )j j is extremely useful relationship, because often when


P(A) is required it may be difficult to compute it. But, with additional information that Bj has
occurred and the joint probability P(AB) we may be able to evaluate P( A BJ ) and it is possible to
compute p(A).
Example 24:- A certain item is manufactured by three factories, say, 1, 2 and 3. It is known that
factory 1 produce twice as many items as factory 2, and factories 2 and 3 produce the same number of
items in a given production period. It is also known that 2% of items produced by 1 and 2 are
defective while 4% of items produced by 3 are defective. All the items produced are put into one
stockpile, and then one item is chosen at random. What is the probability that this item is defective?

Solution
Let A = {the item is defective}
B1 = {the item came from factory 1}
B2 = {the item came from factory 2}
B3 = {the item came from factory 3}
P(B1) = 1 , while P(B2) = P(B3) = 3
2 4
Probability that the defective item is selected from 1, is
P(A/B1) = 2% = 0.02
P(A/B2) =2% = 0.02
P(A/B3) =4% = 0.04
P(A) = P(A/B1) P(B1)+P(A/B2) +P(A/B3) P(B3)
P(A) = (0.02)( 1 ) +( 0.02)( 3 )+( 0.04)( 3 ) = 0.025
2 4 4

1.6.3. Independence

In general, the occurrence of some event B changes the probability that an other event A occurs, then
P(A) is replaced by P(A/B). If the probability remains unchanged, that is if P(A/B) = P(A), then we
say that event A is independent of event B. In other words the occurrence of B has no bearing on the
occurrence of A.

Definition. Let A and B be two events in E of the given probability space .Events A and B are called
independent if and only if any one of the following conditions is satisfied
(i) P(AB) = P(A)P(B)
(ii) P(A/B) = P(A), if P(B) >0 …………………….(26)
(iii) P(B/A) = P(B), if P(A)>0
Example25. Suppose that a fair die is tossed twice.
Let A = { the first die shows an even number}
B = {the second die shows 5 or 6}
Obviously it is clear that events A and B are totally unrelated. Knowing that B did occur provides no
information about the occurrence of A. N(S) = 36 and the outcomes are equally likely.
18 1 12 1 1 1 1
P( A) = = , P( B) = = . Hence P(AB) = P(A).P(B)= x =
36 2 36 3 2 3 6
P ( AB) 1 3 1
P(A/B) = = x = .
P( B) 6 1 2
Independence of A and B imply independence of other events as well.
If A and B are two independent events in E, then A and B , A and B and A and B , are all
independent events. Let us see some of these:
A) P(A B ) = P(A) P( B )
Proof: We know A B = A-AB then
P(A B )= P(A) -P(AB)
= P(A)-P(A).P(B)
= P(A) [1-P(B)]
= P(A) P( B ) [b/se 1-P(B)=P( B )]
B) P( A B)= P( A )P(B)
Proof: We know that B A = B-AB, hence
P(B A ) = P(B) =P(AB)
= P(B) -P(B).P(A)
= P(B) [1-P(A)]
= P(B) P( A) [b/se 1-P(A)=P( A)
C) P( A B ) = P( A ) P( B )
The notion of independent events may be extended to more than two events.
Independence of several events.
Let A1, A2,… , An be family of events in E. Events A1,A2…,An are called independent if and only if
P(AiAj) = P(Ai) P(Aj) for i j
P(AiAjAk) = P(Ai) p(Aj) P(Ak), for ij, jk, ik

 n
 n
More generally P   i   P( Ai )
A = [  is sign of multiplication] …….(27)
 i =1  i =1
If the family { A1,A2,---An} has the property that P(Ai  Aj) = P(Ai) P(Aj) for i  j then it is called
pair wise independent. This pair wise independence is not necessarily true always. To show this is
beyond the scope of this module.
The definition of independence of events is used not only to check whether two events are
independent but also to model some experiment. For example, for a given experiment the nature of
the events A and B might be such that we are willing to assume that A and B are independent; then
the definition of independence gives the probability of the event AB in terms of P(A) &P(B).

Example26: Consider the experiment of sampling with replacement from an urn containing M balls
of which K are black and M-K white. Since balls are being replaced after each draw, it is reasonable
to assume that the out come of the second draw is independent of the outcome of the first. Then
P(2 blacks in first two draws) = P(black on 1st draw) x P(black on 2nd draw) =
2
 K  K   K 
   =  
 M  M   M 

Multiplication rule of probability

Let A1,…,An be events in E for which P(A1  ---  An-1) >0; then P(A1A2  ---  An) = P(A1)
P(A2/A1) P(A3/A1A2) --- P(An/A1A2---An-1) ……………………(28)
In other words we are often interested in P(A1A2A3) let the joint occurrence of A2A3 be denoted by B.
We know, from conditional probability, that
P(A/B)= P(A/B)P(B)
= P(A1/A2A3) P(A2A3)
We also know that
P(A2A3) = P(A2/A3) P(A3) - Substituting this in the above formula we get
P(A1A2A3) = P(A1/A2A3) P(A2/A3) P(A3)
It can be written in several other ways by permuting the subscripts, such as
P(A1A2A3) = P(A3/A1A2) P(A2/A1) P(A1)
Then extending the above chain rule to n events we get the following rule:
P(A1A2  ---  An) = P(A1/A2  ---  An) P(A2/A3  ---  An) --- x = P(An-1/An) P(An).
If A1,A2, ---,An) are pair wise independent then
n
P(A1,A  ---  An) = P(A1). P(A2) ---x P(An) =  P( A ) ………….(29)
i =1
I
The multiplication rule is primarily useful for experiments defined in terms of stages.
1.7. Bayes' Theorem
Many of the important problems of economics, science, engineering, etc are concerned with problem
of causation. If P(A/B) =1 and B occurs, then obviously A occurs. Then it would be stated, under
these circumstances, that A is because of B if B happens before A. Suppose that the occurrence of one
of n mutually exclusive events A1,A2, --- , An is necessary for the occurrence of B. Given that B has
occurred, we wish to know which of A's preceded it, so we ask for the probability that A i has
occurred, given that B occurred. This is the problem which Baye's theorem solves. The formal
definition of the theorem is as follows.

Let S be the sample space of some experiment. The disjoint events A1,A2, ---, An are partition of S
satisfying
n
S= A
i =1
i and P(Ai) >0 for i=1, ---, n, then for every B E for which P(B)>0

P ( B / Ai ) p ( Ai ) P ( B / Ai ) P ( Ai )
P(Ai / B) = = n ……………(30)
 P( B / Ai ) P( Ai )
P( B)
i =1
Proof
P( Ai B) P ( B / Ai ) P( Ai )
P(Ai / B) = and P(AiB) = P(B/Ai) P(Ai), then P(Ai / B) = .
P( B) P( B)
But P(B) = P(A1B) + P(A2B) +---+P(AnB).. (from theorem of total probability)

P ( B / Ai ) P ( Ai )
Therefore P(Ai / B) = n

 P( B / A ) P( A )
i =1
i i

This is known as Baye's theorem because Baye (1763) was one of the first to consider this problem.
Example 27- Three different machines are used to produce chocolate chip cookies by Lovely
Company, which promise to have at least six chips in every cookie. Suppose machine No 1 produces
20% of Lovely cookies, No.2 Produces 30% , and No.3 Produces 50%. Also suppose that the
machines represent different vintages of capital so that 1% of the cookies produced by machine No.1
are defective, in the sense that they have less than six chips, 2% of those produced by machine No.2
are defective, and 3% of those produced by No. 3 are defective. If one cookie is chosen at random and
observed to be defective what is the probability that it was produced by machine No. 2?
Solution Let Ai is the event that the randomly chosen cookie is produced by machine No.i

Thus P(A1) = 0.2, P(A2)= 0.3 and P(A3) = 0.5 and if B is the event that a randomly drawn cookie is
defective, then P(B/A1) = 0.01, P(B/A2) = 0.02 and P(B/A3) = 0.03 using Baye's theorem we have
P( B / A2 ) P( A2 ) (0.3)(0.02)
P(A2/B) = =
3
(0.2)(0.01) + (0.3)(0.02) + (0.5)(0.03)
 P( B / A ) P( A )
i =1
i i

= 0.26
Bayes' theorem has interesting interpretation. The probabilities P(Ai) can be called 'Prior '
probabilities since they represent the probabilities of different machines in the above example,
producing a randomly selected cookie before that cookie is checked to see if it is defective. The
conditional probability P(Ai / B) is called a 'posterior ' probability since it represents our assignment of
probabilities after the sample evidence of the defective cookie is obtained. Thus, stated verbally,
Bayes' theorem is the probability of an event Ai is proportional to the probability of the sample
evidence after Ai times the prior probability of Ai.
Chapter II: Random Variables and probability distribution
2.2. The Concept & Definition of Random Variable
In the previous chapter when we were describing the sample space of an experiment we did not
specify that an individual outcome of a random experiment can be or needs to be a number. In our
earlier discussions of assigning probabilities to events we have seen a number of examples in which
the results of the experiment was not a numerical quantity. For example, in classifying a manufactured
item we might simply use the category of defective and non – defective.

How ever, in many experimental situations we are going to be concerned with measuring and
recording it as a number. That means in many experimental situations we want to assign a real
number to every element of the sample space S.

The following examples will make our intonation very clear

Example 28. Consider an experiment of tossing two coins at once. The sample space associated with
this experiment is
S= { HH, HT, TH, TT}
Let us define the random variable X to be the number of heads obtained in the two tosses.

Out come No of Random Variable


heads X
TT 0 0
TH 1
HT 1 1
HH 2 2
Thus possible values of the random variable X are {0, 1, 2,} .That means X may take a value of 0 or 1
or 2 depending on the result of the experiment. So the random variable X associates a real number
with each out come of the experiment.

Example 29. Consider an experiment of tossing of two 6 sided fair dice. It is known that S has 36
possible out comes; it is possible to define many random variables from this experiment. Let us see
two of them. Let random variable X be the sum of the upturned faces. Also let Y denote the absolute
difference between the upturned faces. As we know the minimum value that the random variable X
will possibly take is 2 (i.e. 1+1=2) , and the maximum value that it will possibly take is 12 ( i.e 6+6) .
Similarly the minimum value that the random variable Y will possibly take is zero ( i.e 1-1) and the
maximum value that it will take is 5, i.e. either (1-6) or( 6-1) . We can summarize all possible values
that the two random variables will possibly assume as follows:

X ( Sum of Y absolute value of the


upturned numbers differences)
)
2 0
3 1
4 2
5 3
6 4
7 2
8
9
10
11
12
Here again what we did is assigning a real number x or y to every element (i, j ) of the sample
spaces S depending on our characterization of (sum or absolute difference of turned u faces) forming
random variables.

That is x = X(  ) or y = Y (  ) is the value of a function X or Y from the sample space to the


real number . In short we are mapping the sample space S, which is possibly non – numerical outcome
of the experiment, to real numbers. Therefore the sample space is the domain the corresponding set of
real numbers is the range (counter domain) space. Diagrammatical explanation of this mapping is
given below:

X
X(W
i )

Having this in mind, we make the following formal definition of random variable.

Definition: Let E1 be an experiment and S a sample space associated with the experiment, a function
X assigning to every element i  S a real number X( i )= x is called random variable.

Therefore, a random variable is a real- valued function that assigns a number to each sample point in
the sample space of a random experiment.

To remind you again, prior to an experiment we know what value the random variable X might take,
but the value that X does take, x , is not known until the experiment has been performed.

Notations
It is important; to distinguish between the rule or function that assigns the numbers to each sample
point and numerical values them selves. In this module we distinguish between the two i.e. between
the random variable and the value that it takes, by denoting the random variables (the rule) by capital
letters such as X, Y, Z, and the values of the random variable by lower case letters x, y, z
respectively. Hence, statements like P (X= x i ) or P(X<5) means the probability that the random
variable X takes the value x or takes values less than 5, respectively.

Events Generated by Random Variables


Is it possible to form an event from random variables? Is it possible to assign probability for each
value of a random variable?

As we were concerned about the events associated with the sample space S, so we can generate events
from random variables and then assign probability to them. In the study of Random variables,
question of the following form arises: what is the probability that the random variable ( r, v) X is less
than a given number x , or what is the probability that the r. v. X is between two values x 1 and x2.
Hence {X< x ) and { x1 < X< x2 ) are events. Similarly, we may ask what is the probability that the r.
v. X takes value exactly equal to x2 . In this case{ X= x2 } is an event . Consider example 28 above,
the value of the r,v, X taking exactly two heads { X=2}; the r. v. X taking at least one heads { X > 1}
are events generated from random variables.

In discussing many of the important concepts associated with random variables, it is convenient to
distinguish between discrete and continuous random variables.

2.3. Discrete Random Variables and Probability Density Function


2.3.1. Discrete Random Variable

Definition. Let X be a random variable. If the number of possible values of X is finite or countably
infinite, we call X a discrete random variable. The random variables indicated in example 28 and 29
above are examples of discrete random variables. In addition number of defective oranges, marks on a
test, number of students in a university etc is some examples of discrete random variables. The value
of a discrete r.v is determined by counting, thus its value is expressed in terms of whole number, like
0,1,2, . . . etc.

Remember that we are interested in the probability that r. v.X, is taking certain value x , or taking
less than x or fall between two values say x1 and x2. For this reason, next, we will try to construct a
probability distribution or probability mass function.

2.3.2. Probability Distribution of Discrete random variable


[Link]. Discrete Density Function.
Definition : Suppose X is a discrete random variable taking at most countable infinite number of
values, x1, x2 . . . xn . . . with each possible outcome xi we associate a number P ( xi ); i=1, 2, . .., n, .
.. = P (X= xi ), called probability of xi . The numbers P ( xi ) must satisfy the following conditions:
i) P( xi ) > 0 for all i= 1,2, . . .
ii) P( xi ) =0 for x  xi , i= 1, 2, . . .(for impossible event) ………….(31)

iii)  P( x ) = 1 . Summation is over the entire possible values of the r.v.
i =1
i

The function P ( xi ) is called the probability mass function or discrete density function of the discrete
random variable. Consequently 0  P( xi )  1 . If x is not one of the values that X can take, then P(
xi ) = 0. Also if the sequence x1, x2, x3 . . . include all values that X can take then the sum of the

discrete density function is 1. That is  P( xi) = 1 . This last result is due to the following: If X is
i 1

discrete random variable with distinct values x1, x2, . . ., xn, . . . then S=  {  = X(  )= x
n
n }

=  { X= x n }, and { X= xi }  { X= xi }=  for it  j
n

Hence P(S) =1 =  P( x = xn ) =  P( xi )
n i =1
by the third axiom of probability. The collection

of ordered pairs ( xi , P ( xi )), i= 1, 2, . . . is called the probability distribution of the discrete random
variable X.

Example30. Let the experiment be the tossing of three coins simultaneously. And let X is the number
of heads obtained. Obtain the probability distribution of the discrete random variable X.

Solution.
The sample space S= { HHH, HHT, THH, HTH, TTH, HTT, TTT} . Assume each of the out come is
equally likely. The discrete density function and the probability distribution function can be presented
as a table illustrated here bellow.
X= No of favorable
Outcomes number of cases Probability P
favorable events heads (xi)
HHH 3 1 1/8
Probability
HHT, THH,HTH 2 3 3/8
TTH, HTT 1 2 2/8 mass or
TTT 0 1 1/8 discrete density
function

From this we can have another reduced from which consists of the list of possible values of X and the
corresponding P( xi ), that is
X P( xi )
3 1/8 The first row reads as the probability of X is exactly 3 is 1/8.
2 3/8 Symbolically P(X=3)=1/8. We can as well see that P( xi ) if it exists
1 2/8
0 1/8 lies between 0 and 1, that is 0 < P( xi ) <1.

But P( X=4) = 0 , because obtaining 4 heads in a throw of 3 coins is an impossible event.

If we take the sum of the probability mass functions over the entire value of the r.v.X we can see that
3
it is equal to 1. That is  P( x ) =1.
i =0
i

The above table shows the probability distribution of the discrete random variable X generated from
the given experiment. The above discrete density function can be represented graphically as
follows:

3/8 3/8

2/8 2/8
1/8 1/8 1/8

0 1 2 3 x

Example 31. Let the experiment be tossing of a fair die twice. And let X be the sum of the upturned
numbers. Obtain the probability distribution of the discrete random variable X.
Solution
We know that N(S)=36. The possible values of X and the corresponding probability mass function is
given below.

X P( xi ) Probability Statement
2 1/36 → P(X=2)
3 2/36 → P(X=3)
4 3/36 → P(X=4)
5 4/36 → P(X=5)
6 5/36 → P(X=6)
7 6/36 → P(X=7)
8 5/36 → P(X=8)
9 4/36 → P(X=9)
10 3/36 → P(X=10)
11 2/36 → P(X=11)
12 1/36 → P(X=12)

Graphically
6
36
5 5
36 36
4 4
36 36
3 3
36 36
2 2
36 36
1 1
36 36

012 3 4 5 6 7 8 9 10 11 12

[Link]. Cumulative Distribution Function


A function closely related to the probability mass function or discrete density function of a discrete
random variable is the corresponding cumulative distribution function (c.d.f) or distribution
function. Dear Student! Do you remember how you were trying to calculate a less than or more than
cumulative frequency of grouped data in your introduction to statistics course? If not please go back
and revise it once, conceptually the cumulative distribution function and less than cumulative
frequency distribution are similar. The cumulative distribution function is denoted by F( x ).

The cumulative distribution function, F( x ) is the probability that the random variable X takes on a
value at or below a number x . It is given by

F( x )= P (X< x ) for –  to 

F( x )= P(X< x )= P (  :X(  )< x )

P(X< xi ) =  P( x ) ,……………………………….(32)
i = xi  X
i

where the summation is over all the values the probability mass function P( xi ) for the values the
random variable can take less than or equal to the specified value.

Example 32. Consider the experiment of tossing a coin 4 times and let X= number of heads obtained.
Then we have

X P( xi ) F( x )
0 1/16 0 For X<0
1 4/16 1/16 For 0X<1
2 6/16 5/16 For 1X<2
3 4/16 11/16 For 2X<3
4 4/16 15/16 For 3 X<4
1 For X4

The probability that the r.v.X will take value less than 3, P[X<3] is obtained by adding the probability
mass functions of the previous three values. That is

P[X<3]=P(X=0) + P(X=1)+P(X=2)
1 4 6 11
+ + =
16 16 16 16

If we are interested on probability of the r.v.X taking values less than or equal to3, it will be as
follows. F(3)=P(X3) = P(X=0) + P(X=1)+P(X=2)+P(X=3)
1 4 6 4 15
+ + + =
16 16 16 16 16

11
Note that F( x ) is defined for all real numbers, hence F(2)= P(x2)= P(x=0) + P(x=1)+P(x=2)=
16

Similarly we can obtain F(2.5) even though X can not actually take the value 2.5. That is
11
F(2.5)= P(X2.5)= P(x=0) + P(x=1)+P(X=2) = . This is the same as F(2) . The implication is that
16
the value of cumulative distribution function remains constant between two possible values. The
value of F( x ) will change when it takes the next value of the r.v.X. There fore between, say 2 and 3,
of our example F( x ) will remain to be equal to F(2); when X takes the value 3 then F( x ) will be
11
equal to .
16
For discrete random variable, the c.d.f is a step function and is right continuous.
F( x )

15
16

11
16

5
16

1
16

0 1 2 3 4 x
Properties of a cumulative Distribution Function F(x)
Cumulative distribution function has the following properties.
i) F (−)  lim F ( x) = 0, and F (+)  lim F ( x) = 1
x → − x → −
ii) F( x ) is a monotone, non decreasing function that is if a<b, then F(a)<F(b).
iii) F( x ) is continuous from the right; that is lim F ( x + h) = F ( x)
0 h→0

Corollary. If F( x ) is c.d.f of the r.v.X and a < b then P(a <X  b) = F(b)- F(a).

Proof
Note that events {0< X b} and X  a  are disjoint and their union is the X  b . Hence by
addition rule of probabilities we have:
P ( a  X  b) + P ( X  a ) = P ( X  b)  P ( a  X  b) = P ( X  b) − P ( X  a ) =
F (b) − F (a ) . Also, we can proof property ii as follows: We know, from the above, that { X  b}={
a  X  b }  { X b} and { a  X  b }  { X b }= . Therefore P(X b) = P( a  X  b
)+ P( X  a ) > P( X  a ). Hence F (b)  F (a) this proves it.

Theorem 1. Let X be a discrete random variable. F( x ) can be obtained from P( xi ) and


vice versa.
Proof
Obtaining F( x ) from probability mass function P( xi ) is simply applying the definition of F( x ),. i.e.
F( x ) =  P( x ) conversely, suppose F( x ) is given then
i = xi  x
i

P( xi ) = F( xi )-F( xi −1 ) ………….(33)
Let the discrete r.v.X. take values -   x1  x2    xn  . We know that
F( xi )=  P( X = x ) =
i = xi  x
i P( xi )+  P( X = xi )
i = xi −1  x

And F( xi −1 ) = P( X  xi −1 )  P( X = x )
i = xi  x
i

 F( xi ) -F( xi −1 ) = P( xi )+  P( X = x
i = xi  x
i −1 )-  P( X = x
i = xi  x
i −1 ) =P( xi )

Thus given the c.d.f of a discrete random variable we can calculate its probability mass function.

Example 33. We hope that you have obtained the c.d.f. of the discrete random variable X of example
31. Let us say that you are given only the c.d.f. instead of the discrete density functions. In that
example F(4)=6/36 and F(5) = 10/36, then how much is P(X=5)?

Solution
10 6 4
P( xi ) = F( xi ) - F( xi −1 )  P(X=5) = F(5)-F(4) = − =
36 36 36
Example 34. From a lot of 10 items containing 3 defectives, a sample of 4 items is drawn at random.
Let the r.v.X. denote the number of defective items in the sample. The sample is drawn without
replacement. Find P(X1), P(X<1) and P(0<X<2)?

Solution
Possible values of the discrete r.v.X. are 0,1,2,3 in sample of 4 items. Next we try to obtain the
probability mass functions of the discrete r.v.X.

P(X=0), i.e probability of no defective item is included in the sample.


3
C0 x 7C4 1
P(X=0) = 10
=
C4 6
3
C1 x 7C3 1
P(X=1) = 10
=
C4 2
3
C2 x 7C2 3
P(X=2) = 10
=
C4 10
3
C3 x 7C2 1
P(X=3) = 10
=
C4 30

Then the probability distribution of the discrete r.v.X is

X 0 1 2 3
P( xi ) 1 1 3 1
6 2 10 30

1 1 2
a) P(X  1)=F(1)=P(X=0)+P(X=1)= + =
6 2 3
1
b) P(X<1)=P(X=0) =
6
1
c) P(0<X<2)= P(X=1) =
2

Example 35: A random variable X has the following probability mass function.

X 0 1 2 3 4 5 6 7
P( xi ) 0 k 2k 2k 3k k2 2k2 7k2+k

i) Find k ii) Evaluate P(X< 6), P(X  6) and determine the distribution of X..

Solution
7
i) We know that  P( x ) = 1
i =0
i

7
Therefore  P( x ) =10k +9k=1
i =0
i
2
 10k2+9k-1=0 → quadratic equation

=(10k-1)(k+1)=0

We have two solution k=-1 and k=1/10. Which one do you think is the correct answer? Why? The
correct answer is k=1/10, because probability can not be less than zero.
i) P(X< 6)= p(X=0)+P(X=1)+P(X=2)+P(X=3)+P(X=4)+P(X=5)
1 2 2 3 1 81
= 0+ + + + + =
10 10 10 10 100 100
P(X 6)= 1-P(X< 6)
81 19
=1- =
100 100

1.3.3. Continuous Random Variables and Probability Density Functions

[Link]. Continuous Random Variable


Definition. A random variable X is said to be continuous if it can take all possible values between
certain limits. It means a continuous random variable is one that may assume all values in at least one
interval of the real number line.

What we are saying is that if a and b are end points of a given interval and if the r.v.X. can take all
possible values [a ,b] or a X b we call the r.v.X a continuous random variable. Examples of
continuous random variables are the age, height, weight of students in a class, distance between two
locations etc. In all these cases we talk about the value in a particular interval, not at a point. For
example the distance between. Addis Ababa and Bahir Dar is usually taken to be 565 km. But if we
want to be accurate it may be 565.25, and again we want to be more accurate it may be 565.257. Thus
the said distance, when it is expressed to the desired degree of accuracy in stead of being on specific
value it will be the interval between 565- 565.257 km.

The implication is that unlike the discrete random variable, a continuous random variable can not take
a value at a point. Hence, we can redefine a continuous random variable as “a random variable is said
to be continuous when its different values can not be put in one – to – one correspondence with a set
of positive integers.’’ It assumes infinite and uncountable set of values.
[Link].Probability Density Function


X
Definition. If X is a continuous random variable, the function f (x ) in F( x )= f (u )du is called
−
the probability density function (p.d.f) of X. Here F( x ) is the cumulative distribution function of X.
For continuous random variable X, F( x ) is absolutely continuous.
Other names of probability density function include density function, continuous density function, and
integrating density function.
Dear colleague! Can you observe, from the above definition, that F( x ) can be obtained from f (x )
and vice versa? Then let us put it in terms of theorem.

Theorem 2. Let X be continuous random variable. Then F( x ) can be obtained from f (x ) , and vice
versa.

Proof. If X is a continuous random variable and f (x ) is given, then F( x ) is obtained by integrating


X
f (x ) ; that is, F( x ) = f (u )du . If, on the other hand F( x ) is given, then f (x ) can be obtained
−
dF ( x )
by differentiation of F( x ), i.e. f (x ) = for those values for which F( x ) is differentiable.
dx

Note strictly that, unlike the situation for discrete random variables, the p.d.f of a continuous random
variable, f (x ) , will not give the probability that X takes the value x . It means while P( xi ) is the

probability of discrete r.v.X taking value xi , f (x ) is not the portability . For continuous

variable X the probability is given by the areas under the curve of f (x ) with corresponding interval
on the horizontal axis.
For continuous random variable X, if x1 = x and x2 = x + x then p.d.f. f (x ) can be expressed as
a limit,
dF ( x) F ( x + x) − F ( x)
F( x ) = = lim = f (x ) x= lim ( F ( x + x) − F ( x)
dx x → 0 x x → 0

But lim F ( x + x) − F ( x) = P( x  X  x + x) .


x → 0

 f (x )  x  P( x  X  x + x) . This result indicates that the probability that X is in small


interval containing the value x is approximately equal to f (x ) times the width of the interval,  x .
Hence the probability of a continuous random variable X is given by the area under the curve of f (x )
, that is curve of the density function.
Properties of Probability density Function
Any function f (x ) with domain the real line and counter domain [0,] is said to be the probability
density function (p.d.f) of X, if and only if it satisfies the following conditions:

(i) f (x )  0 for all x



(ii)
−
 f ( x)dx = 1 …………. (34)

b
(iii) for any a ,b with - <a <b < , we have P( a  X  b ) =  f ( x)dx
a
f (x )
P(aXb
) f (x )

a b x

Note the following important points as well.


a) For any specified value of continuous r.v.X say point c, we have P(X=c)=
c

 f ( x)dx = 0 .
c
i.e are of a point is zero.

Alternatively, we know that P(aXc)= F(c)=0. Consequently the following probabilities are all the
same if X is a continuous random variable because all the following intervals differ only by one or
two points that have probability zero. They are:

P(a<X<b) = P(a<Xb) =P(aX<b)

x x
b) Note that F(-) = lim
x → − 
−
f (u )du =0, F()= lim
x →  

f (u ) du =1 and when a < b,

F(a)F(b).
dF ( x )
Proof. From f (x ) = , we have dF (x ) = f ( x) dx .
dx

Integrating dF( x ) = f ( x) dx from - to x and using the fact that F(-) =0 we get
x
F( x )=  f (u )du =0
−

Since F()=1, there fore F( x ) =  f ( x)dx = 1.
−
c) The probability for r.v.X. to fall in the finite interval [x1,x2] is
x2

F(x2)-F(x1)=  f ( x)dx
x1
……………………(35)

x2

P(x1<Xx2)=  f ( x)dx
x1

d) For discrete random variable P ( xi ) is a function with domain real line and counter domain
[0,1]; where as f (x ) is a function with domain real line and counter domain the infinite
interval [0,].

Let us see some examples.


Example 36. Let X be continuous random variable. Let the p.d.f. be given by

f (x) = 2x, 0< x <1


0 otherwise

i) Check that f( x ) is p.d.f ii) Obtain F ( 1 )?


2
1 2
iii) Evaluate the conditional probability P( X  0.5 X  )
3 3

Solution

x 2 1
1
i) 
−
f ( x ) dx = 1   2xdx = 1 2
0
 = 1− 0 = 1
2  0
since f (x ) 0 and  2( x)dx = 1 it is p.d.f.
1

0
1
2
1 1 1
ii) F( )
2 
0
2
f ( x)dx = 1  2  xdx =
0
4
 1 1 2
iii) P x  / X  
 2 3 3
 1  1 2
First obtain  X      X  
 2  3 3
1 1
The result of this intersection is   X  
3 2

Then apply the rule of conditional probability

1 1
p  X  
 2
Therefore P x  /  X   = 
1 1 3 2
 2 3 3 1 2
P  X  
3 3

1
2 2
2 xdx
5
1 5
3
2
= 36 =
 3
2 xdx 1 12
3
1
3

Example 37. Let X be the life length of a certain type of light bulb in hours. Assuming X to be
continuous random variable, and suppose that its p.d.f. is given by

a
f ( x) = , 1500 X2500
x3
=0, other wise.
For what value of a f (x ) will be p.d.f?
Solution



To obtain the constant a , we invoke the condition f ( x)dx = 1 In this case it will be
−

 a dx = 1  a  1 dx = 1
2500 2500

1500
x3 1500
x3
1
= a=  x −3dx = 1
2500
a

2500 −3
x dx 1500
1500
1
= = 7,031,250
−2
x 2500
− 1500
2
The graph of f (x ) is given below
f (x )

x=1500 x=2500
Example 38, Let X be continuous random variable with p.d.f.

3e−3 x , X  0
f (x ) =  otherwise
0
i) Check that f (x ) is p.d.f.
ii) Evaluate the probability that X falls between 0.5 and 1
iii) Obtain c.d.f.

Solution

 
e−3 x
 f ( x)dx =  3e−3 x dx = 3. lim 
i) =1
 → − 3
0
− 0
1

 3e
−3 x
ii) P(0.5X1)= dx = -e-3+e-1.5=0.173
0.5
iii) The c.d.f. is obtained by integrating the p.d.f.

x x
For x >0, F(x)=  f (u )du =  3e−3u du = − e3 0x = 1 − e −3 x
− 0

0 ,x  0
Therefore, F(x) =  −3 x
1 − e ,x  0
Then P(0.5X1) can also be computed as F(1)-F(0.5)= (1-e-3)-(1-e-1.5)=0.173
Example 39, Obtain the c.d.f of the p.d.f. given in example 36 above

Solution
2 x for0  x  1

Given f ( x) = 
0
 otherwise
F ( x) = 0 if x  0

x x x
For 0<X1, F ( x) = 
−
f (u )du =  f (u )du =  2udu
0 0

2 2
=
x
0 = x2 − 0 = x2
2
For x >1 =1

0 , x 0
Therefore F(x) = x2 , 0< x 1
1 , x >1

The graph of this c.d.f. is

F(x)

1, 1

Example 40, Consider a p.d.f given by


1 2
f (x ) = x ,0< x <3
9
Find the probability that X will assume a value between 0< x < 3.
3 3
1
 f ( x)dx =  9 x dx =
2
P (0 < X < 3)= 1
0 0
Note that the cumulative distribution function is important for a number of reasons. Particularly this is
true when we deal with continuous random variables for in that case we can not study the probability
behavior of the random variable X at a point. Because that probability is always equals zero.
However, we can ask about P( X  x ) and obtain the p.d.f of X as we did previously.
2.4. Expectations and Moments
2.4.1. Expectations /Mathematical expectations/
In order to study random variables and their probability distributions, it is extremely useful to define
the concept of expectation or mathematical expectation of random variable and of functions of a
random variable. These are considered in this sub section.

A) Expected value or mean of a random variable


Let X is a random variable. The expected value or mean of X, denoted by  , is
n
i) E[ X ]=  x P( x ) =  ………………………………..(36)
i =1
i i

If X is discrete random variable



ii) E[ X ]=
−
 x f ( x)dx =  ,……………………….(37)
if X is continuous random variable.
Where E, in this case is the expectation operator and E[ X ] is read as expected value of X .
Note that the above two formula assume that the sum or integral exists .And the summation and
integration is over the entire value of the random variable.

Example 41, Fantu sells new cars for Nyala motors. He usually sells the largest number of cars
on Saturday. He has established the following probability distribution for the number of cars he
expects to sell on a particular Saturday.

No of cars P( xi )
0 0.10
1 0.20
2 0.30
3 0.30
4 0.10

On a typical Saturday how many cars should Fantu expect to sell ?

Solution
n
E[ X ] =  x P( x )
i =1
i i

= 0(0.10)+1(0.2)+2(0.2)+3(0.3)+4(0.10)
 = 2.1 cars. This average is the expected value of the r.v. X , though X can
not actually take the value 2.1.

Example 42. Consider the experiment of rolling a single die. Let X be the value that shows on the
die. The probability distribution of X is

xi 1 2 3 4 5 6
P( xi ) 1 1 1 1 1 1
6 6 6 6 6 6

What would the average value of X be if the experiment were repeated an infinite number of
times?
Solution
n
E[ X ] =  x P( x ) = 
i =1
i i
1 1 1 1 1 1
= 1( ) + 2 ( ) +3 ( ) + 4 ( ) + 5 ( ) + 6 ( )
6 6 6 6 6 6
1 7
= (1+2+3+4+5+6)= 2 = 3.5
6
This is the expected value of X , despite the fact that X can not actually lake the value 3.5.
Example 43. Suppose X is a continuous random variable with p.d.f.
1
9 x
2
for 0  x  3

f (x ) =  .
0 otherwise


Then, find E[ X ]?
Solution

E[ X ] =  f ( x ) dx = 
−
3
1 2 1 11  3 1 2 3
=
0
9 
x dx =  3 x 2 dx =  x 3  0 =
9 9  3  27
x 0
=9
4
0
Example 44 . A continuous distribution of a variable X in the range (-3, 3) is defined by
1
f (x ) = ( 3+x)2 , -3 < x < 1
16
1
= ( 6 -2x2) , -1 < x < 1
16
1
= ( 3+x)2 , -1 < x < 3
16
Find the mean, E( X ), of the above distribution.
Solution
We can consider each segment separately and then add the results to obtain the
expected value of X .

E[ X ] =   xf ( x)dx
−
−1 1 3
1 1 1
−316 + −116 − 1 16 x(3 + x) dx
2 2 2
= x (3 x ) dx + x ( 6 2 x ) dx +

1  
−1 1 3

  x(3 + x) d x +  x(6 − 2 x )dx +  x(3 + x) dx 


2 2 2
=
16 − 3 −1 1 
1 3
−1 1 3

=   x + 6 x + 9 x)dx +  (6 x − 2 x ) dx +  ( x + 6 x + 9 x )dx 
2 3 3 2

16 −3 −1 1 
−1 −1 −1
1
1 1 3 3 3
=   x3 + 6  x 2 dx + 9  xdx + 6  xdx − 2  x dx +  x dx + 6  x dx + 9  xdx
3 3 2

16 −3 −3 −3 −1 −1 1 1 1

=
1
− 20 − 52 − 36 + 0 − 0 + 20 + 52 + 36]
16
= 0] =0
1
6
Example 45 A survey conducted over the last 25 years indicated that in 10 years the winter was
mild, in 8 years it was called and in the remaining 7 years it was very cold. A company sells 1000
Woolen coats in a mild year, 1300 in a could year and 2000 in a very cold year.
Find the yearly expected profit of the company if a woolen coat costs Birr 173 and it is sold to stores
for birr 248.
Solution
The probability distribution of the sells is

Type of winter Probability No of coats xP( xi )


Sold (x)
Mild 10/25 1000 400
Cold 8/25 1300 416
Very cold 7/25 2000 560
1276

The average or expected sales is


n
E[ X ]=  x P( x ) = 1376 .
i =1
i i

Expected or Average profit= expected total revenue- expected cost


= (unit price) (expected sells) - (unit cost) (expected sells)
= (unit price – unit cost) expected sells.
= (Birr 248 – Birr 173) 1376= Birr 103,200

Dear Student! We hope that you may have noticed that the expected value of X is a weighted average
of the values that the random variable X can take, the weights being the probabilities of each values.
E[ X ] can also be regarded as the center of gravity of the probability distribution

2.4.2. Theorems on expectations

Theorem 1, Expectation of a constant is a constant it self.


That is if c is a constant value of a random variable X , then E ( X ) = E [c] =c.
Proof: if X is discrete taking constant value c through out, then
E( X ) =   
xi P( xi )) = c P( xi ) = c [since P( xi ) = 1]

ii) If X is continuous random variable, then



E( X ) =  xf ( x)dx.
−
 
 

=  cf ( x)dx = c 
− −
f ( x)dx = c.sin ce  f ( x)dx = 1
 − 

Theorem 2, If X is a random variable and if a and b are constants, then

E( a + b X )= a + b E( X )

Proof
i) If X is discrete random variable,
E( X ) = 
xi P( xi )) =

(a + bxi ) p( xi )
[ap( xi ) + bxi p( xi )]
Taking summation for all we have
 aP( xi ) +  bxi P( xi ). This can be written as

a  p( xi ) + b  xi P( xi )
= a + b E(x)

ii) If x is continuous random variable, then E( X ) =  xf ( x)dx
−

E( a + b X ) 
= (a + bx) f ( x)dx
−
  


= [af ( x)dx + bxf ( x)dx]
−
=a 
−
f ( x)dx + b  f ( x)dx
−

=a +b E( X )
In both cases we have utilized theorem 1 and the definition of the expected value of X .
Expected Value of a Function of a Random Variable

Theorem 3, Let X be the random variable and g( X ) is the function of the r,v, X , then its
expectation is given by
i) E [ g( X )] = 
g ( xi ) P( xi ) ………………………(38)
for discrete random variables.

ii) E[g( X )] =  g ( x) f ( x)dx. …………………….(39)
−
for continuous random variables.

Example 46, suppose X is a continuous random variable with P.d.f


1/(b-a) , axb
f (x ) 0 Otherwise
= 
And let g( X )= X 2. Then E[g( X )] =E( X 2)=  g ( x) f ( x)dx.
−

 1 
b b
1

= x2 
a
dx. =
b−a

b−a a
x 2dx.

1  b3 − a 3  1 2
= 
3 b−a  3
(
 = b + ba + a
2
)

Theorem 4,If g1( X )and g2( X ) are two functions of the r.v. X . and a and b are
constants, then E[ag1( X ) +b g2( X )]= aE[g1( X )] +b E[g2( X )]
proof
i) E[ag1( X ) +b g2( X )]= [ (ag1( xi ) + b g2( xi )] P ( xi )
=a g1( xi ) P ( xi ) + b g2( xi ) P ( xi ) .
=a E[g1( X )]+ b E[g2( X )]

ii) E[ag1( X ) +b g2( X )]=  [ag ( x) + bg
−
1 2 ( x)] f ( x )dx.
 
= a 
−
g1 ( x) f ( x)dx + b  g 2 ( x) f ( x)dx.
−

= aEg1 ( X ) + bE[ g2 ( X )]
B) Variance

The variance of a random variable X is the measure of the spread or dispersion of the density of X.

Let X be a random variable and  its mean then the variance of X, denoted by 2 is defined by
i) 2=E[ X -E( X )]2= 
( xi −  )2 P( xi ) for discrete random variable……(40)

 =E[ X -E( X )]  (x − )
2
ii) 2 2
f ( x)dx. for continuous random variable…(41)
−

Standard deviation of X = var( x) =  ………………..


2
(42)
Var( X ) can be easily calculated, in both cases, as

Var( X )= E( X 2)-(E( X ))2=E( X 2)- 2……………… .(43)


Proof. Using expectation the above result can be derived as
Var ( X ) = E[( X -E( X ))2]
Expanding the square we have
Var ( X ) = E[( X 2+(E( X ))2-2 X E( X )]
since E(x)=, by substitution we get
Var ( X ) = E[( X 2+2-2 X ]
Taking expectation for all terms in the bracket, we get
Var ( X ) = E( X 2)+E()2-2E( X )
= E( X 2)+2-2 ( is constant)
= E( X )+  -2
2 2 2

= E( X 2)- 2
Var ( X ) = E( X 2)- (E( X ))2
The required steps are:
a) first obtain the mean, i.e. E( X )
b) Second obtain the expected value of the square of the r.v. X . i.e E( X 2)
c) Finally subtract the mean square from E( X 2).
Note carefully that E( X 2)(E( X ))2
Example 47. Calculate the variance of the probability distribution of example 42, above
Solution
We know Var( X )=E( X 2)-(E( X ))2

X P(xi) x2 x2p(xi)
1 1 1 1
6 6
2 1 4 4
6 6
3 1 6 6
6 6
4 1 16 16
6 6
5 1 25 25
6 6
6 1 36 36
6 6

88
E( X 2)= x 2

6
i P( xi ) =
21
E( X )=  xi P( xi ) =
6
2
88  21  87
Var ( X )=  = 2
-   =
6  6  63
Example 48 Let X be continuous random variable with p.d.f. given by
f ( x) = 6 x(1 − x) 0 x 1. Obtain variance of X .
Solution
We know that var ( X )=E( X 2)-2

1 3 1 4  1 1
1 1
E( X )=  −  0 − 0 ( x − x )dx = 6 3 x − 4 x  0 = 2
2 3
xf ( x ) dx. = x 6 x (1 x ) dx. = 6


1 1 
1 1
6
 x f ( x)dx. =  x 6 x(1 − x)dx. 6 ( x3 − x 4 )dx = 6 x 4 − x5  10
2 2
E( X 2)= =
− 0 0 4 5  20
2
6 1 1 1
Var( X )= -   = S.D.(x) = var( x) = =
20  2  20 20

Some Theorems on Variance

Theorem 5. Variance of a constant value is zero. i.e. if r.v. X takes constant value c
throughout then its variance will be zero.
Var( X ) = E( X 2) - (E( X ))2
E(c)=c from theorem 1 of expectation
E(x2) = E(c2) = c2 theorem 1 of expectation
 Var( X ) = c2- c2 =0
Theorem 6 . Let X be random variable and if a and b are constants we have the
following useful results.
i) Var (a X ) = a2 Var( X )
ii) Var (a+ b X ) = b2 Var( X )
iii) Var ( X  a) = Var( X )

The proof is left for you, dear colleague, as an exercise.

2.5. Moments
2.5.1. Moments
Do you remember moments that you have studied in introductory course? The idea that we are going
to discuss here is just the same as the previous case.
Moment is moment of a random variable about some point and these moments are used to describe
the various characteristics of a distribution: Central tendency, Dispersion, Skew ness and Kurtosis.
Moments are the expectations of the powers of the random variable which has the given distribution.
We can calculate moments from three points:

i) Moments about origin;


ii) Moments about assumed value; and
iii) Moments about the mean.
We will treat the first two moments together under moments about assumed value.

I. Moments About Assumed Value

Definition. If X is a random variable, the rth moment of X about assumed value a ,


usually denoted by 'r is defined as
'r = E[ X - a ]r =
i) 
P( xi − a)r ……………………(51)
for discrete random variable.

 ( x − a)
r
ii) f ( x)dx , for continuous r.v…………….(52)
−
Sometimes r moment about a is known as, the rth central moment of X about a. If we take a = 0,
th

then we have
'r=E( X -0)r =E( X r) = i) x r P( xi ) , …………  (53)
for discrete random variable.

x
r r
ii) E( X ) = f ( x)dx , for continuous random variable…….(54)
−

In most cases we are interested in the first four moments.

If r =1, we will have the 1st moment about a.


'1 = E[ X − a] = 
( xi − a) P( xi )
=  ( x ( p( x ) −
i i aP( xi ) )
Taking summation for all terms in the big bracket
'1 = 
xi P( xi ) − a P( xi ) 
=  − a   = '1 + a …………………..(55)
Similarly for continuous random variable
  


'1= ( x − a ) f ( x ) dx =
−

−
xf ( x)dx − a  f ( x)dx
−

'1 =  − a 

[ f ( x)dx =1]
−
  = '1 + a

The second moment is given by


'2 =E( X -a)2
= i) 
( xi − a)2 P( xi ) , for discrete r.v. …………..(56)

 ( x − a)
 2
=ii) p( xi ), for continuous r.v. ………(57)
−
The third moment is given by
'3=E( X -a)3=
= i) 
( xi − a)3 P( xi ) for discrete r.v. ………………(58)

 ( x − a)
 3
ii) p( xi ), for continuous r.v…………….(59)
−
The fourth moment is given by
'4=E( X -a)4=
= i) 
( xi − a)4 P( xi ) , for discrete r.v. ………………(60)

 ( x − a)
 4
ii) p( xi ), for continuous r.v………………..(61)
−
to obtain the2nd , 3rd and 4th moment about origin what we need to do is to tale a =0.

II. Moment about mean

Definition. If X is a random variable, the rth central moment of X about , denoted by r , is given
by
r= E( X -)r =
= i) ( xi −  )r P( xi ) for discrete r.v. ……………..(61)
ii)  
( x −  )r f ( x)dx, for continuous r.v…………(62)
−
Computation of the basic for moments is the same as above.
1. The 1st moment about mean  is given by
1= E( X -) =
i) 
( xi −  ) P( xi ) = 0., for discrete r.v.

 (x − )

ii) f ( x)dx for continuous r.v.
−
 1 is always zero.
2. The 2nd moment about mean is given by
2= E( X -)2 =
i) 
( xi −  )2 P( xi ) =2., for discrete r.v.
ii)  
( x −  )2 f ( x)dx =  2 for continuous r.v.
−
2 is variance.
rd
3. 3 moment is given by
3= E( X -)3 =
i) 
( xi −  )3 P( xi ) for discrete r.v.
ii)  
( x −  )3 f ( x)dx for continuous r.v.
−
4. 4th moment about mean is given by
4= E( X -)4 =
i) 
( xi −  )4 P( xi ) ., for discrete r.v.
ii)  ( x −  )4 f ( x)dx for continuous r.v.

−
We frequently utilize moments about mean to describe the basic characteristics of probability
distribution of a given random variable.

You, already, are well acquainted with the mean, variance, and other two important characteristics of
a distribution. They are Skew ness and Kurtosis. As you know Karl Pearson has developed measures
for coefficients of Skew ness and Kurtosis based on moments about mean. We will discuss them in
this subsection.

Skew ness
Skew ness is lack of symmetricity. K. Pearson's coefficient of skew ness is given by
32
A. 1 =  ………………………..(63)
23
if 1=0 symmetrical
if 1 > 0 (positive) the distribution is positively skewed
if 1< 0 (negative) the distribution is negatively skewed.

However 1 has serious limitation. Those are since 32 and  23 are always positive 1 is always
positive. Thus 1 is not able to tell us about the direction of skew ness completely. Hence the
alternative measure of skew ness which is free from the above limitation is 1 .

1 = + 1 =
3 …………………………………. .(64)
3
Hence if 1 =0 the distribution is symmetrical.  3=0
if 1 >0 the distribution is positively skewed 3is positive
if 1 <0 the distribution is negatively skewed 3<0

y y r1<0 y r1>0
r1=0
a = b symetric

a b

x x x
Symetrical -vely skewed +vely skewed

B) Kurtosis
Kurtosis is measure of flatness o peaked ness of a symmetric curve. Coefficient of Kurtosis is given
by
4
2 =  ……………(65)
 22
if 2=3 the distribution is normal or mesokurtic
if 2 < 3 (positive) the distribution is platykurtic
if 2> 3 (negative) the distribution is Leptokurtic.

Alternatively
2 =2-3 ……………………………….(66)
if 2 = 0 the distribution is normal or mesokurtic
if 2 < 0 (positive) the distribution is platykurtic
if 2 > 0 (negative) the distribution is Leptokurtic.

y LeptoKurtic
Mesokurtic
Platykurtic

You may ask what is the purpose of studying or knowing moments about 0 or some assumed value a?
That is a good question. We need to study them because in many cases moments are given in terms of
either moments about assumed value or about origin. If either of these moments are given it is
possible to change them to moments about mean using the following conversion rule.
r r r
r =  / r −   / r −1 /1 +   / r − 2  // 21 −   / r −31/ 3 +  + (−1) r  / r1 ………...(66)
1  2  3
Hence 1 = 0
For r=2,3, and 4 we get the following
2 ='2 -'21
3 ='3-'2 '1+2'31 ……………………………….(67)
 4 =  / 4 − 4 / 3  /1 + 6 2 /  / 21 − 3 /14

Example 49. In continuous distribution whose p.d.f is given by


3
f (x ) = x(2 − x) , 0  x  2.
4
Find mean, variance, 1 and 2

Solution: General solution.


2
3
'r (about 0) =  x r x(2 − x)dx
40
3
=  2 (2 x r +1 − x r + 2 )dx
40
3  2 r +1 
2 x dx −  x dx 
2 r +2
=
4 0 0 
3  2 r +2 1 r +3  2
=  x − x dx 
4 r + 2 r +3  0
3 r + 2  2r + 6 − (r + 2) x  2
= x  
4  (r + 2)( r + 3)  0
3 (2r + 2 )( 2) 3 2r + 3
= = ( 2 )( )
4 (r + 2)( r + 3) 2 (r + 2)( r + 3)
(3)2r +1
= . This is general solution. Then
(r + 2)( r + 3)
3x 22 3x4
'1 = = =1
(1 + 2)(1 + 3) 3 x 4
3 x 23 3 x8 6
'2 = = =
(2 + 2)( 2 + 3) 4 x5 5
3x 24 3x8 x8 3x 24 24 8
'3 = = = = =
(2 + 2)( 2 + 3) 4 x6 5 x6 5 x 2 5
3 x 25 3x8 x8 x 2 64 3x 25 25 24 16
Lastly '4= = = = = = =
(4 + 2)( 4 + 3) 6 x7 7 6 x7 2 x7 7 7
 Mean, E( X ) = first moment about origin.
Therefore E( X )= '1=1
6 1
'2= '21= − 1 = = Variance
5 5
8 18
3 == '3-3''1+2'31 = − +2=0
5 5
16 8 6 3
4 = − 4 x x1 + 6 x1 − 3x1 =
7 5 5 35
 2
0
Hence 1 = 33 = = 0  Symmetric distribution
2 1
5
 3 15
 2 = 43 = x 25 =  3  Platy Kurtic distribution
2 35 7

Example 50. The first 3 moments of a distribution about the value 2 are 1, 22 and 10. Find the mean,
S.D. and 1

Solution
1) We know ='1+a=1+2=3
2) S.D=  2 but 2 =  2 −  '1 = 22 − 1 = 21
1 2

 S.D = 21 = 4.5
3
3) 1 =
 3 but3 = 3 − 3 '2  '1 +2 ' 1 −1 = 31
1 2

= 10-(3x2x1)+2=-51
− 51
 1 =
( 4.5)3

You might also like