Chapter I Ad 2
Chapter I Ad 2
For scientific investigation of uncertainty of outcomes, it is first necessary to have a measure of the
degree of uncertainty. Since out comes of a random experiment are associated with uncertainties a
measure of the degree of uncertainty associated with an event (to be defined latter on) is known as
probability measure.
Definition:-
1. Sample space. The set of all possible outcomes of an experiment is called the sample space,
and denoted by S. The word sample is included as the reminder of the random nature of the
experiment meaning that its outcomes is uncertain so that a given outcome is one sample of
several possible outcomes. That is each individual outcome is occurring as a sample from
many possible outcomes of an experiment.
Some other symbols that are used in other texts to denote sample space are ,Z,X.
2. Sample point. - Each element of the sample space S is called a sample point and usually
denoted by .
3. Events. - Events are subsets of the sample space, S.
4. Event space. The class or collection of all events associated with a given experiment is
defined to be the event space. We use E to denote event space.
! now let us see some examples which will help us understand the above mentioned definitions.
Example.1. Let the experiment be rolling of a well balanced single die. The sample space of the
experiment is S= {1, 2, 3, 4, 5, 6} which contains all the possible outcomes of the experiment. Each
outcome is a sample point, i.e., i S . And this sample has a finite number of elements. Let A = {2,
4, 6}. Then A is a subset of S and defines the event of obtaining an even outcome. Let B= {4}. Then
B defines the event of obtaining a 4. Both A and B are subsets of the sample space and thus are
events. The size of this sample space(S), is finite.
We say that an event A occurs if a trial produces one of its elements, i.e. when 2 or 4 or 6 occurs. If a
4 is turned up, we say events A and B both occur. Since in this case B A, when B occurs then A
occurs.
We may be interested on the following events
a) the outcome is the number 1
b) the outcome is even but less than 3
c) the outcomes is not even.
When an event contains only one element of S, like B above, it is called elementary event.
If we collect all subsets of S in one set and denote it by E this new set is called event space.
Example2. The experiment is to record the number of traffic death on the eve of the next new year in
Addis Ababa. Any nonnegative integer is a conceivable outcome of this experiment. The sample
space for this experiment is
S= {0, 1, 2, . . .} There are countably infinite number of points in the sample space. Hence the size of
this sample space is countably infinite. Since each point itself is an elementary event; so there is an
infinite number of events. Other possible event is A=fewer than 500 deaths, then A = {0,1,2...499}.
Example 3. Consider the agricultural experiment of growing wheat on an acre of land under a well
defined set of conditions, then recording the yield in quintals (x). Any non negative number of
quintals could be an outcome of this experiment. The sample space here is
S= { x : x 0 }
In this case the sample space has uncountably infinite number of elements since it takes values in
terms of an interval of the real number line.
Thus we think of events as subset of the sample space S. Whenever A and B are events in which we
are interested, we can reasonably concern ourselves also with events A or B A and B, and not A.
Using set notations we can rewrite them as AUB, A B and AC respectively.
Events A and B are called disjoint if their intersection is the empty set ; is called the impossible
event. The set S is called the certain or sure event.
we know from our study of set theory the total number of subsets that can be formed from a given set
is defined by the formula 2n where n is the number of elements of the set under discussion. Exactly
this formula is used to obtain the number of possible events that can be formed from a given sample
space S. In this case the power n is the number of elements of the sample space. Therefore from
sample spaces with 'sufficiently large' or infinite sample points the number of events that can be
formed will be too large.
We have seen, in introduction to statistics course that we always try to know the likelihood of
occurrence of events, i.e. we always try to calculate probabilities only for events. Hence sample
spaces with infinite number of elements like example 2 and 3 above will have infinite events and it
would be impossible( or not necessary) to assign probabilities for all events. Hence, for reasons
beyond this module, for infinite sized sample spaces not all subsets of the sample space are entitled to
be events. Definitely the question is then which subsets of S are entitled to be events?
The collection of events as sub collection E, and known as event space, of the set of all subsets of S
which satisfy the following properties are entitled to be the required events:
a) S E
b) If A E, then A E
c) If A1, A2 ... E, then (A1UA2U... ) E . In compact form = U A E i
i =1
Using mathematical language any collection of events E of subsets of S which satisfy the above three
properties is known as -filed or algebra of events.
A -filed is a field which is closed under the operation of taking finite union countable intersection
and complementation.
1.3. Definitions of Probability
There are three widely used or applied definitions or interpretation of probability. All of them are, of
course, familiar to you, because you have exercised them not only in your previous course on
introduction to statistics. These definitions are:
a) Classical or a priori definition of probability
b) Relative frequency or posterior definition of probability
c) Subjective definition of probability
In this section we will discuss each of these definitions briefly.
1.3.1. Classical or a priori definition of Probability
The theory of probabilities in its early stage was closely associated with games of chance. This
association is the base for the classical definition. The classical definition of probability is based the
notion of 'mutually exclusive' and 'equally likely' experimental outcomes.
Definition. If a random experiment can result in N mutually exclusive and equally likely outcomes,
i.e if the sample space S consists of N mutually exclusive and equally likely outcomes,
and if NA of these outcomes have an attribute A. then the probability of A is given by the
ratio.
NA Favourablenumberofoutcomeswithattributed ( A)
P( A) = = ……..(1)
N Totalpossi bleoutcomes
If an event A is elementary event, meaning containing a single outcome, then its probability, P (A), is
1 . Where N is the sample size i.e. we are taking n(S) = N.
N
In this example the probability of obtaining any single outcome say 2, on a single toss is 1 . If we let
6
event A be an event of obtaining an even number, three of the six outcomes have this attribute, i.e. A=
{2, 4, 6}. Then probability of the event A occurring is given by
NA N ( A) 3 1
P ( A) = = = =
N N 6 2
P(A) can also be obtained as the sum of probabilities of the sample point that result in the occurrence
1 1 1 3 1
of A. That is P(A) = P(2)+P(4)+P(6) = + + = = .
6 6 6 6 2
Example 5: Let the experiment be tossing a single coin 3 times: or tossing 3 coins simultaneously.
Total possible outcomes of this experiment can be obtained by applying Cartesian cross product
principle.
(H,T)X(H,T)X(H,T)= (HH,HT,TH,TT)X(HT)
S= {HHH, HHT, HTH, HTT, THH, THT, TTK, TTT}
If the coin is well balanced, or fair, these 8 possible outcomes are mutually exclusive and equally
1 1
likely outcomes. Each single outcome, or sample point, has the probability of = . Now suppose
8 N
that we want the probability that the result of a toss be at least two heads. Let an event of at least two
heads be represented by B.
B= {HHH, HHT, HTH, THH} then the probability of the event B occurring is given by
N ( B) 4 1
P( B) = = =
N 8 2
Example 6. Probability calculation associated with drawing a car, like drawing a spade, at random
from a deck of playing cards is another example of application of classical definition or interpretation
of probability. Dear student, refer the discussion of probability in your introduction to statistics
module for additional examples of this sort.
According to the classical definition, the probability P(A) of an event A is determined a priori with
out actual experimentation, i.e. no actual experiment need ever take place. In other words it is possible
to conceive the experiment, say, of throwing a coin 3 times with out actually doing it and proceed
logically on the assumption of unbiased coin. That is why it is called a priori probability.
The classical definition was introduced as a consequence of the principle of insufficient reason “In the
absence of any prior knowledge, we must assume that the events Ai (elementary) probabilities.” This
conclusion is based on the subjective interpretation of probability as a measure of our state of
knowledge about the events Ai.
By classical definition the probability of event A is a number between 0 and 1 inclusive, i.e. P(A) take
values between 0 and 1 inclusive, i.e. 0P(A) 1.
(i) P(A) 1 because total number of possible outcomes can not be less than the
number of outcomes with specified attributes, i.e. NN(A).
(ii) If an event is certain to happen, its probability is 1
(iii) If it is certain not to happen, its probability is 0.
Critique:
The classical definition of probability breaks down in the following cases.
a) If the various outcomes of an experiment are not mutually exclusive and equally likely.
Example7. What is the probability of the event of rolling a 4, with single toss of a die if the die is
biased? To put it differently, if the die is loaded and the probability of 4 equals, say 0.2, the number
0.2 can not be calculated from the ratio given by equation (1).
Example 8. The classical definition will not help us when we try to answer questions such as
- What is the probability that a child that will be born next in Bahir Dar will be a boy?
- What is the probability that a male will die before age 50?
- What is the probability that a candidate will pass in a certain test?
- What is the probability that it will rain tomorrow?
Since out comes of the above questions are not equally likely to answer the probability questions of
these sort using classical definition is impossible.
To deal with the above sorts of questions, we can consider broader notions of probability. This widely
applicable probability is called the relative frequency or posterior definition of probability.
Even if the classical probability interpretation is with strong weakness mentioned above it is still
widely applicable definition. To mention two of them:
i) In a number of applications, the assumption that there are N equally likely alternatives is well
established through long experience. Moreover, we can often, by an appropriate choice of the
sample space, reduce the problem to one in which all outcomes are equally likely.
Suppose that we repeat an experiment a large number N of times, under essentially similar condition,
and suppose that A is some event which may or may not occur on each repetition. Experience of such
experimentation shows that the proportion of times that A occurs settles down to some value as N→
∞. That is to say, if N(A) is the number of times that A occurs in the N trials, the ration
N ( A)
N
appears to converge to a constant limit as N increases indefinitely. We can take the ultimate value of
this ratio as the P(A). Then the relative frequency definition is given by
N ( A)
P( A) = lim ………………..(2)
N → N
That means assume that a series of experiment can be made keeping the initial conditions as equal as
possible. An observation of a random experiment is made, then the experiment is repeated and another
observation is taken. When the experiment is repeated up to sufficiently large times, in many of the
cases the observations fall into certain classes where in the relative frequencies are quite stable. This
stable relative frequency can be taken as the probability (approximate) of events.
Example 9. Consider tossing a coin (balanced or unbalance 1000 times. The following result is
expected from this experiment.
540 1
Probability of obtaining H = P( H ) = 0.5 = , P (T ) = 0.5
1000 2
Example 10 . If we find up on examination of large records that about 51% of births, in one locality,
are male; it might be reasonable to postulate that the probability of a male birth in this locality is equal
to P and P 0.51, i.e.
P(male birth in that locality)=P0.51.
Example 11. A study of 8,000 economics graduates of A.A.U was conducted. The study revealed that
400 of these students were not employed in their major areas of study. What is the probability that a
particular economics graduate will be employed in an area other than his/her field of study?
400 1
P(a graduate will be employed in area other than his/her major) P
8000 20
What we learn from the above examples is that the probability of an event happening in the long run
is determined by observing what fraction of the time similar events happened in the past.
Numberoftimeseventoccuredinpas t
Pr obabilityo feventhappening =
Tota ln umberofobservations
Dear distance student! Please notice that the relative frequency definition does not a) require events to
be equally likely b) necessarily requiring the objects of the experiment to be unbiased. When the
objects of experiment are biased (not balanced) it is the relative frequency definition that must be used
to determine the probability of events.
Example 12. i) Estimating the probability that new product of a firm will be successful in
the market.
ii. Estimating the probability that a distant student will score an A in the course. It depends on
evaluation of individuals about the performance of a student in other subject.
iii. When a person, may be friend of you, says that the probability of rain tomorrow is, say 70% -
he is expressing his personal degree of belief.
You may have noticed from the above examples that subjective probabilities are based on any and all
information available to each person about the uncertainties related to the outcome of the event and
will differ from person to person.
The concept of subjective probability is the function for the Bayesian approach to statistics which we
don’t deal in the course.
The axiomatic definition of probability includes the above three definitions of probability but is free
from their draw backs.
Discussion of the axiomatic definition of probability is conveniently couched in the language of set
theory. Hence before giving axiomatic definition of probability below we will present, the summary
of what we know about set theory which are relevant to our interes
1. Demorgan’s Law
( AUB) = A B , and ( A B ) = A UB . Using Ven diagram
A B
A B ( A B) = A B
A B ( A B) = A B
P Ai = P(A ). i
i =1 i =1
Note that when we say algebra of events we mean that events in E satisfy the algebra of subsets
mentioned in the summary of algebra of set operation above. This is because events are subsets of a
sample space.
From the above 3 axioms we are able to deduce a number of laws ( properties) of probability.
Since E is an algebra of events, if we assume that A and B E we know that A , A B, AB, A B ,
etc. are also elements of E, and hence it makes sense to talk about P( A ), P ( A B), P (AB) , P (
A B) etc.
P(A B)=
N ( A) − N ( AB) + N ( AB) + N ( B) − N ( AB)
N (S )
N ( A) + N ( B) − N ( AB)
=
N (S )
N ( A) N ( B) N ( AB)
= + − = P( A) + P( B) − P( ANB)
N (S ) N (S ) N (S )
ii) For three events, A, B, C in E, then
P (A B C) = P(A)+ P(B)+ P(C) – P (AB) – P(BC) – P (AC) + P(ABC) ……(6)
Proof
N ( A B C)
P (AUBUC) =
N (S )
=
1
N ( A) − N ( AB) − N ( AC) + N ( ABC ) + N ( B) − N ( BC ) + N (C )
N (S )
n
n
Ai = P( A1 ) P( A2 ) + ... + P( An ) = P( A ) ……………………………(11)
i
i =1 i =1
Since P(S) =1 Then we have 1= ( A1) + P(A2) + . . . . P (An)
Corollary . If A and B are elements of E, then P (A)= P (AB) + P (A B )
Proof
We Know that A =A( B B )= A S= A we also know that AB A B =, hence AB and A B
are mutually exclusive events. Therefore
P(A)= P (AB A B )= P(AB) + P (A B )
Generalization
If A1, A2, A3 . . . . are sequences of mutually exclusive events in E, and A i E . Then P(A1
i =1
A2 . . .) = P Ai = P( Ai ) = p ( Ai )
i= j i =1 i =1
B) Boole’s inequality.
n
If A1, A2, .... An E, then P Ai P( A1 ) + P( A2 ) + ..... + P( An ) ……(13)
i =1
Proof
Take two events A1 and A2 E. Then
P (A1 A2)= P(A1) + P(A2)- ( A1A2) P (A1)+ P(A2)
That is P (A1 A2) = P(A1) + P(A2) if A1 A2 =.
How ever, if A1 A2 obviously P (A1 A2) < P(A1) + P (A2)
Thus far we have seen some of rules of probability what is left is the multiplication of probabilities.
We will discuss this rule together with discussion of conditional probability and independence. Before
that we will see counting procedures.
1.5. Counting Procedure
If the number of possible outcomes in an experiment is small, it is relatively easy to list and count all
the possible events and assign probability to all events. However, if the size of the event, say N(A) ,
and the size of the sample space, N(S) are large for a given random experiment with a finite number
of equally likely outcomes, the counting procedure become difficult problem. Such counting is
usually handled by use of counting procedures known as combinatorial formulas.
There are three widely used counting (Enumeration) methods namely.
• multiplication principle
• permutations
• Combinations. we will see them one by one
I. Multiplication Principle
Suppose we have two sets F and T, If F has m distinct objects f1, f2, . . . fm and T has n distinct
objects, t1, t2, . . . tn , then the number of pairs ( fi, tj) that can be formed by taking one object from set
F and a second from the set T is (m)(n). ……………..(14)
Example 13. Let the random experiment be throwing a balanced coin and fair die and record the
paired outcomes. If set F contains total possible outcomes of throwing a coin its elements are ( H, T)
and N(F) =2 and if set T contains the total possible out comes of throwing a die its elements are
{1,2,3,4,5,6, } and N (T) = 6 . The outcomes of our random experiment are obtained by the Cartesian
cross products of the two sets. That is
(H, T) X ( 1, 2, 3, 4, 5, 6)
Total possible paired outcomes are
( H ,1), ( H ,2), ( H ,3), ( H ,4), ( H ,5), ( H ,6)
S=
(T ,1), (T ,2), (T ,3), (T ,4), (T ,5), (T ,6)
There fore total number of possible out comes are 12. It can simply be obtained by taking the product
of the size of the two sets under discussion. In our example it is 2 x 6 =12
In general if there are m ways of doing one thing and n ways of doing another thing, there are (m)(n)
ways doing both .
Multiplication formula. Total number of arrangements =(m)(n)
This principle obviously can be extended to any number of procedures. That is if there are m ways of
doing one thing n ways of doing another thing and r ways of doing the third thing then total number of
arrangements is given by (m)(n)( r) .
Example 14. A manufactured item must pass through three controls stations. At each station the item
is inspected for particular characteristics and marked accordingly. At the first station, 3 ratings are
possible while at the last two stations 4 ratings are possible. Therefore there are 3 x 4 x 4 = 48 ways in
which the item may be marked.
Corralary: Let us say that there are m ways of doing one things; and if there are n ways of doing
another thing. Suppose, unlike the above mentioned possibility, that it is not possible to perform them
together. Then the number of ways in which we an perform the first or the second is given by m + n..
It is possible to extend it to more than arranging two distinct sets.
Example15 : Suppose that you and your friends are planning a trip to some tourist attraction areas,
and all of you are deciding between bus or air transformation. If there are 3 bus routes and 1 air
routes, then there are 3 + 1 =4 different routes available for your trip.
II Permutations
Suppose that we have n different objects. Then the question is in how many ways may these objects
be arranged (Permuted ) where the order of arrangement is important. That is ABC and CBA are
considered as two different arrangements. In general, consider the following scheme. Arranging n
objects is equivalent to filling them into a box with n compartments in some specified order.
1 2 . . . n
We have n choices to fill the first compartment. Once we choose any one of n objects to fill the first
compartment we will have ( n-1) options to fill the second compartment etc. and for the last
compartment we have exactly one option, the total number of arrangements denoted by
n P n , is
given by
n Pn
= n( n-1) ( n-2) n-3) . . . . . (2) (1) = n!
We may be interested in the number of permutations possible when we choose r ( <n) objects from n .
That is at arrangement ( Permutation ) of n objects taking r objects at a time . Like the above case the
first compartment position can be filled in n ways, the second in ( n-1) ways etc, and the last
compartment in ( n-r + 1) ways. Hence total permutation of n objects taken 4 at a time is
n!
n Pr
= n( n-1) ( n-2) (n-3) . . . . . . ( n-r+1) = ………………(15)
(n − r )!
Example16. A manufacturer uses a color code to identify lots of manufactured items. The code
consists of stamping seven colored stripes on the container. The order of the color is significant and
each identification uses all 7 colors.
Suppose that the manufacturer wishes to use seven stripes but has 15 colors available. How many
distinct markings can he get?
15! 15!
= = = 32,432,400.
15 P 7 (15 − 7)! 8!
III. Combinations
When arrangements are made without regard to order, they called combinations. In this case ABC and
CBA are considered to be the same combination. The number of combination of n different objects
n n! P
or = = n r
nC
taken r (0 < r < n) at a time denoted by
r
r (n − r )!r! r!
…………………………………………(16)
Example 17. From eight persons, how many committees of 3 members may be chosen? Since two
committees are the same if they are made up of the same members we have
8 8!
3
= 5!3! = 56
possible committees.
Example 18. A group of eight persons consists of 5 men and 3 women. Here we must do two things:
choose 2 men out of 5; and choose one women. The required number of committee is given by
5 3
. = 30 Committees
2 1
Dear colleague! So far what we have discussed in this sub section is determining the size of the
sample space i. e n (S). Obviously you are asking your self how to use these counting (enumeration)
methods in deciding the size of an event and hence obtain their probabilities?
Dear Learner! Understanding of conditional probability and probability of independent events will be
easier by first learning the concept of joint and marginal probabilities.
Definition. Let A and B be two events in E (event space) of the given Probability space. If P(B) >0
the conditional probability that event A occurs given that B occurred is defined by
P( AnB) P( AB) Jo int Pr obability
P( A / B) = = = …………….(23)
P( B) P( B) M arg inal Pr obilityofB.
P ( AB)
Similarly P(B/A) = , ifP ( A) 0. …………………………………(24)
p ( A)
Example 22. Two fair dice are thrown. Given that the first show 3, what is the
probability that the total exceeds 6?
Solution.
Clearly S= {1,2,3,4,5,6} x {1,2,3,4,5,6}
Then N(s)= 36.
Let B be the event that the first die shows 3, and A be the event that the total exceeds 6. In usual
notations
B= {(3,b): 1b6} , A = {(a,b): a+b>6}
AB = {(3,4),(3,5), (3,6) = joint occurrence.
P ( AB) N ( AB) / N 3 6 3
Hence P(A/B) = = = =
P( B) N ( B) / N 36 36 6
Example 23. A family has two children. What is the probability that both are boys, given that at least
one is a boy?
Solution. The older and younger child may each be male or female, so there are four possible
combinations of sexes, which we assume to be equally likely. Hence the sample space will be,
S= { GG,GB,BG,BB}
We have P(GG) = P(GB) = P(BG) = P(BB) = ¼
The question is to obtain P(BB/one boy at least)
P ( BB (GB BG BB )
= P(BB/GB BG BB)=
P (GB BG BB )
P( BB ) 1 3 1 4 1
= = = x =
P(GB BG BB ) 4 4 4 3 3
From the definitions of conditional probability we can see that
P(AB) = P(A/B)xP(B)= P(B/A)P(A)
An important effect of conditional probability is reducing the sample size. To make this idea clear let
us revisit the formula again
P ( AB )
P(A/B) = . Using frequency approach
P( B)
N ( AB)
P(AB) = , where N(AB) the number of occurrence of AB in the N
N
N ( B)
occurrence. P(B) =
N
N ( AB) N N ( AB)
Hence P(A/B) = x = , From this
N N ( B) N ( B)
We can see that the given event is our new sample space, whose size is smaller than the size of the
actual sample space (S).
In passing we would like to tell you that actually all probabilities are conditional, because
probabilities can not be computed except on the basis of probabilities of basic events. Frequently the
conditioning statement in a probability statement is omitted either because it is obvious in the context
of the problem or it is universally accepted, i.e; (P(A)= P(A/S).
The next question to be investigated is that does the conditional probability satisfy the three axioms
and various postulates of probability?
The answer is yes.
i) P(A/B) = P(AB)/P(B) 0 for every AE
i.e. 0 P(A/B) 1
ii) P(S/B) = P(SB)/P(B) = 1
Proof
Since B S, then BS= B
P ( SB ) P ( B )
Hence = =1
P( B) P( B)
= , then
p ( Ai B) P[ A B] i
P Ai / B = P( A1 A2 ... / B) = i =1 = i =1
i =1 P( B) P( B)
= P ( A1 / B) + P( A2 / B) + P( A3 / B) = p ( Ai / B) ………………..(24)
i =1
The conditional probabilities also satisfy the following rules /properties/ of probabilities:
Assume that the probability space is given and let BE and P(B) >0.
1. P(/B) = 0
P(/B) = P(B)/P(B) =0 [Since B= and P()=0]
2. If A is an event in E(event space), then
P( A /B)= 1-P(A/B)
Proof
We know that A A = S and AA = using this result we have
P( AUA / B) = P( A / B) + P( A / B) = P( S / B) axiom iii)
P ( A / B) + P( A / B) = 1
P( A / B) = 1 − P( A / B)
3. If A1 and A2 E, then
P[A1/B) = P(A1A2/B) + P(A1 A2 /B)
Proof
We can write A1 = A1A2 A1 A2
And we know A1A2 A1 A2 = A1(A2n A2 )=
Proof
We know that AS= A(B1 B2 ... Bn)
n
= AB1 AB2 .... ABn= AB j = A then
j =1
n n
P(A)= P ( AB ) j =P A B j P ( B j ) =
j =1 j =1
P( A B1 ) P( B1 ) + P( A B2 ) P( B2 ) + + P( A Bn ) P( Bn )
n
P( A) = P( A / B j ) p( BJ )
j =1
The total probability theorem will help us to calculate the probability of a single event A using the
conditional probability and the concept of simultaneous occurrence of two events.
The formation of a partition of S means that when the experiment is performed then exactly one of the
events Bj will occur. The result
Solution
Let A = {the item is defective}
B1 = {the item came from factory 1}
B2 = {the item came from factory 2}
B3 = {the item came from factory 3}
P(B1) = 1 , while P(B2) = P(B3) = 3
2 4
Probability that the defective item is selected from 1, is
P(A/B1) = 2% = 0.02
P(A/B2) =2% = 0.02
P(A/B3) =4% = 0.04
P(A) = P(A/B1) P(B1)+P(A/B2) +P(A/B3) P(B3)
P(A) = (0.02)( 1 ) +( 0.02)( 3 )+( 0.04)( 3 ) = 0.025
2 4 4
1.6.3. Independence
In general, the occurrence of some event B changes the probability that an other event A occurs, then
P(A) is replaced by P(A/B). If the probability remains unchanged, that is if P(A/B) = P(A), then we
say that event A is independent of event B. In other words the occurrence of B has no bearing on the
occurrence of A.
Definition. Let A and B be two events in E of the given probability space .Events A and B are called
independent if and only if any one of the following conditions is satisfied
(i) P(AB) = P(A)P(B)
(ii) P(A/B) = P(A), if P(B) >0 …………………….(26)
(iii) P(B/A) = P(B), if P(A)>0
Example25. Suppose that a fair die is tossed twice.
Let A = { the first die shows an even number}
B = {the second die shows 5 or 6}
Obviously it is clear that events A and B are totally unrelated. Knowing that B did occur provides no
information about the occurrence of A. N(S) = 36 and the outcomes are equally likely.
18 1 12 1 1 1 1
P( A) = = , P( B) = = . Hence P(AB) = P(A).P(B)= x =
36 2 36 3 2 3 6
P ( AB) 1 3 1
P(A/B) = = x = .
P( B) 6 1 2
Independence of A and B imply independence of other events as well.
If A and B are two independent events in E, then A and B , A and B and A and B , are all
independent events. Let us see some of these:
A) P(A B ) = P(A) P( B )
Proof: We know A B = A-AB then
P(A B )= P(A) -P(AB)
= P(A)-P(A).P(B)
= P(A) [1-P(B)]
= P(A) P( B ) [b/se 1-P(B)=P( B )]
B) P( A B)= P( A )P(B)
Proof: We know that B A = B-AB, hence
P(B A ) = P(B) =P(AB)
= P(B) -P(B).P(A)
= P(B) [1-P(A)]
= P(B) P( A) [b/se 1-P(A)=P( A)
C) P( A B ) = P( A ) P( B )
The notion of independent events may be extended to more than two events.
Independence of several events.
Let A1, A2,… , An be family of events in E. Events A1,A2…,An are called independent if and only if
P(AiAj) = P(Ai) P(Aj) for i j
P(AiAjAk) = P(Ai) p(Aj) P(Ak), for ij, jk, ik
n
n
More generally P i P( Ai )
A = [ is sign of multiplication] …….(27)
i =1 i =1
If the family { A1,A2,---An} has the property that P(Ai Aj) = P(Ai) P(Aj) for i j then it is called
pair wise independent. This pair wise independence is not necessarily true always. To show this is
beyond the scope of this module.
The definition of independence of events is used not only to check whether two events are
independent but also to model some experiment. For example, for a given experiment the nature of
the events A and B might be such that we are willing to assume that A and B are independent; then
the definition of independence gives the probability of the event AB in terms of P(A) &P(B).
Example26: Consider the experiment of sampling with replacement from an urn containing M balls
of which K are black and M-K white. Since balls are being replaced after each draw, it is reasonable
to assume that the out come of the second draw is independent of the outcome of the first. Then
P(2 blacks in first two draws) = P(black on 1st draw) x P(black on 2nd draw) =
2
K K K
=
M M M
Let A1,…,An be events in E for which P(A1 --- An-1) >0; then P(A1A2 --- An) = P(A1)
P(A2/A1) P(A3/A1A2) --- P(An/A1A2---An-1) ……………………(28)
In other words we are often interested in P(A1A2A3) let the joint occurrence of A2A3 be denoted by B.
We know, from conditional probability, that
P(A/B)= P(A/B)P(B)
= P(A1/A2A3) P(A2A3)
We also know that
P(A2A3) = P(A2/A3) P(A3) - Substituting this in the above formula we get
P(A1A2A3) = P(A1/A2A3) P(A2/A3) P(A3)
It can be written in several other ways by permuting the subscripts, such as
P(A1A2A3) = P(A3/A1A2) P(A2/A1) P(A1)
Then extending the above chain rule to n events we get the following rule:
P(A1A2 --- An) = P(A1/A2 --- An) P(A2/A3 --- An) --- x = P(An-1/An) P(An).
If A1,A2, ---,An) are pair wise independent then
n
P(A1,A --- An) = P(A1). P(A2) ---x P(An) = P( A ) ………….(29)
i =1
I
The multiplication rule is primarily useful for experiments defined in terms of stages.
1.7. Bayes' Theorem
Many of the important problems of economics, science, engineering, etc are concerned with problem
of causation. If P(A/B) =1 and B occurs, then obviously A occurs. Then it would be stated, under
these circumstances, that A is because of B if B happens before A. Suppose that the occurrence of one
of n mutually exclusive events A1,A2, --- , An is necessary for the occurrence of B. Given that B has
occurred, we wish to know which of A's preceded it, so we ask for the probability that A i has
occurred, given that B occurred. This is the problem which Baye's theorem solves. The formal
definition of the theorem is as follows.
Let S be the sample space of some experiment. The disjoint events A1,A2, ---, An are partition of S
satisfying
n
S= A
i =1
i and P(Ai) >0 for i=1, ---, n, then for every B E for which P(B)>0
P ( B / Ai ) p ( Ai ) P ( B / Ai ) P ( Ai )
P(Ai / B) = = n ……………(30)
P( B / Ai ) P( Ai )
P( B)
i =1
Proof
P( Ai B) P ( B / Ai ) P( Ai )
P(Ai / B) = and P(AiB) = P(B/Ai) P(Ai), then P(Ai / B) = .
P( B) P( B)
But P(B) = P(A1B) + P(A2B) +---+P(AnB).. (from theorem of total probability)
P ( B / Ai ) P ( Ai )
Therefore P(Ai / B) = n
P( B / A ) P( A )
i =1
i i
This is known as Baye's theorem because Baye (1763) was one of the first to consider this problem.
Example 27- Three different machines are used to produce chocolate chip cookies by Lovely
Company, which promise to have at least six chips in every cookie. Suppose machine No 1 produces
20% of Lovely cookies, No.2 Produces 30% , and No.3 Produces 50%. Also suppose that the
machines represent different vintages of capital so that 1% of the cookies produced by machine No.1
are defective, in the sense that they have less than six chips, 2% of those produced by machine No.2
are defective, and 3% of those produced by No. 3 are defective. If one cookie is chosen at random and
observed to be defective what is the probability that it was produced by machine No. 2?
Solution Let Ai is the event that the randomly chosen cookie is produced by machine No.i
Thus P(A1) = 0.2, P(A2)= 0.3 and P(A3) = 0.5 and if B is the event that a randomly drawn cookie is
defective, then P(B/A1) = 0.01, P(B/A2) = 0.02 and P(B/A3) = 0.03 using Baye's theorem we have
P( B / A2 ) P( A2 ) (0.3)(0.02)
P(A2/B) = =
3
(0.2)(0.01) + (0.3)(0.02) + (0.5)(0.03)
P( B / A ) P( A )
i =1
i i
= 0.26
Bayes' theorem has interesting interpretation. The probabilities P(Ai) can be called 'Prior '
probabilities since they represent the probabilities of different machines in the above example,
producing a randomly selected cookie before that cookie is checked to see if it is defective. The
conditional probability P(Ai / B) is called a 'posterior ' probability since it represents our assignment of
probabilities after the sample evidence of the defective cookie is obtained. Thus, stated verbally,
Bayes' theorem is the probability of an event Ai is proportional to the probability of the sample
evidence after Ai times the prior probability of Ai.
Chapter II: Random Variables and probability distribution
2.2. The Concept & Definition of Random Variable
In the previous chapter when we were describing the sample space of an experiment we did not
specify that an individual outcome of a random experiment can be or needs to be a number. In our
earlier discussions of assigning probabilities to events we have seen a number of examples in which
the results of the experiment was not a numerical quantity. For example, in classifying a manufactured
item we might simply use the category of defective and non – defective.
How ever, in many experimental situations we are going to be concerned with measuring and
recording it as a number. That means in many experimental situations we want to assign a real
number to every element of the sample space S.
Example 28. Consider an experiment of tossing two coins at once. The sample space associated with
this experiment is
S= { HH, HT, TH, TT}
Let us define the random variable X to be the number of heads obtained in the two tosses.
Example 29. Consider an experiment of tossing of two 6 sided fair dice. It is known that S has 36
possible out comes; it is possible to define many random variables from this experiment. Let us see
two of them. Let random variable X be the sum of the upturned faces. Also let Y denote the absolute
difference between the upturned faces. As we know the minimum value that the random variable X
will possibly take is 2 (i.e. 1+1=2) , and the maximum value that it will possibly take is 12 ( i.e 6+6) .
Similarly the minimum value that the random variable Y will possibly take is zero ( i.e 1-1) and the
maximum value that it will take is 5, i.e. either (1-6) or( 6-1) . We can summarize all possible values
that the two random variables will possibly assume as follows:
X
X(W
i )
Having this in mind, we make the following formal definition of random variable.
Definition: Let E1 be an experiment and S a sample space associated with the experiment, a function
X assigning to every element i S a real number X( i )= x is called random variable.
Therefore, a random variable is a real- valued function that assigns a number to each sample point in
the sample space of a random experiment.
To remind you again, prior to an experiment we know what value the random variable X might take,
but the value that X does take, x , is not known until the experiment has been performed.
Notations
It is important; to distinguish between the rule or function that assigns the numbers to each sample
point and numerical values them selves. In this module we distinguish between the two i.e. between
the random variable and the value that it takes, by denoting the random variables (the rule) by capital
letters such as X, Y, Z, and the values of the random variable by lower case letters x, y, z
respectively. Hence, statements like P (X= x i ) or P(X<5) means the probability that the random
variable X takes the value x or takes values less than 5, respectively.
As we were concerned about the events associated with the sample space S, so we can generate events
from random variables and then assign probability to them. In the study of Random variables,
question of the following form arises: what is the probability that the random variable ( r, v) X is less
than a given number x , or what is the probability that the r. v. X is between two values x 1 and x2.
Hence {X< x ) and { x1 < X< x2 ) are events. Similarly, we may ask what is the probability that the r.
v. X takes value exactly equal to x2 . In this case{ X= x2 } is an event . Consider example 28 above,
the value of the r,v, X taking exactly two heads { X=2}; the r. v. X taking at least one heads { X > 1}
are events generated from random variables.
In discussing many of the important concepts associated with random variables, it is convenient to
distinguish between discrete and continuous random variables.
Definition. Let X be a random variable. If the number of possible values of X is finite or countably
infinite, we call X a discrete random variable. The random variables indicated in example 28 and 29
above are examples of discrete random variables. In addition number of defective oranges, marks on a
test, number of students in a university etc is some examples of discrete random variables. The value
of a discrete r.v is determined by counting, thus its value is expressed in terms of whole number, like
0,1,2, . . . etc.
Remember that we are interested in the probability that r. v.X, is taking certain value x , or taking
less than x or fall between two values say x1 and x2. For this reason, next, we will try to construct a
probability distribution or probability mass function.
The function P ( xi ) is called the probability mass function or discrete density function of the discrete
random variable. Consequently 0 P( xi ) 1 . If x is not one of the values that X can take, then P(
xi ) = 0. Also if the sequence x1, x2, x3 . . . include all values that X can take then the sum of the
discrete density function is 1. That is P( xi) = 1 . This last result is due to the following: If X is
i 1
discrete random variable with distinct values x1, x2, . . ., xn, . . . then S= { = X( )= x
n
n }
= { X= x n }, and { X= xi } { X= xi }= for it j
n
Hence P(S) =1 = P( x = xn ) = P( xi )
n i =1
by the third axiom of probability. The collection
of ordered pairs ( xi , P ( xi )), i= 1, 2, . . . is called the probability distribution of the discrete random
variable X.
Example30. Let the experiment be the tossing of three coins simultaneously. And let X is the number
of heads obtained. Obtain the probability distribution of the discrete random variable X.
Solution.
The sample space S= { HHH, HHT, THH, HTH, TTH, HTT, TTT} . Assume each of the out come is
equally likely. The discrete density function and the probability distribution function can be presented
as a table illustrated here bellow.
X= No of favorable
Outcomes number of cases Probability P
favorable events heads (xi)
HHH 3 1 1/8
Probability
HHT, THH,HTH 2 3 3/8
TTH, HTT 1 2 2/8 mass or
TTT 0 1 1/8 discrete density
function
From this we can have another reduced from which consists of the list of possible values of X and the
corresponding P( xi ), that is
X P( xi )
3 1/8 The first row reads as the probability of X is exactly 3 is 1/8.
2 3/8 Symbolically P(X=3)=1/8. We can as well see that P( xi ) if it exists
1 2/8
0 1/8 lies between 0 and 1, that is 0 < P( xi ) <1.
If we take the sum of the probability mass functions over the entire value of the r.v.X we can see that
3
it is equal to 1. That is P( x ) =1.
i =0
i
The above table shows the probability distribution of the discrete random variable X generated from
the given experiment. The above discrete density function can be represented graphically as
follows:
3/8 3/8
2/8 2/8
1/8 1/8 1/8
0 1 2 3 x
Example 31. Let the experiment be tossing of a fair die twice. And let X be the sum of the upturned
numbers. Obtain the probability distribution of the discrete random variable X.
Solution
We know that N(S)=36. The possible values of X and the corresponding probability mass function is
given below.
X P( xi ) Probability Statement
2 1/36 → P(X=2)
3 2/36 → P(X=3)
4 3/36 → P(X=4)
5 4/36 → P(X=5)
6 5/36 → P(X=6)
7 6/36 → P(X=7)
8 5/36 → P(X=8)
9 4/36 → P(X=9)
10 3/36 → P(X=10)
11 2/36 → P(X=11)
12 1/36 → P(X=12)
Graphically
6
36
5 5
36 36
4 4
36 36
3 3
36 36
2 2
36 36
1 1
36 36
012 3 4 5 6 7 8 9 10 11 12
The cumulative distribution function, F( x ) is the probability that the random variable X takes on a
value at or below a number x . It is given by
F( x )= P (X< x ) for – to
P(X< xi ) = P( x ) ,……………………………….(32)
i = xi X
i
where the summation is over all the values the probability mass function P( xi ) for the values the
random variable can take less than or equal to the specified value.
Example 32. Consider the experiment of tossing a coin 4 times and let X= number of heads obtained.
Then we have
X P( xi ) F( x )
0 1/16 0 For X<0
1 4/16 1/16 For 0X<1
2 6/16 5/16 For 1X<2
3 4/16 11/16 For 2X<3
4 4/16 15/16 For 3 X<4
1 For X4
The probability that the r.v.X will take value less than 3, P[X<3] is obtained by adding the probability
mass functions of the previous three values. That is
P[X<3]=P(X=0) + P(X=1)+P(X=2)
1 4 6 11
+ + =
16 16 16 16
If we are interested on probability of the r.v.X taking values less than or equal to3, it will be as
follows. F(3)=P(X3) = P(X=0) + P(X=1)+P(X=2)+P(X=3)
1 4 6 4 15
+ + + =
16 16 16 16 16
11
Note that F( x ) is defined for all real numbers, hence F(2)= P(x2)= P(x=0) + P(x=1)+P(x=2)=
16
Similarly we can obtain F(2.5) even though X can not actually take the value 2.5. That is
11
F(2.5)= P(X2.5)= P(x=0) + P(x=1)+P(X=2) = . This is the same as F(2) . The implication is that
16
the value of cumulative distribution function remains constant between two possible values. The
value of F( x ) will change when it takes the next value of the r.v.X. There fore between, say 2 and 3,
of our example F( x ) will remain to be equal to F(2); when X takes the value 3 then F( x ) will be
11
equal to .
16
For discrete random variable, the c.d.f is a step function and is right continuous.
F( x )
15
16
11
16
5
16
1
16
0 1 2 3 4 x
Properties of a cumulative Distribution Function F(x)
Cumulative distribution function has the following properties.
i) F (−) lim F ( x) = 0, and F (+) lim F ( x) = 1
x → − x → −
ii) F( x ) is a monotone, non decreasing function that is if a<b, then F(a)<F(b).
iii) F( x ) is continuous from the right; that is lim F ( x + h) = F ( x)
0 h→0
Corollary. If F( x ) is c.d.f of the r.v.X and a < b then P(a <X b) = F(b)- F(a).
Proof
Note that events {0< X b} and X a are disjoint and their union is the X b . Hence by
addition rule of probabilities we have:
P ( a X b) + P ( X a ) = P ( X b) P ( a X b) = P ( X b) − P ( X a ) =
F (b) − F (a ) . Also, we can proof property ii as follows: We know, from the above, that { X b}={
a X b } { X b} and { a X b } { X b }= . Therefore P(X b) = P( a X b
)+ P( X a ) > P( X a ). Hence F (b) F (a) this proves it.
P( xi ) = F( xi )-F( xi −1 ) ………….(33)
Let the discrete r.v.X. take values - x1 x2 xn . We know that
F( xi )= P( X = x ) =
i = xi x
i P( xi )+ P( X = xi )
i = xi −1 x
And F( xi −1 ) = P( X xi −1 ) P( X = x )
i = xi x
i
F( xi ) -F( xi −1 ) = P( xi )+ P( X = x
i = xi x
i −1 )- P( X = x
i = xi x
i −1 ) =P( xi )
Thus given the c.d.f of a discrete random variable we can calculate its probability mass function.
Example 33. We hope that you have obtained the c.d.f. of the discrete random variable X of example
31. Let us say that you are given only the c.d.f. instead of the discrete density functions. In that
example F(4)=6/36 and F(5) = 10/36, then how much is P(X=5)?
Solution
10 6 4
P( xi ) = F( xi ) - F( xi −1 ) P(X=5) = F(5)-F(4) = − =
36 36 36
Example 34. From a lot of 10 items containing 3 defectives, a sample of 4 items is drawn at random.
Let the r.v.X. denote the number of defective items in the sample. The sample is drawn without
replacement. Find P(X1), P(X<1) and P(0<X<2)?
Solution
Possible values of the discrete r.v.X. are 0,1,2,3 in sample of 4 items. Next we try to obtain the
probability mass functions of the discrete r.v.X.
X 0 1 2 3
P( xi ) 1 1 3 1
6 2 10 30
1 1 2
a) P(X 1)=F(1)=P(X=0)+P(X=1)= + =
6 2 3
1
b) P(X<1)=P(X=0) =
6
1
c) P(0<X<2)= P(X=1) =
2
Example 35: A random variable X has the following probability mass function.
X 0 1 2 3 4 5 6 7
P( xi ) 0 k 2k 2k 3k k2 2k2 7k2+k
i) Find k ii) Evaluate P(X< 6), P(X 6) and determine the distribution of X..
Solution
7
i) We know that P( x ) = 1
i =0
i
7
Therefore P( x ) =10k +9k=1
i =0
i
2
10k2+9k-1=0 → quadratic equation
=(10k-1)(k+1)=0
We have two solution k=-1 and k=1/10. Which one do you think is the correct answer? Why? The
correct answer is k=1/10, because probability can not be less than zero.
i) P(X< 6)= p(X=0)+P(X=1)+P(X=2)+P(X=3)+P(X=4)+P(X=5)
1 2 2 3 1 81
= 0+ + + + + =
10 10 10 10 100 100
P(X 6)= 1-P(X< 6)
81 19
=1- =
100 100
What we are saying is that if a and b are end points of a given interval and if the r.v.X. can take all
possible values [a ,b] or a X b we call the r.v.X a continuous random variable. Examples of
continuous random variables are the age, height, weight of students in a class, distance between two
locations etc. In all these cases we talk about the value in a particular interval, not at a point. For
example the distance between. Addis Ababa and Bahir Dar is usually taken to be 565 km. But if we
want to be accurate it may be 565.25, and again we want to be more accurate it may be 565.257. Thus
the said distance, when it is expressed to the desired degree of accuracy in stead of being on specific
value it will be the interval between 565- 565.257 km.
The implication is that unlike the discrete random variable, a continuous random variable can not take
a value at a point. Hence, we can redefine a continuous random variable as “a random variable is said
to be continuous when its different values can not be put in one – to – one correspondence with a set
of positive integers.’’ It assumes infinite and uncountable set of values.
[Link].Probability Density Function
X
Definition. If X is a continuous random variable, the function f (x ) in F( x )= f (u )du is called
−
the probability density function (p.d.f) of X. Here F( x ) is the cumulative distribution function of X.
For continuous random variable X, F( x ) is absolutely continuous.
Other names of probability density function include density function, continuous density function, and
integrating density function.
Dear colleague! Can you observe, from the above definition, that F( x ) can be obtained from f (x )
and vice versa? Then let us put it in terms of theorem.
Theorem 2. Let X be continuous random variable. Then F( x ) can be obtained from f (x ) , and vice
versa.
X
f (x ) ; that is, F( x ) = f (u )du . If, on the other hand F( x ) is given, then f (x ) can be obtained
−
dF ( x )
by differentiation of F( x ), i.e. f (x ) = for those values for which F( x ) is differentiable.
dx
Note strictly that, unlike the situation for discrete random variables, the p.d.f of a continuous random
variable, f (x ) , will not give the probability that X takes the value x . It means while P( xi ) is the
probability of discrete r.v.X taking value xi , f (x ) is not the portability . For continuous
variable X the probability is given by the areas under the curve of f (x ) with corresponding interval
on the horizontal axis.
For continuous random variable X, if x1 = x and x2 = x + x then p.d.f. f (x ) can be expressed as
a limit,
dF ( x) F ( x + x) − F ( x)
F( x ) = = lim = f (x ) x= lim ( F ( x + x) − F ( x)
dx x → 0 x x → 0
b
(iii) for any a ,b with - <a <b < , we have P( a X b ) = f ( x)dx
a
f (x )
P(aXb
) f (x )
a b x
f ( x)dx = 0 .
c
i.e are of a point is zero.
Alternatively, we know that P(aXc)= F(c)=0. Consequently the following probabilities are all the
same if X is a continuous random variable because all the following intervals differ only by one or
two points that have probability zero. They are:
x x
b) Note that F(-) = lim
x → −
−
f (u )du =0, F()= lim
x →
f (u ) du =1 and when a < b,
F(a)F(b).
dF ( x )
Proof. From f (x ) = , we have dF (x ) = f ( x) dx .
dx
Integrating dF( x ) = f ( x) dx from - to x and using the fact that F(-) =0 we get
x
F( x )= f (u )du =0
−
Since F()=1, there fore F( x ) = f ( x)dx = 1.
−
c) The probability for r.v.X. to fall in the finite interval [x1,x2] is
x2
F(x2)-F(x1)= f ( x)dx
x1
……………………(35)
x2
P(x1<Xx2)= f ( x)dx
x1
d) For discrete random variable P ( xi ) is a function with domain real line and counter domain
[0,1]; where as f (x ) is a function with domain real line and counter domain the infinite
interval [0,].
Solution
x 2 1
1
i)
−
f ( x ) dx = 1 2xdx = 1 2
0
= 1− 0 = 1
2 0
since f (x ) 0 and 2( x)dx = 1 it is p.d.f.
1
0
1
2
1 1 1
ii) F( )
2
0
2
f ( x)dx = 1 2 xdx =
0
4
1 1 2
iii) P x / X
2 3 3
1 1 2
First obtain X X
2 3 3
1 1
The result of this intersection is X
3 2
1 1
p X
2
Therefore P x / X =
1 1 3 2
2 3 3 1 2
P X
3 3
1
2 2
2 xdx
5
1 5
3
2
= 36 =
3
2 xdx 1 12
3
1
3
Example 37. Let X be the life length of a certain type of light bulb in hours. Assuming X to be
continuous random variable, and suppose that its p.d.f. is given by
a
f ( x) = , 1500 X2500
x3
=0, other wise.
For what value of a f (x ) will be p.d.f?
Solution
To obtain the constant a , we invoke the condition f ( x)dx = 1 In this case it will be
−
a dx = 1 a 1 dx = 1
2500 2500
1500
x3 1500
x3
1
= a= x −3dx = 1
2500
a
2500 −3
x dx 1500
1500
1
= = 7,031,250
−2
x 2500
− 1500
2
The graph of f (x ) is given below
f (x )
x=1500 x=2500
Example 38, Let X be continuous random variable with p.d.f.
3e−3 x , X 0
f (x ) = otherwise
0
i) Check that f (x ) is p.d.f.
ii) Evaluate the probability that X falls between 0.5 and 1
iii) Obtain c.d.f.
Solution
e−3 x
f ( x)dx = 3e−3 x dx = 3. lim
i) =1
→ − 3
0
− 0
1
3e
−3 x
ii) P(0.5X1)= dx = -e-3+e-1.5=0.173
0.5
iii) The c.d.f. is obtained by integrating the p.d.f.
x x
For x >0, F(x)= f (u )du = 3e−3u du = − e3 0x = 1 − e −3 x
− 0
0 ,x 0
Therefore, F(x) = −3 x
1 − e ,x 0
Then P(0.5X1) can also be computed as F(1)-F(0.5)= (1-e-3)-(1-e-1.5)=0.173
Example 39, Obtain the c.d.f of the p.d.f. given in example 36 above
Solution
2 x for0 x 1
Given f ( x) =
0
otherwise
F ( x) = 0 if x 0
x x x
For 0<X1, F ( x) =
−
f (u )du = f (u )du = 2udu
0 0
2 2
=
x
0 = x2 − 0 = x2
2
For x >1 =1
0 , x 0
Therefore F(x) = x2 , 0< x 1
1 , x >1
F(x)
1, 1
Example 41, Fantu sells new cars for Nyala motors. He usually sells the largest number of cars
on Saturday. He has established the following probability distribution for the number of cars he
expects to sell on a particular Saturday.
No of cars P( xi )
0 0.10
1 0.20
2 0.30
3 0.30
4 0.10
Solution
n
E[ X ] = x P( x )
i =1
i i
= 0(0.10)+1(0.2)+2(0.2)+3(0.3)+4(0.10)
= 2.1 cars. This average is the expected value of the r.v. X , though X can
not actually take the value 2.1.
Example 42. Consider the experiment of rolling a single die. Let X be the value that shows on the
die. The probability distribution of X is
xi 1 2 3 4 5 6
P( xi ) 1 1 1 1 1 1
6 6 6 6 6 6
What would the average value of X be if the experiment were repeated an infinite number of
times?
Solution
n
E[ X ] = x P( x ) =
i =1
i i
1 1 1 1 1 1
= 1( ) + 2 ( ) +3 ( ) + 4 ( ) + 5 ( ) + 6 ( )
6 6 6 6 6 6
1 7
= (1+2+3+4+5+6)= 2 = 3.5
6
This is the expected value of X , despite the fact that X can not actually lake the value 3.5.
Example 43. Suppose X is a continuous random variable with p.d.f.
1
9 x
2
for 0 x 3
f (x ) = .
0 otherwise
Then, find E[ X ]?
Solution
E[ X ] = f ( x ) dx =
−
3
1 2 1 11 3 1 2 3
=
0
9
x dx = 3 x 2 dx = x 3 0 =
9 9 3 27
x 0
=9
4
0
Example 44 . A continuous distribution of a variable X in the range (-3, 3) is defined by
1
f (x ) = ( 3+x)2 , -3 < x < 1
16
1
= ( 6 -2x2) , -1 < x < 1
16
1
= ( 3+x)2 , -1 < x < 3
16
Find the mean, E( X ), of the above distribution.
Solution
We can consider each segment separately and then add the results to obtain the
expected value of X .
E[ X ] = xf ( x)dx
−
−1 1 3
1 1 1
−316 + −116 − 1 16 x(3 + x) dx
2 2 2
= x (3 x ) dx + x ( 6 2 x ) dx +
1
−1 1 3
16 −3 −1 1
−1 −1 −1
1
1 1 3 3 3
= x3 + 6 x 2 dx + 9 xdx + 6 xdx − 2 x dx + x dx + 6 x dx + 9 xdx
3 3 2
16 −3 −3 −3 −1 −1 1 1 1
=
1
− 20 − 52 − 36 + 0 − 0 + 20 + 52 + 36]
16
= 0] =0
1
6
Example 45 A survey conducted over the last 25 years indicated that in 10 years the winter was
mild, in 8 years it was called and in the remaining 7 years it was very cold. A company sells 1000
Woolen coats in a mild year, 1300 in a could year and 2000 in a very cold year.
Find the yearly expected profit of the company if a woolen coat costs Birr 173 and it is sold to stores
for birr 248.
Solution
The probability distribution of the sells is
Dear Student! We hope that you may have noticed that the expected value of X is a weighted average
of the values that the random variable X can take, the weights being the probabilities of each values.
E[ X ] can also be regarded as the center of gravity of the probability distribution
E( a + b X )= a + b E( X )
Proof
i) If X is discrete random variable,
E( X ) =
xi P( xi )) =
(a + bxi ) p( xi )
[ap( xi ) + bxi p( xi )]
Taking summation for all we have
aP( xi ) + bxi P( xi ). This can be written as
−
a p( xi ) + b xi P( xi )
= a + b E(x)
ii) If x is continuous random variable, then E( X ) = xf ( x)dx
−
E( a + b X )
= (a + bx) f ( x)dx
−
= [af ( x)dx + bxf ( x)dx]
−
=a
−
f ( x)dx + b f ( x)dx
−
=a +b E( X )
In both cases we have utilized theorem 1 and the definition of the expected value of X .
Expected Value of a Function of a Random Variable
Theorem 3, Let X be the random variable and g( X ) is the function of the r,v, X , then its
expectation is given by
i) E [ g( X )] =
g ( xi ) P( xi ) ………………………(38)
for discrete random variables.
ii) E[g( X )] = g ( x) f ( x)dx. …………………….(39)
−
for continuous random variables.
1
b b
1
= x2
a
dx. =
b−a
b−a a
x 2dx.
1 b3 − a 3 1 2
=
3 b−a 3
(
= b + ba + a
2
)
Theorem 4,If g1( X )and g2( X ) are two functions of the r.v. X . and a and b are
constants, then E[ag1( X ) +b g2( X )]= aE[g1( X )] +b E[g2( X )]
proof
i) E[ag1( X ) +b g2( X )]= [ (ag1( xi ) + b g2( xi )] P ( xi )
=a g1( xi ) P ( xi ) + b g2( xi ) P ( xi ) .
=a E[g1( X )]+ b E[g2( X )]
ii) E[ag1( X ) +b g2( X )]= [ag ( x) + bg
−
1 2 ( x)] f ( x )dx.
= a
−
g1 ( x) f ( x)dx + b g 2 ( x) f ( x)dx.
−
= aEg1 ( X ) + bE[ g2 ( X )]
B) Variance
The variance of a random variable X is the measure of the spread or dispersion of the density of X.
Let X be a random variable and its mean then the variance of X, denoted by 2 is defined by
i) 2=E[ X -E( X )]2=
( xi − )2 P( xi ) for discrete random variable……(40)
=E[ X -E( X )] (x − )
2
ii) 2 2
f ( x)dx. for continuous random variable…(41)
−
= E( X 2)- 2
Var ( X ) = E( X 2)- (E( X ))2
The required steps are:
a) first obtain the mean, i.e. E( X )
b) Second obtain the expected value of the square of the r.v. X . i.e E( X 2)
c) Finally subtract the mean square from E( X 2).
Note carefully that E( X 2)(E( X ))2
Example 47. Calculate the variance of the probability distribution of example 42, above
Solution
We know Var( X )=E( X 2)-(E( X ))2
X P(xi) x2 x2p(xi)
1 1 1 1
6 6
2 1 4 4
6 6
3 1 6 6
6 6
4 1 16 16
6 6
5 1 25 25
6 6
6 1 36 36
6 6
88
E( X 2)= x 2
6
i P( xi ) =
21
E( X )= xi P( xi ) =
6
2
88 21 87
Var ( X )= = 2
- =
6 6 63
Example 48 Let X be continuous random variable with p.d.f. given by
f ( x) = 6 x(1 − x) 0 x 1. Obtain variance of X .
Solution
We know that var ( X )=E( X 2)-2
1 3 1 4 1 1
1 1
E( X )= − 0 − 0 ( x − x )dx = 6 3 x − 4 x 0 = 2
2 3
xf ( x ) dx. = x 6 x (1 x ) dx. = 6
1 1
1 1
6
x f ( x)dx. = x 6 x(1 − x)dx. 6 ( x3 − x 4 )dx = 6 x 4 − x5 10
2 2
E( X 2)= =
− 0 0 4 5 20
2
6 1 1 1
Var( X )= - = S.D.(x) = var( x) = =
20 2 20 20
Theorem 5. Variance of a constant value is zero. i.e. if r.v. X takes constant value c
throughout then its variance will be zero.
Var( X ) = E( X 2) - (E( X ))2
E(c)=c from theorem 1 of expectation
E(x2) = E(c2) = c2 theorem 1 of expectation
Var( X ) = c2- c2 =0
Theorem 6 . Let X be random variable and if a and b are constants we have the
following useful results.
i) Var (a X ) = a2 Var( X )
ii) Var (a+ b X ) = b2 Var( X )
iii) Var ( X a) = Var( X )
2.5. Moments
2.5.1. Moments
Do you remember moments that you have studied in introductory course? The idea that we are going
to discuss here is just the same as the previous case.
Moment is moment of a random variable about some point and these moments are used to describe
the various characteristics of a distribution: Central tendency, Dispersion, Skew ness and Kurtosis.
Moments are the expectations of the powers of the random variable which has the given distribution.
We can calculate moments from three points:
( x − a)
r
ii) f ( x)dx , for continuous r.v…………….(52)
−
Sometimes r moment about a is known as, the rth central moment of X about a. If we take a = 0,
th
then we have
'r=E( X -0)r =E( X r) = i) x r P( xi ) , ………… (53)
for discrete random variable.
x
r r
ii) E( X ) = f ( x)dx , for continuous random variable…….(54)
−
'1= ( x − a ) f ( x ) dx =
−
−
xf ( x)dx − a f ( x)dx
−
'1 = − a
[ f ( x)dx =1]
−
= '1 + a
( x − a)
2
=ii) p( xi ), for continuous r.v. ………(57)
−
The third moment is given by
'3=E( X -a)3=
= i)
( xi − a)3 P( xi ) for discrete r.v. ………………(58)
( x − a)
3
ii) p( xi ), for continuous r.v…………….(59)
−
The fourth moment is given by
'4=E( X -a)4=
= i)
( xi − a)4 P( xi ) , for discrete r.v. ………………(60)
( x − a)
4
ii) p( xi ), for continuous r.v………………..(61)
−
to obtain the2nd , 3rd and 4th moment about origin what we need to do is to tale a =0.
Definition. If X is a random variable, the rth central moment of X about , denoted by r , is given
by
r= E( X -)r =
= i) ( xi − )r P( xi ) for discrete r.v. ……………..(61)
ii)
( x − )r f ( x)dx, for continuous r.v…………(62)
−
Computation of the basic for moments is the same as above.
1. The 1st moment about mean is given by
1= E( X -) =
i)
( xi − ) P( xi ) = 0., for discrete r.v.
(x − )
ii) f ( x)dx for continuous r.v.
−
1 is always zero.
2. The 2nd moment about mean is given by
2= E( X -)2 =
i)
( xi − )2 P( xi ) =2., for discrete r.v.
ii)
( x − )2 f ( x)dx = 2 for continuous r.v.
−
2 is variance.
rd
3. 3 moment is given by
3= E( X -)3 =
i)
( xi − )3 P( xi ) for discrete r.v.
ii)
( x − )3 f ( x)dx for continuous r.v.
−
4. 4th moment about mean is given by
4= E( X -)4 =
i)
( xi − )4 P( xi ) ., for discrete r.v.
ii) ( x − )4 f ( x)dx for continuous r.v.
−
We frequently utilize moments about mean to describe the basic characteristics of probability
distribution of a given random variable.
You, already, are well acquainted with the mean, variance, and other two important characteristics of
a distribution. They are Skew ness and Kurtosis. As you know Karl Pearson has developed measures
for coefficients of Skew ness and Kurtosis based on moments about mean. We will discuss them in
this subsection.
Skew ness
Skew ness is lack of symmetricity. K. Pearson's coefficient of skew ness is given by
32
A. 1 = ………………………..(63)
23
if 1=0 symmetrical
if 1 > 0 (positive) the distribution is positively skewed
if 1< 0 (negative) the distribution is negatively skewed.
However 1 has serious limitation. Those are since 32 and 23 are always positive 1 is always
positive. Thus 1 is not able to tell us about the direction of skew ness completely. Hence the
alternative measure of skew ness which is free from the above limitation is 1 .
1 = + 1 =
3 …………………………………. .(64)
3
Hence if 1 =0 the distribution is symmetrical. 3=0
if 1 >0 the distribution is positively skewed 3is positive
if 1 <0 the distribution is negatively skewed 3<0
y y r1<0 y r1>0
r1=0
a = b symetric
a b
x x x
Symetrical -vely skewed +vely skewed
B) Kurtosis
Kurtosis is measure of flatness o peaked ness of a symmetric curve. Coefficient of Kurtosis is given
by
4
2 = ……………(65)
22
if 2=3 the distribution is normal or mesokurtic
if 2 < 3 (positive) the distribution is platykurtic
if 2> 3 (negative) the distribution is Leptokurtic.
Alternatively
2 =2-3 ……………………………….(66)
if 2 = 0 the distribution is normal or mesokurtic
if 2 < 0 (positive) the distribution is platykurtic
if 2 > 0 (negative) the distribution is Leptokurtic.
y LeptoKurtic
Mesokurtic
Platykurtic
You may ask what is the purpose of studying or knowing moments about 0 or some assumed value a?
That is a good question. We need to study them because in many cases moments are given in terms of
either moments about assumed value or about origin. If either of these moments are given it is
possible to change them to moments about mean using the following conversion rule.
r r r
r = / r − / r −1 /1 + / r − 2 // 21 − / r −31/ 3 + + (−1) r / r1 ………...(66)
1 2 3
Hence 1 = 0
For r=2,3, and 4 we get the following
2 ='2 -'21
3 ='3-'2 '1+2'31 ……………………………….(67)
4 = / 4 − 4 / 3 /1 + 6 2 / / 21 − 3 /14
Example 50. The first 3 moments of a distribution about the value 2 are 1, 22 and 10. Find the mean,
S.D. and 1
Solution
1) We know ='1+a=1+2=3
2) S.D= 2 but 2 = 2 − '1 = 22 − 1 = 21
1 2
S.D = 21 = 4.5
3
3) 1 =
3 but3 = 3 − 3 '2 '1 +2 ' 1 −1 = 31
1 2
= 10-(3x2x1)+2=-51
− 51
1 =
( 4.5)3