MA026IU
P ROBABILITY, S TATISTIC AND R ANDOM
P ROCESS
Part I
Academic year 2021–2022
Contents
Introduction
Random experiments
Basic rules of probability
Conditional probability
Probability and combinatorics
Statistics and stochastics
Statistics is a science that aims to develop
methods for making informed guesses and
decisions based on incomplete and uncertain
information.
Stochastics is the field of mathematics concerned
with modelling randomness and probability.
Computation and visualization by a computer may be enough to study
the properties of a particular data set.
The mathematical models of stochastics are needed whenever one wants
to use the data to generalize and predict.
Probability: The concept
Probability is a way of quantifying the belief of something being
true or false.
• Tossing a coin gives “heads” with probability 21
• Next Monday in Otaniemi it will rain with
probability
• 14% (says Ilmatieteen laitos)
• 19% (says Foreca)
“Interpretations” of probability
• Objective (relative frequency in the long run)
• Subjective (degree of belief, based on some information)
The two interpretations are not in conflict, but support each other.
Also, the mathematical laws of probability are the same in both
cases.
[Link]
Random experiment
A random experiment is a process that will result in something
happening (an outcome), but we do not exactly know what.
• Sample space S is the set of all possibile outcomes
• Outcome = an element of the sample space, s ∈ S
• Event = a set of outcomes; a subset of the sample space,
A⊂S
Terminology
• An event A occurs, if the outcome s ∈ A
• The full set S is the certain event
• The empty set ∅ is the impossible event
Example: Rolling a die
• Outcome i = the result of the roll
• Sample space S = {1, 2, . . . , 6}
• Events are all subsets of S, for example
• A = “outcome is even” = {2, 4, 6}.
• B = “outcome is bigger than four” = {5, 6}.
Example: Two rolls of a die
• An outcome is a pair of integers (i, j),
where i is the first roll result and j is the
second roll result
• Sample space is
S = {(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6),
(2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6),
(3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6),
(4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (4, 6),
(5, 1), (5, 2), (5, 3), (5, 4), (5, 5), (5, 6),
(6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6)}
Events are for example
• A = “the two results are equal”
= {(1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6)}.
• B = “first roll was one”
= {(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6)}.
Example: Tomorrow’s rainfall in Otaniemi (mm)
• Outcomes are real numbers x ≥ 0.
• Sample space S = [0, ∞).
Events are e.g.
• A = “rainfall exceeds 10 mm” = (10, ∞)
• B = “no rain tomorrow” = {0}
Combining events
We can combine events into new events by different logical
operarions.
• ”A and B occur”
• ”A or B occurs”
• ”A does not occur”
• ”B occurs but A does not”
In probability, it is customary to use the language of set theory,
because events are sets of possible outcomes.
Intersection of events
The intersection of two events, denoted
A ∩ B, contains every outcome that
belongs to A and also belongs to B.
A ∩ B = {s ∈ S : s ∈ A and s ∈ B}.
Example (One die)
• A = “Result exceeds 3” = {4, 5, 6}
• B = “Result is even” = {2, 4, 6}
• A ∩ B = “Result exceeds 3 and is even” = {4, 6}
Union of events
The union of two events, denoted
A ∪ B, contains every such outcome
that belongs to A or to B (or both).
A ∪ B = {s ∈ S : s ∈ A or s ∈ B}.
Example (Die roll)
• A = “Result exceeds 3” = {4, 5, 6}
• B = “Result is even” = {2, 4, 6}
• A ∪ B = “Result exceeds 3 or is even” = {2, 4, 5, 6}
Note that “or” is usually understood as “inclusive or”, that is, it
allows the possibility of both happening.
Complement of an event
The complement of an event, denoted
Ac , contains exactly those outcomes
that are not in A.
Ac = {s ∈ S : s 6∈ A}.
Example (Die roll)
• A = “Result exceeds 3” = {4, 5, 6}
• Ac = “Result does not exceed 3”
= ”Result at most 3” = {1, 2, 3}
Difference of events
The difference event B \ A = B ∩ Ac
contains the outcomes that are in B,
but are not in A.
B \ A = {s ∈ S : s ∈ B and s 6∈ A}.
Example (Die roll)
• A = “Result exceeds 3” = {4, 5, 6}
• B = “Result is even = {2, 4, 6}
• B \ A = “Result is even but does not exceed 3” = {2}
Mutually exclusive events
Two events A and B are mutually exclusive (or disjoint), if they
cannot both occur (at the same time).
A ∩ B = ∅.
Several events A1 , A2 , . . . are mutually exclusive, if every pair of
events is mutually exclusive (only one of the events can occur at
the same time).
Example (Die roll)
• A = {1, 2} and {3, 4} are mutually exclusive.
• A = {1, 2, 3} and {2, 4, 5} are not mutually exclusive.
• A = {1, 2}, B = {3, 4} and C = {5, 6} are mutually exclusive.
Combining events — Summary
Name Notation Definition Venn diagram Interpretation
Sample space S {x ∈ S : x ∈ S} Certain event
Event A {x ∈ S : x ∈ A} A occurs
Event B {x ∈ S : x ∈ B} B occurs
Intersection A∩B {x ∈ S : x ∈ A and x ∈ B} A and B occur
Union A∪B {x ∈ S : x ∈ A or x ∈ B} A or B occurs (or both)
Difference A\B {x ∈ S : x ∈ A and x 6∈ B} A occurs but B does not
Difference B\A {x ∈ S : x ∈ B and x 6∈ A} B occurs but A does not
Complement Ac {x ∈ S : x 6∈ A} A does not occur
Complement Bc {x ∈ S : x 6∈ B} B does not occur
Empty set ∅ {x ∈ S : x 6∈ S} Impossible event
The axioms
A probability on the sample space S is a function from events to
numbers.
The probability of event A is denoted P(a).
We have some basic requirements for the function.
(i) The whole sample space S has probability P(S) = 1.
(ii) For every event A we have 0 ≤ P(A) ≤ 1.
(iii) Additivity: For any collection of mutually exclusive events
A1 , A2 , . . . we have
P(A1 ∪ A2 ∪ · · · ) = P(A1 ) + P(A2 ) + · · ·
Other rules of calculating with probability can be deduced from
these “axioms”.
Further rules
• General addition rule:
P(A ∪ B) = P(A) + P(B) − P(A ∩ B).
• Additivity of two mutually exclusive events:
P(A ∪ B) = P(A) + P(B), kun A ∩ B = ∅.
• Probability of complement and difference:
P(Ac ) = 1 − P(A),
P(B \ A) = P(B) − P(A ∩ B).
• Monotonicity:
P(A) ≤ P(B), kun A ⊂ B.
These can be deduced from the axioms.
Probability and combinatorics
If we have a finite sample space S that has n equally probable
outcomes, then each outcome has probability 1/n.
Then if an event A contains k outcomes, by additivity its
probability is
k #A the number of outcomes in A
P(A) = = = .
n #S the number of outcomes in S
A sample space whose outcomes are equally probable is called
symmetric. It seems that probability in such spaces is trivially easy.
However . . .
• It is difficult to “count” the elements of a large set one by one.
• Sometimes you can “calculate” the number of elements with
more effient methods.
• Combinatorics is a field of mathematics that provides tools for
this.
Conditional probability
If A and B are two events, we define the conditional probability of
A given that B occurs, by the formula
P(A ∩ B)
P(A|B) = , P(B) 6= 0.
P(B)
• Read as “probability of A given B”, or “P of A given B”
• Interpretation: This is the probability of A occurring if B
occurs.
• Note that P(A|B) is not the same as P(A ∩ B).
• Also P(B|A) is different.
• If P(B) = 0, we leave P(A|B) undefined.
General product rule
From the definition of conditional probability, we can simply
deduce the general product rule.
Rule
If P(A) 6= 0, then
P(A ∩ B) = P(A)P(B|A).
Interpretation
Probability of the event “both A and B occur” is obtained by
multiplying the probability of A with the conditional probability of
B.
Product rule for several events (chain rule)
Rule
If P(A1 ∩ · · · ∩ Ak−1 ) 6= 0, then
P(A1 ∩ · · · ∩ Ak )
= P(A1 )P(A2 |A1 )P(A3 |A1 ∩ A2 ) · · · P(Ak |A1 ∩ · · · ∩ Ak−1 ).
Interpretation
The probability for the event “all of A1 , . . . , Ak ocrrus” is obtained
by multiplying together:
• probability of A1 ,
• then conditional probability of A2 given A1 ,
• then conditional probability of A3 given A1 and A2 ,
• ...
• conditional probability of Ak given A1 , A2 , . . . , Ak−1 .
Product rule — Example
From a well-shuffled deck (of 52 cards) we deal
three cards. What is the probability that all three
are spades?
• Ai = “ith card is spade”
• A = A1 ∩ A2 ∩ A3
Apply the chain rule on three events.
13 12 11
P(A) = P(A1 )P(A2 |A1 )P(A3 |A1 ∩ A2 ) = · · ≈ 0.013.
52 51 50
There is another method that involves combinatorics (we’ll learn
about this later).
Stochastic dependence and independence
Two events A and B are independent, if
P(A ∩ B) = P(A)P(B).
If this does not hold, we say the events are (stochastically)
dependent.
Several events {Ai , i ∈ I } are independent, if
P(Ai1 ∩ · · · ∩ Aik ) = P(Ai1 ) · · · P(Aik )
for all i1 , i2 , . . . , ik ∈ I .
Example
Some situations where independence is intuitively clear.
• Physically separate tosses of a coin (or a die).
• Sampling with replacement. Pick a lottery ticket from a box,
place it back in the box and shuffle, then pick again a lottery
ticket.
Independence and conditional probability
Fact
If P(A) 6= 0 and P(B) 6= 0, then these three conditions are
equivalent:
• A and B are independent.
• P(A|B) = P(A).
• P(B|A) = P(B).
Interpretation
If P(A|B) 6= P(A), then knowing whether B occurs or not affects
the probability of A occurring (i.e. either makes it more probable
or less probable).
Example: Two events when dealing one card
A random card is dealt from a shuffled deck.
• A = “the card is a spade”
• B = “the card is an ace”
Are A and B dependent or independent?
Let us calculate whether P(A ∩ B) = P(A)P(B).
• P(A) = 13 1
52 = 4 .
4
• P(B) = 52 1
= 13 .
• P(A ∩ B) = P(“ace of spades”) = 1
52 .
Because P(A ∩ B) = P(A)P(B), we see that A and B are
independent events.
Law of total probability
If we divide the sample space S into
mutually exclusive events B1 , . . . , Bn
whose union is the whole S, we have a
partition of S.
Rule
If B1 , . . . , Bn are a partition of the sample space and P(Bi ) 6= 0 for
all i, then
Xn
P(A) = P(Bi )P(A|Bi ).
i=1
Proof.
Events Ci = A ∩ Bi are mutually
exclusive and their union is A.
Applying both additivity, and the product rule
P(A ∩ Bi ) = P(Bi )P(A|Bi ), we have
n n n
!
[ X X
P(A) = P Ci = P(Ci ) = P(A ∩ Bi )
i=1 i=1 i=1
n
X
= P(Bi )P(A|Bi ).
i=1
Example. A rare disease
In a population, 1/10000 of the people carry a certain disease. There is a
test for the disease, but it makes false positive and false negative results,
both with a probability 1%. What is the probability that a random
person shows a positive (“diseased”) test result?
H− = “not diseased” T− = “test is negative”
H+ = “diseased” T+ = “test is positive”
Law of total probability =⇒ P(T+ ) = P(H− )P(T+ | H− ) + P(H+ )P(T+ | H+ )
= 0.9999 · 0.01 + 0.0001 · 0.99
= 0.010098.
Bayes’ rule
Question: How are P(A|B) and P(B|A) related to each other?
Rule (Bayes’ rule)
P(A|B)P(B)
P(B|A) = .
P(A)
Proof.
Apply the definition of the conditional probability twice.
P(A ∩ B) P(A ∩ B) P(B) P(B)
P(B|A) = = = P(A|B) .
P(A) P(B) P(A) P(A)
Example. A rare disease
In a population, 1/10000 of the people carry a certain disease. There is a
test for the disease, but it makes false positive and false negative results,
both with a probability 1%. If we test a random person and the test is
positive, what is the probability that the person indeed has the disease?
H− = “not diseased” T− = “test is negative”
H+ = “diseased” T+ = “test is positive”
Previously we found P(T+ ) = 0.010098. Bayes’ rule =⇒
P(H+ )P(T+ | H+ ) 0.0001 · 0.99
P(H+ | T+ ) = = ≈ 0.0098.
P(T+ ) 0.010098
Is there something odd here?
:
• 99% of all test results are correct
• Over 99% of positive results are wrong!
Summary of the rules of probability
Addition rules (two versions)
P(A ∪ B) = P(A) + P(B) − P(A ∩ B)
= P(A) + P(B) (if A and B mutually exclusive)
Product rule (two versions)
P(A ∩ B) = P(A)P(B|A)
= P(A)P(B) (if A and B independent)
Law of total probability
X
P(A) = P(Bi )P(A|Bi ) (if the Bi ’s are a partition of S)
i
Bayes’ rule
P(A|B)P(B)
P(B|A) =
P(A)