0% found this document useful (0 votes)
17 views13 pages

Bayesian Network Probabilities and Inference

Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views13 pages

Bayesian Network Probabilities and Inference

Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

CS 440 Homework Assignment 2 Part 1

Probabilistic Reasoning
Jinsheng Luo
April 2022

Problem 1. : Consider the following Bayesian network, where variables A through E are
all Boolean valued.
(a) What is the probability that all five of these Boolean variables are simultaneously
true?
The joint probability that all five variables are simultaneously true is:

P (A, B, C, D, E) = P (A) ∗ P (B) ∗ P (C) ∗ P (D|A, B) ∗ P (E|B, C)


= P (A = T ) ∗ P (B = T ) ∗ P (C = T ) ∗ P (D = T |A, B) ∗ P (E = T |B, C)
= 0.2 ∗ 0.5 ∗ 0.8 ∗ 0.1 ∗ 0.3
= 0.0024

Therefore, The joint probability that all five variables are simultaneously true is 0.0024

(b) What is the probability that all five of these Boolean variables are simultaneously
false?
The joint probability that all five variables are simultaneously false is:

P (A, B, C, D, E) = P (A) ∗ P (B) ∗ P (C) ∗ P (D|A, B) ∗ P (E|B, C)


= P (A = F ) ∗ P (B = F ) ∗ P (C = F ) ∗ P (D = F |A, B) ∗ P (E = F |B, C)
= 0.8 ∗ 0.5 ∗ 0.2 ∗ 0.1 ∗ 0.8
= 0.0064

Therefore, The joint probability that all five variables are simultaneously false is 0.0064

(c) What is the probability that A is false given that the four other variables are all
known to be true?

P (¬A|B, C, D, E) = α ∗ P (¬A, B, C, D, E)

1
1
∵α=
P (A, B, C, D, E) + P (¬A, B, C, D, E)
1
=
(0.2 ∗ 0.5 ∗ 0.8 ∗ 0.1 ∗ 0.3) + (0.8 ∗ 0.5 ∗ 0.8 ∗ 0.6 ∗ 0.3)
1
=
0.0024 + 0.0576
1
=
0.06
50
=
3

50
∴ P (¬A|B, C, D, E) = ∗ P (¬A, B, C, D, E)
3
50
= ∗ 0.0576
3
= 0.96

Therefore, The joint probability that A is false given that the four other variables are all
known to be true is 0.96

2
Problem 2. : For this problem, check the Variable Elimination algorithm in your book.
Also consider the Bayesian network from the “burglary” example.

(a) Apply variable elimination to the query:P(Burglary — JohnsCalls = true, MaryCalls


= true) and show in detail the calculations that take place. Use your book to confirm that
your answer is correct.

P (B|J, M ) = α ∗ P (B, J, M )
X X 1
= α ∗ P (B) ∗ P (E) P (A|B, E) ∗ P (J|A) ∗ P (M |A), (α =
E A
P (¬B, J, M ) + P (B, J, M )
   
X 0.95 0.29 1 − 0.95 1 − 0.29
= α ∗ P (B) ∗ P (E) ∗ (0.9 ∗ 0.7 ∗ + 0.5 ∗ 0.01 ∗ )
0.94 0.001 1 − 0.94 1 − 0.001
E
   
X 0.95 0.29 0.05 0.71
= α ∗ P (B) ∗ P (E) ∗ (0.9 ∗ 0.7 ∗ + 0.5 ∗ 0.01 ∗ )
0.94 0.001 0.06 0.999
E
 
X 0.598525 0.183055
= α ∗ P (B) ∗ P (E) ∗
0.59223 0.011295
E
   
0.598525 0.59223
= α ∗ P (B) ∗ (0.002 ∗ + 0.998 ∗ )
0.183055 0.0011295
   
0.001 0.59224259
=α∗( ∗ )
0.999 0.001493351
 
0.00059224259
=α∗
0.001491857649

1
∵α=
P (¬B, J, M ) + P (B, J, M )
1
=
0.00059224259 + 0.001491857649
1
=
0.002084100239
 
1 0.00059224259
∴ P (B|J, M ) = ∗
0.002084100239 0.0014918576
=< 0.2841718354, 0.7158281411 >

Therefore, the probability of a burglary given that John and Mary both called, is 0.2841718354
whereas 0.7158281411. is the probability that burglary will not happen if they both call.

3
(b) Count the number of arithmetic operations performed (additions, multiplications,
divisions), and compare it against the number of operations performed by the tree enumer-
ation algorithm.

There are 24 arithmetic operations performed by previous question.


For tree enumeration, This is computed by adding four terms and each computed by 5 num-
bers. Total is 40 operations( time is doubled because of the computation of α) However, by
simple re-arrangement of the terms can reduce number of operations to 30.

X X
P (B|J, M ) = α ∗ P (B) ∗ P (E) P (A|B, E) ∗ P (J|A) ∗ P (M |A)
E A

Therefore, variable elimination is faster than tree enumeration.

(c) What is the complexity of computing P(X1 | Xn = true) using enumeration? What
is the complexity with variable elimination?

Problem 3. : In this problem, you will implement two kinds of sampling techniques (re-
jection sampling and likelihood weighting) to perform approximate inference on the given
Bayesian network.

(a) How do the approximations and the actual values compare?

Enumeration:

Actual value : P(D | C) = ⟨0.75⟩

Actual value : P(B | C) = ⟨0.1⟩

Actual value : P(D | ¬A, B) = ⟨0.425⟩

Rejection sampling by 1000 samples:

approximate value : P(D | C) = ⟨0.759⟩

approximate value : P(B | C) = ⟨0.1⟩

approximate value : P(D | ¬A, B) = ⟨0.418⟩

Likelihood weighted sampling by 1000 samples:

approximate value : P(D | C) = ⟨0.762⟩

approximate value : P(B | C) = ⟨0.1⟩

4
approximate value : P(D | ¬A, B) = ⟨0.449⟩

The approximate value is very close to the actual value. Because it is randomly sample,
The result will be different in anytime you implement the algorithm

(b) For each of the sampling methods, plot the probability as a function of the number of
samples used. Do you notice large divergences in the convergence rates among the methods?

Figure 1: Rejection Sampling Figure 2: Likelihood Weighting

(c) Construct a query on this Bayes Net such that the convergence and effectiveness of
rejection sampling is noticeably worse than that of likelihood weighting List the query you
are using, and provide the probability plot as a function of samples used. Why is it the case
that rejection sampling is noticeably worse for this query?

Figure 3: Convergence Figure 4: effectiveness

5
I use the query P(D | B) because rejection sampling would be a slower and less accurate
when the probability of finding data-points consistent with the required evidence is large.

Problem 4. :

A B C D E F

Xt = (A, B, C, D, F, F )

Et = (Hot, Cold)
   
P (X1 ) = A 1
P (X1 ) = B  0
   
 P (X1 ) = C  0
P (X1 )=
 = 
P (X 1 ) = D  0
  
P (X1 ) = E  0
P (X1 ) = F 0

(a) Filtering

The observations are given ⟨E1 = Hot,E1 =Cold,E1 =Cold ⟩

P(X3 |h1, C1 , C2 ) =?

we will use this equation to solve this problem:

P
P(Xt |E1:t ) = α ∗ P (Et |Xt ) Xt−1 P (Xt |Xt−1 ) ∗ P (Xt−1 |E1:t−1 )
P
P (X3 |E1 = h, E2 = C, E3 = C) =α ∗ P (E3 |X3 ) ∗ X2 P (X3 |X2 ) ∗ P (X2 |E1 , E2 )

In order to solve this equation, we need to split it:


 
1
0
 
0
P(X1 |E1 = h)=P(X1 )= 
0
 
0
0

P
P(X2 |E1 = h, E2 = C)=α ∗ P (E2 |X2 ) ∗ X1 P (X2 |X1 ) ∗ P (X1 |E1 )

6
First:    
P (E2 = Cold|X2 = A) 0
P (E2 = Cold|X2 = B) 1
   
 P (E2 = Cold|X2 = C)  1
P(E2 = C|X2 )=
P (E2 = Cold|X2 = D)=0(There is no chance of a mistaken reading.)
  
   
P (E2 = Cold|X2 = E) 1
P (E2 = Cold|X2 = F ) 1

Second:      
0.2 1 0.2
0.8 0.8 0.8
     
0 0
 ∗ +0= 0 
P  
X1 P (X 2 |X 1 ) ∗ P (X 1 |E 1 ) = 0 0 0
     
0 0 0
0 0 0
F inally :      
0 0.2 0
1 0.8 0.8
     
1  0 
∗ =α∗ 0 
  1
P (X2 |E1 = h, E2 = C)=α ∗  0  0  0 ∵α= 0.8
= 1.25
     
1  0  0
1 0 0
 
0
1
 
0
∴ P (X2 |E1 = h, E2 = C)=  
0
 
0
0

Then:
P
P (X3 |E1 = h, E2 = C, E3 = C) =α ∗ P (E3 |X3 ) ∗ X2 P (X3 |X2 ) ∗ P (X2 |E1 , E2 )
 
0
1
 
1
P(E3 |X3 ) = 
0

 
1
1
X2 is only 1 at B and 0 otherwise.

7
   
0 0
0.2 0.2
   
P 0.8 0.8
X2 P (X3 |X2 ) ∗ P (X2 |E1 , E2 ) = 0 +  0  + 0 =  0 
   
   
0 0
0 0
In the end:      
0 0 0
1 0.2 0.2
     
1 0.8 0.8 1
P (X3 |E1 = h, E2 = C, E3 = C) = α ∗   ∗   = α ∗ 
     ∵α= 0.8+0.2
=1
0
    0 0
 
1  0  0
1 0 0
 
0
0.2
 
0.8
∴P(X3 |E1 = h, E2 = C, E3 = C) = 0

 
0
0
(b) Smoothing: P(X2 |h1, C1 , C2 ) =?

We use the equation:


P(Xk |E1:t ) = α ∗ P (Ek+1:t |Xk ) ∗ P (Xk |E1:k )

∴ P (X2 |h1 , C2 , C3 ) = α ∗ P (E3:3 |X2 ).P (X2 |E1:2 )


P
P(Et+1:t |Xt ) = Xt+1 ∗P (Xt+1 |Xt ) ∗ P (Et+1 |Xt+1 ) ∗ P (Et+2:t |Xt+1 )
P
P (E3:3 = cold|X2 ) = X3 ∗P (X3 |X2 ) ∗ P (E3 |X3 ) ∗ P (E4:3 |X3 )
∵t=T −1=2
∴Therefore, we can set

P (E4:3 |X3 ) = 1

 
0
1
 
1
Moreover,P (E3 = C|X3 )= 
0
 
1
1

8
X
P (E3:3 = C|X2 ) = P (X3 |X2 ) ∗ P (E3 |X3 ) ∗ 1
X3

 
P (X3 = B|X2 = A) + P (X3 = C|X2 = A) + P (X3 = E|X2 = A) + P (X3 = F |X2 = A)
 P (X3 = B|X2 = B) + P (X3 = C|X2 = B) + P (X3
 = E|X2 = B) + P (X3 = F |X2 = B)  
 P (X3 = B|X2 = C) + P (X3 = C|X2 = C) + P (X3 = E|X2 = C) + P (X3 = F |X2 = C) 
= 
P (X3 = B|X2 = D) + P (X3 = C|X2 = D) + P (X3
 = E|X2 = D) + P (X3 = F |X2 = D) 
 P (X3 = B|X2 = E) + P (X3 = C|X2 = E) + P (X3 = E|X2 = E) + P (X3 = F |X2 = E) 
P (X3 = B|X2 = F ) + P (X3 = C|X2 = F ) + P (X3 = E|X2 = F ) + P (X3 = F |X2 = F )
 
0.8 + 0 + 0 + 0
 0.2 + 0.8 + 0 + 0 
 
 0 + 0.2 + 0 + 0 
=
 
0 + 0 + 0 + 0.8 + 0.0

 0 + 0 + 0.8 + 0.2 
0+0+0+1
 
0.8
1
 
0.2
=
0.8

 
1
1

Therefore,
   
0.8 0
 1  1
   
0.2 0
P (X2 |E1:3 ) = α ∗ 
0.8 ∗ 0
  
   
 1  0
1 0
 
0
1
 
0
=α∗ 
0
 
0
0

9
∵α=1  
0
1
 
0
∴ answer =  
0
 
0
0
(c) Most Likely Explanation:
Let X1 ⋆,X2 ⋆,X3 ⋆ present the most likely sequence.
We need to know X1 ⋆,X2 ⋆,X3 ⋆ given the observations E1 = h, E2 = C, E3 = C

X1 ⋆ = A(Given) Then,
P(A, X2 |E1 = h, E2 = C) = α ∗ P (E2 = C|X2 )*arg maxx1 P(X2 |X1 )
By this function, we get X2 ⋆ = B

P(A,B,B) α0.2
P (A, B, C)α0.8

Therefore,the most likely sequence is A,B and C.

(d) Prediction: P(h4 , h5 , c6 |h1 , c2 , c3 )


For this question we will use this equation
P
P(Et+1:T |E1:t ) = Xt P (Xt |E1:t ) ∗ P (Et+1:T |Xt )

Here t=3, T=6 and t+1=4


P
P(E4:6 |E1:3 ) = X3P (X3 |E1:3 ) ∗ P (E4:6 |X3 )
 
0
0.2
 
0.8
P(X3 |h1 , c2 , c3 )=
 0  From Question 1

 
0
0
Now P
P(Et+1:T |Xt ) = Xt+1 P (Xt+1 |Xt ) ∗ P (Et+1 |Xt+1 ) ∗ P (Et+2:T |Xt+1 )

Therefore P
P(E4:6 |X3 ) = X4 P (X4 |X3 ) ∗ P (E4 |X4 ) ∗ P (E5:6 |X4 )
P
P(E5:6 |X4 ) = X5 P (X5 |X4 ) ∗ P (E5 |X5 ) ∗ P (E6:6 |X5 )

10
P
P(E6:6 |X5 ) = X6 P (X6 |X5 ) ∗ P (E6 |X6 ) ∗ P (E7:6 |X5 )

We start by setting P (E7:6 |X6 ) = 1


Now, P
P(E6:6 = cold|X5 ) = X6 P (X6 |X5 ) ∗ P (C6 |X6 ) ∗ 1
P     
P X6
P (X 6 |X 5 = A) ∗ P (C 6 |X 6 ) 0 + 0.8 + 0 + 0 + 0 + 0 0.8
 X P (X6 |X5 = B) ∗ P (C6 |X6 ) 0 + 0.2 + 0.8 + 0 + 0 + 0  1 
P 6     

P X6 P (X 6 |X 5 = C) ∗ P (C 6 |X 6 )   0 + 0 + 0.2 + 0 + 0 + 0  0.2
=  =  = 
PX6 P (X6 |X5 = D) ∗ P (C6 |X6 )  0 + 0 + 0 + 0 + 0.8 + 0  0.8
     
PX6 P (X6 |X5 = E) ∗ P (C6 |X6 )
  0 + 0 + 0 + 0 + 0.2 + 0.8  1 
X6 P (X6 |X5 = F ) ∗ P (C6 |X6 )
0+0+0+0+0+1 1
Now, P(E5:6 |X4 )
P     
P X 5
P (X 5 |X 4 = A) ∗ P (E 5 = h|X 5 ) ∗ P (E6:6 X 5 ) 0.2 + 0.8 + 0 + 0 + 0 + 0 0.16
 X P (X5 |X4 = B) ∗ P (E5 = h|X5 ) ∗ P (E6:6 X5 )  0 + 0 + 0 + 0 + 0 + 0   0 
P 5     

X P (X 5 |X 4 = C) ∗ P (E 5 = h|X 5 ) ∗ P (E 6:6 X 5 )  0 + 0 + 0 + 0.8 ∗ 0.8 + 0 + 0 0.64
=P 5  =  = 
PX5 P (X5 |X4 = D) ∗ P (E5 = h|X5 ) ∗ P (E6:6 X5 )  0 + 0 + 0 + 0.2 ∗ 0.8, 0, 0  0.16
     
PX5 P (X5 |X4 = E) ∗ P (E5 = h|X5 ) ∗ P (E6:6 X5 )
   0+0+0+0+0+0   0 
X5 P (X5 |X4 = F ) ∗ P (E5 = h|X5 ) ∗ P (E6:6 X5 )
0+0+0+0+0+0 0
Finally, P(E4:6 |X3 )
P     
P X4 P (X4 |X3 = A) ∗ P (E4 = h|X4 ) ∗ P (E5:6 X4 ) 0.2 ∗ 0.16 + 0 + 0 + 0 + 0 + 0 0.032
 X P (X4 |X3 = B) ∗ P (E4 = h|X4 ) ∗ P (E5:6 X4 )  0+0+0+0+0+0   0 
P 4     

X P (X 4 |X 3 = C) ∗ P (E4 = h|X 4 ) ∗ P (E 5:6 X 4 )  0 + 0 + 0 + 0.8 ∗ 0.16 + 0 + 0 0.128
=P
 4
 = 0 + 0 + 0 + 0.2 ∗ 0.16, 0, 0  =0.032
    
PX4 P (X 4 |X 3 = D) ∗ P (E 4 = h|X 4 ) ∗ P (E 5:6 X 4 )     

P 4X P (X 4 |X 3 = E) ∗ P (E 4 = h|X 4 ) ∗ P (E 5:6 X 4 )   0 + 0 + 0 + 0 + 0 + 0   0 
X4 P (X4 |X3 = F ) ∗ P (E4 = h|X4 ) ∗ P (E5:6 X4 )
0+0+0+0+0+0 0
and now,
P
P(E4:6 |E1:3 ) = X3 P (x3 |E1:3 ) ∗ P (E4:6 |X3 )

= 0*0.032+0.2*0+0.8*0.128+0*0.032+0*0+0*0

= 0.1024 = Answer

(e) Prediction P(X4 |h1 , C2 , C3 )

P(Xt+1 |E1:t )= P(Xt |E1:t )*P(Xt+1 |Xt )

Therefore,

11
P(X4 |h1 , C2 , C3 )= P(X3 |h1 , C2 , C3 )*P(X4 |X3 )

Then,  
0
0.2
 
0.8
P(X3 |h1 , C2 , C3 )=0

 
0
0
P
X3 P (X3 |h1 , C2 , C3 ) ∗ P (X4 |X3 )
P 
X
P 3 P (X 4 = A|X 3 ) ∗ P (X 3 |E 3:1 )
 X P (X4 = B|X3 ) ∗ P (X3 |E3:1 )
P 3 

P X3 P (X 4 = C|X 3 ) ∗ P (X 3 |E 3:1 ) 


P 3X P (X 4 = D|X 3 ) ∗ P (X 3 |E 3:1 ) 


P X 3
P (X 4 = E|X 3 ) ∗ P (X 3 |E 3:1 ) 
P (X4 = F |X3 ) ∗ P (X3 |E3:1 )
 X3 
0+0+0+0+0+0
 0 + 0.2 ∗ 0.2 + 0 + 0 + 0 + 0 
 
0 + 0.2 ∗ 0.2 + 0.8 ∗ 0.2 + 0 + 0 + 0
= 
 0 + 0 + 0.8 ∗ 0.8 + 0 + 0 + 0 

 0+0+0+0+0+0 
0+0+0+0+0+0
 
0
0.04
 
0.32
=  (Answer)
0.64 

 0 
0
Similarly:
P(X5 |h1 , C2 , C3 )= P(X4 |h1 , C2 , C3 )*P(X5 |X4 )
P
X4 P (X4 |h1 , C2 , C3 ) ∗ P (X5 |X4 )
P 
X
P 4 P (X 5 = A|X 4 ) ∗ P (X 4 |E4:1 )
 X P (X5 = B|X4 ) ∗ P (X4 |E4:1 )
P 4 

P X4 P (X 5 = C|X 4 ) ∗ P (X 4 |E 4:1 ) 

PX4 P (X5 = D|X4 ) ∗ P (X4 |E4:1 )
 
PX4 P (X5 = E|X4 ) ∗ P (X4 |E4:1 )
 
X4 P (X5 = F |X4 ) ∗ P (X4 |E4:1 )

12
 
0
0.008
 
0.096
=
  (Answer)
0.384

0.512
0

13

You might also like