0% found this document useful (0 votes)
22 views2 pages

Reinforcement Learning Exam Questions

This document outlines the examination structure for a Reinforcement Learning course at Chaitanya Bharathi Institute of Technology, detailing the distribution of questions and marks. It consists of two parts: Part A with five questions worth 15 marks total, and Part B with five questions worth 45 marks total, covering various topics in reinforcement learning. The questions address key concepts such as Gradient Descent, Monte Carlo methods, and the k-armed Bandit Problem.

Uploaded by

mamatha.t
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views2 pages

Reinforcement Learning Exam Questions

This document outlines the examination structure for a Reinforcement Learning course at Chaitanya Bharathi Institute of Technology, detailing the distribution of questions and marks. It consists of two parts: Part A with five questions worth 15 marks total, and Part B with five questions worth 45 marks total, covering various topics in reinforcement learning. The questions address key concepts such as Gradient Descent, Monte Carlo methods, and the k-armed Bandit Problem.

Uploaded by

mamatha.t
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Code No.

: 20CAE01
CHAITANYA BHARATHI INSTITUTE OF TECHNOLOGY (Autonomous)
B.E. (CSE-AI&ML) V Sem (Main) Examination Dec 2022 / Jan 2023
Reinforcement Learning
Time: 3 Hours Max Marks: 60
Note: Answer ALL questions from Part-A & Part –B (Internal Choice) at one place in the
same order
Part - A
(5Q X 3M = 15 Marks)
M CO BT
1 Provide the reasons why Reinforcement learning is different from (3) 1 2
supervised learning and unsupervised learning?
2 What is Gradient Descent (GD) and its variants? (3) 2 1
3 Compare the first-visit MC method and the every-visit MC method. (3) 3 2
4 What is State-space planning? What are the uses of it? (3) 4 1
5 What is d function approximation? (3) 5 2

Part – B
(5Q X 9M = 45 Marks)
M CO BT
6 (a) Explain some of the examples and possible applications of (5) 1 2
reinforcement learning that guides us to understand its development.
(b) Suppose the reinforcement learning player was greedy, that is, it (4) 1 3
always played the move that brought it to the position that it rated the
best. Might it learn to play better, or worse, than a nongreedy player?
What problems might occur?
(OR)
7 (a) What is a k-armed Bandit Problem? Explain the role of Action-value (5) 1 3
methods in dealing with the Bandit Problem.
(b) Explain how to tackle a Nonstationary Problem. (4) 1 2

8 (a) Is the MDP framework adequate to usefully represent all goal-directed (5) 2 3
learning tasks? Can you think of any clear exceptions?
(b) Draw and describe the optimal state-value function for the golf (4) 2 2
example.
(OR)
9 (a) Describe the Efficiency of Dynamic Programming. (5) 2 2
(b) Write the Policy Iteration for estimating π ≈ π∗ through iterative policy (4) 2 3
evaluation.

10 (a) Describe Monte Carlo Control without Exploring Starts (ES). (5) 3 2
(b) Illustrate the backup diagram for Monte Carlo estimation of 𝑞π . (4) 3 3
(OR)
11 (a) What is the Optimality of TD (0)? Explain about Random walk under (5) 3 3
batch updating.
(b) Re-solve the windy grid world assuming eight possible actions, (4) 3 2
including diagonal moves, rather than four. How much better can you
do with the extra actions?
Page 1 of 2
Code No.: 20CAE01
4
12 (a) Illustrate the backup diagrams of n-step TD Prediction methods. (5) 4 3
Describe the formation of spectrum ranging from one-step TD
methods to Monte Carlo methods.
(b) Is off policy learning possible without importance sampling? Explain (4) 3
how the n-step tree-backup algorithm is used to achieve this.
(OR)
13 (a) Why did the Dyna agent with exploration bonus, Dyna-Q+, perform (5) 4 3
better in the first phase as well as in the second phase of the blocking
and shortcut experiments?
(b) Write the steps in Prioritized sweeping for a deterministic (4) 4 1
environment.

14 (a) Describe about the on-policy distribution in episodic tasks. (5) 5 2


(b) Explain eligibility traces for action centric methods. (4) 5 3
(OR)
15 (a) Explain policy approximation in action centric methods. (5) 5 2
(b) Explain value prediction with function approximation. (4) 5 2

******

Page 2 of 2

You might also like