Module-V Chapter 8
Reinforcement Learning
By
Pramod Kumar PM
Vivekananda College of Engineering
Technology, Puttur
Module 5 - Outline
Chapter 13: Reinforcement Learning
1. Introduction
2. The Learning Task
3. Q Learning
4. Summary
2
Introduction
▪ Reinforcement learning addresses the question of
• how an autonomous agent that senses and
• acts in its environment
• can learn to choose optimal actions to achieve its goals.
▪ Applications
• learning to control a mobile robot
• learning to optimize operations in factories
• learning to play board games.
3
Introduction
4
Introduction
▪ Consider building a learning robot called as agent.
▪ It has
• a set of sensors to observe the state of its environment
Ex: Camera, Sonar
• a set of actions it can perform to alter this state
Ex: “Move forward”, “Turn Right”
▪ Its task is to learn a control strategy, or policy, for choosing
actions that achieve its goals.
▪ For example, the robot may have a goal of docking onto its
battery charger whenever its battery level is low.
5
Introduction
▪ The goals of the agent can be defined by a reward function
▪ Reward function assigns a numerical value - an immediate
payoff -to each distinct action the agent may take from each
distinct state.
▪ For example, the goal of docking to the battery charger can
be captured by
• assigning a positive reward (e.g., +100) to state-action
transitions that immediately result in a connection to the
charger and
• a reward of zero to every other state-action transition.
6
Introduction
▪ This reward function
• may be built into the robot, or
• known only to an external teacher who provides the
reward value for each action performed by the robot.
▪ The task of the robot is to perform sequences of actions,
observe their consequences, and learn a control policy.
▪ The control policy we desire is one that, from any initial state,
chooses actions that maximize the reward accumulated over
time by the agent.
7
Robot learning
8
Introduction
9
Introduction
10
Module 5 - Outline
Chapter 13: Reinforcement Learning
1. Introduction
2. The Learning Task
3. Q Learning
4. Summary
11
The Learning Task
12
Source: Wikipedia
13
The Learning Task
14
The Learning Task
15
Illustrative Example
16
Illustrative
Example
17
Module 5 - Outline
Chapter 13: Reinforcement Learning
1. Introduction
2. The Learning Task
3. Q Learning
4. Summary
18
Q Learning
19
Q Learning
20
Q Learning
21
Q learning
22
Q Learning Algorithm
23
Q Learning Algorithm
24
Illustrative Example
25
Relationship to Dynamic
Programming
28
Relationship to Dynamic
Programming
29
Module 5 - Outline
Chapter 13: Reinforcement Learning
1. Introduction
2. The Learning Task
3. Q Learning
4. Summary
30
Summary
▪ Reinforcement learning
• Learning control strategies for autonomous agents.
• It assumes that training information is available in the form
of a real-valued reward signal given for each state-action
transition.
• The goal of the agent is to learn an action policy that
maximizes the total reward it will receive from any starting
state.
31
Summary
▪ The reinforcement learning algorithms addressed in this
chapter fit a problem setting known as a Markov decision
process.
▪ In Markov decision processes, the outcome of applying any
action to any state depends only on this action and state (and
not on preceding actions or states).
▪ Markov decision processes cover a wide range of problems
including many robot control, factory automation, and
scheduling problems.
32
Summary
▪ Q learning is one form of reinforcement learning in which the
agent learns an evaluation function over states and actions.
▪ Evaluation function Q(s, a) is defined as the
• maximum expected, discounted, cumulative reward
• the agent can achieve by applying action a to state s.
▪ Advantage - it can-be employed even when the learner has
no prior knowledge of how its actions affect its environment.
33
Thank You
34