0% found this document useful (0 votes)
29 views32 pages

Q-Learning in Reinforcement Learning

Reinforcement learning addresses how autonomous agents can learn optimal actions to achieve goals by interacting with an environment. Q-learning is a reinforcement learning method where an agent learns a Q-function that evaluates each state-action pair to estimate the maximum reward achievable. The agent takes actions in an environment, observes the results and rewards, and updates the Q-values for each state-action pair to learn an optimal policy that maximizes long-term rewards.

Uploaded by

Lahari bilimale
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views32 pages

Q-Learning in Reinforcement Learning

Reinforcement learning addresses how autonomous agents can learn optimal actions to achieve goals by interacting with an environment. Q-learning is a reinforcement learning method where an agent learns a Q-function that evaluates each state-action pair to estimate the maximum reward achievable. The agent takes actions in an environment, observes the results and rewards, and updates the Q-values for each state-action pair to learn an optimal policy that maximizes long-term rewards.

Uploaded by

Lahari bilimale
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Module-V Chapter 8

Reinforcement Learning
By
Pramod Kumar PM
Vivekananda College of Engineering
Technology, Puttur
Module 5 - Outline

Chapter 13: Reinforcement Learning


1. Introduction
2. The Learning Task
3. Q Learning
4. Summary

2
Introduction
▪ Reinforcement learning addresses the question of
• how an autonomous agent that senses and
• acts in its environment
• can learn to choose optimal actions to achieve its goals.

▪ Applications
• learning to control a mobile robot
• learning to optimize operations in factories
• learning to play board games.

3
Introduction

4
Introduction
▪ Consider building a learning robot called as agent.
▪ It has
• a set of sensors to observe the state of its environment
Ex: Camera, Sonar
• a set of actions it can perform to alter this state
Ex: “Move forward”, “Turn Right”
▪ Its task is to learn a control strategy, or policy, for choosing
actions that achieve its goals.
▪ For example, the robot may have a goal of docking onto its
battery charger whenever its battery level is low.

5
Introduction
▪ The goals of the agent can be defined by a reward function
▪ Reward function assigns a numerical value - an immediate
payoff -to each distinct action the agent may take from each
distinct state.
▪ For example, the goal of docking to the battery charger can
be captured by
• assigning a positive reward (e.g., +100) to state-action
transitions that immediately result in a connection to the
charger and
• a reward of zero to every other state-action transition.

6
Introduction
▪ This reward function
• may be built into the robot, or
• known only to an external teacher who provides the
reward value for each action performed by the robot.

▪ The task of the robot is to perform sequences of actions,


observe their consequences, and learn a control policy.

▪ The control policy we desire is one that, from any initial state,
chooses actions that maximize the reward accumulated over
time by the agent.

7
Robot learning

8
Introduction

9
Introduction

10
Module 5 - Outline

Chapter 13: Reinforcement Learning


1. Introduction
2. The Learning Task
3. Q Learning
4. Summary

11
The Learning Task

12
Source: Wikipedia

13
The Learning Task

14
The Learning Task

15
Illustrative Example

16
Illustrative
Example

17
Module 5 - Outline

Chapter 13: Reinforcement Learning


1. Introduction
2. The Learning Task
3. Q Learning
4. Summary

18
Q Learning

19
Q Learning

20
Q Learning

21
Q learning

22
Q Learning Algorithm

23
Q Learning Algorithm

24
Illustrative Example

25
Relationship to Dynamic
Programming

28
Relationship to Dynamic
Programming

29
Module 5 - Outline

Chapter 13: Reinforcement Learning


1. Introduction
2. The Learning Task
3. Q Learning
4. Summary

30
Summary
▪ Reinforcement learning
• Learning control strategies for autonomous agents.
• It assumes that training information is available in the form
of a real-valued reward signal given for each state-action
transition.
• The goal of the agent is to learn an action policy that
maximizes the total reward it will receive from any starting
state.

31
Summary
▪ The reinforcement learning algorithms addressed in this
chapter fit a problem setting known as a Markov decision
process.
▪ In Markov decision processes, the outcome of applying any
action to any state depends only on this action and state (and
not on preceding actions or states).
▪ Markov decision processes cover a wide range of problems
including many robot control, factory automation, and
scheduling problems.

32
Summary
▪ Q learning is one form of reinforcement learning in which the
agent learns an evaluation function over states and actions.

▪ Evaluation function Q(s, a) is defined as the


• maximum expected, discounted, cumulative reward
• the agent can achieve by applying action a to state s.

▪ Advantage - it can-be employed even when the learner has


no prior knowledge of how its actions affect its environment.

33
Thank You

34

You might also like