0% found this document useful (0 votes)

7 views41 pages

Module-3 Part-2

The document outlines various reinforcement learning algorithms, including Q-learning and SARSA, and their applications in decision-making scenarios like the multi-armed bandit problem. It discusses the importance of controls in reinforcement learning, the exploration vs exploitation dilemma, and provides insights into implementing these algorithms in practical scenarios. Additionally, it covers the Q-learning process, the Bellman Equation, and the differences between on-policy and off-policy learning methods.

Uploaded by

mohit70673parmar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views41 pages

Module-3 Part-2

Uploaded by

mohit70673parmar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

•Module-3

Module 3 RL ALGO

• REINFORCEMENT ALGORITHMS AND APPLICATIONS:

• Algorithms for control learning, Q-learning,

• Discrete action space:
• SARSA – Lambda,
• DQN- Deep Q Network.
• Continuous action space:
• Deep Deterministic Policy Gradient (DDPG),
• Asynchronous Advantage Actor-Critic Algorithm(A3C).
•
• Multi-Armed Bandit Problem
• The multi-armed bandit problem is used in reinforcement learning to
formalize the notion of decision-making under uncertainty. In a
multi-armed bandit problem, an agent(learner) chooses
between k different actions and receives a reward based on the chosen
action.
The multi-armed bandits are also used to describe fundamental
concepts in reinforcement learning, such as rewards, timesteps,
and values.
• The Multi-Armed Bandit (MAB) problem is a classic problem in
probability theory and decision-making that captures the essence of
balancing exploration and exploitation
• This problem is named after the scenario of a gambler facing multiple
slot machines (bandits) and needing to determine which machine to
play to maximize their rewards. The MAB problem has significant
applications in various fields, including online advertising, clinical
trials, adaptive routing in networks, and more.
• The exploration vs exploitation dilemma exists in many aspects of our
life.
• The dilemma comes from the incomplete information.
• For selecting an action by an agent, we assume that each action has a
separate distribution of rewards and there is at least one action that
generates maximum numerical reward.
• Thus, the probability distribution of the rewards corresponding to each
action is different and is unknown to the agent(decision-maker).
• Hence, the goal of the agent is to identify which action to choose to
get the maximum reward after a given set of trials.
• Imagine you are in a casino facing multiple slot machines and each is
configured with an unknown probability of how likely you can get a
reward at one play. The question is: What is the best strategy to
achieve highest long-term rewards?
ε-Greedy Algorithm

epsilon refers to the probability of choosing

to explore, exploits most of the time with a
small chance of exploring.

To solve MBP , epsilon

greedy algo is used.
• Implementation work // Practise
• [Link]
• [Multi-Armed Bandit Problem and Epsilon-Greedy Action Value
Method in Python: Reinforcement Learning]
• [Introduction to OpenAI Gym and Frozen Lake Environment in
Python- Reinforcement Learning Tutorial]
• [Link]
• Intro to Open AI Gym (Frozen Lake environment):
[Link]
• Intro to Open AI Gym (Cart Pole environment):
[Link]
• Intro to Open AI Gym (Atari Game environment):
[Link]
• Classification of RL Algo
Control Algorithms
• algorithm used to control, coordinate, and optimize
• The most common algorithms can be classified as fuzzy-based
algorithms, neural networks-based algorithms, predictive control
algorithms, machine learning algorithms, and combined control
algorithms. For example, Pappis and Mamdani (1977) developed the
first fuzzy-based control algorithm for traffic junctions.
• GOTO is a control algorithm in which the master robot commands the
slave robot to drive itself from its current operational position to
specified positions.
• Urban Traffic Management System Based on Ontology and
Multiagent System
• Dong Shen and Songhang Chen State Key Laboratory of Management
and Control for Complex Systems, Institute of Automation, Chinese
Academy of Sciences, Beijing, China
What is a reinforcement learning problem and why are controls important for them?

• Reinforcement learning is a problem where an agent tries to learn how

to maximize its reward by interacting with its environment.
• Decisions, also known as controls, are important for reinforcement
learning problems because they help agents learn which control in a
given state is associated with the greatest rewards.
• Many different reinforcement learning algorithms help discover what
the best controls (those controls that lead to the greatest rewards) are
• individual weaknesses:
• reinforcement learning algorithms could encourage exploration early
on during an agent’s training to avoid getting stuck in local optima or
suboptimal policies.
• In addition, reinforcement learning control algorithms can be used to
help an agent recover from mistakes and return to the task at hand.
Q learning
• Q-learning is a model-free, off-policy reinforcement learning that will
find the best course of action, given the current state of the agent.
Depending on where the agent is in the environment, it will decide the
next action to be taken.
• To do this, it may come up with rules of its own or it may operate
outside the policy given to it to follow. This means that there is no
actual need for a policy, hence we call it off-policy.
• Model-free means that the agent uses predictions of the environment’s
expected response to move forward. It does not use the reward system
to learn, but rather, trial and error.
• "model-free" means an algorithm learns by interacting
directly with the environment without needing to build a
predictive model of how the environment works, relying on
trial and error to discover optimal actions.
•"off-policy" means the algorithm can learn from data
generated by a different policy (behavior policy) than the one
it is currently trying to optimize (target policy)
Using Q-learning, we can optimize the ad
recommendation system to recommend products that
are frequently bought together. The reward will be if the
user clicks on the suggested product.
Activity
• Case study of Q learning based Ad Recommendation systems
• Sequence Modelling _Aviation Industry
• Design of Q learning
Important Terms in Q-Learning
States: The State, S, represents the current position of an agent in an
environment.
Action: The Action, A, is the step taken by the agent when it is in a
particular state.
Rewards: For every action, the agent will get a positive or negative
reward.
Episodes: When an agent ends up in a terminating state and can’t take a
new action.
Q-Values: Used to determine how good an Action, A, taken at a
particular state, S, is. Q (A, S).
Temporal Difference: A formula used to find the Q-Value by using the
value of current state and action and previous state and action.
Bellman Equation?
• The Bellman Equation is used to determine the value of a particular
state and deduce how good it is to be in/take that state. The optimal
state will give us the highest optimal value.
How to Make a Q-Table?
• While running our algorithm, we will come across various solutions
and the agent will take multiple paths. How do we find out the best
among them? This is done by tabulating our findings in a table called a
Q-Table.
• A Q-Table helps us to find the best action for each state in the
environment. We use the Bellman Equation at each state to get the
expected future state and reward and save it in a table to compare with
other states.
AWS ROBOMAKER
[Link]
• Numerical on Q learning
• Q-Learning ( Start State, Policy, Table and function, Goal state)

•{

• Step 1 : Decide Policy : Qπ ( st, at) // s and a represent state and actions at any time t

• Step 2: Agent uses policy and based on observation ; best action is selected.

• Step 3: Immediate reward points for correct actions and negative points for bad actions has been
generated . //

• Step 4: Q learning depends Q functions and Q table.

• Step 5: Q -function used Bellman equation // Generalized form of policy

• Qπ (st ,at) = E ( Rt+1 + Rt+2 + -------------------+ ¥n Rt+n )

• Step 6 : Maintain Q Table

• Step 7: Until agent reaches to goral states else repeat from initial state (Step 1)

• }
Class Activity _ 17/02/2025
1. Implementation of Grid world agent navigation problem Using
❖ Q learning and SARSA
❖ Compare Both results
SARSA Reinforcement Learning
• SARSA algorithm is a slight variation of the popular Q-Learning
algorithm. For a learning agent in any Reinforcement Learning
algorithm, it’s policy can be of two types:-

1. On Policy: In this, the learning agent learns the value function

according to the current action derived from the policy currently being
used.
2. Off Policy: In this, the learning agent learns the value function
according to the action derived from another policy.
• Q-Learning technique is an Off Policy technique and uses the greedy
approach to learn the Q-value.
• SARSA technique, on the other hand, is an On Policy and uses the
action performed by the current policy to learn the Q-value.
• SARSA (State-Action-Reward-State-Action) is a reinforcement
learning algorithm that is used to learn a policy for an agent interacting
with an environment.
• It is a type of on-policy algorithm, which means that it learns the value
function and the policy based on the actions that are actually taken by
the agent.

• The SARSA algorithm works by maintaining a table of action-value

estimates Q(s, a), where s is the state and a is the action taken by the
agent in that state.
• The table is initialized to some arbitrary values, and the agent uses an
epsilon-greedy policy to select actions.
SARSA algorithm
•
SARSA algo : FrozenLake-v0’
• Reinforcement Learning in Python: Implementing SARSA Agent
in Taxi Environment
• [Link]
mplementing-sarsa-agent-in-taxi-environment/?ref=asr2

DM Assignment Final
No ratings yet
DM Assignment Final
32 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
6 pages
Understanding Reinforcement Learning Basics
No ratings yet
Understanding Reinforcement Learning Basics
16 pages
Enhanced Q-Learning with Relative Rewards
No ratings yet
Enhanced Q-Learning with Relative Rewards
5 pages
Reinforcement Learning Algorithms Overview
No ratings yet
Reinforcement Learning Algorithms Overview
5 pages
Reinforcement Learning Overview
No ratings yet
Reinforcement Learning Overview
13 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
29 pages
Reinforcement Learning Algorithms Overview
No ratings yet
Reinforcement Learning Algorithms Overview
4 pages
Understanding Reinforcement Learning
No ratings yet
Understanding Reinforcement Learning
16 pages
Understanding Reinforcement Learning Basics
No ratings yet
Understanding Reinforcement Learning Basics
18 pages
Reinforcement Learning Explained
No ratings yet
Reinforcement Learning Explained
45 pages
Q-Learning and TD Learning Explained
No ratings yet
Q-Learning and TD Learning Explained
6 pages
Markov Chain Monte Carlo in RL
No ratings yet
Markov Chain Monte Carlo in RL
43 pages
Understanding Reinforcement Learning
No ratings yet
Understanding Reinforcement Learning
16 pages
Understanding Q-Learning Basics
No ratings yet
Understanding Q-Learning Basics
65 pages
Introduction to Reinforcement Learning
No ratings yet
Introduction to Reinforcement Learning
34 pages
RL Iat-1 QB Answers
No ratings yet
RL Iat-1 QB Answers
49 pages
Understanding Q-Learning in Reinforcement Learning
No ratings yet
Understanding Q-Learning in Reinforcement Learning
47 pages
Understanding Reinforcement Learning
No ratings yet
Understanding Reinforcement Learning
17 pages
Understanding Reinforcement Learning
No ratings yet
Understanding Reinforcement Learning
32 pages
Temporal Difference Learning in RL
No ratings yet
Temporal Difference Learning in RL
7 pages
Reinforcement Learning with Latent Confounding
No ratings yet
Reinforcement Learning with Latent Confounding
7 pages
Introduction to Reinforcement Learning
No ratings yet
Introduction to Reinforcement Learning
27 pages
ML UNIT 6 Notes
No ratings yet
ML UNIT 6 Notes
11 pages
Q Learing
No ratings yet
Q Learing
32 pages
Multi-Agent Reinforcement Learning in Hide and Seek
No ratings yet
Multi-Agent Reinforcement Learning in Hide and Seek
7 pages
Understanding Reinforcement Learning
No ratings yet
Understanding Reinforcement Learning
18 pages
Passive Reinforcement Learning Overview
No ratings yet
Passive Reinforcement Learning Overview
43 pages
Understanding Q-Learning in AI
No ratings yet
Understanding Q-Learning in AI
3 pages
Understanding Reinforcement Learning Concepts
No ratings yet
Understanding Reinforcement Learning Concepts
38 pages
Reinforcement Learning Overview by Lucibello
No ratings yet
Reinforcement Learning Overview by Lucibello
56 pages
Understanding Reinforcement Learning Concepts
No ratings yet
Understanding Reinforcement Learning Concepts
9 pages
MLT Unit-5 Notes
No ratings yet
MLT Unit-5 Notes
17 pages
Reinforcement Learning & NLP Overview
No ratings yet
Reinforcement Learning & NLP Overview
22 pages
Understanding Q-Learning Basics
No ratings yet
Understanding Q-Learning Basics
10 pages
SARSA vs Q-Learning in RL Algorithms
No ratings yet
SARSA vs Q-Learning in RL Algorithms
4 pages
Understanding Reinforcement Learning Concepts
No ratings yet
Understanding Reinforcement Learning Concepts
23 pages
Understanding Reinforcement Learning Basics
No ratings yet
Understanding Reinforcement Learning Basics
15 pages
Understanding Reinforcement Learning Concepts
No ratings yet
Understanding Reinforcement Learning Concepts
55 pages
Intro To RL
No ratings yet
Intro To RL
35 pages
Reinforcement Learning Fundamentals Guide
No ratings yet
Reinforcement Learning Fundamentals Guide
72 pages
Reinforcement Learning Fundamentals
No ratings yet
Reinforcement Learning Fundamentals
44 pages
Reinforcement Learning Overview
No ratings yet
Reinforcement Learning Overview
45 pages
Machine Learning Unit 5
No ratings yet
Machine Learning Unit 5
8 pages
Define Reinforcement Learning
No ratings yet
Define Reinforcement Learning
5 pages
Understanding Reinforcement Learning Basics
No ratings yet
Understanding Reinforcement Learning Basics
17 pages
Reinforcement Learning and MCMC Overview
No ratings yet
Reinforcement Learning and MCMC Overview
18 pages
Reinforcement Learning For Clinical Decision Suppo
No ratings yet
Reinforcement Learning For Clinical Decision Suppo
8 pages
Reinforcement Learning Techniques Explained
No ratings yet
Reinforcement Learning Techniques Explained
15 pages
Understanding Reinforcement Learning Basics
No ratings yet
Understanding Reinforcement Learning Basics
21 pages
Reinforcement Learning Overview
No ratings yet
Reinforcement Learning Overview
161 pages
Introduction to Reinforcement Learning
No ratings yet
Introduction to Reinforcement Learning
11 pages
Deep Reinforcement Learning in Finance
No ratings yet
Deep Reinforcement Learning in Finance
40 pages
MLT U-5 ONE SHOT Noes
No ratings yet
MLT U-5 ONE SHOT Noes
30 pages
MLT U-5 ONE SHOT Notes
No ratings yet
MLT U-5 ONE SHOT Notes
42 pages
Reading Comprehension in Children with ASD
No ratings yet
Reading Comprehension in Children with ASD
19 pages
BSc Geology Programme Overview 2024-25
No ratings yet
BSc Geology Programme Overview 2024-25
9 pages
Ashok's Professional CV and Experience
No ratings yet
Ashok's Professional CV and Experience
2 pages
Engaging with King and the Dragonflies
No ratings yet
Engaging with King and the Dragonflies
17 pages
Grade 7 English Curriculum Design Kenya
No ratings yet
Grade 7 English Curriculum Design Kenya
199 pages
AI Fundamentals for Beginners Explained
100% (1)
AI Fundamentals for Beginners Explained
186 pages
Teaching Strategies in Social Science
No ratings yet
Teaching Strategies in Social Science
18 pages
Foundations of Adult Health Nursing
No ratings yet
Foundations of Adult Health Nursing
5 pages
Cut Paper Drawing Lesson Plan
No ratings yet
Cut Paper Drawing Lesson Plan
4 pages
Curriculum Development: Key Theories Explained
No ratings yet
Curriculum Development: Key Theories Explained
5 pages
Arabic Flashcards for Kids' Learning
100% (3)
Arabic Flashcards for Kids' Learning
62 pages
Identifying Bias in Articles
No ratings yet
Identifying Bias in Articles
1 page
Self-Nominations for National Teacher Awards
No ratings yet
Self-Nominations for National Teacher Awards
22 pages
Test Planning and Table of Specification
No ratings yet
Test Planning and Table of Specification
42 pages
Durham Commission on Creativity in Education
No ratings yet
Durham Commission on Creativity in Education
95 pages
Creative Writing iPlan for Grade 7
100% (1)
Creative Writing iPlan for Grade 7
10 pages
Flipped Classroom's Impact on Physics Performance
No ratings yet
Flipped Classroom's Impact on Physics Performance
10 pages
Summer Internship Recommendation Letter
No ratings yet
Summer Internship Recommendation Letter
1 page
Mahyudi: Headmaster of SMA Al Furqon
No ratings yet
Mahyudi: Headmaster of SMA Al Furqon
2 pages
Year 2 Phonics Lesson Plan
No ratings yet
Year 2 Phonics Lesson Plan
1 page
Problematic Why Do Senegalese Third Form Struggle in Oral Profeciency
No ratings yet
Problematic Why Do Senegalese Third Form Struggle in Oral Profeciency
2 pages
DLP Cot Math 10 q2 Wk5
No ratings yet
DLP Cot Math 10 q2 Wk5
11 pages
Carter Scares Reading Assessment Report
No ratings yet
Carter Scares Reading Assessment Report
10 pages
ACSM 030-333 Exam Preparation Guide
No ratings yet
ACSM 030-333 Exam Preparation Guide
7 pages
Course Outline and Schedule
No ratings yet
Course Outline and Schedule
7 pages
MAPEH-10 Daily Lesson Log: Arts
No ratings yet
MAPEH-10 Daily Lesson Log: Arts
6 pages
CHCPAL003 Palliative Care Assessment Guide
No ratings yet
CHCPAL003 Palliative Care Assessment Guide
63 pages
Conclusion Strategies in Lesson Planning
100% (10)
Conclusion Strategies in Lesson Planning
4 pages
Evaluating Gap Year Experiences
No ratings yet
Evaluating Gap Year Experiences
6 pages
Understanding Educational Technology & Action Research
No ratings yet
Understanding Educational Technology & Action Research
11 pages

Module-3 Part-2

Uploaded by

Module-3 Part-2

Uploaded by

•Module-3

• REINFORCEMENT ALGORITHMS AND APPLICATIONS:

• Algorithms for control learning, Q-learning,

epsilon refers to the probability of choosing

To solve MBP , epsilon

• Reinforcement learning is a problem where an agent tries to learn how

• Step 4: Q learning depends Q functions and Q table.

• Step 5: Q -function used Bellman equation // Generalized form of policy

• Qπ (st ,at) = E ( Rt+1 + Rt+2 + -------------------+ ¥n Rt+n )

• Step 6 : Maintain Q Table

1. On Policy: In this, the learning agent learns the value function

• The SARSA algorithm works by maintaining a table of action-value

You might also like