0% found this document useful (0 votes)
7 views13 pages

Introduction to Reinforcement Learning

Reinforcement Learning (RL) is a machine learning method where an agent learns to make decisions through interactions with its environment, receiving feedback in the form of rewards or penalties. Key components of RL include the agent, environment, actions, states, rewards, and policies, with applications in areas like gaming and self-driving cars. The Markov Decision Process (MDP) formalizes RL problems, and Q-learning is a widely used algorithm for learning optimal policies.

Uploaded by

jayanthp1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views13 pages

Introduction to Reinforcement Learning

Reinforcement Learning (RL) is a machine learning method where an agent learns to make decisions through interactions with its environment, receiving feedback in the form of rewards or penalties. Key components of RL include the agent, environment, actions, states, rewards, and policies, with applications in areas like gaming and self-driving cars. The Markov Decision Process (MDP) formalizes RL problems, and Q-learning is a widely used algorithm for learning optimal policies.

Uploaded by

jayanthp1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Introduction to Reinforcement Learning

Presenter:
Sepideh Nikookar
[Link]
Advisor:
Prof. Senjuti Basu Roy
[Link]

Department of Computer Science


NJIT

Dec 1, 2022 1
What is Reinforcement Learning?

• Reinforcement learning is a type of machine


learning method where an intelligent agent interacts
with the environment and learns to act within that.

• For each good action, the agent gets positive


feedback, and for each bad action, the agent gets
negative feedback or penalty.

• RL solves a specific type of problem where decision


making is sequential, and the goal is long-term.

• The agent continues doing following three things:


• Take action,
• change state/remain in the same state
• get feedback

By doing these actions, he learns and explores the environment. 2


Reinforcement Learning Example

• An example of a state could be your cat sitting, and you use a specific word for cat to walk.
• Your cat reacts by performing an action transition from one “state” (sitting) to another “state”
(walking).
• The reaction of your cat is an action, and the policy is a method of selecting an action given a
state in expectation of better outcomes.
• After the transition, cat may get a reward or penalty in return. 3
Reinforcement Learning Applications

4
Supervised vs. Unsupervised vs. Reinforcement Learning

Criteria Supervised ML Unsupervised ML Reinforcement ML


Learns by using labelled Trained using unlabelled data Works on interacting with
Definition
data without any guidance. the environment

Regression and Exploitation or


Type of problems Association and Clustering
classification Exploration

Linear Regression,
K – Means, Q – Learning,
Algorithms Logistic Regression,
C – Means, Apriori SARSA
SVM, KNN etc.

Aim Calculate outcomes Discover underlying patterns Learn a series of action

Risk Evaluation, Forecast Recommendation System, Self Driving Cars,


Application
Sales Anomaly Detection Gaming, Healthcare
5
Term Used in Reinforcement Learning

Agent An entity that can perceive/explore the environment and act upon it.

Environment A situation in which an agent is present or surrounded by.

Action Actions are the moves taken by an agent within the environment.

State is a situation returned by the environment after each action taken by the
State agent.

A feedback returned to the agent from the environment to evaluate the action of
Reward
the agent.
Policy is a strategy applied by the agent for the next action based on the current
Policy state.

It is expected long-term retuned with the discount factor and opposite to the
Value
short-term reward.
6
Elements of Reinforcement Learning
There are four main elements of Reinforcement Learning, which are given below

1. Policy:
A policy can be defined as a way how an agent behaves at a given time. It maps the perceived
states of the environment to the actions taken on those states.

The policy-based approach has mainly two types of policy:


• Deterministic: The same action is produced by the policy (π) at any state.
• Stochastic: In this policy, probability determines the produced action.

2. Reward Signal: The goal of RL is defined by the reward signal. Reward signals are given
according to the good and bad actions taken by the learning agent. The main objective is to
maximize the total number of rewards for good actions.

3. Value Function: Gives information about how good the situation and action are and how
much reward an agent can expect.

4. Model: Mimics the behavior of the environment. 7


Reinforcement Learning Categories
🔘 Value Based
🔘 No Policy
🔘 Value Function

🔘 Policy Based
🔘 Policy
🔘 No Value Function
🔘 Actor Critic
🔘 Policy
🔘 Value Function

🔘 Model Free
🔘 Policy and/or Value Function
🔘 No Model

🔘 Model Based
🔘 Policy and/or Value Function
🔘 Model 8
State Representation

We can represent the agent state using the Markov State that contains all the
required information from the history. The State is Markov state if it follows the
given condition:

The Markov state follows the Markov property, which says that the future is
independent of the past and can only be defined with the present.

The RL works on fully observable environments, where the agent can observe the
environment and act for the new state. The complete process is known as Markov
Decision process.

9
Markov Decision Process

• Markov Decision Process or MDP, is used to formalize the Reinforcement Learning problems.

MDP contains a tuple of four elements

• A set of finite States


• A set of finite Actions
• Rewards received after transitioning from state to state , due to action .
• Probability .

10
Reinforcement Learning Algorithms
• RL algorithms are mainly used in AI applications and gaming applications. The main
used algorithm is:

Q-Learning: Q-learning is a popular model-free Reinforcement


Learning algorithm based on the Bellman
equation .

The main objective of Q-learning is to learn the


policy which can inform the agent that what
actions should be taken for maximizing the reward
under what circumstances.

It is an off-policy RL that attempts to find the best


action to take at a current state.

𝑄𝑛𝑒𝑤 ( 𝑠𝑡 ,𝑎𝑡 )=𝑄 ( 𝑠 𝑡 , 𝑎𝑡 ) +𝛼 ×(𝑟 𝑡 +𝛾 ×𝑚𝑎 𝑥 𝑎 𝑄 (𝑠𝑡 +1 , 𝑎)−𝑄 (𝑠 𝑡 , 𝑎𝑡 )) 11


Please open your Jupyter Notebook
for the hands-on experience .

12
13

You might also like