Introduction to Reinforcement Learning
Presenter:
Sepideh Nikookar
[Link]
Advisor:
Prof. Senjuti Basu Roy
[Link]
Department of Computer Science
NJIT
Dec 1, 2022 1
What is Reinforcement Learning?
• Reinforcement learning is a type of machine
learning method where an intelligent agent interacts
with the environment and learns to act within that.
• For each good action, the agent gets positive
feedback, and for each bad action, the agent gets
negative feedback or penalty.
• RL solves a specific type of problem where decision
making is sequential, and the goal is long-term.
• The agent continues doing following three things:
• Take action,
• change state/remain in the same state
• get feedback
By doing these actions, he learns and explores the environment. 2
Reinforcement Learning Example
• An example of a state could be your cat sitting, and you use a specific word for cat to walk.
• Your cat reacts by performing an action transition from one “state” (sitting) to another “state”
(walking).
• The reaction of your cat is an action, and the policy is a method of selecting an action given a
state in expectation of better outcomes.
• After the transition, cat may get a reward or penalty in return. 3
Reinforcement Learning Applications
4
Supervised vs. Unsupervised vs. Reinforcement Learning
Criteria Supervised ML Unsupervised ML Reinforcement ML
Learns by using labelled Trained using unlabelled data Works on interacting with
Definition
data without any guidance. the environment
Regression and Exploitation or
Type of problems Association and Clustering
classification Exploration
Linear Regression,
K – Means, Q – Learning,
Algorithms Logistic Regression,
C – Means, Apriori SARSA
SVM, KNN etc.
Aim Calculate outcomes Discover underlying patterns Learn a series of action
Risk Evaluation, Forecast Recommendation System, Self Driving Cars,
Application
Sales Anomaly Detection Gaming, Healthcare
5
Term Used in Reinforcement Learning
Agent An entity that can perceive/explore the environment and act upon it.
Environment A situation in which an agent is present or surrounded by.
Action Actions are the moves taken by an agent within the environment.
State is a situation returned by the environment after each action taken by the
State agent.
A feedback returned to the agent from the environment to evaluate the action of
Reward
the agent.
Policy is a strategy applied by the agent for the next action based on the current
Policy state.
It is expected long-term retuned with the discount factor and opposite to the
Value
short-term reward.
6
Elements of Reinforcement Learning
There are four main elements of Reinforcement Learning, which are given below
1. Policy:
A policy can be defined as a way how an agent behaves at a given time. It maps the perceived
states of the environment to the actions taken on those states.
The policy-based approach has mainly two types of policy:
• Deterministic: The same action is produced by the policy (π) at any state.
• Stochastic: In this policy, probability determines the produced action.
2. Reward Signal: The goal of RL is defined by the reward signal. Reward signals are given
according to the good and bad actions taken by the learning agent. The main objective is to
maximize the total number of rewards for good actions.
3. Value Function: Gives information about how good the situation and action are and how
much reward an agent can expect.
4. Model: Mimics the behavior of the environment. 7
Reinforcement Learning Categories
🔘 Value Based
🔘 No Policy
🔘 Value Function
🔘 Policy Based
🔘 Policy
🔘 No Value Function
🔘 Actor Critic
🔘 Policy
🔘 Value Function
🔘 Model Free
🔘 Policy and/or Value Function
🔘 No Model
🔘 Model Based
🔘 Policy and/or Value Function
🔘 Model 8
State Representation
We can represent the agent state using the Markov State that contains all the
required information from the history. The State is Markov state if it follows the
given condition:
The Markov state follows the Markov property, which says that the future is
independent of the past and can only be defined with the present.
The RL works on fully observable environments, where the agent can observe the
environment and act for the new state. The complete process is known as Markov
Decision process.
9
Markov Decision Process
• Markov Decision Process or MDP, is used to formalize the Reinforcement Learning problems.
MDP contains a tuple of four elements
• A set of finite States
• A set of finite Actions
• Rewards received after transitioning from state to state , due to action .
• Probability .
10
Reinforcement Learning Algorithms
• RL algorithms are mainly used in AI applications and gaming applications. The main
used algorithm is:
Q-Learning: Q-learning is a popular model-free Reinforcement
Learning algorithm based on the Bellman
equation .
The main objective of Q-learning is to learn the
policy which can inform the agent that what
actions should be taken for maximizing the reward
under what circumstances.
It is an off-policy RL that attempts to find the best
action to take at a current state.
𝑄𝑛𝑒𝑤 ( 𝑠𝑡 ,𝑎𝑡 )=𝑄 ( 𝑠 𝑡 , 𝑎𝑡 ) +𝛼 ×(𝑟 𝑡 +𝛾 ×𝑚𝑎 𝑥 𝑎 𝑄 (𝑠𝑡 +1 , 𝑎)−𝑄 (𝑠 𝑡 , 𝑎𝑡 )) 11
Please open your Jupyter Notebook
for the hands-on experience .
12
13