0% found this document useful (0 votes)

11 views23 pages

Reinforcement Learning Overview and Methods

Uploaded by

pivam12168

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views23 pages

Reinforcement Learning Overview and Methods

Uploaded by

pivam12168

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Previous Lectures

• Supervised learning
– classification, regression

• Unsupervised learning
– clustering

• Reinforcement learning
– more general than supervised/unsupervised learning
– learn from interaction w/ environment to achieve a goal

environment
reward action
new state
agent

Slides from Peter Bodík RAD Lab, UC Berkeley

Recall
• examples

• defining an RL problem
– Markov Decision Processes

• solving an RL problem
– Dynamic Programming
– Monte Carlo methods
– Temporal-Difference learning
– Policy Gradient Methods
Robot in a room
actions: UP, DOWN, LEFT, RIGHT
+1
UP
-1 80% move UP
10% move LEFT
10% move RIGHT
START

• reward +1 at [4,3], -1 at [4,2]

• reward -0.04 for each step

• what’s the strategy to achieve max reward?

• what if the actions were deterministic?
Other examples
• pole-balancing
• TD-Gammon [Gerry Tesauro]
• helicopter [Andrew Ng]

• no teacher who would say “good” or “bad”

– is reward “10” good or bad?
– rewards could be delayed

• similar to control theory

– more general, fewer constraints

• explore the environment and learn from experience

– not just blind search, try to be smart about it
Robot in a room
actions: UP, DOWN, LEFT, RIGHT
+1
UP

-1 80% move UP
10% move LEFT
10% move RIGHT
START
reward +1 at [4,3], -1 at [4,2]
reward -0.04 for each step

• states
• actions
• rewards

• what is the solution?

Is this a solution?
+1

-1

• only if actions deterministic

– not in this case (actions are stochastic)

• solution/policy
– mapping from each state to an action
Optimal policy
+1

-1
Reward for each step: -2
+1

-1
Reward for each step: -0.1
+1

-1
Reward for each step: -0.04
+1

-1
Reward for each step: -0.01
+1

-1
Reward for each step: +0.01
+1

-1
Markov Decision Process (MDP)
• set of states S, set of actions A, initial state S0
• transition model P(s,a,s’) environment
– P( [1,1], up, [1,2] ) = 0.8 reward action
new state
• reward function r(s) agent
– r( [4,3] ) = +1
• goal: maximize cumulative reward in the long run

• policy: mapping from S to A

– (s) or (s,a) (deterministic vs. stochastic)

• reinforcement learning
– transitions and rewards usually not available
– how to change the policy based on experience
– how to explore the environment
Computing return from rewards
• episodic (vs. continuing) tasks
– “game over” after N steps
– optimal policy depends on N; harder to analyze

• additive rewards
– V(s0, s1, …) = r(s0) + r(s1) + r(s2) + …
– infinite value for continuing tasks

• discounted rewards
– V(s0, s1, …) = r(s0) + γ*r(s1) + γ2*r(s2) + …
– value bounded if rewards bounded
Value functions
• state value function: V(s)
– expected return when starting in s and following 

• state-action value function: Q(s,a)

– expected return when starting in s, performing a, and following 

s
• useful for finding the optimal policy
– can estimate from experience a

– pick the best action using Q(s,a) r

s’
• Bellman equation
Optimal value functions
• there’s a set of optimal policies Backup Diagrams:
– V defines partial ordering on policies See Textbook from
Richard S. Sutton
– they share the same optimal value function
Chapter 3: Finite
Markov Decision
Processes
• Bellman optimality equation Page 48

– system of n non-linear equations a

– solve for V*(s) r

– easy to extract the optimal policy
s’

• having Q*(s,a) makes it even simpler

Dynamic programming
• main idea
– use value functions to structure the search for good policies
– need a perfect model of the environment

• two main components

– policy evaluation: compute V from 
– policy improvement: improve  based on V

– start with an arbitrary policy

– repeat evaluation/improvement until convergence
Policy evaluation/improvement
• policy evaluation:  -> V
– Bellman eqn’s define a system of n eqn’s
– could solve, but will use iterative version

– start with an arbitrary value function V0, iterate until Vk converges

• policy improvement: V -> ’

– ’ either strictly better than , or ’ is optimal (if  = ’)

Policy/Value iteration
• Policy iteration

– two nested iterations; too slow

– don’t need to converge to Vk
• just move towards it

• Value iteration

– use Bellman optimality equation as an update

– converges to V*
Using DP
• need complete model of the environment and rewards
– robot in a room
• state space, action space, transition model

• can we use DP to solve

– robot in a room?
– back gammon?
– helicopter?
Self Study Example

Iterative policy evaluation:

See Textbook from Richard S. Sutton

CHAPTER 4. DYNAMIC PROGRAMMING

Exercise 4.1, 4.2

Page 61-67
Generalized Policy Iteration (GPI)
Basic Idea:

• GPI allows policy evaluation and policy improvement to interact in a less rigid manner than traditional
policy iteration.
• Unlike strict policy iteration where evaluation and improvement alternate in distinct phases, GPI allows
for more flexible and interleaved interactions between these processes.
Outline
• examples

• defining an RL problem
– Markov Decision Processes

• solving an RL problem
– Dynamic Programming
– Monte Carlo methods
– Temporal-Difference learning

Reinforcement Learning Fundamentals
No ratings yet
Reinforcement Learning Fundamentals
44 pages
Reinforcement Learning Fundamentals
No ratings yet
Reinforcement Learning Fundamentals
53 pages
UC Berkeley Reinforcement Learning Guide
No ratings yet
UC Berkeley Reinforcement Learning Guide
46 pages
Understanding Reinforcement Learning Concepts
No ratings yet
Understanding Reinforcement Learning Concepts
40 pages
Reinforcement Learning Overview
No ratings yet
Reinforcement Learning Overview
47 pages
Understanding Reinforcement Learning Basics
No ratings yet
Understanding Reinforcement Learning Basics
61 pages
Reinforcement Learning Concepts Explained
No ratings yet
Reinforcement Learning Concepts Explained
19 pages
Unit 5: Reinforcement Learning Notes
No ratings yet
Unit 5: Reinforcement Learning Notes
20 pages
Understanding Reinforcement Learning Concepts
No ratings yet
Understanding Reinforcement Learning Concepts
52 pages
Reinforcement Learning for Self-Driving Cars
No ratings yet
Reinforcement Learning for Self-Driving Cars
57 pages
Monte Carlo Methods in MDPs Explained
No ratings yet
Monte Carlo Methods in MDPs Explained
28 pages
Understanding Markov Decision Processes
No ratings yet
Understanding Markov Decision Processes
58 pages
Independent Q-Learning in RL
No ratings yet
Independent Q-Learning in RL
50 pages
Dynamic Programming in Reinforcement Learning
No ratings yet
Dynamic Programming in Reinforcement Learning
50 pages
Introduction to Reinforcement Learning
No ratings yet
Introduction to Reinforcement Learning
31 pages
1 Introduction To Reinforcement Learning
No ratings yet
1 Introduction To Reinforcement Learning
29 pages
MDPs and Reinforcement Learning Insights
No ratings yet
MDPs and Reinforcement Learning Insights
15 pages
Complete RL Notes
No ratings yet
Complete RL Notes
4 pages
Understanding Finite Markov Decision Processes
No ratings yet
Understanding Finite Markov Decision Processes
12 pages
Lecture2 Robot Learning
No ratings yet
Lecture2 Robot Learning
31 pages
Advanced Reinforcement Learning Course
No ratings yet
Advanced Reinforcement Learning Course
33 pages
Markov Decision Processes in RL
No ratings yet
Markov Decision Processes in RL
36 pages
06 Reinforcement Learning
No ratings yet
06 Reinforcement Learning
12 pages
Dynamic Programming in RL Overview
No ratings yet
Dynamic Programming in RL Overview
3 pages
Introduction to Reinforcement Learning
No ratings yet
Introduction to Reinforcement Learning
4 pages
Monte Carlo Decision Processes & DP
No ratings yet
Monte Carlo Decision Processes & DP
27 pages
Introduction to Reinforcement Learning
No ratings yet
Introduction to Reinforcement Learning
66 pages
Understanding Reinforcement Learning Basics
No ratings yet
Understanding Reinforcement Learning Basics
8 pages
Reinforcement Learning Detailed Solutions
No ratings yet
Reinforcement Learning Detailed Solutions
4 pages
Reinforcement Learning Cheatsheet
No ratings yet
Reinforcement Learning Cheatsheet
16 pages
Reinforcement Learning and Control Andrew NG Vid Lecture 16-17
No ratings yet
Reinforcement Learning and Control Andrew NG Vid Lecture 16-17
15 pages
Reinforcement Learning Basics Explained
No ratings yet
Reinforcement Learning Basics Explained
15 pages
Reinforcement Learning Overview
No ratings yet
Reinforcement Learning Overview
7 pages
Understanding Reinforcement Learning Concepts
No ratings yet
Understanding Reinforcement Learning Concepts
7 pages
Unit 3
No ratings yet
Unit 3
505 pages
RL Module 3
No ratings yet
RL Module 3
25 pages
Reinforcement Learning Lecture - PART - I
No ratings yet
Reinforcement Learning Lecture - PART - I
128 pages
Reinforcement Learning Overview
No ratings yet
Reinforcement Learning Overview
112 pages
Introduction to Reinforcement Learning
No ratings yet
Introduction to Reinforcement Learning
80 pages
Reinforcement Learning Fundamentals
No ratings yet
Reinforcement Learning Fundamentals
58 pages
Reinforcement Learning Blueprints
No ratings yet
Reinforcement Learning Blueprints
15 pages
RL Complete Answers May2026
No ratings yet
RL Complete Answers May2026
28 pages
Classical RL
No ratings yet
Classical RL
24 pages
Chapter 5 - ReinforcementLearning
No ratings yet
Chapter 5 - ReinforcementLearning
23 pages
Oh DRL
No ratings yet
Oh DRL
163 pages
Understanding Markov Decision Processes
No ratings yet
Understanding Markov Decision Processes
11 pages
Introduction to Computational RL
No ratings yet
Introduction to Computational RL
73 pages
Lecture 3
No ratings yet
Lecture 3
48 pages
Basics of Reinforcement Learning
No ratings yet
Basics of Reinforcement Learning
15 pages
Modern Classifiers in Reinforcement Learning
No ratings yet
Modern Classifiers in Reinforcement Learning
8 pages
Reinforcement Learning Crash Course
No ratings yet
Reinforcement Learning Crash Course
84 pages
RL Lec6
No ratings yet
RL Lec6
51 pages
Dynamic Programming in Reinforcement Learning
No ratings yet
Dynamic Programming in Reinforcement Learning
44 pages
Understanding Reinforcement Learning Concepts
No ratings yet
Understanding Reinforcement Learning Concepts
38 pages
Reinforcement Learning Techniques Overview
No ratings yet
Reinforcement Learning Techniques Overview
33 pages
Cleaned-Reinforcement Learning Blueprints
No ratings yet
Cleaned-Reinforcement Learning Blueprints
15 pages
Overview of Reinforcement Learning Methods
No ratings yet
Overview of Reinforcement Learning Methods
12 pages
Shannon's Entropy and Communication Theory
No ratings yet
Shannon's Entropy and Communication Theory
1 page
Indicator Kriging vs. Uniform Conditioning
No ratings yet
Indicator Kriging vs. Uniform Conditioning
4 pages
Deep Learning for E-Commerce Recommendations
No ratings yet
Deep Learning for E-Commerce Recommendations
7 pages
Graph Algorithms and Probability Calculations
No ratings yet
Graph Algorithms and Probability Calculations
7 pages
2D Fourier Transform in Image Processing
100% (2)
2D Fourier Transform in Image Processing
58 pages
Production Flow Analysis Techniques
No ratings yet
Production Flow Analysis Techniques
41 pages
Bresenham's Line Drawing Algorithm Explained
No ratings yet
Bresenham's Line Drawing Algorithm Explained
26 pages
Optimizing Rectangular Patch Antenna Design
No ratings yet
Optimizing Rectangular Patch Antenna Design
10 pages
Grade 8 Factoring Polynomials Pretest
100% (1)
Grade 8 Factoring Polynomials Pretest
2 pages
Finite Volume Methods for Conservation Laws
No ratings yet
Finite Volume Methods for Conservation Laws
9 pages
Data Preprocessing Techniques in Weka
No ratings yet
Data Preprocessing Techniques in Weka
13 pages
Overview of Adaptive Equalizers
No ratings yet
Overview of Adaptive Equalizers
19 pages
Understanding Factor Variables in R
No ratings yet
Understanding Factor Variables in R
4 pages
Optimizing Autonomous Drone Taxi Fleets
No ratings yet
Optimizing Autonomous Drone Taxi Fleets
9 pages
CS 623 Database Management Systems Agenda
No ratings yet
CS 623 Database Management Systems Agenda
29 pages
Deep Learning for Economic Combinatorial Problems
No ratings yet
Deep Learning for Economic Combinatorial Problems
37 pages
Python Programs for Basic Calculations
No ratings yet
Python Programs for Basic Calculations
12 pages
Woxsen University Weekly Timetable
No ratings yet
Woxsen University Weekly Timetable
1 page
Intelligent Jamming for UAV Signals
No ratings yet
Intelligent Jamming for UAV Signals
16 pages
Corotational Method for Large Rotations
No ratings yet
Corotational Method for Large Rotations
10 pages
Probability Syllabus
No ratings yet
Probability Syllabus
3 pages
ALDEP: Automated Layout Design Overview
100% (1)
ALDEP: Automated Layout Design Overview
15 pages
Data Science Terminology Dictionary
No ratings yet
Data Science Terminology Dictionary
7 pages
Classification & Regression in AIML
No ratings yet
Classification & Regression in AIML
61 pages
Neural Network PID for AC Servo Motors
No ratings yet
Neural Network PID for AC Servo Motors
7 pages
Image Segmentation Techniques Overview
No ratings yet
Image Segmentation Techniques Overview
17 pages
Data Preparation Techniques for Mining
No ratings yet
Data Preparation Techniques for Mining
3 pages
NLP Applications and Text Classification
No ratings yet
NLP Applications and Text Classification
64 pages
M-FANet: Advanced Motor Imagery Decoding
No ratings yet
M-FANet: Advanced Motor Imagery Decoding
11 pages
Mathematics For Civil Engineers
No ratings yet
Mathematics For Civil Engineers
32 pages