Search Header Logo
Reinforcement Learning and Deep RL Python Theory and Projects - Off Policy Versus On Policy

Reinforcement Learning and Deep RL Python Theory and Projects - Off Policy Versus On Policy

Assessment

Interactive Video

Information Technology (IT), Architecture, Social Studies

University

Practice Problem

Hard

Created by

Wayground Content

FREE Resource

The video tutorial introduces two key terminologies in reinforcement learning: off-policy and on-policy. It explains that Q-learning follows an off-policy approach, where the learning agent derives the value function from another policy. In contrast, Sarsa uses an on-policy approach, learning from its current policy. The tutorial briefly touches on the mathematical equations for both methods but focuses on explaining the differences through Python code. The main distinction is that Q-learning seeks the maximum value from a new state, while Sarsa uses the value of a new action from its policy.

Read more

3 questions

Show all answers

1.

OPEN ENDED QUESTION

3 mins • 1 pt

How does the Q learning equation differ from the Sarsa equation?

Evaluate responses using AI:

OFF

2.

OPEN ENDED QUESTION

3 mins • 1 pt

In the context of Q learning, what is meant by 'maximum value of new state'?

Evaluate responses using AI:

OFF

3.

OPEN ENDED QUESTION

3 mins • 1 pt

What steps does Sarsa take to determine the value of a new action?

Evaluate responses using AI:

OFF

Access all questions and much more by creating a free account

Create resources

Host any resource

Get auto-graded reports

Google

Continue with Google

Email

Continue with Email

Classlink

Continue with Classlink

Clever

Continue with Clever

or continue with

Microsoft

Microsoft

Apple

Apple

Others

Others

Already have an account?