Reinforcement Learning and Deep RL Python Theory and Projects - Off Policy Versus On Policy

Reinforcement Learning and Deep RL Python Theory and Projects - Off Policy Versus On Policy

Assessment

Interactive Video

Information Technology (IT), Architecture, Social Studies

University

Hard

Created by

Quizizz Content

FREE Resource

The video tutorial introduces two key terminologies in reinforcement learning: off-policy and on-policy. It explains that Q-learning follows an off-policy approach, where the learning agent derives the value function from another policy. In contrast, Sarsa uses an on-policy approach, learning from its current policy. The tutorial briefly touches on the mathematical equations for both methods but focuses on explaining the differences through Python code. The main distinction is that Q-learning seeks the maximum value from a new state, while Sarsa uses the value of a new action from its policy.

Read more

3 questions

Show all answers

1.

OPEN ENDED QUESTION

3 mins • 1 pt

How does the Q learning equation differ from the Sarsa equation?

Evaluate responses using AI:

OFF

2.

OPEN ENDED QUESTION

3 mins • 1 pt

In the context of Q learning, what is meant by 'maximum value of new state'?

Evaluate responses using AI:

OFF

3.

OPEN ENDED QUESTION

3 mins • 1 pt

What steps does Sarsa take to determine the value of a new action?

Evaluate responses using AI:

OFF