deep rl

Quiz

•

Other

•

University

•

Easy

lucky star

Used 2+ times

FREE Resource

25 questions

Show all answers

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

How does the n-step TD approach differ from TD(0)?

TD(0) is only used for deterministic policies, while n-step TD is for stochastic policies.

n-step TD randomly selects how many steps to wait before an update

n-step TD updates only at the end of the episode, just like Monte Carlo

TD(0) is an on-policy method while n-step TD is off-policy.

n-step TD uses longer traces of rewards and states before performing a single update, rather than a one-step lookahead.

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

Which key feature distinguishes Temporal Difference (TD) learning from Monte Carlo methods?

TD requires access to the full model of the environment’s transition probabilities.

TD is only applicable to deterministic policies.

TD waits until the end of an episode to update value estimates.

TD uses bootstrapping from current estimates rather than waiting for the final outcome.

TD cannot update its estimates online.

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

In DQN, what is the key purpose of the Experience Replay Buffer?

It amplifies the most recent transition repeatedly to speed up learning

It only stores states without actions or rewards

It replaces the need for a target network

It ensures that all experiences are used exactly once to avoid correlation

It stores past experiences and samples them randomly to break correlation in sequential data

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

Q-Learning and SARSA both estimate Q-values, but Q-Learning is considered off-policy because

The actions used in the bootstrapped target are always taken from the same policy that generates behavior

It ignores the discount factor in updates

It never uses \epsilon-greedy exploration

It uses the same policy for both exploration and evaluation

It updates using a greedy action for the next state, not necessarily the one followed by the agent during data collection

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

Why does DQN use a separate target network?

To generate randomized actions for exploration

To eliminate the need for discounting future rewards

To stabilize Q-value updates by keeping target estimates fixed for a while

To convert a continuous action space into a discrete one

To independently learn a model of the transition probabilities

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

Double Q-Learning was introduced primarily to address which issue?

Handling continuous actions without an actor-critic method

The inability of Q-Learning to handle function approximation

The instability caused by batch updates in Q-Learning

Overestimation bias in Q-value updates due to using \max over the same Q function

Lack of exploration in Q-Learning

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

TD methods differ from Dynamic Programming (DP) primarily because TD methods

Do not use the concept of value functions

Only work in deterministic environments

Always converge faster than DP methods

Can learn directly from raw experience without knowing transition probabilities

Require a perfect model of the environment’s transitions

Create a free account and access millions of resources

Create resources

Host any resource

Get auto-graded reports

or continue with

Microsoft

Apple

Others

By signing up, you agree to our Terms of Service & Privacy Policy

Already have an account?

Similar Resources on Wayground

20 questions

general aptitude

Quiz

•

University

20 questions

Cuisine

Quiz

•

University

20 questions

Mini Try SPMB PKN STAN 2020 by:masuk.stan

Quiz

•

10th Grade - Professi...

20 questions

L'interrogation

Quiz

•

10th Grade - University

25 questions

Kardiologi 2

Quiz

•

University

24 questions

Strategic Management Unit 1

Quiz

•

University

20 questions

HTML/CSS

Quiz

•

9th Grade - Professio...

23 questions

Stratégie de communication

Quiz

•

University

Popular Resources on Wayground

25 questions

Equations of Circles

Quiz

•

10th - 11th Grade

30 questions

Week 5 Memory Builder 1 (Multiplication and Division Facts)

Quiz

•

9th Grade

33 questions

Unit 3 Summative - Summer School: Immune System

Quiz

•

10th Grade

10 questions

Writing and Identifying Ratios Practice

Quiz

•

5th - 6th Grade

36 questions

Prime and Composite Numbers

Quiz

•

5th Grade

14 questions

Exterior and Interior angles of Polygons

Quiz

•

8th Grade

37 questions

Camp Re-cap Week 1 (no regression)

Quiz

•

9th - 12th Grade

46 questions

Biology Semester 1 Review

Quiz

•

10th Grade