
deep rl
Quiz
•
Other
•
University
•
Practice Problem
•
Easy
lucky star
Used 2+ times
FREE Resource
Enhance your content in a minute
25 questions
Show all answers
1.
MULTIPLE CHOICE QUESTION
30 sec • 1 pt
How does the n-step TD approach differ from TD(0)?
TD(0) is only used for deterministic policies, while n-step TD is for stochastic policies.
n-step TD randomly selects how many steps to wait before an update
n-step TD updates only at the end of the episode, just like Monte Carlo
TD(0) is an on-policy method while n-step TD is off-policy.
n-step TD uses longer traces of rewards and states before performing a single update, rather than a one-step lookahead.
2.
MULTIPLE CHOICE QUESTION
30 sec • 1 pt
Which key feature distinguishes Temporal Difference (TD) learning from Monte Carlo methods?
TD requires access to the full model of the environment’s transition probabilities.
TD is only applicable to deterministic policies.
TD waits until the end of an episode to update value estimates.
TD uses bootstrapping from current estimates rather than waiting for the final outcome.
TD cannot update its estimates online.
3.
MULTIPLE CHOICE QUESTION
30 sec • 1 pt
In DQN, what is the key purpose of the Experience Replay Buffer?
It amplifies the most recent transition repeatedly to speed up learning
It only stores states without actions or rewards
It replaces the need for a target network
It ensures that all experiences are used exactly once to avoid correlation
It stores past experiences and samples them randomly to break correlation in sequential data
4.
MULTIPLE CHOICE QUESTION
30 sec • 1 pt
Q-Learning and SARSA both estimate Q-values, but Q-Learning is considered off-policy because
The actions used in the bootstrapped target are always taken from the same policy that generates behavior
It ignores the discount factor in updates
It never uses \epsilon-greedy exploration
It uses the same policy for both exploration and evaluation
It updates using a greedy action for the next state, not necessarily the one followed by the agent during data collection
5.
MULTIPLE CHOICE QUESTION
30 sec • 1 pt
Why does DQN use a separate target network?
To generate randomized actions for exploration
To eliminate the need for discounting future rewards
To stabilize Q-value updates by keeping target estimates fixed for a while
To convert a continuous action space into a discrete one
To independently learn a model of the transition probabilities
6.
MULTIPLE CHOICE QUESTION
30 sec • 1 pt
Double Q-Learning was introduced primarily to address which issue?
Handling continuous actions without an actor-critic method
The inability of Q-Learning to handle function approximation
The instability caused by batch updates in Q-Learning
Overestimation bias in Q-value updates due to using \max over the same Q function
Lack of exploration in Q-Learning
7.
MULTIPLE CHOICE QUESTION
30 sec • 1 pt
TD methods differ from Dynamic Programming (DP) primarily because TD methods
Do not use the concept of value functions
Only work in deterministic environments
Always converge faster than DP methods
Can learn directly from raw experience without knowing transition probabilities
Require a perfect model of the environment’s transitions
Access all questions and much more by creating a free account
Create resources
Host any resource
Get auto-graded reports

Continue with Google

Continue with Email

Continue with Classlink

Continue with Clever
or continue with

Microsoft
%20(1).png)
Apple
Others
Already have an account?
Similar Resources on Wayground
20 questions
GRizal Review
Quiz
•
University
20 questions
Artificial Potential Field Method & Roadmaps
Quiz
•
University
20 questions
Pet /FCE1 / Senior 4 passives quiz
Quiz
•
10th Grade - University
20 questions
Révision globale - B1 / B2
Quiz
•
University
20 questions
Starbucks Knowledge Quiz
Quiz
•
University - Professi...
20 questions
Songs & Song Artists
Quiz
•
3rd Grade - Professio...
20 questions
Principal of marketing chapter 2
Quiz
•
University
20 questions
Геологоразведочные работы на нефть и газ
Quiz
•
University - Professi...
Popular Resources on Wayground
15 questions
Fractions on a Number Line
Quiz
•
3rd Grade
20 questions
Equivalent Fractions
Quiz
•
3rd Grade
25 questions
Multiplication Facts
Quiz
•
5th Grade
22 questions
fractions
Quiz
•
3rd Grade
20 questions
Main Idea and Details
Quiz
•
5th Grade
20 questions
Context Clues
Quiz
•
6th Grade
15 questions
Equivalent Fractions
Quiz
•
4th Grade
20 questions
Figurative Language Review
Quiz
•
6th Grade
Discover more resources for Other
12 questions
IREAD Week 4 - Review
Quiz
•
3rd Grade - University
23 questions
Subject Verb Agreement
Quiz
•
9th Grade - University
7 questions
Force and Motion
Interactive video
•
4th Grade - University
7 questions
Renewable and Nonrenewable Resources
Interactive video
•
4th Grade - University
5 questions
Poetry Interpretation
Interactive video
•
4th Grade - University
19 questions
Black History Month Trivia
Quiz
•
6th Grade - Professio...
15 questions
Review1
Quiz
•
University
15 questions
Pre1
Quiz
•
University
