
Reinforcement Learning Concepts Worksheet
Authored by Reena Anbhazhagan
Engineering
University
Used 2+ times

AI Actions
Add similar questions
Adjust reading levels
Convert to real-world scenario
Translate activity
More...
Content View
Student View
38 questions
Show all answers
1.
MULTIPLE CHOICE QUESTION
30 sec • 1 pt
Which statement best characterizes the interaction between the agent and environment when the environment is partially observable?
The agent has direct and complete access to the true state at every time step
The agent receives observations (O_t) that are correlated with the true state (S_t)
The agent receives no feedback from the environment
The agent only obtains reward signals without any state-related information
2.
MULTIPLE CHOICE QUESTION
30 sec • 1 pt
Which condition ensures that a process satisfies the Markov property?
The next state depends on the entire history of past states
The next state depends only on the action taken, ignoring the current state
The next state depends only on the reward signal
The next state depends only on the current state and not on past states
3.
MULTIPLE CHOICE QUESTION
30 sec • 1 pt
Which statement correctly describes an optimal policy?
It maximizes the expected cumulative discounted reward over time
It focuses only on maximizing immediate rewards
It minimizes variability in rewards instead of maximizing returns
It is always unique for all decision problems
4.
MULTIPLE CHOICE QUESTION
30 sec • 1 pt
In a stochastic grid world, if an intended action results in other movements with certain probabilities, what does this indicate?
The policy is deterministic
The system depends on past states
Action outcomes are probabilistic, leading to multiple possible next states
Rewards remain constant for every state-action pair
5.
MULTIPLE CHOICE QUESTION
30 sec • 1 pt
Which tuple correctly defines an MDP?
(S, A, P, R, γ)
(S, A, R, γ)
(S, P, R, γ)
(A, P, R, γ)
6.
MULTIPLE CHOICE QUESTION
30 sec • 1 pt
Which of the following is the correct tuple for a Markov Decision Process (MDP)?
((S, A, R))
((S, A, P, R, \gamma))
((S, P, R))
((A, P, R, \gamma))
7.
MULTIPLE CHOICE QUESTION
30 sec • 1 pt
What ensures convergence in iterative policy evaluation?
The discount factor satisfies ( \gamma < 1 )
The policy is stochastic
Rewards are always zero
The state space is infinite
Access all questions and much more by creating a free account
Create resources
Host any resource
Get auto-graded reports

Continue with Google

Continue with Email

Continue with Classlink

Continue with Clever
or continue with

Microsoft
%20(1).png)
Apple
Others
Already have an account?