Search Header Logo

Reinforcement Learning Concepts Worksheet

Authored by Reena Anbhazhagan

Engineering

University

Used 2+ times

Reinforcement Learning Concepts Worksheet
AI

AI Actions

Add similar questions

Adjust reading levels

Convert to real-world scenario

Translate activity

More...

    Content View

    Student View

38 questions

Show all answers

1.

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

Which statement best characterizes the interaction between the agent and environment when the environment is partially observable?

The agent has direct and complete access to the true state at every time step

The agent receives observations (O_t) that are correlated with the true state (S_t)

The agent receives no feedback from the environment

The agent only obtains reward signals without any state-related information

2.

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

Which condition ensures that a process satisfies the Markov property?

The next state depends on the entire history of past states

The next state depends only on the action taken, ignoring the current state

The next state depends only on the reward signal

The next state depends only on the current state and not on past states

3.

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

Which statement correctly describes an optimal policy?

It maximizes the expected cumulative discounted reward over time

It focuses only on maximizing immediate rewards

It minimizes variability in rewards instead of maximizing returns

It is always unique for all decision problems

4.

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

In a stochastic grid world, if an intended action results in other movements with certain probabilities, what does this indicate?

The policy is deterministic

The system depends on past states

Action outcomes are probabilistic, leading to multiple possible next states

Rewards remain constant for every state-action pair

5.

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

Which tuple correctly defines an MDP?

(S, A, P, R, γ)

(S, A, R, γ)

(S, P, R, γ)

(A, P, R, γ)

6.

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

Which of the following is the correct tuple for a Markov Decision Process (MDP)?

((S, A, R))

((S, A, P, R, \gamma))

((S, P, R))

((A, P, R, \gamma))

7.

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

What ensures convergence in iterative policy evaluation?

The discount factor satisfies ( \gamma < 1 )

The policy is stochastic

Rewards are always zero

The state space is infinite

Access all questions and much more by creating a free account

Create resources

Host any resource

Get auto-graded reports

Google

Continue with Google

Email

Continue with Email

Classlink

Continue with Classlink

Clever

Continue with Clever

or continue with

Microsoft

Microsoft

Apple

Apple

Others

Others

Already have an account?