Reinforcement Learning Quiz

Quiz

•

Other

•

Professional Development

•

Hard

Sai Ganesh

FREE Resource

Student preview

42 questions

Show all answers

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

In the update rule Q_t+1(a)←Q_t(a) +α(R_t−Q_t(a)), select the value of α that we would prefer to estimate Q values in a non-stationary bandit problem.

α=1/n_a+1

α=0.1

α=n_a+1

α=1/(n_a+1)^2

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

The “Credit assignment problem” is the issue of assigning a correct mapping of rewards accumulated by the action(s). Which of the following is/are the reason for credit assignment problem in RL? (Select all that apply)

Reward for an action may only be observed after many time steps.

An agent may get the same reward for multiple actions.

The agent discounts rewards that occurred in previous time steps.

Rewards can be positive or negative

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

Assertion 1: In stationary bandit problems, we can achieve asymptotically correct behaviour by selecting exploratory actions with a fixed non-zero probability without decaying exploration. Assertion 2: In non-stationary bandit problems, it is important that we decay the probability of exploration to zero over time in order to achieve asymptotically correct behavior.

Assertion 1 and Assertion 2 are both True.

Assertion 1 is True and Assertion 2 is False.

Assertion 1 is False and Assertion 2 is True.

Assertion 1 and Assertion 2 are both False.

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

We are trying different algorithms to find the optimal arm for a multi arm bandit. The expected payoff for each algorithm corresponds to some function with respect to time t (time starting from 0). Given that the optimal expected pay off is 1, which among the following functions corresponds to the algorithm with the least Regret? (Hint: Plot the functions)

tanh(t/5)

1−2^−t

x/20 if x < 20 and 1 after that

Same regret for all the above functions.

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

Which of the following is/are correct and valid reasons to consider sampling actions from a softmax distribution instead of using an ε-greedy approach?

Softmax exploration makes the probability of picking an action proportional to the action-value estimates. By doing so, it avoids wasting time exploring obviously ’bad’ actions.

We do not need to worry about decaying exploration slowly like we do in the ε-greedy case. Softmax exploration gives us asymptotic correctness even for a sharp decrease in temperature.

It helps us differentiate between actions with action-value estimates (Q values) that are very close to the action with maximum Q value.

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

Consider a standard multi-arm bandit problem. The probability of picking an action, using the softmax policy is given by: Pr(at=a) = e^(Q_t(a)/β) / Σ_b e^(Q_t(b)/β). Now, assuming the following action-value estimates: Q_t(a_0) = 1, Q_t(a_1) = 0.2, Q_t(a_2) = 0.5, Q_t(a_3) = -1, Q_t(a_4) = 0.02 and Q_t(a_5) = -2. What is the probability that action 2 is selected? (use β= 0.1)

0.13

0.232

0.143

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

What are the properties of a solution method that is PAC Optimal?

Both (b) and (c)

Both (a) and (b)

It always reaches optimal behaviour faster than an algorithm that is simply asymptotically correct.

It is guaranteed to find the correct solution.

It minimizes sample complexity to make the PAC guarantee.

Create a free account and access millions of resources

Create resources

Host any resource

Get auto-graded reports

or continue with

Microsoft

Apple

Others

By signing up, you agree to our Terms of Service & Privacy Policy

Already have an account?

Popular Resources on Wayground

25 questions

Equations of Circles

Quiz

•

10th - 11th Grade

30 questions

Week 5 Memory Builder 1 (Multiplication and Division Facts)

Quiz

•

9th Grade

33 questions

Unit 3 Summative - Summer School: Immune System

Quiz

•

10th Grade

10 questions

Writing and Identifying Ratios Practice

Quiz

•

5th - 6th Grade

36 questions

Prime and Composite Numbers

Quiz

•

5th Grade

14 questions

Exterior and Interior angles of Polygons

Quiz

•

8th Grade

37 questions

Camp Re-cap Week 1 (no regression)

Quiz

•

9th - 12th Grade

46 questions

Biology Semester 1 Review

Quiz

•

10th Grade