
Reinforcement Learning Quiz
Quiz
•
Other
•
Professional Development
•
Practice Problem
•
Hard
Sai Ganesh
FREE Resource
Student preview

42 questions
Show all answers
1.
MULTIPLE CHOICE QUESTION
30 sec • 1 pt
In the update rule Q_t+1(a)←Q_t(a) +α(R_t−Q_t(a)), select the value of α that we would prefer to estimate Q values in a non-stationary bandit problem.
α=1/n_a+1
α=0.1
α=n_a+1
α=1/(n_a+1)^2
2.
MULTIPLE CHOICE QUESTION
30 sec • 1 pt
The “Credit assignment problem” is the issue of assigning a correct mapping of rewards accumulated by the action(s). Which of the following is/are the reason for credit assignment problem in RL? (Select all that apply)
Reward for an action may only be observed after many time steps.
An agent may get the same reward for multiple actions.
The agent discounts rewards that occurred in previous time steps.
Rewards can be positive or negative
3.
MULTIPLE CHOICE QUESTION
30 sec • 1 pt
Assertion 1: In stationary bandit problems, we can achieve asymptotically correct behaviour by selecting exploratory actions with a fixed non-zero probability without decaying exploration. Assertion 2: In non-stationary bandit problems, it is important that we decay the probability of exploration to zero over time in order to achieve asymptotically correct behavior.
Assertion 1 and Assertion 2 are both True.
Assertion 1 is True and Assertion 2 is False.
Assertion 1 is False and Assertion 2 is True.
Assertion 1 and Assertion 2 are both False.
4.
MULTIPLE CHOICE QUESTION
30 sec • 1 pt
We are trying different algorithms to find the optimal arm for a multi arm bandit. The expected payoff for each algorithm corresponds to some function with respect to time t (time starting from 0). Given that the optimal expected pay off is 1, which among the following functions corresponds to the algorithm with the least Regret? (Hint: Plot the functions)
tanh(t/5)
1−2^−t
x/20 if x < 20 and 1 after that
Same regret for all the above functions.
5.
MULTIPLE CHOICE QUESTION
30 sec • 1 pt
Which of the following is/are correct and valid reasons to consider sampling actions from a softmax distribution instead of using an ε-greedy approach?
Softmax exploration makes the probability of picking an action proportional to the action-value estimates. By doing so, it avoids wasting time exploring obviously ’bad’ actions.
We do not need to worry about decaying exploration slowly like we do in the ε-greedy case. Softmax exploration gives us asymptotic correctness even for a sharp decrease in temperature.
It helps us differentiate between actions with action-value estimates (Q values) that are very close to the action with maximum Q value.
6.
MULTIPLE CHOICE QUESTION
30 sec • 1 pt
Consider a standard multi-arm bandit problem. The probability of picking an action, using the softmax policy is given by: Pr(at=a) = e^(Q_t(a)/β) / Σ_b e^(Q_t(b)/β). Now, assuming the following action-value estimates: Q_t(a_0) = 1, Q_t(a_1) = 0.2, Q_t(a_2) = 0.5, Q_t(a_3) = -1, Q_t(a_4) = 0.02 and Q_t(a_5) = -2. What is the probability that action 2 is selected? (use β= 0.1)
0
0.13
0.232
0.143
7.
MULTIPLE CHOICE QUESTION
30 sec • 1 pt
What are the properties of a solution method that is PAC Optimal?
Both (b) and (c)
Both (a) and (b)
It always reaches optimal behaviour faster than an algorithm that is simply asymptotically correct.
It is guaranteed to find the correct solution.
It minimizes sample complexity to make the PAC guarantee.
Access all questions and much more by creating a free account
Create resources
Host any resource
Get auto-graded reports

Continue with Google

Continue with Email

Continue with Classlink

Continue with Clever
or continue with

Microsoft
%20(1).png)
Apple
Others
Already have an account?
Popular Resources on Wayground
15 questions
Fractions on a Number Line
Quiz
•
3rd Grade
20 questions
Equivalent Fractions
Quiz
•
3rd Grade
25 questions
Multiplication Facts
Quiz
•
5th Grade
54 questions
Analyzing Line Graphs & Tables
Quiz
•
4th Grade
22 questions
fractions
Quiz
•
3rd Grade
20 questions
Main Idea and Details
Quiz
•
5th Grade
20 questions
Context Clues
Quiz
•
6th Grade
15 questions
Equivalent Fractions
Quiz
•
4th Grade
Discover more resources for Other
20 questions
Black History Month Trivia Game #1
Quiz
•
Professional Development
100 questions
Screening Test Customer Service
Quiz
•
Professional Development
20 questions
90s Cartoons
Quiz
•
Professional Development
10 questions
Reading a ruler in Inches
Quiz
•
4th Grade - Professio...
16 questions
Parallel, Perpendicular, and Intersecting Lines
Quiz
•
KG - Professional Dev...
12 questions
Valentines Day Trivia
Quiz
•
Professional Development