
Deep Learning: RNNs

Quiz
•
Computers
•
5th Grade
•
Medium

Josiah Wang
Used 6+ times
FREE Resource
6 questions
Show all answers
1.
MULTIPLE SELECT QUESTION
1 min • 1 pt
Which of the following might help prevent vanishing gradients in recurrent neural networks:
Using a gated recurrent network such as an LSTM
Using ReLU activations
Using more layers in your RNN
None of the above
Answer explanation
Gradient vanishing in vanilla recurrent neural networks is caused because of back-propagation through-time (BPTT) and small values of the weights which make the gradients tend to zero when modelling long-term sequences thanks to repeated application of the chain rule. It has been shown experimentally that using ReLU activations can reduce this effect. One can also use GRUs or LSTM units, which contain gates and channels allowing for the model to select gradients to flow back.
2.
MULTIPLE CHOICE QUESTION
1 min • 1 pt
Which of the following is True when implementing an RNN with language data:
All sentences in the training data must be padded so that they are the same length
We only need to use padding within a mini batch so that sentences in the same mini batch have the same length
Answer explanation
We can adapt the depth of the BPTT algorithm whenever a new mini-batch is sampled from the dataset. Thus, it is not required to pad all the sentences in the training data. However, the inputs within the same mini-batch have to have the same size and therefore padding is required in this case (within the mini-batch).
3.
MULTIPLE SELECT QUESTION
1 min • 1 pt
Which of the following are True about GRUs/LSTMs/vanilla RNNs:
LSTMs are likely to perform better on hard tasks with long-distance relations compared to vanilla RNNs
A GRU has more parameters than an LSTM
A GRU is likely to converge quicker than a LSTM
An RNN is likely to eventually outperform a GRU
Answer explanation
Contrary to GRUs or vanilla RNNs, LSTMs have two inner embeddings: the cell state and the hidden state. The former aims to capture long-term relations while the latter aims to capture the short-term relations present in the sequence.
An LSTM has 2 extra gates compared to GRU (forget gate and output gate). Therefore, a GRU unit has less parameters than an LSTM.
Taking into account the previous premise, it is reasonable to assume that in general a GRU will converge faster than an LSTM since it has less parameters to optimise.
A vanilla RNN unlikely to outperform a GRU if we take into account the gradient vanishing problem, which is addressed to some extent in GRUs.
4.
MULTIPLE CHOICE QUESTION
1 min • 1 pt
True or False: Vanishing gradients are more likely to be an issue with longer sentences
True
False
Answer explanation
Longer sequence results in more applications of the chain rule in BPTT, therefore more opportunity for the gradients to vanish.
5.
MULTIPLE CHOICE QUESTION
1 min • 1 pt
True or False: a BiLSTM has the same number of parameters as an LSTM
False
True
Answer explanation
A Bidirectional LSTM trains two LSTM networks: one models the sequence exactly as a traditional LSTM would do, and the other one models the sequence going backwards. Therefore, the number of parameters that a BiLSTM has is twice as much as a LSTM.
6.
MULTIPLE SELECT QUESTION
1 min • 1 pt
Which of the following are gates used in an LSTM:
Forget gate
Input gate
Output gate
Memory gate
Answer explanation
Similar Resources on Wayground
Popular Resources on Wayground
50 questions
Trivia 7/25

Quiz
•
12th Grade
11 questions
Standard Response Protocol

Quiz
•
6th - 8th Grade
11 questions
Negative Exponents

Quiz
•
7th - 8th Grade
12 questions
Exponent Expressions

Quiz
•
6th Grade
4 questions
Exit Ticket 7/29

Quiz
•
8th Grade
20 questions
Subject-Verb Agreement

Quiz
•
9th Grade
20 questions
One Step Equations All Operations

Quiz
•
6th - 7th Grade
18 questions
"A Quilt of a Country"

Quiz
•
9th Grade