Deep Learning: RNNs

Deep Learning: RNNs

5th Grade

6 Qs

quiz-placeholder

Similar activities

Senior Computer Basics

Senior Computer Basics

5th Grade - Professional Development

5 Qs

Logic Gates

Logic Gates

4th - 9th Grade

10 Qs

Automatic Night Light Quizziz

Automatic Night Light Quizziz

1st Grade - Professional Development

9 Qs

PLC Quiz

PLC Quiz

5th Grade

5 Qs

Small Basic (Turtle)

Small Basic (Turtle)

KG - University

10 Qs

Prinsip Dasar Antarmuka

Prinsip Dasar Antarmuka

1st - 10th Grade

10 Qs

GRADE 5 FIRST TERM ICT EXAM

GRADE 5 FIRST TERM ICT EXAM

5th Grade

10 Qs

codes

codes

KG - University

8 Qs

Deep Learning: RNNs

Deep Learning: RNNs

Assessment

Quiz

Computers

5th Grade

Practice Problem

Medium

Created by

Josiah Wang

Used 8+ times

FREE Resource

AI

Enhance your content in a minute

Add similar questions
Adjust reading levels
Convert to real-world scenario
Translate activity
More...

6 questions

Show all answers

1.

MULTIPLE SELECT QUESTION

1 min • 1 pt

Which of the following might help prevent vanishing gradients in recurrent neural networks:

Using a gated recurrent network such as an LSTM

Using ReLU activations

Using more layers in your RNN

None of the above

Answer explanation

Gradient vanishing in vanilla recurrent neural networks is caused because of back-propagation through-time (BPTT) and small values of the weights which make the gradients tend to zero when modelling long-term sequences thanks to repeated application of the chain rule. It has been shown experimentally that using ReLU activations can reduce this effect. One can also use GRUs or LSTM units, which contain gates and channels allowing for the model to select gradients to flow back.

2.

MULTIPLE CHOICE QUESTION

1 min • 1 pt

Which of the following is True when implementing an RNN with language data:

All sentences in the training data must be padded so that they are the same length

We only need to use padding within a mini batch so that sentences in the same mini batch have the same length

Answer explanation

We can adapt the depth of the BPTT algorithm whenever a new mini-batch is sampled from the dataset. Thus, it is not required to pad all the sentences in the training data. However, the inputs within the same mini-batch have to have the same size and therefore padding is required in this case (within the mini-batch).

3.

MULTIPLE SELECT QUESTION

1 min • 1 pt

Which of the following are True about GRUs/LSTMs/vanilla RNNs:

LSTMs are likely to perform better on hard tasks with long-distance relations compared to vanilla RNNs

A GRU has more parameters than an LSTM

A GRU is likely to converge quicker than a LSTM

An RNN is likely to eventually outperform a GRU

Answer explanation

Contrary to GRUs or vanilla RNNs, LSTMs have two inner embeddings: the cell state and the hidden state. The former aims to capture long-term relations while the latter aims to capture the short-term relations present in the sequence.

An LSTM has 2 extra gates compared to GRU (forget gate and output gate). Therefore, a GRU unit has less parameters than an LSTM.

Taking into account the previous premise, it is reasonable to assume that in general a GRU will converge faster than an LSTM since it has less parameters to optimise.

A vanilla RNN unlikely to outperform a GRU if we take into account the gradient vanishing problem, which is addressed to some extent in GRUs.

4.

MULTIPLE CHOICE QUESTION

1 min • 1 pt

True or False: Vanishing gradients are more likely to be an issue with longer sentences

True

False

Answer explanation

Longer sequence results in more applications of the chain rule in BPTT, therefore more opportunity for the gradients to vanish.

5.

MULTIPLE CHOICE QUESTION

1 min • 1 pt

True or False: a BiLSTM has the same number of parameters as an LSTM

False

True

Answer explanation

A Bidirectional LSTM trains two LSTM networks: one models the sequence exactly as a traditional LSTM would do, and the other one models the sequence going backwards. Therefore, the number of parameters that a BiLSTM has is twice as much as a LSTM.

6.

MULTIPLE SELECT QUESTION

1 min • 1 pt

Which of the following are gates used in an LSTM:

Forget gate

Input gate

Output gate

Memory gate

Answer explanation

Media Image

Access all questions and much more by creating a free account

Create resources

Host any resource

Get auto-graded reports

Google

Continue with Google

Email

Continue with Email

Classlink

Continue with Classlink

Clever

Continue with Clever

or continue with

Microsoft

Microsoft

Apple

Apple

Others

Others

Already have an account?