Search Header Logo

NLP-Transformers Last Quiz

Authored by Hazem Abdelazim

Computers

University

Used 15+ times

NLP-Transformers Last  Quiz
AI

AI Actions

Add similar questions

Adjust reading levels

Convert to real-world scenario

Translate activity

More...

    Content View

    Student View

10 questions

Show all answers

1.

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

What is the main purpose of multi-head self-attention in the Transformer model?

To ignore certain aspects of the input

To process inputs sequentially

To process inputs in parallel

To learn multiple contextual relationships at once

2.

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

What is the dimension of each weight matrix in the multi-head self-attention layer if the embedding dimension is 512 and there are 8 heads?

512 x 64

512 x 128

256 x 128

256 x 64

3.

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

What is the purpose of using sinusoids in the original Transformer model?

To process inputs sequentially

To reduce the dimension of the input

To add non-linearity to the system

To inject position information back into the model

4.

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

What is the purpose of the pointwise feed forward neural network in the Transformer model?

To ignore certain aspects of the input

To process inputs sequentially

To introduce non-linearity to learn complex relationships

To reduce the dimension of the input

5.

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

What is the main training task used in the original BERT model?

Sentiment Analysis

Text Generation

Masked Language Modeling (MLM)

Next Sentence Prediction (NSP)

6.

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

What is the purpose of fine-tuning a pre-trained language model?

To adapt the model for a specific task with less data and time

To train the model from scratch

To reduce the dimension of the input

To ignore certain aspects of the input

7.

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

What is the main difference between BERT-base and BERT-large models?

More layers in the decoder

Larger vocabulary size

More layers in the encoder

More blocks, more attention heads, and larger embedding dimensions

Access all questions and much more by creating a free account

Create resources

Host any resource

Get auto-graded reports

Google

Continue with Google

Email

Continue with Email

Classlink

Continue with Classlink

Clever

Continue with Clever

or continue with

Microsoft

Microsoft

Apple

Apple

Others

Others

Already have an account?