If the embedding dimension is 768 and sequence length is 1024 and vocabulary size is 32000 tokens what will be the dimension of softmax MLM head ?

depends on the number of self attention heads

What is the purpose of the masked language modeling head in BERT?

To predict the correct token for masked words using context

To generate random tokens for training

What is the main advantage of using pre-trained language models like BERT?

Leveraging linguistic knowledge for specific tasks with less training data and time

Ignoring linguistic knowledge for specific tasks

Training the model from scratch for every task

Reducing the dimension of the input for specific tasks

NLP-Transformers Last Quiz

Authored by Hazem Abdelazim

Computers

University

Used 15+ times

AI Actions

Add similar questions

Adjust reading levels

Convert to real-world scenario

Translate activity

More...

Content View

Student View

10 questions

Show all answers

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

What is the main purpose of multi-head self-attention in the Transformer model?

To ignore certain aspects of the input

To process inputs sequentially

To process inputs in parallel

To learn multiple contextual relationships at once

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

What is the dimension of each weight matrix in the multi-head self-attention layer if the embedding dimension is 512 and there are 8 heads?

512 x 64

512 x 128

256 x 128

256 x 64

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

What is the purpose of using sinusoids in the original Transformer model?

To process inputs sequentially

To reduce the dimension of the input

To add non-linearity to the system

To inject position information back into the model

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

What is the purpose of the pointwise feed forward neural network in the Transformer model?

To ignore certain aspects of the input

To process inputs sequentially

To introduce non-linearity to learn complex relationships

To reduce the dimension of the input

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

What is the main training task used in the original BERT model?

Sentiment Analysis

Text Generation

Masked Language Modeling (MLM)

Next Sentence Prediction (NSP)

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

What is the purpose of fine-tuning a pre-trained language model?

To adapt the model for a specific task with less data and time

To train the model from scratch

To reduce the dimension of the input

To ignore certain aspects of the input

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

What is the main difference between BERT-base and BERT-large models?

More layers in the decoder

Larger vocabulary size

More layers in the encoder

More blocks, more attention heads, and larger embedding dimensions

Access all questions and much more by creating a free account

Create resources

Host any resource

Get auto-graded reports

Continue with Google

Continue with Email

Continue with Classlink

Continue with Clever

or continue with

Microsoft

Apple

Others

Already have an account?

Similar Resources on Wayground

10 questions

BOOLEAN CLUB QUIZ 1

Quiz

•

University

10 questions

Concept of Computer Network

Quiz

•

University

15 questions

การออกแบบซอฟต์แวร์ (Software Design)

Quiz

•

University

10 questions

Seatwork HASH (Data Structure)

Quiz

•

University

14 questions

Pop Quiz

Quiz

•

University

10 questions

Server Administration- Quiz 1

Quiz

•

12th Grade - University

11 questions

Computer Science (1-9) - Identifying & Preventing Threats

Quiz

•

University

14 questions

IOTA : Web Development Bootcamp Quiz 2

Quiz

•

University

Popular Resources on Wayground

15 questions

Fractions on a Number Line

Quiz

•

3rd Grade

20 questions

Equivalent Fractions

Quiz

•

3rd Grade

25 questions

Multiplication Facts

Quiz

•

5th Grade

29 questions

Alg. 1 Section 5.1 Coordinate Plane

Quiz

•

9th Grade

$fractions$

22 questions

fractions

Quiz

•

3rd Grade

11 questions

FOREST Effective communication

Lesson

•

20 questions

Main Idea and Details

Quiz

•

5th Grade

20 questions

Context Clues

Quiz

•

6th Grade

Discover more resources for Computers

12 questions

IREAD Week 4 - Review

Quiz

•

3rd Grade - University

7 questions

Fragments, Run-ons, and Complete Sentences

Interactive video

•

4th Grade - University

7 questions

Renewable and Nonrenewable Resources

Interactive video

•

4th Grade - University

10 questions

DNA Structure and Replication: Crash Course Biology

Interactive video

•

11th Grade - University

5 questions

Inherited and Acquired Traits of Animals

Interactive video

•

4th Grade - University

5 questions

Examining Theme

Interactive video

•

4th Grade - University

20 questions

Implicit vs. Explicit

Quiz

•

6th Grade - University

7 questions

Comparing Fractions

Interactive video

•

1st Grade - University