Which of the following is a symptom of overfitting?

Good generalization to previously unseen data

SummerSchool-Quiz8

Authored by Irfan Ahmad

Computers

University

Used 2+ times

AI Actions

Add similar questions

Adjust reading levels

Convert to real-world scenario

Translate activity

More...

Content View

Student View

9 questions

Show all answers

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

Transformer models do not have recurrent units but can still perform sequence modeling.

True

False

MULTIPLE CHOICE QUESTION

45 sec • 1 pt

As the number of training examples goes to infinity, your model will have:

Low bias

High Bias

Same Bias

Depends on the model’s variance

MULTIPLE CHOICE QUESTION

1 min • 1 pt

Compared to the encoder-decoder model which does not use an attention mechanism, we expect the attention model to have the greatest advantage when:

The input sequence length is large.

The input sequence length is small.

The vocabulary size is large.

The vocabulary size is small.

MULTIPLE CHOICE QUESTION

1 min • 1 pt

You have a friend whose mood is heavily dependent on the current and past few days’ weather. You’ve collected data for the past 365 days on the weather, which you represent as a sequence as x<1>, …, x<365>. You’ve also collected data on your friend’s mood, which you represent as y<1>, …, y<365>. You’d like to build a model to map from x→y. Should you use a Unidirectional RNN or Bidirectional RNN for this problem?

Bidirectional RNN, because this allows the prediction of mood on day t to take into account more information

Bidirectional RNN, because this allows backpropagation to compute more accurate gradients

Unidirectional RNN, because the value of y<t> depends only on x<1>,…,x<t>, but not on x<t+1>,…,x<365>

Unidirectional RNN, because the value of y depends only on x<t> , and not other days

MULTIPLE SELECT QUESTION

1 min • 2 pts

In beam search, if you increase the beam width, which of the following would you expect to be true?

Beam search will run more slowly

Beam search will use up more memory

Beam search will generally find better solutions

Beam search will converge after fewer steps

Beam search will run much faster as more options can be considered

MULTIPLE CHOICE QUESTION

45 sec • 1 pt

How does decoder module of the transformer model avoid seeing the tokens that do not appear yet in output sequence?

Multi-head attention

Positional encoding

Self attention

Masking future positions

MULTIPLE CHOICE QUESTION

45 sec • 1 pt

Which concept in transformer allows for inducing the sequence information in input tokens:

Multi-head attention

Positional encoding

Self attention

Masking future positions before the softmax step

Access all questions and much more by creating a free account

Create resources

Host any resource

Get auto-graded reports

Continue with Google

Continue with Email

Continue with Classlink

Continue with Clever

or continue with

Microsoft

Apple

Others

Already have an account?

Similar Resources on Wayground

11 questions

Determining System Requirements

Quiz

•

University

10 questions

Planning and Implementation of Information Security

Quiz

•

University

10 questions

INTO Artificial Intelligence

Quiz

•

University - Professi...

10 questions

2025 python class first quiz

Quiz

•

9th Grade - University

13 questions

GIT COURSE

Quiz

•

University

10 questions

Let's Play Together

Quiz

•

University - Professi...

10 questions

Chapter 11 Quizz

Quiz

•

University

10 questions

Machine Learning (Introduction)

Quiz

•

University

Popular Resources on Wayground

7 questions

History of Valentine's Day

Interactive video

•

4th Grade

15 questions

Fractions on a Number Line

Quiz

•

3rd Grade

20 questions

Equivalent Fractions

Quiz

•

3rd Grade

25 questions

Multiplication Facts

Quiz

•

5th Grade

$fractions$

22 questions

fractions

Quiz

•

3rd Grade

15 questions

Valentine's Day Trivia

Quiz

•

3rd Grade

20 questions

Main Idea and Details

Quiz

•

5th Grade

20 questions

Context Clues

Quiz

•

6th Grade

Discover more resources for Computers

18 questions

Valentines Day Trivia

Quiz

•

3rd Grade - University

12 questions

IREAD Week 4 - Review

Quiz

•

3rd Grade - University

23 questions

Subject Verb Agreement

Quiz

•

9th Grade - University

5 questions

What is Presidents' Day?

Interactive video

•

10th Grade - University

7 questions

Renewable and Nonrenewable Resources

Interactive video

•

4th Grade - University

20 questions

Mardi Gras History

Quiz

•

6th Grade - University

10 questions

The Roaring 20's Crash Course US History

Interactive video

•

11th Grade - University

17 questions

Review9_TEACHER

Quiz

•

University

SummerSchool-Quiz8

Transformer models do not have recurrent units but can still perform sequence modeling.

As the number of training examples goes to infinity, your model will have:

Compared to the encoder-decoder model which does not use an attention mechanism, we expect the attention model to have the greatest advantage when:

In beam search, if you increase the beam width, which of the following would you expect to be true?

How does decoder module of the transformer model avoid seeing the tokens that do not appear yet in output sequence?

Which concept in transformer allows for inducing the sequence information in input tokens:

Which of the following is a symptom of overfitting?

Teacher forcing uses the actual output from the training dataset at time step t as input in the next time step (t+1), instead of the output generated by your model.

Access all questions and much more by creating a free account

Similar Resources on Wayground

Popular Resources on Wayground

Discover more resources for Computers