What is the main purpose of Batch Normalization?

Stabilize and accelerate training

What kind of regularization does dropout provide?

Randomly ignoring units during training

What is ensemble learning?

Combining multiple models for better performance

What is a hyperparameter?

A parameter set before training begins

A parameter adjusted by backpropagation

A random number assigned to inputs

How is hyperparameter optimization commonly performed?

Random sampling within a defined range and evaluating results

Using a single run with fixed values

Increasing them until loss vanishes

ML Chapter 06

University

•

15 Qs

Similar activities

Wireless Networks Quiz 2

University

•

20 Qs

ASM655 Chapter 1 Quiz

University

•

20 Qs

สอบปลายภาค งานกราฟิกเบื้องต้น

University

•

20 Qs

Desain Brief

University

•

10 Qs

Contingency Planning

University

•

10 Qs

PRE TEST (B) - Product Link & TELEMATICS

University - Professional Development

•

20 Qs

Algorithms

University

•

10 Qs

GIS UNIT 1 and 2

University

•

20 Qs

ML Chapter 06

Quiz

•

Computers

•

University

•

Practice Problem

•

Medium

Jhonston Benjumea

Used 1+ times

FREE Resource

15 questions

Show all answers

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

What does SGD stand for in neural network training?

Soft Gradient Descent

Stochastic Gradient Descent

Strong Graph Derivative

Semi-Gain Depth

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

What is the main idea behind Stochastic Gradient Descent (SGD)?

Using the full dataset for every update

Adding randomness to initialization

Updating weights using small random batches

Freezing weights during training

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

What problem does the Momentum method solve in SGD?

Overfitting

Vanishing gradient

Oscillations in gradient updates

Data imbalance

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

How does AdaGrad adjust the learning rate?

Keeps it constant

Increases it exponentially

Adapts it for each parameter based on past gradients

Resets it every epoch

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

What is the main feature of the Adam optimizer?

Ignores momentum

Uses only recent gradients

Combines Momentum and AdaGrad

Requires no tuning

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

Why is initializing weights with a standard deviation of 0.01 sometimes problematic?

It slows down learning

It may cause vanishing gradients

It improves generalization

It speeds up convergence too much

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

What is the Xavier initialization designed for?

ReLU activations

Linear regression

Layers with sigmoid/tanh activations

Binary classification

Access all questions and much more by creating a free account

Create resources

Host any resource

Get auto-graded reports

Continue with Google

Continue with Email

Continue with Classlink

Continue with Clever

or continue with

Microsoft

Apple

Others

Already have an account?

Similar Resources on Wayground

11 questions

Determining System Requirements

Quiz

•

University

10 questions

Proxy Server

Quiz

•

University

10 questions

Normalization

Quiz

•

University

20 questions

CpE111-Quiz4

Quiz

•

University

10 questions

Introduction of Computer Vision

Quiz

•

University

15 questions

Tin10_Bài 13

Quiz

•

10th Grade - University

14 questions

Information Technology Application in Construction Quiz

Quiz

•

University

10 questions

Machine Learning (Introduction)

Quiz

•

University

Popular Resources on Wayground

15 questions

Fractions on a Number Line

Quiz

•

3rd Grade

20 questions

Equivalent Fractions

Quiz

•

3rd Grade

25 questions

Multiplication Facts

Quiz

•

5th Grade

$fractions$

22 questions

fractions

Quiz

•

3rd Grade

20 questions

Main Idea and Details

Quiz

•

5th Grade

20 questions

Context Clues

Quiz

•

6th Grade

15 questions

Equivalent Fractions

Quiz

•

4th Grade

20 questions

Figurative Language Review

Quiz

•

6th Grade

Discover more resources for Computers

30 questions

Quiz 1 Review

Quiz

•

University

ML Chapter 06

15 questions

What does SGD stand for in neural network training?

What is the main idea behind Stochastic Gradient Descent (SGD)?

What problem does the Momentum method solve in SGD?

How does AdaGrad adjust the learning rate?

What is the main feature of the Adam optimizer?

Why is initializing weights with a standard deviation of 0.01 sometimes problematic?

What is the Xavier initialization designed for?

What is the He initialization suitable for?

What is the main purpose of Batch Normalization?

What does weight decay help prevent?

Access all questions and much more by creating a free account

Similar Resources on Wayground

Popular Resources on Wayground

Discover more resources for Computers