Which parameter is commonly used to help prevent a single Decision Tree from overfitting the training data?

Increasing the number of features

Using more categorical variables

A Random Forest builds multiple trees, each trained on a bootstrap sample. In this context, what is a bootstrap sample?

A random sample drawn with replacement from the training data.

A stratified subset that guarantees an exact 50/50 class balance.

A random sample drawn without replacement from the training data.

A subset of features rather than a subset of training examples.

Recap Quiz

Authored by Timilehin Aderinola

Information Technology (IT)

University

Used 1+ times

AI Actions

Add similar questions

Adjust reading levels

Convert to real-world scenario

Translate activity

More...

Content View

Student View

10 questions

Show all answers

MULTIPLE CHOICE QUESTION

20 sec • 1 pt

In the CRISP-DM methodology, what is the primary purpose of the Data Understanding phase?

To clean and transform raw data into a usable format

To explore and gather initial insights about the data through tables and visualizations

To build and evaluate machine learning models

To deploy the final model into production

MULTIPLE CHOICE QUESTION

20 sec • 1 pt

During data cleaning, why might you choose to clamp outliers rather than remove those rows entirely?

Clamping is required by most machine learning algorithms.

Clamping is faster computationally than deleting rows.

Clamping preserves all data points and reduces the influence of extremes without losing information about those cases.

Removing rows corrupts the original dataset permanently.

MULTIPLE CHOICE QUESTION

20 sec • 1 pt

When preparing data for a machine learning model, why is stratified sampling important when splitting your data into training and test sets?

It ensures both the training and test sets have similar class distributions for the target variable.

It guarantees equal numbers of every class in each set.

It automatically normalizes the numeric features.

It prevents the model from overfitting to the majority class.

MULTIPLE CHOICE QUESTION

20 sec • 1 pt

A basketball player's weight is standardized using a z-score transformation. If a player's weight is exactly equal to the mean weight of the dataset, what will their standardized value be?

-1

0.5

MULTIPLE CHOICE QUESTION

20 sec • 1 pt

A linear regression model evaluates its predictions and achieves a Mean Absolute Error (MAE) of 25 and a Root Mean Squared Error (RMSE) of 45. Why is the RMSE noticeably larger than the MAE?

MAE cannot exceed RMSE by definition.

RMSE uses entirely different units of measurement than MAE.

There was likely a calculation error during evaluation.

RMSE penalizes large errors more heavily due to the squaring of the residuals.

MULTIPLE CHOICE QUESTION

20 sec • 1 pt

What is the primary output of a logistic regression model before any decision threshold is applied?

A probability score for class membership

A definitive class label (e.g., Class 0 or Class 1)

The distance to the nearest cluster centroid

A continuous numeric prediction with no upper bound

MULTIPLE CHOICE QUESTION

20 sec • 1 pt

In an imbalanced classification setting (e.g., a dataset with 95% negative cases and 5% positive cases), why can evaluating the model solely on "Accuracy" be misleading?

Accuracy cannot be computed for binary classification tasks.

Accuracy ignores class distribution and a model can appear highly accurate simply by predicting the majority class every time.

Accuracy requires probabilities rather than discrete labels.

Accuracy is only a valid metric when using cross-validation.

Access all questions and much more by creating a free account

Create resources

Host any resource

Get auto-graded reports

Continue with Google

Continue with Email

Continue with Classlink

Continue with Clever

or continue with

Microsoft

Apple

Others

Already have an account?

Popular Resources on Wayground

20 questions

Math Review

Quiz

•

3rd Grade

15 questions

Fast food

Quiz

•

7th Grade

20 questions

Context Clues

Quiz

•

6th Grade

20 questions

Inferences

Quiz

•

4th Grade

19 questions

Classifying Quadrilaterals

Quiz

•

3rd Grade

20 questions

Figurative Language Review

Quiz

•

6th Grade

20 questions

Equivalent Fractions

Quiz

•

3rd Grade

10 questions

Identify Fractions, Mixed Numbers & Improper Fractions

Quiz

•

3rd - 4th Grade

Discover more resources for Information Technology (IT)

20 questions

Guess The App

Quiz

•

KG - Professional Dev...

11 questions

NFL Football logos

Quiz

•

KG - Professional Dev...

19 questions

Minecraft

Quiz

•

6th Grade - Professio...

40 questions

8th Grade Math Review

Quiz

•

8th Grade - University

20 questions

Block Buster Movies

Quiz

•

10th Grade - Professi...

10 questions

Would you rather...

Quiz

•

KG - University

40 questions

Flags of the World

Quiz

•

KG - Professional Dev...

14 questions

Superhero

Quiz

•

1st Grade - University

Recap Quiz

In the CRISP-DM methodology, what is the primary purpose of the Data Understanding phase?

During data cleaning, why might you choose to clamp outliers rather than remove those rows entirely?

When preparing data for a machine learning model, why is stratified sampling important when splitting your data into training and test sets?

A basketball player's weight is standardized using a z-score transformation. If a player's weight is exactly equal to the mean weight of the dataset, what will their standardized value be?

A linear regression model evaluates its predictions and achieves a Mean Absolute Error (MAE) of 25 and a Root Mean Squared Error (RMSE) of 45. Why is the RMSE noticeably larger than the MAE?

What is the primary output of a logistic regression model before any decision threshold is applied?

In an imbalanced classification setting (e.g., a dataset with 95% negative cases and 5% positive cases), why can evaluating the model solely on "Accuracy" be misleading?

In a medical screening context where failing to detect a disease is highly dangerous, which type of error is usually the most critical to minimize?

Which parameter is commonly used to help prevent a single Decision Tree from overfitting the training data?

A Random Forest builds multiple trees, each trained on a bootstrap sample. In this context, what is a bootstrap sample?

Access all questions and much more by creating a free account

Popular Resources on Wayground

Discover more resources for Information Technology (IT)