Intro to ML: Unsupervised Learning

Intro to ML: Unsupervised Learning

University

10 Qs

quiz-placeholder

Similar activities

K-Means Quiz

K-Means Quiz

University

10 Qs

Partioning based clustering : Kmeans

Partioning based clustering : Kmeans

University

8 Qs

ANOVA Analysis

ANOVA Analysis

University - Professional Development

10 Qs

ACP &Agrupamiento

ACP &Agrupamiento

University - Professional Development

10 Qs

Chapter 6 Data Displays and Analysis

Chapter 6 Data Displays and Analysis

6th Grade - University

15 Qs

Perpendicular Bisectors

Perpendicular Bisectors

10th Grade - University

12 Qs

DATA CLUSTERING ALGORITHM TEST 1

DATA CLUSTERING ALGORITHM TEST 1

University

6 Qs

Study Jam ML Quiz - b

Study Jam ML Quiz - b

University

10 Qs

Intro to ML: Unsupervised Learning

Intro to ML: Unsupervised Learning

Assessment

Quiz

Mathematics, Computers, Fun

University

Hard

Created by

Josiah Wang

Used 11+ times

FREE Resource

10 questions

Show all answers

1.

MULTIPLE CHOICE QUESTION

1 min • 1 pt

Does one expect two runs of k-means clustering to produce the same clustering results?

yes

no

Answer explanation

No, k-means is sensitive to the initialisation stage where centroids are randomly assigned to positions in the data space.

2.

MULTIPLE CHOICE QUESTION

1 min • 1 pt

Is it possible that the assignment of observations to clusters doesn’t change between successive iterations in K-Means?

yes

no

can't say

Answer explanation

Yes! Each centroid is updated to the average position of the datapoints which were assigned to it in the previous iteration. If the previous update in centroid position did not result in new datapoints being assigned to it then it's position will not be updated.

3.

MULTIPLE CHOICE QUESTION

1 min • 1 pt

True or False. The larger the number of centroids in K-means, the less likely the model is to overfit

True

False

Answer explanation

If you keep increasing the number of centroids, at some point K will equal the number of data points. This will result in each data instance being assigned its own unique cluster. You will be fitting the spurious noise, not the underling trend of the data! The challenge with k-means is picking the correct number of centroids for the problem.

4.

MULTIPLE CHOICE QUESTION

1 min • 1 pt

True or False. The initial position of the clusters does not affect the final result of K-Means

True

False

Answer explanation

As the centroids are simply updated to the average position of the assigned clusters there is no guarantee of convergence on a global optimum. Rather convergence on local minima subject to cluster initialisation occurs.

5.

MULTIPLE CHOICE QUESTION

1 min • 1 pt

A student has applied the k-means algorithm to an unsupervised problem. On analysis they find that the mean distance between data instances and the cluster centres which they are assigned is 0. What does this mean?

That the chosen value of k must equal the true number of clusters

That the chosen value of k must at least equal the number of datapoints

That this specific configuration (ie position) of k centroids is optimal for this dataset

None of these

Answer explanation

Assuming that there are no datapoints with identical attributes there will always be a positive mean distance between clusters and their assigned datapoints if a centroid has more than one data point assigned to it.

6.

MULTIPLE CHOICE QUESTION

1 min • 1 pt

Media Image

The K-means algorithm was executed several times with different values of K. The mean distance between validation datapoints and the nearest centroid was calculated and plotted. From this plot determine the best value for K.

1

3

4

6

9

Answer explanation

Check out the 'Elbow' method in the slides. The sharp plateauing of the decline score with increasing number of K suggests the point where you stop modelling the true underlying clusters of the data and start to model noise.

7.

MULTIPLE SELECT QUESTION

45 sec • 1 pt

Which of the following are limitations of the k-means algorithm

It is sensitive to outliers

It is sensitive to initialisation

It has exponential time complexity with dataset size

It is not suitable for datasets containing non hyper-ellipsoids clusters

None of the above

Answer explanation

Check the slides!

Create a free account and access millions of resources

Create resources
Host any resource
Get auto-graded reports
or continue with
Microsoft
Apple
Others
By signing up, you agree to our Terms of Service & Privacy Policy
Already have an account?

Discover more resources for Mathematics