Exam Questions Flashcard

1.

FLASHCARD QUESTION

Front

Back

SVMs don’t aim for the sharpest slope of the separating hyperplane—they aims for the maximum margin, meaning the boundary that’s as far as possible from the closest points (support vectors). A steep slope might fit the training data too tightly, and keep training points very close to the margin, limiting generalizability. (Either of these answers is acceptable for full credit.)

2.

FLASHCARD QUESTION

Front

In my lab, we study the use of gestures in collaborative problem solving tasks. We gathered 4 hours of audio-visual data, consisting of 10 groups each of 3 people collaboratively solving a problem involving physical objects, and developed a random forest method to detect when any participant in the data is performing a gesture of interest (such as pointing, pinching, or grabbing). There are two possible ways we could evaluate this gesture classifier: 1 (a) Pool the samples, randomly shuffle them, and split them into 10 folds and perform a rotating stratified 90:10 10-fold cross-validation. (b) Perform a rotating stratified 10-fold cross-validation using each group in turn as the test group.

Which of these is a better way to evaluate if I want to understand the robustness of my classifier to unseen data, and why?

Back

(b) is better, because our method needs to work on entire groups that it has not seen during training. Pooling the samples as in (a) means that in each fold, the model is likely to see at least some of the same gestures by the same people during training as it does during cross-validation.

3.

FLASHCARD QUESTION

Front

Back

In step 6, the test data is being normalized using the test means. This means that test points are no longer being represented using values that are meaningful relative to the training distribution.

4.

FLASHCARD QUESTION

Front

Intuitively, why are individual decision trees brittle and sensitive to individual feature values? What do random forests do that alleviates this limitation? What mechanism can be used in a random forest to come to a final decision?

Back

Individual decision trees split based on hard thresholds, so small changes in feature values can result in a completely different tree. Random forests build many different trees based on 2 subsampling different features out of the data. Random forests may come to a decision based on majority voting for classification problems and averaging for regression problems.

5.

FLASHCARD QUESTION

Front

Why can this function not be used as an activation function in a multilayer perceptron neural network (hint: think about how we have to incorporate activation functions when calculating gradient descent)?

Back

This function cannot be theoretically be used within gradient descent because it is not differentiable, and cannot practically be used because its derivative at all points where x= 0 is equal to 0

6.

FLASHCARD QUESTION

Front

What function h(x) can be used as an activation in a neural network, but has similar properties as the step function (e.g., bounds on h(x)) as x → ±∞?

Back

The sigmoid function.

7.

FLASHCARD QUESTION

Front

What common activation function function that we discussed in class has a derivative that is the step function?

Back

ReLU.

8.

FLASHCARD QUESTION

Front

Why does it not matter that ReLU is also technically not differentiable at all values of x?

Back

ReLU is only non-differentiable at exactly x = 0 and the practical odds of inputs to ReLU being exactly 0 are extremely small, and if this occurs, the problem is easily solved with small epsilon values

9.

FLASHCARD QUESTION

Front

Suppose you are training an SVM in two dimensions. Given the two weight vectors w1 = (1,100)⊤ and w2 = (1,3)⊤, which one is more likely to be picked by the SVM training algorithm? Explain!

Back

w_2 will be chosen because it has smaller norm and hence provides larger margin.

10.

FLASHCARD QUESTION

Front

My lab is collaborating with a biologist who studies the microbiome of both living and dead animals and humans. They sampled the microbiomes of dead mice and humans and we helped them develop a random forest-based approach to predict the time since death based on the microbiome composition. This has potential forensic applications since standard forensic techniques work for a limited range of time. To generate the data, each body was sampled at regular intervals and its microbiome composition determined by appropriate experimental protocols. In order to evaluate the ability of a classifier to predict time-since-death, they performed leave-one-cadaver out cross validation, where a classifier was trained on all measurments performed on all but one cadaver. The classifier was evaluted on a all the measurments performed on the left-out cadaver. This is iterated until obtaining predictions on all cadavers. Explain the value of this evaluation procedure over a procedure that pools all the samples from all the cadavers, i.e. mixes them all up and then performs cross-validation over the pooled samples.

Back

For saying that leave-one-cadaver-out is better because leave-one-out is a good approach for small datasets.

The system needs to work on cadavers that it has not seen during training. That's why pooling the samples and performing cross-validation is not appropriate.

We have solved a similar question in class in the context of OCR.

Leave-one-out and leave-one-cadaver out are very different!

Exam Questions

16 questions

Create a free account and access millions of resources

Similar Resources on Wayground

Popular Resources on Wayground

Discover more resources for Mathematics