1/14
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
1. In supervised learning, using dropout as a regularization technique during training helps prevent overfitting by temporarily removing certain connections in the model.
true
For learning models that employ gradient descent, adding a regression term (regularization) will require a change to the gradient used by the update rule.
true
MAE gives higher penalties to larger errors compared to MSE
false
Feedforward neural network with one hidden layers containing a sufficient number of neurons can approximate any continuous function on a closed and bounded interval,
given appropriate activation functions
true
L2 is a regularization technique that is applied only during the training phase of supervised learning, while all neurons are utilized during the testing phase.
true
Which of the following algoithms are superised learning?
Support Vector Machines (SVM)
Linear Regression
Gradient Boosting Machines
Which of the following statements about autoencoders and PCA are not true
PCA does not require labeled data. It uses eigenvectors/eigenvalues from the covariance matrix, which is purely unsupervised.
PCA is a supervised learning method, whereas autoencoders are always unsupervised.
Which of the following are not the characteristics of the Naive Bayes classifier?
It requires large amounts of training data to perform well.
It is sensitive to the correlation between features.
It doesn't perform well with a small amount of training data, especially in text classification tasks.
Which of the following statements are not true about the softmax layer?
The softmax layer is typically applied to hidden layers in deep networks to introduce nonlinearity.
The softmax function ensures that all outputs are non-negative and bounded between -1 and 1.
Which of the following statements are true about the sigmoid activation function? (8 pts)
Sigmoid outputs values between 0 and 1, making it suitable for binary
classification tasks.
Sigmoid can cause the vanishing gradient problem because its gradients become very small for large input values.
Cross-entropy loss:
Is commonly used for classification
Measures the difference between true labels and predicted probabilities
Is minimized when predicted probabilities match the true distribution
Can be used with softmax outputs
Which of the following are offective strategies for handling class imbalance in
clasalfication tasks?
Applying a Class-balanced lose function to give higher penalties for misclassifying minority class instances.
Using SMOTE (Synthetlo Minority Over-sampling Technique) to create synthetic instances of the minority class by interpolating between existing minority samples.
utilizing undersampling of the majority class to reduce its representation in the dataset, thereby balancing class distributions
Which of the following are true about standard scaling of features?
Standard scaling transforms the data by shifting it to have a mean of zero and a
standard deviation of one.
Standard scaling helps prevent the model from being biased toward features with larger magnitudes.
Which of the following statements about linear PCA is correct? (select one) (7 pts)
The sum of the eigenvalues produced by PCA equals the total variance of the dataset.
The eigenvector with largest eigenvalue is the direction along which the projection of the data has highest variance