CS 3262 - Machine Learning

0.0(0)
studied byStudied by 0 people
GameKnowt Play
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/31

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

32 Terms

1
New cards

True or false - the first principal component is always orthogonal to the second principal component.

True

2
New cards

True or False - In bagging, each model sees all data points.

False

3
New cards

True or False - PCA's first component always has the highest variance ratio

True

4
New cards

What is the key function of the hyperplane in Support Vector Machines?

To separate data points into different classes

5
New cards

What happens when a model is overfitting the data?

The model captures noise and patterns specific to the training data

6
New cards

How does regularization help in preventing overfitting?

By adding a penalty to large weights, thus reducing the model's complexity

7
New cards

What type of problem does linear regression solve?

Regression

8
New cards

Which method is used to optimize logistic regression?

Gradient descent

9
New cards

What is the loss function for logistic regression?

Log likelihood or log loss

10
New cards

What is the main difference between supervised and unsupervised learning?

Supervised uses labeled data and the unsupervised not

11
New cards

Why is linear regression not ideal for classification tasks?

It doesn't provide optimal class separation and probabilities.

12
New cards

Which metric is commonly used to evaluate the performance of a classification model?

Accuracy

13
New cards

What does min-max standardization do to the features of a dataset?

It scales the features to a range between 0 and 1.

14
New cards

In gradient descent, what does the negative gradient represent?

The direction of steepest decrease.

15
New cards

Imagine you are working on a binary classification model to detect fraudulent transactions. Out of 1,000 transactions, only 50 are actually fraudulent. Explain why using accuracy alone might not be an adequate metric to assess your model’s performance. How might precision, recall, and F1 score provide a better understanding of your model’s effectiveness?

Accuracy alone may not be the best metric because the data is overwhelmingly of one class. If all fraudulent transactions were misidentified, that would be a disastrous result, but the accuracy would be 95% - seemingly effective. Precision would be an improvement because it identifies what proportion of transactions marked as fraudulent were truly fraudulent. Recall would also be an improvement because it would identify how many out of all genuine fraud cases were caught. Finally, F1 score could improve upon accuracy by combining both measures with equal importance.

16
New cards

You are developing a spam filter for emails. Discuss the advantages and disadvantages of using precision, recall, and accuracy for evaluating your model. Which metric(s) would you prioritize and why?

I would rather use recall than accuracy or precision. Recall would allow me to maximize the amount of spam I catch, which in my opinion is the point of a spam filter. Precision would allow me to know if a non-spam email was accidentally caught, but that's of less importance. All a user would need to do in that situation is take an email out of the spam folder. And as for accuracy, most emails people receive are not spam, so letting every spam email go would still get high accuracy, which is not ideal.

17
New cards

Can you briefly describe the ROC curve, including how it is generated and what the X and Y axes represent? Also, explain the concept of the AUC (Area Under the Curve).

To make an ROC curve, you must first pick a probability threshold to classify each observation. Next, you should find the True Positive Rate (TPR) by dividing the amount of true positives by the total amount of actual positives and False Positive Rate (FPR) by dividing the amount of false positives by the total amount of actual negatives. TPR and FPR can be plotted against each other to find the ROC curve, with TPR on the Y axis and FPR on the X axis. The AUC is the area under this ROC curve, and the goal is to maximize AUC to find the best classifier.

18
New cards

Can you briefly explain the pros and cons of K-fold cross-validation and leave-one-out cross-validation? If we have a relatively small dataset, which method would be more appropriate?

K-fold cross validation splits a dataset into a number (k) of blocks, allowing us to use the entire dataset for training and testing. For small datasets, however, it's better to make k the size of the dataset: leave-one-out cross validation. This allows us to make the most out of a small amount of data points, but it is computationally expensive. K-fold cross validation is less computationally expensive, but its estimation of the model's performance is not as unbiased as leave-one-out cross validation.

19
New cards

Which of the following statements is true about the number of support vectors in a Support Vector Machine (SVM)?

Only a subset of the training data points are support vectors.

20
New cards

What is the effect of a large value of the regularization parameter C in SVM?

It results in a smaller margin, with fewer misclassifications.

21
New cards

You have trained a machine learning model on a dataset and evaluated its performance. The following observations are made:

The model performs well on the training data.

The model's performance on a separate test dataset is significantly worse than on the training data.
Which of the following best describes the likely issue with the model?

The model is suffering from high variance

22
New cards

What is the primary purpose of Principal Component Analysis (PCA) in data analysis?

To reduce the dimensionality of the dataset by transforming it into a new set of uncorrelated variables

23
New cards

Which best explains how bagging improves the performance of a random forest compared to a single decision tree?

Bagging trains multiple decision trees on random subsets of the data, then averages their predictions to reduce overfitting and improve stability

24
New cards

What is the primary purpose of a loss function in machine learning models like linear and logistic regression?

To measure the performance of the model by quantifying the difference between the predicted and actual values

25
New cards

Consider the gradient at a given point during the gradient descent process. Which of the following is true about the gradient and its relationship to the loss function at that point?

The gradient is a vector pointing in the direction of the steepest decrease, and gradient descent moves in the opposite direction of the gradient to minimize the loss function.

26
New cards

Which of the following best describes how a Support Vector Machine (SVM) works?

It finds the hyperplane that maximizes the margin between data points of different classes.

27
New cards

Which of the following statements is true regarding bias, variance, and overfitting/underfitting?

Overfitting occurs when the model performs well on the training data but poorly on new, unseen data.

28
New cards

Which of the following statements is true regarding Lasso regression and its impact on underfitting and overfitting?

Lasso regression encourages sparsity by driving some weights to zero or reducing their magnitudes, which helps prevent overfitting.

29
New cards

Which of the following correctly describes a key difference between K-Fold Cross-Validation and Leave-One-Out Cross-Validation (LOOCV)?

In Leave-One-Out Cross-Validation, the number of folds is equal to the number of samples, making it more computationally intensive than K-Fold Cross-Validation.

30
New cards

Which of the following statements best describes k-fold cross-validation?

The data is divided into k equal-sized parts, and each part is used as a test set exactly once, with the model trained on the remaining parts each time.

31
New cards

The sigmoid function used in logistic regression outputs values between 0 and 1. What does the output of the sigmoid function represent in the context of binary classification?

The probability of the sample belonging to class 1

32
New cards

You have the following DataFrame df. Which of the following statements is not correct?

df['Score'] selects the column as a dataframe