Machine Learning Questions

0.0(0)
Studied by 1 person
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/13

flashcard set

Earn XP

Description and Tags

Last updated 2:22 PM on 7/20/24
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

14 Terms

1
New cards

Define bias

When a model makes predictions, a disparity between the model's prediction values and actual values arises, and this difference is known as bias. Bias is the incapacity of machine learning algorithms like Linear Regression to grasp the real relationship between data points.

2
New cards

Define variance

If alternative training data were utilized, the variance would describe the degree of variation in the prediction. In layman's terms, variance describes how far a random variable deviates from its predicted value.

3
New cards

You have come across some missing data in your dataset. How will you handle it?

In order to handle some missing or corrupted data, the easiest way is to just replace the corresponding rows and columns, which contain the incorrect data, with some different values. The two most useful functions in Panda for this purpose are isnull() and fillna().

  • isnull(): is used to find missing values in a dataset

  • fillna(): is used to fill missing values with 0’s

4
New cards

Explain Decision Tree Classification.

A decision tree uses a tree structure to generate any regression or classification models. While the decision tree is developed, the datasets are split up into ever-smaller subsets in a tree-like manner with branches and nodes. Decision trees can handle both category and numerical data.

5
New cards

How is a logistic regression model evaluated?

One of the best ways to evaluate a logistic regression model is to use a confusion matrix, which is a very specific table that is used to measure the overall performance of any algorithm.

Using a confusion matrix, you can easily calculate the Accuracy Score, Precision, Recall, and F1 score. These can be extremely good indicators for your logistic regression model.

If the recall of your model is low, then it means that your model has too many False Negatives. Similarly, if the precision of your model is low, it signifies that your model has too many False Positives. In order to select a model with a balanced precision and recall score, the F1 Score must be used.

6
New cards

What is Selection Bias?

Selection Bias is a statistical error that brings about a bias in the sampling portion of the experiment. This, in turn, causes more selection of the sampling portion than other groups, which brings about an inaccurate conclusion.

7
New cards

What is the difference between correlation and causality?

Correlation is the relation of one action (A) to another action (B) when A does not necessarily lead to B, but Causality is the situation where one action (A) causes a result (B).

8
New cards

What is the difference between Correlation and Covariance?

Correlation quantifies the relationship between two random variables with three values: 0,1 and -1.

Covariance is the measure of how two different variables are related and how changes in one impact the other.

9
New cards

What are the differences between Type I error and Type II error?

Type 1: False positive. This states something has happened when it hasn’t.

Type 2: False negative. This states nothing has happened when it has.

10
New cards

What is sensitivity?

This is the probability that the prediction outcome of the model is true when the value is positive. It can be described as the metric for evaluating a model’s ability to predict the true positives of each available category.

Sensitivity = TP / TP+FN (i.e. True Positive/True Positive + False Negative)

11
New cards

What is specificity?

This is the probability the prediction of the model is negative when the actual value is negative. It can be termed as the model’s ability to foretell the true negative for each category available..

Specificity = TN / TN + FP (i.e. True Negative/True Negative + False Positive)

12
New cards

Why is the ROC in a curve important?

The ROC curve is important because it is a visual representation of how well a model can distinguish between two classes, and it can be used to compare different models. The area under the curve (AUC) is a measure of how well the model performs, with a higher AUC indicating a better model. Additionally, the shape of the curve can indicate whether a model is biased towards one class or another.

13
New cards

Why does overfitting occur in ML?

Overfitting occurs in ML when the model is too complex or has too many parameters relative to the amount of data that is available. This causes the model to fit the noise of the data rather than the underlying patterns, resulting in poor generalization and an inability to accurately predict on previously unseen data.

14
New cards