Alg for Machine Learning Test 1 Prep Set

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/146

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 2:06 PM on 6/25/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

147 Terms

1
New cards

Independent Variables of Machine Learning

Inputs used to make predictions

2
New cards

Dependent Variables of Machine Learning

The target/results that are predicted

3
New cards

Simple Linear Regression

The type of regression that utilizes one variable to predict the target

4
New cards

Multiple Linear Regression

The type of regression that utilizes multiple variables to predict the target

5
New cards

What does "fitting a line" mean in linear regression?

Finding the line/hyperplane that minimizes prediction error between predicted and actual values

6
New cards

What type of variable does linear regression predict?

Continuous numerical variables

7
New cards

In linear regression why is a column of 1s added?

Added the intercept into the model

8
New cards

Why is the Normal Equation computationally expensive

It requires matrix inversion which can be very slow for data sets with a large number of features

9
New cards

When would you choose a gradient descent algorithm over a Normal Equation algorithm?

When datasets are very large or have a large number of features

10
New cards

What variable does logistic regression predict?

Binary class probabilities for classification

11
New cards

Purpose of the Sigmoid Function in Logistic Regression

Scales the output of the linear model into the probability range of [0,1]

12
New cards

Why is MSE not used in Logistic Regression?

It makes the cost function nonconvex and harder to optimize

13
New cards

What variable does softmax regression predict?

Multi class probabilities for classification

14
New cards

One-Hot Encoding

Representing categorical classes as binary vectors where only one values is one and the rest are 0s

15
New cards

Supervising Learning

Model is trained with labeled data

16
New cards

Unsupervised Learning

Model is trained with unlabeled data

17
New cards

Semi-Supervised Learning

Model is trained with small amounts of labeled data combined into a set with label data

18
New cards

Instance-Based Learning

Model that compares new examples to stored training instances. Saves training data for reference

19
New cards

Model-Based Learning

Model builds an equation/graph that is used to predict target. Does not save training data for reference

20
New cards

Underfitting

Model fails for both training and test data and is too simple for the given dataset and unable to learn feature-target relationships

21
New cards

Overfitting

Model succeeds with training data but fails at testing data and is overly complicated for the given dataset

22
New cards

Which model failure is poor at generalization?

Overfitting

23
New cards

How to fix underfitting?

Use a more complex model

24
New cards

What are the 3 main examples of bad data problems?

- Dataset has a lot of noise

- Dataset has a lot of outliers

- Dataset has a lot of missing features/values

25
New cards

Class Imbalance

One class dominates or appears more in a data set

26
New cards

Why is a high accuracy misleading in a heavily imbalanced data set?

A model can predict only the majority or be 100% wrong in this instance

27
New cards

K-Fold Cross Validation

Data is split into k mutual subsets and k number training/testing experiments are conducted

28
New cards

Why can't training data used in test?

It biases the model

29
New cards

Feature Engineering

Creating or transforming feature to improve performance

30
New cards

Examples of Feature Engineering

- Feature Creation

- Feature Selection

- Dimensionality Reduction

31
New cards

Feature Derivation

Creating new features from existing ones

32
New cards

When should a feature be removed?

- Feature has no correlation with target

- Feature is highly correlated/redundant to another feature

- All values in the feature are the same

- Over 60% of the values of the feature are missing

33
New cards

Best Plot for categorical variables

Histogram

34
New cards

Best plot to show correlation between continuous variables

Heatmap

35
New cards

Primary Question of EDA

Is my data ready for machine learning?

36
New cards

Consequences of skipping EDA

- Incorrect conclusions

- Overfitting

- Underfitting

- Issues with data remains undiscovered

- Wasting time

37
New cards

What sampling method preserves class ratios in the train/test split?

Stratified Sampling

38
New cards

Stratified Sampling

A type of probability sampling in which the population is divided into groups with a common attribute and a random sample is chosen within each group

39
New cards

Equal-Frequency Binning

Each bin contains the same number of observations

40
New cards

Main use of Gradient Descent

Minimizing the cost function during training

41
New cards

What is the role of partial derivatives in gradient descent?

They determine the direction to update model parameters

42
New cards

What does learning rate control in gradient descent?

The size of parameter updates during optimization

43
New cards

What happens if learning rate is too small?

Training becomes slow and does runtime becomes to long to reach the minimum

44
New cards

What happens if learning rate is too large?

Training overshoots the minimum or diverges (causes pattern of oscillation instead of convergence)

45
New cards

Why is feature scaling important in gradient descent?

It speeds up convergence and prevents uneven updates

46
New cards

Types of Gradient Descent

- Batch GD

- Stochastic GF

- Mini Batch GD

47
New cards

Batch Gradient Descent

Uses entire dataset in each step

48
New cards

Main Con of Batch GD

Slow for large datasets

49
New cards

Stochastic Gradient Descent

Uses one training example at a time making on training example one step

50
New cards

Pros of Stochastic GD

- Fast

- Works well with very large datasets

51
New cards

Con of Stochastic GD

- Noisy updates

52
New cards

Mini-Batch Descent

Uses small subsets of data for each step

53
New cards

Pros of Mini-Batch GD

- Efficient & stable

- Works well for large datasets

54
New cards

Use Case of Polynomial Regression

When variable relationships are nonlinear

55
New cards

Bias

An error due to overly simple model

56
New cards

Variance

Error due to model sensitivity to training data

57
New cards

What does bias and variance look like in the case of best generalization?

Balanced

58
New cards

Regularization

Techniques used to reduce overfitting by penalizing large model coefficients

59
New cards

What are the 3 regularized linear models?

- Ridge Regression

- Lasso Regression

- Elastic Net

60
New cards

Ridge Regression

Method of regularization by limiting the sum of the squares of the coefficients (aka L2 regularization). Shrinks coefficients but doesn't eliminate them

61
New cards

Lasso Regression

Uses L1 regularization which sets coefficients to zero then performs feature selection

62
New cards

Elastic Net

Uses a combination of L1 & L2 regularization which balances feature selection and coefficient shrinkage

63
New cards

What does a learning curve show?

Model Performance throughout training

64
New cards

Machine Learning

The process of training and algorithm to learn patterns from data to make predictions automatically

65
New cards

Components of a machine learning problem

- Task (T)

- Experience (E)

- Performance Measure (P)

66
New cards

Task

What a model is trying to do

67
New cards

Experience

The data a model learns from

68
New cards

Performance Measure

How a model is evaluated

69
New cards

What data does classification predict?

Categories

70
New cards

ML Workflow Steps

- Understand problem

- Collect data

- Perform EDA

- Prep & clean data

- Convert data to design matrix

- Train model

- Evaluate training and repeat last step if necessary

- Deploy model

71
New cards

Exploratory Data Analysis

The process of summarizing, visualizing, & understanding a data set before modeling

72
New cards

Nominal Data

Numbered categories with no order

73
New cards

Ordinal Data

Numbered categories with ordered ranking

74
New cards

Prediction Error in Linear Regression

The difference between an actual point in the model and a predicted point on the model line

75
New cards

Gradient Descent Process

- Start with random parameter values

- Calculate prediction error

- Compute Gradient

- Update parameters

- Repeat until convergence/until gradient = 0 & min is reached

76
New cards

Feature Scaling

Adjusting feature values to a common scale

77
New cards

Binary Classification

Classification with 2 classes

78
New cards

Multi-Class Classification

Classification with 3+ classes

79
New cards

Multi-Label Classification

One data type can have multiple labels Ex: movie genre

80
New cards

If there is a lot of bias in a model, what could we expect the model to have?

Underfitting

81
New cards

If there is a lot of variance in a model, what could we expect the model to have?

Overfitting

82
New cards

Reinforcement Learning

The training of machine learning models to make a sequence of decisions

83
New cards

What is commonly mistaken as a part of feature engineering but is not?

Replacing missing data

84
New cards

Pro of K-Fold Cross Validation

More reliable performance estimate

85
New cards

Single K-Fold Cross Validation

86
New cards

Why does a rule-based spam filter become difficult to maintain?

It creates a long list of complex rules that needs constant updating

87
New cards

Bootstrapping

A type of k-cross that uses multiple random Ks in multiple training steps that averaging the results

88
New cards

Min-Max Normalization Process

- Find min & max values

- Get range: max - min

- Plug in values (denoted by x or xi) into the formula to scale all values

89
New cards

Min-Max Normalization Formula

(xi - min) / range

90
New cards

Range of Min-Max Normalization

[0,1]

91
New cards

Analogy for Gradient Decent Optimization

A person moving downhill in a vlaley

92
New cards

How does the slope of Gradient Descent moves in relation to the cost function?

It will more in the opposite direction of the cost function

93
New cards

What is the Learning Rate in Gradient Descent

The size of steps taken during each iteration

94
New cards

If the learning rate is too small in GD what happens to convergence?

It converges too slowly and takes a long time to reach the local min

95
New cards

If the learning rate is too large in GD what happens to convergence?

It can skip the mini and diverge, usually in a pattern of oscillation with the convex function

96
New cards

Why is MSE cost function for linear regression used for GD

It is favorable for being convex

97
New cards

When both θ0 and θ1 vary, the cost function is best visualized using what plot?

3D Surface Plot

98
New cards

Contour Plot

Shows constant cost levels in 2D

99
New cards

If Gradient Descent starts at different initial points on a non-convex cost surface what conclusion will it have?

Different local mins will be reached

100
New cards

What determines the direction parameter update in Gradient Descent?

Derivatives