Alg for Machine Learning Test 1 Prep Set

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/146

There's no tags or description

Looks like no tags are added yet.

Last updated 2:06 PM on 6/25/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

147 Terms

New cards

Independent Variables of Machine Learning

Inputs used to make predictions

New cards

Dependent Variables of Machine Learning

The target/results that are predicted

New cards

Simple Linear Regression

The type of regression that utilizes one variable to predict the target

New cards

Multiple Linear Regression

The type of regression that utilizes multiple variables to predict the target

New cards

What does "fitting a line" mean in linear regression?

Finding the line/hyperplane that minimizes prediction error between predicted and actual values

New cards

What type of variable does linear regression predict?

Continuous numerical variables

New cards

In linear regression why is a column of 1s added?

Added the intercept into the model

New cards

Why is the Normal Equation computationally expensive

It requires matrix inversion which can be very slow for data sets with a large number of features

New cards

When would you choose a gradient descent algorithm over a Normal Equation algorithm?

When datasets are very large or have a large number of features

New cards

What variable does logistic regression predict?

Binary class probabilities for classification

New cards

Purpose of the Sigmoid Function in Logistic Regression

Scales the output of the linear model into the probability range of [0,1]

New cards

Why is MSE not used in Logistic Regression?

It makes the cost function nonconvex and harder to optimize

New cards

What variable does softmax regression predict?

Multi class probabilities for classification

New cards

One-Hot Encoding

Representing categorical classes as binary vectors where only one values is one and the rest are 0s

New cards

Supervising Learning

Model is trained with labeled data

New cards

Unsupervised Learning

Model is trained with unlabeled data

New cards

Semi-Supervised Learning

Model is trained with small amounts of labeled data combined into a set with label data

New cards

Instance-Based Learning

Model that compares new examples to stored training instances. Saves training data for reference

New cards

Model-Based Learning

Model builds an equation/graph that is used to predict target. Does not save training data for reference

New cards

Underfitting

Model fails for both training and test data and is too simple for the given dataset and unable to learn feature-target relationships

New cards

Overfitting

Model succeeds with training data but fails at testing data and is overly complicated for the given dataset

New cards

Which model failure is poor at generalization?

Overfitting

New cards

How to fix underfitting?

Use a more complex model

New cards

What are the 3 main examples of bad data problems?

- Dataset has a lot of noise

- Dataset has a lot of outliers

- Dataset has a lot of missing features/values

New cards

Class Imbalance

One class dominates or appears more in a data set

New cards

Why is a high accuracy misleading in a heavily imbalanced data set?

A model can predict only the majority or be 100% wrong in this instance

New cards

K-Fold Cross Validation

Data is split into k mutual subsets and k number training/testing experiments are conducted

New cards

Why can't training data used in test?

It biases the model

New cards

Feature Engineering

Creating or transforming feature to improve performance

New cards

Examples of Feature Engineering

- Feature Creation

- Feature Selection

- Dimensionality Reduction

New cards

Feature Derivation

Creating new features from existing ones

New cards

When should a feature be removed?

- Feature has no correlation with target

- Feature is highly correlated/redundant to another feature

- All values in the feature are the same

- Over 60% of the values of the feature are missing

New cards

Best Plot for categorical variables

Histogram

New cards

Best plot to show correlation between continuous variables

Heatmap

New cards

Primary Question of EDA

Is my data ready for machine learning?

New cards

Consequences of skipping EDA

- Incorrect conclusions

- Overfitting

- Underfitting

- Issues with data remains undiscovered

- Wasting time

New cards

What sampling method preserves class ratios in the train/test split?

Stratified Sampling

New cards

Stratified Sampling

A type of probability sampling in which the population is divided into groups with a common attribute and a random sample is chosen within each group

New cards

Equal-Frequency Binning

Each bin contains the same number of observations

New cards

Main use of Gradient Descent

Minimizing the cost function during training

New cards

What is the role of partial derivatives in gradient descent?

They determine the direction to update model parameters

New cards

What does learning rate control in gradient descent?

The size of parameter updates during optimization

New cards

What happens if learning rate is too small?

Training becomes slow and does runtime becomes to long to reach the minimum

New cards

What happens if learning rate is too large?

Training overshoots the minimum or diverges (causes pattern of oscillation instead of convergence)

New cards

Why is feature scaling important in gradient descent?

It speeds up convergence and prevents uneven updates

New cards

Types of Gradient Descent

- Batch GD

- Stochastic GF

- Mini Batch GD

New cards

Batch Gradient Descent

Uses entire dataset in each step

New cards

Main Con of Batch GD

Slow for large datasets

New cards

Stochastic Gradient Descent

Uses one training example at a time making on training example one step

New cards

Pros of Stochastic GD

- Fast

- Works well with very large datasets

New cards

Con of Stochastic GD

- Noisy updates

New cards

Mini-Batch Descent

Uses small subsets of data for each step

New cards

Pros of Mini-Batch GD

- Efficient & stable

- Works well for large datasets

New cards

Use Case of Polynomial Regression

When variable relationships are nonlinear

New cards

Bias

An error due to overly simple model

New cards

Variance

Error due to model sensitivity to training data

New cards

What does bias and variance look like in the case of best generalization?

Balanced

New cards

Regularization

Techniques used to reduce overfitting by penalizing large model coefficients

New cards

What are the 3 regularized linear models?

- Ridge Regression

- Lasso Regression

- Elastic Net

New cards

Ridge Regression

Method of regularization by limiting the sum of the squares of the coefficients (aka L2 regularization). Shrinks coefficients but doesn't eliminate them

New cards

Lasso Regression

Uses L1 regularization which sets coefficients to zero then performs feature selection

New cards

Elastic Net

Uses a combination of L1 & L2 regularization which balances feature selection and coefficient shrinkage

New cards

What does a learning curve show?

Model Performance throughout training

New cards

Machine Learning

The process of training and algorithm to learn patterns from data to make predictions automatically

New cards

Components of a machine learning problem

- Task (T)

- Experience (E)

- Performance Measure (P)

New cards

Task

What a model is trying to do

New cards

Experience

The data a model learns from

New cards

Performance Measure

How a model is evaluated

New cards

What data does classification predict?