ML Fundementals

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/89

There's no tags or description

Looks like no tags are added yet.

Last updated 1:58 AM on 5/11/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

90 Terms

New cards

"What is the main difference between supervised and unsupervised learning?"

"Supervised learning trains on labeled data

New cards

"Why would you use unsupervised learning instead of supervised learning?"

"You use unsupervised learning when labeled outcomes are unavailable

New cards

"What does it mean for a model to generalize well?"

"A model generalizes well when it performs well on unseen data

New cards

"Why does overfitting happen?"

"Overfitting happens when a model learns noise

New cards

"Why does underfitting happen?"

"Underfitting happens when a model is too simple to capture the true relationship in the data. It performs poorly on both training and test data."

New cards

"What is the bias-variance tradeoff?"

"The bias-variance tradeoff describes the balance between error from overly simple assumptions and error from being too sensitive to the training data. A good model minimizes both as much as possible."

New cards

"What happens to bias and variance as model complexity increases?"

"As model complexity increases

New cards

"Why can training error keep decreasing while test error starts increasing?"

"Training error decreases because a more complex model can fit the training data better. Test error increases when the model starts fitting noise instead of generalizable patterns."

New cards

"Why do we use a validation set instead of choosing the model based on test performance?"

"A validation set is used to tune models and hyperparameters without contaminating the final test evaluation. The test set should only be used once for an unbiased estimate of final performance."

New cards

"Why is cross-validation useful for estimating model performance?"

"Cross-validation evaluates a model across multiple train-validation splits

New cards

"How does K-fold cross-validation work?"

"The data is split into K folds. The model trains on K-1 folds and validates on the remaining fold

New cards

"What is the difference between K-fold cross-validation and leave-one-out cross-validation?"

"K-fold uses groups of data as validation sets

New cards

"If LOOCV uses almost all the data for training

why not always use it?"

New cards

"How does regularization help prevent overfitting?"

"Regularization adds a penalty for model complexity

New cards

"How can you diagnose overfitting and underfitting using training and validation performance?"

"If training error is low but validation error is high

New cards

"What is the goal of gradient descent in machine learning?"

"Gradient descent minimizes a loss function by repeatedly updating model parameters in the direction that most reduces the loss."

New cards

"How does batch gradient descent update model parameters?"

"Batch gradient descent computes the gradient using the entire training dataset before making one parameter update."

New cards

"How does stochastic gradient descent differ from batch gradient descent?"

"Stochastic gradient descent updates parameters using one training example at a time

New cards

"Why is mini-batch gradient descent commonly used in practice?"

"Mini-batch gradient descent updates parameters using small groups of examples

New cards

"What is the tradeoff between batch

stochastic

New cards

"Why is the learning rate important in gradient descent?"

"The learning rate controls how large each parameter update is. If it is too small

New cards

"What happens when the learning rate is too high?"

"The loss may bounce around

New cards

"What happens when the learning rate is too low?"

"Training becomes very slow and may appear stuck because each update changes the parameters only slightly."

New cards

"When is mean squared error commonly used as a loss function?"

"Mean squared error is commonly used for regression problems where the goal is to predict continuous numerical values."

New cards

"Why is cross-entropy loss used for classification?"

"Cross-entropy measures how far the predicted probability distribution is from the true class distribution. It strongly penalizes confident wrong predictions."

New cards

"What is hinge loss used for?"

"Hinge loss is commonly used in support vector machines. It penalizes predictions that are wrong or not confidently correct by enforcing a margin between classes."

New cards

"Why do we add regularization to a model?"

"Regularization penalizes model complexity to reduce overfitting and improve generalization on unseen data."

New cards

"What is L1 regularization

and why does it lead to sparsity?"

New cards

"What is L2 regularization

and how does it affect model weights?"

New cards

"What is the geometric difference between L1 and L2 regularization

and why does L1 produce sparse solutions?"

New cards

"What is the main goal of linear regression?"

"Linear regression estimates a continuous target variable by fitting a linear relationship between input features and the output."

New cards

"What are the key assumptions of linear regression?"

"Linear regression assumes linearity

New cards

"Why does linear regression assume linearity?"

"It assumes the expected value of the target can be represented as a weighted sum of the features; if the true relationship is highly nonlinear

New cards

"What does homoscedasticity mean in linear regression?"

"Homoscedasticity means the variance of the residuals is roughly constant across all predicted values."

New cards

"Why is multicollinearity a problem in linear regression?"

"Multicollinearity makes coefficient estimates unstable because highly correlated predictors make it difficult to separate each feature’s individual effect."

New cards

"What is logistic regression used for?"

"Logistic regression is used for classification

New cards

"What is the link function in logistic regression?"

"Logistic regression uses the logit link function

New cards

"Why does logistic regression use a sigmoid function?"

"The sigmoid function converts linear model outputs into probabilities between 0 and 1

New cards

"How does a decision tree choose splits?"

"A decision tree chooses splits that increase class purity

New cards

"What is the difference between entropy and Gini impurity in decision trees?"

"Entropy measures impurity using information theory and is used in information gain

New cards

"Why can decision trees easily overfit?"

"Decision trees can keep splitting until they memorize small patterns or noise in the training data

New cards

"What is KNN’s main assumption?"

"KNN assumes that similar data points are close together in feature space

New cards

"Why does KNN suffer from the curse of dimensionality?"

"In high-dimensional spaces

New cards

"What is the difference between precision

recall

New cards

"When should you prefer a PR-Curve over ROC-AUC for imbalanced datasets?"

"A PR-Curve is usually more informative when the positive class is rare because it focuses directly on precision and recall for the positive class. ROC-AUC can look overly optimistic on imbalanced data because the false positive rate may remain small when there are many true negatives."

New cards

"Why do many machine learning models require feature scaling?"

"Scaling makes features comparable in magnitude so that one large-scale feature does not dominate the learning process. It is especially important for distance-based models

New cards

"What is the difference between normalization and standardization?"

"Normalization usually rescales values to a fixed range

New cards

"When would you prefer normalization over standardization?"

"Normalization is useful when features need to be bounded

New cards

"When would you prefer standardization over normalization?"

"Standardization is often preferred for models that assume or benefit from centered data

New cards

"What are common strategies for handling missing data?"

"Common approaches include deletion

New cards

"Why is it important to understand why data is missing?"

"The reason data is missing affects the correct handling strategy. Missing completely at random is less concerning

New cards

"When should you use median imputation instead of mean imputation?"

"Median imputation is usually better when the feature has outliers or a skewed distribution because the median is more robust than the mean."

New cards

"What is one-hot encoding

and when is it appropriate?"

New cards

"What is a major downside of one-hot encoding?"

"It can create many columns when a categorical variable has high cardinality. This increases memory usage

New cards

"What is target encoding?"

"Target encoding replaces each category with a statistic based on the target

New cards

"When would you choose target encoding over one-hot encoding?"

"Target encoding is useful when a categorical variable has many unique levels and one-hot encoding would create too many sparse columns. However

New cards

"What is PCA used for in machine learning?"

"PCA reduces dimensionality by projecting data onto new orthogonal directions that capture the most variance. It is often used for compression

New cards

"Why is PCA considered a lossy transformation?"

"PCA is lossy when you keep only a subset of principal components because some variance from the original data is discarded. The transformed data no longer contains all original information."

New cards

"What is data leakage?"

"Data leakage occurs when information from outside the training process is used to build the model

New cards

"How can data leakage occur during cross-validation?"

"Leakage occurs when preprocessing steps like scaling

New cards

"What is the core difference between bagging and boosting?"

"Bagging trains many models independently on different bootstrapped samples and averages their predictions. Boosting trains models sequentially

New cards

"Why is Random Forest considered a bagging method?"

"Random Forest trains many decision trees on bootstrapped datasets and averages their outputs. It also randomly samples features at each split

New cards

"Why does bagging require diversity among base models?"

"Bagging works best when individual models make different errors. If all trees are highly correlated

New cards

"Why does bagging reduce variance?"

"Averaging multiple noisy models cancels out some individual errors

New cards

"What is the main idea behind boosting?"

"Boosting builds an ensemble sequentially

New cards

"How does boosting optimize an objective function?"

"Boosting minimizes an objective made of a loss term plus a regularization term. Each new learner is added to reduce the current ensemble’s loss; in gradient boosting

New cards

"Why do people say boosting focuses on residuals?"

"In regression

New cards

"What is the practical difference between XGBoost and LightGBM?"

"XGBoost is a highly optimized gradient boosting framework known for regularization and strong performance. LightGBM is often faster on large datasets because it uses histogram-based splitting and leaf-wise tree growth."

New cards

"What is out-of-bag error in bagging?"

"Out-of-bag error estimates validation performance using samples that were not included in a tree’s bootstrap training set. Each tree predicts on its unused samples

New cards

"Why can out-of-bag error act like a validation score?"

"Each bootstrap sample leaves out some training observations

New cards

"What is stacking in ensemble learning?"

"Stacking combines multiple different base models by training a meta-model on their predictions. The meta-model learns how to best combine the strengths of each base learner."

New cards

"Why must stacking use out-of-fold predictions for training the meta-model?"

"If the meta-model is trained on predictions from base models that were trained on the same data

New cards

"What is Gini feature importance in tree-based models?"

"Gini importance measures how much a feature reduces impurity across all splits where it is used. Features that create strong purity improvements receive higher importance scores."

New cards

"What is permutation feature importance?"

"Permutation importance measures how much model performance drops when a feature’s values are randomly shuffled. If performance drops a lot

New cards

"When should you prefer permutation importance over Gini importance?"

"Use permutation importance when you want a performance-based measure of feature usefulness

New cards

"What is Naive Bayes used for in machine learning?"

"Naive Bayes is a probabilistic classifier that predicts the class with the highest posterior probability using Bayes’ theorem. It is commonly used for text classification

New cards

"What is the naive assumption in Naive Bayes?"

"The naive assumption is that all features are conditionally independent given the class label. In other words

New cards

"Why does Naive Bayes often perform well even when the independence assumption is false?"

"Naive Bayes can still classify well because it only needs the correct class to receive the highest score

New cards

"How does Naive Bayes use Bayes’ theorem for classification?"

"It estimates the probability of each class given the observed features

New cards

"Why is smoothing used in Naive Bayes?"

"Smoothing prevents zero probabilities when a feature value appears in the test set but was not seen with a class during training. Without smoothing

New cards

"How does K-Means clustering work?"

"K-Means partitions data into K clusters by assigning each point to the nearest centroid

New cards

"What assumptions does K-Means make about clusters?"

"K-Means assumes clusters are roughly spherical

New cards

"Why is K-Means sensitive to outliers?"

"K-Means uses the mean as the cluster center

New cards

"What is the main difference between K-Means and hierarchical clustering?"

"K-Means requires choosing K beforehand and produces a flat clustering. Hierarchical clustering builds a tree-like structure of nested clusters

New cards

"What is the difference between agglomerative and divisive hierarchical clustering?"

"Agglomerative clustering starts with each point as its own cluster and repeatedly merges the closest clusters. Divisive clustering starts with all points in one cluster and recursively splits them."

New cards

"What is a dendrogram in hierarchical clustering?"

"A dendrogram is a tree diagram that shows how clusters merge or split at different distance levels. It helps decide how many clusters may be reasonable."

New cards

"What is the purpose of the Expectation-Maximization algorithm?"

"EM is used to estimate model parameters when there are hidden or latent variables. It alternates between estimating hidden assignments and updating parameters to maximize likelihood."

New cards

"How is EM for Gaussian Mixture Models different from K-Means?"

"K-Means uses hard assignments

New cards

"What is the difference between Euclidean and Manhattan distance?"

"Euclidean distance measures straight-line distance

New cards

"When is cosine similarity better than Euclidean distance?"

"Cosine similarity is better when the direction of vectors matters more than their magnitude. It is commonly used for text embeddings