1/42
Vocabulary flashcards covering fundamental ML concepts, logistic regression, model evaluation, and hyperparameter tuning from the lecture notes.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Machine Learning
A branch of artificial intelligence that enables algorithms to learn from data to make predictions or decisions without explicit programming.
Artificial Intelligence
A broad field focused on making machines perform tasks that require human intelligence; includes ML, DL, and rule-based systems.
Data preprocessing
Cleaning, transforming, and preparing data (handling missing values, scaling) before feeding it to a model.
Training data
Labeled data used to fit a machine learning model.
Test data
Data used to evaluate model performance after training.
Model
A mathematical representation built from data to make predictions or decisions.
Hyperparameter
A model setting chosen before training that controls learning behavior and model capacity.
Parameter
A quantity learned from data during training (e.g., coefficients in regression).
GridSearchCV
A tool that exhaustively searches over hyperparameter combinations using cross-validation to find the best settings.
Cross-validation
A model evaluation method that splits data into training and validation sets multiple times to estimate performance.
K-fold cross-validation
A type of cross-validation where data is split into k folds; each fold serves as validation once.
Feature scaling
Preprocessing to standardize or normalize features so they contribute equally to the model.
Standardization
Scaling features to zero mean and unit variance.
Normalization
Scaling features to a fixed range, typically [0, 1].
Logistic Regression
A supervised statistical technique to model the probability of a dependent variable using a logistic (sigmoid) function.
Sigmoid function
An S-shaped function mapping real numbers to a probability in the range 0 to 1.
Logistic model equation
p = 1 / (1 + e^{-(β0 + β1 x)}), the predicted probability.
Binary Logistic Regression
Logistic regression where the dependent variable has two possible outcomes/classes.
Multinomial Logistic Regression
Logistic regression for a dependent variable with three or more unordered categories.
Ordinal Logistic Regression
Logistic regression for an ordinal dependent variable with ordered categories.
Decision boundary
The boundary in feature space that separates predicted classes.
Confusion Matrix
A table comparing predicted vs actual class labels (TP, FP, TN, FN).
True Positive
Predicted positive and actual positive.
True Negative
Predicted negative and actual negative.
False Positive
Predicted positive but actually negative.
False Negative
Predicted negative but actually positive.
Accuracy
Proportion of correct predictions: (TP + TN) / (TP + TN + FP + FN).
Precision
Proportion of predicted positives that were actually positive: TP / (TP + FP).
Recall (Sensitivity)
Proportion of actual positives that were correctly predicted: TP / (TP + FN).
F1 Score
Harmonic mean of precision and recall: 2 * (Precision * Recall) / (Precision + Recall).
AUC-ROC
Area Under the Receiver Operating Characteristic Curve; measures classifier’s ability to distinguish classes.
Threshold
Probability cutoff used to assign a class; changing it affects precision and recall.
Odds
Ratio of the probability of an event to its complement: p / (1 - p).
Log odds
Natural logarithm of the odds; the link used in logistic regression.
Odds ratio
Exp(β); the multiplicative change in odds for a one-unit increase in a predictor.
Intercept
β0; the log odds when all predictors are zero.
Coefficient
βi; the change in log odds per one-unit increase in predictor i, holding others constant.
Multicollinearity
High correlation among independent variables; detected by metrics like VIF.
Variance Inflation Factor (VIF)
A metric to detect multicollinearity by quantifying variance inflation.
Box-Tidwell test
Test for linearity of the logit with respect to continuous predictors.
Likelihood interpretation of coefficients
Coefficients reflect log odds; exponentiated coefficients give odds ratios.
Decision boundary concept
A boundary that the model uses to separate classes and make predictions.
Feature scaling (overview)
Standardization or normalization to improve model performance and convergence.