1/26
Vocabulary flashcards covering key concepts from the lecture on data preparation, linear regression, and basic ML workflow.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
y (ground truth / true label)
The actual target value for a sample used during training; the model aims for y_hat to be as close as possible to y.
y_hat (predicted value)
The model's predicted value for a sample; used to compute the error with the true label y.
w (weights)
Model parameters learned during training; coefficients that, with x, produce predictions (intercept handled by bias).
b (bias / intercept)
A constant term added to the linear combination to shift the prediction.
I (sample index)
Index of a data sample (0-based in the lecture; 0 to n-1).
n (number of samples)
The total number of samples in the dataset.
d (number of features)
The number of feature dimensions (input size for each sample).
y = w^T x + b (linear model equation)
The linear predictor: a dot product between weights and features plus the bias, giving the predicted value.
Mean Squared Error (MSE)
Average of the squared differences between predicted and true values: (1/n) Σ (yhati − y_i)^2.
Root Mean Squared Error (RMSE)
Square root of MSE; same units as y and easier to interpret.
Mean Absolute Error (MAE)
Average of the absolute differences between predicted and true values; less sensitive to outliers.
Classification vs Regression
Classification predicts discrete categories; regression predicts continuous numeric values.
One-hot encoding
Converts a categorical feature with k categories into k binary features to avoid implying ordinal relationships.
Drop first in one-hot encoding
Option (drop_first=True) to remove one dummy column and avoid redundancy/collinearity.
Label encoding
Assigns integers to categories; can introduce artificial order and is not ideal for nominal categories.
Structured data
Tabular data with rows and columns (like an Excel file) where features and labels are clearly defined.
Unstructured data
Data without a fixed schema (e.g., text, images); images are matrices of pixel values.
Train-test split
Partition data into training and testing subsets; often 80/20; random_state for reproducibility; stratify to preserve class distribution.
Random state (seed)
A seed for the random number generator to ensure reproducible splits and results.
Imputation
Filling in missing values (e.g., with column mean); training data used to compute imputation values to avoid leaking test information.
Standardization
Scaling features to zero mean and unit variance (z-scores) using fit_transform on training data and transform on test data.
Min-max scaling
Scaling features to [0, 1] by subtracting the min and dividing by the range.
Pseudoinverse
Generalized inverse used when X^T X is not invertible; enables least-squares solutions for non-square matrices.
Normal equations
Closed-form solution for linear regression: w* = (X^T X)^{-1} X^T y; can be expensive for large datasets, hence iterative methods are common.
Simple vs. Multiple Linear Regression
Simple: one independent variable; Multiple: more than one independent variable.
Intercept (w_0)
The predicted value when all features are zero; the base level of the regression line.
Slope (w_1, etc.)
The change in the predicted value for a one-unit change in the corresponding feature.