1/49
Comprehensive vocabulary and core concepts for Machine Learning and Data Science based on lecture notes covering chapters 2, 4, 5, and 6.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
pandas
Library for loading and manipulating tabular data using objects like DataFrame.
numpy
Library for numerical computation with arrays (ndarray) using vectorized operations.
matplotlib
Library used for data visualization and plotting, such as bar graphs and line plots.
sklearn
Library providing machine learning models, metrics, and utilities like train_test_split and accuracy_score.
X
The feature matrix containing all independent input/predictor columns.
y
The target vector representing the single dependent variable, outcome, or label to predict.
train_test_split
A utility to split data into training and test sets, often using a 70/30 or 75/25 ratio.
pd.factorize()
A method used to convert categorical text into integer labels, returning both the labels and the uniques.
.mask()
A pandas method used for conditional replacement, often used for binning continuous columns into categories.
.fit()
The method used to train a machine learning model on training data during the learning phase.
.predict()
The method used to generate predictions on new data during the inference or testing phase.
accuracy_score
Evaluation metric for classification representing the fraction of correct predictions.
confusion_matrix
A table specifically used for classification to show counts of True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN).
Normalization
Scaling features to the range [0,1] using the formula lambda x:(x−min(x))/(max(x)−min(x)) to ensure kNN distances are fair.
Supervised learning
Learning from labeled training data to predict future outcomes, branching into regression and classification.
Regression
A model type for relationships between continuous variables that estimates a numerical output.
Linear regression
Fitting a straight line defined as y=c+mX using Ordinary Least Squares (OLS) to minimize Mean Squared Error (MSE).
c (intercept)
The constant or bias value of y when X=0, where the line crosses the y-axis.
m (coefficient)
The slope or weight representing how much y changes per unit increase in X.
MSE (Mean Squared Error)
The average of squared prediction errors; it is the primary objective to minimize in OLS.
RMSE (Root Mean Squared Error)
The square root of MSE, resulting in error values in the same units as the target variable y.
Overfitting
A condition where a model memorizes training data and fails to generalize to new data, often characterized by low bias and high variance.
Ridge regression
Linear regression with an L2 penalty that shrinks all coefficients but retains all features.
Lasso regression
Linear regression with an L1 penalty that can zero out irrelevant features for automatic feature selection.
Gradient descent
Iterative optimization to minimize cost using a learning rate (alpha), stopping when change is less than epsilon.
r2_score
R-squared value representing the proportion of variance explained by a regression model, ranging from 0 to 1.
Classification
A supervised learning task that predicts discrete categories or class labels.
kNN algorithm
A lazy learner that classifies new points by taking a majority vote of the k nearest neighbors' labels.
Decision tree
A model that predicts classes using a tree structure of rules composed of decision nodes and leaf nodes.
Entropy
A measure of impurity or disorder; completely homogeneous nodes have an entropy of 0, while equal splits have an entropy of 1.
Information gain
The reduction in entropy achieved after a split; models like ID3 select attributes with the largest information gain.
Random forest
An ensemble method using bagging to build many decision trees, aggregating them via majority vote to reduce overfitting.
Logistic regression
A binary classification algorithm that uses the sigmoid function to convert output into a probability.
Sigmoid function
An S-shaped curve defined as σ(x)=1+e−x1 that bounds output between 0 and 1.
Likelihood l()
The probability of the model given the data; logistic regression maximizes log-likelihood iteratively.
Softmax regression
A multinomial extension of logistic regression for more than two classes that outputs a probability distribution over n classes.
Softmax function
A function that takes a vector of numbers and normalizes them into probabilities that sum to 1.
Naive Bayes
A classification method based on Bayes theorem that assumes all features are conditionally independent.
Prior probability P(c)
The base rate probability of a class before observing the current data.
Posterior probability P(c|x)
The updated probability of a class after seeing the data; the highest posterior wins in Naive Bayes.
MultinomialNB
A Naive Bayes variant specifically for count or categorical data following a multinomial distribution.
GaussianNB
A Naive Bayes variant for continuous features that assumes a Gaussian (normal) distribution.
SVM (Support Vector Machine)
A classifier that finds the optimal separating hyperplane with the maximum margin between classes.
Support vectors
The training data points closest to the hyperplane that define and control the margin size.
Margin
The distance between the separating hyperplane and the nearest support vectors; maximized to improve generalization.
Kernel
A function trick mapping data to higher dimensions for separation, with options like linear, rbf, and poly.
ROC / AUC
ROC plots TPR vs FPR at thresholds; AUC measures model quality where 1.0 is perfect and 0.5 is random.
Precision
Calculation defined as TP+FPTP, measuring how many predicted positives were actually correct.
Recall (Sensitivity)
Calculation defined as TP+FNTP, measuring what fraction of actual positives the model caught.
F1 Score
The harmonic mean of precision and recall calculated as 2×Precision+RecallPrecision×Recall.