Machine Learning and Data Science Study Sheet Flashcards

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Podcast

Card Sorting

1/49

Earn XP

Description and Tags

Comprehensive vocabulary and core concepts for Machine Learning and Data Science based on lecture notes covering chapters 2, 4, 5, and 6.

Last updated 5:32 PM on 6/27/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

50 Terms

New cards

pandas

Library for loading and manipulating tabular data using objects like DataFrame.

New cards

numpy

Library for numerical computation with arrays (ndarray) using vectorized operations.

New cards

matplotlib

Library used for data visualization and plotting, such as bar graphs and line plots.

New cards

sklearn

Library providing machine learning models, metrics, and utilities like train_test_split and accuracy_score.

New cards

The feature matrix containing all independent input/predictor columns.

New cards

The target vector representing the single dependent variable, outcome, or label to predict.

New cards

train_test_split

A utility to split data into training and test sets, often using a 70/30 or 75/25 ratio.

New cards

pd.factorize()

A method used to convert categorical text into integer labels, returning both the labels and the uniques.

New cards

.mask()

A pandas method used for conditional replacement, often used for binning continuous columns into categories.

New cards

.fit()

The method used to train a machine learning model on training data during the learning phase.

New cards

.predict()

The method used to generate predictions on new data during the inference or testing phase.

New cards

accuracy_score

Evaluation metric for classification representing the fraction of correct predictions.

New cards

confusion_matrix

A table specifically used for classification to show counts of True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN).

New cards

Normalization

Scaling features to the range $[0, 1]$ using the formula $\text{lambda } x: (x - \min(x)) / (\max(x) - \min(x))$ to ensure kNN distances are fair.

New cards

Supervised learning

Learning from labeled training data to predict future outcomes, branching into regression and classification.

New cards

Regression

A model type for relationships between continuous variables that estimates a numerical output.

New cards

Linear regression

Fitting a straight line defined as $y = c + mX$ using Ordinary Least Squares (OLS) to minimize Mean Squared Error (MSE).

New cards

c (intercept)

The constant or bias value of $y$ when $X = 0$ , where the line crosses the y-axis.

New cards

m (coefficient)

The slope or weight representing how much $y$ changes per unit increase in $X$ .

New cards

MSE (Mean Squared Error)

The average of squared prediction errors; it is the primary objective to minimize in OLS.

New cards

RMSE (Root Mean Squared Error)

The square root of MSE, resulting in error values in the same units as the target variable $y$ .

New cards

Overfitting

A condition where a model memorizes training data and fails to generalize to new data, often characterized by low bias and high variance.

New cards

Ridge regression

Linear regression with an L2 penalty that shrinks all coefficients but retains all features.

New cards

Lasso regression

Linear regression with an L1 penalty that can zero out irrelevant features for automatic feature selection.

New cards

Gradient descent

Iterative optimization to minimize cost using a learning rate (alpha), stopping when change is less than epsilon.

New cards

r2_score

R-squared value representing the proportion of variance explained by a regression model, ranging from 0 to 1.

New cards

Classification

A supervised learning task that predicts discrete categories or class labels.

New cards

kNN algorithm

A lazy learner that classifies new points by taking a majority vote of the $k$ nearest neighbors' labels.

New cards

Decision tree

A model that predicts classes using a tree structure of rules composed of decision nodes and leaf nodes.

New cards

Entropy

A measure of impurity or disorder; completely homogeneous nodes have an entropy of 0, while equal splits have an entropy of 1.

New cards

Information gain

The reduction in entropy achieved after a split; models like ID3 select attributes with the largest information gain.

New cards

Random forest

An ensemble method using bagging to build many decision trees, aggregating them via majority vote to reduce overfitting.

New cards

Logistic regression

A binary classification algorithm that uses the sigmoid function to convert output into a probability.

New cards

Sigmoid function

An S-shaped curve defined as $\sigma(x) = \frac{1}{1 + e^{-x}}$ that bounds output between 0 and 1.

New cards

Likelihood l()

The probability of the model given the data; logistic regression maximizes log-likelihood iteratively.

New cards

Softmax regression

A multinomial extension of logistic regression for more than two classes that outputs a probability distribution over $n$ classes.

New cards

Softmax function

A function that takes a vector of numbers and normalizes them into probabilities that sum to 1.

New cards

Naive Bayes

A classification method based on Bayes theorem that assumes all features are conditionally independent.

New cards

Prior probability P(c)

The base rate probability of a class before observing the current data.

New cards

Posterior probability P(c|x)

The updated probability of a class after seeing the data; the highest posterior wins in Naive Bayes.

New cards

MultinomialNB

A Naive Bayes variant specifically for count or categorical data following a multinomial distribution.

New cards

GaussianNB

A Naive Bayes variant for continuous features that assumes a Gaussian (normal) distribution.

New cards

SVM (Support Vector Machine)

A classifier that finds the optimal separating hyperplane with the maximum margin between classes.

New cards

Support vectors

The training data points closest to the hyperplane that define and control the margin size.

New cards

Margin

The distance between the separating hyperplane and the nearest support vectors; maximized to improve generalization.

New cards

Kernel

A function trick mapping data to higher dimensions for separation, with options like linear, rbf, and poly.

New cards

ROC / AUC

ROC plots TPR vs FPR at thresholds; AUC measures model quality where 1.0 is perfect and 0.5 is random.

New cards

Precision

Calculation defined as $\frac{TP}{TP + FP}$ , measuring how many predicted positives were actually correct.

New cards

Recall (Sensitivity)

Calculation defined as $\frac{TP}{TP + FN}$ , measuring what fraction of actual positives the model caught.

New cards

F1 Score

The harmonic mean of precision and recall calculated as $2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}$ .