1/142
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
Mean
The average of a set of numbers, found by summing values and dividing by count.
Variance
A measure of how far data points spread out from the mean.
Precision
The proportion of true positives among all predicted positives.
Recall
The proportion of true positives among all actual positives.
F1-score
Harmonic mean of precision and recall, useful for imbalanced datasets.
Supervised Learning
Machine learning using labeled data to train models.
Unsupervised Learning
Machine learning using unlabeled data to find patterns.
Overfitting
When a model performs well on training data but poorly on new data.
Normalization
Scaling data to a standard range, often 0-1.
Neural Network
A computational model inspired by the human brain, consisting of layers of nodes.
Bias in AI
Systematic error introduced when training data misrepresents reality.
Bayes' Theorem
A formula for conditional probability: P(A|B) = P(B|A)P(A)/P(B).
ROC Curve
Graph showing trade-off between true positive rate and false positive rate.
Feature Engineering
Creating new input variables to improve model performance.
Reinforcement Learning
Training agents through rewards and penalties for actions taken.
Population vs. sample
A population is the entire set of interest; a sample is a subset drawn from it to estimate population parameters.
Parameter vs. statistic
Parameters describe populations (e.g., μ, σ); statistics describe samples (e.g., x, s).
Mean
Sum of values divided by count; sensitive to outliers.
Median
Middle value in a sorted list; robust to outliers.
Mode
Most frequent value; can be multimodal.
Variance
Average squared deviation from the mean; population: σ², sample: s².
Standard deviation
Square root of variance; interpretable spread in original units.
Interquartile range (IQR)
Q3 − Q1; robust spread measure used in boxplots.
Skewness
Asymmetry of distribution; positive skew has a long right tail.
Kurtosis
Tail heaviness relative to normal; high kurtosis implies heavy tails.
Empirical rule
In normal distributions, ~68%, 95%, 99.7% within 1, 2, 3 SDs.
Z-score
Standardized value: (x - μ)/σ; compares across scales.
Central limit theorem
Sample mean tends toward normal as n increases, regardless of population distribution.
Law of large numbers
Sample average converges to population mean as sample size grows.
Correlation vs. causation
Correlation quantifies association; causation requires mechanisms and controls.
Pearson correlation
Linear association; sensitive to outliers; −1 to 1.
Spearman correlation
Rank-based; robust to nonlinearity and outliers.
Probability basics
P(A ∪ B) = P(A) + P(B) - P(A ∩ B); independence: P(A ∩ B)=P(A)P(B).
Conditional probability
P(A|B)=P(A ∩ B)/P(B).
Bayes' theorem
P(A|B)=P(B|A)P(A)/P(B).
Prior vs. posterior
Prior: belief before data; posterior: updated belief after observing evidence via Bayes.
Likelihood
Probability of data given parameters; central in ML (maximum likelihood).
Distributions: normal
Symmetric, bell-shaped; defined by μ, σ; ubiquitous in measurement data.
Distributions: binomial
Fixed n trials, success probability p; counts of successes; mean np, var np(1−p).
Distributions: Poisson
Counts of events over fixed interval with rate λ; mean = variance = λ.
Distributions: exponential
Memoryless waiting times; parameter λ; mean 1/λ.
Distributions: Bernoulli
Single trial with success/failure; mean p, variance p(1−p).
Sampling methods
Simple random, stratified, cluster, systematic; impact bias and variance.
Bias types (stats)
Selection bias, survivorship bias, measurement bias, nonresponse bias.
Confidence intervals
Range likely containing parameter; depends on variability and sample size.
Hypothesis testing
Null vs. alternative; p-value assesses evidence against null.
Type I vs. Type II error
Type I: false positive (α); Type II: false negative (β); power = 1−β.
Data quality dimensions
Accuracy, completeness, consistency, timeliness, validity, uniqueness.
Data cleaning
Handle missing (drop, impute), fix types, de-duplicate, resolve outliers, enforce constraints.
Missing data mechanisms
MCAR, MAR, MNAR; guide imputation strategy.
Imputation methods
Mean/median, mode, KNN impute, regression impute, multivariate imputation (MICE).
Feature scaling
Normalization (min-max), standardization (z-score), robust scaling (IQR-based).
Feature encoding
One-hot, ordinal, target encoding (use with caution to avoid leakage).
Feature selection
Filter (correlation, chi-squared), wrapper (RFE), embedded (L1/L2 regularization).
Dimensionality reduction
PCA (linear), t-SNE/UMAP (manifold visualization), autoencoders (nonlinear).
Data leakage
Train data includes information from test or future; inflates performance; avoid via strict splits.
Train/validation/test split
Typical: 60-20-20 or 70-15-15; validation tunes; test is final unbiased estimate.
Cross-validation
k-fold, stratified k-fold for classification; reduces variance of performance estimates.
Stratification
Preserve class proportions across folds/splits; critical in imbalanced data.
Supervised learning
Learn mapping from features X to labels y using labeled data.
Regression vs. classification
Regression predicts continuous values; classification predicts discrete classes.
Overfitting vs. underfitting
Overfit: memorizes noise; underfit: too simple; manage with regularization, more data.
Bias-variance tradeoff
High bias: underfit; high variance: overfit; aim for optimal complexity.
Regularization
L1 (lasso) sparsity, L2 (ridge) shrinkage; reduces overfitting.
Early stopping
Halt training when validation loss stops improving to prevent overfit.
Ensembles
Bagging (Random Forest), boosting (XGBoost), stacking; often superior generalization.
Linear regression
Minimize \(\sum (y - \hat{y})^2\); assumptions: linearity, homoscedasticity, normal errors, independence.
Logistic regression
Sigmoid outputs probability; decision boundary via log-odds; interpretable coefficients.
KNN
Instance-based; choose k and distance metric; sensitive to scaling and noise.
Naive Bayes
Assumes feature independence; strong baseline for text; fast and robust.
Decision trees
Recursive splits; interpretable; prone to overfitting without pruning.
Random forest
Ensemble of trees via bagging; reduce variance; feature importance estimates.
Gradient boosting
Sequential trees fit residuals; powerful but sensitive to hyperparameters.
SVM
Maximize margin with kernels (linear, RBF); effective in high-dimensional spaces.
Clustering: K-means
Partition into k clusters; minimizes within-cluster variance; requires scaling; spherical clusters.
Clustering: hierarchical
Agglomerative/divisive; dendrogram visual; flexible but computationally heavy.
Clustering: DBSCAN
Density-based; finds arbitrary shapes and noise; requires eps/minPts tuning.
Topic modeling
LDA uncovers topics via word distributions; unsupervised text analysis.
Evaluation: accuracy
Proportion correct; misleading in imbalanced data.
Precision
TP / (TP + FP); how often positives predicted are correct.
Recall (sensitivity)
TP / (TP + FN); how many actual positives captured.
Specificity
TN / (TN + FP); true negative rate.
F1-score
Harmonic mean of precision and recall; balances both.
Confusion matrix
2×2 summary: TP, FP, TN, FN; foundation for metrics.
ROC curve
TPR vs. FPR across thresholds; AUC summarizes separability.
PR curve
Precision vs. recall; preferred in heavy class imbalance.
Regression metrics
MAE (robust), MSE (penalizes large errors), RMSE (scale-aware), \(R^2\) (variance explained).
Calibration
Agreement between predicted probabilities and observed frequencies; reliability diagrams.
Threshold selection
Choose decision threshold optimizing metric of interest (F1, cost-sensitive, Youden's J).
Neural networks
Layers of neurons; weights and biases; nonlinear activations enable complex functions.
Activation functions
ReLU, Leaky ReLU, Sigmoid, Tanh, Softmax (for multiclass probabilities).
Backpropagation
Gradient-based weight updates via chain rule; paired with optimizers like SGD/Adam.
Vanishing/exploding gradients
Gradients shrink or blow up in deep nets; mitigated with normalization, residuals.
Batch normalization
Normalizes layer inputs per batch; stabilizes training.
Dropout
Randomly zeroes activations; regularizes by preventing co-adaptation.
CNNs
Convolutions for spatial features; pooling reduces dimensions; used in computer vision.
RNNs
Sequential modeling with recurrent connections; struggles with long dependencies.
LSTM/GRU
Gated RNNs; capture long-term dependencies more effectively than vanilla RNNs.
Transformers
Attention mechanisms model global dependencies; state-of-the-art in NLP and beyond.
Word embeddings
Dense vector representations (Word2Vec, GloVe); capture semantic similarity.