1/92
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No study sessions yet.
Activation functions
ReLU, Leaky ReLU, Sigmoid, Tanh, Softmax (for multiclass probabilities).
Backpropagation
Gradient-based weight updates via chain rule; paired with optimizers like SGD/Adam.
Batch normalization
Normalizes layer inputs per batch; stabilizes training.
Bayes' Theorem
A formula for conditional probability: P(A|B) = P(B|A)P(A)/P(B).
Bayes' theorem
P(A|B)=P(B|A)P(A)/P(B).
Bias in AI
Systematic error introduced when training data misrepresents reality.
Bias types (stats)
Selection bias, survivorship bias, measurement bias, nonresponse bias.
Bias-variance tradeoff
High bias: underfit; high variance: overfit; aim for optimal complexity.
Calibration
Agreement between predicted probabilities and observed frequencies; reliability diagrams.
Central limit theorem
Sample mean tends toward normal as n increases, regardless of population distribution.
Clustering: DBSCAN
Density-based; finds arbitrary shapes and noise; requires eps/minPts tuning.
Clustering: hierarchical
Agglomerative/divisive; dendrogram visual; flexible but computationally heavy.
Clustering: K-means
Partition into k clusters; minimizes within-cluster variance; requires scaling; spherical clusters.
CNNs
Convolutions for spatial features; pooling reduces dimensions; used in computer vision.
Confidence intervals
Range likely containing parameter; depends on variability and sample size.
Confusion matrix
2×2 summary: TP, FP, TN, FN; foundation for metrics.
Conditional probability
P(A|B)=P(A ∩ B)/P(B).
Correlation vs. causation
Correlation quantifies association; causation requires mechanisms and controls.
Cross-validation
k-fold, stratified k-fold for classification; reduces variance of performance estimates.
Data cleaning
Handle missing (drop, impute), fix types, de-duplicate, resolve outliers, enforce constraints.
Data leakage
Train data includes information from test or future; inflates performance; avoid via strict splits.
Data quality dimensions
Accuracy, completeness, consistency, timeliness, validity, uniqueness.
Decision trees
Recursive splits; interpretable; prone to overfitting without pruning.
Dimensionality reduction
PCA (linear), t-SNE/UMAP (manifold visualization), autoencoders (nonlinear).
Distributions: Bernoulli
Single trial with success/failure; mean p, variance p(1−p).
Distributions: binomial
Fixed n trials, success probability p; counts of successes; mean np, var np(1−p).
Distributions: exponential
Memoryless waiting times; parameter λ; mean 1/λ.
Distributions: normal
Symmetric, bell-shaped; defined by μ, σ; ubiquitous in measurement data.
Distributions: Poisson
Counts of events over fixed interval with rate λ; mean = variance = λ.
Dropout
Randomly zeroes activations; regularizes by preventing co-adaptation.
Early stopping
Halt training when validation loss stops improving to prevent overfit.
Empirical rule
In normal distributions, ~68%, 95%, 99.7% within 1, 2, 3 SDs.
Ensembles
Bagging (Random Forest), boosting (XGBoost), stacking; often superior generalization.
Evaluation: accuracy
Proportion correct; misleading in imbalanced data.
F1-score
Harmonic mean of precision and recall; balances both.
Feature encoding
One-hot, ordinal, target encoding (use with caution to avoid leakage).
Feature engineering
Creating new input variables to improve model performance.
Feature scaling
Normalization (min-max), standardization (z-score), robust scaling (IQR-based).
Feature selection
Filter (correlation, chi-squared), wrapper (RFE), embedded (L1/L2 regularization).
Gradient boosting
Sequential trees fit residuals; powerful but sensitive to hyperparameters.
Hypothesis testing
Null vs. alternative; p-value assesses evidence against null.
Imputation methods
Mean/median, mode, KNN impute, regression impute, multivariate imputation (MICE).
Interquartile range (IQR)
Q3 − Q1; robust spread measure used in boxplots.
KNN
Instance-based; choose k and distance metric; sensitive to scaling and noise.
Kurtosis
Tail heaviness relative to normal; high kurtosis implies heavy tails.
Law of large numbers
Sample average converges to population mean as sample size grows.
Likelihood
Probability of data given parameters; central in ML (maximum likelihood).
Linear regression
Minimize Σ(y - ŷ)²; assumptions: linearity, homoscedasticity, normal errors, independence.
Logistic regression
Sigmoid outputs probability; decision boundary via log-odds; interpretable coefficients.
LSTM/GRU
Gated RNNs; capture long-term dependencies more effectively than vanilla RNNs.
Mean
The average of a set of numbers, found by summing values and dividing by count.
Median
Middle value in a sorted list; robust to outliers.
Missing data mechanisms
MCAR, MAR, MNAR; guide imputation strategy.
Mode
Most frequent value; can be multimodal.
Naive Bayes
Assumes feature independence; strong baseline for text; fast and robust.
Neural Network
A computational model inspired by the human brain, consisting of layers of nodes.
Neural networks
Layers of neurons; weights and biases; nonlinear activations enable complex functions.
Normalization
Scaling data to a standard range, often 0-1.
Overfitting
When a model performs well on training data but poorly on new data.
Overfitting vs. underfitting
Overfit: memorizes noise; underfit: too simple; manage with regularization, more data.
Parameter vs. statistic
Parameters describe populations (e.g., μ, σ); statistics describe samples (e.g., x, s).
Pearson correlation
Linear association; sensitive to outliers; −1 to 1.
Population vs. sample
A population is the entire set of interest; a sample is a subset drawn from it to estimate population parameters.
PR curve
Precision vs. recall; preferred in heavy class imbalance.
Precision
The proportion of true positives among all predicted positives.
Probability basics
P(A ∪ B) = P(A) + P(B) - P(A ∩ B); independence: P(A ∩ B)=P(A)P(B).
Random forest
Ensemble of trees via bagging; reduce variance; feature importance estimates.
Recall
The proportion of true positives among all actual positives.
Regression metrics
MAE (robust), MSE (penalizes large errors), RMSE (scale-aware), R² (variance explained).
Regression vs. classification
Regression predicts continuous values; classification predicts discrete classes.
Regularization
L1 (lasso) sparsity, L2 (ridge) shrinkage; reduces overfitting.
Reinforcement Learning
Training agents through rewards and penalties for actions taken.
RNNs
Sequential modeling with recurrent connections; struggles with long dependencies.
ROC curve
TPR vs. FPR across thresholds; AUC summarizes separability.
Sampling methods
Simple random, stratified, cluster, systematic; impact bias and variance.
Skewness
Asymmetry of distribution; positive skew has a long right tail.
Spearman correlation
Rank-based; robust to nonlinearity and outliers.
Specificity
TN / (TN + FP); true negative rate.
Standard deviation
Square root of variance; interpretable spread in original units.
Stratification
Preserve class proportions across folds/splits; critical in imbalanced data.
Supervised Learning
Machine learning using labeled data to train models.
Supervised learning
Learn mapping from features X to labels y using labeled data.
SVM
Maximize margin with kernels (linear, RBF); effective in high-dimensional spaces.
Threshold selection
Choose decision threshold optimizing metric of interest (F1, cost-sensitive, Youden's J).
Topic modeling
LDA uncovers topics via word distributions; unsupervised text analysis.
Train/validation/test split
Typical: 60-20-20 or 70-15-15; validation tunes; test is final unbiased estimate.
Transformers
Attention mechanisms model global dependencies; state-of-the-art in NLP and beyond.
Type I vs. Type II error
Type I: false positive (α); Type II: false negative (β); power = 1−β.
Unsupervised Learning
Machine learning using unlabeled data to find patterns.
Vanishing/exploding gradients
Gradients shrink or blow up in deep nets; mitigated with normalization, residuals.
Variance
A measure of how far data points spread out from the mean.
Word embeddings
Dense vector representations (Word2Vec, GloVe); capture semantic similarity.
Z-score
Standardized value: (x - μ)/σ; compares across scales.