1/49
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No study sessions yet.
Mean
Average of values sum divided by count.
Median
Middle value when data is sorted.
Mode
Most frequent value in a dataset.
Range
Max minus min in a dataset.
Variance
Average squared distance from the mean.
Standard deviation
Square root of variance; typical spread from the mean.
Correlation
Strength and direction of a linear relationship.
Correlation vs causation
Correlation does not prove one variable causes another.
Conditional probability P(A|B)
Probability of A given B occurred.
Independent events
A and B are independent if P(A|B)=P(A).
Normal distribution
Bell-shaped distribution defined by mean and standard deviation.
Expected value
Weighted average outcome based on probabilities.
Discrete variable
Counts or separate values like number of clicks.
Continuous variable
Any value in a range like time or temperature.
Histogram use
Shows distribution of a numeric variable.
Scatterplot use
Shows relationship between two numeric variables.
Boxplot use
Shows median quartiles spread and potential outliers.
Data cleaning purpose
Fix errors and improve data quality for analysis and models.
Common data issues
Missing values duplicates inconsistent formats outliers bad sources.
Structured data
Fixed schema like rows and columns in tables.
Unstructured data
Text images audio video without a fixed schema.
Semi-structured data
JSON or XML with flexible fields.
Primary key
Uniquely identifies a row in a table.
Foreign key
References a primary key in another table to link tables.
SQL SELECT
Returns columns and rows from a table.
SQL WHERE
Filters rows by condition.
SQL GROUP BY
Groups rows for aggregation like COUNT SUM AVG.
SQL HAVING
Filters grouped results after aggregation.
INNER JOIN
Returns rows with matching keys in both tables.
Pandas used for
Data wrangling cleaning and analysis with DataFrames.
NumPy used for
Fast numerical arrays and mathematical operations.
PyTorch used for
Building and training deep learning models.
Generative AI
AI that produces new content like text images code.
LLM limitation
Can hallucinate and be confidently wrong.
Train validation test split
Train learns; validation tunes; test evaluates generalization.
Supervised learning
Learns from labeled examples to predict targets.
Unsupervised learning
Finds structure in unlabeled data like clusters.
Reinforcement learning
Learns actions through rewards and penalties.
Overfitting
Great training performance but poor new-data performance.
Underfitting
Model too simple; poor performance even on training data.
Confusion matrix
Counts TP FP TN FN for classification.
Precision
TP divided by TP plus FP.
Recall
TP divided by TP plus FN.
Linear regression
Predicts a continuous numeric value.
Logistic regression
Predicts probability for classification.
Decision tree
Splits data by feature rules to classify or predict.
K-means clustering
Groups data into k clusters by similarity.
Algorithmic bias
Systematic unfair outcomes due to data or model choices.
Ethical AI practice
Inclusive data, bias audits, transparency, accountability.