1/63
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Measures of Central Tendency
Mean, Median, Mode
Mean
Average value, sensitive to outliers
Median
Middle value when data is ordered; robust to outliers
Mode
Most frequently occurring value
When to use mean, median, and mode
Mean for normal distributions, median for skewed data, mode for categorical data
Measures of Variability
Variance, Standard Deviation, Range, Interquartile Range (IQR)
Variance
Average of squared deviations from mean
Standard Deviation
Square root of variance; same units as original data
Range
Difference between max and min values
Interquartile Range (IQR)
Range of middle 50% of data; robust to outliers
Skewness
Measure of asymmetry (positive = right tail, negative left tail)
Kurtosis
Measure of tail heaviness compared to normal distribution
Addition Rule
P(A or B) = P(A) + P(B) - P(A and B)
Multiplication Rule
P(A and B) = P(A) * P(B|A)
Conditional Probability
P(A|B) = P(A and B)/P(B)
Bayes’ Theorem
Formula: P(A|B) = P(B|A) * P(A)/P(B)
Key concept: Updates prior probability with new evidence
Common interview trap: Confusing P(A|B) with P(B|A)
Independent Events
P(A|B) = P(A); knowing B doesn’t change probability of A
Mutually Exclusive
P(A and B) = 0; events cannot occur together
Independence vs. Mutual Exclusivity Common Mistake
Assuming mutually exclusive events are independent
Normal Distribution Properties
Bell-shaped, symmetric, defined by mean and standard deviation
68-95-99.7 Rule
~68% within 1 SD, ~95% within 2 SD, ~99.7% within 3 SD
Standard Normal
Mean = 0, SD = 1; used for z-scores
Central Limit Theorem (CLT)
Key insight: Sample means approach normal distribution as sample size increases
Rule of thumb: n >= 30 for CLT to apply
Why it matters: Enables inference even when population isn’t normal
Binomial Distribution
Number of successes in fixed number of trials
Poisson Distribution
Count of rare events in fixed time/space
Exponential Distribution
Time between events in Poisson process
t-distribution
Used when sample size is small and population SD unknown
Hypothesis Testing Core Framework
Null Hypothesis (H0): Status quo assumption
Alternative Hypothesis (H1): What we’re trying to prove
Test Statistic: Standardized measure of evidence against H0
p-value: Probability of observing data this extreme if H0 is true
Decision: Reject H0 if p-value < alpha (significance level)
Type I Error (alpha)
Rejecting true null hypothesis (false positive)
Type II Error (beta)
Failing to reject false null hypothesis (false negative)
Power
1 - beta; probability of correctly rejecting false null hypothesis
Trade-off
Decreasing alpha increases beta, and vice versa
One-sample t-test
Compare sample mean to known value
Two-sample t-test
Compare means of two groups
Paired t-test
Compare before/after measurements
Chi-square test
Test independence between categorical variables
ANOVA (Analysis of Variance)
Compare means across multiple groups
p-hacking
Manipulating analysis to achieve significant p-value
Multiple testing problem
Increased chance of Type I error with multiple tests
Bonferroni correction
Adjust alpha by dividing by number of tests
Confidence Interval Interpretation
CORRECT: “We are 95% confident the interval contains the true parameter.”
INCORRECT: “There’s a 95% chance the parameter is in this interval.”
KEY POINT: The interval is random, NOT the parameter.
Factors Affecting Width (of confidence interval)
Sample size: Larger n → narrower interval
Confidence level: Higher confidence → wider interval
Population variability → wider interval
Pearson Correlation
Linear relationship between continuous variables (-1 to +1)
KEY LIMITATION: Only captures linear relationships
Spearman Correlation
Monotonic relationship; uses ranks
Requirements for establishing causation
Temporal precedence
Covariation
No confounding variables
Common fallacy of causation
Assuming correlation implies causation
Solutions for establishing causation
Randomized experiments
Instrumental variables
Natural experiments
Types of Sampling Bias
Selection Bias
Survivorship Bias
Response Bias
Confirmation Bias
Selection Bias
Non-representative sample selection
Survivorship Bias
Only analyzing “survivors” of a process
Response Bias
Systematic differences in who responds
Confirmation Bias
Seeking data that confirms preconceptions
Factors of Sample Size Determination
Desired confidence level
Margin of error
Population variability
Power analysis for sample size determination
Determining sample size needed to detect meaningful effect
Common mistake in sample size determination
Use sample size formulas without considering effect size
A/B Testing Design Principles
Randomization
Power calculation
Duration
Primary metric
Randomization
Ensures groups are comparable
Power calculation
Determine sample size before starting
Duration
Balance statistical power with external validity
Primary metric
Define success metric before starting
A/B Testing Common Pitfalls
Peeking
Novelty effect
Simpson’s paradox
Peeking
Checking results before predetermined end
Novelty effect
Initial behavior change due to change itself
Simpson’s paradox
Trend reverses when data is segmented