Statistics Study Guide for Data Science Interviews

0.0(0)

Studied by 0 people

Knowt Play

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/63

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

64 Terms

New cards

Measures of Central Tendency

Mean, Median, Mode

New cards

Mean

Average value, sensitive to outliers

New cards

Median

Middle value when data is ordered; robust to outliers

New cards

Mode

Most frequently occurring value

New cards

When to use mean, median, and mode

Mean for normal distributions, median for skewed data, mode for categorical data

New cards

Measures of Variability

Variance, Standard Deviation, Range, Interquartile Range (IQR)

New cards

Variance

Average of squared deviations from mean

New cards

Standard Deviation

Square root of variance; same units as original data

New cards

Range

Difference between max and min values

New cards

Interquartile Range (IQR)

Range of middle 50% of data; robust to outliers

New cards

Skewness

Measure of asymmetry (positive = right tail, negative left tail)

New cards

Kurtosis

Measure of tail heaviness compared to normal distribution

New cards

Addition Rule

P(A or B) = P(A) + P(B) - P(A and B)

New cards

Multiplication Rule

P(A and B) = P(A) * P(B|A)

New cards

Conditional Probability

P(A|B) = P(A and B)/P(B)

New cards

Bayes’ Theorem

Formula: P(A|B) = P(B|A) * P(A)/P(B)

Key concept: Updates prior probability with new evidence

Common interview trap: Confusing P(A|B) with P(B|A)

New cards

Independent Events

P(A|B) = P(A); knowing B doesn’t change probability of A

New cards

Mutually Exclusive

P(A and B) = 0; events cannot occur together

New cards

Independence vs. Mutual Exclusivity Common Mistake

Assuming mutually exclusive events are independent

New cards

Normal Distribution Properties

Bell-shaped, symmetric, defined by mean and standard deviation

New cards

68-95-99.7 Rule

~68% within 1 SD, ~95% within 2 SD, ~99.7% within 3 SD

New cards

Standard Normal

Mean = 0, SD = 1; used for z-scores

New cards

Central Limit Theorem (CLT)

Key insight: Sample means approach normal distribution as sample size increases

Rule of thumb: n >= 30 for CLT to apply

Why it matters: Enables inference even when population isn’t normal

New cards

Binomial Distribution

Number of successes in fixed number of trials

New cards

Poisson Distribution

Count of rare events in fixed time/space

New cards

Exponential Distribution

Time between events in Poisson process

New cards

t-distribution

Used when sample size is small and population SD unknown

New cards

Hypothesis Testing Core Framework

Null Hypothesis (H0): Status quo assumption
Alternative Hypothesis (H1): What we’re trying to prove
Test Statistic: Standardized measure of evidence against H0
p-value: Probability of observing data this extreme if H0 is true
Decision: Reject H0 if p-value < alpha (significance level)

New cards

Type I Error (alpha)

Rejecting true null hypothesis (false positive)

New cards

Type II Error (beta)

Failing to reject false null hypothesis (false negative)

New cards

Power

1 - beta; probability of correctly rejecting false null hypothesis

New cards

Trade-off

Decreasing alpha increases beta, and vice versa

New cards

One-sample t-test

Compare sample mean to known value

New cards

Two-sample t-test

Compare means of two groups

New cards

Paired t-test

Compare before/after measurements

New cards

Chi-square test

Test independence between categorical variables

New cards

ANOVA (Analysis of Variance)

Compare means across multiple groups

New cards

p-hacking

Manipulating analysis to achieve significant p-value

New cards

Multiple testing problem

Increased chance of Type I error with multiple tests

New cards

Bonferroni correction

Adjust alpha by dividing by number of tests

New cards

Confidence Interval Interpretation

CORRECT: “We are 95% confident the interval contains the true parameter.”

INCORRECT: “There’s a 95% chance the parameter is in this interval.”

KEY POINT: The interval is random, NOT the parameter.

New cards

Factors Affecting Width (of confidence interval)

Sample size: Larger n → narrower interval
Confidence level: Higher confidence → wider interval
Population variability → wider interval

New cards

Pearson Correlation

Linear relationship between continuous variables (-1 to +1)

KEY LIMITATION: Only captures linear relationships

New cards

Spearman Correlation

Monotonic relationship; uses ranks

New cards

Requirements for establishing causation

Temporal precedence
Covariation
No confounding variables

New cards

Common fallacy of causation

Assuming correlation implies causation

New cards

Solutions for establishing causation

Randomized experiments
Instrumental variables
Natural experiments

New cards

Types of Sampling Bias

Selection Bias
Survivorship Bias
Response Bias
Confirmation Bias

New cards

Selection Bias

Non-representative sample selection

New cards

Survivorship Bias

Only analyzing “survivors” of a process

New cards

Response Bias

Systematic differences in who responds

New cards

Confirmation Bias

Seeking data that confirms preconceptions

New cards

Factors of Sample Size Determination

Desired confidence level
Margin of error
Population variability

New cards

Power analysis for sample size determination

Determining sample size needed to detect meaningful effect

New cards

Common mistake in sample size determination

Use sample size formulas without considering effect size

New cards

A/B Testing Design Principles

Randomization
Power calculation
Duration
Primary metric

New cards

Randomization

Ensures groups are comparable

New cards

Power calculation

Determine sample size before starting

New cards

Duration

Balance statistical power with external validity

New cards

Primary metric

Define success metric before starting

New cards

A/B Testing Common Pitfalls

Peeking
Novelty effect
Simpson’s paradox

New cards

Peeking

Checking results before predetermined end

New cards

Novelty effect

Initial behavior change due to change itself

New cards

Simpson’s paradox

Trend reverses when data is segmented

Explore top notes

Harry Domela

Updated 870d ago

Note

Theories of Personality: Gordon Allport

Updated 1051d ago

Note

4.19 The Late Romantics

Updated 984d ago

Note

Phils and Ethics Key Concepts List

Updated 363d ago

Note

Radio Waves

Updated 1038d ago

Note

Hurricane Prediction and Preparedness

Updated 1010d ago

Note

Chapter 22: Microbial Diseases of the Nervous System

Updated 1096d ago

Note

AP US Government and Politics

Updated 176d ago

Note

Explore top flashcards

Chapter 19: Fetal Health Surveillance

Updated 248d ago

Flashcards (139)

Думи unit 3, 4

Updated 886d ago

Flashcards (26)

H16: Algemene begrippen

Updated 90d ago

Flashcards (283)

Informe

Updated 882d ago

Flashcards (35)

Midterm (Anthropology and Psychology)

Updated 786d ago

Flashcards (29)

External and Middle Ear Disorders

Updated 319d ago

Flashcards (68)

Paleolithic and Neolithic Ages

Updated 617d ago

Flashcards (26)

biology test 2 review

Updated 701d ago

Flashcards (72)