Fuck Ahh Gary

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/131

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 10:42 PM on 5/11/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

132 Terms

1
New cards

Week 1

Foundations of Probability

2
New cards

What is frequentist probability

Probability as the long run frequency of a repeatable process — flip a coin enough times and half will be heads

3
New cards

What is Bayesian probability

Probability as a degree of belief that updates when new evidence arrives — like a weather forecast updating with new data

4
New cards

What is conditional probability

The probability of something happening given that something else is already known to be true

5
New cards

What is a sample space

All possible outcomes before any condition is applied

6
New cards

P(win | switch) = 2/3

Switching doors in Monty Hall wins 2 out of every 3 times — the host's reveal moves the leftover probability onto the remaining door

7
New cards

P(win | stay) = 1/3

Staying with your original door only wins 1 out of every 3 times — your first pick only ever had a 1 in 3 shot

8
New cards

P(land on step n) = 1 divided by E[X] = 1 divided by 1.5 = 2/3

The long run chance of landing on any specific step equals 1 divided by the average step size

9
New cards

P(both male | at least one male) = 1/3

Knowing at least one pup is male removes one outcome from the list leaving 3 equal possibilities and only 1 is both male

10
New cards

Walk me through the Week 1 code

The code runs the Monty Hall game 10000 times. Each run hides the car randomly. The contestant picks a door. The host opens a goat door. The contestant either switches or stays. Wins are counted and divided by the total to get the win rate. A second simulation walks step by step from step 1 adding 1 or 2 each flip and counts how often it lands exactly on step 100 across 100000 tries

11
New cards

Was Week 1 originally submitted correctly

Yes

12
New cards

Week 2

Normal Distributions and Empirical Data

13
New cards

What is a normal distribution

A symmetric bell shaped curve that describes how data spreads around an average — the basis of most classic statistics

14
New cards

What is the Shapiro-Wilk test

A test that checks if data follows a bell curve — if the result is below 0.05 the data is not normal

15
New cards

What is skewness

How lopsided the data is — 0 means balanced

16
New cards

What is kurtosis

How peaked the data is — normal is 3

17
New cards

What is a gain score

A way to measure improvement that accounts for how much room a student had to grow — not just the raw change in score

18
New cards

f'(-1) = 1 divided by sqrt(2 times pi times e) = 0.2420

The slope of the bell curve at x minus 1 is gentle not steep — used to identify which of two graphs is geometrically accurate

19
New cards

gain = (final minus initial) divided by (1 minus initial)

Measures how much a student improved relative to how much they could have improved — accounts for starting point

20
New cards

Walk me through the Week 2 code

The slope formula is calculated directly using the math library with no data needed. The police shooting file is loaded and split into two groups by race. For each group the code finds the average age how skewed it is and runs the normality test. The test scores file is loaded and a gain column is calculated for each student using the formula. The same skewness kurtosis and normality test run on those scores and everything is printed

21
New cards

Was Week 2 originally submitted correctly

Mostly — skewness was off by 0.01 and kurtosis was off by 0.09 which are small rounding differences. The bigger issue was the filename in the code used spaces instead of underscores so the file would not load at all

22
New cards

Week 3

Bayes Theorem

23
New cards

What is prior probability

What you believe about something before seeing any new evidence — usually the background rate of how common it is

24
New cards

What is posterior probability

Your updated belief after running Bayes theorem — what you believe after seeing the evidence

25
New cards

What is sensitivity

How often a test correctly says positive when the thing is truly there

26
New cards

What is specificity

How often a test correctly says negative when the thing is truly absent

27
New cards

What is the false positive rate

How often the test wrongly says positive when the thing is not there — equals 1 minus specificity

28
New cards

What is the base rate fallacy

Ignoring how rare something is when deciding whether a positive test result is real — leads to overestimating reliability

29
New cards

P(A|B) = P(B|A) times P(A) divided by P(B)

The core Bayes formula — flips conditional probabilities so you can find the cause from the effect

30
New cards

P(B) = P(B|A) times P(A) + P(B|not A) times P(not A)

The total probability formula — fills in the bottom of the Bayes fraction by accounting for all the ways the evidence can show up

31
New cards

Medical test at 0.1% incidence gives P(disease|positive) = 0.0194

Even with a 99% accurate test only about 2% of positives are real when the disease is very rare — the low base rate dominates

32
New cards

Medical test at 10% incidence gives P(disease|positive) = 0.6875

The same test is nearly 69% reliable when the disease is more common — the prior drives the posterior

33
New cards

50% certainty threshold = false positive rate divided by (sensitivity + false positive rate) = 4.8%

The minimum disease rate needed before a positive result is more likely real than false

34
New cards

Burglar alarm result = 8.68%

Despite the alarm sounding there is only an 8.68% chance of an actual intruder because burglaries are so rare

35
New cards

Container result — P(C1|green) = 1/3 and P(C2|green) = 2/3

Seeing a green ball makes Container 2 twice as likely as Container 1 since it has more green balls

36
New cards

Sam logins result — P(logged yesterday | more than 5 min today) = 4/9

About 44% chance Sam logged in yesterday given he spent more than 5 minutes today

37
New cards

Walk me through the Week 3 code

One general function takes three inputs — how common something is

38
New cards

Was Week 3 originally submitted correctly

Yes

39
New cards

Week 4

Introduction to Bayesian Data Analysis

40
New cards

What is the frequentist framework

Treats unknown values as fixed constants — probability only describes long run frequencies of repeatable events

41
New cards

What is the Bayesian framework

Treats unknown values as uncertain — probability describes degrees of belief that update as evidence arrives

42
New cards

What is a prior

What you believe about a value before seeing any data

43
New cards

What is a likelihood

How well the data fits each possible value of the unknown

44
New cards

What is a posterior

The updated belief after combining the prior and the data together

45
New cards

What is a credibility interval

A Bayesian range estimate — means there is literally a stated probability the true value falls inside it given the data

46
New cards

Was Week 4 originally submitted correctly

Yes

47
New cards

Week 5

Bayesian Estimation and the Beta Distribution

48
New cards

What is the Beta distribution

A flexible distribution that lives between 0 and 1 — used to represent uncertainty about a probability

49
New cards

What is a conjugate prior

A prior that stays in the same family after updating — the Beta is conjugate to the Binomial so updating with coin flip data always gives another Beta

50
New cards

What is a credibility interval

A range with a direct probability meaning — there is genuinely a stated percentage chance the true value falls inside it

51
New cards

What is the posterior mean

The average of the posterior distribution — used as the best single estimate of the unknown probability

52
New cards

Beta(1 and 1) as a starting distribution

A flat starting point — every possible value of p between 0 and 1 is equally plausible before seeing any data

53
New cards

Beta(1+H and 1+T) as the updated distribution

After seeing H heads and T tails this is the updated distribution — the entire Bayesian update done in one step of arithmetic

54
New cards

Posterior mean = alpha divided by (alpha + beta)

The best single estimate of p — the average of the updated distribution

55
New cards

95% credibility interval using Beta inverse at 0.025 and 0.975

The range holding the middle 95% of the posterior — computed using the inverse of the Beta distribution

56
New cards

Walk me through the Week 5 code

The coin toss file is loaded with one column per experiment. For each column the code counts heads and tails after removing blank rows. It adds 1 to each count to get the updated distribution shape. The average of that shape is computed as the estimate of p. The inverse of the Beta distribution is called twice to get the lower and upper bounds of the 95% and 99% intervals. Results are printed for all five experiments

57
New cards

Was Week 5 originally submitted correctly

Yes — all five experiments matched the actual data exactly

58
New cards

Week 6

Regression Analysis LLS vs LAD

59
New cards

What is a residual

The gap between what the model predicted and what actually happened — actual value minus predicted value

60
New cards

What is LLS

Fits a line by minimizing the sum of squared gaps — sensitive to outliers because squaring makes big errors very large

61
New cards

What is LAD

Fits a line by minimizing the sum of absolute gaps — more forgiving of outliers because it does not square them

62
New cards

What does robust mean

A method that holds up well even when some data points are extreme or unusual

63
New cards

What is Shapiro-Wilk on residuals

Testing whether the prediction errors follow a bell curve — required for regression results to be trustworthy

64
New cards

LLS objective — minimize sum of (y minus predicted) squared

Squaring every gap before adding means outliers count far more than typical points

65
New cards

LAD objective — minimize sum of absolute value of (y minus predicted)

Taking the absolute value means outliers count proportionally not exponentially

66
New cards

Shapiro-Wilk p below 0.05 on residuals

The prediction errors are not normally distributed which means the regression assumptions are violated

67
New cards

Walk me through the Week 6 code

The crab data is loaded with pre-molt size as input and post-molt size as the target. LLS is run first using a built-in function that solves it directly and calculates the gaps. For LAD a custom function measures total absolute gap. An optimizer searches for the intercept and slope that make that value as small as possible starting from the LLS answer. Normality tests are run on the gaps from both models to compare which produced cleaner errors

68
New cards

Was Week 6 originally submitted correctly

No — the original reported intercept as minus 25.21 but the actual data gives plus 25.80 and the sign was completely wrong. The slope was also off. Both models reject normality on the real data which contradicts the original. The original results did not come from running the code against the actual file — the filename and column names were never verified

69
New cards

Week 7

Statistical Significance and Effect Size

70
New cards

What is a p-value

The chance of seeing a result this extreme if there were actually no real effect — small p means the result is probably not a fluke

71
New cards

What is effect size

A number that says how big a difference actually is regardless of sample size — standardized so you can compare across studies

72
New cards

What is Cohen's d

Effect size for comparing two group averages — the difference in means divided by the combined spread. Below 0.2 tiny

73
New cards

What is Cohen's h

Same idea as Cohen's d but used when comparing two percentages instead of two averages

74
New cards

What is a Monte Carlo permutation test

Tests significance without assuming normality — shuffles group labels thousands of times to see if the real gap could happen by chance

75
New cards

What is the KS statistic

The biggest vertical gap between two cumulative distribution curves — measures how different two distributions are overall

76
New cards

Cohen's d = (mean x minus mean y) divided by pooled SD

Standardizes the gap between groups so you can judge how meaningful it is on a universal scale

77
New cards

Cohen's h = 2 times arcsin(sqrt(p2)) minus 2 times arcsin(sqrt(p1))

Transforms percentages before taking the difference to account for how proportions naturally behave

78
New cards

Monte Carlo p-value = count of shuffles matching or beating the real gap divided by total shuffles

What fraction of random label shuffles produced a gap as big as the real one — the non-parametric significance test

79
New cards

Walk me through the Week 7 code

The police data is loaded and split into Black and White age arrays. A Cohen's d function is written using the pooled standard deviation formula. A KS test compares the overall shape of both distributions. A Monte Carlo function pools both arrays shuffles labels 10000 times and measures the gap each time. The fraction of shuffles that matched or beat the real gap becomes the p-value. A separate Cohen's h function handles the course success rate comparison using the arcsin transformation

80
New cards

Was Week 7 originally submitted correctly

No — Cohen's d was reported as 0.56 but the actual value is 0.58. The submission also left out the CDF comparison the worksheet required

81
New cards

Week 9

Logistic Regression

82
New cards

What is a binary outcome

An outcome that can only be one of two things — retained or not

83
New cards

What is logistic regression

A model that predicts the probability of a yes or no outcome — keeps all predictions between 0 and 1 using an S-shaped curve

84
New cards

What is an odds ratio

What you get when you take exp of a coefficient — tells you how much the odds multiply for each one unit increase in a predictor. 2.0 means they double

85
New cards

What is complete separation

When one variable perfectly predicts the outcome — the model breaks down because the coefficient tries to grow to infinity

86
New cards

What is classification accuracy

The percentage of cases the model predicted correctly

87
New cards

What is baseline accuracy

What you get by always predicting the most common outcome — the model must beat this to be useful

88
New cards

Logistic formula — P(Y=1) = 1 divided by (1 plus exp of negative linear combination)

Feeds any combination of inputs through an S-curve to produce a probability between 0 and 1

89
New cards

Log-odds = b0 + b1 times X1 + b2 times X2

The linear form of the model — makes it solvable while the sigmoid handles the probability constraint

90
New cards

Odds ratio = exp of coefficient

Converts a coefficient into a real world multiplier for the odds

91
New cards

Walk me through the Week 9 code

The Excel file is loaded and split into retained and not retained groups to compare averages. After seeing GPA and SAT are nearly identical between groups but meetings and workshops differ a lot those engagement variables are chosen as inputs. Rows with missing values are dropped. A constant is added. The statsmodels Logit function fits the model and prints p-values. Odds ratios are computed by taking exp of each coefficient. Predictions above 0.5 are called retained and accuracy is measured against actual outcomes

92
New cards

Was Week 9 originally submitted correctly

No — this was the most serious error. The original reported 1000 students when the actual dataset has 105. The file referenced does not exist and the column names do not match the actual file so it would crash before producing any output. Every result including p-values odds ratios and accuracy was wrong. The actual significant predictors are peer mentor meetings and workshops not GPA or SAT

93
New cards

Week 10

Cross Validation

94
New cards

What is overfitting

When a model learns training data too well and does worse on new data — it memorized the noise instead of the real pattern

95
New cards

What is k-fold cross-validation

Splitting data into k equal chunks training on k minus 1 and testing on the remaining one rotating k times so every chunk gets tested exactly once

96
New cards

What is training MSE

The average squared prediction error on the same data used to train — always an optimistic overestimate

97
New cards

What is CV MSE

The average squared prediction error on held-out data — the honest estimate of real world performance

98
New cards

What is R-squared

The share of variation in the outcome that the model explains — 0.86 means 86% of the variation in math scores is captured

99
New cards

MSE = (1/n) times sum of (actual minus predicted) squared

Average squared prediction error — lower means the model is predicting more accurately

100
New cards

R-squared = 1 minus (residual variance divided by total variance)

Proportion of variation in the outcome the model explains — higher is better