1/131
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Week 1
Foundations of Probability
What is frequentist probability
Probability as the long run frequency of a repeatable process — flip a coin enough times and half will be heads
What is Bayesian probability
Probability as a degree of belief that updates when new evidence arrives — like a weather forecast updating with new data
What is conditional probability
The probability of something happening given that something else is already known to be true
What is a sample space
All possible outcomes before any condition is applied
P(win | switch) = 2/3
Switching doors in Monty Hall wins 2 out of every 3 times — the host's reveal moves the leftover probability onto the remaining door
P(win | stay) = 1/3
Staying with your original door only wins 1 out of every 3 times — your first pick only ever had a 1 in 3 shot
P(land on step n) = 1 divided by E[X] = 1 divided by 1.5 = 2/3
The long run chance of landing on any specific step equals 1 divided by the average step size
P(both male | at least one male) = 1/3
Knowing at least one pup is male removes one outcome from the list leaving 3 equal possibilities and only 1 is both male
Walk me through the Week 1 code
The code runs the Monty Hall game 10000 times. Each run hides the car randomly. The contestant picks a door. The host opens a goat door. The contestant either switches or stays. Wins are counted and divided by the total to get the win rate. A second simulation walks step by step from step 1 adding 1 or 2 each flip and counts how often it lands exactly on step 100 across 100000 tries
Was Week 1 originally submitted correctly
Yes
Week 2
Normal Distributions and Empirical Data
What is a normal distribution
A symmetric bell shaped curve that describes how data spreads around an average — the basis of most classic statistics
What is the Shapiro-Wilk test
A test that checks if data follows a bell curve — if the result is below 0.05 the data is not normal
What is skewness
How lopsided the data is — 0 means balanced
What is kurtosis
How peaked the data is — normal is 3
What is a gain score
A way to measure improvement that accounts for how much room a student had to grow — not just the raw change in score
f'(-1) = 1 divided by sqrt(2 times pi times e) = 0.2420
The slope of the bell curve at x minus 1 is gentle not steep — used to identify which of two graphs is geometrically accurate
gain = (final minus initial) divided by (1 minus initial)
Measures how much a student improved relative to how much they could have improved — accounts for starting point
Walk me through the Week 2 code
The slope formula is calculated directly using the math library with no data needed. The police shooting file is loaded and split into two groups by race. For each group the code finds the average age how skewed it is and runs the normality test. The test scores file is loaded and a gain column is calculated for each student using the formula. The same skewness kurtosis and normality test run on those scores and everything is printed
Was Week 2 originally submitted correctly
Mostly — skewness was off by 0.01 and kurtosis was off by 0.09 which are small rounding differences. The bigger issue was the filename in the code used spaces instead of underscores so the file would not load at all
Week 3
Bayes Theorem
What is prior probability
What you believe about something before seeing any new evidence — usually the background rate of how common it is
What is posterior probability
Your updated belief after running Bayes theorem — what you believe after seeing the evidence
What is sensitivity
How often a test correctly says positive when the thing is truly there
What is specificity
How often a test correctly says negative when the thing is truly absent
What is the false positive rate
How often the test wrongly says positive when the thing is not there — equals 1 minus specificity
What is the base rate fallacy
Ignoring how rare something is when deciding whether a positive test result is real — leads to overestimating reliability
P(A|B) = P(B|A) times P(A) divided by P(B)
The core Bayes formula — flips conditional probabilities so you can find the cause from the effect
P(B) = P(B|A) times P(A) + P(B|not A) times P(not A)
The total probability formula — fills in the bottom of the Bayes fraction by accounting for all the ways the evidence can show up
Medical test at 0.1% incidence gives P(disease|positive) = 0.0194
Even with a 99% accurate test only about 2% of positives are real when the disease is very rare — the low base rate dominates
Medical test at 10% incidence gives P(disease|positive) = 0.6875
The same test is nearly 69% reliable when the disease is more common — the prior drives the posterior
50% certainty threshold = false positive rate divided by (sensitivity + false positive rate) = 4.8%
The minimum disease rate needed before a positive result is more likely real than false
Burglar alarm result = 8.68%
Despite the alarm sounding there is only an 8.68% chance of an actual intruder because burglaries are so rare
Container result — P(C1|green) = 1/3 and P(C2|green) = 2/3
Seeing a green ball makes Container 2 twice as likely as Container 1 since it has more green balls
Sam logins result — P(logged yesterday | more than 5 min today) = 4/9
About 44% chance Sam logged in yesterday given he spent more than 5 minutes today
Walk me through the Week 3 code
One general function takes three inputs — how common something is
Was Week 3 originally submitted correctly
Yes
Week 4
Introduction to Bayesian Data Analysis
What is the frequentist framework
Treats unknown values as fixed constants — probability only describes long run frequencies of repeatable events
What is the Bayesian framework
Treats unknown values as uncertain — probability describes degrees of belief that update as evidence arrives
What is a prior
What you believe about a value before seeing any data
What is a likelihood
How well the data fits each possible value of the unknown
What is a posterior
The updated belief after combining the prior and the data together
What is a credibility interval
A Bayesian range estimate — means there is literally a stated probability the true value falls inside it given the data
Was Week 4 originally submitted correctly
Yes
Week 5
Bayesian Estimation and the Beta Distribution
What is the Beta distribution
A flexible distribution that lives between 0 and 1 — used to represent uncertainty about a probability
What is a conjugate prior
A prior that stays in the same family after updating — the Beta is conjugate to the Binomial so updating with coin flip data always gives another Beta
What is a credibility interval
A range with a direct probability meaning — there is genuinely a stated percentage chance the true value falls inside it
What is the posterior mean
The average of the posterior distribution — used as the best single estimate of the unknown probability
Beta(1 and 1) as a starting distribution
A flat starting point — every possible value of p between 0 and 1 is equally plausible before seeing any data
Beta(1+H and 1+T) as the updated distribution
After seeing H heads and T tails this is the updated distribution — the entire Bayesian update done in one step of arithmetic
Posterior mean = alpha divided by (alpha + beta)
The best single estimate of p — the average of the updated distribution
95% credibility interval using Beta inverse at 0.025 and 0.975
The range holding the middle 95% of the posterior — computed using the inverse of the Beta distribution
Walk me through the Week 5 code
The coin toss file is loaded with one column per experiment. For each column the code counts heads and tails after removing blank rows. It adds 1 to each count to get the updated distribution shape. The average of that shape is computed as the estimate of p. The inverse of the Beta distribution is called twice to get the lower and upper bounds of the 95% and 99% intervals. Results are printed for all five experiments
Was Week 5 originally submitted correctly
Yes — all five experiments matched the actual data exactly
Week 6
Regression Analysis LLS vs LAD
What is a residual
The gap between what the model predicted and what actually happened — actual value minus predicted value
What is LLS
Fits a line by minimizing the sum of squared gaps — sensitive to outliers because squaring makes big errors very large
What is LAD
Fits a line by minimizing the sum of absolute gaps — more forgiving of outliers because it does not square them
What does robust mean
A method that holds up well even when some data points are extreme or unusual
What is Shapiro-Wilk on residuals
Testing whether the prediction errors follow a bell curve — required for regression results to be trustworthy
LLS objective — minimize sum of (y minus predicted) squared
Squaring every gap before adding means outliers count far more than typical points
LAD objective — minimize sum of absolute value of (y minus predicted)
Taking the absolute value means outliers count proportionally not exponentially
Shapiro-Wilk p below 0.05 on residuals
The prediction errors are not normally distributed which means the regression assumptions are violated
Walk me through the Week 6 code
The crab data is loaded with pre-molt size as input and post-molt size as the target. LLS is run first using a built-in function that solves it directly and calculates the gaps. For LAD a custom function measures total absolute gap. An optimizer searches for the intercept and slope that make that value as small as possible starting from the LLS answer. Normality tests are run on the gaps from both models to compare which produced cleaner errors
Was Week 6 originally submitted correctly
No — the original reported intercept as minus 25.21 but the actual data gives plus 25.80 and the sign was completely wrong. The slope was also off. Both models reject normality on the real data which contradicts the original. The original results did not come from running the code against the actual file — the filename and column names were never verified
Week 7
Statistical Significance and Effect Size
What is a p-value
The chance of seeing a result this extreme if there were actually no real effect — small p means the result is probably not a fluke
What is effect size
A number that says how big a difference actually is regardless of sample size — standardized so you can compare across studies
What is Cohen's d
Effect size for comparing two group averages — the difference in means divided by the combined spread. Below 0.2 tiny
What is Cohen's h
Same idea as Cohen's d but used when comparing two percentages instead of two averages
What is a Monte Carlo permutation test
Tests significance without assuming normality — shuffles group labels thousands of times to see if the real gap could happen by chance
What is the KS statistic
The biggest vertical gap between two cumulative distribution curves — measures how different two distributions are overall
Cohen's d = (mean x minus mean y) divided by pooled SD
Standardizes the gap between groups so you can judge how meaningful it is on a universal scale
Cohen's h = 2 times arcsin(sqrt(p2)) minus 2 times arcsin(sqrt(p1))
Transforms percentages before taking the difference to account for how proportions naturally behave
Monte Carlo p-value = count of shuffles matching or beating the real gap divided by total shuffles
What fraction of random label shuffles produced a gap as big as the real one — the non-parametric significance test
Walk me through the Week 7 code
The police data is loaded and split into Black and White age arrays. A Cohen's d function is written using the pooled standard deviation formula. A KS test compares the overall shape of both distributions. A Monte Carlo function pools both arrays shuffles labels 10000 times and measures the gap each time. The fraction of shuffles that matched or beat the real gap becomes the p-value. A separate Cohen's h function handles the course success rate comparison using the arcsin transformation
Was Week 7 originally submitted correctly
No — Cohen's d was reported as 0.56 but the actual value is 0.58. The submission also left out the CDF comparison the worksheet required
Week 9
Logistic Regression
What is a binary outcome
An outcome that can only be one of two things — retained or not
What is logistic regression
A model that predicts the probability of a yes or no outcome — keeps all predictions between 0 and 1 using an S-shaped curve
What is an odds ratio
What you get when you take exp of a coefficient — tells you how much the odds multiply for each one unit increase in a predictor. 2.0 means they double
What is complete separation
When one variable perfectly predicts the outcome — the model breaks down because the coefficient tries to grow to infinity
What is classification accuracy
The percentage of cases the model predicted correctly
What is baseline accuracy
What you get by always predicting the most common outcome — the model must beat this to be useful
Logistic formula — P(Y=1) = 1 divided by (1 plus exp of negative linear combination)
Feeds any combination of inputs through an S-curve to produce a probability between 0 and 1
Log-odds = b0 + b1 times X1 + b2 times X2
The linear form of the model — makes it solvable while the sigmoid handles the probability constraint
Odds ratio = exp of coefficient
Converts a coefficient into a real world multiplier for the odds
Walk me through the Week 9 code
The Excel file is loaded and split into retained and not retained groups to compare averages. After seeing GPA and SAT are nearly identical between groups but meetings and workshops differ a lot those engagement variables are chosen as inputs. Rows with missing values are dropped. A constant is added. The statsmodels Logit function fits the model and prints p-values. Odds ratios are computed by taking exp of each coefficient. Predictions above 0.5 are called retained and accuracy is measured against actual outcomes
Was Week 9 originally submitted correctly
No — this was the most serious error. The original reported 1000 students when the actual dataset has 105. The file referenced does not exist and the column names do not match the actual file so it would crash before producing any output. Every result including p-values odds ratios and accuracy was wrong. The actual significant predictors are peer mentor meetings and workshops not GPA or SAT
Week 10
Cross Validation
What is overfitting
When a model learns training data too well and does worse on new data — it memorized the noise instead of the real pattern
What is k-fold cross-validation
Splitting data into k equal chunks training on k minus 1 and testing on the remaining one rotating k times so every chunk gets tested exactly once
What is training MSE
The average squared prediction error on the same data used to train — always an optimistic overestimate
What is CV MSE
The average squared prediction error on held-out data — the honest estimate of real world performance
What is R-squared
The share of variation in the outcome that the model explains — 0.86 means 86% of the variation in math scores is captured
MSE = (1/n) times sum of (actual minus predicted) squared
Average squared prediction error — lower means the model is predicting more accurately
R-squared = 1 minus (residual variance divided by total variance)
Proportion of variation in the outcome the model explains — higher is better