Looks like no one added any tags here yet for you.
Binomial Distribution
Describes the probability of a specific number of successes in a fixed number of independent trials, each with the same probability of success.
Poisson Distribution
Calculates the probability of a certain number of events occurring within a fixed interval of time or space.
Terms in a binomial distribution
n (number of trials), x (number of successes), p (probability of success in a single trial), and q (probability of failure in a single trial)
Terms in a Poisson distribution
λ (lambda) which represents the average rate of events occurring, k which is the number of events happening within a given time frame, and P(k) which denotes the probability of observing exactly "k" events occurring
Terms in a binomial distribution for mean and variance
"n" representing the number of trials and "p" representing the probability of success, where the mean is calculated as "np" and the variance as "np(1-p)"
When to use a binomial test
When you have a fixed number of independent trials, each with only two possible outcomes, and you want to test if the observed proportion of "successes" is significantly different from a known or expected proportion
How to interpret a binomial test
Comparing the p-value to the significance level and determining whether to reject the null hypothesis or not.
One sided tests
You have some reason to believe that your test will go in a specific direction and you can write your null hypothesis to reflect that.
Two sided tests
Your null hypothesis has no direction at all.
P-Value
Calculated probability from your test
Type I Error Rate
A threshold that is set before you conduct your test
Values needed for a Goodness of Fit Test
Observed (the actual counts from your data)
Expected (the counts you would expect to see if the null hypothesis is true)
𝑋² Test
Calculates the squared differences between the observed and expected values
G-test
Calculates the ratio of probabilities under the null and alternative hypotheses.
When to use a goodness of fit test
When you want to asses how well your data aligns with a specific theoretical distribution; analyzes one categorical variable
When to use a test of association
When you have two or more categories and are trying to analyze 2 or more categorical response variables.
Finding degrees of freedom
(number of rows - 1) * (numbers of columns - 1)
How to interpret X² and G-tests
If p -value is less than significance level (usually 0.05), reject the null hypothesis and conclude that there is a statically significance difference between that categorical values being compared.
When to use Yates correction test
When performing a chi-square test on a 2x2 contingency table, particularly when dealing with small sample sizes, to adjust the calculated chi-square. Add 0.5 if ad-bc is positive and subtract 0.5 if ad-bc is negative.
When to use Fisher’s exact test
Fisher’s exact test is used to determine if two categorical variables are independent, especially when sample sizes are small (expected counts <5). It’s ideal for 2x2 contingency tables with binary outcomes. Interpretation: A p-value < 0.05 suggests a significant association. A p-value ≥ 0.05 indicates no significant relationship.
Odds Ratio
Compares the event of two groups, commonly used in case-control studies. ad/bc (successes)
Relative Risk
Compares the actual risk (probability) of an event, mainly used in cohort studies. q1/q2 (failures)
Basic characteristics of the normal distribution
Bell shaped and symmetric, mean, median, and mode are in the center. Defined by the mean (μ) & Standard Deviation (σ), Empirical Rule, Asymptotic, and the total area under the curve is equal to one.
Central Limit Theorem
States that as sample size increases, the distribution of sample means approaches a normal distribution, regardless of the population's shape. The mean of sample means equals the population mean. The spread (Standard Error) decreases as sample size increases.
Calculating z-scores
A z-score (also called a standard score) measures how many standard deviations a data point is from the mean. It is calculated as: Z = X−μ/σ
How z-scores make the standard normal curve
The standard normal curve is made by plotting z-scores instead of the original data. A standard normal curve will always have a mean of zero and a standard deviation of 1. Z-scores tell you how many standard deviations you are from the mean.
Cumulative Probability Graph
Cumulative probability graph (also called a cumulative distribution function (CDF) graph) shows the probability that a random variable X takes on a value less than or equal to a specific value.
Difference between z-score and z-test
Z-score is for individual values. Z-test is for comparing sample means to a population mean.
Interpreting the z-test
If zstat is positive, we’ll compare it to zcrit. If zstat > zcrit, we will reject the null hypothesis. If zstat < zcrit, we will reject the null hypothesis. If zcrit is negative, you can treat it as a positive to make the interpretation easier.
Difference between a z-test and a t-test
Use a t-test when the population standard deviation is unknown and/or the sample size is small (n < 30). Use a z-test when σ is known and the sample size is large (n ≥ 30). A t-test uses the t-distribution, while a z-test uses the normal distribution.
Calculating degrees of freedom a two-sample t-test
n1 + n2 – 2
Assumptions of the t-test
The data in both groups are normally distributed and the variances in both groups are equal this idea is also called homogeneity of variances or homoscedasticity (if variances are not equal, we have heteroscedasticity)
Terms in a two-sample t-test
x̅1, x̅2 = sample means
n1, n2 = sample sizes
s²p = pooled variance