Looks like no one added any tags here yet for you.
Independence
Two events, A and B, are independent if P(A | B) = P(A) or if P(B | A) = P(B)
To calculate probability…
Typically, count the number of participants that had the characteristic of interest and divide by the population size
For conditional probabilities, the population size (denominator) was modified to reflect the subpopulation of interest
P(A and B) = P(A) x P(B) if A and B happen simultaneously
P (C and D) = P(C) + P(D) if C and D are mutually exclusive
Binomial distribution
Model for dichotomous outcome
The binomial formula generates the probability of observing exactly x successes out of n
We typically do not talk about mean and variance for dichotomous variables, but we can quantify mean and variance for every probability distribution
Mean and variance of (random variables generated from) the binomial distribution
Binomial distribution: Model for dichotomous outcome
Two possible values (responses) for each data point: success and failure
Replications of the process are independent
P(success) is constant for each replication
Binomial distribution: The binomial formula generates the probability of observing exactly x successes out of n
P(x success) = (n! / x! (n-x)!) px (1-p)n-x
Binomial Distribution: Notation
n = number of times the process is repeated
p = P(succes) where success is outcome of interest
x = number of successes of interest 0 ≤ x ≤ n
! = factorial; k! = k(k-1)(k-2)… 1
What the binomial formula does in one step
The binomial coefficient (the fraction with factorials) figures out how many such orderings are possible and then multiply by the common probability
Binomial distribution: Mean and variance of (random variables generated from) the binomial distribution
Mean or expected number of successes: μ = np
Variance: σ2 = np(1-p)
Standard deviation: σ
Normal distribution
Aka Gaussian distribution
Model for a continuous outcome when the distribution is well described by a bell-shaped curve
Notation: μ = distribution mean and σ = distribution standard deviation
x-axis is used to display the scale of the characteristic/variable being analyzed (e.g., height, weight, systolic blood pressure)
y-axis reflects the probability density (relative likelihood) of observing each value
Curve is highest in the middle, suggesting that the values near the middle have higher probabilities or are more likely to occur; values at either extreme are much less likely to occur
Mean = median = mode (hump/most frequent value)
Area under the density curve before X=a represent P(X ≤ a)
Properties of normal distribution
The normal distribution is symmetric about the mean i.e. P(X > μ) = P(X < μ) = 0.5.
The mean = the median = the mode
The mean and variance, μ and σ2, completely characterize the normal distribution
P(a < X < b) = the area under the normal density curve from a to b
𝑃 (𝜇 − 𝜎 < 𝑋 < 𝜇 + 𝜎) = 0.68
𝑃 (𝜇 − 2𝜎 < 𝑋 < 𝜇 + 2𝜎) = 0.95
𝑃 (𝜇 − 3𝜎 < 𝑋 < 𝜇 + 3𝜎) = 0.99
The probability density function of a normal distribution is given by p(x) = (1 / σ√2π) e-(x-μ)² / 2𝜎²
Standard normal distribution Z
Normal distribution with μ = 0 and 𝜎 = 1
P(-1 < X < 1) = 0.68
P(-2 < X < 2) = 0.95
P(-3 < X < 3) = 0.99
Z = x-μ / 𝜎
Normal distribution calculate probability
Template: Computing probabilities about normal distributions
For the normal distribution, and for other distributions for any continuous variable, there is no area in a single line, and thus the absolute likelihood P(x = specific value) is defined as 0
Normal distribution calculate probability: Template: Computing probabilities about normal distributions
First standardize or convert a problem about a normal distribution (X) into a problem about the standard normal distribution (Z)
Then use the Z table to compute the desired probability
Normal distribution calculate probability: For the normal distribution, and for other distributions for any continuous variable, there is no area in a single line, and thus the absolute likelihood P(x = specific value) is defined as 0
This is not true for the binomial distribution and for other probability distributions for discrete/categorical/ordinal variables
Percentiles of the normal distribution
The kth percentile is defined as the score that holds k percent of the scores below it
90th percentile is the score that holds 90% of the scores below it
Percentiles of the normal distribution: 90th percentile is the score that holds 90% of the scores below it
Q1 = 25th percentile
Median = 50th percentile
Q3 = 75th percentile
For the normal distribution, the following is used to compute percentiles
x = μ + z 𝜎
z = x - μ / 𝜎
where
μ = mean of the random variable X
𝜎 = standard deviation
z = value from the standard distribution for the desired percentile