Notes on Normal Distribution, Tail Probabilities, Binomial Density, and Sampling Information

Normal distribution, tail probabilities, and percentiles

Z is standard normal: $Z hicksim N(0,1)$
Cumulative distribution function (CDF) of Z: $\Phi(z) = \mathbb{P}(Z \le z)$
Probability in the right tail: \mathbb{P}(Z > z) = 1 - \Phi(z) = \Phi(-z)
Bottom line: the term
- "lower tail" refers to the left tail (probability mass to the left of a point)
- "upper tail" refers to the right tail (probability mass to the right of a point)
Important intuition: "lower tail probability" is about the direction (left) rather than being automatically < 0.5; it depends on the chosen threshold, not on the tail label alone.
Inverse normal (percentiles): the function that maps a probability to a z-value such that the CDF equals that probability.
- Let $qnorm(p) = z\;\text{such that}\;\Phi(z)=p$
- Example:
- Left-tail probability 0.1: $qnorm(0.1) \approx -1.2816$
- Left-tail probability 0.5: $qnorm(0.5) = 0$
Left-tail vs right-tail interpretation examples:
- If we want the value so that only 10% lies to the left of it, threshold is the 10th percentile: $z_{0.10} = qnorm(0.10) \approx -1.2816$
- If we want only 10% to lie above a threshold (i.e., 90th percentile on the left), threshold is $z_{0.90} = qnorm(0.90) \approx 1.2816$
- Practical takeaway: to find the cutoff so that 10% are above, use the 90th percentile; to find the cutoff so that 10% are below, use the 10th percentile.
Example interpretation with a real-world context (packages):
- If you ask for the weight threshold with only 10% of packages weighing above it, you are asking for the 90th percentile of the weight distribution (right tail 0.10). The corresponding z-value is approximately $+1.2816$ for the standard normal.

- Conversely, asking for the left-tail probability of 0.10 yields threshold at z ≈ -1.2816.

Binomial density (often called the binomial PMF) and the "de novo" function

If X ~ Binomial(n, p), the probability of observing exactly k successes is:
$\mathbb{P}(X=k) = \binom{n}{k} p^{k} (1-p)^{n-k}$
The density/value at a given x is often denoted as y1 in notes, i.e., the pair (x, y1) represents a counting point on the distribution.
Observations about sample size and density estimates:
- With a small n (e.g., n=100), the binomial distribution is relatively discrete and gaps between consecutive probabilities can be noticeable.
- As n grows (e.g., n=1000), the distribution becomes smoother and gaps fill in, making the histogram/empirical density look more continuous.

- Practical implication: larger samples provide clearer, more reliable estimates of the underlying distribution pattern; more data reduces sampling error and improves decision-making.

Data, information, and decision quality

More data generally yields more information, which reduces uncertainty (variance) and error in estimates.
However, not all data are equally useful: signal vs. noise matters; pattern recognition and appropriate summarization are essential.
Early intuition: with more observations, the observed pattern (deviations from expectation) becomes clearer, enabling better decisions.

- Core idea: data quality and quantity together determine the reliability of conclusions.

Sample means and the role of X̄ (the sample mean)

X̄ denotes the sample mean from a sample of observations.
In many contexts, X̄ is used as an estimator of the population mean μ.
Key properties (standard results):
- Expectation: $\mathbb{E}[\bar{X}] = \mu$
- Variance: $\operatorname{Var}(\bar{X}) = \dfrac{\sigma^{2}}{n}$
- By the Central Limit Theorem (for large n):
  $\dfrac{\bar{X} - \mu}{\sigma/\sqrt{n}} \;\xrightarrow{d}\; N(0,1)$
Practical takeaway:
- As the sample size n increases, the distribution of the sample mean concentrates around μ (the variance shrinks as 1/n).
- This tightening reduces the standard error and improves the precision of estimates and hypothesis tests.

- The phrase in notes about X̄ “within keep increasing” can be interpreted as the idea that more samples will push the estimate toward the true mean and reduce sampling variability.

Connections to foundational principles and real-world relevance

Links to probability basics: CDF, tail probabilities, quantiles, percentiles, and inverse functions.
Link to distribution theory: binomial pmf, mean/variance, and how discrete distributions approximate continuous ones with larger n.
Practical data science workflow:
- Compute tail probabilities to understand risk or extreme events (left vs right tail).
- Use inverse CDF (quantiles) to set thresholds corresponding to desired tail probabilities.
- Model counts with binomial distribution, assess probabilities of observed counts, and interpret density values as frequencies.
- Gather more data to reduce noise, observe clearer deviation patterns, and make more informed decisions.
Ethical, practical implications:
- Decisions based on misinterpreted tail probabilities or incorrect percentiles can lead to misassessment of risk.
- Collecting more data should be paired with attention to data quality, representativeness, and potential biases.

- Overreliance on asymptotic results (e.g., CLT for small n) can be misleading; check conditions before applying normal approximations.

Summary of key formulas to memorize

Normal CDF and tails:
- $\Phi(z) = \mathbb{P}(Z \le z)$ , $Z \sim N(0,1)$
- Right tail: \mathbb{P}(Z > z) = 1 - \Phi(z) = \Phi(-z)
Inverse normal / percentile relation:
- $qnorm(p) = z \text{ such that } \Phi(z) = p$
- Examples:
- $qnorm(0.1) \approx -1.2816$
- $qnorm(0.9) \approx 1.2816$
Binomial density (PMF):
- $\mathbb{P}(X=k) = \binom{n}{k} p^{k} (1-p)^{n-k}$
Sample mean properties:
- $\mathbb{E}[\bar{X}] = \mu$
- $\operatorname{Var}(\bar{X}) = \dfrac{\sigma^{2}}{n}$
- CLT normalization: $\dfrac{\bar{X} - \mu}{\sigma/\sqrt{n}} \;\xrightarrow{d}\; N(0,1)$
Meanings: interpret thresholds and percentiles in terms of left/right tails and their probabilities.