Probability Distributions: The Normal Distribution

The distribution of birth weight for a sample of 500 full-term babies born in Brisbane, Australia, followed a roughly symmetric and bell-shaped curve.
Key Point: Distributions of continuous variables frequently exhibit this bell-shaped form.

The normal distribution is a mathematical description of the behavior of a random variable.
It allows analysis of underlying processes that lead to observations and enables testing data against specified models.
Once sample estimates for the mean ($ar{x}$) and standard deviation (SD) ($ ext{σ}$) are obtained, this distribution can be applied to fit the sample data.
Parameters:
- Mean ($ar{x}$) controls the location of the central peak.
- Standard deviation ($ ext{σ}$) controls the extent of variability.

If the probability model fits the sample, features of the model can be used for further population inferences or predictions.
Properties of the Normal Distribution:
- Approximately 68% of the distribution is within 1 standard deviation ($ ext{σ}$) from the mean.
- Approximately 95% is within 2 standard deviations.
- \ $ ext{Pr}( ext{mean} - 2 ext{σ} < X < ext{mean} + 2 ext{σ})
  ightarrow ext{0.95}$
- Formula: $1 - 0.025 = 0.975$ leads to $1.960$ as the approx value of 2 standard deviations.

Continuity: The normal distribution describes continuous variables over the entire numerical scale.
Symmetry: It is bell-shaped and symmetrical about its mean value.
Deterministic: Defined entirely by its mean ($ar{x}$) and standard deviation ($ ext{σ}$).
Limits set by standard deviations predict the proportion enclosed in the distribution.
The standard normal distribution ($N(0,1)$) helps to understand limits related to probabilities through conversion:
- Conversion formula: \ $Z = \frac{x - \bar{x}}{\text{σ}}$

The binomial distribution can be approximated by the normal distribution as sample size increases, particularly when $n$ is large and $p$ is close to 0.5.
The properties of a binomial $B(n,p)$ distribution are:
- Mean ($ ext{μ}$): $μ = np$
- Variance ($ ext{σ}^2$): $σ^2 = np(1 - p)$

For continuous random variables, probabilities relate not to single values but to ranges of values.
The probability density function (pdf) gives probabilities over intervals rather than for specific points:
- Example: Cannot predict the probability of a person being exactly 1.67m tall, but can calculate the probability of being between 1.665m and 1.675m tall.

The concept of sampling variability is fundamental to statistical inference.
Estimates based on a sample lead to uncertainty, calibrated using confidence intervals (CIs) and hypothesis testing.
Behavior Properties of Random Sampling:
1. The mean of the sampling distribution equals the mean of the population.
2. Variability of the sampling distribution decreases with increased sample size. Thus, larger samples yield more precise population estimates.

The standard deviation of the mean is known as the Standard Error (SE): \ $\text{SE} = \frac{\text{σ}}{\sqrt{N}}$
Larger sample sizes lead to smaller standard errors, enhancing precision in estimates.
If a population's original variance is greater, the standard error will increase accordingly.

States that regardless of the underlying distribution, the means of samples will tend to follow a normal distribution as long as the sample size is sufficiently large (usually $N > 30$).

A sample mean serves as an estimator for the unknown population mean. For a sample of birthweights:
- Mean = 3.49
Calculated SE is 0.07 kg, yielding a 95% CI:
- $3.49 ± (1.96 imes 0.07)$, resulting in upper and lower limits that enclose the population mean.

Understanding group comparisons is vital in public health research, focusing on distinct health outcomes or risk factors across different groups.
Key Stages in Analysis:
1. Describe data patterns within groups.
2. Evaluate significance of differences between groups, with emphasis on causal and modifiable risk factors.

Chi-Squared Goodness of Fit Test:
- Evaluates if observations significantly differ from expected frequencies.
- Calculated with $\chi^{2} = \sum \frac{(O - E)^{2}}{E}$, where O=observed and E=expected counts.

Expected frequencies are calculated under the assumption of independence between categories.
An example shown involves skin cancer incidence patterns based on skin types, showcasing variations in observed and expected findings.

Understanding normal distribution properties and statistical inference techniques is essential for analyzing health data effectively.
Emphasizing precise definitions and calculation methodologies supports accurate data interpretation.