Stats 3.5 Transformation of Variables

Main Topic: Various statistical distributions and concepts, including derivations and applications.

The session focuses on deriving and understanding distributions, specifically the Poisson, binomial, Bernoulli, and uniform distributions.
Emphasis on the interrelations of these distributions and practical implications in statistics.

Bernoulli Distribution
- Defined for a single trial with two possible outcomes: success (1) and failure (0).
- Distribution represented by two parameters, with possible outcomes: 0 (failure) or 1 (success).
Binomial Distribution
- Derived from the Bernoulli distribution, applicable for multiple independent Bernoulli trials.
- Parameters: number of trials (n) and probability of success (p).
- Probability mass function (PMF):
  P(X=k) = {n race k} p^k (1-p)^{n-k}, \text{ for } k = 0, 1, …, n
- Here, ${n race k}$ is a binomial coefficient, the number of ways to choose k successes.
Poisson Distribution
- Derived from the binomial distribution when n is large and p is small, representing the number of events in a fixed interval.
- PMF given by:
  P(X=k) = \frac{e^{-\lambda} \lambda^k}{k!}, \text{ where } \lambda = np
- Demonstrates event occurrence over time or space, useful in scenarios like call center dynamics.

Poisson Distribution Derivation
- Positioned as a limiting case of the binomial when n is large and p is small.
- Practical example of averaging events in unit time: (w) as the waiting time until the first call in a Poisson process.
- To derive the cumulative distribution function (CDF) of waiting time, consideration of no events occurring within a specified time frame.
- Use the formula for the CDF by counting calls in the interval [$0, x$] which is based on the Poisson count.
Hypergeometric Distribution Debate
- Discussion arises whether a Poisson variable can be derived from a hypergeometric distribution, especially in scenarios of dependent events (like calls in a call center).
- Dependence complicates covariance and variance calculations, leading to potential challenges in deriving useful statistical insights.

Variance of Hypergeometric Distribution
- The calculation involves relationships between two variables, detailing if they are independent:
  Var(X + Y) = Var(X) + Var(Y) + 2Cov(X, Y)
- Confirmed as difficult for n > 2 due to increasing intricacies of accumulative covariance.

Importance of Covariance
- Highlighted as crucial in understanding relationships between two variables in regulatory studies.
- Example: Correlation of medical data (e.g., blood pressure and dietary habits) emphasizes variable relevance.

Uniform Distribution Example
- Described with a defined range (15-25) indicating constant probability density.
- Denoted with a PMF as:
  P(X=x) = \frac{1}{b-a} \text{ for } x \in [a, b]
- Specific focus on transforming results from a uniform distribution into other ranges through scaling and shifting.
Chi-Squared Distribution
- Produced by squaring a standard normal random variable, useful in statistical hypothesis testing.
- Importance arises in understanding sample variance and distributions:
- The PMF is complex but fundamental for statistical tests like ANOVA.

Class pacing designed to cover foundational material effectively before tackling advanced topics at a slower pace for better conceptual understanding.
Discussions encompass both intuition and complex mathematics to develop comprehensive statistical literacy among students.
End of Unit
- Encouragement to embrace the learning process and grasp concepts to progress effectively in statistical studies.

The exploration of statistical distributions underscores their relevance in practical applications and the derivations that illuminate their properties. Understanding these principles lays the groundwork for more advanced statistical methods.