Sampling Distribution Notes

Sampling Distribution

Sampling distribution is the probability distribution of a sample statistic or any function $g(x1, x2, …, xn)$ where $x1, x2, …, xn$ constitute a sample.
- Suppose a sample of size $n$ is drawn from a finite population of size $N$ .
- The total number of possible samples is: $NCn = \frac{N!}{(N - n)!n!} = K$ (say)
- For each sample, we can compute some statistic $t = f(x1, x2, …, x_n)$ , such as the sample mean $\bar{x}$ or variance $s^2$ .

Example Table

Sample Number	Statistic ( $t$ )	Statistic ( $\bar{x}$ )	Statistic ( $s^2$ )
1	$T_1$	$\bar{X}_1$	$S^2_1$
2	$T_2$	$\bar{X}_2$	$S^2_2$
3	$T_3$	$\bar{X}_3$	$S^2_3$
…	…	…	…
k	$T_k$	$\bar{X}_k$	$S^2_k$

If these $k$ values of the statistic are arranged in a frequency table, we obtain the sampling distribution of the statistic.
The mean and variance of the sampling distribution are denoted by $\bar{t}$ and $S^2$ , respectively.
The distribution of sample means, sample variances, or any function of sample statistics is known as a sampling distribution.

Purpose of Sampling Distribution

A sample is studied not for its own sake but to infer the characteristics of the population.
Finding exact population parameters is often costly, difficult, or impossible when $N$ is large.
Sampling is a practical tool to estimate population parameters easily and efficiently.

Rational Distribution and Sampling Distribution

Rational Distribution: When the standard deviation of a sampling distribution is very small, it is referred to as a rational distribution.
Why Sampling Distribution is called Rational: Because the statistic becomes a reliable (rational) estimator of the population parameter when variability is small.

Importance of Sampling Distribution in Statistics

To infer population parameters through point estimation.
To develop confidence intervals.
To perform hypothesis testing.
Sampling distribution helps determine critical values needed for statistical tests.

Uses of Sampling Distribution

Helps estimate the characteristics of the universe population by examining a small part.
Facilitates inference about population parameters using statistics computed from samples.
Enables construction of confidence intervals and hypothesis testing.

How to Obtain the Distribution of Random Variables

To find the distribution of a function of a random variable, we consider two cases:
- Case I: Single random variable
- Case II: Several random variables

Case I: Single Random Variable (Graphical Approach)

Let $X$ be a continuous random variable with distribution $f(x)$ , strictly monotonic.
Let $Z = g(X)$ , where $g(x)$ is invertible with inverse $x^* = g^{-1}(z)$ .
Transformation: $dg(x) = |\frac{dg(x)}{dx}| dx$
Thus, the density $h(z)$ is: $h(z) = f(x) |\frac{dg(x)}{dx}| = f(g^{-1}(z)) |\frac{dg^{-1}(z)}{dz}|$
where $|\frac{dg^{-1}(z)}{dz}|$ is the Jacobian $J$ .

Case I: Single Random Variable (Algebraic Approach)

CDF Approach: $H(z) = P(g(X) \le z) = P(X \le g^{-1}(z)) = F(g^{-1}(z))$
Differentiating: $h(z) = \frac{dH(z)}{dz} = \frac{dF(g^{-1}(z))}{dz}$
By the chain rule: $h(z) = f(g^{-1}(z)) |\frac{dg^{-1}(z)}{dz}| = f(g^{-1}(z)) \times J$

Case II: Several Random Variables

Suppose $X1, X2, …, Xn$ are continuous random variables with joint distribution $f(x1, x2, …, xn)$ .
Define $k$ functions ( $k \le n$ ):
- $z1 = g1(x1, x2, …, x_n)$
- $z2 = g2(x1, x2, …, x_n)$
- …
- $zk = gk(x1, x2, …, x_n)$
If $k = n$ , solve for $xi$ in terms of $zj$ : $x^*i = g^{-1}i(z1, z2, …, z_k)$ for $i = 1, 2, …, k$

Jacobian and Joint Density

Jacobian Matrix:

J = \begin{bmatrix}
\frac{\partial x^1}{\partial z1} & \frac{\partial x^1}{\partial z2} & \cdots & \frac{\partial x^1}{\partial zk} \
\frac{\partial x^2}{\partial z1} & \frac{\partial x^2}{\partial z2} & \cdots & \frac{\partial x^2}{\partial zk} \
… & … & … & … \
\frac{\partial x^k}{\partial z1} & \frac{\partial x^k}{\partial z2} & \cdots & \frac{\partial x^*k}{\partial zk}
\end{bmatrix}

Joint Density:

$h(z1, z2, …, zk) = f(x^1, x^2, …, x^*k) \times |J|$

Multiple Solutions

If there are multiple solutions $(x^{1i}, x^{2i}, …, x^*_{ki})$ :

$h(z1, z2, …, zk) = \sumi f(x^{1i}, x^{2i}, …, x^*_{ki}) \times |J|$

Marginal Distribution: Integrating:

$h1(z1) = \int \int \cdots \int h(z1, z2, …, zk) dz2 dz3 \cdots dzk$

Example Question and Solution

Question: If $f(x) = x^2$ for 0 < x < 2, find the distribution of $z = 2x + 3$ .
- Step 1: Find the range of $z$
 - When $x = 0, z = 2(0) + 3 = 3$
 - When $x = 2, z = 2(2) + 3 = 7$
 - Thus, 3 < z < 7.
- Step 2: Find the relationship between $x$ and $z$
 - From $z = 2x + 3$ , solve for $x$ : $x = \frac{z - 3}{2}$
- Step 3: Find the new PDF $g(z)$
 - We use the formula for transforming variables: $g(z) = f(x) |\frac{dx}{dz}|$ where $x = \frac{z - 3}{2}$ .
 - First, compute $\frac{dx}{dz}$ : $\frac{dx}{dz} = \frac{1}{2}$
 - Now, plug $x = \frac{z - 3}{2}$ into $f(x)$ : $f(\frac{z - 3}{2}) = (\frac{z - 3}{2})^2 = \frac{(z - 3)^2}{4}$
 - Thus, $g(z) = \frac{(z - 3)^2}{4} \times \frac{1}{2} = \frac{(z - 3)^2}{8}$ for 3 < z < 7.
Final Answer: The distribution of $z$ is: $g(z) = \frac{(z-3)^2}{8}$ , 3 < z < 7

Another Example Problem

Given the probability density function: $f(x) = x^2$ , 0 < x < 2, find the distribution of the random variable: $z = 5x + 2$ .

Let's clarify Jacobian and Joint Density:

Jacobian: In the context of transforming random variables, the Jacobian (often denoted as $J$ ) is a determinant of a matrix of partial derivatives. It's used when you change variables in multiple integrals. Essentially, it accounts for how the 'volume' changes during the transformation. In simpler terms, it helps to correct for the stretching or compressing of the space that occurs when you switch from one set of variables to another.
Joint Density: The joint density function (or joint distribution) describes how multiple random variables behave together. If you have two random variables, $X$ and $Y$ , their joint density $f(x, y)$ tells you the probability density at each point $(x, y)$ . It's a way of understanding how these variables are related and how they vary in conjunction with each other.

In the equations provided:

The Jacobian Matrix $J$ is a matrix of partial derivatives:

J = \begin{bmatrix}
\frac{\partial x^1}{\partial z1} & \frac{\partial x^1}{\partial z2} & \cdots & \frac{\partial x^1}{\partial zk} \
\frac{\partial x^2}{\partial z1} & \frac{\partial x^2}{\partial z2} & \cdots & \frac{\partial x^2}{\partial zk} \
\ldots & \ldots & \ldots & \ldots \
\frac{\partial x^k}{\partial z1} & \frac{\partial x^k}{\partial z2} & \cdots & \frac{\partial x^k}{\partial zk}
\end{bmatrix}

And the Joint Density $h(z1, z2, …, z*k)$ is calculated as:

$h(z1, z2, …, zk) = f(x^1, x^*2, …, x^*k) \times ||J||$

Where $f(x^1, x^**2, …, x^*k)$ is the original joint distribution in terms of the original variables, and $||J||$ is the absolute value of the determinant of the Jacobian matrix.