Sampling Distribution Notes

Sampling Distribution

  • Sampling distribution is the probability distribution of a sample statistic or any function g(x<em>1,x</em>2,,x<em>n)g(x<em>1, x</em>2, …, x<em>n) where x</em>1,x<em>2,,x</em>nx</em>1, x<em>2, …, x</em>n constitute a sample.

    • Suppose a sample of size nn is drawn from a finite population of size NN.

    • The total number of possible samples is: N<em>C</em>n=N!(Nn)!n!=KN<em>C</em>n = \frac{N!}{(N - n)!n!} = K (say)

    • For each sample, we can compute some statistic t=f(x<em>1,x</em>2,,xn)t = f(x<em>1, x</em>2, …, x_n), such as the sample mean xˉ\bar{x} or variance s2s^2.

Example Table

Sample Number

Statistic (tt)

Statistic (xˉ\bar{x})

Statistic (s2s^2)

1

T1T_1

Xˉ1\bar{X}_1

S12S^2_1

2

T2T_2

Xˉ2\bar{X}_2

S22S^2_2

3

T3T_3

Xˉ3\bar{X}_3

S32S^2_3

k

TkT_k

Xˉk\bar{X}_k

Sk2S^2_k

  • If these kk values of the statistic are arranged in a frequency table, we obtain the sampling distribution of the statistic.

  • The mean and variance of the sampling distribution are denoted by tˉ\bar{t} and S2S^2, respectively.

  • The distribution of sample means, sample variances, or any function of sample statistics is known as a sampling distribution.

Purpose of Sampling Distribution

  • A sample is studied not for its own sake but to infer the characteristics of the population.

  • Finding exact population parameters is often costly, difficult, or impossible when NN is large.

  • Sampling is a practical tool to estimate population parameters easily and efficiently.

Rational Distribution and Sampling Distribution

  • Rational Distribution: When the standard deviation of a sampling distribution is very small, it is referred to as a rational distribution.

  • Why Sampling Distribution is called Rational: Because the statistic becomes a reliable (rational) estimator of the population parameter when variability is small.

Importance of Sampling Distribution in Statistics

  • To infer population parameters through point estimation.

  • To develop confidence intervals.

  • To perform hypothesis testing.

  • Sampling distribution helps determine critical values needed for statistical tests.

Uses of Sampling Distribution

  • Helps estimate the characteristics of the universe population by examining a small part.

  • Facilitates inference about population parameters using statistics computed from samples.

  • Enables construction of confidence intervals and hypothesis testing.

How to Obtain the Distribution of Random Variables

  • To find the distribution of a function of a random variable, we consider two cases:

    • Case I: Single random variable

    • Case II: Several random variables

Case I: Single Random Variable (Graphical Approach)

  • Let XX be a continuous random variable with distribution f(x)f(x), strictly monotonic.

  • Let Z=g(X)Z = g(X), where g(x)g(x) is invertible with inverse x=g1(z)x^* = g^{-1}(z).

  • Transformation: dg(x)=dg(x)dxdxdg(x) = |\frac{dg(x)}{dx}| dx

  • Thus, the density h(z)h(z) is: h(z)=f(x)dg(x)dx=f(g1(z))dg1(z)dzh(z) = f(x) |\frac{dg(x)}{dx}| = f(g^{-1}(z)) |\frac{dg^{-1}(z)}{dz}|

  • where dg1(z)dz|\frac{dg^{-1}(z)}{dz}| is the Jacobian JJ.

Case I: Single Random Variable (Algebraic Approach)

  • CDF Approach: H(z)=P(g(X)z)=P(Xg1(z))=F(g1(z))H(z) = P(g(X) \le z) = P(X \le g^{-1}(z)) = F(g^{-1}(z))

  • Differentiating: h(z)=dH(z)dz=dF(g1(z))dzh(z) = \frac{dH(z)}{dz} = \frac{dF(g^{-1}(z))}{dz}

  • By the chain rule: h(z)=f(g1(z))dg1(z)dz=f(g1(z))×Jh(z) = f(g^{-1}(z)) |\frac{dg^{-1}(z)}{dz}| = f(g^{-1}(z)) \times J

Case II: Several Random Variables

  • Suppose X<em>1,X</em>2,,X<em>nX<em>1, X</em>2, …, X<em>n are continuous random variables with joint distribution f(x</em>1,x<em>2,,x</em>n)f(x</em>1, x<em>2, …, x</em>n).

  • Define kk functions (knk \le n):

    • z<em>1=g</em>1(x<em>1,x</em>2,,xn)z<em>1 = g</em>1(x<em>1, x</em>2, …, x_n)

    • z<em>2=g</em>2(x<em>1,x</em>2,,xn)z<em>2 = g</em>2(x<em>1, x</em>2, …, x_n)

    • z<em>k=g</em>k(x<em>1,x</em>2,,xn)z<em>k = g</em>k(x<em>1, x</em>2, …, x_n)

  • If k=nk = n, solve for x<em>ix<em>i in terms of z</em>jz</em>j: x<em>i=g1</em>i(z<em>1,z</em>2,,zk)x^*<em>i = g^{-1}</em>i(z<em>1, z</em>2, …, z_k) for i=1,2,,ki = 1, 2, …, k

Jacobian and Joint Density

  • Jacobian Matrix:

J = \begin{bmatrix}
\frac{\partial x^1}{\partial z1} & \frac{\partial x^1}{\partial z2} & \cdots & \frac{\partial x^1}{\partial zk} \
\frac{\partial x^2}{\partial z
1} & \frac{\partial x^2}{\partial z2} & \cdots & \frac{\partial x^2}{\partial zk} \
… & … & … & … \
\frac{\partial x^k}{\partial z1} & \frac{\partial x^k}{\partial z2} & \cdots & \frac{\partial x^*k}{\partial zk}
\end{bmatrix}

  • Joint Density:

h(z<em>1,z</em>2,,z<em>k)=f(x1,x2,,x</em>k)×Jh(z<em>1, z</em>2, …, z<em>k) = f(x^1, x^2, …, x^*</em>k) \times |J|

Multiple Solutions

  • If there are multiple solutions (x<em>1i,x</em>2i,,xki)(x^<em>{1i}, x^</em>{2i}, …, x^*_{ki}):

h(z<em>1,z</em>2,,z<em>k)=</em>if(x<em>1i,x</em>2i,,xki)×Jh(z<em>1, z</em>2, …, z<em>k) = \sum</em>i f(x^<em>{1i}, x^</em>{2i}, …, x^*_{ki}) \times |J|

  • Marginal Distribution: Integrating:

h<em>1(z</em>1)=h(z<em>1,z</em>2,,z<em>k)dz</em>2dz<em>3dz</em>kh<em>1(z</em>1) = \int \int \cdots \int h(z<em>1, z</em>2, …, z<em>k) dz</em>2 dz<em>3 \cdots dz</em>k

Example Question and Solution

  • Question: If f(x)=x2f(x) = x^2 for 0 < x < 2, find the distribution of z=2x+3z = 2x + 3.

    • Step 1: Find the range of zz

      • When x=0,z=2(0)+3=3x = 0, z = 2(0) + 3 = 3

      • When x=2,z=2(2)+3=7x = 2, z = 2(2) + 3 = 7

      • Thus, 3 < z < 7.

    • Step 2: Find the relationship between xx and zz

      • From z=2x+3z = 2x + 3, solve for xx: x=z32x = \frac{z - 3}{2}

    • Step 3: Find the new PDF g(z)g(z)

      • We use the formula for transforming variables: g(z)=f(x)dxdzg(z) = f(x) |\frac{dx}{dz}| where x=z32x = \frac{z - 3}{2}.

      • First, compute dxdz\frac{dx}{dz}: dxdz=12\frac{dx}{dz} = \frac{1}{2}

      • Now, plug x=z32x = \frac{z - 3}{2} into f(x)f(x): f(z32)=(z32)2=(z3)24f(\frac{z - 3}{2}) = (\frac{z - 3}{2})^2 = \frac{(z - 3)^2}{4}

      • Thus, g(z)=(z3)24×12=(z3)28g(z) = \frac{(z - 3)^2}{4} \times \frac{1}{2} = \frac{(z - 3)^2}{8} for 3 < z < 7.

  • Final Answer: The distribution of zz is: g(z)=(z3)28g(z) = \frac{(z-3)^2}{8}, 3 < z < 7

Another Example Problem

  • Given the probability density function: f(x)=x2f(x) = x^2, 0 < x < 2, find the distribution of the random variable: z=5x+2z = 5x + 2.

Let's clarify Jacobian and Joint Density:

  • Jacobian: In the context of transforming random variables, the Jacobian (often denoted as JJ) is a determinant of a matrix of partial derivatives. It's used when you change variables in multiple integrals. Essentially, it accounts for how the 'volume' changes during the transformation. In simpler terms, it helps to correct for the stretching or compressing of the space that occurs when you switch from one set of variables to another.

  • Joint Density: The joint density function (or joint distribution) describes how multiple random variables behave together. If you have two random variables, XX and YY, their joint density f(x,y)f(x, y) tells you the probability density at each point (x,y)(x, y). It's a way of understanding how these variables are related and how they vary in conjunction with each other.

In the equations provided:

  • The Jacobian Matrix JJ is a matrix of partial derivatives:

J = \begin{bmatrix}
\frac{\partial x^1}{\partial z1} & \frac{\partial x^1}{\partial z2} & \cdots & \frac{\partial x^1}{\partial zk} \
\frac{\partial x^2}{\partial z1} & \frac{\partial x^2}{\partial z2} & \cdots & \frac{\partial x^2}{\partial zk} \
\ldots & \ldots & \ldots & \ldots \
\frac{\partial x^k}{\partial z1} & \frac{\partial x^k}{\partial z2} & \cdots & \frac{\partial x^k}{\partial zk}
\end{bmatrix}

  • And the Joint Density h(z<em>1,z</em>2,,zk)h(z<em>1, z</em>2, …, z*k) is calculated as:

h(z<em>1,z</em>2,,z<em>k)=f(x1,x</em>2,,xk)×Jh(z<em>1, z</em>2, …, z<em>k) = f(x^1, x^</em>*2, …, x^*k) \times ||J||

Where f(x1,x2,,xk)f(x^1, x^**2, …, x^*k) is the original joint distribution in terms of the original variables, and J||J|| is the absolute value of the determinant of the Jacobian matrix.