Introduction to Statistical Inference

Introduction to Statistical Inference

  • Statistical inference involves making generalizations about a larger group (population) based on a smaller collected sample.

  • License: CC BY NC SA 4.0

Descriptive vs. Inferential Statistics

  • Descriptive Statistics: Summarizing and describing the characteristics of the collected data (the sample).

  • Inferential Statistics: Using the sample data to make broader generalizations about the population.

Population vs. Sample

  • Population: The entire group of individuals or items of interest.

    • Characterized by Parameters (e.g., expected value, variance, median, proportion).

  • Sample: A subset of the population from which data is collected.

    • Characterized by Statistics (e.g., sample mean, sample variance, sample median, sample proportion).

  • Descriptive statistics are used on the sample, while inferential statistics aim to draw conclusions about the population.

Example: Stephen Curry's Basketball Scores

  • Stephen Curry's average score is 30.1 points over 79 games.

  • XX = points scored in a single game

  • XN(μ,σ2)X \sim N(\mu, \sigma^2)

  • μ\mu and σ\sigma are unknown parameters representing the population mean and standard deviation.

Parameters vs. Statistics

  • Parameter: A characteristic of the population, typically unknown.

    • Unknown because measuring every individual/outcome is often impossible.

  • Statistic: A value calculated from the sample data.

Examples of Statistics and Parameters

  • Statistics (from sample):

    • Xˉ\bar{X}: Sample mean.

    • ss: Sample standard deviation.

    • p^\hat{p}: Sample proportion.

  • Parameters (unknown, for population):

    • μ\mu: Population mean.

    • σ\sigma: Population standard deviation.

    • pp: Population proportion.

Sampling Distribution

  • Data points are random variables.

  • Statistics are functions of the data and therefore also random variables.

  • The distribution of the statistics depends on the parameters of the data's distribution.

  • Example: The sample mean (Xˉ\bar{X}) is a statistic used to estimate the population mean (μ\mu).

Sample Mean as a Random Variable

  • Given data: X<em>1,X</em>2,X<em>3,,X</em>nX<em>1, X</em>2, X<em>3, …, X</em>n from a sample.

  • Assumptions:

    • Random Sample (EAS):

    • Observations are Independent and Identically Distributed (iid).

    • Example: Each XiN(μ,σ2)X_i \sim N(\mu, \sigma^2), for i=1,,ni = 1, …, n.

  • Challenge: μ\mu and σ\sigma are typically unknown.

Expected Value of the Sample Mean

  • E(Xˉ)=1n<em>i=1nE(X</em>i)=μE(\bar{X}) = \frac{1}{n} \sum<em>{i=1}^{n} E(X</em>i) = \mu

  • Explanation:

    • E(Xˉ)=E(1n<em>i=1nX</em>i)=1n<em>i=1nE(X</em>i)=1ni=1nμ=1nnμ=μE(\bar{X}) = E(\frac{1}{n} \sum<em>{i=1}^{n} X</em>i) = \frac{1}{n} \sum<em>{i=1}^{n} E(X</em>i) = \frac{1}{n} \sum_{i=1}^{n} \mu = \frac{1}{n} \cdot n \cdot \mu = \mu

Standard Deviation of the Sample Mean

  • σXˉ=Var(Xˉ)=σn\sigma_{\bar{X}} = \sqrt{Var(\bar{X})} = \frac{\sigma}{\sqrt{n}}

  • The error in Xˉ\bar{X} decreases as nn increases.

  • Depends on σ\sigma, the standard deviation of the data, which is typically unknown.

Standard Error of the Sample Mean

  • e.s.(Xˉ)=sne.s.(\bar{X}) = \frac{s}{\sqrt{n}}

  • Difference between σXˉ\sigma_{\bar{X}} and e.s.(Xˉ)e.s.(\bar{X}):

    • σ\sigma is generally unknown.

    • We replace σ\sigma with the statistic ss.

Example: 5 Normal, Independent Observations

  • Data set: 63, 65, 72, 74, 74

  • Sample mean: Xˉ=63+65+72+74+745=69.6\bar{X} = \frac{63 + 65 + 72 + 74 + 74}{5} = 69.6

  • Sample standard deviation: s=14<em>i=15x</em>i25(69.6)2=5.225s = \sqrt{\frac{1}{4} \sum<em>{i=1}^{5} x</em>i^2 - 5(69.6)^2} = 5.225

  • Standard error: e.s.(Xˉ)=s5=5.2255=2.337e.s.(\bar{X}) = \frac{s}{\sqrt{5}} = \frac{5.225}{\sqrt{5}} = 2.337

Distribution of the Sample Mean

  • Rule: If X<em>1,X</em>2,,X<em>nX<em>1, X</em>2, …, X<em>n are independent and normally distributed with mean μ\mu and standard deviation σ\sigma (i.e., each X</em>iN(μ,σ2)X</em>i \sim N(\mu, \sigma^2)), then XˉN(μ,σ2n)\bar{X} \sim N(\mu, \frac{\sigma^2}{n}).

Standardization

  • For a variable XX: Z=XμσZ = \frac{X - \mu}{\sigma}

  • For the sample mean Xˉ\bar{X} of nn variables X<em>1,,X</em>nX<em>1, …, X</em>n: Z=Xˉμσ/n=n(Xˉμ)σZ = \frac{\bar{X} - \mu}{\sigma/\sqrt{n}} = \frac{\sqrt{n} (\bar{X} - \mu)}{\sigma}

Population and Sample (Recap)

  • Population: Parameters (Expected Value, Variance, Median, Proportion).

  • Sample: Statistics (Sample Mean, Sample Variance, Sample Median, Sample Proportion).

  • Descriptive statistics describe the sample, while inferential statistics infer about the population.

Example: Average Number of Cars in US Households

  • Consider the number of cars in each household in the United States.

  • Population: All US households.

  • Population size: N=324,227,000N = 324,227,000

  • Data set: x<em>1,,x</em>Nx<em>1, …, x</em>N, where x1x_1 is the number of cars in the 1st household, etc.

  • Population mean: μ=x<em>1+x</em>2++xNN\mu = \frac{x<em>1 + x</em>2 + … + x_N}{N}

  • Population standard deviation: σ=1N1for all x(xμ)2\sigma = \sqrt{\frac{1}{N - 1} \sum_{\text{for all } x} (x - \mu)^2}

Sample (Cars in US Households Example)

  • Sample data set: X<em>1,,X</em>n{X<em>1, …, X</em>n}

  • Sample size: nn

  • Order: n << N (n is much smaller than N).

  • Objective: Infer conclusions about population parameters from sample values.

  • Ideal sample: representative and non-biased, chosen randomly.

Simple Random Sample

  • X<em>1,,X</em>n{X<em>1, …, X</em>n} is a simple random sample if:

    • Choosing one member doesn't affect the chances of choosing another.

    • Each member has the same probability of being chosen.

  • In other words:

    • X<em>1,,X</em>n{X<em>1, …, X</em>n} are independent.

    • X<em>1,,X</em>n{X<em>1, …, X</em>n} are identically distributed (same probability mass or density function).

Estimating μ\mu

  • An estimator of a parameter is a statistic whose value in the sample is used to estimate that parameter.

  • Estimator for μ\mu: Sample mean, Xˉ\bar{X}.

  • Estimator for σ\sigma: Sample standard deviation, ss.

  • Examples:

    • X<em>1,,X</em>nX<em>1, …, X</em>n is a sample of n US households.

    • Estimator for μ\mu will be Xˉ=x<em>1++x</em>nn\bar{X} = \frac{x<em>1 + … + x</em>n}{n}.

    • Estimator for σ\sigma will be s=1n1for all x(xXˉ)2s = \sqrt{\frac{1}{n-1} \sum_{\text{for all } x} (x - \bar{X})^2}.

Properties of Xˉ\bar{X}

  • Question: Is Xˉ\bar{X} a