Introduction to Statistical Inference

Statistical inference involves making generalizations about a larger group (population) based on a smaller collected sample.
License: CC BY NC SA 4.0

Descriptive vs. Inferential Statistics

Descriptive Statistics: Summarizing and describing the characteristics of the collected data (the sample).
Inferential Statistics: Using the sample data to make broader generalizations about the population.

Population vs. Sample

Population: The entire group of individuals or items of interest.
- Characterized by Parameters (e.g., expected value, variance, median, proportion).
Sample: A subset of the population from which data is collected.
- Characterized by Statistics (e.g., sample mean, sample variance, sample median, sample proportion).
Descriptive statistics are used on the sample, while inferential statistics aim to draw conclusions about the population.

Example: Stephen Curry's Basketball Scores

Stephen Curry's average score is 30.1 points over 79 games.
$X$ = points scored in a single game
$X \sim N(\mu, \sigma^2)$
$\mu$ and $\sigma$ are unknown parameters representing the population mean and standard deviation.

Parameters vs. Statistics

Parameter: A characteristic of the population, typically unknown.
- Unknown because measuring every individual/outcome is often impossible.
Statistic: A value calculated from the sample data.

Examples of Statistics and Parameters

Statistics (from sample):
- $\bar{X}$ : Sample mean.
- $s$ : Sample standard deviation.
- $\hat{p}$ : Sample proportion.
Parameters (unknown, for population):
- $\mu$ : Population mean.
- $\sigma$ : Population standard deviation.
- $p$ : Population proportion.

Sampling Distribution

Data points are random variables.
Statistics are functions of the data and therefore also random variables.
The distribution of the statistics depends on the parameters of the data's distribution.
Example: The sample mean ( $\bar{X}$ ) is a statistic used to estimate the population mean ( $\mu$ ).

Sample Mean as a Random Variable

Given data: $X1, X2, X3, …, Xn$ from a sample.
Assumptions:
- Random Sample (EAS):
- Observations are Independent and Identically Distributed (iid).
- Example: Each $X_i \sim N(\mu, \sigma^2)$ , for $i = 1, …, n$ .
Challenge: $\mu$ and $\sigma$ are typically unknown.

Expected Value of the Sample Mean

$E(\bar{X}) = \frac{1}{n} \sum{i=1}^{n} E(Xi) = \mu$
Explanation:
- $E(\bar{X}) = E(\frac{1}{n} \sum{i=1}^{n} Xi) = \frac{1}{n} \sum{i=1}^{n} E(Xi) = \frac{1}{n} \sum_{i=1}^{n} \mu = \frac{1}{n} \cdot n \cdot \mu = \mu$

Standard Deviation of the Sample Mean

$\sigma_{\bar{X}} = \sqrt{Var(\bar{X})} = \frac{\sigma}{\sqrt{n}}$
The error in $\bar{X}$ decreases as $n$ increases.
Depends on $\sigma$ , the standard deviation of the data, which is typically unknown.

Standard Error of the Sample Mean

$e.s.(\bar{X}) = \frac{s}{\sqrt{n}}$
Difference between $\sigma_{\bar{X}}$ and $e.s.(\bar{X})$ :
- $\sigma$ is generally unknown.
- We replace $\sigma$ with the statistic $s$ .

Example: 5 Normal, Independent Observations

Data set: 63, 65, 72, 74, 74
Sample mean: $\bar{X} = \frac{63 + 65 + 72 + 74 + 74}{5} = 69.6$
Sample standard deviation: $s = \sqrt{\frac{1}{4} \sum{i=1}^{5} xi^2 - 5(69.6)^2} = 5.225$
Standard error: $e.s.(\bar{X}) = \frac{s}{\sqrt{5}} = \frac{5.225}{\sqrt{5}} = 2.337$

Distribution of the Sample Mean

Rule: If $X1, X2, …, Xn$ are independent and normally distributed with mean $\mu$ and standard deviation $\sigma$ (i.e., each $Xi \sim N(\mu, \sigma^2)$ ), then $\bar{X} \sim N(\mu, \frac{\sigma^2}{n})$ .

Standardization

For a variable $X$ : $Z = \frac{X - \mu}{\sigma}$
For the sample mean $\bar{X}$ of $n$ variables $X1, …, Xn$ : $Z = \frac{\bar{X} - \mu}{\sigma/\sqrt{n}} = \frac{\sqrt{n} (\bar{X} - \mu)}{\sigma}$

Population and Sample (Recap)

Population: Parameters (Expected Value, Variance, Median, Proportion).
Sample: Statistics (Sample Mean, Sample Variance, Sample Median, Sample Proportion).
Descriptive statistics describe the sample, while inferential statistics infer about the population.

Example: Average Number of Cars in US Households

Consider the number of cars in each household in the United States.
Population: All US households.
Population size: $N = 324,227,000$
Data set: $x1, …, xN$ , where $x_1$ is the number of cars in the 1st household, etc.
Population mean: $\mu = \frac{x1 + x2 + … + x_N}{N}$
Population standard deviation: $\sigma = \sqrt{\frac{1}{N - 1} \sum_{\text{for all } x} (x - \mu)^2}$

Sample (Cars in US Households Example)

Sample data set: ${X1, …, Xn}$
Sample size: $n$
Order: n << N (n is much smaller than N).
Objective: Infer conclusions about population parameters from sample values.
Ideal sample: representative and non-biased, chosen randomly.

Simple Random Sample

${X1, …, Xn}$ is a simple random sample if:
- Choosing one member doesn't affect the chances of choosing another.
- Each member has the same probability of being chosen.
In other words:
- ${X1, …, Xn}$ are independent.
- ${X1, …, Xn}$ are identically distributed (same probability mass or density function).

Estimating $\mu$

An estimator of a parameter is a statistic whose value in the sample is used to estimate that parameter.
Estimator for $\mu$ : Sample mean, $\bar{X}$ .
Estimator for $\sigma$ : Sample standard deviation, $s$ .
Examples:
- $X1, …, Xn$ is a sample of n US households.
- Estimator for $\mu$ will be $\bar{X} = \frac{x1 + … + xn}{n}$ .
- Estimator for $\sigma$ will be $s = \sqrt{\frac{1}{n-1} \sum_{\text{for all } x} (x - \bar{X})^2}$ .

Properties of $\bar{X}$

Question: Is $\bar{X}$ a

Introduction to Statistical Inference

Introduction to Statistical Inference

Descriptive vs. Inferential Statistics

Population vs. Sample

Example: Stephen Curry's Basketball Scores

Parameters vs. Statistics

Examples of Statistics and Parameters

Sampling Distribution

Sample Mean as a Random Variable

Expected Value of the Sample Mean

Standard Deviation of the Sample Mean

Standard Error of the Sample Mean

Example: 5 Normal, Independent Observations

Distribution of the Sample Mean

Standardization

Population and Sample (Recap)

Example: Average Number of Cars in US Households

Sample (Cars in US Households Example)

Simple Random Sample

Estimating μ\muμ

Properties of Xˉ\bar{X}Xˉ

Estimating $\mu$

Properties of $\bar{X}$