Introduction to Statistical Inference
Introduction to Statistical Inference
Statistical inference involves making generalizations about a larger group (population) based on a smaller collected sample.
License: CC BY NC SA 4.0
Descriptive vs. Inferential Statistics
Descriptive Statistics: Summarizing and describing the characteristics of the collected data (the sample).
Inferential Statistics: Using the sample data to make broader generalizations about the population.
Population vs. Sample
Population: The entire group of individuals or items of interest.
Characterized by Parameters (e.g., expected value, variance, median, proportion).
Sample: A subset of the population from which data is collected.
Characterized by Statistics (e.g., sample mean, sample variance, sample median, sample proportion).
Descriptive statistics are used on the sample, while inferential statistics aim to draw conclusions about the population.
Example: Stephen Curry's Basketball Scores
Stephen Curry's average score is 30.1 points over 79 games.
= points scored in a single game
and are unknown parameters representing the population mean and standard deviation.
Parameters vs. Statistics
Parameter: A characteristic of the population, typically unknown.
Unknown because measuring every individual/outcome is often impossible.
Statistic: A value calculated from the sample data.
Examples of Statistics and Parameters
Statistics (from sample):
: Sample mean.
: Sample standard deviation.
: Sample proportion.
Parameters (unknown, for population):
: Population mean.
: Population standard deviation.
: Population proportion.
Sampling Distribution
Data points are random variables.
Statistics are functions of the data and therefore also random variables.
The distribution of the statistics depends on the parameters of the data's distribution.
Example: The sample mean () is a statistic used to estimate the population mean ().
Sample Mean as a Random Variable
Given data: from a sample.
Assumptions:
Random Sample (EAS):
Observations are Independent and Identically Distributed (iid).
Example: Each , for .
Challenge: and are typically unknown.
Expected Value of the Sample Mean
Explanation:
Standard Deviation of the Sample Mean
The error in decreases as increases.
Depends on , the standard deviation of the data, which is typically unknown.
Standard Error of the Sample Mean
Difference between and :
is generally unknown.
We replace with the statistic .
Example: 5 Normal, Independent Observations
Data set: 63, 65, 72, 74, 74
Sample mean:
Sample standard deviation:
Standard error:
Distribution of the Sample Mean
Rule: If are independent and normally distributed with mean and standard deviation (i.e., each ), then .
Standardization
For a variable :
For the sample mean of variables :
Population and Sample (Recap)
Population: Parameters (Expected Value, Variance, Median, Proportion).
Sample: Statistics (Sample Mean, Sample Variance, Sample Median, Sample Proportion).
Descriptive statistics describe the sample, while inferential statistics infer about the population.
Example: Average Number of Cars in US Households
Consider the number of cars in each household in the United States.
Population: All US households.
Population size:
Data set: , where is the number of cars in the 1st household, etc.
Population mean:
Population standard deviation:
Sample (Cars in US Households Example)
Sample data set:
Sample size:
Order: n << N (n is much smaller than N).
Objective: Infer conclusions about population parameters from sample values.
Ideal sample: representative and non-biased, chosen randomly.
Simple Random Sample
is a simple random sample if:
Choosing one member doesn't affect the chances of choosing another.
Each member has the same probability of being chosen.
In other words:
are independent.
are identically distributed (same probability mass or density function).
Estimating
An estimator of a parameter is a statistic whose value in the sample is used to estimate that parameter.
Estimator for : Sample mean, .
Estimator for : Sample standard deviation, .
Examples:
is a sample of n US households.
Estimator for will be .
Estimator for will be .
Properties of
Question: Is a