Lecture 11 - Introduction to Inference & Sampling
Introduction to Inference & Sampling
What is Statistical Inference?
Statistical inference is the process of using a sample to make conclusions about a larger population.
Why? Measuring an entire population can be expensive or impractical.
Example: Determine the average width of a species of sea star using a representative sample.
Inference in Opinion Polling
Example: Angus Reid Poll estimates 53% of Canadians feel unable to keep up with rising costs.
Sampling Distribution:
Distribution of a statistic (e.g., mean) across multiple samples.
Sample Distribution: Distribution from a single sample.
Larger sample sizes lead to less variability of sample means due to more data points being more representative of the population.
As sample size increases:
Sampling distribution becomes narrower.
More sample estimates cluster closer to the true population mean.
Greater variability of sample point estimates.
Descriptive vs. Inferential Questions
Descriptive Questions:
Goal: Summarize existing data.
Focus: What happened?
Methods: Statistics and visualizations.
Example Keywords: "What is the average…?"
Inferential Questions:
Goal: Draw conclusions and generalize findings.
Focus: Why did it happen? What will happen?
Methods: Statistical tests, modeling, estimating population parameters.
Example Keywords: "Is there a significant difference…?"
Key Definitions
Point Estimate: A single number estimating an unknown population parameter.
Population: Entire set of entities/objects of interest.
Random Sampling: Selecting observations where each has an equal likelihood of being chosen.
Representative Sampling: Selecting observations that accurately reflect the population's characteristics.
Population Parameter: Numerical summary of a population (e.g., mean).
Sampling Distribution: Distribution of point estimates from different samples of the same population.
Inference in Market Assessment
Example: Assessing the proportion of undergraduate students owning an iPhone.
A/B Testing Example
Assess which of two website designs yields higher customer engagement.
Estimation
Inferential problem: Estimating a quantitative property of a population (e.g., iPhone ownership proportion).
Steps:
Randomly select a sample.
Calculate sample proportion (point estimate) to estimate true population proportion.
Virtual Simulation Experiment
Simulate the estimation process for population parameters.
Example: Determine iPhone ownership amongst UBC undergrads.
Comments on Distribution
Shapes of distributions (bell-shaped, skewed).
Identify center measures: Mean, median, mode.
Spread measures: Range, standard deviation.
Sampling distribution spread indicates precision of point estimates.
Overview of Statistical Inference
Fundamental concepts of sampling and estimation.
Common techniques: Point estimation and interval estimation.
Learning Objectives:
Describe inferential questions and population parameters.
Define sampling terms and explain their relations.
Use R for sampling and creating sampling distributions.
Need for Sampling
Understanding how sample observations relate to the broader population.
Turn statistical data into estimates for population parameters, which are typically unknown.
Bootstrapping
A method for estimating sampling distributions based on random sampling.
Useful for improving point estimates and understanding variability within sample estimates.
Summary of Key Concepts
Point estimates provide single values from samples to infer about populations.
Sampling distributions illustrate variability of estimates.
Shape and spread of sampling distributions depend on sample size.
Larger sample sizes generally lead to increased reliability of estimates.
Bootstrapping helps refine estimates based on sample variability.
Code Functions Reference Table
Function | Definition |
---|---|
| Randomly selects a sample from a dataset. |
| Computes the average of a numeric array or list. |
| Calculates the standard deviation of a numeric dataset. |
| Creates a frequency table of categorical data. |
| Conducts hypothesis tests for proportions. |
| Implements bootstrapping techniques for estimating distributions. |
| Generates a histogram for visual representation of data distribution. |
| Provides quantiles for a normal distribution. |
| Performs t-tests to compare means of two groups. |
| Fits linear models to data for regression analysis. |