Statistical inference is the process of using a sample to make conclusions about a larger population.
Why? Measuring an entire population can be expensive or impractical.
Example: Determine the average width of a species of sea star using a representative sample.
Example: Angus Reid Poll estimates 53% of Canadians feel unable to keep up with rising costs.
Sampling Distribution:
Distribution of a statistic (e.g., mean) across multiple samples.
Sample Distribution: Distribution from a single sample.
Larger sample sizes lead to less variability of sample means due to more data points being more representative of the population.
As sample size increases:
Sampling distribution becomes narrower.
More sample estimates cluster closer to the true population mean.
Greater variability of sample point estimates.
Descriptive Questions:
Goal: Summarize existing data.
Focus: What happened?
Methods: Statistics and visualizations.
Example Keywords: "What is the average…?"
Inferential Questions:
Goal: Draw conclusions and generalize findings.
Focus: Why did it happen? What will happen?
Methods: Statistical tests, modeling, estimating population parameters.
Example Keywords: "Is there a significant difference…?"
Point Estimate: A single number estimating an unknown population parameter.
Population: Entire set of entities/objects of interest.
Random Sampling: Selecting observations where each has an equal likelihood of being chosen.
Representative Sampling: Selecting observations that accurately reflect the population's characteristics.
Population Parameter: Numerical summary of a population (e.g., mean).
Sampling Distribution: Distribution of point estimates from different samples of the same population.
Example: Assessing the proportion of undergraduate students owning an iPhone.
Assess which of two website designs yields higher customer engagement.
Inferential problem: Estimating a quantitative property of a population (e.g., iPhone ownership proportion).
Steps:
Randomly select a sample.
Calculate sample proportion (point estimate) to estimate true population proportion.
Simulate the estimation process for population parameters.
Example: Determine iPhone ownership amongst UBC undergrads.
Shapes of distributions (bell-shaped, skewed).
Identify center measures: Mean, median, mode.
Spread measures: Range, standard deviation.
Sampling distribution spread indicates precision of point estimates.
Fundamental concepts of sampling and estimation.
Common techniques: Point estimation and interval estimation.
Learning Objectives:
Describe inferential questions and population parameters.
Define sampling terms and explain their relations.
Use R for sampling and creating sampling distributions.
Understanding how sample observations relate to the broader population.
Turn statistical data into estimates for population parameters, which are typically unknown.
A method for estimating sampling distributions based on random sampling.
Useful for improving point estimates and understanding variability within sample estimates.
Point estimates provide single values from samples to infer about populations.
Sampling distributions illustrate variability of estimates.
Shape and spread of sampling distributions depend on sample size.
Larger sample sizes generally lead to increased reliability of estimates.
Bootstrapping helps refine estimates based on sample variability.
Function | Definition |
---|---|
| Randomly selects a sample from a dataset. |
| Computes the average of a numeric array or list. |
| Calculates the standard deviation of a numeric dataset. |
| Creates a frequency table of categorical data. |
| Conducts hypothesis tests for proportions. |
| Implements bootstrapping techniques for estimating distributions. |
| Generates a histogram for visual representation of data distribution. |
| Provides quantiles for a normal distribution. |
| Performs t-tests to compare means of two groups. |
| Fits linear models to data for regression analysis. |