PS

Lecture 11 - Introduction to Inference & Sampling

Introduction to Inference & Sampling

What is Statistical Inference?
  • Statistical inference is the process of using a sample to make conclusions about a larger population.

    • Why? Measuring an entire population can be expensive or impractical.

    • Example: Determine the average width of a species of sea star using a representative sample.

Inference in Opinion Polling
  • Example: Angus Reid Poll estimates 53% of Canadians feel unable to keep up with rising costs.

  • Sampling Distribution:

    • Distribution of a statistic (e.g., mean) across multiple samples.

    • Sample Distribution: Distribution from a single sample.

    • Larger sample sizes lead to less variability of sample means due to more data points being more representative of the population.

    • As sample size increases:

      • Sampling distribution becomes narrower.

      • More sample estimates cluster closer to the true population mean.

      • Greater variability of sample point estimates.

Descriptive vs. Inferential Questions
  • Descriptive Questions:

    • Goal: Summarize existing data.

    • Focus: What happened?

    • Methods: Statistics and visualizations.

    • Example Keywords: "What is the average…?"

  • Inferential Questions:

    • Goal: Draw conclusions and generalize findings.

    • Focus: Why did it happen? What will happen?

    • Methods: Statistical tests, modeling, estimating population parameters.

    • Example Keywords: "Is there a significant difference…?"

Key Definitions
  • Point Estimate: A single number estimating an unknown population parameter.

  • Population: Entire set of entities/objects of interest.

  • Random Sampling: Selecting observations where each has an equal likelihood of being chosen.

  • Representative Sampling: Selecting observations that accurately reflect the population's characteristics.

  • Population Parameter: Numerical summary of a population (e.g., mean).

  • Sampling Distribution: Distribution of point estimates from different samples of the same population.

Inference in Market Assessment
  • Example: Assessing the proportion of undergraduate students owning an iPhone.

A/B Testing Example
  • Assess which of two website designs yields higher customer engagement.

Estimation
  • Inferential problem: Estimating a quantitative property of a population (e.g., iPhone ownership proportion).

  • Steps:

    1. Randomly select a sample.

    2. Calculate sample proportion (point estimate) to estimate true population proportion.

Virtual Simulation Experiment
  • Simulate the estimation process for population parameters.

  • Example: Determine iPhone ownership amongst UBC undergrads.

Comments on Distribution
  • Shapes of distributions (bell-shaped, skewed).

  • Identify center measures: Mean, median, mode.

  • Spread measures: Range, standard deviation.

  • Sampling distribution spread indicates precision of point estimates.

Overview of Statistical Inference
  • Fundamental concepts of sampling and estimation.

  • Common techniques: Point estimation and interval estimation.

  • Learning Objectives:

    • Describe inferential questions and population parameters.

    • Define sampling terms and explain their relations.

    • Use R for sampling and creating sampling distributions.

Need for Sampling
  • Understanding how sample observations relate to the broader population.

  • Turn statistical data into estimates for population parameters, which are typically unknown.

Bootstrapping
  • A method for estimating sampling distributions based on random sampling.

  • Useful for improving point estimates and understanding variability within sample estimates.

Summary of Key Concepts
  1. Point estimates provide single values from samples to infer about populations.

  2. Sampling distributions illustrate variability of estimates.

  3. Shape and spread of sampling distributions depend on sample size.

  4. Larger sample sizes generally lead to increased reliability of estimates.

  5. Bootstrapping helps refine estimates based on sample variability.

Code Functions Reference Table

Function

Definition

sample()

Randomly selects a sample from a dataset.

mean()

Computes the average of a numeric array or list.

sd()

Calculates the standard deviation of a numeric dataset.

table()

Creates a frequency table of categorical data.

prop.test()

Conducts hypothesis tests for proportions.

boot()

Implements bootstrapping techniques for estimating distributions.

hist()

Generates a histogram for visual representation of data distribution.

qnorm()

Provides quantiles for a normal distribution.

t.test()

Performs t-tests to compare means of two groups.

lm()

Fits linear models to data for regression analysis.