Class9

Class 9: Introduction to Statistical Inference

  • Statistical Inference introduces tools for estimating population characteristics based on sample analysis.

  • Key Topics:

    • Point and Interval Estimation

    • The application of statistics in social sciences.

    • Course offered by the Department of Statistics, UC3M.

Chapter 9: Statistical Inference

  • Topics Covered:

    1. The Sampling Distribution

    2. Point Estimation

    3. Interval Estimation

  • Recommended Reading:

    • Chapters 20 and 21 of Peña and Romo (1997).

Objective

  • To grasp basic concepts of statistical inference including:

    • How a sample helps estimate wider population characteristics.

    • Example: If the mean salary in Spain is €25,000, what is the probability of a sample of 100 people showing an average salary of €22,000?

    • Understanding implications on population hypothesis.

Statistical Inference

  • Descriptive Statistics: e.g., Sample mean salary of 100 workers is €22,000 with a standard deviation of €2,000.

  • Probability Determination: Assess if it is probable to find such a sample if the true mean is assumed to be €25,000.

  • Inference: If the derived probability is very low, the hypothesis μ=€25,000 is rejected.

The Sampling Distribution

  • Sample Mean Distribution:

    • Different samples yield different means; the sample mean (X̄) is a random variable before sampling.

    • Formulas:

      • = μ

      • = σ²/n

    • Sample mean tends to follow a normal distribution with adequate sample size (n > 30).

The Sampling Distribution Example

  • Sample Variable: X = Salary of a random Spanish worker.

  • To estimate μ:

Point Estimation

  • The sample mean (X̄) serves as an effective point estimate of the population mean (μ).

  • It exhibits favorable statistical properties such as:

    • Unbiasedness

    • Maximum Likelihood Estimation

    • Variance estimation (S² is a good estimator for σ²).

Interval Estimates

  • Goal: Identify an interval that is likely to contain μ.

    • Wide Interval: Corresponds to lower precision.

    • Narrow Interval: Increases risk of error.

  • Probability Approach:

    • Choose confidence levels (e.g., 95%, 90%, 99%)

    • Variables: L(X₁,...,Xn), U(X₁,...,Xn) aiming for P(L < μ < U) = 95%.

    • Confidence Interval: CI95%(μ) = (L(X₁,...,Xn), U(X₁,...,Xn)).

Interpretation

  • If 95% confidence intervals are constructed from multiple experiments, 95% of these intervals will include the true parameter being estimated.

  • Notably, the probability related to a specific computed interval isn't 0.95 for containing μ post-calculation.

A 95% Confidence Interval for the Population Mean (σ² Known)

  • Formula:

    • When σ² is known: CI95%(μ) = (X̄ - 1.96·σ/√n, X̄ + 1.96·σ/√n).

    • Justification of 1.96 as a critical value from the normal distribution.

    • Comparing 95% confidence interval with 99% interval regarding width.

Examples for Confidence Intervals

  1. Tuition Fees:

    • Sample of 20 students from Madrid with a mean fee of €2000, standard deviation €500 → Calculate a 95% CI.

  2. Heights:

    • Sample of 10 International Studies students with mean height 170cm, standard deviation 5cm → 99% CI calculation.

Calculation via Excel

  • Excel capabilities for confidence interval calculation:

    • Does not compute the whole interval but can compute value for 1.96σ/√n, which can be added or subtracted from the mean for the interval.

    • Example: CI for mean salary in Madrid calculated from €2000 ± 219.13.

A 95% Confidence Interval (σ² Unknown)

  • Adjusted for Student’s t Distribution:

    • CI95%(μ) = (X̄ - t(n-1, 0.975)·s/√n, X̄ + t(n-1, 0.975)·s/√n).

    • Utilization of Excel to obtain t-values.

Example Calculation for Sentences

  • Data on 19 prison sentences: Mean = 72.7 months, Standard Deviation = 10.2 months.

  • Calculate 95% CI for mean duration.

Calculation for Fraud Sentences via Excel

  • Simplified CI calculation method utilizing Excel similar to previous examples, produce CI for fraud sentences: 72.7 months ± 4.92.

Non-Normal Distribution (σ² Unknown)

  • Constructing CI with sufficient sample size (n > 30):

    • CI95%(μ) = (X̄ - 1.96·σ/√n, X̄ + 1.96·σ/√n) or CI based on sample standard deviation s if σ² is unknown.

Confidence Interval for a Proportion

  • For a sample proportion p of size n: CI95%(p) = (p - 1.96·√[p(1-p)]/√n, p + 1.96·√[p(1-p)]/√n).

    • Similar principles apply, with variation based on changing confidence levels.

Example for Proportion

  • In a 100 adult sample, 45 express concern over a sedentary lifestyle:

    • Point estimate and CI for the population.

Calculation for Proportion via Excel

  • Use mean calculation functionality in Excel adjusted for proportion’s variance to determine CI: CI99%(population proportion) = 0.45 ± 0.128.

Exercise

  • From a recent YouGov survey, calculate the 90% confidence interval for UK adults opposing entry of Ukrainian refugees without a visa, based on respondent political preferences.