Class9
Class 9: Introduction to Statistical Inference
Statistical Inference introduces tools for estimating population characteristics based on sample analysis.
Key Topics:
Point and Interval Estimation
The application of statistics in social sciences.
Course offered by the Department of Statistics, UC3M.
Chapter 9: Statistical Inference
Topics Covered:
The Sampling Distribution
Point Estimation
Interval Estimation
Recommended Reading:
Chapters 20 and 21 of Peña and Romo (1997).
Objective
To grasp basic concepts of statistical inference including:
How a sample helps estimate wider population characteristics.
Example: If the mean salary in Spain is €25,000, what is the probability of a sample of 100 people showing an average salary of €22,000?
Understanding implications on population hypothesis.
Statistical Inference
Descriptive Statistics: e.g., Sample mean salary of 100 workers is €22,000 with a standard deviation of €2,000.
Probability Determination: Assess if it is probable to find such a sample if the true mean is assumed to be €25,000.
Inference: If the derived probability is very low, the hypothesis μ=€25,000 is rejected.
The Sampling Distribution
Sample Mean Distribution:
Different samples yield different means; the sample mean (X̄) is a random variable before sampling.
Formulas:
= μ
= σ²/n
Sample mean tends to follow a normal distribution with adequate sample size (n > 30).
The Sampling Distribution Example
Sample Variable: X = Salary of a random Spanish worker.
To estimate μ:
Point Estimation
The sample mean (X̄) serves as an effective point estimate of the population mean (μ).
It exhibits favorable statistical properties such as:
Unbiasedness
Maximum Likelihood Estimation
Variance estimation (S² is a good estimator for σ²).
Interval Estimates
Goal: Identify an interval that is likely to contain μ.
Wide Interval: Corresponds to lower precision.
Narrow Interval: Increases risk of error.
Probability Approach:
Choose confidence levels (e.g., 95%, 90%, 99%)
Variables: L(X₁,...,Xn), U(X₁,...,Xn) aiming for P(L < μ < U) = 95%.
Confidence Interval: CI95%(μ) = (L(X₁,...,Xn), U(X₁,...,Xn)).
Interpretation
If 95% confidence intervals are constructed from multiple experiments, 95% of these intervals will include the true parameter being estimated.
Notably, the probability related to a specific computed interval isn't 0.95 for containing μ post-calculation.
A 95% Confidence Interval for the Population Mean (σ² Known)
Formula:
When σ² is known: CI95%(μ) = (X̄ - 1.96·σ/√n, X̄ + 1.96·σ/√n).
Justification of 1.96 as a critical value from the normal distribution.
Comparing 95% confidence interval with 99% interval regarding width.
Examples for Confidence Intervals
Tuition Fees:
Sample of 20 students from Madrid with a mean fee of €2000, standard deviation €500 → Calculate a 95% CI.
Heights:
Sample of 10 International Studies students with mean height 170cm, standard deviation 5cm → 99% CI calculation.
Calculation via Excel
Excel capabilities for confidence interval calculation:
Does not compute the whole interval but can compute value for 1.96σ/√n, which can be added or subtracted from the mean for the interval.
Example: CI for mean salary in Madrid calculated from €2000 ± 219.13.
A 95% Confidence Interval (σ² Unknown)
Adjusted for Student’s t Distribution:
CI95%(μ) = (X̄ - t(n-1, 0.975)·s/√n, X̄ + t(n-1, 0.975)·s/√n).
Utilization of Excel to obtain t-values.
Example Calculation for Sentences
Data on 19 prison sentences: Mean = 72.7 months, Standard Deviation = 10.2 months.
Calculate 95% CI for mean duration.
Calculation for Fraud Sentences via Excel
Simplified CI calculation method utilizing Excel similar to previous examples, produce CI for fraud sentences: 72.7 months ± 4.92.
Non-Normal Distribution (σ² Unknown)
Constructing CI with sufficient sample size (n > 30):
CI95%(μ) = (X̄ - 1.96·σ/√n, X̄ + 1.96·σ/√n) or CI based on sample standard deviation s if σ² is unknown.
Confidence Interval for a Proportion
For a sample proportion p of size n: CI95%(p) = (p - 1.96·√[p(1-p)]/√n, p + 1.96·√[p(1-p)]/√n).
Similar principles apply, with variation based on changing confidence levels.
Example for Proportion
In a 100 adult sample, 45 express concern over a sedentary lifestyle:
Point estimate and CI for the population.
Calculation for Proportion via Excel
Use mean calculation functionality in Excel adjusted for proportion’s variance to determine CI: CI99%(population proportion) = 0.45 ± 0.128.
Exercise
From a recent YouGov survey, calculate the 90% confidence interval for UK adults opposing entry of Ukrainian refugees without a visa, based on respondent political preferences.