Foundations of Statistical Inference I
Foundations of Statistical Inference I
Introduction to Statistical Inference
Objective of Class: In today’s class, we will cover the following key points:
Discussion on statistical inference.
Learning how to test hypotheses.
Focus on random sampling and its significance, as well as how samples can yield accurate information about a larger population.
What are Inferential Statistics?
Definition: Inferential statistics involve procedures for concluding how closely a relationship observed in a sample reflects the unobserved relationship in the overall population from which the sample was sourced.
Key Inquiry: How representative is the sample of the population being investigated?
Population
Definition of Population:
The population refers to the universe of cases that a researcher wants to describe or study.
Example: In examining prejudice against immigrants in Britain, the population consists of British citizens.
Another Example: If studying financial activities of Political Action Committees (PACs) during recent congressional elections, the population includes all PACs that contributed during these elections.
Population Parameter
Definition: A population parameter is the characteristic or measure of a population that a researcher seeks to determine.
Example in PAC context: The average dollar amount of contributions made by all PACs.
Example in Britain: The average level of prejudice against immigration among British citizens.
Sample
Definition of Sample: A sample consists of a subset of cases or observations drawn from a larger population.
Access to Entire Population Example: In a study of civil wars, the entire population could be all civil wars since 1945 (totaling 127 civil wars).
Situation Necessitating Samples: Researchers often cannot measure the whole population, such as when studying attitudes about immigration in Britain, hence the use of samples is crucial.
Sample Statistic
Definition: Sample statistic serves as an estimate of a population parameter derived from the sample under study.
Goal: The aim is to evaluate how closely the sample statistic approximates the actual population parameter.
Choosing Samples
Three Crucial Factors Influencing Sample Representation:
Random Sampling
Sample Size
Variation in Population Characteristics
1) Random Sampling
Importance of Random Sampling:
The method of selecting a sample is critically important.
Reflects the logic of experimental research design, where random assignment assures that groups are equivalent.
In the context of sample selection, drawing samples randomly from the population enhances representativeness.
Random Selection: Ensures that each individual in the population has an equal chance of being included, effectively reducing bias between the sample and the overall population.
Example of Poor Sampling: The Literary Digest's 1936 prediction of the presidential election showcased a sample bias where only wealthy individuals were polled, leading to incorrect predictions due to not having equal selection chances for all population members. This is referred to as selection bias.
Sampling Scenarios
Sampling from a Class Example:
If a sample were drawn from a class to analyze parking arrangements and student satisfaction at IU Indianapolis, it may not yield representative results.
Random Sampling Error: Even with random selection of 300 students from a total population of 30,000, discrepancies can arise, indicating that the sample might not depict the entire student body accurately.
This situation leads to the random sampling error or standard error.
Random Sampling Error
Definition: Random sampling error quantifies the deviation between the sample statistic and the population parameter due to chance.
Illustrative Example:
If overall satisfaction rates in the population are known to be 7 out of 10, but a sample yields a satisfaction rate of 6.6 from one group and 7.2 from another randomly selected group, differences arise purely from sampling chance.
Formula to Calculate Random Sampling Error:
Relationship Components: Random sampling error is directly tied to variation within the population and inversely related to sample size.
Formula for Random Sampling Error Relation:
Notably, increasing sample size will decrease random sampling error, while increased variation leads to increased sampling error.
Pollock’s Example of Sampling
Scenario: A student organization gauges student attitudes towards the Democratic Party using a feelings thermometer scale from 0 to 100.
High Variation Result: High variation across attitudes leads to large sampling errors.
Effect of Sample Size on Error: If sample size increases, the random sampling error decreases due to a larger denominator in the estimation formula.
2) Sample Size and Random Sampling Error
Inversion of Relationship: Random sampling error exhibits an inverse correlation with sample size, expressed mathematically as:
Diminishing Returns: Increasing the sample size yields diminishing returns in reducing sampling error.
Common Practice: Pollsters typically sample between 1,500 and 2,000 to achieve accurate estimates.
3) Variation and Random Sampling Error
Diversity in Population Attitudes:
Sample error is likely larger if the population has more diverse attitudes compared to one with less variation.
**Standard Error vs. Standard Deviation Distinction:
Standard Error:** Indicates deviation of sample mean from the population mean,
Standard Deviation: Reflects how individual observations vary around the sample mean.
Summary of Standard Error and Standard Deviation
Standard Deviation: Small if individual cases are close to mean; large if cases deviate broadly.
Relationship between the Two:
Standard error is derived from standard deviation to ascertain the estimate of sampling error.
Calculation formula for standard error based on standard deviation.
Calculation Processes for Sample Statistics
Standard Deviation Calculation Example:
Steps:
Calculate Mean (μ=58).
Determine each value’s deviation from the mean.
Square these deviations and sum them for total squared deviations.
Divide by (N-1) to determine variance and subsequently calculate standard deviation:
Standard Error Calculation Example:
The standard deviation criteria are applied to estimate random sampling error:
Variability in Sample Size Statistics
Chances in Distribution: Genetics illustrate how larger samples from a population reduce random error.
Increasing outputs from differing samples lowers the random sampling error whilst raising confidence in estimated sample statistics.
Confidence Intervals Based on Sampling Error
Considering a sample mean could reside in defined ranges based on standard error, there exists:
A 68% probability that sample mean falls within one standard error range of the population mean.
A 95% confidence interval established by two standard errors, leading to prediction metrics employed in evaluating the sample means against known population parameters.