Foundations of Statistical Inference I

Objective of Class: In today’s class, we will cover the following key points:
- Discussion on statistical inference.
- Learning how to test hypotheses.
- Focus on random sampling and its significance, as well as how samples can yield accurate information about a larger population.

Definition: Inferential statistics involve procedures for concluding how closely a relationship observed in a sample reflects the unobserved relationship in the overall population from which the sample was sourced.
Key Inquiry: How representative is the sample of the population being investigated?

Definition of Population:
- The population refers to the universe of cases that a researcher wants to describe or study.
- Example: In examining prejudice against immigrants in Britain, the population consists of British citizens.
- Another Example: If studying financial activities of Political Action Committees (PACs) during recent congressional elections, the population includes all PACs that contributed during these elections.

Definition: A population parameter is the characteristic or measure of a population that a researcher seeks to determine.
- Example in PAC context: The average dollar amount of contributions made by all PACs.
- Example in Britain: The average level of prejudice against immigration among British citizens.

Definition of Sample: A sample consists of a subset of cases or observations drawn from a larger population.
- Access to Entire Population Example: In a study of civil wars, the entire population could be all civil wars since 1945 (totaling 127 civil wars).
- Situation Necessitating Samples: Researchers often cannot measure the whole population, such as when studying attitudes about immigration in Britain, hence the use of samples is crucial.

Definition: Sample statistic serves as an estimate of a population parameter derived from the sample under study.
Goal: The aim is to evaluate how closely the sample statistic approximates the actual population parameter.

Three Crucial Factors Influencing Sample Representation:
1. Random Sampling
2. Sample Size
3. Variation in Population Characteristics

Importance of Random Sampling:
- The method of selecting a sample is critically important.
- Reflects the logic of experimental research design, where random assignment assures that groups are equivalent.
- In the context of sample selection, drawing samples randomly from the population enhances representativeness.
Random Selection: Ensures that each individual in the population has an equal chance of being included, effectively reducing bias between the sample and the overall population.
Example of Poor Sampling: The Literary Digest's 1936 prediction of the presidential election showcased a sample bias where only wealthy individuals were polled, leading to incorrect predictions due to not having equal selection chances for all population members. This is referred to as selection bias.

Sampling from a Class Example:
- If a sample were drawn from a class to analyze parking arrangements and student satisfaction at IU Indianapolis, it may not yield representative results.
Random Sampling Error: Even with random selection of 300 students from a total population of 30,000, discrepancies can arise, indicating that the sample might not depict the entire student body accurately.
- This situation leads to the random sampling error or standard error.

Definition: Random sampling error quantifies the deviation between the sample statistic and the population parameter due to chance.
- Illustrative Example:
- If overall satisfaction rates in the population are known to be 7 out of 10, but a sample yields a satisfaction rate of 6.6 from one group and 7.2 from another randomly selected group, differences arise purely from sampling chance.
Formula to Calculate Random Sampling Error:
$ext{Population Parameter} = ext{Sample Statistic} ext{ +/- Random Sampling Error}$
Relationship Components: Random sampling error is directly tied to variation within the population and inversely related to sample size.
- Formula for Random Sampling Error Relation:
  $ext{Random Sampling Error} = rac{ ext{Variation Component}}{ ext{Sample Size Component}}$
- Notably, increasing sample size will decrease random sampling error, while increased variation leads to increased sampling error.

Scenario: A student organization gauges student attitudes towards the Democratic Party using a feelings thermometer scale from 0 to 100.
- High Variation Result: High variation across attitudes leads to large sampling errors.
Effect of Sample Size on Error: If sample size increases, the random sampling error decreases due to a larger denominator in the estimation formula.

Inversion of Relationship: Random sampling error exhibits an inverse correlation with sample size, expressed mathematically as:
$ext{Random Sampling Error} = rac{ ext{Variation Component}}{ ext{√Sample Size}}$
Diminishing Returns: Increasing the sample size yields diminishing returns in reducing sampling error.
Common Practice: Pollsters typically sample between 1,500 and 2,000 to achieve accurate estimates.

Diversity in Population Attitudes:
- Sample error is likely larger if the population has more diverse attitudes compared to one with less variation.
- **Standard Error vs. Standard Deviation Distinction:
- Standard Error:** Indicates deviation of sample mean from the population mean,
- Standard Deviation: Reflects how individual observations vary around the sample mean.

Standard Deviation: Small if individual cases are close to mean; large if cases deviate broadly.
Relationship between the Two:
- Standard error is derived from standard deviation to ascertain the estimate of sampling error.
- Calculation formula for standard error based on standard deviation.

Steps:
1. Calculate Mean (μ=58).
2. Determine each value’s deviation from the mean.
3. Square these deviations and sum them for total squared deviations.
4. Divide by (N-1) to determine variance and subsequently calculate standard deviation:
  $ext{Standard Deviation} (σ) = ext{√Variance}$

The standard deviation criteria are applied to estimate random sampling error:
$ext{Standard Error (σM)} = rac{σ}{ ext{√Sample Size}}$

Chances in Distribution: Genetics illustrate how larger samples from a population reduce random error.
Increasing outputs from differing samples lowers the random sampling error whilst raising confidence in estimated sample statistics.

Considering a sample mean could reside in defined ranges based on standard error, there exists:
- A 68% probability that sample mean falls within one standard error range of the population mean.
- A 95% confidence interval established by two standard errors, leading to prediction metrics employed in evaluating the sample means against known population parameters.