Lecture+9

Pearson Correlation:
- **Assumptions: **
  - Data must be continuous (interval or ratio).
  - Data should be linear; avoid parabolic or non-parametric shapes.
  - No outliers should be present as they can skew results.
  - Data must be normally distributed; assess using skew and kurtosis or through histograms.
Spearman Correlation:
- Evaluates relationships involving at least one ordinal variable and two quantitative variables under partial linearity.
Kendall’s Tau-b:
- Used for two qualitative (or categorical) ordinal variables.
- In R, use: add , method = "type") where type is pearson, spearman, or kendall.

Definition of Samples:
- Samples provide information about a population, defined as any well-defined set of units.
- A population can refer to a variety of entities, not just people (counties, businesses, etc.).
Population Example: All adult citizens (18+) in the USA in 2018.
- Direct interviews with the entire population are impractical.
- Sampling involves selecting a subset for investigation.
Significance of Sampling: The sample size and selection method impact the accuracy and reliability of inferences about the whole population.

After gathering samples, researchers measure characteristics of interest to approximate population parameters.
Sample statistics provide estimates for population parameters, which will typically not match exactly but should be close under appropriate sampling procedures.
Goal of Statistical Inference: To conjecture unknown population characteristics based on sample statistics; crucial is the concept of sampling distribution.

Example with Presidential Approval:
- If we interview ten adult Americans about Trump’s performance and find 80% approval rate, whereas the real population parameter is 50%, this difference is termed sampling error.
- Each sample could yield different approval statistics due to sample size limitations.
Continuous Sampling: Taking multiple samples leads to varied estimates, but the average may converge closer to the true population parameter.
- For instance, averaging different samples may yield values approaching the actual parameter.
Normal Distribution Appearances: As more samples accumulate, a bell-shaped normal distribution emerges, where the mean correlates with the population parameter.

Element: A unit of analysis; in the presidential approval study, it's an individual American adult.
Sampling Frame: A comprehensive list from which sampling units are drawn, essential for accurate representation of the population.

Probability Sample: Each element has a known probability of selection, providing robustness.
Non-Probability Sample: Each element has an unknown probability of being selected, which can introduce bias.
Probability samples are generally preferred because they promote representativeness.

Characteristics: Involve equal chances for each element in the population to be included, ideally using a complete sampling frame.
- Advantages: Ensures a representative sample with larger sizes.
- Disadvantages: May be challenging to achieve a perfect sampling frame.
- Important note: All samples have potential for error, and no sample accurately captures the entire population.

Posttest Design: A classical experimental design without a pretest, relying on random assignment and large sample sizes to infer causal relations.
- Researchers must establish that treatment occurs before measuring dependent variables to support causal inferences.
- Random assignment mitigates the impact of pre-experiment differences across groups, but researchers can't fully ascertain the magnitude of differences with this approach.