Copy of Topic 3 Notes

Truth in Data: Data collection methods that do not rely on chance lead to untrustworthy conclusions.
Random Assignment:
- Minimizes bias
- Reduces influence of known and unknown factors
- Enables statistical inference methods

Definition of Randomness:
- Not just unpredictability; used as a statistical tool.
- Obtaining truly random values is challenging.

Population vs. Sample:
- A population includes all items/subjects of interest.
- A sample is a subset, enabling generalizations about the population.
Key Criteria for Sampling:
1. Sample must be random.
2. Selected from the relevant population.

Definition: Census attempts to collect data from all subjects in a population.
Challenges:
- Practicality: Hard to locate individuals.
- Timeliness: Populations can change over time.
- Expense: High costs associated with conducting a census.
- Accuracy: Potential for errors in data collection.

Definition: Examinations of existing data without imposing treatments.
Types:
- Retrospective: Studies past data.
- Prospective: Follows individuals into the future.
Sample Surveys:
- Collect data to infer about the population.
- Cannot establish causal relationships.

Simple Random Sample (SRS):
- Every group of a given size has an equal chance of selection.
- Methods: random number generators, tables of random numbers, drawing cards.
Stratified Random Sample:
- Divides population into strata; selects random samples from each.
- Advantages: Reduces variability when strata are homogeneous.
- Disadvantages: Complexity in execution.
Cluster Sample:
- Divides population into clusters; selects whole clusters randomly.
- Advantages: Unbiased and easier to obtain.
- Disadvantages: High variability if clusters are homogeneous.
Systematic Random Sample: Uses random starting point and fixed intervals for selection.

Voluntary Response Bias: Sample consists of volunteers; often unrepresentative.
Undercoverage Bias: Portions of the population have reduced inclusion chances.
Nonresponse Bias: Nonrespondents differ from respondents.
Response Bias: Flaws in survey design impact responses (question wording, self-reports).

Definition: Variations in samples lead to different statistics.
Sample Size Impact: Larger samples reduce sampling error but do not eliminate bias.

Experimental Units: Participants or objects receiving treatments.
Explanatory Variable: Intentionally manipulated variable in an experiment.
Response Variable: Outcome measured post-treatment.
Confounding Variable: Related to both explanatory and response variables, could mislead results.

Key Features:
- Comparative: At least two treatment groups.
- Randomized: Random assignment of treatments.
- Replication: Sufficient experimental units in each group.
- Control: Minimize confounding variables.

Treatments assigned randomly to experimental units, balancing uncontrolled variables.

Control Group: Experimental unit group not given treatment or given placebo.
Placebo: A treatment identical to the real one but inactive.
Blinding: Avoids bias by keeping treatment assignments hidden.

Testing Fertilizer Claims:
- Factor: Fertilizer (3 levels: none, ½ dose, full dose).
- Control: Similar soil, locations, water, sun conditions.
- Random Assignment for reliability.

Subjects arranged in pairs based on relevant characteristics; treatments randomly assigned within pairs.

Changes observed are unlikely due to chance, inferred causation based on statistical significance.

Generalization possible if experimental units are representative of a larger group. Random selection enhances representativeness.