Copy of Topic 3 Notes
Topic 3: Collecting Data
3.1 Importance of Data Collection
Truth in Data: Data collection methods that do not rely on chance lead to untrustworthy conclusions.
Random Assignment:
Minimizes bias
Reduces influence of known and unknown factors
Enables statistical inference methods
3.2 Understanding Randomness
Definition of Randomness:
Not just unpredictability; used as a statistical tool.
Obtaining truly random values is challenging.
3.3 Planning a Study
Population vs. Sample:
A population includes all items/subjects of interest.
A sample is a subset, enabling generalizations about the population.
Key Criteria for Sampling:
Sample must be random.
Selected from the relevant population.
3.4 Parameters and Statistics
Parameter: Numerical summary of a whole population.
Statistic: Numerical summary derived from a sample.
3.5 Census
Definition: Census attempts to collect data from all subjects in a population.
Challenges:
Practicality: Hard to locate individuals.
Timeliness: Populations can change over time.
Expense: High costs associated with conducting a census.
Accuracy: Potential for errors in data collection.
3.6 Observational Studies
Definition: Examinations of existing data without imposing treatments.
Types:
Retrospective: Studies past data.
Prospective: Follows individuals into the future.
Sample Surveys:
Collect data to infer about the population.
Cannot establish causal relationships.
3.7 Experiments
Definition: Assigning various conditions/treatments to participants.
Purpose: To determine causal relationships if well-designed.
3.8 Random Sampling Methods
Sampling Without Replacement: Each selected item cannot be chosen again.
Sampling With Replacement: An item can be selected multiple times.
3.9 Understanding Bias
Bias: Systematic favoritism towards certain responses.
Avoiding Bias: Use random selection when sampling.
3.10 Sampling Techniques
Simple Random Sample (SRS):
Every group of a given size has an equal chance of selection.
Methods: random number generators, tables of random numbers, drawing cards.
Stratified Random Sample:
Divides population into strata; selects random samples from each.
Advantages: Reduces variability when strata are homogeneous.
Disadvantages: Complexity in execution.
Cluster Sample:
Divides population into clusters; selects whole clusters randomly.
Advantages: Unbiased and easier to obtain.
Disadvantages: High variability if clusters are homogeneous.
Systematic Random Sample: Uses random starting point and fixed intervals for selection.
3.11 Multi-Stage Sampling
Definition: Combines several sampling methods for complexity and accuracy.
3.12 Bias in Sampling
Voluntary Response Bias: Sample consists of volunteers; often unrepresentative.
Undercoverage Bias: Portions of the population have reduced inclusion chances.
Nonresponse Bias: Nonrespondents differ from respondents.
Response Bias: Flaws in survey design impact responses (question wording, self-reports).
3.13 Sampling Variability
Definition: Variations in samples lead to different statistics.
Sample Size Impact: Larger samples reduce sampling error but do not eliminate bias.
3.14 Experimental Design
Experimental Units: Participants or objects receiving treatments.
Explanatory Variable: Intentionally manipulated variable in an experiment.
Response Variable: Outcome measured post-treatment.
Confounding Variable: Related to both explanatory and response variables, could mislead results.
3.15 Well-Designed Experiments
Key Features:
Comparative: At least two treatment groups.
Randomized: Random assignment of treatments.
Replication: Sufficient experimental units in each group.
Control: Minimize confounding variables.
3.16 Completely Randomized Design
Treatments assigned randomly to experimental units, balancing uncontrolled variables.
3.17 Control Measures
Control Group: Experimental unit group not given treatment or given placebo.
Placebo: A treatment identical to the real one but inactive.
Blinding: Avoids bias by keeping treatment assignments hidden.
3.18 Example Experimental Setup
Testing Fertilizer Claims:
Factor: Fertilizer (3 levels: none, ½ dose, full dose).
Control: Similar soil, locations, water, sun conditions.
Random Assignment for reliability.
3.19 Blocking
Definition: Treatment assignment within similar groups reduces variability.
3.20 Matched Pairs Design
Subjects arranged in pairs based on relevant characteristics; treatments randomly assigned within pairs.
3.21 Statistically Significant Results
Changes observed are unlikely due to chance, inferred causation based on statistical significance.
3.22 Generalizing Experiment Results
Generalization possible if experimental units are representative of a larger group. Random selection enhances representativeness.