Copy of Topic 3 Notes

Topic 3: Collecting Data

3.1 Importance of Data Collection

  • Truth in Data: Data collection methods that do not rely on chance lead to untrustworthy conclusions.

  • Random Assignment:

    • Minimizes bias

    • Reduces influence of known and unknown factors

    • Enables statistical inference methods

3.2 Understanding Randomness

  • Definition of Randomness:

    • Not just unpredictability; used as a statistical tool.

    • Obtaining truly random values is challenging.

3.3 Planning a Study

  • Population vs. Sample:

    • A population includes all items/subjects of interest.

    • A sample is a subset, enabling generalizations about the population.

  • Key Criteria for Sampling:

    1. Sample must be random.

    2. Selected from the relevant population.

3.4 Parameters and Statistics

  • Parameter: Numerical summary of a whole population.

  • Statistic: Numerical summary derived from a sample.

3.5 Census

  • Definition: Census attempts to collect data from all subjects in a population.

  • Challenges:

    • Practicality: Hard to locate individuals.

    • Timeliness: Populations can change over time.

    • Expense: High costs associated with conducting a census.

    • Accuracy: Potential for errors in data collection.

3.6 Observational Studies

  • Definition: Examinations of existing data without imposing treatments.

  • Types:

    • Retrospective: Studies past data.

    • Prospective: Follows individuals into the future.

  • Sample Surveys:

    • Collect data to infer about the population.

    • Cannot establish causal relationships.

3.7 Experiments

  • Definition: Assigning various conditions/treatments to participants.

  • Purpose: To determine causal relationships if well-designed.

3.8 Random Sampling Methods

  • Sampling Without Replacement: Each selected item cannot be chosen again.

  • Sampling With Replacement: An item can be selected multiple times.

3.9 Understanding Bias

  • Bias: Systematic favoritism towards certain responses.

  • Avoiding Bias: Use random selection when sampling.

3.10 Sampling Techniques

  • Simple Random Sample (SRS):

    • Every group of a given size has an equal chance of selection.

    • Methods: random number generators, tables of random numbers, drawing cards.

  • Stratified Random Sample:

    • Divides population into strata; selects random samples from each.

    • Advantages: Reduces variability when strata are homogeneous.

    • Disadvantages: Complexity in execution.

  • Cluster Sample:

    • Divides population into clusters; selects whole clusters randomly.

    • Advantages: Unbiased and easier to obtain.

    • Disadvantages: High variability if clusters are homogeneous.

  • Systematic Random Sample: Uses random starting point and fixed intervals for selection.

3.11 Multi-Stage Sampling

  • Definition: Combines several sampling methods for complexity and accuracy.

3.12 Bias in Sampling

  • Voluntary Response Bias: Sample consists of volunteers; often unrepresentative.

  • Undercoverage Bias: Portions of the population have reduced inclusion chances.

  • Nonresponse Bias: Nonrespondents differ from respondents.

  • Response Bias: Flaws in survey design impact responses (question wording, self-reports).

3.13 Sampling Variability

  • Definition: Variations in samples lead to different statistics.

  • Sample Size Impact: Larger samples reduce sampling error but do not eliminate bias.

3.14 Experimental Design

  • Experimental Units: Participants or objects receiving treatments.

  • Explanatory Variable: Intentionally manipulated variable in an experiment.

  • Response Variable: Outcome measured post-treatment.

  • Confounding Variable: Related to both explanatory and response variables, could mislead results.

3.15 Well-Designed Experiments

  • Key Features:

    • Comparative: At least two treatment groups.

    • Randomized: Random assignment of treatments.

    • Replication: Sufficient experimental units in each group.

    • Control: Minimize confounding variables.

3.16 Completely Randomized Design

  • Treatments assigned randomly to experimental units, balancing uncontrolled variables.

3.17 Control Measures

  • Control Group: Experimental unit group not given treatment or given placebo.

  • Placebo: A treatment identical to the real one but inactive.

  • Blinding: Avoids bias by keeping treatment assignments hidden.

3.18 Example Experimental Setup

  • Testing Fertilizer Claims:

    • Factor: Fertilizer (3 levels: none, ½ dose, full dose).

    • Control: Similar soil, locations, water, sun conditions.

    • Random Assignment for reliability.

3.19 Blocking

  • Definition: Treatment assignment within similar groups reduces variability.

3.20 Matched Pairs Design

  • Subjects arranged in pairs based on relevant characteristics; treatments randomly assigned within pairs.

3.21 Statistically Significant Results

  • Changes observed are unlikely due to chance, inferred causation based on statistical significance.

3.22 Generalizing Experiment Results

  • Generalization possible if experimental units are representative of a larger group. Random selection enhances representativeness.