Study Notes on Collecting Data

CHAPTER 4: COLLECTING DATA

POPULATION

Population refers to the entire group of individuals about which we want to make conclusions.

SAMPLE

A sample is a subset of the population, selected to represent the population in the study.

HOW TO CHOOSE AN SRS?

1. Technology: Random Number Generator (RNG)

Label: Assign each individual a numerical label from 1 to N.
Randomize: Use an RNG to generate n different integers (ignoring any repeats).
Select: Choose individuals corresponding to the selected integers.

2. Table D

Label: Assign each individual a distinct numerical label with the same number of digits (e.g., 01 to NN if using two digits, or 001 to NNN if using three digits).
Randomize: Read consecutive groups of digits from Table D (ignoring repeats, if necessary) until the desired sample size is reached.
Select: Choose individuals corresponding to the randomly selected labels.

3. Slips of Paper

Label: Write corresponding numbers or letters on identical slips of paper.
Randomize: Place slips in a bowl or hat, shuffle, and let individuals draw one slip (with no replacement).
Select: Group individuals based on the slip of paper drawn.

TYPES OF STUDIES

1. Observational Studies

Involve observing individuals and measuring variables of interest without attempting to influence the responses.

2. Experimental Studies

Involve deliberately imposing treatments (conditions) on individuals to measure their responses.

3. Stratified Random Sampling

Divide the population into strata (subgroups that share a characteristic affecting responses).
Choose separately from each stratum and combine these samples, yielding more precise estimates for unknown population values compared to Simple Random Sampling (SRS).

4. Matched Pairs Design

A common experimental design for comparing two treatments, creating blocks of size 2.
Two similar experimental units are paired, and treatments are randomly assigned within each pair. Alternatively, each unit receives both treatments in random order.

SAMPLING WELL

1. Simple Random Sample (SRS)

Gives every possible sample of a specified size the same chance to be selected.

2. Cluster Sampling

Divides the population into non-overlapping groups (clusters) that are geographically near to each other.
Randomly selects clusters and includes all individuals in those clusters. Clusters are heterogeneous within but similar between.
Saves time and resources.

3. Systematic Random Sampling

Selects every k-th individual based on population size and desired sample size.
Randomly select a starting point between 1 to k.
Caution: If there is a pattern in the population's order, this method may yield a biased sample.

BASIC PRINCIPLES OF EXPERIMENTAL DESIGN

Sampling without Replacement: Ensures that an individual, once selected, cannot be chosen again.
Comparison: Use designs that compare two or more treatments.
Random Assignment: Ensure that assignments of experimental units to treatments are random, creating approximately equivalent groups pre-treatment.
Control: Keep other variables constant across groups to avoid confounding, aiding in determining treatment effects.
Replication: Repeat each treatment on sufficient experimental units to distinguish treatment effects from chance differences.

COMPLETELY RANDOMIZED DESIGNS

All assignments are made completely at random; every individual has an equal chance of being placed into any treatment group.
Example: Assign 20 companies numbers from 1 to 20. Use a random number generator to select 10 integers for additional lighting (Treatment 1) and the remaining as the control (Treatment 2). Compare productivity.

RANDOMIZED BLOCK DESIGN

Random assignment of units to treatments occurs separately within each block (grouping based on a known variable).
Example: For a track experiment, pair runners based on their current speed and assign each to run clockwise or counterclockwise first, minimizing variability.

VOCABULARY

Confounding: When two variables are intertwined, making it difficult to distinguish their individual effects on the response variable.
Placebo: A treatment that contains no active substance but is designed to appear identical to other treatments in the study.
Treatment: A specific condition applied to experimental units in a study.
Experimental Unit: The entity to which a treatment is assigned.
Subjects: When referring to human participants as experimental units.
Factors: Explanatory variables manipulated in the study that may affect the response variable.
Levels: Different values of a factor.
Control Group: A baseline used for comparison against treatment groups.
Placebo Effect: A psychological phenomenon where recipients experience a response due to their expectations rather than the treatment itself.
Double-Blind: Neither subjects nor those measuring outcomes know which treatment participants receive.
Single-Blind: Either subjects or the observers are unaware of which treatment is given.
Random Assignment: Experimental units are allocated to treatments using a chance process.

BAD SAMPLES

1. Convenience Sampling

Choosing individuals who are easiest to reach, risking bias.

2. Voluntary Sampling

Individuals opt to join the study upon receiving an open invitation, which can also lead to bias.

BIAS and ARTEFACTS

Undercoverage: Occurs when certain groups within the population are less likely to be sampled.
Nonresponse: When individuals cannot be contacted or refuse to participate after being chosen.
Response Bias: A systematic pattern of inaccurate responses to survey questions.

IDENTIFYING THE PERCENTAGE (P-VALUE)

Steps to Calculate P-Value

Identify the difference in means.
Create a simulation and corresponding dot plot.
Count how many dots are equal to or exceed the mean difference identified in step 1.
Calculate the percentage of dots above or equal to this mean difference.
Compare this percentage to a threshold (typically 5%) to determine statistical significance.

THE SCOPE OF INFERENCE

Criteria for Establishing Causation Without Experimental Evidence

The association between the explanatory and response variable is strong.
The association is consistent across various studies.
Larger values of the explanatory variable correlate with stronger responses.
The cause precedes the effect in time.
The proposed cause is plausible.

ETHICS IN DATA COLLECTION

All planned studies must undergo review by an institutional review board to protect the subjects' well-being.
Informed consent must be obtained from all participants.
The margin of error must be considered when interpreting sampling variability: estimates ± margin of error create intervals of plausible values.
Maintain the confidentiality of individual data; report only statistical summaries for groups.

INFERENCE GUIDELINES

If individuals are __ random assignment: Confirm inference about population.
If individuals are __ random selection: Confirm inference about cause and effect.
Depending on the combinations of random assignments and selections: Assess whether inferences about population or cause-and-effect conclusions can be made.

SUMMARY OF RANDOMNESS IN STUDIES

Random assignment and selection are critical processes facilitating the reliability and validity of study inferences. Ensure these processes are thoroughly integrated into experimental designs for robust outcomes.