Population
All individuals that are capable of being chosen
Sample
A chosen subset of a population
Convenience Sample
A sample that is easy to reach (ex; surveying the next 10 customers making a purchase)
Bias
A consistent over or underestimate in a specific direction
Simple Random Sample / SRS
Choosing a group from a population so that every individual and group of individuals is equally likely to be chosen
Label the individuals [assign numbers]
Randomizing [use rng]
Select the individuals with the random numbers
Unbias,=ed, sometimes easy or difficult, and sometimes imprecise
Stratified Random Sample
Splitting a population into groups (strata) and choose a SRS from each strata
Each strata consists of individuals with shared attributes (homogenous)
“Sample some from all groups
Unbiased, very precise, low variability when homogenous
Low, Low
A sampling method works best if it has (low/high) variability and (low/high) bias
Cluster Sample
A sampling method that splits subjects into representative, heterogenous groups
Performs a census of randomly chosen clusters
“sample some from all groups”
Unbiased, very high variability when homogenous clusters
Systematic Random Sample
Choose a random starting point and take a sample using equal intervals
Census
Surveying everyone in a population
Undercoverage Bias
Some people are less likely to be chosen: happens before sampling
Nonresponse Bias
People cannot be reached or do not answer a survey; happens during sampling
Response Bias
Problems with data gathering instrument or process (lies, leading questions, uniforms)
Parameter
A number that summarizes something about a population
Statistic
A number that summarizes something about a sample
Observational Study
A study where no treatment is imposed
Prospective
An observational study that looks ahead
Retrospective
An observational study that “looks back” (uses data that already exists)
Experiment
A study that has treatment imposed ==> is able to show causation
Explanatory Variable / Factor
What is being changed in an experiment (used to predict the response)
Response Variable
What is being measured (SAT Results)
Confounding variables
Other variables affecting the outcome
Experimental Units / Subjects
What or who a treatment is imposed on
Treatment
What is done or not done to subjects / experimental units
Levels or combination of variables lead to the explanatory variable
Well Designed Experiment
Comparison (2+ Treatments)
Random assignment (equivalent groups)
Replication (>1 in each treatment group)
Control: keeping other variables constant
Random Assignment
This can show causation:
Label
Randomize
Assign
Placebo Effect
When a fake treatment (placebo) works
Blinding
When subjects (single-blind) and/or experimenters (double-blind) don’t know which treatment is which
Block Design
(For experiments)
A way to experiment using groups of similar experimental units
Blocking
Separate subjects into blocks
Randomly assigns treatments within each
Variability
If done correctly, blocking reduces _______
Matched Pairs Design
Blocks of size 2
Subjects are paired by similarity
Randomly assign treatment
OR: each subject gets both treatments but the order is randomized
ELIMINATES subject variability
Sampling Variability
Different samples yield different results
Larger samples provide more accurate estimates
Statistically Significant
Results of an experiment are unlikely (<5%) to happen by chance
If _______ → convincing evidence that the treatment caused a difference
Random Selection
___________ Allows us to generalize conclusions to the population from which we sampled
Random Assignment
_________ Allows us to conclude that a treatment causes changes in the response variable
Association
If an experiment does not use random assignment, only an __________ can be determined
Variation
How much distance is there between estimates?
Question Wording Bias
When survey questions are confusing or leading
Self-Reporting Response Bias
When individuals inaccurately report their own traits
Completely Randomized Design
Experiment: This tends to balance effects of confounding variables so that results can be attributed to treatments
Replication
Requires that multiple experimental units recieve the same treatment
Statistical Inference
Decisions from the sample can be attributed to their population
Sampling Frame
A portion of the target population: the group of people being selected from
Target Population
Entire set of individuals in the population that meet the sampling criteria