Study Guide! Chapter 4 - Statistics and Probability
Section 1 - Sampling and Surveys
Terminology
Population: entire group of individuals we want information about
Census: collect data from every individual in the population
Sample: subset of individuals in a population from which we actually collect data
Individual: object described in a set of data → people, animals, things
Bad Sampling
Convenience: choosing individuals from the population who are easy to reach results in a convenience sample. The design of a statistical study shows bias if it is very likely to underestimate or overestimate the value you want to know.
Voluntary Response Sampling: allows people to choose to be in the sample by responding to a general invitation.
Good Sampling
Simple random sample: involves using a chance process to determine which members of a population are included in the sample
Stratified random sampling: selects a sample by choosing a simple random sample from each stratum and combining the simple random samples into one overall sample.
Cluster sampling → selects a sample by randomly choosing clusters and including each member of the selected clusters in the sample
Systematic: selects a sample from an ordered arrangement of the population by randomly selecting one of the first K individuals and choosing every Kth individual thereafter.
Things that can go wrong when sampling
Non-response: occurs when an individual chosen for the sample can’t be contacted or refuses to participate
Response Bias: occurs when there is a systematic pattern of inaccurate answers to a survey question
Undercoverage: occurs when some members of the population are less likely to be chosen or cannot be chosen in a sample
Section 2 - Experiments
Studies
Observational
Retrospective: Examines existing data for a sample of individuals
Prospective: Tracks individuals into the future
Experimental
Control Group: used to provide a baseline for comparing the effects of other treatments
Experimental Unit: object to which a treatment is randomly assigned
Subject: when the experimental unit is human
Treatment: specific condition applied to individuals in an experiment
Factor: variable that’s manipulated and may cause a change in the response variable
Levels: different values of a factor
Placebo: treatment that has no active ingredient, but is otherwise like other treatments
Placebo Effect: describes the fact that some subjects in an experiment will respond favorably to any treatment
Confounding Variables: two variables are associated when their effects on a response variable are the same
Double Blind vs Single Blind
Double blind: neither the subject or those who interact with and measure responses know which treatment the subject received
Single blind: either the subject or those who interact and measure the response don't know which subjects are getting which treatment
Replication: using enough experimental units to distinguish a difference in the effects of the treatments from chance variation due to the random assignment
Random Assignment: experimental units are assigned to treatments using a chance process
Randomized Block Design: in each block, experimental units are randomly assigned to treatments
Block: group of experimental units known BEFORE EXPERIMENT to be similar in some way that is expected to affect the response to the treatment
Matched Pairs: pairing, easy to compare
Section 3 - Using Studies Wisely
Inference
Sampling Variability: refers to the fact that different random samples of the same size from the same population produce different estimates. Estimates from larger samples are more precise opposed to smaller samples.
When the observed results of a study are too unusual to be explained by chance alone, the results are called Statistically Significant.
Proving causation
Experiment
Scope of Inference
Random individual selection
Allows inference about the population from which individuals were chosen
Random group assignment
Allows inference about the cause and effect
Study - there are criteria for establishing causation when you can’t perform an experiment; don’t just assume one thing causes another
Strong Association - check r
Consistent Association
Greater sample size, greater the correlation - larger values of explanatory variable = stronger responses
Cause precedes effect
Cause is plausible
Ethics: Don’t do bad stuff. Don’t experiment on real people, don’t traumatize babies, don’t kill people, Don’t do.
Confidential: All individual data must be kept confidential; only statistical group summaries can be made public
Consent: Subjects must give consent