Statistics review

Study Guide! Chapter 4 - Statistics and Probability

Section 1 - Sampling and Surveys

Terminology

Population: entire group of individuals we want information about
Census: collect data from every individual in the population
Sample: subset of individuals in a population from which we actually collect data
Individual: object described in a set of data → people, animals, things
Bad Sampling
- Convenience: choosing individuals from the population who are easy to reach results in a convenience sample. The design of a statistical study shows bias if it is very likely to underestimate or overestimate the value you want to know.
- Voluntary Response Sampling: allows people to choose to be in the sample by responding to a general invitation.
Good Sampling
- Simple random sample: involves using a chance process to determine which members of a population are included in the sample
- Stratified random sampling: selects a sample by choosing a simple random sample from each stratum and combining the simple random samples into one overall sample.
- Cluster sampling → selects a sample by randomly choosing clusters and including each member of the selected clusters in the sample
- Systematic: selects a sample from an ordered arrangement of the population by randomly selecting one of the first K individuals and choosing every Kth individual thereafter.
Things that can go wrong when sampling
- Non-response: occurs when an individual chosen for the sample can’t be contacted or refuses to participate
- Response Bias: occurs when there is a systematic pattern of inaccurate answers to a survey question
- Undercoverage: occurs when some members of the population are less likely to be chosen or cannot be chosen in a sample
\n \n \n

Section 2 - Experiments

Studies
- Observational
- Retrospective: Examines existing data for a sample of individuals
- Prospective: Tracks individuals into the future
- Experimental
- Control Group: used to provide a baseline for comparing the effects of other treatments
- Experimental Unit: object to which a treatment is randomly assigned
- Subject: when the experimental unit is human
- Treatment: specific condition applied to individuals in an experiment
- Factor: variable that’s manipulated and may cause a change in the response variable
  - Levels: different values of a factor
- Placebo: treatment that has no active ingredient, but is otherwise like other treatments
  - Placebo Effect: describes the fact that some subjects in an experiment will respond favorably to any treatment
- Confounding Variables: two variables are associated when their effects on a response variable are the same
- Double Blind vs Single Blind
- Double blind: neither the subject or those who interact with and measure responses know which treatment the subject received
- Single blind: either the subject or those who interact and measure the response don't know which subjects are getting which treatment
- Replication: using enough experimental units to distinguish a difference in the effects of the treatments from chance variation due to the random assignment
- Random Assignment: experimental units are assigned to treatments using a chance process
- Randomized Block Design: in each block, experimental units are randomly assigned to treatments
- Block: group of experimental units known BEFORE EXPERIMENT to be similar in some way that is expected to affect the response to the treatment
- Matched Pairs: pairing, easy to compare
\n

Section 3 - Using Studies Wisely

Inference
- Sampling Variability: refers to the fact that different random samples of the same size from the same population produce different estimates. Estimates from larger samples are more precise opposed to smaller samples.
- When the observed results of a study are too unusual to be explained by chance alone, the results are called Statistically Significant.
- Proving causation
- Experiment
  - Scope of Inference
  - Random individual selection
    - Allows inference about the population from which individuals were chosen
  - Random group assignment
    - Allows inference about the cause and effect
- Study - there are criteria for establishing causation when you can’t perform an experiment; don’t just assume one thing causes another
  - Strong Association - check r
  - Consistent Association
  - Greater sample size, greater the correlation - larger values of explanatory variable = stronger responses
  - Cause precedes effect
  - Cause is plausible
- Ethics: Don’t do bad stuff. Don’t experiment on real people, don’t traumatize babies, don’t kill people, Don’t do.
- Confidential: All individual data must be kept confidential; only statistical group summaries can be made public
- Consent: Subjects must give consent