Statistics review

Study Guide! Chapter 4 - Statistics and Probability

\n

Section 1 - Sampling and Surveys

\n

Terminology

  • Population: entire group of individuals we want information about 

  • Census: collect data from every individual in the population

  • Sample: subset of individuals in a population from which we actually collect data 

  • Individual: object described in a set of data → people, animals, things 

  • Bad Sampling

    • Convenience: choosing individuals from the population who are easy to reach results in a convenience sample. The design of a statistical study shows bias if it is very likely to underestimate or overestimate the value you want to know.
    • Voluntary Response Sampling: allows people to choose to be in the sample by responding to a general invitation.
  • Good Sampling 

    • Simple random sample: involves using a chance process to determine which members of a population are included in the sample 
    • Stratified random sampling: selects a sample by choosing a simple random sample from each stratum and combining the simple random samples into one overall sample. 
    • Cluster sampling → selects a sample by randomly choosing clusters and including each member of the selected clusters in the sample
    • Systematic: selects a sample from an ordered arrangement of the population by randomly selecting one of the first K individuals and choosing every Kth individual thereafter. 
  • Things that can go wrong when sampling

    • Non-response: occurs when an individual chosen for the sample can’t be contacted or refuses to participate 
    • Response Bias: occurs when there is a systematic pattern of inaccurate answers to a survey question
    • Undercoverage: occurs when some members of the population are less likely to be chosen or cannot be chosen in a sample 

    \n \n \n

Section 2 - Experiments

\n

  • Studies

    • Observational
    • Retrospective: Examines existing data for a sample of individuals
    • Prospective: Tracks individuals into the future
    • Experimental 
    • Control Group: used to provide a baseline for comparing the effects of other treatments
    • Experimental Unit: object to which a treatment is randomly assigned
    • Subject: when the experimental unit is human
    • Treatment: specific condition applied to individuals in an experiment
    • Factor: variable that’s manipulated and may cause a change in the response variable
      • Levels: different values of a factor
    • Placebo: treatment that has no active ingredient, but is otherwise like other treatments
      • Placebo Effect: describes the fact that some subjects in an experiment will respond favorably to any treatment
    • Confounding Variables: two variables are associated when their effects on a response variable are the same
    • Double Blind vs Single Blind
    • Double blind: neither the subject or those who interact with and measure responses know which treatment the subject received 
    • Single blind: either the subject or those who interact and measure the response don't know which subjects are getting which treatment
    • Replication: using enough experimental units to distinguish a difference in the effects of the treatments from chance variation due to the random assignment
    • Random Assignment: experimental units are assigned to treatments using a chance process 
    • Randomized Block Design: in each block, experimental units are randomly assigned to treatments 
    • Block: group of experimental units known BEFORE EXPERIMENT to be similar in some way that is expected to affect the response to the treatment
    • Matched Pairs: pairing, easy to compare

    \n

Section 3 - Using Studies Wisely

  • Inference
    • Sampling Variability: refers to the fact that different random samples of the same size from the same population produce different estimates.  Estimates from larger samples are more precise opposed to smaller samples. 
    • When the observed results of a study are too unusual to be explained by chance alone, the results are called Statistically Significant. 
    • Proving causation
    • Experiment
      • Scope of Inference
      • Random individual selection
        • Allows inference about the population from which individuals were chosen
      • Random group assignment
        • Allows inference about the cause and effect
    • Study - there are criteria for establishing causation when you can’t perform an experiment; don’t just assume one thing causes another 
      • Strong Association - check r
      • Consistent Association
      • Greater sample size, greater the correlation - larger values of explanatory variable = stronger responses
      • Cause precedes effect 
      • Cause is plausible  
    • Ethics: Don’t do bad stuff. Don’t experiment on real people, don’t traumatize babies, don’t kill people, Don’t do.
    • Confidential: All individual data must be kept confidential; only statistical group summaries can be made public
    • Consent: Subjects must give consent

\