knowt logo

Statistics review

Study Guide! Chapter 4 - Statistics and Probability


Section 1 - Sampling and Surveys


Terminology

  • Population: entire group of individuals we want information about

  • Census: collect data from every individual in the population

  • Sample: subset of individuals in a population from which we actually collect data

  • Individual: object described in a set of data → people, animals, things

  • Bad Sampling

    • Convenience: choosing individuals from the population who are easy to reach results in a convenience sample. The design of a statistical study shows bias if it is very likely to underestimate or overestimate the value you want to know.

    • Voluntary Response Sampling: allows people to choose to be in the sample by responding to a general invitation.

  • Good Sampling

    • Simple random sample: involves using a chance process to determine which members of a population are included in the sample

    • Stratified random sampling: selects a sample by choosing a simple random sample from each stratum and combining the simple random samples into one overall sample.

    • Cluster sampling → selects a sample by randomly choosing clusters and including each member of the selected clusters in the sample

    • Systematic: selects a sample from an ordered arrangement of the population by randomly selecting one of the first K individuals and choosing every Kth individual thereafter.

  • Things that can go wrong when sampling

    • Non-response: occurs when an individual chosen for the sample can’t be contacted or refuses to participate

    • Response Bias: occurs when there is a systematic pattern of inaccurate answers to a survey question

    • Undercoverage: occurs when some members of the population are less likely to be chosen or cannot be chosen in a sample




Section 2 - Experiments


  • Studies

    • Observational

      • Retrospective: Examines existing data for a sample of individuals

      • Prospective: Tracks individuals into the future

    • Experimental

      • Control Group: used to provide a baseline for comparing the effects of other treatments

      • Experimental Unit: object to which a treatment is randomly assigned

      • Subject: when the experimental unit is human

      • Treatment: specific condition applied to individuals in an experiment

      • Factor: variable that’s manipulated and may cause a change in the response variable

        • Levels: different values of a factor

      • Placebo: treatment that has no active ingredient, but is otherwise like other treatments

        • Placebo Effect: describes the fact that some subjects in an experiment will respond favorably to any treatment

    • Confounding Variables: two variables are associated when their effects on a response variable are the same

    • Double Blind vs Single Blind

      • Double blind: neither the subject or those who interact with and measure responses know which treatment the subject received

      • Single blind: either the subject or those who interact and measure the response don't know which subjects are getting which treatment

    • Replication: using enough experimental units to distinguish a difference in the effects of the treatments from chance variation due to the random assignment

    • Random Assignment: experimental units are assigned to treatments using a chance process

    • Randomized Block Design: in each block, experimental units are randomly assigned to treatments

      • Block: group of experimental units known BEFORE EXPERIMENT to be similar in some way that is expected to affect the response to the treatment

    • Matched Pairs: pairing, easy to compare


Section 3 - Using Studies Wisely

  • Inference

    • Sampling Variability: refers to the fact that different random samples of the same size from the same population produce different estimates.  Estimates from larger samples are more precise opposed to smaller samples.

    • When the observed results of a study are too unusual to be explained by chance alone, the results are called Statistically Significant.

    • Proving causation

      • Experiment

        • Scope of Inference

          • Random individual selection

            • Allows inference about the population from which individuals were chosen

          • Random group assignment

            • Allows inference about the cause and effect

      • Study - there are criteria for establishing causation when you can’t perform an experiment; don’t just assume one thing causes another

        • Strong Association - check r

        • Consistent Association

        • Greater sample size, greater the correlation - larger values of explanatory variable = stronger responses

        • Cause precedes effect

        • Cause is plausible

    • Ethics: Don’t do bad stuff. Don’t experiment on real people, don’t traumatize babies, don’t kill people, Don’t do.

      • Confidential: All individual data must be kept confidential; only statistical group summaries can be made public

      • Consent: Subjects must give consent

Statistics review

Study Guide! Chapter 4 - Statistics and Probability


Section 1 - Sampling and Surveys


Terminology

  • Population: entire group of individuals we want information about

  • Census: collect data from every individual in the population

  • Sample: subset of individuals in a population from which we actually collect data

  • Individual: object described in a set of data → people, animals, things

  • Bad Sampling

    • Convenience: choosing individuals from the population who are easy to reach results in a convenience sample. The design of a statistical study shows bias if it is very likely to underestimate or overestimate the value you want to know.

    • Voluntary Response Sampling: allows people to choose to be in the sample by responding to a general invitation.

  • Good Sampling

    • Simple random sample: involves using a chance process to determine which members of a population are included in the sample

    • Stratified random sampling: selects a sample by choosing a simple random sample from each stratum and combining the simple random samples into one overall sample.

    • Cluster sampling → selects a sample by randomly choosing clusters and including each member of the selected clusters in the sample

    • Systematic: selects a sample from an ordered arrangement of the population by randomly selecting one of the first K individuals and choosing every Kth individual thereafter.

  • Things that can go wrong when sampling

    • Non-response: occurs when an individual chosen for the sample can’t be contacted or refuses to participate

    • Response Bias: occurs when there is a systematic pattern of inaccurate answers to a survey question

    • Undercoverage: occurs when some members of the population are less likely to be chosen or cannot be chosen in a sample




Section 2 - Experiments


  • Studies

    • Observational

      • Retrospective: Examines existing data for a sample of individuals

      • Prospective: Tracks individuals into the future

    • Experimental

      • Control Group: used to provide a baseline for comparing the effects of other treatments

      • Experimental Unit: object to which a treatment is randomly assigned

      • Subject: when the experimental unit is human

      • Treatment: specific condition applied to individuals in an experiment

      • Factor: variable that’s manipulated and may cause a change in the response variable

        • Levels: different values of a factor

      • Placebo: treatment that has no active ingredient, but is otherwise like other treatments

        • Placebo Effect: describes the fact that some subjects in an experiment will respond favorably to any treatment

    • Confounding Variables: two variables are associated when their effects on a response variable are the same

    • Double Blind vs Single Blind

      • Double blind: neither the subject or those who interact with and measure responses know which treatment the subject received

      • Single blind: either the subject or those who interact and measure the response don't know which subjects are getting which treatment

    • Replication: using enough experimental units to distinguish a difference in the effects of the treatments from chance variation due to the random assignment

    • Random Assignment: experimental units are assigned to treatments using a chance process

    • Randomized Block Design: in each block, experimental units are randomly assigned to treatments

      • Block: group of experimental units known BEFORE EXPERIMENT to be similar in some way that is expected to affect the response to the treatment

    • Matched Pairs: pairing, easy to compare


Section 3 - Using Studies Wisely

  • Inference

    • Sampling Variability: refers to the fact that different random samples of the same size from the same population produce different estimates.  Estimates from larger samples are more precise opposed to smaller samples.

    • When the observed results of a study are too unusual to be explained by chance alone, the results are called Statistically Significant.

    • Proving causation

      • Experiment

        • Scope of Inference

          • Random individual selection

            • Allows inference about the population from which individuals were chosen

          • Random group assignment

            • Allows inference about the cause and effect

      • Study - there are criteria for establishing causation when you can’t perform an experiment; don’t just assume one thing causes another

        • Strong Association - check r

        • Consistent Association

        • Greater sample size, greater the correlation - larger values of explanatory variable = stronger responses

        • Cause precedes effect

        • Cause is plausible

    • Ethics: Don’t do bad stuff. Don’t experiment on real people, don’t traumatize babies, don’t kill people, Don’t do.

      • Confidential: All individual data must be kept confidential; only statistical group summaries can be made public

      • Consent: Subjects must give consent