Statistics and Global Issues Notes

Introduction

  • Decision-making in the real world is intensely data-driven.
  • Decisions are based on diverse and ubiquitous data.
  • Decision-makers use statistics to:
    • Collect data
    • Analyze data
    • Make interpretations
    • Build models
    • Describe data using graphs and summary measurements
    • Develop methods for designing experiments and gathering data that are cost-effective and diminish bias.

Statistics and Global Issues

  • Example: Pharmaceutical companies developing a vaccine during a global pandemic.
  • This process uses a controlled experiment.
  • A typical setup involves:
    • Creating two groups sampled from the population.
    • Treatment group (or experimental group): Receives the vaccine.
    • Control group: Does not receive the vaccine.
    • Random assignment of participants to either group.
  • Any difference in the effect of the virus on the two groups can be attributed to the treatment (vaccine).

Definitions

  • Controlled Experiment: An experiment conducted under controlled conditions where one or two factors are changed at a time to determine if a relationship exists between variables.
  • Treatment Group: The group that receives the treatment in the experiment.
  • Control Group: The group that does not receive the treatment; it provides a baseline to determine if the treatment has an effect.
  • Treatment: Something applied or administered to one or more groups in a controlled experiment.
  • Population: The set of all objects or elements about which we are interested in making inferences.
  • Frame: A list containing all members of the population.
  • Population Parameters: Facts about the population.
    • A population can have many parameters.
  • Example: Researchers studying registered voters in a state:
    • Population: Registered voters in the state.
    • Frame: A list of all registered voters in that state.
    • Population Parameters of Interest:
      • Percentage of voters that will vote on election day.
      • Percentage of voters that favor a particular candidate.
      • Average income of voters that favor a particular candidate.
  • For a specific population at a specific point in time, population parameters do not change; they are fixed numbers.
  • The value of a population parameter is seldom known because it involves all population measurements, which are usually too expensive or time-consuming to collect.
  • It is the statistician's job to discover these values.

Samples and Statistics

  • Sample: A subset of the population used to gain insight about the population.
    • Samples are used to represent a larger group, the population.
  • Statistic: A fact or characteristic about a sample.

Visualizations

  • Population (large blue oval):
    • Described by parameters.
    • Values usually cannot be obtained because collecting data from everyone is too expensive/time-consuming.
  • Sample (smaller green oval):
    • A smaller group taken from the population.
    • Its characteristics are the statistics.
  • The goal is to use statistics from the sample to describe the characteristics (parameters) of the population.

Cyclical Nature of Statistics

  • Population → (Sample) - From the population, look at a small group called the sample.
  • Sample → (Statistics) - From the sample, calculate a statistic.
  • Statistics → (Parameter) - The values from the statistic give an idea of the values for the parameter.
  • Parameter → (Population) - The parameter is a characteristic of the population.
  • The cycle:
    • Start with a population.
    • Look at a smaller group (sample).
    • Calculate the statistic from the sample.
    • Use the statistic to estimate the parameter.
    • Use the parameter to describe the population.