Statistics: Observational Units, Populations, Sampling, and Study Designs

Observational units, population, and sample

  • Observational unit: This is a specific person or thing we collect data from.

    • Example: If we're looking at eye colors in a class, each student (Zev, Nadia) is an observational unit.

    • Why it matters: Data is gathered for each individual unit.

  • Population: This is the entire group of observational units we are interested in for a study.

    • Examples: All students at a university; all people living in a city.

    • Simply put: It’s the full group you want to learn about.

  • Sample: This is a smaller piece or a subset of the population from which we actually get information.

    • Simply put: A sample is just a part of the whole group.

    • Examples: A specific class in a university; a group of first-generation students from that university; a subset of international students.

  • How population and sample relate:

    • Population = the complete group you're interested in (e.g., all students at a university).

    • Sample = a smaller selection taken from that population (e.g., just students in a specific math class, or engineering students).

  • Quick examples (from lecture):

    • All students in a university → Population

    • Students in a university who study mathematics → Sample

    • All employees in a company → Population

    • Employees in a company who are over 35 years old → Sample

    • All residents in an apartment building → Population

    • Residents in an apartment building with blue eyes → Sample

  • Important note about getting data:

    • It's often impossible to get information from everyone in the entire population (too big, privacy issues, difficult to reach).

    • You can always get information from a sample (though you might need to pick new samples for different studies).

    • When you can't survey the whole population, you use a sample and then make educated guesses about the population based on that sample's data (this is called inferential statistics).

Descriptive statistics vs inferential statistics

  • Descriptive statistics

    • What it is: Ways to organize and summarize data (like using graphs, charts, or tables) to clearly show what the data is already telling you.

    • Purpose: To make data easier to understand and to present a clear, short summary (think of it as telling a story with numbers).

    • Example: A table from the 1948 US presidential election showing candidates, parties, votes, and percentages. The percentage column summarizes what happened with votes.

    • Key point: Descriptive statistics simply describe what was observed; they don't try to make predictions or conclusions beyond the data you have.

  • Inferential statistics

    • What it is: Using information from a sample to draw conclusions or make predictions about the larger population that the sample came from.

    • Why we use it: Because it's often not possible to collect data from everyone in a population, we use a carefully chosen sample to learn about the whole group.

    • Example: A table showing how much time Americans 2 years and older spent watching TV in 2010 and 2011. The inferential part comes when you interpret the data (e.g., noting a "0.2% increase") to explain a trend over time, which goes beyond just the raw numbers.

    • Important note: Inferential statistics often look at comparisons or trends, but the conclusions depend on the data and how it's presented.

Observational studies vs designed experiments

  • Observational study

    • Researchers watch and measure characteristics without changing anything in the study environment.

    • Examples:

      • Watching if drinking soda is linked to heart disease (this shows a connection or association, not direct cause).

      • Observing the relationship between air pollution and asthma rates in children.

      • Just observing what people wear at a mall (only describing what's seen).

  • Designed experiment (randomized experiment)

    • Researchers actively create different conditions (called "treatments" and "controls") and then observe the results.

    • Examples:

      • A drug trial to see if a new drug works (one group gets the drug, another gets a fake pill/placebo, which is the control).

      • A plant growth experiment where different plants get different types of water (tap, filtered, salt, sugar) to see which grows best. These are controlled conditions.

  • Main difference:

    • Observational studies: Can show if things are related or associated, but they cannot prove cause and effect.

    • Randomized experiments: Can provide strong evidence of cause and effect if done correctly with proper controls.

Simple random sampling (SRS) and sampling bias

  • Simple random sampling (SRS)

    • This is the best method for sampling: every single person or item in the population has an equal chance of being chosen for the sample.

    • Purpose: To get a sample that truly represents the whole population and to avoid making unfair choices (