Statistics: Observational Units, Populations, Sampling, and Study Designs

Observational units, population, and sample

Observational unit: This is a specific person or thing we collect data from.
- Example: If we're looking at eye colors in a class, each student (Zev, Nadia) is an observational unit.
- Why it matters: Data is gathered for each individual unit.
Population: This is the entire group of observational units we are interested in for a study.
- Examples: All students at a university; all people living in a city.
- Simply put: It’s the full group you want to learn about.
Sample: This is a smaller piece or a subset of the population from which we actually get information.
- Simply put: A sample is just a part of the whole group.
- Examples: A specific class in a university; a group of first-generation students from that university; a subset of international students.
How population and sample relate:
- Population = the complete group you're interested in (e.g., all students at a university).
- Sample = a smaller selection taken from that population (e.g., just students in a specific math class, or engineering students).
Quick examples (from lecture):
- All students in a university → Population
- Students in a university who study mathematics → Sample
- All employees in a company → Population
- Employees in a company who are over 35 years old → Sample
- All residents in an apartment building → Population
- Residents in an apartment building with blue eyes → Sample
Important note about getting data:
- It's often impossible to get information from everyone in the entire population (too big, privacy issues, difficult to reach).
- You can always get information from a sample (though you might need to pick new samples for different studies).
- When you can't survey the whole population, you use a sample and then make educated guesses about the population based on that sample's data (this is called inferential statistics).

Descriptive statistics vs inferential statistics

Descriptive statistics
- What it is: Ways to organize and summarize data (like using graphs, charts, or tables) to clearly show what the data is already telling you.
- Purpose: To make data easier to understand and to present a clear, short summary (think of it as telling a story with numbers).
- Example: A table from the 1948 US presidential election showing candidates, parties, votes, and percentages. The percentage column summarizes what happened with votes.
- Key point: Descriptive statistics simply describe what was observed; they don't try to make predictions or conclusions beyond the data you have.
Inferential statistics
- What it is: Using information from a sample to draw conclusions or make predictions about the larger population that the sample came from.
- Why we use it: Because it's often not possible to collect data from everyone in a population, we use a carefully chosen sample to learn about the whole group.
- Example: A table showing how much time Americans 2 years and older spent watching TV in 2010 and 2011. The inferential part comes when you interpret the data (e.g., noting a "0.2% increase") to explain a trend over time, which goes beyond just the raw numbers.
- Important note: Inferential statistics often look at comparisons or trends, but the conclusions depend on the data and how it's presented.

Observational studies vs designed experiments

Observational study
- Researchers watch and measure characteristics without changing anything in the study environment.
- Examples:
  - Watching if drinking soda is linked to heart disease (this shows a connection or association, not direct cause).
  - Observing the relationship between air pollution and asthma rates in children.
  - Just observing what people wear at a mall (only describing what's seen).
Designed experiment (randomized experiment)
- Researchers actively create different conditions (called "treatments" and "controls") and then observe the results.
- Examples:
  - A drug trial to see if a new drug works (one group gets the drug, another gets a fake pill/placebo, which is the control).
  - A plant growth experiment where different plants get different types of water (tap, filtered, salt, sugar) to see which grows best. These are controlled conditions.
Main difference:
- Observational studies: Can show if things are related or associated, but they cannot prove cause and effect.
- Randomized experiments: Can provide strong evidence of cause and effect if done correctly with proper controls.

Simple random sampling (SRS) and sampling bias

Simple random sampling (SRS)
- This is the best method for sampling: every single person or item in the population has an equal chance of being chosen for the sample.
- Purpose: To get a sample that truly represents the whole population and to avoid making unfair choices (