Statistics: Observational Units, Populations, Sampling, and Study Designs
Observational units, population, and sample
Observational unit: This is a specific person or thing we collect data from.
Example: If we're looking at eye colors in a class, each student (Zev, Nadia) is an observational unit.
Why it matters: Data is gathered for each individual unit.
Population: This is the entire group of observational units we are interested in for a study.
Examples: All students at a university; all people living in a city.
Simply put: It’s the full group you want to learn about.
Sample: This is a smaller piece or a subset of the population from which we actually get information.
Simply put: A sample is just a part of the whole group.
Examples: A specific class in a university; a group of first-generation students from that university; a subset of international students.
How population and sample relate:
Population = the complete group you're interested in (e.g., all students at a university).
Sample = a smaller selection taken from that population (e.g., just students in a specific math class, or engineering students).
Quick examples (from lecture):
All students in a university → Population
Students in a university who study mathematics → Sample
All employees in a company → Population
Employees in a company who are over 35 years old → Sample
All residents in an apartment building → Population
Residents in an apartment building with blue eyes → Sample
Important note about getting data:
It's often impossible to get information from everyone in the entire population (too big, privacy issues, difficult to reach).
You can always get information from a sample (though you might need to pick new samples for different studies).
When you can't survey the whole population, you use a sample and then make educated guesses about the population based on that sample's data (this is called inferential statistics).
Descriptive statistics vs inferential statistics
Descriptive statistics
What it is: Ways to organize and summarize data (like using graphs, charts, or tables) to clearly show what the data is already telling you.
Purpose: To make data easier to understand and to present a clear, short summary (think of it as telling a story with numbers).
Example: A table from the 1948 US presidential election showing candidates, parties, votes, and percentages. The percentage column summarizes what happened with votes.
Key point: Descriptive statistics simply describe what was observed; they don't try to make predictions or conclusions beyond the data you have.
Inferential statistics
What it is: Using information from a sample to draw conclusions or make predictions about the larger population that the sample came from.
Why we use it: Because it's often not possible to collect data from everyone in a population, we use a carefully chosen sample to learn about the whole group.
Example: A table showing how much time Americans 2 years and older spent watching TV in 2010 and 2011. The inferential part comes when you interpret the data (e.g., noting a "0.2% increase") to explain a trend over time, which goes beyond just the raw numbers.
Important note: Inferential statistics often look at comparisons or trends, but the conclusions depend on the data and how it's presented.
Observational studies vs designed experiments
Observational study
Researchers watch and measure characteristics without changing anything in the study environment.
Examples:
Watching if drinking soda is linked to heart disease (this shows a connection or association, not direct cause).
Observing the relationship between air pollution and asthma rates in children.
Just observing what people wear at a mall (only describing what's seen).
Designed experiment (randomized experiment)
Researchers actively create different conditions (called "treatments" and "controls") and then observe the results.
Examples:
A drug trial to see if a new drug works (one group gets the drug, another gets a fake pill/placebo, which is the control).
A plant growth experiment where different plants get different types of water (tap, filtered, salt, sugar) to see which grows best. These are controlled conditions.
Main difference:
Observational studies: Can show if things are related or associated, but they cannot prove cause and effect.
Randomized experiments: Can provide strong evidence of cause and effect if done correctly with proper controls.
Simple random sampling (SRS) and sampling bias
Simple random sampling (SRS)
This is the best method for sampling: every single person or item in the population has an equal chance of being chosen for the sample.
Purpose: To get a sample that truly represents the whole population and to avoid making unfair choices (