Data Production Notes #1
Data Production: Take sample data from the population, with sampling and study designs that avoid bias.
Displaying and Summarizing: Use appropriate displays and summaries of the sample data, according to variable types and roles.
Probability: Assume we know what's true for the population; how should random samples behave?
Statistical Inference: Assume we only know what's true about sampled values of a single variable or relationship: what can we infer about the larger population?
Population: The entire collection of individuals ypu want to learn about.
(Parameter) is the numerical value
Sample: Part of the population thsat is selected for the study
(Statistic) is the numerical value
Experiment: A study where the person conducting th study considres how a reepsonse behaves under experimental conditions.
Observatory study: A study where the person conductiong observes characteristics of a sample selcted from a population.
Hawthorne Effect: Noticing that employees work harder simply ecause management installed a camera and they know that they are being watched.
Anecdotal Evience: A friend claims that drinking herba ea cured their cold- butthats’s just one story, not scientific proof.
Confounding Variables: Finding that children who wear glasses tend to do better in school, without realizing that it's age (not glasses) influencing both
Lack of Realism: Testing the effect of loud music on concentration in a quiet lab maynot represent a stoudent’s noisy home environment..
Paired Sample: Measuring students test scores before and after a special tutoting session to see improvement.
Two Sample: Comparing average heights between tw independent groups of men and women.
Type 1 Error: A COVID-19 test says “positive” when the patient is actually healthy ( a false alarm).
Type 2 Error: A cancer screenint test says “negative” when the patient actually has cancer— a missed diagnosis.
Types of samples
Simple Random Sample (SRS) | Selects individuals at random and without replacement |
Stratified | Separate random samples from groups of similar individuals within a population |
Cluster | Select small groups (clusters) at random - all units in cluster all sampled |
Systematic | Selects from an ordered arrangement - random start then 1 in k people |
Convience/Haphazard | Non random based on ease of access |
Volunteer | Volunteer to be part of a sample |
Avoiding Bias