definitions found on the ap stat midterm list
Categorical data
Words
Ex- male/female, eye color, breed of dogs
Quantitative data
Numerical
Ex- weight of hamsters, amounts of chemicals in beverages
Bar graphs
Bars do not touch- for graphing one variable
Pie charts
Percentages must sum up to 100%, for graphing one variable
Dot plots
Can resemble probability curves, for graphing more than one variable
Stem and leaf plots
Remember to put key, split the stems if there is too much data
Histogram
Looks like a bar graph but allows bars to touch, groups data into classes, shape is easily visible
Box plot
Shows 5-number summary and outliers, side-by-sides are good for comparing quartile, median, and spread
Scatterplots
Look for relations between variables
Describe using form, strength, and direction
Linear correlation coefficient (r)
Measures the strength of the linear relationship -1>r>1
Least squares regression line (LSRL)
used for prediction
Minimizes the vertical distances from each data point to the line drawn
y-hat
Predicted y value
Extrapolation
Predicting a y value when the x is far from the other x values and not represented on the visual graph
Coefficient of Determination (r^2)
gives the proportion (%) of variation in the values of y that can be explained by the linear relationship with x seen in the regression line
Residual (y-y hat)
Vertical distance from actual data point to the regression line
Residual plot
Scatterplot of observed x values and predicted y values or (x, y hat)
Lurking variables
Variables not identified or considered
Confounding variables
A third party variable affects the response variable only
Common response
A third party variable affects both the explanatory and response variables
Census
Contacts every individual in the population to obtain data
Mean and SD are parameters and are only used with population data
Sample survey
Collects data from a part of a population in order to learn about the entire population
Voluntary response sample
Participants choose themselves, usually only those with strong opinions choose to respond
Ex- online surveys, call-in opinion questions
Convenience sample
Investigators choose to sample those people who are easy to reach
Ex- marketing surveys done in a mall
Bias
The design systematically favors certain responses or outcomes
Ex- surveying pacifist church members on their opinions about war
Simple random sample
A group of n individuals chosen from a population in such a way that every set of n individuals has an equal chance of being the chosen sample
A statistically significant result and follows all of the rules of experimentation
What can claim causation?
Stratified random sample
divide the population into groups (strata) of similar individuals (by some chosen category) then choose a simple random sample of each of the groups
systematic random sampling
choosing every nth individual after choosing the first randomly
cluster random sample
divide the population into groups (most likely by location), randomly select one or multiple of these groups, and survey each member from each of the selected groups
undercoverage, nonresponse, response bias, wording of questions
Cautions (things that could ruin the sample):
Undercoverage
when some groups of the population are left out, often because a complete list of the population from which the sample was chosen was not available
nonresponse
when an individual appropriately chosen for the sample cannot or does not respond
response bias
when an individual does not answer a question truthfully due to shame, embarrassment, or pride
question wording
questions are worded to illicit a particular response (self-fulfilling prophecy)
after sample is obtained
When does experimental design take place?
observational study
observes individuals in a population or sample, measures variables of interest, but does not in any way assign treatments or influence responses
experiment
deliberately imposes some treatment on individuals (experimental units or subjects) in order to observe a response- can only give evidence for causation is designed well and statistically significant
control
controls lurking variables by assigning the units to groups that do NOT get the treatment (normal person)
randomize
use simple random sampling to assign units to treatment groups or control groups
replicate
use the same treatment on many units to reduce the variation due to chance
double blind
“best experiments,” neither the researchers nor the subject know which treatments are being used on which subjects (placebos are commonly used)
block designs
subjects are grouped before the experiment based on a certain characteristic or set of characteristics, then simple random samples are taken within each block to assign treatments
matched pairs
a type of block design where two treatments are assigned, sometimes to the same subject or sometimes to another subject matched with them (very common for twins)
long term, short term
probability refers to “…..” not “……”
independent
one event does not change or have an effect on another event
1
all probabilities for one event must sum to…