two types of studies
observational and experimental
observational study
making observations with unmanipulated variables; making correlations
experimental study
making observations with a manipulated variable; investigating cause and effects
controlled experiment
changes one variable at a time
independent / explanatory variable
variable being manipulated
dependent variable / response variable
produces data necessary to support or refute hypothesis
constants
aspects of the experiment kept the same
confounding variable
a variable that links the independent and dependent variable and could be affecting the outcomes
experimental units
objects being experimented on; humans referred to as subjects
four principles of experimental design
comparison, random assignment, controls / constants, replication
comparison
the experiment compares two or more treatments
random assignment
using chance to assign treatments to groups
controls / constants
variables kept the same in an experiment
replication
using enough experimental units in an experiment
placebo effect
any outcomes from the dummy treatment
double-blind experiment
researcher and subject are unaware of which treatment is which
population
entire group we want information about
census
collects data from every individual in the population
sample
subset of individuals from population which we actually collect data from
convenience sample
choosing easy to reach individuals from the population
voluntary response surveys
people decide whether to join the sample; creates bias since strong-opinioned people will only want to participate
simple random sampling
every group of n individuals in population has an equal chance to be selected (hat method)
stratified random sampling
classify population into homogenous groups (strata) and take SRS of them
cluster sampling
classify groups into heterogenous groups (geographically) and take SRS of all the clusters; cluster chosen must be used
systemic sampling
randomly choose starting point in population and select every kth member
undercoverage
occurs when some members of the population could not be chosen
nonresponse
when participant of experiment cannot be reached or refuses to participate
response bias
the person asking questions could potentially affect data (systemic incorrect responses)
wording of question
manner of question asked could potentially affect answer
qualitative data
shown in bar graphs, frequency / relative frequency tables, pie chart
quantative data
shown in histograms, stem plots, dot plots, box and whisker plots
two-way table
two categorical variables organized according to a row and column variable
marginal distribution
using the "margins" of the data
mean
average of the data (use x with bar from sample mean and fancy u for population mean)
median
midpoint of distribution when data is arranged smallest to largest
interquartile range (IQR)
middle half of the data (Q3 - Q1)
five number summary
minimum, Q1, median, Q3, maximum
two ways to measure spread
IQR using quartiles and median; standard deviation using mean
standard deviation
measures average distance of observations from mean; calculated by finding average of the squared distances and taking square root (s for sample and fancy o for population)
skewed right
when data values are concentrated on left and less values are on the right; mean is greater than median (dinosaur tail points right)
skewed left
when data values are concentrated in right and less values on the left; mean is less than median (dinosaur tail points left)
symmetric
when data values are centered; mean and median is the same