1/66
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
What makes a good sample
precise (low sampling error)
accurate (unbiased)
random
What constitutes a random sample
independent (one doesn’t influence the others selection) and has equal chance of being selected
what makes a bad sample
sample of connivence - collection of individuals that happen to be available
population parameters
are consistent
sample estimates ____ with the sample
vary
2 ways to describe uncertainty
95% CI and Standard Deviation
Pseudoreplication
samples that are not independent but are treated like they are
How to change CI intervals with % change from 95 to 99
interval must widen
how increasing sample size effects CI interval at same %
narrower
Graphing 1 numerical
histogram
Graphing 1 categorical
bar graph
Graphing 2 categorical
mosaic or grouped bar graph
Graphing 1 categorical 1 numerical
box plot, violin plot, strip chart
Graphing 2 numerical
scatter plot
common problems with figures
do axes start at zero?
do axes titles have units?
are x-axis group names in a legend?
2 general statistics and examples
descriptive - characterizing aspects of a numerical data set (mean and sd)
inferential - evaluate strength of evidence about a hypothesis (t-value, F-ratio)
Type 1 error and how to decrease
false positive: rejecting Ho when it is actually true
decrease decision threshold
Type 2 error and how to decrease
false negative: fail to reject Ho when it is actually false
increase sample size = increased power
power
probability that a random sample results in the rejection of a false Ho
p value
probability of getting the data set if Ho is true
signal to noise ratios
t-value
F-ratio
r-value
1 sample t-test
comparing one value to another
proportions
when you have a number of successes/ sample number and want to test it to a known value
paired t-test
2 variables that are connected, test if they are different
2-sample t-test
2 variables not connected, see if they are different
paired, 2-sample, and ANOVA assumptions
random
normal distributions
2-sample and anova assumptions
variances of both populations are equal
correlation
describes linear assumptions between 2 numerical variables
correlation coefficient
quantifies linear association through
direction and
magnitiude/strength
why variables might be correlated
chance
a causes b
c may cause/influence a and b
a may lead to an increase in c which increases b
correlation missconception
association/correlation does not imply causation
point of linear regression
predict the value of one variable for another
difference between regression and correlation
regression does not treat the two variables equally
residuals
the difference between the actual value and the predicted value of the response variable
minimize these squared for best fit line
R²
variance in y (response variables) explained by x (explanatory variable)
predicting mean y for x
higher precision
use confidence intervals = bend towards the mean
predicting specific y for x
lower precision
use prediction intervals = run parallel to line of best fit
extrapolation
predicting y for x out of range
you don’t know if the trend continues
assumptions for correlation and regression
relationship between x and y is linear
frequency distributions are normal (no gaps, no outliers)
variance of x does not change with y (no funnel)
(each year is chosen at random for each x - regression)
three things to do when assumptions fail
ignore them
transform the data
non-parametric tests
non-parametric tests
ranks the data
Wilcox assumptions
both samples are random
both have same distribution shape
plots for addressing normal frequency
histgoram
qq plot
plots for addressing equal variance
equal IRQs
why normal distributions are important
the occur naturally in nature
symmetric about mean - bell shaped
fully described by mean and sd, mean = median = mode
95% of data are with in 2 sd of the mean
2 goals of experimental design
eliminate bias (increase accurate)
decrease sampling error (increase precision)
ways to eliminate bias
controls
random assignment of samples to treatments
blinding
random assignment can only happen for
experimental studies not observational
Placebo
special kind of control, the expectation to get better is powerful
independent recovery
people will get better inevitably cus they seek treatment when they don’t feel good so you need a control to compare too to see if the treatment actually works
goal of reducing sampling error
increase signal to noise ration
how to increase signal to noise ratio
increase sample size
decrease sd
4 ways to reduce sampling error
replication
balance
blocking
extreme treatments
balance
each group has equal number of samples
n1=n2 means smallest error so smallest noise
blocking
grouping of experimental unit with in each group
creates mini experiments in each block
accounts for variation between blocks
extreme treatments
stronger treatments can increase signal to noise ratio
threats to reproducible science
not biased - no controls, randomization, for blinding
low statistical power - small sample size
poor quality control (higher sampling error) -no replication, balance, block, or extremes
P-hacking - keep adding data till significance is seen
publication bias - only publishing significant results
harking - hypothesis testing after results are known
planning sample size and problems with that
want a sample size with sufficient power and precision
calculate sample size assuming 2 sample experiment comparing means with normal data sets and equal sd
n = 8 x (sd/margin of error)²
problem = can be hard to estimate the population sd and margin of error without literature or short trial experiment
focus on the beginning of the curve because that’s where you can add the most precision
frequentist stats
defines probability of some event in terms of relative frequency with which it tends to occur deductive
the true population looks like this so my sample should look like this
baysean stats
more subjective, defines probability as a measure of strength of your belief regarding true situation
inductive
my sample came out like this so the true population might be this
bayes theory for particular parameter
posterior = likelihood x prior
posterior
new probability given prior data and new data
likelihood
probability of data given parameter
prior
probability of parameter
2 factors that effect here posterior is on the graph
precision - posterior by skinniest curve
distance between prior and likelihood - further apart = more shifted posterior