Statistical tests, their assumptions and more!
3 types of Variables:
fixed vs random
discrete vs continuous
numerical vs categorical
independent vs dependent
Errors of data organization:
data is not versioned
Shorthand dictionary is missing
data has not been backed up
headers are wordy with lost of spaces and no clear pathway
Type 1 error:
rejecting the null when it should be accepted
type 2 error:
accepting the null when it should be rejected
When to use a chi-squared test:
comparing the goodness of fit between two groups of categorical data
When to use a t-test:
When comparing the means of numerical data
4 types of t-tests:
paired
unpaired
one tailed
two tailed
What does an F statistic mean:
gives the ratio of variances between two groups
When to use a Cohenâs d test:
to determine what strength of influence subgroups have on the results (0-.2 small, 0.5 medium, .8-1.4 large)
When to use a correlation test:
determines the direction and influence between two variables when it is unknown, anything greater than 0 indicates correlation
When to use a linear regression:
use to determine causation
When to use a regression model:
when you know that the predictor variable influences the response variable
assumptions to a linear regression:
observations are taken from a random sample
individual observations are independent from each other
predictor variable is not a random variable
for each X value, the corresponding Y comes from a normal distribution
variance is constant and does not depend on H (homoscedastic)
When to use a Wilcoxon Signed Rank test:
when data is non-normally distributed and you want to compare the means (nonparametric version of t-test)
r means:
correlation coefficient, indicates the direction and magnitude of correlation (-1 to 1)
assumptions of a correlation test:
both variables are continuous
a linear relationship between the variables exists
no outliers are present
both variables are normally distributed
What test does a Spearmanâs correlation test relate to:
Nonparametric version of the correlation test
Possible transformations:
log
square root
box-cox
arcsin
power
exponential
reciprocal
When to use nonlinear models:
when transformations cannot alter data enough so a parametric test can be used
two nonlinear models:
Michaelis-Menten and quadratic tests
ANOVA assumptions:
Normality
Homogeneity
Additivity
Regression Fallacy
As more samples from an individual are taken, they will regress towards the mean, however this is not due to treatment
What a Poisson distribution is:
Describes the number of successes in blocks of space and time
Poisson Distribution assumptions:
Successes are independent of one another
Successes must occur with equal probability
When to use Poisson Distribution:
to determine whether events happen randomly in space and time
When to use Log Regression:
when responses are binary so a linear regression cannot be used
Appearances of linear vs. logistic regressions:
linear= straight, logistic= curved