1/48
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
multivariate analysis
used to analyze more than 2 variables at once —> use for concepts you know have are complex and influenced by multiple factors (such as self-esteem)
goal of multivariate analysis
find patterns/correlations b/w several variables simultaneously
multiple regression
multiple, numeric predictors that describe a response variable
method to find an equation that best describes a response variable as a function of >1 explanatory variables when all of our predictor variables are numeric
partial regression coefficient
effects of one predictor on response when all other explanatory variables are held constant
coefficient of multiple determination (R²)
proportion of variance in Y explained by all X
R² diff between linear and multiple regression
linear = corresponds to a single correlation coefficient
multiple = proportion of variance explained by all predictors combined
adjusted R²
only increases if new predictor improves model significantly
prevents overestimation of model’s explanatory power by adjusting for degrees of freedom
also explains ___% of variation in [insert response variable]
why multiple regression is useful
can determine which factor is more important when two predictors are both positively correlated to same response variable
ultimate goal of multiple regression
explain the most amount of variation using the fewest variables NOT which factors are significant
hypothesis testing
determines if findings of a study provide evidence to support a specific theory relevant to a larger population
model selection vs hypothesis testing
1) which set of variables = best model vs does this variable matter
2) uses adjusted R²/AIC vs uses p-values from t-test or F-stats to determine if coefficients are significantly different from 0
multiple comparisons problem
testing single predictor = probability of Type I error = 0.05, multiple predictors means each test carries its own alpha level so more predictors = increase in probability of Type I
why we don’t like adding more factors to explain more variance
trade off between model fit and complexity
increases likelihood of Type I errors
factors interact (many, many possible interaction terms)
could add infinite number of factors (where would we stop)
p-hacking
repeatedly testing diff variables, models, and data subsets until we get one with a p-value less than 0.05 (increases likelihood of Type I)
AIC
strives to address trade-off between model fit and complexity (penalty for adding more terms)
ideal AIC value
most negative value
problems with AIC
doesn’t tell you about quality of model, influenced by sample size, and only useful for model selection
collinearity
issue with two predictor variables overlapping too much
variance components and repeatability
proportion of variation in our response variable that is due to our random effect
variance components
VarCorr() function
intercept = variability among random groups
residual = variability within groups
repeatability
proportion of total variation that is explained by random effect
repeatability ← varAmong / (varAmong + varWithin)
comparing AIC numbers
every term added subtracts two from the value so for a complicated model to be justified, it has to more than 2 number smaller than the next, simpler model
ordination
summarize data in a reduced number of dimensions while accounting for as much of the variability in the original data set as possible
measure stress
solution to maintaining differences between points in ordination as you increase the number of points
stress
reflects how well ordination summarizes observed distances among the samples
eigenanalyses
decomposes square matrix into eigenvectors and eigenvalues
eigenvectors
directions of greatest variance in data, forming axes of ordination space (stay the same no matter how much data is distorted)
eigenvalues
indicate amount of variance explained by each eigenvector
why ordination is important when dealing with complex communities
reduces complexity by identifying the most important underlying gradients or axes that explain variation in community structure AND visualization that readily allows researchers to identify patterns and relationships
characteristics of communities that we use ordination for
have associations between membres, have a lot of zeroes in the matrix, many potential causal variables (but a small fraction explain most of variation), lots of noise/stochasticity
PCA
euclidean distances (can’t interpret 0s well)
PCoA, NMDS
non-Euclidean dissimilarities to calculate differences
NMDS
ranks pairwise distances between samples and preserves these rankings instead of actual data values
PCA and PCoA = eigenanalyses
both use eigenvectors as axes
autocorrelation
response variable can be at least partially predicted by itself at earlier time points or certain spacial distances
errors associated with autocorrelation
type I errors (false positives)
significance of degrees of freedom
crucial for determining the p-value and critical values to assess the statistical significance of your results, also helps you understand the precision and reliability of results
temporal autocorrelation
correlation between values of a variable at different points in time
autocorrelation function
ACF is a plot of autocorrelation between a variable and itself separated by specified lags
autocorrelation and residuals
use residuals because if a data wasn’t autocorrelated, the residuals would be random
pattern = residuals are autocorrelated
we don’t use data values because residuals are the part of the dataset that’s NOT explained by the model
durbin-watson test
use to test for autocorrelation
Ho = no correlation among the residuals
Ha = residuals are autocorrelated
want a p-value greater than 0.05 so we can fail to accept null!
lag function
add to model as a predictor —> lag(response)
unconstrained ordination
finds patterns without considering external explanatory variables.
It is data-driven, meaning it purely reflects variations in the response variables (e.g., species composition) without forcing them to align with specific predictors
PCA, NMDS, PCoA
exploratory and find natural variation patterns
constrained ordination
forces the ordination to be guided by known explanatory variables (e.g., environmental factors like pH, temperature, or soil type).
It maximizes variation explained by these predictors, making it hypothesis-driven rather than purely exploratory.
explain variation using predictors
PERMANOVA
non-parametric statistical test used in multivariate analysis to compare groups based on a distance matrix. It is especially useful in ecological and community composition studies.
uses of multiple regression
prediction - predict or estimate a response variables affected by many explanatory variables
causation - is there a functional relationship between response variable and explanatory variables and which factor cause variation in response variable?
running out of df
df = statistical power, no df = no possibility of finding a significant result
rank deficiency
too many factors (main or random effects) relative to sample size