biostats quiz 4

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/48

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

49 Terms

New cards

multivariate analysis

used to analyze more than 2 variables at once —> use for concepts you know have are complex and influenced by multiple factors (such as self-esteem)

New cards

goal of multivariate analysis

find patterns/correlations b/w several variables simultaneously

New cards

multiple regression

multiple, numeric predictors that describe a response variable

method to find an equation that best describes a response variable as a function of >1 explanatory variables when all of our predictor variables are numeric

New cards

partial regression coefficient

effects of one predictor on response when all other explanatory variables are held constant

New cards

coefficient of multiple determination (R²)

proportion of variance in Y explained by all X

New cards

R² diff between linear and multiple regression

linear = corresponds to a single correlation coefficient

multiple = proportion of variance explained by all predictors combined

New cards

adjusted R²

only increases if new predictor improves model significantly

prevents overestimation of model’s explanatory power by adjusting for degrees of freedom

also explains ___% of variation in [insert response variable]

New cards

why multiple regression is useful

can determine which factor is more important when two predictors are both positively correlated to same response variable

New cards

ultimate goal of multiple regression

explain the most amount of variation using the fewest variables NOT which factors are significant

New cards

hypothesis testing

determines if findings of a study provide evidence to support a specific theory relevant to a larger population

New cards

model selection vs hypothesis testing

1) which set of variables = best model vs does this variable matter

2) uses adjusted R²/AIC vs uses p-values from t-test or F-stats to determine if coefficients are significantly different from 0

New cards

multiple comparisons problem

testing single predictor = probability of Type I error = 0.05, multiple predictors means each test carries its own alpha level so more predictors = increase in probability of Type I

New cards

why we don’t like adding more factors to explain more variance

trade off between model fit and complexity

increases likelihood of Type I errors

factors interact (many, many possible interaction terms)

could add infinite number of factors (where would we stop)

New cards

p-hacking

repeatedly testing diff variables, models, and data subsets until we get one with a p-value less than 0.05 (increases likelihood of Type I)

New cards

AIC

strives to address trade-off between model fit and complexity (penalty for adding more terms)

New cards

ideal AIC value

most negative value

New cards

problems with AIC

doesn’t tell you about quality of model, influenced by sample size, and only useful for model selection

New cards

collinearity

issue with two predictor variables overlapping too much

New cards

variance components and repeatability

proportion of variation in our response variable that is due to our random effect

New cards

variance components

VarCorr() function

intercept = variability among random groups

residual = variability within groups

New cards

repeatability

proportion of total variation that is explained by random effect

repeatability ← varAmong / (varAmong + varWithin)

New cards

comparing AIC numbers

every term added subtracts two from the value so for a complicated model to be justified, it has to more than 2 number smaller than the next, simpler model

New cards

ordination

summarize data in a reduced number of dimensions while accounting for as much of the variability in the original data set as possible

New cards

measure stress

solution to maintaining differences between points in ordination as you increase the number of points

New cards

stress

reflects how well ordination summarizes observed distances among the samples

New cards

eigenanalyses

decomposes square matrix into eigenvectors and eigenvalues

New cards

eigenvectors

directions of greatest variance in data, forming axes of ordination space (stay the same no matter how much data is distorted)

New cards

eigenvalues

indicate amount of variance explained by each eigenvector

New cards

why ordination is important when dealing with complex communities

reduces complexity by identifying the most important underlying gradients or axes that explain variation in community structure AND visualization that readily allows researchers to identify patterns and relationships

New cards

characteristics of communities that we use ordination for

have associations between membres, have a lot of zeroes in the matrix, many potential causal variables (but a small fraction explain most of variation), lots of noise/stochasticity

New cards

PCA

euclidean distances (can’t interpret 0s well)

New cards

PCoA, NMDS

non-Euclidean dissimilarities to calculate differences

New cards

NMDS

ranks pairwise distances between samples and preserves these rankings instead of actual data values

New cards

PCA and PCoA = eigenanalyses

both use eigenvectors as axes

New cards

autocorrelation

response variable can be at least partially predicted by itself at earlier time points or certain spacial distances

New cards

errors associated with autocorrelation

type I errors (false positives)

New cards

significance of degrees of freedom

crucial for determining the p-value and critical values to assess the statistical significance of your results, also helps you understand the precision and reliability of results

New cards

temporal autocorrelation

correlation between values of a variable at different points in time

New cards

autocorrelation function

ACF is a plot of autocorrelation between a variable and itself separated by specified lags

New cards

autocorrelation and residuals

use residuals because if a data wasn’t autocorrelated, the residuals would be random

pattern = residuals are autocorrelated

we don’t use data values because residuals are the part of the dataset that’s NOT explained by the model

New cards

durbin-watson test

use to test for autocorrelation

Ho = no correlation among the residuals

Ha = residuals are autocorrelated

want a p-value greater than 0.05 so we can fail to accept null!

New cards

lag function

add to model as a predictor —> lag(response)

New cards

unconstrained ordination

finds patterns without considering external explanatory variables.
It is data-driven, meaning it purely reflects variations in the response variables (e.g., species composition) without forcing them to align with specific predictors
- PCA, NMDS, PCoA
  - exploratory and find natural variation patterns

New cards

constrained ordination

forces the ordination to be guided by known explanatory variables (e.g., environmental factors like pH, temperature, or soil type).
It maximizes variation explained by these predictors, making it hypothesis-driven rather than purely exploratory.
explain variation using predictors

New cards

PERMANOVA

non-parametric statistical test used in multivariate analysis to compare groups based on a distance matrix. It is especially useful in ecological and community composition studies.

New cards

uses of multiple regression

prediction - predict or estimate a response variables affected by many explanatory variables

causation - is there a functional relationship between response variable and explanatory variables and which factor cause variation in response variable?

New cards

running out of df

df = statistical power, no df = no possibility of finding a significant result

New cards

rank deficiency

too many factors (main or random effects) relative to sample size

New cards