1/82
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
General Linear Model (GLM)
statistical framework that describes the relationship between a DV and one or more IVs
when to use a GLM
to test a hypothesis with a numeric outcome
e.g. regression, correlation, t test, anova
linear regression
tests whether there is a linear association between numeric variables, often continuous variables
works with categorical predictors as an alt to anova (dummy code predictor)
What does a significant association in linear regression indicate?
slope and intercept are significantly different from 0
What are the assumptions of multiple regression?
No multicollinearity, homoscedasticity, linear relationship between DV and IVs, normally distributed residuals.
centering values
mean removed from each datapoint in variable
intercept becomes mean DV value
=to make intercept more interpretable
df in multiple regression
dfm- number of variables in the model
dfe- number of ppts-number of variables-1
What does the F statistic in ANOVA represent?
The ratio of variance due to differences between groups to variance within groups. (F = MSbetween/MSwithin)
group effects
the deviation of each group mean from the grand mean
What is the significance of Mauchly's test in repeated measures ANOVA?
It tests the assumption of sphericity, which is the equality of variances of the differences between treatment levels.
What is a two way anova used for
when there are two categorical IVs
What is the null hypothesis for testing the interaction between two independent variables in a two way anova
no effect of interaction on DV
What is the formula for the individual score in the context of ANOVA?
Individual score (DV) = grand mean + main effect A + main effect B + interaction + error.
why use a Bonferroni correction in post hoc tests
reduce chance of type 1 error from multiple comparisons
What does the term 'homogeneity of variances' refer to in ANOVA?
The assumption that different groups have similar variances.
assumptions of anovas
observations are independent events and identically distributed
homogeneity of variances
normality of residuals
if assumptions for anova not met
do anova anyway (less power and risk of type 2 error
transform the dv
use kruskal wallis non parametric test
What is the difference between fixed effects and random effects in ANOVA?
Fixed effects are chosen levels of factors
random effects are factors selected randomly from a population.
What is the significance of the residuals in regression analysis?
Residuals are the differences between observed values and predicted values, indicating the model's accuracy.
What does 'dummy coding' refer to in regression analysis?
Transforming categorical predictors into numerical format for analysis. df loses 1
What is the main effect in the context of ANOVA?
The individual effect of one independent variable on the dependent variable.
grand mean
average score across all subjects no matter the condition
df in two way anova
DFmaineffect = k-1 for each factor
DFinteraction= DFaxDFb
sphericity
The assumption that the variances of the differences between treatment levels are equal between repeated measures deigns with 3+levels
mauchlys test
tests whether variances of differences between conditions are equal
if p<0.5, F test has too many false positives and no sphericity
correct df using greenhouse geisser correction or huyh-fedlt correction
What is ANCOVA?
An extension of ANOVA that includes a numeric covariate that may explain additional variance in DV
strengths of ancova
reduces within-group variance (error) as co variate explains some of the error
eliminates confounds by including them
contrast coding in linear models
A method to transform categorical predictors into numeric values for analysis, allowing for comparisons against a reference level.
dummy contrast coding
one level of a categorical variable is defined as the reference (0) and other levels are compared to it (1)
intercept shows mean of reference level and slope shows the difference between each other level and the reference level
successive differences coding
tests whether there are differences between levels sequentially, with the intercept representing the grand mean and slope showing mean differences between groups
deviation coding
A method used for factors with two levels, assigning one level -0.5 and one 0.5
intercept becomes grand mean and slope shows mean dif between groups
What are the five important assumptions in order in statistical modeling according to Gelman and Hill (2007)?
Validity, Linearity and additivity, Independence of errors, Homogeneity of variances, Normality of residuals.
multicollinearity
when predictors are strongly correlated with each other
=can inflate standard errors and complicate regression coefficient estimation. above 0.7 concern
What does the Variance Inflation Factor (VIF) measure?
The relationship between variables, with values greater than 5 indicating high collinearity, which is a concern in regression analysis.
Cook's distance
detects influential outliers in regression by investigating how much predicted values change if an observation is removed
higher value= affects data more
centering
Shifting a variable so its mean becomes 0, keeping the same shape size and relationship
z score transformation
rescaling variable so its mean is 0 and sd is 1
What is log transformation used for?
To smooth out tails of a distribution and make data more normally distributed.
Akaike Information Criterion (AIC)?
A measure used to compare different models, where a smaller AIC indicates a better fit.
Linear Mixed Models (LLMs)?
Models that account for random effects in data with nested or hierarchical structures, allowing for varying intercepts or slopes.
fixed effects in LLMs?
Explanatory variables hypothesized to affect the DV.
random effects in LLMs?
Categorical grouping variables considered random samples from a larger population, like participants or schools.
What is REML?
Restricted Maximum Likelihood, the default parameter estimation criterion for linear mixed models.
random effect variables
categorical, ideally 5+ levels, represent a sample from a broader population
What is the purpose of centering variables in LLMs?
To improve interpretability of the intercept and reduce multicollinearity issues.
What is the significance of including random slopes in a model?
To examine how predictors interact with random effects, though it can complicate the model and lead to overfitting.
nested random effects
a lower level grouping factor exists only within one specific level of a higher level factor
crossed random effects
a factor appears in multiple levels of another factor
What is the purpose of transformations in statistical analysis?
To meet assumptions of normality, linearity, and homogeneity of variance, improving model accuracy.
What does it mean if a model fails to converge?
It indicates that the model is too complex or improperly specified, often due to overfitting or insufficient data.
p-value
probability of observing a test statistic that is at least as extreme or more extreme than the one we observed if the null hypothesis is true and we repeat our experiment many times
C: doesnt tell you the null is false
alpha level
threshold for declaring significance (5%)
C: levels should be different for different contexts (Fisher, 1935)
effect size
tells us how practical the result is in the real world
‘small’ <0.2 cohens d hedges g, 0.1-0.3 correlation, 0.01 anova, cohens f 0.1
‘medium’ <0.5 cohens d hedges g, 0.3-0.5 correlation, 0.06 anova, 0.25 cohens f
‘large’ 0.8 cohens d hedges g, >0.5 correlation, 0.14 anova, 0.4 cohens f
eta squared n2
effect size for main effects in anova that tells us the proportion of variance in the dv explained by the predictor
partial eta squared n2p
proportion of partial variance after accounting for the other predictors in the model that the predictor explains in the dv
generalised eta squared n2g
estimates the effect size in a design where only the term of interest was manipulated, accounting for the fact that some terms cannot be manipulated
-formula depends on design
cohens f (partial)
a transformation of partial eta squared when the population means are equal and an indefinitely large number as the means are further and further apart
factors that affect power
sample size- larger gives larger power
expected effect size- larger gives larger power
type 1 error rate- as tolerance for type1 error in, power inc
reliability of measures- more reliable larger power
power
probability of correctly rejecting the null, if type 2 error rate is .2, power is .8
priori power analysis
what sample size do we need to have 80% power in detecting an effect size
sensitivity power analysis
what is the smallest effect size we can detect with the power, sample size, and alpha we have
meta analysis strengths
+higher power than individual studies
+overall effect across studies
+can identify potential publication bias
+can explore impact of design and analysis decisions
+more accurate estimation of effect size for future power analysis
fixed effect model for meta analyses
assumes all studies are estimating the same population effect size. any error is due to sampling
random effects model for meta analyses
allows the population effect to differ between studies. allows for differences in design, population, dosage
heterogeneity of meta analysis
the extent to which effect sizes vary between studies
heterogeneity measures
cochrans Q statistic- difference between observed effect sizes and overall effect size
I2- percentage of variability in the effect sizes not caused by sampling error
tau-squared- alternative measure of heterogeneity between study
funnel plot
plots effect size against precision of the study
studies with high precision should cluster at the top
studies with low precision should scatter widely at the bottom
unsymmetrical could be sign of publication bias
error/residuals (anova)
difference between individual points and its group mean
sum of squares
summarises the total variance associated with each component of the model
SS= SSwithin+SSbetween
mean squares
we cant use ss to compare so use mean squares
MSwithin= SSwithin/DFbetween
MSbetween= SSbetween/DFbetween
homoscedasticity
similar amount of variation in y at each value of x
VIF
1/1-R2
calculate effect n2 for anova
SSeffect/SStotal
mu with a hat
sample estimate of the population mean value
if linear mixed effects model doesnt converge
change number of iterations of optimiser
remove correlations between random effects
dummy coding intercept and slope
intercept is mean of reference level
slope is mean difference between levels
calculate each groups mean in deviation coding
intercept +0.5xslope
anova equation
Individual score (dependent)= grand mean + group mean difference from grand mean + error (diff between group mean and individual score)
df for ancova or multiple regression
number of observations- (k-1)
multiple regression formula
Dv= intercept + predictor slope 1x IV1+ predictor slope 2XIV2+error
degrees of freedom
linear regression N-1
multiple regression N-(K-1)
anova N-1
ancova N-(K-1)
fixed effects lmm depends
random effects lmm number of random variables
meta analysis K-1
poisson distribution
count variable
variance = mean
when to use z score
when you want to compare distribution and coefficients across predictors with different scales