Looks like no one added any tags here yet for you.
What assumptions need to be met for chi-squared tests
• The number of cells with expected frequencies less than 5, are less than 20%
• The minimum expected frequency is at the very least 1.
How can we use a regression model to predict values we don't have for our outcome variable?
Use linear regression model-> save-> click prediction intervals individual
What are the three assumptions for Multiple Regression Inference
1. The relationship between the dependent (Y) and each continuous independent variable (x variables) is linear.
2. Residuals or error terms e should be approximately normally distributed.
We can plot a histogram of the error terms
3. Homoscedasticity (stability in variance of residuals)
What are the steps to test mediation effect (Baron and Kenny )
1 test that the IDV(X) is associated with the DV(Y) c
2 test the association between the IDV and mediator(M) (a)
3 test the association between the mediator on the DV (b)
4 test the IDV association with the DV when controlling for the mediator (c')
What are the two methods to test indirect effects
- Sobel test (Normal Theory Approach)
- Nonparametric Sobel test, PROCESS (bootstrapping)
What is the sobel test for
Measuring the indirect effect of the mediator, ie ab
How do you calculate the estimated linear effect of a categorical variable in the interaction
B1+B3 x levels of modifier (0,1)
What is the confidence interval for?
A statistical estimate of how "good" the test statistic is, with lower % confidence being more cautious
When do we use an independent sample t test
for normally distributed continuous data, to test the differences between two groups/ variables
When do we use a Pearsons chi squared test
If two categorical variables are associated, that meet assumptions
To test if according to the current data, the proportions of two groups are significantly different from each other.
What is the McNemar test used for
test for paired categorical data, that meets assumptions, to see if proportions changed between the paired data (eg over time)
What do we need to look at before reporting a pearsons chi squared test
Assumptions:
• The number of cells with expected frequencies less than 5, are less than 20%
• The minimum expected frequency is at the very least 1.
What do we need to look at before reporting an independent samples t test
the Levene's test, if the levenes test is significant t p<.05, use second row- 'equal variances are not assumed"
What does X2 stand for
Chi square
What test do we use for non-parametric data to compare one group to another group
Mann-Whitney U test
What test is used for comparing independent categorical groups that don't meet assumptions
Fisher's exact test
What are the assumptions of the McNemar test
For McNemar's test, we needed at least 25 discordant observations, and paired 2×2 data, if there is less, or 3×3 or more data we look to the McNemar Bowker test
How do we interpret a scatterplot
See if there is a linear relationship (either positive or negative) and if we can add a linear fit line
How do we test the 3 assumptions for Multiple Regression Inference
1. Use scatterplot, residuals of the dependent variable (Y) plotted against residuals of each independent variable (x)
2.We can plot a histogram of the error terms to see if the errors more or less follow a normal distribution, and p plot
3. A scatterplot of standardised residuals epsilon and standardised predicted values shows no pattern. (ZPRED, ZRESID)
How do we test distributions
For continuous variables we use the histogram plot
How to we test our one sample data against a prediction or other values of population?
Use t test or chi square
Before using test make sure to analyse descriptives!! Look at distribution of data
What do we use the One sample chi square test for
Use to test proportion of results in data, default at 50/50
To test if according to the current data, the proportion in the population equals a certain, pre-specified, value.
How do we calculate the 95% confidence interval?
x̄ +/- (1.96 x SE)
How do we calculate the standard error
SE= sample SD / square root of sample size (n)
How do we calculate a confidence interval
x̄ +/- (CI x SE) , repeat for both upper and lower bound
What type of test to we use for a binary categorical Independent variable of interest and a continuous Dependent variable
T test (if parametric)
What type of test to we use for a categorical Independent variable of interest and a categorical Dependent variable
chi squared test (if assumptions met)
What type of test to we use for a continuous Independent variable of interest and a continuous Dependent variable
Simple linear regression
What type of test to we use for multiple continuous and categorical Independent variables of interest and a continuous Dependent variable
Multiple Linear Regression
When do we use a paired sample t test
When comparing two means from the same group of respondents, with both continuous variables being normally distributed
What test do we use for non-parametric data to compare one group to a pre-defined value
Wilcoxon sign rank test
What test do we use for non-parametric data to compare two related groups
Wilcoxon paired/ matched sign rank
When do we split a variable?
To check the distribution of levels of a categorical variable separately (eg gender or ethnicity) against a continuous outcome
When do we merge two variables?
Only merge paired variables when we want to check the distribution/suitability, we use the differences between paired data to check if it is normally distributed
What test do we use categorical data with >20% cells exp count less than 5 to compare one group to a pre-defined value
One sample Binomial Exact test
What test is used for comparing paired categorical groups that don't meet assumptions
McNemar Bowker test/ paired Binomial exact test
What do you report for non-parametric tests?
medians, with min-max, p value and df
What visualisation is used to see if two continuous variables are associated
Scatterplots are used to see the association between two continuous variables
Why do we use a correlation
When we need an objective measure of strength of a linear relationship.
Correlation 'r' is a method to quantify the Direction and Magnitude, of linear association between two continuous variables.
When do we use Pearson's correlation
When we have two continuous variables with a linear relationship and the data is parametric on a histogram
When do we use Spearman's correlation
When we have two continuous variables with a linear relationship and the data is not parametric/ skewed on a histogram
How do we interpret the results of a correlation?
Coefficient r : r value ranges from -1 to 1, the closer to 1/-1, the stronger the linear correlation (magnitude)
, and p value, the probability
What is a simple linear regression for
In statistical modelling, a regression model is a set of statistical processes for estimating the relationships among variables. These models describe the relationship between variables by fitting a line to the observed data.
What is the equation for a simple linear regression
Y=B0+B1 𝒙 + 𝜺
What is the difference between R2 and R2 adjusted
The R2 value indicates how much of the total variation in the dependent variable, the adjusted is this when adjusting for the number of predictors
What do we do when we have a categorical predictor that we want to test
We code it through dummy variables, with one level acting as a reference category (0)
What are dummy variables represented by
0 and 1
How many dummy variables are needed to represent a variable with 3 levels?
2, labelled d1 and d2, one will be the reference category
How do you use dummy variables in a linear regression with a categorical variable with 3 levels
Input coded dummy variables as usual in regression
If you input all dummy variables (including coded reference category) SPSS will choose a reference category and exclude it
The results table shows the dummy variables COMPARED to the reference category, and if there is a significant difference in the outcome
What is the equation for a multiple linear regression
Y=B0+B1 𝒙1 + B2 𝒙2 + 𝜺
What does x1 or x2 represent in the MLR model
x1 and x2 represent the independent variables of interest
What is the multiple linear regression equation that has a categorical predictor with 3 levels
Y= B0+B1 d1+ B2 d2+ 𝜀
What does R2 represent in a MLR model
R2 is often interpreted as the proportion of the variance in the dependent variable that is "explained" by the independent variables in the model.
What does a higher R2 mean
higher values of R2 indicating better prediction.
What does a B1 of 2 tell us?
That for a 1 unit increase in idv 1, there is an increase of 2 for the dv
What does M stand for
M = the mediator
What does a mediator do
A mediator (M) of the causal effect of independent variable (x1) on dependent variable (Y) is a variable x2 on the causal pathway from x1 to Y.
What is the symbol for the pathway of the total effect
c
What is the difference between complete and partial mediation
Partial mediation, there is still a significant effect of the IDV on the DV when controlling for the mediator but the effect is reduced
Complete mediation, the mediator eliminates the significant effect of the IDV on the DV
What is the equation to compute the indirect effect
SE(ab) = sqr a2 Sb2+ b2 Sa2
What is PROCESS used for
Check the 95% Bias-corrected bootstrap confidence interval in the model, both direct and indirect effects
What is needed for the indirect effect to be significant
the confidence interval does not contain 0
Which steps in the Baron Kenny method are essential to establish mediation
Steps 2 and 3 are essential for establishing mediation
What do you do if there might be a modifier
Create a new variable for the interaction between the IDV and the modifier
Then run a multiple linear regression including the new modifier with the previous IDV(s) and DV
What does Z represent
the modifier
What is B3
𝛃𝟑 is interpreted as the difference of the effect of 𝐱𝟏 on Y by levels of 𝐙 variable.
What are the DFBETA and DFFIT used for?
For outliers and strong influencers on the data
Look at both ascending and descending ( both - and + >1)
In the SDF1 column, if >1 has a strong influence on the model, may need to be removed