Research methods

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/126

Earn XP

Description and Tags

50% exam

Last updated 10:40 PM on 3/26/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

127 Terms

New cards

What is a (bivariate) correlation

This examines the strength of the relationship between 2 variables, e.g ice cream sales and temperature

New cards

What is a linear regression?

Allows us to predict one variable (DV/ criterion) from a series of other related variables (IVs/ predictors), e.g ice cream sales can be predicted by temperature. NOT CAUSAL

New cards

What is regression?

Statistical technique that allows us to predict someone’s score on one variable from their score on:

one variable (bivariate regression)

more than one variable (multiple regression)

New cards

What other name for DV. What other name for IV

DV= criterion variable
IV= predictor variable

New cards

What is multiple regression?

A statistical technique that builds a hypothetical model of a relationship between a single criterion variable (DV) and multiple predictor variables (IVs).

– A predictive model that best predicts an outcome/ criterion variable.

– Produces a regression equation that can be used to make predictions

• To predict someone’s exam score, we assess their performance in a previous exam and how many hours they revised and their attendance at lectures, and their motivation.

New cards

What are some Data Requirements for Multiple Regression

Sample size; normality; linearity; multicollinearity;

homoscedasticity (5 ASSUMPTIONS) S N L M H

New cards

Hypotheses for Multiple regression?

H0: There is no linear relationship between the criterion variable and the predictor variables.

• H1: There is a linear relationship between the criterion variable and at least one of the predictor variables

New cards

Formula for sample size?

N> 50+8M

M is the number of predictor variables

(Tabachnick and Fidell’s 2007)

New cards

What is the assumption ‘Multicollinearity’?

And what is singularity

A high inter-correlation ( r > +/-.90) between the predictors, e.g 3 predictors are highly correlated. Creates issues for regression models!

Singularity: a perfect linear relationship between variables (more intense form of multicollinearity)

New cards

What are the guidelines for suggesting no multicollinearity?

Tolerance> 0.50

VIF <10.00

Correlations between predictors < 0.9

New cards

What is a residual?

The difference between the value a model predicts, and the value observed in the data on which the model is based, e.g when you have a linear plot, it’s the difference between the line of best fit and the plotted data,

e.g a negative residual= when predicted value is too high; a positive residual= when predicted value is too low.

New cards

How do we check for normality?

. On histogram (line of best fit)

. On P-plot.

line of best fit attempts to minimise the residuals

New cards

What is the assumption Homoscedasticity

. This is the SPREAD of scores/variance.

. Check using a scatterplot

. The assumption that the residuals at each level of the predictor variable have similar variances/ evenly distributed.

New cards

What things would you see on a scatterplot for homoscedasicity vs heteroscedacity?

HOMO- random spread of scores, equal number below and above the line

HETERO- fan shape score spread or bow tie shape

New cards

How do we check for the assumption linearity?

The predictor variables should be lineally related to the criterion variable.

New cards

What are the 3 types of multiple regression?

– Standard / Simultaneous

– Hierarchical

– Stepwise

New cards

Describe standard multiple regression:

All the predictor variables entered into the model simultaneously.

. Use this approach if you have a set of variables and want to know how much variance in a criterion

. Tells you how much unique variance each predictor variable explains.

New cards

How do we assess the model via SPSS

. See R, R Square and Adjusted R square (can convert to %)

. See significance level (should be less than 0.05) and standardised coefficients beta

. See df and F value

New cards

What is the structure to formally report results?

A standard multiple regression was used to assess the ability of four predictor variables: age, depression, social support and sleep duration to predict PTSD. All predictor variables were entered simultaneously.

Preliminary analyses were conducted to ensure no violation of the assumptions of normality, linearity, multicollinearity and homoscedasticity. None of the assumptions were violated. Sample size of 292 exceeded the required amount.

61.7% (adjusted R2) of the variation in PTSD symptoms could be explained by the variation in age, sleep duration, depression scores, and social support, F (4, 287) = 118.44, p< .001.

Depression made a significant contribution to the model (p< .001). Depression made a positive contribution, suggesting that increases in depression are associated with an increase in PTSD symptoms.

New cards

What do you also say when talking about implications of research?

. How can our findings benefit people? Who does it affect and what interventions can be put in place?

New cards

What does a bigger R squared value mean

The more variance the combo of predictors can explain in the outcome variable

New cards

Lecture 2:

Multiple hierarchal regression

New cards

What is the regression equation and what does it show?

> Allows us to predict the value of the criterion variable (Y) from a set of predictor variables (X1, X2, X3)

> Allows us to predict how Y will change as a result of changes in X

> Equation is Ŷ = b0 + b1x1 + b2x2… + error

. Y= the criterion/DV

. B0= the intercept/constant (so the value for DV when predictors are 0)

B1= regression coefficient of the predictor

X1= name/value of the predictor

New cards

How did we find the values of B0, B1 etc on SPSS? What do you need to report for this and what is the wording for this?

Unstandardised coefficients B column

. The value of the constant/intercept=

. The number of units the criterion (Y) changes by for each unit increase in the predictor (x), while controlling for the other predictors

New cards

What are the assumptions for multiple hierarchal regression?

. Linearity

. Normality

. Homoscedasicity

. Multicollinearity

. Sample size

New cards

What happens if you have too small of a sample size?

You will be underpowered to find a significant result

. The more predictors, generally the bigger the sample size

New cards

Multicollinearity for hierarchal multiple regression?

. Limits the size of R

. Makes determining the importance of the given predictor difficult

. VIF, tolerance, correlations values

New cards

How do we deal with multicollinearity?

check for errors in data entry/coding
If there’s a fairly large set of predictors- reduce this to a smaller set of predictors
Consider deleting/omitting a predictor that is highly correlated with another predictor
HOWEVER< it may juts be that the predictors are truly highly correlated

New cards

How do we check normality for Hierarchal multiple regression?

P-plot and histogram

. See the normal distribution line, if on the P-lot it is not close to the predicted line, there is a lack of normality

New cards

How do we check for linearity in hierarchal multiple regression?

The predictor variables should be lineally related to the criterion variable (should be a straight line). Check on P-Plot

New cards

How do we check for homoscendacity for hierarchal multiple regression?

. Check on scatterplot which plots the standardised residuals against the predicted values

. The residuals at each level of the predictors should have the same level of variance

. When the variance is unequal - Heteroscedastic

New cards

What is an outlier in regression?

An observation (case) that is substantially different from the others

. Can haver large impacts on the results of regression analysis

New cards

How do we detect outliers in SPSS

. Scatterplots and residual plots (but this is subjective)

. Residuals statistics tables: Standardised residuals and Cook’s Distance

New cards

What are standardised residuals for outliers??

Helps to identify anyone whose predicted score is quite different from their actual score

(Tabachnick and Fiddell, 2001): values that fall outside the safe zone + or - 3.3

New cards

What is Cook’s distance for outliers?

. Cook’s distance measures the influence of deleting a case

. So any large values (Over >1 indicates that the case considerably affects the estimates regression coefficients)

New cards

Describe hierarchal multiple regression?

. AKA Sequential

. Predictor variables are entered into the equation, in the order specified by the researcher (generally known predictors from previous research are entered first, then new predictors in successive models)

. Variables or sets of variables are entered in steps (blocks)

. Each block of predictor variables are assessed in terms of what it adds to the criterion (DV) variable, after previous blocks of variables that have been controlled for.

New cards

What about categorical predictors in regression?

Predictors should be quantitative, or if they are categorical, they should be dichotomous (exhaustive and mutually exclusive)… so you fall into 1 or another group
Regression models should include mostly quantitative predictors (and not all categorical)
They re coded as dummy variables,e g. Boys 1, Girls 0, Graduated 1, Non-graduated 0

New cards

BE careful about….

. Make sure you consider what the variables mean, e.g low scores on a test means positivity (when you may expect low scores to be a bad thing)!!

New cards

When we are predicting the effect of age, education and physical memory tests and PRMQ questionnaire on dementia what are the criterion and predictor variables

Criteria- dementia

Predictor- age, education and physical memory tests and PRMQ questionnaire

New cards

Write what you would say when assessing the model to say adjusted Squared for different models/blocks?

After the variables in Block 1 (Age, education, physical memory test score) have been entered, the overall model explains 51.7% of the variance in dementia symptomology (Adjusted R2=0.517)

After the Block 2 variable (PRMQ test) has been included, Model 2 explains 61.5% (Adjusted R2= 0.615).

New cards

What do you look at to assess significance?

Model summary table for: R square change (is it above 0.001)

ANOVA table for : model 1 and 2 significance

New cards

What table do you look at when evaluating each of the predictors variables?

Coefficients to see each predictor variable’s individual contribution

New cards

How do we formally report the results the Hierarchal Multiple Regression?

A hierarchical multiple regression was used to assess the ability of self-reported memory scores on the PRQM to predict dementia symptomology after controlling for the variation explained by age, education and physical memory test.

Preliminary analyses were conducted to ensure no violation of the assumptions of normality, linearity, multicollinearity and homoscedasticity. Assumptions were met, although homoscedasticity was questionable.

Age, education and scores on physical memory test were entered in Model 1, explaining 51.7% of the variation in dementia symptomology (Adjusted R2= .517), F(3,86)= 32.716, p< .001. Age and physical memory test were significant predictors of dementia symptomology (p<.001). Education was not a significant predictor.

After entry of the PRMQ score in Model 2, the model explained significantly more variance, F(4, 85) =36.564, p < .001; R2 change= 0.099, p< .001. The total variation in dementia symptomology explained by the model as a whole was 61.5%, (Adjusted R2= .615). The significant predictors in model 2 were age, physical memory score, PRMQ score (p< .001). Education was not a significant predictor.

In the final model, age and PRMQ score made positive contributions, suggesting that increases in age and increases in scores on the PRMQ are associated with an increase dementia symptomology.

Physical memory test scores made a negative contribution to predicting dementia symptomology, suggesting that increases in scores on the physical memory tasks are associated with a decrease in dementia symptomology.

New cards

Explain the regression line of best fit?

. the model’s predicted values are plotted on a regression line, which passes through a scatterplot. The regression line (of best fit) attempts to minimise the residuals.

New cards

LECTURE 3

Stepwise multiple regression

New cards

What 5 things can we do with outliers?

Transform variable
Replace value
Delete value
Delete participant
Non-parametric test

New cards

What is stepwise multiple regression?

• Method of regression that adds multiple predictors while simultaneously removing those that don’t improve the R2 value

• SPSS selects ONLY the predictors which provide the strongest prediction of variance in the outcome variable.

• The aim is to create the best model fit and achieve the highest R2 value with the fewest predictors.

Researcher provides a list of predictor variables and then allows the software to select which predictors to enter, and which order they go in.

• Variables are added to the regression equation one at a time, with an attempt to maximise the R².

– Default criteria in SPSS, p< .05.

• After each variable is entered, each of the included variables are tested to see if the model would be better if it were excluded.

New cards

What are the advantages of stepwise multiple regression?

. Saves time: Stepwise regression can help identify the most important predictors for a given outcome variable in a relatively short amount of time.

• Useful for exploratory data analysis: an exploratory tool to identify potentially important predictors that can be further investigated and refined using other statistical techniques.

New cards

What are the disadvantages of stepwise multiple regression?

. Overfitting: When selecting variables based on their statistical significance/predictive power, the resulting model may perform well on that sample but generalise poorly to new data.

• Biased estimates: can produce biased estimates and incorrect conclusions when there are correlations between the predictor variables.

New cards

How do we formally report stepwise regression?

Stepwise multiple regression was used to identify the best predictive model of misophonia from the predictors: OCD (washing, checking, ordering, obsessing, hoarding, neutralising) and anxiety.

Preliminary analyses were conducted to ensure no violation of the assumptions of normality, linearity, multicollinearity and homoscedasticity. An outlier was identified but was (/was not) omitted from the final analyses.

A final model was identified where anxiety (p<.001) and OCD checking (p= .033) explained 26.5% (Adjusted R2= 0.265) of the variance in misophonia, F (2, 149)= 28.17, p< .001. OCD washing, ordering, obsessing, hoarding and neutralising were not significant predictors in the final model.

Anxiety and OCD checking made positive contributions, suggesting that increases in these predictors are associated with an increase in misophonia levels. However, these two variables only explained 26.5% of the variability in misophonia, suggesting there may be other factors that contribute to predicting the variability in misophonia levels.

New cards

What are Cohen’s d table for effect sizes?

shows if it constitutes a good model
Cohen (1988) Interpretating R squared

<0.02 Very weak

0.021- 0.13 Weak

0.131- 0.26 Moderate

> 0.261 Substantial

New cards

How may we interpret results for stepwise multiple regression….

The results suggest increases in anxiety and OCD checking are predictive of increases in misophonia levels.

• Misophonia may be linked to anxiety, and the checking component of OCD, but the results from the present sample suggest that the other components of OCD, namely: washing, ordering, obsessing, hoarding and neutralizing did not contribute to explaining the variance in misophonia.

New cards

What are 3 problems of stepwise entry?

When there are small differences between variables the computer will unquestioningly choose the largest for addition at each step.
Danger that none of the variable are included in the equation, as the variables fail to meet the rules of the stepwise method.
There is a lack of researcher control.

New cards

LECTURE 4

ANOVA

New cards

What are 3 types of t-test

Independent samples t-test: compares means from two independent groups.

Paired samples t-test: compares means from two sets of individuals.

• Repeated measures design

• Matched-subjects design

One-sample t-test: compares an observed mean to a population mean.

New cards

What test do we use when we want to compare more than 2 conditions, or more than 2 means, e,g group 1 no music, group 2, constant music, group 3 intermittent music

ANOVA (analysis of Variance)

New cards

Why would we not just use several T-tests though?

e.g carry out 3 separate t-tests to see the differences between the 3 groups

. The Experimentwise Error Rate (EER) / The Familywise Error Rate (FWER)

. The probability of making at least one Type I error (false positive) across a set of multiple hypothesis tests in a single experiment/ study. (say something is significant, when it actually isn’t)

• When you perform just one statistical test at a significance level of α = 0.05, there’s a 5% chance of incorrectly rejecting a true null hypothesis.

• But when you perform many tests within the same experiment, the chance of making at least one false positive increases dramatically.

. SO you increase the chance of making a Type 1 error

New cards

What is ANOVA?

• A parametric inferential test used to test for variability in scores

• Used when we have more than two groups/ conditions (levels) and/or more than one independent variable (factor)

– Statistically advantageous over performing multiple t-tests on the same data

• A major advantage= it allows you to investigate the effect of multiple factors on your dependent variable at the same time (in combination)

– Factorial ANOVA

New cards

How are t-tests and ANNOVA similar and when would you use both?

• Both compare means between-groups

• With 2 groups both work but:

. t-test more efficient

. ANOVA inefficient (not needed in terms of parsimony)

• With more than 2 groups:

. t-test not efficient

. ANOVA more efficient

New cards

What are some research questions ANOVA could address?

. Are there attitudinal and behavioural differences between different generations?

. Are there differences in reaction times for drivers:

• Hands-on phone drivers

• Hands-free phone drivers

• No-phone drivers

And does gender interact with this?

New cards

What are 4 assumptions of ANOVA

. Level of data

. Normality

. Homogeneity of variance

. Independent random samples

New cards

What is the levels of data assumption?

The dependent variable (DV) consists of data measured at interval or ratio level (quantifiable)

…. not nominal or ordinal (categorical)

Interval:

Put scores in an order, with equal distances (intervals) between numbers. No true zero; zero doesn’t mean and absence of the variable. e.g temperature, score on intelligence

Ratio:

Like interval BUT with a true zero. There can be a total absence of the variable, e.g speed in miles, number of children in a household

New cards

What is the assumption normality? What are the 2 ways we can check it?

. The data for the dependent variable/s (factor/s) is normally distributed

. Check histogram of DV data

. use skewness and kurtosis- convert these scores into z scores (divide statistic by standard error)… If z is outside the zone of +-1.96, then it is significant (p<.05) and suggests non-normal data

New cards

What is the assumption homogeneity of variance?

. The samples being compared are drawn from populations with the same/ similar variance

. Levene’s test of equality of error variances, or on histogram

New cards

What is the independent random samples assumption?

For independent groups designs, independent random samples must have been taken from each population. (Via random allocation)

New cards

How does ANOVA work?

• ANOVA analyses the different sources that cause variance in the DV

• It analyses the variability between conditions (between-groups variance) and within conditions (within-group variance)

• Basically, is the variance primarily due to differences between groups, or differences within groups?

–Is there a true effect of the IV (factor) on the DV?

New cards

What is between-group variance and what are 3 sources that impact this?

• Between-groups variance is the variance (difference) between group means, e,g no music has a mean 9, but intermittent had a mean of 22

SO variance BETWEEN the groups!!!

Arises from…

Individual differences
Treatment effects
Random effects

New cards

What are treatment effects?

This is the effect of the IV(s)/ factors
Variance due to different groups of people, in different conditions, behaving differently from each other
We anticipate a difference between experimental conditions

New cards

What are individual differences?

• People naturally vary.

• We don’t want a high amount of individual differences as this might lead us to think our IV is having an effect when it isn’t

• For example: reaction time to identify famous paintings.

New cards

What are random effects?

–Errors of measurement can arise from a variety of sources such as:

• Varying external conditions (time of day, temperature)

• State of the participant (tired, motivated)

• Experimenter’s, or computer’s, ability to measure and score accurately (same instructions, demeanour)

New cards

What is within group variances and what are 2 factors which influence this?

. AKA error variance (as the difference is not due to the Iv, but error)

. It’s the variation between people within the same group, e.g there’s a difference in test scores within the groups for groups 2 (as someone scored 8, someone else scored 30)

Affected by

random effects
Individual differences

New cards

SO what is the logic of the between and within group variations?

–subjects in different groups should have different scores because they have been treated differently (i.e. given different experimental conditions)= Between-groups variance

–subjects within the same group should have the same/ similar score= Within-groups variance

New cards

When would we use the null hypothesis, and the alternative hypothesis? Give an example for both

NULL= The populations from which the samples have been drawn have equal means. e.g There will be no difference in test scores for students in the 3 conditions

ALTERNATIVE= the populations from which the samples have been drawn do not have equal means. e.g ‘There will be a difference in test scores for students in the no music, constant low music or intermittent music conditions.’ (non-directional) (NO PREVIOUS RESEARCH)

e.g ‘Students in the constant low music condition will perform better in the test, compared to students in the no music or the intermittent music condition.’ (directional) (PREVIOUS RESEARCH)

New cards

What does it mean if the between groups variance is larger than the within groups variance?

Statistically significant value for F, and can conclude there is a significant difference

New cards

How do we calculate the F value

We want to see if our manipulation of the IV/factor is responsible for the differences between scores

F = variance due to manipulation of IV/factor DIVIDED BY error variance

ANOVA calculates the ratio of the variance due to our manipulation of the IV (between-groups variance) and the error variance (within-groups variance)

F = between-groups variance DIVIDED BY within-groups variance

New cards

How do we find out if the F-ratio is statistically significant?

• If the F-ratio is larger than 1, we need to decide if the value is large enough to be statistically significant

• The p value needs to be equal to or less than 0.05 for the F ratio to be regarded as statistically significant

New cards

What’s the difference of F in ANOVA vs multiple regression?

• Multiple Regression is the statistical model that is used to predict a continuous outcome on the basis of one or more continuous predictor variables.

– The F statistic is the test of the fit of the linear model

• ANOVA is the statistical model that is used to predict a continuous outcome on the basis of one or more categorical predictor variables.

– The F statistics is the test of fit for the group means

• The F ratio in ANOVA is exactly the same as in regression, except that the regression model for ANOVA contains categorical predictors.

New cards

SO what are factors?

This is the IV, e.g music type

New cards

What are factor levels

. conditions of the IV, e.g no music, constant, intermittent

New cards

What is a mixed ANOVA design?

Mixed ANOVA is used when a study design includes one or more within-subjects factors and one or more between-subjects factors, E.g., looking at the effect of music type (between-subjects; 3 levels) at different times of the day (within-subjects; 3 levels) on test performance

New cards

When labelling ANOVA, what does a one-way or two-way or three-way mean?

• The number indicates the number of factors (IVs)

–One-way ANOVA (one factor, e.g. music type)

–Two-way ANOVA (two factors, e.g. music type and time of day)

–Three-way ANOVA (three factors, e.g. music type, time of day and gender)

New cards

What does a 3×3 ANOVA mean

tells us the number of levels we have…

e.g Time of day has 3 levels (morning; afternoon; evening)

New cards

LECTURE 5

ONE-WAY ANOVA

New cards

What doe s higher F score mean

More likely statistically significant

New cards

ANOVA has 4 assumptions- which 2 do you do before doing the ANOVA, and which ones do you do when you are doing the ANOVA?

BEFORE- interval/ratio data, check samples have been independent random

AFTER- homogeneity of variance, normality

New cards

Where do we check for normality?

. Histogram

. calculate skewness and kurtosis

New cards

How do we check for outliers?

Boxplots

. An outlier if there is a circle, and extreme outlier if there is a star

New cards

How do you calculate the mean square for the between groups and within groups?

Between groups sum of squares divide by between groups df

Within groups sum of squares divide by within groups df

New cards

How do you calculate the f value?

Between groups mean square divided by within groups mean square

New cards

What do profile plots show?

visual representation of scores each condition. When the error bars do not overlap, this suggests that the conditions are likely to be significantly different.

New cards

What is the effect squared for ANOVA called?

eta squared?

New cards

What are planned vs unplanned comparisons??

1. Planned (a priori) comparisons

– Conducted when the researcher has hypothesised which means will differ from each other in advance, e.g this will be a larger reaction time

– The overall main effect does not need to be significant to run planned comparisons.

2. Unplanned (post hoc) comparisons

– Differences in means explored after data has been collected.

– Don’t use if overall main effect is not significant

. Normally you would only conduct either planned OR unplanned

New cards

What are Cohen’s d effect sizes (1988)

0.2= small

0.5= medium

>0.8= large

New cards

How do you report a planned comparison??

Planned comparisons were performed to test the two hypotheses. See Table 1 for mean (and SD) recall of each group.

The mean recall scores of the morning group were significantly higher than the evening group, t (27) = 4.14, p<.001, with a large effect size (d= 1.80).

Furthermore, the mean recall scores of the afternoon group were significantly higher than the evening group, t (27) = 4.56, p<.001, with a large effect size (d=2.27).

New cards

Give an example of interpreting the results?

The results suggest that recall for the evening group was lower than both the morning and afternoon group. There was no significant difference between the morning and afternoon.

This suggests that to aid student’s recall of information, morning (9am) and afternoon (2pm) teaching sessions produce greater recall thanevening (9pm) teaching sessions.

New cards

LECTURE 6

One way within-subjects ANOVA

New cards

What does a one-way within-subjects ANOVA involve and give an example

. one factor (IV)

. repeated measures so everyone takes part in ALL conditions

e.g is there a difference in the recall of information depending on the time of day it was presented (morning, afternoon, evening)

New cards

Advantages of repeated measures design?

. Increased statistical power

. Removes the effect of individual differences

. Fewer participants needed

. Less time and money

New cards

What are some disadvantages of repeated measures design?

. Practice effects

. Fatigue

. Contrast effects (order effect)

. Demand characteristics

100

New cards

How can we remove order effects?

Randomising the order of testing

• For two conditions (A and B) we could randomly determine whether A or B is experienced first

Counterbalancing order of testing

• For two conditions (A and B) half the participants would experience condition A followed by B and the remaining half would experience B followed by A