1/41
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
sign test definition
involves looking up the number of pluses/minuses (S) (whichever is smaller) against the total number of pluses and minuses (N) in a table
the table tells you whether the results you have obtained are statistically significant at the 5% level
the sign test is used when you have pairs of scores of related samples
used in matched pairs or repeated measures designs
sign test procedure
give each pair of scores a plus if the score in the left column is bigger than the score in the right column
give each pair of scores a minus if the score in the left column is smaller than the score in the right column
give each pair of scores a zero if there is no difference between the left and right columns
make a note of the number of times the less frequent sign (S) occurs and the total number of pluses and minuses (N) (don’t include any zeroes in N)
look at the given statistical table in the highest critical value of S which is significant at the 5% level of N
final steps of sign test
if the value for 5 you have found is equal to or lower than the value in the statistical table, the IV has had an effect on the DV
thus the results are significant at the 5% level
if the value for 5 you have found is more than the table value, then it is concluded the IV has no effect on the DV
thus the results are not significant at the 5% level
example of model answer for sign test
the calculated value of S (7) is greater than the critical value (5) for N = 20 at 5% level (p = 0.05)
for a one-tailed test such as this so the results are not significant
purpose of statistical testing
statistical tests determine if a difference/correlation is statistically significant
factors that determine the choice of statistical tests
has the researcher conducted a test of difference or correlation?
if a test of difference is conducted, which experimental design was used?
unrelated - independent groups design
related - repeated measures and matched pairs designs
has the researcher collected nominal, ordinal or interval data?
when to use which statistical test
chi-squared is a test of both difference and association
spearman's rho and pearons’s r are the only tests of correlation
parametric tests definition
parametric tests assume:
a normal distribution
use of interval data (as it’s the most sensitive and precise)
homogeneity of variance
homogeneity of variance definition
if the set of scores per condition are similar in terms of dispersion, then this means they have homogeneity of variance
if both conditions have a similar standard deviation, then this indicates that there was not a large amount of variability in each condition
strengths of parametric tests
more powerful and precise than non-parametric tests assume they have more statistical power and more likely to lead to the detection of a significant difference or correlation
non-parametric tests definition
do not follow the same criteria as parametric tests
there is no assumption of normal distribution as what is being measured may not fall within defined parameters
they use nominal or ordinal data
they do not depend on homogeneity of variance
reliability definition
reliability is consistency (if results are reliable, they will be consistent every time the experiment is repeated)
internal reliability definition
internal reliability means the test is consistent within itself
e.g. 2 parts of the same test need to measure the same thing in the same way
external reliability definition
external reliability means the test is consistent over a period of time
e.g. an IQ test should produce the same results for the same person the next year as it did last year
external reliability can be determined using the test-retest method
how to test consistency
test-retest reliability
if a test is repeated, a reliable test would yield similar results
split half reliability
if 2 halves of the same test yield the same results
threats to reliability
task interest
if a participant finds a task interesting, they are more likely to do well in it
if a participant is bored by a task, there may be a decline in performance
if a task interesting issues is suspected, it is necessary to make the control task equally interesting
how to combat task interest
counterbalancing
e.g half of participants would do task A then task B
the other half would do task B then task A
inter-rater reliability definition
how similar different raters/judges score the same event
it is unreliable to have the results of a test depend upon who is doing the observation
it is necessary that all people making judgements need to be making the same judgements using the same criteria
MODEL ANS: how to check inter-rater reliability
inter-rater reliability is when you have 2 or more people making the same judgements using
the numerical scores of the observers are compared using a scattergram
a positive correlation of 0.8 or more demonstrates good inter-rater reliability
the correlation can be worked out by calculating a correlational coefficient
correlational coefficient can be calculated using Spearman’s Rho and Pearson’s R
MODEL ANS: how to train inter-rater reliability
clear and mutually exclusive categories are made for observation
joint observations are completed (preferable using a video) whilst recording categories to standardise observations and judgements
the reliability of observational scores is compared using a scattergram
this is done by calculating the correlational coefficient of observational scores
correlational coefficient can be calculated using Spearman’s Rho and Pearson’s R
a correlational coefficient of 0.8 or more demonstrates good inter-rater reliability
validity definition
a test or measure is valid if it measures what its supposed to measure
face validity definition
weakest form of testing validity
process of looking at a measure and making a quick judgement as to whether it does measure what it’s supposed to
population validity definition
concerns the population chosen for a sample
questions which are asked:
is the sample size large enough?
is the sample size too narrow culturally?
can the findings be generalised to a wider population?
external validity definition
another way of questioning whether the results can be extrapolated or generalised across a wider population
ideas to look at and whether these make the data generalisable or not:
location
time of day
era within history
internal validity definition
examines whether the IV caused the DV
the results of a study are internally valid if they are the results of the manipulation of the IV on the DV and if they have not been affected by extraneous variables
ecological validity definition
how true to life the experimental situation is and if the results would be replicated in a real-life scenario
content validity definition
does the content of the experiment/test measure what its supposed is supposed to
criterion validity definition
examines if the criteria being used is measuring what they’re mean to
concurrent validity definition
examines if other test done at the same time support the results found
predictive validity definition
examines whether the experiment or theory predict what will happen to other people or how how other people will behave, based on what happened to the participants or how they behaved in the study
experimental validity definition
examines whether the conclusions drawn from a piece of research are true
examines whether the experiment worked
examines whether there was a genuine effect of the IV on the DV
type I and type II errors can be referred to here
experimental validity requires internal validity
construct validity definition
examines whether the correct assumptions about psychological constructs have been made
e.g. is measuring sadness through crying correct? are there any other occasions when people cry? are there other ways of expressing sadness?
temporal validity definition
measures the extent to which research findings are still relevant in the current age
threats to validity
investigator effects
demand characteristics
Hawthorne Effect
observational unreliability
unreliable self-report
improving the validity of lab experiments
using controlled conditions and standardised procedure to be able to establish causality
using single-blind or double-blind procedures to ensure no bias from researchers
avoiding investigator effects
use double-blind procedure to reduce impact of the investigator on participant performance
avoiding demand characteristics
disguise the aim of the research as much as ethical conduct allows
using single-blind procedures
improving observational reliability
using covert methods in naturalistic observation to reduce participant behaviour being contrived
ensuring behavioural categories are unambiguous and mutually exclusive
improving reliability of self-report
using a lie scale to show inconsistencies in responses
using reverse scoring to ensure the participants answer all questions with the same direction of response, rather than the same option (e.g, 10/10 for all answers)
type I error definition
occurs when the null hypothesis is rejected when it should have been accepted (researcher claims the results are significant when they are not)
more likely to occur when the researcher uses a probability value that is too high (e.g 0.1 rather than 0.05)
type II error definition
occurs when the null hypothesis is accepted when it should have been rejected (researcher claims the results are not significant when they are)
more likely to happen when the researcher uses a probability value that is too low (e.g. 0.01 instead of 0.05)
how to combat type I and II errors
using a significance level of 0.05