week 4: normality/outliers/erroneous data
first step of data screening is checking for
erroneous data
normal distribution
erroneous data eg age results from the methodology/materials:
time taken
extreme/non-serious responses
questionnaire/subscale scores are out of range
ways to screen data:
check demographic data
time check - how long did it take them to do?
what is reasonable?
check responses vary
have people just ticked strongly agree for everything?
check response ranges - within expected ranges?
check for ppts with missing data
removal of ppt needs to be documented and justified
ways to reduce poor/nonsense responses:
commitment checks
attention checks eg please tick the neutral response for this statement, trap questions
speed warning eg you’ve answered these questions very quickly, have you given everything sufficient thought
assumptions = a rule that data needs to meet
parametric tests rely on the mean - 3 assumptions
interval/ratio data
homogeneity of variance
normally distributed
non parametric tests rely on the median
kurtosis = skew of spread about the mean
platykurtic = flat distribution with thin tails - negative kurtosis
leptokurtic = fat tails - positive kurtosis
mesokurtic = 0
value of kurtosis is 0 if data is normally distributed