week 4: normality/outliers/erroneous data

first step of data screening is checking for

  • erroneous data

  • normal distribution

erroneous data eg age results from the methodology/materials:

  • time taken

  • extreme/non-serious responses

  • questionnaire/subscale scores are out of range

ways to screen data:

  • check demographic data

  • time check - how long did it take them to do?

    • what is reasonable?

  • check responses vary

    • have people just ticked strongly agree for everything?

  • check response ranges - within expected ranges?

  • check for ppts with missing data

removal of ppt needs to be documented and justified

ways to reduce poor/nonsense responses:

  • commitment checks

  • attention checks eg please tick the neutral response for this statement, trap questions

  • speed warning eg you’ve answered these questions very quickly, have you given everything sufficient thought

assumptions = a rule that data needs to meet

parametric tests rely on the mean - 3 assumptions

  • interval/ratio data

  • homogeneity of variance

  • normally distributed

non parametric tests rely on the median

kurtosis = skew of spread about the mean

  • platykurtic = flat distribution with thin tails - negative kurtosis

  • leptokurtic = fat tails - positive kurtosis

  • mesokurtic = 0

value of kurtosis is 0 if data is normally distributed