Research Methods I - Reliability

A simulated dataset from a newly developed test measuring the level of negative affect (high score = bad) has been administered to 500 randomly selected individuals.
Analyze the dataset to determine if the scores follow a normal distribution.
- In SPSS: Analyze -> Explore -> Put the variable in the “dependent list” -> Click “plots” and select “Histogram”.
Identify and exclude outliers before proceeding.
- Detect outliers using SPSS: Analyze -> Explore -> Click “statistics”.

A simulated dataset from an already validated and reliable test measuring anxiety level (high score = bad) has been administered to 500 randomly selected individuals.
Client X scored 55 on the affect test and 105 on the anxiety test.
Assuming both tests are valid, determine which score is more concerning if scores exceeding the 90th percentile indicate complaints.

Test for Affect
- M = 49.92, SD = 7.65
- Obtained score = 55
- Difference from mean is approximately 5.
- Approximately 25% of the reference group has a higher score.
Test for Anxiety
- M = 100.14, SD = 3.01
- Obtained score = 105
- Difference from mean is approximately 5.
- Only approximately 5% of the reference group has a higher score.

Reliability refers to the consistency or precision of a measurement.
What impacts reliability?
- Measurement error
- Measurement error is any fluctuation in scores that results from factors related to the measurement process that are irrelevant to what is being measured

Correlation entails the relationship between two.
What is correlation?
Graphs depicting different correlations, need to determine which correlation is higher.
A, B, C (examples of correlations)

Source of Error	Type of Tests Prone to Each Error Source	Appropriate Measures Used to Estimate Error
Interscorer differences	Tests scored with a degree of subjectivity	Scorer reliability
Time sampling error	Tests of relatively stable traits or behaviors	Test-retest reliability (r) a.k.a. stability coefficient
Content sampling error	Tests for which consistency of results, as a whole, is desired	Alternate-form reliability (r) or split-half reliability
Interitem inconsistency	Tests that require inter-item consistency	Split-half reliability or Kuder-Richardson 20 (K-R 20)
Interitem inconsistency and content heterogeneity combined	Tests that require inter-item consistency and homogeneity	Internal consistency measures
Time and content sampling error combined	Tests that require stability and consistency of results, as a whole	Delayed alternate-form reliability

Why?
- To assess inter-item consistency.
How?
- Correlate scores of participants on half of the test with the other half.
But…
- Then we only have the correlation for half of the test.
- We cannot simply extrapolate this.
- So…

Why?
- To assess inter-item / internal consistency.
How?
- Perform all the possible split-half analyses on the dataset, and average the $r_{hh}$ \'s à Chronbach’s alpha…
- Or use the formulate and calculate by hand: https://www.youtube.com/watch?v=JkOiLUZkutc&t=273s
- Or simply use SPSS/R

Good item?
- We want to throw away items that do not correlate well (go together) with the total score on a test.

200 students filled out a questionnaire assessing their level of Nescafe addiction.
The questionnaire is still under development, and this was a first pilot.
Items can be scored from 0 to 10.
Please investigate the dataset, and perform a psychometric analysis.
Specifically, perform the following analyses:
- 1) Are there any outliers? Can you exclude them from further analysis? If so, please do exclude them.
- 2) Report the mean score of the scale (what did participants score on average?) and standard deviation?
- 3) Do you think it is a reliable test? Why?
- 4) Would you keep all the items, or would you exclude items from the test?
- 5) Assume that people > 90th percentile usually present with Nescafe addiction, should we worry for participant “40”?