1/49
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
3 types of reliability:
Across time
Across raters
Across items within the test
Random error
We may not get identical scores when repeating a measure on the same person
Participant-driven error: mood, hunger, fatigue
Environment-driven error: temperature, noise, time of the day
Reliability across time (2 procedures)
Test-retest reliability
Parallel-forms reliability
Test-retest reliability
The extent to which scores on identical measures correlate with e/o when administered at two different times
General procedure
Administer the test
Get the results
Have the interval/time gap
Repeat step 1&2
Correlate both results
r-value
The more reliable; same scores observed again and again, the higher the r-value (from 0.00-1.00)
Limitation
The completion of the 1st test may influence one’s knowledge in completing the subsequent tests
Eg. it is easier to complete an IQ test the 2nd time when we already know the questions
To fix that limitation, we may use:
Parallel-forms reliability
Parallel-forms reliability
The extent to which scores on similar, but not identical, measures correlate with e/o when administered at two different times
General procedure
Administer the test (form A)
Get the results
Have the interval/time gap
Administer the other test (form B)
Get the results
Correlate both results
Limitation
Expensive to create double the number of tests, and difficult to make sure the two tests are really equivalent
Reliability estimates are only meaningful when:
the construct does not change over time
Example: measuring children’s IQ at age 5 and 10
Low reliability (r= 0.5) may not be meaningful as the difference in IQ scores at age 5 vs 10 could be due to changes in intelligence
Why are changes in intelligence not considered random error?
Because intelligence is the construct we are measuring, and there is a genuine change in the construct between age 5 and 10 that is not due to random error
Example: BFLM scale administered to a 1 year r/s couple and a 10 year r/s couple
Low reliability (r= 0.3) could be due to the fact that love can change after 10 years
Time interval
Choose a time interval that makes sense, depending on CONTEXT and what we are measuring
Example: time interval for BFLM scale
3 day time interval > found low TRR > more likely due to random error (as love cannot change in such a short period) > consider to revise
When is appropriate to use short/long intervals?
Long: when the construct is more resistant to change eg. personality
Short: when the construct is more susceptible to change eg. moods
Possible limitation of short intervals
Susceptible to low TRR
Eg. 5 min interval between eating a plate of Hokkien mee and low consistency in taste could be due to fullness
Two ways to improve TRR & PFR:
Revise your measurement: remove subjective questions, or make them more specific to avoid multiple interpretations —> increase resistance to random error
Administer the measure more times (across the day) and aggregate the scores together
over a series of measurements, the inconsistencies in scores caused by random error should be averaged to 0
Reliability across raters; inter-rater reliability
The extent to which the ratings of one or more judges correlate with e/o
Why do we need inter-rater reliability (IRR)?
Observer error may arise as raters may differ in moods, attention, motivation, interests etc
3 ways to improve low IRR
Train your raters and provide clearer guidelines for ratings
Revise your scale (similar to TRR & PFR)
Have more no. of raters before aggregating the scores
the overestimates and underestimates caused by observer error should be averaged to 0
Reliability across items within the test; internal consistency
The extent to which the scores of the different items on a scale correlate with e/o, thus measuring the true score
Why do we need internal consistency?
Most measures consist of >1 item to fulfil content validity, and each item is assumed to measure a part of the total construct
2 ways to calculate internal consistency
Split-half reliability
Cronbach’s alpha
Split-half reliability
The extent to which scores between two halves of a scale are correlated
Procedure example: Odd-even order
One half consists of items 1&3, the other half consists of items 2&4
If the scores are similar for both halves, then the scale has good SHR
Limitation of using SHR
The 2 halves/versions may not really be equivalent
To fix that limitation, we may use:
Cronbach’s alpha
Cronbach’s alpha
An estimation of the average correlation among all the items on the scale = the average of all possible SHR outcomes
What is a suitable Cronbach’s alpha score for an acceptable scale?
>0.7
Relationship between reliability and validity
Reliability is a pre-requisite for validity; a measure must first be reliable then it can be valid, but a measure can be reliable without being valid
Step 4a
Select your sample
Selection bias
When the sampling method favours the selection of some individuals over other
2 categories of Sampling methods:
Probability sampling
Non-probability sampling
When to use either sampling methods?
Probability sampling: when the entire population is known, sampling occurs through an unbiased and equal-chance selection process
Non-probability sampling: when the population is not completely known, sampling is guided by common sense or convenience, while maintaining representativeness and avoiding bias
5 types of probability sampling methods:
Random sampling
Systematic sampling
Stratified random sampling
Proportionate stratified sampling
Cluster sampling
2 types of non-probability sampling methods:
Convenience sampling
Quota sampling
When to use convenience vs quota sampling?
Convenience: when it doesn’t matter if there is any group differences in the variable of interest eg. males or females will produce the same results, hence numbers do not matter
Quota: when you believe there are established group differences in your variable of interest eg. males and females will produce different results, hence numbers matter
Is this probability or non-probability sampling?
Non-probability - not everyone has an equal chance of being selected because this survey is only conducted on one day, those who did not take mrt/bus that day were ntot able to take part
Which sampling method is more employed?
Nonprobability, due to its convenience and much lower cost &
Probability sampling requires every individual in a population to be known and to have equal chance of being selected, which is difficult
What is the standard sample size for accurate generalisation?
>50 individuals from each group
Step 4b
Prepare for ethics approval
4 elements of ethical procedure involving human participants:
No harm
Informed consent
Privacy and confidentiality
Debriefing
No harm
Researchers are to avoid harming subjects, and to minimise harm when it is foreseeable and unavoidable
Eg. Little Albert experiment was super harmful
Informed consent
Researchers must obtain consent from participants
typical containing: the purpose of the research & expected duration, study procedure, any prospective research benefits, potential risks, limits of confidentiality, contact details, and participants rights to decline and withdraw from the research
Privacy and confidentiality
Researchers are obligated to take precautions to protect confidential information
Practising anonymity for publication: ensuring subjects’ names are not directly associated with any information or measurements obtained from them
Debriefing
Researchers must quickly let participants know about the nature, results, and conclusions of the research, especially necessary when there is a cover story
The role of The Institutional Review Board (IRB)
Reviews all research w.r.t. human treatment and their approval must be obtained before conducting any real research