1/39
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Reliability
dependency
consistency of measurement
Reliability coefficient
High_____ is a prerequiste of validity
_______increases with test length
Reliability coefficient
an index of reliability, a proportion that indicates the ratio between the true score variance on a test and the total variance
Reliability estimates
test-retest reliability
Parallel forms reliability
Internal consistency reliability
Inter-rater reliability
Test-retest reliability
an estimate of reliability obtained by correlating pairs of scores from the same people on two different administrations of the same test
Coefficient of stability
The longer the time that passes, the grater the likelihood that the reliability coefficient will be lower
If the duration of the test-retest is too short, there is a tendency for Carryover effect/practice effect
Problem: It is not applicable for states (less enduring characteristics of a person)
Applicable for Trait test (last long, long enduring)
Pearson r or Spearman rho
Coefficient of stability
how stable is the construct or measure
Parallel Foms and Alternate Forms reliability
Item Samping
The consistency of test results between two different —but equivalent —forms of a test
coefficient of equivalence
the advantage of having another form is it eliminates carryover effect
Pearson r or Spearman rho
Parallel forms
________for each form of the test, the mean and the variances of observed test scores are equal
Same items, different positionings/numberings
Alternate forms
are simply different versions of a test that have been constructed so as to be parallel
Internal consistency reliability
defines measurement error strctly in terms of consistency or inconsistemcy in the content of the test
Split half reliability estimate
Spearman Brown Formula
Cronbach Coefficient Alpha
Kruder Richardson Formula
Split-half reliability estimate
obtained by correlating two pairs of scores obtained from equivalent halves of a single test administered once
odd-even reliabilty
Three steps
Divide the test into equivalent halves
Calculate a Pearson r between scores on the two halves of the test
Adjust the half-test reliability using the Spearman-Brown formula
Spearman Brown Formula
allows a test developer or user to estimate internal consistency reliability from a correlation of two halves of a test
Estimate the effect of the shortening on the test’s reliability
used to determine the number of items needed to attain
Cronbach’s Coefficient alpha
used with ratio or interval data
nondichotomous items
Mean of all possible split half correlations
Preferred statistic for obtaining an estimate of internal consistency reliability
Typically ranges in value from 0 to 1
Kruder-Richardson Formula
used for test with dichotomous items, primarily those items that can be scored right or wrong (such as multiple choice items)
KR 20
useful in terms of evaluating the internal consistency of highly homogenous items
used for inter-item, consistency of dichotomous
Inter-item consistency
refers to the degree of correlation among all the items on a scale
Test homogeneity - measure single trait
Test heterogeneity - measure different factors
multipoint item
Pearson r between equivalent test halves with Spearman-Brown correction or Kuder-Richardson for dichotomous items, or coefficient alpha for________
Inter-scorer reliability
the degree of agreement or consistency between two or more scorers (or judges or raters) with regard to a particular measure
scorer reliability, judge reliability, observer reliability, and inter-rater reliability
coefficient of inter-scorer reliability
Pearson r or Spearman rho
higher
the______the reliability of a selection test the better
0.7
the minimum satisfying figure for test reliability is ____
.80
a reliability coefficient of _____indicates that 20% of the variability in test scores are due to measurement error
Validity
the agreement between a test score or measure and the quality it is believed to measure
As applied to a test, is a judgment or estimate of how well a test measures what it purports to measure in a particular context.
More specifically, it is a judgment based on evidence about the appropriateness of inferences drawn from test scores.
The process of gathering and evaluating evidence about validity.
Validations studies (i.e. local validation studie
Logic of validity analysis
a valid test is one that
predicts future performance on appropriate variables
measures an appropriate domain
measures appropriate characteristics of test takers
Validity is determined by the relationship between test scores and some other variable referred to as the validation measure
Validity: Trinitarian model
content validity
criterion-related validity
construct validity
3 approaches to assessing validty
Scrutinizing the test’s content
Relating scores obtained on the test to other test scores or other measures
Executing a comprehensive analysis of
How scores on the test relate to other test scores and measures
How scores on the test can be understood within some theoretical framework for understanding the construct that the test was designed to measure
Face validity
not a true measure of validity
no evidence
a test appears to measure to the person being tested than to what the test actually measures
Content validity
describes a judgment of how adequately a test samples behavior representative of the universe of behavior that the test was designed to sample
Two concepts
Construct under-representation
Construct-irrelevant variance
Construct under-representation
Failure to capture important components of the construct
Construct-irrelevant variance
scores are influenced by factors irrelevant to the construct
Criterion validity
test score can be used to infer an individual’s most probable standing
How a test correspond to a particular criterion
Predictive
Predictor and criterion
Concurrent
Predictive Validity
Measures of the relationship between test scores and a criterion measure obtained at a future time
Researchers must take into consideration the base rate of the occurrence of the variable, both as that variable exists in the general population and as it exists in the sample being studies
Concurrent validity
If the test scores are obtained at about the same time as the criterion measures are obtained
Extent to which test scores may be used to estimate an individual’s present standing on a criterion
Economically efficient
Validity coefficient
Relationship between a test and a criterion
Tells the extent to which the test is valid for making statements about the criterion
Construct validity
something built by mental synthesis
Involves assembling evidence about what a test means
Show relationship between test and other measures
judgement about the appropriateness of inferences drawn from test scores regarding individual standing on variable
Convergent Evidence
Discriminant Evidence
Convergent evidence
Correlation between two sets believed to measure the same construct
Discriminant evidence
divergent validation
the test measures something unique
low correlations with unrelated constructs
Evaluating validity coefficient
Look for changes in the cause of the relationship
What does the criterion mean?
The criterion should be valid and reliable
Review the subject population in the study
Is the sample size adequate?
Do not confuse the criterion with the predictor
Is there variability in the criterion and the predictor?
Is there evidence for validity generalization?
Consider differential prediction
Relationship between validity & reliability
Reliability: ability to produce consistent scores that measure stable characteristics
Validity: which stable characteristics the test scores measures
It is theoretically possible to develop a reliable test that is not valid. If a test is not reliability, its potential validity is limited.
Convergent result
significant
positive direction
DIvergent result
not significant
negative direction