1/61
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Classical test theory
States that if we use x to represent an observed score, t to represent a true score, e to represent error
Systematic error
Would less likely affect scores, it is associated with the test that is applicable to all test takers, it does not change the variability of the distribution or affects reliability
Random error
Affects test performance more, it is not associated with the test itself
Reliability
Refers to the consistency of findings or results of a psychology research under study
Reliability
Refers to the trustworthiness of a measure, yielding the same results across multiple applications to the same sample
Test construction, test administration, test scoring and interpretation, other sources
Sources of error variance
Test construction
One source of variance during this is item sampling or content sampling, terms that refer to variation among items within a test as well as to variation among items between tests
Test construction
From the perspective of a test developer, a challenge in test development is to maximize the proportion of true variance from relevant differences and to minimize the proportion of error variance from irrelevant or random differences
Test administration
Sources of error variance that occur during this may influence the testtaker’s attention or motivation, the testtaker’s reactions to those influences are the sorice of error variance
Test environment, test taker variables, examiner related variables
Other sources of error under test administration are:
Test environment
The room temperatue, the level of lighting, and the amound of ventilation and noise, for instance
Test taker variables
Emotional problema, physical discomfort, lack of sleep, and the effects of food, drugs or any medication
Examiner related variables
Examiner’s physical appearance and manners, the presence or absence of an examiner, head nodding, eye movements, and non verbal gestures
Test scoring and interpretation
The advent of computed scoring and a growing reliance on objective, computer scorable items virtually have mostly eliminated error variance caused by scorer differences in many tests
Test scoring and interpretation
If subjectively is involved in scoring, then the scorer or rater can be a source of error variance
Other sources
Females may underreport abuse because of fear, shame, or social desirability factors and overreport abuse if they are seeking help
Test scoring and interpretation
Males may underreport abuse because of embarrassment and social desirabiloty factors and overreport abuse if they are attempting to justify the report
Test retest reliability, parallel and alternate forms reliability, internal consistency estimate of reliability/inter item consistency, inter score reliability
Types of reliability estimates
Test retest reliability
One way of estimating the reliability of a measuring instrument is by using the same instrument to measure the same thing at two points in time, it is obtained by correlation pairs of scores from the same people on two different administrations of the same test
Test retest reliability
Is appropriate when evaluation the reliability of a test that purports to measure something that is relatively stable over time, such as personality traits, it is also appropriate in measuring reaction time and perceptual judgments
Test retest reliability
This passage of time can be a source of error variance, the longer the time that passes, the likelihood that the reliability coefficient will be lower, factors include practice, memory, fatigue, and motivation
Parallel and alternate forms reliability
Evaluates the degree of the relationship between various forms of a test, two different forms
Parallel forms
__ of a test exist when for each form of the test, the means and the variances of observed test scores are equal, the mwans of scores obtained on this correlate equally with the true score, scores obtained here correlate equally with other measures, it has high correlation on form A and B and is more reliable
Alternate forms
Are simply different versions of a test that have been constructed so as to be parallel, although they do not meet the requirements for the parallel forms, are designed to be equivalent to variables such as content and level of difficulty, it has low correlation but forms A and B are still significant
Alternate forms
Minimizes the effect of memory to the test takers from the content or a previosuly administered form of the test, certain traits are presimed to be relatively stable in people over time, time consuming and expensive
Internal consistency estimate of reliability/ inter item consistency
A reliability estimate of a test can still be obtained without developing an alternate form of the test nor administering the test twice to the same people, entails an evaluation of the internal consistency of the test items
Split half reliability
Obtained by correlation two pairs of scores obtained from equivalent halves of a single test administered once, difficult to make
Split half reliability
Acceptable ways to split a test are randomly assign items to one or the other half of the test or using odd-even reliability, divide the test by content so that each half contains items equivalent to difficulty and content
Split half reliability
It is not recommended to simple divide the test in the middle because this procedure have factors to consider, different amounts of fatigue for the different parts of the test, different amounts of test anxiety, and differences in item placement in the test
Odd even reliability
Assign odd numbered items to one half of the test and even numbered items to the other half
Inter item consistency
Refers to the degree of correlation among all the items on a scale, it is calculated from a single administration of a single form of a test, is useful in assessing the homogenieity of the test or measures a single trait, the more homogeneous a test is
Heterogeneity
Non homogeneous is a degree to which a test measures different factors
Kuder-richardson formula
G. Frederic Kuder and M. W. Richardson developed their own measures for estimating reliability primarily for dichotomous items that replaced split half reliability, no equal variances
KR 20
__ and split half reliability estimates are similar, is the statistic of choice for determining the inter item consistency of dichotomous items, primarily those items that can be scored right or wrong, if more heterogeneous, this will yield lower reliability estimates
KR 21
Is used if there is reason to assume that all the test items have approximately the same degree of difficulty
KR 21
The formule for this has become outdated in an era of calculators and compited, in the past, this was sometimes used to estimate the other test because it is simpler and requires fewer calculations
Coefficient alpha
Developed by Cronbach and subsequently elaborated on by others, may be thought of as the mean of all possible split half correlations
Coefficient alpha
In contrast to KR 20, this is appropriate for use on tests containing non dichotomous/polytomous items, no right or wrong answers
Coefficient alpha
The preferred statistic for obtaining an estimate of internal consistency reliability, is widely used as a mesure of reliability because it requires only one administration of the test
Coefficient alpha
Typically ranges in value from zero, no similarity to one, perfectly identical, negative values of alpha are theoretically impossible, if under such rare circumstances, then the alpha can be reported as zero
Inter scorer reliability
Also referred as judge reliability, observer reliability and inter rater reliability, it is the degree of agreement or consistency between two or more scorers or judges with regards to a particular measure
Nature of tests
Considerations concerning the purpose and use of a reliability coefficient are those concerning the nature of the test itself
Test items are homogeneous or heterogeneous in nature, the characteristic, ability, or trait being measured is presumed to be dynamic or static, range of test scores is or is not restricted, the test is a speed or power test, the test is or is not criterion referenced
Some things we have to consider as to whether:
Homogeneous
A test is said to be this in items if it is functionally uniform throughout, tests designed to measure one factor, such as one ability or one trait, are expected to be __ in items, it is expected for internal consistency to be high
Heterogeneous
If the test is __ in terms, it is expected for test retest reliability to be high
Dynamic characteristic
Is a trait, state, or ability presumed to be ever changing as a function of situational and cognitive experiences are best obtained through internal consistency
Static characteristic
Such as traits, state, or ability presumed to be relatively unchanging such as intelligence are best measured by test retest or the alternate forms
Restriction
If the variance of either variable in a correlational analysis is restricted by the sampling procedure used, then the resulting correlation coefficient tends to be low
Inflation
If the variance of either variable in a correlational analysis is inflated by the sampling procedure then the resulting correlation coefficient tends to be high
Power test
No time limit, allows testtkers to attempt all items, there are some items that starts from easy to difficult items that no test taker is no longer able to answer
Speed test
Contains items of uniform level of difficulty so that, when we give generous time limit, all test takers should be able to answer as many test items as possible
Criterion referenced test
Is designed to provide an indication of where a testtaker stands with respect to some variable or criterion, used frequenly to gauge achievement or mastery, scores on these tests tend to be interpreted in pass or fail
Classical test theory, domain sampling theory, generalizability theory, item response theory
Reliability theories
Classical test theory/true score theory
The test is the unit of analysis, it is explained by the formula X is equal to T added to E
True score
The ability to be measured is not always evident because it is covered by error
Confidence interval
Since we cannot measure the actual abiloty, we can measure for its __, the location where the true score is, the range of the true score
Domain sampling theory
True score is equal to a universe or infinity, small number of items is a representative of the bigger number of iterms, controls sampling error in selection of items, more items, more reliable, internal consistency is perhaps the most compatible
Generalizability theory
Developed by Lee Cronbach, a person’s test scores vary from testing to testing because of variables in the testing situation or bias, it may contain more sources of error
Item response theory/latent trait theory
The test items are the unit of analysis, it is possible to have lesser iterms to be reliable
Item branching
Calibrates difficulty of items depending on the testtaker’s performance
Item difficulty in IRT
The attribute of not being easily solved, or comprehended
Item discrimination in IRT
The degree to which an item differentiates among people with higher or lower levels of what is being measured