1/238
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Reliability
The consistency of an instrument in measuring a specific dimension.
reliability
________ in psychological testing wherein the administration or development of a psychological test that measures dimensions about mental health, personality, intelligence, etc. is done
ensure that the psychological tests that are being developed or administered are consistent in measuring the dimensions that needs to be measured
Reliability testing
Done when developing psychological tests to lessen the Error Variance.
Error Variance
reliability testing is done when developing psychological tests to lessen what?
Error Variance
This is the statistical variability of scores caused by the influence of variables other than the independent variable (the variable that was intended to measure).
Reliability
is the degree to which a measure is free from random errors
reliability testing
Test developers conduct ___________ to lessen those errors thus the variability of scores can be considered as true variance and not a variance that was causes by extraneous variables or errors
errors
Test developers conduct reliability testing to lessen those ______ thus the variability of scores can be considered as true variance and not a variance that was causes by extraneous variables or errors
Test Score Theory
This theory assumes that each person has a true score that would be obtained if there were no errors in measurement; However, because measuring instruments are imperfect, the score observed for each person almost always differs from the person’s ability or characteristic.
Test Construction; Test Administration; Test Scoring and Interpretation.
Sources of Error Variance
Test Construction
One source of variance or difference during ___________ is item sampling or content sampling.
item sampling or content sampling
One source of variance or difference during test construction is
Item sampling or content sampling
Refers to variation among items within a test as well as to variation among items between tests.
Test Administration
Sources of error variance that occur during test administration may influence the testtaker’s attention or motivation.
Test environment, test variables, and examiner-related variables.
Test Administration: Sources of Error Variance classifications
Test Administration: Sources of Error Variance classifications
Test environment, test variables, and examiner-related variables.
These can cause the test taker to make mistakes in entering a test response thus contributing to the increase of variance of test scores
Test Environment
Examples of these are the following: Room Temperature; Level of Lighting; Amount of Ventilation; Noise.
Room Temperature
Level of Lighting
Amount of Ventilation
Noise
example of sources of variance in test environment
Testtaker Variables
Examples of these are the following: Pressing Emotional Problems; Physical Discomfort; Lack of Sleep.
Examiner Related Variables
Examiner’s physical appearance and demeanor
Examiner’s that provides clues
Level of professionalism.
Test Scoring and Interpretation
sources of error variance
there may be a less chance that errors will be committed in simple tests that are usually answered by pencil and paper
In tests that have open ended questions or observations wherein the examiner must have to interpret himself/herself.
Interpreting test results from open ended questions or observations is subjective, thus, errors in the score might be committed
Reliability Estimates
Test-Retest Reliability Estimates
Parallel-Forms and Alternate-Forms Reliability Estimates
Split-Half Reliability
Other Methods of Estimating Internal Consistency
Measures of Inter-Scorer Reliability.
Test-Retest Reliability Estimates
This estimates the reliability of a measuring instrument by using the same instrument to measure the same thing at two points in time; This estimate is obtained by correlating pairs of scores from the same people on two different administrations.
Test-Retest measure
The _________ is appropriate when evaluating the reliability of a test that purports to measure something that is relatively stable over time, such as a personality trait.
Stable factors
Personality Trait; Intelligence Quotient;
hindi basta- basta nababago
fluctuating factors
Past Learning; Skill Level.
nababago
Coefficient of stability
When the interval between testing is greater than six months, the estimate of test-retest reliability is often referred to as the
six months
When the interval between testing is greater than _________, the estimate of testretest reliability is often referred to as the coefficient of stability
Test-Retest Reliability
This type of reliability testing is not appropriate for tests that measure a constantly changing characteristic for obvious reasons.
For example, the reliability of a test measuring happiness should not be tested for reliability using __________ because happiness varies in accordance with the situation experienced by an individua
Parallel-Forms
_______ of a test exist when, for each form of the test, the means and the variances of observed test scores are equal.
Parallel-Forms
Mr. Bitoy, a psychologist creates two forms of the test measuring confidence and administers both to the same standardization sample. The scores of both tests are then tested for correlation.
Parallel-Forms method
This method of testing reliability provides one of the most rigorous assessments of reliability commonly in use.
Often test developers find it burdensome to develop two forms of the same test, and practical constraints make it difficult to retest the same group of individuals
Alternate Forms
are simply different versions of a test that have been constructed so as to be parallel.
Alternate Forms equivalence
Although they do not meet the requirements for the legitimate designation “parallel,” alternate forms of a test are typically designed to be equivalent with respect to variables such as content and level of difficulty.
Split-Half Reliability Estimates
It is obtained by correlating two pairs of scores obtained from equivalent halves of a single test administered once; It is a useful measure of reliability when it is impractical or undesirable to assess reliability with two tests or to administer a test twice.
Split-Half Reliability step 1
Divide the test into equivalent halves.
Split-Half Reliability step 2
Calculate a Pearson r between scores of the two halves of the test.
Split-Half Reliability step 3
Adjust the half-test reliability using the Spearman-Brown formula.
Divide the test into equivalent halves
When it comes to split-half reliability coefficients, there’s more than one way to split a test – but there are some ways you should never split a test.
Middle split
Simply dividing the test in the middle is not recommended because it’s likely this procedure would spuriously raise or lower the reliability coefficient.
Random assignment split
One acceptable way to split a test is to randomly assign items to one or the other half of the test.
Odd-even split
Another acceptable way to split a test is to assign odd-numbered items to one half of the test and even-numbered items to the other half.
Spearman Brown Formula
The Spearman-Brown formula allows a test developer or user to estimate internal consistency reliability from a correlation of two halves of a test.
Spearman Brown Formula general use
It is a specific application of a more general formula to estimate the reliability of a test that is lengthened or shortened by any number of items.
Spearman Brown Formula data
This formula is best used for continuous and ordinal data.
Inter-Item Consistency
Refers to the degree of correlation among all the items on a scale.
Inter-Item Consistency calculation
A measure of inter-item consistency is calculated from a single administration of a single form of a test.
Interitem consistency and homogeneity
An index of interitem consistency, in turn, is useful in assessing the homogeneity of the test.
Homogeneous tests
Tests are said to be homogeneous if they contain items that measure a single trait.
Kuder-Richardson formula 20 (KR-20)
This is best used to measure internal consistency of tests with dichotomous items.
Dichotomous items
Example of this are questions that are answerable by yes or no, male or female, and true or false.
Coefficient Alpha
Coefficient Alpha is appropriate for use on tests containing non-dichotomous items.
Coefficient Alpha use
Coefficient Alpha is widely used as a measure of reliability, in part because it requires only one administration of the test.
Measures of Inter-Scorer Reliability
Measures of Inter-Scorer Reliability.
Inter-Scorer Reliability
Is the degree of agreement or consistency between two or more scorers (or judges or raters) with regard to a particular measure.
High Inter-Scorer Reliability coefficient
If the reliability coefficient is high, the prospective test user knows that test scores can be derived in a systematic, consistent way by various scorers with sufficient training.
Inter-Scorer Reliability example
Example of this is when judges in gymnastics scores the athlete based on various criteria. The extent to which their scores is consistent with each other is called Inter-Scorer Reliability.
Validity
Defined as the agreement between a test score or measure and the quality it is believed to measure.
Validity question
Does the test measure what is supposed to measure?
Evidence
Evidence is the evidence for inferences made about a test score (construct related, criterion related, and content related).
Validation
Validation is the process of gathering and evaluating evidence about validity.
Local validation studies
Necessary for test users to plan to alter in some way the format, instructions, language, or content of the test.
Types of evidence
Construct-related; Criterion-related; Content-related.
Face Validity
Relates more to what a test appears to measure to the person being tested than to what the test actually measures.
Face Validity judgment
A judgement concerning how relevant the test items appear to be.
Lack of face validity
A test’s lack of face validity could contribute to a lack of confidence in the perceived effectiveness of the test.
Face Validity effect on test taker
Decrease in test taker’s cooperation or motivation.
Face Validity effect on administrators
Unwillingness of administrators or manager to buy to the use of a particular test.
Content Validity
This measure of validity is based on evaluation on an evaluation of the subjects, topics, content covered by the items in the test.
Content Validity description
Describes a judgment of how adequately a test samples behavior representative of the universe of behavior that the test was designed to sample.
Content Validity and construct vision
Ideally, test developers have a clear vision of the construct being measured, and the clarity of this vision can be reflected in the content validity of the test.
Content Validity and key components
Test developers strive to include key components of the construct targeted for measurement, and exclude content irrelevant to the construct targeted for measurement.
Test Blueprint
From the pooled information, there emerges a Test Blueprint for the structure of the evaluation – a plan regarding the types of information.
Criterion Validity
This is a measure of validity obtained by evaluating the relationship of scores obtained on the test to scores on other tests or measures.
Criterion Validity judgment
A judgement of how adequately a test score can be used to infer an individual’s most probable standing on some measure of interest.
Criterion
The standard against which a test or a test score is evaluated.
Adequate criterion
An adequate criterion is relevant; Valid for the purpose for which it is being used; Uncontaminated.
Concurrent Validity
An index of the degree to which a test score is related to some criterion obtained at the same time (concurrently).
Predictive Validity
An index of the degree to which a test score predicts some criterion measure.
Concurrent Validity description
If test scores are obtained at about the same time as the criterion measures are obtained, measures of the relationship between the test scores and the criterion.
Predictive Validity description
Measures of the relationship between the test score obtained at a future time provided.
Predictive Validity example
Measures of the relationship between college admission tests and freshman grade point averages provide evidence of the predictive validity of the admissions tests.
Base Rate
Expressed in the population.
Hit Rate
Proportion of people identifies/ exhibit a particular trait/ characteristics.
Miss Rate
Fails to identify as having or not having a characteristics/trait.
False Positive
A miss wherein a test predicted that the test taker DID possess a trait being measured wherein fact the test taker DID NOT.
False Negative
A miss wherein a test predicted that the test taker DID NOT possess wherein fact he/she actually DID.
Validity Coefficient
A correlation coefficient that provides a measure of the relationship between test scores and scores in criterion.
Increment Validity
May be used in when predicting something like academic success in college.
Increment Validity and hierarchical regression
Uses hierarchical regression as it estimates how well a criterion can be predicted within existing predictors and evaluates how new predictors improve when new predictors is added in the equation.
Construct Validity
This is a measure of validity that is arrived at by executing a comprehensive analysis of how scores on the test relate to other test scores and measures.
Construct Validity theoretical framework
How scores on the test can be understood within some theoretical framework for understanding the construct that the test was designed to measure.
Construct Validity judgment
A judgment about appropriateness of inferences drawn from test scores regarding individual standings on a variable called construct.
Intelligence construct
Intelligence is a construct that may invoked to describe why a student performs well in school.
Anxiety construct
Anxiety is a construct that may invoked to describe why a psychiatric patient paces the floor.
Evidence of Construct Validity
Evidence of homogeneity; Evidence of changes in age; Evidence of pretest-posttest changes; Evidence from distinct groups; Correlating scores on other tests in accordance with what would be predicted from a theory that covers the manifestation of the construct in question.
Evidence of Homogeneity
The test is homogeneous, measuring a single construct.
Evidence of Homogeneity example
A test of academic achievement that contains subtests in areas of mathematics, spelling, and reading comprehension.
Pearson r in homogeneity
The Pearson r could be used to correlate average subtest scores with the average total test score.
Subtests and homogeneity
Subtests that in the test developer’s judgment do not correlate well with the test as a whole might have to be reconstructed (or eliminated).