1/63
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Random error
causes a person's test score to change from one administration of a test to the next
Which type of error is more relevant for reliability?
Random error lowers the reliability of a test
Classical test theory
observed score = true score + error
Systematic error
When a single source of error always increases or decreases the true score by the same amount
Name 3 reliability types
1. Test-retest method
2. Alternate-forms method
3. Internal consistency method
What does the test-retest method tell you about the test?
Allows us to examine the stability of test scores over time and provides an estimate of the test's reliability/precision
What does the alternate-forms method tell you about the test?
It is a way of evaluating a test's reliability - how consistently it measures whatever it's supposed to measure
What does the internal consistency method tell you about the test?
How well the items within a test work together to measure the same underlying construct
What is test-retest reliability?
Administer the same test to the same people at two points in time
What needs to be done in terms of test administration to be able to calculate it?(test-retest reliability)
Pearson product - moment correlation
What is internal consistency reliability?
Give the test in one administration, and then compare all possible split halves
What needs to be done in terms of test administration to be able to calculate it?(internal consistency reliability)
Coefficient alpha or KR-20
What is interrater reliability?
Give the test once, and have it scored by two scorers or two methods
What needs to be done in terms of test administration to be able to calculate it? (Interrater reliability)
Pearson product - moment correlation
What is scorer reliability?
how consistently a test is scored when human judgment is involved
What needs to be done in terms of test administration to be able to calculate it?(scorer reliability)
Set up the test administration so that multiple scorers can independently score the same set of responses
How does the test-retest interval influence test reliability?
Test-retest reliability will decline because the number of opportunities for the test takers or the testing situation to change increases over time
Given this influence, how should a test developer decide how long to wait to do retesting?
The interval should be long enough to forget the answers, but short enough that the trait hasn't changed
What is the standard error of measurement and why is it useful?
An estimate of how much the individual's observed test score might differ from the individual's true test score
How is the standard error of measurement useful
It tells you how much trust you can put in a test score
List and describe two things about test administration that can influence reliability
-Consistency of testing conditions
-Standardization of instructions
List and describe two things about test scoring that can influence reliability.
-Using the correct scoring key
-Consistency of Scorers
List and describe two things about test takers that can influence reliability
-Effort and motivation
-Understanding and attention
How would you explain to someone that while reliability is largely a function of the test itself, validity is not?
Reliability is about the test's consistency - something built into the test
Validity is about whether the test scores are meaningful for a specific purpose - something that depends on how and why the test is used
What are the three traditional types of validity?
Content validity, criterion-related validity, and construct validity
What is content validity
Whether the test's items accurately and completely represent the material or skills it is supposed to measure
What is criterion-related validity?
Tells you whether test scores actually connect to real-world outcomes or established measures in the way they should
What is construct validity?
the degree to which a test measures what it claims, or purports, to be measuring
List and describe four of the sources of information for evidence of validity
-Evidence based on test content
-Evidence based on the response process
-Evidence based on internal structure
-Evidence based on relations with other variables
Evidence based on test content
Looks at what is actually on the test - the questions, tasks, wording, and format
-Ask whether the test content fully represents the construct it is supposed to measure and avoids including irrelevant material
Evidence based on response process
Examine how test takers think, behave, or respond while completing the test.
- It checks whether the mental processes used by test takers match what the test is intended to measure
Evidence based on internal structure
How the items on the test relate to each other and whether the test behaves the way the underlying theory says it should
Evidence based on relations to other variables
This examines how test scores relate to other measures
- expected relationships
-lack of relationships with unrelated constructs
Describe the process for assessing content validity
Assessing content validity involves having a panel of experts review each test item and rate whether it is essential, useful but not essential, or not necessary for measuring the construct. Using Lawshe's method, a Content Validity Ratio (CVR) is calculated for each item to determine the level of expert agreement. Items that meet the minimum CVR value are kept as evidence of validity, while items that do not meet the standard are revised or removed. This process ensures the test content accurately represents the construct being measured.
Explain what information about test validity this assessment of content validity provides
Content validity evidence shows whether the test items are essential, relevant, and representative of the construct
-supports the argument that test scores meaningfully reflect what the test claims to measure
What is face validity?
Whether a test looks like it measures what it says it measures
Give an example of when a test would be face valid
A math test that contains only math problems has high face validity because it appears to measure math skills.
Explain both why face validity is desirable and why it might be a problem for a psychological test.
Tests that look relevant increase cooperation and perceived fairness. However, it can be a problem because it is based only on appearance, not scientific evidence, and highly face‑valid tests can be easier to fake.
What is the predictive method
Shows a relationship between test scores and future behavior
Describe the process for assessing the predictive method of validation.
Give a test (predictor), wait a set time, then measure later performance (criterion). Correlate the test scores with the future performance scores. A strong correlation shows the test predicts future behavior.
When is it most common to use this method? (predictive method)
test needs to forecast future performance, especially in employment settings (predicting job performance), educational settings (predicting academic success), and clinical settings (predicting future outcomes)
What is the concurrent method?
Test administration and criterion measurement happen at approximately the same time
Describe the process for assessing the concurrent method of validation.
Administer the test and a valid criterion measure to the same group at the same time, then correlate the two sets of scores. A strong correlation provides concurrent evidence of validity.
When is it most common to use this method (concurrent method of validation)
when researchers need immediate evidence of validity, especially in employment, educational, and clinical settings where test scores can be compared to current performance or diagnoses
What is a validity coefficient in criterion-oriented validation?
It is the correlation between test scores and a criterion measure, showing how well the test predicts or reflects the outcome.
What is restriction of range?
Occurs when a sample's data is limited to a narrow subset of the total population, weakening observed correlations
What is the typical influence of range restriction on validity coefficients?
Range restriction usually lowers the validity coefficient because reduced variation in test scores weakens the correlation between the test and the criterion.
Why do we care about this influence? (range restriction on validity coefficient)
Because range restriction lowers the validity coefficient, it can make a test appear less predictive than it truly is. This may lead to incorrect conclusions about the test's usefulness in hiring, admissions, or clinical decisions.
How is evidence of validity from relationships with external criteria different than validity from content?
Content validity evaluates whether the test items represent the construct's domain, while criterion‑related validity examines how test scores relate to an external measure, using correlations to show prediction or real‑world performance.
What is the difference between objective and subjective criteria?
Objective criteria are measurable and based on observable facts, while subjective criteria rely on personal judgments or opinions.
Describe one objective and one subjective criterion in educational settings?
Objective criterion - standardized test score
Subjective criterion - teacher's rating of class participation
Why is the choice of criterion measures important in interpreting validity coefficients and test validity?
Because the validity coefficient depends on the quality of the criterion. If the criterion is unreliable or poorly matched to what the test measures, the correlation will be inaccurate and may underestimate or misrepresent the test's true validity
What do tests of significance and the coefficient of determination tell us about validity coefficients?
Tests of significance show whether the validity coefficient is statistically meaningful, while the coefficient of determination (r2) shows how much of the variation in the criterion is explained by the test scores.
How are these methods different in the questions they answer? (test of significance and coefficient of determination)
Tests of significance ask whether the validity coefficient is statistically real or due to chance, while the coefficient of determination asks how much of the criterion's variation is explained by the test scores.
What does linear regression allow us to do that validity coefficients do not when making inferences about validity?
Linear regression allows us to make specific predictions of criterion scores from test scores and estimate prediction accuracy, while validity coefficients only show the strength of the relationship.
How is multiple regression different from linear regression?
Linear regression uses one predictor to estimate a criterion, while multiple regression uses several predictors at once and shows how much each one uniquely contributes to predicting the outcome.
Why is multiple regression useful in determining how many tests to use in a selection battery?
Because multiple regression shows the unique contribution each test makes to predicting job performance, helping identify which tests add value and which are redundant so the selection battery can be efficient and effective.
Describe the process for assessing construct validity?
Define the construct, make predictions about how the test should relate to other variables, collect data to test those predictions (e.g., convergent, discriminant, factor‑analytic, and group‑difference evidence), and evaluate whether the results support the test as a measure of the intended construct.
Explain what kind of information about test validity an assessment of construct validity provides
It shows whether the test truly measures the intended psychological construct by examining how the test relates to theory, other measures, group differences, and its internal structure. This reveals what the test actually measures and how meaningful its scores are.
List and describe four ways to establish quantitative evidence about construct validity
-Convergent evidence: The test correlates strongly with measures of related constructs.
-Discriminant evidence: The test shows low correlations with unrelated constructs.
-Factor‑analytic evidence: The test's internal structure matches the theoretical structure of the construct.
-Group‑difference evidence: The test differentiates between groups that theory predicts should score differently.
What is factor analysis?
An advanced procedure based on the concept of correlation that helps investigators explain why items within a test are correlated or why two different test are correlated
Why is factor analysis useful for construct validity and testing in general?
Because it reveals the test's underlying structure, shows whether items measure the intended construct, identifies subscales, and helps detect weak or misfitting items—providing strong evidence that the test measures what it claims to measure.
What is the difference between confirmatory and exploratory factor analysis?
Exploratory factor analysis (EFA) is used to discover the underlying factor structure without prior assumptions, while confirmatory factor analysis (CFA) tests whether a hypothesized factor structure fits the data.
Still learning (4)
You've started learning these terms. Keep it up!