content/criterion validity - class 6

0.0(0)

Studied by 0 people

Call with Kai

Knowt Play

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/53

Earn XP

Description and Tags

blue=lect notes, purple=textbook notes, green=chat

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

54 Terms

New cards

validity (3)

does the test measure what it was designed to measure?

the extent to which a test measures the attribute it is designed to measure

Whether the measurement we obtained from a test is accurate for the meaning of the test

Ex. A math test may be valid test for skills of 1st graders, but not 5th tests. It depends ont eh purposes we are using the test.

New cards

general points on validity (3)

1)do not accept a test’s name as an indicator of what the test measures

2)validity is not a yes/not decision

3)validity evidence: tells how well the test measures what it is intended to measure

New cards

1)do not accept a test’s name as an indictor of what the test measures

Don't take test name as evidence of what the test is meant to measure

the test name is only a vague label inferring what it's suppose to measure and NOT (validity) evidence of what it measure (in reality)

You have to look deeper beyond label to find true validity

New cards

2)validity is not a yes/not decision (3)

it (validity) comes in degrees *there is a degree/spectrum of validity

tests are valid for/it applies to a particular use and particular population

Ex. A test can be valid for highschoolers but not AS VALID for measuring math skills of college students
The test used for college students still measurea math skills but it's not as valid as when it measures math skills for high school students

it (validity) is a process: an ongoing process of gathering (valdity) evidence to ensure test scores are meaningful and accurate

Experts keep accumulating evidence that tells whether her test is accurate in measuring what it's supposed to measure

New cards

3)validity evidence (3)

*validity evidence: tells how well the test measures what it is intended to measure

*validity is the evidence for inferences made about a test score

There are 3 types of evidence: 1)construct-related, 2)criterion related and 3)content related

*Most recent standards emphasize that validity is a unitary concept that represents all of the evidence that supports the intended interpretation of a measure

With validity is separated into these convenient subcategories (e.g. construct, content etc), this use of categories does NOT imply that there are distinct forms of validity

New cards

correlation coefficents for validity VS reliability (2)

Reliability has one coefficient

Ex. Test retest

There is only ONE correlation coefficient we can obtain

But for validity there is NO single number that will indicate how valid or not valid a test is

There are many indicators of validity that are produced by the procedures we use

New cards

3 types of validity

1)content validity

b)criterion validity

c)construct validity

New cards

face validity (2)

face validity: whether a test appears to measure what it is supposed to measure (does the test(‘s face) appear valid?)

BUT not sufficent evidence of validity *not a real form of validity evidence

ex. “I care about ppl” — clearly indicative of s/o’s empathy, more valid

VS “i prefer baths to showers” — less face validity for measuring empathy

New cards

pros of a test having face validity (3)

1)induces cooperations and postive motivation before/during test administration

*If test looks relevant and appropriate, test takers are more likely to take it seriously, put in effort, and stay engaged during (test) administration

2)reduces dissatisfaction and feelings of injustice among low scorers

*Ppl who score poorly are less likely to complain that the test was “unfair” or irrelevant, since it looks like a valid measure of what it claims to measure.

3)convinces policymakers, employers and administrators to implement the test

*Policymakers, employers, administrators, or clients are more willing to accept and adopt a test if it appearsappropriate and valid. It gives the test more credibility in applied settings.

New cards

when might low face validity be useful? (3)

sometimes a test with low face validity can elicit more honest responses

For example, in personality or attitude tests, if the questions are too obviously linked to what’s being measured, people may try to fake their responses (social desirability bias).

Tests with lower face validity can hide their true purpose, which may actually produce more honest and unbiased responses

New cards

content validity *evidence (3)

content validity: evaluates how adequately the test samples the domain or the content of the construct *Degree to which the test samples the content of the construct

is the content of the test relevant and representative of the content domain?

*How representative are the items in the test representing the domain we want to test?

bc can't possible put every item that exists for s construct, so we have to sample from the domain of all possible behaviour. The representative of that sample is key to determine if there's content validity.

New cards

establishing content validity (5)

a)describe the content domain

identify boundaries of the content domain *We have to be clear about what is included vs not included in a domain (ex. Math test: boundaries=math knowledge, so if there are biology questions then the test doesn’t have content validity)
determine the structure of content boundaries

ex. For structure, there are multiple topics that are covered in a certain number of question covered in math classes, so the test of knowledge of the course should reflect the structure of the course. More questions on more imp topics than questions of topics less imp/less covered in the course

b)inspect test

c)form judgement that (*about whether) the test measures what it is supposed to measure…without gathering any external evidence

*It's judgment call from professionals that know the domain—they assess how representative the items are to the domain

New cards

When is content validity high? (2)

1) when the test content is representative of the tasks that define the content domain *representative of defining tasks of the content domain *there is good content coverage/high valdity: when the test content is a representative sample of the domain

there is poor coverage/low validity when the domain is very wide and the test samples from a very narrow part of the domain so there is poor coverage—not represenative of the entire content domain

2)when the items do NOT measure smth else *unrelated to content domain ex. if a psychometric test that asks you about biology, you are contaminating the psychometric test with biological elements that are irrelevant to psychometrics

<p>1) when the<strong> test content is representative of the tasks </strong>that define the content domain<span style="color: purple;"> *representative of defining tasks of the content domain </span><span style="color: blue;">*there is good content coverage/high valdity: when the test content is a representative sample of the domain</span></p><p><span style="color: blue;">there is <strong>poor coverage/low validity when the domain is very wide and the test samples from a very narrow part of the domain so there i</strong>s poor coverage—not represenative of the entire content domain</span></p><p>2)when the<strong> items do NOT measure smth else</strong><span style="color: purple;"> *unrelated to content domain</span> <span style="color: blue;">ex. if a psychometric test that asks you about biology, you are contaminating the psychometric test with biological elements that are irrelevant to psychometrics</span></p>

New cards

evaluation of content validity (3) *limitations

not sufficient—it doesn't tell us how test scores relate to the test scores of other attributes (ex. performance)

it’s the beginning of the process of generalizing validity for a test

*due to no information about relation of test to external variables—doesn't tell us how well the test measures performance or predicts other constructs

*no info on how the test relates to external variables

New cards

criterion validity (definition in terms of prediction)

how accurately does the test predict specific variables that are considered direct indicators of the construct

Ex. Neuronticism: bc neuroticism can’t be directly measured, researchers use questionnaires or scales as indirect measures.

To determine whether these tests are valid, they look at how well the test scores predict observable behaviours that are considered direct indicators of neuroticism (e.g., anxiety reactions, mood instability, stress sensitivity). *ex. observable neuroticism-related behvaiours = criterion

Criterion validity tests whether a measure can successfully predict a relevant, observable outcome that directly reflects the construct.

Criterion validity means evaluating how well a test predicts (or corresponds) with a real-world/direct indicators/outcome (the criterion) that represents the construct.

New cards

*how do scientists establish criterion valdity

To do this, they:

1)Identify or create a “pure” behavioural indicator of neuroticism (a criterion).

2) Compare test scores to that indicator (criterion).

3)Look for a strong correlation between the scale (scores on their test) and the behavioural criterion.

If the test scores correspond well with/predict the criterion (those direct behavioural indicators(, the test shows good criterion validity.

New cards

Explain criterion validity ex. Academic aptitude tests to predict school success (construct, indirect measure, direct indicator, process)

Construct: Academic aptitude (cannot be directly observed)

Indirect measure (the test): Aptitude test scores

Direct indicator (criterion): Real school outcomes (e.g., GPA, grades, graduation rates)

Process: Since academic aptitude itself can’t be directly measured, we use test scores as an indirect measure.

We choose a “pure”/direct indicator of academic success, like semester GPA or standardized grades. *criterion = standard against which the test is compared

Then we check how well the test scores correlate with those outcomes (criterion).

A strong correlation = good criterion validity.

✅ Prediction-based criterion validity → Do high aptitude scores actually predict strong academic performance?

New cards

Explain criterion validity ex. personality tests to predict health risk behaviour

Construct: Personality traits (e.g., impulsivity, conscientiousness)

Indirect measure (the test): Personality questionnaire or scale

Direct indicator (criterion): Observable health-related behaviours, like: Smoking, excessive alcohol use, risky sexual behaviour, poor diet or lack of exercise

Process:

Since personality traits can't be directly measured, the personality test is an indirect indicator.

Researchers select a behavioural criterion that reflects the trait of interest (e.g., impulsivity predicting smoking or binge drinking).

They then assess whether people with certain test scores are more likely to engage in those behaviours. *are ppl with higher impulsivity scores more likely to smoke, engage in risky sexual behaviour, etc.

Strong correlations between test scores and real behaviours provide criterion-related evidence.

✅ Outcome-based criterion validity → Do personality test results help predict actual health risk behaviours?

New cards

criterion (def) (3)

criterion: a standard used by researchers to measure (*test) outcomes (slides)

Criterion as a benchmark: we see how well our test performs relative to the benchmark

criterion—the standard against which the test is compared—Criterion-related evidence shows how well a test matches up with a standard or "gold standard" measure (aka criterion)

criterion is the outcome or standard you ultimately care about and want to measure—something that the test is supposed to predict or match.

Ex: If you design a premarital test to predict relationship outcomes, the criterion is “marital success” (the real-world outcome you want to know about).

The test itself is just a stand-in to estimate or predict that criterion, since you can’t measure it directly at the time of testing.

New cards

selecting a criterion *2 types of criterion

when selecting a criterion, the criterion can be objective or subjective

1)objective criteria—observable and measurable

ex. for employee work performance, number of accidents, days of absence

2)subjective criteria—based on a person’s judgement

ex. supervisor’s ratings, peer ratings

E.g. if want to assess our job performance test: for criterion valdity, we first ask supervisor to rate performance of employees. then we assess validity of our new test with the supervisor's rating—the supervisor's rating as criterion

New cards

types of criterion validity (evidence)

1)concurrent

2)predictive

New cards

concurrent validity (def)

concurrent validity: criterion available at THE same time as test

large group takes the test (predictor)

same group takes another measure (the criterion) with evidence of reliability and validity

test are taken at the same point in time

scores are correlated

Concurrent validity evidence applies when the test and the criterion can be measured at the same time

It comes from assessment of the simultaneous relationship between the test and criterion

Ex. the relationship between learning disability and school performance—measures are taken at the same time bc the test is designed to explain why the person is now having difficulty in school

New cards

predictive validity (def)

predictive validity: criterion measure available in the future

large group takes the test (predictor) and their scores are held (e.g. for 6 months)

at future time, same group is administered a second measure with evidence of reliability and validity (the criterion)

scores are correlated

Predictive validity evidence: type of criterion validity evidence that uses test to predict behavior

Ex. SAT Critical Reading Test serves as predictive validity evidence if it accurately predicts how well high-school students will do in their college studies

New cards

predictive vs concurrent criterion validity

New cards

goal of test? possible criterion options?

goal: to determine whether the new admissions test effectively selects candidates who will be successful counsellors.

Possible Criteria (i.e., real-world indicators of success):

1)Undergraduate Grades: These could be compared with test scores.However, grades may not truly reflect future counselling performance, so this might be a weak criterion.

2)Career Outcomes. How many admitted students go on to become practising counsellors.This is a stronger, behavioural indicator of long-term success in the field.

3)Client Ratings. Feedback from clients (e.g., satisfaction, perceived helpfulness) could serve as a direct measure of counselling effectiveness. This would reflect the construct (*counsellor success)more accurately.

New cards

✅ Goal: Validate a new procrastination scale made up of self-report items.

✅ What’s needed: A behavioural criterion—a real-world indicator that reflects actual procrastination—to compare against test scores.

✅ Possible Criteria (examples discussed):

1)Professor ratings. Professors could rate whether a student typically submits assignments late. Late submissions serve as a direct behavioural indicator.

2)Task initiation behavior. Ask students whether they’ve started specific assignments. Compare scale scores between those who have started vs those who haven’t.

3)Cramming before exams. Measure how many hours of sleep someone gets the night before an exam. Assumption: less sleep may indicate last-minute studying (procrastination). BUT: Sleep loss could also be due to stress, so it may not be a clean indicator.

New cards

selection a criterion for criterion valdity (def of right criterion, 2 problems with selecting a criterion)

which is the right criterion? the “right” criterion is one that measures the same construct (or very close to it) that your test is supposed to measure — not less, not more, not something different.

if the criterion *measure measures fewer dimensions that those measured by the test→there is decreased evidence of validity based on its content bc it has underrepresented some important characterisitcs

if the criterion measures more dimensions than those meaured by the test→there is criterion contamination

New cards

criterion underepresentation (ex. math test)

Problem 1: Criterion measures too little (underrepresentation).

If the criterion only covers part of what the test measures, it weakens validity evidence. This is because important dimensions of the construct are left out.

Example: Your test measures math reasoning, algebra and geometry. But your criterion is only algebra grades.→ This ignores reasoning and geometry, so you can't fully validate the test.

→there is decreased evidence of criterion validity

New cards

criterion contamination (ex. math test)

Problem 2: Criterion measures too much (contamination).

If the criterion includes extra things your test isn’t meant to measure, the comparison gets “contaminated.” This can make your test look more or less valid than it really is.

Example: Your test measures only math reasoning. But the criterion is the overall math course grade, which includes: homework effort, attendance, participation → These extra factors distort the comparison.

New cards

criterion contamination: test may look less/more valid due to ‘contaminated’ criterion

✅ Your test may look more valid than it really is if:

The extra components in the criterion happen to correlate with your test scores by accident.

Example: Your test measures math reasoning. The criterion is overall math grade, which also includes attendance and participation. Students who are strong in reasoning may also tend to show up and participate more. → This extra overlap artificially inflates the correlation, making your test look more valid than it actually is.

✅ Your test may look less valid than it really is if:

The extra components in the criterion do not align with what your test measures—or even work against it.

Example: Your test still measures math reasoning. But some students with strong reasoning don’t participate or submit homework regularly.
→ Their grades drop for reasons unrelated to reasoning, weakening the correlation.
→ Your test appears less valid even though it may measure the construct well.

<p><span data-name="check_mark_button" data-type="emoji">✅</span><span><strong> Your test may look more valid than it really is if:</strong></span></p><p>The <span><strong>extra components in the criterion</strong></span> happen to correlate with your test scores by accident.</p><p><span><strong>Example: </strong>Your test measures <em>math reasoning</em>. The criterion is <em>overall math grade</em>, which also includes attendance and participation. Students who are strong in reasoning may also tend to show up and participate more. → This extra overlap <strong>artificially inflates the correlation</strong>, making your test look more valid than it actually is.</span></p><p> </p><p><span data-name="check_mark_button" data-type="emoji">✅</span><span><strong> Your test may look less valid than it really is if:</strong></span></p><p>The extra components in the criterion <span><strong>do not align</strong></span> with what your test measures—or even work against it.</p><p><span><strong>Example: </strong>Your test still measures <em>math reasoning</em>. But some students with strong reasoning don’t participate or submit homework regularly.<br>→ Their grades drop for reasons <strong>unrelated to reasoning</strong>, weakening the correlation.<br>→ Your test appears <strong>less valid</strong> even though it may measure the construct well.</span></p>

New cards

what is a validity coefficient (3)

It is the correlation between: ➝ Test scores and ➝ Criterion scores.

When assessing criterion validity, we correlate the two sets of scores and compute a correlation coefficient (r).

It shows how well the test predicts/corresponds to the criterion.

In other words, it reflects how useful or valid the test is for making statements about real-world outcomes.

New cards

what is the size of a validity coefficients (4)

Validity coefficients are rarely higher than r = 0.60 (Cronbach, 1990). *Coefficients above 0.60 are uncommon and not expected.

In applied settings, values below 0.60 can still indicate good criterion validity.

r = 0.30 to 0.40 is generally considered adequate.

Very high correlations (close to 1.0) may imply the test and criterion are too similar or measuring the exact same thing.

*A validity coefficient is just a correlation ranging from –1 to +1, but in practice we focus on whether it's strong enough to show meaningful prediction.

validity coefficient tells us the extent to which the test is valid for making statement about the criterion—shows how valid (useful) the test is for predicting or describing something about the criterion.

New cards

magnitude of validity coefficients: effects sizes for medical vs psychological tests (3)

debate on whether psycholgical tests were less valid than other disciplines (e.g. medicine)

but when they compared the validity correlations coefficients in psychological vs medical tests (see paper) it was found that psychological tests perform as well as medical tests

*psychoglical tests were NOT less valid than medical tests

take home message: psychological tests can provide information that is as valid as common medical tests

<p>debate on whether psycholgical tests were <u>less valid</u> than other disciplines (e.g. medicine) </p><p>but when they compared the validity correlations coefficients in psychological vs medical tests (see paper) it was found that <u>psychological tests perform as well as medical tests</u></p><p>*psychoglical tests were NOT less valid than medical tests</p><p>take home message<strong>: psychological tests can provide information that is as valid as common medical tests</strong></p>

New cards

3 factors limiting validity coefficients (3)

1)range of scores—restricted range of scores decreases validity coefficients

2)unreliability of test scores—low reliability decreases validity coefficients *when test scores are less reliable, the relationship/correlation between the test and criterion (aka validity coefficient) is weaker

There is greater correction (aka adjustment for attentuation) (*needed) when reliability is lower than when reliability is higher

3)unreliability in criterion—RE: low reliability decreases validity coefficients

New cards

Restricted range (def)

Restricted range: a variable has restricted range if all scores for that variable fall very close together

(ex. GPAs of graduate student in a PHD programs tend to fall within a limited range of the scale—usually above 3.5 on a 4 point scale)

New cards

1)problem with restricted range (3)

Key Idea: Restricting the range of scores lowers the observed validity coefficient.

RE: A validity coefficient is the correlation between a test and the outcome it’s meant to predict.

But this correlation can look smaller than it truly is when the range of scores is restricted

New cards

why does restricted range decrease validity coefficient (3)

Why restriction weakens the correlation:

When you only look at a limited group (e.g., only high scorers), you remove part of the natural variation in scores (e.g. lower scorers)

With less variation (less lows and highs), the relationship between the test and the criterion appears weaker — even if the true relationship hasn't changed.

New cards

problem with restricted range (ex. SAT and GPA): (3)

Example (SAT test scores and GPA criterion):

If a university only admits students with high SAT scores, the SAT range in that group is restricted *restricted range of SAT scores

When you correlate SAT and GPA within that admitted *limited group, the correlation (validity coefficient) will look artificially low.

If students across the full SAT range were included (high and low scorers), you'd see the true, stronger relationship (btwn SAT test scores and GPA criterion)— because the full variation in scores is visible.

<p>Example (<strong>SAT test scores and GPA criterion):</strong></p><p>If a university only admits students with high SAT scores, the SAT range in that group is restricted <span style="color: blue;">*restricted range of SAT scores</span></p><p>When you correlate SAT and GPA <strong>within that admitted</strong><span style="color: blue;"> *limited<strong> </strong></span><strong>group</strong>, the <mark data-color="red" style="background-color: red; color: inherit;">correlation (validity coefficient) will look artificially low.</mark></p><p>If students across the <strong>full SAT range</strong> were included (high and low scorers), you'd see the<mark data-color="green" style="background-color: green; color: inherit;"> true, stronger relationsh</mark><span style="color: blue;"><mark data-color="green" style="background-color: green; color: inherit;">ip </mark>(btwn SAT test scores and GPA criterion)</span>— because the full variation in scores is visible.</p>

New cards

2)unreliability of test scores (3)

there is potential for random error in testing—leading to unreliability of test scores

low reliability decreases validity coefficients

Measures with perfect reliability have reliability = 1

New cards

measuring unreliability of test scores (4)

to measure the unreliability of test scores affecting a correlation coefficient, we want to assess:

how much of correlation is impacted by unreliability of test scores

we do this using is correction for attenuation: correcting for unreliability in test

When test is impacted by error—it is attenuated

So the correction for attenuation is adjustment for attenuation

New cards

what do the variables r, x and y mean in this context? (4)

r: Stands for the correlation coefficient — the statistical measure of how two variables relate.

It’s always the first part of the notation (e.g., ryx, rxx′).

x: Represents the predictor variable — usually your test or assessment tool. (Ex. SAT scores, personality test, aptitude test.)

y: Represents the criterion variable — the outcome you’re trying to predict. (Ex. GPA, job performance, health outcomes)

New cards

𝑟ᵧₓ (3)

𝑟ᵧₓ (Observed validity coefficient) *which is the score we obtain from the test

This is the actual correlation you obtained between:
✅ Your test (predictor) and
✅ The criterion (outcome).

It’s “observed” because it comes from real data — but it’s affected by measurement error in your test.

New cards

𝑟ₓₓ′ (4)

𝑟ₓₓ′ (Reliability coefficient of the predictor test)

This reflects how consistent and accurate your test is. *reliability

It ranges from 0 to 1: High reliability (e.g., .90+) → scores are stable, low error AND Low reliability (e.g., .50) → scores contain a lot of error

This matters because unreliable tests underestimate validity *low reliability decreases validity coefficient

"r" = correlation, "xx′" = test correlated with itself (across time/items)
✅ So: Reliability of the predictor/test.

New cards

rᵧₓ (corrected) or rᵧₓcorr (3)

𝑟ᵧₓ₍𝚌ₒᵣᵣ₎ (Corrected / estimated validity coefficient)

This is a hypothetical estimate of what the validity would be if the test were perfectly reliable.

It adjusts the observed validity coefficent (𝑟ᵧₓ) to account for test unreliability.

It tells you: ➝ “How strong would the test–criterion relationship be if the test had no measurement error?”

“r" = correlation, "y" = criterion, "x" = predictor, "corr" = corrected for unreliability

✅ So: Estimated correlation between test and criterion if the test were perfectly reliable.

New cards

correcting for attenuation formula—unreliability in test *scores)

we correlate the observed validity coefficient (ryx) (which we obtained from our test) with the reliaibiity coefficient of predictor test (rxx’) to ge the estimated valdity coefficient (ryxcorr) *which is hypothetical

this shows how much of this correlation is impacted by the unreliability of test scores

This tells us that if we improved the test (made more reliability) how predictive it would be of the test criterion

*If we have perfect measure with reliability of 1, what would be the correlation coefficient (for validity) be?

Get estimated validity coefficient and compare the 2

<p>we correlate the <strong>observed validity coefficient (ryx)</strong> (which we obtained from our test) with the <strong>reliaibiity coefficient of predictor test (rxx’)</strong> to ge the<strong> <u>estimated valdity coefficient (ryxcorr) </u></strong>*which is hypothetical</p><p>this shows how much of this correlation is impacted by the unreliability of test scores</p><p>This tells us that if we improved the test (made more reliability) how predictive it would be of the test criterion</p><p><span style="color: blue;">*If we have perfect measure with reliability of 1, what would be the correlation coefficient (for validity) be?</span></p><p><span style="color: blue;"> </span><span style="color: blue;">Get estimated validity coefficient and compare the 2 </span></p>

New cards

interpretation of estimated validity coefficent (in correction for attentuation: correcting for unreliability in test) (4)

This estimated (corrected) validity coefficient (ryxcorr) is hypothetical (you can’t actually observe it).

It shows how much of the low observed valdity coefficient is due to test unreliability *test measurement error

If reliability improved, the test’s predictive validity would rise — but it’s capped by the true relationship between the construct and the criterion.

RE: unreliability of test scores—low reliability decreases validity coefficients *when test scores are less reliable, the relationship/correlation between the test and criterion (aka validity coefficient) is weaker

New cards

how high vs low reliability coefficient of predictor test (𝑟ₓₓ′) affects estimated validity coefficient (rᵧₓcorr )

There is greater correction (aka adjustment for attentuation) for the estimated validity coefficient (rᵧₓcorr ) when reliability coefficient of predictor text (𝑟ₓₓ′) is lower than when it is higher

Lower the reliability of test (𝑟ₓₓ′), the greater the correction for attenuation/value of the estimated validity coefficient attenuation (rᵧₓcorr )

*the lower the reliability coefficient of predictor test, the higher estimated validity coefficient (correction)

<p>There is greater correction (aka adjustment for attentuation) for the estimated validity coefficient<strong> (<mark data-color="yellow" style="background-color: yellow; color: inherit;">rᵧₓcorr </mark>) </strong>when<strong> reliability coefficient of predictor text (𝑟ₓₓ′) is lower</strong> than when it is higher</p><p>Lower the reliability of test (<strong>𝑟ₓₓ′)</strong>, the greater the c<strong>orrection for attenuation/value of the estimated validity coefficient attenuation</strong> <strong>(<mark data-color="yellow" style="background-color: yellow; color: inherit;">rᵧₓcorr </mark>)</strong></p><p>*the lower the reliability coefficient of predictor test, the higher estimated validity coefficient (correction)</p>

New cards

3)unreliability in criterion (3)

There is not only unreliability in test scores, but there can also be unreliability in criterion scores (e.g. both SAT test scores AND GPA criterions scores)

When we transfer criterion data from database there may be error, thus unreliability of criterion

We can correct for BOTH unreliability of test and criterion scores, in doing so, correcting for both potential sources of error

<p>There is not only unreliability in test scores, but there can also be <strong>unreliability in criterion scores</strong> (e.g. both SAT test scores AND GPA criterions scores) </p><p>When we<strong> transfer criterion data from database there may be error,</strong> thus unreliability of criterion </p><p>We can<strong> correct for BOTH unreliability of test and criterion scores,</strong> in doing so,<strong> correcting for both potential sources of error</strong></p>

New cards

rᵧᵧ′ (3)

rᵧᵧ′ (reliability coefficient of the criterion)

r = correlation

ᵧᵧ′ = the criterion correlated with itself

Measures how consistently the outcome (*the criterion) is measured.

New cards

rᵧcₒᵣᵣₓcₒᵣᵣ (3)

rᵧcₒᵣᵣₓcₒᵣᵣ (estimated/corrected validity coefficient) for test AND criterion

How much unreliability is dragging down your observed validity.

How predictive (*or more valid) your test could be if BOTH: The test were more reliable AND the criterion were measured more accurately

This gives you a hypothetical “best-case” estimate of how valid the test could be in predicting the outcome.

New cards

correcting for attenuation (formula)—unreliability in test AND criterion (rᵧcₒᵣᵣₓcₒᵣᵣ)

we correlate the observed validity coefficient (ryx) (which we obtained from our test) with the reliaibiity coefficient of predictor test (𝑟ₓₓ′) AND reliability coefficient of criterion (rᵧᵧ′) to ge the estimated valdity coefficient (rᵧcₒᵣᵣₓcₒᵣᵣ) *which is hypothetical

this shows how much of this correlation is impacted by the unreliability of test AND criterion scores

This tells us that if we improved the test and criterion (made BOTH more reliability) how predictive the test would of the criterion (valdity coefficient)

New cards

how high vs low reliability coefficient of predictor test (𝑟ₓₓ′) and reliability coefficient of criterion (rᵧᵧ′) affects estimated validity coefficient (rᵧcₒᵣᵣₓcₒᵣᵣ)

if observed validity coefficient held constant (ryx):

the higher the reliability of predictor test (𝑟ₓₓ′) and the higher the reliability coefficient of criterion (rᵧᵧ′), the lower the (corrected) estimated validity coefficient (rᵧcₒᵣᵣₓcₒᵣᵣ)

the lower the reliability of predictor test (𝑟ₓₓ′) and the lower the reliability coefficient of criterion (rᵧᵧ′), the higher the (corrected) estimated validity coefficient (rᵧcₒᵣᵣₓcₒᵣᵣ)

<p>if observed validity coefficient held constant (ryx):</p><p>the higher the <strong>reliability of predictor test (𝑟ₓₓ′) </strong>and the higher the<strong> reliability coefficient of criterion (</strong>rᵧᵧ′<strong>)</strong>, the<u> lower</u> the (corrected) <strong>estimated validity coefficient (rᵧcₒᵣᵣₓcₒᵣᵣ)</strong></p><p>the lower the <strong>reliability of predictor test (𝑟ₓₓ′) </strong>and the lower the<strong> reliability coefficient of criterion (</strong>rᵧᵧ′<strong>)</strong>, the <u>higher</u> the (corrected) <strong>estimated validity coefficient (rᵧcₒᵣᵣₓcₒᵣᵣ)</strong></p>

New cards

estimated validity coefficient when reliability of predictor test (𝑟ₓₓ′) and the reliability coefficient of criterion (rᵧᵧ′) are the same (3)

if observed validity coefficient held constant (ryx):

but if the reliability coefficient of predictor test (𝑟ₓₓ′) and the reliability coefficient of criterion (rᵧᵧ′) are the same (e.g. both are 0.5), then the estimated validity coefficient (rᵧcₒᵣᵣₓcₒᵣᵣ) will be above 1 (e.g. 1.2)

this means that we are overcorrecting the estimated validity coefficient (rᵧcₒᵣᵣₓcₒᵣᵣ) to the point that is it not realistic

<p>if observed validity coefficient held constant (ryx):</p><p>but if the r<strong>eliability coefficient of predictor test (𝑟ₓₓ′) </strong>and the<strong> reliability coefficient of criterion (</strong>rᵧᵧ′<strong>)</strong> are the <u>same (e.g. both are 0.5)</u>, then the <strong>estimated validity coefficient (rᵧcₒᵣᵣₓcₒᵣᵣ) will be above 1 (e.g. 1.2)</strong></p><p>this means that we are<strong><u> overcorrecting </u>the estimated validity coefficient (rᵧcₒᵣᵣₓcₒᵣᵣ) </strong>to the point that is it<strong> not realistic</strong></p>

New cards

summary of criterion valdity (2 main takeaways)

1)importance of choosing appropriate criterion *to assess the validity of our test

*Be thoughtful when choosing criterion to assess validity of our test

Think if criterion reflect attribute/construct or not

2)(even) small validity coefficients can have practical utility

E.g. medical test vs psycholgical test, criterion validity of both tests is the same

So if we trust medical tests to make decisions, why don’t we trust psyc tests to do the same