Psychometric Properties and Principles

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/129

There's no tags or description

Looks like no tags are added yet.

Last updated 8:46 PM on 5/27/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

130 Terms

New cards

What is dependability?

Reliability

New cards

What is the reliability coefficient?

The index of reliability that shows the proportion of the true score variance

New cards

What is test score variability?

Refers to how spread out test scores are within the group

New cards

What is the difference between systematic and random error?

Random error involves unpredictable fluctuations and destroys reliability; systematic error is biased and constant, destroying the validity.

New cards

How is systematic a threat to validity more than reliability?

Because systematic error is constant, it mimics consistency; however, it consistently deviates from the truth.

New cards

What are the sources of error?

Item/content sampling, test administration, test scoring and interpretation

New cards

What is item/content sampling?

Process of selecting limited set of items to represent a broader domain of knowledge.

New cards

What is the difference between internal and external reliability?

Internal reliability refers to the consistency within the test itself; external reliability refers to the consistency of scores across differing circumstances, times, or raters.

New cards

What are the sources of error in test-retest reliability estimates?

Time-sampling, carryover/practice effects, maturation/reactivity

New cards

Which reliability coefficient is entirely free from content-sampling error?

Test-retest; because the content remains constant

New cards

What is the difference between carryover and practice effects?

Carryover is when emotional or physical states from the first test persist into the second, either increases or decrease test scores; practice is a type of carryover where scores increase due to memory or skill gain, usually increasing the test scores

New cards

How do test sophistication and test wiseness differ from carryover and practice effects?

"Test sophistication: Familiarity with a specific test or type of test that can boost scores (knowing the format, structure, or typical items).

New cards

Test wiseness: General strategies for answering test questions effectively, regardless of content (e.g., eliminating wrong choices, time management).

New cards

Carryover effects: When performance on one test influences performance on a later test (e.g., memory of items, lingering effects).

New cards

Practice effects: Improved performance simply from repeated exposure to the test, not because of learning the content but because of familiarity with the task."

New cards

What is the difference between parallel and alternate forms?

Parallel forms: mathematically identical (same means, variances, and errors); alternative: closely matched but not mathematically identical

New cards

How does counterbalancing help with carryover effects?

Counterbalancing helps by neutralizing the order in which conditions are presented so that carryover effects (like fatigue, practice, or boredom) are shared equally across all groups.

New cards

What correlation coefficient do we use for parallel/alternate forms reliability?

Pearson r

New cards

If test-retest = coefficient of stability, what is for parallel/alternate forms?

Coefficient of equivalence

New cards

What is the difference between KR20 and KR21 and when do we use them respectively?

KR20 is used for dichotomous items with different levels of difficulty; KR21 is used for dichotomous with same level of difficulty

New cards

What is the difference between Cronbach’s Alpha and McDonald’s Omega?

"Cronbach's Alpha: Assumes all items contribute equally to the score (tau-equivalence).

New cards

McDonald’s Omega: Allows for different item contributions, giving a more flexible measure of consistency."

New cards

Define tau equivalence and explain what happens if it’s violated according to Cronbach’s Alpha?

Tau-equivalence means that each item on a test contributes equally to the true score, with differences only due to random error. If it’s violated, items have unequal contributions, leading to a lower Cronbach’s alpha and less reliable test scores overall.

New cards

What is Average Proportional Difference?

The percentage that represents the difference between two values. This is so you can easily see and compare differences in a standardized way

New cards

What is the difference between Spearman Brown and Rulon’s Formula?

The Spearman-Brown formula is typically used to estimate reliability when the two halves of a test have about equal variance.

New cards

The Flanagan-Rulon formula is used when the two halves of the test have unequal variances. It adjusts for differences in standard deviations, making it more suitable when the halves are not equally distributed"

New cards

When do we use Kappa Statistics and Kendal’s W.?

Kappa is for categorizing nominal data; Kendal's is for ranking ordinal data

New cards

When do we use Fleiss and Cohen’s Kappa?

Cohen's is used for 2 raters; Fleiss is used for 3 or more raters

New cards

What is the difference between criterion- and norm-referenced test?

Norm-referenced test involves comparing an individual's scores against other people's performance; criterion-referenced test involves comparing test scores against a criterion or predetermined standard.

New cards

Differentiate CTT, Domain Sampling, Generalizability, and IRT.

CTT states that observed scores comprise true scores plus error; Domain Sampling states that adding items increases reliability by better representing the content; IRT improves on CTT by providing sample-independent item statistics at the cost of much larger sample sizes.

New cards

What is the concept of sample-dependency all about?

In CTT, if the sample has high ability, the items appear easy; conversely, if the sample has lower ability, the items appear difficult. Essentially, item statistics are contingent on the specific sample taking the test rather than the item itself.

New cards

How does Domain Sampling relate with CTT

Domain Sampling explains why CTT rules work: A bigger scoop (more questions) gives a more accurate sample of your knowledge, which naturally cuts out error and boosts reliability.

New cards

What is another term for IRT?

Latent-trait theory

New cards

Why is IRT better than CTT?

Because IRT is sample-independent, an item's parameters (such as difficulty and discrimination) remain invariant regardless of the ability level of the sample taking the test.

New cards

If test-retest = coefficient of stability, what is for interrater, split-half, and inter-item reliability?

"Test-retest reliability → Coefficient of Stability (consistency of scores over time).

New cards

Interrater reliability → Coefficient of Equivalence (agreement between different raters/observers).

New cards

Split-half reliability → Coefficient of Internal Consistency (consistency between two halves of a test).

New cards

Inter-item reliability → Coefficient of Homogeneity (consistency among items within the same test)."

New cards

What is the equivalent of 土1 and 土2 SEM in confidence interval?

"±1 SEM ≈ 68% confidence interval → The true score is likely within 1 SEM above or below the observed score.

New cards

±2 SEM ≈ 95% confidence interval → The true score is very likely within 2 SEMs above or below the observed score."

New cards

How do we differentiate SEM, SED, SEE?

"SEM (Standard Error of Measurement): Focuses on a single score; shows how much error is in one observed score. Linked to reliability.

New cards

SED (Standard Error of Difference): Compares two scores; indicates whether the gap between them is real or significant.

New cards

SEE (Standard Error of Estimate): Evaluates prediction accuracy; shows the average amount of error in predicted scores. Linked to validity."

New cards

How do we differentiate reliability and validity?

"Reliability: Consistency of measurement. A test is reliable if it produces stable, repeatable results (like a scale giving the same weight each time).

New cards

Validity: Accuracy of measurement. A test is valid if it actually measures what it claims to measure (like a scale measuring weight, not height)."

New cards

What is the relationship between reliability and validity?

A test can be reliable but not valid (e.g., consistently wrong), but it cannot be valid unless it is reliable.

New cards

What is the concept that refers to a judgment regarding how well the test measures what it purports to measure at the time and place the variable is naturally being emitted?

Ecological validity

New cards

Differentiate external validity and ecological validity.

"External Validity: Who, Where, and When (Generalizability).

New cards

Ecological Validity: The Setting (Naturalness vs. Artificiality)"

New cards

What are the components of ecological validity?

"Verisimilitude → Appearance: The degree of resemblance between the test situation and real-world conditions. ""Looks like the real life""

New cards

Veridicality → Accuracy: The degree to which test scores truly reflect or predict actual functioning in the real world. ""Works like the real life"""

New cards

How can we increase internal and external validity?

Internal validity: use random assignment, standardization, and counterbalancing to control extraneous variables; external validity: use diverse yet intentional participants as well as a naturalistic setting

New cards

How do external, internal, conceptual, and face validity differ from each other?

Internal: confidence that there is a relationship; external: generalizability of the results; conceptual: involves a theoretical foundation; face: appearance/judgment based on testtaker

New cards

What is the difference between conceptual and construct validity?

Conceptual is the blueprint on which the test is based; construct ensures that the construct is being measured accurately.

New cards

Differentiate the trinitarian view of validity.

Content: whether the test covers all important aspects of the topic; criterion-related: whether the test can compare to other measures; construct: whether the test align with or contradict the theory.

New cards

Define construct underrepresentation and construct-irrelevant variance.

Construct underrepresentation: indicates the test lacks the components of the construct; construct-irrelevant: indicates that test picks up extraneous constructs which influences the accuracy of the test.

New cards

What are the core principles of content validity?

Representativeness, relevance, absence of bias, technical quality

New cards

How does content validity associate with construct and criterion validity?

Content validity ensures that the test covers all important aspects of the construct. It then ensures that the results correlate with those of another related measure or against a standard.

New cards

What do we use to calculate the content validity of each item according to experts? How is each item rated?

Content validity index measures each item by a 4-point scale

New cards

What does Zero CVR mean? How about a positive one?

0 CVR means no consesus between raters; +1 means all raters agree on the item's relevance ; -1 means all raters find the item irrelevant

New cards

What is the difference between I-CVI and S-CVI?

I-CVI: focuses on testing individual item's relevance; S-CVI: focuses on providing the summary of all items' relevance

New cards

What can criterion-related validity be used for?

Can be used to infer an individual's most probable standing on some measure of interest.

New cards

What is the primary difference between concurrent and predictive validity?

"Concurrent validity → The degree to which test scores correlate with another measure taken at the same time.

New cards

Predictive validity → The degree to which test scores forecast or predict performance on a relevant measure taken in the future."

New cards

What does high incremental validity indicate?

"Incremental validity: The extent to which a new test or measure adds useful information beyond what existing measures already provide.

New cards

High incremental validity → Shows that the measure contributes unique, additional predictive value not captured by other tests or assessments."

New cards

What are the core principles of construct validity?

Clear conceptualization, operationalization, empirical evidence

New cards

What are the six (6) evidences of construct validity and how do they differ from each other?

"Evidence of homogeneity → Proof that the test measures only one construct (items are internally consistent).

New cards

Evidence of changes with age → The construct is expected to change across development, and scores should follow that expected course.

New cards

Evidence of pretest-posttest changes → If an intervention targets the construct, test scores should change accordingly.

New cards

Known-groups validity → The test should differentiate between groups known to differ on the construct (e.g., clinical vs. non-clinical).

New cards

Convergent validity → The test should correlate with other established measures of the same construct.

New cards

Discriminant validity → The test should show little to no correlation with measures of different constructs."

New cards

What do we use to evaluate both convergent and divergent validity simultaneously?

Multi-trait Multimethod Matrix

New cards

What else is the primary purpose of the Multi-trait Multimethod Matrix?

For assessing the construct validity of a set of measures in a study

New cards

What is the the function of each correlation in the MTMM matrix?

"Monomethod-monotrait: reliability diagonal; should be the highest in the entire matrix

New cards

Heteromethod-monotrait: validity diagonal; provides evidence of convergent validity

New cards

Monomethod-heterotrait: should be low as it provides evidence of discriminant validity

New cards

Heteromethod-heterotrait: should be the lowest as it further supports the discriminant validity"

New cards

If Heterotrait and Monomethod Triangle is high, this means that the test has __?

Low discriminant validity; because it's only capturing the test-taker’s ability with the method (e.g., response style, format familiarity) instead of uniquely measuring the intended construct.

New cards

What are latent variables?

Factors that are not directly observed but are inferred from patterns in responses or behaviors.

New cards

How does factor loading work?

A statistical value that shows how strongly a specific item (question) is associated with a latent construct (hidden variable); represents the correlation between the item and the latent construct.

New cards

Higher loadings that is close to __ mean the item is a strong indicator of the construct

New cards

How does factor analysis function?

A statistical technique used to identify latent variables (factors) by examining patterns of correlations among items or questions.

New cards

What are the two types of factor analysis?

"Exploratory Factor Analysis (EFA) → Used when we have no preconceived idea of the factor structure. It aims to explore and discover the underlying dimensions. It is theory-generating.

New cards

Confirmatory Factor Analysis (CFA) → Used when we already have a hypothesized factor structure based on theory or prior research. It aims to test and confirm whether the data fit that structure. It is theory-testing."

New cards

How does Factor Analysis differ from Principal Component Analysis?

"Factor Analysis (FA) → Aims to identify latent variables (factors) by examining correlations among items; focuses on uncovering hidden structure.

New cards

Principal Component Analysis (PCA) → Aims to reduce dimensionality by summarizing items into components that maximize explained variance; involves data simplification."

New cards

What does variance maximization have to do with PCA?

The more variance a component captures, the more information it retains from the original dataset, making summarization more effective.

New cards

PCA forces data into a shape that doesn't overlap as it follows the concept of __?

Orthogonality

New cards

Define eigenvalue and where does it lie in the scree plot?

Eigenvalue → A numerical value that represents the amount of variance explained by a factor or principal component; it is located in the y-axis.

New cards

For every item, the eigenvalue is equivalent to __. Additionally, what should we do with factors that have eigenvalues lower than that?

1.0; we discard those eigenvalues that are lower than 1.0

New cards

Which criterion/rule explains that we should retain factors that have an eigenvalue great than 1.0?

Kaiser criterion/K1 rule

New cards

Why is Kaiser considered not the most accurate method in EFA and PCA?

It tends to overestimate the number of factors, especially when many variables are included.

New cards

What exactly is the limitation of the elbow method?

Elbow point is subjective and therefore ambiguous.

New cards

What is the process of revalidating a test by using a different group and their scores as the criterion?

Cross-validation

New cards

What is the primary goal of cross-validation?

To check whether the test’s validity generalizes beyond the initial sample.

New cards

What is the difference between co-norming and co-validation?

"Co-validation → Involves checking whether two tests predict the same criterion. It’s about comparing their validity evidence — do both tests measure what they claim to measure in relation to an external standard?

New cards

Co-norming → Involves administering two or more tests to the same group of test-takers so their scores can be placed on a common normative scale. It’s about aligning score interpretations across tests."

100

New cards

What is the difference between co-validation and predictive validity?

"Co-validation: comparing two tests against the same criterion.