Psychometrics Midterm

0.0(0)

Studied by 0 people

0.0(0)

Call with Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/92

There's no tags or description

Looks like no tags are added yet.

Last updated 8:56 PM on 10/30/25

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

93 Terms

New cards

What does it mean to analyze test/survey development in a social manner?

Is testing fair? Is it socially beneficial? How does testing affect society

New cards

What does it mean to analyze test/survey development in an ethical manner?

Test takers’ and survey takers’ privacy? Unintended use of test/survey data

New cards

What does it mean to analyze test/survey development in a legal manner?

Legal defense, adverse impact, etc

New cards

What does it mean to analyze test/survey development in a cultural manner?

can the test/survey be used in the area? Constructs are universal?

New cards

What does it mean to analyze test/survey development in a professional manner?

Is the test really necessary? What is the best way to assess it?

New cards

What does it mean to analyze test/survey development in a scientific manner?

Is the assessment validated? Appropriate development processes?

New cards

What are the SIOP Principles?

Principles for the validation and use of personnel selection procedures. provides practical & psychometric guidance

New cards

What are the APA Standards?

Standards for educational and psychological tests

New cards

What are the APA Rights of Test Takers

provides ethical guidance around the rights and responsbilities of test takers

New cards

What are the Uniform Guidelines

Legal guidelines. FOllows the trinitarian view of validity

New cards

What is the Divided Loyalites in terms of ethical dilemmas for IOs

IOs are often retained by the organization but the participant is an individual. should clearly define roles and expectations when conducting organizational research, formal agreements specifying potential actions with ethical implications, and when asked to engage in unethical behavior, you have the obligation to inform the organization of the violation

New cards

What is an organizational survey?

a systematic method of collecting feedback from employees to assess and understand the organization’s current state, work environment, culture, leadership, and employee attitudes and experiences

New cards

What is a psychological test

a systematic procedure for comparing the behavior of 2 or more people

New cards

Why do psychological test measure observable events

the behavior itself is important in some cases, the behavior can reflect an unobservable psychological attribute

New cards

All forms of measurement have inaccuracy and problems such as

complexity of psychological constructs, participant reactivity, observer expectation and bias, use of composite scores (lower accuracy when contrasted with physical measurements), score sensitivity (may not capture subtle variations), lack of awareness of psychometrics

New cards

What is scaling

the way numerical values are assigned to psychological attributes. Important because measurement is about quantifying the differences in psychological attributes. affects the interpretation of scores on a measure, the use of the scores for comparing or conducting statistical analysis

New cards

What are interindividual differences

differences between people (e.g. in their levels of an attribute)

New cards

What are intraindividual differences

differences emerging in one person over time or in different circumstances

New cards

How can you interpret results

utilizing scale anchors, comparing them with past results/benchmarks, examining them by groups, confirming their variability

New cards

When interpreting scores, the results produces are “raw” and ambiguous so…

reframe it within a useful information context

New cards

Test Norms

a distribution of score that represent some relevant population. ideally a large sample sampled in a way that maximizes representativeness of the relevant population

New cards

Ethical considerations of using AI

bias in outputs, hallucinations, non-repetitive outputs, privacy and data security, copyright and intellectual property

New cards

What are the 4 steps for effective prompting

role

context

command

format

New cards

What are key tasks for a typical organizational survey project

project planning and stakeholder engagement (initial consultation, scope definition, approval of survey plan)
Developing organizational survey (lit review, survey design, survey structure, review & feedback from stakeholders, pilot testing, survey tool selection)
Data collection (communication strategy, survey distribution, follow-up and reminders, monitor participation, incentives)
Data analysis (cleaning, descriptive statistics, advanced statistical analysis, benchmarking, qualitative analysis, segmentation analysis)
Reporting and deliverables (report drafting, actionable insights, review and stakeholder feedback, final report)
Sharing results with employees (presentation development, leadership briefing, employee meetings, Q&As, Feedback collection)
Action planning & follow-up (action plan development, communication of next steps, monitoring progress, follow-up survey)

New cards

Procedure for developing surveys

collect info about needs
planning/scheduling
collect info to write items
write items and check/edit items
prepare a survey platform
prepare other materials

New cards

procedure for developing tests

planning/scheduling
collect info to write items
item generation
data collection
data analysis
revise items
data collection
data analysis
complete scoring algorithm
prepare a test platform
1. prepare other materials

New cards

Benefits of an odd numbered likert scale

allows for a neutral option which allows respondents to express neutrality of uncertainty, which reduces response bias and stress

New cards

benefit of an even likert scale

forces respondents to choose a side, reducing people taking the “easy way out” with a neutral response

New cards

benefits of open & closed item formats

open: may obtain useful information that developers did not consider

closed: respondents can clearly understand the intended meaning, may remind participants of things they would not consider, analyzing the data is more straightforward

New cards

how many items should you generate?

for tests, 2-4x the amount you want to use. for surveys, 1.5xw

New cards

what is a bad item

ambiguous

too long

too difficult words/phrases

multiple negatives

double barreled

leading questions

loaded questions

ambiguous pronoun references

misplaced modifiers

adjective forms instead of noun forms

New cards

when should you conduct EFA or CFA to determine dimensionality

if your survey contains dimensions, if it includes sections, if it has the potential to develop sections that will serve as a foundation for future analysis

New cards

what does it mean if correlation and mean are high

drivers are related to engagement but may not need to improve the drivers

New cards

what does it mean if correlation is low and mean is high

drivers are not related to engagement and the current condition is good

New cards

What does it mean is correlation is high and the mean is low

drivers are related to engagement and should consider how to improve the drivers

New cards

what does it mean when correlation and mean is low

drivers are not related to engagement but the current condition is not good. provide the recommendations to improve the drivers but the priority is not high

New cards

What does Natural Language Processing (NLP) mean?

a set of techniques used to analyze written and spoken word. Use in psychometrics to analyze open ended questions

New cards

What is Work-Level Analysis NLP

counts which words appear most often in responses. Results can be shown in bar charts or word clouds. Shows what employees care most about.

New cards

What is Grouping Responses (Clustering & Topic Modeling) NLP

Groups together responses of similar content. Automatically creates clusters so that similar opinions fall into the same group. Identifies themes withing responses

New cards

What is Sentiment & Evaluation Analysis

Identifies positive/negative classification. Can go deeper with Multi-Dimensional Emotion Analysis to identify other emotions like anger or joy

New cards

What is Classical Test Theory (CTT)

An assumption that the observed score is the sum of a true score and a random error. Needed for reliability

New cards

What is reliabilty

the degree to which observed score differences are consistent with true score differences

New cards

what are the 4 key measurement models of reliability ( from most to least strict)

Parallel tests
Tau-equivalent
Essentially tau-equivalent
congeneric

New cards

What does the parallel model assume

A person’s true score on the first testing exactly equals his-her true score on the other testing (Xt1 = Xt2). means that true score means and variance, observed score means and variance, and error variance are all the same

New cards

What does the tau-equivalent test assume

true scores mean and variance and observed scores means are the same but observed score variance and error variance are not the same

New cards

what does essentially tau-equivalent model mean

true score variances are the same, but true score means, observed score means and variances and error variances are different

New cards

what does congeneric model mean

the two tests measure the same construct but true scores mean and variances and observed score mean and variances and error variances are different

New cards

Raw alpha vs standardized alpha

raw alpha is what we normally think of, standardized alpha is gotten when we standardized (z scored) items before aggregating them. Use standardized in one dimensions includes items with a different # of choices on the likert scale

New cards

What does cronbach’s alpha assume

all items measure the same true score with equal strength

New cards

what is omega

estimates reliability that is accurate in a wider range of circumstances than alpha (less strict set of assumptions)

New cards

how to improve reliability

longer tests, stronger internal consistency

New cards

What are statistical indices for interrater reliability

Cohen’s Kappa
Fleiss Kappa
1. Intraclass correlation (ICC)

New cards

What are the types of ICC

Case 1, Case (1,1): one-way, consistency, single
Case 2, Case (2,1): two-way random, agreement, single
Case 3, Case (3,1): two-way mixed, consistency, single
Case 1 (1, k): one-way, consistency, average
Case 2 (2,k): two-way random, agreement, average
- Case 3 (3,k): two-way mixed, consistency, average

New cards

What is a one-way model

each subject rated by a different set of randomly selected raters

New cards

What is a two-way model?

Random: subjects rated by same raters who are randomly selected

Mixed: subjects rated by same set of fixed raters

New cards

What is consistency?

extent to which raters agree on a relative order of the subjects

New cards

what is agreement?

extent to which raters assign the same score to the same subject

New cards

what is single data

raw data is used for the calculation

New cards

what is average data

average data is used for the calcualtion

New cards

How to determine if items are consistent with the rest of the test?

item-total correlations
item discrimination index
1. alpha if item deleted

New cards

why is item validity important?

if low validity, the item may not be able to detect differences between high and low performers. items with low validity cannot correlate with other items

New cards

what is an example of direct evidence of validity

interviews with respondents thinking out loud

New cards

what is an example of indirect evidence of validity

eye tracking, response items, statistical analysis, experimental studies of processes

New cards

types of validity evidence

direct
indirect
convergent
discriminant
criterion
associations that a test should have with other measures
- nomological network

New cards

what is focused examinations for validity evidence

very few criterion has strong relevance for the meaning of scores. instead of looking at a wide range of variables, you select a few key ones to study in depth

New cards

what is unsystematic examination of sets of correlations for validity evidence

several criterion variables are examined. “eyeballing” the pattern of correlations and draw conclusions regarding convergent and discriminant validity

New cards

what is multi-trait multi-method matric (MTMM) for validity evidence

several other measures are examined to systematically evaluate the pattern of correlations and draw conclusions regarding convergent and discriminant validity

New cards

what is the systematic examination of sets of correlations for validity evidence

evaluate the pattern of correlations and draw conclusions regarding convergent and discriminant validity

New cards

what are factors affecting observed associations for validity evidence

restriction of range
method variance
time
predictions of single events
criterion issues
unrepresentative sample
- cultural or contextual differences

New cards

what is validity correction

accounts for measurement error and restriction of range. should still report uncorrected estimates too. Issues: impacts reliability

New cards

what is direct range restriction

individuals are screened on the procedure that is being validated

New cards

what is indirect range restriction

the procedure being validated is correlated with one or more of the procedures used for selection

New cards

what is transportability

the process of using validity evidence from one situation and applying it to another without conducting a new validation study

New cards

what is synthetic validity/job component validity

estimating validity of selection measures by breaking down job into components and predicting how well various selection measures predict success on those components

New cards

What is response bias

the tendency for participants to response inaccurately or falsely to questions

New cards

what is test bias?

when the test systematically obscures differences between groups

New cards

Examples of response bias

acquiescence
extremity and modesty
social desirability
malingering
random/careless responding
- guessing

New cards

examples of test bias

construct bias
predictive bias

New cards

how to deal with response bias

manage testing context, testing content, scoring, or use specialized tests

New cards

what is the importance of test bias

tests need to differentiate among people based on real psychological differences

New cards

what does construct bias look like for test bias

scores may have different meanings for different groups. differences in test scores may not reflect true group differences

New cards

how can you detect construct bias

differential reliability (examine internal structures of test scores)
differential rank order of item difficulties (items difficult for one group are not for others)
differential item discrimination index (compute for each group, are the indices different)
differential dimensionality (is the factor structure different across groups)
- differential item functioning (based on IRT, compare item properties across groups)

New cards

what is predictive bias in terms of test bias

if a test is more predictive for some groups than others

New cards

types of predictive bias

intercept bias (do 2 groups have the same intercept)
- slope bias (do 2 groups have the same slope)

New cards

types of forced choice questions

binary preference (which out of these 2 is more like you)
blocks (rank the options are most to least true/like you)
graded preference (out of these options, select on a linker-type scale how much which is more like you)
- proportional preference (assign point values on to each option based on how like you they are out of 100)

New cards

advantages of forced choice

-reduced response bias

-increases dimensionality

-do not need verbal and non verbal anchors

-reduced faking

New cards

disadvantages of forced-choice

-ipsatative scores (only a comparison within an individual (unless you use advanced IRT)

-more difficult to develop

-need more data collection

New cards

What is classical test theory

framework for assessing the reliability of test and measurements by assuming: observed score = true score + random error. Linear relationship between item responses and level of construct. Difficulty and discrimination are calculated based on relationships between the item and the while test

New cards

What is item response theory

focuses on each item not the test. each item independently has difficulty and discrimination information. if used in a different item set, item retains parameters

New cards

Differences between CTT for non-cognitive nad cognitive items?

Non-cog: focus on descriptive stats, ICC, non response rate

Cog: focus on difficulty, discrimination, response distributions, point biserial correlation

New cards

what is the average, desirable item difficulty for a cognitive item

.5 (# who got it correct/total)

New cards

what is the acceptable and ideal item discrimination for a cognitive item

20+, 30+ (create high and low performer groups based on test score. calculate % who correctly answered each group. high performer % - low performer %)

New cards

what is point biserial correlation for a cognitive item

correlation between performance on the item and performance on the total test. should be more than .2