Psychometrics Midterm

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/92

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

93 Terms

1
New cards

What does it mean to analyze test/survey development in a social manner?

Is testing fair? Is it socially beneficial? How does testing affect society

2
New cards

What does it mean to analyze test/survey development in an ethical manner?

Test takers’ and survey takers’ privacy? Unintended use of test/survey data

3
New cards

What does it mean to analyze test/survey development in a legal manner?

Legal defense, adverse impact, etc

4
New cards

What does it mean to analyze test/survey development in a cultural manner?

can the test/survey be used in the area? Constructs are universal?

5
New cards

What does it mean to analyze test/survey development in a professional manner?

Is the test really necessary? What is the best way to assess it?

6
New cards

What does it mean to analyze test/survey development in a scientific manner?

Is the assessment validated? Appropriate development processes? 

7
New cards

What are the SIOP Principles?

Principles for the validation and use of personnel selection procedures. provides practical & psychometric guidance

8
New cards

What are the APA Standards?

Standards for educational and psychological tests

9
New cards

What are the APA Rights of Test Takers

provides ethical guidance around the rights and responsbilities of test takers

10
New cards

What are the Uniform Guidelines

Legal guidelines. FOllows the trinitarian view of validity

11
New cards

What is the Divided Loyalites in terms of ethical dilemmas for IOs

IOs are often retained by the organization but the participant is an individual. should clearly define roles and expectations when conducting organizational research, formal agreements specifying potential actions with ethical implications, and when asked to engage in unethical behavior, you have the obligation to inform the organization of the violation

12
New cards

What is an organizational survey?

a systematic method of collecting feedback from employees to assess and understand the organization’s current state, work environment, culture, leadership, and employee attitudes and experiences 

13
New cards

What is a psychological test

a systematic procedure for comparing the behavior of 2 or more people

14
New cards

Why do psychological test measure observable events

the behavior itself is important in some cases, the behavior can reflect an unobservable psychological attribute

15
New cards

All forms of measurement have inaccuracy and problems such as

complexity of psychological constructs, participant reactivity, observer expectation and bias, use of composite scores (lower accuracy when contrasted with physical measurements), score sensitivity (may not capture subtle variations), lack of awareness of psychometrics

16
New cards

What is scaling

the way numerical values are assigned to psychological attributes. Important because measurement is about quantifying the differences in psychological attributes. affects the interpretation of scores on a measure, the use of the scores for comparing or conducting statistical analysis

17
New cards

What are interindividual differences

differences between people (e.g. in their levels of an attribute)

18
New cards

What are intraindividual differences

differences emerging in one person over time or in different circumstances 

19
New cards

How can you interpret results

utilizing scale anchors, comparing them with past results/benchmarks, examining them by groups, confirming their variability

20
New cards

When interpreting scores, the results produces are “raw” and ambiguous so…

reframe it within a useful information context

21
New cards

Test Norms

a distribution of score that represent some relevant population. ideally a large sample sampled in a way that maximizes representativeness of the relevant population

22
New cards

Ethical considerations of using AI

bias in outputs, hallucinations, non-repetitive outputs, privacy and data security, copyright and intellectual property

23
New cards

What are the 4 steps for effective prompting

role

context

command

format

24
New cards

What are key tasks for a typical organizational survey project

  1. project planning and stakeholder engagement (initial consultation, scope definition, approval of survey plan)

  2. Developing organizational survey (lit review, survey design, survey structure, review & feedback from stakeholders, pilot testing, survey tool selection)

  3. Data collection (communication strategy, survey distribution, follow-up and reminders, monitor participation, incentives)

  4. Data analysis (cleaning, descriptive statistics, advanced statistical analysis, benchmarking, qualitative analysis, segmentation analysis)

  5. Reporting and deliverables (report drafting, actionable insights, review and stakeholder feedback, final report)

  6. Sharing results with employees (presentation development, leadership briefing, employee meetings, Q&As, Feedback collection)

  7. Action planning & follow-up (action plan development, communication of next steps, monitoring progress, follow-up survey)

25
New cards

Procedure for developing surveys

  1. collect info about needs

  2. planning/scheduling

  3. collect info to write items

  4. write items and check/edit items

  5. prepare a survey platform

  6. prepare other materials

26
New cards

procedure for developing tests

  1. planning/scheduling

  2. collect info to write items

  3. item generation

  4. data collection

  5. data analysis

  6. revise items

  7. data collection

  8. data analysis

  9. complete scoring algorithm

  10. prepare a test platform

    1. prepare other materials 

27
New cards

Benefits of an odd numbered likert scale

allows for a neutral option which allows respondents to express neutrality of uncertainty, which reduces response bias and stress

28
New cards

benefit of an even likert scale

forces respondents to choose a side, reducing people taking the “easy way out” with a neutral response

29
New cards

benefits of open & closed item formats

open: may obtain useful information that developers did not consider

closed: respondents can clearly understand the intended meaning, may remind participants of things they would not consider, analyzing the data is more straightforward

30
New cards

how many items should you generate?

for tests, 2-4x the amount you want to use. for surveys, 1.5xw

31
New cards

what is a bad item

ambiguous

too long

too difficult words/phrases

multiple negatives

double barreled

leading questions

loaded questions

ambiguous pronoun references

misplaced modifiers

adjective forms instead of noun forms

32
New cards

when should you conduct EFA or CFA to determine dimensionality

if your survey contains dimensions, if it includes sections, if it has the potential to develop sections that will serve as a foundation for future analysis

33
New cards

what does it mean if correlation and mean are high

drivers are related to engagement but may not need to improve the drivers

34
New cards

what does it mean if correlation is low and mean is high

drivers are not related to engagement and the current condition is good

35
New cards

What does it mean is correlation is high and the mean is low

drivers are related to engagement and should consider how to improve the drivers

36
New cards

what does it mean when correlation and mean is low

drivers are not related to engagement but the current condition is not good. provide the recommendations to improve the drivers but the priority is not high

37
New cards

What does Natural Language Processing (NLP) mean?

a set of techniques used to analyze written and spoken word. Use in psychometrics to analyze open ended questions

38
New cards

What is Work-Level Analysis NLP

counts which words appear most often in responses. Results can be shown in bar charts or word clouds. Shows what employees care most about.

39
New cards

What is Grouping Responses (Clustering & Topic Modeling) NLP

Groups together responses of similar content. Automatically creates clusters so that similar opinions fall into the same group. Identifies themes withing responses

40
New cards

What is Sentiment & Evaluation Analysis

Identifies positive/negative classification. Can go deeper with Multi-Dimensional Emotion Analysis to identify other emotions like anger or joy

41
New cards

What is Classical Test Theory (CTT)

An assumption that the observed score is the sum of a true score and a random error. Needed for reliability

42
New cards

What is reliabilty

the degree to which observed score differences are consistent with true score differences

43
New cards

what are the 4 key measurement models of reliability ( from most to least strict)

  1. Parallel tests

  2. Tau-equivalent

  3. Essentially tau-equivalent

  4. congeneric

44
New cards

What does the parallel model assume

A person’s true score on the first testing exactly equals his-her true score on the other testing (Xt1 = Xt2). means that true score means and variance, observed score means and variance, and error variance are all the same

45
New cards

What does the tau-equivalent test assume

true scores mean and variance and observed scores means are the same but observed score variance and error variance are not the same

46
New cards

what does essentially tau-equivalent model mean

true score variances are the same, but true score means, observed score means and variances and error variances are different

47
New cards

what does congeneric model mean

the two tests measure the same construct but true scores mean and variances and observed score mean and variances and error variances are different

48
New cards

Raw alpha vs standardized alpha

raw alpha is what we normally think of, standardized alpha is gotten when we standardized (z scored) items before aggregating them. Use standardized in one dimensions includes items with a different # of choices on the likert scale

49
New cards

What does cronbach’s alpha assume

all items measure the same true score with equal strength

50
New cards

what is omega

estimates reliability that is accurate in a wider range of circumstances than alpha (less strict set of assumptions)

51
New cards

how to improve reliability

longer tests, stronger internal consistency

52
New cards

What are statistical indices for interrater reliability

  1. Cohen’s Kappa

  2. Fleiss Kappa

    1. Intraclass correlation (ICC)

53
New cards

What are the types of ICC

  • Case 1, Case (1,1): one-way, consistency, single

  • Case 2, Case (2,1): two-way random, agreement, single

  • Case 3, Case (3,1): two-way mixed, consistency, single

  • Case 1 (1, k): one-way, consistency, average

  • Case 2 (2,k): two-way random, agreement, average

    • Case 3 (3,k): two-way mixed, consistency, average

54
New cards

What is a one-way model

each subject rated by a different set of randomly selected raters

55
New cards

What is a two-way model?

Random: subjects rated by same raters who are randomly selected

Mixed: subjects rated by same set of fixed raters

56
New cards

What is consistency?

extent to which raters agree on a relative order of the subjects

57
New cards

what is agreement?

extent to which raters assign the same score to the same subject

58
New cards

what is single data

raw data is used for the calculation

59
New cards

what is average data

average data is used for the calcualtion

60
New cards

How to determine if items are consistent with the rest of the test?

  1. item-total correlations

  2. item discrimination index

    1. alpha if item deleted

61
New cards

why is item validity important?

if low validity, the item may not be able to detect differences between high and low performers. items with low validity cannot correlate with other items

62
New cards

what is an example of direct evidence of validity

interviews with respondents thinking out loud

63
New cards

what is an example of indirect evidence of validity

eye tracking, response items, statistical analysis, experimental studies of processes

64
New cards

types of validity evidence

  • direct

  • indirect

  • convergent

  • discriminant

  • criterion

  • associations that a test should have with other measures

    • nomological network

65
New cards

what is focused examinations for validity evidence

very few criterion has strong relevance for the meaning of scores. instead of looking at a wide range of variables, you select a few key ones to study in depth

66
New cards

what is unsystematic examination of sets of correlations for validity evidence

several criterion variables are examined. “eyeballing” the pattern of correlations and draw conclusions regarding convergent and discriminant validity

67
New cards

what is  multi-trait multi-method matric (MTMM) for validity evidence

several other measures are examined to systematically evaluate the pattern of correlations and draw conclusions regarding convergent and discriminant validity

68
New cards

what is the systematic examination of sets of correlations for validity evidence

evaluate the pattern of correlations and draw conclusions regarding convergent and discriminant validity

69
New cards

what are factors affecting observed associations for validity evidence

  • restriction of range

  • method variance

  • time

  • predictions of single events

  • criterion issues

  • unrepresentative sample

    • cultural or contextual differences

70
New cards

what is validity correction

accounts for measurement error and restriction of range. should still report uncorrected estimates too. Issues: impacts reliability

71
New cards

what is direct range restriction

individuals are screened on the procedure that is being validated

72
New cards

what is indirect range restriction

the procedure being validated is correlated with one or more of the procedures used for selection

73
New cards

what is transportability

the process of using validity evidence from one situation and applying it to another without conducting a new validation study

74
New cards

what is synthetic validity/job component validity

estimating validity of selection measures by breaking down job into components and predicting how well various selection measures predict success on those components

75
New cards

What is response bias

the tendency for participants to response inaccurately or falsely to questions

76
New cards

what is test bias?

when the test systematically obscures differences between groups

77
New cards

Examples of response bias

  • acquiescence

  • extremity and modesty

  • social desirability

  • malingering

  • random/careless responding

    • guessing

78
New cards

examples of test bias

  • construct bias

  • predictive bias

79
New cards

how to deal with response bias

manage testing context, testing content, scoring, or use specialized tests

80
New cards

what is the importance of test bias

tests need to differentiate among people based on real psychological differences 

81
New cards

what does construct bias look like for test bias

scores may have different meanings for different groups. differences in test scores may not reflect true group differences

82
New cards

how can you detect construct bias

  • differential reliability (examine internal structures of test scores)

  • differential rank order of item difficulties (items difficult for one group are not for others)

  • differential item discrimination index (compute for each group, are the indices different)

  • differential dimensionality (is the factor structure different across groups)

    • differential item functioning (based on IRT, compare item properties across groups)

83
New cards

what is predictive bias in terms of test bias

if a test is more predictive for some groups than others

84
New cards

types of predictive bias

  • intercept bias (do 2 groups have the same intercept)

    • slope bias (do 2 groups have the same slope)

85
New cards

types of forced choice questions

  • binary preference (which out of these 2 is more like you)

  • blocks (rank the options are most to least true/like you)

  • graded preference (out of these options, select on a linker-type scale how much which is more like you)

    • proportional preference (assign point values on to each option based on how like you they are out of 100)

86
New cards

advantages of forced choice

-reduced response bias

-increases dimensionality

-do not need verbal and non verbal anchors

-reduced faking

87
New cards

disadvantages of forced-choice

-ipsatative scores (only a comparison within an individual (unless you use advanced IRT)

-more difficult to develop

-need more data collection

88
New cards

What is classical test theory

framework for assessing the reliability of test and measurements by assuming: observed score = true score + random error. Linear relationship between item responses and level of construct. Difficulty and discrimination are calculated based on relationships between the item and the while test

89
New cards

What is item response theory

focuses on each item not the test. each item independently has difficulty and discrimination information. if used in a different item set, item retains parameters

90
New cards

Differences between CTT for non-cognitive nad cognitive items?

Non-cog: focus on descriptive stats, ICC, non response rate

Cog: focus on difficulty, discrimination, response distributions, point biserial correlation

91
New cards

what is the average, desirable item difficulty for a cognitive item

.5 (# who got it correct/total)

92
New cards

what is the acceptable and ideal item discrimination for a cognitive item

20+, 30+ (create high and low performer groups based on test score. calculate % who correctly answered each group. high performer % - low performer %)

93
New cards

what is point biserial correlation for a cognitive item

correlation between performance on the item and performance on the total test. should be more than .2