1/92
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
What does it mean to analyze test/survey development in a social manner?
Is testing fair? Is it socially beneficial? How does testing affect society
What does it mean to analyze test/survey development in an ethical manner?
Test takers’ and survey takers’ privacy? Unintended use of test/survey data
What does it mean to analyze test/survey development in a legal manner?
Legal defense, adverse impact, etc
What does it mean to analyze test/survey development in a cultural manner?
can the test/survey be used in the area? Constructs are universal?
What does it mean to analyze test/survey development in a professional manner?
Is the test really necessary? What is the best way to assess it?
What does it mean to analyze test/survey development in a scientific manner?
Is the assessment validated? Appropriate development processes?
What are the SIOP Principles?
Principles for the validation and use of personnel selection procedures. provides practical & psychometric guidance
What are the APA Standards?
Standards for educational and psychological tests
What are the APA Rights of Test Takers
provides ethical guidance around the rights and responsbilities of test takers
What are the Uniform Guidelines
Legal guidelines. FOllows the trinitarian view of validity
What is the Divided Loyalites in terms of ethical dilemmas for IOs
IOs are often retained by the organization but the participant is an individual. should clearly define roles and expectations when conducting organizational research, formal agreements specifying potential actions with ethical implications, and when asked to engage in unethical behavior, you have the obligation to inform the organization of the violation
What is an organizational survey?
a systematic method of collecting feedback from employees to assess and understand the organization’s current state, work environment, culture, leadership, and employee attitudes and experiences
What is a psychological test
a systematic procedure for comparing the behavior of 2 or more people
Why do psychological test measure observable events
the behavior itself is important in some cases, the behavior can reflect an unobservable psychological attribute
All forms of measurement have inaccuracy and problems such as
complexity of psychological constructs, participant reactivity, observer expectation and bias, use of composite scores (lower accuracy when contrasted with physical measurements), score sensitivity (may not capture subtle variations), lack of awareness of psychometrics
What is scaling
the way numerical values are assigned to psychological attributes. Important because measurement is about quantifying the differences in psychological attributes. affects the interpretation of scores on a measure, the use of the scores for comparing or conducting statistical analysis
What are interindividual differences
differences between people (e.g. in their levels of an attribute)
What are intraindividual differences
differences emerging in one person over time or in different circumstances
How can you interpret results
utilizing scale anchors, comparing them with past results/benchmarks, examining them by groups, confirming their variability
When interpreting scores, the results produces are “raw” and ambiguous so…
reframe it within a useful information context
Test Norms
a distribution of score that represent some relevant population. ideally a large sample sampled in a way that maximizes representativeness of the relevant population
Ethical considerations of using AI
bias in outputs, hallucinations, non-repetitive outputs, privacy and data security, copyright and intellectual property
What are the 4 steps for effective prompting
role
context
command
format
What are key tasks for a typical organizational survey project
project planning and stakeholder engagement (initial consultation, scope definition, approval of survey plan)
Developing organizational survey (lit review, survey design, survey structure, review & feedback from stakeholders, pilot testing, survey tool selection)
Data collection (communication strategy, survey distribution, follow-up and reminders, monitor participation, incentives)
Data analysis (cleaning, descriptive statistics, advanced statistical analysis, benchmarking, qualitative analysis, segmentation analysis)
Reporting and deliverables (report drafting, actionable insights, review and stakeholder feedback, final report)
Sharing results with employees (presentation development, leadership briefing, employee meetings, Q&As, Feedback collection)
Action planning & follow-up (action plan development, communication of next steps, monitoring progress, follow-up survey)
Procedure for developing surveys
collect info about needs
planning/scheduling
collect info to write items
write items and check/edit items
prepare a survey platform
prepare other materials
procedure for developing tests
planning/scheduling
collect info to write items
item generation
data collection
data analysis
revise items
data collection
data analysis
complete scoring algorithm
prepare a test platform
prepare other materials
Benefits of an odd numbered likert scale
allows for a neutral option which allows respondents to express neutrality of uncertainty, which reduces response bias and stress
benefit of an even likert scale
forces respondents to choose a side, reducing people taking the “easy way out” with a neutral response
benefits of open & closed item formats
open: may obtain useful information that developers did not consider
closed: respondents can clearly understand the intended meaning, may remind participants of things they would not consider, analyzing the data is more straightforward
how many items should you generate?
for tests, 2-4x the amount you want to use. for surveys, 1.5xw
what is a bad item
ambiguous
too long
too difficult words/phrases
multiple negatives
double barreled
leading questions
loaded questions
ambiguous pronoun references
misplaced modifiers
adjective forms instead of noun forms
when should you conduct EFA or CFA to determine dimensionality
if your survey contains dimensions, if it includes sections, if it has the potential to develop sections that will serve as a foundation for future analysis
what does it mean if correlation and mean are high
drivers are related to engagement but may not need to improve the drivers
what does it mean if correlation is low and mean is high
drivers are not related to engagement and the current condition is good
What does it mean is correlation is high and the mean is low
drivers are related to engagement and should consider how to improve the drivers
what does it mean when correlation and mean is low
drivers are not related to engagement but the current condition is not good. provide the recommendations to improve the drivers but the priority is not high
What does Natural Language Processing (NLP) mean?
a set of techniques used to analyze written and spoken word. Use in psychometrics to analyze open ended questions
What is Work-Level Analysis NLP
counts which words appear most often in responses. Results can be shown in bar charts or word clouds. Shows what employees care most about.
What is Grouping Responses (Clustering & Topic Modeling) NLP
Groups together responses of similar content. Automatically creates clusters so that similar opinions fall into the same group. Identifies themes withing responses
What is Sentiment & Evaluation Analysis
Identifies positive/negative classification. Can go deeper with Multi-Dimensional Emotion Analysis to identify other emotions like anger or joy
What is Classical Test Theory (CTT)
An assumption that the observed score is the sum of a true score and a random error. Needed for reliability
What is reliabilty
the degree to which observed score differences are consistent with true score differences
what are the 4 key measurement models of reliability ( from most to least strict)
Parallel tests
Tau-equivalent
Essentially tau-equivalent
congeneric
What does the parallel model assume
A person’s true score on the first testing exactly equals his-her true score on the other testing (Xt1 = Xt2). means that true score means and variance, observed score means and variance, and error variance are all the same
What does the tau-equivalent test assume
true scores mean and variance and observed scores means are the same but observed score variance and error variance are not the same
what does essentially tau-equivalent model mean
true score variances are the same, but true score means, observed score means and variances and error variances are different
what does congeneric model mean
the two tests measure the same construct but true scores mean and variances and observed score mean and variances and error variances are different
Raw alpha vs standardized alpha
raw alpha is what we normally think of, standardized alpha is gotten when we standardized (z scored) items before aggregating them. Use standardized in one dimensions includes items with a different # of choices on the likert scale
What does cronbach’s alpha assume
all items measure the same true score with equal strength
what is omega
estimates reliability that is accurate in a wider range of circumstances than alpha (less strict set of assumptions)
how to improve reliability
longer tests, stronger internal consistency
What are statistical indices for interrater reliability
Cohen’s Kappa
Fleiss Kappa
Intraclass correlation (ICC)
What are the types of ICC
Case 1, Case (1,1): one-way, consistency, single
Case 2, Case (2,1): two-way random, agreement, single
Case 3, Case (3,1): two-way mixed, consistency, single
Case 1 (1, k): one-way, consistency, average
Case 2 (2,k): two-way random, agreement, average
Case 3 (3,k): two-way mixed, consistency, average
What is a one-way model
each subject rated by a different set of randomly selected raters
What is a two-way model?
Random: subjects rated by same raters who are randomly selected
Mixed: subjects rated by same set of fixed raters
What is consistency?
extent to which raters agree on a relative order of the subjects
what is agreement?
extent to which raters assign the same score to the same subject
what is single data
raw data is used for the calculation
what is average data
average data is used for the calcualtion
How to determine if items are consistent with the rest of the test?
item-total correlations
item discrimination index
alpha if item deleted
why is item validity important?
if low validity, the item may not be able to detect differences between high and low performers. items with low validity cannot correlate with other items
what is an example of direct evidence of validity
interviews with respondents thinking out loud
what is an example of indirect evidence of validity
eye tracking, response items, statistical analysis, experimental studies of processes
types of validity evidence
direct
indirect
convergent
discriminant
criterion
associations that a test should have with other measures
nomological network
what is focused examinations for validity evidence
very few criterion has strong relevance for the meaning of scores. instead of looking at a wide range of variables, you select a few key ones to study in depth
what is unsystematic examination of sets of correlations for validity evidence
several criterion variables are examined. “eyeballing” the pattern of correlations and draw conclusions regarding convergent and discriminant validity
what is multi-trait multi-method matric (MTMM) for validity evidence
several other measures are examined to systematically evaluate the pattern of correlations and draw conclusions regarding convergent and discriminant validity
what is the systematic examination of sets of correlations for validity evidence
evaluate the pattern of correlations and draw conclusions regarding convergent and discriminant validity
what are factors affecting observed associations for validity evidence
restriction of range
method variance
time
predictions of single events
criterion issues
unrepresentative sample
cultural or contextual differences
what is validity correction
accounts for measurement error and restriction of range. should still report uncorrected estimates too. Issues: impacts reliability
what is direct range restriction
individuals are screened on the procedure that is being validated
what is indirect range restriction
the procedure being validated is correlated with one or more of the procedures used for selection
what is transportability
the process of using validity evidence from one situation and applying it to another without conducting a new validation study
what is synthetic validity/job component validity
estimating validity of selection measures by breaking down job into components and predicting how well various selection measures predict success on those components
What is response bias
the tendency for participants to response inaccurately or falsely to questions
what is test bias?
when the test systematically obscures differences between groups
Examples of response bias
acquiescence
extremity and modesty
social desirability
malingering
random/careless responding
guessing
examples of test bias
construct bias
predictive bias
how to deal with response bias
manage testing context, testing content, scoring, or use specialized tests
what is the importance of test bias
tests need to differentiate among people based on real psychological differences
what does construct bias look like for test bias
scores may have different meanings for different groups. differences in test scores may not reflect true group differences
how can you detect construct bias
differential reliability (examine internal structures of test scores)
differential rank order of item difficulties (items difficult for one group are not for others)
differential item discrimination index (compute for each group, are the indices different)
differential dimensionality (is the factor structure different across groups)
differential item functioning (based on IRT, compare item properties across groups)
what is predictive bias in terms of test bias
if a test is more predictive for some groups than others
types of predictive bias
intercept bias (do 2 groups have the same intercept)
slope bias (do 2 groups have the same slope)
types of forced choice questions
binary preference (which out of these 2 is more like you)
blocks (rank the options are most to least true/like you)
graded preference (out of these options, select on a linker-type scale how much which is more like you)
proportional preference (assign point values on to each option based on how like you they are out of 100)
advantages of forced choice
-reduced response bias
-increases dimensionality
-do not need verbal and non verbal anchors
-reduced faking
disadvantages of forced-choice
-ipsatative scores (only a comparison within an individual (unless you use advanced IRT)
-more difficult to develop
-need more data collection
What is classical test theory
framework for assessing the reliability of test and measurements by assuming: observed score = true score + random error. Linear relationship between item responses and level of construct. Difficulty and discrimination are calculated based on relationships between the item and the while test
What is item response theory
focuses on each item not the test. each item independently has difficulty and discrimination information. if used in a different item set, item retains parameters
Differences between CTT for non-cognitive nad cognitive items?
Non-cog: focus on descriptive stats, ICC, non response rate
Cog: focus on difficulty, discrimination, response distributions, point biserial correlation
what is the average, desirable item difficulty for a cognitive item
.5 (# who got it correct/total)
what is the acceptable and ideal item discrimination for a cognitive item
20+, 30+ (create high and low performer groups based on test score. calculate % who correctly answered each group. high performer % - low performer %)
what is point biserial correlation for a cognitive item
correlation between performance on the item and performance on the total test. should be more than .2