1/326
A set of 500 English question-and-answer flashcards covering psychological testing, assessment principles, psychometrics, statistics, test construction, major tests, ethics, and related concepts for exam review.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
What does the term “psychological testing” refer to?
The process of measuring psychology-related variables by means of devices or procedures designed to obtain a sample of behaviour.
In psychological testing, what is typically the end product?
A numerical test score or series of scores used for evaluation.
Who is considered the key decision-maker in psychological assessment?
The assessor conducting the evaluation.
What is an ecological momentary assessment?
An “in-the-moment” evaluation of problems and related variables at the time and place they occur.
Define collaborative assessment.
An assessment approach in which the assessor and assessee work as partners from initial contact through feedback.
What is meant by therapeutic assessment?
An approach that encourages self-discovery and new understanding while assessment is conducted.
Describe a dynamic assessment.
An interactive approach that follows the model: evaluation → intervention → re-evaluation.
What is a psychological test?
A device or procedure designed to measure variables related to psychology.
List two basic elements every test item has.
A stimulus (the item itself) and a response to be scored.
What is a cut-score?
A reference point, derived by judgment, used to divide data into two or more classifications.
What does psychometric soundness refer to?
The technical quality and scientific adequacy of a test.
Who is a psychometrist?
A professional who uses, scores, and interprets psychological data.
Differentiate achievement and aptitude tests.
Achievement measures prior learning; aptitude estimates potential for future learning.
What does an intelligence test aim to measure?
General potential to solve problems, adapt, think abstractly, and learn from experience.
Define non-standardised interview.
An unstructured interview pursuing relevant ideas in depth without preset questions.
What is a semi-standardised interview?
An interview with optional probing on a limited set of prepared questions.
Explain the SORC model in behavioural observation.
Stimulus, Organismic variables, Response, and Consequence framework for analysing behaviour.
What makes a personality test ‘structured’?
It uses self-report statements and fixed response alternatives.
Give an example of a projective personality test.
Rorschach Inkblot Test or Thematic Apperception Test.
What is a speed test designed to evaluate?
How many items a test-taker can answer correctly within a time limit.
Contrast speed and power tests.
Speed emphasises time; power emphasises difficulty of items answered correctly.
Define a norm-referenced test.
A test that compares an individual’s score to a normative sample.
What is a criterion-referenced test?
A test that evaluates performance against a fixed standard or skill set.
What are the four main steps in the psychological assessment process?
Determine referral question, acquire knowledge, collect data, interpret data.
What is an actuarial assessment?
Evaluation relying on statistical rules rather than clinical judgment.
Describe mechanical prediction.
Use of computer algorithms plus statistics to generate assessment findings.
What is extra-test behaviour?
Observations about how an examinee behaves during testing that aid interpretation.
Define psychological trait.
A relatively enduring way one individual differs from another.
What is a psychological state?
A temporary pattern of thinking, feeling, or behaving in a specific situation.
What is cumulative scoring?
The assumption that endorsing more keyed responses indicates more of the trait.
Why is reliability important?
It indicates consistency and dependability of test scores across occasions or forms.
What is classical test theory’s basic equation?
Observed score = True score + Error.
Name three potential sources of measurement error.
Assessors, measuring instruments, and random events such as luck.
Differentiate random and systematic error.
Random error fluctuates unpredictably; systematic error is consistent and proportional.
What is test–retest reliability appropriate for?
Measuring stability of traits expected to remain constant over time.
Define carry-over effect.
When experience with the first test administration influences the second administration.
What statistic is usually used for test–retest correlation?
Pearson r or Spearman rho, depending on data level.
What is split-half reliability?
Correlation between two halves of one test administered once.
What formula adjusts split-half coefficients to full-length?
Spearman-Brown formula.
Explain Cronbach’s alpha.
An internal-consistency index for items scored in more than two categories.
What does KR-20 measure?
Internal consistency for dichotomous items with unequal difficulty.
When would KR-21 be preferred?
For dichotomous items all having equal difficulty.
What is parallel-forms reliability?
Correlation between scores on two versions with equal means and error variances.
Define inter-scorer reliability.
Agreement consistency between two or more scorers.
What is Fleiss’ kappa used for?
Inter-rater agreement for three or more raters on categorical data.
What does restriction of range do to reliability?
It lowers correlation coefficients and reliability estimates.
Explain standard error of measurement (SEM).
The standard deviation of error scores, reflecting score precision.
What is a confidence interval in testing?
A range around an observed score likely to contain the true score.
Define validity in testing.
The degree to which evidence supports intended interpretations of test scores.
What is content validity?
How well test items represent the construct’s domain.
Describe face validity.
Whether a test appears to measure what it claims to, to test-takers.
What problem does construct underrepresentation present?
The test fails to capture essential aspects of the construct.
What is construct-irrelevant variance?
Score variance due to factors unrelated to the construct of interest.
Name the formula often used for expert item ratings in content validation.
Lawshe’s Content Validity Ratio (CVR).
What is criterion validity?
How well test scores predict or relate to an external criterion.
Differentiate predictive and concurrent validity.
Predictive uses future criteria; concurrent uses simultaneous criteria.
Define sensitivity regarding a test.
The percentage of true positives correctly identified.
What does specificity measure?
The percentage of true negatives correctly identified.
What is incremental validity?
The additional predictive value gained by a new test beyond existing predictors.
Explain convergent evidence for construct validity.
High correlations with other measures of the same construct.
Explain discriminant evidence.
Low correlations with measures of different constructs.
What is a multitrait-multimethod matrix used for?
Simultaneously evaluating convergent and discriminant validity across traits and methods.
Who developed factor analysis?
Charles Spearman.
What is cross-validation?
Re-testing a test’s validity on a new independent sample.
Define validity shrinkage.
The tendency for validity coefficients to decrease after cross-validation.
What is a rating error called when scores cluster at the middle?
Central tendency error.
What is the halo effect?
Rater’s inability to discriminate among separate traits, giving uniformly high ratings.
Contrast leniency and severity errors.
Leniency inflates ratings; severity deflates them.
Define utility analysis.
Cost-benefit evaluation of a testing program’s practical value.
What is a Taylor-Russell table used for?
Estimating improvement in selection decisions when a test is added.
What is a selection ratio?
The proportion of applicants hired to those available.
Explain the Angoff method.
Experts estimate probability of minimally competent candidates answering each item correctly to set cut scores.
What does IRT stand for?
Item Response Theory.
In IRT, what does item discrimination indicate?
How well an item differentiates between high and low trait levels.
Define item difficulty in IRT.
The trait level where an examinee has a 50% chance of endorsing the keyed response.
What is computerized adaptive testing (CAT)?
Interactive testing where item selection is based on previous responses.
What are floor effects?
Scores clustering at the lower limit of a test.
What are ceiling effects?
Scores clustering at the upper limit of a test.
Name two primary scaling levels used in psychological measurement.
Nominal and ordinal, plus interval and ratio.
Give an example of a comparative scale.
Paired-comparison ranking of preferences.
What is a Likert scale?
A format where respondents rate agreement on ordered categories.
Define ipsative scale.
Respondents choose between options, producing intra-individual profiles.
What is pilot testing?
Preliminary study to refine items before full test development.
How many subjects per item are recommended for tryout?
At least 5, preferably up to 10.
What is item difficulty index (p)?
Proportion of test-takers who answered the item correctly.
Describe item discrimination index (D).
Difference in item success between high and low scoring groups.
What is the optimal average item difficulty for maximal discrimination?
Approximately 0.50.
What is the formula for optimal p on a four-option multiple choice item?
(1 + chance level) / 2 = (1 + 0.25)/2 = 0.625.
Define item reliability index.
Product of item-score SD and item-total correlation, reflecting internal consistency.
What is differential item functioning (DIF)?
An item showing different probabilities of endorsement across groups at equal trait levels.
What is scoring drift?
Gradual departure of scorers from an anchor protocol over time.
What is an anchor protocol?
A model scoring guide to resolve scoring discrepancies.
List the four measures of central tendency.
Mean, median, mode, and sometimes midrange.
When is median preferred over mean?
When the distribution is skewed or contains extreme scores.
Define variance.
The average squared deviation of scores from the mean.
What does standard deviation represent?
The square root of variance; average distance of scores from the mean.
What is a percentile rank?
Percentage of scores in a distribution that fall below a specific score.
Explain positive skew.
Distribution tail extends toward higher scores; mean > median > mode.
What is kurtosis?
The peakedness or flatness of a distribution’s centre.
What values describe a normal curve?
Mean = median = mode, symmetrical, mesokurtic.