Comprehensive Study Notes – Psychological Testing & Assessment
Psychological Testing vs. Psychological Assessment
- Psychological Test
- Measurement tool/technique requiring performance of one or more behaviours to infer attributes, traits, characteristics or predict outcomes.
- Primary goal: obtain a numerical gauge of an ability or attribute.
- Psychological Assessment
- Gathering & integration of psychology-related data (tests, interviews, case studies, behavioural observations, specialised apparatus & procedures) to answer a referral question, solve a problem or reach a decision.
- Focuses on how an individual processes, not only what was produced.
- Key Contrasts
- Objective: Testing → score(s); Assessment → reasoned answer/decision.
- Process: Testing may be group/individual; Assessment typically individualised & integrative.
- Skill: Testing → technician-like scoring & basic interpretation; Assessment → expert tool selection, data integration.
- Outcome: Testing → series of scores; Assessment → logical problem-solving illumination of referral question.
Types of Psychological Assessment
- Collaborative Assessment – assessor & assessee work as partners throughout.
- Therapeutic Assessment – assessment itself used for self-discovery & therapeutic gain.
- Dynamic Assessment – interactive cycle: (1) evaluation, (2) intervention, (3) re-evaluation. Common in education, correctional, corporate, neuropsychological & clinical contexts.
Psychological Tests: Major Classifications
- By Administration
- Individual Tests – administered one-to-one.
- Group Tests – one examiner → many examinees.
- By Behaviour Measured
- Ability Tests – speed, accuracy or both.
- Achievement – previous learning.
- Aptitude – potential to acquire skill.
- Intelligence – capacity to solve problems, adapt, profit from experience.
- Item Types: free-response, multiple choice, Power tests (few/no time limits; difficulty differentiates) vs Speeded tests (easy items, heavy time pressure) vs Adaptive tests (computer tailors difficulty).
- Personality Tests – typical behaviour (traits, temperament, dispositions).
- Structured / Objective – self-report (True/False, Yes/No).
- Projective – ambiguous stimuli; unclear response requirements (e.g., Rorschach inkblots).
Core Concepts & Terminology
- Psychological Construct – unobservable attribute inferred from behaviour.
- Test Format – layout & administration mode (paper, digital, etc.).
- Scoring – assigning evaluative codes.
- Cut Score – numerical reference point dividing classifications.
- Psychometrics – science of psychological measurement.
- Psychometric Soundness – reliability & validity of a test.
- Psychometric Utility – practical usefulness for a purpose.
- Interview (including Panel / Board interview)
- Portfolio – work products.
- Case History / Case Study – archival records integrated into illustrative account.
- Behavioural Observation – naturalistic or structured; may involve role-play tests.
- Computer Assistance
- CAPA – Computer-Assisted Psychological Assessment.
- CAT – Computer Adaptive Testing (item difficulty adapts to ability).
- Report Types: Extended Scoring, Interpretative, Consultative, Integrative.
Users of Tests
- Test Developers – >20 000 new tests/year (APA).
- Test Takers / Assessees – can include deceased (→ Psychological Autopsy).
Applications of Assessment
- Education (school ability, achievement, diagnostic, informal)
- Clinical (maladjustment, therapy outcomes, learning difficulties, forensic)
- Counseling (adjustment, productivity, attitudes/values)
- Geriatric, Business/Military (HR processes, product design, marketing), Governmental credentialing, Program evaluation, Health psychology, Disability assessment (accommodations, alternate assessments).
Historical Milestones
- China (≈2200 BCE) – imperial civil-service exams (music, archery → agriculture, law, strategy).
- 19th C diffusion: British (1832 EIC; 1855 civil service), French & German, U.S. Civil Service Commission (1883).
- Greco-Roman humours → personality types.
- Middle Ages – tests to detect witchcraft/devil association.
- Renaissance–18th C – measurement concepts (Christian von Wolff).
- Francis Galton – anthropometry, correlation, questionnaires.
- Wilhelm Wundt – first psychology lab; studies of RT, perception.
- Spearman – reliability, factor analysis.
- Henri & Binet (1905) – 30-item intelligence scale → clinical testing movement.
- Wechsler (1939) – WAIS: adult intelligence.
- Group Intelligence Tests – WWI/II military screening.
- Woodworth Psychoneurotic Inventory – first wide self-report personality test.
Culture & Assessment Issues
- Culture Definition – socially transmitted patterns, beliefs, products.
- Problems: language comprehension, culturally loaded items, non-verbal behaviours, differing evaluation standards.
- Test & Group Membership – group score differences → debates on bias vs validity; affirmative action to level playing field.
Legal & Ethical Framework
- Laws vs Ethics – statutes vs professional principles (standard of care).
- U.S. Examples:
- Minimum Competency Testing (1970s) – grade promotion, diplomas.
- Truth-in-Testing (1980s) – disclose scoring criteria.
- APA Test-User Qualifications
- Level A – administration/scoring via manual.
- Level B – some technical knowledge (stats, psychometrics).
- Level C – advanced understanding + supervised experience.
- Special Populations – adaptations, alternative assessments.
- Computerised Testing Issues – access/security, equivalence, interpretation quality, unregulated online tests.
- Rights of Test Takers
- Informed consent (purpose, reason, instruments; written).
- Feedback of findings.
- Privacy & confidentiality; privileged communication distinctions.
- Least stigmatizing label.
Statistics Refresher
Frequency Distributions & Graphs
- Frequency distribution: raw scores + occurrence counts.
- Grouped distributions use class intervals.
- Graphs: Histogram, Bar graph, Frequency polygon.
Central Tendency
- \text{Mean}=\frac{\sum X}{N}
- Median – middle score.
- Mode – most frequent.
Variability
- Range = X{max}-X{min}
- Inter-quartile Range = Q3-Q1 ; Semi-IQR =\frac{Q3-Q1}{2}
- Average Deviation (AD).
- Variance s^2=\frac{\sum (X-\bar{X})^2}{N}
- Standard Deviation s=\sqrt{s^2}
Shape Characteristics
- Skewness – positive (tail right) vs negative (tail left).
- Kurtosis – leptokurtic (peaked), mesokurtic, platykurtic (flat).
Normal Curve Facts
- Bell-shaped, symmetrical; mean = median = mode.
- Areas: 68 % within \pm1\sigma, 95 % within \pm2\sigma.
Standard Scores
- Z-score: z=\frac{X-\mu}{\sigma} (mean 0, SD 1).
- T-score: T=10z+50 (mean 50, SD 10; 0–100 range, no negatives).
- Stanine: mean 5, SD ≈2; 9 whole-number categories.
- College-admissions A-score: mean 500, SD 100 (e.g., SAT, GRE).
- Deviation IQ: mean 100, SD 15 (≈95 % between 70–130).
- Linear vs Non-linear transformations; Normalization of skewed distributions.
Test Development Process
- Test Conceptualisation
- Clarify what is measured, objectives, need, users/takers.
- Create test blueprint (content areas × item types × number of items).
- Item Development (Phase 1)
- Deductive vs Inductive.
- Define operational behaviours per objective.
- Generate large item pool (≈2× final length; rule-of-thumb sample ≈10 respondents per item; ≥1000 = “excellent”).
- Follow writing guidelines (clarity, reading level, avoid double-barrelled, balanced positive/negative, no slang, etc.).
- Scale Development & Evaluation (Phases 2-3)
- Pre-testing, item analysis, reliability & validity evidence, norming.
- Selected-Response
- Multiple-Choice (stem + key + distractors).
- Matching (premises vs responses; use extra options & flexible rules).
- Binary/True-False (higher guess rate).
- Constructed-Response
- Completion/Short-Answer, Essay (allows depth; scoring rubrics, multiple raters).
- Other taxonomy: Dichotomous, Polytomous, Likert, Category (1–10), Adjective Checklist, Q-sort.
- Computer-Based
- Item Banking + Item Branching → CAT; reduces items & error 50 %, mitigates floor/ceiling effects.
Scaling Methods
- Summative (Likert) Scale – sum item scores.
- Method of Paired Comparisons – choose preferred of two stimuli; ordinal output.
- Guttman (Scalogram) Scale – hierarchy of agreeability; scalogram analysis.
- Thurstone Equal-Appearing Intervals – judge ratings 1–9; select items with low SD of judges.
Response Bias & Control
- Social Desirability – manage via forced-choice, desirability studies.
- Acquiescence – balance keyed direction.
- Random Responding – embed obvious items, consistency checks.
- Faking Good/Bad – validity (faking) scales; example item “The sun rises in the East.”
- Validity Scales – high scores → non-cooperation, random or fake responding.
Ethical Test Construction & Administration Summary
- Ensure reliability & validity evidence, clear objectives, adequate norms.
- Provide accommodations & alternate assessments when required.
- Maintain confidentiality, informed consent, appropriate test-user qualification.
- Vigilance versus cultural bias, legal mandates, and response distortions.