Comprehensive Study Notes – Psychological Testing & Assessment

Psychological Testing vs. Psychological Assessment

  • Psychological Test
    • Measurement tool/technique requiring performance of one or more behaviours to infer attributes, traits, characteristics or predict outcomes.
    • Primary goal: obtain a numerical gauge of an ability or attribute.
  • Psychological Assessment
    • Gathering & integration of psychology-related data (tests, interviews, case studies, behavioural observations, specialised apparatus & procedures) to answer a referral question, solve a problem or reach a decision.
    • Focuses on how an individual processes, not only what was produced.
  • Key Contrasts
    • Objective: Testing → score(s); Assessment → reasoned answer/decision.
    • Process: Testing may be group/individual; Assessment typically individualised & integrative.
    • Skill: Testing → technician-like scoring & basic interpretation; Assessment → expert tool selection, data integration.
    • Outcome: Testing → series of scores; Assessment → logical problem-solving illumination of referral question.

Types of Psychological Assessment

  • Collaborative Assessment – assessor & assessee work as partners throughout.
  • Therapeutic Assessment – assessment itself used for self-discovery & therapeutic gain.
  • Dynamic Assessment – interactive cycle: (1) evaluation, (2) intervention, (3) re-evaluation. Common in education, correctional, corporate, neuropsychological & clinical contexts.

Psychological Tests: Major Classifications

  • By Administration
    • Individual Tests – administered one-to-one.
    • Group Tests – one examiner → many examinees.
  • By Behaviour Measured
    1. Ability Tests – speed, accuracy or both.
    • Achievement – previous learning.
    • Aptitude – potential to acquire skill.
    • Intelligence – capacity to solve problems, adapt, profit from experience.
    • Item Types: free-response, multiple choice, Power tests (few/no time limits; difficulty differentiates) vs Speeded tests (easy items, heavy time pressure) vs Adaptive tests (computer tailors difficulty).
    1. Personality Tests – typical behaviour (traits, temperament, dispositions).
    • Structured / Objective – self-report (True/False, Yes/No).
    • Projective – ambiguous stimuli; unclear response requirements (e.g., Rorschach inkblots).

Core Concepts & Terminology

  • Psychological Construct – unobservable attribute inferred from behaviour.
  • Test Format – layout & administration mode (paper, digital, etc.).
  • Scoring – assigning evaluative codes.
  • Cut Score – numerical reference point dividing classifications.
  • Psychometrics – science of psychological measurement.
    • Psychometric Soundness – reliability & validity of a test.
    • Psychometric Utility – practical usefulness for a purpose.

Major Assessment Tools

  • Interview (including Panel / Board interview)
  • Portfolio – work products.
  • Case History / Case Study – archival records integrated into illustrative account.
  • Behavioural Observation – naturalistic or structured; may involve role-play tests.
  • Computer Assistance
    • CAPA – Computer-Assisted Psychological Assessment.
    • CAT – Computer Adaptive Testing (item difficulty adapts to ability).
    • Report Types: Extended Scoring, Interpretative, Consultative, Integrative.

Users of Tests

  • Test Developers – >20 000 new tests/year (APA).
  • Test Takers / Assessees – can include deceased (→ Psychological Autopsy).

Applications of Assessment

  • Education (school ability, achievement, diagnostic, informal)
  • Clinical (maladjustment, therapy outcomes, learning difficulties, forensic)
  • Counseling (adjustment, productivity, attitudes/values)
  • Geriatric, Business/Military (HR processes, product design, marketing), Governmental credentialing, Program evaluation, Health psychology, Disability assessment (accommodations, alternate assessments).

Historical Milestones

  • China (≈2200 BCE) – imperial civil-service exams (music, archery → agriculture, law, strategy).
  • 19th C diffusion: British (1832 EIC; 1855 civil service), French & German, U.S. Civil Service Commission (1883).
  • Greco-Roman humours → personality types.
  • Middle Ages – tests to detect witchcraft/devil association.
  • Renaissance–18th C – measurement concepts (Christian von Wolff).
  • Francis Galton – anthropometry, correlation, questionnaires.
  • Wilhelm Wundt – first psychology lab; studies of RT, perception.
  • Spearman – reliability, factor analysis.
  • Henri & Binet (1905) – 30-item intelligence scale → clinical testing movement.
  • Wechsler (1939) – WAIS: adult intelligence.
  • Group Intelligence Tests – WWI/II military screening.
  • Woodworth Psychoneurotic Inventory – first wide self-report personality test.

Culture & Assessment Issues

  • Culture Definition – socially transmitted patterns, beliefs, products.
  • Problems: language comprehension, culturally loaded items, non-verbal behaviours, differing evaluation standards.
  • Test & Group Membership – group score differences → debates on bias vs validity; affirmative action to level playing field.

Legal & Ethical Framework

  • Laws vs Ethics – statutes vs professional principles (standard of care).
  • U.S. Examples:
    • Minimum Competency Testing (1970s) – grade promotion, diplomas.
    • Truth-in-Testing (1980s) – disclose scoring criteria.
  • APA Test-User Qualifications
    • Level A – administration/scoring via manual.
    • Level B – some technical knowledge (stats, psychometrics).
    • Level C – advanced understanding + supervised experience.
  • Special Populations – adaptations, alternative assessments.
  • Computerised Testing Issues – access/security, equivalence, interpretation quality, unregulated online tests.
  • Rights of Test Takers
    • Informed consent (purpose, reason, instruments; written).
    • Feedback of findings.
    • Privacy & confidentiality; privileged communication distinctions.
    • Least stigmatizing label.

Statistics Refresher

Frequency Distributions & Graphs

  • Frequency distribution: raw scores + occurrence counts.
  • Grouped distributions use class intervals.
  • Graphs: Histogram, Bar graph, Frequency polygon.

Central Tendency

  • \text{Mean}=\frac{\sum X}{N}
  • Median – middle score.
  • Mode – most frequent.

Variability

  • Range = X{max}-X{min}
  • Inter-quartile Range = Q3-Q1 ; Semi-IQR =\frac{Q3-Q1}{2}
  • Average Deviation (AD).
  • Variance s^2=\frac{\sum (X-\bar{X})^2}{N}
  • Standard Deviation s=\sqrt{s^2}

Shape Characteristics

  • Skewness – positive (tail right) vs negative (tail left).
  • Kurtosis – leptokurtic (peaked), mesokurtic, platykurtic (flat).

Normal Curve Facts

  • Bell-shaped, symmetrical; mean = median = mode.
  • Areas: 68 % within \pm1\sigma, 95 % within \pm2\sigma.

Standard Scores

  • Z-score: z=\frac{X-\mu}{\sigma} (mean 0, SD 1).
  • T-score: T=10z+50 (mean 50, SD 10; 0–100 range, no negatives).
  • Stanine: mean 5, SD ≈2; 9 whole-number categories.
  • College-admissions A-score: mean 500, SD 100 (e.g., SAT, GRE).
  • Deviation IQ: mean 100, SD 15 (≈95 % between 70–130).
  • Linear vs Non-linear transformations; Normalization of skewed distributions.

Test Development Process

  1. Test Conceptualisation
    • Clarify what is measured, objectives, need, users/takers.
    • Create test blueprint (content areas × item types × number of items).
  2. Item Development (Phase 1)
    • Deductive vs Inductive.
    • Define operational behaviours per objective.
    • Generate large item pool (≈2× final length; rule-of-thumb sample ≈10 respondents per item; ≥1000 = “excellent”).
    • Follow writing guidelines (clarity, reading level, avoid double-barrelled, balanced positive/negative, no slang, etc.).
  3. Scale Development & Evaluation (Phases 2-3)
    • Pre-testing, item analysis, reliability & validity evidence, norming.

Item Formats

  • Selected-Response
    1. Multiple-Choice (stem + key + distractors).
    2. Matching (premises vs responses; use extra options & flexible rules).
    3. Binary/True-False (higher guess rate).
  • Constructed-Response
    • Completion/Short-Answer, Essay (allows depth; scoring rubrics, multiple raters).
  • Other taxonomy: Dichotomous, Polytomous, Likert, Category (1–10), Adjective Checklist, Q-sort.
  • Computer-Based
    • Item Banking + Item Branching → CAT; reduces items & error 50 %, mitigates floor/ceiling effects.

Scaling Methods

  • Summative (Likert) Scale – sum item scores.
  • Method of Paired Comparisons – choose preferred of two stimuli; ordinal output.
  • Guttman (Scalogram) Scale – hierarchy of agreeability; scalogram analysis.
  • Thurstone Equal-Appearing Intervals – judge ratings 1–9; select items with low SD of judges.

Response Bias & Control

  • Social Desirability – manage via forced-choice, desirability studies.
  • Acquiescence – balance keyed direction.
  • Random Responding – embed obvious items, consistency checks.
  • Faking Good/Bad – validity (faking) scales; example item “The sun rises in the East.”
  • Validity Scales – high scores → non-cooperation, random or fake responding.

Ethical Test Construction & Administration Summary

  • Ensure reliability & validity evidence, clear objectives, adequate norms.
  • Provide accommodations & alternate assessments when required.
  • Maintain confidentiality, informed consent, appropriate test-user qualification.
  • Vigilance versus cultural bias, legal mandates, and response distortions.