Comprehensive Study Notes – Psychological Testing & Assessment

Psychological Testing vs. Psychological Assessment

Psychological Test
- Measurement tool/technique requiring performance of one or more behaviours to infer attributes, traits, characteristics or predict outcomes.
- Primary goal: obtain a numerical gauge of an ability or attribute.
Psychological Assessment
- Gathering & integration of psychology-related data (tests, interviews, case studies, behavioural observations, specialised apparatus & procedures) to answer a referral question, solve a problem or reach a decision.
- Focuses on how an individual processes, not only what was produced.
Key Contrasts
- Objective: Testing → score(s); Assessment → reasoned answer/decision.
- Process: Testing may be group/individual; Assessment typically individualised & integrative.
- Skill: Testing → technician-like scoring & basic interpretation; Assessment → expert tool selection, data integration.
- Outcome: Testing → series of scores; Assessment → logical problem-solving illumination of referral question.

Types of Psychological Assessment

Collaborative Assessment – assessor & assessee work as partners throughout.
Therapeutic Assessment – assessment itself used for self-discovery & therapeutic gain.
Dynamic Assessment – interactive cycle: (1) evaluation, (2) intervention, (3) re-evaluation. Common in education, correctional, corporate, neuropsychological & clinical contexts.

Psychological Tests: Major Classifications

By Administration
- Individual Tests – administered one-to-one.
- Group Tests – one examiner → many examinees.
By Behaviour Measured
1. Ability Tests – speed, accuracy or both.
- Achievement – previous learning.
- Aptitude – potential to acquire skill.
- Intelligence – capacity to solve problems, adapt, profit from experience.
- Item Types: free-response, multiple choice, Power tests (few/no time limits; difficulty differentiates) vs Speeded tests (easy items, heavy time pressure) vs Adaptive tests (computer tailors difficulty).
1. Personality Tests – typical behaviour (traits, temperament, dispositions).
- Structured / Objective – self-report (True/False, Yes/No).
- Projective – ambiguous stimuli; unclear response requirements (e.g., Rorschach inkblots).

Core Concepts & Terminology

Psychological Construct – unobservable attribute inferred from behaviour.
Test Format – layout & administration mode (paper, digital, etc.).
Scoring – assigning evaluative codes.
Cut Score – numerical reference point dividing classifications.
Psychometrics – science of psychological measurement.
- Psychometric Soundness – reliability & validity of a test.
- Psychometric Utility – practical usefulness for a purpose.

Major Assessment Tools

Interview (including Panel / Board interview)
Portfolio – work products.
Case History / Case Study – archival records integrated into illustrative account.
Behavioural Observation – naturalistic or structured; may involve role-play tests.
Computer Assistance
- CAPA – Computer-Assisted Psychological Assessment.
- CAT – Computer Adaptive Testing (item difficulty adapts to ability).
- Report Types: Extended Scoring, Interpretative, Consultative, Integrative.

Users of Tests

Test Developers – >20 000 new tests/year (APA).
Test Takers / Assessees – can include deceased (→ Psychological Autopsy).

Applications of Assessment

Education (school ability, achievement, diagnostic, informal)
Clinical (maladjustment, therapy outcomes, learning difficulties, forensic)
Counseling (adjustment, productivity, attitudes/values)
Geriatric, Business/Military (HR processes, product design, marketing), Governmental credentialing, Program evaluation, Health psychology, Disability assessment (accommodations, alternate assessments).

Historical Milestones

China (≈2200 BCE) – imperial civil-service exams (music, archery → agriculture, law, strategy).
19th C diffusion: British (1832 EIC; 1855 civil service), French & German, U.S. Civil Service Commission (1883).
Greco-Roman humours → personality types.
Middle Ages – tests to detect witchcraft/devil association.
Renaissance–18th C – measurement concepts (Christian von Wolff).
Francis Galton – anthropometry, correlation, questionnaires.
Wilhelm Wundt – first psychology lab; studies of RT, perception.
Spearman – reliability, factor analysis.
Henri & Binet (1905) – 30-item intelligence scale → clinical testing movement.
Wechsler (1939) – WAIS: adult intelligence.
Group Intelligence Tests – WWI/II military screening.
Woodworth Psychoneurotic Inventory – first wide self-report personality test.

Culture & Assessment Issues

Culture Definition – socially transmitted patterns, beliefs, products.
Problems: language comprehension, culturally loaded items, non-verbal behaviours, differing evaluation standards.
Test & Group Membership – group score differences → debates on bias vs validity; affirmative action to level playing field.

Legal & Ethical Framework

Laws vs Ethics – statutes vs professional principles (standard of care).
U.S. Examples:
- Minimum Competency Testing (1970s) – grade promotion, diplomas.
- Truth-in-Testing (1980s) – disclose scoring criteria.
APA Test-User Qualifications
- Level A – administration/scoring via manual.
- Level B – some technical knowledge (stats, psychometrics).
- Level C – advanced understanding + supervised experience.
Special Populations – adaptations, alternative assessments.
Computerised Testing Issues – access/security, equivalence, interpretation quality, unregulated online tests.
Rights of Test Takers
- Informed consent (purpose, reason, instruments; written).
- Feedback of findings.
- Privacy & confidentiality; privileged communication distinctions.
- Least stigmatizing label.

Statistics Refresher

Frequency Distributions & Graphs

Frequency distribution: raw scores + occurrence counts.
Grouped distributions use class intervals.
Graphs: Histogram, Bar graph, Frequency polygon.

Central Tendency

\text{Mean}=\frac{\sum X}{N}
Median – middle score.
Mode – most frequent.

Variability

Range = X{max}-X{min}
Inter-quartile Range = Q3-Q1 ; Semi-IQR =\frac{Q3-Q1}{2}
Average Deviation (AD).
Variance s^2=\frac{\sum (X-\bar{X})^2}{N}
Standard Deviation s=\sqrt{s^2}

Shape Characteristics

Skewness – positive (tail right) vs negative (tail left).
Kurtosis – leptokurtic (peaked), mesokurtic, platykurtic (flat).

Normal Curve Facts

Bell-shaped, symmetrical; mean = median = mode.
Areas: 68 % within \pm1\sigma, 95 % within \pm2\sigma.

Standard Scores

Z-score: z=\frac{X-\mu}{\sigma} (mean 0, SD 1).
T-score: T=10z+50 (mean 50, SD 10; 0–100 range, no negatives).
Stanine: mean 5, SD ≈2; 9 whole-number categories.
College-admissions A-score: mean 500, SD 100 (e.g., SAT, GRE).
Deviation IQ: mean 100, SD 15 (≈95 % between 70–130).
Linear vs Non-linear transformations; Normalization of skewed distributions.

Test Development Process

Test Conceptualisation
- Clarify what is measured, objectives, need, users/takers.
- Create test blueprint (content areas × item types × number of items).
Item Development (Phase 1)
- Deductive vs Inductive.
- Define operational behaviours per objective.
- Generate large item pool (≈2× final length; rule-of-thumb sample ≈10 respondents per item; ≥1000 = “excellent”).
- Follow writing guidelines (clarity, reading level, avoid double-barrelled, balanced positive/negative, no slang, etc.).
Scale Development & Evaluation (Phases 2-3)
- Pre-testing, item analysis, reliability & validity evidence, norming.

Item Formats

Selected-Response
1. Multiple-Choice (stem + key + distractors).
2. Matching (premises vs responses; use extra options & flexible rules).
3. Binary/True-False (higher guess rate).
Constructed-Response
- Completion/Short-Answer, Essay (allows depth; scoring rubrics, multiple raters).
Other taxonomy: Dichotomous, Polytomous, Likert, Category (1–10), Adjective Checklist, Q-sort.
Computer-Based
- Item Banking + Item Branching → CAT; reduces items & error 50 %, mitigates floor/ceiling effects.

Scaling Methods

Summative (Likert) Scale – sum item scores.
Method of Paired Comparisons – choose preferred of two stimuli; ordinal output.
Guttman (Scalogram) Scale – hierarchy of agreeability; scalogram analysis.
Thurstone Equal-Appearing Intervals – judge ratings 1–9; select items with low SD of judges.

Response Bias & Control

Social Desirability – manage via forced-choice, desirability studies.
Acquiescence – balance keyed direction.
Random Responding – embed obvious items, consistency checks.
Faking Good/Bad – validity (faking) scales; example item “The sun rises in the East.”
Validity Scales – high scores → non-cooperation, random or fake responding.

Ethical Test Construction & Administration Summary

Ensure reliability & validity evidence, clear objectives, adequate norms.
Provide accommodations & alternate assessments when required.
Maintain confidentiality, informed consent, appropriate test-user qualification.
Vigilance versus cultural bias, legal mandates, and response distortions.