Psychological Assessment and Psychometrics

Distinctions Between Psychological Testing and Psychological Assessment

Psychological testing and psychological assessment are related but distinct processes with different objectives, focuses, and methodologies.

Objective: Testing aims to quantify a construct (obtaining a numerical score), while assessment aims to answer a referral question through evaluation and integration of data.
Focus: Testing utilizes a nomothetic approach (focusing on groups and general laws), whereas assessment uses an idiographic approach (focusing on the individual and their unique characteristics).
Process: Testing can be conducted on an individual or group basis. Assessment is strictly individual-only.
Outcome: The outcome of testing is a test score or a psychometric report. Assessment results in a comprehensive psychological report.
Source of Data: Testing relies solely on the test taker. Assessment incorporates collateral sources (e.g., interviews with family, teachers, archival records).
Evaluator Role: In testing, the tester is not the key factor; the process is often automated or standardized. In assessment, the assessor is the key, as they must integrate and interpret the data.
Duration and Cost: Testing is generally shorter and relatively inexpensive. Assessment is a longer process and significantly more expensive.
Qualification: Testing typically requires a Registered Psychometrician ($RPm$), while assessment requires a Registered Psychologist ($RPsy$).

Varieties and Approaches to Psychological Assessment

Psychological assessment is defined as the gathering and integration of psychology-related data for making a psychological evaluation through tools such as tests, interviews, case studies, behavioral observations, and specialized apparatuses.

Dynamic Psychological Assessment: An interactive approach following the "sandwich method" (evaluation-intervention-evaluation).
Collaborative Psychological Assessment: The assessor and assessee work as "partners" from initial contact to final feedback.
Therapeutic Psychological Assessment: Emphasizes therapeutic self-discovery and gaining new understandings throughout the process.
Educational Assessment: Uses tests (intelligence, achievement, reading comprehension) to evaluate skills relevant to academic success or failure.
Retrospective Assessment: Uses evaluative tools to draw conclusions about psychological aspects of a person as they existed at a prior point in time.
Remote Assessment: Gathers data and draws conclusions about subjects not in physical proximity to the evaluator.
Ecological Momentary Assessment (EMA): An "in the moment" evaluation of specific problems and variables at the exact time and place they occur.

Fundamentals of Psychological Testing and Test Construction

Psychological testing is the process of measuring psychology-related variables using procedures designed to obtain a sample of behavior. A psychological test is the specific device or procedure used.

Key Variables of Testing

Content: The subject matter.
Format: The form, plan, structure, and arrangement/layout of the test.
Item: A specific stimulus to which a person responds overtly; this response is scored or evaluated.
Administration Procedure: Whether the test is administered on an individual or group basis.
Score: A code or summary statement, usually numerical, reflecting performance. Scoring is the process of assigning these values.
Cut Score: A numerical reference point derived by judgment used to divide data into classifications (e.g., pass/fail).
Psychometric Soundness: The technical quality of a test.
Psychometrics: The science of psychological measurements.
Psychometrician: A professional who uses, analyzes, and interprets psychological test data.

Psychometric Properties: Reliability

Reliability refers to the consistency of measurement. A Reliability Coefficient is an index (a proportion) indicating the ratio between the true score variance and the total score variance, ranging from $0$ to $1$ .

Reliability Formulas and Concepts

True Score Formula: $True \, Score = R_{xx}(X - \bar{x}) + \bar{x}$ , where $R_{xx}$ is the correlation coefficient, $X$ is the obtained score, and $\bar{x}$ is the mean.
True Variance Component: $\frac{True \, Variance}{True \, Score \, Variance + Error \, Variance}$ . The higher the proportion of true variance, the more reliable the test.
Classical Test Theory (CTT): Presumes a score ( $X$ ) reflects a True Score ( $T$ ) plus Error ( $E$ ): $X = T + E$ .
Measurement Error:
- Random Error: Unpredictable fluctuations; unavoidable.
- Systematic Error: Constant or proportionate to the true value; avoidable if corrected.

Sources of Error

Test Construction: Item sampling or content sampling variations.
Test Administration: Environment (light, noise, temp), test-taker variables (illness, mood), and examiner variables (physical appearance, gestures).

Reliability Estimates

Test-Retest Reliability: Correlating scores from two administrations. Highers intervals lead to lower Coefficients of Stability. Issues include the Practice Effect, Carryover Effects, and Mortality (subjects dropping out).
Parallel and Alternate Forms: Correlating two equivalent forms. Parallel forms have equal variances. Counterbalancing is used to avoid carryover effects.
Internal Consistency:
- Kuder-Richardson Formula 20 (KR20): Used for dichotomously scored items (True/False) with varying difficulty.
- KR21: Used for dichotomous items with equal difficulty.
- Cronbach's Alpha (\alpha): Used for continuous scales (e.g., Likert).
- McDonald's Omega (\omega): Preferred for complex factor structures.
- Split-Half Reliability: Uses the Spearman-Brown Formula to estimate reliability of a full test from two halves.
Interrater Reliability: Consistency between scorers.
- Cohen's Kappa (\kappa): Agreement between two raters (nominal data).
- Fleiss Kappa: Agreement between three or more raters.
- Kendall's W: Used for ordinal (rank) data.

Psychometric Properties: Validity

Validity is the judgment of how well a test measures what it purports to measure in a specific context.

The Trinitarian View of Validity

Content Validity: Evaluation of subjects and topics covered. Includes Lawshe's Content Validity Ratio (CVR): $CVR = \frac{n_e - (N/2)}{N/2}$ where $n_e$ is the number of experts rating an item "essential" and $N$ is the total number of experts.
Criterion-Related Validity: Relationship between test scores and an external criterion.
- Concurrent Validity: Scores and criterion collected at the same time.
- Predictive Validity: Scores predict a future outcome.
Construct Validity: How well a test reflects a theoretical framework.
- Convergent Validity: High correlation with tests of the same construct.
- Discriminant (Divergent) Validity: Low correlation with tests of unrelated constructs.

Advanced Validity Concepts

Multi-trait Multi-method Matrix (MTMM): Evaluates reliability and validity. Monomethod-monotrait (reliability) should be the highest; Heteromethod-heterotrait should be the lowest.
Face Validity: Whether a test appears valid to the test taker.
Ecological Validity: How well a test measures variables in the environment where they actually occur.

Factor Analysis and Data Reduction

Factor analysis is a mathematical procedure to identify factors (dimensions) on which people differ.

Exploratory Factor Analysis (EFA): Theory-generating; discovers underlying structures.
Confirmatory Factor Analysis (CFA): Theory-testing; confirms a hypothesized structure.
Principal Component Analysis (PCA): Reduces data dimensionality while maximizing variance.
Scree Plot: A plot of eigenvalues against component numbers to find the "elbow" or point of inflection.
Kaiser Criterion: Retain factors with an eigenvalue greater than $1.0$ .

Test Norms and Standardization

Standardization is the process of administering a test to a representative sample to establish norms.

Sampling Methods

Stratified Sampling: Representing subgroups (strata) of the population.
Stratified-Random Sampling: Every member has an equal chance of being selected within their stratum.
Purposive Sampling: Intentionally selecting participants based on specific criteria.
Incidental Sampling (Convenience): Selecting available participants (low budget).

Types of Norms

Percentile Norms: Converting raw data into the percentage of people falling below a score.
Developmental Norms: Based on traits that change with age (Age Norms) or Grade (Grade Norms).
Norm-Referenced vs. Criterion-Referenced: Norm-referenced compares an individual to a group. Criterion-referenced compares an individual to a set standard.

Research Methods and Statistics in Assessment

Levels of Measurement

Nominal: Categories/Names (e.g., gender, eye color).
Ordinal: Ranking (e.g., race rankings, Likert scales).
Interval: Equal intervals, no absolute zero (e.g., IQ, temperature in Celsius).
Ratio: Absolute zero, all mathematical properties (e.g., weight, income, age, height).

Measures of Central Tendency and Variability

Mean: $\bar{x} = \frac{\sum x}{n}$ . Most stable, used for normal distributions.
Median: Middle score. Best for skewed data.
Mode: Most frequent score. Best for nominal data.
Standard Deviation ( $\sigma$ ): Square root of variance.
Interquartile Range ( $IQR$ ): $Q3 - Q1$ .

Distribution Shapes

Positive Skew: Few high scores (difficult test); Mean > Median > Mode.
Negative Skew: Few low scores (easy test); Mean < Median < Mode.
Kurtosis: Peakness of the curve.
- Leptokurtic: Peaked, fat-tailed.
- Platykurtic: Flat, thin-tailed.
- Mesokurtic: Normal curve.

Standard Scores

Z-Score: $Mean = 0$ , $SD = 1$ . Formula: $Z = \frac{X - \mu}{\sigma}$ .
T-Score: $Mean = 50$ , $SD = 10$ . Formula: $T = 50 + 10(Z)$ .
STANINE: $Mean = 5$ , $SD = 2$ .
STEN: $Mean = 5.5$ , $SD = 2$ .
Deviation IQ: $Mean = 100$ , $SD = 15$ .

Statistical Tests and Hypothesis Testing

Parametric Tests: Assume normal distribution and homogeneous variance (e.g., Pearson r, T-tests, ANOVA).
Non-Parametric Tests: "Distribution-free"; used for nominal/ordinal data (e.g., Spearman Rho, Mann-Whitney U, Kruskal-Wallis).
Hypothesis Testing:
- Null Hypothesis ( $H_0$ ): No relationship.
- Alternative Hypothesis ( $H_a$ ): Significant relationship.
- Significant Result: $p \le 0.05$ (usually).
Specific Tests:
- Chi-Squared (\chi^2): Used for categorical data.
- Post Hoc Tests: Used after ANOVA (e.g., Tukey's HSD, Scheff\acute{e}).
- Levene's Test/Bartlett's Test: Check for homogeneity of variance.

The Test Development Process

Test Conceptualization: Defining purpose, target population, and item specifications.
Test Construction: Writing items, setting scaling rules, and defining the Item Pool.
Test Tryout: Administering the prototype to a sample ( $5$ to $10$ subjects per item).
Item Analysis:
- Item-Difficulty Index ( $p$ ): Proportion of correct answers. Optimal difficulty is usually $\frac{1+c}{2}$ (where $c$ is chance).
- Item-Discrimination Index (d$ or $D): Ability of an item to distinguish between high and low scorers.
Test Revision: Modifying content based on analysis; involves Cross-Validation and checking for Validity Shrinkage.

Classifications of Psychological Tests

Based on Administration and Qualification

Individual vs. Group: Level C tests (e.g., Wechsler) are individual-only.
Level A: Straightforward, manual-based (e.g., achievement tests).
Level B: Requires psychometric knowledge (e.g., objective personality tests, group IQ tests).
Level C: Substantial specialized training (e.g., projective tests, individual IQ tests).

Based on Variable Measured

Ability Tests: Measure what a client can do.
- Achievement: Past learning.
- Aptitude: Future potential.
- Intelligence: General mental ability (present-oriented).
Typical Performance: Measure day-to-day behavior.
- Personality Inventories: Stable traits.
- Interest/Values/Attitude Inventories.
- Projective Techniques: Elicit unconscious material (e.g., Rorschach, TAT).

Landmark IQ and Ability Tests

Stanford-Binet (SB-5): Population $2$ to $85+$ years. Measures $5$ CHC factors: Fluid Reasoning, Knowledge, Quantitative Reasoning, Visuo-Spatial, and Working Memory. Use of Routing Tests and Basal/Ceiling levels.
Wechsler Adult Intelligence Scale (WAIS-IV/V): Population $16$ to $90$ years. Indices include Verbal Comprehension ( $VCI$ ), Perceptual Reasoning ( $PRI$ ), Working Memory ( $WMI$ ), and Processing Speed ( $PSI$ ).
Raven's Progressive Matrices (RPM): Non-verbal measure of abstract reasoning. Includes Standard (general), Advanced (gifted), and Colored (children/elderly) versions.
Culture Fair Intelligence Test (CFIT): By R.B. Cattell; minimizes cultural/educational bias.
Differential Aptitude Tests (DAT): Measures specific skills like mechanical reasoning, space relations, and perceptual speed.

Major Personality and Projective Tests

MMPI-2/MMPI-3: Used for psychopathology. Contains clinical scales (e.g., Hypochondriasis, Schizophrenia) and validity scales (e.g., Lie, Infrequency, Correction).
MCMI-III/IV: Based on Theodore Millon's theory; aligns with DSM personality disorders. Uses Base Rate (BR) scores.
NEO-PI-3: Measures the Big Five (OCEAN): Openness, Conscientiousness, Extraversion, Agreeableness, Neuroticism.
MBTI: Measures preferences (Extraversion/Introversion, Sensing/Intuition, Thinking/Feeling, Judging/Perceiving) based on Jungian theory.
16PF: Developed by Cattell; uses STEN scores for $16$ primary factors.
Projective Tests:
- Rorschach Inkblot: $10$ inkblots (5 black/gray, 2 red, 3 multicolored).
- Thematic Apperception Test (TAT): Storytelling based on ambiguous pictures. Includes local versions: Philippine TAT (Lagmay).
- Draw-A-Person (DAPT), House-Tree-Person (HTP), Kinetic Family Drawing (KFD).

Ethical Principles and Philippine Laws

PAP Code of Ethics Principles

Respect for Dignity: Recognizing inherent worth, privacy, and informed consent.
Competent Caring: Working for benefit, maximizing benefits, and "doing no harm."
Integrity: Honesty, accuracy, and managing conflicts of interest.
Professional/Scientific Responsibility to Society: Knowledge promotion and beneficial use.

Relevant Philippine Laws

RA 10029 (Philippine Psychology Act of 2009): Regulates the practice; specifies licensure requirements ( $GWA \ge 75\%$ , no grade below $60\%$ ).
RA 11036 (Mental Health Act): Protects rights of those with psychiatric/neurologic needs.
Data Privacy Act (RA 10173): Governance of personal information.

Advanced Errors and Biases in Assessment

Flynn Effect: Rising IQ scores over time resulting in the need for re-norming.
Hawthorne Effect: Increased performance due to being observed.
Rosenthal/Pygmalion Effect: Examiner's high expectations influence the subject's performance.
Barnum Effect: Accepting vague, general personality descriptions as uniquely accurate.
Rating Errors: Leniency (too easy), Severity (too strict), Central Tendency (middle-only scoring), and Halo/Horn effects.
Social Desirability: "Faking good" to appear approved by the examiner.