Measuring Intelligence: Tests, Subtests, Psychometrics, and Debates
Historical Background & Foundational Concepts
Early measurement of intelligence adopted a strictly analytic (cognitive-testing) view.
Binet–Simon Scale (1905)
Goal: "measure the child’s intellectual powers" to identify whether a child was “normal” or “retarded.” (Terminology now considered outdated.)
Approach:
Age-graded test items created for successive age groups.
Child worked through age sets until failing an entire set.
"Mental age" = highest age level at which child passed all items.
Lower mental age than chronological age flagged possible intellectual disability.
Significance: First systematic, standardized intelligence test.
Terman’s Stanford–Binet Revision (1916)
Introduced the Intelligence Quotient (IQ).
Formula:
Example: 7-yr-old passing 7-yr items → (average).
7-yr-old passing 8-yr items → (above average).
7-yr-old failing 6-yr items → (below average).
Limitation: Mental-age items capped at 16 ⇒ impossible to test post-16 chronological ages accurately (IQ would artificially decline as denominator rises).
Modern IQ Scoring & Distribution
Contemporary tests abandon the mental-age ratio; instead, scores are norm-referenced.
IQs distributed normally (bell curve):
Mean = , standard deviation = .
Allows classification bands (e.g., <70 intellectual disability, gifted).
Stanford–Binet, 5th Edition (SB-5)
Contains 10 core subtests yielding 5 factor scores (two subtests per factor):
Fluid Reasoning – solving novel problems (e.g., matrices).
Knowledge – crystallized knowledge (e.g., vocabulary).
Quantitative Reasoning – numerical & arithmetic ability.
Visual–Spatial Processing – pattern detection in visual stimuli.
Working Memory – holding & manipulating information short-term.
Wechsler Family of Tests
WAIS-IV (Wechsler Adult Intelligence Scale, 2008 edition)
Generates Full-Scale IQ plus four index scores:
Verbal Comprehension Index (VCI)
Similarities: "How are apples and pears alike?"
Vocabulary: "What is a guitar?"
Information: "What is the capital of France?"
Comprehension: "Why are we tried by a jury of our peers?"
Working Memory Index (WMI)
Digit Span: recall lengthening strings of numbers.
Arithmetic: mental calculations.
Letter–Number Sequencing: reorder mixed strings (e.g., "Q1 B3 J2" → "1 2 3 B J Q").
Perceptual Reasoning Index (PRI)
Block Design: replicate 2-D patterns using colored blocks.
Matrix Reasoning: choose missing piece respecting row/column rules (e.g., answer = panel 4 in sample).
Visual Puzzles: assemble given parts to match target shape (e.g., combination 1 + 3 + 6).
Picture Completion: identify missing element (car without wheels, balloon without string).
Figure Weights: balance-scale analogical reasoning (e.g., deduce answer "3" using star/pentagon equivalences).
Processing Speed Index (PSI)
Symbol Search: rapid yes/no detection of targets among distractors.
Coding: transcribe symbol–digit pairs quickly (code-breaker).
Cancellation: mark all instances of specified targets (e.g., red squares & yellow triangles).
WISC (Wechsler Intelligence Scale for Children): ages 7–16 (mirrors WAIS indices).
WPPSI (Wechsler Preschool & Primary Scale of Intelligence): under 7 yrs.
Raven’s Progressive Matrices
Non-verbal multiple-choice test; participant selects missing panel respecting bidirectional rules.
Advantages:
Minimizes influence of language, reading, writing.
Widely used in job selection for perceived cultural fairness.
Psychometric Properties
Reliability (consistency)
Typically assessed via test–retest correlations.
IQ tests often yield (very high).
Perfect reliability would be .
Validity (accuracy)
Criterion validity examined via correlations with theoretically relevant outcomes:
School grades, years of education, occupational status, job performance.
Typical correlations (moderate–strong).
Indicates IQ is not a perfect predictor but provides substantial information from a single measure.
Arguments For IQ Testing
Practical decision-making:
Detect learning disabilities, intellectual disabilities, or giftedness ⇒ tailor intervention/education/treatment.
Neuropsychological diagnosis: localize cognitive deficits after brain injury.
Objectivity: standardized scoring reduces examiner bias relative to subjective judgment.
Predictive utility: moderate correlations with academic and occupational outcomes justify usage.
Arguments Against IQ Testing
Construct Validity Concerns: may gauge accumulated knowledge rather than capacity to learn.
Cultural Bias: items may favor Western, majority-culture, or academically privileged groups.
Labeling & Self-Fulfilment:
Diagnostic labels can stigmatize; low-score students may disengage, lowering future performance.
Incomplete View of Intelligence: omits domains like emotional intelligence, creative or practical intelligence.
Ethical, Philosophical & Practical Implications
Importance of fair test design and interpretation to avoid reinforcing social inequities.
Need for multi-faceted assessment (emotional, social, creative) to capture broad human abilities.
Continuous revision (e.g., SB-5, WAIS-IV) attempts to keep norms current and reduce bias.
Key Takeaways
Intelligence measurement evolved from Binet’s age-based method to modern norm-referenced IQs.
Stanford–Binet and Wechsler batteries remain flagship tools, breaking IQ into multiple cognitive factors.
Raven’s Matrices offer a culturally neutral, non-verbal reasoning measure.
High reliability and moderate–strong validity support utility, yet critiques remind us scores reflect only part of the intelligence construct.
Ethical use demands awareness of cultural bias, labeling risks, and the existence of multiple intelligences beyond those captured by traditional IQ tests.