Measuring Intelligence: Tests, Subtests, Psychometrics, and Debates

Historical Background & Foundational Concepts

  • Early measurement of intelligence adopted a strictly analytic (cognitive-testing) view.

  • Binet–Simon Scale (1905)

    • Goal: "measure the child’s intellectual powers" to identify whether a child was “normal” or “retarded.” (Terminology now considered outdated.)

    • Approach:

    • Age-graded test items created for successive age groups.

    • Child worked through age sets until failing an entire set.

    • "Mental age" = highest age level at which child passed all items.

    • Lower mental age than chronological age flagged possible intellectual disability.

    • Significance: First systematic, standardized intelligence test.

  • Terman’s Stanford–Binet Revision (1916)

    • Introduced the Intelligence Quotient (IQ).

    • Formula: IQ=Mental AgeChronological Age×100IQ = \frac{\text{Mental Age}}{\text{Chronological Age}} \times 100

    • Example: 7-yr-old passing 7-yr items → 77×100=100\frac{7}{7}\times100 = 100 (average).

    • 7-yr-old passing 8-yr items → 87×100114\frac{8}{7}\times100 \approx 114 (above average).

    • 7-yr-old failing 6-yr items → 67×10086\frac{6}{7}\times100 \approx 86 (below average).

    • Limitation: Mental-age items capped at 16 ⇒ impossible to test post-16 chronological ages accurately (IQ would artificially decline as denominator rises).

Modern IQ Scoring & Distribution

  • Contemporary tests abandon the mental-age ratio; instead, scores are norm-referenced.

  • IQs distributed normally (bell curve):

    • Mean = 100100, standard deviation = 1515.

    • Allows classification bands (e.g., <70 intellectual disability, 130+130+ gifted).

Stanford–Binet, 5th Edition (SB-5)

  • Contains 10 core subtests yielding 5 factor scores (two subtests per factor):

    1. Fluid Reasoning – solving novel problems (e.g., matrices).

    2. Knowledge – crystallized knowledge (e.g., vocabulary).

    3. Quantitative Reasoning – numerical & arithmetic ability.

    4. Visual–Spatial Processing – pattern detection in visual stimuli.

    5. Working Memory – holding & manipulating information short-term.

Wechsler Family of Tests

  • WAIS-IV (Wechsler Adult Intelligence Scale, 2008 edition)

    • Generates Full-Scale IQ plus four index scores:

    1. Verbal Comprehension Index (VCI)

      • Similarities: "How are apples and pears alike?"

      • Vocabulary: "What is a guitar?"

      • Information: "What is the capital of France?"

      • Comprehension: "Why are we tried by a jury of our peers?"

    2. Working Memory Index (WMI)

      • Digit Span: recall lengthening strings of numbers.

      • Arithmetic: mental calculations.

      • Letter–Number Sequencing: reorder mixed strings (e.g., "Q1 B3 J2" → "1 2 3 B J Q").

    3. Perceptual Reasoning Index (PRI)

      • Block Design: replicate 2-D patterns using colored blocks.

      • Matrix Reasoning: choose missing piece respecting row/column rules (e.g., answer = panel 4 in sample).

      • Visual Puzzles: assemble given parts to match target shape (e.g., combination 1 + 3 + 6).

      • Picture Completion: identify missing element (car without wheels, balloon without string).

      • Figure Weights: balance-scale analogical reasoning (e.g., deduce answer "3" using star/pentagon equivalences).

    4. Processing Speed Index (PSI)

      • Symbol Search: rapid yes/no detection of targets among distractors.

      • Coding: transcribe symbol–digit pairs quickly (code-breaker).

      • Cancellation: mark all instances of specified targets (e.g., red squares & yellow triangles).

  • WISC (Wechsler Intelligence Scale for Children): ages 7–16 (mirrors WAIS indices).

  • WPPSI (Wechsler Preschool & Primary Scale of Intelligence): under 7 yrs.

Raven’s Progressive Matrices

  • Non-verbal multiple-choice test; participant selects missing panel respecting bidirectional rules.

  • Advantages:

    • Minimizes influence of language, reading, writing.

    • Widely used in job selection for perceived cultural fairness.

Psychometric Properties

  • Reliability (consistency)

    • Typically assessed via test–retest correlations.

    • IQ tests often yield r0.85r \approx 0.85 (very high).

    • Perfect reliability would be r=1.0r = 1.0.

  • Validity (accuracy)

    • Criterion validity examined via correlations with theoretically relevant outcomes:

    • School grades, years of education, occupational status, job performance.

    • Typical correlations r0.400.75r \approx 0.40\text{–}0.75 (moderate–strong).

    • Indicates IQ is not a perfect predictor but provides substantial information from a single measure.

Arguments For IQ Testing

  • Practical decision-making:

    • Detect learning disabilities, intellectual disabilities, or giftedness ⇒ tailor intervention/education/treatment.

    • Neuropsychological diagnosis: localize cognitive deficits after brain injury.

  • Objectivity: standardized scoring reduces examiner bias relative to subjective judgment.

  • Predictive utility: moderate correlations with academic and occupational outcomes justify usage.

Arguments Against IQ Testing

  • Construct Validity Concerns: may gauge accumulated knowledge rather than capacity to learn.

  • Cultural Bias: items may favor Western, majority-culture, or academically privileged groups.

  • Labeling & Self-Fulfilment:

    • Diagnostic labels can stigmatize; low-score students may disengage, lowering future performance.

  • Incomplete View of Intelligence: omits domains like emotional intelligence, creative or practical intelligence.

Ethical, Philosophical & Practical Implications

  • Importance of fair test design and interpretation to avoid reinforcing social inequities.

  • Need for multi-faceted assessment (emotional, social, creative) to capture broad human abilities.

  • Continuous revision (e.g., SB-5, WAIS-IV) attempts to keep norms current and reduce bias.

Key Takeaways

  • Intelligence measurement evolved from Binet’s age-based method to modern norm-referenced IQs.

  • Stanford–Binet and Wechsler batteries remain flagship tools, breaking IQ into multiple cognitive factors.

  • Raven’s Matrices offer a culturally neutral, non-verbal reasoning measure.

  • High reliability and moderate–strong validity support utility, yet critiques remind us scores reflect only part of the intelligence construct.

  • Ethical use demands awareness of cultural bias, labeling risks, and the existence of multiple intelligences beyond those captured by traditional IQ tests.