Intelligence Testing Notes

Lecture Overview

Prevailing models:
- Cattell-Horn-Carroll (CHC) theory
- Multiple Intelligences
- Triarchic theory
- PASS model
Cultural bias
High stakes decisions
Ethical issues
History of IQ testing, development, reliability, measurement.
Clinical and beneficial uses of IQ tests.
Definition of intelligence.
Risks of IQ tests.

What is Intelligence?

David Wechsler (1944): "The aggregate or global capacity of the individual to act purposefully, to think rationally and deal effectively with his environment."
Robert Sternberg (1985): "Mental activity directed toward purposive adaptation to, and selection and shaping of, real-world environments relevant to one’s life."
Jagannath Prasad Das (1984): "The ability to plan and structure one’s behavior with an end in view."
John Wasserman (2018): "In spite of over a century of research, the study of intelligence remains controversial for its social applications and implications."

Early Concepts

"g" factor (Charles Spearman, 1904): Intelligence is a singular construct.
Intelligence quotient (Alfred Binet, 1905): A child’s score on a test, divided by their age, multiplied by 100: IQ = \frac{mental\ age}{chronological\ age} \times 100
Fluid intelligence (Raymond Cattell, 1963): Abstract reasoning on novel tasks.
Crystallized intelligence (Raymond Cattell, 1963): Learned procedures and knowledge.

Major Theories of Intelligence

Cattell-Horn-Carroll (CHC) Theory
Sternberg’s Triarchic Theory of Successful Intelligence
Gardner’s Multiple Intelligences
Cognitive Processing Theories (PASS model)

Cattell-Horn-Carroll (CHC) Theory

Incorporates Cattell and Horn’s theory of fluid and crystallized intelligence (without g).
Based on John Carroll’s (1993) "three-stratum theory" (with g).
Developed and confirmed through factor analysis.
Currently the most widely accepted theory of cognitive abilities.
Strengths:
- Evidence-based/data-based.
- Widespread application.
- Useful in guiding assessment for specific learning disabilities or comprehensive cognitive ability assessments.
- Ever-evolving.
Weaknesses:
- Complicated and confusing for clinicians and clients.
- Challenging to view some abilities as relevant to "intelligence."
- Ever-evolving, making it difficult to know the current version.

Multiple Intelligences

Howard Gardner (1983) recognized that there are many ways people show “intelligences” beyond those valued by Western societies.
These areas were distinct from each other – not theoretically related.
Gardner’s multiple intelligences include:
- Linguistic intelligence
- Logical-mathematical intelligence
- Musical intelligence
- Visual-spatial intelligence
- Bodily-kinaesthetic intelligence
- Naturalist intelligence
- Interpersonal intelligence
- Intrapersonal intelligence
Strengths:
- Recognizes a wide range of capabilities/talents, not just "book smarts."
- Has informed different teaching approaches.
Weaknesses:
- Are these "intelligences" better described as "talents" or "skills"?
- Are these intelligences truly independent of each other?
- Limited supporting research evidence (though difficult to measure).
- Difficult to quantify performance.
- Often conflated with the prevailing myth of "learning styles."
Sternberg (1991): It is very difficult, if not impossible, to quantify performance [on these measures]; assessments take place over extremely long periods of time, and it is questionable whether anything approaching objective scoring is even possible

Sternberg’s Triarchic Theory of Successful Intelligence

Sternberg believed schools focus on analytical and memory abilities too much, and not enough on creative and practical abilities; all three needed to function together for someone to use their intelligence successfully.
3 dimensions of success:
- Componential (internal processes): metacognition, planning/organizing, memory retrieval, knowledge acquisition.
- Experiential: how well people connect their internal world to external reality – applying insights, synthesizing, dealing with novel problems, automatizing.
- Contextual: how well people adapt to, select, and shape their environments; "street smarts."
Three areas associated with successful intelligence:
- Analytical abilities: Useful in analyzing and evaluating options; problem-solving skills.
- Creative abilities: Use of experience in ways that foster insight and new ideas.
- Practical abilities: Use of tacit knowledge to adapt to changing contexts in everyday life.
Strengths:
- Combines internal aspects of intelligence (e.g., problem-solving and reasoning skills) with external aspects (e.g., experience, practice).
- Focus on real-world success.
- Traits can more easily be measured.
- Suggests that many real-life intelligent decisions are not measured by current standardized tests.
Weaknesses:
- Limited information about how the componential, experiential and contextual dimensions relate to one another.
- Are these dimensions distinct, or interrelated?
- Is there a mixing of personality traits (e.g., confidence, sociability) with intelligence?

PASS Theory (Das, Naglieri & Kirby, 1994)

An alternative to the idea of g, emphasizing psychological processes.
PASS = Planning-Attention-Simultaneous-Successive Processing theory
- Planning = cognitive control, knowledge, intentionality, self-regulation
- Attention = focused cognitive activity
- Simultaneous processing = perception of stimuli as a whole, including the ability to integrate words into a meaningful idea
- Successive processing = making a decision based on stimuli in a sequence
These work together when doing intellectual tasks – some stronger than others, depending on the task
Strengths:
- Based on neuropsychological theory about information processing.
- Some IQ tests are based on this model (Naglieri’s Cognitive Assessment System – CAS2, and Kaufman’s KABC-II).
- Factor-analytic studies of performance on the Cognitive Assessment System (CAS) show reasonable support for the PASS model (according to the authors).
Weaknesses:
- Independent authors have claimed that the PASS model is not supported by factor analysis of CAS results.
- The Planning and Attention factors are highly correlated (r = .99).
- Are the components identified in the PASS model actually what is being measured in the CAS2?

Important Note on IQ Tests

The use and access of standardized IQ tests is restricted to registered psychologists only.
Only registered psychologists can purchase the tests, directly from the publisher. They agree to a contract of use, including keeping the test materials secure and confidential.
In addition to copyright law, the content of IQ tests are considered trade secrets.

History of IQ Tests

Stanford-Binet intelligence scale developed during the era of emerging interest in developmental theory and intelligence theory (early 1900s).
World War I (1914 - 1918) created a need to identify which soldiers were capable for different roles, hence the Army Mental Tests were created (1920).
David Wechsler developed his first test (the Wechsler Intelligence Scale for Children) by ‘cherry-picking’ the most statistically and clinically useful subtests from several different existing tests; included Verbal IQ and a Performance IQ, and used a deviation-based IQ.
Use of IQ tests contributed to forced sterilization, institutionalization, racial segregation, dehumanization, and genocide in the historical context either side of World War II.
Must always remember not to use IQ tests as a tool for discrimination and harm, and speak up when they are used this way.

The Flynn Effect

Population average IQ scores gradually increase by around 0.33 IQ points per year.
First observed when the Raven’s Progressive Matrices were routinely given to all 18-year-old army recruits over a long period of time.
Some differences between countries.
The increase is stronger for fluid rather than crystallized intelligence.
There is some evidence that the Flynn Effect could plateau.
Potential causes: education, familiarity with testing conditions generally, changes to family life (technology, smaller families, more learning opportunities).
Life expectancy, infant mortality, and height, have also followed this trend.
Has some unintended consequences (e.g., accuracy of diagnosis, criminal culpability).

Measurement Terminology

Norm-referenced standardized tests
- Constructed by professional test makers.
- Normed on a representative sample from the population for which the test is intended.
- Involve fixed (standardized) procedures for administration and scoring.

Test Norms: How are they Developed?

Test items are chosen with good psychometric properties, and which together produce a range of performance (i.e., easier versus more difficult items).
The finalized test is administered to an appropriately-sized, representative sample.
Individuals’ raw scores are converted into standardized scaled or composite scores.
Individual subtest scores usually have a mean of 10 and a standard deviation of 3.
Composite scores (e.g. IQ) usually have a mean of 100 with standard deviation of 15.
These scores reflect position above or below the mean.
Standardized scores are normally distributed.
We can compare how an individual is positioned in relation to the representative, standardized sample.

Percentile Ranks

Illustrates where an individual falls with respect to the rest of the standardization sample.
Uses standard deviations from the mean.

Standardized Testing

The results of a test are reliable and valid only to the extent that the test was administered according to standardized procedures.
What is standardized:
- Environmental factors
- Test instructions
- Acceptable responses
- Scoring procedures
- Test procedures
- Using norms to convert scores

Test Score, “True Score” and Error

Sources of error:
- Within the test (items don’t perfectly tap into and consistently measure the construct, reliability and validity aren’t perfect)
- The examiner (deviations from standardized administration and scoring, mistakes)
- The test-taker (moments of distraction/inattention, poor sleep, feeling stressed about a significant life issue, feeling anxious about test-taking, forgot glasses)
- The testing environment (fire alarm goes off, noisy children using adjacent corridor causing distraction)
We try to reduce error by establishing rapport, following standardized processes, and trying to ensure an appropriate testing environment – but we can never entirely eliminate error.
A person’s score on a psychometric test represents a “snapshot” of their performance at that time.
Purely theoretically (i.e., not taking into account practice effects), if they were to take the test again and again, their score would fluctuate.
This pattern of different scores would be assumed to fall in a normal distribution with the “true” test score being at the peak/ the mean.
So, a person’s test score = their hypothetical “true score” plus error.

Structure of Common IQ Tests

Each question is called an item – points are awarded for correct/appropriate responses.
Individual tasks are called subtests, which are comprised of many items (e.g., Digit Span, Vocabulary).
Subtests that are designed to measure aspects of the same broad area (e.g. fluid reasoning) are clustered together into composite scores.
A set number of the subtests across the composites are compiled to a summary score, often called the Full Scale IQ (FSIQ)
Conceptually, this structure can be understood in alignment with CHC theory (though tests vary in how well they map onto CHC).

What Results Might Look Like

Numbers of correct items are added up to raw scores.
Raw scores are converted to standard scores by comparing them with normative data for the examinee’s age.
Subtest standard scores that measure the same area are added together to make a “sum of scaled scores”, and then this sum is converted to composite scores.
The FSIQ isn’t always the “full scale” – on the WISC-V, it is derived from only 7 of the 10 core subtests.
When interpreting test scores, we focus on the broadest measure, as it includes a wider range of tasks and is therefore more robust against error.
The FSIQ is the most reliable summary of overall ability – if there are highs and lows, interpreting the composites may be more meaningful.
Subtests may be considered with caution, and item results are too narrow to interpret, though may give us interesting qualitative observations sometimes.

Use of IQ Tests

Part of the diagnostic criteria for Intellectual Developmental Disorder (Intellectual Disability).
Can give us an idea of an individual’s cognitive strengths and weaknesses.
Identifying giftedness, or even average ability, can indicate when somebody is underachieving.
Can be useful in differential diagnosis.
In bureaucratic systems, they can enable people to qualify for support where resources are limited.
Findings can bring insight to clients and those who support them, and can inform suitable intervention.

Important Note

Standardized tests (including IQ tests) should only ever be one small part of an assessment process.
Results should never be interpreted without the context you get from interviews, observations, and other collateral data sources.
Tests can give us new insights and can confirm our hypotheses with more “objective” data.
But diagnoses, interpretations, and recommendations are made based on a combination of professional skills and clinical judgment, not on test results alone, and test results do not have the final say on clinical decision making.

Nonverbal IQ Tests

The Army Mental Tests (Beta) consisted of nonverbal tasks to assess recruits with low literacy or English skills (e.g., mazes, picture sequences, puzzles).
Truly nonverbal tests were designed to provide a fairer and more valid measure of cognitive ability for people who are disadvantaged on language-loaded tests (e.g. d/Deaf, non-English speakers, speech or language difficulties).
Have minimal to no spoken instructions or required responses.
However, they measure a narrower set of cognitive abilities than language-loaded tests can.

Examples of Nonverbal IQ Tests

Multi-dimensional tests:
- UNIT2 – Universal Nonverbal Intelligence Test – 2nd Edition
- Leiter-3 – Leiter International Performance Scale – 3rd Edition
- CTONI-2 – Comprehensive Test of Nonverbal Intelligence – 2nd Edition
Unidimensional tests:
- WNV – Wechsler Nonverbal Scale of Ability
- TONI-4 - Test of Nonverbal Intelligence, 4th Edition
None of these tests have Australian norms.

UNIT2 Example

For ages 5:0 to 21:11
In Queensland, most commonly used in schools as it is easier than the Leiter-3, and more comprehensive than some others
Administration is completely nonverbal although you use words to settle in to the session and to explain what the gestures are that you are going to be using.
Devised based on the concept of fairness: it is language-free, it measures multiple indexes rather than one, there is minimal need for previously acquired knowledge, it has minimal emphasis on timed tasks, and it contains varied response modes.
High reliability and validity, including with several populations (cultural, language, Deaf/HoH)
Developed with input from representatives of many cultures (expert bias panels)
Six subtests, across three domains of cognitive ability (Memory, Reasoning, Quantitative)
Each domain has two subtests: one symbolic, and one non-symbolic. Examinees may draw on their existing knowledge or language for symbolic subtests.

Language and Cultural Bias

C-LTC (Culture-Language Test Classification) Framework and C-LIM (Culture-Language Interpretative Matrices) by Flanagan et al (2007)