PYB309 Intelligence Testing Notes

Intelligence Testing

Lecture Overview

Prevailing models:
- Cattell-Horn-Carroll (CHC) theory
- Multiple Intelligences
- Triarchic theory
- PASS model
Cultural bias
High stakes decisions
Ethical issues
History of IQ testing
Development of IQ tests: How they are developed, reliability, what they measure and how, what they look like etc
Uses of IQ tests: When are they used clinically, when might they be beneficial
Risks of IQ tests
What is intelligence?

What is Intelligence?

David Wechsler, 1944: "The aggregate or global capacity of the individual to act purposefully, to think rationally and deal effectively with his environment."
Jagannath Prasad Das, 1984: "The ability to plan and structure one’s behavior with an end in view."
Robert Sternberg, 1985: "Mental activity directed toward purposive adaptation to, and selection and shaping of, real-world environments relevant to one’s life"
John Wasserman, 2018: "In spite of over a century of research, the study of intelligence remains controversial for its social applications and implications"

Early Concepts

Charles Spearman (1904): The "g" factor – that intelligence is a singular construct
Alfred Binet (1905): "Intelligence quotient" as a child’s score on a test, divided by their age, x 100
Raymond Cattell (1963):
- "Crystallised intelligence" – learned procedures and knowledge
- "Fluid intelligence" – abstract reasoning on novel tasks

Major Theories of Intelligence

Cattell-Horn-Carroll (CHC theory)
Sternberg’s Triarchic Theory of Successful Intelligence
Gardner’s Multiple Intelligences
Cognitive Processing theories (PASS model)

Cattell-Horn-Carroll Theory (CHC Theory)

Incorporates Cattell and Horn’s theory of fluid and crystallized intelligence (without g).
Based on John Carroll’s (1993) "three-stratum theory" (with g).
Developed and confirmed through factor analysis.
Currently the most widely accepted theory of cognitive abilities.
Strengths:
- Evidence-based/data-based; widespread application.
- Useful in guiding assessment for a specific learning disability or a comprehensive assessment of cognitive ability.
- Ever-evolving and therefore open to new discoveries/understandings.
Weaknesses:
- Complicated; can be confusing for clinicians and clients, including in terms of writing clear, concise, helpful reports.
- Challenging to view some of the abilities identified as relevant to what we commonly understand to be "intelligence."
- Ever-evolving; therefore, it is difficult to know its current version.

Multiple Intelligences

Howard Gardner (1983) recognized that there are many ways people show "intelligences" beyond those valued by Western societies.
He believed that these areas were distinct from each other – not theoretically related.
Gardner’s multiple intelligences include:
- Linguistic intelligence
- Logical-mathematical intelligence
- Musical intelligence
- Visual-spatial intelligence
- Bodily-kinaesthetic intelligence
- Naturalist intelligence
- Interpersonal intelligence
- Intrapersonal intelligence
Strengths:
- Not just about "book smarts" – recognizes people with a wide range of capabilities/talents.
- Has informed different teaching approaches.
Weaknesses:
- Are these "intelligences" better described as "talents" or "skills"?
- Are these intelligences truly independent of each other?
- Limited supporting research evidence (though difficult to measure).
- Difficult to quantify performance; assessments take place over extremely long periods of time, and it is questionable whether anything approaching objective scoring is even possible. (Sternberg, 1991)
- Often conflated with the prevailing myth of “learning styles”

Sternberg’s Triarchic Theory of Successful Intelligence

Sternberg believed schools focus on analytical and memory abilities too much and not enough on creative and practical abilities.
He believed all three needed to function together for someone to use their intelligence successfully.
Three dimensions of success:
1. Componential – internal processes, e.g., metacognition, planning/organizing, memory retrieval, knowledge acquisition
2. Experiential – how well people connect their internal world to external reality – applying insights, synthesizing, dealing with novel problems, automatizing
3. Contextual – how well people adapt to, select, and shape their environments; "street smarts"
Also three areas associated with successful intelligence: analytical, creative, and practical abilities
- Analytical abilities: Useful in analyzing and evaluating options; problem-solving skills
- Creative abilities: Use of experience in ways that foster insight and new ideas
- Practical abilities: Use of tacit knowledge to adapt to changing contexts in everyday life
Strengths:
- Combines internal aspects of intelligence (e.g., problem-solving and reasoning skills) with external aspects of intelligence (e.g., experience, practice).
- Focus on real-world success.
- Traits can more easily be measured (c.f. Gardner).
- Suggests that many real-life intelligent decisions are not measured by current standardized tests.
Weaknesses:
- Limited information about how the componential, experiential, and contextual dimensions relate to one another.
- Are these dimensions distinct or interrelated?
- Is there a mixing of personality traits (e.g., confidence, sociability) with intelligence?

PASS Theory

An alternative to the idea of g, emphasizing psychological processes
PASS = Planning-Attention-Simultaneous-Successive Processing theory
- Planning = cognitive control, knowledge, intentionality, self-regulation
- Attention = focused cognitive activity
- Simultaneous processing = perception of stimuli as a whole, including the ability to integrate words into a meaningful idea
- Successive processing = making a decision based on stimuli in a sequence
These work together when doing intellectual tasks – some stronger than others, depending on the task
Strengths:
- Based on neuropsychological theory about information processing.
- Some IQ tests are based on this model, i.e., Naglieri’s Cognitive Assessment System – CAS2, and Kaufman’s KABC-II (sort of).
- The authors claim that factor-analytic studies of performance on the Cognitive Assessment System (CAS) show reasonable support for the PASS model.
Weaknesses:
- Independent authors have claimed that the PASS model is not supported by factor analysis of CAS results.
- The Planning and Attention factors are highly correlated ( $r = .99$ ).
- Are the components identified in the PASS model actually what is being measured in the CAS2?

Important Note on IQ Tests

The use and access of standardized IQ tests is restricted to registered psychologists only.
Only registered psychologists can purchase the tests directly from the publisher. They agree to a contract of use, including keeping the test materials secure and confidential.
In addition to copyright law, the content of IQ tests are considered trade secrets

Super Quick Summary of the History of IQ tests

Stanford-Binet intelligence scale developed during the era of emerging interest in developmental theory and intelligence theory (early 1900s)
World War I (1914 - 1918) created a need to identify which soldiers were capable for different roles, hence the Army Mental Tests were created (1920)
David Wechsler, a psychology academic, worked as a tester during the war. He developed his first test (the Wechsler Intelligence Scale for Children) by ‘cherry-picking’ the most statistically and clinically useful subtests from several different existing tests.
Wechsler’s first test had a Verbal IQ and a Performance IQ and used a deviation-based IQ like we do now.
Unfortunately, as these tests became more widespread, they became a tool for eugenics. In the historical context either side of World War II, IQ tests contributed to forced sterilization, institutionalization, racial segregation, dehumanization, and genocide.
Although we are more cautious of these things today, we must always remember not to use IQ tests as a tool for discrimination and harm, and speak up when they are used this way.

The Flynn Effect

Perplexing observation that population average IQ scores gradually increase by around 0.33 IQ points per year
First observed when the Raven’s Progressive Matrices were routinely given to all 18-year-old army recruits over a long period of time
Some differences between countries
Surprisingly, the increase is stronger for fluid rather than crystallized intelligence
There is some evidence that the Flynn Effect could plateau
Potential causes: education, familiarity with testing conditions generally, changes to family life (technology, smaller families, more learning opportunities)
Life expectancy, infant mortality, and height, have also followed this trend
Has some unintended consequences – e.g., accuracy of diagnosis, criminal culpability

Measurement Terminology

Norm-referenced standardized tests:
- Constructed by professional test makers
- Normed on a representative sample from the population for which the test is intended
- Involve fixed (standardized) procedures for administration and scoring

Test Norms: How are they Developed?

When developing the test, test items are chosen with good psychometric properties and which together produce a range of performance (i.e., easier versus more difficult items).
The finalized test is administered to an appropriately sized, representative sample.
Individuals’ raw scores are converted into standardized scaled or composite scores.
Individual subtest scores usually have a mean of 10 and a standard deviation of 3 on most of our commonly used tests.
Composite scores (e.g., IQ) usually have a mean of 100 with standard deviation of 15.
These scores reflect position above or below the mean.
Standardized scores are normally distributed.
We can compare how an individual is positioned in relation to the representative, standardized sample.

Percentile Ranks

Between -1 and -2 sd below the mean
Between 1 and 2 sd above the mean
>-3 sd
>3 sd

Standardized Testing

The results of a test are reliable and valid only to the extent that the test was administered according to standardized procedures.
What is standardized?
- Environmental factors
- Test instructions
- Acceptable responses
- Scoring procedures
- Test procedures
- Using norms to convert scores

Test Score, “True Score” and Error

Sources of error:
- Within the test: i.e., items don’t perfectly tap into and consistently measure the construct, reliability and validity aren’t perfect
- The examiner: e.g., deviations from standardized administration and scoring, mistakes
- The test-taker: e.g., moments of distraction/inattention, poor sleep, feeling stressed about a significant life issue, feeling anxious about test-taking, forgot glasses
- The testing environment: i.e., fire alarm goes off, noisy children using adjacent corridor causing distraction
We try to reduce error by establishing rapport, following standardized processes, and trying to ensure an appropriate testing environment – but we can never entirely eliminate error
A person’s score on a psychometric test represents a “snapshot” of their performance at that time
Purely theoretically (i.e., not taking into account practice effects), if they were to take the test again and again, their score would fluctuate
This pattern of different scores would be assumed to fall in a normal distribution with the “true” test score being at the peak/the mean
So, a person’s test score = their hypothetical “true score” plus error

Structure of Common IQ Tests

Each question is called an item – points are awarded for correct/appropriate responses
Individual tasks are called subtests, which are comprised of many items. Examples: Digit Span (a measure of working memory involving remembering and manipulating numbers), or Vocabulary (a measure of verbal comprehension where examinees need to define words)
Subtests that are designed to measure aspects of the same broad area (e.g., fluid reasoning) are clustered together into composite scores.
A set number of the subtests across the composites are compiled to a summary score, often called the Full Scale IQ (FSIQ)
Conceptually, this structure can be understood in alignment with CHC theory (though tests vary in how well they map onto CHC)

What Results Might Look Like

Numbers of correct items are added up to raw scores
Raw scores are converted to standard scores by comparing them with normative data for the examinee’s age.
Subtest standard scores that measure the same area are added together to make a “sum of scaled scores”, and then this sum is converted to composite scores
The FSIQ isn’t always the “full scale” – on the WISC-V, it is derived from only 7 of the 10 core subtests
When interpreting test scores, we focus on the broadest measure, as it includes a wider range of tasks and is therefore more robust against error. The FSIQ is the most reliable summary of overall ability – if there are highs and lows, interpreting the composites may be more meaningful. Subtests may be considered with caution, and item results are too narrow to interpret, though may give us interesting qualitative observations sometimes.

With All Their Problems, What Use do IQ Tests Have?

They are part of the diagnostic criteria for Intellectual Developmental Disorder (Intellectual Disability)
They can give us an idea of an individual’s cognitive strengths and weaknesses
Identifying giftedness, or even average ability, can indicate when somebody is underachieving
They can be useful in differential diagnosis, such as ruling out particular hypotheses
In bureaucratic systems, they can enable people to qualify for support where resources are limited
Findings can bring insight to clients and those who support them and can inform suitable intervention

Important Note

Standardized tests (including IQ tests) should only ever be one small part of an assessment process
Results should never be interpreted without the context you get from interviews, observations, and other collateral data sources
Tests can give us new insights and can confirm our hypotheses with more “objective” data
But diagnoses, interpretations, and recommendations are made based on a combination of professional skills and clinical judgment, not on test results alone, and test results do not have the final say on clinical decision making

Nonverbal IQ Tests

The Army Mental Tests (Beta) consisted of nonverbal tasks to assess recruits with low literacy or English skills, e.g., mazes, picture sequences, puzzles, etc. These formed the basis of future nonverbal IQ test tasks.
Truly nonverbal tests were designed to provide a fairer and more valid measure of cognitive ability for people who are disadvantaged on language-loaded tests (e.g., d/Deaf, non-English speakers, speech or language difficulties)
Have minimal to no spoken instructions or required responses
However, they measure a narrower set of cognitive abilities than language-loaded tests can

Nonverbal IQ Tests: Examples

Multi-dimensional tests (i.e., measure more than one composite area)
- UNIT2 – Universal Nonverbal Intelligence Test – 2nd Edition
- Leiter-3 – Leiter International Performance Scale – 3rd Edition
- CTONI-2 – Comprehensive Test of Nonverbal Intelligence – 2nd Edition
Unidimensional tests (i.e., only give you one summary score)
- WNV – Wechsler Nonverbal Scale of Ability
- TONI-4 - Test of Nonverbal Intelligence, 4th Edition
  *Note: None of these tests have Australian norms.

Example: UNIT2

For ages 5:0 to 21:11
In Queensland, most commonly used in schools as it is easier than the Leiter-3 and more comprehensive than some others
Administration is completely nonverbal although you use words to settle in to the session and to explain what the gestures are that you are going to be using.
Devised based on the concept of fairness: it is language-free, it measures multiple indexes rather than one, there is minimal need for previously acquired knowledge, it has minimal emphasis on timed tasks, and it contains varied response modes.
High reliability and validity, including with several populations (cultural, language, Deaf/HoH)
Developed with input from representatives of many cultures (expert bias panels)
Six subtests, across three domains of cognitive ability (Memory, Reasoning, Quantitative)
Each domain has two subtests: one symbolic, and one non-symbolic. Examinees may draw on their existing knowledge or language for symbolic subtests.

Language and Cultural Bias

C-LTC (Culture-Language Test Classification) Framework and C-LIM (Culture-Language Interpretative Matrices) by Flanagan et al (2007)