Intelligence Testing Notes
Intelligence Testing Notes
Lecture Overview
- Prevailing models:
- Cattell-Horn-Carroll (CHC) theory
- Multiple Intelligences
- Triarchic theory
- PASS model
- Cultural bias
- High stakes decisions
- Ethical issues
- History of IQ testing, development, reliability, measurement.
- Clinical and beneficial uses of IQ tests.
- Definition of intelligence.
- Risks of IQ tests.
What is Intelligence?
- David Wechsler (1944): "The aggregate or global capacity of the individual to act purposefully, to think rationally and deal effectively with his environment."
- Robert Sternberg (1985): "Mental activity directed toward purposive adaptation to, and selection and shaping of, real-world environments relevant to one’s life."
- Jagannath Prasad Das (1984): "The ability to plan and structure one’s behavior with an end in view."
- John Wasserman (2018): "In spite of over a century of research, the study of intelligence remains controversial for its social applications and implications."
Early Concepts
- "g" factor (Charles Spearman, 1904): Intelligence is a singular construct.
- Intelligence quotient (Alfred Binet, 1905): A child’s score on a test, divided by their age, multiplied by 100: IQ = \frac{mental\ age}{chronological\ age} \times 100
- Fluid intelligence (Raymond Cattell, 1963): Abstract reasoning on novel tasks.
- Crystallized intelligence (Raymond Cattell, 1963): Learned procedures and knowledge.
Major Theories of Intelligence
- Cattell-Horn-Carroll (CHC) Theory
- Sternberg’s Triarchic Theory of Successful Intelligence
- Gardner’s Multiple Intelligences
- Cognitive Processing Theories (PASS model)
Cattell-Horn-Carroll (CHC) Theory
- Incorporates Cattell and Horn’s theory of fluid and crystallized intelligence (without g).
- Based on John Carroll’s (1993) "three-stratum theory" (with g).
- Developed and confirmed through factor analysis.
- Currently the most widely accepted theory of cognitive abilities.
- Strengths:
- Evidence-based/data-based.
- Widespread application.
- Useful in guiding assessment for specific learning disabilities or comprehensive cognitive ability assessments.
- Ever-evolving.
- Weaknesses:
- Complicated and confusing for clinicians and clients.
- Challenging to view some abilities as relevant to "intelligence."
- Ever-evolving, making it difficult to know the current version.
Multiple Intelligences
- Howard Gardner (1983) recognized that there are many ways people show “intelligences” beyond those valued by Western societies.
- These areas were distinct from each other – not theoretically related.
- Gardner’s multiple intelligences include:
- Linguistic intelligence
- Logical-mathematical intelligence
- Musical intelligence
- Visual-spatial intelligence
- Bodily-kinaesthetic intelligence
- Naturalist intelligence
- Interpersonal intelligence
- Intrapersonal intelligence
- Strengths:
- Recognizes a wide range of capabilities/talents, not just "book smarts."
- Has informed different teaching approaches.
- Weaknesses:
- Are these "intelligences" better described as "talents" or "skills"?
- Are these intelligences truly independent of each other?
- Limited supporting research evidence (though difficult to measure).
- Difficult to quantify performance.
- Often conflated with the prevailing myth of "learning styles."
- Sternberg (1991): It is very difficult, if not impossible, to quantify performance [on these measures]; assessments take place over extremely long periods of time, and it is questionable whether anything approaching objective scoring is even possible
Sternberg’s Triarchic Theory of Successful Intelligence
- Sternberg believed schools focus on analytical and memory abilities too much, and not enough on creative and practical abilities; all three needed to function together for someone to use their intelligence successfully.
- 3 dimensions of success:
- Componential (internal processes): metacognition, planning/organizing, memory retrieval, knowledge acquisition.
- Experiential: how well people connect their internal world to external reality – applying insights, synthesizing, dealing with novel problems, automatizing.
- Contextual: how well people adapt to, select, and shape their environments; "street smarts."
- Three areas associated with successful intelligence:
- Analytical abilities: Useful in analyzing and evaluating options; problem-solving skills.
- Creative abilities: Use of experience in ways that foster insight and new ideas.
- Practical abilities: Use of tacit knowledge to adapt to changing contexts in everyday life.
- Strengths:
- Combines internal aspects of intelligence (e.g., problem-solving and reasoning skills) with external aspects (e.g., experience, practice).
- Focus on real-world success.
- Traits can more easily be measured.
- Suggests that many real-life intelligent decisions are not measured by current standardized tests.
- Weaknesses:
- Limited information about how the componential, experiential and contextual dimensions relate to one another.
- Are these dimensions distinct, or interrelated?
- Is there a mixing of personality traits (e.g., confidence, sociability) with intelligence?
PASS Theory (Das, Naglieri & Kirby, 1994)
- An alternative to the idea of g, emphasizing psychological processes.
- PASS = Planning-Attention-Simultaneous-Successive Processing theory
- Planning = cognitive control, knowledge, intentionality, self-regulation
- Attention = focused cognitive activity
- Simultaneous processing = perception of stimuli as a whole, including the ability to integrate words into a meaningful idea
- Successive processing = making a decision based on stimuli in a sequence
- These work together when doing intellectual tasks – some stronger than others, depending on the task
- Strengths:
- Based on neuropsychological theory about information processing.
- Some IQ tests are based on this model (Naglieri’s Cognitive Assessment System – CAS2, and Kaufman’s KABC-II).
- Factor-analytic studies of performance on the Cognitive Assessment System (CAS) show reasonable support for the PASS model (according to the authors).
- Weaknesses:
- Independent authors have claimed that the PASS model is not supported by factor analysis of CAS results.
- The Planning and Attention factors are highly correlated (r = .99).
- Are the components identified in the PASS model actually what is being measured in the CAS2?
Important Note on IQ Tests
- The use and access of standardized IQ tests is restricted to registered psychologists only.
- Only registered psychologists can purchase the tests, directly from the publisher. They agree to a contract of use, including keeping the test materials secure and confidential.
- In addition to copyright law, the content of IQ tests are considered trade secrets.
History of IQ Tests
- Stanford-Binet intelligence scale developed during the era of emerging interest in developmental theory and intelligence theory (early 1900s).
- World War I (1914 - 1918) created a need to identify which soldiers were capable for different roles, hence the Army Mental Tests were created (1920).
- David Wechsler developed his first test (the Wechsler Intelligence Scale for Children) by ‘cherry-picking’ the most statistically and clinically useful subtests from several different existing tests; included Verbal IQ and a Performance IQ, and used a deviation-based IQ.
- Use of IQ tests contributed to forced sterilization, institutionalization, racial segregation, dehumanization, and genocide in the historical context either side of World War II.
- Must always remember not to use IQ tests as a tool for discrimination and harm, and speak up when they are used this way.
The Flynn Effect
- Population average IQ scores gradually increase by around 0.33 IQ points per year.
- First observed when the Raven’s Progressive Matrices were routinely given to all 18-year-old army recruits over a long period of time.
- Some differences between countries.
- The increase is stronger for fluid rather than crystallized intelligence.
- There is some evidence that the Flynn Effect could plateau.
- Potential causes: education, familiarity with testing conditions generally, changes to family life (technology, smaller families, more learning opportunities).
- Life expectancy, infant mortality, and height, have also followed this trend.
- Has some unintended consequences (e.g., accuracy of diagnosis, criminal culpability).
Measurement Terminology
- Norm-referenced standardized tests
- Constructed by professional test makers.
- Normed on a representative sample from the population for which the test is intended.
- Involve fixed (standardized) procedures for administration and scoring.
Test Norms: How are they Developed?
- Test items are chosen with good psychometric properties, and which together produce a range of performance (i.e., easier versus more difficult items).
- The finalized test is administered to an appropriately-sized, representative sample.
- Individuals’ raw scores are converted into standardized scaled or composite scores.
- Individual subtest scores usually have a mean of 10 and a standard deviation of 3.
- Composite scores (e.g. IQ) usually have a mean of 100 with standard deviation of 15.
- These scores reflect position above or below the mean.
- Standardized scores are normally distributed.
- We can compare how an individual is positioned in relation to the representative, standardized sample.
Percentile Ranks
- Illustrates where an individual falls with respect to the rest of the standardization sample.
- Uses standard deviations from the mean.
Standardized Testing
- The results of a test are reliable and valid only to the extent that the test was administered according to standardized procedures.
- What is standardized:
- Environmental factors
- Test instructions
- Acceptable responses
- Scoring procedures
- Test procedures
- Using norms to convert scores
Test Score, “True Score” and Error
- Sources of error:
- Within the test (items don’t perfectly tap into and consistently measure the construct, reliability and validity aren’t perfect)
- The examiner (deviations from standardized administration and scoring, mistakes)
- The test-taker (moments of distraction/inattention, poor sleep, feeling stressed about a significant life issue, feeling anxious about test-taking, forgot glasses)
- The testing environment (fire alarm goes off, noisy children using adjacent corridor causing distraction)
- We try to reduce error by establishing rapport, following standardized processes, and trying to ensure an appropriate testing environment – but we can never entirely eliminate error.
- A person’s score on a psychometric test represents a “snapshot” of their performance at that time.
- Purely theoretically (i.e., not taking into account practice effects), if they were to take the test again and again, their score would fluctuate.
- This pattern of different scores would be assumed to fall in a normal distribution with the “true” test score being at the peak/ the mean.
- So, a person’s test score = their hypothetical “true score” plus error.
Structure of Common IQ Tests
- Each question is called an item – points are awarded for correct/appropriate responses.
- Individual tasks are called subtests, which are comprised of many items (e.g., Digit Span, Vocabulary).
- Subtests that are designed to measure aspects of the same broad area (e.g. fluid reasoning) are clustered together into composite scores.
- A set number of the subtests across the composites are compiled to a summary score, often called the Full Scale IQ (FSIQ)
- Conceptually, this structure can be understood in alignment with CHC theory (though tests vary in how well they map onto CHC).
What Results Might Look Like
- Numbers of correct items are added up to raw scores.
- Raw scores are converted to standard scores by comparing them with normative data for the examinee’s age.
- Subtest standard scores that measure the same area are added together to make a “sum of scaled scores”, and then this sum is converted to composite scores.
- The FSIQ isn’t always the “full scale” – on the WISC-V, it is derived from only 7 of the 10 core subtests.
- When interpreting test scores, we focus on the broadest measure, as it includes a wider range of tasks and is therefore more robust against error.
- The FSIQ is the most reliable summary of overall ability – if there are highs and lows, interpreting the composites may be more meaningful.
- Subtests may be considered with caution, and item results are too narrow to interpret, though may give us interesting qualitative observations sometimes.
Use of IQ Tests
- Part of the diagnostic criteria for Intellectual Developmental Disorder (Intellectual Disability).
- Can give us an idea of an individual’s cognitive strengths and weaknesses.
- Identifying giftedness, or even average ability, can indicate when somebody is underachieving.
- Can be useful in differential diagnosis.
- In bureaucratic systems, they can enable people to qualify for support where resources are limited.
- Findings can bring insight to clients and those who support them, and can inform suitable intervention.
Important Note
- Standardized tests (including IQ tests) should only ever be one small part of an assessment process.
- Results should never be interpreted without the context you get from interviews, observations, and other collateral data sources.
- Tests can give us new insights and can confirm our hypotheses with more “objective” data.
- But diagnoses, interpretations, and recommendations are made based on a combination of professional skills and clinical judgment, not on test results alone, and test results do not have the final say on clinical decision making.
Nonverbal IQ Tests
- The Army Mental Tests (Beta) consisted of nonverbal tasks to assess recruits with low literacy or English skills (e.g., mazes, picture sequences, puzzles).
- Truly nonverbal tests were designed to provide a fairer and more valid measure of cognitive ability for people who are disadvantaged on language-loaded tests (e.g. d/Deaf, non-English speakers, speech or language difficulties).
- Have minimal to no spoken instructions or required responses.
- However, they measure a narrower set of cognitive abilities than language-loaded tests can.
Examples of Nonverbal IQ Tests
- Multi-dimensional tests:
- UNIT2 – Universal Nonverbal Intelligence Test – 2nd Edition
- Leiter-3 – Leiter International Performance Scale – 3rd Edition
- CTONI-2 – Comprehensive Test of Nonverbal Intelligence – 2nd Edition
- Unidimensional tests:
- WNV – Wechsler Nonverbal Scale of Ability
- TONI-4 - Test of Nonverbal Intelligence, 4th Edition
- None of these tests have Australian norms.
UNIT2 Example
- For ages 5:0 to 21:11
- In Queensland, most commonly used in schools as it is easier than the Leiter-3, and more comprehensive than some others
- Administration is completely nonverbal although you use words to settle in to the session and to explain what the gestures are that you are going to be using.
- Devised based on the concept of fairness: it is language-free, it measures multiple indexes rather than one, there is minimal need for previously acquired knowledge, it has minimal emphasis on timed tasks, and it contains varied response modes.
- High reliability and validity, including with several populations (cultural, language, Deaf/HoH)
- Developed with input from representatives of many cultures (expert bias panels)
- Six subtests, across three domains of cognitive ability (Memory, Reasoning, Quantitative)
- Each domain has two subtests: one symbolic, and one non-symbolic. Examinees may draw on their existing knowledge or language for symbolic subtests.
Language and Cultural Bias
- C-LTC (Culture-Language Test Classification) Framework and C-LIM (Culture-Language Interpretative Matrices) by Flanagan et al (2007)