Measurement and Intelligence

Test Standardisation

  • ‘Do we know what the average level of performance is? Do we know what the mean score is? DO we know something about how much scores vary around that mean?’

  • Done by surveying/testing a large number of people and calculating the mean, and thus how much variation there is around that average

The Normal Distribution

  • Coverting data into standardised scores/converting the distribution of scores into a standard normal distribution

  • A normal distribution where the mean is ‘zero’ and the standard deviation is ‘one’

  • For height, you subtract the mean from the high value and dividing it by the standard deviation, which is 10. Someone with a height of 180cms will now how a standardised score of 1 on this standard normal distribution (180-170 / 1)

  • From that, one can see that they are one standard deviation above the mean, and we cam tell how small or large the score is because we know the properties of the standard normal distribution

Alternate Forms and Split Half Reliability

  • Reliability is the extent to which a measure gives you consistent measures on repeated measures

  • Measuring intelligence tests on reliability has different techniques: alternate forms reliability, split half reliability, and test retest reliability

  • Alternative forms reliability involved the evaluation of two different versions of the same test. The scores are compared to see if the test is reliable

  • Split half reliability involves the evaluation of two tests, however, it may not be possible for researchers to develop two versions of the test, so they test one test into two tests. You evaluate different parts of the test, like their first performance compared to the second performance.

    • By cutting the one test in half, we’re assuming that the first half of the test is measuring the same thing as the second half.

    • But, if we then split the test in half and compare performance between the two, the performance could be different––not because the test is not reliable, but because the parts of the test we’re comparing are measuring different things.

    • Well, we can solve this problem by dividing the test in many different ways. There are many different ways we can split the test in half. We could take the odd and even items, or jumble them up and sort them into two parts randomly. Split half reliability effectively does that, statistically. What it will do is jumble up the test in all different possible halves and give us the average correlation of all those halves.

  • Test retest reliability involves getting the same group of people to complete the same test twice.

    • However, if the test reliability is low, it looks like there's no relation between the scores for the first time the test is taken compared to the second time the test is taken.

    • Now, test retest reliability assumes that what we're measuring in the test is stable.

    • Test retest reliability also assumes that any changes in the responses given by people in the test are not due to repeated exposure to the same test.

Validity

  • Assesses the accuracy of the test in measuring what it is meant to measure

Predictive

  • Refers to whether or not the scores on the test match with later outcomes

  • For examples, when using an intelligence test to predict job performance, what we would want to do is measure a person’s intelligence at one point, and then some time later, compare their performance in the job to their scores on the intelligence test. If their intelligence test scores predict their job performance, then we would say that it has predictive validity for that purpose.

  • We then use the intelligence test when hiring new people for the job to give us some idea about who might be the best candidate. Then, six months later, we test it again on the person and compare the validity of using the intelligence test to assess possible job performance

  • Measure someone’s performance on an intelligence test once, and then measure the same person again six months later to see how well they’re doing in their new job = Predictive

Criterion

  • Matches the scores on the test with some other measure—either a previous measure or concurrent measure of the same thing.

  • For example, if we’re interested to see whether or not your grades in this course are a valid measure of you academic performance, what we could do is compare your grades from this course with your grades from high-school, because your performance in high school is another measure of your academic ability.

  • Typically, student’s grades from high school are taken over a number of years and have been assessed over a range of topics, so they should be a pretty good stable measure for academic ability

  • Compare someone’s grades from a particular course at university with their grades from high school = Criterion

Construct

  • The idea that when we design a measure for something, the way we’ve designed the measure follows the underlying theory that we think represents the concept of what we’re measuring

  • For example, if we develop an intelligence test on the basis of six underlying constructs that are supported by theory and evidence, we need to assess those six underlying constructs in our test

  • If the test we’ve developed for personality is made up of three factors, we need to measure those three factors. Refers to how well a test maps onto the underlying theory with regard to think we’re measuring

  • Assess the six theoretically supported underlying constructs that a newly developed intelligence test is based on = Construct

Test Bias

  • The extent to which everybody has the same chances to do well on the test, which can be dependent on culture and other factors

  • If the test has been developed in one culture and administered in another, people from a different culture where the test was developed may be at a disadvantage

  • If its an intelligence test, it’s not because people from a different culture are less intelligent than those from where the test was developed, it was because it is biased to the people from one cultural background compared to another.

  • Age is another factor = if you develop a test for adults and give it to children, the children will do bad but not because they’re not intelligent enough, just that the test is biased towards adults

Origins of Measuring Intelligence

  • The first attempt to assess intelligence was in France, when the education board decided they needed to do something about children who were struggling in school to kick them out so they didn;t detract from the schooling of other children

  • Alfred Binet and Theodore Simon were two French academics who were interested in measuring children’s intelligence, and they made a book for the children known as the ‘Binet-Simon Scale’

  • American Psychologist, Lewis Terman was interested in measuring intelligence, and he took the Binet-Simon measure and adapted it for the American context. He translated and standardised it, introducing the concept of IQ

  • IQ stands for the intelligence quotient and the scale was called the Stanford-Binet scale as he worked at the Stanford Graduate School of Education

  • For example, if a seven year old child passed all the aged seven normed items on the Binet-Simon Scale, they would score a mental age of seven. To work out their IQ, they would divide their mental age of seven by their chronological age, which is also seven, and multiply it by 100. Then they would score a 7, and is performing at their expected level for their age.

  • If a seven year old child was able to answer all the 14-year-old age-normed questions, it would mean that they would have an IQ of 200. The problem with the Stanford-Binet Scale is that the age-normed items only really go up to 16 years of age, but chronological age keeps rising. What happens is, if you gave a person with an IQ of 100 this test every year and graphed how they scored, the measured mental age would increase in lockstep with their chronological age and they would have an IQ of 100 at each time we measured it because their mental age would match their chronological age, so the IQ stays the same.

  • However, when they hit 16 years of age, the measured IQ would actually start decreasing because as their chronological age increases, their measured mental age is at the top of the scale and so doesn’t keep up with their chronological age any more. This is purely a function of the maths calculating IQ this way. It's not that people become less intelligent in their teenage years, despite what their parents might say. It's just because the scale doesn’t have any items beyond age 16. This way of calculating IQ fell out of favour over time, because it wasn't very adaptable to broader age groups.

  • Today when we measure IQ, we're not talking about the ratio of chronological age to mental age. What we do is we compare performance on an IQ test to the standardised data we have about performance on that test. If someone scored 115 on an IQ test, that's one standard deviation above the mean. IQ scores by definition have a mean of 100, and a standard deviation of 15. Overtime, because people’s average performance on the test gradually increases, we have to re-standardise the test so that the average is always back to 100. This is how we work with IQ tests now. It's literally on the basis of the standard normal curve.

Contemporary Tests of Intelligence

  • American Psychologist, David Wechsler, developed the the Wechsler Adult Intelligence Scale in 1955, known as the WAIS, for adults

  • In the contemporary age, we are up to the WAIS-IV (FOUR) and there are separate versions for children and adults, as well as for different age groups

  • People’s performance on the test is then compared to the standardised information. On the completion of the test, they receive a score that represents the overall or fullscale IQ, which is broken down into verbal and performance IQ

  • Verbal IQ includes verbal comprehension, working memory

  • Performance IQ includes perceptual organisation and processing speed.

  • These indicators provide people with fairly detailed information about their performance across a number of different domains from the test. They are typically administered by trained test administrators.

  • Typically these tests assess things like digit span or the ability to keep a series of numbers in working memory, which is a component of verbal IQ

What is Intelligence?

  • Psychologists agree with what it is they’re measuring when it comes to intelligence, like the person’s ability to learn and remember information, to recognise concepts and their relations, and to apply the information to their own behaviour in an adaptive way

  • Multiple Intelligence is the perspective where we have a number of sub-skills. “Look, I’m not real good at math, but I’ve got really good emotional intelligence and I can connect with people

General Intelligence

  • General Intelligence is the underlying of all our abilities. For example, if someone is good at maths, they’re probably going to be good at music because they’re both driven by the same thing

  • Spearman proposed the idea of the Two Factor Theory of Intelligence, and he thought that people’s performance on tests was a function of two factors. One is the ‘G’ factor (general intelligence) and the other is the ‘S’ factor, which is specific intellectual abilities

  • Spearman thought that general intelligence influences our general performance on all mental tasks whereas ‘S’ is unique individual abilities on a particular task

  • Education of relations = being able to compare something and look at what was consistent across the shapes and work out what the rule is

  • Education of correlates = doesn’t require the ‘G’

Components of Intelligence

  • Factor analysis = advance statistical technique

  • Imagine that we have a couple of tests and we measure people’s performance on those tests, and we plot them on a graph like this as a correlation. We could plot a line there that’s designed to summarise those scores and reduce the error as much as possible in summarising those scores. We can see how much error there is by looking at where the individual responses fall in relation to that line. When the scores are far from the line the amount of error seems to increase as the scores get higher, the points of data get further and further away from the line. The scores seem to be clumping more together down here, but they’re spreading more apart up there. Therefore, this might suggest that using a single correlation or factor is not the best way to describe this data.

  • What factor analysis does is it looks at whether one, two, or even three factors better describe the data. It goes up to any number it needs to describe the data. It goes up to any number it needs to describe the data. How you know when it's adequately describing the data is the variation around those factors is reduced to a statistical minimum. It's not eliminated, but there is an optimal number where adding more won't get rid of any more error. In actual fact, it can induce more error. So, we can statistically tell what number of factors we need to represent these scores.

  • The idea is, if we give people a bunch of intelligence tests, do factor analysis on their performance, we should be able to work out how many factors we need to describe their performance. These statistical factors map onto how many factors intelligence has. If we can use a single line to represent their performance, then there's a single factor, and we might call it “G”, for general intelligence. If we need seven lines to represent their performance on the tests, then there's seven underlying factors to intelligence. Using factor analysis, we can work out, statistically, the number of factors we need to describe intelligence.